XUtils

floki

A simple HTML parser that enables searching using CSS like selectors.


=> [{“p”, [{“class”, “headline”}], [“Floki”]}]

document |> Floki.find(“p.headline”) |> Floki.raw_html

=>

Floki


Each HTML node is represented by a tuple like:

    {tag_name, attributes, children_nodes}

Example of node:

    {"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

#### Using `html5ever` as the HTML parser

This dependency is written with a NIF using [Rustler](https://github.com/rusterlium/rustler), but
you don't need to install anything to compile it thanks to [RustlerPrecompiled](https://hexdocs.pm/rustler_precompiled/).

```elixir
defp deps do
  [
    {:floki, "~> 0.36.0"},
    {:html5ever, "~> 0.15.0"}
  ]
end

Run mix deps.get and compiles the project with mix compile to make sure it works.

Then you need to configure your app to use html5ever:

# in config/config.exs

config :floki, :html_parser, Floki.HTMLParser.Html5ever

Notice that you can pass the HTML parser as an option in parse_document/2 and parse_fragment/2.

Using fast_html as the HTML parser

A C compiler, GNU\Make and CMake need to be installed on the system in order to compile lexbor.

First, add fast_html to your dependencies:

defp deps do
  [
    {:floki, "~> 0.36.0"},
    {:fast_html, "~> 2.0"}
  ]
end

Run mix deps.get and compiles the project with mix compile to make sure it works.

Then you need to configure your app to use fast_html:

# in config/config.exs

config :floki, :html_parser, Floki.HTMLParser.FastHtml

More about Floki API

To parse a HTML document, try:

html = """
  <html>
  <body>
    <div class="example"></div>
  </body>
  </html>
"""

{:ok, document} = Floki.parse_document(html)
# => {:ok, [{"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}]}

To find elements with the class example, try:

Floki.find(document, ".example")
# => [{"div", [{"class", "example"}], []}]

To convert your node tree back to raw HTML (spaces are ignored):

document
|> Floki.find(".example")
|> Floki.raw_html
# =>  <div class="example"></div>

To fetch some attribute from elements, try:

Floki.attribute(document, ".example", "class")
# => ["example"]

You can get attributes from elements that you already have:

document
|> Floki.find(".example")
|> Floki.attribute("class")
# => ["example"]

If you want to get the text from an element, try:

document
|> Floki.find(".headline")
|> Floki.text

# => "Floki"

Suppressing log messages

Floki may log debug messages related to problems in the parsing of selectors, or parsing of the HTML tree. It also may log some “info” messages related to deprecated APIs. If you want to suppress these log messages, please consider setting the :compile_time_purge_matching option for :logger in your compile time configuration.

See https://hexdocs.pm/logger/Logger.html#module-compile-configuration for details.

Special thanks


Articles

  • coming soon...