=> [{“p”, [{“class”, “headline”}], [“Floki”]}]
document |> Floki.find(“p.headline”) |> Floki.raw_html
=>
Floki
Each HTML node is represented by a tuple like:
{tag_name, attributes, children_nodes}
Example of node:
{"p", [{"class", "headline"}], ["Floki"]}
So even if the only child node is the element text, it is represented inside a list.
#### Using `html5ever` as the HTML parser
This dependency is written with a NIF using [Rustler](https://github.com/rusterlium/rustler), but
you don't need to install anything to compile it thanks to [RustlerPrecompiled](https://hexdocs.pm/rustler_precompiled/).
```elixir
defp deps do
[
{:floki, "~> 0.36.0"},
{:html5ever, "~> 0.15.0"}
]
end
Run mix deps.get
and compiles the project with mix compile
to make sure it works.
Then you need to configure your app to use html5ever
:
# in config/config.exs
config :floki, :html_parser, Floki.HTMLParser.Html5ever
Notice that you can pass the HTML parser as an option in parse_document/2
and parse_fragment/2
.
Using fast_html
as the HTML parser
A C compiler, GNU\Make and CMake need to be installed on the system in order to compile lexbor.
First, add fast_html
to your dependencies:
defp deps do
[
{:floki, "~> 0.36.0"},
{:fast_html, "~> 2.0"}
]
end
Run mix deps.get
and compiles the project with mix compile
to make sure it works.
Then you need to configure your app to use fast_html
:
# in config/config.exs
config :floki, :html_parser, Floki.HTMLParser.FastHtml
More about Floki API
To parse a HTML document, try:
html = """
<html>
<body>
<div class="example"></div>
</body>
</html>
"""
{:ok, document} = Floki.parse_document(html)
# => {:ok, [{"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}]}
To find elements with the class example
, try:
Floki.find(document, ".example")
# => [{"div", [{"class", "example"}], []}]
To convert your node tree back to raw HTML (spaces are ignored):
document
|> Floki.find(".example")
|> Floki.raw_html
# => <div class="example"></div>
To fetch some attribute from elements, try:
Floki.attribute(document, ".example", "class")
# => ["example"]
You can get attributes from elements that you already have:
document
|> Floki.find(".example")
|> Floki.attribute("class")
# => ["example"]
If you want to get the text from an element, try:
document
|> Floki.find(".headline")
|> Floki.text
# => "Floki"
Suppressing log messages
Floki may log debug messages related to problems in the parsing of selectors, or parsing of the HTML tree.
It also may log some “info” messages related to deprecated APIs. If you want to suppress these log messages,
please consider setting the :compile_time_purge_matching
option for :logger
in your compile time configuration.
See https://hexdocs.pm/logger/Logger.html#module-compile-configuration for details.
Special thanks
- @arasatasaygin for Floki’s logo from the Open Logos project.