XUtils

Beaver

Beaver is a LLVM/MLIR Toolkit in Elixir and Zig.


Beaver đŸĻĢ

Package Documentation Check Upstream

Boost the almighty blue-silver dragon with some magical elixir! 🧙🧙‍♀ī¸đŸ§™â€â™‚ī¸

Goals

  • Powered by Elixir’s composable modularity and meta-programming features, provide a simple, intuitive, and extensible interface for MLIR.
  • Edit-Build-Test-Debug Loop at seconds. Everything in Elixir and Zig are compiled in parallel.
  • Compile Elixir to native/WASM/GPU with the help from MLIR.
  • Revisit and reincarnate symbolic AI in the HW-accelerated world. Erlang/Elixir has a Prolog root!
  • Introduce a new stack to machine learning.
    • Higher-level: Elixir
    • Representation: MLIR
    • Lower-level: Zig

Why is it called Beaver?

Beaver is an umbrella species increase biodiversity. We hope this project could enable other compilers and applications in the way a beaver pond becomes the habitat of many other creatures. Many Elixir projects also use animal names as their package names and it is often about raising awareness of endangered species. To read more about why beavers are important to our planet, check out this National Geographic article.

Quick introduction

Beaver is essentially LLVM/MLIR on Erlang/Elixir. It is kind of interesting to see a crossover of two well established communities and four sub-communities. Here are some brief information about each of them.

For Erlang/Elixir forks

  • Explain this MLIR thing to me in one sentence

MLIR could be regarded as the XML for compilers and an MLIR dialect acts like HTTP standard which gives the generic format real-world semantics and functionalities.

For LLVM/MLIR forks

  • What’s so good about this programming language Elixir?

    • It gets compiled to Erlang and runs on BEAM (Erlang’s VM). So it has all the fault-tolerance and concurrency features of Erlang.
    • As a Lisp, Elixir has all the good stuff of a Lisp-y language including hygienic macro, protocol-based polymorphism.
    • Elixir has a powerful module system to persist compile-time data and this allows library users to easily adjust runtime behavior.
    • Minimum, very few keywords. Most of the language is built with itself.

Getting started

LLVM/MLIR is a giant project, and built around that Beaver have thousands of functions. To properly ship LLVM/MLIR and streamline the development process, we need to carefully break the functionalities at different level into different Erlang apps under the same umbrella.

  • :beaver: Elixir and C/C++ hybrid.
    • Top level app ships the high level functionalities including IR generation and pattern definition.
    • MLIR CAPI wrappers built by parsing LLVM/MLIR CAPI C headers and some middle level helper functions to hide the C pointer related operations. This app will add the loaded MLIR C library and managed MLIR context to Erlang supervisor tree. Rust is also used in this app, but mainly for LLVM/MLIR CMake integration.
    • All the Ops defined in stock MLIR dialects, built by querying the registry. This app will ship MLIR Ops with Erlang idiomatic practices like behavior compliance.
  • :kinda: Elixir and Zig hybrid, generating NIFs from MLIR C headers. Repo: https://github.com/beaver-lodge/kinda
  • :manx: Pure Elixir, compiler backend for Nx.

Notes on consuming and development

  • Only :beaver and :kinda are designed to be used as stand-alone app being directly consumed by other apps.
  • :manx could only work with Nx.
  • Although :kinda is built for Beaver, any Erlang/Elixir app with interest bundling some C API could take advantage of it as well.
  • The namespace Beaver.MLIR is for standard features are generally expected in any MLIR tools.
  • The namespace Beaver is for concepts and practice only exists in Beaver, which are mostly in a DSL provided as a set of macros (including mlir/0, block/1, defpat/2, etc). The implementations are usually under Beaver.DSL namespace.
  • In Beaver, there is no strict requirements on the consistency between the Erlang app name and Elixir module name. Two modules with same namespace prefix could locate in different Erlang apps (this happens a lot to the Beaver.MLIR namespace). Of course redefinition of Elixir modules with an identical name should be avoided.

How it works?

To implement a MLIR toolkit, we at least need these group of APIs:

  • IR API, to create and update Ops and blocks in the IR
  • Pass API, to create and run passes
  • Pattern API, in which you declare the transformation of a specific structure of Ops

We implement the IR API and Pass API with the help of the MLIR C API. There are both lower level APIs generated from the C headers and higher level APIs that are more idiomatic in Elixir. The Pattern API is implemented with the help from the PDL dialect. We are using the lower level IR APIs to compile your Elixir code to PDL. Another way to look at this is that Elixir/Erlang pattern matching is serving as a frontend alternative to PDLL.

Design principles

Transformation over builder

It is very common to use builder pattern to construct IR, especially in an OO programming language like C++/Python. One problem this approach has is that the compiler code looks very different from the code it is generating. Because Erlang/Elixir is SSA by its nature, in Beaver a MLIR Op’s creation is very declarative and its container will transform it with the correct contextual information. By doing this, we could:

  • Keep compiler code’s structure as close as possible to the generated code, with less noise and more readability.
  • Allow dialects of different targets and semantic to introduce different DSL. For instance, CPU, SIMD, GPU could all have their specialized transformation tailored for their own unique concepts.

One example:

module do
  v2 = Arith.constant(1) >>> ~t<i32>
end
# module/1 is a macro, it will transformed the SSA `v2= Arith.constant..` to:
v2 =
 %Beaver.SSA{}
  |> Beaver.SSA.put_arguments(value: ~a{1})
  |> Beaver.SSA.put_block(Beaver.Env.block())
  |> Beaver.SSA.put_ctx(Beaver.Env.context())
  |> Beaver.SSA.put_results(~t<i32>)
  |> Arith.constant()

Also, using the declarative way to construct IR, proper dominance and operand reference is formed naturally.

SomeDialect.some_op do
  region do
    block entry() do
      x = Arith.constant(1) >>> ~t<i32>
      y = Arith.constant(1) >>> ~t<i32>
    end
  end
  region do
    block entry() do
      z = Arith.addi(x, y) >>> ~t<i32>
    end
  end
end

# will be transformed to:

SomeDialect.some_op(
  regions: fn -> do
    region = Beaver.Env.region() # first region created
    block = Beaver.Env.block()
    x = Arith.constant(...)
    y = Arith.constant(...)

    region = Beaver.Env.region() # second region created
    block = Beaver.Env.block()
    z = Arith.addi([x, y, ...]) # x and y dominate z
  end
)

Beaver DSL as higher level AST for MLIR

There should be a 1:1 mapping between Beaver SSA DSL to MLIR SSA. It is possible to do a roundtrip parsing MLIR text format and dump it to Beaver DSL which is Elixir AST essentially. This makes it possible to easily debug a piece of IR in a more programmable and readable way.

In Beaver, working with MLIR should be in one format, no matter it is generating, transforming, debugging.

Is Beaver a compiler or binding to LLVM/MLIR?

Elixir is a programming language built for all purposes. There are multiple sub-ecosystems in the general Erlang/Elixir ecosystem. Each sub-ecosystem appears distinct/unrelated to each other, but they actually complement each other in the real world production. To name a few:

Each of these sub-ecosystems starts with a seed project/library. Beaver should evolve to become a sub-ecosystem for compilers built with Elixir and MLIR.

MLIR context management

When calling higher-level APIs, it is ideal not to have MLIR context passing around everywhere. If no MLIR context provided, an attribute and type getter should return an anonymous function with MLIR context as argument. In Erlang, all values are copied, so it is very safe to pass around these anonymous functions. When creating an operation, these functions will be called with the MLIR context in an operation state. With this approach we achieve both succinctness and modularity, not having a global MLIR context. Usually a function accepting a MLIR context to create an operation or type is called a “creator” in Beaver.

Release a new version

Update Elixir source

Linux

Mac

  • Run macOS build with:
  rm -rf _build/prod
  bash scripts/build-for-publish.sh
  • Upload the beaver-nif-[xxx].tar.gz file to release

Generate checksum.exs

rm checksum.exs
mix clean
mix
mix elixir_make.checksum --all --ignore-unavailable --print

Check the version in the output is correct.

Publish to Hex

BEAVER_BUILD_CMAKE=1 mix hex.publish

(Optional) Format CMake files

python3 -m pip install cmake-format
cmake-format -i native/**/CMakeLists.txt native/**/*.cmake

Articles

  • coming soon...