XUtils

gen_rpc

A scalable RPC library for Erlang-VM based languages.


gen_rpc: A scalable RPC library for Erlang-VM based languages

Build Dependencies

To build this project you need to have the following:

  • Erlang/OTP >= 19.1

  • git >= 1.7

  • GNU make >= 3.80

  • rebar3 >= 3.2

API

gen_rpc implements only the subset of the functions of the rpc library that make sense for the problem it’s trying to solve. The library’s function interface and return values is 100% compatible with rpc with only one addition: Error return values include {badrpc, Error} for RPC-based errors but also {badtcp, Error} for TCP-based errors.

For more information on what the functions below do, run erl -man rpc.

Functions exported

  • call(NodeOrNodeAndKey, Module, Function, Args) and call(NodeOrNodeAndKey, Module, Function, Args, Timeout): A blocking synchronous call, in the gen_server fashion.

  • cast(NodeOrNodeAndKey, Module, Function, Args): A non-blocking fire-and-forget call.

  • async_call(NodeOrNodeAndKey, Module, Function, Args), yield(Key), nb_yield(Key) and nb_yield(Key, Timeout): Promise-based calls. Make a call with async_call and retrieve the result asynchronously, when you need it with yield or nb_yield.

  • multicall(Module, Function, Args), multicall(Nodes, Module, Function, Args), multicall(Module, Function, Args, Timeout) and multicall(NodesOrNodesWithKeys, Module, Function, Args, Timeout): Multi-node version of the call function.

  • abcast(NodesOrNodesWithKeys, Name, Msg) and abcast(Name, Msg): An asynchronous broadcast function, sending the message Msg to the named process Name in all the nodes in NodesOrNodesWithKeys.

  • sbcast(NodesOrNodesWithKeys, Name, Msg) and sbcast(Name, Msg): A synchronous broadcast function, sending the message Msg to the named process Name in all the nodes in NodesOrNodesWithKeys. Returns the nodes in which the named process is alive and the nodes in which it isn’t.

  • eval_everywhere(Module, Function, Args) and eval_everywhere(NodesOrNodesWithKeys, Module, Function, Args): Multi-node version of the cast function.

Application settings

  • tcp_server_port: The plain TCP port gen_rpc will use for incoming connections or false if you do not want plain TCP enabled.

  • tcp_client_port: The plain TCP port gen_rpc will use for outgoing connections.

  • ssl_server_port: The port gen_rpc will use for incoming SSL connections or false if you do not want SSL enabled.

  • ssl_client_port: The port gen_rpc will use for outgoing SSL connections.

  • ssl_server_options and ssl_client_options: Settings for the ssl interface that gen_rpc will use to connect to a remote gen_rpc server.

  • default_client_driver: The default driver gen_rpc is going to use to connect to remote gen_rpc nodes. It should be either tcp or ssl.

  • client_config_per_node: A map of Node => {Driver, Port} or Node => Port that instructs gen_rpc on the Port and/or Driver to use when connecting to a Node. If you prefer to use an external discovery service to map Nodes to {Driver, Port} tuples, instead of the map, you’ll need to define a {Module, Function} tuple instead with a function that takes the Node as its single argument, consumes the external discovery service and returns a {Driver, Port} tuple.

  • rpc_module_control: Set it to blacklist to define a list of modules that will not be exposed to gen_rpc or to whitelist to define the list of modules that will be exposed to gen_rpc. Set it to disabled to disable this feature.

  • rpc_module_list: The list of modules that are going to be blacklisted or whitelisted.

  • authentication_timeout: Default timeout for the authentication state of an incoming connection in milliseconds. Used to protect against half-open connections in a DoS attack.

  • connect_timeout: Default timeout for the initial node-to-node connection in milliseconds.

  • send_timeout: Default timeout for the transmission of a request (call/cast etc.) from the local node to the remote node in milliseconds.

  • call_receive_timeout: Default timeout for the reception of a response in a call in milliseconds.

  • sbcast_receive_timeout: Default timeout for the reception of a response in an sbcast in milliseconds.

  • client_inactivity_timeout: Inactivity period in milliseconds after which a client connection to a node will be closed (and hence have the TCP file descriptor freed).

  • server_inactivity_timeout: Inactivity period in milliseconds after which a server port will be closed (and hence have the TCP file descriptor freed).

  • async_call_inactivity_timeout: Inactivity period in milliseconds after which a pending process holding an async_call return value will exit. This is used for process sanitation purposes so please make sure to set it in a sufficiently high number (or infinity).

  • socket_keepalive_idle: Seconds idle after the last packet of data sent to start sending keepalive probes (applies to both drivers).

  • socket_keepalive_interval: Seconds between keepalive probes.

  • socket_keepalive_count: Probs lost to consider the socket closed

Build Targets

gen_rpc bundles a Makefile that makes development straightforward.

To build gen_rpc simply run:

make

To run the full test suite, run:

make test

To run the full test suite, the XRef tool and Dialyzer, run:

make dist

To build the project and drop in a console while developing, run:

make shell-master

or

make shell-slave

If you want to run a “master” and a “slave” gen_rpc nodes to run tests.

To clean every build artifact and log, run:

make distclean

Testing

A full suite of tests has been implemented for gen_rpc. You can run the CT-based test suite, dialyzer and xref by:

make dist

If you have Docker available on your system, you can run dynamic integration tests with “physically” separated hosts/nodes by running the command:

make integration

This will launch 3 slave containers and 1 master (change that by NODES=5 make integration) and will run the integration_SUITE CT test suite.

Rationale

TL;DR: gen_rpc uses a mailbox-per-node architecture and gen_tcp processes to parallelize data reception from multiple nodes without blocking the VM’s distributed port.

The reasons for developing gen_rpc became apparent after a lot of trial and error while trying to scale a distributed Erlang infrastructure using the rpc library initially and subsequently erlang:spawn/4 (remote spawn). Both these solutions suffer from very specific issues under a sufficiently high number of requests.

The rpc library operates by shipping data over the wire via Distributed Erlang’s ports into a registered gen_server on the other side called rex (Remote EXecution server), which is running as part of the standard distribution. In high traffic scenarios, this allows the inherent problem of running a single gen_server server to manifest: mailbox flooding. As the number of nodes participating in a data exchange with the node in question increases, so do the messages that rex has to deal with, eventually becoming too much for the process to handle (don’t forget this is confined to a single thread).

Enter erlang:spawn/4 (remote spawn from now on). Remote spawn dynamically spawns processes on a remote node, skipping the single-mailbox restriction that rex has. The are various libraries written to leverage that loophole (such as Rexi), however there’s a catch.

Remote spawn was not designed to ship large amounts of data as part of the call’s arguments. Hence, if you want to ship a large binary such as a picture or a transaction log (large can also be small if your network is slow) over remote spawn, sooner or later you’ll see this message popping up in your logs if you have subscribed to the system monitor through erlang:system_monitor/2:

{monitor,<4685.187.0>,busy_dist_port,#Port<4685.41652>}

This message essentially means that the VM’s distributed port pair was busy while the VM was trying to use it for some other task like Distributed Erlang heartbeat beacons or mnesia synchronization. This of course wrecks havoc in certain timing expectations these subsystems have and the results can be very problematic: the VM might detect a node as disconnected even though everything is perfectly healthy and mnesia might misdetect a network partition.

gen_rpc solves both these problems by sharding data coming from different nodes to different processes (hence different mailboxes) and by using a different gen_tcp port for different nodes (hence not utilizing the Distributed Erlang ports).

Performance

gen_rpc is being used in production extensively with over 150.000 incoming calls/sec/node on a 8-core Intel Xeon E5 CPU and Erlang 19.1. The median payload size is 500 KB. No stability or scalability issues have been detected in over a year.

Known Issues

  • When shipping an anonymous function over to another node, it will fail to execute because of the way Erlang implements anonymous functions (Erlang serializes the function metadata but not the function body). This issue also exists in both rpc and remote spawn.

Contributors:


Articles

  • coming soon...