XUtils

parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers [Apache2] [website](https://greg7mdp.github.io/parallel-hashmap/)


Fast and memory friendly

Click here For a full writeup explaining the design and benefits of the Parallel Hashmap.

The hashmaps and btree provided here are built upon those open sourced by Google in the Abseil library. The hashmaps use closed hashing, where values are stored directly into a memory array, avoiding memory indirections. By using parallel SSE2 instructions, these hashmaps are able to look up items by checking 16 slots in parallel, allowing the implementation to remain fast even when the table is filled up to 87.5% capacity.

IMPORTANT: This repository borrows code from the abseil-cpp repository, with modifications, and may behave differently from the original. This repository is an independent work, with no guarantees implied or provided by the authors. Please visit abseil-cpp for the official Abseil libraries.

Example

#include <iostream>
#include <string>
#include <parallel_hashmap/phmap.h>

using phmap::flat_hash_map;

int main()
{
    // Create an unordered_map of three strings (that map to strings)
    flat_hash_map<std::string, std::string> email =
    {
        { "tom",  "tom@gmail.com"},
        { "jeff", "jk@gmail.com"},
        { "jim",  "jimg@microsoft.com"}
    };

    // Iterate and print keys and values
    for (const auto& n : email)
        std::cout << n.first << "'s email is: " << n.second << "\n";

    // Add a new entry
    email["bill"] = "bg@whatever.com";

    // and print it
    std::cout << "bill's email is: " << email["bill"] << "\n";

    return 0;
}

Iterator invalidation for hash containers

The rules are the same as for std::unordered_map, and are valid for all the phmap hash containers:

Operations Invalidated
All read only operations, swap, std::swap Never
clear, rehash, reserve, operator= Always
insert, emplace, emplace_hint, operator[] Only if rehash triggered
erase Only to the element erased

Iterator invalidation for btree containers

Unlike for std::map and std::set, any mutating operation may invalidate existing iterators to btree containers.

Operations Invalidated
All read only operations, swap, std::swap Never
clear, operator= Always
insert, emplace, emplace_hint, operator[] Yes
erase Yes

Example 2 - providing a hash function for a user-defined class

In order to use a flat_hash_set or flat_hash_map, a hash function should be provided. This can be done with one of the following methods:

  • Provide a hash functor via the HashFcn template parameter

  • As with boost, you may add a hash_value() friend function in your class.

For example:

#include <parallel_hashmap/phmap_utils.h> // minimal header providing phmap::HashState()
#include <string>
using std::string;

struct Person
{
    bool operator==(const Person &o) const
    {
        return _first == o._first && _last == o._last && _age == o._age;
    }

    friend size_t hash_value(const Person &p)
    {
        return phmap::HashState().combine(0, p._first, p._last, p._age);
    }

    string _first;
    string _last;
    int    _age;
};
  • Inject a specialization of std::hash for the class into the “std” namespace. We provide a convenient and small header phmap_utils.h which allows to easily add such specializations.

For example:

file “Person.h”

#include <parallel_hashmap/phmap_utils.h> // minimal header providing phmap::HashState()
#include <string>
using std::string;

struct Person
{
    bool operator==(const Person &o) const
    {
        return _first == o._first && _last == o._last && _age == o._age;
    }

    string _first;
    string _last;
    int    _age;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            return phmap::HashState().combine(0, p._first, p._last, p._age);
        }
    };
}

The std::hash specialization for Person combines the hash values for both first and last name and age, using the convenient phmap::HashState() function, and returns the combined hash value.

file “main.cpp”

#include "Person.h"   // defines Person  with std::hash specialization

#include <iostream>
#include <parallel_hashmap/phmap.h>

int main()
{
    // As we have defined a specialization of std::hash() for Person,
    // we can now create sparse_hash_set or sparse_hash_map of Persons
    // ----------------------------------------------------------------
    phmap::flat_hash_set<Person> persons =
        { { "John", "Mitchell", 35 },
          { "Jane", "Smith",    32 },
          { "Jane", "Smith",    30 },
        };

    for (auto& p: persons)
        std::cout << p._first << ' ' << p._last << " (" << p._age << ")" << '\n';

}

Thread safety

Parallel Hashmap containers follow the thread safety rules of the Standard C++ library. In Particular:

  • A single phmap hash table is thread safe for reading from multiple threads. For example, given a hash table A, it is safe to read A from thread 1 and from thread 2 simultaneously.

  • If a single hash table is being written to by one thread, then all reads and writes to that hash table on the same or other threads must be protected. For example, given a hash table A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.

  • It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given hash tables A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.

  • The parallel tables can be made internally thread-safe for concurrent read and write access, by providing a synchronization type (for example std::mutex) as the last template argument. Because locking is performed at the submap level, a high level of concurrency can still be achieved. Read access can be done safely using if_contains(), which passes a reference value to the callback while holding the submap lock. Similarly, write access can be done safely using modify_if, try_emplace_l or lazy_emplace_l. However, please be aware that iterators or references returned by standard APIs are not protected by the mutex, so they cannot be used reliably on a hash map which can be changed by another thread.

  • Examples on how to use various mutex types, including boost::mutex, boost::shared_mutex and absl::Mutex can be found in examples/bench.cc

Acknowledgements

Many thanks to the Abseil developers for implementing the swiss table and btree data structures (see abseil-cpp) upon which this work is based, and to Google for releasing it as open-source.


Articles

  • coming soon...