Why ASDF?
asdf was originally developed at Tamedia to extract and transform real-time click streams.
- ASDF is fast. It can be really helpful if you have gigabytes of JSON line separated values.
- ASDF is simple. It uses D’s modelling power to make you write less boilerplate code.
- ASDF is tested and used in production for real World JSON generated by millions of web clients (we call it the great fuzzer).
see also github.com/tamediadigital/je a tool for fast extraction of json properties into a csv/tsv.
Simple Example
- define your struct
- call
serializeToJson
( orserializeToJsonPretty
for pretty printing! ) - profit!
/+dub.sdl:
dependency "asdf" version="~>0.2.5"
#turns on SSE4.2 optimizations when compiled with LDC
dflags "-mattr=+sse4.2" platform="ldc"
+/
import asdf;
struct Simple
{
string name;
ulong level;
}
void main()
{
auto o = Simple("asdf", 42);
string data = `{"name":"asdf","level":42}`;
assert(o.serializeToJson() == data);
assert(data.deserialize!Simple == o);
}
Documentation
See ASDF API and Specification.
I/O Speed
- Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
- Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).
Fast setup with the dub package manager
Dub is D’s package manager. You can create a new project with:
dub init <project-name>
Now you need to edit the dub.json
add asdf
as dependency and set its targetType to executable
.
(dub.json)
{
...
"dependencies": {
"asdf": "~><current-version>"
},
"targetType": "executable",
"dflags-ldc": ["-mcpu=native"]
}
(dub.sdl)
dependency "asdf" version="~><current-version>"
targetType "executable"
dflags "-mcpu=native" platform="ldc"
Now you can create a main file in the source
and run your code with
dub
Flags --build=release
and --compiler=ldmd2
can be added for a performance boost:
dub --build=release --compiler=ldmd2
ldmd2
is a shell on top of LDC (LLVM D Compiler).
"dflags-ldc": ["-mcpu=native"]
allows LDC to optimize ASDF for your CPU.
Instead of using -mcpu=native
, you may specify an additional instruction set for a target with -mattr
.
For example, -mattr=+sse4.2
. ASDF has specialized code for
SSE4.2.
Main transformation functions
uda | function |
---|---|
@serdeKeys("bar_common", "bar") |
tries to read the data from either property. saves it to the first one |
@serdeKeysIn("a", "b") |
tries to read the data from a , then b . last one occuring in the json wins |
@serdeKeyOut("a") |
writes it to a |
@serdeIgnore |
ignore this property completely |
@serdeIgnoreIn |
don’t read this property |
@serdeIgnoreOut |
don’t write this property |
@serdeIgnoreOutIf!condition |
run function condition on serialization and don’t write this property if the result is true |
@serdeScoped |
Dangerous! non allocating strings. this means data can vanish if the underlying buffer is removed. |
@serdeProxy!string |
call to!string |
@serdeTransformIn!fin |
call function fin to transform the data |
@serdeTransformOut!fout |
run function fout on serialization, different notation |
@serdeAllowMultiple |
Allows deserialiser to serialize multiple keys for the same object member input. |
@serdeOptional |
Allows deserialiser to to skip member desrization of no keys corresponding keys input. |
Please also look into the Docs or Unittest for concrete examples!
ASDF Example (incomplete)
import std.algorithm;
import std.stdio;
import asdf;
void main()
{
auto target = Asdf("red");
File("input.jsonl")
// Use at least 4096 bytes for real world apps
.byChunk(4096)
// 32 is minimum size for internal buffer. Buffer can be reallocated to get more memory.
.parseJsonByLine(4096)
.filter!(object => object
// opIndex accepts array of keys: {"key0": {"key1": { ... {"keyN-1": <value>}... }}}
["colors"]
// iterates over an array
.byElement
// Comparison with ASDF is little bit faster
// than comparison with a string.
.canFind(target))
//.canFind("red"))
// Formatting uses internal buffer to reduce system delegate and system function calls
.each!writeln;
}
Input
Single object per line: 4th and 5th lines are broken.
null
{"colors": ["red"]}
{"a":"b", "colors": [4, "red", "string"]}
{"colors":["red"],
"comment" : "this is broken (multiline) object"}
{"colors": "green"}
{"colors": "red"]}}
[]
Output
{"colors":["red"]}
{"a":"b","colors":[4,"red","string"]}
JSON and ASDF Serialization Examples
Simple struct or object
struct S
{
string a;
long b;
private int c; // private fields are ignored
package int d; // package fields are ignored
// all other fields in JSON are ignored
}
Selection
struct S
{
// ignored
@serdeIgnore int temp;
// can be formatted to json
@serdeIgnoreIn int a;
//can be parsed from json
@serdeIgnoreOut int b;
// ignored if negative
@serdeIgnoreOutIf!`a < 0` int c;
}
Key overriding
struct S
{
// key is overrided to "aaa"
@serdeKeys("aaa") int a;
// overloads multiple keys for parsing
@serdeKeysIn("b", "_b")
// overloads key for generation
@serdeKeyOut("_b_")
int b;
}
User-Defined Serialization
struct DateTimeProxy
{
DateTime datetime;
alias datetime this;
SerdeException deserializeFromAsdf(Asdf data)
{
string val;
if (auto exc = deserializeScopedString(data, val))
return exc;
this = DateTimeProxy(DateTime.fromISOString(val));
return null;
}
void serialize(S)(ref S serializer)
{
serializer.putValue(datetime.toISOString);
}
}
//serialize a Doubly Linked list into an Array
struct SomeDoublyLinkedList
{
@serdeIgnore DList!(SomeArr[]) myDll;
alias myDll this;
//no template but a function this time!
void serialize(ref AsdfSerializer serializer)
{
auto state = serializer.listBegin();
foreach (ref elem; myDll)
{
serializer.elemBegin;
serializer.serializeValue(elem);
}
serializer.listEnd(state);
}
}
Serialization Proxy
struct S
{
@serdeProxy!DateTimeProxy DateTime time;
}
@serdeProxy!ProxyE
enum E
{
none,
bar,
}
// const(char)[] doesn't reallocate ASDF data.
@serdeProxy!(const(char)[])
struct ProxyE
{
E e;
this(E e)
{
this.e = e;
}
this(in char[] str)
{
switch(str)
{
case "NONE":
case "NA":
case "N/A":
e = E.none;
break;
case "BAR":
case "BR":
e = E.bar;
break;
default:
throw new Exception("Unknown: " ~ cast(string)str);
}
}
string toString()
{
if (e == E.none)
return "NONE";
else
return "BAR";
}
E opCast(T : E)()
{
return e;
}
}
unittest
{
assert(serializeToJson(E.bar) == `"BAR"`);
assert(`"N/A"`.deserialize!E == E.none);
assert(`"NA"`.deserialize!E == E.none);
}
Finalizer
If you need to do additional calculations or etl transformations that happen to depend on the deserialized data use the finalizeDeserialization
method.
struct S
{
string a;
int b;
@serdeIgnoreIn double sum;
void finalizeDeserialization(Asdf data)
{
auto r = data["c", "d"];
auto a = r["e"].get(0.0);
auto b = r["g"].get(0.0);
sum = a + b;
}
}
assert(`{"a":"bar","b":3,"c":{"d":{"e":6,"g":7}}}`.deserialize!S == S("bar", 3, 13));