Binary Encodings for JSON and Variant

(jincongho.com)

14 points | by jincongho 3 days ago

2 comments

  • boricj 1 hour ago
    At work, I wrote a C++20 data binding library. It works by running visitors over a data model that binds to the application state. My comment comes from a different set of trade-offs driven by memory constraints.

    I've implemented a bunch of serialization visitors. For the structured formats, most (JSON, YAML, CBOR with indefinite lengths) use an output iterator and can stream out one character/byte at a time, which is useful when your target is a MCU with 640 KiB of SRAM and you need to reply large REST API responses.

    And there's the BSON serializer, which writes to a byte buffer because it uses tag-length-value and I need to backtrack in order to patch in the lengths after serializing the values. This means that the entire document needs to be written upfront before I can do something with it. It also has some annoying quirks, like array indices being strings in base 10.

    There are also other trade-offs when dealing with JSON vs. its binary encodings. Strings in JSON may have escape characters that require parsing, if it has them then you can't return a view into the document, you need to allocate a string to hold the decoded value. Whereas in BSON or CBOR (excluding indefinite-length strings) the strings are not escaped and you can return a std::string_view straight from the document (and even a const char* for BSON, as it embeds a NUL character).

    Some encodings like CBOR are also more expressive than JSON, allowing for example any value type to be used for map keys and not just strings.

  • kstenerud 1 hour ago
    As the author stated, it really depends on what you intend to use it for.

    Fast internal scanning isn't free, because now you need pre-indexing, which is more data, and loses the incremental buildability on the encoding end.

    Small transfer size and fast (full) decoding is possible with a single binary format, but unfortunately designers keep falling into the trap of adding extra things that make them incompatible with JSON. It's why I wrote https://github.com/kstenerud/bonjson/