RRB-Trees: Efficient Immutable Vectors (2012) [pdf]

(infoscience.epfl.ch)

41 points | by azhenley 2 days ago

4 comments

  • xeubie 8 hours ago
    I used this data structure in my immutable database (see profile) but eventually switched to a B-tree because I believe RRB trees are inherently flawed. If you do a large number of slices and concats, it is possible for the tree to contain so many gaps it that the tree grows so deep it can't be modified anymore. At first I thought it was a bug in my own implementation but I eventually found the same bug in rrb-vector, the clojure implementation (see CRRBV-14). In fact, the maintainer of that library reached the same conclusion I did and switched to B-trees: https://github.com/jafingerhut/core.btree-vector

    Still, I have huge respect for Phil Bagwell and I make heavy use of his hash array mapped trie. But this issue with RRB trees makes it impossible for me to use them, especially for an on-disk data structure whose long lifespan makes it way more likely that the problem will eventually happen.

    • bjoli 2 hours ago
      that is because many implementations bypass strict rebalancing to squeeze out extra concat performance. And because the strict rebalancing algorithm is awful to implement.
    • HexDecOctBin 4 hours ago
      Did you use some resource on how to layer immutability on top of B-trees? This is finicky enough that I don't feel comfortable YOLOing.
  • ashton314 9 hours ago
    This is Rhombus’ native data type! Such a nice data structure.
  • erichocean 1 day ago
    If you like this kind of thing, Bifurcan [0] is a Java library with implementations of RBB-trees and related (fast) immutable data structures.

    [0] https://github.com/lacuna/bifurcan

  • wasting_time 2 days ago
    A refreshing break from Molt News. Now I want to check how vectors are implemented in my favorite languages.
    • inhumantsar 2 days ago
      the `im` rust crate provides immutable data structures, one of them being an RRB-based Vec. don't remember what the stdlib Vec uses.
      • oniony 1 day ago
        I believe Vec is a straight array underneath, which is reallocated at a larger size when full. And Vector in the `im` crate you mentioned looks very interesting indeed.