Trie

Functional maps

Functional maps (and sets) whose representation is "canonical", and independent of their operation history (unlike other popular search trees).

Background

The representation we use here comes from Section 6 of ["Incremental computation via function caching", Pugh & Teitelbaum](https://dl.acm.org/citation.cfm?id=75305).

Trie

type Trie<K, V> = {#empty; #leaf : Leaf<K, V>; #branch : Branch<K, V>}

binary hash tries: either empty, a leaf node, or a branch node

Leaf

type Leaf<K, V> = { size : Nat; keyvals : AssocList<Key<K>, V> }

leaf nodes of trie consist of key-value pairs as a list.

Branch

type Branch<K, V> = { size : Nat; left : Trie<K, V>; right : Trie<K, V> }

branch nodes of the trie discriminate on a bit position of the keys' hashes. we never store this bitpos; rather, we enforce a style where this position is always known from context.

AssocList

type AssocList<K, V> = AssocList.AssocList<K, V>

Key

type Key<K> = { hash : Hash.Hash; key : K }

equalKey

func equalKey<K>(keq : (K, K) -> Bool) : ((Key<K>, Key<K>) -> Bool)

Equality function for two `Key<K>`s, in terms of equality of `K’s.

isValid

func isValid<K, V>(t : Trie<K, V>, enforceNormal : Bool) : Bool

checks the invariants of the trie structure, including the placement of keys at trie paths

Trie2D

type Trie2D<K1, K2, V> = Trie<K1, Trie<K2, V>>

A 2D trie maps dimension-1 keys to another layer of tries, each keyed on the dimension-2 keys.

Trie3D

type Trie3D<K1, K2, K3, V> = Trie<K1, Trie2D<K2, K3, V>>

A 3D trie maps dimension-1 keys to another layer of 2D tries, each keyed on the dimension-2 and dimension-3 keys.

empty

func empty<K, V>() : Trie<K, V>

An empty trie.

size

func size<K, V>(t : Trie<K, V>) : Nat
Get the number of key-value pairs in the trie, in constant time.

branch

func branch<K, V>(l : Trie<K, V>, r : Trie<K, V>) : Trie<K, V>

Construct a branch node, computing the size stored there.

leaf

func leaf<K, V>(kvs : AssocList<Key<K>, V>, bitpos : Nat) : Trie<K, V>

Construct a leaf node, computing the size stored there.

This helper function automatically enforces the MAX_LEAF_SIZE by constructing branches as necessary; to do so, it also needs the bitpos of the leaf.

fromList

func fromList<K, V>(kvc : ?Nat, kvs : AssocList<Key<K>, V>, bitpos : Nat) : Trie<K, V>

clone

func clone<K, V>(t : Trie<K, V>) : Trie<K, V>

clone the trie efficiently, via sharing.

Purely-functional representation permits O(1) copy, via persistent sharing.

replace

func replace<K, V>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool, v : ?V) : (Trie<K, V>, ?V)

replace the given key’s value option with the given one, returning the previous one

put

func put<K, V>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool, v : V) : (Trie<K, V>, ?V)

put the given key’s value in the trie; return the new trie, and the previous value associated with the key, if any

find

func find<K, V>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool) : ?V
find the given key's value in the trie, or return null if nonexistent

merge

func merge<K, V>(tl : Trie<K, V>, tr : Trie<K, V>, k_eq : (K, K) -> Bool) : Trie<K, V>
merge tries, preferring the right trie where there are collisions
in common keys.
note: the `disj` operation generalizes this `merge`
operation in various ways, and does not (in general) lose
information; this operation is a simpler, special case.
See also:
  • disj

  • join

  • prod

mergeDisjoint

func mergeDisjoint<K, V>(tl : Trie<K, V>, tr : Trie<K, V>, k_eq : (K, K) -> Bool) : Trie<K, V>

like merge, it merges tries, but unlike merge, it signals a dynamic error if there are collisions in common keys between the left and right inputs.

diff

func diff<K, V, W>(tl : Trie<K, V>, tr : Trie<K, W>, k_eq : (K, K) -> Bool) : Trie<K, V>

The key-value pairs of the final trie consists of those pairs of the left trie whose keys are not present in the right trie; the values of the right trie are irrelevant.

disj

func disj<K, V, W, X>(tl : Trie<K, V>, tr : Trie<K, W>, k_eq : (K, K) -> Bool, vbin : (?V, ?W) -> X) : Trie<K, X>

This operation generalizes the notion of "set union" to finite maps.

Produces a "disjunctive image" of the two tries, where the values of matching keys are combined with the given binary operator.

For unmatched key-value pairs, the operator is still applied to create the value in the image. To accomodate these various situations, the operator accepts optional values, but is never applied to (null, null).

See also:

  • join

  • merge

  • prod

join

func join<K, V, W, X>(tl : Trie<K, V>, tr : Trie<K, W>, k_eq : (K, K) -> Bool, vbin : (V, W) -> X) : Trie<K, X>

This operation generalizes the notion of "set intersection" to finite maps. Produces a "conjuctive image" of the two tries, where the values of matching keys are combined with the given binary operator, and unmatched key-value pairs are not present in the output.

See also:

  • disj

  • merge

  • prod

foldUp

func foldUp<K, V, X>(t : Trie<K, V>, bin : (X, X) -> X, leaf : (K, V) -> X, empty : X) : X

This operation gives a recursor for the internal structure of tries. Many common operations are instantiations of this function, either as clients, or as hand-specialized versions (e.g., see , map, mapFilter, some and all below).

prod

func prod<K1, V1, K2, V2, K3, V3>(tl : Trie<K1, V1>, tr : Trie<K2, V2>, op : (K1, V1, K2, V2) -> ?(Key<K3>, V3), k3_eq : (K3, K3) -> Bool) : Trie<K3, V3>

Conditional catesian product, where the given operation op conditionally creates output elements in the resulting trie.

The keyed structure of the input tries are not relevant for this operation: all pairs are considered, regardless of keys matching or not. Moreover, the resulting trie may use keys that are unrelated to these input keys.

See also:

  • disj

  • join

  • merge

Build

let Build

Represent the construction of tries as data.

This module provides optimized variants of normal tries, for more efficient join queries.

The central insight is that for (unmaterialized) join query results, we do not need to actually build any resulting trie of the resulting data, but rather, just need a collection of what would be in that trie. Since query results can be large (quadratic in the DB size), avoiding the construction of this trie provides a considerable savings.

To get this savings, we use an ADT for the operations that would build this trie, if evaluated. This structure specializes a rope: a balanced tree representing a sequence. It is only as balanced as the tries from which we generate these build ASTs. They have no intrinsic balance properties of their own.

fold

func fold<K, V, X>(t : Trie<K, V>, f : (K, V, X) -> X, x : X) : X

Fold over the key-value pairs of the trie, using an accumulator. The key-value pairs have no reliable or meaningful ordering.

some

func some<K, V>(t : Trie<K, V>, f : (K, V) -> Bool) : Bool

Test whether a given key-value pair is present, or not.

all

func all<K, V>(t : Trie<K, V>, f : (K, V) -> Bool) : Bool

Test whether all key-value pairs have a given property.

nth

func nth<K, V>(t : Trie<K, V>, i : Nat) : ?(Key<K>, V)

Project the nth key-value pair from the trie.

Note: This position is not meaningful; it’s only here so that we can inject tries into arrays using functions like Array.tabulate.

toArray

func toArray<K, V, W>(t : Trie<K, V>, f : (K, V) -> W) : [W]

Gather the collection of key-value pairs into an array of a (possibly-distinct) type.

Implementation notes:

we use this function repeatedly in the Produce Exchange example application, often on very large tries.

Performance Profiling shows that it is important that this be memory efficient, and reasonably time efficient, at large scales.

To do so, we use a single array allocation (for the returned array) and we sacrifice some efficiency in reading the input trie, and instead use function nth to project each element with an independent trie traversal.

This approach is somewhat forced on us by the type signature of A.tabulate, and the desire to only allocate one array; that requirement rules out iterative mutation of an optionally-null array, since an imperative approach which would give us the wrong return type.

Since we want to statically rule out null output elements, and since the AS type system cannot do that for an imperative approach unless we assume more about the type W (e.g., the existence of "default values"), we settle for using nth.

isEmpty

func isEmpty<K, V>(t : Trie<K, V>) : Bool

Test for "deep emptiness": subtrees that have branching structure, but no leaves. These can result from naive filtering operations; filter uses this function to avoid creating such subtrees.

filter

func filter<K, V>(t : Trie<K, V>, f : (K, V) -> Bool) : Trie<K, V>

filter the key-value pairs by a given predicate.

mapFilter

func mapFilter<K, V, W>(t : Trie<K, V>, f : (K, V) -> ?W) : Trie<K, W>

map and filter the key-value pairs by a given predicate.

equalStructure

func equalStructure<K, V>(tl : Trie<K, V>, tr : Trie<K, V>, keq : (K, K) -> Bool, veq : (V, V) -> Bool) : Bool

Test for equality, but naively, based on structure. Does not attempt to remove "junk" in the tree; For instance, a "smarter" approach would equate #bin{left=#empty;right=#empty} with #empty. We do not observe that equality here.

replaceThen

func replaceThen<K, V, X>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool, v2 : V, success : (Trie<K, V>, V) -> X, fail : () -> X) : X

replace the given key’s value in the trie, and only if successful, do the success continuation, otherwise, return the failure value

putFresh

func putFresh<K, V>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool, v : V) : Trie<K, V>

put the given key’s value in the trie; return the new trie; assert that no prior value is associated with the key

put2D

func put2D<K1, K2, V>(t : Trie2D<K1, K2, V>, k1 : Key<K1>, k1_eq : (K1, K1) -> Bool, k2 : Key<K2>, k2_eq : (K2, K2) -> Bool, v : V) : Trie2D<K1, K2, V>

put the given key’s value in the 2D trie; return the new 2D trie.

put3D

func put3D<K1, K2, K3, V>(t : Trie3D<K1, K2, K3, V>, k1 : Key<K1>, k1_eq : (K1, K1) -> Bool, k2 : Key<K2>, k2_eq : (K2, K2) -> Bool, k3 : Key<K3>, k3_eq : (K3, K3) -> Bool, v : V) : Trie3D<K1, K2, K3, V>

put the given key’s value in the trie; return the new trie;

remove

func remove<K, V>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool) : (Trie<K, V>, ?V)

remove the given key’s value in the trie; return the new trie

removeThen

func removeThen<K, V, X>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool, success : (Trie<K, V>, V) -> X, fail : () -> X) : X

remove the given key’s value in the trie, and only if successful, do the success continuation, otherwise, return the failure value

remove2D

func remove2D<K1, K2, V>(t : Trie2D<K1, K2, V>, k1 : Key<K1>, k1_eq : (K1, K1) -> Bool, k2 : Key<K2>, k2_eq : (K2, K2) -> Bool) : (Trie2D<K1, K2, V>, ?V)

remove the given key-key pair’s value in the 2D trie; return the new trie, and the prior value, if any.

remove3D

func remove3D<K1, K2, K3, V>(t : Trie3D<K1, K2, K3, V>, k1 : Key<K1>, k1_eq : (K1, K1) -> Bool, k2 : Key<K2>, k2_eq : (K2, K2) -> Bool, k3 : Key<K3>, k3_eq : (K3, K3) -> Bool) : (Trie3D<K1, K2, K3, V>, ?V)

remove the given key-key pair’s value in the 3D trie; return the new trie, and the prior value, if any.

mergeDisjoint2D

func mergeDisjoint2D<K1, K2, V>(t : Trie2D<K1, K2, V>, k1_eq : (K1, K1) -> Bool, k2_eq : (K2, K2) -> Bool) : Trie<K2, V>

Like [mergeDisjoint](#mergedisjoint), except instead of merging a pair, it merges the collection of dimension-2 sub-trees of a 2D trie.