Quamina: Traversal for Pattern Matching

This series is an excuse to build a writing habit while exploring a codebase I’ve wanted to understand better.

This post answers how does Quamina execute this automaton at runtime to determine which patterns match an incoming event in mostly O(N) performance. Every event that arrives triggers this traversal path, making it the true bottleneck that’s been heavily optimize.

The challenge is messier than it first appears. Events might have fields in any order, though flattening sorted them lexically. Patterns can start matching at any field, not just the first one. A single pattern might require multiple fields to match, creating dependencies that the traversal must track. The solution is a recursive field-by-field walk that tries every field as a potential match start, following state transitions through both the field-level and value-level automata built during pattern compilation.

Field-by-Field Execution Link to heading

The entry point receives a sorted array of field path/value pairs from the flattening layer (core_matcher.go). The first operation sorts these fields lexically to match the order used during pattern compilation. Without this sorting alignment, the automaton cannot work. The fields are sorted by pathname before constructing the automaton, and incoming field lists must be sorted identically to allow proper state transitions.

Once sorted, the traversal tries each field as a potential pattern start. This is necessary because patterns can begin at any field. A pattern checking {"status": "active", "priority": 1} must match events whether they arrive as {"priority": 1, "status": "active"} or {"region": "us-west", "status": "active", "priority": 1}. The lexical sorting means both get reordered to status-then-priority, but the traversal still needs to handle the “region” field that doesn’t appear in any pattern:

for i := 0; i < len(fields); i++ {
    tryToMatch(fields, i, cmFields.state, matches, bufs)
}

Each iteration calls tryToMatch() with a different starting index, attempting to match the automaton from that field onward. The function is recursive, trying to extend matches by processing subsequent fields. A single event can match multiple patterns, so there’s no early exit. The accumulator collects all matches across all starting positions, deduplicating via a map structure before returning the final list.

The recursive structure of tryToMatch() (core_matcher.go) handles three types of transitions for each field. First, it checks for existence-true patterns that match whenever the field is present, regardless of value. These use a direct map lookup on the field path, returning immediately if found. Second, it checks existence-false patterns after attempting normal transitions, verifying that certain field paths do not appear anywhere in the event. Third, it processes normal field/value transitions by delegating to the dual-automaton chain.

For each successful transition, the function accumulates any patterns that have matched at that state, then recursively tries all subsequent fields against the new state. Array-aware conflict checking prevents invalid matches where fields come from different elements of the same JSON array. A flattened event like [{"status": "a"}, {"status": "b"}] produces two fields with different array trails, and the conflict checker ensures they cannot both match the same pattern instance.

Dual-Automaton Dispatch Link to heading

State transitions follow a two-level dispatch model reflecting the compilation architecture. The field matcher performs a map lookup on the field pathname (field_matcher.go), returning nil immediately if no patterns reference that field. This provides the first level of early termination, avoiding expensive value matching when the field is irrelevant:

func (m *fieldMatcher) transitionOn(field *Field, bufs *nfaBuffers) []*fieldMatcher {
    valMatcher, ok := m.fields().transitions[string(field.Path)]
    if !ok {
        return nil
    }
    return valMatcher.transitionOn(field, bufs)
}

When a field transition exists, the value matcher examines the field’s content. Three fast paths handle common cases before falling back to full automaton traversal (value_matcher.go). The singleton optimization applies when a field has only one allowed value across all patterns. This is extremely common for example many patterns specifying "status": "error" as part of the rules. Instead of building and traversing an automaton, the code performs a direct byte comparison using bytes.Equal() against the singleton value, then returns the pre-computed next state if it matches.

The second fast path distinguishes between deterministic and nondeterministic automata. Most patterns use exact string matches, numeric ranges, or simple prefixes; all of which compile to deterministic automata where each state has at most one successor per input byte. These can use DFA-style traversal with a single active state cursor, avoiding the overhead of epsilon closures and state-set management required for full NFA processing. Wildcard patterns, shell-style globs, and regular expressions require NFA traversal where multiple states are active simultaneously.

Numeric patterns generate a third branch. When the field value is tagged as a number during flattening and the pattern contains numeric constraints, the code converts the value bytes to Q-number encoding before traversal. The Q-number representation enables automaton-based numeric comparison by making lexical byte ordering match numeric ordering. A pattern checking x >= 100 compiles to an automaton accepting Q-encoded values from the encoded form of 100 onward, and the traversal simply walks this automaton with the encoded event value.

DFA Fast Path Link to heading

The DFA traversal is a simple byte-by-byte walk through the automaton (in nfa.go). Starting from the initial state table, it fetches the next table for each byte, accumulating field transitions as it goes. When it reaches the end of the value, it appends the special value terminator byte (0xF5) before taking a final step. This terminator unifies exact and prefix matching logic. Patterns can match at the terminator or at earlier positions where prefix matches are valid.

func traverseDFA(table *smallTable, val []byte, transitions []*fieldMatcher) []*fieldMatcher {
    for index := 0; index <= len(val); index++ {
        var utf8Byte byte
        if index < len(val) {
            utf8Byte = val[index]
        } else {
            utf8Byte = valueTerminator
        }
        next := table.dStep(utf8Byte)
        if next == nil {
            break
        }
        transitions = append(transitions, next.fieldTransitions...)
        table = next.table
    }
    return transitions
}

The simplicity enables good performance. There are no branches for epsilon handling, no state-set allocations, no closure computations. The dStep() function performs a linear scan through the ceiling array to find the transition for the input byte, a pattern that works well when the average number of transitions per state is small.

Early termination kicks in when next becomes nil, indicating no pattern can match the remaining bytes. This occurs frequently: an event with status = "inactive" will fail during DFA traversal of a pattern checking for “active” after the first few bytes diverge, avoiding unnecessary processing of the rest of the value.

NFA Traversal Link to heading

Nondeterministic patterns require tracking multiple active states as the automaton consumes input bytes (nfa.go). The implementation uses two pre-allocated buffers (buf1 and buf2) that swap roles on each iteration. Starting with a single initial state, the traversal loops through input bytes, computing epsilon closures and applying transitions for all currently-active states.

The epsilon closure computation finds all states reachable via zero-width epsilon transitions from a given state. These transitions represent nondeterministic choice points, typically from wildcard patterns where the automaton can either consume another byte with the wildcard or try to match the subsequent literal character. The implementation caches closures to avoid recomputing them on every iteration, since automaton structure doesn’t change during event processing.

For each active state in the current set, the code follows its epsilon closure, accumulates any field transitions it encounters, then computes the next states for the input byte. These next states become the current set for the next iteration. Buffer swapping avoids allocation: after processing a byte, the code clears the old current buffer and reuses it as the next buffer for the following iteration.

for index := 0; len(currentStates) != 0 && index <= len(val); index++ {
    var utf8Byte byte
    if index < len(val) {
        utf8Byte = val[index]
    } else {
        utf8Byte = valueTerminator
    }
    for _, state := range currentStates {
        closure := bufs.eClosure.getAndCacheClosure(state)
        for _, ecState := range closure {
            newTransitions.add(ecState.fieldTransitions)
            ecState.table.step(utf8Byte, stepResult)
            if stepResult.step != nil {
                nextStates = append(nextStates, stepResult.step)
            }
        }
    }
    swapStates := currentStates
    currentStates = nextStates
    nextStates = swapStates[:0]
}

The buffer reuse is critical for performance. Buffers will first grow to accommodate the workload, then stabilize, eventually reducing event-matching memory allocation to nearly zero.

Match Accumulation Link to heading

As traversal discovers successful pattern matches, it accumulates them in a deduplicating set structure (match_set.go). Multiple paths through the automaton might reach the same terminal state, particularly when patterns share common prefixes or when the recursive traversal tries different field orderings. The set ensures each pattern ID appears only once in the final result.

The accumulation uses an in-place mutation optimization. During single-threaded traversal, the code can safely mutate the match set’s internal map without copying it. This optimization improved performance compared to the previous copy-on-add approach:

func (m *matchSet) addXSingleThreaded(exes ...X) *matchSet {
    for _, x := range exes {
        m.set[x] = true
    }
    return m
}

The map keys automatically deduplicate. After traversal completes, a final conversion step extracts the keys into a slice for return to the caller. The slice order is non-deterministic due to map iteration randomization, but this doesn’t matter since pattern matches have no inherent ordering.

Performance Characteristics Link to heading

The traversal complexity is harder to pin down than it first appears. The obvious analysis suggests O(F × V) where F is the number of fields in the event and V is the average value length. The main loop tries each field as a starting position (O(F) iterations), and value matching walks through bytes (O(V) per field). However, the recursive structure creates dependencies that push worst-case complexity to O(F² × V) as each tryToMatch() call recurses on all subsequent fields, creating nested loops through the field list.

In practice, early termination makes this worst case rare. When a field doesn’t appear in any pattern, the field matcher returns nil immediately, short-circuiting the entire recursive branch. When a value doesn’t match the automaton, DFA or NFA traversal breaks early, typically after examining only a few bytes. Array conflict checking prevents many recursive calls from even starting. The result is that typical performance stays close to O(F × V) despite the theoretical quadratic bound.

The exists-false check is a known potential bottleneck since ruler times. Unlike normal field transitions that use map lookups, exists-false patterns require scanning all event fields to verify a field path does not appear (core_matcher.go). It gets slow for big events with hundreds or more fields though most cases (at least in ruler) of such checks were workaround for lack of regex support which is not as big of problem in Quamina with its RegEx matched support.

What This Enables Link to heading

The traversal implementation delivers the matching performance promised by the compilation architecture. Adding pattern number 10,000 to a Quamina instance doesn’t slow down event processing by 10,000x because those patterns share automaton structure. If the new pattern checks the same fields as existing patterns with different values, it extends existing value automata without adding new field transitions. The traversal cost scales with the number of unique fields referenced across all patterns and the complexity of value matching on those fields, not with the raw pattern count.

This makes Quamina suitable for systems that need to route events based on content filtering with thousands or millions of patterns. Event routers, pub-sub systems, and rule engines can compile user-defined patterns once, then match them against high-throughput event streams with predictable per-event costs. The state management layer handles dynamic pattern addition and deletion while traversal is occurring, but that’s a topic for future exploration of Quamina’s concurrent update mechanisms.