📖 WHAT IS A DAWG???!
A Directed Acyclic Word Graph (also called a Suffix Automaton) is a minimal finite automaton that accepts every suffix of a word — and therefore every substring. It is the smallest possible structure that can recognise all substrings of a word.
Unlike a trie, which stores each suffix separately, the DAWG merges states that accept the same set of future suffixes into a single node. The result has at most 2n − 1 states and 3n − 4 transitions for a word of length n.
What can you do with it? After building the DAWG for string S, you can find the Longest Common Substring between S and any query string T in O(|T|) time by traversing the automaton.
🧠 Key Concepts
- State
- A node representing one endpos-equivalence class. All strings in a state have lengths in a contiguous range [link.len + 1 … len(u)], and they all occur at exactly the same ending positions.
- Transition
- A labelled directed edge. Following edge labelled c from state u means "I just read character c, and I'm now in the class of strings that end with c".
- Suffix Link
- A dashed edge from state u to the state whose endpos is the smallest superset of endpos(u). Concretely: the state representing the longest proper suffix of u's shortest string. Because a shorter string must occur at least as often, its endpos is always a superset.
- len(u)
- The length of the longest string in state u's equivalence class. The invariant
len(link(u)) < len(u)always holds, making the suffix-link tree well-founded. - endpos(u)
- The set of ending positions in the original word where the strings in state u occur. Two substrings share a state if and only if they end at exactly the same positions — this partition into endpos-equivalence classes is the key idea behind the whole structure.
⚙️ ENTER YOUR WORD!!!
Type a short word (2–25 lowercase letters). We will build the DAWG for it step by step!