A .Net Trie in C# implementing IDictionary

Recently I decided to write a straight forward solver for a popular word game similar to Boggle. The obvious data structure for such a game is the Trie. When searching the .Net library for such a data structure, I could not find one that suited my needs. Also looking over the internet did not yield the result I desired. I decided to implement my own Trie for this application.

Before we can start implementing the Trie, we need to understand the data structure at a fundamental level. Though often thought of as a neat way to store strings in memory, the structure is much more than that. A Trie is an associative array mapping sets of keys to values. This differs from most associative arrays that map keys to values. In many uses of a Trie, the keys are strings and the values are unused. Even in this configuration a Trie's internal structure yields very useful properties such as finding all sets of keys that share the same prefix. This was the property that I needed for my word game solver, but if we are implementing the structure, we should do it correctly. I decided that a Trie can be mapped to the interface of IDictionary with at least one added method to look up entries that share the same prefix.

Seperating these Trie specific methods into a seperate interface yields:

    /// <summary>
    /// Interface for a data structure that implements a Trie.
    /// </summary>
    /// <typeparam name="TKey">Type of the Key to store. Note each entry in the Trie has multiple Keys represented by an IEnumerable.</typeparam>
    /// <typeparam name="TValue">Type of the Value to store. This type must have a public parameterless constructor.</typeparam>
    public interface ITrie<TKey, TValue> : IDictionary<IEnumerable<TKey>, TValue>
    {
        /// <summary>
        /// Find all KeyValuePairs whose keys have the prefix specified by keys
        /// </summary>
        /// <param name="keys">Keys that all results must begin with</param>
        /// <returns>Collection of KeyValuePairs whose Keys property begins with the supplied keys prefix</returns>
        ICollection<KeyValuePair<IEnumerable<TKey>, TValue>> Suffixes(IEnumerable<TKey> keys);
    }
Notice that I abstracted out the concept of sets of keys from the interface. This cleans up many methods, while still making the user aware via method signatures that entries in the associative array are represented by sets of keys and not keys. Here is what the interface becomes for a Trie implementing ITrie:
    /// <summary>
    /// Implementation of the ITrie interface, representing a collection of keys of keys and values.
    /// </summary>
    /// <typeparam name="TKey">Type of the Key to store. Note each entry in the Trie has multiple Keys represented by an IEnumerable.</typeparam>
    /// <typeparam name="TValue">Type of the Value to store. This type must have a public parameterless constructor.</typeparam>
    public class Trie<TKey, TValue> : ITrie<TKey, TValue> where TValue : new()
    {
        /// <summary>
        /// Initializes a new instance of the Trie class.
        /// </summary>
        public Trie()
        {
            this.root = new Node<TKey, TValue>();
            this.keys = new List<IEnumerable<TKey>>();
            this.values = new List<TValue>();
        }

        #region ICollectionProperties
        /// <summary>
        /// Gets the number of elements contained in the ICollection.
        /// </summary>
        public int Count;

        /// <summary>
        /// Gets a value indicating whether the ICollection is read-only.
        /// </summary>
        public bool IsReadOnly;
        #endregion

        #region IDictionaryProperties
        /// <summary>
        /// Gets an ICollection containing the keys of the IDictionary.
        /// </summary>
        public ICollection<IEnumerable<TKey>> Keys;

        /// <summary>
        /// Gets an ICollection containing the values in the IDictionary.
        /// </summary>
        public ICollection<TValue> Values;
        #endregion

        #region IDictionaryIndexers
        /// <summary>
        /// Gets or sets the element with the specified key.
        /// </summary>
        /// <param name="keys">The key of the element to get or set.</param>
        /// <returns>The element with the specified key.</returns>
        public TValue this[IEnumerable<TKey> keys];
        #endregion

        #region ICollectionMethods
        /// <summary>
        /// Adds an item to the ICollection.
        /// </summary>
        /// <param name="item">The object to add to the ICollection.</param>
        public void Add(KeyValuePair<IEnumerable<TKey>, TValue> item);

        /// <summary>
        /// Removes all items from the ICollection.
        /// </summary>
        public void Clear();

        /// <summary>
        /// Determines whether the ICollection contains a specific value.
        /// </summary>
        /// <param name="item">The object to locate in the ICollection.</param>
        /// <returns>true if item is found in the ICollection; otherwise, false.</returns>
        public bool Contains(KeyValuePair<IEnumerable<TKey>, TValue> item);

        /// <summary>
        /// Copies the elements of the ICollection to an Array, starting at a particular Array index. 
        /// </summary>
        /// <param name="array">The one-dimensional Array that is the destination of the elements copied from ICollection. The Array must have zero-based indexing.</param>
        /// <param name="arrayIndex">The zero-based index in array at which copying begins.</param>
        public void CopyTo(KeyValuePair<IEnumerable<TKey>, TValue>[] array, int arrayIndex);

        /// <summary>
        /// Removes the first occurrence of a specific object from the ICollection. 
        /// </summary>
        /// <param name="item">The object to remove from the ICollection.</param>
        /// <returns>true if item was successfully removed from the ICollection; otherwise, false. This method also returns false if item is not found in the original ICollection.</returns>
        public bool Remove(KeyValuePair<IEnumerable<TKey>, TValue> item);
        #endregion

        #region IDictionaryMethods
        /// <summary>
        /// Adds an element with the provided key and value to the IDictionary.
        /// </summary>
        /// <param name="keys">The object to use as the key of the element to add.</param>
        /// <param name="value">The object to use as the value of the element to add.</param>
        public void Add(IEnumerable<TKey> keys, TValue value);
        
        /// <summary>
        /// Determines whether the IDictionary contains an element with the specified key.
        /// </summary>
        /// <param name="keys">The key to locate in the Dictionary.</param>
        /// <returns>true if the IDictionary contains an element with the key; otherwise, false.</returns>
        public bool ContainsKey(IEnumerable<TKey> keys);

        /// <summary>
        /// Removes the element with the specified key from the IDictionary.
        /// </summary>
        /// <param name="keys">The key of the element to remove.</param>
        /// <returns>true if the element is successfully removed; otherwise, false. This method also returns false if key was not found in the original IDictionary.</returns>
        public bool Remove(IEnumerable<TKey> keys);

        /// <summary>
        /// Gets the value associated with the specified key.
        /// </summary>
        /// <param name="keys">The key whose value to get.</param>
        /// <param name="value">When this method returns, the value associated with the specified key, if the key is found; otherwise, the default value for the type of the value parameter. This parameter is passed uninitialized.</param>
        /// <returns>true if the object that implements IDictionary contains an element with the specified key; otherwise, false.</returns>
        public bool TryGetValue(IEnumerable<TKey> keys, out TValue value);
        #endregion

        #region ITrieMethods
        /// <summary>
        /// Find all KeyValuePairs whose keys have the prefix specified by keys
        /// </summary>
        /// <param name="keys">Keys that all results must begin with</param>
        /// <returns>Collection of KeyValuePairs whose Keys property begins with the supplied keys prefix</returns>
        public ICollection<KeyValuePair<IEnumerable<TKey>, TValue>> Suffixes(IEnumerable<TKey> keys);
        #endregion

        #region IEnumerableMethods
        /// <summary>
        /// Returns an enumerator that iterates through the collection. 
        /// </summary>
        /// <returns>A IEnumerator that can be used to iterate through the collection.</returns>
        public IEnumerator<KeyValuePair<IEnumerable<TKey>, TValue>> GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through a collection.
        /// </summary>
        /// <returns>An IEnumerator object that can be used to iterate through the collection.</returns>
        IEnumerator IEnumerable.GetEnumerator();
        #endregion
}
This makes a nice Trie that is useful not only for simple word game solvers, but also complex associate array uses. For an example use, we can use the word game. Say we are given the prefix "do" and told to determine whether for a given list of words, if there are any words that begin with this prefix. We could use linq, but this is inefficient as we have to do a string comparison on every word in the word list:
words.Where(word => word.StartsWith("do")).Any();
A Trie on the otherhand can simply navigate to a certain node of the prefix tree, and enumerate all decendents. If there are no decendents, then we know there are not any words that start with the given set of keys:
words.Suffixes("do").Count == 0;
I've zipped the library and made availble under the lenient public domain license where available. Please comment on my implementation in the comments! I definitely know we can improve the enumerator to have a smaller memory footprint.

5 comments:

Katlan Merrill said...

Hey Chris,

Thanks for posting this! I am currently trying to use and extend it for my uses, but after a bit of testing, I decided I either don't know how to use this, or it doesn't quite work.

The problem is that if I were to add words "hello", "hell", and "he", all three would be found. But in addition, the suffix "hel" would be found and interpreted as a word. So this becomes unusable for word-recognition.

In a different post someone suggested using the [currently unused] Value as the indicator at each node whether ending there would constitute a word. This may work, but will require rewriting some of your code, and losing the generic-ness and multi-usability of the Value property.

Just some thoughts, correct me if I'm wrong. Hope this helps!

Also, a bit more How-To would be greatly appreciated, regardless of your comments. Would've saved me a chunk of time.

Thanks again for the post!
-K

PS What's with the unfinished Microsoft copywrite?

Christopher Miceli said...

@Katlan Merrill
Thanks for your comments. You can ignore the Microsoft copyright notice. I was copying some files around and that ended up on there. as I said at the end of the post, anyone can use this.

As for your scenario involving hello, hell, and he, and I am not experiencing this issue. I think what is happening is your value (the second type parameter) is a value type and not reference type. The current implementation of Trie uses null checks to determine if a node exists in the trie or not. Can you retry your example with a nullable type, such as int? or string class as the value? An example would be:

ITrie trie = new Trie();
trie.Add("Hello", 0);
trie.Add("Hell", 0);
trie.Add("He", 0);
foreach (var entry in trie)
{
Console.WriteLine("Key: {0}, value: {1}", new string(entry.Key.ToArray()), entry.Value);
}

Katlan Merrill said...

Hey Christopher,

Thanks for replying! Glad to see comments don't go unnoticed :)

So I found later the null check you're speaking of. I was using ints and bools as my Value type, which I believe are both non-nullable. Once I gave it an object it seemed to work better. To clarify, you need to give it a nullable type, not a value type, so Boolean, not bool, etc. Haven't performed too much testing, but I believe this is the case.

But I soon came upon the need to serialize the Trie due to the slowness in performing 50k Add calls to load my dictionary. So to simplify things, I ended up just scrapping the Generic types and replacing them with direct string/bool key/values, making serialization quite a bit easier. And as it turns out, I took the easy road and set it up so that Unity performed the Serialization for me :)

Anyways, it works now perfectly with ~zero load time! And since I didn't really need it for anything but my dictionary, the simplification of types works for me.

Thanks again for posting this, helped a lot!

-K

Christopher Miceli said...

@Katlan Merrill
Glad to be of help. I should really add a type constraint to the generic so that it must be an object type like you say:
where T : class
I too noticed that insertion is not the most efficient, maybe that is something I can look at in the future, but lookup is very good compared to other data structures so that is good.

Andy said...

Hi Chris,

Just tried to iterate trough the keys and extract the corect key-sequence and it failed, the stored key elements are a stateful result from a .Skip() linq operation.
You did an error on the Trie construction, the keys must be stored in a non-iterative state, you use key.Skip(1) and this yields a iterator in a state, not a su-sequence, so the trie is subject of errors when seeking or using the keys, they are not inmutable, and thus yields errors by traversing any of them once again.!
kindly,
Andres