Implementing Generators/Yield in Languages Without Support

As a follow up to my previous post, I intent to go into detail how generators are actually implemented in a language. The way I intend to do this is to actually implement the generator paradigm without using language provided tools such as keywords. The first step to this process is actually understanding what is happening when you use the yield keyword. Perhaps an simple walk through will be illustrative.

A function that wishes to use a generator will iterate over that generator. To the function using the generator, the generator acts more like a data structure that implements the enumerable (iterable) interface than a function at all. The generator function cannot be called except by iterating over it. This is the proper way to think of a generator, a data structure. If you are only using a generator, than this is all you need to know. Implementing a generator requires a little more detail. Let's examine a the most basic generator, one that counts down. This example uses, C#, but the syntax is only slightly different than other languages such as python.
public IEnumerable<int> count(int start, int end)
{
   int counter = end;
   while (counter <= end)
   {
      yield return counter;
      counter++;
   }
}
I used a while loop here intentionally so the control flow is obvious. Up until the yield statement, everything is normal. So what is happening at "yield return counter"? The function execution stops and returns the variable counter for each "call" happening indirectly through the iterator. So what happens the next time the iterator issues a call to this function? Well, the execution picks up at counter++! But what is the value of counter? The special yield return statement is saving the entire state of the generator(local variables, instruction pointer, and any other state information), allowing the generator to be resumed exactly where it left off. I'll get to that other information later. To wrap up the basic, right before yielding a value, save the state of the generator. Upon entering the generator again, restore the saved state and jump to where we left off.

Let's examine another example. This time we will be looking at a recursive implementation of our count generator. I'm going to be intentionally avoiding the use of foreach to bring out the enumerator (iterator).
public IEnumerable<int> count(int start, int end)
{
   yield return start;
   if (start != end)
   {
      // foreach (int result in count(start + 1, end))
      // {
      //    yield return result;
      // }
      IEnumerator<int> enumerator = count(start + 1, end).GetEnumerator();
      while (false != enumerator.MoveNext())
      {
         yield return enumerator.Current;
      }
   }
}
This generator does the same thing, just in a recursive fashion. The first yield is exactly as the one previously examined. Things get slightly more difficult to keep track when the recursion begins, but the same exact process is followed. The enumerator is saved as part of the state of the count generator. When returning, the iterator is restored and so the generator continues as expected.

If C# allowed us to directly recur on the generator itself instead of iterating over the results as shown below:
public IEnumerable<int> count(int start, int end)
{
   yield return start;
   if (start != end)
   {
      // Will not compile
      yield return count(start + 1, end);
   }
}
then the state would have to include the call stack of the recursion produced by the generator. By explicitly restricting yield to return the enumerated base type (int), we are prevented from this operation because count returns an enumerable type. This makes implementing generators manually easier. We don't have to worry about recursion at all.

So now we get to implementing a generator. The basic ideas have been captured above. We need to save locals upon yielding, restore state upon calling, and allow enumeration. Since we generators are essentially data structures, we are going to have our desired operation (count) be a member function of a class that extends from a Generator class. Then your program can call foreach on this data type. Let's shown the Generator class first.
class Generator<T, State, RetType>

   : IEnumerable<RetType> where T : GeneratorIterator<State, RetType>, new()
{
   public Generator(State initialState)
   {
      this.initialState = initialState;
   }

   private State initialState;

   #region IEnumerable
   public IEnumerator<RetType> GetEnumerator()
   {
      GeneratorIterator<State, RetType> yieldedFunction = new T();
      yieldedFunction.Initialize(initialState);
      return yieldedFunction;
   }

   IEnumerator IEnumerable.GetEnumerator()
   {
      return this.GetEnumerator();
   }
   #endregion
}

public abstract class GeneratorIterator<State, RetType> : IEnumerator<RetType>
{
   public void Initialize(State initialState)
   {
      this.initialState = initialState;
      this.currentState = initialState;
   }

   public RetType YieldReturn(State state, RetType result)
   {
      // Save state before returning
      this.currentState = state;
      return result;
   }

   private State initialState;

   private State currentState;

   private RetType nextResult;

   public abstract RetType function(State currentState);

   #region IEnumerator
   public RetType Current
   {
      get
      {
         return this.nextResult;
      }
   }

   object IEnumerator.Current
   {
      get
      {
         return this.Current;
      }
   }

   public void Reset()
   {
      // Set the state back and invalid nextResult
      this.Initialize(this.initialState);
      this.nextResult = default(RetType);
   }

   public bool MoveNext()
   {
      this.nextResult = this.function(this.currentState);
      return this.currentState != null;
   }

   public void Dispose() { }
   #endregion
}
Let's examine what this class is doing. It is taking a class that is expected to override a member called function. This will be the function that is expected to be using the YieldReturn method provided by this class. Most of the code is setting up the enumerator. The enumerator takes care of calling the function when appropriate. The real magic that enables this class to be a generator is the currentState variable. This is where the function's state is saved. Whenever the function is needed, the function is passed the current state that was saved by YieldReturn. The function is responsible for using the state to produce the expected results.

To actually use the state, we are going to have functions that resemble finite state machines more than traditional functions. This is due to limitations of the C# compiler to allow us to jump to exactly where we need to. Below is a version of the first non recursive count and the appropriate state class representing all locals.
class Count : GeneratorIterator<CountState, int>
{
   public override int function(CountState currentState)
   {
      if (currentState.stateIndex == 0)
      {
         currentState.counter = currentState.start;
         currentState.stateIndex = 1;
         // Proceed to state 1
      }

      if (currentState.stateIndex == 1)
      {
         while (currentState.counter <= currentState.end)
         {
            // Set the next state
            currentState.stateIndex = 2;
            // yield return
            return YieldReturn(currentState, currentState.counter);
         }

         // Signal no more to yield, normally accomplished by reaching end of the function
         return YieldReturn(null, default(int));
      }

      if (currentState.stateIndex == 2)
      {
         currentState.counter = currentState.counter + 1;
         currentState.stateIndex = 1;
         // Use recursion to jump to state already passed
         return function(currentState);
      }
      
      // Can only get here if an invalid stateIndex was given
      throw new Exception("Unreachable");
   }
}

class CountState
{
   public int stateIndex;
   public int start;
   public int end;
   public int counter;
}
The stateIndex variable of the CountState class is used to determine where in the code path we left off. The function has no locals to avoid having variables no properly restored when upon reentry. To use this generator, we simply instantiate the wrapper and then the class behaves as all enumerator types do.
CountState initialState = new CountState
{
   start = 0,
   end = 5,
   stateIndex = 0
};

Generator<Count, CountState, int> generator = new Generator<Count, CountState, int>(initialState);
foreach (int result in generator)
{
   Console.WriteLine(result);
}
To illustrate that recursion is supported, I am translated the recursive count above to an appropriate finite state machine. You can try for yourself, but you can observe that there is no need to save a call stack to support recursion in your generators.
class RecursiveCount : GeneratorIterator<RecursiveCountState, int>
{
   public override int function(RecursiveCountState currentState)
   {
      if (currentState.stateIndex == 0)
      {
         currentState.stateIndex = 1;
         return YieldReturn(currentState, currentState.start);
      }

      if (currentState.stateIndex == 1)
      {
         if (currentState.start == currentState.end)
         {
            // We are done, signal no more results in this generator
            currentState.stateIndex = 3;
         }
         else
         {
            /// repeat
            currentState.recursiveCall = new Generator<RecursiveCount, RecursiveCountState, int>(
               new RecursiveCountState
                  {
                     start = currentState.start + 1,
                     end = currentState.end,
                     stateIndex = 0
                  }).GetEnumerator();
            // Go to state 2
            currentState.stateIndex = 2;
         }
      }

      if (currentState.stateIndex == 2)
      {
         while (false != currentState.recursiveCall.MoveNext())
         {
            return YieldReturn(currentState, currentState.recursiveCall.Current);
         }

         // We are done, signal no more results in this generator
         currentState.stateIndex = 3;
      }

      if (currentState.stateIndex == 3)
      {
         return YieldReturn(null, default(int));
      }

      throw new Exception("Unreachable");
   }
}

class RecursiveCountState
{
   public int stateIndex;
   public int start;
   public int end;
   public IEnumerator<int> recursiveCall;
}
I hope that this has been informative of the inner workings of generators and the yield statement. Please comment below with any questions.
Post a Comment