Generators in Python Explained

gen

Before beginning, I ask people who are new to python to assess their knowledge on various Python concepts. Basic concepts like "if/else" or "defining and using functions" are understood by a majority. There are some topics, where many lack understanding of them. Of these, "generators and the yield keyword" are major. This is the case for most novice Python programmers.

Any python function with a keyword “yield” is called as a Generator. In a normal python function the execution of a function starts from first line and continues until return statement or an exception or end of the function (which is by default ‘return None’), however any of the local variables created during the function scope are destroyed and no longer accessible. In the case of a generator when it encounters a yield keyword the state of the function is frozen and all the variables are stored in memory until the generator is called again. 

A generator can be used in accordance with an iterator or can be explicitly called using the ‘next’ keyword. A simple example is as follows:

Using Generators with Iterators:

>> def test_generator():
      yield‘abc’
      yield 123
      yield ‘789’
>> for i in test_generator(): 
      print i
>> ‘abc’
>> 123
>> ‘789’

Using Generators with next

>> printer = test_generator()
>> printer.next()
‘abc’
>> printer.next()
123
>> printer.next()
‘789’

You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested. Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it's more convenient if that happened as late as possible.

Consider to write a simple program to print square of numbers from 1 up to ‘n’

Our initial approach would be:

>>numbers_list = range(1,n+1) # this produces a list of numbers from 1 to n
>> for i in numbers_list: #this for loop would print the squares of numbers from 1 up to n
      print i*i

But what if the ‘n’ is really really big number, such that creating a list of numbers up to ‘n’ would occupy the entire system memory space. In this case we cannot use the above mentioned approach since it consumes the entire system memory space.

It would be better if we had a mechanism to iterate over the numbers as mentioned in above approach without ever creating the list of numbers so that the system memory space isn’t occupied. The prefered solution would be to use generators.

>> def number_generator(n):
      num=1
      while True:
         yield num
         if num == n:
            return
         else:
            num += 1
>> for i in number_generator(200000000): 
       print i*i

In the above mentioned approach, when the for loop is first initialised the number_generator is called and the value of n = 200000000 is stored in memory and num = 1 is initialised and is entered into while loop which loops forever. Then the yield num is encountered, at this stage the while loop is frozen and all the local variables are stored into memory. Since num is 1, yield num is returned to the for loop and is assigned to i, here 1 (i*i) is printed and the next call to number_generator is made.

Now the execution starts from the place where it has been frozen previously, so it executes the line ‘num == n’ which evaluates to ‘1 == 200000000’ since it is false ‘num += 1’ is executed which evaluates to num=2 and the while loop is executed once again and the process continues.

The while loop is continuously executed up to n = 200000000, when 200000000 is yielded then the next line ‘num == n’ which evaluates to ‘200000000 == 200000000’ is executed, since it is true the return statement is executed.

Whenever a generator executes a return statement or encounters exception or reached end of the generator the ‘StopIteration’ exception is raised and the for loop iteration stops at the moment. So in this example we were able to print square of numbers from 1 to 200000000 without ever creating a big list of numbers which would have occupied a large memory space.

Considering this simple use case we could integrate the use of generators in our daily programming to create more efficient programs.