Objectives

In this short blog, I’d like to introduce Python Generators. After reading this blog, I do hope that you will understand

  • What Python generators are
  • What generator function and generator expression are
  • How to use them

What are Python Generators?

A special function in Python that returns a lazy iterator. It works similar to a list, but it doesn’t store all values in memory

Implementation

So, why do we need to use generators instead of normal interators (e.g., list, tuple, set, etc.). The answer is YOU DON’T NEED TO for general programming tasks. However, when dealing a large amount of data which can consume all your computers’ memories; generators become handy.

Let’s say we have a list of numbers 0 to 10. We can easily store and iterate through the list without having problems.

num_list = list(range(11)) # 0 - 10

for num in num_list:
    print(num)

However, imagine that instead of having few elements in a list from our datasource, we have millions of them. And to make matter worst, each element is a big integer. Realistically, we might not have enough memory to store all of those numbers in our tiny computers.

We can solve the problem by using yield instead of return to create a generator function. For example, datasource represents reading data from a large dataset.

def datasource():
    num = 9999
    while True:
        yield num
        num *= 2

We can simply iterate through the datasource one by one without consuming all of memory.

for data in datasource():
    print(data, end=' ')

# 9999 -------- (n)

We can also populate the next value inside of a genetor by using next() function. This is really useful when it comes to test our generators to examine each element.

datasource = datasource()
next(datasource) # 9999

Additionally, a genertor can be created by using expression.

datasource_list = [n for n in range(11)]
datasource_gen = (n for n in range(11))

The difference between datasource_list and datasource_gen is that datasource_list populates a list of numbers and store them in memory; however, datasource_gen does not.

Implementation (Contd.)

Another example, still remeber Fibonacci sequence? We can easily implement generator function to create it as follows:

def fibo_sequence(n=2):
    a, b = 0, 1
    for i in range(n):
        yield b
        a, b = b, a + b
fibo_seq = fibo_sequence(10)
list(fibo_seq) # [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Conclusion

Python generators greatly help optimise memory used when dealing with a large number of data. They can be generated by using yield to create generator function or without wrapped iterators to create generator expression. A rule of thumb to bare in mind is that if memory is an issue, go for generators.