Welcome to the python generators tutorial. If you have a huge amount of data to read, generators make it easy to iterate through the data. Iterators are very important to learn. If you don't use generators yet, you have to learn to use them. It will make your coding life easier. If you are a beginner or new to python, you are in for a treat. In this post, you will learn python generators. We will provide you a complete understanding of python generators and how to use them. Keep reading to become a better programmer in python. Let us get started.
The main purpose of a generator is to help us in creating our own iterators. It is a special type of function that returns an iterable set. The iterators that we create with the generator are referred to as lazy iterators. The contents of lazy iterators will not be stored in memory. If you want to iterate through large files, data streams, CSV files, etc., generators will be a good choice. Generators are introduced in PEP 255 and they are available since python 2.2 version.
How to create generator functions
Let us create a sample generator. Create a new file in any text editor and copy the below code.
a = ["Hello", "Welcome"]
for i in sample():
print("This is a sample generator")
In this code, the sample() is the generator function name. Yield is used to return items to the caller. Unlike return in normal function, you won't exit the function here. Once a generator is defined, it is called similar to a normal function. But the execution gets paused when it encounters a yield keyword.
Save the file with script.py as the name. Open command prompt, navigate to the script file location path, and execute the below command.
You should be able to see an output that says 'This is a sample generator' on the command prompt. Let us look at one more example that returns squared root numbers to the range of numbers defined.
for num in range(num):
for i in Squared_numbers(5):
This program calls Squared_numbers generator with 5 as a range. The generator will iterate from 0 and yields the square root of 5 numbers. The output for this program will be as follows.
Yield controls the flow of a generator function. When we call a generator expression or a generator function, we will get an iterator in return. This is nothing but a generator.
We have to assign the generator to a variable and then use it. When we call a generator function, it only gets executed until it encounters a yield statement. The yielded value is sent back to the caller.
Generator expressions are similar to list comprehensions. They help us to create a generator object with minimal code. We can create generator objects that do not hold the entire object in memory before iteration. Let us create a list and a generator object and look at the difference between the two.
#Creating a list
numbers_list = [num for num in range(5)]
#Creating a generator object
numbers_generatorObject = (num for num in range(5))
In the above code, we have created a list and a generator object for numbers. The syntax will be very much similar, but the difference will be the type of parentheses that we use. When you execute the above code, this will be the output.
[0, 1, 2, 3, 4]
You can observe here that the numbers_list is a list, so the numbers were printed on the command line. Whereas the numbers_generatorObject has got created as a generator object. You can also see the location at which the generator object is created.
As I mentioned before, generators optimize memory. Let's consider the same example that we have taken above and increase numbers up to 150. Let us see how much size the list and generator objects take to hold the same numbers. Here is a small program that we can use to get the size.
#Creating a list
numbers_list = [num for num in range(150)]
print("The size of the list is", sys.getsizeof(numbers_list))
#Creating a generator object
numbers_generatorObject = (num for num in range(150))
print("The size of the generator is", sys.getsizeof(numbers_generatorObject))
The output for the above program will be as follows.
The size of the list is 1448
The size of the generator is 88
You can see that the list took 1448 bytes, whereas the generator object is only 88 bytes. You can observe a huge difference when you work with a larger dataset.
Generators provide three special methods which were introduced in PEP 342 and is available since the python 2.5 version.
send() - It is a method used to send values to the generator iterators. The value specified in the send() method is used to continue with the next yield. If we do not pass any value to the send() method, it will be equivalent to the next() call.
throw() - It is a method used to throw exceptions from the generator. We can add a throw() method when we might need to catch an exception. The value or exception specified in the throw() method will be sent to the caller.
close() - It is a method used to stop a generator. This will be really helpful when we want to stop a program when it goes into an infinity loop.
When you have a huge dataset that needs processing, we can't really do all the processing at a single place. To avoid this, we can create a pipeline. Each method in a pipeline receives an item, applies transformations on it, and returns the transformed item. This way, we can even change the order of transformations.
For example, if we want to process data in a CSV file, we have to read all the lines of data in the file. Identify the column names, split each row into a list of values, and filter out any unwanted data. Create dictionaries for the column names and lists. Apply the transformations that you want on the rows. All the created generators will function as a pipeline.
As you have learned, generators simplify code. Generator expressions simplify code much further. They might be a little confusing at first. But when you put enough effort and practice them, you will get to understand them completely. Then you will know how easy it is to code in python with the help of generators.
Generators are especially useful when dealing with huge datasets. We can create pipelines and make the developer’s job easier. The calculations on data will be performed on-demand. We can use generators to simulate concurrency. Enjoy coding with python!
5th April | 08:00 AM