The concept of iterators isn't in any way specific to Python. In the most general terms, it is an object that is used to iterate (loop) over a sequence of items. However, different programming languages implement them differently or not at all. In Python every for-loop uses an iterator, differing from many other languages. In this post, I will cover how iterators are defined in Python, and on the way, we will take a look at iterables and what I call nextables.
Iterables
An iterable is an object that can be iterated over, for example, a list. In python, for an object to be an iterable it has to implement the __iter__
method. This dunder method takes only the object instance, self
, as input and has to return an iterator object. You can use the built-in function iter
to get the iterator of an iterable object.
Note that an iterable isn't necessarily an iterator as it doesn't actually perform the iteration. You could have a separate iterator object that is returned from the iterable class, rather than the class handling its own iteration. But, more on this later.
Iterators
Let's move on to the actual iterators, the workhorse of iteration (especially in Python). Iterators are an abstraction layer that encapsulates the knowledge of how to take items from some sequence. I am intentionally very general here as the "sequence" can be anything from lists and files to data streams from a database or remote service. What's cool about iterators is that the code using the iterator doesn't even have to know what source is used. Instead, it can focus on one thing only and that is, "what should I do with each element?".
Non-iterator Iteration
To better understand the benefits of iterators, let's take a short look at iteration without iterators. An example of non-iterator iteration is the classic C-style for-loop. This style exists in not only C but also in e.g. C++, go, and javascript.
An example of this in javascript:
let numbers = [1,2,3,4,5]
for(let i=0; i < numbers.length; i++){
// Extract element
const current_number = numbers[i]
// Perform action on element
const squared = current_number ** 2
console.log(squared)
}
Here we see how this type of for-loop has to have both the extraction of and action for each element bundled.
All Python for-loops uses Iterators
In Python, we don't have C-style for-loops. Instead, Pythons for-loops are what in other languages might be called "for each"-loops. A type of loop that uses iterators. This means that every for-loop you write in Python has to use an iterator.
First let's look at the syntactically closet equivalent of the previous example:
numbers = [1,2,3,4,5]
for i in range(len(numbers)):
number = numbers[i]
squared = number**2
print(squared)
Yes, I know I should just have looped over numbers
, but I wanted to push it a bit closer to the javascript for-loop. Here we need to perform the extraction ourselves, so we are not using an iterator over numbers
. Rather, we are creating a range
that goes over the indices of numbers
(an iterator). While this is relatively close to the javascript example, it still works at a higher abstraction level.
If you look closely at the javascript example you can see how we actually tell the loop when to end i < numbers.length
and also how to increment i++
. So to get Python code closer to this abstraction level, we would actually need to write something like this:
numbers = [1,2,3,4,5]
i = 0
while i < len(numbers):
number = numbers[i]
squared = number**2
print(squared)
i += 1
Here we initialize i
, define what condition should be met to exit the loop, and how to increment. We need to very explicitly manage the value of i
, something not required when using range
.
Python Iterator Protocol
In the Python documentation, an iterator is defined as a class that implements __next__
and __iter__
. By this definition, an iterator is also an iterable (it implements __iter__
). It is also what I will call nextable, as it implements the __next__
method. No, nextable is not a commonly used term and that's because you might as well make it into an iterator. You see the __iter__
method is trivial to implement for iterators. It is actually written out explicitly in the iterator definition what the method should do:
class MyABCIterator:
...
def __iter__(self):
return self
That's it, the method just returns a reference to the iterator itself. So if you copypaste this __iter__
code into your nextable, you have an iterator. I named the class MyABCIterator
as I will build this into an iterator that iterates over the alphabet.
Now let's make it into an iterator by making it nextable as well. The __next__
method should return the next object in the sequence. It also has to raise StopIteration
when we have reached the end of the sequence, often referred to as having "exhausted the iterator". In this case, when we have reached the end of the alphabet.
I will use a handy import, ascii_lowercase
, from the string
module in Pythons standard library. This is a string of all lowercase characters in the english alphabet.
>>> from string import ascii_lowercase
>>> ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
Okay, now let's look at the code for our class, then I will explain the implementation.
from string import ascii_lowercase
class MyABCIterator:
def __init__(self):
self.index = 0
def __next__(self):
if self.index >= len(ascii_lowercase):
raise StopIteration()
char = ascii_lowercase[self.index]
self.index +=1
return char
def __iter__(self):
return self
To know which character to return each time __next__
is called, we will need an index. Therefore, we add a __init__
to our class, where we initialize self.index
to zero. Then, for each call to __next__
we will first check if we reached the end of the alphabet. If the index is outside the length of the string, we raise StopIteration
as dictated in the Python documentation. Next, we extract the current character before incrementing self.index
. Otherwise, we would start with b instead of a. Finally, we increment the index and return the character previously extracted.
Now, let's try it out in a for-loop:
>>> for char in MyABCIterator():
... print(char)
a
b
c
d
e
...
I truncated the output as the alphabet isn't that interesting now, is it? This iterator, as you might expect, is utterly useless. We could just as well iterated over ascii_lowercase
directly. But, I hope you learned something about iterators at least.
A final note on the iterator protocol. Note how the for-loop does all the work in using the protocol. It automatically gets the iterator using __iter__
and repeatedly iterates over it using __next__
. This follows Pythons pattern of dunder methods not being directly used, but rather a way for us to hook into Pythons syntax or top-level function.
Nextables
Now that we have properly implemented an iterator, let's experiment a bit. The first question I'd like to ask is:
What does Python thinks about looping over a nextable?
For this, I removed the __iter__
method from the previous example, resulting in:
class MyABCNextable:
def __init__(self):
self.index = 0
def __next__(self):
if self.index >= len(ascii_lowercase):
raise StopIteration()
char = ascii_lowercase[self.index]
self.index +=1
return char
Trying to create a for-loop with this object gives us: TypeError: 'MyABCNextable' object is not iterable
. Which, you know isn't that surprising. The interpreter can't find a __iter__
to call and can, therefore, not create an iterator.
Python Separates the Iterator from the Sequence
I started playing around with the built-in sequences and made a fun little discovery. In Python, the sequences aren't iterators themselves. Rather each has a corresponding iterator-class responsible for the iteration. Let's look at range
as an example.
>>> range_0_to_9 = range(10)
>>> type(range_0_to_9)
<class 'range'>
range()
gives us a an object of type range. Now let's see what happen when we try to use next
on this object.
>>> next(range_0_to_9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'range' object is not an iterator
So if the range
object isn't an iterator, what do we get from iter
?
>>> range_0_to_9_iterator = iter(range_0_to_9)
>>> type(range_0_to_9_iterator)
<class 'range_iterator'>
Just to verify, we can use next
with the range_iterator
.
>>> next(range_0_to_9_iterator)
0
>>> next(range_0_to_9_iterator)
1
Here we get back a range_iterator
object that is actually responsible for doing the iteration. This seems to be the case for all built-in types; including lists, dictionaries, and sets. It seems that in CPython, the developers have chosen to separate the iteration of an object from the object itself.
Creating a Separate Iterable and Nextable
Armed with this new knowledge of separating the iterable from the iterator, I got a new idea:
Can I return a plain nextable from the iterable?
The iterable is really easy:
class MyABCIterable:
def __iter__(self):
return MyABCNextable()
This is just a wrapper around our nextable from the nextables example. Then we write the loop:
for char in MyABCIterable():
print(char)
It works! So, while Python documentation says that the __iter__
method must return an iterator (implementing both __iter__
and __next__
), this isn't demanded by the for-loop.
However, this setup is fragile. Going back to our properly implemented iterator, this code will work:
my_abc = iter(MyABCIterator())
for char in my_abc:
print(char)
However, with our iterable + nextable combo, it doesn't:
my_abc = iter(MyABCIterable())
for char in my_abc:
print(char)
It raises TypeError: 'MyABCNextable' object is not iterable
. This is the reason the iterator protocol is defined the way it is. It makes it possible to pass around the iterator and still loop over it. For example, if you could first create the iterator and then pass it off to another function. Our hacky solution wouldn't work for this.
Final Notes
I just want to end with some key takeaways from this post. The first one I think most of you know, all for-loops in Python uses iterators!. We also took a look at how iterators allow us to separate the code doing the iteration from the code working with each item. I also hope you learned a bit more about how iterators work in Python and what the iterator protocol is.
The final takeaway is a bit more abstract, it actually doesn't have anything to do with iterators or Python at all. Rather, it is how code might seem to work well until you stumble on the case where it breaks. If I would have called it a day after solving just trying the iterable + nextable combination in a for-loop, I wouldn't have discovered how it breaks when passing around the iterator.
That's it for now and I hope you enjoyed this deeper look at Pythons iterator protocol!
Edit (2022-01-19):
There was some discussion about how CPython just require next for the object returned from iter for e.g. for-loops to work. So the question was if the definition of an iterator should just include next or not. The python steering council decided that the protocol should still be defiend as next and iter, but that there is a CPython implementation detail that it is inconsistant with validating this (reference comment on github).
So it seems everything in the post is still correct, but that there will be an addition to the documentation noting this implementation detail of CPython. So, really just implement iter for your iterators. Swapping to other Python run-times might cause issues otherwise.