-
iterator, iterable, iteration, iterator protocol, generator functions, generator expressions, and lazy evaluationDynamicPL/Python 2019. 10. 22. 08:44
1. Overview
In Python, iterable and iterator have specific meanings.
- How the iteration in Python works under the hood
- What are iterables and iterators and how to create them
- What is the iterator protocol
- What is a lazy evaluation
- What are the generator functions and generator expressions
2. Description
2.1 Loop
2.1.1 For Loop
Python has something called for loop, but it works like a foreach loop. There is no initializing, condition or iterator section. Under the hood, Python’s for loop is using iterators.
numbers = [10, 12, 15, 18, 20] for number in numbers: print(number)
2.1.2 Indices
index = 0 numbers = [1, 2, 3, 4, 5] while index < len(numbers): print(numbers[index]) index += 1 #output 1 2 3 4 5
It seems that this approach works very well for lists and other sequence objects. What about the non-sequence objects? They don’t support indexing, so this approach will not work for them.
index = 0 numbers = {1, 2, 3, 4, 5} while index < len(numbers): print(numbers[index]) index += 1 #output -------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-af1fab82d68f> in <module>() 2 numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} 3 while index < len(numbers): ----> 4 print(numbers[index]) 5 index += 1 TypeError: 'set' object does not support indexing
But how the Python’s for loop works on these iterables then? We can see that it works with sets.
numbers = {1, 2, 3, 4, 5} for number in numbers: print(number) #output 1 2 3 4 5
2.2.1 Loop without for loop
We can try to define a function that loops through an iterable without using a for loop.
To achieve this, we need to:
- Create an iterator from the given iterable
- Repedeatly get the next item from the iterator
- Execute the wanted action
- Stop the looping, if we got a StopIteration exception when we’re trying to get the next item
def custom_for_loop(iterable, action_to_do): iterator = iter(iterable) done_looping = False while not done_looping: try: item = next(iterator) except StopIteration: done_looping = True else: action_to_do(item)
Let’s try to use this function with a set of numbers and the print built-in function.
numbers = {1, 2, 3, 4, 5} custom_for_loop(numbers, print) #output 1 2 3 4 5
We can see that the function we’ve defined works very well with sets, which are not sequences. This time we can pass any iterable and it will work. Under the hood, all forms of looping over iterables in Python is working this way.
2.1 iterable
An iterable is an object capable of returning its members one by one. Said in other words, an iterable is anything that you can loop over with a for loop in Python.
An iterable is an object that has an __iter__ method that returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.
2.1.1 Iterable but not sequence
Many things in Python are iterables, but not all of them are sequences. Dictionaries, file objects, sets, and generators are all iterables, but none of them is a sequence.
my_set = {2, 3, 5} my_dict = {"name": "Ventsislav", "age": 24} my_file = open("file_name.txt") squares = (n**2 for n in my_set)
2.2 iterator
An iterator is an object representing a stream of data. You can create an iterator object by applying the iter() built-in function to an iterable. It does the iterating over an iterable. An iterator has a next (Python 2) or __next__ (Python 3) method.
numbers = [10, 12, 15, 18, 20] fruits = ("apple", "pineapple", "blueberry") message = "I love Python ❤️" print(iter(numbers)) print(iter(fruits)) print(iter(message)) #output <list_iterator object at 0x000001DBCEC33B70> <tuple_iterator object at 0x000001DBCEC33B00> <str_iterator object at 0x000001DBCEC33C18>
You can use an iterator to manually loop over the iterable it came from. A repeated passing of iterator to the built-in function next()returns successive items in the stream. Once, when you consumed an item from an iterator, it’s gone. When no more data are available a StopIteration exception is raised.
values = [10, 20, 30] iterator = iter(values) print(next(iterator)) print(next(iterator)) print(next(iterator)) print(next(iterator)) #output 10 20 30 -------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-14-fd36f9d8809f> in <module>() 4 print(next(iterator)) 5 print(next(iterator)) ----> 6 print(next(iterator)) StopIteration:
the iterators are also iterables that act as their own iterators. the difference is that iterators don’t have some of the features that some iterables have. They don’t have length and can’t be indexed.
numbers = [100, 200, 300] iterator = iter(numbers) print(len(iterator)) -------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-15-778b5f9befc3> in <module>() 1 numbers = [100, 200, 300] 2 iterator = iter(numbers) ----> 3 print(len(iterator)) TypeError: object of type 'list_iterator' has no len()
2.2.1 Example
- Enumerate
fruits = ("apple", "pineapple", "blueberry") iterator = enumerate(fruits) print(type(iterator)) print(next(iterator)) #output <class 'enumerate'> (0, 'apple')
- Reversed
fruits = ("apple", "pineapple", "blueberry") iterator = reversed(fruits) print(type(iterator)) print(next(iterator)) #output <class 'reversed'> blueberry
- Zip
numbers = [1, 2, 3] squares = [1, 4, 9] iterator = zip(numbers, squares) print(type(iterator)) print(next(iterator)) print(next(iterator)) #output <class 'zip'> (1, 1) (2, 4)
- Map
numbers = [1, 2, 3, 4, 5] squared = map(lambda x: x**2, numbers) print(type(squared)) print(next(squared)) print(next(squared)) #output <class 'map'> 1 4
- Filter
numbers = [-1, -2, 3, -4, 5] positive = filter(lambda x: x > 0, numbers) print(type(positive)) print(next(positive)) #output <class 'filter'> 3
- file
file = open("example.txt") print(type(file)) print(next(file)) print(next(file)) print(next(file)) file.close() #output <class '_io.TextIOWrapper'> This is the first line. This is the second line. This is the third line.
- items() of dictionary
my_dict = {"name": "Ventsislav", "age": 24} iterator = my_dict.items() print(type(iterator)) for key, item in iterator: print(key, item) #output <class 'dict_items'> name Ventsislav age 24
2.2.2 Custom iterator
In some cases, we may want to create a custom iterator. We can do that by defining a class that has __init__, __next__, and __iter__ methods. Let’s try to create a custom iterator class that generates numbers between min value and max value.
class generate_numbers: def __init__(self, min_value, max_value): self.current = min_value self.high = max_value def __iter__(self): return self def __next__(self): if self.current > self.high: raise StopIteration else: self.current += 1 return self.current - 1 numbers = generate_numbers(40, 50) print(type(numbers)) print(next(numbers)) print(next(numbers)) print(next(numbers)) #output <class '__main__.generate_numbers'> 40 41 42
However, it is much easier to use a generator function or generator expression to create a custom iterator.
2.3 Sequences
2.3.1 Definition
In Math: $S=x_{1},x_{2},x_{3},\cdots $
Note The sequence of indices: 1, 2, 3, ...
We can refer to any item in the sequence by using its index number S[2]
So we have a concept of the first element, the second element, and so on
Python lists have a concept of positional order, but set do not
- A list is a sequence type
- A set is not a sequence type
2.3.2 Built-In Sequence Types
- Mutable
- lists
- bytearrays
- Immutable
- strings
- tuples: in reality, a tuple is more than just a sequence type
- range: more limited than lists, strings, and tuples
- bytes
- Additional Standard Type
- collections package
- namedtuple
- deque
- array module
- array
- collections package
2.3.3 Homogeneous vs Heterogeneous Sequences
- homogeneous
- each elements is of the same type(a character)
- 'python'
- Strings
- each elements is of the same type(a character)
- heterogeneous
- each element may be a different type
- [1,10.5,'python']
- lists
- each element may be a different type
Homogeneous sequence types usually more efficient in storage wise at least. e.g. prefer using a string of characters, rather than a list or tuple of characters.
2.3.4 Features
Sequences are a very common type of iterable. So any sequence type is iterable. But an iterable is not necessarily a sequence type. iterables are more general. Some examples of built-in sequence types are lists, strings, and tuples.
numbers = [10, 12, 15, 18, 20] fruits = ("apple", "pineapple", "blueberry") message = "I love Python ❤️"
They support efficient element access using integer indices via the __getitem()__ special method (indexing) and define a __length()__ method that returns the length of the sequence.
Also, we can use the slicing technique on them.
# Slicing the sequences print(numbers[:2]) print(fruits[1:]) print(message[2:])
2.4 iteration
Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.
2.5 Iterator Protocol
- iterator.__iter__()
Return the iterator object itself. This is required to allow both containers (also called collections) and iterators to be used with the for and in statements. - iterator.__next__()
Return the next item from the container. If there are no more items, raise the StopIteration exception.
From the methods descriptions above, we see that we can loop over an iterator. So, the iterators are also iterables.
Remember that when we apply the iter() function to an iterable we get an iterator. If we call the iter() function on an iterator it will always give us itself back.
numbers = [100, 200, 300] iterator1 = iter(numbers) iterator2 = iter(iterator1) # Check if they are the same object print(iterator1 is iterator2) for number in iterator1: print(number) #output True 100 200 300
3. Difference between iterable and iterator
- An iterable is something you can loop over.
- An iterator is an object representing a stream of data. It does the iterating over an iterable.
Additionally, in Python, the iterators are also iterables which act as their own iterators.
However, the difference is that iterators don’t have some of the features that some iterables have. They don’t have length and can’t be indexed.
numbers = [100, 200, 300] iterator = iter(numbers) print(len(iterator)) #output -------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-15-778b5f9befc3> in <module>() 1 numbers = [100, 200, 300] 2 iterator = iter(numbers) ----> 3 print(len(iterator)) TypeError: object of type 'list_iterator' has no len()
4. Lazy evaluation
Iterators allow us to both work with and create lazy iterables that don’t do any work until we ask them for their next item.
Because of their laziness, the iterators can help us to deal with infinitely long iterables. In some cases, we can’t even store all the information in the memory, so we can use an iterator which can give us the next item every time we ask it. Iterators can save us a lot of memory and CPU time.
This approach is called lazy evaluation.
Many people use Python to solve Data Science problems. In some cases, the data you work with can be very large. In this cases, we can’t load all the data in the memory.
The solution is to load the data in chunks, then perform the desired operation/s on each chunk, discard the chunk and load the next chunk of data. Said in other words we need to create an iterator. We can achieve this by using the read_csv function in pandas. We just need to specify the chunksize.
4.1 Large Dataset Example
In this example, we’ll see the idea with a small dataset called “iris species”, but the same concept will work with very large datasets, too. There are a lot of iterator objects in the Python standard library and in third-party libraries.
import pandas as pd # Initialize an empty dictionary counts_dict = {} # Iterate over the file chunk by chunk for chunk in pd.read_csv("iris.csv", chunksize = 10): # Iterate over the "species" column in DataFrame for entry in chunk["species"]: if entry in counts_dict.keys(): counts_dict[entry] += 1 else: counts_dict[entry] = 1 # Print the populated dictionary print(counts_dict) #output {'Iris-setosa': 50, 'Iris-versicolor': 50, 'Iris-virginica': 50}
5. Generator Functions and Generator Expressions
5.1 Generator Functions
A function that returns a generator iterator. It looks like a normal function except that it contains yield expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.
def generate_numbers(min_value, max_value): while min_value < max_value: yield min_value min_value += 1 numbers = generate_numbers(10, 20) print(type(numbers)) print(next(numbers)) print(next(numbers)) print(next(numbers)) #output <class 'generator'> 10 11 12
The yield expression is the thing that separates a generation function from a normal function. This expression is helping us to use the iterator’s laziness. Each yield temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator iterator resumes, it picks up where it left off (in contrast to functions which start fresh on every invocation).
5.2 Generator Expressions
The generator expressions are very similar to the list comprehensions. Just like a list comprehension, the general expressions are concise. In most cases, they are written in one line of code.
An expression that returns an iterator. It looks like a normal expression followed by a "for" expression defining a loop variable, range, and an optional "if" expression.
numbers = [1, 2, 3, 4, 5] squares = (number**2 for number in numbers) print(type(squares)) print(next(squares)) print(next(squares)) print(next(squares)) #output <class 'generator'> 1 4 9
We can also add a conditional expression on the iterable. We can do it like this:
numbers = [1, 2, 3, 4, 5] squares = (number**2 for number in numbers if number % 2 == 0 if number % 4 == 0) print(type(squares)) print(list(squares)) #output <class 'generator'> [16]
Also, we can add an if-else clause on the output expression like this:
numbers = [1, 2, 3, 4, 5] result = ("even" if number % 2 == 0 else "odd" for number in numbers) print(type(result)) print(list(result)) #output <class 'generator'> ['odd', 'even', 'odd', 'even', 'odd']
6. Reference
https://github.com/fbaptiste/python-deepdive
https://realpython.com/python-for-loop/
https://code-maven.com/list-comprehension-vs-generator-expression
https://docs.python.org/3/tutorial/classes.html#iterators
https://docs.python.org/dev/howto/functional.html#iterators
https://docs.python.org/dev/library/stdtypes.html#iterator-types
https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration
'DynamicPL > Python' 카테고리의 다른 글
Slice (0) 2019.10.28 Sequence (0) 2019.10.26 List, Dictionary, Set Comprehensions, and Difference between List Comprehensions and Generator expressions (0) 2019.10.22 Asterisk(*) of Python (0) 2019.10.18 Data Structure in Python (0) 2019.10.18