Saturday, March 7, 2020

Python - Iterables


Iterables

v A central abstraction in python is the notion of an iterable: an object from which you can fetch a sequence of other objects.
The act of fetching a sequence from an iterable object is known as iteration.

v What are Comprehensions in Python:

A concise syntax for describing lists, sets, or dictionaries in a declarative or functional style.
This shorthand is readable and expressive meaning that comprehensions are very effective at communicating intent to human readers.


v List Comprehension:

Ø List comprehension is enclosed in square brackets just like a literal list, but instead of literal elements it contains a fragment of declarative code, which describes how to construct the elements of the list.

Ø General form of list comprehensions: [expr(item) for item in iterable].
That is, for each item in the iterable object on the right, we evaluate the expression on the left, and use that as the next element of this new list. The expression on the left is almost always in terms of the item, but that is not mandatory.

Ø The source object can be any iterable object such as a tuple.
The expression can be any Python expression (which may or may not be in terms of the item)

Ø The type of object produced by list comprehensions is a regular list

List Comprehension
Equivalent for loop
In [8]: words="An enhanced Interactive Python."
In [9]: print(words)
An enhanced Interactive Python.
In [10]: [len(word) for word in words.split()]
Out[10]: [2, 8, 11, 7]
In [17]: type([len(word) for word in words.split()])
Out[17]: list
In [14]: lengths=[]
In [15]: for word in words.split():
    ...: lengths.append(len(word))
In [16]: print(lengths)
[2, 8, 11, 7]

v Set Comprehension:

Ø Set supports similar comprehension syntax using curly braces instead of square brackets.
Note that the resulting set will not be stored in a meaningful order since sets are unordered containers.
Use this to remove duplicates.

Ø General form of Set comprehensions: {expr(item) for item in iterable}
Set Comprehension
In [18]: {len(word) for word in words.split()}
Out[18]: {2, 7, 8, 11}
In [19]: type({len(word) for word in words.split()})
Out[19]: set

v Dictionary Comprehension:

Ø Dictionary Comprehension also uses curly braces but is distinguished from the set comprehension by the fact that we now provide two colon separated expressions for the key and value, which will be evaluated in tandem for each item.

Ø General form of Set comprehensions: {key_expr:value_expr for item in iterable}

Dictionary Comprehension
In [20]: {word:len(word) for word in words.split()}
Out[20]: {'An': 2, 'enhanced': 8, 'Interactive': 11, 'Python.': 7}
In [21]: type({word:len(word) for word in words.split()})
Out[21]: dict

Ø One use for a dictionary comprehension is to invert a dictionary so we can perform efficient lookups in the opposite direction.
Note: Dictionary comprehensions do not work directly on dict sources. Use dict.items() to get keys and values from dict sources, and then use tuple unpacking to access the key and values separately.
In [22]: from pprint import pprint as pp
    ...: d1={word:len(word) for word in words.split()}
In [23]: pp(d1)
{'An': 2, 'Interactive': 11, 'Python.': 7, 'enhanced': 8}
In [25]: invertedd1={v:k for k,v in d1.items()}
In [26]: pp(invertedd1)
{2: 'An', 7: 'Python.', 8: 'enhanced', 11: 'Interactive'}

Ø Caution: If Dictionary comprehension produce some identical keys, later keys will override earlier keys.

v Limit on expression complexity in Comprehensions: There is no limit to the complexity of the expression we can use in any of the comprehensions, we should avoid going overboard and extract complex expressions into separate functions to preserve readability.

v Filter predicates in Comprehension: All three types of collection comprehension support an optional filtering clause, which allows us to choose which items of the source are evaluated by the expression on the left.
General form of List expression with Filter predicates:
[expr(item) for item in iterable if predicate(item)]
In [28]: [x*x for x in range(1,11) if x%2==0]
Out[28]: [4, 16, 36, 64, 100]
v Comprehensions are often more readable than the alternative, however sometimes a long or complex comprehension may be less readable than the equivalent for loop. There's no hard and fast rule about when one form should be preferred, but we should be conscientious when writing our code. We should try to choose the best form for your situation.

v Comprehensions should ideally be Purely Functional: They should have no side effects. If we need to create side effects such as printing to the console during iteration, use another construct such as a for loop instead.

v Iterable and Iterator Protocol:

Ø Comprehensions and for loops are the most frequently used language features for performing iteration. That is, taking items one-by-one from a source and doing something with each in turn. However, both comprehensions and for loops iterate over the whole sequence by default whereas sometimes more fine-grain control is needed. This is provided by iterable and iterator objects.

Ø The iterable protocol allows us to pass an iterable object, usually a collection or stream of objects such as a list, to the built-in iter() function to get an iterator for the iterable object. In short, Iterable is any object that returns and iterator when passed to iter function.

Ø Iterator objects support the iterator protocol, which requires that we can pass the iterator object to the built-in next() function to fetch the next value from the underlying collection.

Ø Example: In the below example we ask our iterable object to give us an iterator using the built-in iter function, and then request a value from the iterator using the next function. Each call to next moves the iterator through the sequence

In [29]: iterable=["spring","summer","autumn","winter"]
In [30]: type(iterable)
Out[30]: list
In [31]: iterator=iter(iterable)
In [32]: type(iterator)
Out[32]: list_iterator
In [33]: next(iterator)
Out[33]: 'spring'
In [34]: next(iterator)
Out[34]: 'summer'
In [35]: next(iterator)
Out[35]: 'autumn'
In [36]: next(iterator)
Out[36]: 'winter'
In [37]: next(iterator)
Traceback (most recent call last):
File "<ipython-input-37-4ce711c44abc>", line 1, in <module>
next(iterator)
StopIteration
            Note that when we reach the end, python raises StopIteration Exception.

Ø Higher-level iteration constructs such as for loops and comprehensions are built directly upon this lower-level iteration protocol.

v Generators:

Ø Python generators provide the means for describing iterable series with code and functions. These sequences are evaluated lazily meaning they only compute the next value on demand. This important property allows them to model infinite sequences of values with no definite end such as streams of data from a sensor or active log files. Ex: Simple code to send events to event Hub.

Ø Generators are defined by any Python function which uses the yield keyword at least once (May have many times or has it as part of a loop) in its definition. They may also contain the return keyword with no arguments.(This return can be useful if we want to terminate the stream based on some condition.) And just like any other function, there's an implicit return at the end of the definition.
In [38]: def gen123():
...: yield 1
...: yield 2
...: yield 3
...:
...:
...: g=gen123()
In [39]: type(g)
Out[39]: generator
In [40]: print(g)
<generator object gen123 at 0x0000020E121FED68>
Ø Generators are in fact Python iterators, so we can use the standard ways of working with iterators to retrieve or yield successive values from the sequence. To retrieve the next value from an iterator, we use the built- in next function passing the iterator or generator in this case to the function.

Because generators are iterators, they can be used in all the usual Python constructs which expect iterators such as for loops.
Like iterators if we call next after the last Item we get StopIteration Exception.

In [41]: next(g)
Out[41]: 1
In [42]: next(g)
Out[42]: 2
In [43]: next(g)
Out[43]: 3
In [44]: next(g)
Traceback (most recent call last):
File "<ipython-input-44-e734f8aca5ac>", line 1, in <module>
next(g)
StopIteration
Ø Each call to the generator function returns a new generator object. This means that each generator can be advanced independently.

In [46]: g1=gen123()
    ...: g2=gen123()
In [47]: next(g1)
Out[47]: 1
In [48]: next(g1)
Out[48]: 2
In [49]: next(g2)
Out[49]: 1

Ø Following shows how generators work internally. When the generator g is created none of the code within the generator body has yet been executed. When we request the first value the generator body runs up to and including the first yield statement. The code executes just far enough to literally yield the next value. When we call next(g) again, execution of the generator function resumes at the point it left off and continues running until the next yield. After the final value is returned, the next request causes the generator function to execute until it returns at the end of the function body, which in turn raises the expected StopIteration exception.

In [60]: def gen123():
    ...: for i in range(3):
    ...: print("About to yield "+str(i))
    ...: yield i*i
In [61]: g=gen123()
In [62]: next(g)
About to yield 0
Out[62]: 0
In [63]: next(g)
About to yield 1
Out[63]: 1
In [64]: next(g)
About to yield 2
Out[64]: 4

Ø Note that generator functions, which resume execution each time the next value is requested, can maintain state in local variables. This means we can have counters defined inside the generator. We can use these stateful local variables to check conditions and may be exit the generator function.

Following shows the counter local variable which maintains state:

In [94]: def take(count,iterable1):
    ...: counter=0
    ...: for i in iterable1:
    ...: if counter == count:
    ...: return
    ...: else:
    ...: yield i
    ...: counter = counter+1

In [95]: list1=["Mumbai","Delhi","Hyder","Bang"]
    ...: g1=take(2,list1)

In [96]: for j in g1:
    ...: print(j)
Mumbai
Delhi

Ø Generators are lazy meaning that computation only happens just in time when the next result is requested. This interesting and useful property of generators means that they can be used to model infinite sequences. Ex: Simulating continuous events to be sent to Azure Event Hub Since values are only produced as requested by the caller and since no data structure needs to be built to contain the elements of the sequence, generators can safely be used to produce never ending or just very large sequences like sensor readings, mathematical sequences such as primes or factorials, or perhaps the contents of multi-terabyte files.

v Generator Comprehension:

Ø Generator expressions are a cross between comprehensions and generator functions. They use a similar syntax as comprehensions, but they result in the creation of a generator object, which produces the specified sequence lazily.

Ø The syntax for generator expressions is very similar to list comprehensions (expr(item) for item in iterable) delimited by parentheses instead of the brackets used for list comprehensions.

Ø Generator expressions are useful for situations where you want the lazy evaluation of generators with the declarative concision of comprehensions.

In [8]: millionsquares=(i*i for i in range(100001))
In [9]: print(millionsquares)
<generator object <genexpr> at 0x00000262E5C95C00>
In [10]: list(millionsquares)[-10:]
Out[10]:
[9998200081,
9998400064,
9998600049,
9998800036,
9999000025,
9999200016,
9999400009,
9999600004,
9999800001,
10000000000]
In [11]: list(millionsquares)[-10:]
Out[11]: []
            Note that in code 9, no squares have been created yet. We can force evaluation of a generator by converting to generator to a list.
Note that generator does not take any memory, but when we convert it to list it consumes a significant amount of memory.

Ø Imp: Just like Iterators, Generators are single use objects. Once exhausted, it cannot yield more items.
Notice that 2nd time we try to fetch the last 10 element, we get empty list.

Ø Each time we call a generator function, we create a new generator object.
To recreate a generator from a generator expression, we must execute the expression itself once more.

Ø Memory Usage: To compute sum of squares of 1st 10 million number will take lof of space if we 1st create a list of 10 million numbers. However, if we make use of generators we will get the same result, but the amount of memory consumed will be very less.

In [13]: sum(i*i for i in range(1000010))
Out[13]: 333342833423500285
In [14]: sum([i*i for i in range(1000010)])
Out[14]: 333342833423500285
Note that we didn't supply separate enclosing parentheses for the generator expression in addition to those needed for the sum function call. This elegant ability to have the parentheses used for the function call also serve for the generator expression aids readability. You can include the second set of parentheses if you wish, but it's not required.

Ø As with comprehensions, we can include an if clause at the end of the generator expression.

v Additional Iteration Functionality:

Ø Python provides several built-in functions for performing common iterator operations. These functions form the call of a sort of vocabulary for working with iterators, and they can be combined to produce powerful statements in very concise, readable code. Examples are enumerate for producing integer indices and sum for computing summation of numbers. We also have max,min etc

Ø The itertools module contains a wealth of useful functions and generators for processing iterable streams of data.

Ø The itertools islice allows us to perform lazy slicing like the built-in list slicing functionality.

Ø The itertools count allows us to get open-ended version of range. (Note that range is not open ended. It needs to know how many items to create)

Following shows how to generate list of 1st 1000 prime numbers:

for i in itertools.islice((x for x in itertools.count() if is_prime(x)),1000):
    print(i)

Ø The itertools chain allows us to lazily concatenate iterables without having to create a new list. Thus without the memory impact of data duplication.

v Iteration Built-ins:

Ø Boolean Aggregation:
o   The any() determines if any of the element in Series are True.
o   The all() determines if all of the elements in Series are True.
Using comprehension with any and all makes it easy to check for a test over entire iterable and give a collective result if True and False.
In [15]: any([True,False,False])
Out[15]: True
In [16]: all([True,False,False])
Out[16]: False
In [17]: all([True,True])
Out[17]: True
In [18]: any([x%2==0 for x in range(1,100)])
Out[18]: True
In [20]: names=['London','Tokya','Paros','Sydney']
    ...: all([name == name.title() for name in names])
Out[20]: True

Ø Zip: Synchronize iterations over two or more Iterables.
That zip yields tuples when iterated. This in turn means we can use it with tuple unpacking in the for loop.
Zip can accept any number of iterable arguments.

v Summary:

Ø Comprehensions are a concise and readable syntax for describing lists, sets, and dictionaries in a declarative way. These comprehensions iterate on an iterable source object and apply an optional predicate filter and a mandatory expression. Both filter and expression are usually in terms of the current item.

Ø Iterable objects are objects over which we can iterate item-by-item.

Ø We retrieve an iterate all from an iterable using the built-in iter() function.

Ø Iterators produce items one-by-one from the underlying iterable series each time they are passed to the built-in next() function.

Ø When the series is exhausted, iterators raise a StopIteration exception.

Ø Generator functions look just like regular functions and have all the same facilities, but they must contain at least one instance of the yield keyword.

Ø Generators are iterators.

Ø When the iterator is advanced with next(), the generator starts or resumes execution up to and including the next yield statement.

Ø Each call to a generator function creates a new generator object.

Ø Generators can maintain state between calls in local variables and because they are lazy can model infinite series of data.

Ø Generator expressions are a sort of hybrid of generator functions and list comprehensions. These allow for a more declarative and concise way of creating generator objects.

Ø Python includes a rich set of tools for dealing with iterable series both in the form of built-in functions such as sum(), any(), and zip(), but also in the itertools module.

No comments:

Post a Comment