Iterables
v A central abstraction in python is the
notion of an iterable: an
object from which you can fetch a sequence of other objects.
The act of fetching a sequence from an iterable object is known as iteration.
The act of fetching a sequence from an iterable object is known as iteration.
v What
are Comprehensions in Python:
A concise syntax for describing lists, sets, or dictionaries in a declarative or functional style.
This shorthand is readable and expressive meaning that comprehensions are very effective at communicating intent to human readers.
A concise syntax for describing lists, sets, or dictionaries in a declarative or functional style.
This shorthand is readable and expressive meaning that comprehensions are very effective at communicating intent to human readers.
v List
Comprehension:
Ø List comprehension is enclosed in square brackets just like a literal
list, but instead of literal elements it contains a
fragment of declarative code, which describes how to construct the
elements of the list.
Ø General form of list comprehensions: [expr(item) for item in iterable].
That is, for each item in the iterable object on the right, we evaluate the expression on the left, and use that as the next element of this new list. The expression on the left is almost always in terms of the item, but that is not mandatory.
That is, for each item in the iterable object on the right, we evaluate the expression on the left, and use that as the next element of this new list. The expression on the left is almost always in terms of the item, but that is not mandatory.
Ø The source
object can be any iterable object such as a tuple.
The expression can be any Python expression (which may or may not be in terms of the item)
The expression can be any Python expression (which may or may not be in terms of the item)
Ø The type of object produced by list
comprehensions is a regular list
List Comprehension
|
Equivalent for loop
|
In [8]:
words="An enhanced Interactive Python."
In [9]:
print(words)
An enhanced Interactive Python.
In [10]:
[len(word) for word in words.split()]
Out[10]: [2, 8, 11, 7]
In [17]:
type([len(word) for word in words.split()])
Out[17]: list
|
In [14]:
lengths=[]
In [15]:
for word in words.split():
...: lengths.append(len(word))
In [16]:
print(lengths)
[2, 8, 11, 7]
|
v Set
Comprehension:
Ø Set supports similar comprehension syntax
using curly braces instead of square brackets.
Note that the resulting set will not be stored in a meaningful order since sets are unordered containers.
Note that the resulting set will not be stored in a meaningful order since sets are unordered containers.
Use this to remove duplicates.
Ø General form of Set comprehensions: {expr(item) for item in iterable}
Set
Comprehension
|
In [18]:
{len(word) for word in words.split()}
Out[18]: {2, 7, 8, 11}
In [19]:
type({len(word) for word in words.split()})
Out[19]: set
|
v Dictionary
Comprehension:
Ø Dictionary Comprehension also uses curly braces but is distinguished from the set
comprehension by the fact that we now provide two colon
separated expressions for the key and value, which will be evaluated in
tandem for each item.
Ø General form of Set comprehensions: {key_expr:value_expr for item in iterable}
Dictionary Comprehension
|
In [20]:
{word:len(word) for word in words.split()}
Out[20]: {'An': 2, 'enhanced': 8, 'Interactive': 11, 'Python.': 7}
In [21]:
type({word:len(word) for word in words.split()})
Out[21]: dict
|
Ø One use for a dictionary comprehension is
to invert a dictionary so we can perform efficient lookups in the opposite
direction.
Note: Dictionary comprehensions do not work
directly on dict sources. Use dict.items() to get keys and values from dict
sources, and then use tuple unpacking to access the key and values separately.
In [22]: from pprint import pprint as pp
...: d1={word:len(word) for word in words.split()}
In [23]: pp(d1)
{'An': 2,
'Interactive': 11, 'Python.': 7, 'enhanced': 8}
In [25]: invertedd1={v:k for k,v in d1.items()}
In [26]: pp(invertedd1)
{2: 'An', 7:
'Python.', 8: 'enhanced', 11: 'Interactive'}
Ø Caution: If Dictionary comprehension produce some
identical keys, later keys will override earlier keys.
v Limit
on expression complexity in Comprehensions: There is no limit to the complexity of the expression we can
use in any of the comprehensions, we should avoid going overboard and extract
complex expressions into separate functions to preserve readability.
v Filter
predicates in Comprehension: All three types of collection comprehension
support an optional filtering clause, which allows us to choose which items of the source are
evaluated by the expression on the left.
General form of List expression with Filter predicates: [expr(item) for item in iterable if predicate(item)]
General form of List expression with Filter predicates: [expr(item) for item in iterable if predicate(item)]
In [28]: [x*x for x in range(1,11) if x%2==0]
Out[28]: [4, 16, 36, 64, 100]
v Comprehensions are often more readable than
the alternative, however sometimes a long or complex comprehension may be less
readable than the equivalent for loop. There's no hard and fast rule about when
one form should be preferred, but we should be conscientious when writing our
code. We should try to choose the best form for your situation.
v Comprehensions
should ideally be Purely Functional: They should have no side effects.
If we need to create side effects such as printing to the console during iteration,
use another construct such as a for loop instead.
v Iterable
and Iterator Protocol:
Ø Comprehensions and for loops are the most
frequently used language features for performing iteration. That is, taking
items one-by-one from a source and doing something with each in turn. However,
both comprehensions and for loops iterate over the whole sequence by default
whereas sometimes more fine-grain control is needed. This is provided by iterable and
iterator objects.
Ø The iterable
protocol allows us to pass an iterable object, usually a collection or
stream of objects such as a list, to the built-in
iter() function to get an iterator for the iterable object. In short,
Iterable is any object that returns and iterator when passed to iter function.
Ø Iterator
objects support the iterator protocol, which requires that we can pass the iterator object to the
built-in next() function to fetch
the next value from the underlying collection.
Ø Example:
In the below example we ask our iterable object to give us an iterator using
the built-in iter function, and then request a value from the iterator using
the next function. Each call to next moves the iterator through the sequence
In [29]:
iterable=["spring","summer","autumn","winter"]
In [30]: type(iterable)
Out[30]: list
In [31]: iterator=iter(iterable)
In [32]: type(iterator)
Out[32]: list_iterator
In [33]: next(iterator)
Out[33]: 'spring'
In [34]: next(iterator)
Out[34]: 'summer'
In [35]: next(iterator)
Out[35]: 'autumn'
In [36]: next(iterator)
Out[36]: 'winter'
In [37]: next(iterator)
Traceback
(most recent call last):
File "<ipython-input-37-4ce711c44abc>", line 1, in <module>
next(iterator)
StopIteration
Note that when we reach the end, python raises
StopIteration Exception.
Ø Higher-level iteration constructs such as
for loops and comprehensions are built directly upon this lower-level iteration
protocol.
v Generators:
Ø Python generators provide the means for describing iterable series with code and
functions. These sequences are evaluated
lazily meaning they only compute the next value on demand. This
important property allows them to model infinite
sequences of values with no definite end such as streams of data from a
sensor or active log files. Ex: Simple code to send events to event Hub.
Ø Generators are
defined by any Python function which uses the yield keyword at least once (May
have many times or has it as part of a loop) in its definition. They may also contain the return keyword with no arguments.(This
return can be useful if we want to terminate the stream based on some
condition.) And just like any other function, there's an implicit return at the end of the definition.
In [38]: def gen123():
...: yield 1
...: yield 2
...: yield 3
...:
...:
...: g=gen123()
In [39]: type(g)
Out[39]: generator
In [40]: print(g)
<generator
object gen123 at 0x0000020E121FED68>
Ø Generators are in fact Python iterators, so we can use
the standard ways of working with iterators to retrieve or yield successive
values from the sequence. To retrieve the next value from an iterator,
we use the built- in next function passing the iterator or generator in this case to
the function.
Because generators are iterators, they can be used in all the usual Python constructs which expect iterators such as for loops.
Like iterators if we call
next after the last Item we get StopIteration Exception.
In [41]: next(g)
Out[41]: 1
In [42]: next(g)
Out[42]: 2
In [43]: next(g)
Out[43]: 3
In [44]: next(g)
Traceback
(most recent call last):
File "<ipython-input-44-e734f8aca5ac>", line 1, in <module>
next(g)
StopIteration
Ø Each
call to the generator function returns a new generator object. This means that each generator can be advanced independently.
In [46]: g1=gen123()
...: g2=gen123()
In [47]: next(g1)
Out[47]: 1
In [48]: next(g1)
Out[48]: 2
In [49]: next(g2)
Out[49]: 1
Ø Following
shows how generators work internally. When the generator g is created
none of the code within the generator body has yet been executed. When we request the first value the generator body runs up to
and including the first yield statement. The code executes just far
enough to literally yield the next value. When we call next(g) again, execution
of the generator function resumes at the point
it left off and continues running until the next yield. After the
final value is returned, the next request causes the generator function to
execute until it returns at the end of the function body, which in turn raises
the expected StopIteration exception.
In [60]: def gen123():
...: for i in range(3):
...: print("About to yield "+str(i))
...: yield i*i
In [61]: g=gen123()
In [62]: next(g)
About
to yield 0
Out[62]: 0
In [63]: next(g)
About
to yield 1
Out[63]: 1
In [64]: next(g)
About
to yield 2
Out[64]: 4
Ø Note that generator functions, which resume
execution each time the next value is requested, can maintain state in local
variables. This means we can have counters defined inside the generator. We
can use these stateful local variables to check conditions and may be exit the
generator function.
Following shows the counter local variable which maintains state:
In [94]: def take(count,iterable1):
...: counter=0
...: for i in iterable1:
...: if counter == count:
...: return
...: else:
...: yield i
...: counter = counter+1
In [95]:
list1=["Mumbai","Delhi","Hyder","Bang"]
...: g1=take(2,list1)
In [96]: for j in g1:
...: print(j)
Mumbai
Delhi
Ø Generators
are lazy meaning that computation only happens just in time when the next
result is requested. This interesting and useful property of generators
means that they can be used to model infinite
sequences. Ex: Simulating continuous events to be sent to Azure Event
Hub Since values are only produced as requested by the caller and since no data
structure needs to be built to contain the elements of the sequence, generators
can safely be used to produce never ending
or just very large sequences like sensor readings, mathematical sequences such
as primes or factorials, or perhaps the contents of multi-terabyte files.
v Generator
Comprehension:
Ø Generator expressions are a cross between
comprehensions and generator functions. They use a
similar syntax as comprehensions, but they result in the creation of a
generator object, which produces the
specified sequence lazily.
Ø The syntax for generator expressions is
very similar to list comprehensions (expr(item) for
item in iterable) delimited by parentheses instead of the brackets used
for list comprehensions.
Ø Generator expressions are useful for
situations where you want the lazy evaluation of generators with the
declarative concision of comprehensions.
In [8]: millionsquares=(i*i for i in
range(100001))
In [9]: print(millionsquares)
<generator
object <genexpr> at 0x00000262E5C95C00>
In [10]:
list(millionsquares)[-10:]
Out[10]:
[9998200081,
9998400064,
9998600049,
9998800036,
9999000025,
9999200016,
9999400009,
9999600004,
9999800001,
10000000000]
In [11]:
list(millionsquares)[-10:]
Out[11]: []
Note that in
code 9, no squares have been created yet. We can
force evaluation of a generator by converting to generator to a list.
Note that generator does not take any memory, but when we convert
it to list it consumes a significant amount of memory.
Ø Imp: Just
like Iterators, Generators are single use objects.
Once exhausted, it cannot yield more items.
Notice that 2nd time we try to
fetch the last 10 element, we get empty list.
Ø Each time we call a generator function, we
create a new generator object.
To recreate a generator from a generator expression, we must execute the expression itself once more.
To recreate a generator from a generator expression, we must execute the expression itself once more.
Ø Memory Usage: To compute sum of squares of
1st 10 million number will take lof of space if we 1st
create a list of 10 million numbers. However, if we make use of generators we
will get the same result, but the amount of memory consumed will be very less.
In [13]: sum(i*i for i in range(1000010))
Out[13]: 333342833423500285
In [14]: sum([i*i for i in
range(1000010)])
Out[14]:
333342833423500285
Note that we didn't supply separate
enclosing parentheses for the generator expression in addition to those needed
for the sum function call. This elegant ability to have the parentheses used
for the function call also serve for the generator expression aids readability.
You can include the second set of parentheses if you wish, but it's not
required.
Ø As with comprehensions, we can include an
if clause at the end of the generator expression.
v Additional
Iteration Functionality:
Ø Python provides several built-in functions
for performing common iterator operations. These functions form the call of a
sort of vocabulary for working with iterators, and they can be combined to
produce powerful statements in very concise, readable code. Examples are enumerate for producing integer indices
and sum for computing summation of
numbers. We also have max,min etc
Ø The itertools
module contains a wealth of useful functions and generators for processing
iterable streams of data.
Ø The itertools islice allows us to perform lazy slicing like the built-in list
slicing functionality.
Ø The itertools count allows us to get open-ended version of range. (Note that
range is not open ended. It needs to know how many items to create)
Following shows how to generate list of 1st
1000 prime numbers:
for i in itertools.islice((x for x in
itertools.count() if is_prime(x)),1000):
print(i)
Ø The itertools chain allows us to lazily concatenate iterables without having to
create a new list. Thus without the memory impact of data duplication.
v Iteration
Built-ins:
Ø Boolean
Aggregation:
o
The any() determines if any of the element
in Series are True.
o
The all() determines if all of the elements
in Series are True.
Using comprehension with any and all makes it easy to check for a test over entire iterable and give a collective result if True and False.
Using comprehension with any and all makes it easy to check for a test over entire iterable and give a collective result if True and False.
In [15]:
any([True,False,False])
Out[15]: True
In [16]:
all([True,False,False])
Out[16]: False
In [17]: all([True,True])
Out[17]: True
In [18]: any([x%2==0 for x in range(1,100)])
Out[18]: True
In [20]: names=['London','Tokya','Paros','Sydney']
...: all([name == name.title() for name in names])
Out[20]: True
In [18]: any([x%2==0 for x in range(1,100)])
Out[18]: True
In [20]: names=['London','Tokya','Paros','Sydney']
...: all([name == name.title() for name in names])
Out[20]: True
Ø Zip: Synchronize iterations over two or more Iterables.
That zip yields
tuples when iterated. This in turn means we can use it with tuple
unpacking in the for loop.
Zip can accept
any number of iterable arguments.
v Summary:
Ø Comprehensions are a
concise and readable syntax for describing lists, sets, and dictionaries in a
declarative way. These comprehensions iterate on an iterable source object
and apply an optional predicate filter and a mandatory expression. Both
filter and expression are usually in terms of the current item.
Ø Iterable objects are objects
over which we can iterate item-by-item.
Ø We retrieve an iterate all
from an iterable using the built-in iter() function.
Ø Iterators produce items one-by-one
from the underlying iterable series each time they are passed to the built-in
next() function.
Ø When the series is exhausted,
iterators raise a StopIteration exception.
Ø Generator functions look just
like regular functions and have all the same facilities, but they must contain at
least one instance of the yield keyword.
Ø Generators are iterators.
Ø When the iterator is advanced
with next(), the generator starts or resumes execution up to and including
the next yield statement.
Ø Each call to a generator
function creates a new generator object.
Ø Generators can maintain
state between calls in local variables and because they are lazy can model
infinite series of data.
Ø Generator expressions are a
sort of hybrid of generator functions and list comprehensions. These allow for
a more declarative and concise way of creating generator objects.
Ø Python includes a rich set of
tools for dealing with iterable series both in the form of built-in functions
such as sum(), any(), and zip(), but also in the itertools module.
No comments:
Post a Comment