Files and resource Management
Ø To open a file in Python, we call the
built-in open() function.
Common
arguments:
1)
File, the path
to the file(required)
2)
Mode, which specifies read/write/append, and binary or text mode. This
is optional, but we always recommend
specifying it for clarity. Explicit is better than implicit.
3)
Encoding. If the file contains encoded text data, this is the text
encoding to use. It's often a good idea to specify this. If you don't specify
it, Python will choose a default encoding for you.
The
exact type of the object returned by open depends on how the file was opened,
dynamic typing in action. However, know that the object returned is a file-like
object.
Ø At the file system level, files contain
only a series of bytes. Python distinguishes
between files opened in binary and text modes even when the underlying
operating system doesn't.
o
Files
opened in binary mode return and manipulate their
contents as bytes objects without any decoding. Binary mode files
reflect the raw data in the file.
o
A file
opened in text mode treats its contents as if it
contains text strings of the str type, the raw
bytes having first been decoded using a platform dependent encoding or using
the specified encoding if given.
By default, text mode also engages support for Python's universal newlines. This causes translation between a single portable newline character in our program strings, /n, and a platform-dependent newline representation in the raw bytes stored in the file system, for example carriage return newline /r/n on Windows.
By default, text mode also engages support for Python's universal newlines. This causes translation between a single portable newline character in our program strings, /n, and a platform-dependent newline representation in the raw bytes stored in the file system, for example carriage return newline /r/n on Windows.
Ø Default Encoding: Getting
the encoding right is crucial for correctly interpreting the contents of a text
file. If you don't specify an encoding, Python will use the default from
sys.getdefaultencoding.
Ø File open Modes: The mode argument in open builtin function All mode strings
should consist of a read, write, or append mode. One of R, W, or A with the
optional plus modifier should be combined with a selective text or binary mode
T or B. is a string containing letters with different meanings.
f=open(‘wasteland.txt’,mode=’wt’,encoding=’utf-8’)
Both
parts of the mode code support defaults, its recommended being explicit for the
sake of readability.

Ø The write method: used to write to a file. The write call returns the number of codepoints or characters written to
the file. It is the caller's
responsibility to provide newline characters where they are needed.
There is no writeline method.
When
we finish writing, we should remember to close the
file by calling the close method.
Ø The size of the files written on windows
and linux may be different. The difference is because Python's
universal newline behavior for files has translated the line endings to your
platform's native endings. (on windows \n will be translated by python
to \r\n).
The number returned by the write method is the number of
codepoints or characters in
the string passed to write, not the number of bytes
written to the file after encoding a universal newline translation. This
means when working with text files, you cannot sum the quantities returned by
write to determine the length of the file in bytes.
In [22]: f1=open('wasteland.txt',mode='wt',encoding='utf-8')
In [28]: type(f1)
Out[28]: _io.TextIOWrapper
In [23]: f1.write("This is a crazy world\n")
Out[23]: 22
In [24]: f1.write("filled with stupid ppl")
Out[24]: 22
In [25]: f1.close()
Ø The Read Function:
o
If we
know how many bytes to read or if we want to read
the whole file, we can use the read function. In text mode the read
method accepts the number of characters to read from the file, not the number
of bytes.
o
The
call returns the text and advances the file pointer to the end of what was
read. Subsequent read call will read next piece of data.
o
In text Mode, the return type is str. In Binary mode,
the return type is bytes.(.i.e no encoding)
o
To read all the remaining data in the file, we can call read
without an argument. This gives us multiple lines in one string with
newline characters embedded in middle.
o
At the
end of the file, further calls to read return an
empty string.
In [45]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
In [46]: type(f2)
Out[46]: _io.TextIOWrapper
In [47]: f3=open('wasteland.txt',mode='rb')
In [48]: s1=f2.read(5)
In [49]: print(s1)
This
In [50]: type(s1)
Out[50]: str
In [51]: b1=f3.read(5)
In [52]: print(b1)
b'This
'
In [53]: type(b1)
Out[53]: bytes
In [54]: s2=f2.read()
In [55]: print(s2)
is
a crazy world
filled
with stupid ppl
In [56]:
print(f2.read())
Ø The seek method can
be used to move the file pointer to any location.
Use 0 offset to move it to start of the file. We can use this to go over the
file repeatedly without having to closing and reopening.
Ø Use readline() function
to read file line by line. The
returned lines are terminated by a single newline character if there is one
present in the file. The last line does not terminate with a newline because
there is no newline sequence at the end of the file.
Again,
the universal newline support will have
translated to \n from whatever the platform native newline sequence is. This means on windows \r\n will be translated by python
to \n.
Once
we reach the end of the file, further calls to readline return an empty string.(Similar
to read() method)
Ø Use readlines() method to read all lines into a
list. Note that memory may be an issue. This is particularly useful if pausing
the file involves hopping backwards and forwards between lines.
In [57]: f2.seek(0)
Out[57]: 0
In [58]: f2.readline()
Out[58]: 'This is a crazy world\n'
In [59]: f2.readline()
Out[59]: 'filled with stupid ppl'
In [60]: f2.readline()
Out[60]: ''
In [61]: f2.seek(0)
Out[61]: 0
In [62]: f2.readlines()
Out[62]: ['This is a crazy world\n', 'filled with stupid ppl']
Ø To append to
an existing file, we can open the file with mode a, which opens the file for
writing, appending to the end of the file if it already exists.
There is no writeline method in Python, there is a writelines method, which writes an iterable series of strings to a stream. If you want line endings on your strings, you must provide them yourself.
In [66]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
...: f2.readlines()
...: f2.close()
In [67]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
...: print(f2.readlines())
...: f2.close()
['This
is a crazy world\n', 'filled with stupid ppl']
In [68]: f3=open('wasteland.txt',mode='at',encoding='utf-8')
In [69]: f3.writelines(['most of which want to\n','watch world burn'])
In [70]: f3.close()
In [71]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
...: print(f2.readlines())
...: f2.close()
['This is a
crazy world\n', 'filled with stupid pplmost of which want to\n', 'watch world
burn']
Ø File objects support the
iterator protocol with each iteration yielding the next line in the file. This means they
can be used in for loops and any other place where an iterator can be
used.
In [74]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
In [75]: for i in f2:
...: print(i)
This
is a crazy world
filled
with stupid pplmost of which want to
watch world burn
The
double line spacing occurs because each line of the file is terminated by a
newline, and then print adds its own. To fix that we could use the strip method
to remove the whitespace from the end of each line prior to printing.
Instead
we can use the write method of the standard out stream. Files and streams are
closely related and can be used because the stream is a file-like object. We
can get hold of a reference to the standard out stream from the sys module.
In [76]: import sys
...: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
...: for i in f2:
...: sys.stdout.write(i)
This
is a crazy world
filled
with stupid pplmost of which want to
watch world burn
Ø Context
Managers: When working with
files, the close method call is important. It informs the
underlying OS that we are done working with a file.
If we don't close a file, it's possible to lose data. There may be
pending rights buffered up, which might not get written completely.
Many
a times during exceptions , the close call is never executed.
Furthermore,
if you're opening lots of files, your system may
run out of resources.
One
option to make sure that files are closed no matter what, is to make use of
try-finally clause. The finally block will make sure the close call is executed
every time (irrespective of how execution exits the try block)
To
ease the need for resource cleanup, Python implements a control flow structure
called with-block to support it. With-blocks can be used with any object which supports the context-manager protocol,
and that includes the file objects returned by open().
We
no longer need to call close explicitly because the with construct will call it
for us when and by whatever means execution exits the block. This also removes
the need for an explicit close.
The with-block syntax is so-called syntactic sugar for a much
more complex arrangement of try/except and try/finally blocks.
Ø Working
with Binary Files: We
open the file for write in binary mode using the 'wb' mode string. With Binary
files we don't specify an encoding as that makes no sense for raw binary files.
To the write method we should pass bytes object as the file is opened in binary
mode. To convert things to bytes, use the bytes constructor and use b’’ for
byte literals. Ex: b’\x01’
Ø Bitwise
operators to work on bytes:
& - bitwise and (Remember than python uses ‘and’ for logical and)
| - bitwise OR
>> right-shift
<< left-shift
& - bitwise and (Remember than python uses ‘and’ for logical and)
| - bitwise OR
>> right-shift
<< left-shift
v Summary:
Ø Files are opened using the
built-in open() function, which accepts a file mode. This controls read/write/append
behavior and also whether the file is treated as binary or encoded text data.
Ø For text data, it's good
practice to always specify an encoding.
Ø Text files differ from binary
files by dealing with string objects and performing universal newline
translation and string encoding. Binary files deal with bytes objects
with no newline translation or encoding.
Ø When you write text files,
it's up to us to provide newline characters for line breaks.
Ø Files should always be closed
after use to prevent resource leaks and to ensure that all data has been
committed to the file system.
Ø Files provide various
convenient methods for working with lines, but are also iterators, which yield
values line-by-line.
Ø Files are also context
mangers and can be used with the with-statement. This ensures that cleanup
operations such as closing the files are performed.
Ø Context managers aren't
restricted to file-like objects. We can use the tools in the contextlib
standard library module such as the closing() wrapper to create our own context
managers.
Ø Python supports bitwise
operators bitwise &, bitwise or, and left- and right-bitwise shifts.
No comments:
Post a Comment