Saturday, March 7, 2020

Python - Files


Files and resource Management

Ø To open a file in Python, we call the built-in open() function. 
Common arguments:
1) File, the path to the file(required)
2) Mode, which specifies read/write/append, and binary or text mode. This is optional, but we always recommend specifying it for clarity. Explicit is better than implicit.
3) Encoding. If the file contains encoded text data, this is the text encoding to use. It's often a good idea to specify this. If you don't specify it, Python will choose a default encoding for you.

The exact type of the object returned by open depends on how the file was opened, dynamic typing in action. However, know that the object returned is a file-like object.

Ø At the file system level, files contain only a series of bytes. Python distinguishes between files opened in binary and text modes even when the underlying operating system doesn't.

o   Files opened in binary mode return and manipulate their contents as bytes objects without any decoding. Binary mode files reflect the raw data in the file.

o   A file opened in text mode treats its contents as if it contains text strings of the str type, the raw bytes having first been decoded using a platform dependent encoding or using the specified encoding if given.      
By default, text mode also engages support for Python's universal newlines. This causes translation between a single portable newline character in our program strings, /n, and a platform-dependent newline representation in the raw bytes stored in the file system, for example carriage return newline /r/n on Windows.

Ø Default Encoding: Getting the encoding right is crucial for correctly interpreting the contents of a text file. If you don't specify an encoding, Python will use the default from sys.getdefaultencoding.

Ø File open Modes: The mode argument in open builtin function All mode strings should consist of a read, write, or append mode. One of R, W, or A with the optional plus modifier should be combined with a selective text or binary mode T or B. is a string containing letters with different meanings.

f=open(‘wasteland.txt’,mode=’wt’,encoding=’utf-8’)

Both parts of the mode code support defaults, its recommended being explicit for the sake of readability.

Ø The write method: used to write to a file. The write call returns the number of codepoints or characters written to the file. It is the caller's responsibility to provide newline characters where they are needed. There is no writeline method.
When we finish writing, we should remember to close the file by calling the close method.

Ø The size of the files written on windows and linux may be different. The difference is because Python's universal newline behavior for files has translated the line endings to your platform's native endings. (on windows \n will be translated by python to \r\n).
The number returned by the write method is the number of codepoints or characters in the string passed to write, not the number of bytes written to the file after encoding a universal newline translation. This means when working with text files, you cannot sum the quantities returned by write to determine the length of the file in bytes.

In [22]: f1=open('wasteland.txt',mode='wt',encoding='utf-8')
In [28]: type(f1)
Out[28]: _io.TextIOWrapper
In [23]: f1.write("This is a crazy world\n")
Out[23]: 22
In [24]: f1.write("filled with stupid ppl")
Out[24]: 22
In [25]: f1.close()

Ø The Read Function:
o   If we know how many bytes to read or if we want to read the whole file, we can use the read function. In text mode the read method accepts the number of characters to read from the file, not the number of bytes.
o   The call returns the text and advances the file pointer to the end of what was read. Subsequent read call will read next piece of data.
o   In text Mode, the return type is str. In Binary mode, the return type is bytes.(.i.e no encoding)
o   To read all the remaining data in the file, we can call read without an argument. This gives us multiple lines in one string with newline characters embedded in middle.
o   At the end of the file, further calls to read return an empty string.

In [45]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
In [46]: type(f2)
Out[46]: _io.TextIOWrapper
In [47]: f3=open('wasteland.txt',mode='rb')
In [48]: s1=f2.read(5)
In [49]: print(s1)
This
In [50]: type(s1)
Out[50]: str
In [51]: b1=f3.read(5)
In [52]: print(b1)
b'This '
In [53]: type(b1)
Out[53]: bytes
In [54]: s2=f2.read()
In [55]: print(s2)
is a crazy world
filled with stupid ppl
In [56]: print(f2.read())

Ø The seek method can be used to move the file pointer to any location. Use 0 offset to move it to start of the file. We can use this to go over the file repeatedly without having to closing and reopening.

Ø Use readline() function to read file line by line. The returned lines are terminated by a single newline character if there is one present in the file. The last line does not terminate with a newline because there is no newline sequence at the end of the file.

Again, the universal newline support will have translated to \n from whatever the platform native newline sequence is. This means on windows \r\n will be translated by python to \n.

Once we reach the end of the file, further calls to readline return an empty string.(Similar to read() method)

Ø Use readlines() method to read all lines into a list. Note that memory may be an issue. This is particularly useful if pausing the file involves hopping backwards and forwards between lines.

In [57]: f2.seek(0)
Out[57]: 0
In [58]: f2.readline()
Out[58]: 'This is a crazy world\n'
In [59]: f2.readline()
Out[59]: 'filled with stupid ppl'
In [60]: f2.readline()
Out[60]: ''
In [61]: f2.seek(0)
Out[61]: 0
In [62]: f2.readlines()
Out[62]: ['This is a crazy world\n', 'filled with stupid ppl']

Ø To append to an existing file, we can open the file with mode a, which opens the file for writing, appending to the end of the file if it already exists.

There is no writeline method in Python, there is a writelines method, which writes an iterable series of strings to a stream. If you want line endings on your strings, you must provide them yourself.

In [66]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
    ...: f2.readlines()
    ...: f2.close()
In [67]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
    ...: print(f2.readlines())
    ...: f2.close()
['This is a crazy world\n', 'filled with stupid ppl']
In [68]: f3=open('wasteland.txt',mode='at',encoding='utf-8')
In [69]: f3.writelines(['most of which want to\n','watch world burn'])
In [70]: f3.close()
In [71]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
    ...: print(f2.readlines())
    ...: f2.close()
['This is a crazy world\n', 'filled with stupid pplmost of which want to\n', 'watch world burn']

Ø File objects support the iterator protocol with each iteration yielding the next line in the file. This means they can be used in for loops and any other place where an iterator can be used.

In [74]: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
In [75]: for i in f2:
    ...: print(i)
This is a crazy world

filled with stupid pplmost of which want to

watch world burn

The double line spacing occurs because each line of the file is terminated by a newline, and then print adds its own. To fix that we could use the strip method to remove the whitespace from the end of each line prior to printing.
Instead we can use the write method of the standard out stream. Files and streams are closely related and can be used because the stream is a file-like object. We can get hold of a reference to the standard out stream from the sys module.

In [76]: import sys
    ...: f2=open('wasteland.txt',mode='rt',encoding='utf-8')
    ...: for i in f2:
    ...: sys.stdout.write(i)
This is a crazy world
filled with stupid pplmost of which want to
watch world burn

Ø Context Managers: When working with files, the close method call is important. It informs the underlying OS that we are done working with a file. If we don't close a file, it's possible to lose data. There may be pending rights buffered up, which might not get written completely.

Many a times during exceptions , the close call is never executed.

Furthermore, if you're opening lots of files, your system may run out of resources.

One option to make sure that files are closed no matter what, is to make use of try-finally clause. The finally block will make sure the close call is executed every time (irrespective of how execution exits the try block)

To ease the need for resource cleanup, Python implements a control flow structure called with-block to support it. With-blocks can be used with any object which supports the context-manager protocol, and that includes the file objects returned by open().

We no longer need to call close explicitly because the with construct will call it for us when and by whatever means execution exits the block. This also removes the need for an explicit close.

The with-block syntax is so-called syntactic sugar for a much more complex arrangement of try/except and try/finally blocks.

Ø Working with Binary Files: We open the file for write in binary mode using the 'wb' mode string. With Binary files we don't specify an encoding as that makes no sense for raw binary files. To the write method we should pass bytes object as the file is opened in binary mode. To convert things to bytes, use the bytes constructor and use b’’ for byte literals. Ex: b’\x01’

Ø Bitwise operators to work on bytes:
& - bitwise and (Remember than python uses ‘and’ for logical and)
| - bitwise OR
>> right-shift
<< left-shift


v Summary:

Ø Files are opened using the built-in open() function, which accepts a file mode. This controls read/write/append behavior and also whether the file is treated as binary or encoded text data.

Ø For text data, it's good practice to always specify an encoding.

Ø Text files differ from binary files by dealing with string objects and performing universal newline translation and string encoding. Binary files deal with bytes objects with no newline translation or encoding.

Ø When you write text files, it's up to us to provide newline characters for line breaks.

Ø Files should always be closed after use to prevent resource leaks and to ensure that all data has been committed to the file system.
 
Ø Files provide various convenient methods for working with lines, but are also iterators, which yield values line-by-line.

Ø Files are also context mangers and can be used with the with-statement. This ensures that cleanup operations such as closing the files are performed.

Ø Context managers aren't restricted to file-like objects. We can use the tools in the contextlib standard library module such as the closing() wrapper to create our own context managers.

Ø Python supports bitwise operators bitwise &, bitwise or, and left- and right-bitwise shifts.

No comments:

Post a Comment