Takeaways from Python Crash Course: Read and Write Files, and Handle Exceptions in Python
This post is a record made while learning Chapter 10 âFiles and Exceptionsâ in Eric Matthesâs book, Python Crash Course.1
Read text from a file
Read an entire file
We can use the following code to open the file pi_digits.txt, read and print its contents to the screen:
1
2
3
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print(contents)
1
2
3
3.1415926535
8979323846
2643383279
where the pi_digits.txt file contains $\pi$ to 30 decimal places with 10 decimal places per line:
1
2
3
3.1415926535
8979323846
2643383279
open() function
To do any work with a file, we first need to open the file to access it by open() function; the open() function always returns an file object representing the file. Specifically in this case, open('pi_digits.txt') returns an object representing pi_digits.txt, and Python assigns this object to the variable file_object.
Once we have a file object representing pi_digits.txt, we can use the read() method to read the entire contents of the file and store it as one long string in contents. When we print the value of contents, we get the entire text file back.
The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line. If we want to remove the extra blank line, use rstrip() function in the call to print():
1
2
3
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
with keyword
The with keyword is used to automatically close the file that was opened by open() function, especially helpful when a bug (or an error) occurs after opening the file by open() but not yet close the file by close() function:
The keyword with closes the file once access to it is no longer needed. Notice how we call open() in this program but not close(). You could open and close the file by calling open() and close(), but if a bug in your program prevents the close() method from being executed, the file may never close. This may seem trivial, but improperly closed files can cause data to be lost or corrupted. And if you call close() too early in your program, youâll find yourself trying to work with a closed file (a file you canât access), which leads to more errors. Itâs not always easy to know exactly when you should close a file, but with the structure shown here, Python will figure that out for you. All you have to do is open the file and work with it as desired, trusting that Python will close it automatically when the with block finishes execution.
I think the easiest method to judge whether or not a file is closed normally is checking if it can be deleted successfully.
If we execute the following code:
1
2
3
file_object = open('pi_digits.txt')
print('9' >= 1)
file_object.close()
As expected, weâll get an error:
1
2
3
4
5
6
7
8
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], line 2
1 file_object = open('pi_digits.txt')
----> 2 print('9' >= 1)
3 file_object.close()
TypeError: '>=' not supported between instances of 'str' and 'int'
and at this time, if we try to delete the file pi_digits.txt, a Windows error will occur, âThe action cannot be completed because the file is open in Pythonâ. Itâs easy to understand because Python didnât execute file_object.close(). If we want to delete the file successfully, we should run file_object.close() again.
On the other hand, by with keyword we have following code snippet to do the same work:
1
2
3
4
5
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print('9' >= 1)
print(contents)
1
2
3
4
5
6
7
8
9
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 3
1 with open('pi_digits.txt') as file_object:
2 contents = file_object.read()
----> 3 print('9' >= 1)
5 print(contents)
TypeError: '>=' not supported between instances of 'str' and 'int'
although the same error occurs, we can directly delete pi_digits.txt at this point. From this small example we can better understand the advantage brought by using with.
Technically, the with keyword is simplified version of a try-catch block (or try-finally block), and it is not only available by combining with open() functionâcontext managers all support with statement, and open() function is a special context manager. To make a context manager, we should define __enter__() and __exit__() methods for the class.
In Python, the with statement replaces a try-catch block with a concise shorthand. More importantly, it ensures closing resources right after processing them. A common example of using the with statement is reading or writing to a file. A function or class that supports the with statement is known as a context manager. A context manager allows you to open and close resources right when you want to. For example, the open() function is a context manager. When you call the open() function using the with statement, the file closes automatically after youâve processed the file.2
⌠the with statement replaces this kind of try-catch block2:
1
2
3
4
5
6
f = open("example.txt", "w")
try:
f.write("hello world")
finally:
f.close()
The with statement is popularly used with file streams, as shown above [open() function] and with Locks, sockets, subprocesses and telnets etc.3
There is nothing special in open() which makes it usable with the with statement and the same functionality can be provided in user defined objects. Supporting with statement in your objects will ensure that you never leave any resource open. To use with statement in user defined objects you only need to add the methods __enter__() and __exit__() in the object methods.3
See also references 456 for more information about with keyword.
Absolute file path vs. relative file path
If the text file isnât in the current folder where the script file is put, we need to provide a file path for open() function to tell Python to look in the specific directory.
For example, if the file pi_digits.txt is on the Desktop, we could provide an absolute file path:
1
2
3
4
5
6
file_path = 'C:/Users/whatastarrynight/Desktop/pi_digits.txt'
# file_path = 'C:\\Users\\whatastarrynight\\Desktop\\pi_digits.txt'
with open(file_path) as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
Itâs okay to use forward slash / or double backslash \\ to separate the file path, but single back slash \\ is not available:
1
2
3
4
5
file_path = 'C:\Users\whatastarrynight\Desktop\pi_digits.txt'
with open(file_path) as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
4
Cell In[1], line 1
file_path = 'C:\Users\whatastarrynight\Desktop\pi_digits.txt'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
The reason is that, the backslash is used to start a escape characters in Python strings, which is not our intention. For example, in the path âC:\path\to\file.txtâ, the sequence \t is interpreted as a tab. This also explains why \\ works: the first backslash escape the second one.
If pi_digits.txt is in a sub-folder, say text_files, under the current folder, we can choose to provide a relative file path for open() function:
1
2
3
4
5
6
7
file_path = 'text_files\pi_digits.txt'
# file_path = 'text_files/pi_digits.txt'
# file_path = 'text_files\\pi_digits.txt'
with open(file_path) as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
In the relative file path, itâs all fine to take \, /, or \\ as a path delimiter.
Read an entire file line by line using for loop
We can use a for loop on the file object to examine each line from a file one at a time:
1
2
3
4
filename = 'pi_digits.txt'
with open(filename) as file_object:
for line in file_object: # Loop over the file object
print(line)
1
2
3
4
5
3.1415926535
8979323846
2643383279
These blank lines appear because an invisible newline character is at the end of each line in the text file. The print function adds its own newline each time we call it, so we end up with two newline characters at the end of each line: one from the file and one from print() function. Similarly, we can use rstrip() function on each line to eliminate these extra blank lines.
1
2
3
4
filename = 'pi_digits.txt'
with open(filename) as file_object:
for line in file_object: # Loop over the file object
print(line.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
Make a list of lines from a file: readlines() method
1
2
3
4
5
6
7
8
9
10
filename = 'pi_digits.txt'
with open(filename) as file_object:
# The `readlines()` method takes each line from the file and stores it in a list.
lines = file_object.readlines()
print(lines)
for line in lines:
print(line.rstrip())
1
2
3
4
['3.1415926535\n', ' 8979323846\n', ' 2643383279']
3.1415926535
8979323846
2643383279
Afterwards, we can further work with fileâs contents, like concatenating above three lines as one long string:
1
2
3
4
5
6
7
8
9
10
11
filename = 'pi_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
pi_string = ''
for line in lines:
pi_string += line.strip()
print(pi_string)
print(len(pi_string))
1
2
3.141592653589793238462643383279
32
When Python reads from a text file, it interprets all text in the file as a string. If we read in a number and want to work with that value in a numerical context, we should convert it to an integer by int() function or to a float by float().
Read in a large file
The $\pi$ in above file pi_digits.txt only contains 30 decimal places, while that in pi_million_digits.txt (which can be obtained from7) has 1,000,000 decimal places. We can adopt a similar method to read pi_million_digits.txt and concatenate its content together into a long string:
1
2
3
4
5
6
7
8
9
10
11
filename = 'pi_million_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
pi_string = ''
for line in lines:
pi_string += line.strip()
print(f"{pi_string[:52]}...")
print(len(pi_string))
1
2
3.14159265358979323846264338327950288419716939937510...
1000002
Based on which we can make an interesting program to find if someoneâs birthday appears in the first million digits of $\pi$. Take mine, 101798 (in date format mmddyy):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
filename = 'pi_million_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
pi_string = ''
for line in lines:
pi_string += line.strip()
birthday = input("Enter your birthday, in the form mmddyy: ")
if birthday in pi_string:
print("Your birthday appears in the first million digits of pi!")
else:
print("Your birthday does not appear in the first million digits of pi.")
1
2
Enter your birthday, in the form mmddyy: 101798
Your birthday appears in the first million digits of pi!
Interesting!
Write to a file: open() function
One of the simplest ways to save data is to write it to a file. To do this, we need to call open() function with a second argument.
Write to an empty file: write mode 'w'
1
2
3
4
filename = 'programming.txt'
with open(filename, 'w') as file_object:
file_object.write("I love programming.")
In the open() function, the second argument 'w' tells Python that we want to open the file in write mode. In the write mode, the open() function will automatically create the file programming.txt if it doesnât already exist in the current folder, and will erase the fileâs contents before returning the file object if the file already exist.
Besides write mode, there are also some others: read mode 'r' (default), write mode 'w', append mode 'a', and a mode that allows us to read and write to the file 'r+'.
The write() method on the file object is used to write a string to the file. By running above script, there is no any terminal output, but we can see one line in programming.txt:
1
I love programming.
Python can only write strings to a text file. If we want to output numerical data to a text file, we need to convert the data to string format first using str() function.
The write() function doesnât add any newlines to the text we write. So if we want to write the text more than one line to the file, we could add some newline characters:
1
2
3
4
5
filename = 'programming.txt'
with open(filename, 'w') as file_object:
file_object.write("I love programming.\n")
file_object.write("I love creating new games.\n")
Append content to a file: append mode, 'a'
By opening a file in append mode (with argument 'a'), we can add content to the file rather than writing over existing contentâPython doesnât erase the contents of the file before returning the file object, and any lines we write to the file will be added at the end of the file. Similar to write mode, Python will create an empty file if the file doesnât exist yet.
1
2
3
4
5
filename = 'programming.txt'
with open(filename, 'a') as file_object:
file_object.write("I also love finding meaning in large datasets.\n")
file_object.write("I love creating apps that can run in a browser.\n")
1
2
3
4
I love programming.
I love creating new games.
I also love finding meaning in large datasets.
I love creating apps that can run in a browser.
Exception
Python uses special objects called exceptions to manage errors that arise during a programâs execution. If an error occurs that makes Python unsure what to do next, the program will halt, create an exception object, and display a traceback, which includes a report of the exception that was raised, but if we could write code to handle the exception properly, the program will continue running.
Exceptions are handled with try-except blocks. A try-except block not only asks Python to do something, but also tells Python what to do if an exception is raised. When we use try-except blocks, our programs will continue running, without interrupting and exiting the program, even if things start to go wrong. Instead of tracebacks, which can be confusing for users to read, users will see friendly error messages that programmer write.
The function of Python try-except block is like MATLAB try-catch block8.
Handle the ZeroDivisionError exception by try-except block
When we divide a number by zero in Python, an error will occur:
1
print(5/0)
1
2
3
4
5
6
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[21], line 1
----> 1 print(5/0)
ZeroDivisionError: division by zero
In this example, ZeroDivisionError is a so-called exception object, and we can use a try-except block to handle it, making Python provide a user-friendly prompt and not throw an error interrupting the program:
1
2
3
4
try:
print(5/0)
except ZeroDivisionError:
print("You can't divide by zero!")
1
You can't divide by zero!
Generally, we put the code (simple print(5/0) in this case) which we think may raise an error into a try block. If the code in the try block work, Python will skip over the except block, otherwise, i.e. the code in the try block causes an error (ZeroDivisionError), Python looks for an except block whose error matches the raised error (ZeroDivisionError) and runs the code in that block (indented code followed by except ZeroDivisionError:, print("You can't divide by zero!")). As a result, users will see a friendly error message instead of a traceback.
We can put above code snippet into a more complicated case, showing as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")
while True:
first_number = input("\nFirst number: ")
if first_number == 'q':
break
second_number = input("Second number: ")
if second_number == 'q':
break
try:
answer = int(first_number) / int(second_number)
except ZeroDivisionError:
print("You can't divide by 0!")
else:
print(answer)
1
2
3
4
5
6
7
8
Give me two numbers, and I'll divide them.
Enter 'q' to quit.
First number: 5
Second number: 0
You can't divide by 0!
First number: q
This example also shows how to use a complete try-except-else block. The additional else block contains any code that depends on the try block succeeding.
Handling some particular errors correctly is especially important because the program usually has more work to do even if the error occurs, using exceptions to prevent crashes is practical. (This happens often in programs that prompt users for input. If the program responds to invalid input appropriately, it can prompt for more valid input instead of crashing.)
On the other hand, itâs also not a good idea to let users see tracebacks. Nontechnical users will be confused by them, and in a malicious setting, attackers will learn more than programmers want them to know from a traceback. For example, theyâll know the name of program file, and theyâll see a part of the code that isnât working properly. A skilled attacker can sometimes use this information to determine which kind of attacks to use against the code.
Anyway, by anticipating likely sources of errors, we can write robust programs that continue to run even when they encounter invalid data and missing resources. The code will be resistant to innocent user mistakes and malicious attacks.
Handle the FileNotFoundError exception
1
2
3
4
5
6
7
filename = 'alice.txt'
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
print(f"Sorry, the file `{filename}` does not exist.")
1
Sorry, the file `alice.txt` does not exist.
There are two changes here. One is the use of the variable f to represent the file object, which is a common convention. The second is the use of the encoding argument of the open() function. This argument is needed when the systemâs default encoding doesnât match the encoding of the file thatâs being read.
A complicated example: count the approximate number of words in a text file
The following code snippet is used to count the approximate number of words in the text file alice.txt (Alice in Wonderland, the file can be also found in resource7):
1
2
3
4
5
6
7
8
9
10
11
12
filename = 'alice.txt'
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
print(f"Sorry, the file `{filename}` does not exist.")
else:
# Count the approximate number of words in the file.
words = contents.split()
num_words = len(words)
print(f"The file {filename} has about {num_words} words.")
1
The file alice.txt has about 29461 words.
where the split() method separates a string into parts wherever it finds a space and stores all the parts of the string in a list. The result is a list of words from the string, although some punctuation may also appear with some of the words. BTW, the count is a little high because extra information is provided by the publisher in the text file.
Afterwards, by wrapping above code snippet in a function count_words(), we can easily work with multiple files:
Similarly, we can download text files siddhartha.txt and little_women.txt from resource7.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def count_words(filename):
"""Count the approximate number of words in a file."""
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
print(f"Sorry, the file {filename} does not exist.")
else:
words = contents.split()
num_words = len(words)
print(f"The file {filename} has about {num_words} words.")
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dict.txt', 'little_women.txt']
for filename in filenames:
count_words(filename)
1
2
3
4
The file alice.txt has about 29461 words.
The file siddhartha.txt has about 42172 words.
Sorry, the file moby_dict.txt does not exist.
The file little_women.txt has about 189079 words.
Using the try-except block in this example provides two significant advantages: prevent users from seeing a traceback, and let the program continue analyzing the texts itâs able to find. If we donât catch the FileNotFoundError that moby_dict.txt raised, the user would see a full traceback, and the program would stop running after trying to analyze siddhartha.txt, and hence would never analyze little_women.txt.
Make a program fail silently: pass statement
We donât need to report every exception. Sometimes, we probably want the program to fail silently when an exception occurs and continue on as if nothing happened. To make a program fail silently, we can write a try block as usual, but explicitly tell Python to do nothing in the except block by pass statement.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def count_words(filename):
"""Count the approximate number of words in a file."""
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
pass
else:
words = contents.split()
num_words = len(words)
print(f"The file {filename} has about {num_words} words.")
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dict.txt', 'little_women.txt']
for filename in filenames:
count_words(filename)
1
2
3
The file alice.txt has about 29461 words.
The file siddhartha.txt has about 42172 words.
The file little_women.txt has about 189079 words.
The pass statement also acts as a placeholder. Itâs a reminder that weâre choosing to do nothing at a specific point in the programâs execution and that we might want to do something there later. For example, in this program we might decide to write any missing filenames to a file called missing_files.txt. Our users wouldnât see this file, but we, as a programmer, would be able to read the file and deal with any missing texts.
Decide which errors to report
Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But every time the program depends on something external, such as user input, the existence of a file, or the availability of a network connection, there is a possibility of an exception being raised. A little experience will help us know where to include exception handling blocks in the program and how much to report to users about errors that arise.
Save and read data by json module
json.dump() and json.load() function
A simple way to save and read data is by using Python json module9. Python json module allows us to dump simple Python data structures into a file (by json.dump() function):
1
2
3
4
5
6
7
8
import json
numbers = [2, 3, 5, 7, 11, 13]
filename = 'numbers.json'
with open(filename, 'w') as f:
json.dump(numbers, f)
and load the data from that file (by json.load() function) the next time the program runs.
1
2
3
4
5
6
7
import json
filename = 'numbers.json'
with open(filename) as f:
numbers = json.load(f)
print(numbers)
Save and read user-generated data
Saving data with JSON is useful when working with user-generated data (i.e. user input in the following example):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import json
# Load the username, if it has been stored previously.
# Otherwise, prompt for the username and store it.
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
username = input("What is your name? ")
with open(filename, 'w') as f:
json.dump(username, f)
print(f"We'll remember you when you come back, {username}!")
else:
print(f"Welcome back, {username}!")
If the file username.json doesnât exist,
1
2
What is your name? Eric
We'll remember you when you come back, Eric!
otherwise:
1
Welcome back, Eric!
Refactor an existing program
Often, weâll come to a point where the program works, but weâll recognize that we could improve the code by breaking it up into a series of functions that have specific jobs. This process is called refactoring. Refactoring makes our code cleaner, easier to understand, and easier to extend.
For the above script, we can put the main code into a function greet_user():
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import json
def greet_user():
"""Greet the user by name."""
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
username = input("What is your name? ")
with open(filename, 'w') as f:
json.dump(username, f)
print(f"We'll remember you when you come back, {username}!")
else:
print(f"Welcome back, {username}!")
greet_user()
Next, we can continue refactoring the greet_user() function so itâs not doing so many different tasks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import json
def get_stored_username():
"""Get stored username if available."""
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
return None
else:
return username
def greet_user():
"""Greet the user by name."""
username = get_stored_username()
if username:
print(f"Welcome back, {username}!")
else:
username = input("What is your name? ")
filename = 'username.json'
with open(filename, 'w') as f:
json.dump(username, f)
print(f"We'll remember you when you come back, {username}!")
greet_user()
At last, we could further make one more block of code out of the greet_user() function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import json
def get_stored_username():
"""Get stored username if available."""
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
return None
else:
return username
def get_new_username():
"""Prompt for a new username."""
username = input("What is your name? ")
filename = 'username.json'
with open(filename, 'w') as f:
json.dump(username, f)
return username
def greet_user():
"""Greet the user by name."""
username = get_stored_username()
if username:
print(f"Welcome back, {username}!")
else:
username = get_new_username()
print(f"We'll remember you when you come back, {username}!")
greet_user()
References