Takeaways from Python Crash Course: Read and Write Files, and Handle Exceptions in Python
This post is a record made while learning Chapter 10 “Files and Exceptions” in Eric Matthes’s book, Python Crash Course.1
Read text from a file
Read an entire file
We can use the following code to open the file pi_digits.txt
, read and print its contents to the screen:
1
2
3
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print(contents)
1
2
3
3.1415926535
8979323846
2643383279
where the pi_digits.txt
file contains $\pi$ to 30 decimal places with 10 decimal places per line:
1
2
3
3.1415926535
8979323846
2643383279
open()
function
To do any work with a file, we first need to open the file to access it by open()
function; the open()
function always returns an file object representing the file. Specifically in this case, open('pi_digits.txt')
returns an object representing pi_digits.txt
, and Python assigns this object to the variable file_object
.
Once we have a file object representing pi_digits.txt
, we can use the read()
method to read the entire contents of the file and store it as one long string in contents
. When we print the value of contents
, we get the entire text file back.
The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read()
returns an empty string when it reaches the end of the file; this empty string shows up as a blank line. If we want to remove the extra blank line, use rstrip()
function in the call to print()
:
1
2
3
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
with
keyword
The with
keyword is used to automatically close the file that was opened by open()
function, especially helpful when a bug (or an error) occurs after opening the file by open()
but not yet close the file by close()
function:
The keyword with
closes the file once access to it is no longer needed. Notice how we call open()
in this program but not close()
. You could open and close the file by calling open()
and close()
, but if a bug in your program prevents the close()
method from being executed, the file may never close. This may seem trivial, but improperly closed files can cause data to be lost or corrupted. And if you call close()
too early in your program, you’ll find yourself trying to work with a closed file (a file you can’t access), which leads to more errors. It’s not always easy to know exactly when you should close a file, but with the structure shown here, Python will figure that out for you. All you have to do is open the file and work with it as desired, trusting that Python will close it automatically when the with
block finishes execution.
I think the easiest method to judge whether or not a file is closed normally is checking if it can be deleted successfully.
If we execute the following code:
1
2
3
file_object = open('pi_digits.txt')
print('9' >= 1)
file_object.close()
As expected, we’ll get an error:
1
2
3
4
5
6
7
8
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], line 2
1 file_object = open('pi_digits.txt')
----> 2 print('9' >= 1)
3 file_object.close()
TypeError: '>=' not supported between instances of 'str' and 'int'
and at this time, if we try to delete the file pi_digits.txt
, a Windows error will occur, “The action cannot be completed because the file is open in Python”. It’s easy to understand because Python didn’t execute file_object.close()
. If we want to delete the file successfully, we should run file_object.close()
again.
On the other hand, by with
keyword we have following code snippet to do the same work:
1
2
3
4
5
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print('9' >= 1)
print(contents)
1
2
3
4
5
6
7
8
9
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 3
1 with open('pi_digits.txt') as file_object:
2 contents = file_object.read()
----> 3 print('9' >= 1)
5 print(contents)
TypeError: '>=' not supported between instances of 'str' and 'int'
although the same error occurs, we can directly delete pi_digits.txt
at this point. From this small example we can better understand the advantage brought by using with
.
Technically, the with
keyword is simplified version of a try-catch
block (or try-finally
block), and it is not only available by combining with open()
function—context managers all support with
statement, and open()
function is a special context manager. To make a context manager, we should define __enter__()
and __exit__()
methods for the class.
In Python, the with
statement replaces a try-catch
block with a concise shorthand. More importantly, it ensures closing resources right after processing them. A common example of using the with
statement is reading or writing to a file. A function or class that supports the with
statement is known as a context manager. A context manager allows you to open and close resources right when you want to. For example, the open()
function is a context manager. When you call the open()
function using the with
statement, the file closes automatically after you’ve processed the file.2
… the with
statement replaces this kind of try-catch
block2:
1
2
3
4
5
6
f = open("example.txt", "w")
try:
f.write("hello world")
finally:
f.close()
The with
statement is popularly used with file streams, as shown above [open()
function] and with Locks, sockets, subprocesses and telnets etc.3
There is nothing special in open()
which makes it usable with the with
statement and the same functionality can be provided in user defined objects. Supporting with
statement in your objects will ensure that you never leave any resource open. To use with
statement in user defined objects you only need to add the methods __enter__()
and __exit__()
in the object methods.3
See also references 456 for more information about with
keyword.
Absolute file path vs. relative file path
If the text file isn’t in the current folder where the script file is put, we need to provide a file path for open()
function to tell Python to look in the specific directory.
For example, if the file pi_digits.txt
is on the Desktop, we could provide an absolute file path:
1
2
3
4
5
6
file_path = 'C:/Users/whatastarrynight/Desktop/pi_digits.txt'
# file_path = 'C:\\Users\\whatastarrynight\\Desktop\\pi_digits.txt'
with open(file_path) as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
It’s okay to use forward slash /
or double backslash \\
to separate the file path, but single back slash \\
is not available:
1
2
3
4
5
file_path = 'C:\Users\whatastarrynight\Desktop\pi_digits.txt'
with open(file_path) as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
4
Cell In[1], line 1
file_path = 'C:\Users\whatastarrynight\Desktop\pi_digits.txt'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
The reason is that, the backslash is used to start a escape characters in Python strings, which is not our intention. For example, in the path “C:\path\to\file.txt
”, the sequence \t
is interpreted as a tab. This also explains why \\
works: the first backslash escape the second one.
If pi_digits.txt
is in a sub-folder, say text_files
, under the current folder, we can choose to provide a relative file path for open()
function:
1
2
3
4
5
6
7
file_path = 'text_files\pi_digits.txt'
# file_path = 'text_files/pi_digits.txt'
# file_path = 'text_files\\pi_digits.txt'
with open(file_path) as file_object:
contents = file_object.read()
print(contents.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
In the relative file path, it’s all fine to take \
, /
, or \\
as a path delimiter.
Read an entire file line by line using for
loop
We can use a for
loop on the file object to examine each line from a file one at a time:
1
2
3
4
filename = 'pi_digits.txt'
with open(filename) as file_object:
for line in file_object: # Loop over the file object
print(line)
1
2
3
4
5
3.1415926535
8979323846
2643383279
These blank lines appear because an invisible newline character is at the end of each line in the text file. The print
function adds its own newline each time we call it, so we end up with two newline characters at the end of each line: one from the file and one from print()
function. Similarly, we can use rstrip()
function on each line to eliminate these extra blank lines.
1
2
3
4
filename = 'pi_digits.txt'
with open(filename) as file_object:
for line in file_object: # Loop over the file object
print(line.rstrip())
1
2
3
3.1415926535
8979323846
2643383279
Make a list of lines from a file: readlines()
method
1
2
3
4
5
6
7
8
9
10
filename = 'pi_digits.txt'
with open(filename) as file_object:
# The `readlines()` method takes each line from the file and stores it in a list.
lines = file_object.readlines()
print(lines)
for line in lines:
print(line.rstrip())
1
2
3
4
['3.1415926535\n', ' 8979323846\n', ' 2643383279']
3.1415926535
8979323846
2643383279
Afterwards, we can further work with file’s contents, like concatenating above three lines as one long string:
1
2
3
4
5
6
7
8
9
10
11
filename = 'pi_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
pi_string = ''
for line in lines:
pi_string += line.strip()
print(pi_string)
print(len(pi_string))
1
2
3.141592653589793238462643383279
32
When Python reads from a text file, it interprets all text in the file as a string. If we read in a number and want to work with that value in a numerical context, we should convert it to an integer by int()
function or to a float by float()
.
Read in a large file
The $\pi$ in above file pi_digits.txt
only contains 30 decimal places, while that in pi_million_digits.txt
(which can be obtained from7) has 1,000,000 decimal places. We can adopt a similar method to read pi_million_digits.txt
and concatenate its content together into a long string:
1
2
3
4
5
6
7
8
9
10
11
filename = 'pi_million_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
pi_string = ''
for line in lines:
pi_string += line.strip()
print(f"{pi_string[:52]}...")
print(len(pi_string))
1
2
3.14159265358979323846264338327950288419716939937510...
1000002
Based on which we can make an interesting program to find if someone’s birthday appears in the first million digits of $\pi$. Take mine, 101798
(in date format mmddyy
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
filename = 'pi_million_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
pi_string = ''
for line in lines:
pi_string += line.strip()
birthday = input("Enter your birthday, in the form mmddyy: ")
if birthday in pi_string:
print("Your birthday appears in the first million digits of pi!")
else:
print("Your birthday does not appear in the first million digits of pi.")
1
2
Enter your birthday, in the form mmddyy: 101798
Your birthday appears in the first million digits of pi!
Interesting!
Write to a file: open()
function
One of the simplest ways to save data is to write it to a file. To do this, we need to call open()
function with a second argument.
Write to an empty file: write mode 'w'
1
2
3
4
filename = 'programming.txt'
with open(filename, 'w') as file_object:
file_object.write("I love programming.")
In the open()
function, the second argument 'w'
tells Python that we want to open the file in write mode. In the write mode, the open()
function will automatically create the file programming.txt
if it doesn’t already exist in the current folder, and will erase the file’s contents before returning the file object if the file already exist.
Besides write mode, there are also some others: read mode 'r'
(default), write mode 'w'
, append mode 'a'
, and a mode that allows us to read and write to the file 'r+'
.
The write()
method on the file object is used to write a string to the file. By running above script, there is no any terminal output, but we can see one line in programming.txt
:
1
I love programming.
Python can only write strings to a text file. If we want to output numerical data to a text file, we need to convert the data to string format first using str()
function.
The write()
function doesn’t add any newlines to the text we write. So if we want to write the text more than one line to the file, we could add some newline characters:
1
2
3
4
5
filename = 'programming.txt'
with open(filename, 'w') as file_object:
file_object.write("I love programming.\n")
file_object.write("I love creating new games.\n")
Append content to a file: append mode, 'a'
By opening a file in append mode (with argument 'a'
), we can add content to the file rather than writing over existing content—Python doesn’t erase the contents of the file before returning the file object, and any lines we write to the file will be added at the end of the file. Similar to write mode, Python will create an empty file if the file doesn’t exist yet.
1
2
3
4
5
filename = 'programming.txt'
with open(filename, 'a') as file_object:
file_object.write("I also love finding meaning in large datasets.\n")
file_object.write("I love creating apps that can run in a browser.\n")
1
2
3
4
I love programming.
I love creating new games.
I also love finding meaning in large datasets.
I love creating apps that can run in a browser.
Exception
Python uses special objects called exceptions to manage errors that arise during a program’s execution. If an error occurs that makes Python unsure what to do next, the program will halt, create an exception object, and display a traceback, which includes a report of the exception that was raised, but if we could write code to handle the exception properly, the program will continue running.
Exceptions are handled with try-except
blocks. A try-except
block not only asks Python to do something, but also tells Python what to do if an exception is raised. When we use try-except
blocks, our programs will continue running, without interrupting and exiting the program, even if things start to go wrong. Instead of tracebacks, which can be confusing for users to read, users will see friendly error messages that programmer write.
The function of Python try-except
block is like MATLAB try-catch
block8.
Handle the ZeroDivisionError
exception by try-except
block
When we divide a number by zero in Python, an error will occur:
1
print(5/0)
1
2
3
4
5
6
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[21], line 1
----> 1 print(5/0)
ZeroDivisionError: division by zero
In this example, ZeroDivisionError
is a so-called exception object, and we can use a try-except
block to handle it, making Python provide a user-friendly prompt and not throw an error interrupting the program:
1
2
3
4
try:
print(5/0)
except ZeroDivisionError:
print("You can't divide by zero!")
1
You can't divide by zero!
Generally, we put the code (simple print(5/0)
in this case) which we think may raise an error into a try
block. If the code in the try
block work, Python will skip over the except
block, otherwise, i.e. the code in the try
block causes an error (ZeroDivisionError
), Python looks for an except
block whose error matches the raised error (ZeroDivisionError
) and runs the code in that block (indented code followed by except ZeroDivisionError:
, print("You can't divide by zero!")
). As a result, users will see a friendly error message instead of a traceback.
We can put above code snippet into a more complicated case, showing as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")
while True:
first_number = input("\nFirst number: ")
if first_number == 'q':
break
second_number = input("Second number: ")
if second_number == 'q':
break
try:
answer = int(first_number) / int(second_number)
except ZeroDivisionError:
print("You can't divide by 0!")
else:
print(answer)
1
2
3
4
5
6
7
8
Give me two numbers, and I'll divide them.
Enter 'q' to quit.
First number: 5
Second number: 0
You can't divide by 0!
First number: q
This example also shows how to use a complete try-except-else
block. The additional else
block contains any code that depends on the try
block succeeding.
Handling some particular errors correctly is especially important because the program usually has more work to do even if the error occurs, using exceptions to prevent crashes is practical. (This happens often in programs that prompt users for input. If the program responds to invalid input appropriately, it can prompt for more valid input instead of crashing.)
On the other hand, it’s also not a good idea to let users see tracebacks. Nontechnical users will be confused by them, and in a malicious setting, attackers will learn more than programmers want them to know from a traceback. For example, they’ll know the name of program file, and they’ll see a part of the code that isn’t working properly. A skilled attacker can sometimes use this information to determine which kind of attacks to use against the code.
Anyway, by anticipating likely sources of errors, we can write robust programs that continue to run even when they encounter invalid data and missing resources. The code will be resistant to innocent user mistakes and malicious attacks.
Handle the FileNotFoundError
exception
1
2
3
4
5
6
7
filename = 'alice.txt'
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
print(f"Sorry, the file `{filename}` does not exist.")
1
Sorry, the file `alice.txt` does not exist.
There are two changes here. One is the use of the variable f
to represent the file object, which is a common convention. The second is the use of the encoding argument of the open()
function. This argument is needed when the system’s default encoding doesn’t match the encoding of the file that’s being read.
A complicated example: count the approximate number of words in a text file
The following code snippet is used to count the approximate number of words in the text file alice.txt
(Alice in Wonderland, the file can be also found in resource7):
1
2
3
4
5
6
7
8
9
10
11
12
filename = 'alice.txt'
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
print(f"Sorry, the file `{filename}` does not exist.")
else:
# Count the approximate number of words in the file.
words = contents.split()
num_words = len(words)
print(f"The file {filename} has about {num_words} words.")
1
The file alice.txt has about 29461 words.
where the split()
method separates a string into parts wherever it finds a space and stores all the parts of the string in a list. The result is a list of words from the string, although some punctuation may also appear with some of the words. BTW, the count is a little high because extra information is provided by the publisher in the text file.
Afterwards, by wrapping above code snippet in a function count_words()
, we can easily work with multiple files:
Similarly, we can download text files siddhartha.txt
and little_women.txt
from resource7.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def count_words(filename):
"""Count the approximate number of words in a file."""
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
print(f"Sorry, the file {filename} does not exist.")
else:
words = contents.split()
num_words = len(words)
print(f"The file {filename} has about {num_words} words.")
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dict.txt', 'little_women.txt']
for filename in filenames:
count_words(filename)
1
2
3
4
The file alice.txt has about 29461 words.
The file siddhartha.txt has about 42172 words.
Sorry, the file moby_dict.txt does not exist.
The file little_women.txt has about 189079 words.
Using the try-except
block in this example provides two significant advantages: prevent users from seeing a traceback, and let the program continue analyzing the texts it’s able to find. If we don’t catch the FileNotFoundError
that moby_dict.txt
raised, the user would see a full traceback, and the program would stop running after trying to analyze siddhartha.txt
, and hence would never analyze little_women.txt
.
Make a program fail silently: pass
statement
We don’t need to report every exception. Sometimes, we probably want the program to fail silently when an exception occurs and continue on as if nothing happened. To make a program fail silently, we can write a try
block as usual, but explicitly tell Python to do nothing in the except
block by pass
statement.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def count_words(filename):
"""Count the approximate number of words in a file."""
try:
with open(filename, encoding='utf-8') as f:
contents = f.read()
except FileNotFoundError:
pass
else:
words = contents.split()
num_words = len(words)
print(f"The file {filename} has about {num_words} words.")
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dict.txt', 'little_women.txt']
for filename in filenames:
count_words(filename)
1
2
3
The file alice.txt has about 29461 words.
The file siddhartha.txt has about 42172 words.
The file little_women.txt has about 189079 words.
The pass
statement also acts as a placeholder. It’s a reminder that we’re choosing to do nothing at a specific point in the program’s execution and that we might want to do something there later. For example, in this program we might decide to write any missing filenames to a file called missing_files.txt
. Our users wouldn’t see this file, but we, as a programmer, would be able to read the file and deal with any missing texts.
Decide which errors to report
Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But every time the program depends on something external, such as user input, the existence of a file, or the availability of a network connection, there is a possibility of an exception being raised. A little experience will help us know where to include exception handling blocks in the program and how much to report to users about errors that arise.
Save and read data by json
module
json.dump()
and json.load()
function
A simple way to save and read data is by using Python json
module9. Python json
module allows us to dump simple Python data structures into a file (by json.dump()
function):
1
2
3
4
5
6
7
8
import json
numbers = [2, 3, 5, 7, 11, 13]
filename = 'numbers.json'
with open(filename, 'w') as f:
json.dump(numbers, f)
and load the data from that file (by json.load()
function) the next time the program runs.
1
2
3
4
5
6
7
import json
filename = 'numbers.json'
with open(filename) as f:
numbers = json.load(f)
print(numbers)
Save and read user-generated data
Saving data with JSON is useful when working with user-generated data (i.e. user input in the following example):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import json
# Load the username, if it has been stored previously.
# Otherwise, prompt for the username and store it.
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
username = input("What is your name? ")
with open(filename, 'w') as f:
json.dump(username, f)
print(f"We'll remember you when you come back, {username}!")
else:
print(f"Welcome back, {username}!")
If the file username.json
doesn’t exist,
1
2
What is your name? Eric
We'll remember you when you come back, Eric!
otherwise:
1
Welcome back, Eric!
Refactor an existing program
Often, we’ll come to a point where the program works, but we’ll recognize that we could improve the code by breaking it up into a series of functions that have specific jobs. This process is called refactoring. Refactoring makes our code cleaner, easier to understand, and easier to extend.
For the above script, we can put the main code into a function greet_user()
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import json
def greet_user():
"""Greet the user by name."""
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
username = input("What is your name? ")
with open(filename, 'w') as f:
json.dump(username, f)
print(f"We'll remember you when you come back, {username}!")
else:
print(f"Welcome back, {username}!")
greet_user()
Next, we can continue refactoring the greet_user()
function so it’s not doing so many different tasks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import json
def get_stored_username():
"""Get stored username if available."""
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
return None
else:
return username
def greet_user():
"""Greet the user by name."""
username = get_stored_username()
if username:
print(f"Welcome back, {username}!")
else:
username = input("What is your name? ")
filename = 'username.json'
with open(filename, 'w') as f:
json.dump(username, f)
print(f"We'll remember you when you come back, {username}!")
greet_user()
At last, we could further make one more block of code out of the greet_user()
function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import json
def get_stored_username():
"""Get stored username if available."""
filename = 'username.json'
try:
with open(filename) as f:
username = json.load(f)
except FileNotFoundError:
return None
else:
return username
def get_new_username():
"""Prompt for a new username."""
username = input("What is your name? ")
filename = 'username.json'
with open(filename, 'w') as f:
json.dump(username, f)
return username
def greet_user():
"""Greet the user by name."""
username = get_stored_username()
if username:
print(f"Welcome back, {username}!")
else:
username = get_new_username()
print(f"We'll remember you when you come back, {username}!")
greet_user()
References