Takeaways from Python Crash Course: Read and Write Files, and Handle Exceptions in Python

Aug. 23, 2024

This post is a record made while learning Chapter 10 “Files and Exceptions” in Eric Matthes’s book, Python Crash Course.1

Read text from a file

Read an entire file

We can use the following code to open the file pi_digits.txt, read and print its contents to the screen:

1
2
3
with open('pi_digits.txt') as file_object:
    contents = file_object.read()
print(contents)
1
2
3
3.1415926535
   8979323846
   2643383279

where the pi_digits.txt file contains $\pi$ to 30 decimal places with 10 decimal places per line:

1
2
3
3.1415926535
   8979323846
   2643383279

open() function

To do any work with a file, we first need to open the file to access it by open() function; the open() function always returns an file object representing the file. Specifically in this case, open('pi_digits.txt') returns an object representing pi_digits.txt, and Python assigns this object to the variable file_object.

Once we have a file object representing pi_digits.txt, we can use the read() method to read the entire contents of the file and store it as one long string in contents. When we print the value of contents, we get the entire text file back.

The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line. If we want to remove the extra blank line, use rstrip() function in the call to print():

1
2
3
with open('pi_digits.txt') as file_object:
    contents = file_object.read()
    print(contents.rstrip())
1
2
3
3.1415926535
   8979323846
   2643383279

with keyword

The with keyword is used to automatically close the file that was opened by open() function, especially helpful when a bug (or an error) occurs after opening the file by open() but not yet close the file by close() function:

The keyword with closes the file once access to it is no longer needed. Notice how we call open() in this program but not close(). You could open and close the file by calling open() and close(), but if a bug in your program prevents the close() method from being executed, the file may never close. This may seem trivial, but improperly closed files can cause data to be lost or corrupted. And if you call close() too early in your program, you’ll find yourself trying to work with a closed file (a file you can’t access), which leads to more errors. It’s not always easy to know exactly when you should close a file, but with the structure shown here, Python will figure that out for you. All you have to do is open the file and work with it as desired, trusting that Python will close it automatically when the with block finishes execution.

I think the easiest method to judge whether or not a file is closed normally is checking if it can be deleted successfully.

If we execute the following code:

1
2
3
file_object = open('pi_digits.txt')
print('9' >= 1)
file_object.close()

As expected, we’ll get an error:

1
2
3
4
5
6
7
8
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 2
      1 file_object = open('pi_digits.txt')
----> 2 print('9' >= 1)
      3 file_object.close()

TypeError: '>=' not supported between instances of 'str' and 'int'

and at this time, if we try to delete the file pi_digits.txt, a Windows error will occur, “The action cannot be completed because the file is open in Python”. It’s easy to understand because Python didn’t execute file_object.close(). If we want to delete the file successfully, we should run file_object.close() again.

On the other hand, by with keyword we have following code snippet to do the same work:

1
2
3
4
5
with open('pi_digits.txt') as file_object:
    contents = file_object.read()
    print('9' >= 1)
    
print(contents)
1
2
3
4
5
6
7
8
9
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 3
      1 with open('pi_digits.txt') as file_object:
      2     contents = file_object.read()
----> 3     print('9' >= 1)
      5 print(contents)

TypeError: '>=' not supported between instances of 'str' and 'int'

although the same error occurs, we can directly delete pi_digits.txt at this point. From this small example we can better understand the advantage brought by using with.

Technically, the with keyword is simplified version of a try-catch block (or try-finally block), and it is not only available by combining with open() function—context managers all support with statement, and open() function is a special context manager. To make a context manager, we should define __enter__() and __exit__() methods for the class.

In Python, the with statement replaces a try-catch block with a concise shorthand. More importantly, it ensures closing resources right after processing them. A common example of using the with statement is reading or writing to a file. A function or class that supports the with statement is known as a context manager. A context manager allows you to open and close resources right when you want to. For example, the open() function is a context manager. When you call the open() function using the with statement, the file closes automatically after you’ve processed the file.2


… the with statement replaces this kind of try-catch block2:

1
2
3
4
5
6
f = open("example.txt", "w")

try:
    f.write("hello world")
finally:
    f.close()


The with statement is popularly used with file streams, as shown above [open() function] and with Locks, sockets, subprocesses and telnets etc.3


There is nothing special in open() which makes it usable with the with statement and the same functionality can be provided in user defined objects. Supporting with statement in your objects will ensure that you never leave any resource open. To use with statement in user defined objects you only need to add the methods __enter__() and __exit__() in the object methods.3


See also references 456 for more information about with keyword.

Absolute file path vs. relative file path

If the text file isn’t in the current folder where the script file is put, we need to provide a file path for open() function to tell Python to look in the specific directory.

For example, if the file pi_digits.txt is on the Desktop, we could provide an absolute file path:

1
2
3
4
5
6
file_path = 'C:/Users/whatastarrynight/Desktop/pi_digits.txt'
# file_path = 'C:\\Users\\whatastarrynight\\Desktop\\pi_digits.txt'

with open(file_path) as file_object:
    contents = file_object.read()
    print(contents.rstrip())
1
2
3
3.1415926535
   8979323846
   2643383279

It’s okay to use forward slash / or double backslash \\ to separate the file path, but single back slash \\ is not available:

1
2
3
4
5
file_path = 'C:\Users\whatastarrynight\Desktop\pi_digits.txt'

with open(file_path) as file_object:
    contents = file_object.read()
    print(contents.rstrip())
1
2
3
4
  Cell In[1], line 1
    file_path = 'C:\Users\whatastarrynight\Desktop\pi_digits.txt'
                                                                 ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

The reason is that, the backslash is used to start a escape characters in Python strings, which is not our intention. For example, in the path “C:\path\to\file.txt”, the sequence \t is interpreted as a tab. This also explains why \\ works: the first backslash escape the second one.

If pi_digits.txt is in a sub-folder, say text_files, under the current folder, we can choose to provide a relative file path for open() function:

1
2
3
4
5
6
7
file_path = 'text_files\pi_digits.txt'
# file_path = 'text_files/pi_digits.txt'
# file_path = 'text_files\\pi_digits.txt'

with open(file_path) as file_object:
    contents = file_object.read()
    print(contents.rstrip())
1
2
3
3.1415926535
   8979323846
   2643383279

In the relative file path, it’s all fine to take \, /, or \\ as a path delimiter.

Read an entire file line by line using for loop

We can use a for loop on the file object to examine each line from a file one at a time:

1
2
3
4
filename = 'pi_digits.txt'
with open(filename) as file_object:
    for line in file_object: # Loop over the file object
        print(line)
1
2
3
4
5
3.1415926535

   8979323846

   2643383279

These blank lines appear because an invisible newline character is at the end of each line in the text file. The print function adds its own newline each time we call it, so we end up with two newline characters at the end of each line: one from the file and one from print() function. Similarly, we can use rstrip() function on each line to eliminate these extra blank lines.

1
2
3
4
filename = 'pi_digits.txt'
with open(filename) as file_object:
    for line in file_object: # Loop over the file object
        print(line.rstrip())
1
2
3
3.1415926535
   8979323846
   2643383279

Make a list of lines from a file: readlines() method

1
2
3
4
5
6
7
8
9
10
filename = 'pi_digits.txt'

with open(filename) as file_object:
    # The `readlines()` method takes each line from the file and stores it in a list.
    lines = file_object.readlines()

print(lines)

for line in lines:
    print(line.rstrip())
1
2
3
4
['3.1415926535\n', '   8979323846\n', '   2643383279']
3.1415926535
   8979323846
   2643383279

Afterwards, we can further work with file’s contents, like concatenating above three lines as one long string:

1
2
3
4
5
6
7
8
9
10
11
filename = 'pi_digits.txt'

with open(filename) as file_object:
    lines = file_object.readlines()

pi_string = ''
for line in lines:
    pi_string += line.strip()
 
print(pi_string)
print(len(pi_string))
1
2
3.141592653589793238462643383279
32

When Python reads from a text file, it interprets all text in the file as a string. If we read in a number and want to work with that value in a numerical context, we should convert it to an integer by int() function or to a float by float().

Read in a large file

The $\pi$ in above file pi_digits.txt only contains 30 decimal places, while that in pi_million_digits.txt (which can be obtained from7) has 1,000,000 decimal places. We can adopt a similar method to read pi_million_digits.txt and concatenate its content together into a long string:

1
2
3
4
5
6
7
8
9
10
11
filename = 'pi_million_digits.txt'

with open(filename) as file_object:
    lines = file_object.readlines()
    
pi_string = ''
for line in lines:
    pi_string += line.strip()
    
print(f"{pi_string[:52]}...")
print(len(pi_string))
1
2
3.14159265358979323846264338327950288419716939937510...
1000002

Based on which we can make an interesting program to find if someone’s birthday appears in the first million digits of $\pi$. Take mine, 101798 (in date format mmddyy):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
filename = 'pi_million_digits.txt'

with open(filename) as file_object:
    lines = file_object.readlines()
    
pi_string = ''
for line in lines:
    pi_string += line.strip()
    
birthday = input("Enter your birthday, in the form mmddyy: ")

if birthday in pi_string:
    print("Your birthday appears in the first million digits of pi!")
else:
    print("Your birthday does not appear in the first million digits of pi.")
1
2
Enter your birthday, in the form mmddyy: 101798
Your birthday appears in the first million digits of pi!

Interesting!


Write to a file: open() function

One of the simplest ways to save data is to write it to a file. To do this, we need to call open() function with a second argument.

Write to an empty file: write mode 'w'

1
2
3
4
filename = 'programming.txt'

with open(filename, 'w') as file_object:
    file_object.write("I love programming.")

In the open() function, the second argument 'w' tells Python that we want to open the file in write mode. In the write mode, the open() function will automatically create the file programming.txt if it doesn’t already exist in the current folder, and will erase the file’s contents before returning the file object if the file already exist.

Besides write mode, there are also some others: read mode 'r' (default), write mode 'w', append mode 'a', and a mode that allows us to read and write to the file 'r+'.

The write() method on the file object is used to write a string to the file. By running above script, there is no any terminal output, but we can see one line in programming.txt:

1
I love programming.

Python can only write strings to a text file. If we want to output numerical data to a text file, we need to convert the data to string format first using str() function.

The write() function doesn’t add any newlines to the text we write. So if we want to write the text more than one line to the file, we could add some newline characters:

1
2
3
4
5
filename = 'programming.txt'

with open(filename, 'w') as file_object:
    file_object.write("I love programming.\n")
    file_object.write("I love creating new games.\n")

Append content to a file: append mode, 'a'

By opening a file in append mode (with argument 'a'), we can add content to the file rather than writing over existing content—Python doesn’t erase the contents of the file before returning the file object, and any lines we write to the file will be added at the end of the file. Similar to write mode, Python will create an empty file if the file doesn’t exist yet.

1
2
3
4
5
filename = 'programming.txt'

with open(filename, 'a') as file_object:
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")
1
2
3
4
I love programming.
I love creating new games.
I also love finding meaning in large datasets.
I love creating apps that can run in a browser.


Exception

Python uses special objects called exceptions to manage errors that arise during a program’s execution. If an error occurs that makes Python unsure what to do next, the program will halt, create an exception object, and display a traceback, which includes a report of the exception that was raised, but if we could write code to handle the exception properly, the program will continue running.

Exceptions are handled with try-except blocks. A try-except block not only asks Python to do something, but also tells Python what to do if an exception is raised. When we use try-except blocks, our programs will continue running, without interrupting and exiting the program, even if things start to go wrong. Instead of tracebacks, which can be confusing for users to read, users will see friendly error messages that programmer write.

The function of Python try-except block is like MATLAB try-catch block8.

Handle the ZeroDivisionError exception by try-except block

When we divide a number by zero in Python, an error will occur:

1
print(5/0)
1
2
3
4
5
6
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[21], line 1
----> 1 print(5/0)

ZeroDivisionError: division by zero

In this example, ZeroDivisionError is a so-called exception object, and we can use a try-except block to handle it, making Python provide a user-friendly prompt and not throw an error interrupting the program:

1
2
3
4
try:
    print(5/0)
except ZeroDivisionError:
    print("You can't divide by zero!")
1
You can't divide by zero!

Generally, we put the code (simple print(5/0) in this case) which we think may raise an error into a try block. If the code in the try block work, Python will skip over the except block, otherwise, i.e. the code in the try block causes an error (ZeroDivisionError), Python looks for an except block whose error matches the raised error (ZeroDivisionError) and runs the code in that block (indented code followed by except ZeroDivisionError:, print("You can't divide by zero!")). As a result, users will see a friendly error message instead of a traceback.

We can put above code snippet into a more complicated case, showing as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")

while True:
    first_number = input("\nFirst number: ")
    if first_number == 'q':
        break
    second_number = input("Second number: ")
    if second_number == 'q':
        break
        
    try:
        answer = int(first_number) / int(second_number)
    except ZeroDivisionError:
        print("You can't divide by 0!")
    else:
        print(answer)
1
2
3
4
5
6
7
8
Give me two numbers, and I'll divide them.
Enter 'q' to quit.

First number: 5
Second number: 0
You can't divide by 0!

First number: q

This example also shows how to use a complete try-except-else block. The additional else block contains any code that depends on the try block succeeding.

Handling some particular errors correctly is especially important because the program usually has more work to do even if the error occurs, using exceptions to prevent crashes is practical. (This happens often in programs that prompt users for input. If the program responds to invalid input appropriately, it can prompt for more valid input instead of crashing.)

On the other hand, it’s also not a good idea to let users see tracebacks. Nontechnical users will be confused by them, and in a malicious setting, attackers will learn more than programmers want them to know from a traceback. For example, they’ll know the name of program file, and they’ll see a part of the code that isn’t working properly. A skilled attacker can sometimes use this information to determine which kind of attacks to use against the code.

Anyway, by anticipating likely sources of errors, we can write robust programs that continue to run even when they encounter invalid data and missing resources. The code will be resistant to innocent user mistakes and malicious attacks.

Handle the FileNotFoundError exception

1
2
3
4
5
6
7
filename = 'alice.txt'

try:
    with open(filename, encoding='utf-8') as f:
        contents = f.read()
except FileNotFoundError:
    print(f"Sorry, the file `{filename}` does not exist.")
1
Sorry, the file `alice.txt` does not exist.

There are two changes here. One is the use of the variable f to represent the file object, which is a common convention. The second is the use of the encoding argument of the open() function. This argument is needed when the system’s default encoding doesn’t match the encoding of the file that’s being read.

A complicated example: count the approximate number of words in a text file

The following code snippet is used to count the approximate number of words in the text file alice.txt (Alice in Wonderland, the file can be also found in resource7):

1
2
3
4
5
6
7
8
9
10
11
12
filename = 'alice.txt'

try:
    with open(filename, encoding='utf-8') as f:
        contents = f.read()
except FileNotFoundError:
    print(f"Sorry, the file `{filename}` does not exist.")
else:
    # Count the approximate number of words in the file.
    words = contents.split()
    num_words = len(words)
    print(f"The file {filename} has about {num_words} words.")
1
The file alice.txt has about 29461 words.

where the split() method separates a string into parts wherever it finds a space and stores all the parts of the string in a list. The result is a list of words from the string, although some punctuation may also appear with some of the words. BTW, the count is a little high because extra information is provided by the publisher in the text file.

Afterwards, by wrapping above code snippet in a function count_words(), we can easily work with multiple files:

Similarly, we can download text files siddhartha.txt and little_women.txt from resource7.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def count_words(filename):
    """Count the approximate number of words in a file."""
    try:
        with open(filename, encoding='utf-8') as f:
            contents = f.read()
    except FileNotFoundError:
        print(f"Sorry, the file {filename} does not exist.")
    else:
        words = contents.split()
        num_words = len(words)
        print(f"The file {filename} has about {num_words} words.")


filenames = ['alice.txt', 'siddhartha.txt', 'moby_dict.txt', 'little_women.txt']
for filename in filenames:
    count_words(filename)
1
2
3
4
The file alice.txt has about 29461 words.
The file siddhartha.txt has about 42172 words.
Sorry, the file moby_dict.txt does not exist.
The file little_women.txt has about 189079 words.

Using the try-except block in this example provides two significant advantages: prevent users from seeing a traceback, and let the program continue analyzing the texts it’s able to find. If we don’t catch the FileNotFoundError that moby_dict.txt raised, the user would see a full traceback, and the program would stop running after trying to analyze siddhartha.txt, and hence would never analyze little_women.txt.

Make a program fail silently: pass statement

We don’t need to report every exception. Sometimes, we probably want the program to fail silently when an exception occurs and continue on as if nothing happened. To make a program fail silently, we can write a try block as usual, but explicitly tell Python to do nothing in the except block by pass statement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def count_words(filename):
    """Count the approximate number of words in a file."""
    try:
        with open(filename, encoding='utf-8') as f:
            contents = f.read()
    except FileNotFoundError:
        pass
    else:
        words = contents.split()
        num_words = len(words)
        print(f"The file {filename} has about {num_words} words.")


filenames = ['alice.txt', 'siddhartha.txt', 'moby_dict.txt', 'little_women.txt']
for filename in filenames:
    count_words(filename)
1
2
3
The file alice.txt has about 29461 words.
The file siddhartha.txt has about 42172 words.
The file little_women.txt has about 189079 words.

The pass statement also acts as a placeholder. It’s a reminder that we’re choosing to do nothing at a specific point in the program’s execution and that we might want to do something there later. For example, in this program we might decide to write any missing filenames to a file called missing_files.txt. Our users wouldn’t see this file, but we, as a programmer, would be able to read the file and deal with any missing texts.

Decide which errors to report

Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But every time the program depends on something external, such as user input, the existence of a file, or the availability of a network connection, there is a possibility of an exception being raised. A little experience will help us know where to include exception handling blocks in the program and how much to report to users about errors that arise.


Save and read data by json module

json.dump() and json.load() function

A simple way to save and read data is by using Python json module9. Python json module allows us to dump simple Python data structures into a file (by json.dump() function):

1
2
3
4
5
6
7
8
import json

numbers = [2, 3, 5, 7, 11, 13]

filename = 'numbers.json'

with open(filename, 'w') as f:
    json.dump(numbers, f)

and load the data from that file (by json.load() function) the next time the program runs.

1
2
3
4
5
6
7
import json

filename = 'numbers.json'
with open(filename) as f:
    numbers = json.load(f)

print(numbers)

Save and read user-generated data

Saving data with JSON is useful when working with user-generated data (i.e. user input in the following example):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import json

# Load the username, if it has been stored previously.
# Otherwise, prompt for the username and store it.
filename = 'username.json'
try:
    with open(filename) as f:
        username = json.load(f)
except FileNotFoundError:
    username = input("What is your name? ")
    with open(filename, 'w') as f:
        json.dump(username, f)
        print(f"We'll remember you when you come back, {username}!")
else:
    print(f"Welcome back, {username}!")

If the file username.json doesn’t exist,

1
2
What is your name? Eric
We'll remember you when you come back, Eric!

otherwise:

1
Welcome back, Eric!


Refactor an existing program

Often, we’ll come to a point where the program works, but we’ll recognize that we could improve the code by breaking it up into a series of functions that have specific jobs. This process is called refactoring. Refactoring makes our code cleaner, easier to understand, and easier to extend.

For the above script, we can put the main code into a function greet_user():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import json

def greet_user():
    """Greet the user by name."""
    filename = 'username.json'
    try:
        with open(filename) as f:
            username = json.load(f)
    except FileNotFoundError:
        username = input("What is your name? ")
        with open(filename, 'w') as f:
            json.dump(username, f)
            print(f"We'll remember you when you come back, {username}!")
    else:
        print(f"Welcome back, {username}!")

greet_user()

Next, we can continue refactoring the greet_user() function so it’s not doing so many different tasks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import json

def get_stored_username():
    """Get stored username if available."""
    filename = 'username.json'
    try:
        with open(filename) as f:
            username = json.load(f)
    except FileNotFoundError:
        return None
    else:
        return username


def greet_user():
    """Greet the user by name."""
    username = get_stored_username()
    if username:
        print(f"Welcome back, {username}!")
    else:
        username = input("What is your name? ")
        filename = 'username.json'
        with open(filename, 'w') as f:
            json.dump(username, f)
            print(f"We'll remember you when you come back, {username}!")

greet_user()

At last, we could further make one more block of code out of the greet_user() function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import json

def get_stored_username():
    """Get stored username if available."""
    filename = 'username.json'
    try:
        with open(filename) as f:
            username = json.load(f)
    except FileNotFoundError:
        return None
    else:
        return username


def get_new_username():
    """Prompt for a new username."""
    username = input("What is your name? ")
    filename = 'username.json'
    with open(filename, 'w') as f:
        json.dump(username, f)
    return username


def greet_user():
    """Greet the user by name."""
    username = get_stored_username()
    if username:
        print(f"Welcome back, {username}!")
    else:
        username = get_new_username()
        print(f"We'll remember you when you come back, {username}!")

greet_user()


References