Thursday, August 20, 2009

Python 3.1 File I/O open() Is No Longer Binary Mode By Default

After long holiday on my wedding preparation, D day and honeymoon, I returned back in shape.

Some things happened during the off days, such as the acquisition of SpringSource by VMWare, Inc.

I try to run my piece of XML parsing code like this in Python 3.1:

[sourcecode lang="python"]
import xml.parsers.expat
import sys
import os.path

filename = "companies.xml"
parser = xml.parsers.expat.ParserCreate()
f = open(filename, "r")
parser.ParseFile(f)
f.close()
[/sourcecode]

The code above throws an error like this:

[sourcecode lang="shell"]
TypeError: read() did not return a bytes object (type=str)
[/sourcecode]

This source code works in Python 2.6, but failed when running on Python 3.1.

After comparing the manual for the built-in open() function between those 2 versions (2.6 and 3.x versions), I found out that there is a new feature in Python 3.x which is not backward compatibility (unlike transitions between 2.x versions, the transition from 2.x to 3.x may break the backward compatibility).

Python 3.x added the "t" for text, "b" for binary, "+" for updating (read/write) and "U" modifier to the open file mode.

It turns out to be that Python 3.1 no more handles file open as binary by default.Now the text ("t") mode become default for Python 3.1 meanwhile Python 2.6 assume all file access are binary accesses. Python 3.x has implemented modes of more similarity to its C stdio library counterpart than its previous 2.6 version.

Since Python 3.x added additional parameter to the open() function, we must now specify "b" to make it binary access, so that the read() method will return bytes, otherwise it will return str.

[sourcecode lang="python"]
import xml.parsers.expat
import sys
import os.path

xmlFilename = "companies.xml"
p = xml.parsers.expat.ParserCreate()
f = open(xmlFilename, "rb")
p.ParseFile(f)
f.close()
[/sourcecode]

Now the ParseFile() method get what it asks for, a file handler with read() method that return bytes instead of str.