Showing posts with label non-ascii. Show all posts
Showing posts with label non-ascii. Show all posts

Sunday, March 11, 2012

Reading unicode data through Python

Processing files containing unicode based characters requires using the codec library instead of the standard file processing libraries. The relevant code:


import codecs

file = codecs.open("file_with_unicode_data.txt", "r", "utf-8")
print file.readlines()
file.close()

Thursday, July 7, 2011

Python Non-ASCII character '\xc3'

During a python script execution, if you get the following error:

SyntaxError: Non-ASCII character '\xc3' in file text2term_topia.py on line 21, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Add the following statement right at the begining, before all import statements.

# -*- coding: utf-8 -*-

Similar approach for errors with the term '\xe2'.