Tuesday, June 19, 2012

Removing punctuations


Python has a built-in function to access all the punctuations:

>>> from string import punctuation
>>> print punctuation
!"#$%&'()*+,-./:;<=>?@[]^_`{|}~

Code to remove punctuation(s) from a given string:

from string import punctuation
def removePunctuation(string,replacement='',exclude=''):
   for p in set(list(punctuation)).difference(set(list(exclude))):
       string = string.replace(p,replacement)
   return string

>>> removePunctuation("Hello World!!",' ')
"Hello World  "

>>> removePunctuation("Hello World!!")
"Hello World"

>>> removePunctuation("Hello-World!!",'  ','!')
"Hello World!!"

The replacement parameter replaces the punctuation characters with the given character.
The default value to replace punctuation marks is an empty string.

The exclude option provides scope to retain specific punctuations. For example in the case of cleaning a paragraph of text, we might want to retain the full stop (.) mark. The exclude parameter takes a string containing all the punctuations that needs to be skipped.

No comments:

Post a Comment