Python escape or unescape html special words methods.
1、The cgi module that comes with Python has an escape() function:
1 import cgi
2
3 s = cgi.escape( """& < >""" ) # s = "& < >"
However, it doesn't escape characters beyond &, <, and >. If it is used as cgi.escape(string_to_escape, quote=True), it also escapes ".
Recent Python 3.2 have html module with html.escape() and html.unescape() functions. html.escape() differs from cgi.escape() by its defaults to quote=True:
1 import html
2
3 s = html.escape( """& < " ' >""" ) # s = '& < " ' >'
Here's a small snippet that will let you escape quotes and apostrophes as well:
1 html_escape_table = {
2 "&": "&",
3 '"': """,
4 "'": "'",
5 ">": ">",
6 "<": "<",
7 }
8
9 def html_escape(text):
10 """Produce entities within text."""
11 return "".join(html_escape_table.get(c,c) for c in text)
You can also use escape() from xml.sax.saxutils to escape html. This function should execute faster. The unescape() function of the same module can be passed the same arguments to decode a string.
1 from xml.sax.saxutils import escape, unescape
2 # escape() and unescape() takes care of &, < and >.
3 html_escape_table = {
4 '"': """,
5 "'": "'"
6 }
7 html_unescape_table = {v:k for k, v in html_escape_table.items()}
8
9 def html_escape(text):
10 return escape(text, html_escape_table)
11
12 def html_unescape(text):
13 return unescape(text, html_unescape_table)
Undoing the escaping performed by cgi.escape() isn't directly supported by the library. This can be accomplished using a fairly simple function, however:
1 def unescape(s):
2 s = s.replace("<", "<")
3 s = s.replace(">", ">")
4 # this has to be last:
5 s = s.replace("&", "&")
6 return s
A very easy way to transform non-ASCII characters like German umlauts or letters with accents into their HTML equivalents is simply encoding them from unicode to ASCII and use the xmlcharrefreplace encoding error handling:
>>> a = u"äöüßáà"
>>> a.encode('ascii', 'xmlcharrefreplace')
'äöüßáà'