将HTML实体转换为Unicode,反之亦然(Convert HTML entities to Unicode and vice versa)

可能的重复:

将XML / HTML实体转换为Python中的Unicode字符串 HTML实体代码到文本

在Python中如何将HTML实体转换为Unicode,反之亦然?

Possible duplicates:

Convert XML/HTML Entities into Unicode String in Python HTML Entity Codes to Text

How do you convert HTML entities to Unicode and vice versa in Python?

最满意答案

对于“反之亦然”(我需要我自己,引导我找到这个问题,这没有帮助,接下来又有一个有答案的网站 ):

u'some string'.encode('ascii', 'xmlcharrefreplace')

将返回一个简单的字符串,其中任何非ASCII字符转换为XML(HTML)实体。

You need to have BeautifulSoup.

from BeautifulSoup import BeautifulStoneSoup import cgi def HTMLEntitiesToUnicode(text): """Converts HTML entities to unicode. For example '&amp;' becomes '&'.""" text = unicode(BeautifulStoneSoup(text, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)) return text def unicodeToHTMLEntities(text): """Converts unicode to HTML entities. For example '&' becomes '&amp;'.""" text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace') return text text = "&amp;, &reg;, &lt;, &gt;, &cent;, &pound;, &yen;, &euro;, &sect;, &copy;" uni = HTMLEntitiesToUnicode(text) htmlent = unicodeToHTMLEntities(uni) print uni print htmlent # &, ®, <, >, ¢, £, ¥, €, §, © # &amp;, &#174;, &lt;, &gt;, &#162;, &#163;, &#165;, &#8364;, &#167;, &#169;将HTML实体转换为Unicode,反之亦然(Convert HTML entities to Unicode and vice versa)

可能的重复:

将XML / HTML实体转换为Python中的Unicode字符串 HTML实体代码到文本

在Python中如何将HTML实体转换为Unicode,反之亦然?

Possible duplicates:

Convert XML/HTML Entities into Unicode String in Python HTML Entity Codes to Text

How do you convert HTML entities to Unicode and vice versa in Python?

最满意答案

对于“反之亦然”(我需要我自己,引导我找到这个问题,这没有帮助,接下来又有一个有答案的网站 ):

u'some string'.encode('ascii', 'xmlcharrefreplace')

将返回一个简单的字符串,其中任何非ASCII字符转换为XML(HTML)实体。

You need to have BeautifulSoup.

from BeautifulSoup import BeautifulStoneSoup import cgi def HTMLEntitiesToUnicode(text): """Converts HTML entities to unicode. For example '&amp;' becomes '&'.""" text = unicode(BeautifulStoneSoup(text, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)) return text def unicodeToHTMLEntities(text): """Converts unicode to HTML entities. For example '&' becomes '&amp;'.""" text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace') return text text = "&amp;, &reg;, &lt;, &gt;, &cent;, &pound;, &yen;, &euro;, &sect;, &copy;" uni = HTMLEntitiesToUnicode(text) htmlent = unicodeToHTMLEntities(uni) print uni print htmlent # &, ®, <, >, ¢, £, ¥, €, §, © # &amp;, &#174;, &lt;, &gt;, &#162;, &#163;, &#165;, &#8364;, &#167;, &#169;