将HTML实体转换为Unicode,反之亦然(Convert HTML entities to Unicode and vice versa)
可能的重复:
将XML / HTML实体转换为Python中的Unicode字符串 HTML实体代码到文本
在Python中如何将HTML实体转换为Unicode,反之亦然?
Possible duplicates:
Convert XML/HTML Entities into Unicode String in Python HTML Entity Codes to Text
How do you convert HTML entities to Unicode and vice versa in Python?
最满意答案
对于“反之亦然”(我需要我自己,引导我找到这个问题,这没有帮助,接下来又有一个有答案的网站 ):
u'some string'.encode('ascii', 'xmlcharrefreplace')将返回一个简单的字符串,其中任何非ASCII字符转换为XML(HTML)实体。
You need to have BeautifulSoup.
from BeautifulSoup import BeautifulStoneSoup import cgi def HTMLEntitiesToUnicode(text): """Converts HTML entities to unicode. For example '&' becomes '&'.""" text = unicode(BeautifulStoneSoup(text, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)) return text def unicodeToHTMLEntities(text): """Converts unicode to HTML entities. For example '&' becomes '&'.""" text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace') return text text = "&, ®, <, >, ¢, £, ¥, €, §, ©" uni = HTMLEntitiesToUnicode(text) htmlent = unicodeToHTMLEntities(uni) print uni print htmlent # &, ®, <, >, ¢, £, ¥, €, §, © # &, ®, <, >, ¢, £, ¥, €, §, ©将HTML实体转换为Unicode,反之亦然(Convert HTML entities to Unicode and vice versa)可能的重复:
将XML / HTML实体转换为Python中的Unicode字符串 HTML实体代码到文本
在Python中如何将HTML实体转换为Unicode,反之亦然?
Possible duplicates:
Convert XML/HTML Entities into Unicode String in Python HTML Entity Codes to Text
How do you convert HTML entities to Unicode and vice versa in Python?
最满意答案
对于“反之亦然”(我需要我自己,引导我找到这个问题,这没有帮助,接下来又有一个有答案的网站 ):
u'some string'.encode('ascii', 'xmlcharrefreplace')将返回一个简单的字符串,其中任何非ASCII字符转换为XML(HTML)实体。
You need to have BeautifulSoup.
from BeautifulSoup import BeautifulStoneSoup import cgi def HTMLEntitiesToUnicode(text): """Converts HTML entities to unicode. For example '&' becomes '&'.""" text = unicode(BeautifulStoneSoup(text, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)) return text def unicodeToHTMLEntities(text): """Converts unicode to HTML entities. For example '&' becomes '&'.""" text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace') return text text = "&, ®, <, >, ¢, £, ¥, €, §, ©" uni = HTMLEntitiesToUnicode(text) htmlent = unicodeToHTMLEntities(uni) print uni print htmlent # &, ®, <, >, ¢, £, ¥, €, §, © # &, ®, <, >, ¢, £, ¥, €, §, ©
发布评论