使用Python 2.5.2和Linux的debian我试图从西班牙语的URL中获取包含西班牙语字符('í')的内容:
import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()我收到这个错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
我已经尝试使用之前将URL传递给urllib:
url = urllib.quote(url)
和这个:
url = url.encode('UTF-8')
但它不起作用
你能告诉我我做错了什么吗?
Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í':
import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()I'm getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
I've tried using before passing the url to urllib this:
url = urllib.quote(url)and this:
url = url.encode('UTF-8')but they didn't work.
Can you tell me what I am doing wrong ?
最满意答案
根据适用标准RFC 1378 ,URL只能包含ASCII字符。 这里有很好的解释,我引用:
“...只有字母数字[0-9a-zA-Z],特殊字符”$ -_。+!*'(),“[不包括引号]和用于保留目的的保留字符在URL中使用未编码。“
正如我给出的网址解释的那样,这可能意味着您必须用'%ED'替换“带有尖锐重音的小写字母”。
Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.
无法用python打开unicode网址(Can't open Unicode URL with Python)使用Python 2.5.2和Linux的debian我试图从西班牙语的URL中获取包含西班牙语字符('í')的内容:
import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()我收到这个错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
我已经尝试使用之前将URL传递给urllib:
url = urllib.quote(url)
和这个:
url = url.encode('UTF-8')
但它不起作用
你能告诉我我做错了什么吗?
Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í':
import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()I'm getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
I've tried using before passing the url to urllib this:
url = urllib.quote(url)and this:
url = url.encode('UTF-8')but they didn't work.
Can you tell me what I am doing wrong ?
最满意答案
根据适用标准RFC 1378 ,URL只能包含ASCII字符。 这里有很好的解释,我引用:
“...只有字母数字[0-9a-zA-Z],特殊字符”$ -_。+!*'(),“[不包括引号]和用于保留目的的保留字符在URL中使用未编码。“
正如我给出的网址解释的那样,这可能意味着您必须用'%ED'替换“带有尖锐重音的小写字母”。
Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.
发布评论