无法用python打开unicode网址(Can't open Unicode URL with Python)

使用Python 2.5.2和Linux的debian我试图从西班牙语的URL中获取包含西班牙语字符('í')的内容:

import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()

我收到这个错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

我已经尝试使用之前将URL传递给urllib:

url = urllib.quote(url)

和这个:

url = url.encode('UTF-8')

但它不起作用

你能告诉我我做错了什么吗?

Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í':

import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()

I'm getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

I've tried using before passing the url to urllib this:

url = urllib.quote(url)

and this:

url = url.encode('UTF-8')

but they didn't work.

Can you tell me what I am doing wrong ?

最满意答案

根据适用标准RFC 1378 ,URL只能包含ASCII字符。 这里有很好的解释,我引用:

“...只有字母数字[0-9a-zA-Z],特殊字符”$ -_。+!*'(),“[不包括引号]和用于保留目的的保留字符在URL中使用未编码。“

正如我给出的网址解释的那样,这可能意味着您必须用'%ED'替换“带有尖锐重音的小写字母”。

Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.

无法用python打开unicode网址(Can't open Unicode URL with Python)

使用Python 2.5.2和Linux的debian我试图从西班牙语的URL中获取包含西班牙语字符('í')的内容:

import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()

我收到这个错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

我已经尝试使用之前将URL传递给urllib:

url = urllib.quote(url)

和这个:

url = url.encode('UTF-8')

但它不起作用

你能告诉我我做错了什么吗?

Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í':

import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read()

I'm getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

I've tried using before passing the url to urllib this:

url = urllib.quote(url)

and this:

url = url.encode('UTF-8')

but they didn't work.

Can you tell me what I am doing wrong ?

最满意答案

根据适用标准RFC 1378 ,URL只能包含ASCII字符。 这里有很好的解释,我引用:

“...只有字母数字[0-9a-zA-Z],特殊字符”$ -_。+!*'(),“[不包括引号]和用于保留目的的保留字符在URL中使用未编码。“

正如我给出的网址解释的那样,这可能意味着您必须用'%ED'替换“带有尖锐重音的小写字母”。

Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.