如何使用python HTMLParser从HTML页面中抓取特定值(How to crawl a specific value out of a HTML page with python HTMLParser)

我们想象一下，我想从HTML页面中抓取一个特定的值，但是我没有明确的标识符（name =“abc”）。我必须通过HTML层次结构找到值（在本例中为“dfgd454”：

如何使用Python HTMLparser提取该值？

它必须是以下方式：

def handle_starttag(self, tag, attrs): if tag == 'div': attrD = dict(attrs) if attrD['class'] == 'attr':

但我知道代码不够......

感谢任何帮助，因为我搜索了很多，直到现在，并没有找到一个合适的解决方案。

Let's imagine I want to crawl a specific value out of a HTML page, but I have no clear identifier (name="abc") for that value. I have to find the value (in this case "dfgd454" through the HTML hierarchy:

How can I extract that value with Python HTMLparser?

It has to be something in the way of:

def handle_starttag(self, tag, attrs): if tag == 'div': attrD = dict(attrs) if attrD['class'] == 'attr':

But I know that code is not sufficient...

Thankfull for any help because I googled a lot until now and did not find a proper solution.

最满意答案

你可以使用BeautifulSoup解析器。

from bs4 import BeautifulSoup s = '''<html><body><div id="pagecontent"><div id="container"><div id="content"><div id="tab-description"><div id="attributes"> <div class="attr"> <span class="name">Ugug</span> <span class="value">dfgd454</span> </div>''' soup = BeautifulSoup(s) print soup.select('div > span.value')[0].text

这将选择div标签的所有直接子span标签，其具有class属性值作为value 。

输出：

dfgd454

You could use BeautifulSoup parser.

This would select all the immediate child span tags of div tag which has the class attribute value as value.

Output:

dfgd454如何使用python HTMLParser从HTML页面中抓取特定值(How to crawl a specific value out of a HTML page with python HTMLParser)

我们想象一下，我想从HTML页面中抓取一个特定的值，但是我没有明确的标识符（name =“abc”）。我必须通过HTML层次结构找到值（在本例中为“dfgd454”：

如何使用Python HTMLparser提取该值？

它必须是以下方式：

def handle_starttag(self, tag, attrs): if tag == 'div': attrD = dict(attrs) if attrD['class'] == 'attr':

但我知道代码不够......

感谢任何帮助，因为我搜索了很多，直到现在，并没有找到一个合适的解决方案。

How can I extract that value with Python HTMLparser?

It has to be something in the way of:

def handle_starttag(self, tag, attrs): if tag == 'div': attrD = dict(attrs) if attrD['class'] == 'attr':

But I know that code is not sufficient...

Thankfull for any help because I googled a lot until now and did not find a proper solution.

最满意答案

你可以使用BeautifulSoup解析器。

这将选择div标签的所有直接子span标签，其具有class属性值作为value 。

输出：

dfgd454

You could use BeautifulSoup parser.

This would select all the immediate child span tags of div tag which has the class attribute value as value.

Output:

dfgd454

如何使用python HTMLParser从HTML页面中抓取特定值(How to crawl a specific value out of a HTML page with python HTMLParser)

最满意答案

最满意答案

发布评论取消回复

最近发表

相关推荐

标签列表

如何使用python HTMLParser从HTML页面中抓取特定值(How to crawl a specific value out of a HTML page with python HTMLParser)

最满意答案

最满意答案

发布评论 取消回复

最近发表

相关推荐

标签列表

发布评论取消回复