FeedParser Processes RSS Documents

Before introducing the feedparser package, let’s understand RSS first

RSS subscription is used to share content between sites and other sites A simple way of that is Really Simple Syndication.
In the face of news that is coming, you don’t need to spend a lot of time Surfing and downloading from news sites, as long as you download or purchase a small program, this technology called RSS will collect and organize customized news, according to the format, location, time and method you want, directly sent to your On the computer. News websites and those online diary authors have already experienced the fun of RSS feeds, which also makes it easier for readers to track RSS feeds. With its convenient and quick working method, RSS has brought a great leap in work efficiency for the vast web, but it has also contributed to the high-speed repetition of information.

Almost all blogs support RSS subscriptions. The RSS feed is a simple XML document that contains all blogs and all the article entry information. Through Universal Feed Parser, these online RSS feeds can be easily processed, and titles, links, and article entries in RSS or Atom feeds can be easily obtained later. The following explains how to use FeedParser.

http://rss.huanqiu.com/

feedparser installation

< div class="para">

This is a Python package, the download address is in the google code homepage below, and a simple introductory document:< /p>

Project Home: https://code.google.com/p/feedparser/
Project doc: http://packages.python.org /feedparser/introduction.html#parsing-a-feed-from-a-remote-url

After downloading, enter the folder in the command line and execute python setup.py install.

It was found that an error will occur when performing this operation: (it should be that the setuptools module was not found)

< p>python setup.py install
Report an error!
Then install the setuptools module

python Code

import feedparser

url ='http://rss.huanqiu.com/mil/world.xml'
url ='http://www.xinhuanet.com/politics/news_politics.xml'
feedtext = feedparser.parse(url)
a = feedtext['feed']
print (a)

title = feedtext['feed']['title']
print (title)

print ('
 the sum numbers of entries',len(feedtext.entries),'
')
entries = feedtext['entries'][0]
print (entries)


autor = feedtext.entries[0].author
print (autor)

#com = feedtext.entries[0].comments
#print (com)

summ = feedtext.entries[0].summary # is the content
print (summ)

If you want to learn more about feedparser, please refer to

http://pythonhosted.org//feedparser/ Learn more elements

Through the learning of feedparser, this toolkit is mainly to extract the source code on a given webpage, and then convert it into the type it needs
, and then analyze its Type structure, process, extract the required data, or link

So if you want to grab a specific item information, you need to analyze the content of the entries, and then get the link.< br> Get the final content through this link.

If you want to get a specific content, you need to further extract the information, combine with other crawlers to get the link first, and then get the content through this link.

< div class="card-summary-content">

RSS subscription is a simple way for sites to share content with other sites. That is Really Simple Syndication.
In the face of news that is coming, you don’t need to spend a lot of time Surfing and downloading from news sites, as long as you download or purchase a small program, this technology called RSS will collect and organize customized news, according to the format, location, time and method you want, directly sent to your On the computer. News websites and those online diary authors have already experienced the fun of RSS feeds, which also makes it easier for readers to track RSS feeds. With its convenient and quick working method, RSS has brought a great leap in work efficiency for the vast web, but it has also contributed to the high-speed repetition of information.

Almost all blogs support RSS subscriptions. The RSS feed is a simple XML document that contains all blogs and all the article entry information. Through Universal Feed Parser, these online RSS feeds can be easily processed, and titles, links, and article entries in RSS or Atom feeds can be easily obtained later. The following explains how to use FeedParser.

http://rss.huanqiu.com/

feedparser installation

< div class="para">

This is a Python package, the download address is in the google code homepage below, and a simple introductory document:< /p>

Project Home: https://code.google.com/p/feedparser/
Project doc: http://packages.python.org /feedparser/introduction.html#parsing-a-feed-from-a-remote-url

After downloading, enter the folder in the command line and execute python setup.py install.

It was found that an error will occur when performing this operation: (it should be that the setuptools module was not found)

< p>python setup.py install
Report an error!
Then install the setuptools module

python Code

import feedparser

url ='http://rss.huanqiu.com/mil/world.xml'
url ='http://www.xinhuanet.com/politics/news_politics.xml'
feedtext = feedparser.parse(url)
a = feedtext['feed']
print (a)

title = feedtext['feed']['title']
print (title)

print ('
 the sum numbers of entries',len(feedtext.entries),'
')
entries = feedtext['entries'][0]
print (entries)


autor = feedtext.entries[0].author
print (autor)

#com = feedtext.entries[0].comments
#print (com)

summ = feedtext.entries[0].summary # is the content
print (summ)

If you want to learn more about feedparser, please refer to

http://pythonhosted.org//feedparser/ Learn more elements

Through the learning of feedparser, this toolkit is mainly to extract the source code on a given webpage, and then convert it into the type it needs
, and then analyze its Type structure, process, extract the required data, or link

So if you want to grab a specific item information, you need to analyze the content of the entries, and then get the link.< br> Get the final content through this link.

If you want to get a specific content, you need to further extract the information, combine with other crawlers to get the link first, and then get the content through this link.