Using AIOHTTP to make asynchronous reptiles

September 28, 2021By Simo Web Crawler

Introduction
asyncio can implement single-threaded concurrent IO operations and is a commonly used asynchronous processing module in Python. Regarding the introduction of the asyncio module, the author will introduce it in a follow-up article. This article will describe an HTTP framework based on asyncio-aiohttp, which can help us implement HTTP requests asynchronously, thereby greatly improving the efficiency of our programs.
This article will introduce a simple application of aiohttp in crawlers.
In the original project, we used Python’s crawler framework scrapy to crawl the book information of Dangdang.com’s best-selling books. In this article, the author will make crawlers in two ways, compare the efficiency of synchronous crawlers and asynchronous crawlers (implemented by aiohttp), and show the advantages of aiohttp in crawlers.
Synchronous crawler
First, let’s take a look at the crawler implemented by the general method, that is, the synchronous method. The complete Python code is as follows:

Using aiohttp Making an asynchronous crawler

The output results are as follows:

Use aiohttp to make asynchronous crawlers

The program ran for 23.5 seconds and crawled information about 500 books , The efficiency is still possible. We go to the directory to view the files as follows:

Use aiohttp to make asynchronous crawlers

Asynchronous crawlers
Next, let’s take a look at the efficiency of asynchronous crawlers made with aiohttp. The complete source code is as follows:

Use aiohttp to make asynchronous crawlers

We can see that this crawler is basically the same as the original general method of crawler thinking and processing methods Consistent, only the aiohttp module is used when processing HTTP requests and the function becomes a coroutine when parsing web pages, and then aysncio is used for concurrent processing, which will undoubtedly improve the efficiency of crawlers. Its running results are as follows:

Use aiohttp to make asynchronous crawlers

2.4 seconds, so amazing! ! ! Let’s take a look at the content of the file again:

Use aiohttp to make asynchronous crawlers

p>

Summary
To sum up, it can be seen that the efficiency of crawlers made by synchronous and asynchronous methods is very different. Therefore, in the actual process of making crawlers, we may wish to consider asynchronous crawlers and make more use of them. Asynchronous modules, such as aysncio, aiohttp. In addition, aiohttp only supports Python versions after 3.5.3.

Of course, this article is just an example of asynchronous crawlers, and does not specifically tell the story behind asynchrony, and the idea of asynchrony has a wide range of applications in our real life and website production. This concludes this article. Welcome everyone to communicate. If you like python like me and are running on the road of learning python, you are welcome to join the python learning group: 839383765 The group will share the latest industry information, enterprise project cases, share free python courses, and communicate together every day Learning, let learning become a habit!

AIOHTTP, asynchronous, crawler, production, utilization

Leave a Comment Cancel reply