Skip to navigation Skip to content
Simon Technology Blog
  • Architecture
  • Cloud
  • Database
  • Develop
  • Hardware
  • Industry
  • Language
  • Mobile
  • Opensource
  • OS
  • Web
Main Navigation

Tag: Crawl

The concept and role of reptile

Contents

01 The concept of crawlers

02 Crawler process

03 HTTP protocol

04 WEBSOCKET

reptile Concept

The more official name of crawlers is data collection, which

September 29, 2021By Simo Web Crawler Concept, Crawl, roleLeave a Comment

The bug of the crawler handles the website — less than the number unconverted into entity

1. BUG found    crawling chinadrugtrials When the test information was published on the details page, it was found that the program was broken in some places, as follows:

After investigation,

September 29, 2021By Simo Web Crawler BUG, Crawl, entity, less than, no, Number, transformation, treatment, WebsiteLeave a Comment

Reptile Frame SCRAPY (2)

One: Introduction to the core components of scrapy 1: Engine (scrapy): responsible for data processing of the entire system process, triggering things (core)

2: Scheduling Scheduler: Put the

September 29, 2021By Simo Web Crawler Crawl, frame, SCRAPY, twoLeave a Comment

Climber – General Code Framework

1. Baidu search keyword submission

The format of Baidu search path is: http://www.baidu.com/s?wd=keyword

import requests
keyword = “Python”
try:
kv = {‘wd’: keyword}
url = “http:/

September 29, 2021By Simo Web Crawler code, Crawl, frame, GeneralLeave a Comment

Link split and combination in reptile

from urllib.parse import urlencode, quotefrom oauthlib.common import urldecodedef decodeUrl(url): “”” :param url: Pass in a link to be decoded: return: output tuple url, a dictionary containing par

September 29, 2021By Simo Web Crawler Crawl, link, Merge, middle, SplitLeave a Comment

Reptile automatically generates request head tutorial

Previous situation summary:

< span style="font-size: 18pt; color: #ff0000;">  The request header is a way of disguising the operator. Because the request header contains a lot of content;

September 28, 2021By Simo Web Crawler automatic, Crawl, generated, Head, Request, tutorialLeave a Comment

Multi-site RSS news text, import the discuz forum, automatic posting (1)

The company’s R&D department cannot access the Internet, but the company hopes that its R&D colleagues can Follow the news, understand the hotspots of science and technology, and keep up with the t

September 27, 2021By Simo Rss Auto, Crawl, Discuz, Forum, import, Multi, News, one, POST, realization, RSS, site, TextLeave a Comment
Recent Posts
  • Sencha-Touch-2 – Sencha Touch 2, Nested XML Analysis NodeValue
  • Add a separation line and format XML content
  • Is there a norm of simplified XML subsets?
  • Look at it when you write React
  • ReactJS – Present React Redux React-Router App to add the server to the Firebase hosted by the Firebase
Categories
  • Android
  • Apache
  • Apache Kafka
  • Asp
  • Auto-Test
  • Automated Build
  • Aws
  • Bitcoin
  • Browser
  • C & C++
  • C#
  • Centos
  • Cgi
  • Character
  • Cloud Service
  • Cocos2dx
  • Cordova
  • CSS
  • Data Structure
  • Delphi
  • Design Pattern
  • Dojo
  • Dubbo
  • ELK
  • Flex
  • football
  • Game
  • Hadoop
  • Hibernate
  • HTML
  • Hybrid
  • Intel
  • IOS
  • Ipad
  • iPhone
  • Java
  • Javascript
  • Jetty
  • JQuery
  • Jsp
  • Linux
  • Load Balance
  • Lua
  • Macbook
  • Macos
  • Mathematics
  • Micro Services
  • Monitoring
  • Motherboard
  • Mysql
  • Network Hardware
  • Network Marketing
  • Nginx
  • NodeJs
  • Nosql
  • Oracle
  • Os Theory
  • Performance
  • PHP
  • Postgresql
  • Power Designer
  • React
  • Redis
  • Regexp
  • Rom
  • Rss
  • Ruby
  • Search Engines
  • Shell Script
  • Silicon Valley
  • Silverlight
  • Software Design
  • Spring
  • Sql
  • Sqlite
  • Sqlserver
  • Storage
  • Storm
  • Surface
  • SVN
  • Swift
  • System Architecture
  • Tablet
  • Uncategorized
  • Unix
  • Visual Basic
  • Visual Studio
  • Web Crawler
  • WebService
  • Windows
  • Wireless
  • XML
  • ZooKeeper
Archives
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • September 2019
  • August 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
© Simon Technology Blog 2025 • ThemeCountry Powered by WordPress