Yi Niu cloud reptile agent sets the program of independent switching IP

1. Switch IP independently?
This mode is suitable for some businesses that require login, cookie cache processing and other crawlers that need to precisely control the timing of IP switching. Crawlers can set the HTTP header Proxy-Tunnel: random number. When the random number is the same, the proxy IP that visits the target website is the same.

For example

You need to log in and get data. Two requests are under one IP. Just set the same Proxy-Tunnel for this group of requests, for example: Proxy-Tunnel: 12345 , This group requests to use the same proxy IP during the validity period of the proxy.

Note

At the same time, different request groups can set different Proxy-Tunnel: random numbers, and data crawling can be completed concurrently.

Use the same IP to access the HTTPS target website

1 Use Connection: keep-alive and Proxy-Connection: keep-alive to access the target website, the proxy will ensure that all in a session Requests all reach the target website through an IP. 2 Set the same Proxy-Tunnel. Some libraries have higher-level encapsulation. Please make sure to send the HTTP header to the proxy.

2. TCP request to switch IP (KeepAlive)?
Each TCP request is automatically switched, which means that the crawler agent randomly provides a proxy IP for each TCP request sent by the crawler program. This mode is suitable for the needs Multiple sessions use the same IP for continuous access.

For example

Need to log in, get data two requests under one IP, only need to ensure that the group of requests are under one TCP session, the group of requests use the same during the validity period of the proxy Proxy IP.

3. User password authentication?
Identity authentication is carried out in the form of user name and password. The authentication information will eventually be converted into Proxy-Authorization protocol header and sent along with the request. At the same time, it supports authorization through Authorization protocol header. Tunnel authentication. If the user authentication is wrong, the system will return 401 Unauthorized or 407 Proxy Authentication Required.

For example

When using the HTTP tunnel in the code, if the HTTP request method of the code does not support the authentication information in the form of username/password, you need to manually set each HTTP Request to increase the Proxy-Authorization protocol header, and its value is Basic. Among them is the string of “username” and “password” after being spliced ​​and then encoded by BASE64. After correct settings, all requests sent will contain HTTP protocol header information in the following format: Proxy-Authorization: Basic MTZZVU4xMjM6MTIzNDMyMw==

Note

It is recommended to use Proxy-Authorization for user password authentication . If Authorization is used, the HTTP header information will be sent to the target website with the request. When visiting an HTTPS website, please use the proxy authentication method that comes with the library and the manually set Proxy-Authorization protocol header. When visiting an HTTPS website, it will be forwarded directly to the target website by the proxy, causing anonymity to become invalid.

Domain name resolution failed

The ttl time of the crawler agent domain name is relatively short [multi-machine and multiple hot backup]. If you encounter a failure to resolve the domain name of the crawler agent, it is recommended to use 114.114.114.114 or an operator DNS to do DNS resolution.

Leave a Comment

Your email address will not be published.