Basics include
head{}Dictionary to access the header files to be passed in. If it can be considered as a general data header, the specific data header should be obtained by capturing the packet
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0',
'Accept': 'text/html,application/xhtml+xml,application /xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5 ',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive'}
Simulated login
This data is a simulated login by Visual China
First enter the wrong account and password in Visual China to obtain a sent value, which can be found by calling the check function of the browser page< br>Get the value {‘username’: “*****”,’password’: “*******”,’captcha’: “”,’lgt’: “0”,’token’ : “”}
Named date
Use the post() function to pass in the login address, the actual account password, and Head data.
Write a function to test whether cookies are returned. If there is no return value, capture the packet to find the actual sent value and extract the value
< em>For details, please see https://blog.csdn.net/churximi/article/details/50917322 I learned from here
< span style="color: #0000ff;">def login():
s = requests.session()
loginURL = "https://www.vcg.com/ajax /login/submit" # The URL sent to by POST
login = s.post(loginURL, data = date, headers = headers) # send Login information, return response information (including cookie)
cookies = login.cookies
return cookies
Get webpage
get() function to get the URL, pass in url or urls ,heasders, timeout time The value of html is the webpage
table gets the corresponding tags obtained in html If there is no corresponding internal value, it will return None and find_all() will prompt an error
html=requests.get('https://18moe.com/category/game'< /span>,headers=headers,timeout=5).text
table=BeautifulSoup(html,'lxml').find('select',{'class',< span style="color: #800000;">'poi-pager__item_middle_select poi-form__control< /span>'})
Use proxy< /p>
Not used yet waiting for supplement
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0',
'Accept': 'text/html,application/xhtml+xml,application /xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5 ',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive'}
def login():
s = requests.session()
loginURL = "https://www.vcg.com/ajax /login/submit" # The URL sent to by POST
login = s.post(loginURL, data = date, headers = headers) # send Login information, return response information (including cookie)
cookies = login.cookies
return cookies
html=requests.get('https://18moe.com/category/game',headers=headers, timeout=5).text
table=BeautifulSoup(html,'lxml').find('< /span>select',{'class','poi-pager__item_middle_select poi-form__control'})< /pre>