Pycharm - early, Pycharm, reptile

Today, try to use pycharm+beautifulsoup for crawler testing. What I understand is mainly divided into two types: HTML written by myself and web pages on Baidu. The first one is to read the webpage written by yourself (directly upload the code):

(Mainly refer to the blog: https://blog.csdn.net/Ka_Ka314/article/details/80999803)

< div class="Highlighter">

from bs4 import BeautifulSoupfile = open('aa.html','rb')html = file.read()bs = BeautifulSoup( html,"html.parser")# Indentation format print(bs.prettify())# Get all the contents of the title tag print(bs.title)# Get the name of the title tag print(bs.title.name)# Get the title The text content of the label print(bs.title.string)# Get all the content of the head tag print(bs.head)# Get all the content in the first div tag print(bs.div)# Get the information of the first div tag The value of id print(bs.div["id"])# Get all the content in the first a tag print(bs.a)# Get all the content in all a tag print(bs.find_all("a" ))# Get id="u1"print(bs.find(id="u1"))# Get all a tags, and traverse and print the href value in the a tag for item in bs.find_all("a") : print(item.get("href"))# Get all a tags, and traverse and print the text value of a tag for item in bs.find_all("a"): print(item.get_text())

Leave a Comment Cancel reply