Extract xPath, Re, CSS

XPATH

(1)/Extract layer by layer

(2)text() Extract the text below the label

(3)/ /Tag name extract all tags

(4)//Tag name[num>=1] Extract sibling nodes with the same tag name.

Job title

Job category

< br>

Number of people

Location

Publication time


xpath(‘/tr[@class=”h”]/td[1]/text()’) #Job name
xpath(‘/tr[@class=”h”]/td [2]/text()’) #Job category
xpath(‘/tr[@class=”h”]/td[3]/text()’) #Number of people
xpath(‘/tr[ @class=”h”]/td[3]/text()’) #location

(5)//tag name[@attribute=’attribute value’] Extract the attribute for… Tags
//a[@class=’noactive’]
//a[@class=’noactive’ and @id=’next’]

(6) @Attribute名取某Attributes

======================================== ====================

RE

re.compile(pattern, flags=0)
flags Bit parameter

re.I(re.IGNORECASE)
Make the match insensitive to case

re.L(re.LOCAL)
Do localization recognition ( locale-aware) matching

re.M(re.MULTILINE)
Multi-line matching affects ^ and $

re.S(re.DOTALL)
make . Match all characters including newlines

re.U(re.UNICODE)
Analyze characters according to the Unicode character set. This flag affects \w, \W, \b, \B.

re.X(re.VERBOSE)
This flag gives you a more flexible format so that you can write regular expressions It’s easier to understand.

========================================== =================

Leave a Comment

Your email address will not be published.