Core Competence Summary
Responsible for: Multi-platform information capture, cleaning and analysis work
Requirements:
- Familiar with common open source crawler frameworks, such as scrapy / pyspider?
- Understand the principle of cookie-based login, and be familiar with common information extraction techniques , Such as regular expressions, XPath
- Familiar with common anti-crawler technologies, and have certain resistance capabilities
- Experience in distributed crawler architecture*
Byte Beat python crawler engineer 22-40k
Responsible:
- Design and develop a distributed web crawler system, capture and analyze multi-platform information, monitor the crawler progress and alert feedback in real time
- Web page information and APP data extraction, cleaning, and deduplication, etc. Work
Requirements:
- Have a solid algorithm and data structure ability
- Familiar with crawler principles, Familiar with common anti-crawler technologies
- Master the http protocol, familiar with common data extraction technologies such as html, dom, xpath, etc.
- Persons with experience in large-scale data processing, data mining, information extraction, etc. Priority
Xiaomi Data Crawler Engineer 20-40k
Responsible: p>
- Responsible for the design and development of a distributed web crawler system, for the capture and analysis of multi-platform information
- Responsible for the extraction of page content for web search, and the filter weight under the search field ( simhash/minhash), clustering, anti-spam, page analysis, tags, classifiers (Bayes/Bayes/LR/SVM), data mining, etc., to improve the crawling efficiency of the platform
- Participate in crawlers Core algorithm and strategy optimization, familiar with the scheduling strategy of the collection system
- Real-time monitoring of crawler progress and alert feedback
Requirements:
- Familiar with Linux system, mastering languages such as Python
- Mastering the principle of web crawling And technology, understand the principle of cookie-based login, and be familiar with web page information extraction technologies based on regular expressions, XPath, CSS, etc.
- Familiar with the entire crawler design and implementation process, have experience in large-scale web information extraction and development, and be familiar with Various anti-crawler technologies, with experience in distributed crawler architecture
- The ability to link analysis (pagerank, trustrank), feature extraction (page quality, authority, topic, linear/non-linear regression, LDA), etc. is preferred
NetEase Crawler Engineer 12-24k
Responsible:
< ul>
Requirements:< /p>
- Proficient in python, computer networks, proficient in multi-threading, familiar with common crawler frameworks such as Scrapy;
- Familiar with Linux operations, regular expressions, MySQL, MongoDB and other common databases, understand Various Web front-end technologies;
- Can solve the problems of account closure, IP closure, verification code recognition, image recognition, etc.;
Scallop Crawler Engineer 8-16k
Responsible:
- Develop a distributed web crawler system to capture information on multiple platforms and Analysis work?
- Responsible for web page information and App data extraction and deduplication work?
- Cooperate with algorithm posts to complete ETL related tasks
Requirements:
- Master the principles and technologies of web crawling, understand the principles of cookie-based login, and be familiar with web information extraction technologies based on regular expressions and XPath?
- Solid coding ability and algorithm foundation, familiar with Python / Shell development under Linux
< li>Familiar with commonly used open source crawler frameworks, such as scrapy / pyspider?