Reptile Engineer JD

Core Competence Summary

Responsible for: Multi-platform information capture, cleaning and analysis work

Requirements:

  • Familiar with common open source crawler frameworks, such as scrapy / pyspider?
  • Understand the principle of cookie-based login, and be familiar with common information extraction techniques , Such as regular expressions, XPath
  • Familiar with common anti-crawler technologies, and have certain resistance capabilities
  • Experience in distributed crawler architecture*

Byte Beat python crawler engineer 22-40k

Responsible:

  • Design and develop a distributed web crawler system, capture and analyze multi-platform information, monitor the crawler progress and alert feedback in real time
  • Web page information and APP data extraction, cleaning, and deduplication, etc. Work

Requirements:

  • Have a solid algorithm and data structure ability
  • Familiar with crawler principles, Familiar with common anti-crawler technologies
  • Master the http protocol, familiar with common data extraction technologies such as html, dom, xpath, etc.
  • Persons with experience in large-scale data processing, data mining, information extraction, etc. Priority

Xiaomi Data Crawler Engineer 20-40k

Responsible: p>

  • Responsible for the design and development of a distributed web crawler system, for the capture and analysis of multi-platform information
  • Responsible for the extraction of page content for web search, and the filter weight under the search field ( simhash/minhash), clustering, anti-spam, page analysis, tags, classifiers (Bayes/Bayes/LR/SVM), data mining, etc., to improve the crawling efficiency of the platform
  • Participate in crawlers Core algorithm and strategy optimization, familiar with the scheduling strategy of the collection system
  • Real-time monitoring of crawler progress and alert feedback

Requirements:

  • Familiar with Linux system, mastering languages ​​such as Python
  • Mastering the principle of web crawling And technology, understand the principle of cookie-based login, and be familiar with web page information extraction technologies based on regular expressions, XPath, CSS, etc.
  • Familiar with the entire crawler design and implementation process, have experience in large-scale web information extraction and development, and be familiar with Various anti-crawler technologies, with experience in distributed crawler architecture
  • The ability to link analysis (pagerank, trustrank), feature extraction (page quality, authority, topic, linear/non-linear regression, LDA), etc. is preferred

NetEase Crawler Engineer 12-24k

Responsible:

< ul>

  • Responsible for designing and developing a general crawler system, extracting and analyzing various platform page contents;
  • Study various websites and link forms, and discover their characteristics and laws;< /li>
  • Solve technical problems, including anti-anti-climbing, pressure control, etc., to improve the efficiency and quality of web crawling;
  • Requirements:< /p>

    • Proficient in python, computer networks, proficient in multi-threading, familiar with common crawler frameworks such as Scrapy;
    • Familiar with Linux operations, regular expressions, MySQL, MongoDB and other common databases, understand Various Web front-end technologies;
    • Can solve the problems of account closure, IP closure, verification code recognition, image recognition, etc.;

    Scallop Crawler Engineer 8-16k

    Responsible:

    • Develop a distributed web crawler system to capture information on multiple platforms and Analysis work?
    • Responsible for web page information and App data extraction and deduplication work?
    • Cooperate with algorithm posts to complete ETL related tasks

    Requirements:

    • Master the principles and technologies of web crawling, understand the principles of cookie-based login, and be familiar with web information extraction technologies based on regular expressions and XPath?
    • < li>Familiar with commonly used open source crawler frameworks, such as scrapy / pyspider?

    • Solid coding ability and algorithm foundation, familiar with Python / Shell development under Linux

    WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
    SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 2046 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

    Leave a Comment

    Your email address will not be published.