Is the crawler legally or illegal?

It is said that more than 50% of the traffic on the Internet is created by crawlers. Maybe you see that a lot of popular data is created by crawlers, so it can be said that without crawlers, there will be no Internet prosperity.

The day before yesterday, I wrote an article “Just because I wrote a crawler, more than 200 people in the company were arrested! “, tells about the incident of a programmer being criminally investigated for writing a crawler. The article spreads widely, and the hottest discussion in the comments is: Are reptiles legal or illegal?

This topic involves the daily work of many of our programmers, so it is necessary to have a detailed chat with you.

01. Technology is not guilty?

Many friends left me a message: Technology is innocent, technology itself is indeed right or wrong, but the people who use technology are right or wrong, the company or the programmer If you know that the use of its technology is illegal, then the company or person will have to pay for it.

After the state promulgated the “Network Security Law of the People’s Republic of China” this year, many businesses that were previously in a gray area could not be done.

Don’t you see the various social work library websites that were very popular before, have most of them disappeared now? Because the latest security law emphasizes: Selling more than 50 pieces of personal information is a “serious situation” and requires the pursuit of legal responsibility.

Many grassroots webmasters have voluntarily shut down websites; there are also many websites involving copyright information, such as books, film and television dramas, courses, etc., which will face more and more stringent censorship in later stages. This is the current situation.

On December 20, 2014, Renren Film and Television Subtitling Station released Weibo stating that Renren Film and Television was officially closed, and said that it may continue to provide translation services for genuine dealers, or it may be transformed into a discussion community. .

In June 2019, I love to crack that the site was closed due to copyright issues and rectified…
…..

As the Chinese economy continues to move forward, knowledge More and more attention will be paid to the issue of property rights. Illegal crawlers are now an important part of the crackdown.

If a programmer walks on the edge of the gray to stop it as soon as possible, don’t violate the law because of a small profit, and thus the gain is not worth the loss. .

Technology is innocent, but the cost of using it in the wrong place is very high.

02. Everyone at the reptile post is at risk

I searched on the hook: crawler engineer, there are 217 related Recruitment information, salary ranges from 10-60k, indicating that there is a great demand for crawlers in the market.

Share a picture

After the article was sent out the day before yesterday Many programmers left me a message:

  • Our leader arranged for me to crawl the company’s internal information. Is this a crime?
  • Is it a crime to crawl public information on the Internet?
  • I wrote a piece of code and uploaded it to Github. Is it illegal to be used?

Simply answer these questions:

  • 1. Of course it’s not a crime to crawl the company’s internal information with the company’s authorization, but the company does not use it. I don’t know why I use a crawler instead of an interface?

  • 2.It is not illegal to crawl public information on the Internet, but it is also illegal if a large number of crawlers are turned on and the other party’s server crashes. This belongs to the category of brute force attacks. NS.

  • 3. Write a piece of code and upload it to Github. Someone uses your code to do other illegal things. Most of them are fine, but if you write It’s hard to say that software involves intrusion, brute force cracking, viruses, etc.

Some friends believe that the company is not responsible for this matter. The initial design and final launch of the project in daily work need to be approved by the company’s legal affairs, and all codes must have other programs. It can be submitted only after review by the staff and colleagues.

What this friend said is quite right. According to reason, every company should have legal affairs and risk control in front, and product design and programmer development are behind. But if a company is for profit, the boss You can just shut up these two departments, can the programmers not do it?

Furthermore, many companies actually do not have these two departments or they are useless. So, as a programmer, you also need to worry about it. You can’t do any program that involves intrusion, because there is a thing called: unit crime.

Unit crime refers to a company, enterprise, public institution, government agency, or organization seeking benefits for a unit, and the implementation of the decision by the unit’s decision-making body or the person in charge shall be criminalized by law Responsible behavior that harms society.

In principle, my country’s Criminal Law adopts the double penalty system for unit crimes, that is, if a unit commits a crime, a fine is imposed on the unit, and the directly responsible person in charge and other directly responsible persons The responsible officer was sentenced to punishment.

03. What kind of crawler is illegal?

Crawlers cannot involve personal privacy!

If the crawler program collects personal information such as citizen’s name, ID number, communication contact information, address, account password, property status, and whereabouts, If it is used illegally, it will definitely constitute an illegal act of illegally obtaining citizens’ personal information.

That is to say, there is no problem with your crawling information, but it cannot involve personal privacy issues. If it involves and gains income through illegal channels, it must be an illegal act.

In addition, there are the following three situations, crawlers may violate the law, serious or even constitute a crime:

  • 1.< strong>The crawler program evades the anti-crawler measures set by the website operator or cracks the server anti-crawling measures, and obtains relevant information illegally.

  • 2.The crawler program interferes with the normal operation of the visited website or system, if the consequences are serious, it violates the criminal law and constitutes a “crime of destroying computer information systems”

  • 3.The information collected by crawlers belongs to citizens’ personal information, which may constitute an illegal act of illegally obtaining citizens’ personal information. If the circumstances are serious, there are It may constitute a “crime of infringing on citizens’ personal information.”

There are many paid courses on the Internet, such as GeekTime, Gitchat, MOOC, Knowledge Planet, etc. If these paid internal information are sold by illegal crawling techniques Profit is an illegal act.

Before I met a netizen who grabbed the contents of each knowledge planet and put them together to sell them. I thought I had found a big business opportunity, but I didn’t know that this behavior was actually very dangerous. , The risks and benefits are obviously not equal.

When I watched these two days, one of his official accounts was blocked, and then another trumpet account was transferred to continue working. Sooner or later, it was the fate of being blocked again. It was really not worth it. The most pitiful are those users who bought his services, because he promised to be forever when he advertised, and certainly not forever.

1. Comply with Robots protocol

Robots protocol is also called robots.txt (unified lowercase) which is stored in the root directory of the website in ASCII encoding A text file, which usually tells the bots of web search engines (also known as web spiders), which content in this website should not be obtained by the bots of the search engine, and which can be obtained by the bots.

The Robots protocol tells the crawler what information can be crawled and which information cannot be crawled. In strict accordance with the Robots protocol, crawling website-related information will generally not cause much problem.

2. Don’t cause the other party’s server to be paralyzed

But it’s not that as long as the crawlers comply with the Robots protocol, there will be no problems. Two factors are involved. First Not being able to crawl on a large scale causes the opponent’s server to be paralyzed, which is tantamount to a network attack.

In the “Data Security Management Measures (Draft for Comment)” issued by the Cyberspace Administration of China on May 28, 2019, it is proposed to restrict the use of crawlers in the form of administrative regulations:

Network operators use automated means to access and collect website data, and must not hinder the normal operation of the website; such behavior seriously affects the operation of the website. If the traffic collected by automated visits exceeds one-third of the average daily traffic on the website, when the website requests to stop the automatic visit collection , Should stop.

3. Can not make illegal profits

Malicious use of crawling technology to grab data, grab the advantage of unfair competition, or even seek illegal benefits, then May violate the law. In practice, the number of disputes arising from the illegal use of crawler technology to capture data is actually quite large, and most of them are litigated on the grounds of unfair competition.

For example, if you grab all the public information on Dianping and copy an exact same website by yourself, and also get a lot of profits through this website, this is also a problem of.

Under normal circumstances, crawlers are for corporate profit. Therefore, the moral self-sufficiency of crawler developers and the conscience of business operators are the fundamentals to avoid touching the bottom line of the law.

05.Last

Recently, I have seen a lot of incidents about programmer accidents, Southeast Asian programmers were beaten, multiple big data companies were investigated and so on. As an ordinary programmer, I hope everyone can pay more attention to such events to remind themselves.

Enter risky industries with caution, such as cash loans, non-compliant P2P, gambling games, and black five product industries. If the company arranges to invade a certain website data, or if a colleague/friend invites to disclose company information, you need to be vigilant. Sometimes a small action may cause problems.

Most of our companies and individuals use crawlers without any problems. You don’t need to endanger yourself, as long as you don’t crawl personal information, don’t use crawlers to make illegal profits, and don’t crawl. There is basically no problem in fetching the paid content of the website.

Programmers are the simplest group of people in the world, and they are also a group of people with high IQ and low EQ. Work is work, but you also need to be cautious. For some walking on the edge of the law Please keep your distance.

Fear the law, abide by the law, start with me.

Reference:
https://www.zhihu.com/question/291554395

Leave a Comment

Your email address will not be published.