TIDB source reading series articles (1)

At TiDB DevCon2018, we announced the TiDB source code reading and sharing activity, and promised to publish a series of articles and videos to help everyone understand the TiDB source code. Everyone has been very concerned about the time of this event, and we are busy with the development of the new version and have been busy. During the Spring Festival holiday, I finally had time to start writing this series.

Why are we doing this? The cause of the matter is that with the gradual development of the TiDB project, the code is becoming more and more complex, and we find that it is more and more difficult for new recruits to modify the code. We had the idea of ​​doing internal training. By recording videos and writing tutorials, we speeded up the integration of new colleagues. After doing a few times, we found that the results were good. In addition to the new students’ gains, the old comrades also Knowing the modules that you are not familiar with before, everyone has gained. We thought that the open source community faced the same problem and could also benefit from this work, so the idea of ​​making this activity bigger and bigger was born, so this activity was created.

TiDB, as an open source project, has received extensive attention from the community during the development process. Many people are trying out TiDB or have used TiDB online, and they have given many good suggestions or feedback to help us. Make the project better. This is true for project development, and so is the research on database technology. We very much hope to communicate with database researchers and enthusiasts. We have organized nearly a hundred technical Meetups or Talks in the past two years. In the process of communicating with you, we found that the domestic database technology level is very good. Sparks can always be collided in the communication process. Through this activity, we hope to have more in-depth exchanges with you, and through source code reading, let TiDB “see you candidly”.

Preface

The best way to learn a system is to read some classic works and study an open source project, and databases are no exception. There are many good open source projects in the field of stand-alone databases. MySQL and PostgreSQL are the two most well-known among them. Many people have seen the codes of these two projects. We also saw a lot of MySQL and PG code when we first started the database, and benefited a lot from it. However, in terms of distributed databases, there are not many good open source projects. Some well-known systems are not open source, such as F1/Spanner, and some systems are neglected or changed from open source to closed source, such as being closed after being acquired by Apple. Source FoundationDB (fortunately, I cloned a code at the time:), see here, we have also organized some open source system code reading talks internally or externally, but it is not systematic.

TiDB is currently receiving widespread attention, especially some technical enthusiasts, hoping to participate in this project. Due to the complexity of the entire system, many people do not understand the entire project well. We hope that through this series of articles, from the top to the bottom, from the shallower to the deeper, describe the technical principles and implementation details of TiDB to help you master this project.

Background knowledge

This series of articles will focus on TiDB itself. Readers need to have some basic knowledge, including but not limited to:

  • Go language , Do not need to be proficient, but at least you must be able to read the code, know the use of Goroutine, Channel, Sync and other components
  • Basic knowledge of database, understand which functions and components of a stand-alone database
  • SQL basic knowledge, know basic DDL, DML statements, basic common sense of transactions
  • Basic knowledge of back-end services, such as how to start a background process, how RPC works

Some common sense of network and operating system

  • In general, readers need to understand basic database knowledge and understand Go language programs. I believe this is for most students , It’s not a problem.

In addition to the above-mentioned more general knowledge, I also hope that readers can read the three articles I have written before (saying storage, computing, and scheduling) to understand some of the basics of TiDB principle.

What can readers gain?

What can be gained through this series of articles? First, by understanding the basic principles of TiDB, understand the basic principles of a relational database; second, by reading the code of TiDB, know how a database is implemented, and implement the database principles seen in the textbook. Third, understanding the impact of the implementation of a database on its behavior can better understand why the database is like this and extend it to other databases. I believe it will also help readers to make good use of other databases. Fourth, you can see how a large-scale distributed system is designed, constructed, and optimized. Finally, after everyone understands the code of TiDB, if there is a need in the follow-up work, you can quote the code of TiDB. At present, some companies have used some modules of TiDB in their products, such as Parser.

Summary of content

First, we should clarify a concept. Generally speaking, we mentioned that TiDB refers to the entire distributed database, including the three major components of tidb-server/pd-server/tikv-server . Since the whole project is more complicated and involves two programming languages ​​(Golang and Rust), you only need to look at the code of tidb-server if you want to understand database-related things. The calculation-related logic of tikv-server above can also be found in the code of tidb-server. In the code directory of tidb-server, you can find a component called mock-tikv. Here we use local storage to simulate the behavior of tikv-server. Here You can find a lot of the same code logic as the tikv-server above, especially the logic of the Coprocessor module. The logic on tikv-server is transplanted from mock-tikv. Therefore, this series of articles mainly introduces the tidb-server code. Unless otherwise specified, the TiDB mentioned in the article refers to tidb-server.

This series of articles will explain the Protocol layer, as well as important modules such as Parser, Preprocess, Optimizer, Executor, and Storage Engine in accordance with the components of the database and the common flow of SQL processing. It is divided into two parts as a whole. The upper part includes the following four articles:

  • The first article introduces the overall architecture and knows what modules TiDB has and what they do. Where to start is better, which can be ignored, and which need to be read carefully.
  • The second article starts from the SQL processing flow, introduces where is the entry point, what operations need to be done, and knows where a SQL comes in, where it is processed, and where it is returned.
  • The third article starts from the code itself and introduces how to understand the code of a certain module.
  • The fourth article will introduce an example to introduce how to make TiDB support a new syntax.

After reading this part, I hope you have a certain foundation of TiDB, can understand the general process, when you encounter problems or want to add a new feature to TiDB, you will not be lost Start.

The second half will explain in more depth, explaining each important module of TiDB, including the detailed implementation of the optimizer, how the logic optimization/physical optimization is done, and the implementation of important physical operators and many more. I hope you can have a deep understanding of TiDB after reading, and can fully understand the code of TiDB. This part will be much more than the first half, and the specific number has not yet been determined.

This series of articles will also serve as PingCAP’s internal training materials, and we hope that the community can also benefit from it. All articles will be published on PingCAP’s WeChat official account (WeChat ID: pingcap2015), Zhihu column, and PingCAP’s official blog. Welcome everyone to follow through these channels.

In addition to the article

In addition to this series of articles, we also have an open source plan for internal training videos. At present, the internal source code explanation activities have been carried out 4 times in the form of A colleague spends a week studying a module he is not familiar with, and then spends an hour explaining it to other colleagues. The purpose is to let everyone understand all the modules. This training will continue. Videos are recorded every time. We plan to edit and organize these videos before opening them up. In the near future, some community contributors will be invited to do internal testing, and then make some adjustments based on their opinions, and then open it to the entire community.

Time plan

This series of articles has just begun to be written, but at present there is only a general plan. We will try our best to ensure that each article is released according to the plan. Several articles in the first half will be published before mid-March, and articles in the second half will be gradually released later.

As for the video part, depending on the editing and the progress of the test, we will give a preview.

Some expectations

We do not have experience in writing a series of tutorials. We hope that in the process of gradually releasing articles, we can receive feedback from readers and guide us to continuously improve this work. Finally Able to do this thing well together. Throughout the course of the event, we will pay close attention to feedback and make adjustments at any time.

In addition, we hope that like-minded people can participate in the development of TiDB, through the open source community, or even physically :).

In addition, the purpose of this series of articles is to help readers better understand the TiDB source code, not to replace the process of reading the source code. I hope that readers can use these articles as references when reading the source code, rather than read-only articles and ignore the code. Remember, “It’s always shallow on paper, and I absolutely know that this matter requires PR.”

Author: Shen Li

Leave a Comment

Your email address will not be published.