TIDB source reading series articles (2) first knowledge TIDB source code

This article is the second in a series of articles on the TiDB source code reading. The first article introduces the overall architecture of TiDB, knows which modules TiDB has, what they do, where to start, and which ones can be ignored , Which need to be read carefully.

This article is an introductory document with a relatively low degree of difficulty. Some of the content may have been seen in other channels, but for the sake of content completeness, we will still put it here.

TiDB architecture

This TiDB source code journey starts with this simple architecture diagram. Many people have seen this diagram. We can describe this in one sentence Picture: “TiDB is a SQL engine that supports MySQL protocol and uses a distributed KV storage engine that supports transactions as the underlying storage.” It can be seen from this sentence that there are three important things. The first is how to support the MySQL protocol and interact with the Client. The second is how to interact with the underlying storage engine and access data. The third is how to implement SQL functions. This article will first introduce some TiDB modules and their functions, and then use these three points as clues to connect these modules in series.

Code introduction

TiDB source code is fully hosted on Github, and all the information can be seen from the project homepage. The entire project is developed in Go language, and many packages are divided into functional modules. Through some dependency analysis tools, you can see the dependencies between the internal packages of the project.

Most packages provide external services in the form of interfaces, and most of the functions are concentrated in a package. However, some packages provide very basic functions and are dependent on many packages. Need special attention.

The main file of the project is in tidb-server/main.go, which defines how to start the service. The Build method of the entire project can be found in the Makefile.

In addition to the code, there are many test cases, which can be found in xx_test.go. In addition, there are several toolkits under the cmd directory, which are used for performance testing or to construct test data.

Module introduction

TiDB has a lot of modules. Here is an overall introduction. You can see what each module is used for. If you want to see the code for related functions, You can find the corresponding module directly.

< td>metrics

< td align="center ">Some constant definitions of TiKV API

< tr>

< td align="center">Character set related logic

Package Introduction
ast The data structure definition of the abstract syntax tree, for example, SelectStmt defines what kind of data structure a Select statement is parsed into
cmd/benchdb Simple benchmark tool for performance optimization
cmd/benchfilesort< /td>

Simple benchmark tool for performance optimization
cmd/benchkv Transactional The KV API benchmark tool can also be seen as an example of the use of the KV interface
cmd/benchraw Raw KV API benchmark tool, It can also be seen as an example of using the KV interface without transaction
cmd/importer forgery based on table structure and statistical information Data tool, used to construct test data
config Configuration file related logic
context mainly includes the Context interface, which provides some basic functional abstractions. Many packages and functions depend on this interface. The purpose of abstracting these functions into interfaces is to solve the problem of package Dependence between
ddl D The execution logic of DL
distsql The abstraction of the distributed computing interface. Through this package, the interface between Executor and TiKV Client Logic isolation
domain domain can be considered as an abstraction of storage space, in which databases and tables can be created, different Between domains, there can be databases with the same name, a bit like Name Space. Generally speaking, a single TiDB instance will only create one Domain instance, which will hold information schema information, statistical information, and so on.
executor The execution logic of the executor, it can be considered that most of the execution logic of the statement is here, which is more complicated, and will be dedicated later Introduction
expression Expression related logic, including various operators and built-in functions
expression/aggregation The logic related to aggregation expressions, such as Sum, Count and other functions
infoschema< /td>

SQL meta-information management module, in addition to the operation of Information Schema, will visit here
kv KV engine interface and some public methods, the underlying storage engine needs to implement the interface defined in this package
meta Use the functions provided by the structure package to manage the SQL meta-information stored in the storage engine. Infoschema/DDL uses this module to access or modify the SQL meta-information
meta/autoid The module used to generate a globally unique auto-increment ID. In addition to the auto-increment ID for each table, it is also used to generate
Metrics related information, the Metrics information of all modules is here
model SQL metadata data structure, including DBInfo / TableInfo / ColumnInfo / IndexInfo, etc.
mysql MySQL-related constant definitions
owner Some tasks in the TiDB cluster can only have one For instance execution, such as asynchronous schema changes, this module is used to coordinate among multiple tidb-servers to generate a task executor. Each task will have its own executor.
parser The grammatical analysis module mainly includes lexical analysis (lexer.go) and grammatical analysis (parser.y), this The main interface of the package is Parse(), which is used to parse SQL text into AST
parser/goyacc for GoYacc Packaging
parser/opcode Some constant definitions about operators
perfschema Performance Schema related functions are not enabled by default
plan Query optimization-related logic
privilege User authority management interface
privilege/privileges User authority management function realization
server MySQL protocol And Session management related logic
sessionctx/binloginfo Output Binlog information to Binlog module
sessionctx/stmtctx The information needed when the statement in the Session is running is more complicated.
sessionctx/variable< /td>

System Va riable related code
statistics Statistics module
store

Storage engine related logic, here is the interaction logic between the storage engine and the SQL layer
store/mockoracle simulate TSO components
store/mockstore the logic of instantiating a Mock TiKV, the main method is NewMockTikvStore , Extracting this part of logic from mocktikv is to avoid circular dependency
store/mockstore/mocktikv on the stand-alone storage engine Simulate some behaviors of TiKV, the main function is to debug locally, construct unit tests and guide TiKV to develop Coprocessor related logic
store/tikv TiKV’s Go Language Client
store/tikv/gcworker TiKV GC related logic, tidb-server will follow the configured strategy Send GC commands to TiKV
store/tikv/oracle TSO service interface
store/tikv/oracle/oracles Client of TSO service
store/tikv/tikvrpc
structure A layer of structured API defined on Transactional KV API, providing List /Queue/HashMap and other structures
table Abstraction of SQL Table
table/tables Implementation of the interface defined in the table package
tablecodec SQL to Key-Value encoding and decoding. For the specific encoding and decoding schemes of each data type, see codec package
terror TiDB’s error package
tidb-server main method of service
types all type-related logic, including some type definitions, operations on types, etc.
types/json json type-related logic
util Some utility tools, there are a lot of bread in this directory, here will only introduce a few important packages
util/admin Some methods used in TiDB management statements (Admin statements)
util/charset
util/chunk Chunk is a kind of data introduced in TiDB 1.1 Indicates structure. A Chunk stores several rows of data. When performing SQL calculations, the data flows between modules in units of Chunk
util/codec Encoding and decoding of various data types
x-server X-Protocol implementation

Where to start

At first glance, TiDB has 80 packages, which makes people feel at a loss, but not all packages are important. Other functions only involve a small number of packages. Where to start looking at the source code depends on the purpose of looking at the source code.

If you want to know the implementation details of a specific function, you can refer to the module introduction above and find the corresponding module.

If you have a comprehensive understanding of the source code, you can start with tidb-server/main.go to see how tidb-server is started, and how to wait and process user requests. Follow the code again and look at the specific execution process of SQL. Some other important modules need to be looked at to know how they are implemented. Auxiliary modules, you can choose to look at them, and you can get a general impression.

Important modules

Among all 80 modules, the following modules are the most important. I hope you can read them carefully. For these modules, we will also use special articles. Explain, after all the articles are Ready, I will replace the TODO in the table below with the corresponding article and link.

Package Related Articles
plan

TODO
expression TODO
executor TODO
distsql TODO
store/tikv TODO
ddl TODO
tablecodec TODO
server TODO
types TODO
kv TODO
tidb TODO

Auxiliary modules

In addition to important modules, the rest are auxiliary modules, but it does not mean that these modules are not important, but that these modules are not in the critical path of SQL execution. , We will also use a certain amount of space to describe most of these packages.

SQL layer architecture

This picture is much more detailed than the previous picture. It roughly describes the SQL core module. You can start from the left and look in the direction of the arrow.

Protocol Layer

On the far left is the Protocol Layer of TiDB. This is the interface for interacting with the Client. Currently TiDB only supports the MySQL protocol, and the related codes are in the server In the package.

The main function of this layer is to manage client connections, parse MySQL commands and return execution results. The specific implementation is implemented in accordance with the MySQL protocol. For the specific protocol, please refer to the MySQL protocol document. We think this module is currently the best implemented MySQL protocol component. If you need to use the MySQL protocol parsing and processing functions in your project, you can refer to or quote this module.

The logic of connection establishment is in the Run() method of server.go, mainly in the following two lines:

236: conn, err := s.listener.Accept()< br />
258: go s.onConn(conn)

The entry method for a single Session to process commands is to call the dispatch method of the clientConn class, where the protocol will be parsed and transferred to different processing functions.

SQL Layer

Generally speaking, a SQL statement needs to go through, syntax analysis-->validity verification-->making query plan-->optimizing query plan-->according to A series of processes such as plan generation queryer-->execute and return results. This backbone corresponds to the following packages of TiDB:

< tr>

< td>store/tikv
Package Function
tidb The interface between the Protocol layer and the SQL layer
parser Syntax parsing
plan Validation of legality + query plan development + query plan optimization
executor Actuator generation and execution
distsql Send to TiKV via TiKV Client and summarize the returned results
TiKV Client

KV API Layer

TiDB depends on the underlying storage engine Provides data access functions, but does not rely on a specific storage engine (such as TiKV), but puts forward some requirements on the storage engine, and engines that meet these requirements can be used (TiKV is the most suitable one).

The most basic requirement is "Key-Value engine with transaction and Driver in Go language". The more advanced requirement is "Support for distributed computing interface", so that TiDB can request some calculations. Push down to the storage engine.

These requirements can be found in the interface of the kv package. The storage engine needs to provide a Go language Driver that implements these interfaces, and then TiDB uses these interfaces to manipulate the underlying data.

For the most basic requirements, you can focus on these interfaces:

  • Transaction: basic transaction operations
  • Retriever: read data interface
  • Mutator: Interface for modifying data
  • Storage: Basic functions provided by Driver
  • Snapshot: Operations on data Snapshot
  • Iterator : The result returned by Seek can be used to traverse the data

With the above interfaces, you can perform various required operations on the data and complete all SQL functions , But in order to perform calculations more efficiently, we also define a high-level computing interface, you can pay attention to these three Interface/struct:

  • Client: Send requests to the lower layer and obtain the calculation of the lower storage engine Ability
  • Request: The content of the request
  • Response: The abstraction of the returned result

Summary

So far, the reader has come Understand the TiDB source code structure and the three main parts of the structure, more detailed content will be described in detail in the following chapters.

Author: Shen Li

Leave a Comment

Your email address will not be published.