TIDB Application of Funyours Japan in Corporate Closer

Background

Since its establishment in Japan in 2014, FUNYOURS JAPAN Co., Ltd. has operated a variety of well-received page games and mobile games, such as: 鉣頭のソティラス, Ninety-Nine Hime, etc., For operating games, it is very important to understand what the players in the game are doing, what their favorite preferences are, and whether the level design is balanced, so with the growth of operating time, the database data is more than 100 million records. ordinary.

So our technical unit has been constantly evaluating various databases on the market and how to improve the current existing systems and architectures. In recent years, the most popular database system can be said to be NoSQL, regardless of MongoDB , Cassandra, Redis, HBase, etc. all occupy a lot of space, with the characteristics of fast reading and writing, easy expansion and so on. After a preliminary understanding, the NoSQL method requires a complete redesign of the current data storage architecture, and the business transformation design needs to be coordinated with the NoSQL database used, so which NoSQL database should be used is another issue that needs to be carefully considered. Subject. Let’s go back and look at the items that need to be dealt with most at present: 1. Storage space expansion is not easy, 2. A single database has limited performance.

Initial plan

In dealing with the lack of storage space, we first adopted the compressed table format provided by MySQL innoDB, and used 8K page size for the parts that need to be read and written frequently. In the past, the 4K page size was used in the log section, and the effect was very satisfactory, freeing up a lot of storage space, and causing no noticeable impact on performance. There are many tests on this part of the Internet, so I won’t explain more here. But the space saved by quickly compressing the table is limited after all. The next step is to increase the volume capacity and move the past logs that do not need to be updated to other databases. Although the maintenance work and time are complicated and burdened, the problem is solved NS.

Based on the performance limitations of a single MySQL database architecture, we have adopted multiple sets of database servers to meet the required performance. Of course, the data between different groups is not shared, that is, SQL cannot be used directly to do operations between groups, and additional programs are required to operate. And of course, for the efficiency of a large amount of data access, it is indispensable to partition the tables by table and database.

Getting to know TiDB

Using a NoSQL database seems to provide a perfect solution, but the cost is also high. So we set our sights on MySQL Cluster. At this time, we saw the news that Google released Cloud Spanner beta, NewSQL? What is this? It quickly aroused our strong interest, and after many investigations, we found TiDB: an open source NewSQL database on GitHub. The official has also continued to publish many related articles. With the understanding of TiDB, it is believed that it is a very suitable optimization solution for the current situation, compatible with MySQL, high availability, and easy horizontal expansion.

During the feasibility evaluation and testing, I used the architecture of 3 TiKV with 3 PDs and 2 TiDB mashup PDs, and used the ansible installation suggested by the document. At this time, I encountered two Difficulty. The first one is that when ansible checks the machine performance, it cannot be installed because of the hard disk read and write performance. Because it uses a cloud machine, there is not much flexibility in hardware, so I have to manually modify the script to install it smoothly. The second one also checks whether the ntp synchronization service is started in ansible, but the default time synchronization service of centos7 is chrony, so the script is also modified (the later version provides flags to switch, and there is also an option to automatically install ntp), in short It is successfully installed. At this time, because PingCAP has just released the ansible installation method, the official doc did not elaborate on the horizontal extension of the document, such as adding TiKV, PD, TiDB machines, or removing machines, so I wrote an email to contact PingCAP and send After finishing the letter, I went out for lunch and came back. The official has responded and invited to join wechat to provide more immediate communication and support, which is really amazing.

Backup and restore mechanism. In this part, TiDB provides a faster performance than the official mysqldump solution-mydumper/loader. Here we use the source on GitHub to build by ourselves, but there is a problem. After communicating with the official, I realized that tidb-enterprise-tools was already provided in the toolkit. mydumper can use regular expressions to select the desired database and table backup. For the design that the original structure will be divided into databases and tables, it adds a lot of convenience. The backup files will all be placed in a folder for use The loader can then re-enter the backup data into the DB. But isn’t TiDB just for the convenience of using the same table? When a huge amount of data is in the same table, although the performance of mydumper/loader is very good, it still takes some time due to the need for full backup. Because TiDB also supports mysqldump, if you need to perform incremental backups, you can also use mysqldump with the where conditional expression.

Because of the need to control the permissions of different services, we also tested the account permissions mechanism of TiDB. At that time, it was still the pre-GA version. According to the fuzzy matching on the file, the permissions cannot be obtained. It must be completely matched to obtain it normally; in addition, the permissions will not be properly recovered in the use of revoke to reclaim permissions. But after reporting back to PingCAP, it was quickly fixed in the GA version.

Launch TiDB

In the initial launch, 4 core cpu, 32 GB memory as TiKV, 8 core cpu, 16 GB memory as TiDB/PD, 3 TiKV, 3 PD, 2 sets of TiDB and PD are mixed and matched. Through prometheus observation, it is found that the loading is concentrated on the same TiKV, and the loadaverage will rush to more than 7 during the peak period. The preliminary judgment may be that the specifications are not enough, so it is decided to upgrade all TiKV to 16 core and 24 GB memory. Because there is an online event, so I don’t want to stop the machine. The method of adding three TiKV machines first and then removing the three original TiKV machines is used. Special thanks to PingCAP for its continuous online support in the replacement of machines. The switching machine was completed smoothly. After the machine increased its specifications, the peak loadaverage dropped to 4, but it was still concentrated on one of the TiKVs and would not be scattered to three. With the assistance of PingCAP, it was judged that it might be the select in the business behavior. count(1) This SQL is too frequent. The business data has been accessed in the same region. By trying to improve the development of the file, it still cannot be solved (the latest v1.1 version has After optimizing count(*)), the business behavior was changed in combination with data characteristics, and loadavg was almost always kept below 1.

Comparing the original architecture with the TiDB architecture, the original architecture uses multiple groups of DBs to distribute users across different groups of DBs to achieve the required performance. However, when some of the groups have a heavy load, the DBs of other groups cannot help share the load. After adopting the TiDB architecture, the use of the machine is more efficient, and when using the background to find and analyze data, under the original architecture, as long as the time is extended to more than one month, it will affect the performance of the group of DBs; Under the TiDB architecture, there is no such problem.
Now the most direct benefit in operation is the saving of hardware costs. Under the original architecture, each set of DB specifications must meet the peak period of operation. However, under the TiDB architecture, after integrating all the machines into one, as long as the combined performance of all machines can reach the peak period, the monitoring is matched with the Prometheus/Grafana visualization system and flexible custom rules. The warning also eliminates the cost of using snmp to build a self-built surveillance system. In addition, since the complexity of writing programs is reduced, when operation planners propose new analysis data they want to know, they can be retrieved from the database more quickly, and they can have more time to deal with and analyze user preferences.

Future plan

Currently, we are evaluating the use of TiSpark, and we plan to use TiSpark in the background analysis data. Because TiSpark can directly operate TiKV, it can also use many ready-made libraries provided by Spark to analyze the collected log data. It is expected to use Spark’s machine learning to initially determine whether each function in the system is operating normally, and to provide warnings, such as when the user’s login frequency is abnormal, to assist in manual monitoring of the game’s running status.

Author: Zhang Mingtang FUNYOURS JAPAN Operation System Engineer

Leave a Comment

Your email address will not be published.