Distributed global unique ID generation strategy

Why a distributed system needs an ID generation system

In a complex distributed system, it is often necessary to uniquely identify a large amount of data and messages. For example, in the financial, payment, catering, hotel, Maoyan movie and other product systems of Meituan Dianping, the data is growing day by day. After the sub-database and sub-table of the database, a unique ID is required to identify a piece of data or message. The database is self-increasing. ID obviously cannot meet the demand; special ones such as orders, riders, and coupons also need to have a unique ID for identification. At this time, a system that can generate a globally unique ID is very necessary.

To sum up, what are the requirements of the business system for ID numbers?

Requirements for the ID generation system

1. Global uniqueness: no duplicate IDs, the most basic requirement.
2. Increasing trend: MySQL InnoDB engine uses clustered index. Since most RDBMS use B-tree data structure to store index data, we should try our best to use ordered primary keys to ensure write performance in the selection of primary keys.
3. Monotonically increasing: Ensure that the next ID must be greater than the previous ID.
4. Information security: If the ID is continuously increasing, a malicious user can easily see the rules of the order number and guess the next order number. If it is a competitor, we can directly know the order volume of our day . Therefore, in some scenarios, the ID needs to be irregular.

The third and fourth requirements are mutually exclusive and cannot be met at the same time.

At the same time, in a large-scale distributed website architecture, in addition to meeting the needs of ID generation itself, the ID generation system also needs to be highly available. Imagine the following, if the ID generation system is paralyzed, then the entire business cannot go on, it will be a disaster.
Therefore, the summary ID generation system also needs to meet the following requirements:
1. High availability, availability reaches 5 9s or 4 9s.
2. High QPS, performance should not be too bad, otherwise it is easy to cause thread blockage.
3. The average delay and TP999 (the lowest delay that guarantees 99.9% of requests can succeed) delays should be as low as possible.

Type of ID generation system

UUID

UUID means that the number generated by one machine at the same time is unique among all machines. According to the standard calculation established by the Open Software Foundation (OSF), the Ethernet card address, nanosecond time, chip ID code and many possible numbers are used.
UUID is a combination of the following parts:
(1) Current Date and time.
(2) Clock sequence.
(3) The globally unique IEEE machine identification number. If there is a network card, it will be obtained from the MAC address of the network card. If there is no network card, it will be obtained by other means.
The standard UUID format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (8-4-4-4-12), 36 characters divided into five segments by hyphen, example: 550e8400-e29b-41d4 -a716-446655440000
The UUID API has been provided in the Java Standard Class Library.

UUID.randomUUID()  

Advantages

  • Very high performance: local generation, no network consumption.

Disadvantages

  • Difficult to store: UUID is too long, 16 bytes and 128 bits, usually expressed as a 36-length string , Many scenarios are not applicable.
  • Insecure information: The algorithm that generates UUID based on MAC address may cause MAC address disclosure. This vulnerability was used to find the location of the creator of Melissa virus.
  • When ID is used as the primary key, there will be some problems in specific environments. For example, in the scenario where the primary key of DB is used, UUID is very unsuitable.

SnowFlake algorithm

SnowFlake ID is a 64-bit binary positive integer, and then converted into a decimal number. A 64-bit binary number is composed of the following parts:

share picture snowflake id generation rules

  • 1-digit identifier

strong>: It is always 0. Because the long basic type is signed in Java, the highest bit is the sign bit, positive numbers are 0, and negative numbers are 1, so id is generally positive, and the highest bit is 0.

< li> 41-bit time stamp: The 41-bit time cutoff is not the time cutoff for storing the current time, but the value obtained by storing the difference between the time cutoff (current time cutoff-start time cutoff), here is The start time cut off is generally the time when our id generator starts to use, which is specified by our program.

  • 10-digit machine identification code: It can be deployed on 1024 nodes, If an IDC is deployed, these 10 digits can be composed of 5 digits of the computer room ID + 5 digits of the machine ID.
  • 12-digit sequence: Counting within milliseconds, the 12-bit counting sequence number supports each node to generate 4096 ID sequence numbers every millisecond (the same machine, the same time)
  • Advantages< /p>

    • Simple and efficient, fast generation speed.
    • The timestamp is in the high position, and the auto-increasing sequence is in the low position. The entire ID is trending and increasing in order of time.
    • >

    • High flexibility, the division of bits can be adjusted according to business needs to meet different needs.

    Disadvantages

    • Depending on the machine’s clock, if the server clock is dialed back, it will cause duplicate ID generation.
    • In a distributed environment, the clocks of each server cannot be completely synchronized, and sometimes it may not increase globally.

    Snowflake Java implementation

    /** * Twitter_Snowflake< br> * The structure of SnowFlake is as follows (each part is separated by -): 
    * 0-0000000000 0000000000 0000000000 0000000000 0-00000-00000-000000000000
    * 1-bit identification, because the long basic type is signed in Java Yes, the highest bit is the sign bit, positive numbers are 0, negative numbers are 1, so id is generally positive, and the highest bit is 0
    * 41-bit time cutoff (millisecond level), note that 41-bit time cutoff is not storing the current time Time cutoff, but the difference of the stored time cutoff (current time cutoff-start time cutoff) * the value obtained), the start time cutoff here is generally the time when our id generator starts to use, which is determined by our program Specified (as follows the startTime attribute of the IdWorker class of the program below). * 41-bit time cut-off, you can use 69 years, year T = (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69
    * 10-bit data machine bits, which can be deployed in 1024 Nodes, including 5-digit datacenterId and 5-digit workerId
    * 12-digit sequence, counting within milliseconds, 12-digit counting sequence number supports each node to generate 4096 ID sequence numbers every millisecond (same machine, same time cut)< br> * Add up to exactly 64 bits, which is a Long type.
    * The advantage of SnowFlake is that, as a whole, it is automatically sorted by time, and there will be no ID collisions in the entire distributed system (differentiated by data center ID and machine ID), and it is more efficient. * After testing, SnowFlake It can generate about 260,000 IDs per second. */ public class SnowflakeIdWorker {// =============================Fields=========== ================================ /** Start time cut off (2015 -01-01) */ private final long twepoch = 1420041600000L; /** The number of digits occupied by the machine id*/ private final long workerIdBits = 5L; /** The number of bits occupied by the data identifier id*/ private final long datacenterIdBits = 5L; /** The maximum machine id supported, the result is 31 (this shift algorithm can quickly calculate the maximum decimal number that can be represented by a few binary numbers) */ private final long maxWorkerId = -1L ^ (-1L << workerIdBits); /** The largest supported data identification id, the result is 31 */ private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); /** The sequence is in the id Number of bits*/ private final long sequenceBits = 12L; /** Machine ID is shifted 12 digits to the left*/ private final long workerIdShift = sequenceBits; /** Data identification id is shifted 17 bits to the left (12+5) */ private final long datacenterIdShift = sequenceBits + workerIdBits; /** Time cut is shifted to the left by 22 bits ( 5+5+12) */ private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; / ** The mask of the generated sequence, here is 4095 (0b111111111111=0xfff=4095) */ private final long sequenceMask = -1L ^ (-1L << sequenceBits); /** work Machine ID (0~31) */ private long workerId; /** Data center ID( 0~31) */ private long datacenterId; /** Sequence within milliseconds (0~4095 ) */ private long sequence = 0L; / ** The time when the ID was last generated*/ private long lastTimestamp = -1 L; //=============================Constructors===== ================================ /** * Constructor* < span class="hljs-doctag">@param workerId job ID (0~31) * @param datacenterId data center ID (0~31) */ public SnowflakeIdWorker(long workerId, < span class="hljs-keyword">long datacenterId) {if (workerId> maxWorkerId || workerId <0) {throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));} if (datacenterId> maxDatacenterId || datacenterId <0) {throw new IllegalArgumentException(String.for mat("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));} this.workerId = workerId; this.datacenterId = datacenterId;} // ==================== =========Methods======================================== == /** * Get the next ID (this method is thread-safe) * @return SnowflakeId */ public synchronized long nextId() {long timestamp = timeGen(); //If the current time is less than the time of the last ID generation Stamp, indicating that an exception should be thrown when the system clock has rolled backif (timestamp throw new RuntimeException( String.format("Clock moved backwards. Refusing to generate id for %d milli seconds", lastTimestamp-timestamp));} //If it is generated at the same time, then the sequence within milliseconds if (lastTimestamp == timestamp ) {sequence = (sequence + 1) & sequenceMask; //Sequence overflow within millisecondsif ( sequence == 0) {//Block to the next millisecond, get a new timestamp timestamp = tilNextMillis(lastTimestamp);}} //The timestamp changes, the sequence is reset within millisecondselse {sequence = 0L;} //The last time the ID was generated lastTimestamp = timestamp; //Shifted and combined by OR operation to form a 64-bit ID return ((timestamp-twepoch) << timestampLeftShift) // | (datacenterId << datacenterIdShift) // | (workerId << workerIdShift) // | sequence;} /** * Block until the next millisecond until a new timestamp is obtained* @param la stTimestamp Last ID generation time cutoff* @return current timestamp*/ protected long tilNextMillis(long lastTimestamp) {long timestamp = timeGen(); while (timestamp <= lastTimestamp) {timestamp = timeGen();} return timestamp;} /** * Returns the current time in milliseconds* @return current time (milliseconds) */ protected long timeGen() {return System.currentTimeMillis();} //====================== =========Test======================================== ==== /** Test*/ public static void main(String[] args) {SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0); for (int i = 0; i <1000; i++) {long id = idWorker.nextId(); System.out.println(Long.toBinaryString(id)); System.out.println(id);}}} < /span>< /span>< / span> span> span>
    span>

    Database self-increment ID mechanism

    The main idea is to use database self-increment ID + replace_into realizes the acquisition of unique ID.

    create table t_global_id( id bigint(20) unsigned not null auto_increment, stub char(1) not null default ", primary key (id), unique key stub (stub)) engine=MyISAM; < /span>
    # Each business can use the following SQL to read and write MySQL to get the ID number
    replace into t_golbal_id(stub) values('a '); select last_insert_id();  span>

    The replace into function is similar to the insert function. The difference is: replace into first tries to insert into the data list, if it is found that there is already this row of data in the table (judging by the primary key or unique index), delete it first. Insert again. Otherwise, insert new data directly.
    Of course, in order to avoid a single point of failure of the database, at least two database instances are required to generate odd and even IDs by distinguishing the starting value and step length of auto_increment. As follows:

    Server1:
    auto-increment-increment = 2
    auto-increment-offset = 1
    
    Server2:
    auto-increment-increment = 2
    auto-increment-offset = 2

    Advantages

    • Simple. Take full advantage of the self-incrementing ID mechanism of the database, which has high reliability and generates an orderly ID.

    Disadvantages

    • ID generation depends on the read and write performance of a single database.
    • Depending on the database, the entire system is unavailable when the database is abnormal.

    For MySQL performance problems, the following solutions can be used
    In a distributed environment, we can deploy N database instances, each set to For different initial values, the self-increment step is the number of machines. The initial value of each unit is 1, 2, 3…N, and the step size is N.

    Share pictures MySQL database self-increase ID multi-machine map

    Although the above solution solves the performance problem, it also has great limitations:

    • Difficulty in horizontal expansion of the system: After the system has defined the step size, it is difficult to adjust the step size after adding machines. What if you want to add a machine? Assuming that there is only one machine that issues the number 1,2,3,4,5 (step size is 1), this time you need to expand one machine. You can do this: set the initial value of the second machine much more than the first one, such as 14 (assuming that The first one cannot be sent to 14 within the expansion time), and the step size is set to 2, then the numbers issued by this machine are even numbers after 14. Then remove the first one and keep the ID value as an odd number. For example, 7, and then modify the step size of the first station to 2. Let it meet the number segment standard we defined, for this example, let the first station only produce odd numbers. Does the expansion plan look complicated? It looks okay , Now imagine if we have 100 machines online, what should we do to expand at this time? It’s a nightmare.
    • Database pressure: every time we get an ID, we must Read and write the database once. Of course, there is a corresponding solution to this problem, which is to obtain a range of numbers in the memory every time the ID is obtained, and then get it after using it. The performance of the database has improved by several orders of magnitude. .

    Third-party software generation (Redis)

    Redis implements an atomic operation INCR and INCRBY to achieve incremental operations. When the database performance is not enough, Redis can be used Instead, use Redis cluster at the same time to improve throughput. You can initialize the initial value of each Redis to 1, 2, 3, 4, 5, Then the step size is 5. The IDs generated by each Redis are:

    A: 1, 6, 11, 16, 21
    B: 2, 7, 12, 17, 22
    C: 3, 8, 13, 18, 23
    D: 4, 9, 14, 19, 24
    E: 5, 10, 15, 20, 25

    Advantages

    • It is not dependent on the database, flexible and convenient, and has better performance than the database.
    • Digital ID is sorted naturally, which is very helpful for paging or results that need to be sorted.

    Disadvantages:

    • If there is no Redis in the system, new components need to be introduced to increase system complexity.
    • The workload of coding and configuration is relatively large. This is not the biggest problem.

    Regarding the generation of distributed global unique IDs, various Internet companies have many implementation solutions, such as Leaf-snowflake of Meituan Dianping, which uses zookeeper to solve the problem of clock callback of each server. Rely on zookeeper. And Leaf-segment is similar to the above database batch ID acquisition scheme.

    Author: Misout Link: https://www.jianshu.com/p/9d7ebe37215e Source: Jianshu

    Why distributed The system needs an ID generation system

    In a complex distributed system, it is often necessary to uniquely identify a large amount of data and messages. For example, in the financial, payment, catering, hotel, Maoyan movie and other product systems of Meituan Dianping, the data is growing day by day. After the sub-database and sub-table of the database, a unique ID is required to identify a piece of data or message. The database is self-increasing. ID obviously cannot meet the demand; special ones such as orders, riders, and coupons also need to have a unique ID for identification. At this time, a system that can generate a globally unique ID is very necessary.

    To sum up, what are the requirements of the business system for ID numbers?

    Requirements for the ID generation system

    1. Global uniqueness: no duplicate IDs, the most basic requirement.
    2. Increasing trend: MySQL InnoDB engine uses clustered index. Since most RDBMS use B-tree data structure to store index data, we should try our best to use ordered primary keys to ensure write performance in the selection of primary keys.
    3. Monotonically increasing: Ensure that the next ID must be greater than the previous ID.
    4. Information security: If the ID is continuously increasing, a malicious user can easily see the rules of the order number and guess the next order number. If it is a competitor, we can directly know the order volume of our day . Therefore, in some scenarios, the ID needs to be irregular.

    The third and fourth requirements are mutually exclusive and cannot be met at the same time.

    At the same time, in a large-scale distributed website architecture, in addition to meeting the needs of ID generation itself, the ID generation system also needs to be highly available. Imagine the following, if the ID generation system is paralyzed, then the entire business cannot go on, it will be a disaster.
    Therefore, the summary ID generation system also needs to meet the following requirements:
    1. High availability, availability reaches 5 9s or 4 9s.
    2. High QPS, performance should not be too bad, otherwise it is easy to cause thread blockage.
    3. The average delay and TP999 (the lowest delay that guarantees 99.9% of requests can succeed) delays should be as low as possible.

    Type of ID generation system

    UUID

    UUID means that the number generated by one machine at the same time is unique among all machines. According to the standard calculation established by the Open Software Foundation (OSF), the Ethernet card address, nanosecond time, chip ID code and many possible numbers are used.
    UUID is a combination of the following parts:
    (1) Current Date and time.
    (2) Clock sequence.
    (3) The globally unique IEEE machine identification number. If there is a network card, it will be obtained from the MAC address of the network card. If there is no network card, it will be obtained by other means.
    The standard UUID format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (8-4-4-4-12), 36 characters divided into five segments by hyphen, example: 550e8400-e29b-41d4 -a716-446655440000
    The UUID API has been provided in the Java Standard Class Library.

    UUID.randomUUID()  

    Advantages

    • Very high performance: local generation, no network consumption.

    Disadvantages

    • Difficult to store: UUID is too long, 16 bytes and 128 bits, usually expressed as a 36-length string , Many scenarios are not applicable.
    • Insecure information: The algorithm that generates UUID based on MAC address may cause MAC address disclosure. This vulnerability was used to find the location of the creator of Melissa virus.
    • When ID is used as the primary key, there will be some problems in specific environments. For example, in the scenario where the primary key of DB is used, UUID is very unsuitable.

    SnowFlake algorithm

    SnowFlake ID is a 64-bit binary positive integer, and then converted into a decimal number. A 64-bit binary number is composed of the following parts:

    share picture snowflake id generation rules

    • 1-digit identifier

    strong>: It is always 0. Because the long basic type is signed in Java, the highest bit is the sign bit, positive numbers are 0, and negative numbers are 1, so id is generally positive, and the highest bit is 0.

    < li> 41-bit time stamp: The 41-bit time cutoff is not the time cutoff for storing the current time, but the value obtained by storing the difference between the time cutoff (current time cutoff-start time cutoff), here is The start time cut off is generally the time when our id generator starts to use, which is specified by our program.

  • 10-digit machine identification code: It can be deployed on 1024 nodes, If an IDC is deployed, these 10 digits can be composed of 5 digits of the computer room ID + 5 digits of the machine ID.
  • 12-digit sequence: Counting within milliseconds, the 12-bit counting sequence number supports each node to generate 4096 ID sequence numbers every millisecond (the same machine, the same time)
  • Advantages< /p>

    • Simple and efficient, fast generation speed.
    • The timestamp is in the high position, and the auto-increasing sequence is in the low position. The entire ID is trending and increasing in order of time.
    • >

    • High flexibility, the division of bits can be adjusted according to business needs to meet different needs.

    Disadvantages

    • Depending on the machine’s clock, if The server clock is dialed back, which will cause duplicate ID generation.
    • In a distributed environment, the clocks of each server cannot be completely synchronized, and sometimes it may not increase globally.

    Snowflake Java implementation

    /** * Twitter_Snowflake< br> * The structure of SnowFlake is as follows (each part is separated by -): 
    * 0-0000000000 0000000000 0000000000 0000000000 0-00000-00000-000000000000
    * 1-bit identification, because the long basic type is signed in Java Yes, the highest bit is the sign bit, positive numbers are 0, negative numbers are 1, so id is generally positive, and the highest bit is 0
    * 41-bit time cutoff (millisecond level), note that 41-bit time cutoff is not storing the current time Time cutoff, but the difference of the stored time cutoff (current time cutoff-start time cutoff) * the value obtained), the start time cutoff here is generally the time when our id generator starts to use, which is determined by our program Specified (as follows the startTime attribute of the IdWorker class of the program below). * 41-bit time cut-off, you can use 69 years, year T = (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69
    * 10-bit data machine bits, which can be deployed in 1024 Nodes, including 5-digit datacenterId and 5-digit workerId
    * 12-digit sequence, counting within milliseconds, 12-digit counting sequence number supports each node to generate 4096 ID sequence numbers every millisecond (same machine, same time cut)< br> * Add up to exactly 64 bits, which is a Long type.
    * The advantage of SnowFlake is that, as a whole, it is automatically sorted by time, and there will be no ID collisions in the entire distributed system (differentiated by data center ID and machine ID), and it is more efficient. * After testing, SnowFlake It can generate about 260,000 IDs per second. */ public class SnowflakeIdWorker {// =============================Fields=========== ================================ /** Start time cut off (2015 -01-01) */ private final long twepoch = 1420041600000L; /** The number of digits occupied by the machine id*/ private final long workerIdBits = 5L; /** The number of bits occupied by the data identifier id*/ private final long datacenterIdBits = 5L; /** The maximum machine id supported, the result is 31 (this shift algorithm can quickly calculate the maximum decimal number that can be represented by a few binary numbers) */ private final long maxWorkerId = -1L ^ (-1L << workerIdBits); /** The largest supported data identification id, the result is 31 */ private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); /** The sequence is in the id Number of bits*/ private final long sequenceBits = 12L; /** Machine ID is shifted 12 digits to the left*/ private final long workerIdShift = sequenceBits; /** Data identification id is shifted 17 bits to the left (12+5) */ private final long datacenterIdShift = sequenceBits + workerIdBits; /** Time cut is shifted to the left by 22 bits ( 5+5+12) */ private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; / ** The mask of the generated sequence, here is 4095 (0b111111111111=0xfff=4095) */ private final long sequenceMask = -1L ^ (-1L << sequenceBits); /** 工作机器ID(0~31) */ private long workerId; /** 数据中心ID( 0~31) */ private long datacenterId; /** 毫秒内序列(0~4095 ) */ private long sequence = 0L; / ** 上次生成ID的时间截*/ private long lastTimestamp = -1 L; //==============================Constructors===================================== /** * 构造函数 * @param workerId 工作ID (0~31) * @param datacenterId 数据中心ID (0~31) */ public SnowflakeIdWorker(long workerId, long datacenterId) { if (workerId > maxWorkerId || workerId < 0) { throw new IllegalArgumentException(String.format("worker Id can‘t be greater than %d or less than 0", maxWorkerId)); } if (datacenterId > maxDatacenterId || datacenterId < 0) { throw new IllegalArgumentException(String.for mat("datacenter Id can‘t be greater than %d or less than 0", maxDatacenterId)); } this.workerId = workerId; this.datacenterId = datacenterId; } // ==============================Methods========================================== /** * 获得下一个ID (该方法是线程安全的) * @return SnowflakeId */ public synchronized long nextId() { long timestamp = timeGen(); //如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过这个时候应当抛出异常 if (timestamp < lastTimestamp) { throw new RuntimeException( String.format("Clock moved backwards. Refusing to generate id for %d milli seconds", lastTimestamp - timestamp)); } //如果是同一时间生成的,则进行毫秒内序列 if (lastTimestamp == timestamp) { sequence = (sequence + 1) & sequenceMask; //毫秒内序列溢出 if (sequence == 0) { //阻塞到下一个毫秒,获得新的时间戳 timestamp = tilNextMillis(lastTimestamp); } } //时间戳改变,毫秒内序列重置 else { sequence = 0L; } //上次生成ID的时间截 lastTimestamp = timestamp; //移位并通过或运算拼到一起组成64位的ID return ((timestamp - twepoch) << timestampLeftShift) // | (datacenterId << datacenterIdShift) // | (workerId << workerIdShift) // | sequence; } /** * 阻塞到下一个毫秒,直到获得新的时间戳 * @param la stTimestamp 上次生成ID的时间截 * @return 当前时间戳 */ protected long tilNextMillis(long lastTimestamp) { long timestamp = timeGen(); while (timestamp <= lastTimestamp) { timestamp = timeGen(); } return timestamp; } /** * 返回以毫秒为单位的当前时间 * @return 当前时间(毫秒) */ protected long timeGen() { return System.currentTimeMillis(); } //==============================Test============================================= /** 测试 */ public static void main(String[] args) { SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0); for (int i = 0; i < 1000; i++) { long id = idWorker.nextId(); System.out.println(Long.toBinaryString(id)); System.out.println(id); } } }

    数据库自增ID机制

    主要思路是采用数据库自增ID + replace_into实现唯一ID的获取。

    create table t_global_id( id bigint(20) unsigned not null auto_increment, stub char(1) not null default ‘‘, primary key (id), unique key stub (stub) ) engine=MyISAM; 
    # 每次业务可以使用以下SQL读写MySQL得到ID号
    replace into t_golbal_id(stub) values(‘a‘); select last_insert_id(); 

    replace into跟insert功能类似,不同点在于:replace into首先尝试插入数据列表中,如果发现表中已经有此行数据(根据主键或唯一索引判断)则先删除,再插入。否则直接插入新数据。
    当然为了避免数据库的单点故障,最少需要两个数据库实例,通过区分auto_increment的起始值和步长来生成奇偶数的ID。如下:

    Server1:
    auto-increment-increment = 2
    auto-increment-offset = 1
    
    Server2:
    auto-increment-increment = 2
    auto-increment-offset = 2

    优点

    • 简单。充分借助数据库的自增ID机制,可靠性高,生成有序的ID。

    缺点

    • ID生成依赖数据库单机的读写性能。
    • 依赖数据库,当数据库异常时整个系统不可用。

    对于MySQL的性能问题,可以用如下方案解决
    在分布式环境中,我们可以部署N台数据库实例,每台设置成不同的初始值,自增步长为机器的台数。每台的初始值分别为1,2,3…N,步长为N。

     
    分享图片 MySQL数据库自增ID多机图

    以上方案虽然解决了性能问题,但是也存在很大的局限性:

    • 系统水平扩容困难:系统定义好步长之后,增加机器之后调整步长困难。如果要添加机器怎么办?假设现在只有一台机器发号是1,2,3,4,5(步长是1),这个时候需要扩容机器一台。可以这样做:把第二台机器的初始值设置得比第一台超过很多,比如14(假设在扩容时间之内第一台不可能发到14),同时设置步长为2,那么这台机器下发的号码都是14以后的偶数。然后摘掉第一台,把ID值保留为奇数,比如7,然后修改第一台的步长为2。让它符合我们定义的号段标准,对于这个例子来说就是让第一台以后只能产生奇数。扩容方案看起来复杂吗?貌似还好,现在想象一下如果我们线上有100台机器,这个时候要扩容该怎么做?简直是噩梦。
    • 数据库压力大:每次获取一个ID都必须读写一次数据库。当然对于这种问题,也有相应的解决方案,就是每次获取ID时都批量获取一个区间的号段到内存中,用完之后再来获取。数据库的性能提高了几个量级。

    第三方软件生成(Redis)

    Redis实现了一个原子操作INCR和INCRBY实现递增的操作。当使用数据库性能不够时,可以采用Redis来代替,同时使用Redis集群来提高吞吐量。可以初始化每台Redis的初始值为1,2,3,4,5,然后步长为5。各个Redis生成的ID为:

    A:1,6,11,16,21
    B:2,7,12,17,22
    C:3,8,13,18,23
    D:4,9,14,19,24
    E:5,10,15,20,25

    优点

    • 不依赖于数据库,灵活方便,且性能优于数据库。
    • 数字ID天然排序,对分页或者需要排序的结果很有帮助。

    缺点:

    • 如果系统中没有Redis,还需要引入新的组件,增加系统复杂度。
    • 需要编码和配置的工作量比较大。这个都不是最大的问题。

    关于分布式全局唯一ID的生成,各个互联网公司有很多实现方案,比如美团点评的Leaf-snowflake,用zookeeper解决了各个服务器时钟回拨的问题,弱依赖zookeeper。以及Leaf-segment类似上面数据库批量ID获取的方案。

     
    分享图片 snowflake id生成规则