Interpretation of distributed lock principle and three implementations

At present, almost many large-scale websites and applications are deployed in a distributed manner. The issue of data consistency in distributed scenarios has always been a relatively important topic. The distributed CAP theory tells us that “Any distributed system cannot satisfy Consistency, Availability, and Partition tolerance at the same time, and can only satisfy two at the same time.” So , Many systems have to make a choice between these three at the beginning of the design. In most scenarios in the Internet field, strong consistency needs to be sacrificed in exchange for high availability of the system. The system often only needs to ensure “final consistency”, as long as the final time is within the range acceptable to users.

In many scenarios, in order to ensure the final consistency of the data, we need a lot of technical solutions to support, such as distributed transactions, distributed locks Wait. Sometimes, we need to ensure that a method can only be executed by the same thread at the same time. In a stand-alone environment, Java actually provides a lot of APIs related to concurrent processing, but these APIs are powerless in a distributed scenario. In other words, pure Java APIs cannot provide distributed lock capabilities. Therefore, there are currently many solutions for the realization of distributed locks.

For the realization of distributed locks, the following schemes are currently more commonly used:

Realize distributed lock based on database
based on cache (redis, memcached, tair) Realize distributed locks
Realize distributed locks based on Zookeeper

Before analyzing these implementations, let’s think about what we need What should a distributed lock look like? (The method lock is taken as an example here, and the resource lock is the same.)

It can ensure that the same application cluster in a distributed deployment A method can only be executed by one thread on one machine at the same time.
If this lock is a reentrant lock (to avoid deadlock)
This lock is best to be a blocking lock (consider whether or not this one is required based on business needs)
Have highly available access Lock and release lock function
The performance of acquiring and releasing lock is better

Realize distributed lock based on database

based on database table

To implement distributed locks, the easiest way may be to directly create a lock table, and then manipulate the table The data in it came true.
When we want to lock a method or resource, we add a record to the table, and delete it when we want to release the lock This record.

Create such a database table:

 CREATE TABLE `methodLock` (

`id`
int(11) NOT NULL AUTO_INCREMENT COMMENT'primary key' ,
`method_name` varchar(
64) NOT NULL DEFAULT ‘‘ COMMENT ‘Locked method name’,
`desc` varchar(
1024) NOT NULL DEFAULT ‘Remarks’,
`update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT
‘Save data time, automatically generate ‘,
PRIMARY KEY (`id`),
UNIQUE KEY `uidx_method_name` (`method_name `) USING BTREE
) ENGINE
=InnoDB DEFAULT CHARSET=utf8 COMMENT='The method in lock';

When we want to lock a method, execute the followingSQL:

insert into methodLock(method_name,desc) values ​​( 'method_name','desc')

Because we have made a unique constraint onmethod_name, if there are more If a request is submitted to the database at the same time, the database will ensure that only one operation can succeed, then we can think that the thread that succeeded in the operation has obtained the lock of the method and can execute the method body content.

After the method is executed, if you want to release the lock, you need to execute the followingSql:

delete from methodLock where method_name ='method_name'

The above simple implementation has the following problems :

1. This lock strongly depends on the availability of the database. The database is a single point. Once the database goes down, the business system will become unavailable.
2. This lock has no expiration time. Once the unlock operation fails, the lock record will always be in the database, and other threads can no longer obtain the lock.
3. This lock can only be non-blocking, because the insert operation of the data will directly report an error once the insert fails. Threads that have not acquired the lock will not enter the queue. If they want to acquire the lock again, they must trigger the acquisition operation again.
4. This lock is non-reentrant, and the same thread cannot obtain the lock again before releasing the lock. Because the data in the data already exists.

Of course, we can also solve the above problems in other ways.

Is the database single point? Engage in two databases and synchronize the data in both directions. Once hung up, quickly switch to the standby database.
No expiration time? Just do a timed task to clean up the overtime data in the database at regular intervals.
non-blocking? Make a while loop, and then return to success until insert is successful.
Non-reentrant? Add a field to the database table to record the host information and thread information of the machine currently acquiring the lock, then query the database first when acquiring the lock next time. If the host information and thread information of the current machine can be found in the database, directly Just assign the lock to him.

based on database exclusive lock

In addition to adding and deleting records in the data table, you can actually use the locks in the data to implement distributed locks.

We also use the database table we just created. Distributed locks can be realized through exclusive locks in the database. Based onMySql’s InnoDB engine, you can use the following methods to achieve lock operation:

< pre>public Boolean lock(){

connection.setAutoCommit(
false)

while(true){

try{

result
= select * from methodLock where method_name=xxx for update;

if(result==null){

return true;

}

}

catch(Exception e){

}

sleep(
1000);

}

return false;

}

Add for update after the query statement, the database will be in the query process Add exclusive locks to the database table in the database table (here is one more sentence, when the InnoDB engine locks, it will only use row-level locks when searching through the index, otherwise it will use table-level locks. Here we hope to use row-level locks , It is necessary to add an index to method_name. It is worth noting that this index must be created as a unique index, otherwise there will be a problem that multiple overloaded methods cannot be accessed at the same time. It is recommended to add parameter types to overloaded methods. .). When an exclusive lock is added to a record, other threads can no longer add an exclusive lock on the row of records.

We can think that the thread that obtains the exclusive lock can obtain the distributed lock. After the lock is obtained, the business logic of the method can be executed, and the After finishing the method, unlock it by the following method:

public void unlock(){

connection.commit();
}

viaconnection.commit() Operate to release the lock.

This method can effectively solve the problem of unable to release locks and blocking locks mentioned above.

blocking lock? The for update statement will return immediately after the execution is successful, and will remain in a blocking state when the execution fails until it succeeds.
The service is down after being locked and cannot be released? In this way, the database will release the lock by itself after the service goes down.

But it still cannot directly solve the problem of single point and reentrant database.

There may be another problem here, although we use a unique index for method_name and show the use of for update To use row-level locks. However, MySql will optimize the query. Even if the index field is used in the condition, whether to use the index to retrieve the data is determined by MySQL by judging the cost of different execution plans. If MySQL thinks that the full table scan is more efficient, such as For some very small tables, it will not use indexes. In this case, InnoDB will use table locks instead of row locks. If this happens, it will be a tragedy. . .

There is another problem, that is, we need to use an exclusive lock for distributed locklock, then an exclusive lock If it is not submitted for a long time, the database connection will be occupied. Once there are more similar connections, the database connection pool may burst

Summary< /strong>

Summarize the method of using a database to implement distributed locks. Both methods rely on a table in the database, one It is to determine whether there is a lock currently exists through the existence of records in the table, and the other is to implement distributed locks through exclusive locks in the database.

The advantages of database implementation of distributed lock

< span style="font-family: Tahoma;">Directly with the help of the database, easy to understand.

Disadvantages of implementing distributed locks in databases

< span style="font-family: Tahoma;">There will be various problems, and the process of solving the problems will make the whole program more and more complicated.
Operating the database requires a certain amount of overhead, and performance issues need to be considered.
The row-level lock of the database is not necessarily reliable, especially when our lock table is not large.

Realize distributed lock based on cache

Compared with the solution based on the database to realize the distributed lock, the implementation based on the cache will perform better in terms of performance. And many caches can be deployed in clusters, which can solve single-point problems.

There are currently many mature caching products, including Redis, memcached and Tair within our company.

Here we takeTair as an example to analyze the scheme of using cache to realize distributed lock. There are many related articles about Redis and memcached on the Internet, and there are also some mature frameworks and algorithms that can be used directly.

The implementation of distributed locks based onTair is actually similar to Redis, and the main implementation method is to use the TairManager.put method.

public boolean< span style="color: #000000;"> trylock(String key) {

ResultCode code = ldbTairManager.put(NAMESPACE, key, "This is a Lock.", 2, 0);
if (ResultCode.SUCCESS.equals(code))
return true;
else
return false;
}
public boolean unlock(String key) {
ldbTairManager.invalid(NAMESPACE, key);
}

The above implementation also has several problems:

1. This lock has no expiration time. Once the unlock operation fails, the lock record will always be in tair, and other threads can no longer obtain the lock.
2. This lock can only be non-blocking, and it will return directly regardless of success or failure.
3. This lock is non-reentrant. After a thread acquires the lock, it cannot acquire the lock again before releasing the lock, because the key used already exists in tair. The put operation can no longer be performed.

Of course, there are also ways to solve it.

No expiration time? Tair’s put method supports passing in the expiration time, and the data will be automatically deleted after the time is reached.
Non-blocking? while repeat execution.
Non-reentrant? After a thread acquires the lock, it saves the current host information and thread information, and checks whether it owns the current lock before acquiring it next time.

However, how long should I set the expiration time? How to set the invalidation time is too short, the lock is automatically released before the method is executed, then concurrency problems will occur. If the set time is too long, other threads that acquire the lock may have to wait for a while. This problem also exists when using a database to implement distributed locks

Summary

You can use the cache instead of the database to implement distributed locks. This can provide better performance. At the same time, many cache services are deployed in clusters, which can avoid single Point question. And many cache services provide methods that can be used to implement distributed locks, such as the put method of Tair and the setnx method of redis. Moreover, these cache services also provide support for automatic deletion of data after expiration, and you can directly set the timeout period to control the release of the lock.

The advantages of using cache to implement distributed locks

The performance is good and it is more convenient to implement.

Disadvantages of using cache to implement distributed locks

It is not very reliable to control the expiration time of the lock through the timeout period.

Based on Zookeeper to implement distributed locks

Based on zookeeper temporary ordered nodes can achieve distributed locks.

The general idea is: When each client locks a method, the and This method generates a unique instantaneous ordered node under the directory of the specified node corresponding to this method. The way to determine whether to acquire a lock is very simple, only the one with the smallest sequence number among the ordered nodes is judged. When the lock is released, only the instantaneous node needs to be deleted. At the same time, it can avoid deadlock problems caused by locks that cannot be released due to service downtime.

Let’s see whether Zookeeper can solve the aforementioned problems.

The lock cannot be released? Using Zookeeper can effectively solve the problem that the lock cannot be released, because when the lock is created, the client will create a temporary node in ZK. Once the client acquires the lock, it suddenly hangs up (Session connection is disconnected) , Then this temporary node will be deleted automatically. Other clients can obtain the lock again.
Non-blocking lock? Using Zookeeper can achieve blocking locks. The client can create sequential nodes in ZK and bind listeners on the nodes. Once the node changes, Zookeeper will notify the client, and the client can check the one created by itself. Is the node with the smallest serial number among all the current nodes? If it is, then you have acquired the lock and can execute the business logic.

Not reentrant? Using Zookeeper can also effectively solve the problem of non-reentrancy. When the client creates a node, it writes the host information and thread information of the current client directly to the node, and the next time it wants to acquire the lock, and Just compare the data in the smallest node at present. If the information is the same as your own, then you directly obtain the lock. If it is not the same, you will create a temporary sequence node to participate in the queuing.

Single point question? Using Zookeeper can effectively solve single-point problems. ZK is deployed in a cluster. As long as more than half of the machines in the cluster survive, it can provide services to the outside world.

Can be used directlyzookeeper third-party library CuratorClient, this client A reentrant lock service is encapsulated in the end.

public boolean< /span> tryLock(long timeout, TimeUnit unit) throws InterruptedException {

try {
return interProcessMutex.acquire(timeout, unit);
}
catch (Exception e) {
e.printStackTrace();
}
return true;
}
public boolean unlock() {
try {
interProcessMutex.release();
}
catch (Throwable e) {
log.error(e.getMessage(), e);
}
finally {
executorService.schedule(
new Cleaner(client, path), delayTimeForClean, TimeUnit.MILLISECONDS);
}
return true;
}

InterProcessMutex provided by Curator is an implementation of distributed locks. The acquire method user acquires the lock, and the release method is used to release the lock.

The distributed lock implemented using ZK seems to fully meet all our expectations of a distributed lock at the beginning of this article. However, it is not. The distributed lock implemented by Zookeeper actually has a disadvantage, that is, the performance may not be as high as the cache service. Because each time in the process of creating and releasing locks, instantaneous nodes must be dynamically created and destroyed to achieve the lock function. The creation and deletion of nodes in ZK can only be performed through the Leader server, and then the data cannot be shared on all Follower machines.

In fact, using Zookeeper may also cause concurrency problems, but it is not common. Considering such a situation, due to network jitter, the client can disconnect the session of the ZK cluster. Then zk thinks that the client is down, and will delete the temporary node. At this time, other clients can obtain the distributed lock. It may cause concurrency problems. This problem is not common because zk has a retry mechanism. Once the zk cluster fails to detect the client’s heartbeat, it will retry. The Curator client supports multiple retry strategies. The temporary node will be deleted if it fails after several retries. (Therefore, choosing a suitable retry strategy is also more important. It is necessary to find a balance between lock granularity and concurrency.)

Summary

UseZookeeper to realize the advantages of distributed lock< /strong>

Effectively solve single point problems, non-reentrant problems, non-blocking problems, and locks cannot be released. It is relatively simple to implement.

Disadvantages of using Zookeeper to implement distributed locks

The performance is not as good as using cache to implement distributed locks. Need to understand the principle of ZK.

Comparison of the three options

top There are several ways, none of which can be perfect. Just like CAP, complexity, reliability, performance, etc. cannot be satisfied at the same time. Therefore, it is the kingly way to choose the most suitable one according to different application scenarios.

From the perspective of difficulty of understanding (from low to high)
Database> Cache> Zookeeper

From the perspective of implementation complexity (from low to high)
Zookeeper >= Cache> Database

From a performance perspective (from high to low)
Cache> Zookeeper >= Database

From a reliability perspective (from high to low)
Zookeeper> Cache> Database

https://mp.weixin.qq.com/s/vkvYJnKfQyuUeD_BDQy_1g

For more learning materials, you can add a group: 473984645 or scan The QR code below

share picture

CREATE TABLE `methodLock` (

`id`
int(11) NOT NULL AUTO_INCREMENT COMMENT'primary key' ,
`method_name` varchar(
64) NOT NULL DEFAULT ‘‘ COMMENT ‘Locked method name’,
`desc` varchar(
1024) NOT NULL DEFAULT ‘Remarks’,
`update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT
‘Save data time, automatically generate ‘,
PRIMARY KEY (`id`),
UNIQUE KEY `uidx_method_name` (`method_name `) USING BTREE
) ENGINE
=InnoDB DEFAULT CHARSET=utf8 COMMENT='The method in the lock';

insert into methodLock(method_name,desc) values ​​(' method_name','desc')

delete from methodLock where method_name ='method_name'

public Boolean lock(){

connection.setAutoCommit(
false)
while(true){
try{
result
= select * from methodLock where method_name=xxx for update;
if(result==null){
return true;
}
}
catch(Exception e){
}
sleep(
1000);
}
return false;
}

public void unlock(){

connection.commit();
}

public boolean trylock(String key) {

ResultCode code
= ldbTairManager.put(NAMESPACE, key, "This is a Lock.", 2, 0);
if (ResultCode.SUCCESS.equals(code))
return true;
else
return false;
}
public boolean unlock(String key) {
ldbTairManager.invalid(NAMESPACE, key);
}

public boolean tryLock(long timeout, TimeUnit unit) throws InterruptedException {

try {
return interProcessMutex.acquire(timeout, unit);
}
catch (Exception e) {
e.printStackTrace();
}
return true;
}
public boolean unlock() {
try {
interProcessMutex.release();
}
catch (Throwable e) {
log.error(e.getMessage(), e);
}
finally {
executorService.schedule(
new Cleaner(client, path), delayTimeForClean, TimeUnit.MILLISECONDS);
}
return true;
}

Leave a Comment

Your email address will not be published.