Foreword: I believe many people have seen the spike system, such as the spike of Jingdong or Taobao, and the spike of Xiaomi mobile phones. So how is the background of the spike system implemented? How do we design a spike system? What issues should be considered for the spike system? How to design a robust spike system? In this issue, let’s discuss this issue:
p>
Blog directory
One: Issues that the spike system should consider
Two: spike System design and technical solutions
Three: System architecture diagram
Four: Summary
One: What issues should be considered for seckill
1.1: Oversold issue
Analyze the business scenario of seckill, the most important point is oversold The problem is that if there are only 100 stocks, but the final oversold is 200, generally speaking, the price of the spike system is relatively low. If oversold will seriously affect the company’s property interests, the first thing to do is to solve the problem of oversold goods.
1.2: High concurrency
The spike has the characteristics of short time and large amount of concurrency. The spike lasts only a few minutes. However, in order to create a sensational effect, the average company will use a very low Price to attract users, so there will be a lot of users participating in panic buying. In a short period of time, there will be a large number of requests coming in. How to prevent the back-end from being too high to cause cache breakdown or invalidation, and breaking the database are all issues that need to be considered.
1.3: Interface anti-brushing
Most of the current spikes will come out for spikes corresponding software, this kind of software will simulate and continuously initiate requests to the background server, hundreds of times a second Very common, how to prevent repeated invalid requests of this kind of software and prevent continuous requests also need our targeted consideration
1.4: spike url
For ordinary users, What you see is just a relatively simple spike page. Before the specified time, the spike button is gray. Once the specified time is reached, the gray button becomes clickable. This part is for Xiaobai users. If you are a user with a little computer knowledge, you will see the seckill URL through F12 to look at the browser’s network, and you can also achieve the seckill by requesting it through specific software. Or those who know the spike url in advance can directly realize the spike as soon as they request it. We need to consider solving this problem
1.5: Database Design
Seckill has the risk of destroying our server. If it is used in the same database with our other businesses, it will be coupled Together, it is very likely to involve and affect other businesses. How to prevent this kind of problem from happening? Even if the seckill is down or the server is stuck, you should try not to affect the normal online business
1.6: Mass request problem
According to the consideration of 1.2, even if the cache is used, it is not enough to cope with the impact of short-term high-concurrency traffic. How to carry such a huge amount of visits while providing a stable and low-latency service guarantee is a major challenge that needs to be faced. Let’s calculate an account. If redis caching is used, the QPS that a single redis server can withstand is about 4W. If a spike attracts enough users, a single QPS may reach hundreds of thousands, and a single redis is still Not enough to support such a huge amount of requests. The cache will be penetrated and penetrate directly into the DB, thus destroying mysql. A large number of errors will be reported in the background
Two: The design and technical solution of the spike system
< p>2.1: Design of the spike system database
In view of the spike database problem raised in 1.5, a spike database should be designed separately to prevent the high concurrent access of spike activities from dragging down the entire website. Only two tables are needed here, one is the spike order form and the other is the spike product list
In fact, there should be several tables. Product table: You can find specific product information, product image, name, usual price, spike price, etc., by linking goods_id, and user table: According to user_id, you can find user nickname, The user’s mobile phone number, delivery address and other additional information, this specific example will not be given.
2.2: The design of spike url
In order to prevent people with program access experience from directly accessing the backend interface through the order page url to spike the goods, we need to make the spike url dynamic , Even the people who develop the entire system cannot know the url of the spike before the spike starts. The specific method is to encrypt a string of random characters as the seckill url through md5, and then the front end accesses the background to obtain the specific url, and the seckill can only be continued after the background verification is passed.
2.3: Static seckill page
All product descriptions, parameters, transaction records, images, reviews, etc. are written to a static page, and user requests do not need to access the backend The server does not need to go through the database and is directly generated on the front-end client, which can reduce the pressure on the server as much as possible. The specific method can use the freemarker template technology to create a web page template, fill in the data, and then render the web page
2.4: Single redis upgrade to cluster redis
Seckill is a read more write less In the scenario, using redis for caching is more appropriate. However, considering the cache breakdown problem, we should build a redis cluster and adopt the sentinel mode to improve the performance and availability of redis.
2.5: Use nginx
nginx is a high-performance web server, its concurrency capacity can reach tens of thousands, while tomcat only has a few hundred. Mapping client requests through nginx and then distributing them to the background tomcat server cluster can greatly improve concurrency.
2.6: Streamline sql
A typical scenario is when inventory is deducted. The traditional approach is to query inventory first, and then update. In this case, two SQLs are required, but in fact one SQL can be completed. You can use this approach: update miaosha_goods set stock =stock-1 where goos_id ={#goods_id} and version = #{version} and sock>0; In this way, you can ensure that the inventory will not be oversold and update the inventory at one time. One thing to note is that optimistic locks with version numbers are used here. Compared with pessimistic locks, its performance is better.
2.7: Redis pre-decrease inventory
Many requests come in and need to query inventory in the background. This is a frequently read scene. You can use redis to pre-decrease the inventory. You can set the value in redis before the spike starts, such as redis.set(goodsId,100), where the pre-released inventory is 100 and the value can be set as a constant). After each order is successfully placed, Integer stock = (Integer)redis.get(goosId); Then determine the value of sock, if it is less than the constant value, subtract 1; but note that when canceling, you need to increase the inventory, and when you increase the inventory, you must also pay attention to the set value. The set total inventory (checking inventory and deducting inventory requires atomic operations, at this time, you can use the lua script) when you place an order and get the inventory next time, you can check it directly from redis.
2.8: Interface current limit
The ultimate essence of seckill is the update of the database, but there are a lot of invalid requests. What we ultimately need to do is how to filter out these invalid requests To prevent penetration into the database. For current limiting, there are many aspects that need to be started:
2.9.1: Front-end current limiting
The first step is to limit the current through the front-end. The user initiates a request after clicking the spike button. Then it cannot be clicked in the next 5 seconds (by setting the button to disable). This small measure has a small cost to develop, but it is very effective.
2.9.2: Direct rejection of repeated requests within xx seconds for the same user
The specific number of seconds depends on the actual business and the number of spikes, and is generally limited to 10 seconds. The specific method is to use redis’ key expiration strategy. First, for each request, string value = redis.get(userId); if you get this
value is empty or null, it means it is valid Request, and then release the request. If it is not empty, it means that it is a repetitive request, and the request is directly discarded. If it is valid, use redis.setexpire(userId,value,10). Value can be any value, generally it is better to put business attributes, this is to set the userId as the key, and the expiration time of 10 seconds (after 10 seconds, the value corresponding to the key) Automatically null)
2.7.3: Token Bucket Algorithm Current Limiting
There are many strategies for interface current limiting, and we use the token bucket algorithm here. The basic idea of the token bucket algorithm is that each request attempts to obtain a token, and the backend only processes requests that hold tokens. We can limit the speed and efficiency of token production. Guava provides RateLimter APIs for us to use. . The following is a simple example, pay attention to the need to introduce guava
public class TestRateLimiter {
public < span style="color: #0000ff">static void main(String[] args) {
//1 token per second
final RateLimiter rateLimiter = RateLimiter.create(1);
for (int i = 0; i <10; i++) {
//This method will block the thread until it is available in the token bucket The execution continues downward until the token is issued.
double waitTime= rateLimiter.acquire();
System.out .println("Task Execution" + i + "Waiting Time" + waitTime);
}
System.out.println( span>"End of execution");
}
}
The idea of the above code is to pass RateLimiter limits our token bucket to produce 1 token per second (the production efficiency is relatively low), and the task is executed 10 times in a loop. Acquire will block the current thread until the token is acquired, that is, if the task does not acquire the token, it will wait forever. Then the request will be stuck in the time we set before it can go down. This method returns the specific waiting time of the thread. The execution is as follows;
You can see In the process of task execution, the first one does not need to wait, because the token has been produced in the first second. The next task request must wait until the token bucket generates a token before it can continue to execute. If it is not obtained, it will be blocked (there is a pause in the process). But this method is not very good, because if the user requests on the client side, if there are more, the token will be stuck in the production of the direct background (poor user experience), it will not abandon the task, we need a better one Strategy: If it is not obtained for a certain period of time, directly reject the task. Here is another case:
public class< /span> TestRateLimiter2 {
public static void main(String[] args) {
< /span>final RateLimiter rateLimiter = RateLimiter.create(1);
span>for (int i = 0; i <10; i++) {
long timeOut = (long ) 0.5;
boolean isValid = rateLimiter.tryAcquire(timeOut, TimeUnit.SECONDS);
Sys tem.out.println("Task" + i + "Is the execution valid:" + isValid);
if (!isValid) {
continue;
}
System.out.println("Task" + i + "In execution");
}
System.out.println("End");
}
}
The tryAcquire method is used. The main function of this method is to set a timeout. If within a specified time estimate (note that it is estimated Will not wait for real), If you can get the token, it will return true, if you can’t get it, it will return false. Then we let the invalid ones skip directly. Here we set to produce 1 token per second. Let each task try to obtain the token within 0.5 seconds. If it cannot be obtained, skip the task directly (putting it in the spike environment is to directly discard the request); the actual operation of the program is as follows:
p>
Only the first one got the order The card was successfully executed, and the following basics were directly discarded, because within 0.5 seconds, the token bucket (1 in 1 second) must not be obtained before production and return false.
2.8: Asynchronous Order Placement
In order to improve the efficiency of order placement and prevent the failure of order placement services. The operation of placing an order needs to be processed asynchronously. The most commonly used method is to use queues. The three most significant advantages of queues are: asynchrony, peak clipping, and decoupling. Rabbitmq can be used here. After the current limit and inventory check are performed in the background, the valid request flows into this step. Then it is sent to the queue, and the queue receives the message and places the order asynchronously. After the order is placed, there is no problem with the storage, and the user can be notified by SMS of the success of the spike. If it fails, you can use a compensation mechanism and try again.
2.9: Service degradation
If a server is down during the spike, or the service is unavailable, backup work should be done. The previous blog has introduced service fusing and downgrading through Hystrix, and a backup service can be developed. If the server is really down, a friendly prompt will be returned to the user directly, rather than blunt feedback such as stuck or server error.
Three: Summary
Scill flow chart:
This is the spike flow chart I designed. Of course, different spike sizes target different technology selections. This process can support hundreds of thousands of traffic, if it is tens of millions If it exceeds 100 million, it will have to be redesigned. For example, the database sub-database sub-table, the queue is changed to use kafka, redis to increase the number of clusters and other means. The main purpose of this design is to show how we deal with high-concurrency processing and start to try to solve it. Thinking more and doing more at work can improve our ability level, come on! If there are any errors in this blog, please point them out, it is greatly appreciated.
public class TestRateLimiter {
public static< /span> void main(String[] args) {
//1 token per second
final RateLimiter rateLimiter = RateLimiter.create(1);
for (int i = 0; i <10; i++) {
//This method will block the thread until the token is available in the token bucket before continuing to execute downward.
double waitTime= rateLimiter.acquire();
System.out .println("Task Execution" + i + "Waiting Time" + waitTime);
}
System.out.println( span>"End of execution");
}
}
public class TestRateLimiter2 {
public static void span> main(String[] args) {
final RateLimiter rateLimiter = RateLimiter. create(1);
for (int i = 0; i <10; i++) {< br /> long timeOut = (long) 0.5;
boolean isValid = rateLimiter.tryAcquire(timeOut, TimeUnit .SECONDS);
System.out.println("Task" + i + "Is the execution valid:" + isValid);
< /span>if (!isValid) {
continue;
}
System.out.println("task" + i + "in progress" );
}
System.out.println("End");
}
}