BDubbo series (07-4) cluster fault tolerance-cluster

[toc]

Spring Cloud Alibaba Series Catalog-Dubbo Chapter

1. Background introduction

Recommended documents:

Dubbo cluster fault tolerance-actual combat
Dubbo official website source code interpretation-cluster

In Dubbo’s entire cluster fault tolerance process, first go through the Directory to obtain all The Invoker list is then filtered by Routers according to the routing rules. In the end, the surviving Invoker also needs to go through the load balancing LoadBalance level to select the final invoked Invoker. In the previous article, the basic principles of service dictionary, service routing, and load balancing have been analyzed. Next, we will continue to analyze the entire process of cluster fault tolerance.

If there are multiple Invokers, which Invoker does the consumer call? What to do if the call fails, retry, throw an exception, or just print the exception, etc.? In order to deal with these problems, Dubbo defines the cluster interface Cluster and Cluster Invoker. Cluster The purpose of Cluster is to merge multiple service providers into a Cluster Invoker and expose this Invoker to service consumers. In this way, service consumers only need to make remote calls through this Invoker. As for the specific service provider to call, and how to deal with the failure of the call, they are now handed over to the cluster module for processing. The cluster module is the middle layer between service providers and service consumers, shielding the service providers from the service consumers, so that the service consumers can concentrate on handling remote invocation related matters. Such as sending a request, accepting data returned by the service provider, etc. This is the role of clusters.

Cluster essentially packs multiple Invokers into one Invoker, shielding consumers from internal load balancing and exception handling, so that consumers do not have to perceive the internal information of the cluster at all.

1.1 Cluster fault tolerance

Before analyzing the cluster-related code, it is necessary to introduce all the components of cluster fault tolerance. Including Cluster, Cluster Invoker, Directory, Router, LoadBalance, etc.

Figure 1 Dubbo cluster fault-tolerant components share picture

The cluster working process can be divided into two stages:

The first stage is during the initialization of service consumers. The Cluster implementation class creates an instance of Cluster Invoker for service consumers, that is, the merge operation in the figure above.
The second stage is when the service consumer makes a remote call. Take FailoverClusterInvoker as an example. This type of Cluster Invoker will first call the list method of Directory to enumerate the list of Invokers (invoker can be simply understood as a service provider). The purpose of Directory is to store Invoker, which can be simply analogized to List. Its implementation class RegistryDirectory is a dynamic service directory that can perceive changes in the registry configuration, and the Invoker list it holds will change with changes in the content of the registry. After each change, RegistryDirectory will dynamically add or delete Invokers, and call the route method of the Router for routing, filtering out Invokers that do not meet the routing rules. When FailoverClusterInvoker gets the Invoker list returned by Directory, it will select an Invoker from the Invoker list through LoadBalance. Finally, FailoverClusterInvoker will pass the parameters to the invoker method of the Invoker instance selected by LoadBalance to make a real remote call.

1.2 Fault tolerance strategy

Dubbo provides 9 cluster fault tolerance implementations.

Table 1 Dubbo cluster 9 fault tolerance strategies

Fault tolerance mechanism	Description
Failover Cluster	Automatically switch after failure. Dubbo’s default fault-tolerant mechanism will do load balancing and automatically switch to other servers to retry 3 times (the default number of times). Use scenario: read or idempotent write operation, retrying will increase the pressure on downstream service providers.
Failback Cluster	Failed to recover automatically. After the failure, it is recorded in the queue, and the timer is retried, and load balancing will be performed. Use scenario: asynchronous or eventually consistent request.
Failfast Cluster	Fail fast. After the request fails, an exception will be returned, without retrying, load balancing will be performed. Use scenario: non-idempotent operation.
Failsafe Cluster	Fail safe. After the request fails, the exception is ignored, no retry is performed, and load balancing is performed. Use scenario: don’t care whether the call is successful, eg: log record.
Forking Cluster	Call multiple services at the same time, as long as one succeeds, it will return. Scenario: Requests with high real-time requirements.
Broadcast Cluster	Broadcast multiple services, as long as one fails, it will fail, no load balancing is required. Use scenario: Usually used for broadcasting after user status is updated.
Available Cluster	The simplest way, without load balancing, traverse all the service lists and find every available service Call directly. If there is no node available, an exception is thrown directly.
Mock Cluster	Broadcast calls all available services, and any node reports an error.
Mergeable Cluster	Merge the results of multiple node requests.

Cluster use reference Dubbo cluster fault tolerance-actual combat. Available in Set the cluster attribute on the label. Such as:

2. Cluster structure

2.1 Cluster inheritance system

Figure 1 Dubbo cluster inheritance system diagram share picture

Summary: Like the service routing interface, Cluster is also an SPI interface, which is a factory class. To create a specific ClusterInvoker. The above class diagram only shows part of the implementation of Cluster. The Cluster interface is defined as follows:

@SPI(FailoverCluster.NAME)
public interface Cluster {
 @Adaptive
  Invoker join(Directory directory) throws RpcException;
}

Dubbo’s default cluster fault tolerance strategy is FailoverCluster, which is a failover strategy. When a server fails, switch to another server for retry. Dubbo The default retry is 3 times.

public class FailbackCluster implements Cluster {
 @Override
 public  Invoker join(Directory directory) throws RpcException {
 return new FailbackClusterInvoker< T>(directory);
 }
}

2.2 AbstractClusterInvoker

As you can see in the figure above, the specific implementation of cluster fault tolerance is inherited The abstract class AbstractClusterInvoker, this class mainly accomplishes two things:

Implemented the Invoker interface, and made a general abstract implementation of the Invoker#invoke method.
A general load balancing algorithm is implemented.

2.2.1 invoke execution

Let’s guess the execution process of the Invoker. Before execution, we must get all the service list invokers, and then according to the load The equalization algorithm obtains the specific executed Invoker and executes it at the end. As for how to deal with the call failure, it is a specific subclass for cluster fault tolerance.

@Override
public Result invoke(final Invocation invocation) throws RpcException {
 checkWhetherDestroyed();

 // 1. Bind attachments to invocation .
 Map contextAttachments = RpcContext.getContext().getAttachments();
 if (contextAttachments != null && contextAttachments.size() != 0) {
 (( RpcInvocation) invocation).addAttachments(contextAttachments);
 }

 // 2. List all Invokers through Directory
 List> invokers = list(invocation) ;
 // 3. Load LoadBalance
 LoadBalance loadbalance = initLoadBalance(invokers, invocation);
 RpcUtils.attachInvocationIdIfAsync(getUrl(), invocation);
 // 4. Call doInvoke Carry out follow-up operations
 return doInvoke(invocation, invokers, loadbalance);
}

Summary: In fact, Dubbo’s implementation is almost the same as guessed, and there will be Bind the attachments parameter. After that, all invokers are obtained through the Directory, loadbalance is initialized, and the specific execution logic is delegated to the subclass for implementation.

Note: When the list(invocation) method calls directory.list(invocation), it has been filtered by the routing rules. At this time, only the load balancing algorithm is needed.

2.2.2 Load balancing

AbstractClusterInvoker does not directly use Invoker invoker = loadbalance.select(invokers, getUrl(), invocation) for load balancing, but further encapsulation.

If the sticky connection is turned on, you need to cache the last used Invoker, and call it directly as long as the service is available. Load balancing is no longer required.
If the call fails, you need to perform load balancing again, and you need to exclude services that have been retried.

Figure 2 Dubbo cluster load balancing call process

graph LR AbstractClusterInvoker – sticky connection –> select select – loadbalance.select –> doSelect doSelect – Retry –> reselect

Summary: When AbstractClusterInvoker calls select for load balancing

select

The doSelect method calls the load balancing algorithm loadbalance.select.
reselect When the service selected by doSelect is unavailable, you need to retry for load balancing.

(1) Sticky connections

The select method mainly deals with sticky connections. The select method has four parameters: the first parameter is the load balancing algorithm; the second is the calling parameter; the third is the list of all registered services, and the fourth is the service after retrying.

protected Invoker select(LoadBalance loadbalance, Invocation invocation,
 List> invokers, List> selected) throws RpcException {
< br /> if (CollectionUtils.isEmpty(invokers)) {
 return null;
 }
 // 1. Get the name of the calling method
 String methodName = invocation == null? StringUtils .EMPTY: invocation.getMethodName();

 // 2. Get sticky configuration, sticky means sticky connection. The so-called sticky connection means to let service consumers as much as possible
 // call the same service provider, unless the provider hangs up and then switch
 boolean sticky = invokers.get(0).getUrl ()
 .getMethodParameter(methodName, CLUSTER_STICKY_KEY, DEFAULT_CLUSTER_STICKY);

 // 3. Check whether the invoice list contains stickyInvoker, if not,
 //Describe the service represented by stickyInvoker The provider is hung up and needs to be left blank at this time
 if (stickyInvoker != null && !invokers.contains(stickyInvoker)) {
 stickyInvoker = null;
 }
< br /> // 4. If it is a sticky connection, you need to determine whether the service has been retried and is temporarily unavailable.
 // sticky && stickyInvoker != null means a sticky connection
 / / (selected == null || !selected.contains(stickyInvoker)) indicates that the service has not been retried
 if (sticky && stickyInvoker != null && (selected == null || !selected.contains(stickyInvoker))) {
 // availablecheck=true means to determine whether the service is available every time
 if (availablecheck && stickyInvoker.isAvailable()) {
 return stickyInvoker;
 }
 }

 // 5. If the thread goes to the current code, it means that the previous stickyInvoker is empty or unavailable. 
 // At this point, continue to call doSelect to select Invoker
 Invoker invoker = doSelect(loadbalance, invocation, invokers, selected);

 // 6. sticky=true, Then cache the Invoker selected by the load balancing component
 if (sticky) {
 stickyInvoker = invoker;
 }
 return invoker;
}

< p>Summary: You can see that select mainly deals with sticky connections. If sticky connections are turned on and the service is available, the stickyInvoker will be returned directly. Otherwise, doSelect is called for load balancing.

(2) Load balancing

private Invoker doSelect(LoadBalance loadbalance, Invocation invocation,
 List> invokers, List> selected) throws RpcException {
 // 1. Determine whether load balancing is required
 if (CollectionUtils.isEmpty(invokers)) {
 return null;< br /> }
 if (invokers.size() == 1) {
 return invokers.get(0);
 }

 // 2. Pass Load balancing component selection Invoker
 Invoker invoker = loadbalance.select(invokers, getUrl(), invocation);

 // 3. If the Invoker selected by load balancing has been retried If it is over or unavailable, you need to reselect reselect
 if ((selected != null && selected.contains(invoker))
 || (!invoker.isAvailable() && getUrl() != null && availablecheck)) {
 try {
 // 3.1 Reselection
 Invoker rInvoker = reselect(loadbalance, invocation, invokers, selected, availablecheck);
 // 3.2 The reselected rInvoker is not empty, just return this rInvoker
 if (rInvoker != null) {
 invoker = rInvoker;
 // 3.3 If rinvoker is empty, return the next one (relative to the invoker selected by load balancing)
 // This can also be seen as part of the logic of reselection
} else {
 int index = invokers.indexOf(invoker);
 try {
 invoker = invokers.get((index + 1)% invokers.size());
} catch (Exception e) {
 logger.warn(e.getMessage() + "may because invokers list dynamic change, ignore.", e);
 }
 }
} catch (Throwable t) {
 }
 }
 return invoker;
}

Summary: doSelect mainly does two things The first thing is to select Invoker through the load balancing component. The second is that if the selected Invoker is unstable or unavailable, you need to call the reselect method to reselect. If the Invoker selected by reselect is empty, locate the position index of the invoker selected by load balancing in the invokers list, and then obtain the invoker at index + 1. This can also be regarded as a part of the reselection logic. Let’s take a look at the logic of the reselect method.

(3) Reselect

reselect to re-load balance, first load balance the available invokers that have not been retried, if all have been retried If the service is retried, the available services will be filtered out and load balancing will be performed again.

private Invoker reselect(LoadBalance loadbalance, Invocation invocation,
 List> invokers, List> selected,
 boolean availablecheck) throws RpcException {

 List> reselectInvokers = new ArrayList<>(
 invokers.size()> 1? (Invokers.size()-1): invokers.size( ));

 // 1. Filter out the invokers that are not in the selected set for load balancing
 for (Invoker invoker: invokers) {
 if (availablecheck && !invoker.isAvailable()) {
 continue;
 }

 if (selected == null || !selected.contains(invoker)) {
 reselectInvokers. add(invoker);
 }
 }

 // reselectInvokers is not empty, at this time select through the load balancing component
 if (!reselectInvokers.isEmpty() ) {
 return loadbalance.select(reselectInvokers, getUrl(), invocation);
 }

 // 2. Only available for selected (selected) The invoker performs load balancing again
 if (selected != null) {
 for (Invoker invoker: selected) {
 if ((invoker.isAvailable()) && !reselectInvokers.contains(invoker)) {
 reselectInvokers.add(invoker);
} 
 }
 }
 if (!reselectInvokers.isEmpty()) {
 return loadbalance.select(reselectInvokers, getUrl(), invocation);
 }

 return null;
}

Summary: Reselect has also done fault tolerance processing, the code can be divided into two parts:

Part 1: Re-balance load among services that have not been retried.
Part 2: If all retries have been retried, filter the available services among the retried services and perform load balancing again.

3. Cluster fault tolerance

3.1 FailoverClusterInvoker

FailoverClusterInvoker failover , That is, when the call fails, the Invoker will be automatically switched to try again. Under the default configuration, Dubbo will use this class as the default Cluster Invoker. Let’s take a look at the logic of this class.

public Result doInvoke(Invocation invocation, final List> invokers,
 LoadBalance loadbalance) throws RpcException {
 List> copyInvokers = invokers;< br /> checkInvokers(copyInvokers, invocation);
 // 1. Get parameters, such as the number of retries
 String methodName = RpcUtils.getMethodName(invocation);
 int len = getUrl(). getMethodParameter(methodName, RETRIES_KEY, DEFAULT_RETRIES) + 1;
 if (len <= 0) {
 len = 1;
 }
 RpcException le = null;
 List > invoked = new ArrayList>(copyInvokers.size());
 Set providers = new HashSet(len);
 // 2 . Loop call, fail retry, default is 3 times
 for (int i = 0; i  // 3. The first incoming call has been checked. The second time is to try again, you need to get the latest service list again
 if (i> 0) {
 checkWhetherDestroyed();
 // Get the latest available Invoker list by calling list And check whether it is empty
 copyInvokers = list(invocation );
 checkInvokers(copyInvokers, invocation);
 }
 // 4. Core code: select Invoker through load balancing
 Invoker invoker = select(loadbalance, invocation, copyInvokers, invoked);
 // 5. The side that has been retried is added to the invoked list, and the service will be filtered during the next retry.
 invoked.add(invoker);
 RpcContext .getContext().setInvokers((List) invoked);
 try {
 // 6. Core code: call the invoke method of the target Invoker
 Result result = invoker.invoke(invocation); 
 return result;
} catch (RpcException e) {
 if (e.isBiz()) {// biz exception.
 throw e;
 }
 le = e;
} catch (Throwable e) {
 le = new RpcException(e.getMessage(), e);
} finally {
 providers.add( invoker.getUrl().getAddress());
 }
 }
 // 7. If the retry fails, an exception will be thrown
 throw new RpcException(le);< br />}

Summary: With the above foundation, it should be easy to see the code of FailoverClusterInvoker. As long as the execution fails, re-call select(loadbalance, invocation, copyInvokers, invoked) to retry. Dubbo retry 3 times by default.

Record a little bit of heart every day. Content may not be important, but habits are important!

graph LR AbstractClusterInvoker – sticky connection –> select select – loadbalance.select –> doSelect doSelect – retry –> reselect

< /p>

Dubbo Series (07-4) Cluster fault – Cluster