实战:微服务之Spring Cloud 负载均衡组件loadbalance和ribbon的超时与重试机制

avatar
作者
筋斗云
阅读量:0

一、概叙

1.1 实现目标

服务A调用服务B1和B2(B1和B2提供同种服务),当服务B1/B2在停止和重新发布阶段,或B1/B2有一个服务故障时,

  • 需保证服务A正常调用B服务,达到无感知发布的效果(服务B高可用)
  • 需保证服务A的请求负载均衡,避免某个B服务节点压力过大(服务B负载均衡)
  • 主要是验证服务调用超时和重试机制

说明:有用nacos服务注册发现组件。

1.2 环境

        <maven.compiler.source>1.8</maven.compiler.source>         <maven.compiler.target>1.8</maven.compiler.target>          <spring.boot.version>2.2.2.RELEASE</spring.boot.version>         <spring.cloud.version>Hoxton.SR1</spring.cloud.version>         <spring.alibaba.version>2.1.0.RELEASE</spring.alibaba.version>

服务消费端:已经排除了ribbon,用的是官方推荐的loadbalancer

二、服务调用超时和重试案例

2.1 服务提供者:provider-user

详细nacos上的服务信息

备注:provider-user启动两个服务;provider-user--3015和provider-user--4015

服务端代码

2.2 服务消费者:provider-order

retry接口用的是默认配置:PoolingHttpClientConnectionManager

retry2接口用的是自定义配置:RestTemplate

配置

2.3 负载均衡测试

启动一个消费者服务provider-order--3017;

多次请求provider-order--3017的retry和retry2,通过日志可以确认默认使用了轮询的负载均衡策略来调用provider-user--3015和provider-user--4015

2.4 高可用测试

停止其中一个provider-user-4015服务实例,确认轮询到已停止的服务时,可以成功地在未停止的服务上自动重试请求。

2.5 ribbon.restclient.enabled

1.不设置ribbon.restclient.enabled=true时

provider-order--3017:/retry 接口 直接超时报错,并未进行重试

    /** todo 5秒即超时报错,公用的PoolingHttpClientConnectionManager      * 2024-08-05 20:40:53.150[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]      * 2024-08-05 20:40:53.155[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015      */

provider-order--3017:/retry2 接口   7秒也不报错,且未进行重试。

    @GetMapping("/retry2") // todo retry2 7秒也不报错 ,单独配置的RestTemplate; 2024-08-05 20:43:57.405[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String            *2024-08-05 20:29:41.267[] user [http-nio-0.0.0.0-3015-exec-9] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s          * 2024-08-05 20:29:51.358[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s          * 2024-08-05 20:30:23.498[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s          * 2024-08-05 20:30:31.393[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s          * 2024-08-05 20:30:38.764[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s          * 2024-08-05 20:31:00.140[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s          * 2024-08-05 20:31:07.552[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s          * 2024-08-05 20:31:15.993[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s          * 2024-08-05 20:31:24.517[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 6s          *

2.设置ribbon.restclient.enabled=true时,有三种情况

* 案例一:provider-user只启动了一个服务 * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000) * todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String 
    * 案例一:provider-user只启动了一个服务      * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)      * todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String      *      * 2024-08-05 21:04:15.198[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.servlet.DispatcherServlet-91- GET "/order/api/v1/retry2?name=String", parameters={masked}      * 2024-08-05 21:04:15.201[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG o.s.w.s.m.m.a.RequestMappingHandlerMapping-412- Mapped to com.zxx.study.cloud.order.controller.RestfulApiController#retry2(String)      * 2024-08-05 21:04:15.206[] order [http-nio-0.0.0.0-3017-exec-7] INFO  c.z.s.cloud.order.controller.RestfulApiController-255- name=String      * 2024-08-05 21:04:15.207[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String      * 2024-08-05 21:04:15.209[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- Accept=[application/json, application/*+json]      * 2024-08-05 21:04:15.210[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.ZoneAwareLoadBalancer-112- Zone aware logic disabled or there is only one zone      * 2024-08-05 21:04:15.211[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.LoadBalancerContext-551- using LB returned Server: 192.168.1.4:3015 for request: http://provider-user/user/api/v1/retry?name=String      * 2024-08-05 21:04:15.212[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String      * 2024-08-05 21:04:15.213[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.http4.MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000      *      * todo provider-user只启动了一个服务      * todo 第一次 5秒超时,后面重试了5次,总共6此u;   MaxAutoRetries:3 + MaxAutoRetriesNextServer: 2      * 2024-08-05 21:04:15.227[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= eaa55510-140d-4f5d-bf23-8adf9a620646      * 2024-08-05 21:04:15.228[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s      * 2024-08-05 21:04:18.264[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ba0e6cc8-4cf6-41fe-91eb-42ec3d2e60d2      * 2024-08-05 21:04:18.265[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s      * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ff3fba7e-4798-44b8-a25e-b84e75fb828a      * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s      * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= f261909b-d752-42ac-b1f9-47a1747481cc      * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s      * 2024-08-05 21:04:27.386[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= fcb7d6f7-a997-4629-9901-ebf894758a02      * 2024-08-05 21:04:27.387[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s      * 2024-08-05 21:04:30.409[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= e6f13159-5111-4f0a-babe-3f9b6d8eff61      * 2024-08-05 21:04:30.410[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s      *
* 案例二:provider-user只启动了一个服务 * todo provider-user只启动了一个服务 *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000) * todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String *
   * 案例二:provider-user只启动了一个服务      * todo provider-user只启动了一个服务      *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)      * todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String      *      * todo retry还是直接超时,并未重试。      * 2024-08-05 21:24:05.636[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]      * 2024-08-05 21:24:05.637[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015      * 2024-08-05 21:24:05.643[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] <--- ERROR SocketTimeoutException: Read timed out (5085ms)      * 2024-08-05 21:24:05.648[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] java.net.SocketTimeoutException: Read timed out      *      * todo 第一次 6秒超时,后面重试了2次,总共3此u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1      * 2024-08-05 21:18:22.255[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= dc2edb32-25a6-465c-9577-07b59388670f      * 2024-08-05 21:18:22.256[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s      * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 05c593da-6b35-4f7b-8df2-15e9dab7b391      * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s      * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 07fc8c11-5d77-4ffe-a81c-9851f68a647e      * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s      *
* 案例三:provider-user只启动了2个服务 * * 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) * 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1 *  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
   * 案例三:provider-user只启动了2个服务      *      * 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )      * 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1      *  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。      *  provider-user-3015 2次      *  2024-08-05 21:34:14.407[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 38c1f932-e99f-49c1-889d-aa79af316089      * 2024-08-05 21:34:14.409[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s      * 2024-08-05 21:34:19.456[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 410f7b33-46ac-4102-a0ff-3c19c18d2b52      * 2024-08-05 21:34:19.457[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s      *      *  provider-user-4015 3次      *  2024-08-05 21:33:59.263[] user [http-nio-0.0.0.0-4015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 97936999-48f0-4257-9fce-7a78081afa4b      * 2024-08-05 21:33:59.264[] user [http-nio-0.0.0.0-4015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s      * 2024-08-05 21:34:04.308[] user [http-nio-0.0.0.0-4015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ac766122-6a7a-4535-b1fb-928e3a9a5f7f      * 2024-08-05 21:34:04.309[] user [http-nio-0.0.0.0-4015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 7s      * 2024-08-05 21:34:09.338[] user [http-nio-0.0.0.0-4015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 02283f9e-fddb-480a-a9d2-68eb3da988ac      * 2024-08-05 21:34:09.339[] user [http-nio-0.0.0.0-4015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s      * */

2.6 小结

1. 慎用 重试机制,GET方法也要慎用,其他方法建议不要用重试机制;OkToRetryOnAllOperations: false即只对Get生效;而true对Post,Put,Delete等均生效。   2.如果一定要用重试,建议单服务配置,同时确保接口的幂等性。   3.ribbon.restclient.enabled=true控制了重试的开关。 

三、FeignLoadBalancer分析

跟踪源码,在FeignLoadBalancer中配置了重试相关的策略,如果ribbon.OkToRetryOnAllOperations配置为true,则任何请求方法都进行重试,ribbon.OkToRetryOnAllOperations配置为false时,GET请求方式也会进行重试,非GET方法只有在连接异常时才会进行重试。

@Override public RequestSpecificRetryHandler getRequestSpecificRetryHandler (         RibbonRequest request, IClientConfig requestConfig){     // 如果OkToRetryOnAllOperations配置为true,则任何请求方法/任何异常的情况都进行重试     if (this.ribbon.isOkToRetryOnAllOperations()) {         return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),                 requestConfig);     }     // OkToRetryOnAllOperations配置为false时(默认为false)     // 非GET请求,只有连接异常时才进行重试     if (!request.toRequest().method().equals("GET")) {         return new RequestSpecificRetryHandler(true, false, this.getRetryHandler(),                 requestConfig);         // GET请求任何情况/任何异常都重试     } else {         return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),                 requestConfig);     } } 

通过上面的分析,我们可以知道并不是配置了ribbon.OkToRetryOnAllOperations=false就不会进行重试,对于GET请求Ribbon还是会进行重试的,而在我们的系统中并没有对Ribbon的重试机制做特殊的配置,也就是用的默认值。

Ribbon重试机制默认配置如下:

#同一实例最大重试次数,不包括首次调用。默认值为0 ribbon.MaxAutoRetries = 0 #同一个服务其他实例的最大重试次数,不包括第一次调用的实例。默认值为1 ribbon.MaxAutoRetriesNextServer = 1 #是否所有操作都允许重试。默认值为false ribbon.OkToRetryOnAllOperations = false 

由于MaxAutoRetriesNextServer配置默认值为1,而我们的导入接口恰巧又是GET请求,在业务服务接口数据处理超时的情况下,所以Ribbon会自动重试一次。

    广告一刻

    为您即时展示最新活动产品广告消息,让您随时掌握产品活动新动态!