一、概叙
1.1 实现目标
服务A调用服务B1和B2(B1和B2提供同种服务),当服务B1/B2在停止和重新发布阶段,或B1/B2有一个服务故障时,
- 需保证服务A正常调用B服务,达到无感知发布的效果(服务B高可用)
- 需保证服务A的请求负载均衡,避免某个B服务节点压力过大(服务B负载均衡)
- 主要是验证服务调用超时和重试机制
说明:有用nacos服务注册发现组件。
1.2 环境
<maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <spring.boot.version>2.2.2.RELEASE</spring.boot.version> <spring.cloud.version>Hoxton.SR1</spring.cloud.version> <spring.alibaba.version>2.1.0.RELEASE</spring.alibaba.version>
服务消费端:已经排除了ribbon,用的是官方推荐的loadbalancer
二、服务调用超时和重试案例
2.1 服务提供者:provider-user
详细nacos上的服务信息
备注:provider-user启动两个服务;provider-user--3015和provider-user--4015
服务端代码
2.2 服务消费者:provider-order
retry接口用的是默认配置:PoolingHttpClientConnectionManager
retry2接口用的是自定义配置:RestTemplate
配置
2.3 负载均衡测试
启动一个消费者服务provider-order--3017;
多次请求provider-order--3017的retry和retry2,通过日志可以确认默认使用了轮询的负载均衡策略来调用provider-user--3015和provider-user--4015
2.4 高可用测试
停止其中一个provider-user-4015服务实例,确认轮询到已停止的服务时,可以成功地在未停止的服务上自动重试请求。
2.5 ribbon.restclient.enabled
1.不设置ribbon.restclient.enabled=true时
provider-order--3017:/retry 接口 直接超时报错,并未进行重试
/** todo 5秒即超时报错,公用的PoolingHttpClientConnectionManager * 2024-08-05 20:40:53.150[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200] * 2024-08-05 20:40:53.155[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015 */
provider-order--3017:/retry2 接口 7秒也不报错,且未进行重试。
@GetMapping("/retry2") // todo retry2 7秒也不报错 ,单独配置的RestTemplate; 2024-08-05 20:43:57.405[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String *2024-08-05 20:29:41.267[] user [http-nio-0.0.0.0-3015-exec-9] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s * 2024-08-05 20:29:51.358[] user [http-nio-0.0.0.0-3015-exec-8] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s * 2024-08-05 20:30:23.498[] user [http-nio-0.0.0.0-3015-exec-7] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s * 2024-08-05 20:30:31.393[] user [http-nio-0.0.0.0-3015-exec-6] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s * 2024-08-05 20:30:38.764[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s * 2024-08-05 20:31:00.140[] user [http-nio-0.0.0.0-3015-exec-1] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s * 2024-08-05 20:31:07.552[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s * 2024-08-05 20:31:15.993[] user [http-nio-0.0.0.0-3015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s * 2024-08-05 20:31:24.517[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 6s *
2.设置ribbon.restclient.enabled=true时,有三种情况
* 案例一:provider-user只启动了一个服务 * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000) * todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
* 案例一:provider-user只启动了一个服务 * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000) * todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String * * 2024-08-05 21:04:15.198[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.servlet.DispatcherServlet-91- GET "/order/api/v1/retry2?name=String", parameters={masked} * 2024-08-05 21:04:15.201[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG o.s.w.s.m.m.a.RequestMappingHandlerMapping-412- Mapped to com.zxx.study.cloud.order.controller.RestfulApiController#retry2(String) * 2024-08-05 21:04:15.206[] order [http-nio-0.0.0.0-3017-exec-7] INFO c.z.s.cloud.order.controller.RestfulApiController-255- name=String * 2024-08-05 21:04:15.207[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String * 2024-08-05 21:04:15.209[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- Accept=[application/json, application/*+json] * 2024-08-05 21:04:15.210[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.ZoneAwareLoadBalancer-112- Zone aware logic disabled or there is only one zone * 2024-08-05 21:04:15.211[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.LoadBalancerContext-551- using LB returned Server: 192.168.1.4:3015 for request: http://provider-user/user/api/v1/retry?name=String * 2024-08-05 21:04:15.212[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String * 2024-08-05 21:04:15.213[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.http4.MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000 * * todo provider-user只启动了一个服务 * todo 第一次 5秒超时,后面重试了5次,总共6此u; MaxAutoRetries:3 + MaxAutoRetriesNextServer: 2 * 2024-08-05 21:04:15.227[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= eaa55510-140d-4f5d-bf23-8adf9a620646 * 2024-08-05 21:04:15.228[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s * 2024-08-05 21:04:18.264[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ba0e6cc8-4cf6-41fe-91eb-42ec3d2e60d2 * 2024-08-05 21:04:18.265[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ff3fba7e-4798-44b8-a25e-b84e75fb828a * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= f261909b-d752-42ac-b1f9-47a1747481cc * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s * 2024-08-05 21:04:27.386[] user [http-nio-0.0.0.0-3015-exec-7] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= fcb7d6f7-a997-4629-9901-ebf894758a02 * 2024-08-05 21:04:27.387[] user [http-nio-0.0.0.0-3015-exec-7] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s * 2024-08-05 21:04:30.409[] user [http-nio-0.0.0.0-3015-exec-8] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= e6f13159-5111-4f0a-babe-3f9b6d8eff61 * 2024-08-05 21:04:30.410[] user [http-nio-0.0.0.0-3015-exec-8] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s *
* 案例二:provider-user只启动了一个服务 * todo provider-user只启动了一个服务 *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000) * todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String *
* 案例二:provider-user只启动了一个服务 * todo provider-user只启动了一个服务 *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000) * todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String * * todo retry还是直接超时,并未重试。 * 2024-08-05 21:24:05.636[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200] * 2024-08-05 21:24:05.637[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015 * 2024-08-05 21:24:05.643[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] <--- ERROR SocketTimeoutException: Read timed out (5085ms) * 2024-08-05 21:24:05.648[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] java.net.SocketTimeoutException: Read timed out * * todo 第一次 6秒超时,后面重试了2次,总共3此u; MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1 * 2024-08-05 21:18:22.255[] user [http-nio-0.0.0.0-3015-exec-1] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= dc2edb32-25a6-465c-9577-07b59388670f * 2024-08-05 21:18:22.256[] user [http-nio-0.0.0.0-3015-exec-1] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 05c593da-6b35-4f7b-8df2-15e9dab7b391 * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 07fc8c11-5d77-4ffe-a81c-9851f68a647e * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s *
* 案例三:provider-user只启动了2个服务 * * 总共有5次 com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) * 总共5次u; MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1 * 即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
* 案例三:provider-user只启动了2个服务 * * 总共有5次 com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) * 总共5次u; MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1 * 即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。 * provider-user-3015 2次 * 2024-08-05 21:34:14.407[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 38c1f932-e99f-49c1-889d-aa79af316089 * 2024-08-05 21:34:14.409[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s * 2024-08-05 21:34:19.456[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 410f7b33-46ac-4102-a0ff-3c19c18d2b52 * 2024-08-05 21:34:19.457[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s * * provider-user-4015 3次 * 2024-08-05 21:33:59.263[] user [http-nio-0.0.0.0-4015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 97936999-48f0-4257-9fce-7a78081afa4b * 2024-08-05 21:33:59.264[] user [http-nio-0.0.0.0-4015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s * 2024-08-05 21:34:04.308[] user [http-nio-0.0.0.0-4015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ac766122-6a7a-4535-b1fb-928e3a9a5f7f * 2024-08-05 21:34:04.309[] user [http-nio-0.0.0.0-4015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 7s * 2024-08-05 21:34:09.338[] user [http-nio-0.0.0.0-4015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 02283f9e-fddb-480a-a9d2-68eb3da988ac * 2024-08-05 21:34:09.339[] user [http-nio-0.0.0.0-4015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s * */
2.6 小结
1. 慎用 重试机制,GET方法也要慎用,其他方法建议不要用重试机制;OkToRetryOnAllOperations: false即只对Get生效;而true对Post,Put,Delete等均生效。 2.如果一定要用重试,建议单服务配置,同时确保接口的幂等性。 3.ribbon.restclient.enabled=true控制了重试的开关。
三、FeignLoadBalancer分析
跟踪源码,在FeignLoadBalancer中配置了重试相关的策略,如果ribbon.OkToRetryOnAllOperations配置为true,则任何请求方法都进行重试,ribbon.OkToRetryOnAllOperations配置为false时,GET请求方式也会进行重试,非GET方法只有在连接异常时才会进行重试。
@Override public RequestSpecificRetryHandler getRequestSpecificRetryHandler ( RibbonRequest request, IClientConfig requestConfig){ // 如果OkToRetryOnAllOperations配置为true,则任何请求方法/任何异常的情况都进行重试 if (this.ribbon.isOkToRetryOnAllOperations()) { return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(), requestConfig); } // OkToRetryOnAllOperations配置为false时(默认为false) // 非GET请求,只有连接异常时才进行重试 if (!request.toRequest().method().equals("GET")) { return new RequestSpecificRetryHandler(true, false, this.getRetryHandler(), requestConfig); // GET请求任何情况/任何异常都重试 } else { return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(), requestConfig); } }
通过上面的分析,我们可以知道并不是配置了ribbon.OkToRetryOnAllOperations=false就不会进行重试,对于GET请求Ribbon还是会进行重试的,而在我们的系统中并没有对Ribbon的重试机制做特殊的配置,也就是用的默认值。
Ribbon重试机制默认配置如下:
#同一实例最大重试次数,不包括首次调用。默认值为0 ribbon.MaxAutoRetries = 0 #同一个服务其他实例的最大重试次数,不包括第一次调用的实例。默认值为1 ribbon.MaxAutoRetriesNextServer = 1 #是否所有操作都允许重试。默认值为false ribbon.OkToRetryOnAllOperations = false
由于MaxAutoRetriesNextServer配置默认值为1,而我们的导入接口恰巧又是GET请求,在业务服务接口数据处理超时的情况下,所以Ribbon会自动重试一次。