Spring Boot随机SSLException:在带有JDK11的Kubernetes中重置连接

2022-08-21 00:00:00 ssl spring java resttemplate

上下文:

  • 我们有一个Spring Boot(2.3.1.RELEASE)Web应用程序
  • 它是用Java 8编写的,但在使用Java 11(openjdk:11.0.6-jre-stretch)的容器中运行。
  • 它有一个数据库连接和一个通过HTTPS(简单RestTemplate#交换方法)调用的上游服务(这很重要!)
  • 部署在Kubernetes集群内(不确定这是否重要)

问题:

  • 每天,我都会看到一小部分针对上游服务的请求失败,错误为:I/O error on GET request for "https://upstream.xyz/path": Connection reset; nested exception is javax.net.ssl.SSLException: Connection reset
  • 错误完全是随机的,并且间歇性地发生。
  • 我们遇到过与JRE11和TLS 1.3协商问题相关的类似错误(javax.net.ssl.SSLProtocolException: Connection reset)。我们已将Docker映像更新为上述内容,并已修复该问题。
  • 这是来自错误的堆栈跟踪:
java.net.SocketException: Connection reset
    at java.base/java.net.SocketInputStream.read(Unknown Source)
    at java.base/java.net.SocketInputStream.read(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketInputRecord.read(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:739)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:674)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:583)
....

配置:

public static RestTemplate create(final int maxTotal, final int defaultMaxPerRoute,
                                  final int connectTimeout, final int readTimeout,
                                  final String userAgent) {
    final Registry<ConnectionSocketFactory> schemeRegistry = RegistryBuilder.<ConnectionSocketFactory>create()
            .register("http", PlainConnectionSocketFactory.getSocketFactory())
            .register("https", SSLConnectionSocketFactory.getSocketFactory())
            .build();

    final PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(schemeRegistry);
    connManager.setMaxTotal(maxTotal);
    connManager.setDefaultMaxPerRoute(defaultMaxPerRoute);

    final CloseableHttpClient httpClient = HttpClients.custom()
            .setConnectionManager(connManager)
            .setUserAgent(userAgent)
            .setDefaultRequestConfig(RequestConfig.custom()
                                             .setConnectTimeout(connectTimeout)
                                             .setSocketTimeout(readTimeout)
                                             .setExpectContinueEnabled(false).build())
            .build();

    return new RestTemplateBuilder()
            .requestFactory(() -> new HttpComponentsClientHttpRequestFactory(httpClient))
            .build();
}

有人遇到过这个问题吗? 当我打开http客户端上的调试日志时,它充满了噪音,我无法辨别出任何有用的东西...


解决方案

我们在迁移到aws/kubernetes时遇到了类似的问题。 我已经找到原因了。

您正在使用连接池。PoolingHttpClientConnectionManager的默认行为是它将重复使用连接。因此,当您的请求完成时,连接不会立即关闭。这将节省资源,因为不必一直重新连接。

Kubernetes集群使用NAT(网络地址转换)进行传出连接。当某个连接在一段时间内未使用时,该连接将从NAT表中移除,并且该连接将被断开。这会导致看似随机的SSLExceptions。

在AWS上,当NAT表处于空闲状态350秒时,连接将从NAT表中删除。其他Kubernetes实例可能有其他设置。

参见https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html

解决方案:

禁用连接重用:

final CloseableHttpClient closeableHttpClient = HttpClients.custom()
    .setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE)
    .setConnectionManager(poolingHttpClientConnectionManager)
    .build();

或,让httpClient驱逐空闲时间过长的连接:

return HttpClients.custom()
            .evictIdleConnections(300, TimeUnit.SECONDS)  //Read the javadocs, may not be used when the instance of HttpClient is created inside an EJB container.
            .setConnectionManager(poolingHttpClientConnectionManager)
            .build();
        

或使用永远不返回-1或超时值超过300秒的自定义KeepAliveStrategy调用setConnectionKeepAliveStrategy(....)

相关文章