配置 GlassFish JDBC 连接池以处理 Amazon RDS 多可用区故障转移
我有一个在 EC2 上的 GlassFish 中运行的 Java EE 应用程序,在 Amazon RDS 上有一个 MySQL 数据库.我正在尝试将 JDBC 连接池配置为以最大限度地减少数据库故障转移时的停机时间.
I have a Java EE application running in GlassFish on EC2, with a MySQL database on Amazon RDS. I am trying to configure the JDBC connection pool to in order to minimize downtime in case of database failover.
在多可用区故障转移期间,我的当前配置无法正常工作,因为备用数据库实例似乎在几分钟后可用(根据 AWS 控制台),而我的 GlassFish 实例仍然卡住了很长时间(大约 15 分钟),然后再开始工作.
My current configuration isn't working correctly during a Multi-AZ failover, as the standby database instance appears to be available in a couple of minutes (according to the AWS console) while my GlassFish instance remains stuck for a long time (about 15 minutes) before resuming work.
连接池是这样配置的:
asadmin create-jdbc-connection-pool --restype javax.sql.ConnectionPoolDataSource
--datasourceclassname com.mysql.jdbc.jdbc2.optional.MysqlConnectionPoolDataSource
--isconnectvalidatereq=true --validateatmostonceperiod=60 --validationmethod=auto-commit
--property user=$DBUSER:password=$DBPASS:databaseName=$DBNAME:serverName=$DBHOST:port=$DBPORT
MyPool
如果我使用 Single-AZ db.m1.small 实例并从控制台重新启动数据库,GlassFish 将使断开的连接无效,抛出一些异常,然后数据库可用后立即重新连接.在此设置中,我的停机时间不到 1 分钟.
If I use a Single-AZ db.m1.small instance and reboot the database from the console, GlassFish will invalidate the broken connections, throw some exceptions and then reconnect as soon the database is available. In this setup I get less than 1 minute of downtime.
如果我使用 多可用区 db.m1.small 实例并从 AWS 控制台通过故障转移重新启动,我看不到任何异常.服务器完全停止,所有传入请求都超时.15 分钟后,我终于明白了:
If I use a Multi-AZ db.m1.small instance and reboot with failover from the AWS console, I see no exception at all. The server halts completely, with all incoming requests timing out. After 15 minutes I finally get this:
Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.3.2.v20111125-r10461): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 940,715 milliseconds ago. The last packet sent successfully to the server was 935,598 milliseconds ago.
看起来好像每个 HTTP 线程都在无效连接上被阻塞而没有出现异常,因此没有机会执行连接验证.
It appears as if each HTTP thread gets blocked on an invalid connection without getting an exception and so there's no chance to perform connection validation.
多可用区情况下的停机时间始终在 15-16 分钟之间,因此看起来像是某种超时,但我无法更改它.
Downtime in the Multi-AZ case is always between 15-16 minutes, so it looks like a timeout of some sort but I was unable to change it.
我尝试过但没有成功的事情:
Things I have tried without success:
- 连接泄漏超时/回收
- 语句泄漏超时/回收
- 语句超时
- 使用不同的验证方法
- 使用
MysqlDataSource
而不是MysqlConnectionPoolDataSource
如何为卡住的查询设置超时,以便重用、验证和替换池中的连接?或者如何让 GlassFish 检测到数据库故障转移?
How can I set a timeout on stuck queries so that connections in the pool are reused, validated and replaced? Or how can I let GlassFish detect a database failover?
推荐答案
正如我之前评论过的,这是因为打开并连接到数据库的套接字没有意识到连接已经丢失,所以它们保持连接直到操作系统套接字超时被触发,我读到它通常在大约 30 分钟内.
As I commented before, it is because the sockets that are open and connected to the database don't realize the connection has been lost, so they stayed connected until the OS socket timeout is triggered, which I read might be usually in about 30 minutes.
要解决此问题,您需要覆盖 JDBC 连接字符串或 JDNI 连接配置/属性中的套接字超时,以将 socketTimeout 参数定义为更短的时间.
To solve the issue you need to override the socket Timeout in your JDBC Connection String or in the JDNI COnnection Configuration/Properties to define the socketTimeout param to a smaller time.
请记住,任何比定义的值长的连接都将被终止,即使它正在被使用(我无法确认这一点,这是我读到的).
Keep in mind that any connection longer than the value defined will be killed, even if it is being used (I haven't been able to confirm this, is what I read).
我在评论中提到的另外两个参数是 connectTimeout 和 autoReconnect.
The other two parameters I mention in my comment are connectTimeout and autoReconnect.
这是我的 JDBC 连接字符串:
Here's my JDBC Connection String:
jdbc:(...)&connectTimeout=15000&socketTimeout=60000&autoReconnect=true
我还通过执行禁用了 Java 的 DNS 缓存
I also disabled Java's DNS cache by doing
java.security.Security.setProperty("networkaddress.cache.ttl" , "0");
java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "0");
我这样做是因为 Java 不遵守 TTL,并且当发生故障转移时,DNS 是相同的,但 IP 会发生变化.
I do this because Java doesn't honor the TTL's, and when the failover takes place, the DNS is the same but the IP changes.
由于您使用的是应用服务器,因此在使用 -Dnet 启动 glassfish 时必须将禁用 DNS 缓存的参数传递给 JVM,而不是应用程序本身.
Since you are using an Application Server, the parameters to disable DNS cache must be passed to the JVM when starting the glassfish with -Dnet and not the application itself.
相关文章