SQL Server DbCommand Timeout with .Net Core container under load

我在 Open Shift Enterprise V3 上运行一个指向 SQL Server 数据库的 .Net Core 容器.

I'm running a .Net Core container on Open Shift Enterprise V3 pointing to a SQL Server database.

我有一个带有 put 方法的 .Net Core REST API,它可以添加或更新数据库中的记录.

I have a .Net Core REST API with a put method which adds or updates a record in the database.

我正在添加/更新的表有 3000 条记录并且有索引.

The table I am adding/updating had 3000 records and has indexes.

这在本地使用相同的数据库和容器可以正常工作.但是,当我开始通过容器加载负载(大约 50 个与 JMeter 的并发 http 连接)时,我得到随机超时,并显示以下错误消息.我在本地运行时没有遇到问题.

This works fine locally using the same database and with the container. However when I start to put load through the the container (approximately 50 concurrent http connections with JMeter) I get random timeouts with the error message below. I don't get the problem running locally.

本地机器比容器强大得多,我增加了容器上的 CPU 功率,但这似乎没有任何区别.

The local machine is a lot more powerful than the container and I have increased the CPU power on the container but this doesn't appear to have made any difference.

任何关于尝试的建议都将不胜感激.

Any suggestions on things to try would be appreciated.

[10:45:36 ERR] Failed executing DbCommand (35,001ms) [Parameters=[@__get_Item_0='?' (Size = 255) (DbType = AnsiString)], CommandType='Text', CommandTimeout='30']
SELECT [e].[host_name], [e].[data_centre], [e].[Environment], [e].[is_physical_machine], [e].[mac_address], [e].[number_of_cores], [e].[number_of_sockets], [e].[number_of_v_cores], [e].[number_ofcpus], [e].[operating_system], [e].[operating_system_version], [e].[processor], [e].[uuid]
FROM [EntitlementServer].[host] AS [e]
WHERE [e].[host_name] = @__get_Item_0
System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
   at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
   at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at System.Data.Common.DbCommand.ExecuteReader()
   at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.Execute(IRelationalConnection connection, DbCommandMethod executeMethod, IReadOnlyDictionary`2 parameterValues)
ClientConnectionId:6ca037fc-9671-4d43-bebe-35879203c682
Error Number:-2,State:0,Class:11

更新 1

虽然我发现扩展容器上的 CPU 和内存对性能没有帮助.当我增加运行容器的数量时,确实增加了吞吐量并且超时问题消失了.我不知道为什么会这样.

Whilst I found that scaling up the CPU and memory on the container did not help with performance. When I increased the number of running containers, that did increase throughput and the timeout issue went away. I am not sure why this would be the case.

推荐答案

问题实际上是线程问题.数据库超时是一个症状.这就是为什么增加容器数量可以解决问题的原因,因为 http 线程的数量也增加了.

The problem actually turned out to be a threading issue. The db timeout was a symptom. This is the reason why increasing the number of containers stopped the problem because the number of http threads also increased.

通过使代码异步以便释放线程,我能够解决问题而无需增加容器的数量.

By making code asynchronous so that threads are released I have been able to fix the problem without having to increase the number of containers.

相关文章