如何准确检测 SQL Server 作业是否正在运行并处理已运行的作业?

2021-09-10 00:00:00 tsql sql-server sql-agent-job sql-server-agent

我目前正在使用这样的代码来检测 SQL 服务器作业是否正在运行.(这是 SQL Server 2005，所有 SP)

I'm currently using code like this to detect if a SQL server job is running. (this is SQL Server 2005, all SP's)

return (select isnull( (select top 1 CASE WHEN current_execution_status = 4 THEN 0 ELSE 1 END from openquery(devtestvm, 'EXEC msdb.dbo.sp_help_job') where current_execution_status = 4 and name = 'WQCheckQueueJob' + cast(@Index as varchar(10)) ), 1) )

没有问题，一般来说，它工作得很好.

No problems there, and generally speaking, it works just fine.

但是....(总是一个但是)

But.... (always a but)

有时，我会调用它，返回作业未运行"结果，此时我将尝试通过

On occasion, I'll invoke this, get back a "job is not running" result, at which point I'll try and start the job, via

exec msdb.dbo.sp_start_job @JobName

并且 SQL 将返回SQLAgent 拒绝启动作业，因为它已经有一个挂起的请求".

and SQL will return that "SQLAgent has refused to start the job because it already has a pending request".

好的.也不是问题.可以想象，在这段代码可以启动目标作业之前，有一个小窗口可以启动它，但是在检查它是否启动之后.但是，我可以将其包含在 try catch 中并忽略错误，对吗?

Ok. Also not a problem. It's conceivable that there's a slight window where the target job could get started before this code can start it, but after checking if it's started. However, I can just wrap that up in a try catch and just ignore the error, right?

begin try if dbo.WQIsQueueJobActive(@index) = 0 begin exec msdb.dbo.sp_start_job @JobName break end end try begin catch -- nothing here end catch

问题来了.

10 次中有 9 次，这很好用.SQL 代理将引发错误，它被捕获，并且处理继续进行，因为作业已经在运行，没有伤害没有犯规.

9 times out of 10, this works just fine. SQL agent will raise the error, it's caught, and processing just continues on, since the job is already running, no harm no foul.

但偶尔，我会在作业历史记录"视图中收到一条消息(请记住上面的代码来检测特定作业是否正在运行，如果不是从另一个作业实际运行，则启动它)说作业失败，因为SQLAgent 拒绝启动作业，因为它已经有一个待处理的请求".

But occasionally, I'll get a message in the Job History view (keep in mind the above code to detect if a specific job is running and start it if not is actually running from another job) saying that the job failed because "SQLAgent has refused to start the job because it already has a pending request".

当然，这正是 TRY CATCH 应该处理的错误！

Of course, this is the exact error that TRY CATCH is supposed to be handling!

当发生这种情况时，正在执行的作业就会终止，但据我所知不会立即终止，只是非常接近.我把日志记录到处都是，没有一致性.一次失败，它会在 a 点，下一次在 b 点.在某些情况下，A 地和 B 地只有一个

When this happens, the executing job just dies, but not immediately from what I can tell, just pretty close. I've put logging all over the place and there's no consistency. One time it fails, it'll be at place a, the next time at place b. In some cases, Place A and place B have nothing but a

select @var = 'message'

在他们之间.很奇怪.基本上，该作业似乎被毫不客气地转储，并且该作业中剩下的任何要执行的内容都完全 + 不 + 执行.

in between them. Very strange. Basically, the job appears to be unceremoniously dumped and anything left to execute in the job is +not+ executed at all.

但是，如果我删除exec StartJob"(或者当我知道目标作业无法运行时，它只被调用一次)，一切正常，我在作业中的所有处理都会运行.

However, if I remove the "exec StartJob" (or have it invoked exactly one time, when I KNOW that the target job can't already be running), everything works perfectly and all my processing in the job runs through.

这一切背后的目的是让作业作为触发器的结果(除其他外)启动，如果作业已经启动，则真的没有必要重新启动".

The purpose behind all this is to have a job started as a result of a trigger (among other things), and, if the job is already started, there's really no need to "start it again".

有人在使用 SQL Agent 的作业处理时遇到过这样的行为吗?

Anyone ever run into behavior like this with SQL Agent's Job handling?

目前的控制流程是这样的:

Current flow of control is like so:

更改为表(更新或插入)...
触发调用...的触发器
一个调用...的存储过程
sp_Start_Job 其中...
开始一个特定的工作......
调用另一个存储过程(称为 CheckQueue)...
执行一些处理并...
检查几个表格，根据它们的内容可能...
在另一个作业上调用 sp_start_job 以启动第二个同步作业处理额外的工作(第二个作业也调用 CheckQueue sproc但是这两个调用对完全独立的数据集进行操作)

推荐答案

首先，你有没有机会看一下 service broker?从你的描述来看，这似乎是你真正想要的.

First of all, have you had a chance to look at service broker? From your description, it sounds like that's what you actually want.

不同之处在于您将数据放入 SB 队列而不是开始工作，SB 将异步调用您的处理过程，并且完全回避已经运行的工作等问题.它会自动生成/终止其他线程和需求决定，它负责秩序等.

The difference would be instead of starting a job, you put your data into a SB queue and SB will call your processing proc asynchronously and completely side-step issues with already-running jobs etc. It will auto spawn/terminate additional threads and demand dictates, it takes care of order etc.

这是一个很好的(并且相关的)教程.http://www.sqlteam.com/article/centralized-asynchronous-审计与服务经纪人

Here's a good (and vaguely related) tutorial. http://www.sqlteam.com/article/centralized-asynchronous-auditing-with-service-broker

让我们假设无论出于何种原因您都不能使用 SB(但说真的，一定要这样做！).

Let's assume that you can't use SB for whatever reason (but seriously, do!).

如何使用作业 spid 的 context_info.

What about using the job spid's context_info.

您的工作调用了一个单独执行每个步骤的包装程序.
包装过程中的第一条语句是

Your job calls a wrapper proc that execs each step individually.

The first statement inside the wrapper proc is

DECLARE @context_info VARBINARY(30) SET @context_info = CAST('MyJob1' AS VARBINARY) SET CONTEXT_INFO @context_info

当您的 proc 完成时(或在您的 catch 块中)

When your proc finishes (or in your catch block)

SET CONTEXT_INFO 0x0

当你打算打电话给你的工作时，这样做:

When you are looking at calling your job, do this:

IF NOT EXISTS (SELECT * FROM master..sysprocesses WITH (NOLOCK) WHERE context_info=CAST('MyJob1' AS VARBINARY)) EXEC StartJob

当您的包装过程终止或连接关闭时，您的 context_info 将消失.

When your wrapper proc terminates or the connection is closed, your context_info goes away.

您还可以使用全局临时表(即##JobStatus)，当所有引用它的 spid 断开连接或显式删除它时，它们将消失.

You could also use a global temp table (i.e. ##JobStatus) They will disappear when all spids that reference it disconnect or if it's explicitly dropped.

只是一些想法.

相关文章