在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关闭工作人员

2022-01-11 00:00:00 python heroku celery sigterm rabbitmq

问题描述

我对此进行了大量研究,但我很惊讶我还没有在任何地方找到一个好的答案.

I've done a ton of research on this, and I'm surprised I haven't found a good answer to this yet anywhere.

我在 Heroku 上运行一个大型应用程序,并且我有一些运行很长时间的 celery 任务,并在任务结束时保存结果.每次我在 Heroku 上重新部署时,它都会发送 SIGTERM(最终发送 SIGKILL)并杀死我正在运行的工作人员.我正在尝试找到一种方法让工作实例优雅地关闭自己并重新排队以供稍后处理,以便最终我们可以保存所需的结果而不是丢失排队的任务.

I'm running a large application on Heroku, and I have certain celery tasks that run for a very long time processing, and at the end of the task save a result. Every time I redeploy on Heroku, it sends SIGTERM (and eventually, SIGKILL) and kills my running worker. I'm trying to find a way for the worker instance to shut itself down gracefully and re-queue itself for processing later so that eventually we can save the required result instead of losing the queued task.

我找不到让工作人员正确收听 SIGTERM 的方法.我得到的最接近的,在直接运行 python manage.py celeryd 时有效,但在使用工头模拟 Heroku 时 NOT,如下:

I cannot find a way that works to have the worker listen for SIGTERM properly. The closest I've gotten, which works when running python manage.py celeryd directly but NOT when emulating Heroku using foreman, is the following:

@app.task(bind=True, max_retries=1)
def slow(self, x):
    try:
        for x in range(100):
            print 'x: ' + unicode(x)
            time.sleep(10)
    except exceptions.MaxRetriesExceededError:
        logger.error('whoa')
    except (exceptions.WorkerShutdown, exceptions.WorkerTerminate) as exc:
        logger.error(u'retrying, ' + unicode(exc))
        raise self.retry(exc=exc, countdown=10)
    except (KeyboardInterrupt, SystemExit) as exc:
        print 'retrying'
        raise self.retry(exc=exc, countdown=10)
    else:
        return x
    finally:
        logger.info('task ended!')

当我在工头中启动这个 celery 任务并按 Ctrl+C 时,会发生以下情况:

When I start this celery task running within foreman and hit Ctrl+C, the following happens:

^CSIGINT received
22:20:59 system   | sending SIGTERM to all processes
22:20:59 web.1    | exited with code 0
22:21:04 system   | sending SIGKILL to all processes
Killed: 9

所以很明显,我在其他帖子中看到的任何 celery 异常,以及 KeyboardInterruptSystemExit 异常都没有正确捕获 SIGTERM 并关闭工作程序.

So it's clear that none of the celery exceptions, nor the KeyboardInterrupt or SystemExit exceptions I've seen in other posts, properly catch SIGTERM and shut down the worker.

这样做的正确方法是什么?

What is the right way to do this?


解决方案

从 >= 4 版本开始,Celery 带有一个特殊功能,专为 Heroku 提供,开箱即用地支持此功能:

Starting in version >= 4, Celery comes with a special feature, just for Heroku, that supports this functionality out of the box:

$ REMAP_SIGTERM=SIGQUIT celery -A proj worker -l info

来源:https://devcenter.heroku.com/articles/celery-heroku#using-remap_sigterm

相关文章