分布式分析系统数据一致性架构设计

我正在重构一个将进行大量计算的分析系统,我需要一些关于可能的架构设计的想法,以解决我面临的数据一致性问题.

I am refactoring an Analytic system that will do a lot of calculation, and I need some ideas on possible architectural designs to a data consistency issue I am facing.

当前架构

我有一个基于队列的系统,其中不同的请求应用程序创建最终由工作人员使用的消息.

I have a queue based system, in which different requesting applications create messages that are eventually consumed by workers.

每个请求应用"将大型计算分解成较小的部分,这些部分将被发送到队列并由工作人员处理.

Each "Requesting App" breaks down a large calculation into smaller pieces that will be sent to the queue and processed by the workers.

当所有部分都完成后,原始请求应用"将合并结果.

When all the pieces are finished, the originating "Requesting app" will consolidate the results.

此外,workers 使用来自中央数据库 (SQL Server) 的信息来处理请求(重要:worker 不会更改数据库上的任何数据,只会使用它).

Also, the workers consume information from a centralized database (SQL Server) in order to process the requests (Important: the workers do not change any data on the database, only consume it).

问题

好的.到现在为止还挺好.当我们包含更新数据库信息的 Web 服务时,就会出现问题.这可能随时发生,但至关重要的是,源自同一个请求应用程序"的每个大型计算"都会在数据库中看到相同的数据.

Ok. So far, so good. The problem arises when we include a web service that updates the information on the database. This can happen at any time, but it is critical that each "large calculation" originated from the same "Requesting App" sees the same data on the database.

例如:

  1. App A 生成消息 A1 和 A2,将其发送到队列
  2. Worker W1 选择消息 A1 进行处理.
  3. Web 服务器更新数据库,从状态 S0 更改为 S1.
  4. Worker W2 拿起消息 A2 进行处理
  1. App A generates messages A1 and A2, sending it to queue
  2. Worker W1 picks up message A1 for processing.
  3. The web server updates the database, changing from state S0 to S1.
  4. Worker W2 picks up message A2 for processing

我不能让工作人员 W2 使用数据库的状态 S1.为了使整个计算保持一致,应该使用之前的 S0 状态.

I just can´t have worker W2 using state S1 of the database. for the whole calculation to be consistent it should use the previous S0 state.

想法

  1. 锁定模式,以防止 Web 服务器在有工作人员从数据库中使用信息时更改数据库.

  1. A lock pattern to prevent the web server from changing the database while there is a worker consuming information from it.

  • 缺点:锁定可能会持续很长时间,因为不同请求应用程序"的计算可能会重叠(A1、B1、A2、B2、C1、B3 等).
  • cons: The lock might be on for a long time, since the calculation form different "Request Apps" might overlap (A1, B1, A2, B2, C1, B3, etc.).

在数据库和工作程序之间创建新层(通过请求应用程序控制数据库缓存的服务器)

Create new layer between the database and the workers (a server that controls db caching by req. app)

  • 缺点:添加另一层可能会带来很大的开销(也许?),而且工作量很大,因为我将不得不重写工作人员的持久性(大量代码).
  • cons: Adding another layer might impose significant overhead (maybe?), and it is a lot of work, since I will have to rewrite the persistence of the workers (a lot of code).

我正在等待第二种解决方案,但对它不是很有信心.

I am pending to the second solution, but not very confident about it.

有什么绝妙的主意吗?我设计错了,还是遗漏了什么?

Any brilliant ideas ? Am I designing it wrong, or missing something ?

OBS:

  • 这是一个巨大的 2 层遗留系统(在 C# 中),我们正在尝试以最少的努力演变为更具可扩展性的解决方案可能.
  • 每个工作人员可能在不同的服务器上运行.

推荐答案

感谢大家的帮助.

因为我认为这个问题在其他场景中可能很常见,所以我想分享我们选择的解决方案.

Since I believe this is problem might be usual in other scenarios, I would like to share the solution we chose.

更彻底地思考这个问题,我明白了它的真正含义.

Thinking more thoroughly about the problem, I understood it for what it really is.

  • 我需要对每个作业进行某种会话控制
  • 有一个进程内缓存,用作每个作业的会话控制

现在计算已经进化为分布式,我只需要将我的缓存也进化为分布式.

Now the calculation has evolved to be distributed, I just needed to evolve my cache to be distributed as well.

为了做到这一点,我们选择使用内存数据库(哈希值),部署为单独的服务器.(在本例中为 Redis).

In order to do that, we chose to use an In-Memory Database (hash-value), deployed as a separate server. (in this case Redis).

现在每次开始工作时,我都会为工作创建一个 ID 并将其传递给他们的消息

Now every time I start a job, I create a ID for the job and pass it to their messages

当每个工人想从数据库中获取一些信息时,它会:

When each worker wants some information from the database, it would:

  1. 在 Redis 中查找数据(使用作业 ID)
  2. 如果数据在Redis,使用数据
  3. 如果不是,则从 SQL 加载它,并将其保存在 redis 中(使用作业 ID).

在作业结束时,我清除与作业 ID 关联的所有哈希值.

At the end of the job, I clear all hashes associated with the job ID.

相关文章