用于报告和日常交易的数据库
我有一个保存大量数据的系统.使用的数据库是 SQL Server.其中一个表有大约 300000 行,并且有很多这种大小的表.此表会定期更新 - 我们将其称为事务数据库",其中发生事务.
I have a system that holds some big amount of data. The database used is SQL Server. One of the tables have around 300000 rows, and there are quite a few number of tables of this size. There happens regular updates on this table - we say this as "transactional database" where transactions are happening.
现在,我们需要实现报告功能.一些架构师建议使用不同的数据库,该数据库是该数据库的副本 + 一些用于报告的附加表.他们提议这样做是因为他们不想破坏事务性数据库功能.为此,必须经常将数据移动到报告数据库.我的问题是,是否真的需要为此目的拥有第二个数据库?我们可以将事务数据库本身用于报告目的吗?由于必须将数据移动到不同的数据库,因此会涉及延迟,如果事务数据库本身用于报告,则情况并非如此.期待一些专家的建议.
Now, we need to implement a reporting functionality. Some of the architect folks are proposing a different database which is a copy of this database + some additional tables for reporting. They propose this because they do not want to disrupt the transactional database functionality. For this, data has to be moved to the reporting database frequently. My question here is, is it really required to have second database for this purpose? Can we use the transactional database itself for reporting purposes? Since the data has to be moved to a different database, there will be latency involved which is not the case if the transactional database itself is used for reporting. Expecting some expert advice.
推荐答案
您需要对 ETL、数据仓库和报告数据库进行一些研究,因为我认为您的架构师可能会很好地解决这个问题.由于您没有提供实际报告的详细信息,我将尝试回答一般情况.
You need to do some research into ETLs, Data Warehousing and Reporting databases, as I think your architects may be addressing this in a good way. Since you don't give details of the actual reports I'll try and answer the general case.
(免责声明:我在这个领域工作,我们有适合这个领域的产品)
(Disclaimer: I work in this field and we have products geared to this)
事务数据库针对读取/更新/插入之间的良好平衡进行了优化,并且索引和表规范化针对此效果.
Transactional databases are optimised for a good balance between read/update/insert, and the indexes and table normalisations are geared to this effect.
报告数据库非常适合读取访问而不是其他所有事情.这意味着将应用于事务数据库的正常"规范化规则将不适用.事实上,高度的非规范化可能已经到位,以使报告查询更有效、更易于管理.
Reporting databases are geared to be very very optimal for read access over and above all other things. This means that the 'normal' normalisation rules that one would apply to a transactional database won't apply. In fact high degrees of de-normalisation may be in place to make the report queries way more efficient and simpler to manage.
在事务数据库上运行复杂的(尤其是在扩展数据范围内的聚合,例如历史时间框架)查询,可能会影响性能,从而使数据库的关键用户 - 事务生成器可能受到负面影响.
Running complex (especially aggregations over extended data ranges such as historical time frames) queries on transactional database, may impact the performance such that the key users of the database - the transaction generators could be negatively impacted.
虽然在您的情况下可能不需要报告数据库,但您可能会发现将两个用例分开更简单.
Though a reporting database may not be required in your situation you may find that the it's simpler to keep the two use cases separate.
您对数据延迟的担忧是真实存在的.这只能由将使用报告的业务用户回答.人们通常会说我们想要实时信息",而实际上很多(如果不是全部)他们的需求都包含在非实时信息中.可接受的数据陈旧程度只能由他们来回答
Your concern about the data latency is a real one. This can only be answered by the business users who will consume the reports. Often people say "We want real time info" when in fact lots if not all of their requirements are covered with non real time info. The acceptable degree of data staleness can only be answered by them
事实上,我建议您稍微进一步研究一下,并针对您的报告关注点查看多维多维数据集,而不仅仅是报告数据库.将您的报告问题抽象到全新的水平.
In fact I'd suggest that you take your research slight further and look at multidimensional cubes for your report concerns as opposed just reporting databases. There are designed abstract your reporting concerns to whole new level.
相关文章