与时间属性相关的设计数据库

我想设计一个数据库,描述如下:每个产品在一个时间点只有一种状态.但是,产品的状态在其生命周期内可能会发生变化.如何设计产品和状态之间的关系,以便可以轻松查询当前特定状态的所有产品?另外,谁能给我一些关于设计数据库的深入细节,这些细节与上述问题的持续时间有关?感谢您的帮助

I want to design a database which is described as follows: Each product has only one status at one time point. However, the status of a product can change during its life time. How could I design the relationship between product and status which can easily be queried all product of a specific status at current time? In addition, could anyone please give me some in-depth details about design database which related to time duration as problem above? Thanks for any help

推荐答案

这里有一个模型可以实现您的既定要求.

Here is a model to achieve your stated requirement.

时间序列数据模型链接

IDEF1X 符号链接适用于不熟悉关系建模标准的人.

Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

  • 标准化为 5NF;没有重复的列;没有更新异常,没有空值.

  • Normalised to 5NF; no duplicate columns; no Update Anomalies, no Nulls.

当产品的状态发生变化时,只需在 ProductStatus 中插入一行,其中包含当前的日期时间.无需触摸前几行(这是真的,并且仍然是真的).报告工具(您的应用除外)无需解释任何虚拟值.

When the Status of a Product changes, simply insert a row into ProductStatus, with the current DateTime. No need to touch previous rows (which were true, and remain true). No dummy values which report tools (other than your app) have to interpret.

DateTime 是产品处于该状态的实际日期时间;从",如果你愿意的话.To"很容易推导出:它是产品的下一行(DateTime > From")的 DateTime;如果不存在,则值为当前 DateTime(使用 ISNULL).

The DateTime is the actual DateTime that the Product was placed in that Status; the "From", if you will. The "To" is easily derived: it is the DateTime of the next (DateTime > "From") row for the Product; where it does not exist, the value is the current DateTime (use ISNULL).

第一个模型完成;(ProductId, DateTime) 足以为主键提供唯一性.但是,由于您对某些查询条件要求速度,我们可以在物理层面增强模型,并提供:

The first model is complete; (ProductId, DateTime) is enough to provide uniqueness, for the Primary Key. However, since you request speed for certain query conditions, we can enhance the model at the physical level, and provide:

  • 一个索引(我们已经有了 PK 索引,所以我们将在添加第二个索引之前先增强它)以支持涵盖的查询(可以提供基于 { ProductId | DateTime | Status } 的任何安排的查询)通过索引,而不必转到数据行).这将 Status::ProductStatus 关系从非识别(虚线)更改为识别类型(实线).

  • An Index (we already have the PK Index, so we will enhance that first, before adding a second index) to support covered queries (those based on any arrangement of { ProductId | DateTime | Status } can be supplied by the Index, without having to go to the data rows). Which changes the Status::ProductStatus relation from Non-Identifying (broken line) to Identifying type (solid line).

PK 安排的选择是基于大多数查询将是时间序列,基于 Product⇢DateTime⇢Status.

The PK arrangement is chosen on the basis that most queries will be Time Series, based on Product⇢DateTime⇢Status.

提供第二个索引以提高基于状态的查询速度.

The second index is supplied to enhance the speed of queries based on Status.

在替代安排中,情况相反;即,我们最想要的是所有产品的当前状态.

In the Alternate Arrangement, that is reversed; ie, we mostly want the current status of all Products.

在 ProductStatus 的所有再现中,二级索引(不是 PK)中的 DateTime 列是 DESCending;最新的排在第一位.

In all renditions of ProductStatus, the DateTime column in the secondary Index (not the PK) is DESCending; the most recent is first up.

我已经提供了您要求的讨论.当然,您需要对合理大小的数据集进行试验,并做出自己的决定.如果这里有什么不明白的,请追问,我会补充.

I have provided the discussion you requested. Of course, you need to experiment with a data set of reasonable size, and make your own decisions. If there is anything here that you do not understand, please ask, and I will expand.

报告当前状态为 2 的所有产品

Report all Products with Current State of 2

SELECT  ProductId,
        Description
    FROM  Product       p,
          ProductStatus ps
    WHERE p.ProductId = ps.ProductId  -- Join
    AND   StatusCode  = 2             -- Request
    AND   DateTime    = (             -- Current Status on the left ...
        SELECT MAX(DateTime)          -- Current Status row for outer Product
            FROM  ProductStatus ps_inner
            WHERE p.ProductId = ps_inner.ProductId
            )

  • ProductId 被索引,前导列,两边

  • ProductId is Indexed, leading col, both sides

DateTime 在索引中,在覆盖查询选项中的第二列

DateTime in Indexed, 2nd col in Covered Query Option

StatusCode 被索引,覆盖查询选项中的第三列

StatusCode is Indexed, 3rd col in Covered Query Option

由于Index中的StatusCode是DESCending,所以只需要一次fetch就可以满足内部查询

Since StatusCode in the Index is DESCending, only one fetch is required to satisfy the inner query

对于一个查询,同时需要行;它们靠得很近(由于 Clstered 索引);由于行数较短,几乎总是在同一页面上.

the rows are required at the same time, for the one query; they are close together (due to Clstered Index); almost always on the same page due to the short row size.

这是普通的SQL,一个子查询,利用SQL引擎的强大,关系集处理.这是一种正确的方法,没有什么比这更快,任何其他方法都会更慢.任何报告工具只需点击几下即可生成此代码,无需输入.

This is ordinary SQL, a subquery, using the power of the SQL engine, Relational set processing. It is the one correct method, there is nothing faster, and any other method would be slower. Any report tool will produce this code with a few clicks, no typing.

ProductStatus 中的两个日期

诸如 DateTimeFrom 和 DateTimeTo 之类的列是严重错误.让我们按重要性排序.

Columns such as DateTimeFrom and DateTimeTo are gross errors. Let's take it in order of importance.

  1. 这是一个严重的标准化错误.DateTimeTo"很容易从下一行的单个 DateTime 派生出来;因此它是多余的,重复的列.

  1. It is a gross Normalisation error. "DateTimeTo" is easily derived from the single DateTime of the next row; it is therefore redundant, a duplicate column.

  • 精度不存在:这可以通过数据类型(日期、日期时间、SMALLDATETIME)轻松解决.是否显示少一秒、微秒或纳秒,是一项业务决策;它与存储的数据无关.

实现 DateTo 列是 100% 重复(下一行的 DateTime).这需要两倍的磁盘空间.对于大表来说,这将是大量不必要的浪费.

Implementing a DateTo column is a 100% duplicate (of DateTime of the next row). This takes twice the disk space. For a large table, that would be significant unnecessary waste.

鉴于它是一个短行,每次访问时,您将需要两倍的逻辑和物理 I/O 来读取表.

Given that it is a short row, you will need twice as many logical and physical I/Os to read the table, on every access.

并且两倍的缓存空间(或者换句话说,任何给定的缓存空间中只有一半的行可以容纳).

And twice as much cache space (or put another way, only half as many rows would fit into any given cache space).

通过引入重复列,您引入了出错的可能性(现在可以通过两种方式导出值:从重复的 DateTimeTo 列或下一行的 DateTimeFrom).

By introducing a duplicate column, you have introduced the possibility of error (the value can now be derived two ways: from the duplicate DateTimeTo column or the DateTimeFrom of the next row).

这也是一个更新异常.当您更新任何 DateTimeFrom 被更新时,前一行的 DateTimeTo 必须被获取(没什么大不了,因为它已经关闭)和更新(大不了,因为它是一个可以避免的附加动词).

This is also an Update Anomaly. When you update any DateTimeFrom is Updated, the DateTimeTo of the previous row has to be fetched (no big deal as it is close) and Updated (big deal as it is an additional verb that can be avoided).

更短"和编码快捷方式"无关紧要,SQL 是一种繁琐的数据操作语言,但SQL 就是我们所拥有的(Just Deal With It).任何不能编码子查询的人都不应该编码.任何复制列以减轻次要编码困难"的人都不应该对数据库进行建模.

"Shorter" and "coding shortcuts" are irrelevant, SQL is a cumbersome data manipulation language, but SQL is all we have (Just Deal With It). Anyone who cannot code a subquery really should not be coding. Anyone who duplicates a column to ease minor coding "difficulty" really should not be modelling databases.

请注意,如果维持最高阶规则(归一化),则整个低阶问题都将被消除.

Note well, that if the highest order rule (Normalisation) was maintained, the entire set of lower order problems are eliminated.

从集合的角度考虑

  • 任何在编写简单 SQL 时遇到困难"或痛苦"的人都无法执行其工作职能.通常,开发人员不考虑集合,而关系数据库是面向集合的模型.

  • Anyone having "difficulty" or experiencing "pain" when writing simple SQL is crippled in performing their job function. Typically the developer is not thinking in terms of sets and the Relational Database is set-oriented model.

对于上面的查询,我们需要当前日期时间;由于 ProductStatus 是按时间顺序排列的产品状态的集合,我们只需要属于产品的集合的最新或 MAX(DateTime).

For the query above, we need the Current DateTime; since ProductStatus is a set of Product States in chronological order, we simply need the latest, or MAX(DateTime) of the set belonging to the Product.

现在让我们看一些所谓的困难",就集合而言.对于每个产品处于特定状态的持续时间的报告:DateTimeFrom 是一个可用列,并定义了水平截止值,一个子 set(我们可以排除较早的行);DateTimeTo 是产品状态子集合中最早的.

Now let's look at something allegedly "difficult", in terms of sets. For a report of the duration that each Product has been in a particular State: the DateTimeFrom is an available column, and defines the horizontal cut-off, a sub set (we can exclude earlier rows); the DateTimeTo is the earliest of the sub set of Product States.

SELECT               ProductId,
                     Description,
        [DateFrom] = DateTime,
        [DateTo]   = (
        SELECT MIN(DateTime)                        -- earliest in subset
            FROM  ProductStatus ps_inner
            WHERE p.ProductId = ps_inner.ProductId  -- our Product
            AND   ps_inner.DateTime > ps.DateTime   -- defines subset, cutoff
            )
    FROM  Product       p,
          ProductStatus ps
    WHERE p.ProductId = ps.ProductId 
    AND   StatusCode  = 2             -- Request

  • 考虑获取下一行是面向行的,不是面向集合的处理.使用面向集合的数据库时会造成严重后果.让优化器为您做所有的思考.检查您的 SHOWPLAN,这优化得很好.

  • Thinking in terms of getting the next row is row-oriented, not set-oriented processing. Crippling, when working with a set-oriented database. Let the Optimiser do all that thinking for you. Check your SHOWPLAN, this optimises beautifully.

无法在集合中思考,因此仅限于编写单级查询,这不是合理的理由:在数据库中实施大量重复和更新异常;浪费在线资源和磁盘空间;保证一半的性能.学习如何编写简单的 SQL 子查询以获取轻松派生的数据要便宜得多.

Inability to think in sets, thus being limited to writing only single-level queries, is not a reasonable justification for: implementing massive duplication and Update Anomalies in the database; wasting online resources and disk space; guaranteeing half the performance. Much cheaper to learn how to write simple SQL subqueries to obtain easily derived data.

相关文章