我应该规范化我的数据库吗?

2021-12-20 00:00:00 optimization database mysql database-normalization rdbms

在为数据库(例如 MySQL)设计架构时，会出现是否完全规范化表的问题.

When designing a schema for a DB (e.g. MySQL) the question arises whether or not to completely normalize the tables.

一方面连接(和外键约束等)非常慢，另一方面你会得到冗余数据和潜在的不一致.

On one hand joins (and foreign key constraints, etc.) are very slow, and on the other hand you get redundant data and the potential for inconsistency.

最后优化"是正确的方法吗?即创建一个按手册规范化的数据库，然后查看可以进行非规范化以实现最佳速度增益的内容.

Is "optimize last" the correct approach here? i.e. create a by-the-book normalized DB and then see what can be denormalized to achieve the optimal speed gain.

关于这种方法，我担心的是，我会选择一个可能不够快的数据库设计 - 但在那个阶段重构架构(同时支持现有数据)将非常痛苦.这就是为什么我很想暂时忘记我学到的关于正确"RDBMS 实践的所有内容，并尝试一次平面表"方法.

My fear, regarding this approach, is that I will settle on a DB design that might not be fast enough - but at that stage refactoring the schema (while supporting existing data) would be very painful. This is why I'm tempted to just temporarily forget everything I learned about "proper" RDBMS practices, and try the "flat table" approach for once.

这个数据库将是大量插入的事实是否会影响这个决定?

Should the fact that this DB is going to be insert-heavy effect the decision?

推荐答案

一个哲学性的答案:次优(关系)数据库充斥着插入、更新和删除异常.这些都会导致数据不一致，导致数据质量不佳.如果您不能相信数据的准确性，那又有什么用呢?问问自己:你想要正确的答案更慢还是想要错误的答案更快?

A philosophical answer: Sub-optimal (relational) databases are rife with insert, update, and delete anomalies. These all lead to inconsistent data, resulting in poor data quality. If you can't trust the accuracy of your data, what good is it? Ask yourself this: Do you want the right answers slower or do you want the wrong answers faster?

作为一个实际问题:在快速完成之前先做好.我们人类非常不擅长预测瓶颈会在哪里出现.使数据库变得更好，在适当的时间段内测量性能，然后决定是否需要使其更快.在进行非规范化和牺牲准确性之前，请尝试其他技术:您可以获得更快的服务器、连接、数据库驱动程序等吗?存储过程可以加快速度吗?索引及其填充因子如何?如果这些和其他性能和调优技术不起作用，那么只有考虑非规范化.然后测量性能以验证您是否获得了支付"的速度提升.确保您正在执行优化，而不是悲观.

As a practical matter: get it right before you get it fast. We humans are very bad at predicting where bottlenecks will occur. Make the database great, measure the performance over a decent period of time, then decide if you need to make it faster. Before you denormalize and sacrifice accuracy try other techniques: can you get a faster server, connection, db driver, etc? Might stored procedures speed things up? How are the indexes and their fill factors? If those and other performance and tuning techniques do not do the trick, only then consider denormalization. Then measure the performance to verify that you got the increase in speed that you "paid for". Make sure that you are performing optimization, not pessimization.

问:所以如果我最后优化，你能推荐合理的迁移方式模式更改后的数据?如果，例如，我决定摆脱一个查找表 - 我如何迁移现有的数据库用于这种新设计?

Q: So if I optimize last, can you recommend a reasonable way to migrate data after the schema is changed? If, for example, I decide to get rid of a lookup table - how can I migrate existing databased to this new design?

答:当然.

进行备份.
再备份到不同的设备.
使用select into newtable from oldtable..."类型命令创建新表.您需要进行一些连接以合并以前不同的表.
删除旧表.
重命名新表.

但是...考虑更强大的方法:

BUT... consider a more robust approach:

立即在完全规范化的表上创建一些视图.这些视图(虚拟表、数据上的窗口"...如果您想了解有关此主题的更多信息，请询问我)将具有与上述第三步相同的定义查询.当您编写应用程序或 DB 层逻辑时，请使用视图(至少用于读取访问；可更新的视图是……好吧，很有趣).然后，如果您稍后进行非规范化，请按上述方式创建一个新表，删除视图，无论视图是什么，重命名新基表.您的应用程序/数据库层不会知道其中的区别.

Create some views on your fully normalized tables right now. Those views (virtual tables, "windows" on the data... ask me if you want to know more about this topic) would have the same defining query as step three above. When you write your application or DB-layer logic, use the views (at least for read access; updatable views are... well, interestsing). Then if you denormalize later, create a new table as above, drop the view, rename the new base table whatever the view was. Your application/DB-layer won't know the difference.

在实践中实际上还有更多内容，但这应该可以帮助您入门.

There's actually more to this in practice, but this should get you started.

相关文章