在 MySQL 上使用 NoSQL 数据库
我有一个在 Java 堆栈(Struts 2 + Spring + Hibernate)上运行的 Web 应用程序,并在 MySQL 中持久化.我查看了 NoSQL 数据库,它们肯定比 RDBMS 更容易推理和使用.这是一个音乐流媒体应用程序,可存储艺术家信息并允许用户保存播放列表.
I have a web application running on Java stack (Struts 2 + Spring + Hibernate) and persisted in MySQL. I looked at NoSQL databases and they are certainly easy to reason about and work with than a RDBMS. It's a music streaming app which stores artist information and allows users to save playlists.
我想知道切换到 NoSQL DB(CouchDB?MongoDB?Cassandra?)是否有任何优势(性能?硬件成本?简化代码?可扩展性?).切换到 NoSQL 数据库会失去/获得什么?
I am wondering whether there are any advantages (performance?, hardware cost?, simplified code?, scalability?) of switching to a NoSQL DB (CouchDB?, MongoDB?, Cassandra?). What would I lose/gain by switching to a NoSQL database?
请指教.
推荐答案
对NoSQL"的礼貌解释变成了Not Only SQL
.如果您的数据确实是真正相关的,或者您的功能依赖于连接和 ACIDity 等内容,那么您应该以关系方式存储该数据.在这篇文章中,我将解释如何将 MySQL 与 两个 NoSQL 数据存储一起使用.现代的网络规模数据存储就是要了解如何为工作选择最佳工具.
The polite interpretation of "NoSQL" has become Not Only SQL
. If you have data that is indeed truly relational, or if your functionality depends on things like joins and ACIDity, then you should store that data in a relational way. In this post, I'll explain how I use MySQL alongside two NoSQL data stores. Modern, web-scale data storage is all about understanding how to pick the best tool(s) for the job(s).
也就是说,NoSQL 实际上是对这样一个事实的反应,即关系方法和思维方式已被应用于实际上不太适合的问题(通常是具有数千万行或更多行的巨大表).一旦表变得那么大,典型的 SQL最佳实践"就是手动分片数据——也就是说,将记录 1 到 10,000,000 放在表 A 中,将 10,000,001 到 20,000,001 放在表 B 中,依此类推在.然后,通常在应用程序模型层中,根据该方案执行查找.这就是所谓的 application-aware
缩放.它耗时且容易出错,但是为了在为长表存储维护 MySQL 的同时扩展某些东西,它或多或少已成为标准 MO.对我来说,NoSQL 代表了 application-unaware
替代方案.
That said, NoSQL is really a reaction to the fact that the relational method and way of thinking has been applied to problems where it's not actually a very good fit (typically huge tables with tens of millions of rows or more). Once tables get that large, the typical SQL "best practice" has been to manually shard the data -- that is, putting records 1 through 10,000,000 in table A, 10,000,001 through 20,000,001 in table B, and so on. Then, typically in the application model layer, the lookups are performed according to this scheme. This is what's called application-aware
scaling. It's time-intensive and error prone, but to scale something up while maintaining MySQL for the long table store, it's become a more or less standard MO. NoSQL represents, to me, the application-unaware
alternative.
键值对
当我的 MySQL 原型开始变得过于庞大时,我个人将尽可能多的数据转移到闪电般的速度 Membase,其性能优于 Memcached 并增加了持久性.Membase 是一种分布式键值存储,通过在集群中添加更多商品服务器,或多或少地线性扩展(例如,Zynga 使用它来处理每秒 50 万次操作)——因此它是一个伟大的em> 适合 Amazon EC2、Joyent 等
When I had a MySQL prototype start getting too big for its own good, I personally moved as much data as possible to the lightning-fast Membase, which outperforms Memcached and adds persistence. Membase is a distributed key-value store that scales more or less linearly (Zynga uses it to handle a half-million ops per second, for instance) by adding more commodity servers into a cluster -- it's therefore a great fit for the cloud age of Amazon EC2, Joyent, etc.
众所周知,分布式键值存储是获得巨大线性规模的最佳方式.键值的弱点是可查询性和索引.但即使在关系世界中,可伸缩性的最佳实践是尽可能多地将精力转移到应用程序服务器上,在商品应用程序服务器上的内存中进行连接,而不是要求中央 RDB 集群处理所有这些逻辑.由于 simple select
加上 application logic
确实是实现大规模甚至在 MySQL 上的最佳方式,因此过渡到 Membase(或其竞争对手)之类的东西像 Riak) 还不错.
It's well known that distributed key-value stores are the best way to get enormous, linear scale. The weakness of key-value is queryability and indexing. But even in the relational world, the best practice for scalability is to offload as much effort onto the application servers as possible, doing joins in memory on commodity app servers instead of asking the central RDB cluster to handle all of that logic. Since simple select
plus application logic
are really the best way to achieve massive scale even on MySQL, the transition to something like Membase (or its competitors like Riak) isn't really too bad.
文档存储
有时——尽管我认为比许多人认为的要少——应用程序的设计本质上需要二级索引、范围可查询性等.NoSQL 方法是通过 document store
像 MongoDB.和 Membase 一样,Mongo 在一些关系数据库特别薄弱的领域非常出色,例如 application-unaware
缩放、auto-sharding
和 甚至可以保持平坦的响应时间作为数据集大小气球
.它比 Membase 慢得多,而且做纯水平缩放有点棘手,但好处是它是高度可查询的.您可以实时查询参数和范围,也可以使用 Map/Reduce 对真正庞大的数据集执行复杂的批处理操作.
Sometimes -- though I would argue less often than many think -- an application's design inherently requires secondary indices, range queryability, etc. The NoSQL approach to this is through a document store
like MongoDB. Like Membase, Mongo is very good in some areas where relational databases are particularly weak, like application-unaware
scaling, auto-sharding
, and maintaining flat response times even as dataset size balloons
. It's significantly slower than Membase and a bit trickier to do pure horizontal scale, but the benefit is that it's highly queryable. You can query on parameters and ranges in real time, or you can use Map/Reduce to perform complex batch operations on truly enormous data sets.
在我上面提到的同一个项目中,它使用 Membase 提供大量实时玩家数据,我们使用 MongoDB 来存储分析/指标数据,这正是 MongoDB 的亮点.
On the same project I mentioned above, which uses Membase to serve tons of live player data, we use MongoDB to store analytics/metrics data, which is really where MongoDB shines.
为什么要将内容保存在 SQL 中
我简要地谈到了真正的关系"信息应该保留在关系数据库中这一事实.正如评论者 Dan K. 指出的那样,我错过了讨论离开 RDBMS 或至少完全离开 RDBMS 的缺点的部分.
I touched briefly on the fact that 'truly relational' information should stay in relational databases. As commenter Dan K. points out, I missed the part where I discuss the disadvantages of leaving RDBMS, or at least of leaving it entirely.
首先是 SQL 本身. SQL 是众所周知的,并且长期以来一直是行业标准.一些NoSQL"数据库,如 Google 的 App Engine 数据存储区(基于 Big Table 构建)实现了自己的 SQL-类似语言(Google 的 Google Query Language
被称为 GQL,很可爱).MongoDB 以其令人愉悦的 JSON 查询对象 采用了一种全新的方法来解决查询问题.尽管如此,SQL 本身还是从数据中获取信息的强大工具,而这通常是数据库的全部重点.
First, there's SQL itself. SQL is well-known and has been an industry standard for a long time. Some "NoSQL" databases like Google's App Engine Datastore (built on Big Table) implement their own SQL-like language (Google's is called, cutely, GQL for Google Query Language
). MongoDB takes a fresh approach to the querying problem with its delightful JSON query objects. Still, SQL itself is a powerful tool for getting information out of data, which is often the whole point of databases to begin with.
使用 RDBMS 最重要的原因是 ACID,或 原子性,一致性、隔离性、持久性
.我不会重新散列 Acid-NoSQL 的状态,因为它在 这篇文章关于 SO.可以这么说,Oracle 的 RDBMS 有这样一个原因是有原因的不会去任何地方的巨大市场:某些数据需要纯 ACID 合规性.如果您的数据确实如此(如果确实如此,那么您可能很清楚这一事实),那么您的数据库也是如此.保持 pH 低!
The most important reason to stay with RDBMS is ACID, or Atomicity, Consistency, Isolation, Durability
. I won't re-hash the state of Acid-NoSQL, as it's well-addressed in this post on SO. Suffice it to say, there's a rational reason Oracle's RDBMS has such a huge market that isn't going anywhere: some data needs pure ACID compliance. If your data does (and if it does, you're probably well aware of that fact), then so does your database. Keep that pH low!
查看 Aaronaught 的帖子此处.他比我更能代表企业对企业的观点,部分原因是我的整个职业生涯都在消费领域.
Check out Aaronaught's post here. He represents the business-to-business perspective far better than I could, in part because I've spent my entire career in the consumer space.
相关文章