Hadoop 分布差异

2022-01-14 00:00:00 hadoop mapreduce java

有人可以概述各种可用的 Hadoop 发行版之间的各种差异吗:

Can somebody outline the various differences between the various Hadoop Distributions available:

  • Cloudera - http://www.cloudera.com/hadoop
  • 雅虎 - http://developer.yahoo.net/blogs/hadoop/
  • Cloudera - http://www.cloudera.com/hadoop
  • Yahoo - http://developer.yahoo.net/blogs/hadoop/

使用 Apache Hadoop 发行版作为基准.

using the Apache Hadoop distro as a baseline.

是否有充分的理由在标准 Apache Hadoop 发行版上使用这些发行版之一?

Is there a good reason to using one of these distributions over the standard Apache Hadoop distro?

推荐答案

免责声明:今年夏天我在 Cloudera 实习(但我最好的一些朋友在 Yahoo! :-))

Disclaimer: I interned at Cloudera this summer (but some of my best friends are at Yahoo! :-))

Yahoo 发行版是他们在其集群的某些子集上运行(运行?)的 Hadoop 20 版本.它包括一组用于稳定性、错误修复等的补丁.它是一个源版本;它没有 rpm 或 debian 软件包等对管理员友好的功能.

The Yahoo distribution is a version of Hadoop 20 that they run (ran?) on some subset of their clusters. It includes a set of patches for stability, bug fixes, etc. It is a source release; it does not have admin-friendly features like rpm or debian packages, etc.

Cloudera 发行版是 rpms 和 debs 形式的软件包(源代码也可用).这意味着您可以通过标准方法等获取更新.它还包括稳定性和错误修复补丁.它一直在维护(并不是说雅虎不是——我想人们可以去 github 上查看他们上次更新它的时间).它还打包了 Pig 和 Hive.

The Cloudera distribution is packages as rpms and debs (the source is also available). This means you can get updates via standard methods, etc. It also includes stability and bug fix patches. It is constantly maintained (not to say Yahoo's isn't -- I suppose one could just go on github and check when they last updated it). It also packages Pig and Hive.

Cloudera 的 Hadoop 20 发行版处于测试阶段,18 被认为是稳定的(更多关于这方面的信息,请访问 Cloudera 博客).18 版本还包括 Hive 和 Pig 的包;对于 20,您必须自己构建它们(虽然存在补丁,但目前还没有支持 20 的 Pig 或 Hive 的官方版本).Cloudera 和雅虎 20 版本之间很可能有很大的重叠;两者都提供清单,因此您可以检查.Cloudera 发行版的最新文档位于 http://archive.cloudera.com

Cloudera's distribution of Hadoop 20 is in beta, and 18 is considered stable (more on this on the Cloudera blog). The 18 version also includes packages for Hive and Pig; for 20, you have to build them yourself (there aren't official releases of Pig or Hive that support 20 yet, although patches exist). There may well be significant overlap between the Cloudera and Yahoo versions of 20; both provide manifests, so you can check. The latest documentation of Cloudera's distros is at http://archive.cloudera.com

Yahoo 不为其分发提供支持;他们将补丁版本作为服务提供给社区,因此感兴趣的人可以构建雅虎内部运行的内容.鉴于 Yahoo 集群的规模,这是一个重大贡献,尤其是如果您不是一直遵循 JIRA 的 Hadoop 开发人员.Cloudera 支持其商业发行版,并通过 Hadoop 邮件列表提供一些社区支持,对于发行版特定问题,在其 GetSatisfaction 页面上提供.

Yahoo does not provide support for their distribution; they provide their patched version as a service to the community, so the folks who are interested can build what Yahoo runs internally. Given the size of Yahoo clusters, that's a significant contribution, especially if you aren't a Hadoop developer who follows the JIRAs all the time. Cloudera supports their distribution commercially, as well as providing some community support via the Hadoop mailing lists and, for distro-specific issues, on their GetSatisfaction page.

两者都与原版 Apache 发行版有很大不同,因为它们会在两个版本之间对其进行修补(cloudera 版本 20 有 60 多个补丁!).

Both are pretty different from the vanilla Apache distro since they patch it in between releases (the cloudera version of 20 has 60+ patches!).

相关文章