在 Hadoop 伪分布式模式下充分利用所有内核

2022-01-13 00:00:00 hadoop mapreduce java mahout

我在我的 4 核笔记本电脑上以伪分布式模式运行任务.如何确保所有核心都得到有效使用.目前，我的作业跟踪器显示一次只执行一项作业.这是否意味着只使用一个核心?

I am running a task in pseudo-distributed mode on my 4 core laptop. How can I ensure that all cores are effectively used. Currently my job tracker shows that only one job is executing at a time. Does that mean only one core is used?

以下是我的配置文件.

conf/core-site.xml:

conf/core-site.xml:

<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>

conf/hdfs-site.xml:

conf/hdfs-site.xml:

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

conf/mapred-site.xml:

conf/mapred-site.xml:

<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>

根据答案，我需要在 mapred-site.xml 中添加以下属性

As per the answer, I need to add the following properties in mapred-site.xml

<property> <name>mapred.map.tasks</name> <value>4</value> </property> <property> <name>mapred.reduce.tasks</name> <value>4</value> </property>

推荐答案

mapred.map.tasks 和 mapred.reduce.tasks 将控制这个，并且(我相信) 将在 mapred-site.xml 中设置.然而，这将这些设置为集群范围的默认值；更常见的是，您会在每个作业的基础上配置这些.您可以使用 -D

mapred.map.tasks and mapred.reduce.tasks will control this, and (I believe) would be set in mapred-site.xml. However this establishes these as cluster-wide defaults; more usually you would configure these on a per-job basis. You can set the same params on the java command line with -D

相关文章