为什么 hadoop 无法识别我的 Map 类?

2022-01-13 00:00:00 hadoop runtimeexception mapreduce java

我正在尝试在 hadoop 2.2.0 上运行我的 PDFWordCount map-reduce 程序，但出现此错误:

I am trying to run my PDFWordCount map-reduce program on hadoop 2.2.0 but I get this error:

13/12/25 23:37:26 INFO mapreduce.Job: Task Id : attempt_1388041362368_0003_m_000009_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718) ... 8 more

它说我的地图类是未知的.我有一个集群，在 3 个虚拟机上具有一个 namenod 和 2 个数据节点.

It says that my map class is not known. I have a cluster with a namenod and 2 datanodes on 3 VMs.

我的主要功能是这样的:

My main function is this:

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); @SuppressWarnings("deprecation") Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(MyMap.class); job.setReducerClass(MyReduce.class); job.setInputFormatClass(PDFInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(PDFWordCount.class); job.waitForCompletion(true); }

如果我使用这个命令运行我的 jar:

If I run my jar using this command:

yarn jar myjar.jar PDFWordCount /in /out

它以 /in 作为输出路径，并在我的主函数中有 job.setJarByClass(PDFWordCount.class); 时给我错误，如上所示.

it takes /in as output path and gives me error while I have job.setJarByClass(PDFWordCount.class); in my main function as you see above.

我已经运行了一个简单的 WordCount 项目，它的主要功能与这个主要功能完全一样，为了运行它，我使用了 yarn jar wc.jar MyWordCount/in2/out2，它运行完美.

I have run simple WordCount project with main function exactly like this main function and to run it, I used yarn jar wc.jar MyWordCount /in2 /out2 and it run flawlessly.

我不明白是什么问题！

更新:我试图将我的工作从这个项目转移到我成功使用的 wordcount 项目.我构建了一个包，将相关文件从 pdfwordcount 项目复制到这个包并导出了项目(我的 main 没有更改为使用 PDFInputFormat，所以除了将 java 文件移动到新包之外我什么也没做.)它没有不行.我从其他项目中删除了文件，但它没有用.我将 java 文件移回默认包，但它不起作用！

UPDATE: I tried to move my work from this project to wordcount project I have used successfully. I built a package, copied related files from pdfwordcount project to this package and exported the project (my main was not changed to used PDFInputFormat, so I did nothing except moving java files to new package.) It didn't work. I deleted files from other project but it didn't work. I moved java file back to default package but it didn't work!

怎么了?！

推荐答案

我找到了解决这个问题的方法，尽管我无法理解问题到底是什么.

I found a way to overcome this problem, even though I couldn't understand what was the problem actually.

当我想在 eclipse 中将我的 java 项目导出为 jar 文件时，我有两个选择:

When I want to export my java project as a jar file in eclipse, I have two options:

将所需的库提取到生成的 JAR 中
将所需的库打包到生成的 JAR 中

我不知道到底有什么区别，或者有什么大不了的.我曾经选择第二个选项，但如果我选择第一个选项，我可以使用以下命令运行我的作业:

I don't know exactly what is the difference or is it a big deal or not. I used to choose second option, but if I choose first option, I can run my job using this command:

yarn jar pdf.jar /in /out

相关文章