如何在 hadoop 中序列化对象(在 HDFS 中)

2022-01-13 00:00:00 bigdata serialization hadoop mapreduce java

我有一个 HashMap <字符串,数组列表 <整数 > >.我想将我的 HashMap 对象(hmap)序列化到 HDFS 位置,然后在 Mapper 和 Reducers 将其反序列化以使用它.

I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.

为了在 HDFS 上序列化我的 HashMap 对象,我使用了如下的普通 java 对象序列化代码,但出现错误(权限被拒绝)

To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)

try
        {
            FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
            ObjectOutputStream out = new ObjectOutputStream(fileOut);
            out.writeObject(hm);
            out.close();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

我遇到了以下异常

java.io.FileNotFoundException: hashmap.ser (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
    at KMerIndex.createIndex(KMerIndex.java:121)
    at MyDriverClass.formRefIndex(MyDriverClass.java:717)
    at MyDriverClass.main(MyDriverClass.java:768)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

有人可以建议或分享如何在 hdfs 上的 hadoop 中序列化对象的示例代码吗?

Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?

推荐答案

请尝试使用 SerializationUtils 来自 Apache Commons Lang.

Please try using SerializationUtils from Apache Commons Lang.

下面是方法

static Object   clone(Serializable object)  //Deep clone an Object using serialization.
static Object   deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object   deserialize(InputStream inputStream)  //Deserializes an Object from the specified stream.
static byte[]   serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.

在存储到 HDFS 时,您可以存储从序列化返回的 byte[].在获取对象时,您可以将类型转换为相应的对象,例如:文件对象并可以将其取回.

While storing in to HDFS you can store byte[] which was returned from serialize. While getting the Object you can type cast to corresponding object for ex: File object and can get it back.

在我的例子中,我在 Hbase 列中存储了一个哈希图,我在我的映射器方法中将它检索回来,作为 Hashmap .. 并且成功了.强>

当然,你也可以用同样的方法...

Surely, you can also do that in the same way...

另一件事是你也可以使用 Apache Commons IO 参考这个 (org.apache.commons.io.FileUtils);但稍后您需要将此文件复制到 HDFS.因为您希望 HDFS 作为数据存储.

Another thing is You can also Use Apache Commons IO refer this (org.apache.commons.io.FileUtils); but later you need to copy this file to HDFS. since you wanted HDFS as datastore.

FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);

注意: jar apache commons io 和 apache commons lang 在 hadoop 集群中始终可用.

Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.

相关文章