Hadoop – How to Create a Spark DataFrame from Sequencefile

I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains json data with a large number of fields in a sequence input file format.

Is there a way to do this gracefully in java? I don’t know the structure/fields of json in advance.

I can use the input in the sequence file as an RDD as follows:

JavaPairRDD inputRDD = jsc.sequenceFile("s3n://key_id:secret_key@file/path", LongWritable.class, BytesWritable.class);
JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) {
return Text.decode(tuple._2.getBytes());< br /> }
}
);

How to create a data frame from this RDD?

I did the following for json data in my sequence file:

JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) throws JSONException, UnsupportedEncodingException {
String valueAsString = new String(tuple._2.getBytes(), "UTF-8");
JSONObject data = new JSONObject(valueAsString);
JSONObject payload = new JSONObject(data.getString("payload"));
String atlas_ts = "";
return payload.toString();
}
}
);

I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains a large number of fields with a sequence input file format Json data.

Is there a way to do this gracefully in java? I don’t know the structure/fields of json in advance.

I can use the input in the sequence file as an RDD as follows:

JavaPairRDD inputRDD = jsc.sequenceFile("s3n://key_id:secret_key@file/path", LongWritable.class, BytesWritable.class);
JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) {
return Text.decode(tuple._2.getBytes());< br /> }
}
);

How to create a data frame from this RDD?

I performed the following operations for json data in my sequence file:

 JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) throws JSONException, UnsupportedEncodingException {
String valueAsString = new String(tuple._2.getBytes(), "UTF-8");
JSONObject data = new JSONObject(valueAsString);
JSONObject payload = new JSONObject(data. getString("payload"));
String atlas_ts = "";
return payload.toString();
}
}
);

< /p>

Leave a Comment

Your email address will not be published.