Hadoop – How to Create a Spark DataFrame from Sequencefile

I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains json data with a large number of fields in a sequence input file format.

Is there a way to do this gracefully in java? I don’t know the structure/fields of json in advance.

I can use the input in the sequence file as an RDD as follows:

JavaPairRDD inputRDD = jsc.sequenceFile("s3n://key_id:secret_key@file/path", LongWritable.class, BytesWritable.class);
JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) {
return Text.decode(tuple._2.getBytes());< br /> }
}
);

How to create a data frame from this RDD?

I did the following for json data in my sequence file:

JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) throws JSONException, UnsupportedEncodingException {
String valueAsString = new String(tuple._2.getBytes(), "UTF-8");
JSONObject data = new JSONObject(valueAsString);
JSONObject payload = new JSONObject(data.getString("payload"));
String atlas_ts = "";
return payload.toString();
}
}
);

I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains a large number of fields with a sequence input file format Json data.

Is there a way to do this gracefully in java? I don’t know the structure/fields of json in advance.

I can use the input in the sequence file as an RDD as follows:

JavaPairRDD inputRDD = jsc.sequenceFile("s3n://key_id:secret_key@file/path", LongWritable.class, BytesWritable.class);
JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) {
return Text.decode(tuple._2.getBytes());< br /> }
}
);

How to create a data frame from this RDD?

I performed the following operations for json data in my sequence file:

 JavaRDD events = inputRDD.map(
new Function, String>() {
public String call(Tuple2 tuple) throws JSONException, UnsupportedEncodingException {
String valueAsString = new String(tuple._2.getBytes(), "UTF-8");
JSONObject data = new JSONObject(valueAsString);
JSONObject payload = new JSONObject(data. getString("payload"));
String atlas_ts = "";
return payload.toString();
}
}
);

< /p>

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 4376 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.