Is there a way to do this gracefully in java? I don’t know the structure/fields of json in advance.
I can use the input in the sequence file as an RDD as follows:
JavaPairRDDinputRDD = jsc.sequenceFile("s3n://key_id:secret_key@file/path", LongWritable.class, BytesWritable.class);
JavaRDDevents = inputRDD.map(
new Function, String>() {
public String call(Tuple2tuple) {
return Text.decode(tuple._2.getBytes());< br /> }
}
);
How to create a data frame from this RDD?
p>
JavaRDDevents = inputRDD.map(
new Function, String>() {
public String call(Tuple2tuple) throws JSONException, UnsupportedEncodingException {
String valueAsString = new String(tuple._2.getBytes(), "UTF-8");
JSONObject data = new JSONObject(valueAsString);
JSONObject payload = new JSONObject(data.getString("payload"));
String atlas_ts = "";
return payload.toString();
}
}
);
I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains a large number of fields with a sequence input file format Json data.
Is there a way to do this gracefully in java? I don’t know the structure/fields of json in advance.
I can use the input in the sequence file as an RDD as follows:
JavaPairRDDinputRDD = jsc.sequenceFile("s3n://key_id:secret_key@file/path", LongWritable.class, BytesWritable.class);
JavaRDDevents = inputRDD.map(
new Function, String>() {
public String call(Tuple2tuple) {
return Text.decode(tuple._2.getBytes());< br /> }
}
);
How to create a data frame from this RDD?
I performed the following operations for json data in my sequence file:
JavaRDDevents = inputRDD.map(
new Function, String>() {
public String call(Tuple2tuple) throws JSONException, UnsupportedEncodingException {
String valueAsString = new String(tuple._2.getBytes(), "UTF-8");
JSONObject data = new JSONObject(valueAsString);
JSONObject payload = new JSONObject(data. getString("payload"));
String atlas_ts = "";
return payload.toString();
}
}
);
< /p>