Hadoop Archives - Page 3 of 4 - Simon Technology Blog

Hadoop – How to Create a Spark DataFrame from Sequencefile

I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains json data with a large number of fields in a sequence input file format.

Is there a way to do t

October 12, 2021By Simo Hadoop create, DataFrame, Hadoop, How, Sequencefile, SPARKLeave a Comment

Hadoop cluster time synchronization

1. Cluster time synchronization
Find a machine, as a practical server, all machines will synchronize with the cluster time regularly, for example, synchronize the time every ten minutes.
1.1 Steps

October 12, 2021By Simo Hadoop cluster, Hadoop, Synchronization, timeLeave a Comment

The difference between Hadoop – Reduce Task and Reducer

“The reducer is different from the reduction task. The reducer can run multiple reduction tasks”. Can someone explain this with the following example?

foo.txt: Very good, this is the foo file

October 12, 2021By Simo Hadoop between, Hadoop, Reduce, REDUCER, TaskLeave a Comment

Hadoop – Search Specific Text in String – Hive

/google/gmail/inbox
/google/drive/map
/google/apps
/yahoo/news/cricket
/yahoo/mail/
/yahoo/sports
/wiki/ind/jack
/wiki/us/jil I need to get the required page group .If I use hive query to se

October 12, 2021By Simo Hadoop Hadoop, hive, search, specific, string, TextLeave a Comment

Hadoop – write Spark data frame as inlaid to S3 instead of creating _temporary folder

Using pyspark I am reading a data frame from a parquet file on Amazon S3

dataS3 = sql. read.parquet(“s3a://” + s3_bucket_in) This is no problem. But then I try to write data

dataS3.writ

October 12, 2021By Simo Hadoop Created, data, folder, frame, Hadoop, Mosa, S3, SPARK, temporary, writeLeave a Comment

Hadoop – HBase Oldwals: What is it, how can I clean it?

There is not enough space in our small hadoop cluster, so I am checking the disk usage on HDFS and I found that most of the space is occupied by the /hbase/oldWALs folder.

I have checked the

October 12, 2021By Simo Hadoop Hadoop, HBase, how to clean up, Oldwals, whatLeave a Comment

Hadoop – Apache Spark runs Spark-shell on Yarn error

I downloaded spark-2.1.0-bin-hadoop2.7.tgz from http://spark.apache.org/downloads.html. I have Hadoop HDFS and YARN with $ start-dfs.sh and $start-yarn.sh start. But running $spark-shell –master ya

October 12, 2021By Simo Hadoop apache, error, Hadoop, running, shell, SPARK, YarnLeave a Comment

Hadoop gen1 vs Hadoop gen2

I am a bit confused about the location of tasktracker in Hadoop-2.x.

The daemons in Hadoop-1.x are namenode, datanode, jobtracker, taskracker and secondarynamenode

The daemons in Hadoop

October 12, 2021By Simo Hadoop Gen, gen1, gen2, Hadoop, VSLeave a Comment

Hadoop – How do I check if HDFS is running?

I want to see if Hadoop’s hdfs file system is working properly. I know that jps lists the running daemons, but I don’t actually know which daemons to look for. < p>

I ran the following comman

October 12, 2021By Simo Hadoop check, Hadoop, HDFS, How, is it, runLeave a Comment

The second bomb of those pits of Hadoop CDH

Fuck. . . . Can’t connect to the datanode. I don’t know why the datanode can’t be connected. .

2019-07-19 16:10:00,156 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop102

October 12, 2021By Simo Hadoop bombs, CDH, Hadoop, pits, Second, thoseLeave a Comment