Skip to navigation Skip to content
Simon Technology Blog
  • Architecture
  • Cloud
  • Database
  • Develop
  • Hardware
  • Industry
  • Language
  • Mobile
  • Opensource
  • OS
  • Web
Main Navigation

Tag: Hadoop

Hadoop – How to Create a Spark DataFrame from Sequencefile

I am using spark 1.5. I want to create a data frame from a file in HDFS. The HDFS file contains json data with a large number of fields in a sequence input file format.

Is there a way to do t

October 12, 2021By Simo Hadoop create, DataFrame, Hadoop, How, Sequencefile, SPARKLeave a Comment

Hadoop cluster time synchronization

1. Cluster time synchronization
Find a machine, as a practical server, all machines will synchronize with the cluster time regularly, for example, synchronize the time every ten minutes.
1.1 Steps

October 12, 2021By Simo Hadoop cluster, Hadoop, Synchronization, timeLeave a Comment

The difference between Hadoop – Reduce Task and Reducer

“The reducer is different from the reduction task. The reducer can run multiple reduction tasks”. Can someone explain this with the following example?

foo.txt: Very good, this is the foo file

October 12, 2021By Simo Hadoop between, Hadoop, Reduce, REDUCER, TaskLeave a Comment

Hadoop – Search Specific Text in String – Hive

/google/gmail/inbox
/google/drive/map
/google/apps
/yahoo/news/cricket
/yahoo/mail/
/yahoo/sports
/wiki/ind/jack
/wiki/us/jil I need to get the required page group .If I use hive query to se

October 12, 2021By Simo Hadoop Hadoop, hive, search, specific, string, TextLeave a Comment

Hadoop – write Spark data frame as inlaid to S3 instead of creating _temporary folder

Using pyspark I am reading a data frame from a parquet file on Amazon S3

dataS3 = sql. read.parquet(“s3a://” + s3_bucket_in) This is no problem. But then I try to write data

dataS3.writ

October 12, 2021By Simo Hadoop Created, data, folder, frame, Hadoop, Mosa, S3, SPARK, temporary, writeLeave a Comment

Hadoop – HBase Oldwals: What is it, how can I clean it?

There is not enough space in our small hadoop cluster, so I am checking the disk usage on HDFS and I found that most of the space is occupied by the /hbase/oldWALs folder.

I have checked the

October 12, 2021By Simo Hadoop Hadoop, HBase, how to clean up, Oldwals, whatLeave a Comment

Hadoop – Apache Spark runs Spark-shell on Yarn error

I downloaded spark-2.1.0-bin-hadoop2.7.tgz from http://spark.apache.org/downloads.html. I have Hadoop HDFS and YARN with $ start-dfs.sh and $start-yarn.sh start. But running $spark-shell –master ya

October 12, 2021By Simo Hadoop apache, error, Hadoop, running, shell, SPARK, YarnLeave a Comment

Hadoop gen1 vs Hadoop gen2

I am a bit confused about the location of tasktracker in Hadoop-2.x.

The daemons in Hadoop-1.x are namenode, datanode, jobtracker, taskracker and secondarynamenode

The daemons in Hadoop

October 12, 2021By Simo Hadoop Gen, gen1, gen2, Hadoop, VSLeave a Comment

Hadoop – How do I check if HDFS is running?

I want to see if Hadoop’s hdfs file system is working properly. I know that jps lists the running daemons, but I don’t actually know which daemons to look for. < p>

I ran the following comman

October 12, 2021By Simo Hadoop check, Hadoop, HDFS, How, is it, runLeave a Comment

The second bomb of those pits of Hadoop CDH

Fuck. . . . Can’t connect to the datanode. I don’t know why the datanode can’t be connected. .

2019-07-19 16:10:00,156 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop102

October 12, 2021By Simo Hadoop bombs, CDH, Hadoop, pits, Second, thoseLeave a Comment

Posts navigation

Page 1 Page 2 Page 3 Page 4
Recent Posts
  • Sencha-Touch-2 – Sencha Touch 2, Nested XML Analysis NodeValue
  • Add a separation line and format XML content
  • Is there a norm of simplified XML subsets?
  • Look at it when you write React
  • ReactJS – Present React Redux React-Router App to add the server to the Firebase hosted by the Firebase
Categories
  • Android
  • Apache
  • Apache Kafka
  • Asp
  • Auto-Test
  • Automated Build
  • Aws
  • Bitcoin
  • Browser
  • C & C++
  • C#
  • Centos
  • Cgi
  • Character
  • Cloud Service
  • Cocos2dx
  • Cordova
  • CSS
  • Data Structure
  • Delphi
  • Design Pattern
  • Dojo
  • Dubbo
  • ELK
  • Flex
  • football
  • Game
  • Hadoop
  • Hibernate
  • HTML
  • Hybrid
  • Intel
  • IOS
  • Ipad
  • iPhone
  • Java
  • Javascript
  • Jetty
  • JQuery
  • Jsp
  • Linux
  • Load Balance
  • Lua
  • Macbook
  • Macos
  • Mathematics
  • Micro Services
  • Monitoring
  • Motherboard
  • Mysql
  • Network Hardware
  • Network Marketing
  • Nginx
  • NodeJs
  • Nosql
  • Oracle
  • Os Theory
  • Performance
  • PHP
  • Postgresql
  • Power Designer
  • React
  • Redis
  • Regexp
  • Rom
  • Rss
  • Ruby
  • Search Engines
  • Shell Script
  • Silicon Valley
  • Silverlight
  • Software Design
  • Spring
  • Sql
  • Sqlite
  • Sqlserver
  • Storage
  • Storm
  • Surface
  • SVN
  • Swift
  • System Architecture
  • Tablet
  • Uncategorized
  • Unix
  • Visual Basic
  • Visual Studio
  • Web Crawler
  • WebService
  • Windows
  • Wireless
  • XML
  • ZooKeeper
Archives
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • September 2019
  • August 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
© Simon Technology Blog 2025 • ThemeCountry Powered by WordPress