Category: Hadoop

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distributed. Make full use of the power of clusters for high-speed computing and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data sets). set) application. HDFS relaxes the requirements of POSIX and can access data in the file system in the form of streaming access. The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive amounts of data, while MapReduce provides calculations for massive amounts of data.

Hadoop – Search Specific Text in String – Hive

/google/gmail/inbox
/google/drive/map
/google/apps
/yahoo/news/cricket
/yahoo/mail/
/yahoo/sports
/wiki/ind/jack
/wiki/us/jil I need to get the required page group .If I use hive query to se

October 12, 2021By Simo Hadoop Hadoop, hive, search, specific, string, TextLeave a Comment

Hive and HBase difference

Hive

Data warehouse, the essence of Hive is actually equivalent to making a bijective relationship in Mysql for files already stored in HDFS to facilitate the use of HQL to manage queries< /s

October 12, 2021By Simo Hadoop Difference, HBase, hiveLeave a Comment

What is the difference between Observer.Throw and Observer.Error in TypeScript – Observable?

What is the difference between observer.throw (error) and observer.error (error)?

I am using RxJS version “5.0.0-beta.12”

var innerObservable = new Observable(observer => {
console.log

October 12, 2021By Simo Hadoop error, observable, observer, observer.error, observer.throw, Throw, TypeScript, 什么, 区别Leave a Comment

Hadoop – write Spark data frame as inlaid to S3 instead of creating _temporary folder

Using pyspark I am reading a data frame from a parquet file on Amazon S3

dataS3 = sql. read.parquet(“s3a://” + s3_bucket_in) This is no problem. But then I try to write data

dataS3.writ

October 12, 2021By Simo Hadoop Created, data, folder, frame, Hadoop, Mosa, S3, SPARK, temporary, writeLeave a Comment

Hadoop – How do I get all table definitions in the database in Hive?

I want to get all table definitions in Hive. I know that I can use something similar for single table definitions –

describe <>
describe extended <> However, I can’t find a way to get all ta

October 12, 2021By Simo Hadoop acquisition, all, database, table definitionLeave a Comment

Hadoop – HBase Oldwals: What is it, how can I clean it?

There is not enough space in our small hadoop cluster, so I am checking the disk usage on HDFS and I found that most of the space is occupied by the /hbase/oldWALs folder.

I have checked the

October 12, 2021By Simo Hadoop Hadoop, HBase, how to clean up, Oldwals, whatLeave a Comment

October 12, 2021By Simo Hadoop API, HDFS, Java, troubleshootingLeave a Comment

Complete distributed construction of HBase

Reading statement: The following content is a personal understanding based on online materials and work. If it is inappropriate, please correct me~~~Thank you< /strong> 1. HBase installation mode 　

October 12, 2021By Simo Hadoop built, complete, distributed, HBaseLeave a Comment