Category: Hadoop

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distributed. Make full use of the power of clusters for high-speed computing and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data sets). set) application. HDFS relaxes the requirements of POSIX and can access data in the file system in the form of streaming access. The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive amounts of data, while MapReduce provides calculations for massive amounts of data.

Hive field Chinese Note garbled solution

Chinese garbled characters in Hive fields. For example, when show create table xxx is executed, garbled characters are found in table-level comments and field-level comments (both????), which are g

October 12, 2021By Simo Hadoop Chinese, Field, garbled, hive, Notes, solutionLeave a Comment

HBASE 2.1.3 Cluster Web Removement InvalidProtocolBuffeRexception Solution

After the HBase cluster is built, various background processes are normal. Refer to the construction manual:

Hbase 2.1.3 Cluster Construction Manual
https://www.cndba.cn/dave/article/3322

October 12, 2021By Simo Hadoop 2.1.3, cluster, HBase, INVALIDPROTOCOLBUFFEREXCEPTION, method, Solving, WEBLeave a Comment

Hadoop – ‘SparkContext’ Object No Properties ‘TextFile’

I tried to load the file using the following code:

textdata = sc.textfile(‘hdfs://localhost:9000 /file.txt’) Error message:

AttributeError:’SparkContext’ object has no attribute’textfil

October 12, 2021By Simo Hadoop Hadoop, no, object, Property, Sparkcontext, TextFileLeave a Comment

GRMS_README

A product recommendation system based on Hadoop

Based on characteristics:
Based on behavior: Has certain historical characteristics.
Based on user:
Based on product:

Recommendation

October 12, 2021By Simo Hadoop grms, readmeLeave a Comment

HDFS basic principle

1. ?NameNode overview

a. NameNode is the core of HDFS.

b, NameNode is also called Master.

c. NameNode only stores HDFS metadata: the directory tree of all files in the file system

October 12, 2021By Simo Hadoop basic principle, HDFSLeave a Comment

Hadoop has one or more files per mapper?

Does the mapper process multiple files at the same time or the mapper can only process one file at a time? I want to know the default behavior >By default, a typical Mapreduce job follows one inp

October 12, 2021By Simo Hadoop each, file, Hadoop, Mapping, middle, multiple, oneLeave a Comment

Hadoop – What is the difference between FirstInfirstoutPrioritizer and OldestFlowFileFirstPrioritizer?

The user guide https://nifi.apache.org/docs/nifi-docs/html/user-guide.html has the following detailed information about the priority sorter, please help I understand these differences and provide a

October 12, 2021By Simo Hadoop between, Difference, FirstInfirstoutprioritizer, Hadoop, NiFi, OldestflowFilefirstPrioritizer, whatLeave a Comment

Multi-INSERT Writing in Hive

Multiple inserts:

with tmp_a as (
select name from tmp_test3
)
from tmp_a
insert overwrite table tmp_test1
select name where name = ‘test123’
insert overwrite table tmp_test2
select na

October 12, 2021By Simo Hadoop hive, INSERT, Medium, multiple, writeLeave a Comment

What is the difference between Hadoop – AWS ELASTIC MAPREDUCE and AWS Redshift

I see that both AWS Elastic MapReduce and AWS Redshift use a cluster structure, which can be used for data analysis. What are their different use cases?

Amazon Redshift supports client connec

October 12, 2021By Simo Hadoop AWS, Difference between Hadoop, Elastic, MapReduce, Redshift, whatLeave a Comment

Hue installation and use

HUE is an open source Apache Hadoop UI system, which was developed by Cloudera in the early stage and later contributed to the open source community. It is implemented based on the Python Web frame

October 12, 2021By Simo Hadoop Hue, installation, useLeave a Comment