Category: Hadoop

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distributed. Make full use of the power of clusters for high-speed computing and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data sets). set) application. HDFS relaxes the requirements of POSIX and can access data in the file system in the form of streaming access. The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive amounts of data, while MapReduce provides calculations for massive amounts of data.

Hive data type

Basic type < p align="center">Type name

Size

Minimum value

Maximum value

Example

TINYINT

1byte

-128

127

100Y

SMALLINT

2byte

October 12, 2021By Simo Hadoop data, hive, typeLeave a Comment

Hadoop – How to extract the first tuple from the package generated in the PIG (whose size may be different)?

I’m generating a “package” information, its size (the number of tuples in the package) may be different. From here, I want to dynamically extract the first element. I should How to do it Accordi

October 12, 2021By Simo Hadoop Different, Group, Medium, Of a group, possible, SizeLeave a Comment

Hadoop-HDFS-Storage Model – Architecture Model – Role Introduction

October 12, 2021By Simo Hadoop architecture, Hadoop, HDFS, Introduction, model, role, storageLeave a Comment

How CouchBase achieves powerful consistency

I searched for an explanation of how Couchbase achieves strong consistency within the cluster. Are all these results of using membase? Couchbase IS membase btw. Couchbase IS membase btw. Couchba

October 12, 2021By Simo Hadoop Consistency, Couchbase, How, implement, powerfulLeave a Comment

Hive Digital Client Interface Tool

1. Hive’s official website introduces three graphical interface tools that can connect to HiveServer2 through JDBC in Windows, including: SQuirrel SQL Client, Oracle SQL Developer and DbVisualizer.

October 12, 2021By Simo Hadoop client, hive, Interface, Number, tool, warehouseLeave a Comment

Why happened to hadoop spilling?

I am very new to the Hadoop system and the learning phase.

I noticed that Spill occurs as long as the MapOutputBuffer reaches 80% in the Shuffle and Sort phases (I think this can also be conf

October 12, 2021By Simo Hadoop Hadoop, spilling, why, will occurLeave a Comment

Hadoop – excludes partition fields from the selection query in Hive

Suppose I have the following table definition in Hive (the actual table has about 65 columns):

CREATE EXTERNAL TABLE S .TEST (
COL1 STRING,
COL2 STRING
)
PARTITIONED BY (extract_date STRING

October 12, 2021By Simo Hadoop fields, SubzaLeave a Comment