Hadoop Archives - Page 4 of 4 - Simon Technology Blog

hadoop – Apache Hive regexp_extract UDF

I encountered a piece of code in Apache Hive, such as regexp_extract(input,'[0-9] *’,0), can someone explain to me what this code does? Thank you Starting from the Hive manual DDL, it returns the

October 12, 2021By Simo Hadoop apache, extract, Hadoop, hive, regexp, UDFLeave a Comment

MapReduce on Hadoop said “output file already exists”

I ran a wordcount example using Mapreduce for the first time, and it worked. Then, I stopped the cluster, started it temporarily, and followed the same steps. This error is displayed:

10P:/$

October 12, 2021By Simo Hadoop already existed, file, Hadoop, MapReduce, Output, saidLeave a Comment

Hadoop fs -cp, said that the file does not exist?

The new.txt file is certain; I don’t know why when I try to enter the hdfs directory, it says the file does not exist.

deepak@deepak:/$cd $HOME/fs
deepak@deepak:~/fs$ls
new.txt
deepak@deepak:

October 12, 2021By Simo Hadoop cp, existence, file, fs, Hadoop, sayLeave a Comment

Hadoop sequence data access

According to Hadoop authoritative guidelines:

HDFS is a filesystem designed for storing very large files with
streaming or sequential data access patterns

What is streaming or sequenti

October 12, 2021By Simo Hadoop Access, data, Hadoop, orderLeave a Comment

Hadoop pseudo-distribution

The virtual machine creation and basic linux configuration are skipped, and the key configuration for building a pseudo-distributed hadoop cluster on a single node is recorded.

Get the hadoop

October 12, 2021By Simo Hadoop distributed, Hadoop, Pseudo, setLeave a Comment

Mac deployed Hadoop3 (pseudo-distributed)

Environmental information Operating system: macOS Mojave 10.14.6 JDK: 1.8.0_211 (installation location: /Library /Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home) hadoop: 3.2.1

In “S

September 24, 2021By Simo Macbook deployment, distributed, Hadoop, Hadoop3, Mac, PseudoLeave a Comment

(Heavy pound) fastest Hadoop fully distributed operation

1. Prepare the virtual machine Clone 3 linux virtual machines, only the machine with centos minimal mode installed

Network allocation table

Host name

IP address

hadoop1

August 22, 2021By Simo Hadoop complete, distributed, fastest, Hadoop, Heavy pound, runLeave a Comment

9, Hadoop-HDFS Overview

1. Background and definition of HDFS generation Background generation

As the amount of data becomes larger and larger, it is stored in a system If you don’t have all the data, you need to all

August 22, 2021By Simo Hadoop Hadoop, HDFS, overviewLeave a Comment

6-Hadoop operating mode (fully distributed) (on)

Note: In actual production and development, fully distributed is used

1) Prepare 3 clients (close firewall, static ip, host name)

2) Install JDK

3) Configure environment Variables

August 22, 2021By Simo Hadoop complete, distributed, Hadoop, mode, runLeave a Comment