(Heavy pound) fastest Hadoop fully distributed operation

August 22, 2021By Simo Hadoop

1. Prepare the virtual machine

Clone 3 linux virtual machines, only the machine with centos minimal mode installed

2. Configure the cluster network

Network allocation table

< td>

192.168.178.102

Host name	IP address
hadoop1	192.168.178.101
hadoop2
hadoop3	192.168.178.103

< pre class="cm-s-default">vi /etc/hosts

< /p>

To add in /etc/hosts:

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

3, SSH passwordless login configuration< /h1>

(0) enter the .ssh directory

cd ~/.ssh/

Note: If there is no such directory, it is because I haven’t used ssh yet, usually it will be created automatically after using ssh

(1) Generate a public key

ssh-keygen -t rsa

Note: Press continuously Press Enter 3 times.

(2) Copy the public key

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

Note: The servers that need to issue public keys are: NameNode, ResourceManager

< p>

4. Write cluster distribution script xsync

#!/bin/bash
#1 Get the number of input parameters, if there are no parameters, exit directly pcount=$# if((pcount==0)); then echo no args; exit; fi #2 Get the file namep1=$1 fname=`basename $p1` echo fname=$fname #3 Get the absolute path of the parent directorypdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 Get the current user nameuser=`whoami` # 5 Loop for((host=1; host <4; host++)); do  echo --------< span class="cm-attribute">----------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done < /span>< /span>< /span>

5. Cluster configuration

< td>

hadoop2

hadoop1

hadoop3

HDFS

NameNode

DataNode

< p>

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager

Principle: NameNode, ResourceManager, SecondaryNameNode are allocated in different On the server

*-env.sh is to modify JAVA_HOME

(1)core-site. xml

 <property> <name >hadoop.tmp.dirname> <val ue>/opt/module/hadoop-2.7.2/data/tmpvalue> property> < /span>< /span>

(2)hdfs-site.xml , Hadoop-env.sh

 <property> < name>dfs.replicationname> <value>3value> property> 
<property> <name>yarn.nodemanager.aux-servicesname> <value >mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager. hostnamename> <value>hadoop2value> property> < /span>

(4)mapred-site .xml, mapred-env.sh


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property> < /span>

(5) Configure slaves h2>

hadoop1
hadoop2
hadoop3

(6) Distribution configuration

xsync /opt/module/hadoop-2.7.2 /
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

(7) Cluster update system environment variables

Used to update JAVA_HOME, Hadoop_HOME, PATH

source /etc/profile

6. Cluster startup

(1) First startup: NameNode Format

hdfs namenode -format

(2) Start HDFS, start it on NameNode

sbin/start-dfs.sh

(3) Start YARN and start it on ResourceManager

h2>

sbin/start-yarn.sh

7. Cluster test

(1) Web view NaneNode information

Enter address: hadoop1:50070

< h2 id="WAvKD">(2) Upload a file

[[email protected] software]# hdfs dfs -put hadoop-2.7. 2.tar.gz /

(3) View file block information

Host name	IP address
hadoop1	192.168.178.101
hadoop2	192.168.178.102
hadoop3	192.168.178.103

vi /etc/hosts

< pre class="cm-s-default">vi /etc/hosts

vi /etc/hosts

< div class="CodeMirror">

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

cd ~/.ssh/

cd ~/.ssh/

cd ~/.ssh/

ssh-keygen -t rsa

ssh-keygen -t rsa

< /div>

ssh-keygen -t rsa

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

< span class="cm-meta">#!/bin/bash
#1 Get the number of input parameters, if there are no parameters, exit directly pcount=$# if((pcount==0)); then echo no args; exit; fi #2 Get the file namep1=$1 fname=`basename $p1` echo fname=$fname #3 Get the absolute path of the parent directorypdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 Get the current user nameuser=`whoami` # 5 Loop for((host=1; host <4; host++)); do  echo --------< span class="cm-attribute">----------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done < /span>< /span>< /span>

 #!/bin/bash
#1 获取输入参数个数，如果没有参数，直接退出 pcount=$# if((pcount==0)); then echo no args; exit; fi #2 获取文件名称 p1=$1 fname=`basename $p1` echo fname=$fname #3 获取上级目录到绝对路径 pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 获取当前用户名称 user=`whoami` #5 循环 for((host=1; host<4; host++)); do echo ------------------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done

#!/bin/bash
#1 获取输入参数个数，如果没有参数，直接退出 pcount=$# if((pcount==0)); then echo no args; exit; fi #2 获取文件名称 p1=$1 fname=`basename $p1` echo fname=$fname #3 获取上级目录到绝对路径 pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 获取当前用户名称 user=`whoami` #5 循环 for((host=1; host<4; host++)); do echo ------------------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done

hadoop1

hadoop2

hadoop3

HDFS

NameNode

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager


<property> <name>fs.defaultFSname> <value>hdfs://hadoop1:9000value> property>  <property> <name>hadoop.tmp.dirname> <value>/opt/module/hadoop-2.7.2/data/tmpvalue> property>


<property> <name>fs.defaultFSname> <value>hdfs://hadoop1:9000value> property>  <property> <name>hadoop.tmp.dirname> <value>/opt/module/hadoop-2.7.2/data/tmpvalue> property>


<property> <name>fs.defaultFSname> <value>hdfs://hadoop1:9000value> property>  <property> <name>hadoop.tmp.dirname> <value>/opt/module/hadoop-2.7.2/data/tmpvalue> property>

<property> <name>dfs.replicationname> <value>3value> property>  <property> <name>dfs.namenode.secondary.http-addressname> <value>hadoop3:50090value> property>

<property> <name>dfs.replicationname> <value>3value> property>  <property> <name>dfs.namenode.secondary.http-addressname> <value>hadoop3:50090value> property>

<property> <name>dfs.replicationname> <value>3value> property>  <property> <name>dfs.namenode.secondary.http-addressname> <value>hadoop3:50090value> property>


<property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager.hostnamename> <value>hadoop2value> property>


<property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager.hostnamename> <value>hadoop2value> property>


<property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager.hostnamename> <value>hadoop2value> property>


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property>


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property>


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property>

hadoop1
hadoop2
hadoop3

hadoop1
hadoop2
hadoop3

hadoop1
hadoop2
hadoop3

xsync /opt/module/hadoop-2.7.2/
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

xsync /opt/module/hadoop-2.7.2/
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

xsync /opt/module/hadoop-2.7.2/
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

source /etc/profile

source /etc/profile

source /etc/profile

hdfs namenode -format

hdfs namenode -format

hdfs namenode -format

sbin/start-dfs.sh

sbin/start-dfs.sh

sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/start-yarn.sh

sbin/start-yarn.sh

[[email protected] software]# hdfs dfs -put hadoop-2.7.2.tar.gz /

[[email protected] software]# hdfs dfs -put hadoop-2.7.2.tar.gz /

[[email protected] software]# hdfs dfs -put hadoop-2.7.2.tar.gz /

complete, distributed, fastest, Hadoop, Heavy pound, run

1. Prepare the virtual machine

2. Configure the cluster network

3, SSH passwordless login configuration< /h1>

(0) enter the .ssh directory

(1) Generate a public key

(2) Copy the public key

4. Write cluster distribution script xsync

5. Cluster configuration

(1)core-site. xml

(2)hdfs-site.xml , Hadoop-env.sh

(4)mapred-site .xml, mapred-env.sh

(5) Configure slaves h2> hadoop1 hadoop2 hadoop3

(6) Distribution configuration

(7) Cluster update system environment variables

6. Cluster startup

(1) First startup: NameNode Format

(2) Start HDFS, start it on NameNode

(3) Start YARN and start it on ResourceManager

7. Cluster test

(1) Web view NaneNode information

(3) View file block information

Leave a Comment Cancel reply

(5) Configure slaves h2>

hadoop1 hadoop2 hadoop3