(Heavy pound) fastest Hadoop fully distributed operation

1. Prepare the virtual machine

Clone 3 linux virtual machines, only the machine with centos minimal mode installed

2. Configure the cluster network

Network allocation table

< td>

192.168.178.102

Host name

IP address

hadoop1

192.168.178.101

hadoop2

hadoop3

192.168.178.103

< pre class="cm-s-default">vi /etc/hosts

< /p>

To add in /etc/hosts:

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

3, SSH passwordless login configuration< /h1>

(0) enter the .ssh directory

cd ~/.ssh/

Note: If there is no such directory, it is because I haven’t used ssh yet, usually it will be created automatically after using ssh

(1) Generate a public key

ssh-keygen -t rsa

Note: Press continuously Press Enter 3 times.

(2) Copy the public key

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

Note: The servers that need to issue public keys are: NameNode, ResourceManager

< p>

4. Write cluster distribution script xsync

#!/bin/bash
#1 Get the number of input parameters, if there are no parameters, exit directly pcount=$# if((pcount==0)); then echo no args; exit; fi #2 Get the file namep1=$1 fname=`basename $p1` echo fname=$fname #3 Get the absolute path of the parent directorypdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 Get the current user nameuser=`whoami` # 5 Loop for((host=1; host <4; host++)); do  echo --------< span class="cm-attribute">----------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done < /span>< /span>< /span>

5. Cluster configuration

< td>

hadoop2

hadoop1

hadoop3

HDFS

NameNode

DataNode

< p>

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager

NodeManager

Principle: NameNode, ResourceManager, SecondaryNameNode are allocated in different On the server

*-env.sh is to modify JAVA_HOME

(1)core-site. xml

 <property> <name >hadoop.tmp.dirname> <val ue>/opt/module/hadoop-2.7.2/data/tmpvalue> property> < /span>< /span>

(2)hdfs-site.xml , Hadoop-env.sh

 <property> < name>dfs.replicationname> <value>3value> property> 
<property> <name>yarn.nodemanager.aux-servicesname> <value >mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager. hostnamename> <value>hadoop2value> property> < /span>

(4)mapred-site .xml, mapred-env.sh


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property> < /span>

(5) Configure slaves h2>

hadoop1
hadoop2
hadoop3

(6) Distribution configuration

xsync /opt/module/hadoop-2.7.2 /
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

(7) Cluster update system environment variables

Used to update JAVA_HOME, Hadoop_HOME, PATH

source /etc/profile

6. Cluster startup

(1) First startup: NameNode Format

hdfs namenode -format

(2) Start HDFS, start it on NameNode

sbin/start-dfs.sh

(3) Start YARN and start it on ResourceManager

h2>

sbin/start-yarn.sh

7. Cluster test

(1) Web view NaneNode information

Enter address: hadoop1:50070

image.png

< h2 id="WAvKD">(2) Upload a file

[[email protected] software]# hdfs dfs -put hadoop-2.7. 2.tar.gz /

image.png

(3) View file block information

image.png

Host name

IP address

hadoop1

192.168.178.101

hadoop2

192.168.178.102

hadoop3

192.168.178.103

vi /etc/hosts

< pre class="cm-s-default">vi /etc/hosts

vi /etc/hosts

< div class="CodeMirror">

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

192.168.178.101 hadoop1
192.168.178.102 hadoop2
192.168.178.103 hadoop3

cd ~/.ssh/

cd ~/.ssh/

cd ~/.ssh/

ssh-keygen -t rsa

ssh-keygen -t rsa

< /div>

ssh-keygen -t rsa

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

< span class="cm-meta">#!/bin/bash
#1 Get the number of input parameters, if there are no parameters, exit directly pcount=$# if((pcount==0)); then echo no args; exit; fi #2 Get the file namep1=$1 fname=`basename $p1` echo fname=$fname #3 Get the absolute path of the parent directorypdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 Get the current user nameuser=`whoami` # 5 Loop for((host=1; host <4; host++)); do  echo --------< span class="cm-attribute">----------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done < /span>< /span>< /span>

 #!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出 pcount=$# if((pcount==0)); then echo no args; exit; fi #2 获取文件名称 p1=$1 fname=`basename $p1` echo fname=$fname #3 获取上级目录到绝对路径 pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 获取当前用户名称 user=`whoami` #5 循环 for((host=1; host<4; host++)); do echo ------------------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done 

#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出 pcount=$# if((pcount==0)); then echo no args; exit; fi #2 获取文件名称 p1=$1 fname=`basename $p1` echo fname=$fname #3 获取上级目录到绝对路径 pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir #4 获取当前用户名称 user=`whoami` #5 循环 for((host=1; host<4; host++)); do echo ------------------- hadoop$host -------------- rsync -rvl $pdir/$fname [email protected]$host:$pdir done 

 

hadoop1

hadoop2

hadoop3

HDFS

NameNode

DataNode

 

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager

NodeManager


<property> <name>fs.defaultFSname> <value>hdfs://hadoop1:9000value> property>  <property> <name>hadoop.tmp.dirname> <value>/opt/module/hadoop-2.7.2/data/tmpvalue> property> 


<property> <name>fs.defaultFSname> <value>hdfs://hadoop1:9000value> property>  <property> <name>hadoop.tmp.dirname> <value>/opt/module/hadoop-2.7.2/data/tmpvalue> property> 


<property> <name>fs.defaultFSname> <value>hdfs://hadoop1:9000value> property>  <property> <name>hadoop.tmp.dirname> <value>/opt/module/hadoop-2.7.2/data/tmpvalue> property> 

<property> <name>dfs.replicationname> <value>3value> property>  <property> <name>dfs.namenode.secondary.http-addressname> <value>hadoop3:50090value> property> 

<property> <name>dfs.replicationname> <value>3value> property>  <property> <name>dfs.namenode.secondary.http-addressname> <value>hadoop3:50090value> property> 

<property> <name>dfs.replicationname> <value>3value> property>  <property> <name>dfs.namenode.secondary.http-addressname> <value>hadoop3:50090value> property> 


<property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager.hostnamename> <value>hadoop2value> property> 


<property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager.hostnamename> <value>hadoop2value> property> 


<property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property>  <property> <name>yarn.resourcemanager.hostnamename> <value>hadoop2value> property> 


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property> 


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property> 


<property> <name>mapreduce.framework.namename> <value>yarnvalue> property> 

hadoop1
hadoop2
hadoop3

hadoop1
hadoop2
hadoop3

hadoop1
hadoop2
hadoop3

xsync /opt/module/hadoop-2.7.2/
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

xsync /opt/module/hadoop-2.7.2/
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

xsync /opt/module/hadoop-2.7.2/
xsync /opt/module/jdk1.8.0_144
xsync /etc/profile

source /etc/profile

source /etc/profile

source /etc/profile

hdfs namenode -format

hdfs namenode -format

hdfs namenode -format

sbin/start-dfs.sh

sbin/start-dfs.sh

sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/start-yarn.sh

sbin/start-yarn.sh

[[email protected] software]# hdfs dfs -put hadoop-2.7.2.tar.gz /

[[email protected] software]# hdfs dfs -put hadoop-2.7.2.tar.gz /

[[email protected] software]# hdfs dfs -put hadoop-2.7.2.tar.gz /

Leave a Comment

Your email address will not be published.