1, Introduction
HUE is an open source Apache Hadoop UI system, which was developed by Cloudera in the early stage and later contributed to the open source community. It is implemented based on the Python Web framework Django. By using Hue, we can manipulate the Hadoop cluster through a browser. For example, put, get, execute MapReduce Job, etc.
2, install
2.1 Install the third-party package that hue depends on
#Install the xml package$>sudo yum install- y libxml2-devel.x86_64 #Install other packages$>sudo yum install -y libxslt-devel.x86_64 python-devel openldap-devel asciidoc cyrus-sasl-gssapi< /span>
3. Configure hue
The connection between hue and hadoop, that is, access to hadoop files, can use two ways.
-
WebHDFS
Provide high-speed data transmission, client can directly communicate with DataNode.
-
HttpFS
A proxy service to facilitate the integration of systems outside the cluster. Note: Only this method can be used in HA mode.
3.1 Configure the hue proxy user of Hadoop
-
[/soft/hadoop /etc/hadoop/core-site.xml]
Note: The proxy user configuration method of hadoop is: hadoop.proxyuser.${superuser}.hosts, here is my superuser It is centos.
<property> <name>hadoop.proxyuser.centos.hostsname> < value>*value> < span class="hljs-name">property> <property> <name>hadoop.proxyuser.centos.groupsname> << span class="hljs-name">value>*value> property> span>
-
[/soft/hadoop/etc/hadoop/hdfs-site.xml]
<property> <name>dfs .webhdfs.enabledname> <value >truevalue> property> pre>
-
[/soft/hadoop/etc/hadoop/httpfs-site.xml]
<property> <name>httpfs.proxyuser.centos.hostsname> <value>*value> property> < property> <name>httpfs.proxyuser.centos.groupsname> <value>*value> property>< /span>
-
Distribution configuration file
$>cd /soft/hadoop/etc/hadoop $>xsync.sh core-site.xml < span class="hljs-meta">$>xsync.sh hdfs-site.xml $>xsync.sh httpfs-site.xml
?< /p>
3.2 Restart hadoop and yarn processes
$>stop- dfs.sh $>stop-dfs.sh $>start- dfs.sh $>start-yarn.sh span>
3.3 start httpfs process
3.3.1 start process
$>/soft/hadoop/sbin/httpfs.sh start
3.3.2 Check 14000 port
$>netstat -anop |grep 14000
p>
3.4 Configure hue file
Here we are using hadoop's namenode HA mode, so we can only configure httpfs to access hdfs files. It should be noted that webhdfs_url specifies a port of 14000, as shown below.
[/home/centos/hue-3.12.0/desktop/conf/hue.ini]
... [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://mycluster:8020 # NameNode logical name. logical_name=mycluster # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://s101:14000/webhdfs/v1 # Change this if your HDFS cluster is Kerberos-secured ## security_enabled=false # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs # have to be verified against certificate authority ## ssl_cert_ca_verify=True # Directory of the Hadoop configuration hadoop_conf_dir=/soft/hadoop/etc/hadoop
3.5 Configure hue's database as mysql
...
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host >:<port>/<service_name>" . # Note for MariaDB use the'mysql' engine. engine=mysql host=192.168.231.1 port=3306 user=root password=root # Execute this script to produce the database password. This will be used when'password' is not set . ## password_script=/path/script name=hue ## options={} # Database schema, to be used only when public schema is revoked in postgres ## schema=< /span>
4, initialize mysql library, generate table
4.1 Create the hue library
Because the database we specified in the hue.ini file is named hue, we need to create the hue database first.
msyql>create database hue ;
4.2 Initialize the data table
This step is to create a table and insert part of the data. The hue initialization data table command is completed by hue/bin/hue syncdb. During the creation, a user name and password are required. As shown below:
#Sync database$>~/hue-3.12.0/build/env/bin/hue syncdb #Import data, including oozie, pig , The table needed for desktop$>~/hue-3.12.0/build/env/bin/hue migrate
4.3 Check whether the table is generated in mysql
Check whether the required is generated in mysql Table, the screenshot is as follows:
msyql>show tables ;
5 , Start the hue process
$>~/hue-3.12.0/build/env/bin/ supervisor
The startup process is shown in the following figure:
6, check webui
http://s101:8888/
Open the login interface and enter the account created above.
7, Visit hdfs
Click on the hdfs link in the upper right corner to enter the hdfs system screen.
8. Configure ResourceManager
8.1 Modify hue.ini configuration file
[[yarn_clusters]] ... # [[[ha]]] # Resource Manager logical name (required for HA) logical_name=cluster1 # Un-comment to enable ## submit_to=True # URL of the ResourceManager API resourcemanager_api_url=http://s101:8088
8.2 View job execution status
9, configure hive
9.1 Write hue .ini file
[beeswax] # Host where HiveServer2 is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=s101 # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located hive_conf_dir=/soft/hive/conf
9.2 Install dependent packages
If the following dependent packages are not installed, it will cause sasl errors. Said hiveserver2 did not start.
$>sudo yum install -y cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl -gssapi
9.3 Start hiveserver2 server
$>/soft/hive/bin/hiveserver2
9.4 View webui
< img alt="1527152006500" src="/wp-content/uploads/images/opensource/hadoop/1626813347605.png" >
10, configuration hbase
10.1 Modify hue.ini configuration file
The hbase configuration is the thriftserver2 server address, not the master address, and it needs to be wrapped in parentheses. The thriftserver needs to be started separately.
[hbase] # Comma-separated list of HBase Thrift servers for clusters in the format of ‘(name|host:port)’. # Use full hostname with security. # If using Kerberos we assume GSSAPI SASL, not PLAIN. hbase_clusters=(s101:9090) # HBase configuration directory, where hbase-site.xml is located. hbase_conf_dir=/soft/hbase/conf
10.2 Start thriftserver server
Note: The name of thriftserver server startup is thrift. Remember: Some documents say thrit2, here is thrfit.
$>hbase-daemon.sh start thrift
10.3 View port 9090
10.4 View hbase in hue
11, Configure spark
11.1 Introduction
Integration of hue and spark Use livy server for transfer, livy server is similar to hive server2. Provide a set of restful-based services, accept HTTP requests submitted by clients, and then forward them to the spark cluster. The livy server is not in the spark distribution package and needs to be downloaded separately.
Note: Hue uses netebook to write scala or python programs. To ensure that the notebook can be used, you need to start the hadoop httpfs process-remember!
Pay attention to download and use a higher version, otherwise some categories will not be found. The download address is as follows:
http://mirrors.tuna.tsinghua.edu.cn/apache/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating- bin.zip
11.2 Unzip
$>unzip livy-server-0.2.0.zip -d /soft/
11.3 start livy server
$>/soft/livy-server-0.2.0/bin/live-server
11.4 Configuring hue
It is recommended to start the job in local or yarn mode. Here we configure it as spark://s101:7077.
[spark] # Host address of the Livy Server. livy_server_host=s101 # Port of the Livy Server. livy_server_port=8998 # Configure Livy to start in local ‘process’ mode, or ‘yarn’ workers. livy_server_session_kind=spark://s101:7077
11.5 Use notebook to write scala programs
1, Introduction
HUE It is an open source Apache Hadoop UI system, which was developed by Cloudera in the early stage and later contributed to the open source community. It is implemented based on the Python Web framework Django. By using Hue, we can manipulate the Hadoop cluster through a browser. For example, put, get, execute MapReduce Job, etc.
2, install
2.1 Install the third-party package that hue depends on
#Install the xml package$>sudo yum install- y libxml2-devel.x86_64 #Install other packages$>sudo yum install -y libxslt-devel.x86_64 python-devel openldap-devel asciidoc cyrus-sasl-gssapi< /span>
3. Configure hue
The connection between hue and hadoop, that is, access to hadoop files, can use two methods.
-
WebHDFS
Provide high-speed data transmission, client can directly communicate with DataNode.
-
HttpFS
A proxy service to facilitate the integration of systems outside the cluster. Note: Only this method can be used in HA mode.
3.1 Configure the hue proxy user of Hadoop
-
[/soft/hadoop /etc/hadoop/core-site.xml]
Note: The proxy user configuration method of hadoop is: hadoop.proxyuser.${superuser}.hosts, here is my superuser It is centos.
<property> <name>hadoop.proxyuser.centos.hostsname> < value>*value> < span class="hljs-name">property> <property> <name>hadoop.proxyuser.centos.groupsname> << span class="hljs-name">value>*value> property> span>
-
[/soft/hadoop/etc/hadoop/hdfs-site.xml]
<property> <name>dfs .webhdfs.enabledname> <value >truevalue> property> pre>
-
[/soft/hadoop/etc/hadoop/httpfs-site.xml]
<property> <name>httpfs.proxyuser.centos.hostsname> <value>*value> property> < property> < name>httpfs.proxyuser.centos.groupsname> <value>*value> property>< /span>
-
Distribution configuration file
$> cd /soft/hadoop/etc/hadoop $>xsync.sh core-site.xml $>xsync.sh hdfs-site.xml $>xsync.sh httpfs- site.xml
?
3.2 Restart hadoop and yarn process
$ >stop-dfs.sh $>stop-dfs.sh $>start-dfs.sh $>start-yarn.sh< /span>
3.3 Start httpfs process
3.3.1 Start process
$>/soft/hadoop/sbin/httpfs.sh start
3.3.2 check 14000 port
$>netstat -anop | grep 14000
< h4 id="Configure hue file">3.4 Configure hue file
Here we are using hadoop's namenode HA mode, so we can only configure httpfs to access hdfs files. It should be noted that webhdfs_url specifies a port of 14000, as shown below.
[/home/centos/hue-3.12.0/desktop/conf/hue.ini]
... [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://mycluster:8020 # NameNode logical name. logical_name=mycluster # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://s101:14000/webhdfs/v1 # Change this if your HDFS cluster is Kerberos-secured ## security_enabled=false # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs # have to be verified against certificate authority ## ssl_cert_ca_verify=True # Directory of the Hadoop configuration hadoop_conf_dir=/soft/hadoop/etc/hadoop
3.5 Configure hue's database as mysql
...
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host >:<port>/<service_name>" . # Note for MariaDB use the'mysql' engine. engine=mysql host=192.168.231.1 port=3306 user=root password=root # Execute this script to produce the database password. This will be used when'password' is not set . ## password_script=/path/script name=hue ## options={} # Database schema, to be used only when public schema is revoked in postgres ## schema=< /span>
4, initialize mysql library, generate table
4.1 Create the hue library
Because the database we specified in the hue.ini file is named hue, we need to create the hue database first.
msyql>create database hue ;
4.2 Initialize the data table
This step is to create a table and insert part of the data. The hue initialization data table command is completed by hue/bin/hue syncdb. During the creation, a user name and password are required. As shown below:
#Sync database$>~/hue-3.12.0/build/env/bin/hue syncdb #Import data, including oozie, pig , The table needed for desktop$>~/hue-3.12.0/build/env/bin/hue migrate
4.3 Check whether the table is generated in mysql
Check whether the required table is generated in mysql Table, the screenshot is as follows:
msyql>show tables ;
5 , Start the hue process
$>~/hue-3.12.0/build/env/bin/ supervisor
The startup process is shown in the following figure:
6, check webui
http://s101:8888/ pre> Open the login interface and enter the account created above.
7, Visit hdfs
Click on the hdfs link in the upper right corner to enter the hdfs system screen.
8. Configure ResourceManager
8.1 Modify hue.ini configuration file
[[yarn_clusters]]
...
# [[[ha]]]
# Resource Manager logical name (required for HA)
logical_name=cluster1
# Un-comment to enable
## submit_to=True
# URL of the ResourceManager API
resourcemanager_api_url=http://s101:8088
8.2 View job execution status
9. Configure hive
9.1 Write hue .ini file
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=s101
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/soft/hive/conf
9.2 Install dependent packages
If the following dependent packages are not installed, it will cause sasl errors. Said hiveserver2 did not start.
$>sudo yum install -y cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl -gssapi
9.3 Start hiveserver2 server
$>/soft/hive/bin/hiveserver2
9.4 View webui
< img alt="1527152006500" src="/wp-content/uploads/images/opensource/hadoop/1626813347635.png" >
10. Configure hbase
10.1 Modify hue.ini configuration file
The hbase configuration is the thriftserver2 server address, not the master address, and it needs to be wrapped in parentheses. The thriftserver needs to be started separately.
[hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of ‘(name|host:port)’.
# Use full hostname with security.
# If using Kerberos we assume GSSAPI SASL, not PLAIN.
hbase_clusters=(s101:9090)
# HBase configuration directory, where hbase-site.xml is located.
hbase_conf_dir=/soft/hbase/conf
10.2 Start thriftserver server
Note: The name of thriftserver server startup is thrift. Remember: Some documents say thrit2, here is thrfit.
$>hbase-daemon.sh start thrift
10.3 View port 9090
10.4 View hbase in hue
11, Configure spark
11.1 Introduction
The integration of hue and spark Use livy server for transfer, livy server is similar to hive server2. Provide a set of restful-based services, accept HTTP requests submitted by clients, and then forward them to the spark cluster. The livy server is not in the spark distribution package and needs to be downloaded separately.
Note: Hue uses netebook to write scala or python programs. To ensure that the notebook can be used, you need to start the hadoop httpfs process-remember!
Pay attention to download and use a higher version, otherwise some categories will not be found. The download address is as follows:
http://mirrors.tuna.tsinghua.edu.cn/apache/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating- bin.zip
11.2 Unzip
$>unzip livy-server-0.2.0.zip -d /soft/
11.3 start livy server
$>/soft/livy-server-0.2.0/bin/live-server
11.4 Configure hue
It is recommended to use local or yarn mode to start the job, Here we configure it as spark://s101:7077.
[spark]
# Host address of the Livy Server.
livy_server_host=s101
# Port of the Livy Server.
livy_server_port=8998
# Configure Livy to start in local ‘process’ mode, or ‘yarn’ workers.
livy_server_session_kind=spark://s101:7077
11.5 Use notebook to write scala programs