Hue installation and use

1, Introduction

HUE is an open source Apache Hadoop UI system, which was developed by Cloudera in the early stage and later contributed to the open source community. It is implemented based on the Python Web framework Django. By using Hue, we can manipulate the Hadoop cluster through a browser. For example, put, get, execute MapReduce Job, etc.

2, install

2.1 Install the third-party package that hue depends on

#Install the xml package$>sudo yum install- y libxml2-devel.x86_64  #Install other packages$>sudo yum install -y libxslt-devel.x86_64 python-devel openldap-devel asciidoc cyrus-sasl-gssapi< /span>

1527152006500

3. Configure hue

The connection between hue and hadoop, that is, access to hadoop files, can use two ways.

  • WebHDFS

    Provide high-speed data transmission, client can directly communicate with DataNode.

  • HttpFS

    A proxy service to facilitate the integration of systems outside the cluster. Note: Only this method can be used in HA mode.

3.1 Configure the hue proxy user of Hadoop

  1. [/soft/hadoop /etc/hadoop/core-site.xml]

    Note: The proxy user configuration method of hadoop is: hadoop.proxyuser.${superuser}.hosts, here is my superuser It is centos.

    <property> <name>hadoop.proxyuser.centos.hostsname> < value>*value> property> <property> <name>hadoop.proxyuser.centos.groupsname> << span class="hljs-name">value>*value> property>  span>
  2. [/soft/hadoop/etc/hadoop/hdfs-site.xml]

    <property> <name>dfs .webhdfs.enabledname> <value >truevalue> property>  pre> 
  3. [/soft/hadoop/etc/hadoop/httpfs-site.xml]

     <property> <name>httpfs.proxyuser.centos.hostsname> <value>*value> property> < property> <name>httpfs.proxyuser.centos.groupsname> <value>*value> property>< /span>
  4. Distribution configuration file

    $>cd /soft/hadoop/etc/hadoop $>xsync.sh core-site.xml < span class="hljs-meta">$>xsync.sh hdfs-site.xml $>xsync.sh httpfs-site.xml

    ?< /p>

3.2 Restart hadoop and yarn processes

$>stop- dfs.sh $>stop-dfs.sh  $>start- dfs.sh $>start-yarn.sh

3.3 start httpfs process

3.3.1 start process
$>/soft/hadoop/sbin/httpfs.sh start
3.3.2 Check 14000 port
$>netstat -anop |grep 14000

1527152006500

3.4 Configure hue file

Here we are using hadoop's namenode HA mode, so we can only configure httpfs to access hdfs files. It should be noted that webhdfs_url specifies a port of 14000, as shown below.

[/home/centos/hue-3.12.0/desktop/conf/hue.ini]

...
    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://mycluster:8020

      # NameNode logical name.
      logical_name=mycluster

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://s101:14000/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

      # Directory of the Hadoop configuration
      hadoop_conf_dir=/soft/hadoop/etc/hadoop

3.5 Configure hue's database as mysql

...
    [[database]]
    # Database engine is typically one of:
    # postgresql_psycopg2, mysql, sqlite3 or oracle.
    #
    # Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name
    # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
    # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host >:<port>/<service_name>" . # Note for MariaDB use the'mysql' engine. engine=mysql host=192.168.231.1 port=3306 user=root password=root # Execute this script to produce the database password. This will be used when'password' is not set . ## password_script=/path/script name=hue ## options={} # Database schema, to be used only when public schema is revoked in postgres ## schema=< /span>

4, initialize mysql library, generate table

4.1 Create the hue library

Because the database we specified in the hue.ini file is named hue, we need to create the hue database first.

msyql>create database hue ;

4.2 Initialize the data table

This step is to create a table and insert part of the data. The hue initialization data table command is completed by hue/bin/hue syncdb. During the creation, a user name and password are required. As shown below:

#Sync database$>~/hue-3.12.0/build/env/bin/hue syncdb #Import data, including oozie, pig , The table needed for desktop$>~/hue-3.12.0/build/env/bin/hue migrate 

1527152006500

4.3 Check whether the table is generated in mysql

Check whether the required is generated in mysql Table, the screenshot is as follows:

msyql>show tables ; 

1527152006500

5 , Start the hue process

$>~/hue-3.12.0/build/env/bin/ supervisor

The startup process is shown in the following figure:

1527152006500

6, check webui

http://s101:8888/ 

Open the login interface and enter the account created above.

1527152006500

7, Visit hdfs

Click on the hdfs link in the upper right corner to enter the hdfs system screen.

1527152006500

1527152006500

8. Configure ResourceManager

8.1 Modify hue.ini configuration file

[[yarn_clusters]]
    ...
    # [[[ha]]]
      # Resource Manager logical name (required for HA)
      logical_name=cluster1

      # Un-comment to enable
      ## submit_to=True

      # URL of the ResourceManager API
      resourcemanager_api_url=http://s101:8088

8.2 View job execution status

1527152006500

9, configure hive

9.1 Write hue .ini file

[beeswax]
  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=s101

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/soft/hive/conf

9.2 Install dependent packages

If the following dependent packages are not installed, it will cause sasl errors. Said hiveserver2 did not start.

$>sudo yum install -y cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl -gssapi

9.3 Start hiveserver2 server

 $>/soft/hive/bin/hiveserver2

9.4 View webui

< img alt="1527152006500" src="/wp-content/uploads/images/opensource/hadoop/1626813347605.png" >

10, configuration hbase

10.1 Modify hue.ini configuration file

The hbase configuration is the thriftserver2 server address, not the master address, and it needs to be wrapped in parentheses. The thriftserver needs to be started separately.

[hbase]
  # Comma-separated list of HBase Thrift servers for clusters in the format of ‘(name|host:port)’.
  # Use full hostname with security.
  # If using Kerberos we assume GSSAPI SASL, not PLAIN.
  hbase_clusters=(s101:9090)

  # HBase configuration directory, where hbase-site.xml is located.
  hbase_conf_dir=/soft/hbase/conf

10.2 Start thriftserver server

Note: The name of thriftserver server startup is thrift. Remember: Some documents say thrit2, here is thrfit.

$>hbase-daemon.sh start thrift

10.3 View port 9090

1527152006500

10.4 View hbase in hue

1527152006500

11, Configure spark

11.1 Introduction

Integration of hue and spark Use livy server for transfer, livy server is similar to hive server2. Provide a set of restful-based services, accept HTTP requests submitted by clients, and then forward them to the spark cluster. The livy server is not in the spark distribution package and needs to be downloaded separately.

Note: Hue uses netebook to write scala or python programs. To ensure that the notebook can be used, you need to start the hadoop httpfs process-remember!

Pay attention to download and use a higher version, otherwise some categories will not be found. The download address is as follows:

http://mirrors.tuna.tsinghua.edu.cn/apache/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating- bin.zip

11.2 Unzip

$>unzip livy-server-0.2.0.zip -d /soft/

11.3 start livy server

$>/soft/livy-server-0.2.0/bin/live-server

1527152006500

1527152006500

11.4 Configuring hue

It is recommended to start the job in local or yarn mode. Here we configure it as spark://s101:7077.

[spark]
  # Host address of the Livy Server.
  livy_server_host=s101

  # Port of the Livy Server.
  livy_server_port=8998

  # Configure Livy to start in local ‘process’ mode, or ‘yarn’ workers.
  livy_server_session_kind=spark://s101:7077

11.5 Use notebook to write scala programs

1527152006500

1527152006500

1, Introduction

HUE It is an open source Apache Hadoop UI system, which was developed by Cloudera in the early stage and later contributed to the open source community. It is implemented based on the Python Web framework Django. By using Hue, we can manipulate the Hadoop cluster through a browser. For example, put, get, execute MapReduce Job, etc.

2, install

2.1 Install the third-party package that hue depends on

#Install the xml package$>sudo yum install- y libxml2-devel.x86_64  #Install other packages$>sudo yum install -y libxslt-devel.x86_64 python-devel openldap-devel asciidoc cyrus-sasl-gssapi< /span>

1527152006500

3. Configure hue

The connection between hue and hadoop, that is, access to hadoop files, can use two methods.

  • WebHDFS

    Provide high-speed data transmission, client can directly communicate with DataNode.

  • HttpFS

    A proxy service to facilitate the integration of systems outside the cluster. Note: Only this method can be used in HA mode.

3.1 Configure the hue proxy user of Hadoop

  1. [/soft/hadoop /etc/hadoop/core-site.xml]

    Note: The proxy user configuration method of hadoop is: hadoop.proxyuser.${superuser}.hosts, here is my superuser It is centos.

    <property> <name>hadoop.proxyuser.centos.hostsname> < value>*value> property> <property> <name>hadoop.proxyuser.centos.groupsname> << span class="hljs-name">value>*value> property>  span>
  2. [/soft/hadoop/etc/hadoop/hdfs-site.xml]

    <property> <name>dfs .webhdfs.enabledname> <value >truevalue> property>  pre> 
  3. [/soft/hadoop/etc/hadoop/httpfs-site.xml]

     <property> <name>httpfs.proxyuser.centos.hostsname> <value>*value> property> < property> < name>httpfs.proxyuser.centos.groupsname> <value>*value> property>< /span>
  4. Distribution configuration file

    $> cd /soft/hadoop/etc/hadoop $>xsync.sh core-site.xml $>xsync.sh hdfs-site.xml $>xsync.sh httpfs- site.xml

    ?

3.2 Restart hadoop and yarn process

$ >stop-dfs.sh $>stop-dfs.sh  $>start-dfs.sh $>start-yarn.sh< /span>

3.3 Start httpfs process

3.3.1 Start process
$>/soft/hadoop/sbin/httpfs.sh start
3.3.2 check 14000 port
$>netstat -anop | grep 14000

1527152006500

< h4 id="Configure hue file">3.4 Configure hue file

Here we are using hadoop's namenode HA mode, so we can only configure httpfs to access hdfs files. It should be noted that webhdfs_url specifies a port of 14000, as shown below.

[/home/centos/hue-3.12.0/desktop/conf/hue.ini]

...
    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://mycluster:8020

      # NameNode logical name.
      logical_name=mycluster

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://s101:14000/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

      # Directory of the Hadoop configuration
      hadoop_conf_dir=/soft/hadoop/etc/hadoop

3.5 Configure hue's database as mysql

...
    [[database]]
    # Database engine is typically one of:
    # postgresql_psycopg2, mysql, sqlite3 or oracle.
    #
    # Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name
    # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
    # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host >:<port>/<service_name>" . # Note for MariaDB use the'mysql' engine. engine=mysql host=192.168.231.1 port=3306 user=root password=root # Execute this script to produce the database password. This will be used when'password' is not set . ## password_script=/path/script name=hue ## options={} # Database schema, to be used only when public schema is revoked in postgres ## schema=< /span>

4, initialize mysql library, generate table

4.1 Create the hue library

Because the database we specified in the hue.ini file is named hue, we need to create the hue database first.

msyql>create database hue ;

4.2 Initialize the data table

This step is to create a table and insert part of the data. The hue initialization data table command is completed by hue/bin/hue syncdb. During the creation, a user name and password are required. As shown below:

#Sync database$>~/hue-3.12.0/build/env/bin/hue syncdb #Import data, including oozie, pig , The table needed for desktop$>~/hue-3.12.0/build/env/bin/hue migrate 

1527152006500

4.3 Check whether the table is generated in mysql

Check whether the required table is generated in mysql Table, the screenshot is as follows:

msyql>show tables ; 

1527152006500

5 , Start the hue process

$>~/hue-3.12.0/build/env/bin/ supervisor

The startup process is shown in the following figure:

1527152006500

6, check webui

http://s101:8888/

pre>

Open the login interface and enter the account created above.

1527152006500

7, Visit hdfs

Click on the hdfs link in the upper right corner to enter the hdfs system screen.

1527152006500

1527152006500

8. Configure ResourceManager

8.1 Modify hue.ini configuration file

[[yarn_clusters]]
    ...
    # [[[ha]]]
      # Resource Manager logical name (required for HA)
      logical_name=cluster1

      # Un-comment to enable
      ## submit_to=True

      # URL of the ResourceManager API
      resourcemanager_api_url=http://s101:8088

8.2 View job execution status

1527152006500

9. Configure hive

9.1 Write hue .ini file

[beeswax]
  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=s101

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/soft/hive/conf

9.2 Install dependent packages

If the following dependent packages are not installed, it will cause sasl errors. Said hiveserver2 did not start.

$>sudo yum install -y cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl -gssapi

9.3 Start hiveserver2 server

 $>/soft/hive/bin/hiveserver2

9.4 View webui

< img alt="1527152006500" src="/wp-content/uploads/images/opensource/hadoop/1626813347635.png" >

10. Configure hbase

10.1 Modify hue.ini configuration file

The hbase configuration is the thriftserver2 server address, not the master address, and it needs to be wrapped in parentheses. The thriftserver needs to be started separately.

[hbase]
  # Comma-separated list of HBase Thrift servers for clusters in the format of ‘(name|host:port)’.
  # Use full hostname with security.
  # If using Kerberos we assume GSSAPI SASL, not PLAIN.
  hbase_clusters=(s101:9090)

  # HBase configuration directory, where hbase-site.xml is located.
  hbase_conf_dir=/soft/hbase/conf

10.2 Start thriftserver server

Note: The name of thriftserver server startup is thrift. Remember: Some documents say thrit2, here is thrfit.

$>hbase-daemon.sh start thrift

10.3 View port 9090

1527152006500

10.4 View hbase in hue

1527152006500

11, Configure spark

11.1 Introduction

The integration of hue and spark Use livy server for transfer, livy server is similar to hive server2. Provide a set of restful-based services, accept HTTP requests submitted by clients, and then forward them to the spark cluster. The livy server is not in the spark distribution package and needs to be downloaded separately.

Note: Hue uses netebook to write scala or python programs. To ensure that the notebook can be used, you need to start the hadoop httpfs process-remember!

Pay attention to download and use a higher version, otherwise some categories will not be found. The download address is as follows:

http://mirrors.tuna.tsinghua.edu.cn/apache/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating- bin.zip

11.2 Unzip

$>unzip livy-server-0.2.0.zip -d /soft/

11.3 start livy server

$>/soft/livy-server-0.2.0/bin/live-server

1527152006500

1527152006500

11.4 Configure hue

It is recommended to use local or yarn mode to start the job, Here we configure it as spark://s101:7077.

[spark]
  # Host address of the Livy Server.
  livy_server_host=s101

  # Port of the Livy Server.
  livy_server_port=8998

  # Configure Livy to start in local ‘process’ mode, or ‘yarn’ workers.
  livy_server_session_kind=spark://s101:7077

11.5 Use notebook to write scala programs

1527152006500

1527152006500

Leave a Comment

Your email address will not be published.