Big Data Quarry Presto, 10 times faster than HIVE

At present, the most popular big data query engine is Hive. It is a SQL-like query tool based on MR. It interprets the input query SQL as MapReduce, which can greatly reduce the threshold for using big data queries. , So that ordinary business personnel can directly query big data. However, because it is based on MR, the running speed is a drawback, and usually it takes a long time to run a query before there is a result. For this situation, Facebook, which created hive, has lived up to expectations and created a new artifact —presto. Its query speed is on average 10 times faster than hive. Let’s deploy and experience it now.

1. Preparations

Operating system: centos7

JAVA: JDK8 (version 155 and above), I am using jdk1.8.0_191

presto server: presto-server-0.221.tar.gz

presto client: presto-cli-0.221-executable.jar

Note:

a) This deployment is based on hive, so the related nodes have deployed hadoop and hive;

b) The official website address of presto is https://prestodb.github.io presto server, client and jdbc Jars can be downloaded from the official website.

2. Deployment stage

1. Upload jdk and presto server presto client to each server

I upload the jdk package to the /usr/local directory, And unzip, configure soft links, configure environment variables, if you don’t configure environment variables, you can also modify them in the launcher

share pictures

Upload the presto server and client to /opt/presto, and unzip the server package at the same time

share picture

2. The information of each node is as follows

It contains a Coordinator node And 8 worker nodes

< /tr>

< td>Worker

< td>node57

ip node role node name
192.168.11.22 Coordinator node22
192.168.11.50 node50
192.168.11.51 Worker node51
192.168.11.52 Worker node52
192.168.11.53 Worker node53
192.168.11.54 Worker node54
192.168.11.55 Worker node55
192.168.11.56 Worker node56
192.168.11.57  Worker

3. Create presto data and log directories

The following operations are the same for all nodes, only the configuration file needs to be based on Each node situation, corresponding modification

mkdir -p /data/presto

4. Create etc directory

cd /opt/presto/presto-server-0.221

span>
mkdir etc

5. Create the required configuration file

Share pictures

1) Create and configure config.properties

If It is the Coordinator node. The following configuration is recommended (the memory size is modified according to the actual situation)

vim config.properties
## Add the following content< br>coordinator=true
datasources
=hive
node
-scheduler.include-coordinator=false
http
-server.http.port=8080
query.max
-memory=80GB
query.max
-memory-per-node=10GB
query.max
-total-memory-per-node=10GB
discovery
-server.enabled=true
discovery.uri
=http://192.168.11.22:8080< /span>

If it is a worker node:

vim config.properties ## Add the following content
coordinator=false
#datasources
=hive
#node
-scheduler.include-coordinator=false
http
-server.http.port=8080
query.max
-memory=80GB
query.max
-memory-per-node=10GB
query.max
-total-memory-per-node=10GB
#discovery
-server.enabled=true
discovery.uri
=http://192.168.248.22:8080< /span>

Parameter description:

coordinator: Whether to run the instance as a coordinator (accept client's query and management query execution).

node-scheduler.include-coordinator: Whether coordinator is also used as work. For large clusters, working as a worker in the coordinator will affect query performance.
http-server.http.port: Specify the HTTP port. Presto uses HTTP to communicate with the outside and the inside.
query.max-memory: Maximum total memory available for query
query.max-memory-per-node: Maximum single-node memory that can be used for query
discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. Each instance of Presto will be registered in the Discovery service when it starts. This simplifies deployment and does not require additional services. Presto's coordinator has a built-in Discovery service. The HTTP port is also used.
discovery.uri: The URI of the Discovery service. Replace 192.168.248.22:8080 with the host and port of the coordinator. This URI cannot end with a slash. This error requires special attention, otherwise a 404 error will be reported.
In addition, there are the following attributes:
jmx.rmiregistry.port: Specify the registration of JMX RMI. JMX client can connect to this port
jmx.rmiserver.port: Specify the JXM RMI server. Can monitor through JMX.

2) Configure jvm.config

vim jvm.config 
# Add the following content
-server
-Xmx20G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

The JVM configuration file contains command-line options when starting the Java virtual machine. The format is that each line is a command line option. This file data is parsed by the shell, so spaces or special characters in the options will be ignored.

3) Configure log.properties

vim log.properties
# Add the following content
com.facebook.presto=INFO< /pre>

There are four log levels, DEBUG, INFO, WARN and ERROR

4) Configure node.properties

vim node.properties


## Add the following content
node.environment
=presto_ocean
node.
id=node22
node.data
-dir=/data/presto

Parameter description:

node.environment: environment name, the environment name of the nodes in the Presto cluster must be the same.


node.
id: unique identification, the identification of each node Must be one. Even if you restart or upgrade Presto, you must still maintain the original logo.

node.data
-dir: Data directory, Presto uses it to save log and other data

5) Configure catalog and hive.properties

Create a catalog directory. Because of the hive used this time, create hive.properties in this directory and configure the corresponding parameters

mkdir catalog


vim hive.properties
# Add the following content

connector.name
=hive-hadoop2
hive.metastore.uri
=thrift://192.168.11.22: 9083
hive.config.resources=/opt/hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml,/opt/hadoop/hadoop-3.2. 0/etc/hadoop/hdfs-site.xml
hive.allow
-drop-table=true

So far the configuration file configuration is complete .

3. Start presto-server and connect

Enter /opt/presto/presto-server-0.221/bin, there is launcher command

Share pictures

If you need to configure environment variables such as JAVA, you can also go here Modified in the file. The advantage of modifying here is that it can coexist with different versions of jdk without affecting the original business.

1. Start presto-server

./launcher start

If /data/presto The /var log is generated and there is no error message, which means the startup is normal.

2. Presto-cli connection

Rename the downloaded jar package: presto-cli-0.221-executable.jar to: presto and grant permissions

ln -s presto-cli-0.221-< span style="color: #000000;">executable.jar presto

chmod +x presto
./presto --server localhost:8080 --catalog hive --schema default

You can view the libraries and tables in hive at this time

Share pictures

3. View web interface

Login http://192.168.11.22:8080/ui/ to view the overall status.

share picture

So far, the presto deployment is complete. Its performance comparison with hive and usage suggestions will be introduced later when there is a chance.

Geng Xiaochu has opened a personal WeChat public account, students who want to communicate further or want to know other articles can follow me

share Picture

My blog will be synced to Tencent Cloud + community soon, I invite everyone to join us: https://cloud.tencent.com/developer/support-plan?invite_code=33ja5r1x478ks

mkdir -p /data/presto

cd /opt/presto/presto-server -0.221

mkdir etc

vim config.properties
## Add the following content
coordinator=true
datasources
=hive
node
-scheduler.include-coordinator=false
http
-server.http.port=8080
query.max
-memory=80GB
query.max
-memory-per-node=10GB
query.max
-total-memory-per-node=10GB
discovery
-server.enabled=true
discovery.uri
=http://192.168.11.22:8080< /span>

vim config.properties ## Add the following content
coordinator=false
#datasources
=hive
#node
-scheduler.include-coordinator=false
http
-server.http.port=8080
query.max
-memory=80GB
query.max
-memory-per-node=10GB
query.max
-total-memory-per-node=10GB
#discovery
-server.enabled=true
discovery.uri
=http://192.168.248.22:8080< /span>

coordinator: whether to run the instance as a coordinator (accept client query and management query execution).

node-scheduler.include-coordinator: Whether coordinator is also used as work. For large clusters, working as a worker in the coordinator will affect query performance.
http-server.http.port: Specify the HTTP port. Presto uses HTTP to communicate with the outside and the inside.
query.max-memory: Maximum total memory available for query
query.max-memory-per-node: Maximum single-node memory that can be used for query
discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. Each instance of Presto will be registered in the Discovery service when it starts. This simplifies deployment and does not require additional services. Presto's coordinator has a built-in Discovery service. The HTTP port is also used.
discovery.uri: The URI of the Discovery service. Replace 192.168.248.22:8080 with the host and port of the coordinator. This URI cannot end with a slash. This error requires special attention, otherwise a 404 error will be reported.
In addition, there are the following attributes:
jmx.rmiregistry.port: Specify the registration of JMX RMI. JMX client can connect to this port
jmx.rmiserver.port: Specify the JXM RMI server. Can monitor through JMX.

vim jvm.config 
# Add the following content
-server
-Xmx20G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

vim log.properties
# Add the following content
com.facebook.presto=INFO

< span style="color: #000000;">vim node.properties


## Add the following content
node.environment=presto_ocean
node.
id=node22
node.data
-dir=/data/presto

node.environment: the environment name, the environment name of the nodes in the Presto cluster must be the same.


node.
id: unique identification, the identification of each node Must be one. Even if you restart or upgrade Presto, you must still maintain the original logo.

node.data
-dir: Data directory, Presto uses it to save log and other data

mkdir catalog


vim hive.properties
# Add the following content

connector.name
=hive-hadoop2
hive.metastore.uri
=thrift://192.168.11.22: 9083
hive.config.resources=/opt/hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml,/opt/hadoop/hadoop-3.2. 0/etc/hadoop/hdfs-site.xml
hive.allow
-drop-table=true

./launcher start

ln -s presto-cli- 0.221-executable.jar presto

chmod +x presto
./presto --server localhost:8080 --catalog hive --schema default

Leave a Comment

Your email address will not be published.