Big Data Quarry Presto, 10 times faster than HIVE - big, cutlery, data, deployment, Double, FAST, hive, Presto, query

At present, the most popular big data query engine is Hive. It is a SQL-like query tool based on MR. It interprets the input query SQL as MapReduce, which can greatly reduce the threshold for using big data queries. , So that ordinary business personnel can directly query big data. However, because it is based on MR, the running speed is a drawback, and usually it takes a long time to run a query before there is a result. For this situation, Facebook, which created hive, has lived up to expectations and created a new artifact —presto. Its query speed is on average 10 times faster than hive. Let’s deploy and experience it now.

1. Preparations

Operating system: centos7

JAVA: JDK8 (version 155 and above), I am using jdk1.8.0_191

presto server: presto-server-0.221.tar.gz

presto client: presto-cli-0.221-executable.jar

Note:

a) This deployment is based on hive, so the related nodes have deployed hadoop and hive;

b) The official website address of presto is https://prestodb.github.io presto server, client and jdbc Jars can be downloaded from the official website.

2. Deployment stage

1. Upload jdk and presto server presto client to each server

I upload the jdk package to the /usr/local directory, And unzip, configure soft links, configure environment variables, if you don’t configure environment variables, you can also modify them in the launcher

share pictures

Upload the presto server and client to /opt/presto, and unzip the server package at the same time

share picture

2. The information of each node is as follows

It contains a Coordinator node And 8 worker nodes

< /tr>

< td>Worker

< td>node57

ip	node role	node name
192.168.11.22	Coordinator	node22
192.168.11.50	node50
192.168.11.51	Worker	node51
192.168.11.52	Worker	node52
192.168.11.53	Worker	node53
192.168.11.54	Worker	node54
192.168.11.55	Worker	node55
192.168.11.56	Worker	node56
192.168.11.57	Worker

3. Create presto data and log directories

The following operations are the same for all nodes, only the configuration file needs to be based on Each node situation, corresponding modification

mkdir -p /data/presto

4. Create etc directory

cd /opt/presto/presto-server-0.221 span>

mkdir etc

5. Create the required configuration file

Share pictures

1) Create and configure config.properties

If It is the Coordinator node. The following configuration is recommended (the memory size is modified according to the actual situation)

vim config.properties
## Add the following content< br>coordinator=true

datasources=hive

node-scheduler.include-coordinator=false

http-server.http.port=8080

query.max-memory=80GB

query.max-memory-per-node=10GB

query.max-total-memory-per-node=10GB

discovery-server.enabled=true

discovery.uri=http://192.168.11.22:8080< /span>

If it is a worker node:

vim config.properties ## Add the following content
coordinator=false

#datasources=hive

#node-scheduler.include-coordinator=false

http-server.http.port=8080

query.max-memory=80GB

query.max-memory-per-node=10GB

query.max-total-memory-per-node=10GB

#discovery-server.enabled=true

discovery.uri=http://192.168.248.22:8080< /span>

Parameter description:

coordinator: Whether to run the instance as a coordinator (accept client's query and management query execution).

node-scheduler.include-coordinator: Whether coordinator is also used as work. For large clusters, working as a worker in the coordinator will affect query performance.

http-server.http.port: Specify the HTTP port. Presto uses HTTP to communicate with the outside and the inside.

query.max-memory: Maximum total memory available for query

query.max-memory-per-node: Maximum single-node memory that can be used for query

discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. Each instance of Presto will be registered in the Discovery service when it starts. This simplifies deployment and does not require additional services. Presto's coordinator has a built-in Discovery service. The HTTP port is also used.

discovery.uri: The URI of the Discovery service. Replace 192.168.248.22:8080 with the host and port of the coordinator. This URI cannot end with a slash. This error requires special attention, otherwise a 404 error will be reported.

In addition, there are the following attributes:

jmx.rmiregistry.port: Specify the registration of JMX RMI. JMX client can connect to this port

jmx.rmiserver.port: Specify the JXM RMI server. Can monitor through JMX.

2) Configure jvm.config

vim jvm.config 
 # Add the following content
-server

-Xmx20G

-XX:+UseG1GC

-XX:G1HeapRegionSize=32M

-XX:+UseGCOverheadLimit

-XX:+ExplicitGCInvokesConcurrent

-XX:+HeapDumpOnOutOfMemoryError

-XX:OnOutOfMemoryError=kill -9 %p

The JVM configuration file contains command-line options when starting the Java virtual machine. The format is that each line is a command line option. This file data is parsed by the shell, so spaces or special characters in the options will be ignored.

3) Configure log.properties

vim log.properties
# Add the following content
com.facebook.presto=INFO< /pre>

There are four log levels, DEBUG, INFO, WARN and ERROR

4) Configure node.properties

vim node.properties



## Add the following content

node.environment=presto_ocean

node.id=node22

node.data-dir=/data/presto

Parameter description:

node.environment: environment name, the environment name of the nodes in the Presto cluster must be the same.



node.id: unique identification, the identification of each node Must be one. Even if you restart or upgrade Presto, you must still maintain the original logo.



node.data-dir: Data directory, Presto uses it to save log and other data

5) Configure catalog and hive.properties

Create a catalog directory. Because of the hive used this time, create hive.properties in this directory and configure the corresponding parameters

mkdir catalog



vim hive.properties

# Add the following content



connector.name=hive-hadoop2

hive.metastore.uri=thrift://192.168.11.22: 9083

hive.config.resources=/opt/hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml,/opt/hadoop/hadoop-3.2. 0/etc/hadoop/hdfs-site.xml

hive.allow-drop-table=true

So far the configuration file configuration is complete .

3. Start presto-server and connect

Enter /opt/presto/presto-server-0.221/bin, there is launcher command

Share pictures

If you need to configure environment variables such as JAVA, you can also go here Modified in the file. The advantage of modifying here is that it can coexist with different versions of jdk without affecting the original business.

1. Start presto-server

./launcher start

If /data/presto The /var log is generated and there is no error message, which means the startup is normal.

2. Presto-cli connection

Rename the downloaded jar package: presto-cli-0.221-executable.jar to: presto and grant permissions

ln -s presto-cli-0.221-< span style="color: #000000;">executable.jar presto

chmod +x presto
./presto --server localhost:8080 --catalog hive --schema default

You can view the libraries and tables in hive at this time

Share pictures

3. View web interface

share picture

So far, the presto deployment is complete. Its performance comparison with hive and usage suggestions will be introduced later when there is a chance.

Geng Xiaochu has opened a personal WeChat public account, students who want to communicate further or want to know other articles can follow me

share Picture

My blog will be synced to Tencent Cloud + community soon, I invite everyone to join us: https://cloud.tencent.com/developer/support-plan?invite_code=33ja5r1x478ks

mkdir -p /data/presto

cd /opt/presto/presto-server -0.221

mkdir etc

vim config.properties
## Add the following content
coordinator=true

datasources=hive

node-scheduler.include-coordinator=false

http-server.http.port=8080

query.max-memory=80GB

query.max-memory-per-node=10GB

query.max-total-memory-per-node=10GB

discovery-server.enabled=true

discovery.uri=http://192.168.11.22:8080< /span>

vim config.properties ## Add the following content
coordinator=false

#datasources=hive

#node-scheduler.include-coordinator=false

http-server.http.port=8080

query.max-memory=80GB

query.max-memory-per-node=10GB

query.max-total-memory-per-node=10GB

#discovery-server.enabled=true

discovery.uri=http://192.168.248.22:8080< /span>

coordinator: whether to run the instance as a coordinator (accept client query and management query execution).

node-scheduler.include-coordinator: Whether coordinator is also used as work. For large clusters, working as a worker in the coordinator will affect query performance.

http-server.http.port: Specify the HTTP port. Presto uses HTTP to communicate with the outside and the inside.

query.max-memory: Maximum total memory available for query

query.max-memory-per-node: Maximum single-node memory that can be used for query

discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. Each instance of Presto will be registered in the Discovery service when it starts. This simplifies deployment and does not require additional services. Presto's coordinator has a built-in Discovery service. The HTTP port is also used.

discovery.uri: The URI of the Discovery service. Replace 192.168.248.22:8080 with the host and port of the coordinator. This URI cannot end with a slash. This error requires special attention, otherwise a 404 error will be reported.

In addition, there are the following attributes:

jmx.rmiregistry.port: Specify the registration of JMX RMI. JMX client can connect to this port

jmx.rmiserver.port: Specify the JXM RMI server. Can monitor through JMX.

vim jvm.config 
# Add the following content
-server

-Xmx20G

-XX:+UseG1GC

-XX:G1HeapRegionSize=32M

-XX:+UseGCOverheadLimit

-XX:+ExplicitGCInvokesConcurrent

-XX:+HeapDumpOnOutOfMemoryError

-XX:OnOutOfMemoryError=kill -9 %p

vim log.properties
# Add the following content
com.facebook.presto=INFO

< span style="color: #000000;">vim node.properties



## Add the following content

node.environment=presto_ocean

node.id=node22

node.data-dir=/data/presto

node.environment: the environment name, the environment name of the nodes in the Presto cluster must be the same.



node.id: unique identification, the identification of each node Must be one. Even if you restart or upgrade Presto, you must still maintain the original logo.



node.data-dir: Data directory, Presto uses it to save log and other data

mkdir catalog



vim hive.properties

# Add the following content



connector.name=hive-hadoop2

hive.metastore.uri=thrift://192.168.11.22: 9083

hive.config.resources=/opt/hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml,/opt/hadoop/hadoop-3.2. 0/etc/hadoop/hdfs-site.xml

hive.allow-drop-table=true

./launcher start

ln -s presto-cli- 0.221-executable.jar presto

chmod +x presto
./presto --server localhost:8080 --catalog hive --schema default

big, cutlery, data, deployment, Double, FAST, hive, Presto, query

Leave a Comment Cancel reply