At present, the most popular big data query engine is Hive. It is a SQL-like query tool based on MR. It interprets the input query SQL as MapReduce, which can greatly reduce the threshold for using big data queries. , So that ordinary business personnel can directly query big data. However, because it is based on MR, the running speed is a drawback, and usually it takes a long time to run a query before there is a result. For this situation, Facebook, which created hive, has lived up to expectations and created a new artifact —presto. Its query speed is on average 10 times faster than hive. Let’s deploy and experience it now.
1. Preparations
Operating system: centos7
JAVA: JDK8 (version 155 and above), I am using jdk1.8.0_191
presto server: presto-server-0.221.tar.gz
presto client: presto-cli-0.221-executable.jar
Note:
a) This deployment is based on hive, so the related nodes have deployed hadoop and hive;
b) The official website address of presto is https://prestodb.github.io presto server, client and jdbc Jars can be downloaded from the official website.
2. Deployment stage
1. Upload jdk and presto server presto client to each server
I upload the jdk package to the /usr/local directory, And unzip, configure soft links, configure environment variables, if you don’t configure environment variables, you can also modify them in the launcher
Upload the presto server and client to /opt/presto, and unzip the server package at the same time
2. The information of each node is as follows
It contains a Coordinator node And 8 worker nodes
ip | node role | node name |
192.168.11.22 | Coordinator | node22 |
192.168.11.50 | node50 | |
192.168.11.51 | Worker | node51 |
192.168.11.52 | Worker | node52 |
192.168.11.53 | Worker | node53 |
192.168.11.54 | Worker | node54 |
192.168.11.55 | Worker | node55 |
192.168.11.56 | Worker | node56 |
192.168.11.57 | Worker |
3. Create presto data and log directories
The following operations are the same for all nodes, only the configuration file needs to be based on Each node situation, corresponding modification
mkdir -p /data/presto
div>
4. Create etc directory
cd /opt/presto/presto-server-0.221 span>
mkdir etc
5. Create the required configuration file
1) Create and configure config.properties
If It is the Coordinator node. The following configuration is recommended (the memory size is modified according to the actual situation)
vim config.properties
## Add the following content< br>coordinator=true
datasources=hive
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=80GB
query.max-memory-per-node=10GB
query.max-total-memory-per-node=10GB
discovery-server.enabled=true
discovery.uri=http://192.168.11.22:8080< /span>
If it is a worker node:
vim config.properties ## Add the following content
coordinator=false
#datasources=hive
#node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=80GB
query.max-memory-per-node=10GB
query.max-total-memory-per-node=10GB
#discovery-server.enabled=true
discovery.uri=http://192.168.248.22:8080< /span>
Parameter description:
coordinator: Whether to run the instance as a coordinator (accept client's query and management query execution).
node-scheduler.include-coordinator: Whether coordinator is also used as work. For large clusters, working as a worker in the coordinator will affect query performance.
http-server.http.port: Specify the HTTP port. Presto uses HTTP to communicate with the outside and the inside.
query.max-memory: Maximum total memory available for query
query.max-memory-per-node: Maximum single-node memory that can be used for query
discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. Each instance of Presto will be registered in the Discovery service when it starts. This simplifies deployment and does not require additional services. Presto's coordinator has a built-in Discovery service. The HTTP port is also used.
discovery.uri: The URI of the Discovery service. Replace 192.168.248.22:8080 with the host and port of the coordinator. This URI cannot end with a slash. This error requires special attention, otherwise a 404 error will be reported.
In addition, there are the following attributes:
jmx.rmiregistry.port: Specify the registration of JMX RMI. JMX client can connect to this port
jmx.rmiserver.port: Specify the JXM RMI server. Can monitor through JMX.
2) Configure jvm.config
vim jvm.config
# Add the following content
-server
-Xmx20G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
The JVM configuration file contains command-line options when starting the Java virtual machine. The format is that each line is a command line option. This file data is parsed by the shell, so spaces or special characters in the options will be ignored.
3) Configure log.properties
vim log.properties
# Add the following content
com.facebook.presto=INFO< /pre>
There are four log levels, DEBUG, INFO, WARN and ERROR
4) Configure node.properties
vim node.properties
## Add the following content
node.environment=presto_ocean
node.id=node22
node.data-dir=/data/presto
Parameter description: p>
node.environment: environment name, the environment name of the nodes in the Presto cluster must be the same.
node.id: unique identification, the identification of each node Must be one. Even if you restart or upgrade Presto, you must still maintain the original logo.
node.data-dir: Data directory, Presto uses it to save log and other data
5) Configure catalog and hive.properties
Create a catalog directory. Because of the hive used this time, create hive.properties in this directory and configure the corresponding parameters
mkdir catalog
vim hive.properties
# Add the following content
connector.name=hive-hadoop2
hive.metastore.uri=thrift://192.168.11.22: 9083
hive.config.resources=/opt/hadoop/hadoop-3.2.0 span>/etc/hadoop/core-site.xml,/opt/hadoop/hadoop-3.2. 0/etc/hadoop/hdfs-site.xml
hive.allow-drop-table=true
So far the configuration file configuration is complete .
3. Start presto-server and connect
Enter /opt/presto/presto-server-0.221/bin, there is launcher command
If you need to configure environment variables such as JAVA, you can also go here Modified in the file. The advantage of modifying here is that it can coexist with different versions of jdk without affecting the original business.
1. Start presto-server
./launcher start
If /data/presto The /var log is generated and there is no error message, which means the startup is normal.
2. Presto-cli connection
Rename the downloaded jar package: presto-cli-0.221-executable.jar to: presto and grant permissions
ln -s presto-cli-0.221-< span style="color: #000000;">executable.jar presto
chmod +x presto
./presto --server localhost:8080 --catalog hive --schema default pre>
You can view the libraries and tables in hive at this time
3. View web interface
Login http://192.168.11.22:8080/ui/ to view the overall status.
So far, the presto deployment is complete. Its performance comparison with hive and usage suggestions will be introduced later when there is a chance.
Geng Xiaochu has opened a personal WeChat public account, students who want to communicate further or want to know other articles can follow me
My blog will be synced to Tencent Cloud + community soon, I invite everyone to join us: https://cloud.tencent.com/developer/support-plan?invite_code=33ja5r1x478ks
mkdir -p /data/presto
cd /opt/presto/presto-server -0.221
mkdir etc
vim config.properties
## Add the following content
coordinator=true
datasources=hive
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=80GB
query.max-memory-per-node=10GB
query.max-total-memory-per-node=10GB
discovery-server.enabled=true
discovery.uri=http://192.168.11.22:8080< /span>
vim config.properties ## Add the following content
coordinator=false
#datasources=hive
#node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=80GB
query.max-memory-per-node=10GB
query.max-total-memory-per-node=10GB
#discovery-server.enabled=true
discovery.uri=http://192.168.248.22:8080< /span>
coordinator: whether to run the instance as a coordinator (accept client query and management query execution).
node-scheduler.include-coordinator: Whether coordinator is also used as work. For large clusters, working as a worker in the coordinator will affect query performance.
http-server.http.port: Specify the HTTP port. Presto uses HTTP to communicate with the outside and the inside.
query.max-memory: Maximum total memory available for query
query.max-memory-per-node: Maximum single-node memory that can be used for query
discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. Each instance of Presto will be registered in the Discovery service when it starts. This simplifies deployment and does not require additional services. Presto's coordinator has a built-in Discovery service. The HTTP port is also used.
discovery.uri: The URI of the Discovery service. Replace 192.168.248.22:8080 with the host and port of the coordinator. This URI cannot end with a slash. This error requires special attention, otherwise a 404 error will be reported.
In addition, there are the following attributes:
jmx.rmiregistry.port: Specify the registration of JMX RMI. JMX client can connect to this port
jmx.rmiserver.port: Specify the JXM RMI server. Can monitor through JMX.
vim jvm.config
# Add the following content
-server
-Xmx20G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
vim log.properties
# Add the following content
com.facebook.presto=INFO
< span style="color: #000000;">vim node.properties
## Add the following content
node.environment=presto_ocean
node.id=node22
node.data-dir=/data/presto
node.environment: the environment name, the environment name of the nodes in the Presto cluster must be the same.
node.id: unique identification, the identification of each node Must be one. Even if you restart or upgrade Presto, you must still maintain the original logo.
node.data-dir: Data directory, Presto uses it to save log and other data
mkdir catalog
vim hive.properties
# Add the following content
connector.name=hive-hadoop2
hive.metastore.uri=thrift://192.168.11.22: 9083
hive.config.resources=/opt/hadoop/hadoop-3.2.0 span>/etc/hadoop/core-site.xml,/opt/hadoop/hadoop-3.2. 0/etc/hadoop/hdfs-site.xml
hive.allow-drop-table=true
./launcher start
ln -s presto-cli- 0.221-executable.jar presto
chmod +x presto
./presto --server localhost:8080 --catalog hive --schema default pre>
WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 4435 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC