(1) Basic introduction to hive
Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table and provide SQL-like query functions
Other knowledge:
DML (data manipulation language) data manipulation language
Commonly used select, update, insert, delete; mainly used to perform some operations on database data
DDL( data definition language) database definition language
commonly used create, alter, drop; mainly used to define or change the structure of the table, data types, links and constraints between tables and other initialization work
DCL (data control language)数据库控制语言
用来设置或更改数据库用户或角色权限的语句
######################## ###################
############hive common commands############# #####
1. Create a table
1.1 Table creation syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[( col_name data_type [COMMENT col_comment], …)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], …)]
[CLUSTERED BY (col_name, col_name, …)
[SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS]
[ ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
1.2 Parameter description:
① The EXTERNAL keyword allows users Create an external table, specify a path to the actual data (LOCATION) while creating the table, when Hive creates the internal table , The data will be moved to the path pointed to by the data warehouse; if an external table is created, only the path where the data is located will be recorded, and no changes will be made to the location of the data. When deleting a table, the metadata and data of the internal table will be deleted together, while the external table only deletes the metadata, not the data
②ROW FORMAT users can customize the SerDe or use the built-in SerDe when creating the table . If ROW FORMAT or ROW FORMAT DELIMITED is not specified, the built-in SerDe will be used. When creating the table, the user also needs to specify the columns for the table. The user will also specify a custom SerDe when specifying the columns of the table. Hive determines the data of the specific columns of the table through SerDe
③STORED AS specifies the user source file The storage formats are commonly used as follows: TEXTFILE, the default format, this format is not specified when the table is built; SEQUENCEFILE, a binary file provided by the Hadoop API, is easy to use, can be divided, and can be compressed; RCFILE, one种行列存储相结合的存储方式; ORC,hive给出的新格式,属于RCFILE的升级版,性能有大幅度提升.
1.3样例:
CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_user
(
userid STRING COMMENT’user id’
,username STRING COMMENT’user name’
,sex STRING COMMENT’user gender’
,address STRING COMMENT’home address’
)
COMMENT’user table’
PARTITIONED BY (pt_d VARCHAR(8) COMMENT’day partition’, pt_h VARCHAR(2) COMMENT’hour partition’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY’