HIVE common statement

1Hive Introduction

Hive to me is a data warehouse based on HDFS, which provides a type of SQL language (it is basically the same as the SQL standard but has some special differences) , Allows engineers who are not proficient in Java but familiar with SQL to quickly analyze data on HDFS or other storage file systems such as Amazon, S3, and is a very important tool in the Hadoop ecosystem. For big data analysts, HiveQL is a tool that must be mastered.

2. Hive common sentences

2.1 Novice table creation method

2.1 span>

1. Create the table directly, specify the separator, and store it as text by default. You can also specify the storage format!

create table < span style="color: #0000ff;">if not exists  employees(

nam STRING,
salary
FLOAT comment "income",
subordinates ARRAY
<STRING>,
deductions MAP
<STRING, FLOAT>,
address STRUCT
<street:STRING, city:STRING, state:STRING,zip:INT>
)
row format delimited
fields terminated
by ","
collection items terminated
by "-"
map keys terminated
by "_"
lines terminated
by " "
stored
as textfile;
--comment comments, collection items are separated by "-", that is ARRAY in this table building statement, map keys are divided by "_", rows are divided by " ", stored in text format, or you can choose sequence
etc, text is the default value.
--View table structure:
hive> desc employees;
OK
nam string
salary
float income
subordinates array
<string>
deductions map
<string,float>
address struct
<street:string,city:string,state:string,zip:int>
Time taken:
0.153 seconds, Fetched: 5 row(s)

--desc extended employees; or desc formatted employees; To display more detailed and redundant table information.

2 Create a table from the query statement

-- Original table

hive> select * from tb151;
OK
0 Ma Wenkai 29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting28
5 Bai Weidong27
--Create table through query statement

create table tb151_sample as select * from tb151 tablesample(5 rows);< br>

hive> select * from tb151_sample;
OK
0 Ma Wenkai29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting 28

–Note: tablesample is a table sampling function, 5 rows represents the first 5 rows extracted, and it can also be a percentage (50 percent) or memory sampling (10M):

< /div>

3 Create a table based on a known table structure

create table test_tb like tb151;

hive
> desc tb151;
OK
id string
name string
age
int
Time taken:
0.173 seconds, Fetched: 3 row(s)
hive
> desc test_like;
OK
id string
name string
age
int
Time taken:
0.164 seconds, Fetched: 3 row(s)

2.2 Cainiao Import and Export Several methods of Hive data

1. Load data

--1Import tables directly from local

load data local inpath "/path/to/data" overwrite into table tbname;
--Note: local specifies the local directory, if not added, the default is HDFS directory, overwrite keyword directly overwrites the original table data
--
2Import the query result into the table
insert overwrite table tbname partition(partname="ptname")
select * from tbname2 where....;
--Note: the overwrite keyword will overwrite the metadata, use into Then append data

2. Export data

- -1 If the data file happens to be in the required format, just download it directly from the HDFS directory

hadoop fs -get /path
--2By query statement
insert overwrite local directory "/path/to/data"
select ... from tbname where...;
--3 Via shell command line
/path/to/hive -e "select ...from tbname where.. .." >> /path/to/data.txt

2.3 About Join

Generally, click the rules to join:

1. From left to right table from small to large

2 or the displayed markup table

select /*+STREAMTABLE(t) */ta,t1.b from t join t 1 where……;

--Note: t means large table pre>

3 small tables join large tables can use mapjoin to put small tables in memory

select /*+MAPJOIN(d)*/s.ymd,s.tmd from s join d on s.ymd=d.ymd;

--Note: d refers to the small table

3. Hive Set Intersection and Difference

First look at the original table:

hive> select * from tb151;

OK
0 Ma Wenkai 29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting28
5 Bai Weidong27

1. Hive union can be realized by union

 hive> select age from< /span> tb151 union

> select age from tb151;
Total MapReduce CPU Time Spent:
3 seconds 520 msec
OK
27
28
29
Time taken:
36.328 seconds, Fetched: 3 row(s)

2. Hive's union all merges two sets without going Heavy

> select  age from tb151 union all

> select age from tb151;
Total MapReduce CPU Time Spent:
1 seconds 290 msec
OK
29
29
28
28
28
28
28
28
28
28
27
27
Time taken:
23.941 seconds, Fetched: 12 row(s)

3. Hive intersection can first associate A with B first, and then the result will be the elements that are not NULL in the right column Just remove the duplicates.

4 Difference set left table A-right table B, first use Aleft joinB, and then get the result of the left column elements with NULL in the right column to remove the duplicates

Note: 3, 4 can first store the result of the association as a table, and then select and de-duplicate according to the requirements. Let’s look at the result of the association and it’s OK

hive > select * from tb151;

OK
0 Ma Wenkai 29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting28
5 Bai Weidong27 Time taken: 0.141 seconds, Fetched: 6 row(s) hive> select * from tb151_sample; OK 0 Ma Wenkai29 1 Wang Junpeng 28 2 Li Yulin 28 3 Chen Penghui 28 4 Li Chunting28 hive> select t.age,t1.age from tb151 t left join< span style="color: #000000;"> tb151_sample t1 > on t.age=t1.age; Total MapReduce CPU Time Spent: 1 seconds 550 msec OK 29 29 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 27 NULL --Note: The intersection of the results of the two tables is obvious, and the elements in the right column are not empty. The difference between the left table and the right table is to remove the duplicate elements in the right column.

4. Hive types Implementation method

--1Execute a single statement

hive -e "select * from tbname"
--2Execute sql file
hive -f /path/to/sqlfil
--3 You can execute Hadoop commands through dfs on the hive command line< /span>
hive> dfs -ls /;
Found
3 items
drwxr
-xr-x - root supergroup 0 2019-02 -15 19:18 / test
drwx
-wx-wx - root supergroup 0 2019-02 -12 18:23 / tmp
drwxr
-xr-x - root supergroup 0 2019-01 -27 09:19 / user
--4 can also be passed on the hive command line! +Command to execute some simple commands of the shell
hive>! ls /wjp;
data
javacode
pycode
readDoc.py
sparkcode
udf

create table if not exists employees(

nam STRING,
salary
FLOAT comment "income",
subordinates ARRAY
<STRING>,
deductions MAP
<STRING, FLOAT>,
address STRUCT
<street:STRING, city:STRING, state:STRING,zip:INT>
)
row format delimited
fields terminated
by ","
collection items terminated
by "-"
map keys terminated
by "_"
lines terminated
by " "
stored
as textfile;
--comment comments, collection items are separated by "-", that is ARRAY in this table building statement, map keys are divided by "_", rows are divided by " ", stored in text format, or you can choose sequence
etc, text is the default value.
--View table structure:
hive> desc employees;
OK
nam string
salary
float income
subordinates array
<string>
deductions map
<string,float>
address struct
<street:string,city:string,state:string,zip:int>
Time taken:
0.153 seconds, Fetched: 5 row(s)

--desc extended employees; or desc formatted employees; To display more detailed and redundant table information.

--Original table 

hive> select * from tb151;
OK
0 Ma Wenkai 29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting28
5 Bai Weidong27
--Create table through query statement

create table tb151_sample as select * from tb151 tablesample(5 rows);< br>

hive> select * from tb151_sample;
OK
0 Ma Wenkai29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting 28

--Note: tablesample is a table sampling function, 5 rows represents the first 5 rows extracted, and it can also be a percentage (50 percent) or memory sampling (10M):

< /p>

create table test_tb like tb151;

hive
> desc tb151;
OK
id string
name string
age
int
Time taken:
0.173 seconds, Fetched: 3 row(s)
hive
> desc test_like;
OK
id string
name string
age
int
Time taken:
0.164 seconds, Fetched: 3 row(s)

--1Import tables directly from local

load data local inpath "/path/to/data" overwrite into table tbname;
--注:local指定本地目录,不加的话默认是HDFS目录,overwrite关键字直接覆盖原表数据
--
2将查询语句结果导入表
insert overwrite table tbname partition(partname="ptname")
select * from tbname2 where....;
--注:overwrite关键字会覆盖元数据,使用into则追加数据

--1如果数据文件恰好是需要的格式则直接从HDFS目录下载即可

hadoop fs -get /path
--2通过查询语句
insert overwrite local directory "/path/to/data"
select ... from tbname where...;
--3通过shell命令行
/path/to/hive -e "select ...from tbname where...." >> /path/to/data.txt

select /*+STREAMTABLE(t)*/t.a,t1.b from t join t1 where……;

--注:t表示大表

select /*+MAPJOIN(d)*/s.ymd,s.tmd from s join d on s.ymd=d.ymd;

--注:d指的是小表

hive> select * from tb151;

OK
0 马文楷 29
1 王俊朋 28
2 李玉林 28
3 陈鹏辉 28
4 李春廷 28
5 白卫东 27

hive> select age from tb151 union

> select age from tb151;
Total MapReduce CPU Time Spent:
3 seconds 520 msec
OK
27
28
29
Time taken:
36.328 seconds, Fetched: 3 row(s)

> select age from tb151 union all

> select age from tb151;
Total MapReduce CPU Time Spent:
1 seconds 290 msec
OK
29
29
28
28
28
28
28
28
28
28
27
27
Time taken:
23.941 seconds, Fetched: 12 row(s)

hive> select * from tb151;

OK
0 马文楷 29
1 王俊朋 28
2 李玉林 28
3 陈鹏辉 28
4 李春廷 28
5 白卫东 27 Time taken: 0.141 seconds, Fetched: 6 row(s) hive> select * from tb151_sample; OK 0 马文楷 29 1 王俊朋 28 2 李玉林 28 3 陈鹏辉 28 4 李春廷 28 hive> select t.age,t1.age from tb151 t left join tb151_sample t1 > on t.age=t1.age; Total MapReduce CPU Time Spent: 1 seconds 550 msec OK 29 29 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 27 NULL --注:两个表的结果交集很明显右列不为空的元素去重,差集左表-右表则是右列为空的元素去重即可

--1执行单条语句

hive -e "select * from tbname"
--2执行sql文件
hive -f /path/to/sqlfil
--3可以在hive命令行通过dfs执行Hadoop命令
hive> dfs -ls /;
Found
3 items
drwxr
-xr-x - root supergroup 0 2019-02-15 19:18 /test
drwx
-wx-wx - root supergroup 0 2019-02-12 18:23 /tmp
drwxr
-xr-x - root supergroup 0 2019-01-27 09:19 /user
--4也可以在hive命令行通过!+命令执行shell的一些简单命令
hive> ! ls /wjp;
data
javacode
pycode
readDoc.py
sparkcode
udf

Leave a Comment

Your email address will not be published.