HIVE common statement - common, hive, statement

1Hive Introduction

Hive to me is a data warehouse based on HDFS, which provides a type of SQL language (it is basically the same as the SQL standard but has some special differences) , Allows engineers who are not proficient in Java but familiar with SQL to quickly analyze data on HDFS or other storage file systems such as Amazon, S3, and is a very important tool in the Hadoop ecosystem. For big data analysts, HiveQL is a tool that must be mastered.

2. Hive common sentences

2.1 Novice table creation method

2.1 span>

1. Create the table directly, specify the separator, and store it as text by default. You can also specify the storage format!

create table < span style="color: #0000ff;">if not exists  employees(

nam STRING,

salary FLOAT comment "income",

subordinates ARRAY<STRING>,

deductions MAP<STRING, FLOAT>,

address STRUCT<street:STRING, city:STRING, state:STRING,zip:INT>

)

row format delimited

fields terminated by ","

collection items terminated by "-"

map keys terminated by "_"

lines terminated by "
"

stored as textfile;

--comment comments, collection items are separated by "-", that is ARRAY in this table building statement, map keys are divided by "_", rows are divided by "
", stored in text format, or you can choose sequence

etc, text is the default value.

--View table structure:

hive> desc employees;

OK

nam string

salary float income

subordinates array<string>

deductions map<string,float>

address struct<street:string,city:string,state:string,zip:int>

Time taken: 0.153 seconds, Fetched: 5 row(s)



--desc extended employees; or desc formatted employees; To display more detailed and redundant table information.

2 Create a table from the query statement

-- Original table

hive> select * from tb151;

OK

0 Ma Wenkai 29

1 Wang Junpeng28

2 Li Yulin28

3 Chen Penghui28

4 Li Chunting28

5 Bai Weidong27
 --Create table through query statement

create table tb151_sample as select * from tb151 tablesample(5 rows);< br>

hive> select * from tb151_sample;
OK
0 Ma Wenkai29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting 28

–Note: tablesample is a table sampling function, 5 rows represents the first 5 rows extracted, and it can also be a percentage (50 percent) or memory sampling (10M):

< /div>

3 Create a table based on a known table structure

create table test_tb like tb151;

hive> desc tb151;

OK

id string

name string

age int

Time taken: 0.173 seconds, Fetched: 3 row(s)

hive> desc test_like;

OK

id string

name string

age int

Time taken: 0.164 seconds, Fetched: 3 row(s)

2.2 Cainiao Import and Export Several methods of Hive data

1. Load data

--1Import tables directly from local

load data local inpath "/path/to/data" overwrite into table tbname;

--Note: local specifies the local directory, if not added, the default is HDFS directory, overwrite keyword directly overwrites the original table data

--2Import the query result into the table

insert overwrite table tbname partition(partname="ptname")

select * from tbname2 where....;

--Note: the overwrite keyword will overwrite the metadata, use into Then append data

2. Export data

- -1 If the data file happens to be in the required format, just download it directly from the HDFS directory

hadoop fs -get /path

--2By query statement

insert overwrite local directory "/path/to/data"

select ... from tbname where...;

--3 Via shell command line

/path/to/hive -e "select ...from tbname where.. .." >> /path/to/data.txt

2.3 About Join

Generally, click the rules to join:

1. From left to right table from small to large

2 or the displayed markup table

select /*+STREAMTABLE(t) */ta,t1.b from t join t 1 where……;

--Note: t means large table pre>

3 small tables join large tables can use mapjoin to put small tables in memory

select /*+MAPJOIN(d)*/s.ymd,s.tmd from s join d on s.ymd=d.ymd;

--Note: d refers to the small table

3. Hive Set Intersection and Difference

First look at the original table:

hive> select * from tb151;

OK

0 Ma Wenkai 29

1 Wang Junpeng28

2 Li Yulin28

3 Chen Penghui28

4 Li Chunting28

5 Bai Weidong27

1. Hive union can be realized by union

 hive> select age from< /span> tb151 union

 > select age from tb151;

Total MapReduce CPU Time Spent: 3 seconds 520 msec

OK

27

28

29

Time taken: 36.328 seconds, Fetched: 3 row(s)

2. Hive's union all merges two sets without going Heavy

> select  age from tb151 union all

> select age from tb151;

Total MapReduce CPU Time Spent: 1 seconds 290 msec

OK

29

29

28

28

28

28

28

28

28

28

27

27

Time taken: 23.941 seconds, Fetched: 12 row(s)

3. Hive intersection can first associate A with B first, and then the result will be the elements that are not NULL in the right column Just remove the duplicates.

4 Difference set left table A-right table B, first use Aleft joinB, and then get the result of the left column elements with NULL in the right column to remove the duplicates

Note: 3, 4 can first store the result of the association as a table, and then select and de-duplicate according to the requirements. Let’s look at the result of the association and it’s OK

hive > select * from tb151;

OK

0 Ma Wenkai 29

1 Wang Junpeng28

2 Li Yulin28

3 Chen Penghui28

4 Li Chunting28

5 Bai Weidong27
Time taken: 0.141 seconds, Fetched: 6 row(s)
hive> select * from tb151_sample;
OK
0 Ma Wenkai29
1 Wang Junpeng 28
2 Li Yulin 28
3 Chen Penghui 28
4 Li Chunting28
hive> select t.age,t1.age from tb151 t left join< span style="color: #000000;"> tb151_sample t1
      > on t.age=t1.age;
Total MapReduce CPU Time Spent: 1 seconds 550 msec
OK
29 29
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
28 28
27 NULL
--Note: The intersection of the results of the two tables is obvious, and the elements in the right column are not empty. The difference between the left table and the right table is to remove the duplicate elements in the right column.

4. Hive types Implementation method

--1Execute a single statement

hive -e "select * from tbname"

--2Execute sql file

hive -f /path/to/sqlfil

--3 You can execute Hadoop commands through dfs on the hive command line< /span>

hive> dfs -ls /;

Found 3 items

drwxr-xr-x - root supergroup 0 2019-02 -15 19:18 / test

drwx-wx-wx - root supergroup 0 2019-02 -12 18:23 / tmp

drwxr-xr-x - root supergroup 0 2019-01 -27 09:19 / user

--4 can also be passed on the hive command line! +Command to execute some simple commands of the shell

hive>! ls /wjp;

data

javacode

pycode

readDoc.py

sparkcode

udf

create table if not exists employees(

nam STRING,

salary FLOAT comment "income",

subordinates ARRAY<STRING>,

deductions MAP<STRING, FLOAT>,

address STRUCT<street:STRING, city:STRING, state:STRING,zip:INT>

)

row format delimited

fields terminated by ","

collection items terminated by "-"

map keys terminated by "_"

lines terminated by "
"

stored as textfile;

--comment comments, collection items are separated by "-", that is ARRAY in this table building statement, map keys are divided by "_", rows are divided by "
", stored in text format, or you can choose sequence

etc, text is the default value.

--View table structure:

hive> desc employees;

OK

nam string

salary float income

subordinates array<string>

deductions map<string,float>

address struct<street:string,city:string,state:string,zip:int>

Time taken: 0.153 seconds, Fetched: 5 row(s)



--desc extended employees; or desc formatted employees; To display more detailed and redundant table information.

--Original table 

hive> select * from tb151;

OK

0 Ma Wenkai 29

1 Wang Junpeng28

2 Li Yulin28

3 Chen Penghui28

4 Li Chunting28

5 Bai Weidong27
 --Create table through query statement

create table tb151_sample as select * from tb151 tablesample(5 rows);< br>

hive> select * from tb151_sample;
OK
0 Ma Wenkai29
1 Wang Junpeng28
2 Li Yulin28
3 Chen Penghui28
4 Li Chunting 28

--Note: tablesample is a table sampling function, 5 rows represents the first 5 rows extracted, and it can also be a percentage (50 percent) or memory sampling (10M):

< /p>

create table test_tb like tb151;

hive> desc tb151;

OK

id string

name string

age int

Time taken: 0.173 seconds, Fetched: 3 row(s)

hive> desc test_like;

OK

id string

name string

age int

Time taken: 0.164 seconds, Fetched: 3 row(s)

--1Import tables directly from local

load data local inpath "/path/to/data" overwrite into table tbname;

--注：local指定本地目录，不加的话默认是HDFS目录，overwrite关键字直接覆盖原表数据

--2将查询语句结果导入表

insert overwrite table tbname partition(partname="ptname")

select * from tbname2 where....;

--注：overwrite关键字会覆盖元数据，使用into则追加数据

--1如果数据文件恰好是需要的格式则直接从HDFS目录下载即可

hadoop fs -get /path

--2通过查询语句

insert overwrite local directory "/path/to/data"

select ... from tbname where...;

--3通过shell命令行

/path/to/hive -e "select ...from tbname where...." >> /path/to/data.txt

select /*+STREAMTABLE(t)*/t.a,t1.b from t join t1 where……;

--注：t表示大表

select /*+MAPJOIN（d）*/s.ymd,s.tmd from s join d on s.ymd=d.ymd;

--注：d指的是小表

hive> select * from tb151;

OK

0    马文楷    29

1    王俊朋    28

2    李玉林    28

3    陈鹏辉    28

4    李春廷    28

5    白卫东    27

hive> select age from tb151 union

    > select age from tb151;

Total MapReduce CPU Time Spent: 3 seconds 520 msec

OK

27

28

29

Time taken: 36.328 seconds, Fetched: 3 row(s)

> select age from tb151 union all

> select age from tb151;

Total MapReduce CPU Time Spent: 1 seconds 290 msec

OK

29

29

28

28

28

28

28

28

28

28

27

27

Time taken: 23.941 seconds, Fetched: 12 row(s)

hive> select * from tb151;

OK

0    马文楷    29

1    王俊朋    28

2    李玉林    28

3    陈鹏辉    28

4    李春廷    28

5    白卫东    27
Time taken: 0.141 seconds, Fetched: 6 row(s)
hive> select * from tb151_sample;
OK
0    马文楷    29
1    王俊朋    28
2    李玉林    28
3    陈鹏辉    28
4    李春廷    28
hive> select t.age,t1.age from tb151 t left join tb151_sample t1
      > on t.age=t1.age;
Total MapReduce CPU Time Spent: 1 seconds 550 msec
OK
29    29
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
28    28
27    NULL
--注：两个表的结果交集很明显右列不为空的元素去重，差集左表-右表则是右列为空的元素去重即可

--1执行单条语句

hive -e "select * from tbname"

--2执行sql文件

hive -f /path/to/sqlfil 

--3可以在hive命令行通过dfs执行Hadoop命令

hive> dfs -ls /;

Found 3 items

drwxr-xr-x   - root supergroup          0 2019-02-15 19:18 /test

drwx-wx-wx   - root supergroup          0 2019-02-12 18:23 /tmp

drwxr-xr-x   - root supergroup          0 2019-01-27 09:19 /user

--4也可以在hive命令行通过！+命令执行shell的一些简单命令

hive> ! ls /wjp;

data

javacode

pycode

readDoc.py

sparkcode

udf

common, hive, statement