hadoop hive
1) Hive was born in 2007,
2) 2014 hive 0.13.0 is very popular (it is relatively stable for the first time)
3) 2015hive1.2.0 (relatively only an upgrade)
4) 2016hive2.1.0 (updated many functions)
1.1hive metadata management
h4>
1) Modeling and processing metadata through hive, turning it into a table form, establishing data warehouse management
2) Establishing a form similar to a relational database, not a storage On hive, it is stored in a relational database, while the metadata in hive reality is not stored on hadoop
3) mysql provides storage for hive
1.2 Hive’s architecture
Processing large data, suitable for P-processing data,
metastore: relational database
Driver: core driver Server: link jdbc
1.3 Commonly used commands:
1) Belline mode
2) Command line mode
formatted
1.4 Data type
String (universal), decimal (calculation of array)
Complex data type
array array[‘apple’,’Orange’,’ Mongo’]
map(mapping)[‘a’:’apple’,’o’:’orange’] key-value pair
Equivalent to how many points a skill level has reached
struct (can have multiple columns)
1.5 Metadata table structure
database is a subfolder, table is a subfolder
< p>partition: corresponds to the folder to analyze the data
buckets: the optimization of the bucket query connection, which determines how the data is distributed, is a part of the data file, and corresponds to a data file (the data on the file Separate)
row (row): Corresponds to a data file is viewed horizontally
views: query the imaging of data data, no data is stored
Index: Index, corresponding to folders and files
1.6 hiveDatabase (database)
Create database: create database name
To switch database: use database name
Default path: /user/hive/warehouse
1.7 hive Tables (external table to internal table)
1) external tables (external tables): add Location’address’ when creating a table with keywords, deleting the table will not delete data
2) internal tables (management Table): The data is completely managed by Hive. Deleting the table (metadata) will delete the data.
Hive table creation statement
Must write: row format delimited; (delimited split)
Each column | segmentation: fields terminated by’|’
We use arrays, segmentation: collection items terminated by’,’
Key-value pairs we use: Split: Map keys terminated by’:’
?