This tutorials will explain the Data load commands for Hive and various method to load data. Sample data can be downloaded here.
Hive Load Command Attributes: Some of the attributes that can be used in load statement in Hive.
Attribute |
Description |
LOCAL |
This is mandatory if data is being loaded into Hive table from local system and not from HDFS. |
inpath |
This attribute is used to specify the path of file to be loaded into Hive table. |
OVERWRITE |
This attribute is used to truncate the table before loading new file. If this attribute is not specified then load command will append new data to the existing data. |
TABLE |
This attribute is used specify the table where LOAD command will load data. |
PARTITION |
This attribute is used to specify partition column during data load. User cannot load data into Partitioned table without specifying partition column/columns. |
LOAD DATA LOCAL inpath 'full_file_path_and_file_name' OVERWRITE INTO TABLE table_name;
LOAD DATA inpath 'full_file_path_and_file_name' OVERWRITE INTO TABLE table_name;
LOAD DATA local inpath '/Users/username/data_files/order.txt' OVERWRITE INTO TABLE hiveTempTable;
INSERT OVERWRITE TABLE hiveFirstTable SELECT order_id,order_data,cust_id,order_status FROM hiveTempTable;
LOAD DATA local inpath '/Users/username/data_files/order.txt' OVERWRITE INTO TABLE hiveTempTable;
INSERT OVERWRITE TABLE hiveFirstPartitionedClusteredTable PARTITION(order_status) SELECT order_id,order_date,cust_id,order_status FROM hiveTempTable;
LOAD DATA local inpath '/Users/username/data_files/order.txt' OVERWRITE INTO TABLE hiveTempTable;
FROM hiveTempTable INSERT OVERWRITE TABLE hiveFirstPartitionedClusteredTable PARTITION(order_status) SELECT order_id,order_date,cust_id,order_status;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
INSERT OVERWRITE LOCAL DIRECTORY '/Users/username/Desktop/hive' (SELECT * FROM order_json);
INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/user/username/hive' (SELECT * FROM order_json);