Delivery problem with SQOOP imports

Use the following sqoop1 import when importing

sqoop import --connect jdbc:oracle:thin:@ip:port/ORCL --username user --password pwd --table db.table --target-dir /path --delete-target-dir -m 1 --null-string "\N" --null-non-string "\N" --as-textfile --fields-terminated-by "	" --hive-drop-import-delims

But use the following when exporting

sqoop export --connect jdbc:mysql://ip:3306/db?characterEncoding=utf8 --username user --password pwd --table table --export-dir /path* --update-mode allowinsert --update-key id --input -null-string "\N" --input-null-non-string "\N" --fields-terminated-by "	"

It will often report errors.

After investigation, it was found that because the separator was specified as ” ” during import, and some columns contained the character ” “, there was a problem with splitting rows during export. The subsequent --hive-drop-import-delims seems to only replace the default delimiter character of hive. Therefore, it is recommended to use hive’s default delimiter when importing, and bring --hive-drop-import-delims to avoid problems when exporting.

As follows:

Import

sqoop import --connect jdbc:oracle:thin:@ip:port/ORCL --username user --password pwd --table db.table --target-dir /path --delete-target-dir -m 1 --null-string "\N" --null-non-string "\N"- -as-textfile --fields-terminated-by "01" --hive-drop-import-delims

export

sqoop export --connect jdbc :mysql://ip:3306/db?characterEncoding=utf8 --username user --password pwd --table table --export-dir /path* --update-mode allowinsert --update-key id --input- null-string "\N" --input-null-non-string "\N" --fields-terminated-by "01"

NOTE:< /p>

Separator Description
For a text file, each line is a record, so a newline character can separate the records
^A(ctl+A) Used to separate fields (columns). In the CREATE TABLE statement, you can use the octal code 01 to indicate
^B(ctl+B) is used to separate ARRAY or Elements in STRUCT, or used to separate key-value pairs in MAP. In the CREATE TABLE statement, you can use the octal code 02 to represent
^C(ctl+C) for the key in MAP And the separation between values. In the CREATE TABLE statement, you can use the octal code 03 to indicate

Hive does not define a special data format, the data format can be specified by the user, and the user defines the data The format needs to specify three attributes: column separator (usually space, ” “, “01″), row separator (”
“), and the method of reading file data. In the process of loading data, there is no need to convert from the user data format to the data format defined by Hive. Therefore, Hive will not modify the data itself during the loading process, but just copy or move the data content to the corresponding data format. In the HDFS directory.

Recommendation

It is better to use “01″ as the column separator. ” ” is very easy to appear in the text, leading to errors when exporting.

Leave a Comment

Your email address will not be published.