Hive – Hadoop merged file

I have run a map job with only 674 mappers, and hive has generated 674 .gz files, and I want to merge these files into 30-35 files. Pass Do not get the merged output, try the hive megre mapfilse attribute
try to use the TEZ execution engine, and then use hive.merge.tezfiles .You may also want to specify the size.

set hive.execution.engine=tez; - TEZ execution engine
set hive.merge.tezfiles= true; - Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; - 128MB

If you want to use the MR engine, then add the following settings (I haven't tried it myself)

set hive.merge.mapredfiles=true; - Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; - 128MB

above The setting will generate a merge file step, the size of each part file should be 128MB.

Reference:

> Settings description

I have run a map job with only 674 mappers, in which hive generated 674 .gz files, I want to merge these files into 30-35 files. By not getting the merged output, I tried hive megre mapfilse attribute

Try to use the TEZ execution engine, and then use hive.merge.tezfiles. You You may also want to specify the size.

set hive.execution.engine=tez; - TEZ execution engine
set hive.merge.tezfiles=true; - Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; - 128MB

If you want to use the MR engine, then add the following settings (I haven't tried it myself)

set hive.merge.mapredfiles=true; - Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; - 128MB

The above setting Steps to generate a combined file, the size of each part file should be 128MB.

Reference:

> Settings description

Leave a Comment

Your email address will not be published.