Arrays – Array intersects Hive

I have two string arrays in Hive

{'value1','value2','value3' }
{'value1','value2'}

I want to merge arrays without duplicates, the result:

{'value1', 'value2','value3'}

How can I do this in the hive?

You need a UDF. Klout has a bunch of open source HivUDFS under the package
brickhouse. This is github link. They have a bunch of UDFs that fit your purpose perfectly.
Download, build and add JARs. Here is an example

CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b ','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c ","a"]

I have two string arrays in Hive

{' value1','value2','value3'}
{'value1','value2'}

I want to merge arrays without duplicates, the result:

{'value1','value2','value3'}

How can I do this in the hive?

You need a UDF. Klout has a bunch of open source HivUDFS under the package
brickhouse. This is the github link. They have a bunch of UDFs, which are exactly yours Purpose.
Download, build and add JAR. Here is an example

CREATE TEMPORARY FUNCTION combine AS'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b' ,'c','d'))) from reqtable;

OK
["d","b","c","a"]

Leave a Comment

Your email address will not be published.