{'value1','value2','value3' }
{'value1','value2'}
I want to merge arrays without duplicates, the result:
{'value1', 'value2','value3'}
How can I do this in the hive?
brickhouse. This is github link. They have a bunch of UDFs that fit your purpose perfectly.
Download, build and add JARs. Here is an example
CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';
select combine_unique(combine(array('a','b ','c'), array('b','c','d'))) from reqtable;
OK
["d","b","c ","a"]
I have two string arrays in Hive
{' value1','value2','value3'}
{'value1','value2'}
I want to merge arrays without duplicates, the result:
p>
{'value1','value2','value3'}
How can I do this in the hive?
You need a UDF. Klout has a bunch of open source HivUDFS under the package
brickhouse. This is the github link. They have a bunch of UDFs, which are exactly yours Purpose.
Download, build and add JAR. Here is an example
CREATE TEMPORARY FUNCTION combine AS'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';
select combine_unique(combine(array('a','b','c'), array('b' ,'c','d'))) from reqtable;
OK
["d","b","c","a"]
p>