Arrays – Array intersects Hive

I have two string arrays in Hive

{'value1','value2','value3' }
{'value1','value2'}

I want to merge arrays without duplicates, the result:

{'value1', 'value2','value3'}

How can I do this in the hive?

You need a UDF. Klout has a bunch of open source HivUDFS under the package
brickhouse. This is github link. They have a bunch of UDFs that fit your purpose perfectly.
Download, build and add JARs. Here is an example

CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b ','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c ","a"]

I have two string arrays in Hive

{' value1','value2','value3'}
{'value1','value2'}

I want to merge arrays without duplicates, the result:

{'value1','value2','value3'}

How can I do this in the hive?

You need a UDF. Klout has a bunch of open source HivUDFS under the package
brickhouse. This is the github link. They have a bunch of UDFs, which are exactly yours Purpose.
Download, build and add JAR. Here is an example

CREATE TEMPORARY FUNCTION combine AS'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b' ,'c','d'))) from reqtable;

OK
["d","b","c","a"]

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 465 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.