Hadoop – How to extract the first tuple from the package generated in the PIG (whose size may be different)?

I’m generating a “package” information, its size (the number of tuples in the package) may be different. From here, I want to dynamically extract the first element. I should How to do it
According to the docs, a bag is a tuple and

Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)) , the expression represents a bag composed of the specified fields.

But be careful, b.$0 does not give you the first tuple because the bag is not ordered! You will get the first element that composes the tuple.

You need to convert the package to an ordered structure, or better, use UDF. You should also not accept this answer (so I can delete it) And accept Guarev, but there is a UDF link.

I am generating a “package” information, its size (the number of tuples in the package) may vary Different. From here, I want to dynamically extract the first element. What should I do?

According to the docs, a bag is a tuple and

Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)), the expression represents a bag composed of the specified fields.

But be careful, b.$0 does not give you the first tuple because the bag is not ordered! You will get the first element that composes the tuple.

You need to convert the package to an ordered structure, or better, use UDF. You should also not accept this answer (so I can delete it) And accept Guarev, but there is a UDF link.

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 4375 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.