Hadoop – How to extract the first tuple from the package generated in the PIG (whose size may be different)?

I’m generating a “package” information, its size (the number of tuples in the package) may be different. From here, I want to dynamically extract the first element. I should How to do it
According to the docs, a bag is a tuple and

Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)) , the expression represents a bag composed of the specified fields.

But be careful, b.$0 does not give you the first tuple because the bag is not ordered! You will get the first element that composes the tuple.

You need to convert the package to an ordered structure, or better, use UDF. You should also not accept this answer (so I can delete it) And accept Guarev, but there is a UDF link.

I am generating a “package” information, its size (the number of tuples in the package) may vary Different. From here, I want to dynamically extract the first element. What should I do?

According to the docs, a bag is a tuple and

Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)), the expression represents a bag composed of the specified fields.

But be careful, b.$0 does not give you the first tuple because the bag is not ordered! You will get the first element that composes the tuple.

You need to convert the package to an ordered structure, or better, use UDF. You should also not accept this answer (so I can delete it) And accept Guarev, but there is a UDF link.

Leave a Comment

Your email address will not be published.