The index parameter is the capture group, which is an integer that can take the following values:
> 0: the whole game, in my example it will be foothebar
> 1: the first group, in my example it will be
> 2: the second group, in my example it will Is bar
> n: the nth group. If n is greater than the actual number of groups defined in the regular expression, the Hive query will fail.
In your example, regexp_extract(input,'[ 0-9] *’,0), you are looking for the entire match of the column identified by the input, starting with a numeric value.
Here are some examples:
> regexp_extract( ‘9eleven’,'[0-9] *’,0) -> return 9> regexp_extract(‘9eleven’,'[0-9] *’,1) -> query failed> regexp_extract(‘911test’,'[ 0-9] *’,0) -> return 911> regexp_extract(‘911test’,'[0-9] *’,1) -> query failed> regexp_extract(‘eleven’,'[0-9] *’ ,0) -> return an empty string> regexp_extract(‘test911′,'[0-9] *’,0) -> return an empty string
I am I encountered a piece of code in Apache Hive, such as regexp_extract(input,'[0-9] *’,0), can someone explain to me what this code does? Thank you
Starting from the Hive manual DDL, it returns the string extracted using the pattern. For example, regexp_extract(‘foothebar’,’foo(.*?)(bar)’, 2) Return to bar.
The index parameter is the capturing group, which is an integer that can take the following values:
> 0: The entire game, in my case it Will be foothebar
> 1: first group, in my case it will be
> 2: second group, in my case it will be bar
> n: nth group. If n is greater than the actual number of groups defined in the regular expression, the Hive query will fail.
In your example, regexp_extract(input,'[0-9] *’, 0), you are Find the entire match in the column identified by the input and start with a numeric value.
Here are some examples:
> regexp_extract(‘9eleven’,'[0-9] *’ ,0) -> return 9> regexp_extract(‘9eleven’,'[0-9] *’,1) -> query failed> regexp_extract(‘911test’,'[0-9] *’,0) -> return 911> regexp_extract(‘911test’,'[0-9] *’,1) –> Query failed> regexp_extract(‘eleven’,'[0-9] *’,0) –> returns an empty string> regexp_extract( ‘test911′,'[0-9] *’,0) -> returns an empty string