Hadoop – Search Specific Text in String – Hive

/google/gmail/inbox
/google/drive/map
/google/apps
/yahoo/news/cricket
/yahoo/mail/
/yahoo/sports
/wiki/ind/jack
/wiki/us/jil

I need to get the required page group .If I use hive query to search for page groups starting with’google’, I need to get the data of the first 3 rows.

/google/gmail/inbox
/ google/drive/map
/google/apps

This way I need to get data based on page groups.

I use the like function to search for strings.

< p>

select * from table where field like'%/google/%';

It sounds like you want page groups. This may be Google, but it also seems to be Yahoo. If you want to extract page groups through search engines, you can use regular expressions. You can find them in (page1 | page2 |… | pageN) to place multiple websites.

Select column from table
where column rlike'.*(google|yahoo).*'

Output:

/google/gmail/inbox
/google/drive/map
/google/apps

You may need to create a new column that contains the search engine name or landing page. It seems that the first position in the path is the landing page. You can access the landing page this way:

p>

select * from
(Select column
, regexp_extract('^(\/[a-zA-Z]*\/)',1) as la nding_page
from table) a
where landing page in ('google','yahoo',...,'bing')
;

Output:

column new column
/google/gmail/inbox /google/
/google/drive/map /google/
/google/apps / google/
/yahoo/news/cricket /yahoo/
/yahoo/mail/ /yahoo/
/yahoo/sports /yahoo/
/bing/meats/delisandwich /bing /
/bing/maps/delis /bing/

If you don’t want /Google/ but just Google, then do:

regexp_extract( '^\/([a-zA-Z]*)\/',1) as landing_page

Now I assume that the landing page first appears in the path you describe.

< /div>

/google/gmail/inbox
/google/drive/map
/google/apps
/yahoo/news/cricket
/yahoo/mail/
/yahoo/sports
/wiki/ind/jack
/wiki/us/jil

I need to get the required page group. If I use hive query to search for page groups starting with’google’, I need to get the first 3 rows of data.

/google/gmail/inbox
/google /drive/map
/google/apps

This way I need to get data based on page groups.

I use the like function to search for strings.

select * from table where field like'%/google/%';

It sounds like you want page groups. This It may be Google, but it also seems to be Yahoo. If you want to extract page groups through search engines, you can use regular expressions. You can put multiple websites in (page1 | page2 |… | pageN).

Select column from table
where column rlike'.*(google|yahoo).*'

Output:

/google/gmail/inbox
/google/drive/map
/google/apps

You may need to create a new column that contains search engines Name or landing page. It seems that the first position in the path is the landing page. You can access the landing page this way:

select * from
( Select column
, regexp_extract('^(\/[a-zA-Z]*\/)',1) as landing_page
from table) a
where landing page in ( 'google','yahoo',...,'bing')
;

Output:

column new column
/google/gmail/inbox /google/
/google/drive/map /google/
/google/apps /google/
/yahoo/news/cricket /yahoo/
/yahoo/mail/ /yahoo/
/yahoo/sports /yahoo/
/bing/meats/delisandwich /bing/
/bing/maps/delis /bing/

If you don’t want /Google/ but just Google, then do:

regexp_extract('^\/([a-zA- Z]*)\/',1) as landing_page

Now I assume that the landing page appears first in the path you describe.

Leave a Comment

Your email address will not be published.