multiple matches in regex in hive - arrays

When I run
select regexp_extract("hosts: 192.168.1.1 192.168.1.2 host",'((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)',0);
I got 192.168.1.1.
But what i want is 192.168.1.1,192.168.1.2 or ["192.168.1.1","192.168.1.2"]
What should I do, change reg or create a UDF?

Split string, explode, check each part for regexp matching, collect array of matching parts, if necessary to get string from array, use concat_ws() to concatenate array:
with your_data as(
select stack(1, 'hosts: 192.168.1.1 192.168.1.2 host' ) as hosts
)
select collect_set(case when part rlike '((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)'
then part
else null end )
from your_data d
lateral view explode(split(hosts, ' +')) s as part;
Result:
["192.168.1.1","192.168.1.2"]

Related

Extract string in snowflake

Is there a way in snowflake to do the followin
I want to provide input like below
'ab.cd#test.com,ef.gh#test.com,ij.kl.mn#test.com,op.qr#test.com'
output should be
ab.cd#test.com
That means, output would be starting from beginning before the first occurrence of "comma"
I am not sure if below code will work in all scenarios or there is a better way to do this in Snowflake
SELECT
SUBSTRING ('ab.cd#test.com,ef.gh#test.com,ij.kl.mn#test.com,op.qr#test.com', 1,
CHARINDEX (',', 'ab.cd#test.com,ef.gh#test.com,ij.kl.mn#test.com,op.qr#test.com')-1
)
Using SPLIT_PART function:
SELECT
SPLIT_PART('ab.cd#test.com,ef.gh#test.com,ij.kl.mn#test.com,op.qr#test.com', ',',1)
Output:
ab.cd#test.com
Alternatively SPLIT_TO_TABLE:
SELECT *
FROM TABLE(SPLIT_TO_TABLE('ab.cd#test.com,ef.gh#test.com,ij.kl.mn#test.com,op.qr#test.com', ',')) s
WHERE s.Index = 1;
Hi you can use SPLIT_PART
Reference: SPLIT_PART
https://docs.snowflake.com/en/sql-reference/functions/split_part.html#split-part
select split_part('ab.cd#test.com,ef.gh#test.com,ij.kl.mn#test.com,op.qr#test.com' ,
',',1) from dual;

How to extract and combine specific data tables in hive database. Found some errors

SELECT
ps.hotel_id,
ps.name,
ps.country,
ps.city,
ps.chain_name,
ps.account_manager,
ps.adf_2_0,
t.booker_cc1,
sum(kpi.as_booked_roomnights) as ABRN,
sum(kpi.booked_roomnights) as BRN,
sum(kpi.cancelled_roomnights) as CRN,
sum(kpi.booked_price_euro) as booked_rev,
sum(kpi.stayed_roomnights) as SRN,
sum(kpi.stayed_price_euro) as stayed_rev,
sum(kpi.as_booked_commission_euro) as abc,
FROM reporting.kpi_booked_cancelled_stayed kpi
JOIN reporting.property_splits ps ON kpi.hotel_id=ps.hotel_id
JOIN fpa.transactions_detail t ON ps.hotel_id = t.hotel_id
where
to_date(kpi.yyyy_mm_dd) BETWEEN '2020-01-01' AND '2020-01-05'
AND ps.region ='APAC'
AND ps.is_global_chain ='1'
AND ps.is_open_bookable ='1'
GROUP BY 1,2,3,4,5,6,7
Hi, I wanted to extract some tables from hive and combine them together. However I got some error. Here is what I got:
***error while compiling statement: FAILED: ParseException line 17:0 cannot recognize input near 'FROM' 'reporting' '.' in selection target
Extra comma here:
sum(kpi.as_booked_commission_euro) as abc, --remove comma

Listagg for large data and included values in quotes

I would like to get all the type names of a user seperated in commas and included in single quotes. The problem I have is that &apos ; character is displayed as output instead of '.
Trial 1
SELECT LISTAGG(TYPE_NAME, ''',''') WITHIN GROUP (ORDER BY TYPE_NAME)
FROM ALL_TYPES
WHERE OWNER = 'USER1';
ORA-01489: result of string concatenation is too long
01489. 00000 - "result of string concatenation is too long"
*Cause: String concatenation result IS more THAN THE maximum SIZE.
*Action: Make sure that the result is less than the maximum size.
Trial 2
SELECT '''' || RTRIM(XMLAGG(XMLELEMENT(E,TYPE_NAME,q'$','$' ).EXTRACT('//text()')
ORDER BY TYPE_NAME).GetClobVal(),q'$','$') AS LIST
FROM ALL_TYPES
WHERE OWNER = 'USER1';
&apos ;TYPE1&apos ;,&apos ;TYPE2&apos ;, ............... ,'TYPE3&apos ;,&apos ;
Trial 3
SELECT
dbms_xmlgen.CONVERT(XMLAGG(XMLELEMENT(E,TYPE_NAME,''',''').EXTRACT('//text()')
ORDER BY TYPE_NAME).GetClobVal())
AS LIST
FROM ALL_TYPES
WHERE OWNER = 'USER1';
TYPE1&amp ;apos ;,&amp ;apos ;TYPE2&amp ;apos ;, ......... ,&amp ;apos ;TYPE3&amp ;apos ;,&amp ;apos ;
I don;t want to call replace function and then make substring as follow
With tbla as (
SELECT REPLACE('''' || RTRIM(XMLAGG(XMLELEMENT(E,TYPE_NAME,q'$','$' ).EXTRACT('//text()')
ORDER BY TYPE_NAME).GetClobVal(),q'$','$'),''',''') AS LIST
FROM ALL_TYPES
WHERE OWNER = 'USER1')
select SUBSTR(list, 1, LENGTH(list) - 2)
from tbla;
Is there any other way ?
use dbms_xmlgen.convert(col, 1) to prevent escaping.
According to Official docs, the second param flag is:
flag
The flag setting; ENTITY_ENCODE (default) for encode, and
ENTITY_DECODE for decode.
ENTITY_DECODE - 1
ENTITY_ENCODE - 0 default
Try this:
select
''''||substr(s, 1, length(s) - 2) list
from (
select
dbms_xmlgen.convert(xmlagg(xmlelement(e,type_name,''',''')
order by type_name).extract('//text()').getclobval(), 1) s
from all_types
where owner = 'USER1'
);
Tested the similar code below with 100000 rows:
with t (s) as (
select level
from dual
connect by level < 100000
)
select
''''||substr(s, 1, length(s) - 2)
from (
select
dbms_xmlgen.convert(xmlagg(XMLELEMENT(E,s,''',''') order by s desc).extract('//text()').getClobVal(), 1) s
from t);

Convert Statement to Crystal Reports SQL Expression

I have a SQL command that works great in SQL Server. Here's the query:
SELECT TOP 1000
(
SELECT COUNT(LINENUM)
FROM OEORDD D1
WHERE D1.ORDUNIQ = OEORDD.ORDUNIQ
)
- (SELECT COUNT(LINENUM)
FROM OEORDD D1
WHERE D1.ORDUNIQ = OEORDD.ORDUNIQ
AND D1.LINENUM > OEORDD.LINENUM)
FROM OEORDD
ORDER BY ORDUNIQ, LINENUM
The query looks at the total lines on an order, then looks at the current "LINENUM" field. With the value of the LINENUM field, it looks to see how many lines have a greater LINENUM value on the order and subtracts it from the number of lines on an order to get the correct Line number.
When I try to add it as a SQL expression in version 14.0.2.364 as follows:
(
(
SELECT COUNT("OEORDD"."LINENUM")
FROM "OEORDD" "D1"
WHERE "D1"."ORDUNIQ" = "OEORDD"."ORDUNIQ"
)
- (SELECT COUNT("OEORDD"."LINENUM")
FROM "OEORDD" "D1"
WHERE "D1"."ORDUNIQ" = "OEORDD"."ORDUNIQ"
AND "D1"."LINENUM" > "OEORDD"."LINENUM"
)
)
I get the error "Column 'SAMDB.dbo.OEORDD.ORDUNIQ' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If I try to add GROUP BY "OEORDD"."ORDUNIQ" at the end, I get "Incorrect syntax near the keyword 'GROUP'. I've tried adding "FROM OEORDD" at the end of query and it errors out on the word "FROM". I have the correct tables linked in the Database Expert.
EDIT --------------
I was able to get the first query working by getting rid of the alias, it's as follows:
(
SELECT COUNT(LINENUM)
FROM OEORDD
WHERE OEORDH.ORDUNIQ=OEORDD.ORDUNIQ)
)
However, I believe I need to use the alias in the second query to compare line numbers. I'm still stuck on that one.

Oracle split text into multiple rows

Inside a varchar2 column I have text values like :
aaaaaa. fgdfg.
bbbbbbbbbbbbbb ccccccccc
dddddd ddd dddddddddddd,
asdasdasdll
sssss
if i do select column from table where id=... i get the whole text in a single row, normally.
But i would like to get the result in multiple rows, 5 for the example above.
I have to use just one select statement, and the delimiters will be new line or carriage return (chr(10), chr(13) in oracle)
Thank you!
Like this, maybe (but it all depends on the version of oracle you are using):
WITH yourtable AS (SELECT REPLACE('aaaaaa. fgdfg.' ||chr(10)||
'bbbbbbbbbbbbbb ccccccccc ' ||chr(13)||
'dddddd ddd dddddddddddd,' ||chr(10)||
'asdasdasdll ' ||chr(13)||
'sssss '||chr(10),chr(13),chr(10)) AS astr FROM DUAL)
SELECT REGEXP_SUBSTR ( astr, '[^' ||chr(10)||']+', 1, LEVEL) data FROM yourtable
CONNECT BY LEVEL <= LENGTH(astr) - LENGTH(REPLACE(astr, chr(10))) + 1
see: Comma Separated values in Oracle
The answer by Kevin Burton contains a bug if your data contains empty lines.
The adaptation below, based on the solution invented here, works. Check that post for an explanation on the issue and the solution.
WITH yourtable AS (SELECT REPLACE('aaaaaa. fgdfg.' ||chr(10)||
'bbbbbbbbbbbbbb ccccccccc ' ||chr(13)||
chr(13)||
'dddddd ddd dddddddddddd,' ||chr(10)||
'asdasdasdll ' ||chr(13)||
'sssss '||chr(10),chr(13),chr(10)) AS astr FROM DUAL)
SELECT REGEXP_SUBSTR ( astr, '([^' ||chr(10)||']*)('||chr(10)||'|$)', 1, LEVEL, null, 1) data FROM yourtable
CONNECT BY LEVEL <= LENGTH(astr) - LENGTH(REPLACE(astr, chr(10))) + 1;

Resources