BigQuery or SQL Server SPLIT query - sql-server

I have searched around and can not find much on this topic. I have a table, that gets logging information. As a result the column I am interested in contains multiple values that I need to search against. The column is formatted in a php URL style. i.e.
/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32
This makes all searches end up with really long regexes to get data. Then join statements to combine data.
Is there a way in BigQuery, or SQL Server that I can pull the information from that column and put it into new columns?
Example:
The information I would like extracted begins after the ?, and ends at &, The string can sometimes be longer, and contains additional headers.
Thanks,

Below is for BigQuery Standard SQL and addresses below aspect of your question
Is there a way in BigQuery, ... that I can pull the information from that column and put it into new columns?
#standardSQL
CREATE TEMP FUNCTION parseColumn(kv STRING, column_name STRING) AS (
IF(SPLIT(kv, '=')[OFFSET(0)]= column_name, SPLIT(kv, '=')[OFFSET(1)], NULL)
);
WITH `project.dataset.table` AS (
SELECT '/test/test.aspx?extra=abc&DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url UNION ALL
SELECT '/test/test.aspx?DS_Vendor=55192&DS_ProdVer=4.30.100.0&more=123&DS_ProdLang=DE&DS_Product=MTE&DS_OfficeBits=64'
)
SELECT
MIN(parseColumn(kv, 'DS_Vendor')) AS DS_Vendor,
MIN(parseColumn(kv, 'DS_ProdVer')) AS DS_ProdVer,
MIN(parseColumn(kv, 'DS_ProdLang')) AS DS_ProdLang,
MIN(parseColumn(kv, 'DS_Product')) AS DS_Product,
MIN(parseColumn(kv, 'DS_OfficeBits')) AS DS_OfficeBits
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS kv
GROUP BY url
with the result as below
Row DS_Vendor DS_ProdVer DS_ProdLang DS_Product DS_OfficeBits
1 55039 7.90.100.0 EN MTT 32
2 55192 4.30.100.0 DE MTE 64
Below is also addressed
The string can sometimes be longer, and contains additional headers.

One example using BigQuery (with standard SQL):
SELECT REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
This returns the parts of the URL as an ARRAY<STRING>. To go one step further, you can get back an ARRAY<STRUCT<key STRING, value STRING>> with a query of this form:
SELECT
ARRAY(
SELECT AS STRUCT
SPLIT(part, '=')[OFFSET(0)] AS key,
SPLIT(part, '=')[OFFSET(1)] AS value
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS part
) AS keys_and_values
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
...or with the keys and values as top-level columns:
SELECT
SPLIT(part, '=')[OFFSET(0)] AS key,
SPLIT(part, '=')[OFFSET(1)] AS value
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
CROSS JOIN UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS part

Related

extracting certain size data from column

I have a table in MS SQL Server 2016. the table has a column called notes varchar(255)
The data that contains in the notes column contains notes entry by end user.
Select ServiceDate, notes from my_table
ServiceDate, notes
--------------------------------------
9/1/2022 The order was called in AB13456736
9/1/2022 AB45876453 not setup
9/2/2022 Signature for AB764538334
9/2/2022 Contact for A0943847432
9/3/2022 Hold off on AB73645298
9/5/2022 ** Confirmed AB88988476
9/6/2022 /AB9847654 completed
I would like to be able to extract the word AB% from the notes column. I know the ABxxxxxxx is always 10 characters. Because the ABxxxxxx always entered in different position, it's difficult to use exact extract where to look for. I have tried substring(), left() functions and because the value AB% is always in different positions, I can't get it to extract. is there a method I can use to do this?
thanks in advance.
Assuming there is only ONE AB{string} in notes, otherwise you would need a Table-Valued Function.
Note the nullif(). This is essentially a Fail-Safe if the string does not exist.
Example
Declare #YourTable Table ([ServiceDate] varchar(50),[notes] varchar(50)) Insert Into #YourTable Values
('9/1/2022','The order was called in AB13456736')
,('9/1/2022','AB45876453 not setup')
,('9/2/2022','Signature for AB764538334')
,('9/2/2022','Contact for A0943847432')
,('9/3/2022','Hold off on AB73645298')
,('9/5/2022','** Confirmed AB88988476')
,('9/6/2022','/AB9847654 completed')
Select *
,ABValue = substring(notes,nullif(patindex('%AB[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%',notes),0),10)
from #YourTable
Results
ServiceDate notes ABValue
9/1/2022 The order was called in AB13456736 AB13456736
9/1/2022 AB45876453 not setup AB45876453
9/2/2022 Signature for AB764538334 AB76453833
9/2/2022 Contact for A0943847432 NULL
9/3/2022 Hold off on AB73645298 AB73645298
9/5/2022 ** Confirmed AB88988476 AB88988476
9/6/2022 /AB9847654 completed NULL

How to write user-defined functions in snowflake?

I am very new to Snowflake. Till now I had used Teradata for writing complex SQL queries.
In snowflake, I need to create and call macros (similar to Teradata), where I have to pass date as parameters, and within the function I have to append records in a table. Something along these lines:
CREATE TABLE SFAAP.WS_DIRBNK_DPST.PV_HIGH_RISK_FI_LIST
(
APP_DT DATE
,FI_NAME VARCHAR(50)
);
CREATE OR REPLACE FUNCTION SFAAP.INSERT_FI (DT DATE, CRED CHAR(5))
--RETURNS NULL
--COMMENT='Create list of high risk FI by date'
AS
'
INSERT INTO SFAAP.WS_DIRBNK_DPST.PV_HIGH_RISK_FI_LIST
TO_DATE(DD) --------------Passed Parameter
,FI_NAME
FROM
(
SELECT
FINANCIAL_INSTITUTION AS FI_NAME
,COUNT(DISTINCT CASE WHEN IND_FPFA_FRAUD = 1 THEN APP_ID ELSE NULL END) AS TOT_FPFA_APPS
,COUNT(DISTINCT APP_ID) AS TOT_APPS
,CAST(TOT_FPFA_APPS AS DECIMAL(38,2))/TOT_APPS AS FRAUD_RATE
FROM
(
SELECT
A.*
,C.FINANCIAL_INSTITUTION
FROM BASE_05 A
LEFT JOIN
(
SELECT
BNK_ACCT_NBR_TOK
,BNK_TRAN_TYP_CDE
,ALT_DR_CR_CDE
,TRAN_1_DSC_TOK
,TRAN_DT
,TRAN_AMT
FROM "SFAAP"."V_SOT_DIRBNK_CLB_FRD_CRD"."BNK_DPS_TRAN_RLT_INFO"
WHERE TRAN_DT BETWEEN DATEADD(Day,-90,TO_DATE(DD)) AND TO_DATE(DD) --------------Passed Parameter, does calculation in the 90 days window from the passed date
AND ALT_DR_CR_CDE = TO_CHAR(CRED) --------------Passed Parameter
AND BNK_TRAN_TYP_CDE IN (22901,56003,56002,56302,56303,56102,70302)
AND TRAN_AMT>=5
QUALIFY ROW_NUMBER() OVER(PARTITION BY BNK_ACCT_NBR_TOK, TRAN_DT, TRAN_AMT, BNK_TRAN_TYP_CDE ORDER BY TRAN_DT ASC, TRAN_AMT DESC)=1
) B
ON A.BNK_ACCT_NBR = B.BNK_ACCT_NBR_TOK
LEFT JOIN SFAAP.WS_DIRBNK_DPST.PV_FRAUD_METRICS_03 C
ON B.TRAN_1_DSC_TOK = C.TOKEN_NAME
)SUB_A
GROUP BY 1
)SUB_B
WHERE FINANCIAL_INSTITUTION IS NOT NULL
AND TOT_APPS>=3
AND FRAUD_RATE>=0.20
'
;
I took some guidance from this answer here, but I am still not there yet. Here's the error which I am getting:
Due to lack of experience writing snowflake user-defined functions, I think I am messing up syntax somewhere (could be the way I am passing those two parameters). Comments/suggestions are most welcome.
Thanks in advance.
It looks like SFAAP is your database name, please include your schema name if you are going to use "Fully Qualified Names", or change your session context to use a database and schema and then create the function without the database and schema name.
example:
CREATE OR REPLACE FUNCTION SFAAP.WS_DIRBNK_DPST.INSERT_FI (

Oracle ROWTOCOL Function oddities

I have a requirement to pull data in a specific format and I'm struggling slightly with the ROWTOCOL function and was hoping a fresh pair of eyes might be able to help.
I'm using 10g Oracle DB (10.2) so LISTAGG which appears to do what I need to achieve is not an option.
I need to aggregate a number of usernames into a string delimited with a '$' but I also need to concatenate another column to to build up email addresses.
select
rowtocol('select username_id from username where user_id = '||s.user_id|| 'order by USERNAME_ID asc','#'||d.domain_name||'$')
from username s, domain d
where s.user_id = d.user_id
(I've simplified the query specific to just this function as the actual query is quite large and all works except for this particular function.)
in the DOMAIN Table I have a number of domains such as 'hotmail.com','gmail.com' etc
I need to concatenate the username, an '#' symbol followed by the domain and all delimited with a '$'
such as ......
joe.bloggs#gmail.com$joeblogs#gmail.com$joe_bloggs#gmail.com
I've battled with this and I've got close but in reverse?!.....
gmail.com$joe.bloggs#gmail.com$joeblogs#gmail.com$joe_bloggs
I've also noticed that if I play around with the delimiter (,'#'||d.domain_name||'$') it has a tendency to drop off the first character as can be seen above the preceding '#' has been dropped from the first email address.
Can anyone offer any suggestions as to how to get this working?
Many Thanks in advance!
Assuming you're using the rowtocol function from OTN, and have tables something like:
create table username (user_id number, username_id varchar2(20));
create table domain (user_id number, domain_name varchar2(20));
insert into username values (1, 'joe.bloggs');
insert into username values (1, 'joebloggs');
insert into username values (1, 'joe_bloggs');
insert into domain values (1, 'gmail.com');
Then your original query gets three rows back:
gmail.com$joe.bloggs
gmail.com$joe_bloggs#gmail.com$joebloggs
gmail.com$joe_bloggs#gmail.com$joebloggs
You're passing the data from each of your user IDs to a separate call to rowtocol, which isn't really what you want. You can get the result I think you're after by reversing it; pass the main query that joins the two tables as the select argument to the function, and have that passed query do the username/domain concatenation - that is a separate step to the string aggregation:
select
rowtocol('select s.username_id || ''#'' || d.domain_name from username s join domain d on d.user_id = s.user_id', '$')
from dual;
which gets a single result:
joe.bloggs#gmail.com$joe_bloggs#gmail.com$joebloggs#gmail.com
Whether that fits into your larger query, which you haven't shown, is a separate question. You might need to correlate it with the rest of your query.
There are other ways to string aggregation in Oracle, but this function is one way, and you already have it installed. I'd look at alternatives though, such as ThomasG's answer, which make it a bit clearer what's going on I think.
As Alex told you in comments, this ROWTOCOL isn't a standard function so if you don't show its code, there's nothing we can do to fix it.
However you can accomplish what you want in Oracle 10 using the XMLAGG built-in function.
try this :
SELECT
rtrim (xmlagg (xmlelement (e, s.user_id || '#' || d.domain_name || '$')).extract ('//text()'), '$') whatever
FROM username s
INNER JOIN domain d ON s.user_id = d.user_id

SQL - How can I return a value from a different table base on a parameter?

SQL - How can I return a value from a different table base on a parameter
First time poster, long time reader:
I am using a custom Excel function that allows be to pass parameters and build a SQL string that returns a value. This is working fine. However, I would like to choose among various tables based on the parameters that are passed.
At the moment I have two working functions with SQL statements look like this:
_______FUNCTION ONE________
<SQLText>
SELECT PRODDTA.TABLE1.T1DESC as DESCRIPTION
FROM PRODDTA.TABLE1
WHERE PRODDTA.TABLE1.T1KEY = '&PARM02'</SQLText>
_______FUNCTION TWO________
<SQLText>
SELECT PRODDTA.TABLE2.T2DESC as DESCRIPTION
FROM PRODDTA.TABLE2
WHERE PRODDTA.TABLE2.T2KEY = '&PARM02'</SQLText>
So I am using IF logic in Excel to check the first parameter and decide which function to use.
It would be much better if I could do a single SQL statement that could pick the right table based on the 1st parameter. Logically something like this:
_______FUNCTIONS COMBINED________
IF '&PARM02' = “A” THEN
SELECT PRODDTA.TABLE1.T1DESC as DESCRIPTION
FROM PRODDTA.TABLE1
WHERE PRODDTA.TABLE1.T1KEY = '&PARM02'
ELSE IF '&PARM02' = “B” THEN
SELECT PRODDTA.TABLE2.T2DESC as DESCRIPTION
FROM PRODDTA.TABLE2
WHERE PRODDTA.TABLE2.T2KEY = '&PARM02'
ELSE
DESCRIPTION = “”
Based on another post Querying different table based on a parameter I tried this exact syntax with no success
<SQLText>
IF'&PARM02'= "A"
BEGIN
SELECT PRODDTA.F0101.ABALPH as DESCRIPTION
FROM PRODDTA.F0101
WHERE PRODDTA.F0101.ABAN8 = '&PARM02'
END ELSE
BEGIN
SELECT PRODDTA.F4801.WADL01 as DESCRIPTION
FROM PRODDTA.F4801
WHERE PRODDTA.F4801.WADOCO = '&PARM02'
END</SQLText>
You could try using a JOIN statement.
http://www.sqlfiddle.com/#!9/23461d/1
Here is a fiddle showing two tables.
The following code snip will give you the values from both tables, using the Key as the matching logic.
SELECT Table1.description, Table1.key, Table2.description
from Table1
Join Table2 on Table1.key = Table2.key
Here's one way to do it. If PARM03='Use Table1' then the top half of the union will return records and vice versa. This won't necessarily product good performance though. You should consider why you are storing data in this way. It looks like you are partitioning data across different tables which is a bad idea.
SELECT PRODDTA.TABLE1.T1DESC as DESCRIPTION
FROM PRODDTA.TABLE1
WHERE PRODDTA.TABLE1.T1KEY = '&PARM02'
AND &PARM03='Use Table1'
UNION ALL
SELECT PRODDTA.TABLE2.T2DESC as DESCRIPTION
FROM PRODDTA.TABLE2
WHERE PRODDTA.TABLE2.T2KEY = '&PARM02'</SQLText>
AND &PARM03='Use Table2'

Parse json arrays using HIVE

I have many json arrays stored in a table (jt) that looks like this:
[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}]
[{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]
Each array is a record.
I would like to parse this table in order to get a new table (logs) with 3 fields: ts, id, log.
I tried to use the get_json_object method, but it seems that method is not compatible with json arrays because I only get null values.
This is the code I have tested:
CREATE TABLE logs AS
SELECT get_json_object(jt.value, '$.ts') AS ts,
get_json_object(jt.value, '$.id') AS id,
get_json_object(jt.value, '$.log') AS log
FROM jt;
I tried to use other functions but they seem really complicated.
Thank you! :)
Update!
I solved my issue by performing a regexp:
CREATE TABLE jt_reg AS
select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') as valuereg from jt;
CREATE TABLE logs AS
SELECT get_json_object(jt_reg.valuereg, '$.ts') AS ts,
get_json_object(jt_reg.valuereg, '$.id') AS id,
get_json_object(jt_reg.valuereg, '$.log') AS log
FROM ams_json_reg;
I just ran into this problem, with the JSON array stored as a string in the hive table.
The solution is a bit hacky and ugly, but it works and doesn't require serdes or external UDFs
SELECT
get_json_object(single_json_table.single_json, '$.ts') AS ts,
get_json_object(single_json_table.single_json, '$.id') AS id,
get_json_object(single_json_table.single_json, '$.log') AS log
FROM ( SELECT explode (
split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
'"}","', '"}",,,,"'), ',,,,')
) FROM src_table) single_json_table;
I broke the lines up so that it would be a little easier to read.
I'm using substr() to strip the first and last characters, removing [ and ] . I'm then using regex_replace to match the separator between records in the json array and adding or changing the separator to be something unique that can then be used easily with split() to turn the string into a hive array of json objects which can then be used with explode() as described in the previous solution.
Note, the separator regex used here ( "}"," ) wouldn't work with the original data set...the regex would have to be ( "},\{" ) and the replacement would then need to be "},,,,{" eg..
split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
'"},\\{"', '"},,,,{"'), ',,,,')
Use explode() function
hive (default)> CREATE TABLE logs AS
> SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
> get_json_object(single_json_table.single_json, '$.id') AS id,
> get_json_object(single_json_table.single_json, '$.log') AS log
> FROM
> (SELECT explode(json_array_col) as single_json FROM jt) single_json_table ;
Automatically selecting local only mode for query
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
hive (default)> select * from logs;
OK
ts id log
1403781896 14 show
1403781896 14 start
1403781911 14 press
1403781911 14 press
Time taken: 0.118 seconds, Fetched: 4 row(s)
hive (default)>
where json_array_col is column in jt which holds your array of jsons.
hive (default)> select json_array_col from jt;
json_array_col
["{"ts":1403781896,"id":14,"log":"show"}","{"ts":1403781896,"id":14,"log":"start"}"]
["{"ts":1403781911,"id":14,"log":"press"}","{"ts":1403781911,"id":14,"log":"press"}"]
because get_json_object doesn't support json array string, so you can concat to a json object, like this:
SELECT
get_json_object(concat(concat('{"root":', jt.value), '}'), '$.root')
FROM jt;

Resources