Create 1 array with 2 fields from 2 csv fields in BigQuery - arrays

I am currently trying to work through this and I'm unsure as to how to proceed. I have the below data
ID
name
value
One
a,b,c
10,20,30
I would like to turn it into
| ID | properties.name | properties.value |
|:---- |:------: | -----: |
| One | a | 10 |
| | b | 20 |
| | c | 30 |
The below query looked like it was working but instead of having an array it created a nested record with 2 array fields.
SELECT ID
name
, value
, array (
select as struct
split(name, ',') as name
, split(value, ',') as value
) as properties
FROM `orders`

Consider below approach
select id, array(
select as struct name, value
from unnest(split(name)) name with offset
join unnest(split(value)) value with offset
using(offset)
) as properties
from `orders`
if applied to sample data in your question - output is

Related

Postgress select like any of array of text

i have 2 tables; table 1: contain a wildcard paths and table 2: files with full path;
i want to select all files that match wild card path
example:
table1:
| type | path |
| sys | /etc/* |
| protected | /etc/* |
| sys | /sys/* |
| log | /log/* |
table2:
| file | path |
| f1.cmd | /etc/folder/name |
| f2.cmd | /log/folder/name |
| f3.cmd | /etc/folder/name |
| f4.cmd | /sys/folder/name |
my ultimate goal is: to create a VIEW that has all data from table2 and add one more column type to tell me which type does this file belongs to.
so that i can select all files that is of type = sys for example
** what i tried:**
step 1: get list of all paths of wanted type.
select array_agg(replace(path,'*','%')) from
table1 where type = 'sys'
group by type
this will result with something like {"etc\\%","sys\\%"}
step 2 select files using like any
select * from symbols where path like any ( array['etc\\%', 'sys\\%'] )
this successfully returned all files with paths like one i need.
now quesiton is how can i combine both queries into one :D ?
or is there an easier way using JOIN for example.
thanks
You could get the table1.type from each row in table2 by checking if table1.path is a substring of table2.path:
with table1(type, path) as (
values ('sys', '/etc/*'),
('sys', '/sys/*'),
('log', '/log/*'),
('etc', '/etc/*')
),
table2(file, path) as (
values ('f1.cmd', '/etc/folder/name'),
('f2.cmd', '/log/folder/name'),
('f3.cmd', '/etc/folder/name'),
('f4.cmd', '/sys/folder/name')
)
select *,
(select type
from table1
where position(replace(path, '*', '') in table2.path) > 0
limit 1) as type
from table2;
file | path | type
--------+------------------+------
f1.cmd | /etc/folder/name | sys
f2.cmd | /log/folder/name | log
f3.cmd | /etc/folder/name | sys
f4.cmd | /sys/folder/name | sys
(4 rows)

DELETE TOP variable records with variable from grouping of another table

Say I have two tables: A and B
Table A
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 0 |
+----+-------+
Table B
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 10 |
| 3 | 30 |
| 4 | 20 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
If I do SELECT value, COUNT(*) AS occurrence FROM A GROUP BY value, I'll get:
+-------+------------+
| value | occurrence |
+-------+------------+
| 20 | 2 |
| 10 | 1 |
| 0 | 1 |
+-------+------------+
Based on this grouping of table A, I want to delete occurrence records from table B with the same values. In other words, I want to delete from B 2 records with value 20, 1 record with value 10, and 1 record with value 0. (Other conditions include 'do nothing if no record exists' and 'smallest id first', but I think these conditions are pretty trivial compared to the bulk of this question.)
Table B after deleting should be:
+----+-------+
| id | value |
+----+-------+
| 3 | 30 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
From the official TOP documentation, doesn't seems like I can perform some JOIN to use as the TOP expression.
We could use ROW_NUMBER with CTEs here:
WITH cteA AS (
SELECT value, COUNT(*) cnt
FROM A
GROUP BY value
),
cteB AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY value ORDER BY id) rn
FROM B
)
DELETE
FROM cteB b
INNER JOIN cteA a
ON b.value = a.value
WHERE
b.rn <= a.cnt;
The logic here is that we use ROW_NUMBER to keep track of the order of each value in the B table. Then, we join to bring in the counts of each value in the A table, and we only delete B records for which the row number is strictly less than or equal to the A count.
See the demo link below to verify that the logic be correct. Note that I use a select there, not a delete, but the correct rows are being targeted for deletion.
Demo

TSQL Multiple column unpivot with named rows possible?

I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder

Rearranging and deduplicating SQL columns based on column data

Sorry I know that's a rubbish Title but I couldn't think of a more concise way of describing the issue.
I have a (MSSQL 2008) table that contains telephone numbers:
| CustomerID | Tel1 | Tel2 | Tel3 | Tel4 | Tel5 | Tel6 |
| Cust001 | 01222222 | 012333333 | 07111111 | 07222222 | 01222222 | NULL |
| Cust002 | 07444444 | 015333333 | 07555555 | 07555555 | NULL | NULL |
| Cust003 | 01333333 | 017777777 | 07888888 | 07011111 | 016666666 | 013333 |
I'd like to:
Remove any duplicate phone numbers
Rearrange the telephone numbers so that anything beginning with "07" is the first phone number. If there are multiple 07's, they should be in the first fields. The order of the numbers apart from that doesn't really matter.
So, for example, after processing, the table would look like:
| CustomerID | Tel1 | Tel2 | Tel3 | Tel4 | Tel5 | Tel6 |
| Cust001 | 07111111 | 07222222 | 01222222 | 012333333 | NULL | NULL |
| Cust002 | 07444444 | 07555555 | 015333333 | NULL | NULL | NULL |
| Cust003 | 07888888 | 07011111 | 016666666 | 013333 | 01333333 | 017777777 |
I'm struggling to figure out how to efficiently achieve my goal (there are 600,000+ records in the table). Can anyone help?
I've created a fiddle if it'll help anyone play around with the scenario.
You can break up the numbers into individual rows using UNPIVOT, then reorder them based on the occurence of the '07' prefix using ROW_NUMBER(), and finally recombine it using PIVOT to end up with the 6 Tel columns again.
select *
FROM
(
select CustomerID, Col, Tel
FROM
(
select *, Col='Tel' + RIGHT(
row_number() over (partition by CustomerID
order by case
when Tel like '07%' then 1
else 2
end),10)
from phonenumbers
UNPIVOT (Tel for Seq in (Tel1,Tel2,Tel3,Tel4,Tel5,Tel6)) seqs
) U
) P
PIVOT (MAX(TEL) for Col IN (Tel1,Tel2,Tel3,Tel4,Tel5,Tel6)) V;
SQL Fiddle
Perhaps using cursor to collect all customer id and sorting the fields...traditional sorting technique as we used to do in school c++ ..lolz...like to know if any other method possible.
If you dont get any then it is the last way . It will take a long time for sure to execute.

Select column based on whether a specific row in another table exists

Question is similar to this one How to write a MySQL query that returns a temporary column containing flags for whether or not an item related to that row exists in another table
Except that I need to be more specific about which rows exists
I have two tables: 'competitions' and 'competition_entries'
Competitions:
ID | NAME | TYPE
--------------------------------
1 | Example | example type
2 | Another | example type
Competition Entries
ID | USERID | COMPETITIONID
---------------------------------
1 | 100 | 1
2 | 110 | 1
3 | 110 | 2
4 | 120 | 1
I want to select the competitions but add an additional column which specifies whether the user has entered the competition or not. This is my current SELECT statement
SELECT
c.[ID],
c.[NAME],
c.[TYPE],
(CASE
WHEN e.ID IS NOT NULL AND e.USERID = #userid THEN 1
ELSE 0
END
) AS 'ENTERED'
FROM competitions AS c
LEFT OUTER JOIN competition_entries AS e
ON e.COMPETITIONID = c.ID
My desired result set from setting the #userid parameter to 110 is this
ID | NAME | TYPE | ENTERED
-------------------------------------
1 | Example | example type | 1
2 | Another | example type | 1
But instead I get this
ID | NAME | TYPE | ENTERED
-------------------------------------
1 | Example | example type | 0
1 | Example | example type | 1
1 | Example | example type | 0
2 | Another | example type | 1
Because it's counting the entries for all user ids
Fixing your query
SELECT
c.[ID],
c.[NAME],
c.[TYPE],
MAX(CASE
WHEN e.ID IS NOT NULL AND e.USERID = #userid THEN 1
ELSE 0
END
) AS 'ENTERED'
FROM competitions AS c
LEFT OUTER JOIN competition_entries AS e ON e.COMPETITIONID = c.ID
GROUP BY
c.[ID],
c.[NAME],
c.[TYPE]
An alternative is to rewrite it using EXISTS which is pretty much the same but may be easier to understand.
BTW, using single quotes on the column name is deprecated. Use square brackets.
SELECT
c.[ID],
c.[NAME],
c.[TYPE],
CASE WHEN EXISTS (
SELECT *
FROM competition_entries AS e
WHERE e.COMPETITIONID = c.ID
AND e.USERID = #userid) THEN 1 ELSE 0 END [ENTERED]
FROM competitions AS c

Resources