Is there a trick I'm missing when using a WHERE...ANY style query in Postgres to order the results by the given array? ie:
SELECT *
FROM table
WHERE id = ANY (<idsInDesiredOrder>)
Perhaps array_position can help you:
WITH sample (id, description) AS (
VALUES
(1, 'lorem ipsum'),
(2, 'lorem ipsum'),
(3, 'lorem ipsum'),
(4, 'lorem ipsum')
)
SELECT
*
FROM
sample
WHERE
id = ANY (ARRAY[3,2])
ORDER BY
array_position(ARRAY[3,2], id);
As Marc mentioned, SQL will never guarantee ordering without an ORDER BY clause. Historically, cases like this were problematic in Postgres, because you couldn't write an ORDER BY which referenced the array index.
They addressed this in 9.4 by adding the WITH ORDINALITY clause for set-returning function calls, allowing you to unpack an array into a "values" column and an "index" column:
SELECT * FROM table
JOIN UNNEST(<preSortedIds>) WITH ORDINALITY u(id,pos) USING (id)
ORDER BY pos
In SQL, when you want to order a result use, you must use an "ORDER BY" clause. Else the results are in a inpredictable order. So rewrite you query as
SELECT *
FROM table
WHERE id = ANY (<preSortedIds>)
ORDER BY id
Related
I am relatively new to Snowflake and struggle a bit with setting up a transformation for a semi-structured dataset. I have several log data batches, where each batch (table row in Snowflake) has the following columns: LOG_ID, COLUMN_NAMES, and LOG_ENTRIES .
COLUMN_NAMES contains a semicolon-separated list of columns names, e.g.:
“TIMESTAMP;Sensor A;Sensor B”, “TIMESTAMP;Sensor B;Sensor C”
LOG_ENTRIES:entry contains a semicolon separated list of values, e.g.
“2020-02-11 09:08:19; 99.24;12.25”
The COLUMN_NAMES string can be different between log batches (Snowflake rows), but the names in the order they appear describe the content of the LOG_ENTRIES column values of the same row. My goal is to transform the data into a table that has column names for all unique values present in the COLUMN_NAMES column, e.g.:
LOG_ID
TIMESTAMP
Sensor A
Sensor B
Sensor C
1
2020-02-11 09:08:19
99.24
12.25
NaN
2
2020-02-11 09:10:44
NaN
13.32
0.947
Can this be achieved with a snowflake script, and if so, how? :)
Best regards,
Johan
You should use the SPLIT_TO_TABLE function, split the two values, and join them by index.
After that, all you have to do is use PIVOT to invert the table.
Sample data:
create or replace table splittable (LOG_ID int, COLUMN_NAMES varchar, LOG_ENTRIES varchar);
insert into splittable (LOG_ID, COLUMN_NAMES, LOG_ENTRIES)
values (1, 'TIMESTAMP;Sensor A;Sensor B', '2020-02-11 09:08:19;99.24;12.25'),
(2, 'TIMESTAMP;Sensor B;Sensor C', '2020-02-11 09:10:44;13.32;0.947');
Solution proposal:
WITH src AS (
select LOG_ID, cn.VALUE as COLUMN_NAMES, le.VALUE as LOG_ENTRIES
from splittable as st,
lateral split_to_table(st.COLUMN_NAMES, ';') as cn,
lateral split_to_table(st.LOG_ENTRIES, ';') as le
where cn.INDEX = le.INDEX
)
select * from src
pivot (min(LOG_ENTRIES) for COLUMN_NAMES in ('TIMESTAMP','Sensor A','Sensor B','Sensor C'))
order by LOG_ID;
Reference: SPLIT_TO_TABLE, PIVOT
If the column list is variable and you can't define it then you have to write some generator, maybe it will help: CREATE A DYNAMIC PIVOT IN SNOWFLAKE
You could transform the data into an ACTUAL semi-structured data type that you can then natively query using Snowflake SQL.
WITH x AS (
SELECT column_names, log_entries
FROM (VALUES ('TIMESTAMP_;SENSOR1','2021-02-01'||';1.2')) x (column_names, log_entries)
),
y AS (
SELECT *
FROM x,
LATERAL FLATTEN(input => split(column_names,';')) f
),
z AS (
SELECT *
FROM x,
LATERAL FLATTEN(input => split(log_entries,';')) f
)
SELECT listagg(('"'||y.value||'":"'||z.value||'"'),',') as cnt
, parse_json('{'||cnt||'}') as var
FROM y
JOIN z
ON y.seq = z.seq
AND y.index = z.index
GROUP BY y.seq;
I am trying to order values that are going to be inserted into another table based on their order value from a secondary table. We are migrating old data to a new table with a slightly different structure.
I thought I would be able to accomplish this by using the string_split function but I'm not very skilled with SQL and so I am running into some issues.
Here is what I have:
UPDATE lse
SET Options = a.Options
FROM
dbo.LessonStepElement as lse
CROSS JOIN
(
SELECT
tbl1.*
tbl2.Options,
tbl2.QuestionId
FROM
dbo.TrainingQuestionAnswer as tbl1
JOIN (
SELECT
string_agg((CASE
WHEN tqa.CorrectAnswer = 1 THEN REPLACE(tqa.AnswerText, tqa.AnswerText, '*' + tqa.AnswerText)
ELSE tqa.AnswerText
END),
char(10)) as Options,
tq.Id as QuestionId
FROM
dbo.TrainingQuestionAnswer as tqa
INNER JOIN
dbo.TrainingQuestion as tq
on tq.Id = tqa.TrainingQuestionId
INNER JOIN
dbo.Training as t
on t.Id = tq.TrainingId
WHERE
t.IsDeleted = 0
and tq.IsDeleted = 0
and tqa.IsDeleted = 0
GROUP BY
tq.Id,
tqa.AnswerDisplayOrder
ORDER BY
(SELECT [Value] FROM STRING_SPLIT((SELECT AnswerDisplayOrder FROM dbo.TrainingQuestion WHERE Id = tmq.Id), ','))
) as tbl2
on tbl1.TrainingQuestionId = tbl2.QuestionId
) a
WHERE
a.TrainingQuestionId = lse.TrainingQuestionId
The AnswerDisplayOrder that I am using is just a nvarchar comma separated list of the ids for the answers to the question.
Here is an example:
I have 3 rows in the TrainingQuestionAnswer table that look like the following.
ID TrainingQuestionId AnswerText
-------------------------------------------
215 100 No
218 100 Yes
220 100 I'm not sure
I have 1 row in the TrainingQuestion table that looks like the following.
ID AnswerDisplayOrder
--------------------------
100 "218,215,220"
Now what I am trying to do is when I update the row in the new table with all of the answers combined, the answers will need to be in the correct order which is dependent on the AnswerDisplayOrder in the TrainingQuestion table. So in essence, the new table would have a row that would look similar to the following.
ID Options
--------------
193 "Yes No I'm not sure"
I'm aware that the way I'm trying to do it might not be even possible at all. I am still learning and would just love some advice or guidance on how to make this work. I also know that string_split does not guarantee order. I'm open to other suggestions that do guarantee the order as well.
I simplified the issue in the question to the following approach, that is a possible solution to your problem. If you want to get the results from the question, you need a splitter, that returns the substrings and the positions of the substrings. STRING_SPLIT() is available from SQL Server 2016, but is not an option here, because (as is mentioned in the documentation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try to use a JSON based approach, with a small string manipulation, that transforms answers IDs into a valid JSON array (218,215,220 into [218,215,220]). After that you can easily parse this JSON array with OPENJSON() and default schema. The result is a table, with columns key, value and type and the key column (again from the documentation) holds the index of the element in the specified array.
Tables:
CREATE TABLE TrainingQuestionId (
ID int,
TrainingQuestionId int,
AnswerText varchar(1000)
)
INSERT INTO TrainingQuestionId
(ID, TrainingQuestionId, AnswerText)
VALUES
(215, 100, 'No'),
(218, 100, 'Yes'),
(220, 100, 'I''m not sure')
CREATE TABLE TrainingQuestion (
ID int,
AnswerDisplayOrder varchar(1000)
)
INSERT INTO TrainingQuestion
(ID, AnswerDisplayOrder)
VALUES
(100, '218,215,220')
Statement:
SELECT tq.ID, oa.Options
FROM TrainingQuestion tq
OUTER APPLY (
SELECT STRING_AGG(tqi.AnswerText, ' ') WITHIN GROUP (ORDER BY CONVERT(int, j.[key])) AS Options
FROM OPENJSON(CONCAT('[', tq.AnswerDisplayOrder, ']')) j
LEFT JOIN TrainingQuestionId tqi ON TRY_CONVERT(int, j.[value]) = tqi.ID
) oa
Result:
ID Options
100 Yes No I'm not sure
Notes: You need SQL Server 2017+ to use STRING_AGG(). For SQL Server 2016 you need a FOR XML to aggregate strings.
declare #TrainingQuestionAnswer table
(
ID int,
TrainingQuestionId int,
AnswerText varchar(20)
);
insert into #TrainingQuestionAnswer(ID, TrainingQuestionId, AnswerText)
values(215, 100, 'No'), (218, 100, 'Yes'), (220, 100, 'I''m not sure');
declare #TrainingQuestiontest table
(
testid int identity,
QuestionId int,
AnswerDisplayOrder varchar(200)
);
insert into #TrainingQuestiontest(QuestionId, AnswerDisplayOrder)
values(100, '218,215,220'), (100, '220,218,215'), (100, '215,218');
select *,
(
select string_agg(pci.AnswerText, '==') WITHIN GROUP ( ORDER BY pci.pos)
from
(
select a.AnswerText,
pos = charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,','))
from #TrainingQuestionAnswer as a
where a.TrainingQuestionId = q.QuestionId
and charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,',')) >= 1
) as pci
) as TestAnswerText
from #TrainingQuestiontest as q;
I have the following values in the SQL Server table:
But I need to build query from which output look like this:
I know that I should probably use combination of substring and charindex but I have no idea how to do it.
Could you please help me how the query should like?
Thank you!
Try the following, it may work.
SELECT
offerId,
cTypes
FROM yourTable AS mt
CROSS APPLY
EXPLODE(mt.contractTypes) AS dp(cTypes);
You can use string_split function :
select t.offerid, trim(translate(tt.value, '[]"', ' ')) as contractTypes
from table t cross apply
string_split(t.contractTypes, ',') tt(value);
The data in each row in the contractTypes column is a valid JSON array, so you may use OPENJSON() with explicit schema (result is a table with columns defined in the WITH clause) to parse this array and get the expected results:
Table:
CREATE TABLE Data (
offerId int,
contractTypes varchar(1000)
)
INSERT INTO Data
(offerId, contractTypes)
VALUES
(1, '[ "Hlavni pracovni pomer" ]'),
(2, '[ "ÖCVS", "Staz", "Prahovne" ]')
Table:
SELECT d.offerId, j.contractTypes
FROM Data d
OUTER APPLY OPENJSON(d.contractTypes) WITH (contractTypes varchar(100) '$') j
Result:
offerId contractTypes
1 Hlavni pracovni pomer
2 ÖCVS
2 Staz
2 Prahovne
As an additional option, if you want to return the position of the contract type in the contractTypes array, you may use OPENJSON() with default schema (result is a table with columns key, value and type and the value in the key column is the 0-based index of the element in the array):
SELECT
d.offerId,
CONVERT(int, j.[key]) + 1 AS contractId,
j.[value] AS contractType
FROM Data d
OUTER APPLY OPENJSON(d.contractTypes) j
ORDER BY CONVERT(int, j.[key])
Result:
offerId contractId contractType
1 1 Hlavni pracovni pomer
2 1 ÖCVS
2 2 Staz
2 3 Prahovne
I'm using SQL Server 2014.
I have a table that contains several millions of events. The primary key is composed of three columns:
Time DateTime
user (bigint)
context (varchar(50))
I have another column with a value (nvarchar(max))
I need to count rows restricted on
context = 'somecontext' and value = 'value2'
that follows in time rows restricted on
context = 'somecontext' and value = 'value1'
for the same user.
For Example with the following records:
Time user context value
2019-02-22 14:56:57.710 359586015014836 somecontext value1
2019-02-22 15:13:42.887 359586015014836 somecontext value2 <------ Need to count this rows only.
It is "recorded" 15 min after the first one and the user and context are the same.
I have seen other similar questions like this one or that one.
Should I make a JOIN on the same table? Use subqueries? may be a CTE? I'm concerned about performance that should be optimal.
The idea would be to use query features available in this version of the DB engine.
If the example that I made in comment is what you want than you can use the following code
assuming that you want to select all the rows where context = 'c1', current value = 'v1', next value = 'v3' if ordered by time:
declare #t table
(
Time_ DateTime,
user_ bigint,
context varchar(50),
value_ varchar(50)
);
insert into #t values
('20000101', 1, 'c1', 'v1'),
('20000102', 1, 'c2', 'v3'),
('20000103', 1, 'c1', 'v3'),
('20000104', 2, 'c1', 'v1'),
('20000105', 2, 'c1', 'v4'),
('20000106', 2, 'c1', 'v2');
with cte as
(
select *,
lead(value_) over(partition by user_ order by time_) as next_value
from #t
where context = 'c1'
)
select *
from cte
where next_value = 'v3';
I have a JSONB object in PostgreSQL:
'{"cars": ["bmw", "mercedes", "pinto"], "user_name": "ed"}'
I am trying to use values from the "cars" array inside it in the WHERE clause of a SELECT:
SELECT car_id FROM cars WHERE car_type IN ('bmw', 'mercedes', 'pinto');
This will correctly return the values 1, 2, and 3 - see table setup at the bottom of this post.
Currently, in my function I do this:
(1) Extract the "cars" array into a variable `v_car_results`.
(2) Use that variable in the `WHERE` clause.
Pseudo code:
DECLARE v_car_results TEXT
BEGIN
v_car_results = '{"cars": ["bmw", "mercedes", "pinto"], "user_name": "ed"}'::json#>>'{cars}';
-- this returns 'bmw', 'mercedes', 'pinto'
SELECT car_id FROM cars WHERE car_type IN ( v_car_results );
END
However, the SELECT statement is not returning any rows. I know it's reading those 3 car types as a single type. (If I only include one car_type in the "cars" element, the query works fine.)
How would I treat these values as an array inside the WHERE clause?
I've tried a few other things:
The ANY clause.
Various attempts at casting.
These queries:
SELECT car_id FROM cars
WHERE car_type IN (json_array_elements_text('["bmw", "mercedes", "pinto"]'));
...
WHERE car_type IN ('{"cars": ["bmw", "mercedes", "pinto"], "user_name": "ed"}':json->>'cars');
It feels like it's something simple I'm missing. But I've fallen down the rabbit hole on this one. (Maybe I shouldn't even be using the ::json#>> operator?)
TABLE SETUP
CREATE TABLE cars (
car_id SMALLINT
, car_type VARCHAR(255)
);
INSERT INTO cars (car_id, car_type)
VALUES
(1, 'bmw')
, (2, 'mercedes')
, (3, 'pinto')
, (4, 'corolla');
SELECT car_id FROM cars
WHERE car_type IN ('bmw', 'mercedes', 'pinto'); -- Returns Values : 1, 2, 3
Assuming at least the current Postgres 9.5.
Use the set-returning function jsonb_array_elements_text() (as table function!) and join to the result:
SELECT c.car_id
FROM jsonb_array_elements_text('{"cars": ["bmw", "mercedes", "pinto"]
, "user_name": "ed"}'::jsonb->'cars') t(car_type)
JOIN cars c USING (car_type);
Extract the JSON array from the object with jsonb->'cars' and pass the resulting JSON array (still data type jsonb) to the function. (The operator #> would do the job as well.)
Aside: ::json#>> isn't just an operator. It's a cast to json (::json), followed by the operator #>>. You don't need either.
The resulting type text conveniently matches your column type varchar(255), so we don't need type-casting. And assign the column name car_type to allow for the syntax shorthand with USING in the join condition.
This form is shorter, more elegant and typically a bit faster than alternatives with IN () or = ANY() - which would work too. Your attempts were pretty close, but you need a variant with a subquery. This would work:
SELECT car_id FROM cars
WHERE car_type IN (SELECT json_array_elements_text('["bmw", "mercedes", "pinto"]'));
Or, cleaner:
SELECT car_id FROM cars
WHERE car_type IN (SELECT * FROM json_array_elements_text('["bmw", "mercedes", "pinto"]'));
Detailed explanation:
How to use ANY instead of IN in a WHERE clause?
Related:
How to turn JSON array into Postgres array?