Use Values from JSONB Array inside a WHERE IN Clause - arrays

I have a JSONB object in PostgreSQL:
'{"cars": ["bmw", "mercedes", "pinto"], "user_name": "ed"}'
I am trying to use values from the "cars" array inside it in the WHERE clause of a SELECT:
SELECT car_id FROM cars WHERE car_type IN ('bmw', 'mercedes', 'pinto');
This will correctly return the values 1, 2, and 3 - see table setup at the bottom of this post.
Currently, in my function I do this:
(1) Extract the "cars" array into a variable `v_car_results`.
(2) Use that variable in the `WHERE` clause.
Pseudo code:
DECLARE v_car_results TEXT
BEGIN
v_car_results = '{"cars": ["bmw", "mercedes", "pinto"], "user_name": "ed"}'::json#>>'{cars}';
-- this returns 'bmw', 'mercedes', 'pinto'
SELECT car_id FROM cars WHERE car_type IN ( v_car_results );
END
However, the SELECT statement is not returning any rows. I know it's reading those 3 car types as a single type. (If I only include one car_type in the "cars" element, the query works fine.)
How would I treat these values as an array inside the WHERE clause?
I've tried a few other things:
The ANY clause.
Various attempts at casting.
These queries:
SELECT car_id FROM cars
WHERE car_type IN (json_array_elements_text('["bmw", "mercedes", "pinto"]'));
...
WHERE car_type IN ('{"cars": ["bmw", "mercedes", "pinto"], "user_name": "ed"}':json->>'cars');
It feels like it's something simple I'm missing. But I've fallen down the rabbit hole on this one. (Maybe I shouldn't even be using the ::json#>> operator?)
TABLE SETUP
CREATE TABLE cars (
car_id SMALLINT
, car_type VARCHAR(255)
);
INSERT INTO cars (car_id, car_type)
VALUES
(1, 'bmw')
, (2, 'mercedes')
, (3, 'pinto')
, (4, 'corolla');
SELECT car_id FROM cars
WHERE car_type IN ('bmw', 'mercedes', 'pinto'); -- Returns Values : 1, 2, 3

Assuming at least the current Postgres 9.5.
Use the set-returning function jsonb_array_elements_text() (as table function!) and join to the result:
SELECT c.car_id
FROM jsonb_array_elements_text('{"cars": ["bmw", "mercedes", "pinto"]
, "user_name": "ed"}'::jsonb->'cars') t(car_type)
JOIN cars c USING (car_type);
Extract the JSON array from the object with jsonb->'cars' and pass the resulting JSON array (still data type jsonb) to the function. (The operator #> would do the job as well.)
Aside: ::json#>> isn't just an operator. It's a cast to json (::json), followed by the operator #>>. You don't need either.
The resulting type text conveniently matches your column type varchar(255), so we don't need type-casting. And assign the column name car_type to allow for the syntax shorthand with USING in the join condition.
This form is shorter, more elegant and typically a bit faster than alternatives with IN () or = ANY() - which would work too. Your attempts were pretty close, but you need a variant with a subquery. This would work:
SELECT car_id FROM cars
WHERE car_type IN (SELECT json_array_elements_text('["bmw", "mercedes", "pinto"]'));
Or, cleaner:
SELECT car_id FROM cars
WHERE car_type IN (SELECT * FROM json_array_elements_text('["bmw", "mercedes", "pinto"]'));
Detailed explanation:
How to use ANY instead of IN in a WHERE clause?
Related:
How to turn JSON array into Postgres array?

Related

PostgreSQL aggregate over json arrays

I have seen a lot of references to using json_array_elements on extracting the elements of a JSON array. However, this appears to only work on exactly 1 array. If I use this in a generic query, I get the error
ERROR: cannot call json_array_elements on a scalar
Given something like this:
orders
{ "order_id":"2", "items": [{"name": "apple","price": 1.10}]}
{ "order_id": "3","items": [{"name": "apple","price": 1.10},{"name": "banana","price": 0.99}]}
I would like to extract
item
count
apple
2
banana
1
Or
item
total_value_sold
apple
2.20
banana
0.99
Is it possible to aggregate over json arrays like this using json_array_elements?
Use the function for orders->'items' to flatten the data:
select elem->>'name' as name, (elem->>'price')::numeric as price
from my_table
cross join jsonb_array_elements(orders->'items') as elem;
It is easy to get the aggregates you want from the flattened data:
select name, count(*), sum(price) as total_value_sold
from (
select elem->>'name' as name, (elem->>'price')::numeric as price
from my_table
cross join jsonb_array_elements(orders->'items') as elem
) s
group by name;
Db<>fiddle.

Count rows that follows other rows in a single table both restricted with a where clause

I'm using SQL Server 2014.
I have a table that contains several millions of events. The primary key is composed of three columns:
Time DateTime
user (bigint)
context (varchar(50))
I have another column with a value (nvarchar(max))
I need to count rows restricted on
context = 'somecontext' and value = 'value2'
that follows in time rows restricted on
context = 'somecontext' and value = 'value1'
for the same user.
For Example with the following records:
Time user context value
2019-02-22 14:56:57.710 359586015014836 somecontext value1
2019-02-22 15:13:42.887 359586015014836 somecontext value2 <------ Need to count this rows only.
It is "recorded" 15 min after the first one and the user and context are the same.
I have seen other similar questions like this one or that one.
Should I make a JOIN on the same table? Use subqueries? may be a CTE? I'm concerned about performance that should be optimal.
The idea would be to use query features available in this version of the DB engine.
If the example that I made in comment is what you want than you can use the following code
assuming that you want to select all the rows where context = 'c1', current value = 'v1', next value = 'v3' if ordered by time:
declare #t table
(
Time_ DateTime,
user_ bigint,
context varchar(50),
value_ varchar(50)
);
insert into #t values
('20000101', 1, 'c1', 'v1'),
('20000102', 1, 'c2', 'v3'),
('20000103', 1, 'c1', 'v3'),
('20000104', 2, 'c1', 'v1'),
('20000105', 2, 'c1', 'v4'),
('20000106', 2, 'c1', 'v2');
with cte as
(
select *,
lead(value_) over(partition by user_ order by time_) as next_value
from #t
where context = 'c1'
)
select *
from cte
where next_value = 'v3';

PostgreSQL: WHERE id = ANY (idsArray) - sort results by idsArray

Is there a trick I'm missing when using a WHERE...ANY style query in Postgres to order the results by the given array? ie:
SELECT *
FROM table
WHERE id = ANY (<idsInDesiredOrder>)
Perhaps array_position can help you:
WITH sample (id, description) AS (
VALUES
(1, 'lorem ipsum'),
(2, 'lorem ipsum'),
(3, 'lorem ipsum'),
(4, 'lorem ipsum')
)
SELECT
*
FROM
sample
WHERE
id = ANY (ARRAY[3,2])
ORDER BY
array_position(ARRAY[3,2], id);
As Marc mentioned, SQL will never guarantee ordering without an ORDER BY clause. Historically, cases like this were problematic in Postgres, because you couldn't write an ORDER BY which referenced the array index.
They addressed this in 9.4 by adding the WITH ORDINALITY clause for set-returning function calls, allowing you to unpack an array into a "values" column and an "index" column:
SELECT * FROM table
JOIN UNNEST(<preSortedIds>) WITH ORDINALITY u(id,pos) USING (id)
ORDER BY pos
In SQL, when you want to order a result use, you must use an "ORDER BY" clause. Else the results are in a inpredictable order. So rewrite you query as
SELECT *
FROM table
WHERE id = ANY (<preSortedIds>)
ORDER BY id

Any reason why I shouldn't use "between X and Y" on a varchar field in SQL to return a number?

I've got an indexed (but not unique) varchar field of Employee IDs in a table, and in a query I need to return rows that are exactly 4 numerical characters but also over 1000.
I've found various questions on here about using validation methods to check that the field contains 0-9 characters, or doesn't contain a-z characters etc, but these are unrelated to this question.
Background:
I've got a table with various values, sample set as follows:
EmployeeID
----------
6745
EMP1
EMP2
1874
LTST
5694
0014
What I would like to do is return all values except EMP1, EMP2, LTST and 0014.
My question is, are there any reasons why I shouldn't use a Where clause like where EmployeeID between '1000' and '9999'? Reason for this being employeeid is a varchar column
If I can do this, should I also Order By employee ID, or does this not matter?
I believe "0014" would be left out of the where clause between '1000' and '9999', so that's a reason. Perhaps between '0000' and '9999' would suit your purposes better. Just remember that you're still sorting based on text. If you have any entries like "1_99", this would also show up in your query results with your given between clause.
If you're looking to only return 4-character numbers excluding leading zeroes, then the following addition should suffice:
WHERE EmployeeID BETWEEN '1000' AND '9999' AND TRY_CAST(EmployeeID As int) IS NOT NULL
...or, more intuitively:
WHERE TRY_CAST(EmployeeID As int) BETWEEN 1000 AND 9999
Run the following code as an example and you'll see that SQL Server doesn't treat INT the same as integers stored as VARCHAR:
WITH IntsAsVars
AS (
SELECT var = '1000',
int = 1000
UNION ALL
SELECT var = '100',
int = 100
UNION ALL
SELECT var = '9999',
int = 999
UNION ALL
SELECT var = '99',
int = 99
UNION ALL
SELECT var = '750',
int = 750
UNION ALL
SELECT var = '10',
int = 10
UNION ALL
SELECT var = '2',
int = 2
)
SELECT *
FROM IntsAsVars
--WHERE var BETWEEN '2' AND '750'
/* should return 2, 10, 99, 100 & 750 if it works like INT
but does it? */
ORDER BY
--var ASC,
int ASC;
Running it without the where clause gets the following so SQL Server doesn't consider the other records to be between 2 and 750 when they are stored as varchar.:
If your real data is exactly as the sample data in regard of the non-numeric values beginning with a letter, you could use your query to achieve the desired result.
However be aware of of the sort order of the data. If you have got an EmployeeId of 1ABC it will be included in the data returned by WHERE EmployeeID BETWEEN '1000' AND '9999'!
Your approach is not suitable to filter out non-numeric values!
An additional ORDER BY affects the order of the results only, it has no effect on the evaluation of the WHERE condition.
I'd say the simplest way is to use like:
select * from yourtable
where EmployeeID like '[1-9][0-9][0-9][0-9]'
lets say you have this input:
IF OBJECT_ID('tempdb..#test') IS NOT NULL
DROP TABLE #test
CREATE TABLE #test
(
EmployeeID VARCHAR(255)
)
CREATE CLUSTERED INDEX CIX_test_EmployeeID ON #test(EmployeeID)
INSERT INTO #test
VALUES
('6745'),
('EMP1'),
('EMP2'),
('1874'),
('LTST'),
('5694'),
('1000'),
('9999'),
('10L'),
('187'),
('9X9'),
('7est'),
('1ok'),
('0_o'),
('0014');
Your statement would also return '1ok','187', '10L' and so on.
Since you mentioned that your employeeID has a fixed length, you could use something like this:
SELECT *
FROM #test
WHERE EmployeeID LIKE '[1-9][0-9][0-9][0-9]'

SQLite UPSERT / UPDATE OR INSERT

I need to perform UPSERT / INSERT OR UPDATE against a SQLite Database.
There is the command INSERT OR REPLACE which in many cases can be useful. But if you want to keep your id's with autoincrement in place because of foreign keys, it does not work since it deletes the row, creates a new one and consequently this new row has a new ID.
This would be the table:
players - (primary key on id, user_name unique)
| id | user_name | age |
------------------------------
| 1982 | johnny | 23 |
| 1983 | steven | 29 |
| 1984 | pepee | 40 |
Q&A Style
Well, after researching and fighting with the problem for hours, I found out that there are two ways to accomplish this, depending on the structure of your table and if you have foreign keys restrictions activated to maintain integrity. I'd like to share this in a clean format to save some time to the people that may be in my situation.
Option 1: You can afford deleting the row
In other words, you don't have foreign key, or if you have them, your SQLite engine is configured so that there no are integrity exceptions. The way to go is INSERT OR REPLACE. If you are trying to insert/update a player whose ID already exists, the SQLite engine will delete that row and insert the data you are providing. Now the question comes: what to do to keep the old ID associated?
Let's say we want to UPSERT with the data user_name='steven' and age=32.
Look at this code:
INSERT INTO players (id, name, age)
VALUES (
coalesce((select id from players where user_name='steven'),
(select max(id) from drawings) + 1),
32)
The trick is in coalesce. It returns the id of the user 'steven' if any, and otherwise, it returns a new fresh id.
Option 2: You cannot afford deleting the row
After monkeying around with the previous solution, I realized that in my case that could end up destroying data, since this ID works as a foreign key for other table. Besides, I created the table with the clause ON DELETE CASCADE, which would mean that it'd delete data silently. Dangerous.
So, I first thought of a IF clause, but SQLite only has CASE. And this CASE can't be used (or at least I did not manage it) to perform one UPDATE query if EXISTS(select id from players where user_name='steven'), and INSERT if it didn't. No go.
And then, finally I used the brute force, with success. The logic is, for each UPSERT that you want to perform, first execute a INSERT OR IGNORE to make sure there is a row with our user, and then execute an UPDATE query with exactly the same data you tried to insert.
Same data as before: user_name='steven' and age=32.
-- make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);
-- make sure it has the right data
UPDATE players SET user_name='steven', age=32 WHERE user_name='steven';
And that's all!
EDIT
As Andy has commented, trying to insert first and then update may lead to firing triggers more often than expected. This is not in my opinion a data safety issue, but it is true that firing unnecessary events makes little sense. Therefore, a improved solution would be:
-- Try to update any existing row
UPDATE players SET age=32 WHERE user_name='steven';
-- Make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);
This is a late answer. Starting from SQLIte 3.24.0, released on June 4, 2018, there is finally a support for UPSERT clause following PostgreSQL syntax.
INSERT INTO players (user_name, age)
VALUES('steven', 32)
ON CONFLICT(user_name)
DO UPDATE SET age=excluded.age;
Note: For those having to use a version of SQLite earlier than 3.24.0, please reference this answer below (posted by me, #MarqueIV).
However if you do have the option to upgrade, you are strongly encouraged to do so as unlike my solution, the one posted here achieves the desired behavior in a single statement. Plus you get all the other features, improvements and bug fixes that usually come with a more recent release.
Here's an approach that doesn't require the brute-force 'ignore' which would only work if there was a key violation. This way works based on any conditions you specify in the update.
Try this...
-- Try to update any existing row
UPDATE players
SET age=32
WHERE user_name='steven';
-- If no update happened (i.e. the row didn't exist) then insert one
INSERT INTO players (user_name, age)
SELECT 'steven', 32
WHERE (Select Changes() = 0);
How It Works
The 'magic sauce' here is using Changes() in the Where clause. Changes() represents the number of rows affected by the last operation, which in this case is the update.
In the above example, if there are no changes from the update (i.e. the record doesn't exist) then Changes() = 0 so the Where clause in the Insert statement evaluates to true and a new row is inserted with the specified data.
If the Update did update an existing row, then Changes() = 1 (or more accurately, not zero if more than one row was updated), so the 'Where' clause in the Insert now evaluates to false and thus no insert will take place.
The beauty of this is there's no brute-force needed, nor unnecessarily deleting, then re-inserting data which may result in messing up downstream keys in foreign-key relationships.
Additionally, since it's just a standard Where clause, it can be based on anything you define, not just key violations. Likewise, you can use Changes() in combination with anything else you want/need anywhere expressions are allowed.
The problem with all presented answers it complete lack of taking triggers (and probably other side effects) into account.
Solution like
INSERT OR IGNORE ...
UPDATE ...
leads to both triggers executed (for insert and then for update) when row does not exist.
Proper solution is
UPDATE OR IGNORE ...
INSERT OR IGNORE ...
in that case only one statement is executed (when row exists or not).
To have a pure UPSERT with no holes (for programmers) that don't relay on unique and other keys:
UPDATE players SET user_name="gil", age=32 WHERE user_name='george';
SELECT changes();
SELECT changes() will return the number of updates done in the last inquire.
Then check if return value from changes() is 0, if so execute:
INSERT INTO players (user_name, age) VALUES ('gil', 32);
Option 1: Insert -> Update
If you like to avoid both changes()=0 and INSERT OR IGNORE even if you cannot afford deleting the row - You can use this logic;
First, insert (if not exists) and then update by filtering with the unique key.
Example
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Insert if NOT exists
INSERT INTO players (user_name, age)
SELECT 'johnny', 20
WHERE NOT EXISTS (SELECT 1 FROM players WHERE user_name='johnny' AND age=20);
-- Update (will affect row, only if found)
-- no point to update user_name to 'johnny' since it's unique, and we filter by it as well
UPDATE players
SET age=20
WHERE user_name='johnny';
Regarding Triggers
Notice: I haven't tested it to see the which triggers are being called, but I assume the following:
if row does not exists
BEFORE INSERT
INSERT using INSTEAD OF
AFTER INSERT
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
if row does exists
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
Option 2: Insert or replace - keep your own ID
in this way you can have a single SQL command
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Single command to insert or update
INSERT OR REPLACE INTO players
(id, user_name, age)
VALUES ((SELECT id from players WHERE user_name='johnny' AND age=20),
'johnny',
20);
Edit: added option 2.
You can also just add an ON CONFLICT REPLACE clause to your user_name unique constraint and then just INSERT away, leaving it to SQLite to figure out what to do in case of a conflict. See:https://sqlite.org/lang_conflict.html.
Also note the sentence regarding delete triggers: When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, delete triggers fire if and only if recursive triggers are enabled.
For those who have the latest version of sqlite available, you can still do it in a single statement using INSERT OR REPLACE but beware you need to set all the values. However this "clever" SQL works by use of a left-join on the table into which you are inserting / updating and ifnull:
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( id varchar(20) PRIMARY KEY, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
The first few lines of that code are setting up the table, with a single ID primary key column and two values. It then enters data with IDs 'A' and 'B'
The second section creates the 'upsert' text, and calls it for 2 rows of data, one with the ID of 'B' which is found and one with 'C' which is not found.
When you run it, you'll find the data at the end produces
$python3 main.py
[('A', 1, 8), ('B', 3, 10)]
[('A', 1, 8), ('B', 4, 10), ('C', 3, 12)]
B 'updated' value to 4 but value2 (5) was ignored, C inserted.
Note: this does not work if your table has an auto-incremented primary key as INSERT OR REPLACE will replace the number with a new one.
A slight modification to add such a column
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( pkey integer primary key autoincrement not null, id varchar(20) UNIQUE not null, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test (id, value, value2)
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
output is now:
$python3 main.py
[(1, 'A', 1, 8), (2, 'B', 3, 10)]
[(1, 'A', 1, 8), (3, 'B', 4, 10), (4, 'C', 3, 12)]
Note pkey 2 is replaced with 3 for id 'B'
This is therefore not ideal but is a good solution when:
You don't have an auto-generated primary key
You want to create an 'upsert' query with bound parameters
You want to use executemany() to merge in multiple rows of data in one go.

Resources