PostgreSQL ARRAY_AGG return separate arrays

PostgreSQL ARRAY_AGG return separate arrays - arrays

The use case is this: each user can create their own games, and keep track in which country they played a game.
I would like to create one query where I can get a list of all games for that user and in which country that game was played. I am only interested in the country id.
I have 4 tables: users, games, countries and a games_countries_xref table.
CREATE SEQUENCE countries_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 2147483647 START 1 CACHE 1;
CREATE TABLE "public"."countries" (
"id" integer DEFAULT nextval('countries_id_seq') NOT NULL,
"name" character varying(200) NOT NULL,
CONSTRAINT "countries_pkey" PRIMARY KEY ("id")
) WITH (oids = false);
INSERT INTO "countries" ("id", "name") VALUES
(1, 'USA'),
(2, 'Japan'),
(3, 'Australia');
CREATE SEQUENCE games_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 2147483647 START 3 CACHE 1;
CREATE TABLE "public"."games" (
"id" integer DEFAULT nextval('games_id_seq') NOT NULL,
"user_id" integer NOT NULL,
"name" character varying(200) NOT NULL,
CONSTRAINT "games_pkey" PRIMARY KEY ("id")
) WITH (oids = false);
INSERT INTO "games" ("id", "user_id", "name") VALUES
(1, 1, 'Monopoly'),
(2, 1, 'Zelda'),
(3, 2, 'Hide & Seek');
CREATE TABLE "public"."games_countries_xref" (
"game_id" integer NOT NULL,
"country_id" integer NOT NULL
) WITH (oids = false);
INSERT INTO "games_countries_xref" ("game_id", "country_id") VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 2),
(3, 1);
CREATE SEQUENCE users_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 2147483647 START 2 CACHE 1;
CREATE TABLE "public"."users" (
"id" integer DEFAULT nextval('users_id_seq') NOT NULL,
"name" character varying(200) NOT NULL,
CONSTRAINT "users_pkey" PRIMARY KEY ("id")
) WITH (oids = false);
INSERT INTO "users" ("id", "name") VALUES
(1, 'Jack'),
(2, 'Jason');
when querying the data, I tried using ARRAY_AGG:
WITH country_ids AS (
SELECT g.user_id, ARRAY_AGG(gcx.country_id) AS country_ids
FROM games AS g
LEFT JOIN games_countries_xref AS gcx ON g.id = gcx.game_id
GROUP BY g.user_id
)
SELECT g.name, country_ids
FROM games AS g
NATURAL LEFT JOIN country_ids
WHERE g.user_id = 1
but that gives me this output:
name | country_ids
------------------
Monopoly | {1,2,3,2}
Zelda | {1,2,3,2}
while I am looking for this:
name | country_ids
------------------
Monopoly | {1,2,3}
Zelda | {2}
I know I am likely doing something wrong in the subquery, but I can't figure out what.
Any ideas?

You are on the right track with ARRAY_AGG, but just a little over aggressive with the joins. You just need a simple join (1 left, 1 inner) on the 3 tables
select g.name,array_agg(gcx.country_id) as country_ids
from games g
join users u on u.id = g.user_id
left join games_countries_xref gcx on gcx.game_id = g.id
where u.id = 1
group by g.name;
+----------+-------------+
| name | country_ids |
+----------+-------------+
| Monopoly | {1,2,3} |
| Zelda | {2} |
+----------+-------------+

Related

Query items that don't have a related record in link table but return results with Id from link table

I have an Item table:
Id | Title | Active
====================
1 | Item 1 | 1
2 | Item 2 | 1
A Location table:
Id | Name
=========
1 | A1
2 | B1
and a link table, where EventId specifies a cycle count event:
Id | EventId | ItemId | LocationId
=============|====================
1 | 1 | 1 | 2
2 | 1 | 2 | 1
3 | 2 | 1 | 1
4 | 2 | 2 | 2
5 | 3 | 1 | 1
I need to determine what items haven't been cycle-counted for a specified EventId (which in this example would be ItemId 2 for EventId 3). We're using a code generation tool that only supports tables and views with a simple filter, so I can't use a sproc or table-valued function. Ideally we'd like to be to do this:
SELECT [EventId], [ItemId] FROM [SomeView] WHERE [EventId] = 3
and get a result like
EventId | ItemId
================
3 | 2
I've tried to wrap my head around this -- unsuccessfully -- because I know it's difficult to query a negative. Is this even possible?

Is something like the following what you're after?
select l.eventId, x.Id ItemId
from Link l
cross apply (
select *
from Items i
where i.Id != l.ItemId
)x
where l.EventId = 3;

--data to work with
DECLARE #items TABLE (ID int, Title nvarchar(100), Active int)
INSERT INTO #items VALUES (1, 'Item 1', 1)
INSERT INTO #items VALUES (2, 'Item 2', 1)
DECLARE #location TABLE (ID int, Name nvarchar(100))
INSERT INTO #location VALUES (1, 'A1')
INSERT INTO #location VALUES (2, 'B1')
DECLARE #linkTable TABLE (ID int, EventId int, ItemId int, LocationId int)
INSERT INTO #linkTable VALUES (1, 1, 1, 2)
INSERT INTO #linkTable VALUES (2, 1, 2, 1)
INSERT INTO #linkTable VALUES (3, 2, 1, 1)
INSERT INTO #linkTable VALUES (4, 2, 2, 2)
INSERT INTO #linkTable VALUES (5, 3, 1, 1)
INSERT INTO #linkTable VALUES (6, 4, 2, 1)
--query you want
SELECT 3 as EventID, ID as ItemID
FROM #items i
WHERE ID not in (SELECT ItemId
FROM #linkTable
WHERE EventId = 3)
Get all the ItemIDs from the LinkTable and then get all the items from the Items table that dont have the sync event. You can replace the 3 in WHERE and SELECT clauses with whatever event you are looking for. And if you want all such pairs of event + item then this should do it:
SELECT subData.EventId, subData.ItemID
FROM (SELECT i.ID as ItemID, cj.EventId
FROM #items i CROSS JOIN (SELECT DISTINCT EventId
FROM #linkTable) cj) subData
left join #linkTable lt ON lt.EventId = subData.EventId and lt.ItemId = subData.ItemID
WHERE lt.ID is null
This could be heavy on performance because CROSS JOIN and DISTINCT and subjoins but it gets the job done. At 1st you create a data of all possible items and events pairs, then left join linked table to it and if the linked table's ID is null that means that there is no event + item pair which means that the item is not synced for that event.

Updating a table column based on the content of another column in another table

I have below table (to simplify I only show a piece of the tables as an example, not all their content):
CREATE TABLE InstArtRel
(
id INT IDENTITY(1, 1) NOT NULL,
idIns INT,
ExpDateRev DATE,
codArticle NVARCHAR(4),
PRIMARY KEY (id)
);
INSERT INTO InstArtRel (idIns, ExpDateRev, codArticle)
VALUES (17400, datefromparts(2018, 10, 1), 'X509'),
(17400, datefromparts(2020, 12, 2), 'X529'),
(17400, datefromparts(2016, 9, 10), 'T579'),
(17400, datefromparts(2017, 6, 7), 'Z669'),
(10100, datefromparts(2019, 8, 17), 'TG09'),
(10100, datefromparts(2018, 3, 28), 'TG09'),
(10100, datefromparts(2018, 4, 24), 'TG09'),
(10100, datefromparts(2016, 7, 12), 'TG09');
CREATE TABLE Installations
(
idIns INT NOT NULL,
DateIns DATETIME,
PRIMARY KEY (idIns)
);
INSERT INTO Installations (idIns, DateIns)
VALUES (17400, '2020-12-01'),
(10100, '2022-05-07');
For each idIns in table Installations I need to update its DateIns column with the ExpDateRev column in InstArtRel table based on the following assumptions:
If all codArticle column values for an IdIns in InstArtRel table are the same, then DateIns column in table Installations will be updated for the corresponding idIns with the maximum value of ExpDateRev.
Otherwise, if all codArticle column values are NOT the same for an IdIns in InstArtRel table, then the DateIns column in table Installations will be updated for the corresponding idIns with the minimun value of ExpDateRev.
Better an example... taken into account that said above, the result in this case will be:
idIns | DateIns
------+-----------
17400 | 2016-9-10
10100 | 2019-8-17

The Aggregate with CASE will help you.
Query:
SELECT idIns,CASE WHEN COUNT(DISTINCT codArticle) = 1 THEN MAX(ExpDateRev)
WHEN COUNT(DISTINCT codArticle) != 1 THEN MIN(ExpDateRev) END DateIns
FROM InstArtRel
GROUP BY idIns
Output:
| idIns | DateIns |
|-------|------------|
| 10100 | 2019-08-17 |
| 17400 | 2016-09-10 |
UPDATE Query:
UPDATE I
SET I.DateIns = R.DateIns
FROM Installations I
JOIN (
SELECT idIns,CASE WHEN COUNT(DISTINCT codArticle) = 1 THEN MAX(ExpDateRev)
WHEN COUNT(DISTINCT codArticle) != 1 THEN MIN(ExpDateRev) END DateIns
FROM InstArtRel
GROUP BY idIns
)R ON R.idIns = I.idIns
SQL Fiddle link

How to create database within a database(postgres)?

Actually I'm noob and stuck on this problem for a week. I will try explaining it.
I have table for USER,
and a table for product
I want to store data of every user for every product. Like if_product_bought, num_of_items, and all.
So only solution I can think of database within database , that is create a copy of products inside user named database and start storing.
If this is possible how or is there any other better solution
Thanks in advance

You actually don't create a database within a database (or a table within a table) when you use PostgreSQL or any other SQL RDBMS.
You use tables, and JOIN them. You normally would have an orders table, together with an items_x_orders table, on top of your users and items.
This is a very simplified scenario:
CREATE TABLE users
(
user_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
user_name text
) ;
CREATE TABLE items
(
item_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
item_description text NOT NULL,
item_unit text NOT NULL,
item_standard_price decimal(10,2) NOT NULL
) ;
CREATE TABLE orders
(
order_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id),
order_date DATE NOT NULL DEFAULT now(),
other_data TEXT
) ;
CREATE TABLE items_x_orders
(
order_id INTEGER NOT NULL REFERENCES orders(order_id),
item_id INTEGER NOT NULL REFERENCES items(item_id),
-- You're supposed not to have the item more than once in an order
-- This makes the following the "natural key" for this table
PRIMARY KEY (order_id, item_id),
item_quantity DECIMAL(10,2) NOT NULL CHECK(item_quantity <> /* > */ 0),
item_percent_discount DECIMAL(5,2) NOT NULL DEFAULT 0.0,
other_data TEXT
) ;
This is all based in the so-called Relational Model. What you were thinking about is something else called a Hierarchical model, or a document model used in some NoSQL databases (where you store your data as a JSON or XML hierarchical structure).
You would fill those tables with data like:
INSERT INTO users
(user_id, user_name)
VALUES
(1, 'Alice Cooper') ;
INSERT INTO items
(item_id, item_description, item_unit, item_standard_price)
VALUES
(1, 'Oranges', 'kg', 0.75),
(2, 'Cookies', 'box', 1.25),
(3, 'Milk', '1l carton', 0.90) ;
INSERT INTO orders
(order_id, user_id)
VALUES
(100, 1) ;
INSERT INTO items_x_orders
(order_id, item_id, item_quantity, item_percent_discount, other_data)
VALUES
(100, 1, 2.5, 0.00, NULL),
(100, 2, 3.0, 0.00, 'I don''t want Oreo'),
(100, 3, 1.0, 0.05, 'Make it promo milk') ;
And then you would produce queries like the following one, where you JOIN all relevant tables:
SELECT
user_name, item_description, item_quantity, item_unit,
item_standard_price, item_percent_discount,
CAST(item_quantity * (item_standard_price * (1-item_percent_discount/100.0)) AS DECIMAL(10,2)) AS items_price
FROM
items_x_orders
JOIN orders USING (order_id)
JOIN items USING (item_id)
JOIN users USING (user_id) ;
...and get these results:
user_name | item_description | item_quantity | item_unit | item_standard_price | item_percent_discount | items_price
:----------- | :--------------- | ------------: | :-------- | ------------------: | --------------------: | ----------:
Alice Cooper | Oranges | 2.50 | kg | 0.75 | 0.00 | 1.88
Alice Cooper | Cookies | 3.00 | box | 1.25 | 0.00 | 3.75
Alice Cooper | Milk | 1.00 | 1l carton | 0.90 | 5.00 | 0.86
You can get all the code and test at dbfiddle here

T-SQL hierarchy - get breadcrumbs using query

I have virtual folder structure saved in database and I want to get the breadcrumbs from the current folder to the root. The data can be unsorted (but better will be sorted) and I want the parent folders of the current folder only.
The table definition is:
DECLARE Folders TABLE (
FOL_PK INT IDENTITY(1,1) NOT NULL,
FOL_Name VARCHAR(200) NOT NULL,
FOL_FOL_FK INT NULL -- Foreign key to parent
)
And this is my solution:
DECLARE #FOL_PK INT = 5 -- Current folder PK
DECLARE #breadcrumbs TABLE (
FOL_PK INT NOT NULL,
FOL_Name VARCHAR(200) NOT NULL,
FOL_FOL_FK INT NULL
)
DECLARE #isRoot BIT = 0
,#currentFolderPK INT
,#parentFK INT
-- Get current and parent folder PK
SELECT
#currentFolderPK = FOL_PK
FROM
Folder
WHERE
FOL_PK = #FOL_PK
-- Breadcrumb
WHILE (#isRoot = 0)
BEGIN
-- Save to breadcrumb
INSERT INTO #breadcrumbs
SELECT
FOL_PK,
FOL_Name,
FOL_FOL_FK
FROM
Folder
WHERE
FOL_PK = #currentFolderPK
-- Set parent as current
SET #currentFolderPK =
(
SELECT
FOL_FOL_FK
FROM
Folder
WHERE
FOL_PK = #currentFolderPK
)
-- Set flag for loop
SET #isRoot = CASE
WHEN ISNULL(#currentFolderPK, 0) = 0 THEN 1
ELSE 0
END
END
-- Return breadcrumbs
SELECT
FOL_PK AS PK,
FOL_Name AS Name,
FOL_FOL_FK AS ParentFK
FROM
#breadcrumbs
The problem is I am not very comfortable with the loop. Is there any other sophisticated solution how to do this?

Try this using a recursive Common Table Expression (CTE):
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE [Folders](
[FOL_PK] [int] IDENTITY(1,1) NOT NULL,
[FOL_Name] [varchar](200) NOT NULL,
[FOL_FOL_FK] [int] NULL,
CONSTRAINT [PK__Folders__FOL_PK] PRIMARY KEY CLUSTERED
(
[FOL_PK] ASC
))
ALTER TABLE [dbo].[Folders]
WITH CHECK ADD CONSTRAINT [FK_Folders_Folders] FOREIGN KEY([FOL_FOL_FK])
REFERENCES [dbo].[Folders] ([FOL_PK])
ALTER TABLE [dbo].[Folders] CHECK CONSTRAINT [FK_Folders_Folders]
INSERT INTO Folders(FOL_Name, FOL_FOL_FK)
VALUES ('Level 1', NULL),
('Level 1.1', 1),
('Level 1.2', 1),
('Level 1.3', 1),
('Level 1.2.1', 3),
('Level 1.2.2', 3),
('Level 1.2.3', 3),
('Level 1.2.2.1', 6),
('Level 1.2.2.2', 6),
('Level 1.2.2.3', 6),
('Level 1.3.1', 4),
('Level 1.3.2', 4)
Query 1:
DECLARE #FolderId Int = 9
;WITH CTE
AS
(
SELECT FOL_PK AS PK, FOL_NAME As Name, FOL_FOL_FK AS ParentFK
FROM Folders
WHERE FOL_PK = #FolderId
UNION ALL
SELECT F.FOL_PK AS PK, F.FOL_NAME AS Name, F.FOL_FOL_FK AS ParentFK
FROM Folders F
INNER JOIN CTE C
ON C.ParentFK = F.FOL_PK
)
SELECT *
FROM CTE
Results:
| PK | Name | ParentFK |
|----|---------------|----------|
| 9 | Level 1.2.2.2 | 6 |
| 6 | Level 1.2.2 | 3 |
| 3 | Level 1.2 | 1 |
| 1 | Level 1 | (null) |

Best way to store last-touched time in Cassandra

I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.
Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra?
If I should use Cassandra, any tips on table properties?
The table I have in mind:
CREATE TABLE ksp1.user_last_job_activities (
user_id bigint,
touched_at timeuuid,
PRIMARY KEY (user_id, touched_at)
) WITH CLUSTERING ORDER BY (touched_at DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Update
Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.
Like so:
CREATE TABLE simple_user_last_activity (
user_id bigint,
touched_at timestamp,
PRIMARY KEY (user_id)
);
Then:
INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now()));
SELECT touched_at from simple_user_last_activity WHERE user_id = 6;
Since touched_at is no longer in the primary key, only one record per user is stored.
Update 2
There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:
CREATE TABLE final_user_last_job_activities (
user_id bigint,
touched_at timestamp,
job_id bigint,
PRIMARY KEY (user_id, touched_at)
)
WITH CLUSTERING ORDER BY (touched_at DESC)
AND default_time_to_live = 604800;
Adding the 1-week TTL takes care of expiring records - if there are none I return current time.
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
SELECT * FROM final_user_last_job_activities LIMIT 1;
Which gives me:
user_id | touched_at | job_id
---------+--------------------------+--------
5 | 2015-06-17 12:43:30+1200 | 6
Simple benchmarks show no significant performance difference in storing or reading from the bigger table.

Because c* is last write wins, you can simply keep the latest versions of each row.
You could, as MSD suggests, use writetime to pull the time of the write. But be careful because this is column specific and you can't use write time on your primary key columns. For example in a table as follows:
cqlsh> create TABLE test.test ( a int, b int, c int, d int, primary key (a))
... ;
cqlsh> insert INTO test.test (a, b, c, d) VALUES ( 1,2,3,4)
... ;
cqlsh> select * from test.test
... ;
a | b | c | d
---+------+---+------
1 | 2 | 3 | 4
(2 rows)
cqlsh> insert into test.test (a,c) values (1, 6);
cqlsh> select * from test.test ;
a | b | c | d
---+------+---+------
1 | 2 | 6 | 4
(2 rows)
cqlsh> select writetime(a), writetime(b), writetime(c), writetime(d) from test.test
... ;
InvalidRequest: code=2200 [Invalid query] message="Cannot use selection function writeTime on PRIMARY KEY part a"
cqlsh> select writetime(b), writetime(c), writetime(d) from test.test ;
writetime(b) | writetime(c) | writetime(d)
------------------+------------------+------------------
1434424690700887 | 1434424690700887 | 1434424702420929
Otherwise you can add a cql column with the timestamp:
create TABLE test.test ( a int, b int, c int, d int, touched_at timeuuid, primary key (a)) ;
Some quick benchmarking would help you determine which is more performant.

Cassandra has implicit support for writetime per each column. See this, looks like that is what you are looking for here.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

PostgreSQL ARRAY_AGG return separate arrays - arrays

Related

Query items that don't have a related record in link table but return results with Id from link table

Updating a table column based on the content of another column in another table

How to create database within a database(postgres)?

T-SQL hierarchy - get breadcrumbs using query

Best way to store last-touched time in Cassandra

Categories

Resources