Actually I'm noob and stuck on this problem for a week. I will try explaining it.
I have table for USER,
and a table for product
I want to store data of every user for every product. Like if_product_bought, num_of_items, and all.
So only solution I can think of database within database , that is create a copy of products inside user named database and start storing.
If this is possible how or is there any other better solution
Thanks in advance
You actually don't create a database within a database (or a table within a table) when you use PostgreSQL or any other SQL RDBMS.
You use tables, and JOIN them. You normally would have an orders table, together with an items_x_orders table, on top of your users and items.
This is a very simplified scenario:
CREATE TABLE users
(
user_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
user_name text
) ;
CREATE TABLE items
(
item_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
item_description text NOT NULL,
item_unit text NOT NULL,
item_standard_price decimal(10,2) NOT NULL
) ;
CREATE TABLE orders
(
order_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id),
order_date DATE NOT NULL DEFAULT now(),
other_data TEXT
) ;
CREATE TABLE items_x_orders
(
order_id INTEGER NOT NULL REFERENCES orders(order_id),
item_id INTEGER NOT NULL REFERENCES items(item_id),
-- You're supposed not to have the item more than once in an order
-- This makes the following the "natural key" for this table
PRIMARY KEY (order_id, item_id),
item_quantity DECIMAL(10,2) NOT NULL CHECK(item_quantity <> /* > */ 0),
item_percent_discount DECIMAL(5,2) NOT NULL DEFAULT 0.0,
other_data TEXT
) ;
This is all based in the so-called Relational Model. What you were thinking about is something else called a Hierarchical model, or a document model used in some NoSQL databases (where you store your data as a JSON or XML hierarchical structure).
You would fill those tables with data like:
INSERT INTO users
(user_id, user_name)
VALUES
(1, 'Alice Cooper') ;
INSERT INTO items
(item_id, item_description, item_unit, item_standard_price)
VALUES
(1, 'Oranges', 'kg', 0.75),
(2, 'Cookies', 'box', 1.25),
(3, 'Milk', '1l carton', 0.90) ;
INSERT INTO orders
(order_id, user_id)
VALUES
(100, 1) ;
INSERT INTO items_x_orders
(order_id, item_id, item_quantity, item_percent_discount, other_data)
VALUES
(100, 1, 2.5, 0.00, NULL),
(100, 2, 3.0, 0.00, 'I don''t want Oreo'),
(100, 3, 1.0, 0.05, 'Make it promo milk') ;
And then you would produce queries like the following one, where you JOIN all relevant tables:
SELECT
user_name, item_description, item_quantity, item_unit,
item_standard_price, item_percent_discount,
CAST(item_quantity * (item_standard_price * (1-item_percent_discount/100.0)) AS DECIMAL(10,2)) AS items_price
FROM
items_x_orders
JOIN orders USING (order_id)
JOIN items USING (item_id)
JOIN users USING (user_id) ;
...and get these results:
user_name | item_description | item_quantity | item_unit | item_standard_price | item_percent_discount | items_price
:----------- | :--------------- | ------------: | :-------- | ------------------: | --------------------: | ----------:
Alice Cooper | Oranges | 2.50 | kg | 0.75 | 0.00 | 1.88
Alice Cooper | Cookies | 3.00 | box | 1.25 | 0.00 | 3.75
Alice Cooper | Milk | 1.00 | 1l carton | 0.90 | 5.00 | 0.86
You can get all the code and test at dbfiddle here
Related
I have a target table for which partial data arrives at different times from 2 departments. The keys they use are the same, but the fields they provide are different. Most of the rows they provide have common keys, but there are some rows that are unique to each department. My question is about the fields, not the rows:
Scenario
the target table has a key and 30 fields.
Dept. 1 provides fields 1-20
Dept. 2 provides fields 21-30
Suppose I loaded Q1 data from Dept. 1, and that created new rows 100-199 and populated fields 1-20. Later, I receive Q1 data from Dept. 2. Can I execute the same merge code I previously used for Dept. 1 to update rows 100-199 and populate fields 21-30 without unintentionally changing fields 1-20? Alternatively, would I have to tailor separate merge code for each Dept.?
In other words, does (or can) "Merge / Update" operate only on target fields that are present in the source table while ignoring target fields that are NOT present in the source table? In this way, Dept. 1 fields would NOT be modified when merging Dept. 2, or vice-versa, in the event I get subsequent corrections to this data from either Dept.
You can use a merge instruction, where you define a source and a target data, and what happens when a registry is found on both, just on the source, just on the target, and even expand it with custom logic, like it's just on the source, and it's older than X, or it's from department Y.
-- I'm skipping the fields 2-20 and 22-30, just to make this shorter.
create table #target (
id int primary key,
field1 varchar(100), -- and so on until 20
field21 varchar(100), -- and so on until 30
)
create table #dept1 (
id int primary key,
field1 varchar(100)
)
create table #dept2 (
id int primary key,
field21 varchar(100)
)
/*
Creates some data to merge into the target.
The expected result is:
| id | field1 | field21 |
| - | - | - |
| 1 | dept1: 1 | dept2: 1 |
| 2 | | dept2: 2 |
| 3 | dept1: 3 | |
| 4 | dept1: 4 | dept2: 4 |
| 5 | | dept2: 5 |
*/
insert into #dept1 values
(1,'dept1: 1'),
--(2,'dept1: 2'),
(3,'dept1: 3'),
(4,'dept1: 4')
insert into #dept2 values
(1,'dept2: 1'),
(2,'dept2: 2'),
--(3,'dept2: 3'),
(4,'dept2: 4'),
(5,'dept2: 5')
-- Inserts the data from the first department. This could be also a merge, it necessary.
insert into #target(id, field1)
select id, field1 from #dept1
merge into #target t
using (select id, field21 from #dept2) as source_data(id, field21)
on (source_data.id = t.id)
when matched then update set field21=source_data.field21
when not matched by source and t.field21 is not null then delete -- you can even use merge to remove some records that match your criteria
when not matched by target then insert (id, field21) values (source_data.id, source_data.field21); -- Every merge statement should end with ;
select * from #target
You can see this code running on this DB Fiddle
I have some tables and want to populate a database attribute based on other table interval values.
The base idea is to populate the 'eye-age' attribute with the values: young, pre-prebyotic, or prebyotic depending on patient's age.
I have the patient table birthdate, and need to populate last attribute with a value from BirthToEyeAge based on patient birthdate, inferring its age.
How can I do this, or which documentation should I read to learn these types of things.
INSERT INTO BirthToEyeAge(bId, minAge , maxAge , eyeAge)
VALUES(1, 0, 28 , 'young')
VALUES(2, 29, 59, 'probyotic')
VALUES(3, 60, 120, 'pre-probyotic')
INSERT INTO Patient( patId, firstName, lastName, birthDate )
VALUES( 1, 'Ark', 'May', '1991-7-22' );
INSERT INTO Diagnostic( diagId, date, tear_rate, consId_Consulta, eyeAge )
VALUES( 1, '2019-08-10', 'normal', 1, ??? );
You can join table Patient with BirthToEyeAge, taking advantage of handy postgres function age() to compute the age of the patient at the time he was diagnosed. Here is an an insert query based on this logic:
insert into Diagnostic( diagId, date, tear_rate, consId_Consulta, eyeAge )
select d.*, b.bId
from
(select 1 diagId, '2018-08-10'::date date, 'normal' tear_rate, 1 consId_Consulta ) d
inner join patient p
on d.consId_Consulta = p.patId
inner join BirthToEyeAge b
on extract(year from age(d.date, p.birthDate)) between b.minAge and b.maxAge;
In this demo on DB Fiddle, after creating the tables, initializing their content, and running the above query, the content of Diagnostic is:
| diagid | date | tear_rate | consid_consulta | eyeage |
| ------ | ------------------------ | --------- | --------------- | ------ |
| 1 | 2018-08-10T00:00:00.000Z | normal | 1 | 1 |
Hi just wondering if this scenario is possible?
I have two tables and a relationship table to create a many to many relationships between the two tables. See the below tables for a simple representation;
| Security ID | Security Group |
| 1 | Admin |
| 2 | Basic |
| Security ID | Access ID |
| 1 | NULL |
| 2 | 1 |
| Function ID | Function Code |
| 1 | Search |
| 2 | Delete |
What I want to achieve is while checking the relationship table I want to return all functions a user on a security group has access to. If the user is assigned to a security group that contains a NULL value in the relationship table then grant them access to all functions.
For instance, a user on the "Basic" security group would have access to the search function while a user on the "Admin" security group should have access to both Search and Delete.
The reason it is set up this way is because a user can have 0 to many security groups and the list of functions is very large requiring the use of a whitelist of functions you can access instead of a list of a blacklist of functions you can't access.
Thank you for your time.
Your tables' sample:
CREATE TABLE #G
(
Security_ID INT,
Security_Group VARCHAR(32)
)
INSERT INTO #G
VALUES (1, 'Admin'), (2, 'Basic')
CREATE TABLE #A
(
Security_ID INT,
Access_ID INT
)
INSERT INTO #A
VALUES (1, NULL), (2, 1)
CREATE TABLE #F
(
Function_ID INT,
Function_CODE VARCHAR(32)
)
INSERT INTO #F
VALUES (1, 'Search'), (2, 'Delete')
Query:
SELECT #G.Security_Group, #F.Function_CODE
FROM #G
JOIN #A ON #G.Security_ID = #A.Security_ID
JOIN #F ON #F.Function_ID = #A.Access_ID OR #A.Access_ID IS NULL
Dropping the sample tables:
DROP TABLE #G
DROP TABLE #A
DROP TABLE #F
I'm trying to add values in a junction table of a many to many relationship.
Tables look like these (all IDs are integers):
Table A
+------+----------+
| id_A | ext_id_A |
+------+----------+
| 1 | 100 |
| 2 | 101 |
| 3 | 102 |
+------+----------+
Table B is conceptually similar
+------+----------+
| id_B | ext_id_B |
+------+----------+
| 1 | 200 |
| 2 | 201 |
| 3 | 202 |
+------+----------+
Tables PK are id_A and id_B, as columns in my junction table are FK to those columns, but I have to insert values having only external ids (ext_id_A, ext_id_B).
External IDs are unique columns, (and therefore in a 1:1 with table id itself), so having ext_id I can lookup the exact row and get the id need to insert into junction table.
This is an example of what I've done so far, but doesn't look like an optimized sql statement:
-- Example table I receive with test values
declare #temp as table (
ext_id_a int not null,
ext_id_b int not null
);
insert into #temp values (100, 200), (101, 200), (101, 201);
--Insertion - code from my sp
declare #final as table (
id_a int not null,
id_b int not null
);
insert into #final
select a.id_a, b.id_b
from #temp as t
inner join table_a a on a.ext_id_a = t.ext_id_a
inner join table_b b on b.ext_id_b = t.ext_id_b
merge into junction_table as jt
using #final as f
on f.id_a = jt.id_a and f.id_b = tj.id_b
when not matched by target then
insert (id_a, id_b) values (id_a, id_b);
I was thinking about a MERGE statement since my stored procedure receives data in a Table Value Parameters parameter and I also have to check for already existing references.
Is anything I can do to improve insertion of these values?
No need to use the #final table variable:
; with cte as (
select tA.id_A, tB.id_B
from #temp t
join table_A tA on t.ext_id_a = tA.ext_id_A
join table_B tB on t.ext_id_B = tB.ext_id_B
)
merge into junction_table
using cte
on cte.id_A = junction_table.id_A and cte.id_B = junction_table.id_B
when not matched by target then
insert (id_A, id_B) values (cte.id_A, cte.id_B);
I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.
Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra?
If I should use Cassandra, any tips on table properties?
The table I have in mind:
CREATE TABLE ksp1.user_last_job_activities (
user_id bigint,
touched_at timeuuid,
PRIMARY KEY (user_id, touched_at)
) WITH CLUSTERING ORDER BY (touched_at DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Update
Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.
Like so:
CREATE TABLE simple_user_last_activity (
user_id bigint,
touched_at timestamp,
PRIMARY KEY (user_id)
);
Then:
INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now()));
SELECT touched_at from simple_user_last_activity WHERE user_id = 6;
Since touched_at is no longer in the primary key, only one record per user is stored.
Update 2
There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:
CREATE TABLE final_user_last_job_activities (
user_id bigint,
touched_at timestamp,
job_id bigint,
PRIMARY KEY (user_id, touched_at)
)
WITH CLUSTERING ORDER BY (touched_at DESC)
AND default_time_to_live = 604800;
Adding the 1-week TTL takes care of expiring records - if there are none I return current time.
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
SELECT * FROM final_user_last_job_activities LIMIT 1;
Which gives me:
user_id | touched_at | job_id
---------+--------------------------+--------
5 | 2015-06-17 12:43:30+1200 | 6
Simple benchmarks show no significant performance difference in storing or reading from the bigger table.
Because c* is last write wins, you can simply keep the latest versions of each row.
You could, as MSD suggests, use writetime to pull the time of the write. But be careful because this is column specific and you can't use write time on your primary key columns. For example in a table as follows:
cqlsh> create TABLE test.test ( a int, b int, c int, d int, primary key (a))
... ;
cqlsh> insert INTO test.test (a, b, c, d) VALUES ( 1,2,3,4)
... ;
cqlsh> select * from test.test
... ;
a | b | c | d
---+------+---+------
1 | 2 | 3 | 4
(2 rows)
cqlsh> insert into test.test (a,c) values (1, 6);
cqlsh> select * from test.test ;
a | b | c | d
---+------+---+------
1 | 2 | 6 | 4
(2 rows)
cqlsh> select writetime(a), writetime(b), writetime(c), writetime(d) from test.test
... ;
InvalidRequest: code=2200 [Invalid query] message="Cannot use selection function writeTime on PRIMARY KEY part a"
cqlsh> select writetime(b), writetime(c), writetime(d) from test.test ;
writetime(b) | writetime(c) | writetime(d)
------------------+------------------+------------------
1434424690700887 | 1434424690700887 | 1434424702420929
Otherwise you can add a cql column with the timestamp:
create TABLE test.test ( a int, b int, c int, d int, touched_at timeuuid, primary key (a)) ;
Some quick benchmarking would help you determine which is more performant.
Cassandra has implicit support for writetime per each column. See this, looks like that is what you are looking for here.