Best way to store last-touched time in Cassandra - database

I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.
Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra?
If I should use Cassandra, any tips on table properties?
The table I have in mind:
CREATE TABLE ksp1.user_last_job_activities (
user_id bigint,
touched_at timeuuid,
PRIMARY KEY (user_id, touched_at)
) WITH CLUSTERING ORDER BY (touched_at DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Update
Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.
Like so:
CREATE TABLE simple_user_last_activity (
user_id bigint,
touched_at timestamp,
PRIMARY KEY (user_id)
);
Then:
INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now()));
SELECT touched_at from simple_user_last_activity WHERE user_id = 6;
Since touched_at is no longer in the primary key, only one record per user is stored.
Update 2
There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:
CREATE TABLE final_user_last_job_activities (
user_id bigint,
touched_at timestamp,
job_id bigint,
PRIMARY KEY (user_id, touched_at)
)
WITH CLUSTERING ORDER BY (touched_at DESC)
AND default_time_to_live = 604800;
Adding the 1-week TTL takes care of expiring records - if there are none I return current time.
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
SELECT * FROM final_user_last_job_activities LIMIT 1;
Which gives me:
user_id | touched_at | job_id
---------+--------------------------+--------
5 | 2015-06-17 12:43:30+1200 | 6
Simple benchmarks show no significant performance difference in storing or reading from the bigger table.

Because c* is last write wins, you can simply keep the latest versions of each row.
You could, as MSD suggests, use writetime to pull the time of the write. But be careful because this is column specific and you can't use write time on your primary key columns. For example in a table as follows:
cqlsh> create TABLE test.test ( a int, b int, c int, d int, primary key (a))
... ;
cqlsh> insert INTO test.test (a, b, c, d) VALUES ( 1,2,3,4)
... ;
cqlsh> select * from test.test
... ;
a | b | c | d
---+------+---+------
1 | 2 | 3 | 4
(2 rows)
cqlsh> insert into test.test (a,c) values (1, 6);
cqlsh> select * from test.test ;
a | b | c | d
---+------+---+------
1 | 2 | 6 | 4
(2 rows)
cqlsh> select writetime(a), writetime(b), writetime(c), writetime(d) from test.test
... ;
InvalidRequest: code=2200 [Invalid query] message="Cannot use selection function writeTime on PRIMARY KEY part a"
cqlsh> select writetime(b), writetime(c), writetime(d) from test.test ;
writetime(b) | writetime(c) | writetime(d)
------------------+------------------+------------------
1434424690700887 | 1434424690700887 | 1434424702420929
Otherwise you can add a cql column with the timestamp:
create TABLE test.test ( a int, b int, c int, d int, touched_at timeuuid, primary key (a)) ;
Some quick benchmarking would help you determine which is more performant.

Cassandra has implicit support for writetime per each column. See this, looks like that is what you are looking for here.

Related

PostgreSQL ARRAY_AGG return separate arrays

The use case is this: each user can create their own games, and keep track in which country they played a game.
I would like to create one query where I can get a list of all games for that user and in which country that game was played. I am only interested in the country id.
I have 4 tables: users, games, countries and a games_countries_xref table.
CREATE SEQUENCE countries_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 2147483647 START 1 CACHE 1;
CREATE TABLE "public"."countries" (
"id" integer DEFAULT nextval('countries_id_seq') NOT NULL,
"name" character varying(200) NOT NULL,
CONSTRAINT "countries_pkey" PRIMARY KEY ("id")
) WITH (oids = false);
INSERT INTO "countries" ("id", "name") VALUES
(1, 'USA'),
(2, 'Japan'),
(3, 'Australia');
CREATE SEQUENCE games_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 2147483647 START 3 CACHE 1;
CREATE TABLE "public"."games" (
"id" integer DEFAULT nextval('games_id_seq') NOT NULL,
"user_id" integer NOT NULL,
"name" character varying(200) NOT NULL,
CONSTRAINT "games_pkey" PRIMARY KEY ("id")
) WITH (oids = false);
INSERT INTO "games" ("id", "user_id", "name") VALUES
(1, 1, 'Monopoly'),
(2, 1, 'Zelda'),
(3, 2, 'Hide & Seek');
CREATE TABLE "public"."games_countries_xref" (
"game_id" integer NOT NULL,
"country_id" integer NOT NULL
) WITH (oids = false);
INSERT INTO "games_countries_xref" ("game_id", "country_id") VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 2),
(3, 1);
CREATE SEQUENCE users_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 2147483647 START 2 CACHE 1;
CREATE TABLE "public"."users" (
"id" integer DEFAULT nextval('users_id_seq') NOT NULL,
"name" character varying(200) NOT NULL,
CONSTRAINT "users_pkey" PRIMARY KEY ("id")
) WITH (oids = false);
INSERT INTO "users" ("id", "name") VALUES
(1, 'Jack'),
(2, 'Jason');
when querying the data, I tried using ARRAY_AGG:
WITH country_ids AS (
SELECT g.user_id, ARRAY_AGG(gcx.country_id) AS country_ids
FROM games AS g
LEFT JOIN games_countries_xref AS gcx ON g.id = gcx.game_id
GROUP BY g.user_id
)
SELECT g.name, country_ids
FROM games AS g
NATURAL LEFT JOIN country_ids
WHERE g.user_id = 1
but that gives me this output:
name | country_ids
------------------
Monopoly | {1,2,3,2}
Zelda | {1,2,3,2}
while I am looking for this:
name | country_ids
------------------
Monopoly | {1,2,3}
Zelda | {2}
I know I am likely doing something wrong in the subquery, but I can't figure out what.
Any ideas?
You are on the right track with ARRAY_AGG, but just a little over aggressive with the joins. You just need a simple join (1 left, 1 inner) on the 3 tables
select g.name,array_agg(gcx.country_id) as country_ids
from games g
join users u on u.id = g.user_id
left join games_countries_xref gcx on gcx.game_id = g.id
where u.id = 1
group by g.name;
+----------+-------------+
| name | country_ids |
+----------+-------------+
| Monopoly | {1,2,3} |
| Zelda | {2} |
+----------+-------------+

How to create database within a database(postgres)?

Actually I'm noob and stuck on this problem for a week. I will try explaining it.
I have table for USER,
and a table for product
I want to store data of every user for every product. Like if_product_bought, num_of_items, and all.
So only solution I can think of database within database , that is create a copy of products inside user named database and start storing.
If this is possible how or is there any other better solution
Thanks in advance
You actually don't create a database within a database (or a table within a table) when you use PostgreSQL or any other SQL RDBMS.
You use tables, and JOIN them. You normally would have an orders table, together with an items_x_orders table, on top of your users and items.
This is a very simplified scenario:
CREATE TABLE users
(
user_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
user_name text
) ;
CREATE TABLE items
(
item_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
item_description text NOT NULL,
item_unit text NOT NULL,
item_standard_price decimal(10,2) NOT NULL
) ;
CREATE TABLE orders
(
order_id INTEGER /* SERIAL */ NOT NULL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id),
order_date DATE NOT NULL DEFAULT now(),
other_data TEXT
) ;
CREATE TABLE items_x_orders
(
order_id INTEGER NOT NULL REFERENCES orders(order_id),
item_id INTEGER NOT NULL REFERENCES items(item_id),
-- You're supposed not to have the item more than once in an order
-- This makes the following the "natural key" for this table
PRIMARY KEY (order_id, item_id),
item_quantity DECIMAL(10,2) NOT NULL CHECK(item_quantity <> /* > */ 0),
item_percent_discount DECIMAL(5,2) NOT NULL DEFAULT 0.0,
other_data TEXT
) ;
This is all based in the so-called Relational Model. What you were thinking about is something else called a Hierarchical model, or a document model used in some NoSQL databases (where you store your data as a JSON or XML hierarchical structure).
You would fill those tables with data like:
INSERT INTO users
(user_id, user_name)
VALUES
(1, 'Alice Cooper') ;
INSERT INTO items
(item_id, item_description, item_unit, item_standard_price)
VALUES
(1, 'Oranges', 'kg', 0.75),
(2, 'Cookies', 'box', 1.25),
(3, 'Milk', '1l carton', 0.90) ;
INSERT INTO orders
(order_id, user_id)
VALUES
(100, 1) ;
INSERT INTO items_x_orders
(order_id, item_id, item_quantity, item_percent_discount, other_data)
VALUES
(100, 1, 2.5, 0.00, NULL),
(100, 2, 3.0, 0.00, 'I don''t want Oreo'),
(100, 3, 1.0, 0.05, 'Make it promo milk') ;
And then you would produce queries like the following one, where you JOIN all relevant tables:
SELECT
user_name, item_description, item_quantity, item_unit,
item_standard_price, item_percent_discount,
CAST(item_quantity * (item_standard_price * (1-item_percent_discount/100.0)) AS DECIMAL(10,2)) AS items_price
FROM
items_x_orders
JOIN orders USING (order_id)
JOIN items USING (item_id)
JOIN users USING (user_id) ;
...and get these results:
user_name | item_description | item_quantity | item_unit | item_standard_price | item_percent_discount | items_price
:----------- | :--------------- | ------------: | :-------- | ------------------: | --------------------: | ----------:
Alice Cooper | Oranges | 2.50 | kg | 0.75 | 0.00 | 1.88
Alice Cooper | Cookies | 3.00 | box | 1.25 | 0.00 | 3.75
Alice Cooper | Milk | 1.00 | 1l carton | 0.90 | 5.00 | 0.86
You can get all the code and test at dbfiddle here

Lookup primary ID from multiple tables having another (unique) field

I'm trying to add values in a junction table of a many to many relationship.
Tables look like these (all IDs are integers):
Table A
+------+----------+
| id_A | ext_id_A |
+------+----------+
| 1 | 100 |
| 2 | 101 |
| 3 | 102 |
+------+----------+
Table B is conceptually similar
+------+----------+
| id_B | ext_id_B |
+------+----------+
| 1 | 200 |
| 2 | 201 |
| 3 | 202 |
+------+----------+
Tables PK are id_A and id_B, as columns in my junction table are FK to those columns, but I have to insert values having only external ids (ext_id_A, ext_id_B).
External IDs are unique columns, (and therefore in a 1:1 with table id itself), so having ext_id I can lookup the exact row and get the id need to insert into junction table.
This is an example of what I've done so far, but doesn't look like an optimized sql statement:
-- Example table I receive with test values
declare #temp as table (
ext_id_a int not null,
ext_id_b int not null
);
insert into #temp values (100, 200), (101, 200), (101, 201);
--Insertion - code from my sp
declare #final as table (
id_a int not null,
id_b int not null
);
insert into #final
select a.id_a, b.id_b
from #temp as t
inner join table_a a on a.ext_id_a = t.ext_id_a
inner join table_b b on b.ext_id_b = t.ext_id_b
merge into junction_table as jt
using #final as f
on f.id_a = jt.id_a and f.id_b = tj.id_b
when not matched by target then
insert (id_a, id_b) values (id_a, id_b);
I was thinking about a MERGE statement since my stored procedure receives data in a Table Value Parameters parameter and I also have to check for already existing references.
Is anything I can do to improve insertion of these values?
No need to use the #final table variable:
; with cte as (
select tA.id_A, tB.id_B
from #temp t
join table_A tA on t.ext_id_a = tA.ext_id_A
join table_B tB on t.ext_id_B = tB.ext_id_B
)
merge into junction_table
using cte
on cte.id_A = junction_table.id_A and cte.id_B = junction_table.id_B
when not matched by target then
insert (id_A, id_B) values (cte.id_A, cte.id_B);

SQL Server 2005 : ALTER COLUMN AUTO_INCREMENT and set started ID of AUTO_INCREMENT

I just search and tried for 2 hours, but I still can't solve this
I want to migrate data from another database to my database. I created a table tbl_animal and inserted some values:
CREATE TABLE tbl_animal
(
id INT NOT NULL,
name VARCHAR (150) NOT NULL
) ;
INSERT INTO tbl_animal (id, name) VALUES (111, 'dog');
INSERT INTO tbl_animal (id, name) VALUES (222, 'bird');
and data will be like this
______________
id | name|
--------------
111 | dog |
222 | bird |
--------------
Then I want to set ID to AUTO_INCREMENT PRIMARY KEY and SET AUTO_INCREMENT to start from 223
So if I run
INSERT INTO tbl_animal (name) VALUES ('fish')
data will be like this
______________
id | name|
--------------
111 | dog |
222 | bird |
223 | fish |
--------------
I try so many solutions, but still can't. The last query that I have tried is
ALTER TABLE tbl_animal
ALTER COLUMN id int NOT NULL IDENTITY(1,1) PRIMARY KEY;
above query throw error code (156):
Incorrect syntax near the keyword 'IDENTITY'
Thanks for help me
Adding identity to an existing column is not possible. All is not lost however. If you can create the table again you can do this kind of thing pretty easily. Something like this.
CREATE TABLE tbl_animal
(
id INT identity(1, 1) NOT NULL,
name VARCHAR (150) NOT NULL
) ;
set identity_insert tbl_animal on
INSERT INTO tbl_animal (id, name) VALUES (111, 'dog');
INSERT INTO tbl_animal (id, name) VALUES (222, 'bird');
set identity_insert tbl_animal off
dbcc CHECKIDENT('tbl_animal', RESEED, 222) --you really do want 222 here so the next inserted identity will be 223
INSERT INTO tbl_animal (name) VALUES ('fish')
select * from tbl_animal
If however you cannot recreate the table this is quite a bit more complicated. If that is the case let me know and I can help you.

Unique values in two columns

I have two columns and I need them both to be unique between each other (like it is 1 column).
My first attempt was to create sequence and set default constraints.
create sequence seq1
as bigint
start with 1
increment by 1
cache;
go
create table product (
pk uniqueidentifier
, id_1 bigint not null default (next value for seq1)
, id_2 bigint not null default (next value for seq1)
);
go
insert into product (pk) values (newid());
insert into product (pk) values (newid());
insert into product (pk) values (newid());
insert into product (pk) values (newid());
insert into product (pk) values (newid());
go
And it doesn't work. The result of query above will be:
id_1 id_2
1 1
2 2
3 3
4 4
5 5
Currently I stopped using 2 sequences with even and not even numbers.
create sequence seq2
as bigint
start with 1
increment by 2
cache;
go
create sequence seq3
as bigint
start with 2
increment by 2
cache;
go
But if in future I will need to add another column which also must be unique I will have a problem.
I also thinked about stored procedures. Something like this works for me.
create procedure sp_insertProduct
as
begin
declare #id1 as bigint = next value for seq1;
declare #id2 as bigint = next value for seq1;
insert into product (pk, id_1, id_2) values (newid(), #id1, #id2);
end
go
exec sp_insertProduct;
exec sp_insertProduct;
exec sp_insertProduct;
go
But due to my ORM framework restictions I cannot use stored procedures for inserting.
So is there a better solution for that problem?
PS. for some reasons I can not use uniqueidentifiers.
UPDATE
I think I need to explain question a bit clearly. I have a working solution for now (and both current answers will also work), but I wonder if there is a extensible solution to:
provide uniqueness of values in multiple columns (with ability to
add additional columns in future).
avoid using uniqueidentifiers
for better understanding of question, this is how I check uniqueness:
with src as (
select id_1 as id from product
union all
select id_2 as id from product
)
select id, count(*) as equal_values
from src
group by id
having (count(*) > 1)
create sequence seq1
as bigint
start with 1
increment by 2
cache;
go
create table product (
pk uniqueidentifier
, id_1 bigint not null default (next value for seq1)
, id_2 bigint not null default (next value for seq1 + 1)
);
go
doing this..
insert into product (pk) values (newid());
insert into product (pk) values (newid());
insert into product (pk) values (newid());
insert into product (pk) values (newid());
insert into product (pk) values (newid());
this would generate..
pk id_1 id_2
2A159914-8105-4DC1-9D7E-570CC5444172 1 2
6DAFEF16-2B81-4A10-99EF-B3F1A74389C6 3 4
8C6F6697-D993-4320-92BB-04CD56804C5A 5 6
AC97F37F-CAC3-4E83-BDD4-4B55D009C334 7 8
3DDAADA0-D7DB-4350-8087-ABF02B539552 9 10
SQL Fiddle
May be this seems very naive but it should work. I didn't check but you may need to add some parenthesis:
create table product (
pk uniqueidentifier
, id_1 bigint not null default (next value for seq1)
, id_2 bigint not null default (-next value for seq1)
);
go

Resources