Cassandra order by second clustering key

Cassandra order by second clustering key - database

I have a table in cassandra and I want to order by the second clustering column and leave the first clustering column. It is the table definition:
CREATE TABLE table1 (
key int,
value1 text,
value2 text,
value3 text,
comments text,
PRIMARY KEY (key, value1, value2, value3)
)WITH CLUSTERING ORDER BY (value2 DESC);
I know that the above script is wrong and I should change it below:
CREATE TABLE table1 (
key int,
value1 text,
value2 text,
value3 text,
comments text,
PRIMARY KEY (key, value1, value2, value3)
)WITH CLUSTERING ORDER BY (value1 DESC, value2 DESC);
but I want to sort it by the only value2(and not the value1). Is it possible? Is there any way to achieve this?

It's not possible out of box - data is sorted hierarchically inside the partition - first they are sorted by first clustering column, then sorted inside every unique value of "parent column", etc. Something like this (CL - clustering column):
partition key:
CL1 value 1:
CL2 value 1
CL2 value 2
CL1 value 2
CL2 value 1
CL2 value 3
...

Related

In composite primary key, auto-increment first data for each second one in postgres?

Let's suppose I have a table with a composite primary key, as follows:
CREATE TABLE fruits(
id SERIAL NOT NULL,
name VARCHAR NOT NULL,
PRIMARY KEY(id , name)
);
Had name not be a primary key too, we could have expected this behaviour:
id name
1 banana
2 apple
3 peach
4 banana
5 apple
However, I want id, the first of the two primary keys, to be incremented for each name, in order to have an increment of id for EACH value of name.
Is it possible, with any data type like SERIAL or with any feature provided by postgres, to reach the following behaviour when you have a composite key, without having to add extra logic ( like a TRIGGER ) for each new row of fruits and to reach an example such as follows?
id just has to be an incrementing integer.
id name
1 banana
1 apple
1 peach
2 banana
3 banana
2 peach

auto-increment means every time insert operation the sequence nextval increased by 1. If the sequence column have some duplicated value then you may want to reset the nextval of the sequence.
CREATE temp TABLE fruits (
id bigint GENERATED BY DEFAULT AS IDENTITY,
name text NOT NULL,
PRIMARY KEY (id, name)
);
INSERT INTO fruits (name)
VALUES ('banana');
SELECT
nextval('fruits_id_seq');
--return the sequence next value. it will return 2.
INSERT INTO fruits (id, name)
VALUES (1, 'apple');
INSERT INTO fruits (id, name)
VALUES (1, 'peach');
INSERT INTO fruits (id, name)
VALUES (2, 'banana');
INSERT INTO fruits (id, name)
VALUES (3, 'banana');
INSERT INTO fruits (id, name)
VALUES (2, 'peach');
SELECT
nextval('fruits_id_seq');
--return 6.
ALTER SEQUENCE fruits_id_seq
RESTART WITH 4;
SELECT
nextval('fruits_id_seq');
--return 4.

Self Join on large tables slowness issue

I have two tables like...
table1 (cid, duedate, currency, value)
main_table1 (cid)
My query is like below, I am find out co-relation between each cid and table1 contains 3 million records(cid and duedate column is compositely unique) and main_table contains 1500 records all unique.
SELECT
b.cid, c.cid,
(COUNT(*) * SUM(b.value * c.value) -
SUM(b.value) * SUM(c.value)) /
(SQRT(COUNT(*) * SUM(b.value * b.value) -
SUM(b.value) * SUM(b.value)) *
SQRT(COUNT(*) * SUM(c.value * c.value) -
SUM(c.value) * SUM(c.value))
) AS correl_ij
FROM
main_table1 a
JOIN
table1 AS b ON a.cid = b.cid
JOIN
table1 AS c ON b.cid < c.cid
AND b.duedate = c.duedate
AND b.currency = c.currency
GROUP BY
b.cid, c.cid
Please suggest how to optimize this query because it is running slow.
CREATE TABLE #table1(
id int identity,
cid int NOT NULL,
duedate date NOT NULL,
currency char(3) NOT NULL,
value float,
PRIMARY KEY(id,currency,cid,duedate)
);
CREATE TABLE #main_table1(
cid int NOT NULL PRIMARY KEY,
currency char(3)
);
--#main table contains 155000 cid records there is no duplicate values
insert into #main_table1
values(19498,'ABC'),(19500,'ABC'),(19534,'ABC')
INSERT INTO #table1(CID,DUEDATE,currency,value)
VALUES(19498,'2016-12-08','USD',-0.0279702098021799) ,
(19498,'2016-12-12','USD',0.0151285161000268),
(19498,'2016-12-15','USD',-0.00965080868337728),
(19498,'2016-12-19','USD',0.00808331709091531)
There are 3 million records in this table for diffrent dates and cid and most of the cid are present in #main_table1.
I am using a.cid < b.cid to remove duplicate relationship between a.cid and b.cid beause i am deriving corelation between each cid.
so 19498 -->>19500 corelation is calculated hence then i do not want 19500--> 19498 because it would be same but duplicate.

That PK is silly. Why would you include Iden in a composite PK let alone in the first position? Drop Iden unless you have to have it for some misguided reason.
PRIMARY KEY(cid, currency, duedate)
Or the natural key if different

If you're commonly joining or sorting on the cid column, you probably want a clustered index on that column or a composite beginning with that column.
If cid, duedate is unique then you can consider removing the id altogether.
If you want to retain id for some reason, make it PRIMARY KEY NONCLUSTERED, and specify a clustered index on cid, duedate.

Auto-increment Id based on composite primary key

Note: Using Sql Azure & Entity Framework 6
Say I have the following table of a store's invoices (there are multiple stores in the DB)...
CREATE TABLE [dbo].[Invoice] (
[InvoiceId] INTEGER NOT NULL,
[StoreId] UNIQUEIDENTIFIER NOT NULL,
CONSTRAINT [PK_Invoice] PRIMARY KEY CLUSTERED ([InvoiceId] ASC, [StoreId] ASC)
);
Ideally, I would like the InvoiceId to increment consecutively for each StoreId rather than independent of each store...
InvoiceId | StoreId
-------------------
1 | 'A'
2 | 'A'
3 | 'A'
1 | 'B'
2 | 'B'
Question: What is the best way to get the [InvoiceId] to increment based on the [StoreId]?
Possible options:
a) Ideally a [InvoiceId] INTEGER NOT NULL IDENTITY_BASED_ON([StoreId]) parameter of some kind would be really helpful, but I doubt this exists...
b) A way to set the default from the return of a function based on another column? (AFAIK, you can't reference another column in a default)
CREATE FUNCTION [dbo].[NextInvoiceId]
(
#storeId UNIQUEIDENTIFIER
)
RETURNS INT
AS
BEGIN
DECLARE #nextId INT;
SELECT #nextId = MAX([InvoiceId])+1 FROM [Invoice] WHERE [StoreId] = #storeId;
IF (#nextId IS NULL)
RETURN 1;
RETURN #nextId;
END
CREATE TABLE [dbo].[Invoice] (
[InvoiceId] INTEGER NOT NULL DEFAULT NextInvoiceId([StoreId]),
[StoreId] UNIQUEIDENTIFIER NOT NULL,
CONSTRAINT [PK_Invoice] PRIMARY KEY CLUSTERED ([InvoiceId] ASC, [StoreId] ASC)
);
c) A way to handle this in Entity Framework (code first w/o migration) using DbContext.SaveChangesAsync override or by setting a custom insert query?
Note: I realize I could do it with a stored procedure to insert the invoice, but I'd prefer avoid that unless its the only option.

You should stick to an auto-incrementing integer primary key, this is much simpler than dealing with a composite primary key especially when relating things back to an Invoice.
In order to generate an InvoiceNumber for the sake of a user, which increments per-store, you can use a ROW_NUMBER function partitioned by StoreId and ordered by your auto-incrementing primary key.
This is demonstrated with the example below:
WITH TestData(InvoiceId, StoreId) AS
(
SELECT 1,'A'
UNION SELECT 2,'A'
UNION SELECT 3,'A'
UNION SELECT 4,'B'
UNION SELECT 5,'B'
)
Select InvoiceId,
StoreId,
ROW_NUMBER() OVER(PARTITION BY StoreId ORDER BY InvoiceId) AS InvoiceNumber
FROM TestData
Result:
InvoiceId | StoreId | InvoiceNumber
1 | A | 1
2 | A | 2
3 | A | 3
4 | B | 1
5 | B | 2

After playing around with the answer provided by #Jamiec in my solution I instead decided to go the TRIGGER route in order to persist the invoice number and better work with Entity Framework. Additionally, since ROW_NUMBER doesn't work in an INSERT (AFAIK) I am instead using MAX([InvoiceNumber])+1.
CREATE TABLE [dbo].[Invoice] (
[InvoiceId] INTEGER NOT NULL,
[StoreId] UNIQUEIDENTIFIER NOT NULL,
[InvoiceNumber] INTEGER NOT NULL,
CONSTRAINT [PK_Invoice] PRIMARY KEY CLUSTERED ([InvoiceId] ASC)
);
CREATE TRIGGER TGR_InvoiceNumber
ON [Invoice]
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO [Invoice] ([InvoiceId], [StoreId], [InvoiceNumber])
SELECT [InvoiceId],
[StoreId],
ISNULL((SELECT MAX([InvoiceNumber]) + 1 FROM [Invoice] AS inv2 WHERE inv2.[StoreId] = inv1.[StoreId]), 1)
FROM inserted as inv1;
END;
This allows me to set up my EF class like:
public class Invoice
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int InvoiceId { get; set; }
public Guid StoreId { get; set; }
[DatabaseGenerated(DatabaseGeneratedOption.Computed)]
public int InvoiceNumber { get; set; }
}

Default Index on Primary Key

Does SQL Server build an index on primary keys by default?
If yes what kind of index? If no what kind of index is appropriate for selections by primary key?
I use SQL Server 2008 R2
Thank you.

You can easily determine the first part of this for yourself
create table x
(
id int primary key
)
select * from sys.indexes where object_id = object_id('x')
Gives
object_id name index_id type type_desc is_unique data_space_id ignore_dup_key is_primary_key is_unique_constraint fill_factor is_padded is_disabled is_hypothetical allow_row_locks allow_page_locks
1653580929 PK__x__6383C8BA 1 1 CLUSTERED 1 1 0 1 0 0 0 0 0 1 1
Edit: There is one other case I should have mentioned
create table t2 (id int not null, cx int)
create clustered index ixc on dbo.t2 (cx asc)
alter table dbo.t2 add constraint pk_t2 primary key (id)
select * from sys.indexes where object_id = object_id('t2')
Gives
object_id name index_id type type_desc is_unique data_space_id ignore_dup_key is_primary_key is_unique_constraint fill_factor is_padded is_disabled is_hypothetical allow_row_locks allow_page_locks has_filter filter_definition
----------- ------------------------------ ----------- ---- ------------------------------ --------- ------------- -------------- -------------- -------------------- ----------- --------- ----------- --------------- --------------- ---------------- ---------- ------------------------------
34099162 ixc 1 1 CLUSTERED 0 1 0 0 0 0 0 0 0 1 1 0 NULL
34099162 pk_t2 2 2 NONCLUSTERED 1 1 0 1 0 0 0 0 0 1 1 0 NULL
With regard to the second part there is no golden rule it depends on your individual query workload, and what your PK is.
For satisfying individual lookups by primary key a non clustered index will be fine. If you are doing queries against ranges these would be well served by a matching clustered index but a covering non clustered index could also suffice.
You also need to consider the index width of the clustered index in particular as it impacts all your non clustered indexes and effect of inserts on page splits.
I recommend the book SQL Server 2008 Query Performance Tuning Distilled to read more about the issues.

Yes. By default a unique clustered index is created on all primary keys, but you can create a unique nonclustered index instead of you like.
As to what the appropriate choice, I'd say that for 80-90% of the tables you create, you generally want the clustered index to be the primary key, but that's not always the case.
You'd typically make the clustered index something else if you do heavy range scans on that "something else". For example, if you have a synthetic primary key*, but have a date column that you typically query in terms of a range, you'd often want that date column to be the most significant column in your clustered index.
*That's usually done by using an INT IDENTITY column as the PK on the table.

Yes, it builds a clustered index on the primary key by default.

To be direct, SQL does create an index on the PRIMARY KEY (PK) keyword. That index is a Unique, Clustered Index.
sqlvogel brings up an important point in his commoent above. You can only have one "CLUSTERED" index. If you already have one prior to declaring a PK then your key will be NONCLUSTERED. This is a little more detail than the default answer to this question. It should also be noted that PK's can not have NULL values.
Note, however that that index can vary depending on prior constraints or index on the table. Additionally, you can declare the details of this index upon creation depending on how you write out the code:
< table_constraint > ::= [ CONSTRAINT constraint_name ]
{ [ { PRIMARY KEY | UNIQUE }
[ CLUSTERED | NONCLUSTERED ]
{ ( column [ ASC | DESC ] [ ,...n ] ) }
[ WITH FILLFACTOR = fillfactor ]
[ ON { filegroup | DEFAULT } ]
]
Example:
CREATE TABLE MyTable
(
Id INT NOT NULL,
ForeignKeyId INT REFERENCES OtherTable(Id) NOT NULL,
Name VARCHAR(50) NOT NULL,
Comments VARCHAR(500) NULL,
PRIMARY KEY NONCLUSTERED (Id, ForeignKeyId)
)

No it doesn't in the case where there is already an index defined on the table

I just come across the same confusion ,Just created a following script to test :
create database mytest
go
use mytest
go
create table x
(
id int primary key
)
go
create table y
(
id int
)
go
select * from sys.indexes where object_id = object_id(N'x') or object_id=object_Id(N'y')
go
First Table x having primary key get's the clustered index ,while table y did't get any
indexes ,as there is no primary key .
Confirmed following point about clustered Indexes :
When we create table with Primary Key Clustered Index will be created
by Default
When we create table without Primary key clustered index will not be
created .
There will be single clustered Index ,as it depends on the Index key
Value . Shorting of the data rows in the table can only be in the one
order based on Index key value .

get max from table where sum required

Suppose I have a table with following data:
gameId difficultyLevel numberOfQuestions
--------------------------------------------
1 1 2
1 2 2
1 3 1
In this example the game is configured for 5 questions, but I'm looking for a SQL statement that will work for n number of questions.
What I need is a SQL statement that given a question, displayOrder will return the current difficulty level of question. For example - given a displayOrder of 3, with the table data above, will return 2.
Can anyone advise how the query should look like?

I'd recommend a game table with a 1:m relationship with a question table.
You shouldn't repeat columns in a table - it violates first normal form.
Something like this:
create table if not exists game
(
game_id bigint not null auto_increment,
name varchar(64),
description varchar(64),
primary key (game_id)
);
create table if not exists question
(
question_id bigint not null auto_increment,
text varchar(64),
difficulty int default 1,
game_id bigint,
primary key (question_id) ,
foreign key game_id references game(game_id)
);
select
game.game_id, name, description, question_id, text, difficulty
game left join question
on game.game_id = question.game_id
order by question_id;

things might be easier for you if you change your design as duffymo suggests, but if you must do it that way, here's a query that should do the trick.
SELECT MIN(difficultyLevel) as difficltyLevel
FROM
(
SELECT difficltyLevel, (SELECT sum(numberOfQuestions) FROM yourtable sub WHERE sub.difficultyLevel <= yt.difficultyLevel ) AS questionTotal
FROM yourTable yt
) AS innerSQL
WHERE innerSQL.questionTotal >= #displayOrder

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Cassandra order by second clustering key - database

Related

In composite primary key, auto-increment first data for each second one in postgres?

Self Join on large tables slowness issue

Auto-increment Id based on composite primary key

Default Index on Primary Key

get max from table where sum required

Categories

Resources