SqlServer clustered index storage ( >1 columns )?

SqlServer clustered index storage ( >1 columns )? - sql-server

lets say i have a table like this :
a | b | c | d
______________
1 | 2 | 4 | 5
6 | 2 | 5 | 5
3 | 5 | 2 | 5
[a] column has clustered index
so the physical order which its stored is :
a | b | c | d
______________
1 | 2 | 4 | 5
3 | 5 | 2 | 5
6 | 2 | 5 | 5
now lets enhance the [a] index to be [a,c] ( still as clustered).
now , I can't udnerstand how it can be stored since [a] column is already sorted and [c] column cant be sorted ( because sorting of [a] hurts the sorting of [c])
so how does sqlServer will store it ?
2'nd question : do I need to open another index for [c] ?

I think you're missing something obvious. Consider what you would expect from the query
select * from myTable
order by [a], [c]
Your clustered index on columns [a,c] will give a physical layout with the same order.

Composite indexes produce lexicographical order: the records are additionally ordered on c when values of a are considered "equal".
a c
1 2
2 3 -- Within this block, records are sorted on [c]
2 5 --
2 7 --
3 7
4 1
5 6 -- Within this block, records are sorted on [c]
5 8 --
This is how dictionaries sort.
You need an additional index on c if you want to speed up queries not involving a:
SELECT *
FROM mytable
WHERE c = #some_value

Related

How can I take out each element of string to a separate column? [duplicate]

This question already has answers here:
How to split a comma-separated value to columns
(38 answers)
Closed 3 years ago.
I have table like:
|----|----------------|--------------------------|----------------------|
| id | tickets | comb1 | comb2 |
|---------------------|--------------------------|----------------------|
| 1 | 3146000011086..| ,13, ,31, ,50,66,77,..| ,22,38,40, , .. |
|---------------------|--------------------------|----------------------|
|2..n| 314600001924...| 5,14,23, , ,50, , ,..| 4,12,21, ,47, ,.. |
|-----------------------------------------------------------------------|
I need to take out each elements of comb1 and comb2 to columns like:
|---------------------|------------------|------------------|---------------|
| val_of_comb1(1) | val_of_comb1(2) | ..val_of_comb2(1)|val_of_comb2(2)|
|---------------------|------------------|------------------|---------------|
| | 13 | | 22 |
|---------------------|------------------|------------------|---------------|
| 5 | 14 | .. 4 | 12 |
|---------------------|------------------|------------------|---------------|
Maybe take out each element with loop? (but if I have a lot of records how it will affect the database) welcome any ideas

A. cross apply, pivot, and string_split
Here is a version if Comb1 splits into 12 strings.
drop table X
create table X
(
id int,
comb1 nvarchar(max)
);
insert into X values (1,',13, ,31, ,50,66,77,..');
insert into X values (2,'5,14,23, , ,50, , ,..');
-- From https://stackoverflow.com/questions/12195504/splitting-a-string-then-pivoting-result by Kannan Kandasamy
select * from (
select * from X x cross apply (select RowN=Row_Number() over (Order by (SELECT NULL)), value from string_split(x.Comb1, ',') ) d) src
pivot (max(value) for src.RowN in([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) as p
id comb1 1 2 3 4 5 6 7 8 9 10 11 12
1 ,13, ,31, ,50,66,77,.. 13 31 50 66 77 .. NULL NULL NULL
2 5,14,23, , ,50, , ,.. 5 14 23 50 .. NULL NULL NULL
B. Just STRING_SPLIT and code
One option is to use STRING_SPLIT which will return rows.
select value from STRING_SPLIT(',13, ,31, ,50,66,77,..',',');
value
13
31
50
66
77
..
You could then collect all the rows in your code and collect them as an array.

Limit the rows if same id repeats

I have a table like below
ID | s_id | mark
-----------------------
1 | 2 | 10
2 | 5 | 9
3 | 7 | 8
4 | 2 | 8
5 | 2 | 10
6 | 5 | 7
7 | 3 | 7
8 | 2 | 9
9 | 5 | 8
I need to get SQL query for output like:-
mark column need to be in descending order.
Same s_id should not repeat more than 2 times
if same s_id repeats more than 2 times, ignore the 3rd result
ID | s_id | mark
-----------------------
1 | 2 | 10
2 | 2 | 9
3 | 3 | 7
4 | 5 | 9
5 | 5 | 8
6 | 7 | 8

Assuming you're using SQL Server, you can just use ROW_NUMBER() to assign a row number to each s_id group based on a descending order of the mark column. Then, retain only those records where this row number is 1 or 2.
SELECT
t.ID, t.s_id, t.mark
FROM
(
SELECT ID, s_id, mark, ROW_NUMBER() OVER (PARTITION BY s_id ORDER BY mark DESC) rn
FROM yourTable
) t
WHERE t.rn <= 2
ORDER BY t.s_id;
Note: You'll notice that the record (s_id, mark) = (2, 10) appears twice in my result set. Based on your input data, this is what is generated. If you really intended to also remove duplicate (s_id, mark) pairs, then let us know and a small correction can be added to the query.
Output:
Demo here:
Rextester

try this code.
;WITH cte
AS (
SELECT ROW_NUMBER() OVER (PARTITION BY s_id
ORDER BY ( SELECT 0)) RN,ID,s_id,mark
FROM aaa)
select RN,ID,s_id,mark FROM cte
WHERE RN <= 2
order by s_id,mark desc;

oracle database, Omitting records from a table which are subset of records in same table based on two columns

I have some customers which are clustered into groups by cluster_id and the amount of data is huge (So performance is a matter here). The simplest form of what I have is the following table:
cust_id | cluster_Id
---------- | -----------
1 | 1
2 | 1
3 | 1
2 | 2
1 | 2
2 | 3
4 | 3
1 | 4
I want those clusters with greatest number of customers such that no costumer be removed. In other words, I want to delete records of a cluster that is subset of another cluster.
In above example, the output table should look like this:
cust_id cluster_Id
-------- | ----------
1 | 1
2 | 1
3 | 1
2 | 3
4 | 3

Maintain consistent PostgreSQL array column indices while concatenating array columns when dealing with empty column values

Given the following starting data:
CREATE TABLE t1 AS
SELECT generate_series(1, 20) AS id,
(SELECT array_agg(generate_series) FROM generate_series(1, 6)) as array_1;
CREATE TABLE t2 AS
SELECT generate_series(5, 10) AS id,
(SELECT array_agg(generate_series) FROM generate_series(7, 10)) as array_2;
CREATE TABLE t3 AS
SELECT generate_series(8, 15) AS id,
(SELECT array_agg(generate_series) FROM generate_series(11, 15)) as array_3;
I would like to do an outer join between several tables, each with a fixed-length array column that is uniform within a given table, but may differ from table to table (as in the examples above), concatenating the array columns in each table into one large array column. I was wondering if there is an efficient or straightforward way to maintain consistent indexing in the new combined column, replacing NULL column values (caused by the outer join) with an array of NULL values so that the final array column will have a uniform length. Unlike in the above example, in my actual use case, I won't know the length of each table's array column a priori, only that it will be of a uniform length throughout that table. In other words, instead of this query:
SELECT id, (array_1 || array_2 || array_3 ) AS combined_array FROM
t1 LEFT OUTER JOIN t2 USING(id) LEFT OUTER JOIN t3 USING (id);
Which produces:
id | combined_array
----+---------------------------------------
1 | {1,2,3,4,5,6}
2 | {1,2,3,4,5,6}
3 | {1,2,3,4,5,6}
4 | {1,2,3,4,5,6}
5 | {1,2,3,4,5,6,7,8,9,10}
6 | {1,2,3,4,5,6,7,8,9,10}
7 | {1,2,3,4,5,6,7,8,9,10}
8 | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
9 | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
10 | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
11 | {1,2,3,4,5,6,11,12,13,14,15}
12 | {1,2,3,4,5,6,11,12,13,14,15}
13 | {1,2,3,4,5,6,11,12,13,14,15}
14 | {1,2,3,4,5,6,11,12,13,14,15}
15 | {1,2,3,4,5,6,11,12,13,14,15}
16 | {1,2,3,4,5,6}
17 | {1,2,3,4,5,6}
18 | {1,2,3,4,5,6}
19 | {1,2,3,4,5,6}
20 | {1,2,3,4,5,6}
(20 rows)
I would like the result to look like:
id | combined_array
----+---------------------------------------
1 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
2 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
3 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
4 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
5 | {1,2,3,4,5,6,7,8,9,10,NULL,NULL,NULL,NULL,NULL}
6 | {1,2,3,4,5,6,7,8,9,10,NULL,NULL,NULL,NULL,NULL}
7 | {1,2,3,4,5,6,7,8,9,10,NULL,NULL,NULL,NULL,NULL}
8 | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
9 | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
10 | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
11 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,11,12,13,14,15}
12 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,11,12,13,14,15}
13 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,11,12,13,14,15}
14 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,11,12,13,14,15}
15 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,11,12,13,14,15}
16 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
17 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
18 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
19 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
20 | {1,2,3,4,5,6,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}
(20 rows)
So that each row contains an array of length 15.

To answer my own question, here is the query that I came up with that appears to do the job. It doesn't seem particularly elegant or efficient to me so definitely still open to other answers.
SELECT id, (
coalesce(array_1, array_fill(NULL::INT,
ARRAY[(SELECT max(array_length(array_1, 1)) FROM t1)])) ||
coalesce(array_2, array_fill(NULL::INT,
ARRAY[(SELECT max(array_length(array_2, 1)) FROM t2)])) ||
coalesce(array_3, array_fill(NULL::INT,
ARRAY[(SELECT max(array_length(array_3, 1)) FROM t3)]))
) AS combined_array FROM
t1 LEFT OUTER JOIN t2 USING(id) LEFT OUTER JOIN t3 USING (id);

SQL Update everytime column hits x number rows

I have table call question with two columns, it contains more than 160K rows, example:
id | questionID
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
6 | 6
7 | 7
8 | 8
9 | 9
10 | 10
...
I would like to update the questionID column so it will look like the example below. For every x number rows it need update to set from 1 again. The final result should be something like this:
id | questionID
1 | 1
2 | 2
3 | 3
4 | 4
5 | 1
6 | 2
7 | 3
8 | 4
9 | 1
10 | 2
...
The table contains some many rows, so its not an option do it manually.
What could be the easiest way to update the table?
Any help will be appreciated. Thanks

If you are going to use the modulus operator. Both SQL Server and MySQL support %:
UPDATE question
SET questionID = 1 + ((id - 1) % 4);
If the numbers have gaps, then you need to do something different. In that case, the solution is highly database dependent.

Simply use modulo operator:
UPDATE question
SET questionID = CASE WHEN id % 4 = 0 THEN 4 ELSE id % 4 END
or, if id has gaps and you are using SQL Server, then you can use this:
UPDATE q1
SET id = (CASE WHEN q2.rn % 4 = 0 THEN 4 ELSE q2.rn % 4 END)
FROM question q1
INNER JOIN (
SELECT id, ROW_NUMBER() OVER (ORDER by id) AS rn
FROM question ) q2 ON q1.ID = q2.ID

UPDATE question SET questionID = questionID % 4 + 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SqlServer clustered index storage ( >1 columns )? - sql-server

I think you're missing something obvious. Consider what you would expect from the query select * from myTable order by [a], [c] Your clustered index on columns [a,c] will give a physical layout with the same order.

Related

How can I take out each element of string to a separate column? [duplicate]

Limit the rows if same id repeats

oracle database, Omitting records from a table which are subset of records in same table based on two columns

Maintain consistent PostgreSQL array column indices while concatenating array columns when dealing with empty column values

SQL Update everytime column hits x number rows

Categories

Resources