SQL Index Identity Column Sort Order - sql-server

I have a transaction table:
|rowID|value1|value2|...|
|11111|12 |34 |...|
|11112|23 |123 |...|
|11113|99 |53 |...|
...
RowID is the Identity and increments by 1. Indexing is also no problem. Lots of new values get inserted, updates happen and sometimes some rows may get deleted.
But now I have a second table:
|rowID|flag1|flag2|...|
|11113|0 |1 |...|
|11111|1 |1 |...|
|11112|0 |1 |...|
...
It is a user operation, which inserts rows from the transaction table into this second table. RowID corresponds to RowID from the transaction table.
The insert to the second table are not sorted by the RowID. A higher RowID may be inserted earlier, than a lower RowID.
What is the best indexing strategy for such a table?
Is it wise to define RowID in the second table as primary key, resulting in a clustered index? Which, I think, is not ideal because the RowID's are not sorted.
Is it better to have no primary key, but a suitable non-clustered index?
I wonder if there are some general advise for such a table (second table)?

Tables are not ordered dataset. The fact RowID isnt order on the second table doesnt matter at the moment of define PK/index.
You should make RowID the PK for Table2 The moment the index is created a B-TREE is also created and the data is stored there for fastest access.

Related

Bad design to compare to computed columns?

Using SQL Server I have a table with a computed column. That column concatenates 60 columns:
CREATE TABLE foo
(
Id INT NOT NULL,
PartNumber NVARCHAR(100),
field_1 INT NULL,
field_2 INT NULL,
-- and so forth
field_60 INT NULL,
-- and so forth up to field_60
)
ALTER TABLE foo
ADD RecordKey AS CONCAT (field_1, '-', field_2, '-', -- and so on up to 60
) PERSISTED
CREATE INDEX ix_foo_RecordKey ON dbo.foo (RecordKey);
Why I used a persisted column:
Not having the need to index 60 columns
To test to see if a current record exists by checking just one column
This table will contain no fewer than 20 million records. Adds/Inserts/updates happen a lot, and some binaries do tens of thousands of inserts/updates/deletes per run and we want these to be quick and live.
Currently we have C# code that manages records in table foo. It has a function which concatenates the same fields, in the same order, as the computed column. If a record with that same concatenated key already exists we might not insert, or we might insert but call other functions that we may not normally.
Is this a bad design? The big danger I see is if the code for any reason doesn't match the concatenation order of the computed column (if one is edited but not the other).
Rules/Requirements
We want to show records in JQGrid. We already have C# that can do so if the records come from a single table or view
We need the ability to check two records to verify if they both have the same values for all of the 60 columns
A better table design would be
parts table
-----------
id
partnumber
other_common_attributes_for_all_parts
attributes table
----------------
id
attribute_name
attribute_unit (if needed)
part_attributes table
---------------------
part_id (foreign key to parts)
attribute_id (foreign key to attributes)
attribute value
It looks complicated but due to proper indexing this is super fast even if part_attributes contain billions of records!

SQL Server - Unique index vs Unique constraint - Re. Duplicate values

A unique index ensures that the values in the index key columns are unique.
A unique constraint guarantees that no duplicate values can be inserted into the column(s) on which the constraint is created. When a unique constraint is created a corresponding unique index is automatically created on the column(s).
Qusetions:
Can duplicate values be inserted if we have a unique index on a column and no unique constraint?
What about existing duplicates in any on the column - will it allow to create unique index or unique constraint?
Can duplicate values be inserted if we have a unique index on a column
and no unique constraint?
Generally, duplicate values cannot be inserted and an error is raised when a unique index exists on the column. The exceptions are:
Index was created with the IGNORE_DUP_KEY option. No error is raised and the insert is ignored.
The non-clustered index is filtered such that the duplicate value does not satisfy the index WHERE clause. The row is inserted but not reflected in the non-clustered index.
What about existing duplicates in any on the column - will it allow to
create unique index or unique constraint?
No, with the exception of the filtered index mentioned above.
Can duplicate values be inserted if we have a unique index on a column and no unique constraint?
No, the values of the columns within the index must create a unique set of data within that index.
What about existing duplicates in any on the column - will it allow to create unique index or unique constraint?
No, you cannot create a Unique Index on a table that has duplicate values.
This easiest way to have found this out would be to try (I suggest for things like that doing so, it's a great way of learning):
CREATE TABLE dbo.SomeTable (SomeInt int, AnotherInt int);
GO
INSERT INTO dbo.SomeTable (SomeInt,
AnotherInt)
VALUES (1,1),
(1,2),
(2,1);
GO
--Create a unique index on a column with duplicate values
CREATE UNIQUE INDEX UQ_SomeInt ON dbo.SomeTable(SomeInt); --fails
GO
--Create a unique index on the 2 columns, as they are unique
CREATE UNIQUE INDEX UQ_Some_AnotherInt ON dbo.SomeTable(SomeInt, AnotherInt); --Succeeds
GO
--Try to insert a duplicate value
INSERT INTO dbo.SomeTable (SomeInt,
AnotherInt)
VALUES(2,1); --fails
GO
SELECT *
FROM dbo.SomeTable
GO
DROP TABLE dbo.SomeTable;
One potentially unintuitive scenario that confused me at first: postgres does not treat NULL values as equal. If your table looked like this:
+-------+-------+-------+
|id |a |b |
+-------+-------+-------+
|1 |0 |NULL |
|2 |0 |NULL |
+-------+-------+-------+
You could still add a unique index on columns a and b. According to Postgres, row with id 1 and row with id 2 have the same value for column a, but different values for column b

how best to index columns in a table with dynamic relationships to other tables?

Some of our tables have dynamic relationships to other tables.
for example - we have an address table that stores all addresses, in it are two 'linking' fields- entity_id and entity_key_id, that are used to link the addresses to other tables.
For instance 'member' might be entity_id 1 and 'organization' might be entity id 2, so if we are storing a member address the row would have entity_id = 1 and entity_key_id = mem_id (the pk of the mem table), but if we are storing an organization address, the row would have entity_id = 2 and entity_key_id would store the pk of the org table).
how best would I index this? should I have 2 indexes- one for the entity_id and one for the entity_key_id? or would it be better to include both columns in a single index, and if so, in what order?
The db is SQL server 2008 R2
It depends on queries you are going to run against this database. You can use Tuning Adviser (https://msdn.microsoft.com/en-us/library/ms166575(v=sql.100).aspx), it will help.
But generally you should have index with columns in the order you use for predicate (WHERE) and columns you select. Here are some examples.
SELECT ... from table1 where table1.column1=.. and table1.column2=...
Here you should have column1, column2 index, so DB will be able to first find all column1 and then all column2 directly from index. You may also have column1 index, but in this case DB will first read column1 from index, and then go to table itself, which is slower.
But if your have column2, column42, column1 index it will not be used since DB can't follow your WHERE condition.
It is also good to have column in index.
Running select column1 from table1 where column2=... with index column2, column1 will give DB ability to read both columns from index and not even touch table! It is fast. But if you change order in this index, it will not be used becase DB needs column2 (according to WHERE).
You should always profiler to obtain execution plan: use https://msdn.microsoft.com/en-us/library/ms181091(v=sql.100).aspx : not only it helps you to find bottlenecks but also teaches you how DB optimizer uses indecies.
A cleaner approach would be to have two distinct keys mem_key_id and org_key_id. This not only allows you to create an index on each of them but also to declare proper foreign key constraints.
One of the two keys would always be null.

foreign key in databases and creating a join table

I have question regarding the Associative or the join table we create for the relationship between two entities.
I know the that the foreign key can be NULL in the join table.But should the join table only contain the relationships.As in if in a bank there is a customer(key-id) and a loan(key-id) entity.Let borrow be the relationship between it.Now suppose there are customers who "haven't taken a loan".
So should i take those customers id in the borrow table and the corresponding foreign key for loan-id to be NULL.Or i shouldn't take those customers in the borrow table.
And what can be a good primary key for the join table.And is the primary key for the join table required.
You are right having a join table between customer and loan.
But you do not need to do anything in this table until there is an actual borrow.
Your primary key for the borrow table should be a composite primary key. Made of customer_id and load_id
Customer
customer_id | name | ...
1 | Jon | ...
2 | Harry | ...
Loan
load_id | amount | ...
1 | 1000 | ...
2 | 2000 | ...
Borrow
customer_id | load_id
1 | 1
1 | 2
In this example you can see that Jon has to loans and respectivley there are two records in the borrow table. Harry is a customer, but he has no loan and so there is no record in the borrow table for him.
Every table (base or query result) has a parameterized statement (aka predicate):
customer [customer_id] has taken out loan [loan_id]
Borrows(customer_id,loan_id)
When you plug in a row like VALUES (customer_id,loan_id) (8,3) you get a statement (aka proposition):
customer 8 has taken loan 3
The rows that make true statements go in the table. The rows that make false statements stay out of the table. So every row that fits in a table makes a statement whether it is in it or not!
The table predicate corresponds to an application relationship wher parameters correspond to columns. A row says something about those values and about identified application entities via them.
You pick the application relationships ie table predicates. Then you look at an application situation and put every true row into the tables. Or you look at the tables and see what things are true (per present rows) and false (per absent rows).
Queries also have predicates per their conditions and their logical and relational operators. And their results hold the rows that make them true.
So when someone hasn't taken a loan their customer_id doesn't appear in any row in Borrows. And when a loan has not been taken by anyone then its loan_id doesn't appear in any row of Borrows.
If a column can be null then its table's predicate often looks like:
[name] IS NULL AND [customer_id] identifies a customer
OR [name] IS NOT NULL
AND [customer_id] identifies a customer
AND customer [customer_id] is named [name]
Customer(customer_id NOT NULL,name NULL)
(Using NULL in other ways gets even more complicated. And we try to remove NULLs in queries as near to when they're introduced as possible.)
We determine candidate keys like usual and pick one as a primary key as ususal. Eg the key for Borrows is (customer_id,name) because that set's values are unique and there is no smaller unique subset. But determining keys involves columns that are UNIQUE NOT NULL (which PRIMARY KEY is just a synonym for as a constraint). But we don't ever need to use NULL in a column because instead of a predicate/table like the above we can have two:
[customer_id] identifies a customer
Customer(customer_id NOT NULL)
customer [customer_id] is named [name]
Customer(customer_id NOT NULL,name NOT NULL)
Just like always a row goes in a table if and only if it makes a true statement.
See this.

Resetting Primary key without deleting truncating table

I have a table with a primary key, now without any reason I don't know when I am inserting data it is being loaded like this
Pk_Col Some_Other_Col
1 A
2 B
3 C
1002 D
1003 E
1901 F
Is there any way I can reset my table like below, without deleting/ truncating the table?
Pk_Col Some_Other_Col
1 A
2 B
3 C
4 D
5 E
6 F
You can't update the IDENTITY column so DELETE/INSERT is the only way. You can reseed the IDENTITY column and recreate the data, like this:
DBCC CHECKIDENT ('dbo.tbl',RESEED,0);
INSERT INTO dbo.tbl (Some_Other_Col)
SELECT Some_Other_Col
FROM (DELETE FROM tbl OUTPUT deleted.*) d;
That assumes there are no foreign keys referencing this data.
If you really, really want to have neat identity values you can write a cursor (slow but maintainable) or investigate any number of "how can I find gaps in my sequence" question on SO and perform an UPDATE accordingly (runs faster but tricky to get right). This becomes exponentially harder when you start having foreign keys pointing back to this table. Be prepared to re-run this script any time data is put into, or removed from this table.
Edit: IDENTITY columns cannot be updated per se. You can, however, SET IDENTITY_INSERT dbo.MyTable ON;, INSERT a row with the desired IDENTITY value and the values from the other columns of an existing row, then DELETE the existing row. The nett effect on the data being the same as an UPDATE.
The only sensible reason to do this is if your table has about two billion rows and you're about to run out of integers for your identity column. If that's the case you have a whole world of other stuff to worry about, too.
But seriously - listen to #Damien, don't worry about it.
ALTER TABLE #temp1
DROP CONSTRAINT PK_Id
ALTER TABLE #temp1
DROP COLUMN Id
ALTER TABLE #temp1
ADD Id int identity(1,1)
Try this one.

Resources