I want to start off saying I am not a database guru, but I'm decent with the basics.
I have a set of IO data that I'm storing in two tables which are uniquely identified by 'ioid' and 'machinenum'.
I have a 2 tables: IOConfig which uniquely identifies points (all the identifying information and a primary key: ConfigID). And a data table that contains samples of these items.
My table layouts below are to test using a primary key + index versus using just an index, so I know there is duplicate data.
Think of IOConfig table as such:
ConfigId(PK) machineNum ioId ioType
Think of IOData table as such:
Timestamp ConfigId machineNum ioId value
If I use the ConfigID primary key, with an index on (timestamp,ConfigId) my query is like this:
select * from AnalogInput
where sampleTimestamp>=1520306916007000000 and sampleTimestamp<=1520351489939000000
and configId in (1112)
"0" "0" "0" "SEARCH TABLE IOData USING INDEX cfgIndexAnalogInput (configId=? AND sampleTimestamp>? AND sampleTimestamp<?)"
If I avoid using ConfigID the query is like this:
select * from AnalogInput
where sampleTimestamp>=1520306916007000000 and sampleTimestamp<=1520351489939000000
and ioId in (1)
and machineid=1111
"0" "0" "0" "SEARCH TABLE IOData USING INDEX tsIndexAnalogInput (sampleTimestamp>? AND sampleTimestamp<?)"
Why wouldn't I get the improvement that I see with the first query + Index of (timestamp,configid) for the second query using an index of (timestamp,machineNum,ioid)? I ask because machineNum and ioid are used to define what point is unique to make a configId primary key... so one would expect them to equate?
schema:
CREATE TABLE 'IOData'(
'sampleTimestamp' INTEGER,
'configId' INTEGER,
'machineId' INTEGER,
'ioId' INTEGER,
'value' REAL);
CREATE TABLE 'IOConfig'(
'sampleTimestamp' INTEGER,
'configId' INTEGER PRIMARY KEY,
'machineId' INTEGER,
'ioId' INTEGER,
'ioType' INTEGER);
CREATE INDEX `Something` ON `IOData` (`sampleTimestamp` ASC,`machineId` ASC,`ioId` ASC)
CREATE INDEX cfgIndexAnalogInput ON IOData(configId,sampleTimestamp)
CREATE INDEX tsIndexAnalogInput ON IOData(sampleTimestamp)
Read Query Planning to understand how indexes work, and The SQLite Query Optimizer Overview to see what specific optimization will be applied.
In this case, the filter on sampleTimestamp uses inequality comparisons, so, according to section 1.0, that must be the last column in the index (either in an explicit index, or in a three-column primary key):
CREATE INDEX SomethingBetter ON IOData(machineId, ioId, sampleTimestamp);
Related
I am going to insert a 2.3 billion rows (2,300,000,000) from table_a into table_b. The schema of table_a and table_b are identical, the only difference is table_a doesn't have a primary key but table_b has set up a 4 columns compound primary key with 0 rows of data. I encounter the error message after 24 hours:
Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 422223771074560. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
This is my compound PK in table_b and the sample query code, any help will be thankful.
column1: varchar(10), not null
column2: nvarchar(50), not null
column3: nvarchar(100), not null
column4: int, not null
Sample code
insert into table_b
select *
from table_a
where date < '2017-01-01' -- some filters here
According to the SQL Server Documentation part of creating a primary key includes creating a unique index on that same table.
When you create a PRIMARY KEY constraint, a unique index on the
column, or columns, is automatically created. By default, this index
is clustered; however, you can specify a nonclustered index when you
create the constraint.
When a unique index is not on the table, each row gets what the docs are calling a "uniqueifier" which is 4 bytes in length (aka ~2.14 Billion combinations)
If the clustered index is not created with the UNIQUE property, the
Database Engine automatically adds a 4-byte uniqueifier column to the
table. When it is required, the Database Engine automatically adds a
uniqueifier value to a row to make each key unique. This column and
its values are used internally and cannot be seen or accessed by
users.
From this information and your error message we can tell two things:
There is a clustered index on the table
There is not a primary key on the table
Given the volume of the data you're dealing with, I'm betting you have a Clustered Columnstore Index on the table, which in SQL Server 2014 does not have the ability to have a primary key on.
One possible solution is to partition table_b based on particular column value (that has less than 15K unique values based on the limitations specified in the documentation). As a side-note, the same partitioning effort could have a significant impact on minimizing run time of any queries using table_b depending on which column is used in the partition function.
You know that:
If the clustered index is not created with the UNIQUE property, the
Database Engine automatically adds a 4-byte uniqueifier column to the
table. When it is required, the Database Engine automatically adds a
uniqueifier value to a row to make each key unique. This column and
its values are used internally and cannot be seen or accessed by
users.
While it´s unlikely that you will face an issue related with uniqueifiers, we have seen rare cases where customer reaches the uniqueifier limit of 2,147,483,648, generating error 666.
And from this topic about the issue we have:
As of February 2018, the design goal for the storage engine is to not
reset uniqueifiers during REBUILDs. As such, rebuild of the index
ideally would not reset uniquifiers and issue would continue to occur,
while inserting new data with a key value for which the uniquifiers
were exhausted. But current engine behavior is different for one
specific case, if you use the statement ALTER INDEX ALL ON
REBUILD WITH (ONLINE = ON), it will reset the uniqueifiers (across all
version starting SQL Server 2005 to SQL Server 2017).
So, if this is the cause if your issue, you can add additional integer column and build the index over it.
I have a table with 2 columns as primary key like below.
create table table1(key1 int NOT NULL,key2 int NOT NULL,content NVARCHAR(MAX), primary key(key1,key2))
I have created index on table with this query
CREATE unique INDEX index1 ON table1 (key1,key2);
and with this query, I create full-text searching
create fulltext index on table1 (content ) key index index1;
but I get this error because index must be single-column
'index1' is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key.
and with single Column indexing, when I insert a new row I get a duplicate error.
what should I do?
I am using SQL Server and EF orm
Update
i solve this problem by creating a computed column that return unique data
ALTER TABLE Table1 ADD indexKey AS cast(key1 as float) + cast((cast(key2 as float)/power(10,len(key2))) as float) PERSISTED not null
and i create my index on this column and it work pretty fine.
I am using SQL Server 2012 & am creating a table that will have 8 columns, types below
datetime
varchar(12)
varchar(6)
varchar(100)
float
float
int
datetime
Once a day (normally) there will be an upload of approx 10,000 rows of data. Going forward its possible it could be 100,000.
The rows will be unique if I group on the first three columns listed above. I have read I can use the unique constraint on multiple columns which will guarantee the rows are unique.
I think I'm correct in saying that the unique constraint by default sets up non-clustered index. Would a clustered index be better & assuming when the table starts to contain millions of rows this won't cause any issues?
My last question. By applying the unique constraint on my table I am right to say querying the data will be quicker than if the unique constraint wasn't applied (because of the non-clustering or clustering) & uploading the data will be slower (which is fine) with the constraint on the table?
Unique index can be non-clustered.
Primary key is unique and can be clustered
Clustered index is not unique by default
Unique clustered index is unique :)
Mor information you can get from this guide.
So, we should separate uniqueness and index keys.
If you need to kepp data unique by some column - create uniqe contraint (unique index). You'll protect your data.
Also, you can create primary key (PK) on your columns - they will be unique also. But, there is a difference: all other indexies will use PK for referencing, so PK must be as short as possible. So, my advice - create Identity column (int or bigint) and create PK on it. And, create unique index on your unique columns.
Querying data may become faster, if you do queries on your unique columns, if you do query on other columns - you need to create other, specific indexies.
So, unique keys - for data consistency, indexies - for queries.
I think I'm correct in saying that the unique constraint by default
sets up non-clustered index
TRUE
Would a clustered index be better & assuming when the table starts to
contain millions of rows this won't cause any issues?
(1)if u need to make (datetime ,varchar(12), varchar(6)) Unique
(2)if you application or you will access rows using datetime or datetime ,varchar(12) or datetime ,varchar(12), varchar(6) in where condition
ALL the time
then have primary key on (datetime ,varchar(12), varchar(6))
by default it will put Uniqness and clustered index on all above three column.
but as you commented above:
the queries will vary to be honest. I imagine most queries will make
use of the first datetime column
and you will deal with huge data and might join this table with other tables
then its better have a surrogate key( ever-increasing unique identifier ) in the table and to satisfy your Selects
have Non-Clustered INDEXES
Surrogate Key vs Business Key
NON-CLUSTERED INDEX
I am reusing portions of a PostgreSQL schema in a SSQL database.
This is a snippet of my SQL statements:
CREATE TABLE pac_region
(id INTEGER NOT NULL PRIMARY KEY,
country_id INTEGER REFERENCES Country(id) ON UPDATE CASCADE ON DELETE NO ACTION,
name VARCHAR(256) NOT NULL
);
CREATE UNIQUE INDEX idxu_pac_region_name ON pac_region(country_id, name(32));
I want to specify that only the first 32 chars of the name need to be unique (when combined with the country_id).
SSMS barfs at the (32) specification. What is the correct way to restrict the length of a text used in a compound index, in TSQL?
I don't think you can create a index partially on a column, like what you are trying.
Rather, you can create a persisted computed column and add index on that column like
Taken from Create Index on partial CHAR Column
alter table pac_region
add Computed_Name as cast(name as varchar(32)) persisted;
CREATE UNIQUE INDEX idxu_pac_region_name
ON pac_region(country_id, Computed_Name);
(OR)
Probably by creating a indexed view.
Question about SQLite.
In the CREATE TABLE SQL, we can add UNIQUE constraints in either way: column-constraint or table-constraint. My question is simple. Do they work differently?
The only difference I could find was, in table-constraint, there could be multiple indexed-columns in a single constraint.
Column-constraint:
Table-constraint:
Here is an example:
CREATE TABLE Example (
_id INTEGER PRIMARY KEY,
name TEXT UNIQUE ON CONFLICT REPLACE,
score INTEGER
)
and
CREATE TABLE Example (
_id INTEGER PRIMARY KEY,
name TEXT,
score INTEGER,
UNIQUE (name) ON CONFLICT REPLACE
)
Are they different?
In this case there is no difference.
However, you could create an unique constraint on table, that would span over two different columns. Like this:
CREATE TABLE Example (
_id INTEGER PRIMARY KEY,
name TEXT,
index INTEGER,
score INTEGER,
UNIQUE (name, index) ON CONFLICT REPLACE
)
Consult this post for further details:
SQLite table constraint - unique on multiple columns