I have table with time entries with following columns:
Id (PK)
Date
EmployeeId
State (state of the entry New, Approved, etc.)
Quantity
And I would like to create an indexed view which groups time entries by day and employee. So I used:
CREATE VIEW dbo.Test1
WITH SCHEMABINDING
AS
SELECT
Date, EmployeeId, SUM(Quantity), SUM(CASE State = 1 THEN Quantity ELSE NULL END) AS QuantityApproved
FROM
TimeEntries
GROUP BY
EmployeeId, Date
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON dbo.Test1 (EmployeeId, Date);
GO
But when I try to make it an indexed view an error occurs:
Cannot create the clustered index "IDX_V1" on view "dbo.Test1" because the view references an unknown value (SUM aggregate of nullable expression). Consider referencing only non-nullable values in SUM. ISNULL() may be useful for this.
Obviously using ISNULL would help in case of QuantityApproved column. But this is not a solution for me as 0 may also indicate there are 2 records (Quantity=-1 and QUantity=1) on the same day.
Also I can use an auxiliary column for ABS value for this case, but having NULL there is very convenvient as I do not need to solve anything else.
Is there any other way to overcome this?
Related
I recently migrated my SQL 2019 database from a VM into Azure SQL.
I used the MS Data Migration tool, but unfortunately, it wouldn't migrate data from Temporal Tables.
So. I just used the tool to create the table schemas and then used SSIS to move the data.
Since my existing history table had data in it, I wanted to keep the SysStartDate and SysEndDate fields. In order to do this, I had to disable SYSTEM_VERSIONING in my Azure SQL database as well as DROP the PERIOD on the table.
The data migration was a success so I re-created my PERIOD on the table but when I tried to enable SYSTEM_VERSIONING with a specified history table, I get the following error:
Msg 13573, Level 16, State 0, Line 34
Setting SYSTEM_VERSIONING to ON failed because history table 'xxxxxHistory' contains overlapping records.
I find this odd because the existing tables were originally joined as a temporal table so I don't understand why there would be a conflict now.
ALTER TABLE xxx.xxx
ADD PERIOD FOR SYSTEM_TIME(SysStartTime, SysEndTime)
ALTER TABLE xxx.xxx
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=xxx.xxxHistory))
I expect to get a successful temporal table. Instead, I get the following error:
Msg 13573, Level 16, State 0, Line 34
Setting SYSTEM_VERSIONING to ON failed because history table 'xxxxxHistory' contains overlapping records.
I ran the following query to identify the overlaps but I don't get any:
SELECT
xxxxKeyNumeric
,SysStartTime
,SysEndTime
FROM
xxxx.xxxxhistory o
WHERE EXISTS
(
SELECT
1
FROM
xxxx.xxxxhistory o2
WHERE
o2.xxxxKeyNumeric = o.xxxxKeyNumeric
AND o2.SysStartTime <= o.SysEndTime
AND o.SysStartTime <= o2.SysEndTime
AND o2.xxxxPK != o.xxxxPK
)
ORDER BY
o.xxxxKeyNumeric,
o.SysStartTime
I found this explanation for the error:
"There are multiple records for the same record with overlapping start and end dates. The end date for the last row in the history table should match the start date for the active record in the parent table" blog of a DBA
This happened to me after switching the historic table, touching a few rows, then trying to go back to the old historic table.
UPDATE: Happened again, and this time the table had millions of rows. I had to write a query, comparing the start date and end date of every row in the history table.
Possible causes:
For every PK, the start dates and end dates of the history rows must not overlap. The query below will find this specific issue.
the end date of the latest row in the history for that PK, has a later end date than the start date of the PK in the main table. It is possible to modify the above query to do this
in the rows with a same PK, 2 rows cover the same time interval. If they overlap by a single millisecond, and someone requests that exact millisecond, it won't know which of the 2 versions is the correct one.
For the first issue:
select ant.*,post.* , DATEDIFF(day,ant.end_date,post.start_date)
from
(SELECT
PK_column
, start_date
, end_date
, ROW_NUMBER() OVER(PARTITION BY PK_column ORDER BY end_date desc, start_date desc) AS current
,(ROW_NUMBER() OVER(PARTITION BY PK_column ORDER BY end_date desc, start_date desc))-1 AS previous
FROM huge_table_HIST
) ant
inner join
(SELECT
key_column
, start_date
, end_date
, ROW_NUMBER() OVER(PARTITION BY PK_column ORDER BY end_date desc, start_date desc ) AS current
FROM huge_table_HIST
) post
ON ant.PK_column=post.PK_column AND ant.previous=post.current
WHERE ant.end_date > post.start_date
Surprisingly, it doesn't fail if:
you have multiple rows with exactly the same start end and end date, for the same PK. SQL Server seems to consider them a single point in space, instead of an interval. They will only appear if you request the exact millisecond in which they exist.
there are gaps between the end date of a history row, and the start end of the next one. SQL server considers that the PK just didn't exist in that time interval.
Temporal tables depend on the temporal table's primary key values combined with the SysStartTime do determine uniqueness in the history table.
This can very easily happen if you make changes to the primary key definition. Also, if your history table's fields corresponding to the temporal table's PK are not populated, or many / all are populated with a default value, overlaps are detected and you get that error.
Check that your PK is defined on the system versioned temporal table, then check that the corresponding values in your history table's primary key fields are correct (i.e. unique for any given PK & SysStartTime value.)
You may have to update the history table accordingly before applying the system versioning relationship again.
This error can also occur when there are multiple records per Primary Key for any given
GENERATED ALWAYS AS ROW START or GENERATED ALWAYS AS ROW END columns.
The following queries will help identify those records.
select ID
from dbo.HistoryTable
group by ID, SysStartTime
having count(*) > 1
select ID
from dbo.HistoryTable
group by ID, SysEndTime
having count(*) > 1
I have a specific need for a computed column called ProductCode
ProductId | SellerId | ProductCode
1 1 000001
2 1 000002
3 2 000001
4 1 000003
ProductId is identity, increments by 1.
SellerId is a foreign key.
So my computed column ProductCode must look how many products does Seller have and be in format 000000. The problem here is how to know which Sellers products to look for?
I've written have a TSQL which doesn't look how many products does a seller have
ALTER TABLE dbo.Product
ADD ProductCode AS RIGHT('000000' + CAST(ProductId AS VARCHAR(6)) , 6) PERSISTED
You cannot have a computed column based on data outside of the current row that is being updated. The best you can do to make this automatic is to create an after-trigger that queries the entire table to find the next value for the product code. But in order to make this work you'd have to use an exclusive table lock, which will utterly destroy concurrency, so it's not a good idea.
I also don't recommend using a view because it would have to calculate the ProductCode every time you read the table. This would be a huge performance-killer as well. By not saving the value in the database never to be touched again, your product codes would be subject to spurious changes (as in the case of perhaps deleting an erroneously-entered and never-used product).
Here's what I recommend instead. Create a new table:
dbo.SellerProductCode
SellerID LastProductCode
-------- ---------------
1 3
2 1
This table reliably records the last-used product code for each seller. On INSERT to your Product table, a trigger will update the LastProductCode in this table appropriately for all affected SellerIDs, and then update all the newly-inserted rows in the Product table with appropriate values. It might look something like the below.
See this trigger working in a Sql Fiddle
CREATE TRIGGER TR_Product_I ON dbo.Product FOR INSERT
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE #LastProductCode TABLE (
SellerID int NOT NULL PRIMARY KEY CLUSTERED,
LastProductCode int NOT NULL
);
WITH ItemCounts AS (
SELECT
I.SellerID,
ItemCount = Count(*)
FROM
Inserted I
GROUP BY
I.SellerID
)
MERGE dbo.SellerProductCode C
USING ItemCounts I
ON C.SellerID = I.SellerID
WHEN NOT MATCHED BY TARGET THEN
INSERT (SellerID, LastProductCode)
VALUES (I.SellerID, I.ItemCount)
WHEN MATCHED THEN
UPDATE SET C.LastProductCode = C.LastProductCode + I.ItemCount
OUTPUT
Inserted.SellerID,
Inserted.LastProductCode
INTO #LastProductCode;
WITH P AS (
SELECT
NewProductCode =
L.LastProductCode + 1
- Row_Number() OVER (PARTITION BY I.SellerID ORDER BY P.ProductID DESC),
P.*
FROM
Inserted I
INNER JOIN dbo.Product P
ON I.ProductID = P.ProductID
INNER JOIN #LastProductCode L
ON P.SellerID = L.SellerID
)
UPDATE P
SET P.ProductCode = Right('00000' + Convert(varchar(6), P.NewProductCode), 6);
Note that this trigger works even if multiple rows are inserted. There is no need to preload the SellerProductCode table, either--new sellers will automatically be added. This will handle concurrency with few problems. If concurrency problems are encountered, proper locking hints can be added without deleterious effect as the table will remain very small and ROWLOCK can be used (except for the INSERT which will require a range lock).
Please do see the Sql Fiddle for working, tested code demonstrating the technique. Now you have real product codes that have no reason to ever change and will be reliable.
I would normally recommend using a view to do this type of calculation. The view could even be indexed if select performance is the most important factor (I see you're using persisted).
You cannot have a subquery in a computed column, which essentially means that you can only access the data in the current row. The only ways to get this count would be to use a user-defined function in your computed column, or triggers to update a non-computed column.
A view might look like the following:
create view ProductCodes as
select p.ProductId, p.SellerId,
(
select right('000000' + cast(count(*) as varchar(6)), 6)
from Product
where SellerID = p.SellerID
and ProductID <= p.ProductID
) as ProductCode
from Product p
One big caveat to your product numbering scheme, and a downfall for both the view and UDF options, is that we're relying upon a count of rows with a lower ProductId. This means that if a Product is inserted in the middle of the sequence, it would actually change the ProductCodes of existing Products with a higher ProductId. At that point, you must either:
Guarantee the sequencing of ProductId (identity alone does not do this)
Rely upon a different column that has a guaranteed sequence (still dubious, but maybe CreateDate?)
Use a trigger to get a count at insert which is then never changed.
I want to alter a table and add a column that is the sum of two other columns, and this column is auto computed when I add new data.
The syntax for a computed column specification is as follows:
column-name AS formula
If the column values are to be stored within the database, the PERSISTED keyword should be added to the syntax, as follows:
column-name AS formula PERSISTED
You didn't mention an example, but if you wanted to add the column "sumOfAAndB" to calculate the Sum of A and B, your syntax looks like
ALTER TABLE tblExample ADD sumOfAAndB AS A + B
Hope that helps.
Rather than adding this column to the table, I would recommend using a View to calculate the extra column, and read from that.
Here is a tutorial on how to create views here:
http://odetocode.com/Articles/299.aspx
Your view query would look something like:
SELECT
ColumnA, ColumnB, (ColumnA+ColumnB) as ColumnC
FROM
[TableName]
You can use a view, but you may want to use a PERSISTED computed value if you don't want to incur the cost of computing the value each time you access the view.
e.g.
CREATE TABLE T1 (
a INT,
b INT,
operator CHAR,
c AS CASE operator
WHEN '+' THEN a+b
WHEN '-' THEN a-b
ELSE a*b
END
PERSISTED
) ;
See the SQL docs assuming you're using SQL Server of course.
How to create computed column during creation of new table:
CREATE TABLE PRODUCT
(
WORKORDERID INT NULL,
ORDERQTY INT NULL,
[ORDERVOL] AS CAST
(
CASE WHEN ORDERQTY < 10 THEN 'SINGLE DIGIT' WHEN ORDERQTY >=10 AND ORDERQTY < 100 THEN 'DOUBLE DIGIT'
WHEN ORDERQTY >=100 AND ORDERQTY < 1000 THEN 'THREE DIGIT' ELSE 'SUPER LARGE' END AS NVARCHAR(100)
)
)
INSERT INTO PRODUCT VALUES (1,1),(2,-1),(3,11)
SELECT * FROM PRODUCT
UPDATE table_name SET total = mark1+mark2+mark3;
first think you can insert all the data into the table and then you can update the table like this form.
//i hop it's help to you
I am designing a database with a single table for a special scenario I need to implement a solution for. The table will have several hundred million rows after a short time, but each row will be fairly compact. Even when there are a lot of rows, I need insert, update and select speeds to be nice and fast, so I need to choose the best indexes for the job.
My table looks like this:
create table dbo.Domain
(
Name varchar(255) not null,
MetricType smallint not null, -- very small range of values, maybe 10-20 at most
Priority smallint not null, -- extremely small range of values, generally 1-4
DateToProcess datetime not null,
DateProcessed datetime null,
primary key(Name, MetricType)
);
A select query will look like this:
select Name from Domain
where MetricType = #metricType
and DateProcessed is null
and DateToProcess < GETUTCDATE()
order by Priority desc, DateToProcess asc
The first type of update will look like this:
merge into Domain as target
using #myTablePrm as source
on source.Name = target.Name
and source.MetricType = target.MetricType
when matched then
update set
DateToProcess = source.DateToProcess,
Priority = source.Priority,
DateProcessed = case -- set to null if DateToProcess is in the future
when DateToProcess < DateProcessed then DateProcessed
else null end
when not matched then
insert (Name, MetricType, Priority, DateToProcess)
values (source.Name, source.MetricType, source.Priority, source.DateToProcess);
The second type of update will look like this:
update Domain
set DateProcessed = source.DateProcessed
from #myTablePrm source
where Name = source.Name and MetricType = #metricType
Are these the best indexes for optimal insert, update and select speed?
-- for the order by clause in the select query
create index IX_Domain_PriorityQueue
on Domain(Priority desc, DateToProcess asc)
where DateProcessed is null;
-- for the where clause in the select query
create index IX_Domain_MetricType
on Domain(MetricType asc);
Observations:
Your updates should use the PK
Why not use tinyint (range 0-255) to make the rows even narrower?
Do you need datetime? Can you use smalledatetime?
Ideas:
Your SELECT query doesn't have an index to cover it. You need one on (DateToProcess, MetricType, Priority DESC) INCLUDE (Name) WHERE DateProcessed IS NULL
`: you'll have to experiment with key column order to get the best one
You could extent that index to have a filtered indexes per MetricType too (keeping DateProcessed IS NULL filter). I'd do this after the other one when I do have millions of rows to test with
I suspect that your best performance will come from having no indexes on Priority and MetricType. The cardinality is likely too low for the indexes to do much good.
An index on DateToProcess will almost certainly help, as there is lilely to be high cardinality in that column and it is used in a WHERE and ORDER BY clause. I would start with that first.
Whether an index on DateProcessed will help is up for debate. That depends on what percentage of NULL values you expect for this column. Your best bet, as usual, is to examine the query plan with some real data.
In the table schema section, you have highlighted that 'MetricType' is one of two Primary keys, therefore this should definately be indexed along with the Name column. As for the 'Priority' and 'DateToProcess' fields as these will be present in a where clause it can't hurt to have them indexed also but I don't recommend the where clause you have on that index of 'DateProcessed' is null, indexing just a set of the data is not a good idea, remove this and index the whole of both those columns.
Consider the following table, which specifies fruit that is forbidden on given days of the week. DayOfWeek is nullable, where a NULL signifies that this type of fruit is forbidden on all days of the week.
Fruit DayOfWeek
----------------------
Kiwi NULL
Apples Monday
Strawberries Monday
Oranges Tuesday
Bananas Wednesday
Pineapple Thursday
Is it possible to implement a constraint on this table that prevents me from inserting the values (Kiwi, Monday), since Kiwis are already banned on Mondays (and every other day) by the existing (Kiwi, NULL) row.
Preferably this should be implemented without the use of triggers.
Unless you have a really good justification you shouldn't change the meaning of NULL. Null reflects a value is unknown so I would read this as We don't know what day of the week kiwi's are banned. You would then change your logic to store a record for each day of the week that Kiwi is banned on.
What if you need to write a query which says give me all forbidden fruit for monday. You need to write your query as
select * from BadFruit where DayOfWeek='Monday' || DayOfWeek is null
A more efficent and easier to understand query would eliminate the or clause.
I agree with others that I probably would aim for a different model than the one you've shown, but if you're sold on it, then the following seems to do what you want:
create table dbo.Weekdays (
DayOfWeek varchar(10) not null,
constraint PK_Weekdays PRIMARY KEY (DayOfWeek)
)
go
create table dbo.Exclusions (
Fruit varchar(20) not null,
DayOfWeek varchar(10) null,
/* PK? */
constraint FK_Exclusions FOREIGN KEY (DayOfWeek) references Weekdays (DayOfWeek)
)
Not sure what the PK is for the Exclusions table should be, not obvious from what you've shown, but it's intended to be your table. We need to introduce the Weekdays table to make the later view work*. Now populate it:
insert into dbo.Weekdays (DayOfWeek)
select 'Monday' union all
select 'Tuesday' union all
select 'Wednesday' union all
select 'Thursday' union all
select 'Friday' union all
select 'Saturday' union all
select 'Sunday'
And your sample data:
insert into dbo.Exclusions (Fruit,DayOfWeek)
select 'Kiwi',NULL union all
select 'Apples','Monday' union all
select 'Strawberries','Monday' union all
select 'Oranges','Tuesday' union all
select 'Bananas','Wednesday' union all
select 'Pineapple','Thursday'
Now we create the view to implement your constraint:
create view dbo.Exclusions_NullExpanded
with schemabinding
as
select
e.Fruit,
wd.DayOfWeek
from
dbo.Exclusions e
inner join
dbo.Weekdays wd
on
e.DayOfWeek = wd.DayOfWeek or
e.DayOfWeek is null
go
create unique clustered index IX_Exclusions_NoDups on dbo.Exclusions_NullExpanded (Fruit,DayOfWeek)
And, if we try to insert the row you don't want us to:
insert into dbo.Exclusions (Fruit,DayOfWeek)
select 'Kiwi','Monday'
We get:
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.Exclusions_NullExpanded' with unique index 'IX_Exclusions_NoDups'.
The statement has been terminated.
*I initially tried to do this without introducing the Weekdays table, and have it appear inline in the view definition, as a subselect of literal rows. But you can't create the index that enforces the constraint we want on such a view.
I am personally not fond of the idea of NULL mean all, but changing this to include a row for each day when NULL is probably out of scope.
If a trigger is not an option, I would look at a CHECK CONSTRAINT and setting up a function that tests the condition you are trying to avoid.
Try this link http://www.java2s.com/Code/SQL/Select-Clause/SettingaUniqueConstraint.htm
For MySQL you can set unique contraints.