I know that the value itself for a RowVersion column is not in and of itself useful, except that it changes each time the row is updated. However, I was wondering if they are useful for relative (inequality) comparison.
If I have a table with a RowVersion column, are either of the following true:
Will all updates that occur simultaneously (either same update statement or same transaction) have the same value in the RowVersion column?
If I do update "A", followed by update "B", will the rows involved in update "B" have a higher value than the rows involved in update "A"?
Thanks.
From MSDN:
Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database. This counter is the database rowversion. This tracks a relative time within a database, not an actual time that can be associated with a clock. Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column.
http://msdn.microsoft.com/en-us/library/ms182776.aspx
As far as I understand, nothing ACTUALLY happens simultaneously in the system. This means that all rowversions should be unique. I venture to say that they would be effectively useless if duplicates were allowed within the same table. Also giving credance to rowversions not being duplicated is MSDN's stance on not using them as primary keys not because it would cause violations, but because it would cause foreign key issues.
According to MSDN, "The rowversion data type is just an incrementing number..." so yes, later is larger.
To the question of how much it increments, MSDN states, "[rowversion] tracks a relative time within a database" which indicates that it is not a fluid integer incrementing, but time based. However, this "time" reveals nothing of when exactly, but rather when in relation to other rows a row was inserted/modified.
Some additional information.
RowVersion converts nicely to bigint and thus one can display better readable output when debugging:
CREATE TABLE [dbo].[T1](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](50) NULL,
[RowVer] [timestamp] NOT NULL
)
insert into t1 ([value]) values ('a')
insert into t1 ([value]) values ('b')
insert into t1 ([value]) values ('c')
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'x' where id = 3
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'y'
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
Id Value RowVer
1 a 2037
2 b 2038
3 c 2039
Id Value RowVer
1 a 2037
2 b 2038
3 x 2040
Id Value RowVer
1 y 2041
2 y 2042
3 y 2043
I spent ages trying to sort something out with this - to ask for columns updated after a particular sequence number. The timestamp is really just a sequence number - it's also bigendian when c# functions like BitConverter.ToInt64 want littleendian.
I ended up creating a db view on the table i want data from with an alias column 'SequenceNo'
SELECT ID, CONVERT(bigint, Timestamp) AS SequenceNo
FROM dbo.[User]
c# Code first sees the view (ie UserV) identically to a normal table
then in my linq I can join the view and parent table and compare with a sequence number
var users = (from u in context.GetTable<User>()
join uv in context.GetTable<UserV>() on u.ID equals uv.ID
where mysequenceNo < uv.SequenceNo
orderby uv.SequenceNo
select u).ToList();
to get what I want - all the entries changed since the last time I checked.
What makes you think Timestamp data types are evil? The data type is very useful for concurrency checking. Linq-To-SQL uses this data type for this very purpose.
The answers to your questions:
1) No. This value is updated each time the row is updated. If you are updating the row say five times, each update will increment the Timestamp value. Of course, you realize that updates that "occur simultaneously" really don't. They still only occur one at a time, in turn.
2) Yes.
Just as a note, timestamp is deprecated in SQL Server 2008 onwards. rowversion should be used instead.
From this page on MSDN:
The timestamp syntax is deprecated. This feature will be removed in a
future version of Microsoft SQL Server. Avoid using this feature in
new development work, and plan to modify applications that currently
use this feature.
Rowversion does break one of the "idealistic" approaches of SQL - that an UPDATE statement is a single, atomic action, and acts as if all UPDATEs (both to all columns within a row, and all rows within the table) occur "at the same time". But in this case, with Rowversion, it is possible to determine that one row was updated at a slightly different time than another.
Note that the order in which rows are updated (by a single update statement) is not guaranteed - it may, by coincidence follow the same order as the clustered key for the table, but I wouldn't count on that being true.
To answer part of your question: you can end up with duplicate values according to MSDN:
Duplicate rowversion values can be generated by using the SELECT INTO
statement in which a rowversion column is in the SELECT list. We do
not recommend using rowversion in this manner.
Source: rowversion (Transact-SQL)
Every database has a counter that is incremented one by one on every data modification that is done in the database. If the table containing the affected (by update/insert) row contains a timestamp/rowversion column, the current counter value of the database is stored in that column of the updated/inserted record.
Related
I'm trying to insert records into a table in a certain (and simple) order, as the table have an IDENTITY column (e.g. MyTbl (ID INT IDENTITY(1,1), Sale_Date DATE, Product_ID INT, Sales INT).
The query is quite simple (this is just a simplified example):
INSERT INTO MyTbl (Sale_Date, Product_ID, Sales)
SELECT Sale_Date, Product_ID,COUNT(*) as sales
FROM Fact_tbl
GROUP BY Sale_Date,Product_ID
ORDER BY Sale_Date,Product_ID
The expected behavior is that when I select the highest values of the identity ID column, I should see the latest Sale_Date. However, this is not the case. The order of the ID column in the table has nothing to do with the dates. To make things even worse, if I recreate the table and run the same INSERT statement again and again and again, I'm getting different order of insertion each time for the same data.
I'm getting this behavior even if I encase the query and put the ORDER BY in or out of the casing.
I never saw this behavior in any other SQL platform. Is this the expected behavior in Snowflake?
It's expected. Let me explain the reason:
AUTOINCREMENT and IDENTITY are synonymous. If either is specified for a column, Snowflake utilizes a sequence to generate the values for the column.
https://docs.snowflake.com/en/sql-reference/sql/create-table.html#optional-parameters
There is no guarantee that values from a sequence are contiguous (gap-free) or that the sequence values are assigned in a particular order. There is, in fact, no way to assign values from a sequence to rows in a specified order other than to use single-row statements (this still provides no guarantee about gaps).
https://docs.snowflake.com/en/user-guide/querying-sequences.html#sequence-semantics
With Snowflake each INSERT has completely different order than the
same INSERT that ran a couple of minutes ago
No, it should insert the data in expected order because you use "ORDER BY" clause. The issue is, the sequence values are not assigned in a particular order!
It's not easy to verify if the data is sorted when you use "INSERT/SELECT ORDER BY", unless you have access to underlying metadata. For testing, you may define clustering keys on a table that you ingested "sorted" data.
Anyway, if you want to assign IDs matching the order when inserting bulk data, you need to use ROW_NUMBER instead of using an IDENTITY column or any sequence values.
This is not expected behavior in Snowflake. However the way you insert data into your table (with the order by) doesn't affect the order in which the data is stored inside the table. You can leave the order by out in the insert, but you should include it in your select.
I'm trying to rank a series of transactions, however my source data does not capture the time of a transaction which can happen multiple times a day, the only other field I can use is a timestamp field - will this be ranked correctly?
Here's the code
SELECT [LT].[StockCode]
, [LT].[Warehouse]
, [LT].[Lot]
, [LT].[Bin]
, [LT].[TrnDate]
, [LT].[TrnQuantity]
, [LT].[TimeStamp]
, LotRanking = Rank() Over (Partition By [LT].[Warehouse],[LT].[StockCode],[LT].[Lot] Order By [LT].[TrnDate] Desc, [LT].[TimeStamp] Desc)
From [LotTransactions] [LT]
Results being returned are as below
StockCode |Warehouse |Lot |Bin |TrnDate |TrnQuantity |TimeStamp |LotRanking
2090 |CB |3036 |CB |2016-02-16 00:00:00.000 |2.000000 |0x0000000000500AB9 |1
2090 |CB |3036 |CB |2016-02-16 00:00:00.000 |2.000000 |0x0000000000500A4E |2
First, you should be using rowversion rather than timestamp for keeping track of row versioning information. I believe timestamp is deprecated. At the very least, the documentation explicitly suggests [rowversion][1].
Second, I would strongly recommend that you add an identity column to the table. This will provide the information that you really need -- as well as a nice unique key for the table.
In general, a timestamp or rowversion is used just to determine whether or not a row has changed -- not to determine the ordering. But, based on this description, what you are doing might be correct:
Each database has a counter that is incremented for each insert or
update operation that is performed on a table that contains a
timestamp column within the database. This counter is the database
timestamp. This tracks a relative time within a database, not an
actual time that can be associated with a clock. A table can have only
one timestamp column. Every time that a row with a timestamp column is
modified or inserted, the incremented database timestamp value is
inserted in the timestamp column.
I would caution that this might not be safe. Instead, it gives a reason why such an approach might make sense. Let me repeat the recommendation: add an identity column, so you are correctly adding this information, at least for the future.
You can use something like this to get datetime of transaction:
SELECT LEFT(CONVERT(nvarchar(50),[LT].[TrnDate],121),10) + RIGHT(CONVERT(nvarchar(50),CAST([LT].[TimeStamp] as datetime),121),13)
For first string it will be:
2016-02-16 04:51:25.417
And use this for ranking.
I'm using SQL Server 2014. My request I believe is rather simple. I have one table containing a field holding a date value that is stored as VARCHAR, and another table containing a field holding a date value that is stored as INT.
The date value in the VARCHAR field is stored like this: 2015M01
The data value in the INT field is stored like this: 201501
I need to compare these tables against each other using EXCEPT. My thought process was to somehow extract or TRIM the "M" out of the VARCHAR value and see if it would let me compare the two. If anyone has a better idea such as using CAST to change the date formats or something feel free to suggest that as well.
I am also concerned that even extracting the "M" out of the VARCHAR may still prevent the comparison since one will still remain VARCHAR and the other is INT. If possible through a T-SQL query to convert on the fly that would be great advice as well. :)
REPLACE the string and then CONVERT to integer
SELECT A.*, B.*
FROM TableA A
INNER JOIN
(SELECT intField
FROM TableB
) as B
ON CONVERT(INT, REPLACE(A.varcharField, 'M', '')) = B.intField
Since you say you already have the query and are using EXCEPT, you can simply change the definition of that one "date" field in the query containing the VARCHAR value so that it matches the INT format of the other query. For example:
SELECT Field1, CONVERT(INT, REPLACE(VarcharDateField, 'M', '')) AS [DateField], Field3
FROM TableA
EXCEPT
SELECT Field1, IntDateField, Field3
FROM TableB
HOWEVER, while I realize that this might not be feasible, your best option, if you can make this happen, would be to change how the data in the table with the VARCHAR field is stored so that it is actually an INT in the same format as the table with the data already stored as an INT. Then you wouldn't have to worry about situations like this one.
Meaning:
Add an INT field to the table with the VARCHAR field.
Do an UPDATE of that table, setting the INT field to the string value with the M removed.
Update any INSERT and/or UPDATE stored procedures used by external services (app, ETL, etc) to do that same M removal logic on the way in. Then you don't have to change any app code that does INSERTs and UPDATEs. You don't even need to tell anyone you did this.
Update any "get" / SELECT stored procedures used by external services (app, ETL, etc) to do the opposite logic: convert the INT to VARCHAR and add the M on the way out. Then you don't have to change any app code that gets data from the DB. You don't even need to tell anyone you did this.
This is one of many reasons that having a Stored Procedure API to your DB is quite handy. I suppose an ORM can just be rebuilt, but you still need to recompile, even if all of the code references are automatically updated. But making a datatype change (or even moving a field to a different table, or even replacinga a field with a simple CASE statement) "behind the scenes" and masking it so that any code outside of your control doesn't know that a change happened, not nearly as difficult as most people might think. I have done all of these operations (datatype change, move a field to a different table, replace a field with simple logic, etc, etc) and it buys you a lot of time until the app code can be updated. That might be another team who handles that. Maybe their schedule won't allow for making any changes in that area (plus testing) for 3 months. Ok. It will be there waiting for them when they are ready. Any if there are several areas to update, then they can be done one at a time. You can even create new stored procedures to run in parallel for any updated app code to have the proper INT datatype as the input parameter. And once all references to the VARCHAR value are gone, then delete the original versions of those stored procedures.
If you want everything in the first table that is not in the second, you might consider something like this:
select t1.*
from t1
where not exists (select 1
from t2
where cast(replace(t1.varcharfield, 'M', '') as int) = t2.intfield
);
This should be close enough to except for your purposes.
I should add that you might need to include other columns in the where statement. However, the question only mentions one column, so I don't know what those are.
You could create a persisted view on the table with the char column, with a calculated column where the M is removed. Then you could JOIN the view to the table containing the INT column.
CREATE VIEW dbo.PersistedView
WITH SCHEMA_BINDING
AS
SELECT ConvertedDateCol = CONVERT(INT, REPLACE(VarcharCol, 'M', ''))
--, other columns including the PK, etc
FROM dbo.TablewithCharColumn;
CREATE CLUSTERED INDEX IX_PersistedView
ON dbo.PersistedView(<the PK column>);
SELECT *
FROM dbo.PersistedView pv
INNER JOIN dbo.TableWithIntColumn ic ON pv.ConvertedDateCol = ic.IntDateCol;
If you provide the actual details of both tables, I will edit my answer to make it clearer.
A persisted view with a computed column will perform far better on the SELECT statement where you join the two columns compared with doing the CONVERT and REPLACE every time you run the SELECT statement.
However, a persisted view will slightly slow down inserts into the underlying table(s), and will prevent you from making DDL changes to the underlying tables.
If you're looking to not persist the values via a schema-bound view, you could create a non-persisted computed column on the table itself, then create a non-clustered index on that column. If you are using the computed column in WHERE or JOIN clauses, you may see some benefit.
By way of example:
CREATE TABLE dbo.PCT
(
PCT_ID INT NOT NULL
CONSTRAINT PK_PCT
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SomeChar VARCHAR(50) NOT NULL
, SomeCharToInt AS CONVERT(INT, REPLACE(SomeChar, 'M', ''))
);
CREATE INDEX IX_PCT_SomeCharToInt
ON dbo.PCT(SomeCharToInt);
INSERT INTO dbo.PCT(SomeChar)
VALUES ('2015M08');
SELECT SomeCharToInt
FROM dbo.PCT;
Results:
I have the following table:
tbl_ProductCatg
Id IDENTITY
Code
Description
a few more.
Id field is auto-incremented and I have to insert this field value in Code field.
i.e. if Id generated is 1 then in Code field the value should be inserted like 0001(formatted for having length of four),if id is 77 Code should be 0077.
For this, I made the query like:
insert into tbl_ProductCatg(Code,Description)
values(RIGHT('000'+ltrim(Str(SCOPE_IDENTITY()+1,4)),4),'testing')
This query runs well in sql server query analyzer but if I write this in C# then it insets Null in Code even Id field is updated well.
Thanks
You may want to look at Computed Columns (Definition)
From what is sounds like you are trying to do, this would work well for you.
CREATE TABLE tbl_ProductCatg
(
ID INT IDENTITY(1, 1)
, Code AS RIGHT('000' + CAST(ID AS VARCHAR(4)), 4)
, Description NVARCHAR(128)
)
or
ALTER TABLE tbl_ProductCatg
ADD Code AS RIGHT('000' + CAST(id AS VARCHAR(4)), 4)
You can also make the column be PERSISTED so it is not calculated every time it is referenced.
Marking a column as PERSISTED Specifies that the Database Engine will physically store the computed values in the table, and update the values when any other columns on which the computed column depends are updated.
Unfortunately SCOPE_IDENTITY isn't designed to be used during an insert so the value will not be populated until after the insert happens.
The three solutions I can see of doing this would be either making a stored procedure to generate the scope identity and then do an update of the field.
insert into tbl_ProductCatg(Description) values(NULL,'testing')
update tbl_ProductCatg SET code=RIGHT('000'+ltrim(Str(SCOPE_IDENTITY()+1,4)),4) WHERE id=SCOPE_IDENTITY()
The second option, is taking this a step further and making this into a trigger which runs on UPDATE and INSERT. I've always been taught to avoid triggers where possible and instead do things at the SP level, but triggers are justified in some cases.
The third option is computed fields, as described by #Adam Wenger
I have a customer table, and my requirement is to add a new varchar column that automatically obtains a random unique value each time a new customer is created.
I thought of writing an SP that randomizes a string, then check and re-generate if the string already exists. But to integrate the SP into the customer record creation process would require transactional SQL stuff at code level, which I'd like to avoid.
Help please?
edit:
I should've emphasized, the varchar has to be 5 characters long with numeric values between 1000 and 99999, and if the number is less than 10000, pad 0 on the left.
if it has to be varchar, you can cast a uniqueidentifier to varchar.
to get a random uniqueidentifier do NewId()
here's how you cast it:
CAST(NewId() as varchar(36))
EDIT
as per your comment to #Brannon:
are you saying you'll NEVER have over 99k records in the table? if so, just make your PK an identity column, seed it with 1000, and take care of "0" left padding in your business logic.
This question gives me the same feeling I get when users won't tell me what they want done, or why, they only want to tell me how to do it.
"Random" and "Unique" are conflicting requirements unless you create a serial list and then choose randomly from it, deleting the chosen value.
But what's the problem this is intended to solve?
With your edit/update, sounds like what you need is an auto-increment and some padding.
Below is an approach that uses a bogus table, then adds an IDENTITY column (assuming that you don't have one) which starts at 1000, and then which uses a Computed Column to give you some padding to make everything work out as you requested.
CREATE TABLE Customers (
CustomerName varchar(20) NOT NULL
)
GO
INSERT INTO Customers
SELECT 'Bob Thomas' UNION
SELECT 'Dave Winchel' UNION
SELECT 'Nancy Davolio' UNION
SELECT 'Saded Khan'
GO
ALTER TABLE Customers
ADD CustomerId int IDENTITY(1000,1) NOT NULL
GO
ALTER TABLE Customers
ADD SuperId AS right(replicate('0',5)+ CAST(CustomerId as varchar(5)),5)
GO
SELECT * FROM Customers
GO
DROP TABLE Customers
GO
I think Michael's answer with the auto-increment should work well - your customer will get "01000" and then "01001" and then "01002" and so forth.
If you want to or have to make it more random, in this case, I'd suggest you create a table that contains all possible values, from "01000" through "99999". When you insert a new customer, use a technique (e.g. randomization) to pick one of the existing rows from that table (your pool of still available customer ID's), and use it, and remove it from the table.
Anything else will become really bad over time. Imagine you've used up 90% or 95% of your available customer ID's - trying to randomly find one of the few remaining possibility could lead to an almost endless retry of "is this one taken? Yes -> try a next one".
Marc
Does the random string data need to be a certain format? If not, why not use a uniqueidentifier?
insert into Customer ([Name], [UniqueValue]) values (#Name, NEWID())
Or use NEWID() as the default value of the column.
EDIT:
I agree with #rm, use a numeric value in your database, and handle the conversion to string (with padding, etc) in code.
Try this:
ALTER TABLE Customer ADD AVarcharColumn varchar(50)
CONSTRAINT DF_Customer_AVarcharColumn DEFAULT CONVERT(varchar(50), GETDATE(), 109)
It returns a date and time up to milliseconds, wich would be enough in most cases.
Do you really need an unique value?