TSQL Ranking using timestamp and datetime - sql-server

I'm trying to rank a series of transactions, however my source data does not capture the time of a transaction which can happen multiple times a day, the only other field I can use is a timestamp field - will this be ranked correctly?
Here's the code
SELECT [LT].[StockCode]
, [LT].[Warehouse]
, [LT].[Lot]
, [LT].[Bin]
, [LT].[TrnDate]
, [LT].[TrnQuantity]
, [LT].[TimeStamp]
, LotRanking = Rank() Over (Partition By [LT].[Warehouse],[LT].[StockCode],[LT].[Lot] Order By [LT].[TrnDate] Desc, [LT].[TimeStamp] Desc)
From [LotTransactions] [LT]
Results being returned are as below
StockCode |Warehouse |Lot |Bin |TrnDate |TrnQuantity |TimeStamp |LotRanking
2090 |CB |3036 |CB |2016-02-16 00:00:00.000 |2.000000 |0x0000000000500AB9 |1
2090 |CB |3036 |CB |2016-02-16 00:00:00.000 |2.000000 |0x0000000000500A4E |2

First, you should be using rowversion rather than timestamp for keeping track of row versioning information. I believe timestamp is deprecated. At the very least, the documentation explicitly suggests [rowversion][1].
Second, I would strongly recommend that you add an identity column to the table. This will provide the information that you really need -- as well as a nice unique key for the table.
In general, a timestamp or rowversion is used just to determine whether or not a row has changed -- not to determine the ordering. But, based on this description, what you are doing might be correct:
Each database has a counter that is incremented for each insert or
update operation that is performed on a table that contains a
timestamp column within the database. This counter is the database
timestamp. This tracks a relative time within a database, not an
actual time that can be associated with a clock. A table can have only
one timestamp column. Every time that a row with a timestamp column is
modified or inserted, the incremented database timestamp value is
inserted in the timestamp column.
I would caution that this might not be safe. Instead, it gives a reason why such an approach might make sense. Let me repeat the recommendation: add an identity column, so you are correctly adding this information, at least for the future.

You can use something like this to get datetime of transaction:
SELECT LEFT(CONVERT(nvarchar(50),[LT].[TrnDate],121),10) + RIGHT(CONVERT(nvarchar(50),CAST([LT].[TimeStamp] as datetime),121),13)
For first string it will be:
2016-02-16 04:51:25.417
And use this for ranking.

Related

SQL Server: Slowly Changing Dimension Type 2 on historical records

I am trying to set up a SCD of Type 2 for historical records within my Customer table. Attached is how the Customer table is set up alongside the expected outcome. Note that the Customer table in practice has 2 million distinct Customer IDs. I tried to use the query below, but the Start_Date and End_Date are repeating for each row.
SELECT t.Customer_ID, t.Lifecyle_ID, t.Date As Start_Date,
LEAD(t.Date) OVER (ORDER BY t.Date) AS End_Date
FROM Customer AS t
I think a three step query is likely needed.
Use LEAD and LAG, partitioned by Customer and ordered by date, to peek at the next row's values for both Date and Lifecycle.
Use a CASE statement to emit a value for End Date when the current row's Lifecycle <> the next row's lifecycle (otherwise emit NULL). Now do the same using LAG for the Effective Date.
Group By or Distinct on the output from Step #2.
Hopefully that makes sense. I'll try to post a code example later today, but hopefully that's enough to get you started.

Add column to track percent/rate of change

I have a table which captures backup file sizes and creation dates. It is my hope that I can track down which backup files are growing at a faster rate than others. How can I setup a query to add a column to track the percentage growth or rate of growth?
Example data:
Filename CreationDate Size
DB1 2017-06-19 13:00:28.450 96480
DB1 2017-06-20 13:00:36.627 97568
DB2 2017-06-18 22:00:00.800 19672
DB2 2017-06-19 22:00:00.370 19672
DB2 2017-06-20 22:00:00.913 19672
DB3 2017-06-18 22:00:04.520 17872840
DB3 2017-06-19 22:00:05.183 17873864
DB3 2017-06-20 22:00:06.400 17878984
Edit: I would like to have 2 separate new columns which track the percentage change. Column1 will be the percentage change since the most recent (i.e. change between yesterday and today). Column2 will be the percentage change cine the oldest file on record (i.e. change between today and the oldest date for the specific filename). Hope that helps. Also not sure why I got downvoted for this question.
tSQL... assuming SQL server. You have window functions available that can look ahead at the next record in the partition(filename) allowing you to do the percentages in a query.
SELECT td.fileName
, td.creationdate
, td.size
, lag(td.size) over (partition by td.filename order by td.creationdate asc) / td.size as PercentChange
FROM TableData td
The "magic" is here: lag(td.size) over (partition by td.filename order by td.creationdate asc)
This basically says return the size of the record having the same filename as this one but a creationdate before the record being evaluated.
Using this we can get the prior value if one exists. and then do some basic math. I wasn't sure how you wanted to do the % if it's New size / old or something else... Note if size is an integer we may have integer math involved so you may have to cast size to a decimal value of sufficient size (if you need the decimals)

Creating a function to add a calculated column to a table in SQL Server based on a query

I have 2 database tables called Spend, and VendorSpend. The columns used in the Spend table are called VendorID, VendorName, RecordDate, and Charges. The VendorSpend table contains VendorID and VendorName but with distinct data (one record for each unique VendorID). I need a simple way to add a column to the VendorSpend table called Aug2015, this column will contain the SUM of each Vendor's charges within that month time period. It will be calculated based on this query:
Select Sum(Charges)
from Spend
where RecordDate >= '2015-08-01' and RecordDate <= '2015-08-31'
Keep in mind this will need to be called whenever new data is inserted into the Spend table and the VendorSpend table will need to update based on the new data. This will happen every month so actually a new column will need to be added and the data be calculated every month.
Any assistance is greatly appreciated.
Create a user-defined function that you pass a VendorID and Date to and which does your SELECT:
Select Sum(Charges)
from Spend
where VendorID=#VendorID
AND DATEDIFF(month, RecordDate, #Date) = 0
Now personally, I would stop right there and use the function to select your data at query time, rather than adding a new column to your table.
But treating your question as academic, you can create a computed column called [Aug2015] in VendorSpend that passes [VendorID] and '08/01/2015' to this function and it will contain the desired result.

SQL Server RowVersion/Timestamp - Comparisons

I know that the value itself for a RowVersion column is not in and of itself useful, except that it changes each time the row is updated. However, I was wondering if they are useful for relative (inequality) comparison.
If I have a table with a RowVersion column, are either of the following true:
Will all updates that occur simultaneously (either same update statement or same transaction) have the same value in the RowVersion column?
If I do update "A", followed by update "B", will the rows involved in update "B" have a higher value than the rows involved in update "A"?
Thanks.
From MSDN:
Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database. This counter is the database rowversion. This tracks a relative time within a database, not an actual time that can be associated with a clock. Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column.
http://msdn.microsoft.com/en-us/library/ms182776.aspx
As far as I understand, nothing ACTUALLY happens simultaneously in the system. This means that all rowversions should be unique. I venture to say that they would be effectively useless if duplicates were allowed within the same table. Also giving credance to rowversions not being duplicated is MSDN's stance on not using them as primary keys not because it would cause violations, but because it would cause foreign key issues.
According to MSDN, "The rowversion data type is just an incrementing number..." so yes, later is larger.
To the question of how much it increments, MSDN states, "[rowversion] tracks a relative time within a database" which indicates that it is not a fluid integer incrementing, but time based. However, this "time" reveals nothing of when exactly, but rather when in relation to other rows a row was inserted/modified.
Some additional information.
RowVersion converts nicely to bigint and thus one can display better readable output when debugging:
CREATE TABLE [dbo].[T1](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](50) NULL,
[RowVer] [timestamp] NOT NULL
)
insert into t1 ([value]) values ('a')
insert into t1 ([value]) values ('b')
insert into t1 ([value]) values ('c')
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'x' where id = 3
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'y'
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
Id Value RowVer
1 a 2037
2 b 2038
3 c 2039
Id Value RowVer
1 a 2037
2 b 2038
3 x 2040
Id Value RowVer
1 y 2041
2 y 2042
3 y 2043
I spent ages trying to sort something out with this - to ask for columns updated after a particular sequence number. The timestamp is really just a sequence number - it's also bigendian when c# functions like BitConverter.ToInt64 want littleendian.
I ended up creating a db view on the table i want data from with an alias column 'SequenceNo'
SELECT ID, CONVERT(bigint, Timestamp) AS SequenceNo
FROM dbo.[User]
c# Code first sees the view (ie UserV) identically to a normal table
then in my linq I can join the view and parent table and compare with a sequence number
var users = (from u in context.GetTable<User>()
join uv in context.GetTable<UserV>() on u.ID equals uv.ID
where mysequenceNo < uv.SequenceNo
orderby uv.SequenceNo
select u).ToList();
to get what I want - all the entries changed since the last time I checked.
What makes you think Timestamp data types are evil? The data type is very useful for concurrency checking. Linq-To-SQL uses this data type for this very purpose.
The answers to your questions:
1) No. This value is updated each time the row is updated. If you are updating the row say five times, each update will increment the Timestamp value. Of course, you realize that updates that "occur simultaneously" really don't. They still only occur one at a time, in turn.
2) Yes.
Just as a note, timestamp is deprecated in SQL Server 2008 onwards. rowversion should be used instead.
From this page on MSDN:
The timestamp syntax is deprecated. This feature will be removed in a
future version of Microsoft SQL Server. Avoid using this feature in
new development work, and plan to modify applications that currently
use this feature.
Rowversion does break one of the "idealistic" approaches of SQL - that an UPDATE statement is a single, atomic action, and acts as if all UPDATEs (both to all columns within a row, and all rows within the table) occur "at the same time". But in this case, with Rowversion, it is possible to determine that one row was updated at a slightly different time than another.
Note that the order in which rows are updated (by a single update statement) is not guaranteed - it may, by coincidence follow the same order as the clustered key for the table, but I wouldn't count on that being true.
To answer part of your question: you can end up with duplicate values according to MSDN:
Duplicate rowversion values can be generated by using the SELECT INTO
statement in which a rowversion column is in the SELECT list. We do
not recommend using rowversion in this manner.
Source: rowversion (Transact-SQL)
Every database has a counter that is incremented one by one on every data modification that is done in the database. If the table containing the affected (by update/insert) row contains a timestamp/rowversion column, the current counter value of the database is stored in that column of the updated/inserted record.

Ensuring index is used on Informix DATETIME column

Say I have a table on an Informix DB:
create table password_audit (
username CHAR(20),
old_password CHAR(20),
new_password CHAR(20),
update_date DATETIME YEAR TO FRACTION));
I need the update_date field to be in milliseconds (or seconds maybe - same question applies) because there will be multiple updates of the password on the same day.
Say, I have a nightly batch job that wants to retrieve all records from the password_audit table for today.
To increase performance, I want to put an index on the update_date column. If I do this:
CREATE INDEX pw_idx ON password_audit(update_date);
and run this SQL:
SELECT *
FROM password_audit
WHERE DATE(update_date) = mdy(?,?,?)
(where ?, ?, ? are the month, day and year passed in by my batch job)
then I don't think my index will be used - is that right?
I think I need to create an index something like this:
CREATE INDEX pw_idx ON password_audit(DATE(update_date));
- is that right?
Because you are forcing the server to convert two values to DATE, not DATETIME, then it probably won't use an index.
You would do best to generate the SQL as:
SELECT *
FROM password_audit
WHERE update_date
BETWEEN DATETIME(2010-08-02 00:00:00.00000) YEAR TO FRACTION(5)
AND DATETIME(2010-08-02 23:59:59.99999) YEAR TO FRACTION(5)
That's rather verbose. Alternatively, and maybe slightly more easily:
SELECT *
FROM password_audit
WHERE update_date >= DATETIME(2010-08-02 00:00:00.00000) YEAR TO FRACTION(5)
AND update_date < DATETIME(2010-08-03 00:00:00.00000) YEAR TO FRACTION(5)
Both of these should be able to use the index on the update_date column. You can experiment with dropping some of the trailing zeroes from the literals, but I don't think you'll be able to remove them all - but see what the SET EXPLAIN ON output tells you.
Depending on your server version, you might need to run UPDATE STATISTICS after creating the index before the optimizer uses it at all; that is more of a problem on older (say 10.00 and earlier) versions of Informix than on the current (11.10 and later) versions.
I Didn't see 'date_to_accounts_ni' defined in your password_audit table.
What datatype/length is it?
Your first index on password_audit.update_date is adequate, why would you want to index
(DATE(update_table))?

Resources