I have a table called MyHistory my history have about 1000 rows in this table and the performance is crappy at best.
What I want to do is select rows showing the next row as a result. This is probably a bad example.
this is MyHistory structure ID int,DateTimeColumn datetime,ValueResult decimal (4,2)
my table has the following data
ID|DateTimeColumn|ValueResult
1|8/1/2005 1:01:29 PM|2
1|8/1/2006 1:01:29 PM|3
1|8/1/2007 1:01:29 PM|5
1|8/1/2008 1:01:29 PM|9
What I want to do is select out of this the following data
ID|DateTimeColumn|ValueResult|ChangeValue
1|8/1/2008 1:01:29 PM|9|4
1|8/1/2007 1:01:29 PM|5|2
1|8/1/2006 1:01:29 PM|3|1
1|8/1/2005 1:01:29 PM|2|
You'll notice that ID is = ID and the datetime column is now desc. Thats the easy part. But how do I make a self referencing table (in order to calculate the difference in value) based on which datetime comes next?
Thanks!
So, the task is:
to order records by DateTimeColumn descending,
to set sequence number for each record to identify next record,
to calculate required difference in value.
This is one of many possible solutions:
-- Use CTE to make intermediate table with sequence numbers - ranks
;WITH a (rank, ID, DateTimeColumn, ValueResult) AS
(
select rank() OVER (ORDER BY m.DateTimeColumn DESC) as rank, ID, DateTimeColumn, ValueResult
from MyHistory
)
-- Select all resulting columns
select a1.ID,
a1.DateTimeColumn,
a1.ValueResult,
a1.ValueResult - a2.ValueResult as ChangeValue -- Difference between current record and next one
from a a1
join a a2
on a2.rank = a1.rank + 1 -- linking next record to each one
Related
I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer
I want to retrieve the 2nd last row result and I have seen this question:
How can I retrieve second last row?
but it uses order by which in my case does not work because the Emp_Number Column contains number of rows and date time stamp that mixes data if I use order by .
The rows 22 and 23 contain the total number of rows (excluding row 21 and 22) and the time and day it got entered respectively.
I used this query which returns the required result 21 but if this number increases it will cause an error.
SELECT TOP 1 *
FROM(
SELECT TOP 2 *
FROM DAT_History
ORDER BY Emp_Number ASC
) t
ORDER BY Emp_Number desc
Is there any way to get the 2nd last row value without using the Order By function?
There is no guarantee that the count will be returned in the one-but-last row, as there is no definite order defined. Even if those records were written in the correct order, the engine is free to return the records in any order, unless you specify an order by clause. But apparently you don't have a column to put in that clause to reproduce the intended order.
I propose these solutions:
1. Return the minimum of those values that represent positive integers
select min(Emp_Number * 1)
from DAT_history
where Emp_Number not regexp '[^0-9]'
See SQL Fiddle
This will obviously fail when the count is larger then the smallest employee number. But seeing the sample data, that would represent a number of records that is maybe not expected...
2. Count the records, ignoring the 2 aggregated records
select count(*)-2
from DAT_history
See SQL Fiddle
3. Relying on correct order without order by
As explained at the start, you cannot rely on the order, but if for some reason you still want to rely on this, you can use a variable to number the rows in a sub query, and then pick out the one that has been attributed the one-but-last number:
select Emp_Number * 1
from (select Emp_Number,
#rn := #rn + 1 rn
from DAT_history,
(select #rn := 0) init
) numbered
where rn = #rn - 1
See SQL Fiddle
The * 1 is added to convert the text to a number data type.
This is not a perfect solution. I am making some assumptions for this. Check if this could work for you.
;WITH cte
AS (SELECT emp_number,
Row_number()
OVER (
ORDER BY emp_number ASC) AS rn
FROM dat_history
WHERE Isdate(emp_number) = 0) --Omit date entries
SELECT emp_number
FROM cte
WHERE rn = 1 -- select the minimum entry, assuming it would be the count and assuming count might not exceed the emp number range of 9888000
I have a table with some names in a row. For each row I want to generate a random name. I wrote the following query to:
BEGIN transaction t1
Create table TestingName
(NameID int,
FirstName varchar(100),
LastName varchar(100)
)
INSERT INTO TestingName
SELECT 0,'SpongeBob','SquarePants'
UNION
SELECT 1, 'Bugs', 'Bunny'
UNION
SELECT 2, 'Homer', 'Simpson'
UNION
SELECT 3, 'Mickey', 'Mouse'
UNION
SELECT 4, 'Fred', 'Flintstone'
SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5
ROLLBACK Transaction t1
The problem is the "ABS(CHECKSUM(NEWID())) % 5" portion of this query sometime returns more than 1 row and sometimes returns 0 rows. I must be missing something but I can't see it.
If I change the query to
DECLARE #n int
set #n= ABS(CHECKSUM(NEWID())) % 5
SELECT FirstName from TestingName
WHERE NameID = #n
Then everything works and I get a random number per row.
If you take the query above and paste it into SQL management studio and run the first query a bunch of times you will see what I am attempting to describe.
The final update query will look like
Update TableWithABunchOfNames
set [FName] = (SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5)
This does not work because sometimes I get more than 1 row and sometimes I get no rows.
What am I missing?
The problem is that you are getting a different random value for each row. That is the problem. This query is probably doing a full table scan. The where clause is executed for each row -- and a different random number is generated.
So, you might get a sequence of random numbers where none of the ids match. Or a sequence where more than one matches. On average, you'll have one match, but you don't want "on average", you want a guarantee.
This is when you want rand(), which produces only one random number per query:
SELECT FirstName
from TestingName
WHERE NameID = floor(rand() * 5);
This should get you one value.
Why not use top 1?
Select top 1 firstName
From testingName
Order by newId()
This worked for me:
WITH
CTE
AS
(
SELECT
ID
,FName
,CAST(5 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS int) AS rr
FROM
dbo.TableWithABunchOfNames
)
,CTE_ForUpdate
AS
(
SELECT
CTE.ID
, CTE.FName
, dbo.TestingName.FirstName AS RandomName
FROM
CTE
LEFT JOIN dbo.TestingName ON dbo.TestingName.NameID = CTE.rr
)
UPDATE CTE_ForUpdate
SET FName = RandomName
;
This solution depends on how smart optimizer is.
For example, if I use INNER JOIN instead of LEFT JOIN (which is the correct choice for this query), optimizer would move calculation of random numbers outside the join loop and end result would be not what we expect.
I created a table TestingName with 5 rows as in the question and a table TableWithABunchOfNames with 100 rows.
Here is the execution plan with LEFT JOIN. You can see the Compute scalar that calculates random numbers is done before the join loop. You can see that 100 rows were updated:
Here is the execution plan with INNER JOIN. You can see the Compute scalar that calculates random numbers is done after the join loop and with extra filter. This query may update not all rows in TableWithABunchOfNames and some rows in TableWithABunchOfNames may be updated several times. You can see that Filter left 102 rows and Stream aggregate left only 69 rows. It means that only 69 rows were eventually updated and also there were multiple matches for some rows (102 - 69 = 33).
To guarantee that the result is what you expect you should generate random number for each row in TableWithABunchOfNames and explicitly remember the result, i.e. materialize the CTE shown above. Then use this temporary result to join with the table TestingName.
You can add a column to TableWithABunchOfNames to store generated random numbers or save CTE to a temp table or table variable.
Request your help in acheiving the following result from the date set below
I have the below result set
CampaignName Matchfrom MatchTo
a 08-09-2013 07-11-2013
a 10-09-2013 10-11-2013
a 08-11-2013 07-01-2014
a 09-11-2013 08-01-2014
above set is sorted on matchfrom date column. First row will be considered as a master
now the query should filter out the rows in which matchfrom lies in the date range of the master.
This, I achieved using a self join. But now the third row is completely out of range of the master(1st row). This should now be considered as the master and it should filter out the 4th row.
Final result set will be like the below, marked as pass and fail
CampaignName Matchfrom MatchTo
a 08-09-2013 07-11-2013 PASS
a 10-09-2013 10-11-2013 FAIL
a 08-11-2013 07-01-2014 PASS
a 09-11-2013 08-01-2014 FAIL
Can someone advise me on this
With you data you'll have to do a bit more scrubbing but the code below should get you in the right direction. You have to be careful because your MatchFrom and MatchTo in your "Master Record"go opposite directions than all of your other data.
CREATE TABLE #tmpCampaign(
CampaignName varchar(1),
Matchfrom Date,
MatchTo Date
)
INSERT INTO #tmpCampaign VALUES
('a','08-09-2013','07-11-2013'),
('a','10-09-2013','10-11-2013'),
('a','08-11-2015','07-01-2014'),
('a','09-11-2013','08-01-2014')
;WITH Campaign AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY campaignName ORDER BY MatchFrom) as CampRank
FROM #tmpCampaign)
SELECT c1.*, c2.MatchFrom as MasterFrom, c2.MatchTo as MasterTo,
CASE WHEN c1.Matchfrom >= c2.MatchFrom AND c1.Matchfrom <= c2.MatchTo THEN 'Pass'
ELSE 'Fail' END as PassFail
FROM Campaign as c1
JOIN Campaign as c2
ON c1.CampaignName = c2.CampaignName and c2.CampRank = 1
may be this is create problem when date duplication happens but as for your result set i have picked the datekey and done the partition according to that to achieve results
;With Cte as
(select Campaignname,
matchfrom,
matchto,
ROW_number()OVER(PARTITION BY right(matchfrom, len(matchfrom) - charindex('-', matchfrom) - 3)ORDER BY Campaignname)RN
from #tmpCampaign )
select Campaignname,
matchfrom,
matchto,
Case when RN = 1 then 'Pass' ELSE 'Fail' END
from Cte
I have a table INDICATORS that stores details and current scores of performance indicators. I have another table IND_HISTORIES that stores historical values of the indicator scores. Data are stored from INDICATORS to IND_HISTORIES at set periods (ie quarterly), to establish score / rating trends.
IND_HISTORIES has a column structure similar to this-
pk_IndHistId fk_IndId Score DateSaved
Rating levels are also defined, meaning a score value of 1 to 3 is Low, 4 to 6 is Avg, and 7 to 9 is High.
I am trying to build an alert feature, whereby a record will be returned if it's most recent rating level (based on most recent score in IND_HISTORIES) is greater than it's second-most recent rating level (based on second-most recent score in IND_HISTORIES).
I am using code like below to build a temp table that translates score values to rating level thresholds...
-- opt_IND_ScoreValues = 1;2;3;4;5;6;7;8;9
DECLARE #tblScores TABLE (idx int identity, val int not null)
INSERT INTO #tblScores (val) SELECT IntValue FROM dbo.fn_getSettingList('opt_IND_ScoreValues')
-- opt_IND_RatingLevels = Low;Low;Low;Avg;Avg;Avg;High;High;High
DECLARE #tblRatings TABLE (idx int identity, txt nvarchar(128))
INSERT INTO #tblRatings (txt) SELECT TxtValue FROM dbo.fn_getSettingList('opt_IND_RatingLevels')
-- combine two tables above using a common index
DECLARE #tblRatingScores TABLE (val int, txt nvarchar(128))
INSERT INTO #tblRatingScores SELECT s.val, r.txt FROM #tblScores s JOIN #tblRatings r ON s.idx = r.idx
-- reduce table rows above to find score thresholds for each rating level
DECLARE #tblRatingBands TABLE (idx int identity, score int not null, rating nvarchar(128))
INSERT INTO #tblRatingBands
SELECT rs.val, rs.txt FROM #tblRatingScores rs
INNER JOIN (SELECT MIN(val) as val FROM #tblRatingScores GROUP BY txt) AS x ON rs.txt = x.txt AND rs.val = x.val
ORDER BY val
QUESTION: Is there an elegant query I can run against the IND_HISTORIES table that will return records where the most recent rating level for an INDICATOR is above the second-most recent rating level?
UPDATE: To clarify, INDICATORS is not used in the calculation - it's a parent table that holds general information of the performance measure and current 'volatile' scores. Scores are saved to IND_HISTORY periodically - this provides point-in-time 'snapshots' of data, helping to establish score trends.
I'm looking to query the IND_HISTORY table, to find where the most recent 'snapshot' value of an indicator is higher than its second-most recent 'snapshot' value. (It would be ideal to also join the Rating Levels table, as described above, in the determination, so that rows are only returned if the score increase results in a Rating Level increase.)
Any solution should be compatible with SQL Server 2005.
I've implemented the below, which seems to work. But I'd be interested to hear any recommendations to optimize or consolidate.
First, I realize that I do not need the last temp table #tblRatingBands constructed above. Instead, I simply select matching text ratings from #tblRatingScores in my first query set below.
Then in the final query, I check if the score value has increased and if the rating text has changed -- this indicates the trend score has increased and resulted in a change to the rating level.
DECLARE #tblTrendScores TABLE (indId int not null, ih_date datetime, rowNo int, ih_score int, rating nvarchar(128));
WITH LastTwoScores AS (
SELECT fk_IndId,
DateSaved,
ROW_NUMBER() OVER (PARTITION BY fk_IndId ORDER BY DateSaved DESC) AS RowNo,
Score
FROM Ind_History
)
INSERT INTO #tblTrendScores
SELECT *,
(SELECT txt FROM #tblRatingScores WHERE val = Score)
FROM LastTwoScores
WHERE RowNo BETWEEN 1 AND 2
ORDER BY fk_IndId, RowNo
SELECT a.indId,
a.ih_date,
CASE WHEN ((a.ih_score > IsNull(b.ih_score, 0)) AND (a.rating <> IsNull(b.rating, 'none'))) THEN 'Up'
WHEN ((a.ih_score < IsNull(b.ih_score, 0)) AND (a.rating <> IsNull(b.rating, 'none'))) THEN 'Down'
ELSE 'no-change'
END AS TrendRatingChange
FROM #tblTrendScores a
JOIN #tblTrendScores b ON a.indId = b.indId AND b.rowNo = 2
WHERE a.rowNo = 1