Add computed column using subquery - sql-server

In SQL Server 2000, I want to add a computed column which basically is MAX(column1).
Of course I get an error because subqueries are not allowed.
What I basically try to do is to get the max(dateandtime) of a number of tables of my database.
However when I run my code it takes too long because it's a very old and badly designed database with no keys and indexes.
So, I believe that by adding a new computed column which is the max(datetime), I will do my query much much faster because I will query
(SELECT TOP 1 newcomputedcolumn FROM Mytable)
and I will not have to do
(SELECT TOP 1 dateandtime FROM Mytable
ORDER BY dateandtime DESC)
or
(SELECT MAX(dateandtime) FROM Mytable)
which takes too long.
Any ideas? Thanks a lot.

Related

SQL Server: order of returned rows when using IN clause

Running the following query returns 4 rows. As I can see in SSMS the order of returned rows is the same as I specified in the IN clause.
SELECT * FROM Table WHERE ID IN (4,3,2,1)
Can I say that the order of returned rows are ALWAYS the same as they appear in the IN clause?
If yes then is it true, that the following two queries return the rows in the same order? (as I've tested the orders are the same, but I don't know if I can trust this behavior)
SELECT TOP 10 * FROM Table ORDER BY LastModification DESC
SELECT * FROM Table WHERE ID IN (SELECT TOP 10 ID FROM Table ORDER BY LastModification DESC)
I ask this question because I have a quite complex select query. Using this trick over it brings me ca. 30% performance gain, in my case.
You cannot guarantee the records to be in any particular order unless you use ORDER BY clause. You may use some tricks that may work some of the time but they won't give you guarantee of the order.

Single condition slows down SQL query drastically

I have a SQL query looking something like this:
WITH RES_CTE AS
(SELECT
COLUMN1,
COLUMN2,
[MORE COLUMNS...]
ROW_NUMBER() OVER (ORDER BY R.RANKING DESC) AS RowNum
FROM TABLE1 As R, TABLE2 As A, TABLE3 As U, TABLE4 As S, TABLE5 As T
WHERE R.RID = A.LID
AND S.QRYID = R.QRYID
AND A.AID = U.AID
AND CONDITION1 = 'VALUE'
AND CONDITION2 = 'VALUE'
AND [MORE CONDITIONS...]
),
Results_Cnt AS
(SELECT COUNT(*) CNT FROM Results_CTE)
SELECT * FROM Results_CTE, Results_Cnt WHERE RowNum >= 1 AND RowNum <= 25
Now, this query typically runs under 1 sec and returns the 25 records out of 5000 based on CONDITION1.
Recently though, I added a new column to a TABLE1 and then use its values as a CONDITION2 in the query above. The column is populated going forward but all the values in the past are NULL.
I read something above joining table that have NULL being a reason for slow execution. The table has about 1,300,000 records. 90% of them are NULL in the problematic column. But that column is not being joined on. (The one that is being joined on has an INDEX)
However, I wanted to try that anyway by creating a new column and simply copying the data like so:
ALTER TABLE TABLE1 ADD COL_NEW
UPDATE TABLE1 SET COL_NEW = COL_OLD
My next step was to replace the NULLs with an actual value but first, just for kicks, I changed the query to use as a condition the new field COL_NEW, and the problem went away.
Although I'm happy the problem is gone, I can't explain it to myself. Why was the execution slow in the first place if it had nothing to do with the NULLs?
UPDATE: It appears the problem may have resulted from a cached query plan. So the question essentially becomes, how to force a query plan refresh?
UPDATE: Although doing ALTER TABLE may have refreshed the execution plan, the problem returned. How can I find out what is happening?
It sounds like your query plan got cached while the stats for the new column showed it completely full of nulls, forcing a table scan. Following the ALTER TABLE the query plan was refreshed, replcing the table scan with an index lookujp again, and performance returned to normal.
The only way to know for sure if that is what happened would be to examine the query plans for both queries, but those are long gone now.

ROW_NUMBER OVER (ORDER BY date_column)

I am wondering. I have a complex query which runs in a SQL Server 2005 Express edition in around 3 seconds.
The main table has around 300k rows.
When I add
ROW_NUMBER() OVER (ORDER BY date_column)
it takes 123 seconds while date_column is a datetime column.
If I do
ROW_NUMBER() OVER (ORDER BY string_title)
it runs in 3 seconds again.
I added an index on the datetime column. No change. Still 123 seconds.
Then I tried:
ROW_NUMBER() OVER (ORDER BY CAST(date_column AS int))
and the query runs in 3 seconds again.
Since casting needs time, why does SQL Server behave like this???
UPDATE:
It seems like ROW_NUMBER ignore my WHERE statements at all and build a row column list for all available entries? Can anyone confirm that ?
Here I copied a better read able (still tonz of logic :)) in the SQL Management Studio:
SELECT ROW_NUMBER() OVER (ORDER BY xinfobase.lid) AS row_num, *
FROM xinfobase
LEFT OUTER JOIN [xinfobasetree] ON [xinfobasetree].[lid] = [xinfobase].[xlngfolder]
LEFT OUTER JOIN [xapptqadr] ON [xapptqadr].[lid] = [xinfobase].[xlngcontact]
LEFT OUTER JOIN [xinfobasepvaluesdyn] ON [xinfobasepvaluesdyn].[lparentid] = [xinfobase].[lid]
WHERE (xinfobase.xlngisdeleted=2
AND xinfobase.xlinvalid=2)
AND (xinfobase.xlngcurrent=1)
AND ( (xinfobase.lownerid = 1
OR (SELECT COUNT(lid)
FROM xinfobaseacl
WHERE xinfobaseacl.lparentid = xinfobase.lid
AND xlactor IN(1,-3,-4,-230,-243,-254,-255,-256,-257,-268,-589,-5,-6,-7,-8,-675,-676,-677,-9,-10,-864,-661,-671,-913))>0
OR xinfobasetree.xlresponsible = 1)
AND (xinfobase.lid IN (SELECT lparentid
FROM xinfobasealt a, xinfobasetree t
WHERE a.xlfolder IN(1369)
AND a.xlfolder = t.lid
AND dbo.sf_MatchRights(1, t.xtxtrights,'|')=1 )) )
AND ((SELECT COUNT(*) FROM dbo.fn_Split(cf_17,',')
WHERE [value] = 39)>0)
This query need 2-3 seconds on 300k records.
Now I changed the ORDER BY to xinfobase.xstrtitle then it runs in around 2-3 seconds again.
If I switch to xinfobase.dtedit (datetime column with an additional index I just added) it needs hte time I mentioned above already.
I also tried to "cheat" and made my statement as a SUB SELECT to force him to retriev the records first and do a ROW_NUMBER() outside in another SQL statement, same performance result.
UPDATE
After I was still frustrated about doing a workaround I was investigating more.
I removed all my existing indexes and run several SQL statements against the tables.
It turns out, that building new indexes with a new sortorder of columns and include different columns I fixed my issue and the query is fast with dtedit (datetime) column as well.
So lessons learned:
Take more care of your indexes and execution plans and recheck them with every update (new version) of the software you produce...
But still wonderung why CAST(datetime_column AS int) makes it fast before...

SQL Server Pagination w/o row_number() or nested subqueries?

I have been fighting with this all weekend and am out of ideas. In order to have pages in my search results on my website, I need to return a subset of rows from a SQL Server 2005 Express database (i.e. start at row 20 and give me the next 20 records). In MySQL you would use the "LIMIT" keyword to choose which row to start at and how many rows to return.
In SQL Server I found ROW_NUMBER()/OVER, but when I try to use it it says "Over not supported". I am thinking this is because I am using SQL Server 2005 Express (free version). Can anyone verify if this is true or if there is some other reason an OVER clause would not be supported?
Then I found the old school version similar to:
SELECT TOP X * FROM TABLE WHERE ID NOT IN (SELECT TOP Y ID FROM TABLE ORDER BY ID) ORDER BY ID where X=number per page and Y=which record to start on.
However, my queries are a lot more complex with many outer joins and sometimes ordering by something other than what is in the main table. For example, if someone chooses to order by how many videos a user has posted, the query might need to look like this:
SELECT TOP 50 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID WHERE iUserID NOT IN (SELECT TOP 100 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID ORDER BY iVideoCount) ORDER BY iVideoCount
The issue is in the subquery SELECT line: TOP 100 iUserID, iVideoCount
To use the "NOT IN" clause it seems I can only have 1 column in the subquery ("SELECT TOP 100 iUserID FROM ..."). But when I don't include iVideoCount in that subquery SELECT statement then the ORDER BY iVideoCount in the subquery doesn't order correctly so my subquery is ordered differently than my parent query, making this whole thing useless. There are about 5 more tables linked in with outer joins that can play a part in the ordering.
I am at a loss! The two above methods are the only two ways I can find to get SQL Server to return a subset of rows. I am about ready to return the whole result and loop through each record in PHP but only display the ones I want. That is such an inefficient way to things it is really my last resort.
Any ideas on how I can make SQL Server mimic MySQL's LIMIT clause in the above scenario?
Unfortunately, although SQL Server 2005 Row_Number() can be used for paging and with SQL Server 2012 data paging support is enhanced with Order By Offset and Fetch Next, in case you can not use any of these solutions you require to first
create a temp table with identity column.
then insert data into temp table with ORDER BY clause
Use the temp table Identity column value just like the ROW_NUMBER() value
I hope it helps,

Optimizing ROW_NUMBER() in SQL Server

We have a number of machines which record data into a database at sporadic intervals. For each record, I'd like to obtain the time period between this recording and the previous recording.
I can do this using ROW_NUMBER as follows:
WITH TempTable AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Machine_ID ORDER BY Date_Time) AS Ordering
FROM dbo.DataTable
)
SELECT [Current].*, Previous.Date_Time AS PreviousDateTime
FROM TempTable AS [Current]
INNER JOIN TempTable AS Previous
ON [Current].Machine_ID = Previous.Machine_ID
AND Previous.Ordering = [Current].Ordering + 1
The problem is, it goes really slow (several minutes on a table with about 10k entries) - I tried creating separate indicies on Machine_ID and Date_Time, and a single joined-index, but nothing helps.
Is there anyway to rewrite this query to go faster?
The given ROW_NUMBER() partition and order require an index on (Machine_ID, Date_Time) to satisfy in one pass:
CREATE INDEX idxMachineIDDateTime ON DataTable (Machine_ID, Date_Time);
Separate indexes on Machine_ID and Date_Time will help little, if any.
How does it compare to this version?:
SELECT x.*
,(SELECT MAX(Date_Time)
FROM dbo.DataTable
WHERE Machine_ID = x.Machine_ID
AND Date_Time < x.Date_Time
) AS PreviousDateTime
FROM dbo.DataTable AS x
Or this version?:
SELECT x.*
,triang_join.PreviousDateTime
FROM dbo.DataTable AS x
INNER JOIN (
SELECT l.Machine_ID, l.Date_Time, MAX(r.Date_Time) AS PreviousDateTime
FROM dbo.DataTable AS l
LEFT JOIN dbo.DataTable AS r
ON l.Machine_ID = r.Machine_ID
AND l.Date_Time > r.Date_Time
GROUP BY l.Machine_ID, l.Date_Time
) AS triang_join
ON triang_join.Machine_ID = x.Machine_ID
AND triang_join.Date_Time = x.Date_Time
Both would perform best with an index on Machine_ID, Date_Time and for correct results, I'm assuming that this is unique.
You haven't mentioned what is hidden away in * and that can sometimes means a lot since a Machine_ID, Date_Time index will not generally be covering and if you have a lot of columns there or they have a lot of data, ...
If the number of rows in dbo.DataTable is large then it is likely that you are experiencing the issue due to the CTE self joining onto itself. There is a blog post explaining the issue in some detail here
Occasionally in such cases I have resorted to creating a temporary table to insert the result of the CTE query into and then doing the joins against that temporary table (although this has usually been for cases where a large number of joins against the temp table are required - in the case of a single join the performance difference will be less noticable)
I have had some strange performance problems using CTEs in SQL Server 2005. In many cases, replacing the CTE with a real temp table solved the problem.
I would try this before going any further with using a CTE.
I never found any explanation for the performance problems I've seen, and really didn't have any time to dig into the root causes. However I always suspected that the engine couldn't optimize the CTE in the same way that it can optimize a temp table (which can be indexed if more optimization is needed).
Update
After your comment that this is a view, I would first test the query with a temp table to see if that performs better.
If it does, and using a stored proc is not an option, you might consider making the current CTE into an indexed/materialized view. You will want to read up on the subject before going down this road, as whether this is a good idea depends on a lot of factors, not the least of which is how often the data is updated.
What if you use a trigger to store the last timestamp an subtract each time to get the difference?
If you require this data often, rather than calculate it each time you pull the data, why not add a column and calculate/populate it whenever row is added?
(Remus' compound index will make the query fast; running it only once should make it faster still.)

Resources