SQL Server : pagination into Excel - sql-server

I've got a large data set that I need to get into excel to get some pivot tables and analysis going.
I normally am able to do this as the data never reaches the 1 million line mark. I just do a SQL Server data import and specify my SQL statement.
Here is my current SQL
WITH n AS (
Select A1.AccountID, A1.ParentAccountID, A1.Name
FROM Account AS A1
WHERE A1.ParentAccountID = 92
UNION ALL
SELECT A2.AccountID, A2.ParentAccountID, A2.Name
FROM Account AS A2
JOIN n
ON A2.ParentAccountID=n.AccountID
)
select n.*, D.DeviceID, A.*, P.*
FROM n
LEFT OUTER JOIN
Device AS D
ON D.AccountID = n.AccountID
LEFT OUTER JOIN
Audit as A
ON A.AccountID = n.AccountID
RIGHT OUTER JOIN
DeviceAudit As P
ON P.AuditID = A.AuditID
WHERE A.AuditDate > CAST('2013-03-11' AS DATETIME)
ORDER BY n.AccountID ASC, P.DeviceID ASC, A.AuditDate DESC
This right now is returning to me 100% of what I need. 18 million records for the past 30 days. I was hoping there would be a simple way to find the next 100,000 or 500,000 records.
I can use TOP 100000 to get my first chunk, though I do not seem to have an offset available to me.
At present this runs and completes in 20 minutes. This is 1 of many account hierarchies that I have to perform this for. Hopefully this pagination will not be too expensive cpu wise.
I did try exporting to a CSV in hopes of importing it, though that just gives me a 12GB csv file that I do not have time to and break apart.

Yes, you can do paginated subqueries on the row number since SQL 2005. Add a row number to the select clause of your original query:
, ROW_NUMBER() OVER (ORDER BY {whatever id}) AS row
Then you can make your old query a subquery and query against the row:
SELECT TOP {results per page} *
FROM ({your previous sql statement})
WHERE row > {page# * results per page}

Related

Why does sql server do a scan on joins when there are no records in source table

The idea of the below query is to use the CTE to get the primary key of all rows in [Archive].[tia_tia_object] that meet the filter.
The execution time for the query within the CTE is 0 seconds.
The second part is supposed to do joins on other tables, to filter the data some more, but only if there are any rows returned in the CTE. This was the only way I could get the SQL server to use the correct indexes.
Why does it spend time (see execution plan) looking in TIA_TIA_AGREEMENT_LINE and TIA_TIA_OBJECT, when CTE returns 0 rows?
WITH cte_vehicle
AS (SELECT O.[Seq_no],
O.Object_No
FROM [Archive].[tia_tia_object] O
WHERE O.RECORD_TIMESTAMP >
(SELECT LastLoadTimeStamp FROM staging.Ufngetlastloadtimestamp('Staging.CoveredObject'))
AND O.[Meta_iscurrent] = 1
AND O.OBJECT_TYPE IN ( 'BIO01', 'CAO01', 'DKV', 'GFO01',
'KMA', 'KNO01', 'MCO01', 'VEO01',
'SVO01', 'AUO01' ))
SELECT O.[Seq_no] AS [Bkey_CoveredObject],
Cast(O.[Agr_Line_No] AS BIGINT) AS [Agr_Line_No],
O.[Cover_Start_Date] AS [CoverageFrom],
O.[Cover_End_Date] AS [CoverageTo],
O.[Timestamp] AS [TIMESTAMP],
O.[Record_Timestamp] AS [RECORD_TIMESTAMP],
O.[Newest] AS [Newest],
O.LOCATION_ID AS LocationNo,
O.[Cust_no],
O.[N01]
FROM cte_vehicle AS T
INNER JOIN [Archive].[tia_tia_object] O
ON t.Object_No = O.Object_No
AND t.Seq_No = O.Seq_No
INNER JOIN [Archive].[tia_tia_agreement_line] AL
ON O.Agr_line_no = AL.Agr_line_no
INNER JOIN [Archive].[tia_tia_policy] P
ON AL.Policy_no = P.Policy_no
WHERE P.[Transaction_type] <> 'D'
Execution plan:
Because it still needs to check and look for records. Even if there are no records in that table, it doesn't know that until it actually checks.
Much like if someone gives you a sealed box, you don't know it's empty or not till you open it.

Why LEFT JOIN increase query time so much?

I'm using SQL Server 2012 and encountered strange problem.
This is the original query I've been using:
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, NULL
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
WHERE
H.[NUM] = (SELECT TOP 1 [NUM] FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
Table TABLE_Accounts_History consists of 3 200 000 records.
Table TABLE_For_Filtering is around 1 500 records.
Insert took me 2m 40s and inserted 1 600 000 records for further work.
But then I decided to attach a column from pretty small table TABLE_Additional (only around 100 recs):
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, P.[prof_type]
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
LEFT JOIN [TABLE_Additional] P ON H.[ACCOUNTSYS] = P.[AccountSys]
WHERE H.[NUM] = ( SELECT TOP 1 [NUM]
FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
And now it takes ages this query to complete. Why is it so? How such small left join possibly can dump performance? How can I improve it?
An update: no luck so far with LEFT JOIN. Indexes, no indexes, hinted indexes.. For now I've found a workaround by using my first query and UPDATE after it:
UPDATE [TABLE_TEMP]
SET [PROF_TYPE] = P1.[prof_type]
FROM [TABLE_TEMP] A1
LEFT JOIN
[TABLE_Additional] P1
ON A1.[ACCOUNTSYS] = P1.[AccountSys]
Takes only 5s and does pretty much the same I've been trying to achieve. Still SQL Server performance is mystery to me.
The 'small' left join is actually doing a lot of extra work for you. SQL Server has to go back to TABLE_Additional for each row from your inner join between and TABLE_Accounts_History and TABLE_For_Filtering. You can help SQL Server a few ways to speed this up by trying some indexing. You could:
1) Ensure TABLE_Accounts_History has an index on the Foreign Key H.[ACCOUNTSYS]
2) If you think that TABLE_Additional will always be accessed by the AccountSys, i.e. you will be requesting AccountSys in ordered groups, you could create a Clustered Index on TABLE_Additional.AccountSys. (in orther words physically order the table on disk in order of AccountSys)
3) You could also ensure there is a foreign key index on TABLE_Accounts_History.
left outer join selects all rows from left table. In Your case your left table has 3 200 000 this much rows and then comparing with each record to your right table. One solution is to use Indexes which will reduce retrieval time.

Prevent MORE records returned by JOINING a lookup table?

I am having a problem. My Lookup table is producing MORE records than my original query..
I feel I am missing something basic. How do I prevent ending up with more records by bringing in a column or two from the 2nd table?
-- 140930
SELECT COUNT(ID)
FROM dbo.USER_ACCOUNTS AS A
-- 143324
LEFT JOIN dbo.DOMAIN AS B
ON A.Domain = B.DOMAIN
As you can see my count grows to 143324 after the join. I have tried outer joins as well. There are only 150 or so domains to join on. AND some should not even be in the results because no domain match should be found!?
This is SQL SERVER 2008 R2
|Thanks|
SELECT COUNT(ID)
FROM dbo.USER_ACCOUNTS AS A
WHERE EXISTS (
SELECT 1
FROM dbo.DOMAIN AS B
WHERE A.Domain = B.DOMAIN
)

ROW_NUMBER OVER (ORDER BY date_column)

I am wondering. I have a complex query which runs in a SQL Server 2005 Express edition in around 3 seconds.
The main table has around 300k rows.
When I add
ROW_NUMBER() OVER (ORDER BY date_column)
it takes 123 seconds while date_column is a datetime column.
If I do
ROW_NUMBER() OVER (ORDER BY string_title)
it runs in 3 seconds again.
I added an index on the datetime column. No change. Still 123 seconds.
Then I tried:
ROW_NUMBER() OVER (ORDER BY CAST(date_column AS int))
and the query runs in 3 seconds again.
Since casting needs time, why does SQL Server behave like this???
UPDATE:
It seems like ROW_NUMBER ignore my WHERE statements at all and build a row column list for all available entries? Can anyone confirm that ?
Here I copied a better read able (still tonz of logic :)) in the SQL Management Studio:
SELECT ROW_NUMBER() OVER (ORDER BY xinfobase.lid) AS row_num, *
FROM xinfobase
LEFT OUTER JOIN [xinfobasetree] ON [xinfobasetree].[lid] = [xinfobase].[xlngfolder]
LEFT OUTER JOIN [xapptqadr] ON [xapptqadr].[lid] = [xinfobase].[xlngcontact]
LEFT OUTER JOIN [xinfobasepvaluesdyn] ON [xinfobasepvaluesdyn].[lparentid] = [xinfobase].[lid]
WHERE (xinfobase.xlngisdeleted=2
AND xinfobase.xlinvalid=2)
AND (xinfobase.xlngcurrent=1)
AND ( (xinfobase.lownerid = 1
OR (SELECT COUNT(lid)
FROM xinfobaseacl
WHERE xinfobaseacl.lparentid = xinfobase.lid
AND xlactor IN(1,-3,-4,-230,-243,-254,-255,-256,-257,-268,-589,-5,-6,-7,-8,-675,-676,-677,-9,-10,-864,-661,-671,-913))>0
OR xinfobasetree.xlresponsible = 1)
AND (xinfobase.lid IN (SELECT lparentid
FROM xinfobasealt a, xinfobasetree t
WHERE a.xlfolder IN(1369)
AND a.xlfolder = t.lid
AND dbo.sf_MatchRights(1, t.xtxtrights,'|')=1 )) )
AND ((SELECT COUNT(*) FROM dbo.fn_Split(cf_17,',')
WHERE [value] = 39)>0)
This query need 2-3 seconds on 300k records.
Now I changed the ORDER BY to xinfobase.xstrtitle then it runs in around 2-3 seconds again.
If I switch to xinfobase.dtedit (datetime column with an additional index I just added) it needs hte time I mentioned above already.
I also tried to "cheat" and made my statement as a SUB SELECT to force him to retriev the records first and do a ROW_NUMBER() outside in another SQL statement, same performance result.
UPDATE
After I was still frustrated about doing a workaround I was investigating more.
I removed all my existing indexes and run several SQL statements against the tables.
It turns out, that building new indexes with a new sortorder of columns and include different columns I fixed my issue and the query is fast with dtedit (datetime) column as well.
So lessons learned:
Take more care of your indexes and execution plans and recheck them with every update (new version) of the software you produce...
But still wonderung why CAST(datetime_column AS int) makes it fast before...

Optimising query

This view query is taking 2min to load 500 000 lines.
I'm using EF 4.0 and thread load this view on a DataGrid.
How can I optimize it so it can load in shorter time ?
Update : I updated the query to this and now it takes 55 seconds but still too long !
SELECT ROW_NUMBER() OVER ( ORDER BY ss.IDStore_CardStore DESC ) AS Id_PK ,
SC.IDStockCardIndex ,
SC.Designation ,
ISNULL(P.PMP, 0) PMP ,
ISNULL(SS.Quantity, 0) Quantity ,
SS.UnitePrice * SS.Quantity AS SubTotalStockCard ,
S.StoreName ,
SS.IDPurchaseInvoice
FROM dbo.Stores S
INNER JOIN dbo.StockCardsStores ss ON S.IDStore = ss.IDStore
RIGHT OUTER JOIN dbo.StockCard SC ON ss.IDStockCardIndex = SC.IDStockCardIndex
LEFT OUTER JOIN ( SELECT SUM(UnitePrice * Quantity) / SUM(Quantity) AS PMP ,
IDStockCardIndex
FROM dbo.StockCardsStores AS SCS
GROUP BY IDStockCardIndex
) AS P ON P.IDStockCardIndex = SC.IDStockCardIndex
Use Estimate Execution Plan in SSMS. If you're using 2008 R2, SSMS will suggest "missing index" that will possibly improve the time overall. 500,000 rows at 55 seconds suggest that one or more table scans are kicking in. The Estimated Execution Plan will identify these. The Plan will also show what part of the query "costs" the most overall, helping you to zero in.
Highlight that inner sub-query and look at the plan for that first. Then work your way outwards.

Resources