Update 13 million rows - SQL Server 2008 - sql-server

How can I update 13 million rows in stages by using a cursor or something?
Updating with the current script runs for days and still haven't finished.
There is a row_id field. 1 - 13m
Only one field needs to be updated.
UPDATE
[CIPC].[dbo].[tbldirector]
SET
[CIPC].[dbo].[tbldirector].ENT_NUM = REG.Ent_Number
FROM
[CIPC].[dbo].[tbldirector] DIR
INNER JOIN
[Cipc].[dbo].[tblregister] REG
ON
DIR.ENT_LONGNAME = REG.ENT_NAME

in this case you don't need cursor. You can do it with a loop like this.
DECLARE #indx int, #StepSize INT
SET #indx = 1
SET #StepSize = 100000
BEGIN TRAN
WHILE (EXISTS(SELECT 0 FROM [CIPC].[dbo].[tbldirector] WHERE row_id >= #indx))
BEGIN
PRINT 'Going to update indx ' + REPLICATE(CONVERT(VARCHAR, #indx) + ' -- ' + CONVERT(VARCHAR, #indx + #StepSize) + ' | ', 200)
UPDATE [CIPC].[dbo].[tbldirector]
SET [CIPC].[dbo].[tbldirector].ENT_NUM = REG.Ent_Number
FROM [CIPC].[dbo].[tbldirector] DIR
INNER JOIN [Cipc].[dbo].[tblregister] REG
ON DIR.ENT_LONGNAME = REG.ENT_NAME
WHERE row_id BETWEEN #indx AND #indx + #StepSize
SELECT #indx = #indx + #StepSize
SELECT REPLICATE(LEFT(CONVERT(VARCHAR, #indx) + ' | ', 10), 200)
END
COMMIT

This below link have three methods to do the needed,
http://ksadba.wordpress.com/2008/06/16/updating-millions-of-rows-merge-vs-bulk-collect/

Related

Paginating a parent in SQL Server on a parent/child query

(SQL Server 2012 - Web Edition)
I have a parent/child (one to many) relationship in a query like so:
SELECT a.a, a.b, b.c
FROM tablea INNER JOIN
tableb ON b.pk = a.fk
I have a huge pagination query that encompasses this using the standard (psuedo-code):
WITH C as (SELECT top(#perpage*#pagenum) rowID = row_number() OVER (somefield)),
SELECT c.* FROM C (query) WHERE DT_RowId > (#pagenum-1)*#perpage
The question I have is in this scenario is it possible to paginate off the parent table (a), instead of the entire query? Can I modify my pagination query (not the sql that pulls the query itself) so that when I ask for 10 rows, it gives me 10 rows from the parent, with 'x' number of children attached?
I know I'm not giving the bigger picture here, but the bigger picture is ugly. If need be, we can go there, but it's out there. Here's a small taste of where we're going with this:
IF UPPER(LEFT(#rSQL, 6)) = 'SELECT'
BEGIN
SET #rSQL = 'SELECT * FROM (' + #rSQL + ')' + ' as rTBL';
SET #rSQL = RIGHT(#rSQL, LEN(#rSQL)-7);
IF (LEN(LTRIM(#search)) > 0)
BEGIN
SET #rPaging =
'IF (#schemaonly=1) SET FMTONLY ON;
SELECT #ttlrows = COUNT(*) FROM (SELECT ' + #rSQL + #rWhere + ') AS TBL;
WITH C as (select top(#perpage*#pagenum) DT_RowId = ROW_NUMBER() OVER (' + #rOrder + '), ';
SET #rPaging = #rPaging + #rSQL + #rWhere + ')
SELECT C.*' + #rcols + ', (#perpage-1) * #pagenum as pagenum, #ttlrows as ct, CEILING(#ttlrows / CAST(#perpage AS FLOAT)) as pages
FROM C '+ #query + ' WHERE DT_RowId > (#pagenum-1) * #perpage ';
END
ELSE
BEGIN
SET #rPaging =
'IF (#schemaonly=1) SET FMTONLY ON;
SELECT #ttlrows = COUNT(*) FROM (' + #oSQL + ') AS SUBQUERY;
WITH C as (select top(#perpage*#pagenum) DT_RowId = ROW_NUMBER() OVER (' + #rOrder + '), ';
SET #rPaging = #rPaging + #rSQL + ')
SELECT C.*' + #rcols + ',(#perpage-1) * #pagenum as pagenum, #ttlrows as ct, CEILING(#ttlrows / CAST(#perpage AS FLOAT)) as pages
FROM C '+ #query + ' WHERE DT_RowId > (#pagenum-1) * #perpage ';
END
PRINT #rPaging;
EXECUTE SP_EXECUTESQL #rPaging, #parms, #ttlrows out, #schemaonly, #perpage, #pagenum, #fksiteID, #filter1, #filter2, #filter3, #filter4, #intfilter1, #intfilter2, #intfilter3, #intfilter4, #datefilter1, #datefilter2, #search;
SET FMTONLY OFF;
END
ELSE
BEGIN
SET #rSQL = LTRIM(REPLACE(UPPER(#rSQL), 'EXEC',''));
EXECUTE SP_EXECUTESQL #rSQL, #parms, #ttlrows out, #schemaonly, #perpage, #pagenum, #fksiteID, #filter1, #filter2, #filter3, #filter4, #intfilter1, #intfilter2, #intfilter3, #intfilter4, #datefilter1, #datefilter2;
END
You could do the pagination in a CTE that only gets the parent rows, and then join the child rows in a subsequent CTE or in the main query.
Due to the dynamic way you are using this, this might have to involve building your pagination query from the same building blocks you use to build #query. Without seeing the code that builds #query I can't be much more specific than that.
You could add
,DENSE_RANK() OVER (ORDER BY table_a.primary_key)
This would indirectly provide the same result as
,ROW_NUMBER() OVER(ORDER BY table_a.primary_key)
but the former would be on the final result set instead going back to table a for the latter code snippet.
But please be aware of the disadvantage: any additional ranking function will force an additional sort operation on the result set! This might significantly influence the query performance. If this is the case in your scenario, I'd recommend to follow Tab Allemans solution and use a cte.

Get name from variable using index in T-SQL

Using the following two queries
Query 1:
DECLARE #ContentColumnNamesSRC NVARCHAR(4000) = NULL,
SELECT
#ContentColumnNamesSRC = COALESCE(#ContentColumnNamesSRC + ', ', '') + '[' + name + ']'
FROM
tempdb.sys.columns
WHERE
1 = 1
AND object_id = OBJECT_ID('tempdb..#tempTable')
AND column_id < 9 -- First 8 columns are ID data, which is what I am after
Query 2:
DECLARE #ContentColumnNamesDST NVARCHAR(4000) = NULL,
SELECT
#ContentColumnNamesDST = COALESCE(#ContentColumnNamesDST + ', ', '') + '[' + name + ']'
FROM
tempdb.sys.columns
WHERE
1 = 1
AND object_id = OBJECT_ID('Import.dbo.ContentTable')
AND column_id < 9 -- First 8 columns are ID data, which is what I am after
I can get the first 8 columns from each table into a variable.
What I would like to do is find a way to get the values out of the variable, such that I can match the column names.
They should be identical in each table, and I need it to be able to create a dynamic merge statement, such that the columnsnames from each variable
#ContentColumnNamesSRC
and
#ContentColumnNamesDST
line up, so I can use it in a merge statement.
The point of this is to be able to use it in a loop, and all i would have to do is change which tables it looks at and the merge statements would still work.
Ideally, id like to end up with something like the following:
SELECT #StageSQLCore = N'USE Staging;
BEGIN TRANSACTION
MERGE '+#StageTableCore+' AS DST
USING '+#ImportTableCore+' AS SRC
ON (SRC.[Key] = DST.[Key])
WHEN NOT MATCHED THEN
INSERT ('+#StageTableCoreColumns+')
VALUES (
'+#ImportTableCoreColumns+',GETDATE())
WHEN MATCHED
THEN UPDATE
SET
DST.'+#ContentColumnNamesDST[i]' = SRC.'+#ContentColumnNamesSRC[i] +'
,DST.'+#ContentColumnNamesDST[i]' = SRC.'+#ContentColumnNamesSRC[i] +'
,DST.'+#ContentColumnNamesDST[i]' = SRC.'+#ContentColumnNamesSRC[i] +'
,DST.'+#ContentColumnNamesDST[i]' = SRC.'+#ContentColumnNamesSRC[i] +'
,DST.[ETLDate] = GETDATE()
;
COMMIT'
EXEC (#StageSQLCore)
You can generate Merge SQL like this if both the ordinal are matching
DECLARE #MergeSQL NVARCHAR(4000) = NULL
SELECT --*--,
#MergeSQL = COALESCE(#MergeSQL + ', DST.=', '') + QUOTENAME(bc.column_name) + ' = SRC.' + QUOTENAME(bc.COLUMN_NAME) + char(13)
FROM
test.INFORMATION_SCHEMA.COLUMNS tc
inner join testb.INFORMATION_SCHEMA.COLUMNS bc
on tc.TABLE_NAME = bc.TABLE_NAME
and tc.ORDINAL_POSITION = bc.ORDINAL_POSITION
and tc.TABLE_NAME = 'History'
WHERE
tc.ORDINAL_POSITION < 5 -- First 8 columns are ID data, which is what I am after
and bc.ORDINAL_POSITION < 5
select #MergeSQL

T-SQL Merge Two Comma-Separated Columns

I'm trying to merge one table into another (we'll call them Stage and Prod) that controls users and their permissions. My end result should be a single Prod table that has combined each userid's permissions from Stage into Prod. The issue I'm having though is that the tables were designed by an outside vendor and contain multiple pieces of information in one comma-delimited column.
Stage might look like below:
Userid | Permissions
----------------------------------------------------------------
1 | schedule,upload,test,download,admin
2 | test,upload
3 | download
Prod:
Userid | Permissions
----------------------------------------------------------------
1 | test,admin,schedule,download,upload
2 | admin
3 | download,upload
When they're merged, the userids should have their permissions from Stage, combined with those in Prod. However, tackling this when the permissions are a comma-delimited string has me at wit's end.
In the final result below, userid 1's permissions remain unchanged because they are the same in Stage as they are in Prod, merely in a different order.
Userid 2 had his Stage permissions added to his Prod since he did not have those permissions yet.
Userid 3 had his Prod permissions unchanged since his Stage permissions are already included.
Result:
Userid | Permissions
----------------------------------------------------------------
1 | test,admin,schedule,download,upload
2 | admin,test,upload
3 | download,upload
Is there any way to do this? Hopefully this makes some sense, but if there's any more info that might help I'm happy to try to provide it. Thank you for any help at all.
Interestingly enough, this was a topic of discussion on a MSSQLTips blog by Aaron Bertrand. Borrowing his code you can create the Numbers table and string splitting/reassembling functions required to make the following work. If you are planning on doing this often and are stuck with the schema you've shown, this is the way to go.
/*Create Test Data
create table StagePermissions (UserID int, [Permissions] nvarchar(max));
create table ProdPermissions (UserID int, [Permissions] nvarchar(max));
insert StagePermissions values
(1,'schedule,upload,test,download,admin'),
(2,'test,upload'),
(3,'download')
insert ProdPermissions values
(1,'test,admin,schedule,download,upload'),
(2,'admin'),
(3,'download,upload')
*/
select sp.UserID, dbo.ReassembleString(sp.Permissions+','+pp.Permissions,',',N'OriginalOrder') MergedPermissions
from StagePermissions sp
join ProdPermissions pp on pp.UserID=sp.UserID
Taking Steve's test data, but adding:
create table BothPermissions (UserID int, [Permissions] nvarchar(max));
This code will work with a fixed number of possible permissions.
DECLARE #XPermissions TABLE (
UserID int
,XSchedule BIT
,XUpload BIT
,XTest BIT
,XDownload BIT
,XAdmin BIT
)
INSERT INTO #XPermissions
SELECT
ISNULL(sp.UserID,pp.UserID),
CHARINDEX('schedule',sp.[Permissions]) + CHARINDEX('schedule',pp.[Permissions]),
CHARINDEX('upload',sp.[Permissions]) + CHARINDEX('upload',pp.[Permissions]),
CHARINDEX('test',sp.[Permissions]) + CHARINDEX('test',pp.[Permissions]),
CHARINDEX('download',sp.[Permissions]) + CHARINDEX('download',pp.[Permissions]),
CHARINDEX('admin',sp.[Permissions]) + CHARINDEX('admin',pp.[Permissions])
FROM StagePermissions sp
FULL JOIN ProdPermissions pp
ON sp.UserID = pp.UserID
INSERT INTO BothPermissions
SELECT
UserID,
CASE XSchedule WHEN 0 THEN '' ELSE 'schedule ' END +
CASE XUpload WHEN 0 THEN '' ELSE 'upload ' END +
CASE XTest WHEN 0 THEN '' ELSE 'test ' END +
CASE XDownload WHEN 0 THEN '' ELSE 'download ' END +
CASE XAdmin WHEN 0 THEN '' ELSE 'admin' END
FROM #XPermissions
UPDATE BothPermissions
SET [Permissions] = REPLACE(RTRIM([Permissions]),' ',', ')
Now, I was further curious about Steve's answer. I think it is the most robust solution here. However, I wondered how it would perform with a large dataset. I still don't know the answer because I haven't set up the tools necessary to use it. But here's a query that includes some random number generation to populate 10,000 records of each:
SELECT GETDATE()
DECLARE #StagePerms TABLE (
UserID INT IDENTITY
,Perms NVARCHAR(MAX)
)
DECLARE #ProdPerms TABLE (
UserID INT IDENTITY
,Perms NVARCHAR(MAX)
)
DECLARE #Counter INT = 0
DECLARE #XString NVARCHAR(MAX)
WHILE #Counter < 10000
BEGIN
SET #Counter += 1
SET #XString = REPLACE(RTRIM(
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'test ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'admin ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'schedule ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'download ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'upload ' END)
,' ',', ')
INSERT INTO #StagePerms SELECT #XString
SET #XString = REPLACE(RTRIM(
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'test ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'admin ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'schedule ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'download ' END +
CASE ROUND(RAND()-.2,0) WHEN 0 THEN '' ELSE 'upload ' END)
,' ',', ')
INSERT INTO #ProdPerms SELECT #XString
END
SELECT GETDATE()
DECLARE #BothPerms TABLE (
UserID INT
,Perms NVARCHAR(MAX)
)
DECLARE #XPerms TABLE (
UserID int
,XSchedule BIT
,XUpload BIT
,XTest BIT
,XDownload BIT
,XAdmin BIT
)
INSERT INTO #XPerms
SELECT
ISNULL(sp.UserID,pp.UserID),
CHARINDEX('schedule',sp.Perms) + CHARINDEX('schedule',pp.Perms),
CHARINDEX('upload',sp.Perms) + CHARINDEX('upload',pp.Perms),
CHARINDEX('test',sp.Perms) + CHARINDEX('test',pp.Perms),
CHARINDEX('download',sp.Perms) + CHARINDEX('download',pp.Perms),
CHARINDEX('admin',sp.Perms) + CHARINDEX('admin',pp.Perms)
FROM #StagePerms sp
FULL JOIN #ProdPerms pp
ON sp.UserID = pp.UserID
INSERT INTO #BothPerms
SELECT
UserID,
CASE XTest WHEN 0 THEN '' ELSE 'test ' END +
CASE XAdmin WHEN 0 THEN '' ELSE 'admin ' END +
CASE XSchedule WHEN 0 THEN '' ELSE 'schedule ' END +
CASE XDownload WHEN 0 THEN '' ELSE 'download ' END +
CASE XUpload WHEN 0 THEN '' ELSE 'upload ' END
FROM #XPerms
UPDATE #BothPerms
SET Perms = REPLACE(RTRIM(Perms),' ',', ')
SELECT * FROM #BothPerms
SELECT GETDATE()
The random number generation took less than a second; the rest took about 31 seconds. Steve, I'd be interested to see a comparison. Doesn't matter, obviously, if the data doesn't allow for my solution. And I'm sure there's a sweet spot somewhere.
Please make use of the below query. Its working fine in SQL Server 2012.
DECLARE #Stage TABLE (Userid int, Permission Varchar (8000))
DECLARE #Prod TABLE (Userid int, Permission Varchar (8000))
DECLARE #temp TABLE (Userid int, Permission Varchar (8000))
INSERT #Stage
(Userid,Permission)
VALUES
(1,'schedule,upload,test,download,admin'),
(2,'test,upload'),
(3,'download')
INSERT #Prod
(Userid,Permission)
VALUES
(1,'test,admin,schedule,download,upload'),
(2,'admin'),
(3,'download,upload')
-- Execution Part
INSERT INTO #temp
(Userid,Permission)
(
SELECT A.Userid AS Userid,Split.a.value('.', 'VARCHAR(100)') AS Permission FROM
(SELECT Userid,CAST ('<M>' + REPLACE(Permission, ',', '</M><M>') + '</M>' AS XML) AS Permission FROM #Stage A) AS A
CROSS APPLY Permission.nodes ('/M') AS Split(a)
UNION
SELECT A.Userid AS Userid,Split.a.value('.', 'VARCHAR(100)') AS Permission FROM
(SELECT Userid,CAST ('<M>' + REPLACE(Permission, ',', '</M><M>') + '</M>' AS XML) AS Permission FROM #Prod A) AS A
CROSS APPLY Permission.nodes ('/M') AS Split(a)
)
SELECT Userid, Permission =
STUFF((SELECT ', ' + Permission
FROM #temp b
WHERE b.Userid = a.Userid
FOR XML PATH('')), 1, 2, '')
FROM #temp a
GROUP BY Userid
OUTPUT
Userid Permission
1 admin, download, schedule, test, upload
2 admin, test, upload
3 download, upload
You can also use direct support of string splitting introduced in SQL Serv 2016 (in case you started using this engine version already of course :) )
STRING_SPLIT returns single column table...

AFTER UPDATE TRIGGER with Conditional Replies

I've written an After Update Trigger on tblVisitLog when a specific field, TimeOUT gets updated a new record is INSERTED in tblClinicalDoc. There is a possibility that a record with an identifying ID (Encounter_code) is already in the Target Table, so I added an IF EXIST and an IF NOT EXISTS for the UPDATE or INSERT statements.
I'm sure there must be a more elegant and efficient way of writing this trigger than what I have. Here is what I have:
USE [test_db1]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER TRIGGER [dbo].[CreateRowInClinicalDoc]
ON [dbo].[tblVisitLog]
AFTER UPDATE
AS
BEGIN
SET NOCOUNT ON;
IF UPDATE(TimeOUT)
IF EXISTS (SELECT c.Encounter_code FROM tblClinicalDoc c
INNER JOIN INSERTED i ON c.Encounter_code = i.Encounter_code
WHERE I.TimeOut IS NOT NULL)
UPDATE tblClinicalDoc
SET EditDate = GetDate()
FROM INSERTED i
INNER JOIN tblClinicalDoc c ON c.Encounter_code = i.Encounter_code
END
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS (SELECT c.Encounter_code FROM tblClinicalDoc c
INNER JOIN INSERTED i ON c.Encounter_code = i.Encounter_code
WHERE I.TimeOut IS NOT NULL)
INSERT INTO tblClinicalDoc
SELECT DISTINCT v.CaseNumber AS CaseNumber, v.Episode AS Episode, v.Encounter_code AS Encounter_code, v.Provider AS Provider,
(r.Discipline + '_'+ r.ReportType) AS Note_Type, (r.Discipline + '_' +r.ReportType + ' - ' + v.Case_Name + ' ' + convert(nvarchar(10),
v.TimeIN, 112)) AS Note_Synopsis,
(UPPER(w.ClientCode) + '_' + v.Discipline + '_' + v.VisitType + '_' + RIGHT('0000000' + CONVERT(VARCHAR,v.Encounter_code), 7)+ '_' + convert(nvarchar(10), v.TimeIN, 112) + '.pdf') AS Document_Name,
GetDate() AS EditDate, v.Provider AS EditBy, '' AS InterfaceDate, '' AS InterfaceAgent, '-2' AS Sent, 0 AS Ack, '' AS Document_ID, '' AS Confired
FROM inserted v
INNER JOIN [WebLoginSelector].[dbo].LookupDatabaseCode w ON DB_NAME() = w.ClientDatabaseCode
INNER JOIN LookupReportTypes r ON v.Discipline = r.Discipline AND v.VisitType = r.VT
WHERE v.TimeOUT IS NOT NULL
END
GO
Thanks in advance for all replies,
JackW9653

Change a table name in SQL Server procedure

I want this procedure change the table name when I execute it.
The table name that I want to change is Recargas_#mes
There is some way to do that?
#MES DATETIME
AS
BEGIN
SELECT CUENTA, SUM(COSTO_REC) COSTO_REC
INTO E09040_DEV.BI_PRO_COSTO_RECARGAS
FROM (
SELECT a.*,(CASE
WHEN COD_AJUSTE IN ('ELEC_TEXT','TFREPPVV_C') THEN (A.VALOR)*(R.COSTO) ELSE 0 END)
FROM Recargas_#MES AS A, BI_PRO_LISTA_COSTOS_RECARGAS AS R
WHERE R.ANO_MES = #MES
) D
GROUP BY CUENTA
END
Sample code:
-- Declare variables
DECLARE #MES DATETIME;
DECLARE #TSQL NVARCHAR(MAX);
-- Set the variable to valid statement
SET #TSQL = N'
SELECT CUENTA, SUM(COSTO_REC) AS COSTO_REC
INTO E09040_DEV.BI_PRO_COSTO_RECARGAS
FROM (
SELECT A.*,
(CASE
WHEN COD_AJUSTE IN (''ELEC_TEXT'',''TFREPPVV_C'') THEN
(A.VALOR)*(R.COSTO)
ELSE 0
END)
FROM
Recargas_' + REPLACE(CONVERT(CHAR(10), #MES, 101), '/', '') + ' AS A,
BI_PRO_LISTA_COSTOS_RECARGAS AS R
WHERE R.ANO_MES = ' + CONVERT(CHAR(10), #MES, 101) + '
) D
GROUP BY CUENTA'
-- Execute the statement
EXECUTE (#SQL)
Some things to note:
1 - I assume the table name has some type of extension that is a date? I used MM/DD/YYYY and removed the slashes as a format for the suffix.
2 - The WHERE clause will only work if you are not using the time part of the variable.
For instance, 03/15/2016 00:00:00 would be date without time entry. If not, you will have to use >= and < to grab all hours for a particular day.
3 - You are creating a table on the fly with this code. On the second execution, you will get a error unless you drop the table.
4 - You are not using the ON clause when joining table A to table R. To be ANSI compliant, move the WHERE clause to a ON clause.
5 - The actual calculation created by the CASE statement is not give a column name.
Issues 3 to 5 have to be solved on your end since I do not have the detailed business requirements.
Have Fun.
It should work using dynamic SQL to allow putting a dynamic table name:
DECLARE #SQL NVARCHAR(MAX) = N'
SELECT CUENTA, SUM(COSTO_REC) COSTO_REC
INTO E09040_DEV.BI_PRO_COSTO_RECARGAS
FROM (
SELECT a.*,(CASE
WHEN COD_AJUSTE IN (''ELEC_TEXT'',''TFREPPVV_C'') THEN (A.VALOR)*(R.COSTO) ELSE 0 END)
FROM Recargas_' + #MES + ' AS A, BI_PRO_LISTA_COSTOS_RECARGAS AS R
WHERE R.ANO_MES = ' + CAST(#MES AS VARCHAR(32)) + '
) D
GROUP BY CUENTA'
EXECUTE (#SQL)

Resources