I need to partition SQL table on monthly basis. So far i am able to create 12 partitions in Year-2015. But when year 2016 starts, all data started to pile up in last partition (December in my case). I need to place data of January-2016 in 1 partition (January in my case). I cannot make partitions for every year. Any suggestions?
Below is an example of how to create an incremental monthly partition for a RANGE RIGHT function, including test data.
CREATE DATABASE Test;
GO
USE Test
GO
--main table partition function (before start of next month)
CREATE PARTITION FUNCTION PF_Monthly(datetime2(0))
AS RANGE RIGHT FOR VALUES (
'2015-01-01T00:00:00'
, '2015-02-01T00:00:00'
, '2015-03-01T00:00:00'
, '2015-04-01T00:00:00'
, '2015-05-01T00:00:00'
, '2015-06-01T00:00:00'
, '2015-07-01T00:00:00'
, '2015-08-01T00:00:00'
, '2015-09-01T00:00:00'
, '2015-10-01T00:00:00'
, '2015-11-01T00:00:00'
, '2015-12-01T00:00:00'
, '2016-01-01T00:00:00' --future empty partition
)
GO
--main table partition scheme
CREATE PARTITION SCHEME PS_Monthly
AS PARTITION PF_Monthly
ALL TO ( [PRIMARY] );
GO
--main partitioned table
CREATE TABLE dbo.MontylyPartitionedTable(
PartitioningColumn datetime2(0)
, OtherKeyColumn int NOT NULL
, OtherData int NULL
, CONSTRAINT PK_MontylyPartitionedTable PRIMARY KEY
CLUSTERED (PartitioningColumn, OtherKeyColumn)
ON PS_Monthly(PartitioningColumn)
) ON PS_Monthly(PartitioningColumn);
GO
---load 12M rows test data
WITH
t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
,t16M AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) - 1 AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t256 AS c)
INSERT INTO dbo.MontylyPartitionedTable WITH (TABLOCKX) (PartitioningColumn, OtherKeyColumn, OtherData)
SELECT DATEADD(month, num/1000000, '20150101'), num, num
FROM t16M
WHERE num < 12000000;
GO
CREATE PROCEDURE dbo.CreateMonthlyPartition
#NewMonthStartDate datetime2(0) --partition boundary to create
/*
*/
AS
SET XACT_ABORT ON;
BEGIN TRY
BEGIN TRAN;
--acquire exclusive lock on main table to prevent deadlocking during partition maintenance
DECLARE #result int = (SELECT TOP (0) 1 FROM dbo.MontylyPartitionedTable WITH (TABLOCKX));
--add new partition for future data
ALTER PARTITION SCHEME PS_Monthly
NEXT USED [PRIMARY];
ALTER PARTITION FUNCTION PF_Monthly()
SPLIT RANGE (#NewMonthStartDate);
--this will release the exclusve table lock but the data in the staging table temporarily unavailable
COMMIT;
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0 ROLLBACK;
THROW;
END CATCH;
GO
--schedule this before the start of each new month to create a new monthly partition 2 months in advance
SELECT DATEADD(day, 1, DATEADD(month, 1, EOMONTH(GETDATE())));
DECLARE #NewMonthStartDate datetime2(0) = DATEADD(day, 1, DATEADD(month, 1, EOMONTH(GETDATE())));
EXEC dbo.CreateMonthlyPartition #NewMonthStartDate;
GO
Partition Table Monthly Bases using Computed Column.
**step1 : Create FileGroup For 12 Month **
ALTER DATABASE yourDataBase ADD FILEGROUP January
ALTER DATABASE yourDataBase ADD FILE (
NAME = N'January',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\January.ndf'
) TO FILEGROUP January
ALTER DATABASE yourDataBase ADD FILEGROUP February
ALTER DATABASE yourDataBase ADD FILE (
NAME = N'February',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\February.ndf',
SIZE = 3072KB , FILEGROWTH = 1024KB
) TO FILEGROUP February
and so on Until 12 Month.
step2 : Create Function
CREATE PARTITION FUNCTION partition_ByMonth (int) AS RANGE RIGHT FOR VALUES (2,3,4,5,6,7,8,9,10,11,12);
step3: Create SCheme
CREATE PARTITION SCHEME partition_scheme_ByMonth
AS PARTITION partition_ByMonth
TO (January, February, March, April, May, June, July, August, September, October, November, December);
step4: Table Partition
ALTER TABLE PartitionTableByMonth ADD PartitionColumn as MONTH(OrderDate) PERSISTED
step5: index
CREATE NONCLUSTERED INDEX IX_PartitionedTable_Pd ON PartitionTableByMonth (PartitionColumn )
ON partition_scheme_ByMonth(PartitionColumn )
Now yourTable Partitioning By month
Related
I'm new to work with database. Describing the Problem below:
Problem: Having 'X' number of Tables with 'Y' number of partitions. Now want to have a way of finding which tables have partitions and which partitions are almost full/partitions created before D-30 is needed to be deleted as getting error in application due to such issue.
Please share guidance/queries how this problem can be resolved?
Artifacts: Tried several queries as SYSDBA to list down all the partitions but there is multiple dependencies pop up. Still trying to figuring a concrete way to make a scheduler Job.
step 1 is to create your PARTITIONS. In my test CASE I am creating 1 PARTITION t1= weekly but I'm leaving the syntax for other partitions:
t0 = daily, t2= monthly, t3=quaterly and t4=yearly.
Then I will populate PARTITION t1 with sample data. I'm using 1 date for each day but during your testing add as many dates as you like per day. I would suggest data to emulate your application.
In addition, I am using a GLOBAL index for table t1 and the rest are LOCAL indexes.
When a partition from a global index is dropped the entire index MUST be rebuilt or it will be invalid.
Based on your application you need to make your own determination whether to use global or local indexes!!
Note: I am creating all the PARTITIONS on DATE columns but you can also use TIMESTAMP columns If need be.
/* weekly PARTITION */
CREATE TABLE t1 (
seq_num NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
dt DATE)
PARTITION BY RANGE (dt)
INTERVAL ( NUMTODSINTERVAL (7, 'DAY') ) (
PARTITION OLD_DATA VALUES LESS THAN (DATE '2022-04-01')
);
/
INSERT into t1 (dt)
with dt (dt, interv) as (
select date '2022-04-01', numtodsinterval(1,'DAY') from dual
union all
select dt.dt + interv, interv from dt
where dt.dt + interv < date '2022-08-31')
select dt from dt;
/
create index t1_global_ix on T1 (dt);
/
/* daily PARTITION */
CREATE TABLE t0 (
seq_num NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
dt DATE)
PARTITION BY RANGE (dt)
INTERVAL ( NUMTODSINTERVAL (1, 'DAY') ) (
PARTITION OLD_DATA VALUES LESS THAN (DATE 2022-01-01')
);
/
INSERT into t0 (dt)
with dt (dt, interv) as (
select date '2022-01-01', numtodsinterval(1,'DAY') from dual
union all
select dt.dt + interv, interv from dt
where dt.dt + interv < date '2022-12-31')
select dt from dt;
/
create index t0_global_ix on T0 (dt);
/
/* MONTHLY PARTITION */
CREATE TABLE t2 (
seq_num NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
dt DATE)
PARTITION BY RANGE (dt)
INTERVAL ( NUMTODSINTERVAL (1, 'MONTH') ) (
PARTITION OLD_DATA VALUES LESS THAN (DATE '2022-01-01')
);
/
INSERT into t2 (dt)
with dt (dt, interv) as (
select date '2021-01-01', numtodsinterval(1,'DAY') from dual
union all
select dt.dt + interv, interv from dt
where dt.dt + interv < date '2022-12-31')
select dt from dt;
/
create index t2_local_ix on T2 (dt) LOCAL;
/
/* QUARTERLY PARTITION */
CREATE TABLE t3 (
seq_num NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
dt DATE
)
PARTITION BY RANGE (dt)
INTERVAL
(NUMTOYMINTERVAL(3,'MONTH'))
(
PARTITION OLD_DATA values LESS THAN (TO_DATE('2021-01-01','YYYY-MM-DD'))
);
/
INSERT into t3 (dt)
with dt (dt, interv) as (
select date '2021-01-01', numtodsinterval(1,'DAY') from dual
union all
select dt.dt + interv, interv from dt
where dt.dt + interv < date '2022-12-31')
select dt from dt;
/
create index t3_local_ix on T3 (dt) LOCAL;
/
/* yearly PARTITION */
CREATE TABLE t4 (
seq_num NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
dt DATE
)
PARTITION BY RANGE (dt)
INTERVAL
(NUMTOYMINTERVAL(1,'YEAR'))
(
PARTITION OLD_DATA values LESS THAN (TO_DATE('2020-01-01','YYYY-MM-DD'))
);
/
INSERT into t4 (dt)
with dt (dt, interv) as (
select date '2021-01-01', numtodsinterval(90,'DAY') from dual
union all
select dt.dt + interv, interv from dt
where dt.dt + interv < date 2023-12-31')
select dt from dt;
/
create index t4_local_ix on T4 (dt) LOCAL;
/
Every time a new partition is created Oracle generates a system name, which is quite cryptic
Here is a list of the PARTITION names, which Oracle generated when I loaded the data above
SELECT PARTITION_NAME
FROM USER_TAB_PARTITIONS
WHERE TABLE_NAME = 'T1'
PARTITION_NAME
OLD_DATA
SYS_P458773
SYS_P458774
SYS_P458775
SYS_P458776
SYS_P458777
SYS_P458778
SYS_P458779
SYS_P458780
SYS_P458781
SYS_P458782
SYS_P458783
SYS_P458784
SYS_P458785
SYS_P458786
SYS_P458787
SYS_P458788
SYS_P458789
SYS_P458790
SYS_P458791
SYS_P458792
SYS_P458793
SYS_P458794
Although PARTITION management will work with system GENERATED PARTITION names, I use the procedure below to rename them to something more meaningful.
Let's create and run the procedure and take a look at the names again. As you can see, since we are working with weekly partitions the name is P_ for partittion YYYY 4 digit year the PARTITION is in, W for Week of the year, and ## for the week number within the year.
I would suggest using the scheduler to run this process at least once a day. You can run it as many time as you want as it will not cause any harm.
CREATE OR REPLACE PROCEDURE MaintainPartitions IS EXPRESSION_IS_OF_WRONG_TYPE EXCEPTION;
PRAGMA EXCEPTION_INIT(EXPRESSION_IS_OF_WRONG_TYPE, -6550);
CURSOR PartTables IS
SELECT TABLE_NAME, INTERVAL
FROM USER_PART_TABLES
WHERE PARTITIONING_TYPE = 'RANGE'
ORDER BY TABLE_NAME;
CURSOR TabParts(aTableName VARCHAR2) IS
SELECT PARTITION_NAME, HIGH_VALUE
FROM USER_TAB_PARTITIONS
WHERE regexp_like(partition_name,'^SYS_P[[:digit:]]{1,10}') AND
TABLE_NAME = aTableName AND
table_name not like 'BIN$%'
and interval is not null
ORDER BY PARTITION_POSITION;
ym INTERVAL YEAR TO MONTH;
ds INTERVAL DAY TO SECOND;
newPartName VARCHAR2(30);
PERIOD TIMESTAMP;
BEGIN
FOR aTab IN PartTables LOOP
BEGIN
EXECUTE IMMEDIATE 'BEGIN :ret := '||aTab.INTERVAL||'; END;' USING OUT ds;
ym := NULL;
EXCEPTION
WHEN EXPRESSION_IS_OF_WRONG_TYPE THEN
EXECUTE IMMEDIATE 'BEGIN :ret := '||aTab.INTERVAL||'; END;' USING OUT ym;
ds := NULL;
END;
FOR aPart IN TabParts(aTab.TABLE_NAME) LOOP
EXECUTE IMMEDIATE 'BEGIN :ret := '||aPart.HIGH_VALUE||'; END;' USING OUT PERIOD;
IF ds IS NOT NULL THEN
IF ds >= INTERVAL '7' DAY THEN
-- Weekly partition
EXECUTE IMMEDIATE 'BEGIN :ret := TO_CHAR('||aPart.HIGH_VALUE||' - :int, :fmt); END;' USING OUT newPartName, INTERVAL '1' DAY, '"P_"IYYY"W"IW';
ELSE
-- Daily partition
EXECUTE IMMEDIATE 'BEGIN :ret := TO_CHAR('||aPart.HIGH_VALUE||' - :int, :fmt); END;' USING OUT newPartName, INTERVAL '1' DAY, '"P_"YYYYMMDD';
END IF;
ELSE
IF ym = INTERVAL '3' MONTH THEN
-- Quarterly partition
EXECUTE IMMEDIATE 'BEGIN :ret := TO_CHAR('||aPart.HIGH_VALUE||' - :int, :fmt); END;' USING OUT newPartName, INTERVAL '1' DAY, '"P_"YYYY"Q"Q';
ELSE
-- Monthly partition
EXECUTE IMMEDIATE 'BEGIN :ret := TO_CHAR('||aPart.HIGH_VALUE||' - :int, :fmt); END;' USING OUT newPartName, INTERVAL '1' DAY, '"P_"YYYYMM';
END IF;
END IF;
IF newPartName <> aPart.PARTITION_NAME THEN
EXECUTE IMMEDIATE 'ALTER TABLE '||aTab.TABLE_NAME||' RENAME PARTITION '||aPart.PARTITION_NAME||' TO '||newPartName;
END IF;
END LOOP;
END LOOP;
END MaintainPartitions;
/
EXEC MaintainPartitions
SELECT PARTITION_NAME
FROM USER_TAB_PARTITIONS
WHERE TABLE_NAME = 'T1'
PARTITION_NAME
OLD_DATA
P_2022W14
P_2022W15
P_2022W16
P_2022W17
P_2022W18
P_2022W19
P_2022W20
P_2022W21
P_2022W22
P_2022W23
P_2022W24
P_2022W25
P_2022W26
P_2022W27
P_2022W28
P_2022W29
P_2022W30
P_2022W31
P_2022W32
P_2022W33
P_2022W34
SELECT COUNT(*) FROM USER_TAB_PARTITIONS
COUNT(*)
31
Next step is setting up your RETENTION table. There should be an entry for each interval range PARTITION.
The RETENTION value is for you to decide. In my example, I chose 30 days fir table T1. This means, when the high value for a PARTITION is greater than 30 days its eligible to be dropped. So chose wisely when setting up these values.
Note: I listed the names of other tables to show how each table has its own value.
CREATE TABLE PARTITION_RETENTION (
seq_num NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
TABLE_NAME VARCHAR2(30),
RETENTION INTERVAL DAY(3) TO SECOND(0),
CONSTRAINT
partition_retention_pk primary key (table_name),
CONSTRAINT CHK_NON_ZERO_DAYS CHECK (
RETENTION > INTERVAL '0' DAY
),
CONSTRAINT CHK_WHOLE_DAYS CHECK (
EXTRACT(HOUR FROM RETENTION) = 0
AND EXTRACT(MINUTE FROM RETENTION) = 0
AND EXTRACT(SECOND FROM RETENTION) = 0
)
);
insert into PARTITION_RETENTION (TABLE_NAME, RETENTION)
select 'T0', interval '10' day from dual union all
select 'T1', interval '30' day from dual union all
select 'T2', interval '15' day from dual union all
select 'T3', interval '30' day from dual union all
select 'T4', 15 * interval '1' day from dual union all
select 'T5', 5 * interval '1 00:00:00' day to second from dual;
Below are 3 procedures that need to be created.
The ddl procedure is a wrapper, which shows you what is being processed and how long it takes.
The rebuild_index procedure is obvious it rebuilds any invalid indexes. As I mentioned above if you are using a global index and a PARTITION is dropped then the index needs to be rebuilt. I hardcoded parallel 4 in this example but if you have plenty if CPU power you may want to increase the number to fit your needs.
In addition, there are other ways indexes can be marked unusable so you may want to consider scheduling that task.
Lastly, is the anonymous block. Which actually drops the PARTITIONS for, which the retention PERIOD has passed. This needs to be scheduled once a day!!
If you look carefully at the anonymous block the last step is a call to the rebuild index procedure. So if an index is unusable it will be rebuilt.
Now let's run the process and see what happens.
CREATE OR REPLACE PROCEDURE ddl(p_cmd varchar2)
authid current_user
is
t1 pls_integer;
BEGIN
t1 := dbms_utility.get_time;
dbms_output.put_line(p_cmd);
execute immediate p_cmd;
dbms_output.put_line((dbms_utility.get_time - t1)/100 || ' seconds');
END;
/
CREATE OR REPLACE PROCEDURE rebuild_index
authid current_user
is
BEGIN
for i in (
select index_owner, index_name, partition_name, 'partition' ddl_type
from all_ind_partitions
where status = 'UNUSABLE'
union all
select owner, index_name, null, null
from all_indexes
where status = 'UNUSABLE'
)
loop
if i.ddl_type is null then
ddl('alter index '||i.index_owner||'.'||i.index_name||' rebuild parallel 4 online');
else
ddl('alter index '||i.index_owner||'.'||i.index_name||' modify '||i.ddl_type||' '||i.partition_name||' rebuild parallel 4 online');
end if;
end loop;
END;
/
DECLARE
CANNOT_DROP_LAST_PARTITION EXCEPTION;
PRAGMA EXCEPTION_INIT(CANNOT_DROP_LAST_PARTITION, -14758);
CANNOT_DROP_ONLY_ONE_PARTITION EXCEPTION;
PRAGMA EXCEPTION_INIT(CANNOT_DROP_ONLY_ONE_PARTITION, -14083);
ts TIMESTAMP;
CURSOR TablePartitions IS
SELECT TABLE_NAME, PARTITION_NAME, p.HIGH_VALUE, t.INTERVAL, RETENTION, DATA_TYPE
FROM USER_PART_TABLES t
JOIN USER_TAB_PARTITIONS p USING (TABLE_NAME)
JOIN USER_PART_KEY_COLUMNS pk ON pk.NAME = TABLE_NAME
JOIN USER_TAB_COLS tc USING (TABLE_NAME, COLUMN_NAME)
JOIN PARTITION_RETENTION r USING (TABLE_NAME)
WHERE pk.object_type = 'TABLE' AND
t.partitioning_type = 'RANGE' AND
REGEXP_LIKE (tc.data_type, '^DATE$|^TIMESTAMP.*');
BEGIN
FOR aPart IN TablePartitions LOOP
EXECUTE IMMEDIATE 'BEGIN :ret := '||aPart.HIGH_VALUE||'; END;' USING OUT ts;
IF ts < SYSTIMESTAMP - aPart.RETENTION THEN
BEGIN
ddl('alter table '||aPart.TABLE_NAME||' drop partition '||aPart.partition_name);
EXCEPTION
WHEN CANNOT_DROP_ONLY_ONE_PARTITION THEN
DBMS_OUTPUT.PUT_LINE('Cant drop the only partition '||aPart.PARTITION_NAME ||' from table '||aPart.TABLE_NAME);
ddl('ALTER TABLE '||aPart.TABLE_NAME||' TRUNCATE PARTITION '||aPart.PARTITION_NAME);
WHEN CANNOT_DROP_LAST_PARTITION THEN
BEGIN
DBMS_OUTPUT.PUT_LINE('Drop last partition '||aPart.PARTITION_NAME ||' from table '||aPart.TABLE_NAME);
EXECUTE IMMEDIATE 'ALTER TABLE '||aPart.TABLE_NAME||' SET INTERVAL ()';
ddl('alter table '||aPart.TABLE_NAME||' drop partition '||aPart.partition_name);
EXECUTE IMMEDIATE 'ALTER TABLE '||aPart.TABLE_NAME||' SET INTERVAL( '||aPart.INTERVAL||' )';
EXCEPTION
WHEN CANNOT_DROP_ONLY_ONE_PARTITION THEN
-- Depending on the order the "last" partition can be also the "only" partition at the same time
EXECUTE IMMEDIATE 'ALTER TABLE '||aPart.TABLE_NAME||' SET INTERVAL( '||aPart.INTERVAL||' )';
DBMS_OUTPUT.PUT_LINE('Cant drop the only partition '||aPart.PARTITION_NAME ||' from table '||aPart.TABLE_NAME);
ddl('ALTER TABLE '||aPart.TABLE_NAME||' TRUNCATE PARTITION '||aPart.PARTITION_NAME);
END;
END;
END IF;
END LOOP;
rebuild_index();
END;
alter table T1 drop partition OLD_DATA
.02 seconds
alter table T1 drop partition P_2022W14
.01 seconds
alter table T1 drop partition P_2022W15
.02 seconds
alter table T1 drop partition P_2022W16
.01 seconds
alter table T1 drop partition P_2022W17
.02 seconds
alter table T1 drop partition P_2022W18
.01 seconds
alter table T1 drop partition P_2022W19
.02 seconds
alter table T1 drop partition P_2022W20
.01 seconds
alter table T1 drop partition P_2022W21
.01 seconds
alter table T1 drop partition P_2022W22
.02 seconds
alter table T1 drop partition P_2022W23
.01 seconds
alter table T1 drop partition P_2022W24
.01 seconds
alter table T1 drop partition P_2022W25
.01 seconds
alter table T1 drop partition P_2022W26
.01 seconds
alter table T1 drop partition P_2022W27
.02 seconds
alter index SQL_WUKYPRGVPTOUVLCAEKUDCRCQI.T1_GLOBAL_IX rebuild parallel 4 online
.1 seconds
…
…
…
alter index SQL_WUKYPRGVPTOUVLCAEKUDCRCQI.T1_GLOBAL_IX rebuild parallel 4 online
.1 seconds
SELECT count(*) from USER_TAB_PARTITIONS
Where
table_name not like 'BIN$%'
8
SELECT PARTITION_NAME
FROM USER_TAB_PARTITIONS
WHERE TABLE_NAME = 'T1'
AND
table_name not like 'BIN$%'
P_2022W28
P_2022W29
P_2022W30
P_2022W31
P_2022W32
P_2022W33
P_2022W34
P_2022W35
I have a table that contains a column date, I want to add to the the same table 4 other columns DAY, MONTH, YEAR, QUARTER based on the value of the column date for all the records in the table. How could it be done please ?
Option 1 - computed columns via datepart
ALTER TABLE dbo.MyTable
ADD TheYear = DATEPART(YEAR, DateColumn);
ALTER TABLE dbo.MyTable
ADD TheQuarter = DATEPART(QUARTER, DateColumn);
ALTER TABLE dbo.MyTable
ADD TheMonth = DATEPART(MONTH, DateColumn);
ALTER TABLE dbo.MyTable
ADD TheDay = DATEPART(DAY, DateColumn);
This adds virtual, read-only columns to your table based on the value of your source column. This is likely the easiest, least approach unless you have not supplied all the relevant points.
Option 2 - Add columns
ALTER TABLE dbo.MyTable
ADD TheYear int;
ALTER TABLE dbo.MyTable
ADD TheQuarter tinyint;
ALTER TABLE dbo.MyTable
ADD TheMonth tinyint;
ALTER TABLE dbo.MyTable
ADD TheDay tinyint;
This is likely going to cause you pain as there is nothing ensuring the values stored in those 4 columns has any bearing on the value in your original Date column but it's an approach.
And since you're tagging SSIS, if you are trying to do this in a data flow - don't. Write the query to do it and bring this data into your pipeline
SELECT T.*
, TheYear = DATEPART(YEAR, T.DateColumn);
, TheQuarter = DATEPART(QUARTER, T.DateColumn);
, TheMonth = DATEPART(MONTH, T.DateColumn);
, TheDay = DATEPART(DAY, T.DateColumn);
FROM dbo.MyTable AS T;
I am facing a problem when I query master table (having ~700 Million records and high transactional table) to look for newly inserted records. My aim is to get all the newly created IDs from the #IDs temp table (Min and Max records) and dump it in another child table. But random IDs are missing in the child table.
Setup:
We have a primary and secondary server (SQL Server 2016) and they are in sync mode.
Tables:
CREATE TABLE tblMaster
(
ID BIGINT IDENTITY(1,1) NOT NULL,
EmployeeID INT NOT NULL
)
CREATE TABLE tblChild
(
ChildID IDENTITY(1,1),
ID BIGINT NOT NULL,
TransactionDate Datetime NOT NULL
)
tblChild.ID references tblMaster.ID.
Stored procedure:
DECLARE #MaxID BIGINT
SELECT #MaxID = MAX(ID) FROM tblChild WITH(NOLOCK)
SET #MaxID = ISNULL(#MaxID, 0)
DROP TABLE IF EXISTS #IDS
SELECT ID
INTO #IDS
FROM tblMaster WITH(NOLOCK)
WHERE ID > #MaxID
--25k RECORDS BATCH INSERT INTO tblChild - MAINLY TAKE CARE NEWLY inserted records
STARTIDS:
IF EXISTS (SELECT TOP 1 * FROM #IDS)
BEGIN
DROP TABLE IF EXISTS #TOPIDS
SELECT TOP 25000 ID INTO #TOPIDS
FROM #IDS
ORDER BY ID ASC
INSERT INTO tblChild (ID, CreatedBy, CreatedDate)
SELECT ID, SYSTEM_USER, GETDATE()
FROM #TOPIDS
DELETE AA
FROM #IDS AA
INNER JOIN #TOPIDS BB ON AA.ID = BB.ID
GOTO STARTIDS
END
Please help where it's going wrong.
I have a SQL Server database in which I store millions of records. Now, I have to delete a big number of records continuously, so what I do is running thousands of queries like this within the same execution:
delete TWEET
from TWEET
where REQUEST_ID >= x and TWEET_ID = y and ID < z
The single query is immediate, but putting them all together is extremely slow.
What would you suggest to me?
You can use this. You can delete 100000 rows every turn. This works faster than delete all rows in the same time.
DECLARE #RC INT = 1
WHILE #RC > 0
BEGIN
delete TOP(100000)
from TWEET
where REQUEST_ID >= x and TWEET_ID = y and ID < z
SET #RC = ##ROWCOUNT
END
Using a JOINed table is probably your best bet (see below). Another option is to use Service Broker: you send a delete request into SB and some asynchronous process deletes it when is has the CPU to do so.
------------------------------------------------------------------------
-- Create a table to hold the delete requests:
CREATE TABLE DeleteTweets (
ID INT IDENTITY(1,1) PRIMARY KEY
, REQUEST_ID INT
, TWEET_ID INT
)
GO
------------------------------------------------------------------------
-- Create an index to keep the deletions fast:
CREATE INDEX IX_DeleteTweets ON dbo.DeleteTweets (REQUEST_ID, TWEET_ID)
GO
------------------------------------------------------------------------
-- However you can put your delete requests into that table. This might
-- be part of you front-end application or via some batch process:
INSERT INTO dbo.DeleteTweets
( REQUEST_ID, TWEET_ID )
VALUES ( 0, /*REQUEST_ID*/, 0 /*TWEET_ID*/)
, ( 0, /*REQUEST_ID*/, 0 /*TWEET_ID*/)
, ( 0, /*REQUEST_ID*/, 0 /*TWEET_ID*/)
, ( 0, /*REQUEST_ID*/, 0 /*TWEET_ID*/)
, ( 0, /*REQUEST_ID*/, 0 /*TWEET_ID*/)
GO
------------------------------------------------------------------------
-- Delete from the main table via JOIN:
DELETE t
FROM TWEET t
INNER JOIN dbo.DeleteTweets dt
ON t.REQUEST_ID = dt.REQUEST_ID
AND t.TWEET_ID = dt.TWEET_ID
GO
------------------------------------------------------------------------
-- Once they're done, empty the table so you can re-fill with new deletion requests:
TRUNCATE TABLE dbo.DeleteTweets
GO
--IN QUERY WINDOW 1
DROP TABLE TWEETDEL
GO
DROP TABLE TWEET
GO
CREATE TABLE TWEET
(ID int NOT NULL
,REQUEST_ID AS ID*2
,TWEET_ID AS ID*10
)
GO
INSERT TWEET WITH (TABLOCKX) (ID )
SELECT TOP 20000000 id
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY a.id) AS id
FROM sys.sysobjects AS a, sys.syscolumns AS b, sys.syscolumns AS c) x
GO
ALTER TABLE TWEET ADD PRIMARY KEY (ID)
GO
SELECT IDENTITY(INT,1,1) AS TWEETDELPK, *
INTO TWEETDEL
FROM TWEET
WHERE REQUEST_ID%14=0
GO
ALTER TABLE TWEETDEL ADD PRIMARY KEY (TWEETDELPK)
GO
ALTER TABLE TWEETDEL
ADD CONSTRAINT fk101
FOREIGN KEY (ID) REFERENCES TWEET(ID) ON DELETE CASCADE
GO
-- IN THE SAME WINDOW
SET NOCOUNT OFF
DECLARE #ID INT
CR:
DELETE TOP (SELECT CNT FROM ##) t WITH (PAGLOCK)
FROM TWEET AS t
WHERE EXISTS
(SELECT 1 FROM TWEET AS x WHERE T.ID = X.ID)
IF ##ROWCOUNT >0 GOTO CR
-- IN ANOTHER WINDOW
CREATE TABLE ## (CNT INT)
INSERT ## SELECT 1000
select COUNT(1) from tweetdel
select COUNT(1) from tweet
CHECK FOR LOCKING - IF LOCKS ARE STAYING OPEN TOO LONG, RAISE OR LOWER CHUNKSIZE ACCORDINGLY
UPDATE ## SET CNT =5000
SELECT resource_type, resource_associated_entity_id,
request_status, request_mode,request_session_id,
resource_description
FROM sys.dm_tran_locks
WHERE resource_database_id = DB_ID()
AND request_mode in ('x')
I need to build a datetime in a select statement based on another 2 columns (datetime).
I cannot seem to get the conversion correct. Can you spot what I am doing wrong?
It seems to me that DatePart it omits the "0" part of the day
Below script should create all the data necessary
IF EXISTS (SELECT * FROM sys.databases WHERE name='TestDB')
BEGIN
ALTER DATABASE TestDB
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE TestDB
END
CREATE DATABASE TestDB
GO
IF OBJECT_ID(N'[dbo].[TestTable]','U') IS NOT NULL
DROP TABLE [dbo].[TestTable]
GO
CREATE TABLE [dbo].[TestTable]
(
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[DateSample1] datetime NOT NULL,
[DateSample2] datetime NOT NULL
)
GO
INSERT dbo.TestTable (DateSample1, DateSample2)
VALUES('2006-10-06 00:00:00.000', '2007-01-17 00:00:00.000')
/*
In your select statement you should return another column "DateSample3"
and this should be year from DateSample1 and month and day from dateSample2
*/
--my try1
SELECT
tt.DateSample1, tt.DateSample2,
DateSample3 = CAST(DATEPART(YYYY, tt.DateSample1) AS CHAR(4))
+ CAST(DATEPART(MM, tt.DateSample2) AS CHAR(2))
+ CAST(DATEPART(dd, tt.DateSample2) AS CHAR(2))
,WantedResultForDateSample3='2006-01-17 00:00:00.000'
FROM
dbo.TestTable tt
--mytry2 THROWS AN ERROR
--Conversion failed when converting date and/or time from character string
/*
SELECT
tt.DateSample1, tt.DateSample2,
DateSample3 = CONVERT(DATETIME,CAST(DATEPART(YYYY, tt.DateSample1) AS CHAR(4))
+ CAST(DATEPART(MM, tt.DateSample2) AS CHAR(2))
+ CAST(DATEPART(dd, tt.DateSample2) AS CHAR(2)),120),
WantedResult='2006-01-17 00:00:00.000'
FROM
dbo.TestTable tt
*/
You can use DATETIMEFROMPARTS
DATETIMEFROMPARTS(YEAR(tt.DateSample1),
MONTH(tt.DateSample2),
DAY(tt.DateSample2),
0,0,0,0)
Which is a lot cleaner than constructing a string IMO.
Whichever method you use you might have to deal with impossible dates with this requirement. One approach is below
SELECT CASE WHEN month=2
AND day = 29
AND (yr % 4 != 0 OR (yr % 100 = 0 AND yr % 400 != 0))
THEN NULL
ELSE
DATETIMEFROMPARTS(yr,
month,
day,
0,0,0,0)
END
FROM [dbo].[TestTable] tt
CROSS APPLY (VALUES (YEAR(tt.DateSample1),
MONTH(tt.DateSample2),
DAY(tt.DateSample2))) V(yr, month, day)
SQL Fiddle