Oracle Partition by ID and subpartition by DATE with interval - database

The schema I'm working on has a small amount of customers, with lots of data per customer.
In determining a partitioning strategy, my first thought was to partition by customer_id and then subpartition by range with a day interval. However you cannot use interval in subpartitions.
Ultimately I would like a way to automatically create partitions for new customers as they are created, and also have automatic daily subpartitions created for the customers' data. All application queries are at the customer_id level with various date ranges specified.
This post is nearly identical, but the answer involves reversing the partitioning strategy, and I would still like to find a way to accomplish range-range interval partitioning. One way could potentially be to have a monthly database job to create subpartitions for the days/months ahead, but that doesn't feel right.
Perhaps I'm wrong on my assumptions that the current data structure would benefit more from a range-range interval partitioning strategy. We have a few customers whose data dwarfs other customers, so I was thinking of ways to isolate customer data.
Any thoughts/suggestions on a better approach?
Thank you again!
UPDATE
Here is an example of what I was proposing:
CREATE TABLE PART_TEST(
CUSTOMER_ID NUMBER,
LAST_MODIFIED_DATE DATE
)
PARTITION BY RANGE (CUSTOMER_ID)
INTERVAL (1)
SUBPARTITION BY RANGE (LAST_MODIFIED_DATE)
SUBPARTITION TEMPLATE
(
SUBPARTITION subpart_1206_min values LESS THAN (TO_DATE('12/2006','MM/YYYY')),
SUBPARTITION subpart_0107 values LESS THAN (TO_DATE('01/2007','MM/YYYY')),
SUBPARTITION subpart_0207 values LESS THAN (TO_DATE('02/2007','MM/YYYY')),
...
...
...
SUBPARTITION subpart_max values LESS THAN (MAXVALUE)
)
(
PARTITION part_1 VALUES LESS THAN (1)
)
I currently have 290 subpartitions in the template. This appears to be working except for one snag. In my tests I'm finding that any record with a CUSTOMER_ID greater than 3615 fails with ORA-14400: inserted partition key does not map to any partition

You can make a RANGE INTERVAL partition on date and then LIST or RANGE subpartition on it. Would be like this:
CREATE TABLE MY_PART_TABLE
(
CUSTOMER_ID NUMBER NOT NULL,
THE_DATE TIMESTAMP(0) NOT NULL,
OTHER_COLUMNS NUMBER
)
PARTITION BY RANGE (THE_DATE) INTERVAL (INTERVAL '1' MONTH)
SUBPARTITION BY RANGE (CUSTOMER_ID)
SUBPARTITION TEMPLATE (
SUBPARTITION CUSTOMER_GROUP_1 VALUES LESS THAN (10),
SUBPARTITION CUSTOMER_GROUP_2 VALUES LESS THAN (20),
SUBPARTITION CUSTOMER_GROUP_3 VALUES LESS THAN (30),
SUBPARTITION CUSTOMER_GROUP_4 VALUES LESS THAN (40),
SUBPARTITION CUSTOMER_GROUP_5 VALUES LESS THAN (MAXVALUE)
)
(PARTITION VALUES LESS THAN ( TIMESTAMP '2015-01-01 00:00:00') );
CREATE TABLE MY_PART_TABLE
(
CUSTOMER_ID NUMBER NOT NULL,
THE_DATE TIMESTAMP(0) NOT NULL,
OTHER_COLUMNS NUMBER
)
PARTITION BY RANGE (THE_DATE) INTERVAL (INTERVAL '1' MONTH)
SUBPARTITION BY LIST (CUSTOMER_ID)
SUBPARTITION TEMPLATE (
SUBPARTITION CUSTOMER_1 VALUES (1),
SUBPARTITION CUSTOMER_2 VALUES (2),
SUBPARTITION CUSTOMER_3_to_6 VALUES (3,4,5,6),
SUBPARTITION CUSTOMER_7 VALUES (7)
)
(PARTITION VALUES LESS THAN ( TIMESTAMP '2015-01-01 00:00:00') );
Note, for the second solution the number (i.e. ID's) of customer is fix. If you get new customers you have to alter the table and modify the SUBPARTITION TEMPLATE accordingly.
Monthly partitions will be created automatically by Oracle whenever new values are inserted or updated.

Related

Range Partition in Azure SQL Database

I want to create a ranged partitioned table in an Azure SQL Database with rolling monthly partitions. So everything from a January (no matter what year) should be within one partition.
The table contains logging information for ETL processes and to ease the housekeeping I'd like to be able to truncate partitions from time to time.
In Oracle I would do it like this:
CREATE TABLE my_log (
log_id NUMBER PRIMARY KEY,
log_txt VARCHAR2(1000),
insert_date DATE
)
PARTITION BY RANGE(TO_CHAR(insert_date, 'MM')) (
partition m1 values less than ('02'),
partition m2 values less than ('03'),
partition m3 values less than ('04'),
partition m4 values less than ('05'),
partition m5 values less than ('06'),
partition m6 values less than ('07'),
partition m7 values less than ('08'),
partition m8 values less than ('09'),
partition m9 values less than ('10'),
partition m10 values less than ('11'),
partition m11 values less than ('12'),
partition m12 values less than ('13'),
partition mmax values less than (MAXVALUE)
);
And use a ALTER TABLE TRUNCATE PARTITION for housekeeping to get rid of everything older than let's say 4 months.
What I found out so far: if I create a partition function for the ranges, the column that contains the range must be part of the primary key. Is there any way to circumvent that?
This dos not work:
CREATE PARTITION FUNCTION logRangePF1 (int)
AS RANGE RIGHT FOR VALUES (1,2,3,4,5,6,7,8,9,10,11,12) ;
GO
CREATE PARTITION SCHEME logRangePS1
AS PARTITION logRangePF1
ALL TO ('PRIMARY') ;
GO
CREATE TABLE dbo.logPartitionTable (
log_id INT PRIMARY KEY ,
log_text nvarchar(1000),
insert_date datetime,
partition_column as datepart(month, insert_date) PERSISTED
)
ON logRangePS1 ( partition_column ) ;
GO
I appreciate any hint on how to archive this in an Azure SQL Database.
Thanks

Can we partition an oracle database based on two columns

I am working for a new project that require to partition the table based on two columns (city and area). does oracle database support that ?
I worked on projects before where I partition the database based on one column when creating the table. but I have no idea on how to partition using two columns do we use the same semantic or different one
CREATE TABLE TEST (....)
PARTITION BY RANGE (date1) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION TEST_INITIAL VALUES less than (DATE '2000-01-01')
);
If you have Oracle 12.2 or later, this is a snap. Use AUTOMATIC partitioning. E.g.,
CREATE TABLE my_auto_partitioned_table
( id NUMBER,
city_name VARCHAR2(80),
area_name VARCHAR2(80),
other_data VARCHAR2(400) )
PARTITION BY LIST ( city_name, area_name) AUTOMATIC
( PARTITION p_dummy VALUES (null, null) )
;
Pre 12.2, it is possible, with LIST-LIST partitioning, but it is a real pain because you have to pre-create all your partitions and subpartitions. E.g.,
CREATE TABLE my_partitioned_table
( id NUMBER,
city_name VARCHAR2(80),
area_name VARCHAR2(80),
other_data VARCHAR2(400) )
PARTITION BY LIST ( city_name )
SUBPARTITION BY LIST ( area_name )
-- if your area names are generic (e.g., "north"/"south" or "downtown"/"suburbs"),
-- you can use a SUBPARTITION TEMPLATE clause right here...
( PARTITION p_philadelpha VALUES ( 'PHILADELPHIA')
( SUBPARTITION p_philly1 VALUES ('SOUTH PHILLY','WEST PHILLY'),
SUBPARTITION p_philly2 VALUES ('NORTH PHILLY','OLD CITY')
),
PARTITION p_new_york VALUES ( 'NEW YORK')
( SUBPARTITION p_nyc1 VALUES ('SOHO'),
SUBPARTITION p_nyc2 VALUES ('HELL''S KITCHEN')
)
);
I would not do that. It is the same if somebody would ask: I like to partition my table by Months and Day of a date column. Just define the partition by Day, then you are done.
Anyway, in general you can use several columns if you define a virtual column and partition on that:
CREATE TABLE my_auto_partitioned_table
( id NUMBER,
city_name VARCHAR2(80),
area_name VARCHAR2(80),
partition_key VARCHAR2(200) GENERATED ALWAYS AS (city_name||-||area_name) VIRTUAL)
PARTITION BY LIST ( partition_key ) ...
But I don't think this would make so much sense in your case.

Running total with purchases and sales in query column

I have a table like below:
How can I have a column like below using Transact-SQL (Order By Date)?
I'm using SQL Server 2016.
The thing you need is called an aggregate windowing function, specifically SUM ... OVER.
The problem is that a 'running total' like this only makes sense if you can specify the order of the rows deterministically. The sample data does not include an attribute that could be used to provide this required ordering. Tables, by themselves, do not have an explicit order.
If you have something like an entry date column, a solution like the following would work:
DECLARE #T table
(
EntryDate datetime2(0) NOT NULL,
Purchase money NULL,
Sale money NULL
);
INSERT #T
(EntryDate, Purchase, Sale)
VALUES
('20180801 13:00:00', $1000, NULL),
('20180801 14:00:00', NULL, $400),
('20180801 15:00:00', NULL, $400),
('20180801 16:00:00', $5000, NULL);
SELECT
T.Purchase,
T.Sale,
Remaining =
SUM(ISNULL(T.Purchase, $0) - ISNULL(T.Sale, 0)) OVER (
ORDER BY T.EntryDate
ROWS UNBOUNDED PRECEDING)
FROM #T AS T;
Demo: db<>fiddle
Using ROWS UNBOUNDED PRECEDING in the window frame is shorthand for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The behaviour of ROWS is different w.r.t duplicates (and generally better-performing) than the default RANGE. There are strong arguments to say that ROWS ought to have been the default, but that is not what we were given 🙂.
For more information see How to Use Microsoft SQL Server 2012's Window Functions by Itzik Ben-Gan, and his excellent book on the topic.

Why Oracle database requires to provide at least single partition when creating partition by interval

I wonder why Oracle Databases require that at least single partition is defined when creating PARTITION BY RANGE INTERVAL
This is correct:
CREATE TABLE FOO (
bar VARCHAR2(10),
creation_date timestamp(6) not null
)
PARTITION BY RANGE (creation_date) INTERVAL (NUMTODSINTERVAL(1,'DAY')) (
PARTITION part_01 values LESS THAN (TO_DATE('01-03-2018','DD-MM-YYYY'))
)
This however not:
CREATE TABLE FOO (
bar VARCHAR2(10),
creation_date timestamp(6) not null
)
PARTITION BY RANGE (creation_date) INTERVAL (NUMTODSINTERVAL(1,'DAY'))
I would expect that the first partition would be required in some migration case but not when creating a new table.
Oracle documentation about that:
The INTERVAL clause of the CREATE TABLE statement establishes interval partitioning for the table. You must specify at least one range partition using the PARTITION clause.
https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_admin001.htm#BAJHFFBE
Without default interval Oracle does not know where to start the interval. For daily partition it is not so obvious but imagine you have one partition per week, i.e. 7 days.
Shall it be Monday-Monday or Sunday-Sunday or something else?
What does an interval of "1 DAY" mean? From 00:00:00 - 23:59:59 (as implicitly given in your example) or something else, for example 12:00:00 - 11:59:59 (which would be PARTITION part_01 values LESS THAN (TO_DATE('01-03-2018 12:00','DD-MM-YYYY HH24:MI')))

SQL Server index - very large table with where clause against a very small range of values - do I need an index for the where clause?

I am designing a database with a single table for a special scenario I need to implement a solution for. The table will have several hundred million rows after a short time, but each row will be fairly compact. Even when there are a lot of rows, I need insert, update and select speeds to be nice and fast, so I need to choose the best indexes for the job.
My table looks like this:
create table dbo.Domain
(
Name varchar(255) not null,
MetricType smallint not null, -- very small range of values, maybe 10-20 at most
Priority smallint not null, -- extremely small range of values, generally 1-4
DateToProcess datetime not null,
DateProcessed datetime null,
primary key(Name, MetricType)
);
A select query will look like this:
select Name from Domain
where MetricType = #metricType
and DateProcessed is null
and DateToProcess < GETUTCDATE()
order by Priority desc, DateToProcess asc
The first type of update will look like this:
merge into Domain as target
using #myTablePrm as source
on source.Name = target.Name
and source.MetricType = target.MetricType
when matched then
update set
DateToProcess = source.DateToProcess,
Priority = source.Priority,
DateProcessed = case -- set to null if DateToProcess is in the future
when DateToProcess < DateProcessed then DateProcessed
else null end
when not matched then
insert (Name, MetricType, Priority, DateToProcess)
values (source.Name, source.MetricType, source.Priority, source.DateToProcess);
The second type of update will look like this:
update Domain
set DateProcessed = source.DateProcessed
from #myTablePrm source
where Name = source.Name and MetricType = #metricType
Are these the best indexes for optimal insert, update and select speed?
-- for the order by clause in the select query
create index IX_Domain_PriorityQueue
on Domain(Priority desc, DateToProcess asc)
where DateProcessed is null;
-- for the where clause in the select query
create index IX_Domain_MetricType
on Domain(MetricType asc);
Observations:
Your updates should use the PK
Why not use tinyint (range 0-255) to make the rows even narrower?
Do you need datetime? Can you use smalledatetime?
Ideas:
Your SELECT query doesn't have an index to cover it. You need one on (DateToProcess, MetricType, Priority DESC) INCLUDE (Name) WHERE DateProcessed IS NULL
`: you'll have to experiment with key column order to get the best one
You could extent that index to have a filtered indexes per MetricType too (keeping DateProcessed IS NULL filter). I'd do this after the other one when I do have millions of rows to test with
I suspect that your best performance will come from having no indexes on Priority and MetricType. The cardinality is likely too low for the indexes to do much good.
An index on DateToProcess will almost certainly help, as there is lilely to be high cardinality in that column and it is used in a WHERE and ORDER BY clause. I would start with that first.
Whether an index on DateProcessed will help is up for debate. That depends on what percentage of NULL values you expect for this column. Your best bet, as usual, is to examine the query plan with some real data.
In the table schema section, you have highlighted that 'MetricType' is one of two Primary keys, therefore this should definately be indexed along with the Name column. As for the 'Priority' and 'DateToProcess' fields as these will be present in a where clause it can't hurt to have them indexed also but I don't recommend the where clause you have on that index of 'DateProcessed' is null, indexing just a set of the data is not a good idea, remove this and index the whole of both those columns.

Resources