I am working for a new project that require to partition the table based on two columns (city and area). does oracle database support that ?
I worked on projects before where I partition the database based on one column when creating the table. but I have no idea on how to partition using two columns do we use the same semantic or different one
CREATE TABLE TEST (....)
PARTITION BY RANGE (date1) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION TEST_INITIAL VALUES less than (DATE '2000-01-01')
);
If you have Oracle 12.2 or later, this is a snap. Use AUTOMATIC partitioning. E.g.,
CREATE TABLE my_auto_partitioned_table
( id NUMBER,
city_name VARCHAR2(80),
area_name VARCHAR2(80),
other_data VARCHAR2(400) )
PARTITION BY LIST ( city_name, area_name) AUTOMATIC
( PARTITION p_dummy VALUES (null, null) )
;
Pre 12.2, it is possible, with LIST-LIST partitioning, but it is a real pain because you have to pre-create all your partitions and subpartitions. E.g.,
CREATE TABLE my_partitioned_table
( id NUMBER,
city_name VARCHAR2(80),
area_name VARCHAR2(80),
other_data VARCHAR2(400) )
PARTITION BY LIST ( city_name )
SUBPARTITION BY LIST ( area_name )
-- if your area names are generic (e.g., "north"/"south" or "downtown"/"suburbs"),
-- you can use a SUBPARTITION TEMPLATE clause right here...
( PARTITION p_philadelpha VALUES ( 'PHILADELPHIA')
( SUBPARTITION p_philly1 VALUES ('SOUTH PHILLY','WEST PHILLY'),
SUBPARTITION p_philly2 VALUES ('NORTH PHILLY','OLD CITY')
),
PARTITION p_new_york VALUES ( 'NEW YORK')
( SUBPARTITION p_nyc1 VALUES ('SOHO'),
SUBPARTITION p_nyc2 VALUES ('HELL''S KITCHEN')
)
);
I would not do that. It is the same if somebody would ask: I like to partition my table by Months and Day of a date column. Just define the partition by Day, then you are done.
Anyway, in general you can use several columns if you define a virtual column and partition on that:
CREATE TABLE my_auto_partitioned_table
( id NUMBER,
city_name VARCHAR2(80),
area_name VARCHAR2(80),
partition_key VARCHAR2(200) GENERATED ALWAYS AS (city_name||-||area_name) VIRTUAL)
PARTITION BY LIST ( partition_key ) ...
But I don't think this would make so much sense in your case.
Related
I want to create a ranged partitioned table in an Azure SQL Database with rolling monthly partitions. So everything from a January (no matter what year) should be within one partition.
The table contains logging information for ETL processes and to ease the housekeeping I'd like to be able to truncate partitions from time to time.
In Oracle I would do it like this:
CREATE TABLE my_log (
log_id NUMBER PRIMARY KEY,
log_txt VARCHAR2(1000),
insert_date DATE
)
PARTITION BY RANGE(TO_CHAR(insert_date, 'MM')) (
partition m1 values less than ('02'),
partition m2 values less than ('03'),
partition m3 values less than ('04'),
partition m4 values less than ('05'),
partition m5 values less than ('06'),
partition m6 values less than ('07'),
partition m7 values less than ('08'),
partition m8 values less than ('09'),
partition m9 values less than ('10'),
partition m10 values less than ('11'),
partition m11 values less than ('12'),
partition m12 values less than ('13'),
partition mmax values less than (MAXVALUE)
);
And use a ALTER TABLE TRUNCATE PARTITION for housekeeping to get rid of everything older than let's say 4 months.
What I found out so far: if I create a partition function for the ranges, the column that contains the range must be part of the primary key. Is there any way to circumvent that?
This dos not work:
CREATE PARTITION FUNCTION logRangePF1 (int)
AS RANGE RIGHT FOR VALUES (1,2,3,4,5,6,7,8,9,10,11,12) ;
GO
CREATE PARTITION SCHEME logRangePS1
AS PARTITION logRangePF1
ALL TO ('PRIMARY') ;
GO
CREATE TABLE dbo.logPartitionTable (
log_id INT PRIMARY KEY ,
log_text nvarchar(1000),
insert_date datetime,
partition_column as datepart(month, insert_date) PERSISTED
)
ON logRangePS1 ( partition_column ) ;
GO
I appreciate any hint on how to archive this in an Azure SQL Database.
Thanks
I am pretty new to table partitioning technique supported by MS SQL server. I have a huge table that has more than 40 millions of records and want to apply table partitioning to this table. Most of the examples I find about the partition function is to define the partition function as Range LEFT|RIGHT for Values(......), but what I need exactly is to something like following example I found from Oracle web page:
CREATE TABLE q1_sales_by_region
(...,
...,
...,
state varchar2(2))
PARTITION BY LIST (state)
(PARTITION q1_northwest VALUES ('OR', 'WA'),
PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'),
PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'),
PARTITION q1_southeast VALUES ('FL', 'GA'),
PARTITION q1_northcentral VALUES ('SD', 'WI'),
PARTITION q1_southcentral VALUES ('OK', 'TX'));
);
The example shows that we can specify a PARTITION BY LIST clause in the CREATE TABLE statement, and the PARTITION clauses specify lists of discrete values that qualify rows to be included in the partition.
My question is does MS SQL server support table partitioning by List as well?
It does not. SQL Server's partitioned tables only support range partitioning.
In this circumstance, you may wish instead to consider using a Partitioned View.
There are a number of restrictions (scroll down slightly from the link anchor) that apply to partitioned views but the key here is that the partitioning is based on CHECK constraints within the underlying tables and one form the CHECK can take is <col> IN (value_list).
However, setting up partitioned views is considerably more "manual" than creating a partitioned table - each table that holds some of the view data has to be individually and explicitly created.
You can achieve this by using ausillary computed persisted column.
Here you can find a complete example:
LIST Partitioning in SQL Server
The idea is to create a computed column based on your list like this:
alter table q1_sales_by_region add calc_field (case when q1_northwest in ('OR', 'WA') then 1...end) PERSISTED
And then partition on this calc_field using standard range partition function
What are you trying to accomplish with partitioning? 40M rows was huge 20 years ago but commonplace nowadays. Index and query tuning is especially important for performance of large tables, although partitioning can improve performance of large scans when the partitioning column is not the leftmost clustered index key column and partitions can be eliminated during query processing.
For improved manageability and control over physical placement on different filegroups, you can use range partitioning with a filegroup per region. For example:
CREATE TABLE q1_sales_by_region
(
--
state char(2)
);
CREATE PARTITION FUNCTION PF_State(char(2)) AS RANGE RIGHT FOR VALUES(
'AZ'
, 'FL'
, 'GA'
, 'NJ'
, 'NM'
, 'NY'
, 'OK'
, 'OR'
, 'SD'
, 'TX'
, 'UT'
, 'VM'
, 'WA'
, 'WI'
);
CREATE PARTITION SCHEME PS_State AS PARTITION PF_State TO(
[PRIMARY] --unused
, q1_southwest --'AZ'
, q1_southeast --'FL'
, q1_southeast --'GA'
, q1_northeast --'NJ'
, q1_southwest --'NM'
, q1_northeast --'NY'
, q1_southcentral --'OK'
, q1_northwest --'OR'
, q1_northcentral --'SD'
, q1_southcentral --'TX'
, q1_southwest --'UT'
, q1_northeast --'VM'
, q1_northwest --'WA'
, q1_northcentral --'WI'
);
You can also add a check constraint if you don't already have a related table to enforce only valid state values:
ALTER TABLE q1_sales_by_region
ADD CONSTRAINT ck_q1_sales_by_region_state
CHECK (state IN('OR', 'WA', 'AZ', 'UT', 'NM','NY', 'VM', 'NJ','FL', 'GA','SD', 'WI','OK', 'TX'));
This is my table:
DocumentTypeId DocumentType UserId CreatedDtm
--------------------------------------------------------------------------
2d47e2f8-4 PDF 443f-4baa 2015-12-03 17:56:59.4170000
b4b-4803-a Images a99f-1fd 1997-02-11 22:16:51.7000000
600-0e32 XL e60e07a6b 2015-08-19 15:26:11.4730000
40f8ff9f Word 79b399715 1994-04-23 10:33:44.2300000
8230a07c email 750e-4c3d 2015-01-10 09:56:08.1700000
How can I shift the last entire row (DocumentType=email) on 3rd position,(before DocumentType=XL) without changing table values?
Without wishing to deny the truth of what others have said here, SQL Server does have CLUSTERED indices. For full details on these and the difference between a clustered table and a non-clustered one, please see here. In effect, a clustered table does have data written to disk in index order. However, due to subsequent insertions and deletions, you should never rely on any given record being in a fixed ordinal position.
To get your data showing email third and XL fourth, you simply need to order by CreatedDtm. Thus:
declare #test table
(
DocumentTypeID varchar(20),
DocumentType varchar(10),
UserID varchar(20),
CreatedDtm datetime
)
INSERT INTO #test VALUES
('2d47e2f8-4','PDF','443f-4baa','2015-12-03 17:56:59'),
('b4b-4803-a','Images','a99f-1fd','1997-02-11 22:16:51'),
('600-0e32','XL','e60e07a6b','2015-08-19 15:26:11'),
('40f8ff9f','Word','79b399715','1994-04-23 10:33:44'),
('8230a07c','email','750e-4c3d','2015-01-10 09:56:08')
SELECT * FROM #test order by CreatedDtm
This gives a result set of:
40f8ff9f Word 79b399715 1994-04-23 10:33:44.000
b4b-4803-a Images a99f-1fd 1997-02-11 22:16:51.000
8230a07c email 750e-4c3d 2015-01-10 09:56:08.000
600-0e32 XL e60e07a6b 2015-08-19 15:26:11.000
2d47e2f8-4 PDF 443f-4baa 2015-12-03 17:56:59.000
This maybe what you are looking for, but I cannot stress enough, that it only gives email 3rd and XL 4th in this particular case. If the dates were different, it would not be so. But perhaps, this was all that you needed?
I assumed that you need to sort by DocumentTypecolumn.
Joining with a temp table, which may contain virtually DocumenTypes with desired SortOrder, you can achieve the result you want.
declare #tbl table(
DocumentTypeID varchar(50),
DocumentType varchar(50)
)
insert into #tbl(DocumentTypeID, DocumentType)
values
('2d47e2f8-4','PDF'),
('b4b-4803-a','Images'),
('600-0e32','XL'),
('40f8ff9f','Word'),
('8230a07c','email')
;
--this will give you original output
select * from #tbl;
--this will output rows with new sort order
select t.* from #tbl t
inner join
(
select *
from
(values
('PDF',1, 1),
('Images',2, 2),
('XL',3, 4),
('Word',4, 5),
('email',5, 3) --here I put new sort order '3'
) as dt(TypeName, SortOrder, NewSortOrder)
) dt
on dt.TypeName = t.DocumentType
order by dt.NewSortOrder
The row positions don't really matter in SQL tables, since it's all unordered sets of data, but if you really want to switch the rows I'd suggest you send all your data to temp table e.g,
SELECT * FROM [tablename] INTO #temptable
then delete/truncate the data from that table (if it won't mess the other tables it's connected to) and use the temp table you made to insert into it as you like, since it'll have all the same fields with the same data from the original.
I have a view which displays test data from multiple sources for a GPS spot.
The view displays the "GPS Point ID" and some geological test results associated with this GPS Point.
The GPS-POINT-ID is like this : XYZ-0XX-CCCCC
XYZ : Area
00XX : ID
CCCC: Coordinates
The GPS point name changes over time, the first portion of point name(XYZ-0XX) is same and doesn't change, but the Coordinate part (CCCC) changes according to new GPS point location.
I wanted to design a table that will have the previously mentioned view as a datasource. I need to decide about the following:
Primary key: if I used the full GPS-POINT-ID, I won't be able to keep track of the changes because it changes frequently over time. I can't keep track of the point. and I can't link it to it's historical records.
If I use the fixed part of GPS-Point-ID (XYZ-00XX) as a computed column, I can't use it as a primary key, because the same point has many historical records that have the same (XYZ-00XX) part, this will violate the primary key duplicate constraint.
If I create an identity column that will increase for each new record, how can I keep track of each point name change and get the latest test data as well as historical data for each point (XYZ-00XX).
Sample rows from the view are attached in a snapshot.
Thanks
I would recommend using identity for primary key with no business value. I would store the data in two columns one with the static data and another with the changing data. Then you can have a computed column that puts them together as one field if that is necessary. You can also add a date field so that you can follow the history. The static data column being the identifier that ties the records together.
I am assuming you do not want to use auditing to track historical records for some reason. That is the approach I would normally take.
http://weblogs.asp.net/jongalloway/adding-simple-trigger-based-auditing-to-your-sql-server-database
EDIT:
Sample query works if only one update can happen on a given date. If more than one update can occur then the row_number function can be used instead of group by.
Select *
From Table T1
Join (Select Max(MatchDate) MatchDate, GpsStaticData
From Table Group By GpsStaticData) T2
On T1.GpsStaticData = T2.GpsStaticData And T1.UpdateDate = T2.MatchDate
EDIT:
Using Row_Number()
With cteGetLatest As
(
Select UpdateDate MatchDate, GpsStaticData,
Row_Number() Over (Partition By GpsStaticData, Order By UpdateDate Desc) SortOrder
)
Select *
From Table T1
Join (Select MMatchDate, GpsStaticData
From cteGetLatest Where SortOrder = 1) T2
On T1.GpsStaticData = T2.GpsStaticData And T1.UpdateDate = T2.MatchDate
You can add more fields after Order By UpdateDate in the row_number function to determine which record is selected.
--To avoid artificial columns overhead costs a compound Primary Key can be used:
-- Simulate the Source View
CREATE TABLE ybSourceView (
[GPS-POINT-ID] VARCHAR(20),
[Status] NVARCHAR(MAX),
UpdateDate [datetime2],
Reason NVARCHAR(MAX),
OpId VARCHAR(15)
);
-- Source View sample data
INSERT INTO ybSourceView ([GPS-POINT-ID], [Status], UpdateDate, Reason, OpId)
VALUES ('RF-0014-9876', 'Reachable' , '2015-01-27 13:36', 'New Updated Coordinate' , 'AFERNANDO'),
('RF-0014-9876', 'Reachable' , '2014-02-27 09:37', 'New Updated Coordinate' , 'AFERNANDO'),
('RF-0014-3465', 'Reachable' , '2015-04-27 09:42', 'New Updated Coordinate' , 'HRONAULD' ),
('RF-0014-2432', 'Reachable' , '2013-06-27 12:00', 'New Updated Coordinate' , 'AFERNANDO'),
('RF-0015-9876', 'OUT_OF_Range', '2014-04-14 12:00', 'Point Abandoned, getting new coordinate', 'AFERNANDO');
-- Historic Data Table
CREATE TABLE ybGPSPointHistory (
Area VARCHAR(5) NOT NULL DEFAULT '',
ID VARCHAR(10) NOT NULL DEFAULT '',
Coordinates VARCHAR(20) NOT NULL DEFAULT '',
[GPS-POINT-ID] VARCHAR(20),
[Status] NVARCHAR(MAX),
UpdateDate [datetime2] NOT NULL DEFAULT SYSUTCDATETIME(),
Reason NVARCHAR(MAX),
OpId VARCHAR(15),
CONSTRAINT ybGPSPointHistoryPK PRIMARY KEY (Area, ID, UpdateDate) --< Compound Primary Key
);
GO
-- Update Historic Data Table from the Source View
INSERT INTO ybGPSPointHistory (Area, ID, Coordinates, [GPS-POINT-ID], [Status], UpdateDate, Reason, OpId)
SELECT LEFT(Src.[GPS-POINT-ID], LEN(Src.[GPS-POINT-ID]) - 10), RIGHT(LEFT(Src.[GPS-POINT-ID], LEN(Src.[GPS-POINT-ID]) - 5), 4), RIGHT(Src.[GPS-POINT-ID], 4), Src.[GPS-POINT-ID], Src.[Status], Src.UpdateDate, Src.Reason, Src.OpId
FROM ybSourceView Src
LEFT JOIN ybGPSPointHistory Tgt ON Tgt.[GPS-POINT-ID] = Src.[GPS-POINT-ID] AND Tgt.UpdateDate = Src.UpdateDate
WHERE Tgt.[GPS-POINT-ID] Is NULL;
--Tests (check Actual Execution Plan to see PK use):
-- Full history
SELECT * FROM ybGPSPointHistory;
-- Up-to-date only
SELECT *
FROM (
SELECT *, RANK () OVER (PARTITION BY Area, ID ORDER BY UpdateDate DESC) As HistoricOrder
FROM ybGPSPointHistory
) a
WHERE HistoricOrder = 1;
-- Latest record for a particular ID
SELECT TOP 1 *
FROM ybGPSPointHistory a
WHERE [GPS-POINT-ID] = 'RF-0014-9876'
ORDER BY UpdateDate DESC;
-- Latest record for a particular ID in details (more efficient)
SELECT TOP 1 *
FROM ybGPSPointHistory a
WHERE Area = 'RF' AND ID = '0014' AND Coordinates = '9876'
ORDER BY UpdateDate DESC;
-- Latest record for a particular point
SELECT TOP 1 *
FROM ybGPSPointHistory a
WHERE Area = 'RF' AND ID = '0014'
ORDER BY UpdateDate DESC;
--Clean-up:
DROP TABLE ybGPSPointHistory;
DROP TABLE ybSourceView;
The schema I'm working on has a small amount of customers, with lots of data per customer.
In determining a partitioning strategy, my first thought was to partition by customer_id and then subpartition by range with a day interval. However you cannot use interval in subpartitions.
Ultimately I would like a way to automatically create partitions for new customers as they are created, and also have automatic daily subpartitions created for the customers' data. All application queries are at the customer_id level with various date ranges specified.
This post is nearly identical, but the answer involves reversing the partitioning strategy, and I would still like to find a way to accomplish range-range interval partitioning. One way could potentially be to have a monthly database job to create subpartitions for the days/months ahead, but that doesn't feel right.
Perhaps I'm wrong on my assumptions that the current data structure would benefit more from a range-range interval partitioning strategy. We have a few customers whose data dwarfs other customers, so I was thinking of ways to isolate customer data.
Any thoughts/suggestions on a better approach?
Thank you again!
UPDATE
Here is an example of what I was proposing:
CREATE TABLE PART_TEST(
CUSTOMER_ID NUMBER,
LAST_MODIFIED_DATE DATE
)
PARTITION BY RANGE (CUSTOMER_ID)
INTERVAL (1)
SUBPARTITION BY RANGE (LAST_MODIFIED_DATE)
SUBPARTITION TEMPLATE
(
SUBPARTITION subpart_1206_min values LESS THAN (TO_DATE('12/2006','MM/YYYY')),
SUBPARTITION subpart_0107 values LESS THAN (TO_DATE('01/2007','MM/YYYY')),
SUBPARTITION subpart_0207 values LESS THAN (TO_DATE('02/2007','MM/YYYY')),
...
...
...
SUBPARTITION subpart_max values LESS THAN (MAXVALUE)
)
(
PARTITION part_1 VALUES LESS THAN (1)
)
I currently have 290 subpartitions in the template. This appears to be working except for one snag. In my tests I'm finding that any record with a CUSTOMER_ID greater than 3615 fails with ORA-14400: inserted partition key does not map to any partition
You can make a RANGE INTERVAL partition on date and then LIST or RANGE subpartition on it. Would be like this:
CREATE TABLE MY_PART_TABLE
(
CUSTOMER_ID NUMBER NOT NULL,
THE_DATE TIMESTAMP(0) NOT NULL,
OTHER_COLUMNS NUMBER
)
PARTITION BY RANGE (THE_DATE) INTERVAL (INTERVAL '1' MONTH)
SUBPARTITION BY RANGE (CUSTOMER_ID)
SUBPARTITION TEMPLATE (
SUBPARTITION CUSTOMER_GROUP_1 VALUES LESS THAN (10),
SUBPARTITION CUSTOMER_GROUP_2 VALUES LESS THAN (20),
SUBPARTITION CUSTOMER_GROUP_3 VALUES LESS THAN (30),
SUBPARTITION CUSTOMER_GROUP_4 VALUES LESS THAN (40),
SUBPARTITION CUSTOMER_GROUP_5 VALUES LESS THAN (MAXVALUE)
)
(PARTITION VALUES LESS THAN ( TIMESTAMP '2015-01-01 00:00:00') );
CREATE TABLE MY_PART_TABLE
(
CUSTOMER_ID NUMBER NOT NULL,
THE_DATE TIMESTAMP(0) NOT NULL,
OTHER_COLUMNS NUMBER
)
PARTITION BY RANGE (THE_DATE) INTERVAL (INTERVAL '1' MONTH)
SUBPARTITION BY LIST (CUSTOMER_ID)
SUBPARTITION TEMPLATE (
SUBPARTITION CUSTOMER_1 VALUES (1),
SUBPARTITION CUSTOMER_2 VALUES (2),
SUBPARTITION CUSTOMER_3_to_6 VALUES (3,4,5,6),
SUBPARTITION CUSTOMER_7 VALUES (7)
)
(PARTITION VALUES LESS THAN ( TIMESTAMP '2015-01-01 00:00:00') );
Note, for the second solution the number (i.e. ID's) of customer is fix. If you get new customers you have to alter the table and modify the SUBPARTITION TEMPLATE accordingly.
Monthly partitions will be created automatically by Oracle whenever new values are inserted or updated.