Oracle 11 Index only for part of the data - database

I'm using a table a table called T and have a column called C_I (Index) and C_D (Data).
Now I would like to only have the row in the index if C_D = null.
CREATE INDEX T_INDEX ON T(C_I) STORAGE(BUFFER_POOL KEEP);
How do I get some WHERE C_D IS NULL clause into this create statement?

Let me first make sure I understand the question correctly:
You want to speed-up SELECT .. WHERE C_D IS NULL but you do not want to speed-up any of the queries that search for a non-NULL C_D.
You also want to make sure no "unnecessary" non-NULL values are in the index, to save space.
If that understanding is correct, then what you need is a functional index. I'e. an index on a function on a field, not a field itself...
CREATE INDEX T_IE1 ON T (CASE WHEN C_D IS NULL THEN 1 ELSE NULL END) COMPRESS
...which you would then query as...
SELECT * FROM T WHERE (CASE WHEN C_D IS NULL THEN 1 ELSE NULL END) = 1
...which is equivalent to...
SELECT * FROM T WHERE C_D IS NULL
...but faster since it uses the index:
This saves space because single-column indexes do not store NULLs. Also, use COMPRESS since index will ever only contain one key so there is no need to waste space on repeating the same key over and over again in the index structure.
NOTE: Under Oracle 11, you could also create a function-based virtual column (based on the CASE expression above), then index and query on that column directly, to save some repetitive typing.
--- EDIT ---
If you are also interested in querying on C_I together with C_D IS NULL, you could...
CREATE UNIQUE INDEX T_IE2 ON T (C_I, CASE WHEN C_D IS NULL THEN 1 ELSE NULL END)
...and query it with (for example)...
SELECT * FROM T WHERE C_I > 'some value' AND (CASE WHEN C_D IS NULL THEN 1 ELSE NULL END) = 1
...which is equivalent of...
SELECT * FROM T WHERE C_I > 'some value' AND C_D IS NULL
...but faster, since it uses the index T_IE2.
This is in fact the only index that you need on your table (it "covers" the primary key, so you no longer need a separate index just on C_I). Which also means a same ROWIDs is never stored in more than one index, which saves space.
NOTE: COMPRESS no longer makes sense for index T_IE2.
--- EDIT 2 ---
If you care about simplicity more than space, you can just create a composite index on {C_I, C_D}. Oracle stores NULL values in composite index as long as there is at least one non-NULL value in the same tuple:
CREATE UNIQUE INDEX T_IE3 ON T (C_I, C_D)
This uses the index:
SELECT * FROM T WHERE C_I > 1 AND C_D IS NULL
As in previous EDIT, this is the only index that you need on your table.

CREATE INDEX T_INDEX ON T ( CASE WHEN CD IS NULL THEN C_I ELSE NULL END);
This works because Oracle will not put the null values returned by the CASE statement into the index.

Let me "repackage" my original answer.
Create the table like this:
CREATE TABLE T ...;
CREATE INDEX T_PK_IDX ON T (C_I, CASE WHEN C_D IS NULL THEN 1 ELSE NULL END);
ALTER TABLE T ADD CONSTRAINT T_PK PRIMARY KEY (C_I) USING INDEX T_PK_IDX;
And query like this:
SELECT * FROM T
WHERE
C_I > 'some value'
AND (CASE WHEN C_D IS NULL THEN 1 ELSE NULL END) = 1
Query plan:

Related

sql server multi columns index & query filter has the first and the last columns in the index keys

If I have multi-columns index with column keys: col_1, col_2, and col_3
Is the query would use this index or not if it has in the WHERE clause these conditions:
col_1 = any_value AND col_3 = any_value
(the second columns in the index keys was not added to WHERE-clause)
and here is another example:
if the index has 10 columns and the column keys in this order:
col_1, col_2, ...., col_10
and then, I have run this query:
Select col_1,col_2, ..., col_10 from X
WHERE col_1 = any_value AND col_5 = any_value AND col_10 = any_value
and my question: Is the index would be used in this case or not??
new answer as your question is now more clear to me
No, the index will not be used. Only when querying on col_1 OR col_1/col_2 OR col_1/col_2/col_3 the index will/may be used. Check this with the execution plan of your query. The order of your multi-column index does matter: check this question for some discussion around this topic Multiple Indexes vs Multi-Column Indexes
If you consider that it will be more likely you will query on col_1 and col_3, why not creating a multicolumn index just on those 2 columns?
It might be used. It depends on many factors, mostly your data(and statistics about your data), and your queries.
TL/DR; you need to test this on your own data and your own queries. The index might be used.
You should try it out on the data that you have or expect to have. It is very easy to create some test-data on which you can test your queries and try different indexes. You might also need to reconsider the order of the columns in the index, is col_1 really the best column to be first in the index?
Below is a very specific scenario from which we can only conclude that the index can be used, sometimes, in similar scenarios as yours.
Consider this scenario below; the table contains 1M rows and have only a single nonclustered index on (a, b, c). Note that the values in column D is very large.
The first 4 queries below used the index, only the fifth query did not.
Why?
Sql Server will need to figure out how to complete the query while reading the least amount of data. Sometimes it is easier for SQL Server to read the index instead of the table even when the query-filter does not completely match the index.
In Query 1 and 2 the query will actually do a Seek on the index which is quite good. It knows that column A is a good candidate to perform the Seek on.
In query 3 and 4 it needs to scan the entirety of the index to find the matching rows. It still used the index.
In query 5 SQL Server realizes that it is easier to scan the actual table instead of the index.
IF OBJECT_ID('tempdb..#peter') IS NOT NULL DROP TABLE #peter;
CREATE TABLE #peter(a INT, b INT, c VARCHAR(100), d VARCHAR(MAX));
WITH baserows AS (
SELECT * FROM master..spt_values WHERE type = 'P'
),
numbered AS (
SELECT TOP 1000000
a.*, rn = ROW_NUMBER() OVER(ORDER BY (SELECT null))
FROM baserows a, baserows b, baserows c
)
INSERT #peter
( a, b, c, d )
SELECT
rn % 100, rn % 10, CHAR(65 + (rn % 60)), REPLICATE(CHAR(65 + (rn % 60)), rn)
FROM numbered
CREATE INDEX ix_peter ON #peter(a, b, c);
-- First query does Seek on the index + RID Lookup.
SELECT * FROM #peter WHERE a = 55 AND c = 'P'
-- Second Query does Seek on the index.
SELECT a, b, c FROM #peter WHERE a = 55 AND c = 'P'
-- Third query does Scan on the index because the index is smaller to scan than the full table.
SELECT a, b, c FROM #peter WHERE c = 'P'
-- Fourth query does a scan on the index
SELECT a, b, c FROM #peter WHERE b = 22
-- Fifth query scans the table and not the index
SELECT MAX(d) FROM #peter
Tested on SQL Server 2014.
The index will definitely be used but not effectively.
I did an experiment (SQL Server) and here is how it looks [IX_AB is an index on a, b] and I can correlate your problem with it.
These are the conclusions
If you create an index with col1, col2, and col3 and pass col1 and col3 only, the index will only filter col1 values and then data retrieved from there will be filtered programmatically O(N) where N is the records marked by the index.
Passing the mid-value as "not null" or "null" does not help.

index on partitioned table doesn't enhance performance with a function in where clause

I have this query
select col1,col2, x.id pk
/*+ INDEX (some_index_on_col4)*/
from tbl1 y
,tbl2 x
where col2 = 'some_value' and col3 = 'U'
and x.col4 = dbms_lob.substr( REPLACE(y.PK_DATA,'"',''), 100, 1 )
;
the query is very slow, and when I explain the plan, it shows that the index is not used but a full table scan is used instead, if I remove
dbms_lob.substr( REPLACE(y.PK_DATA,'"',''), 100, 1 )
and say instead
x.col4 = 3456
it's working fine, how can I enhance this?
N.B. : tbl2 is partitioned
One obvious difference (and offen cause of not using index) is that the result of dbms_lob.substr( REPLACE(y.PK_DATA,'"',''), 100, 1 )is VARCHAR, not a NUMBER as 3456.
So if possible transform it with to_number.
But you will not get the same plan as for 3456 becaouse this is constant; the original query uses y.PK_DATA.
actually there was no match and that is why the index wasn't used as a full scan was performed any way ... but when there is a match, the index is used

TSQL query optimizer view on non-nullable ISNULL()

As part of some dynamic SQL (ick), I've implemented the 'sort NULLs last' solution described here: Sorting null-data last in database query
ORDER BY CASE column WHEN NULL THEN 1 ELSE 0 END, column
My question is: On non-nullable columns that have ISNULL() applied to them, will the query optimizer strip this out when it realises that it will never apply?
It's not clear why your question mentions the ISNULL function when that isn't in your code.
ORDER BY CASE column WHEN NULL THEN 1 ELSE 0 END, column
First of all this code doesn't work, it is equivalent to CASE WHEN column = NULL which is not what you need.
It would need to be
ORDER BY CASE WHEN column IS NULL THEN 1 ELSE 0 END, column
The optimisation question is easy to test.
CREATE TABLE #T
(
X INT NOT NULL PRIMARY KEY
)
SELECT *
FROM #T
ORDER BY X
SELECT *
FROM #T
ORDER BY CASE WHEN X IS NULL THEN 1 ELSE 0 END, X
DROP TABLE #T
The plan shows a sort operation in the second plan indicating that this was not optimised out as you hoped and the pattern is less efficient than ORDER BY X.

conditional "next value for sequence"

scenario:
Sql Server 2012 Table named "Test" has two fields. "CounterNo" and "Value" both integers.
There are 4 sequence objects defined named sq1, sq2, sq3, sq4
I want to do these on inserts:
if CounterNo = 1 then Value = next value for sq1
if CounterNo = 2 then Value = next value for sq2
if CounterNo = 3 then Value = next value for sq3
I think, create a custom function assign it as default value of Value field. But when i tried custom functions not supports "next value for Sequence Objects"
Another way is using trigger. That table has trigger already.
Using a Stored Procedure for Inserts is the best way. But EntityFramework 5 Code-First is not supporting it.
Can you suggest me a way to achieve this.
(if you show me how can i do it with custom functions you can also post it here. It's another question of me.)
Update:
In reality there are 23 fields in that table and also primary keys setted and i'm generating this counter value on software side, using "counter table".It is not good to generate counter values on client side.
I'm using 4 sequence objects as counters because they represents different types of records.
If i use 4 counters on same record at same time, all of them generates next values. I want only related counter generates it's next value while others remains same.
I'm not shure if I fully understand your use case but maybe the following sample illustrates what you need.
Create Table Vouchers (
Id uniqueidentifier Not Null Default NewId()
, Discriminator varchar(100) Not Null
, VoucherNumber int Null
-- ...
, MoreData nvarchar(100) Null
);
go
Create Sequence InvoiceSequence AS int Start With 1 Increment By 1;
Create Sequence OrderSequence AS int Start With 1 Increment By 1;
go
Create Trigger TR_Voucher_Insert_VoucherNumer On Vouchers After Insert As
If Exists (Select 1 From inserted Where Discriminator = 'Invoice')
Update v
Set VoucherNumber = Next Value For InvoiceSequence
From Vouchers v Inner Join inserted i On (v.Id = i.Id)
Where i.Discriminator = 'Invoice';
If Exists (Select 1 From inserted Where Discriminator = 'Order')
Update v
Set VoucherNumber = Next Value For OrderSequence
From Vouchers v Inner Join inserted i On (v.Id = i.Id)
Where i.Discriminator = 'Order';
go
Insert Into Vouchers (Discriminator, MoreData)
Values ('Invoice', 'Much')
, ('Invoice', 'More')
, ('Order', 'Data')
, ('Invoice', 'And')
, ('Order', 'Again')
;
go
Select * From Vouchers;
Now Invoice- and Order-Numbers will be incremented independently. And as you can have multiple insert triggers on the same table, that shouldn't be an issue.
I think you're thinking about this in the wrong way. You have 3 values and these values are determined by another column. Switch it around, create 3 columns and remove the Counter column.
If you have a table with value1, value2 and value3 then the Counter value is implied by the column in which the value resides. Create a unique index on these three columns and add an identity column for a primary key and you're sorted; you can do it all in a stored procedure easily.
If you have four different types of records, use four different tables, with a separate identity column in each one.
If you need to see all the data together, then use a view to combine them:
create v_AllTypes as
select * from type1 union all
select * from type2 union all
select * from type3 union all
select * from type4;
Alternatively, do the calculation of the sequence number on output:
select t.*,
row_number() over (partition by CounterNo order by t.id) as TypeSeqNum
from AllTypes t;
Something seems amiss with your data model if it requires conditional updates to four identity columns.

SQL-Server: Define columns as mutually exclusive

joking with a collegue, I came up with an interesting scenario: Is it possible in SQL Server to define a table so that through "standard means" (constraints, etc.) I can ensure that two or more columns are mutually exclusive?
By that I mean: Can I make sure that only one of the columns contains a value?
Yes you can, using a CHECK constraint:
ALTER TABLE YourTable
ADD CONSTRAINT ConstraintName CHECK (col1 is null or col2 is null)
Per your comment, if many columns are exclusive, you could check them like this:
case when col1 is null then 0 else 1 end +
case when col2 is null then 0 else 1 end +
case when col3 is null then 0 else 1 end +
case when col4 is null then 0 else 1 end
= 1
This says that one of the four columns must contain a value. If they can all be NULL, just check for <= 1.

Resources