How many characters can a SQL Query have? OracleDB and SQL Server - sql-server

I'm working on a feature in a Java application that builds a query based on a LOG table.
This LOG table basically has the history of created entities, storing all PK columns and their values.
This feature basically adds an intermediary filter where the user can tell me something like:
"Here are the entity X and date Y. Go to the log table, find all PKs of entity X since the date Y, then go on the Entity X's table and bring me all the data based on the PKs that you've get on the last step."
This is why I cannot paginate :(. I need to find all PKs since date Y.
I'm facing the following situation:
Imagine that my log table has 5,000 creation logs of the TB_WINDOWS table. The table TB_WINDOWS has a composite key with 3 columns ID(NUMBER), CREATED_BY(NUMBER), CREATED_AT(DATE).
So the query will have at least:
At least 10,000 AND clauses (AND.size() * 10,000 = 30,000 characters)
At least 5,000 OR clauses (OR.size() * 5,000 = 10,000 characters)
At least 30,000 () braces
At least 110,000 characters (( ID.size() + CREATED_BY.size() + CREATED_AT.size() ) * 5,000 = 110,000).
So, imagine the query looking like this:
-- In this example I'm using Oracle's TO_DATE.
-- But as I said, I need to know about Oracle and SQL Server
-- So, just take this as an example.
SELECT
*
FROM
TB_WINDOWS
WHERE
(ID = 1
AND CREATED_BY = 2
AND CREATED_AT = TO_DATE('a date', 'a format') )
OR (ID = 2
AND CREATED_BY = 3
AND CREATED_AT = TO_DATE('a date', 'a format') )
OR (.....)
OR (.....)
OR (.....)
-- And many other OR clauses
We can say that this query will have more than 180,000 characters.
My question is, does ORACLE and SQL Server have a limitation on the number of characters that a SQL Query can have?

You risk serious trouble to run your code in production. Oracle shares SQL statements and their exec plan. Therefore every every SQL hard-parsing requires exclusive lock in shared memory resources. This can be a problem when a chunk of memory you want to allocate is HUGE. I saw DB system crashes cased by situation when heavily loaded system tried to allocate memory in shared area, trying to parse 15MB SQL statement.
Generally this approach (used by Hibernate for example) is discouraged.
There were various articles written about "Dynamic where clause problem" in Oracle. It is much safer to store your data in temporary table - for example.

Related

Database design for IoT application

Our application shows near-real-time IoT data (up to 5 minute intervals) for our customers' remote equipment.
The original pilot project stores every device reading for all time, in a simple "Measurements" table on a SQL Server 2008 database.
The table looks something like this:
Measurements: (DeviceId, Property, Value, DateTime).
Within a year or two, there will be maybe 100,000 records in the table per device, with the queries typically falling into two categories:
"Device latest value" (95% of queries): looking at the latest value only
"Device daily snapshot" (5% of queries): looking at a single representative value for each day
We are now expanding to 5000 devices. The Measurements table is small now, but will quickly get to half a billion records or so, for just those 5000 devices.
The application is very read-intensive, with frequently-run queries looking at the "Device latest values" in particular.
[EDIT #1: To make it less opinion-based]
What database design techniques can we use to optimise for fast reads of the "latest" IoT values, given a big table with years worth of "historic" IoT values?
One suggestion from our team was to store MeasurementLatest and MeasurementHistory as two separate tables.
[EDIT #2: In response to feedback]
In our test database, seeded with 50 million records, and with the following index applied:
CREATE NONCLUSTERED INDEX [IX_Measurement_DeviceId_DateTime] ON Measurement (DeviceId ASC, DateTime DESC)
a typical "get device latest values" query (e.g. below) still takes more than 4,000 ms to execute, which is way too slow for our needs:
SELECT DeviceId, Property, Value, DateTime
FROM Measurements m
WHERE m.DateTime = (
SELECT MAX(DateTime)
FROM Measurements m2
WHERE m2.DeviceId = m.DeviceId)
This is a very broad question - and as such, it's unlikely you'll get a definitive answer.
However, I have been in a similar situation, and I'll run through my thinking and eventual approach. In summary though - I did option B but in a way to mirror option A: I used a filtered index to 'mimic' the separate smaller table.
My original thinking was to have two tables - one with the 'latest data only' for most reporting, then a table with all historical values. An alternate was to have two tables - one with all records, and one with just the latest.
When inserting a new row, it would typically need to therefore update at least two rows, if not more (depending on how it's stored).
Instead, I went for a slightly different route
Put all the data into one table
On that one table, add a new column 'Latest_Flag' (bit, NOT NULL, DEFAULT 1). If it's 1 then it's the latest value; otherwise it's historical
Have a filtered index on the table that has all columns (with appropriate column order) and filter of Latest_Flag = 1
This filtered index is similar to a second copy of the table with just the latest rows only
The insert process therefore has two steps in a transaction
'Unflag' the last Latest_Flag for that device, etc
Insert the new row
It still makes the writes a bit slower (as it needs to do several row updates as well as index updates) but fundamentally it does the pre-calculation for later reads.
When reading from the table, however, you need to then specify WHERE Latest_Flag = 1. Alternatively, you may want to put it into a view or similar.
For the filtered index, it may be something like
CREATE INDEX ix_measurements_deviceproperty_latest
ON Measurements (DeviceId, Property)
INCLUDE (Value, DateTime, Latest_Flag)
WHERE (Latest_Flag = 1)
Note - another version of this can be done in a trigger e.g., when inserting a new row, it invalidates (sets Latest_Flag = 0) any previous rows. It means you don't need to do the two-step inserts; but you do then rely on business/processing logic being within triggers.

How to store 300M records in Postgresql to run efficiency queries

I have following table:
CREATE TABLE public.shop_prices
(
shop_name text COLLATE pg_catalog."default",
product text COLLATE pg_catalog."default",
product_category text COLLATE pg_catalog."default",
price text COLLATE pg_catalog."default"
)
and for this table i have a dataset from 18 months. In each file there are about 15M records. I have to some analysis, like in which month a shop has increased or decreased their price. I imported two months in a table and run following query just to test:
select shop, product from shop_prices group by shop, product limit 10
I waited more than 5 minutes, but no any result and response. It was still on working. What is the best way the store these datasets and run efficiency queries? Is it a good idea if I create for each dataset a seperate tables?
Using explain analyze select shop_name, product from shop_prices group by shop, product limit 10 you can see how Postgres is planning and executing the query and the time the execution takes. You'll see it needs to read the whole table (with the time consuming disk reads) and then sort it in memory - which will probably need to be cached on disk, before returning the results. In the next run you might discover the same query is very snappy if the number of shop_name+product combinations are very limited and thus stored in pg_stats after that explain analyze. The point being that a simple query like this can be deceiving.
You will faster execution by creating an index on the columns you are using (create index shop_prices_shop_prod_idx on public.shop_prices(shop_name,product)).
You should definitely change the price column type to numeric (or float/float8)) if you plan to do any numerical calculations on it.
Having said all that, I suspect this table is not what you will be using as it does not have any timestamp to compare prices between months to begin with.
I suggest you complete the table design and speculate on indices to improve performance. You might even want consider table partitioning https://www.postgresql.org/docs/current/ddl-partitioning.html
You will probably be doing all sorts of queries on this data so there is no simple solution to them all.
By all means return with perhaps more specific questions with complete table description and the output from the explain analyze statement for queries you are trying out and get some good advice.
Best regards,
Bjarni
What is your PostgreSQL version ?
First there is a typo: column shop should be shop_name.
Second you query looks strange because it has only a LIMIT clause without any ORDER BY clause or WHERE clause: do you really want to have "random" rows for this query ?
Can you try to post EXPLAIN output for the SQL statement:
explain select shop_name, product from shop_prices group by shop_name, product limit 10;
Can you also check if any statistics have been computed for this table with:
select * from pg_stats where tablename='shop_prices';

SSIS - How to improve a work flow

Previously I have asked for a possible solution for a situation that I had to face in order to implement a sql query (which is implementing originally in access). I have reach a solution (after asking a lot) but I would like to know if anyone has another way to execute this query.
I have got two different tables, one in sql and another in oracle (S and O)
O(A, B, C) => PK=(A,B) and S(D,E,F) => PK = (D,E)
The query looks like this
SELECT A,B,C,E,F
FROM S INNER JOIN O ON
S.D = O.A (Only one attribute of the PK in O)
S has over 10.000 registers and O more than 700 millions. Given this, is not logic to implement a merge join, or a look up because I will have only the first match between D and A.
So I thought that it will be better to assemble the query in the Oracle side. To do this I have implemented an scheme like this.
With the sql I have executed this query:
with tmp(A) as ( select distinct D as A from S
)
select cast( select concat(' or A = ', A)
from tmp
for xml path('')) as nvarchar(max)) as ID
I am getting a string with the values that I gonna search on oracle.
Finally in the data flow, I am creating an expression like this:
select A, B, C
from O
where A= '' + #ID
I downnload this values to sql server and then I am able to manipulate them as I wish.
The use of the foreach loop was necessary because I am storing the string of sql inside an object variable. I found that SSIS has some troubles with the nvarchar(max) variables.
Some considerations:
1) The Oracle database is administered for another area of the company and they only gives reading permissions over the tables.
2) The DBA of the sql server does not allow to download the O table on a staging area. Not possibilities of negotiations with him, besides, this tabla is updated every day with more registers. He only manages this server and does not have any authority over Oracle.
3) The solution that was given for some members of my team was to create a query in oracle between different tables that can give me the attributes of O that I need, as a result I could get more than 3 millions of register and not all of the attributes A are presented in S. Even more, some the values of D has been manipulated, so possibly they are not going to be present in O.
With this implementation I am getting more than 150.000 registers from Oracle. But I would like to know if another solution can be implemented or if there are other components that I can use to reach the same results. Believe me when I say that I have read, asked and searched a lot before to implement this flow.
EDITED:
Option 1 (You say that you cannot use this solution – but it would be the first one – the best)
Use a DBLink to let Oracle access S table (you must use Oracle Database Gateway). Create a view in Oracle joining O and S. And finally use linked server to let SQL Server access the Oracle Joining view and get the results.
The process is as follow:
You must convince your Oracle DBA to configure the Oracle Database Gateway for SQL Server (see
http://docs.oracle.com/cd/B28359_01/gateways.111/b31043/conf_sql.htm#CIHGADGB)
. When it is properly configured then you can create a DBLink from SQL Server to Oracle. With the DBLink Oracle will have have a direct
access to S table.
Now create a view V just joining O and S table.
As you want the result back in your SQL Server and you cannot use
SSIS then you can proceed as described in:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/111df59c-b309-4d59-b56c-9cd5574ee181/how-to-access-oracle-table-from-sql-server-?forum=transactsql
Option 2 (You say that you cannot use this solution – but it would be the second one)
As your Oracle admins seem to be monsters that will kill you if they get their paws on you. Then you can try (if they let you create a table in oracle):
Create a linked server in SQL Server (to access Oracle from SQL
Server). As I mentioned in the "normal case".
And Create a (temporary) table in Oracle schema with only 1 column (it will store D values from SQL Server)
Everytime you need to evaluate your query execute in SQL Server:
INSERT INTO ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE
SELECT DISTINCT D FROM S;
SELECT * FROM OPENQUERY('SELECT * FROM ORACLE_OWNER.O WHERE A IN (SELECT D FROM ORACLE_OWNER.TEMP_TABLE)');
And finally don't forget to delete the Oracle's temp table:
DELETE * FROM ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE;
Option 3 (If you have an Oracle license and one available host)
You can install your own Oracle server in your host and use Option 2.
Option 4
If your solution is really the only way out, then let's try to improve it a little bit.
As you know, your solution works but it is a little bit aggressive (you are transforming a relational algebra semijoin operator into a relational algebra selection operator with a monster condition). You say that the Oracle table is updated everyday with more register, but if the update rate of your tables are lower than your query rate then you can create a result cache that you can use while the tables S or O are not changed.
Proceed as follows:
Create a table in your SQL Server to store the Oracle result of your monster query. And before build and launch your query execute this:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
This returns the most recent time when your table S was update. Store this value in a table (create a new table or store this value in a typical parameter table).
Create your monster query. But before launch it, send this query to Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
It returns the last SCN (System Change Number) that cause a change in the table. Store this value in a table (create a new table or store this value in a typical parameter table).
Launch the big query and store its result into the cache table.
Finally, when you need to repeat the big query, first execute in your SQL Server:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
And execute in Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
If one or both values have changed with respect the one you have stored in your parameter table, then you must store them in the parameters table (updating the old values) and launch again the big query. But if none of the values have changed, then your cache is up to date, and you can use it.
Note that
SCN is not absolutely precise, but it is a good approximation (see: http://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns007.htm)
The greater is your query rate with respect to your update rate, the better is this solution.
If you can tolerate working with old values, then you can improve the cache with expiration time.

When should I use a table variable vs temporary table in sql server?

I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.
In my database, there is a table that is structured as followed:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above.
The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
As an experiment, I tested the following SQL command:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text".
The whole index file has a size of about 50 GB.
A few more informations:
The database is Oracle 11g installed on my local computer.
I use Windows 7 Enterprise 64bit.
The whole index is split over 3 dbf files with about 50GB size.
I would really be glad, if someone could tell me how to make Oracle use the index in the first query.
Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
Thanks in advance.
[01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan.
The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as #Dan and #Ollie mentioned, with year as the second column this will make the index even slower.
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
If the time is off, check your workload statistics.
Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
Your index should be:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
Hope it helps...
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.
Your second test query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
I don't know if it's relevant, but I tested the following query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?
01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Try this:
1) Create an index on year field (see Ollie answer).
2) And then use this query:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.

Resources