Simple select query taking too long - sql-server

I have a table with around 18k rows for a certain week and another week has 22k rows.
I'm using view and indexes to retrieve the data like so
SELECT TOP 100 * FROM my_view
WHERE timestamp BETWEEN #date1 AND
#date2
But somehow, the week with 22k retrieves data faster (around 3-5sec) while the other takes a minute at least. These causes my wcf to timeout. What am i missing?

Apply an index on timestamp field.
if you have already an index on timestamp then check the index being used for this query in execution plan.
The index hint will only come into play where your query involves joining tables, and where the columns being used to join to the other table matches more than one index. In that case the database engine may choose to use one index to make the join, and from investigation you may know that if it uses another index the query will perform better. In that case you provide the index hint telling the database engine which index to use.
Sample code use index hints:
select [Order].[OrgId], [OrderDetail].[ProductId]
from [Order]
inner join [OrderDetail] **with(index(IX_OrderDetail_OrderId))** on [Order].[OrderId] = [OrderDetail].[OrderId]

Related

Why does snowflake create table as select (CTAS) ignore order by clause?

The command is:
drop table if exists metrics_done;
create table metrics_done as select * from metrics where end_morning='2022-03-31' order by LOG_INFO desc;
The expected behaviour is creation of a table with sorted entries. But this does not happen. Why?
Snowflake does use ORDER BY on a CTAS. You can see that by using the system$clustering_information - subject to some limitations on high cardinality and how the function checks clustering state before it runs the auto clustering service with a new key at least once.
However, just because Snowflake uses the ORDER BY in a CTAS, it doesn't mean the rows will return in order without using an ORDER BY clause. Snowflake is an MPP system and will scan multiple micropartitions during a query. Without specifying an ORDER BY, there is no reason the optimizer should generate a plan that guarantees order. The plan it generates can and will return rows in the order they're ready for the result.
Here's an over-simplistic example: on a CTAS you order by date and all rows in micropartition 1 have date 2022-01-01; all rows in micropartition have date 2022-01-02. When you select rows from that table, the scan for micropartition 2 is just as likely to finish first as micropartition 1 is. If #2 finishes first, those rows will be first in the result set.
Also, when the table becomes large and it has more micropartitions assigned to scan than there are available CPUs in the warehouse, one or more CPUs will need to scan multiple micropartitions. In this case, there's no reason to prefer to scan one micropartition before another.

Why oracle table indexed but still do full table scan?

I have a table 'MSATTRIBUTE' with 3000K rows. I used the following query to retrieve data, this query has different execution plan with same DB data but in different env. in one env, it appears full scan so the query is very slow, but in another env it all used index scan it's quite good, everybody who knows why it have full table scan in one env because I built index for them, how do I let become index scan just like what I tested in env 1. how I can improve this query?
without understanding way more than I care to know about your data model and you business it's hard to give concrete positive advice. But here are some notes about your indexing strategy and why I would guess the optimizer is not using the indxes you have.
In the sub-query the access path to REDLINE_MSATTRIBUTE drives from three columns:
CLASS
OBJECT_ID
CHANGE_RELEASE_DATE.
CLASS is not indexed. but that is presumably not very selective. OBJECT_ID
is the leading column of a compound index but the other columns are irrelevant the sub-query.
But the biggest problem is CHANGE_RELEASE_DATE. This is not indexed at all. Which is bad news, as your one primary key look up produces a date which is then compared with CHANGE_RELEASE_DATE. If a column is not indexed teh database has to read the table to get its values.
The main query drives off
ATTID
CHANGE_ID
OBJECT_ID (again)
CHANGE_RELEASE_DATE (again)
CLASS (again)
OLD_VALUE
ATTID is indexed but how sleective is that index? The optimizer probably doesn't think it's very selective. ATTID is also in a compound index with CHANGE_ID and OLD_VALUE but none of them are the leading columns, so that's not very useful. And we've discussed CLASS, CHANGE_RELEASE_DATE and OBJECT_ID already.
The optimizer will only choose to use an index if it is cheaper (fewer reads) than a table scan. This usually means WHERE clause criteria need to map to the leading (i.e. leftmost) columns of an index. This could be the case with OBJECT_ID and ATTID in the sub-query except that
The execution plan would have to do an INDEX SKIP SCAN because REDLINE_MSATTRIBUTE_INDEX1 has CHANGE_ID between the two columns
The database has to go to the table anyway to get the CLASS and the CHANGE_RELEASE_DATE.
So, you might get some improvement by building an index on (CHANGE_RELEASE_DATE, CLASS, OBJECT_ID, ATTID). But as I said upfront, without knowing more about your situation these are just ill-informed guesses.
If the rows are in a different order in the two tables then the indexes in the two systems can have different clustering factors, and hence difference estimated costs for index access. Check the table and index statistics, including the clustering factor, to see if there are significant differences.
Also, do either of the systems' explain plans mention dynamic sampling?
when oracle has an index and decides to use/not use it it's might be because
1) you may have different setting for OPTIMIZER_MODE - make sure it's not on RBO.
2) the data is different - in this case oracle might evaluate the query stats differently.
3) the data is the same but the stats are not up to date. in this case - gather stats
dbms_stats.gather_table_stats('your3000Ktable',cascade=>true);
4) there are allot more reasons why oracle will not use the index on one environment, i'll suggest comparing parameters (such as OPTIMIZER_ INDEX_COST_ADJ etc...)
One immediate issue is this piece SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355 (This piece of code will run for every row coming back and it doesn't need to... a better way of doing this is using a single cartesian table so it only runs once and returns a static value to compare....
Example 1:
Select * From Table1, (Select Sysdate As Compare From Dual) Table2 Where Table1.Date > Table2.Compare.
Is always faster than
Select * from Table1 Where Date > Sysdate -- Sysdate will get called for each row as it is dynamic function based value. The earlier example will resolve once to a literal and drastically faster. and i believe this is definitely once piece hurting your query and forcing a table scan.
I also believe this is a more efficient way to execute the query.
Select
REDLINE_MSATTRIBUTE.ATTID
,REDLINE_MSATTRIBUTE.VALUE
From
REDLINE_MSATTRIBUTE
,(
SELECT ATTID
,CHANGE_ID
,MIN(CHANGE_RELEASE_DATE) RELEASE_DATE
FROM REDLINE_MSATTRIBUTE
,(SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355) T_COMPARE
WHERE CLASS = 9000
And OBJECT_ID = 32718015
And CHANGE_RELEASE_DATE > T_COMPARE.RELEASE_DATE
And ATTID IN (1564, 1565)
GROUP
BY ATTID,
CHANGE_ID
) T_DYNAMIC
Where
REDLINE_MSATTRIBUTE.ATTID = T_DYNAMIC.ATTID
And REDLINE_MSATTRIBUTE.CHANGE_ID = T_DYNAMIC.CHANGE_ID
And REDLINE_MSATTRIBUTE.RELEASE_DATE = T_DYNAMIC.RELEASE_DATE
And CLASS = 9000
And OBJECT_ID = 32718015
And OLD_VALUE ='Y'
Order
By REDLINE_MSATTRIBUTE.ATTID,
REDLINE_MSATTRIBUTE.VALUE;

Best way to get distinct values from large table

I have a db table with about 10 or so columns, two of which are month and year. The table has about 250k rows now, and we expect it to grow by about 100-150k records a month. A lot of queries involve the month and year column (ex, all records from march 2010), and so we frequently need to get the available month and year combinations (ie do we have records for april 2010?).
A coworker thinks that we should have a separate table from our main one that only contains the months and years we have data for. We only add records to our main table once a month, so it would just be a small update on the end of our scripts to add the new entry to this second table. This second table would be queried whenever we need to find the available month/year entries on the first table. This solution feels kludgy to me and a violation of DRY.
What do you think is the correct way of solving this problem? Is there a better way than having two tables?
Using a simple index on the columns required (Year and Month) should greatly improve either a DISTINCT, or GROUP BY Query.
I would not go with a secondary table as this adds extra over head to maintaining the secondary table (inserts/updates deletes will require that you validate the secondary table)
EDIT:
You might even want to consider using Improving Performance with SQL Server 2005 Indexed Views
Make sure to have an Clustered Index on those columns.
and partition your table on these date columns an place the datafiles on different disk drives
I Believe keeping your index fragmentation low is your best shot.
I also Believe having a physical view with the desired select is not a good idea,
because it adds Insert/Update overhead.
on average there's 3,5 insert's per minute.
or about 17 seconds between each insert (on average please correct me if I'm wrong)
The question is are you selecting more often than every 17 seconds?
That's the key thought.
Hope it helped.
Use a 'Materialized View', also called an 'Indexed View with Schema Binding', and then index this view. When you do this SQL server will essentially create and maintain the data in a secondary table behind the scenes and choose to use the index on this table when appropriate.
This is similar to what your co-worker suggested, the advantage being you won't need to add logic to your query to take advantage of it, SQL Server will do this when it creates a query plan and SQL Server will also automatically maintain the data in the Indexed View.
Here is how you would accomplish this: create a view that returns the distinct [month] [year] values and then index [year] [month] on the view. Again SQL Server will use the tiny index on the view and avoid the table scan on the big table.
Because SQL server will not let you index a view with the DISTINCT keyword, instead use GROUP BY [year],[month] and use BIG_COUNT(*) in the SELECT. It will look something like this:
CREATE VIEW dbo.vwMonthYear WITH SCHEMABINDING
AS
SELECT
[year],
[month],
COUNT_BIG(*) [MonthCount]
FROM [dbo].[YourBigTable]
GROUP BY [year],[month]
GO
CREATE UNIQUE CLUSTERED INDEX ICU_vwMonthYear_Year_Month
ON [dbo].[vwMonthYear](Year,Month)
Now when you SELECT DISTINCT [Year],[Month] on the big table, the query optimizer will scan the tiny index on the view instead of scanning millions of records on the big table.
SELECT DISTINCT
[year],
[month]
FROM YourBigTable
This technique took me from 5 million reads with an estimated I/O of 10.9 to 36 reads with an estimated I/O of 0.003. The overhead on this will be that of maintaining an additional index, so each time the large table is updated the index on the view will also be updated.
If you find this index is substantially slowing down your load times. Drop the index, perform your data load and then recreate it.
Full working example:
CREATE TABLE YourBigTable(
YourBigTableID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_YourBigTable_YourBigTableID PRIMARY KEY,
[Year] INT,
[Month] INT)
GO
CREATE VIEW dbo.vwMonthYear WITH SCHEMABINDING
AS
SELECT
[year],
[month],
COUNT_BIG(*) [MonthCount]
FROM [dbo].[YourBigTable]
GROUP BY [year],[month]
GO
CREATE UNIQUE CLUSTERED INDEX ICU_vwMonthYear_Year_Month ON [dbo].[vwMonthYear](Year,Month)
SELECT DISTINCT
[year],
[month]
FROM YourBigTable
-- Actual execution plan shows SQL server scaning ICU_vwMonthYear_Year_Month
create a materialized indexed view of:
SELECT DISTINCT
MonthCol, YearCol
FROM YourTable
you will now get access to the pre-computed distinct values without going through the work every time.
Make the date the first column in the table's clustered index key. This is very typical for historic data, because most, if not all, queries are interested in specific ranges and a clustered index on time can address this. All queries like 'month of May' need to be addressed as ranges, eg: WHERE DATECOLKEY BETWEEN '05/01/2010' AND '06/01/2001'. Answering a question like 'are there any records in May' will involve a simple seek into the clustered index.
While this seems complicated for a programmer mind, it is the optimal way to approach a database design problem.

what is the fastest way of getting table record count with condition on SQL Server

As per subject, i am looking for a fast way to count records in a table without table scan with where condition
There are different methods, the most reliable one is
Select count(*) from table_name
But other than that you can also use one of the followings
select sum(1) from table_name
select count(1) from table_name
select rows from sysindexes where object_name(id)='table_name' and indid<2
exec sp_spaceused 'table_name'
DBCC CHECKTABLE('table_name')
The last 2 need sysindexes to be updated, run the following to achieve this, if you don't update them is highly likely it'll give you wrong results, but for an approximation they might actually work.
DBCC UPDATEUSAGE ('database_name','table_name') WITH COUNT_ROWS.
EDIT: sorry i did not read the part about counting by a certain clause. I agree with Cruachan, the solution for your problem are proper indexes.
The following page list 4 methods of getting the number of rows in a table with commentary on accuracy and speed.
http://blogs.msdn.com/b/martijnh/archive/2010/07/15/sql-server-how-to-quickly-retrieve-accurate-row-count-for-table.aspx
This is the one Management Studio uses:
SELECT CAST(p.rows AS float)
FROM sys.tables AS tbl
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and idx.index_id < 2
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int)
AND p.index_id=idx.index_id
WHERE ((tbl.name=N'Transactions'
AND SCHEMA_NAME(tbl.schema_id)='dbo'))
Simply, ensure that your table is correctly indexed for the where condition.
If you're concerned over this sort of performance the approach is to create indexes which incorporate the field in question, for example if your table contains a primary key of foo, then fields bar, parrot and shrubbery and you know that you're going to need to pull back records regularly using a condition based on shrubbery that just needs data from this field you should set up a compound index of [shrubbery, foo]. This way the rdbms only has to query the index and not the table. Indexes, being tree structures, are far faster to query against than the table itself.
How much actual activity the rdbms needs depends on the rdbms itself and precisely what information it puts into the index. For example, a select count()* on an unindexed table not using a where condition will on most rdbms's return instantly as the record count is held at the table level and a table scan is not required. Analogous considerations may hold for index access.
Be aware that indexes do carry a maintenance overhead in that if you update a field the rdbms has to update all indexes containing that field too. This may or may not be a critical consideration, but it's not uncommon to see tables where most activity is read and insert/update/delete activity is of lesser importance which are heavily indexed on various combinations of table fields such that most queries will just use the indexes and not touch the actual table data itself.
ADDED: If you are using indexed access on a table that does have significant IUD activity then just make sure you are scheduling regular maintenance. Tree structures, i.e. indexes, are most efficient when balanced and with significant UID activity periodic maintenance is needed to keep them this way.

SQL Server Indexes Aren't Helping

I have a table (SQL 2000) with over 10,000,000 records. Records get added at a rate of approximately 80,000-100,000 per week. Once a week a few reports get generated from the data. The reports are typically fairly slow to run because there are few indexes (presumably to speed up the INSERTs). One new report could really benefit from an additional index on a particular "char(3)" column.
I've added the index using Enterprise Manager (Manage Indexes -> New -> select column, OK), and even rebuilt the indexes on the table, but the SELECT query has not sped up at all. Any ideas?
Update:
Table definition:
ID, int, PK
Source, char(3) <--- column I want indexed
...
About 20 different varchar fields
...
CreatedDate, datetime
Status, tinyint
ExternalID, uniqueidentifier
My test query is just:
select top 10000 [field list] where Source = 'abc'
You need to look at the query plan and see if it is using that new index - if it isnt there are a couple things. One - it could have a cached query plan that it is using that has not been invalidated since the new index was created. If that is not the case you can also trying index hints [ With (Index (yourindexname)) ].
10,000,000 rows is not unheard of, it should read that out pretty fast.
Use the Show Execution Plan in SQL Query Analyzer to see if the index is used.
You could also try making it a clustered index if it isn't already.
For a table of that size your best bet is probably going to be partitioning your table and indexes.
select top 10000
How unique are your sources? Indexes on fields that have very few values are usually ignore by the SQL engine. They make queries slower. You might want to remove that index and see if it is faster if your SOURCE field only has a handful of values.

Resources