indexing datetime column on a very wide table - sql-server

I have table with about 400 columns and 4 million rows in SQL Server 2012.
the only purpose of this table to be used by a reporting tool. this table is refreshed(dropped and recreated) every night via scheduled Job. so no update/insert/delete.
there is a Date column with Datetime as datatype. I have created a clustered index on this date column but it only seemed to help a little.(there wont be any other conditions on where clause so I haven't included any other columns in the index)
the query send by reporting tool is like
select *(all columns listed)
from mytable
where date>='01/01/2010' and date <='12/01/2010'
it takes about 10 mins to retrieve all that falls under above date range which is about a million rows.
I need to get this under a minute if I can or the best I can.
if I can get some idea that might help me to achieve this . I would greatly appreciate it.
I have tried following but no significant performance gain.
-change datatype to 'Date'/'varchar'/'int' from 'Datetime'
-create nonclustered index on same column
-create clustered/nonclustered index including other columns to make it unique

100 million rows may just plain be a data volume thing.
Try:
select count(*) from mytable where date>='01/01/2010' and date <='12/01/2010'
If that is fast then it is not an index issue

Related

Report is Slow in SSMS and timing out in SSRS

I have table that stores about 250K rows of data. the select * query takes about 15 seconds to run in SSMS. Rendering the report in SSRS is not working. It just keep loading for about 5 minutes and sometimes works and other times out. Here's my two questions:
Is there a way to speed up the select query in SSMS?
Is there fix to the issue I'm having in SSRS?
I have tried converting the query into a stored procedure. Still the same problem in SSRS.
I have tried adding a new column to the table which assigns the same id for group of rows based on the date column. i.e. if the year of the date column is 2017 then all rows with this year will have id column = 1 and if it's 2018 then the id column = 2
I have tried limiting the number of rows in each page in SSRS but, still no luck.

Save result of select statement into wide table SQL Server

I have read about the possibilty to create wide tables (30,000 columns) in SQL server (1)
But how do I actually save the result of a select statement (one that has 1024+ columns) into a wide table?
Because if I do:
Select *
Into wide_table
From (
**Select statement with 1024+ columns**
) b
I get: CREATE TABLE failed because column 'c157' in table 'wide_table' exceeds the maximum of 1024 columns.
And, will I be able to query that table and all it's columns in a regular manner?
Thank you for your help!
You are right you are allowed to created table with 30 000 columns, but you can SELECT or INSERT 'only' 4096 column in one clause:
So, in case of SELECT you will need to get the columns in parts or concatenate the results. All of this does not seem to be practical and easier and performance efficient.
If you are going to have so many columns, maybe it will be better to try to UNPIVOT the data and normalized it further.

Tuning Select statement to obtain faster results

I have benefited from this website for a long time now. This is my first question on the site. It is regarding performance tuning a reporting query. Here it goes.
1.
SELECT Count(b1.primkey)
from tableA b1 --WITH (NOLOCK)
join tableA b2 --WITH (NOLOCK)
on b1.email = b2.email
and DateDiff(day, b2.BookedDate , b1.BookedDate) > 1
tableA has around 7 million rows. Email is a varchar(100) field. Bookeddate is a datetime field. primkey is a primary key column that is an int.
My purpose of writing this query is to find out the count entries that have same email ids but have come in one day late. This query take about 45 minutes to run. I really want to reduce the time it takes to execute.
Since this is for reporting, i tried in vain to use --WITH (NOLOCK) option to improve the read time. I have a column store index on tableA and I know that it is being used by the SQL optimizer - can see in the execution plan. I am using SQL Server 2012.
Can someone tell me in such a case, what would be better? Using a nonclustered index on email or a nonclustered columnstore index on tableA?
Please help me.
Your query is relatively complex. You are essentially joining two tables that have 7 million records each on a column that is not unique.
How about the following query instead:
select Email
from TableA
group by Email
having MAX(BookedDate) > MIN(BookedDate) + 1
Also make sure you have an index with Email and BookedDate.
Hope this helps.
You have 3 options here:
Create clustered index on email field at least for a larger table.
But I suppose there are other queries running on these tables, and
clustered index is needed on other fields
Move emails to another table, and store email id's in TableA and
TableB; join on int field would be much faster than on varchar
fields
Create indexes on email fields with included columns BookedDate (no
need to include primkey, you can count on another field, or count(*). Code: create index idx_email on TableA include(BoodedDate)
I think that third option is the one you should go with. There's not much work to be done, and there will be great performance gain. The only problem is that index on varchar field will take a lot of space and impact insert/update operations; but you said that this is a reporting db, so I think you can allow that.

Indexing on DateTime and VARCHAR fields in SQL Server 2000, which one is more effectient?

We have a CallLog table in Microsoft SQL Server 2000. The table contains CallEndTime field whose type is DATETIME, and it's an index column.
We usually delete free-charge calls and generate monthly fee statistics report and call detail record report, all the SQLs use CallEndTime as query condition in WHERE clause. Due to a lot of records exist in CallLog table, the queries are slow, so we want to optimize it starting from indexing.
Question
Will it more effictient if query upon an extra indexed VARCHAR column CallEndDate ?
Such as
-- DATETIME based query
SELECT COUNT(*) FROM CallLog WHERE CallEndTime BETWEEN '2011-06-01 00:00:00' AND '2011-06-30 23:59:59'
-- VARCHAR based queries
SELECT COUNT(*) FROM CallLog WHERE CallEndDate BETWEEN '2011-06-01' AND '2011-06-30'
SELECT COUNT(*) FROM CallLog WHERE CallEndDate LIKE '2011-06%'
SELECT COUNT(*) FROM CallLog WHERE CallEndMonth = '2011-06'
It has to be the datetime. Dates are essentially stored as a number in the database so it is relatively quick to see if the value is between two numbers.
If I were you, I'd consider splitting the data over multiple tables (by month, year of whatever) and creating a view to combine the data from all those tables. That way, any functionality which needs to entire data set can use the view and anything which only needs a months worth of data can access the specific table which will be a lot quicker as it will contain much less data.
I think comparing DateTime is much faster than LIKE operator.
I agree with DoctorMick on Spliting your DateTime as persisted columns Year, Month, Day
for your query which selects COUNT(*), check if in the execution plan there is a Table LookUp node. if so, this might be because your CallEndTime column is nullable. because you said that you have a [nonclustered] index on CallEndTime column. if you make your column NOT NULL and rebuild that index, counting it would be a INDEX SCAN which is not so slow.and I think you will get much faster results.

Ensuring index is used on Informix DATETIME column

Say I have a table on an Informix DB:
create table password_audit (
username CHAR(20),
old_password CHAR(20),
new_password CHAR(20),
update_date DATETIME YEAR TO FRACTION));
I need the update_date field to be in milliseconds (or seconds maybe - same question applies) because there will be multiple updates of the password on the same day.
Say, I have a nightly batch job that wants to retrieve all records from the password_audit table for today.
To increase performance, I want to put an index on the update_date column. If I do this:
CREATE INDEX pw_idx ON password_audit(update_date);
and run this SQL:
SELECT *
FROM password_audit
WHERE DATE(update_date) = mdy(?,?,?)
(where ?, ?, ? are the month, day and year passed in by my batch job)
then I don't think my index will be used - is that right?
I think I need to create an index something like this:
CREATE INDEX pw_idx ON password_audit(DATE(update_date));
- is that right?
Because you are forcing the server to convert two values to DATE, not DATETIME, then it probably won't use an index.
You would do best to generate the SQL as:
SELECT *
FROM password_audit
WHERE update_date
BETWEEN DATETIME(2010-08-02 00:00:00.00000) YEAR TO FRACTION(5)
AND DATETIME(2010-08-02 23:59:59.99999) YEAR TO FRACTION(5)
That's rather verbose. Alternatively, and maybe slightly more easily:
SELECT *
FROM password_audit
WHERE update_date >= DATETIME(2010-08-02 00:00:00.00000) YEAR TO FRACTION(5)
AND update_date < DATETIME(2010-08-03 00:00:00.00000) YEAR TO FRACTION(5)
Both of these should be able to use the index on the update_date column. You can experiment with dropping some of the trailing zeroes from the literals, but I don't think you'll be able to remove them all - but see what the SET EXPLAIN ON output tells you.
Depending on your server version, you might need to run UPDATE STATISTICS after creating the index before the optimizer uses it at all; that is more of a problem on older (say 10.00 and earlier) versions of Informix than on the current (11.10 and later) versions.
I Didn't see 'date_to_accounts_ni' defined in your password_audit table.
What datatype/length is it?
Your first index on password_audit.update_date is adequate, why would you want to index
(DATE(update_table))?

Resources