find and get data in large amount of SQL table - sql-server

I have a simple DB table with ONLY 5 columns with no primary key having 7 billion+(7,50,01,771) data. yes, you read it correctly. it has one cluster index.
DB table columns
Cluster index
if I write a simple select query to get data, it is taking 7-8 minutes to return data. now, you get my next question. what are the techniques that I can apply to this DB table? So that I can get data in time.
in the actual scenario, where I am using this table have join with 2 temp tables that have WHERE clause and filtered data. Please find below my query for reference.
SELECT dt.ZipFrom, dt.ZipTo, dt.Total_time, sz.storelocation, sz.AcctShip, sz.Licensee,sz.Entity from #Zips z INNER join DriveTime_ZIPtoZIP dt on zipFrom = z.zip INNER join #storeZips sz on ZipTo = sz.zip order by z.zip desc, total_time asc
Thanks

You can index according to the where conditions in the query. However, this comes at a cost: Storage.
Order by statement is also important. If you have to use order by in your query, you can also index accordingly.
But do not forget, the cost of indexing ...

Related

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

Improving performance by using XML fields - is it possible?

For a personal project I'm trying, for reasons of performance and security, to add display information in a XML field on the main table.
In this case Orders and Orderlines.
The current setup is:
tblOrders has 1 Index: Clustered on UID
tblOrderItems has 1 Index: Clustered on UID
tblOrder.Orderlines (XML) has 2 indexes. a primary and a secondary on PATH.
Now I'm trying following 2 queries:
SELECT Ord.UID
, Item.DomainName
, Item.BasicInfo
, Item.Base
, Item.Period
FROM tblOrder Ord
INNER JOIN tblOrderItem Item
ON Item.OrderID = Ord.UID
WHERE Item.DomainName = 'domainname.com'
and
SELECT
UID
, c.value('(DomainName)[1]','nvarchar(150)') AS DomainName
, c.value('(BasicInfo)[1]','nvarchar(150)') AS [Basic Info]
, c.value('(Base)[1]','float') AS [Base Price]
, c.value('(Period)[1]','smallint') AS Period
FROM tblOrder
CROSS APPLY tblOrder.OrderLines.nodes('/OrderItem/line') as t(c)
WHERE c.value('(DomainName)[1]','nvarchar(150)') = 'domainname.com'
First one has a average time of 4ms while the second has a average time of 38ms.
Both tests were done with the same data, which is not a lot since I'm trying to decide what data model to use.
My question at last: is it possible to rewite the xml / xml query to make that one more performant then the regular inner join?
Thanks.
First of all SQL Server is relational database.
The whole point of relational databases is normalization.
First Normal Form:
A database is in first normal form if it satisfies the following conditions:
Contains only atomic values
There are no repeating groups
An atomic value is a value that cannot be divided.
Using XML column you insert non-atomic data. Then during retrieving data you need to parse it to get specific values. Parsing is almost always more expensive than simple JOIN. So the first approach is better.

SQL query tune up while fetching data for counting the records by joining 2 tables

I am having problem in fetching a number of records from while joining tables. Please see the below query:
SELECT
H.EIN,
H.OUC,
(
SELECT
COUNT(1)
FROM
tbl_Checks C
INNER JOIN INFM_People_OR.dbo.tblHierarchy P
ON P.EIN = C.EIN_Checked
WHERE
(H.EIN IN (P.L1, P.L2)
OR H.EIN = C.EIN_Checked)
AND C.[Read] = 1
) AS [Read]
FROM
INFM_People_OR.dbo.tblHierarchy H
LEFT JOIN tbl_Checks C
ON H.EIN = C.EIN_Checked
WHERE
H.L1 = #EIN
GROUP BY
H.EIN,
H.OUC,
C.Check_Date
Even if there are just 100 records this query takes a much more time(around 1 min).
Please suggest a solution to tune up this query as it is throwing error in front end
Given just the query there are a few things that stick out as being non-optimal:
Any use of OR will be slower:
WHERE
(H.EIN IN (P.L1, P.L2)
OR H.EIN = C.EIN_Checked)
AND C.[Read] = 1
If there's any way to rework this based off of your data set so that both the IN and the OR are replaced with ANDs that would help.
Also, use of a local variable in the WHERE clause will not work well with the optimizer:
WHERE
H.L1 = #EIN
Finally, make sure you have indexes (and hopefully these are integer fields) where you are doing your joins and group bys (H.EIN, H.OUC, C.Check_Date
The size of the result set (100 records) doesn't matter as much as the size of the joined tables and whether or not they have appropriate indexes.
The Estimated number of rows affected is 1196880 is very high resulting in high execution time of query. I have also tried to join the tables only once but that it giving different output.
Please suggest any other solution than creating indices as I have already created non-clustered index for the table tbl_checks but it doesn't make any difference.
Below is the SQl execution plan.

Looping in an oracle table based on partitions

I am creating a Query which is taking data from multiple other databases through DBlinks.
One of the table "ord" in a Query is immensely large , say more than 50 million rows.
Now, I want to create query and I want to traverse the data and retrieve the required data based on Partitions defined in t1.
i.e if ord has 50 partitions in it with 1 million records each, I want to run my whole query on first partition get the result ,then move to 2nd, 3rd and so on. . upto last partition.
How can I do that?
Please consider the Sample Query where from local DB I am accessing all the remote DBs using DB links.
This Query list all the orders which are active.
Select ord.order_no,
ord.customer_id,
ord.order_date,
cust.customer_id,
cust.cust_name,
adr.street,
adr.city,
adr.state,
ship.ship_addr_street,
ship.ship_addr_city,
ship.ship_addr_state,
ship.ship_date
from order ord#ordDB
inner join customer#custDB cust on cust.customer_id = ord.customer_id
inner join address#adrDB adr on adr.address_id = cust.address_id
inner join shipment#shipDB ship on ship.shipment_id = ord.shipment_id
where ord.active = 'true';
Now there is a feild "partition_key" defined in this table, and each key is associated with say 1 million rows and I want to restructure the query so that at one time we take one partition from Order and run this whole query on that partition and move to next partition until table is not completed.
Please help me to create sample query.

Tuning Select statement to obtain faster results

I have benefited from this website for a long time now. This is my first question on the site. It is regarding performance tuning a reporting query. Here it goes.
1.
SELECT Count(b1.primkey)
from tableA b1 --WITH (NOLOCK)
join tableA b2 --WITH (NOLOCK)
on b1.email = b2.email
and DateDiff(day, b2.BookedDate , b1.BookedDate) > 1
tableA has around 7 million rows. Email is a varchar(100) field. Bookeddate is a datetime field. primkey is a primary key column that is an int.
My purpose of writing this query is to find out the count entries that have same email ids but have come in one day late. This query take about 45 minutes to run. I really want to reduce the time it takes to execute.
Since this is for reporting, i tried in vain to use --WITH (NOLOCK) option to improve the read time. I have a column store index on tableA and I know that it is being used by the SQL optimizer - can see in the execution plan. I am using SQL Server 2012.
Can someone tell me in such a case, what would be better? Using a nonclustered index on email or a nonclustered columnstore index on tableA?
Please help me.
Your query is relatively complex. You are essentially joining two tables that have 7 million records each on a column that is not unique.
How about the following query instead:
select Email
from TableA
group by Email
having MAX(BookedDate) > MIN(BookedDate) + 1
Also make sure you have an index with Email and BookedDate.
Hope this helps.
You have 3 options here:
Create clustered index on email field at least for a larger table.
But I suppose there are other queries running on these tables, and
clustered index is needed on other fields
Move emails to another table, and store email id's in TableA and
TableB; join on int field would be much faster than on varchar
fields
Create indexes on email fields with included columns BookedDate (no
need to include primkey, you can count on another field, or count(*). Code: create index idx_email on TableA include(BoodedDate)
I think that third option is the one you should go with. There's not much work to be done, and there will be great performance gain. The only problem is that index on varchar field will take a lot of space and impact insert/update operations; but you said that this is a reporting db, so I think you can allow that.

Resources