I am creating a Query which is taking data from multiple other databases through DBlinks.
One of the table "ord" in a Query is immensely large , say more than 50 million rows.
Now, I want to create query and I want to traverse the data and retrieve the required data based on Partitions defined in t1.
i.e if ord has 50 partitions in it with 1 million records each, I want to run my whole query on first partition get the result ,then move to 2nd, 3rd and so on. . upto last partition.
How can I do that?
Please consider the Sample Query where from local DB I am accessing all the remote DBs using DB links.
This Query list all the orders which are active.
Select ord.order_no,
ord.customer_id,
ord.order_date,
cust.customer_id,
cust.cust_name,
adr.street,
adr.city,
adr.state,
ship.ship_addr_street,
ship.ship_addr_city,
ship.ship_addr_state,
ship.ship_date
from order ord#ordDB
inner join customer#custDB cust on cust.customer_id = ord.customer_id
inner join address#adrDB adr on adr.address_id = cust.address_id
inner join shipment#shipDB ship on ship.shipment_id = ord.shipment_id
where ord.active = 'true';
Now there is a feild "partition_key" defined in this table, and each key is associated with say 1 million rows and I want to restructure the query so that at one time we take one partition from Order and run this whole query on that partition and move to next partition until table is not completed.
Please help me to create sample query.
Related
I have 13 rows in a json.gz file. I am running this MERGE statement.
MERGE INTO order_lines
USING (
SELECT
$1:tenant_id as tenant_id,
$1:data:id as id
$1:data AS data,
$1:data_hash as data_hash,
FROM #s3_some_stage/dump/order_lines/2022-02-13_21-24-20_518.json.gz
) AS new_batch
ON
order_lines.tenant_id = new_batch.tenant_id
AND order_lines.id = new_batch.id
WHEN MATCHED AND order_lines.data_hash != new_batch.data_hash THEN
UPDATE SET
id = new_batch.id
data = new_batch.data,
data_hash = new_batch.data_hash,
WHEN NOT MATCHED THEN
INSERT (tenant_id, id, data, data_hash)
VALUES (new_batch.tenant_id, new_batch.id, new_batch.data, new_batch.data_hash);
It takes 15 seconds to run. When I initially ran, 3 rows updated and it took 15 seconds. When I ran it again, no rows changed but it still took 15 seconds on an S (small) warehouse. order_lines has 9.3M rows.
[{"number of rows inserted":0,"number of rows updated":0}]
SELECT
$1:tenant_id as tenant_id,
$1:data:id as id
$1:data AS data,
$1:data_hash as data_hash,
FROM #s3_some_stage/dump/order_lines/2022-02-13_21-24-20_518.json.gz
Takes 600ms to run and has 13 rows. Pretty small file.
Going to query profiler, it does show execution time at 15 seconds, but seeing the nodes, the most expensive node is 129ms. Snowflake spent 14s in processing, what does that mean?
The merge statement doesn't update any rows since the data_hash's are the same. So the MERGE statement is a no-op and I'd expect it to be very fast.
If I do a join between the staged file and the actual table, the filter returns in 400ms (13 rows). So why is the MERGE so slow?
WITH tmp as (
SELECT $1:tenant_id as tenant_id, $1:data:id::varchar AS id
from #s3_some_stage/dump/order_lines/2022-02-13_21-24-20_518.json.gz
)
select order_lines.id
from order_lines
right join tmp on
order_lines.tenant_id = tmp.tenant_id and order_lines.id = tmp.id;
MERGE is one of, if not the most, expensive operations in any database. Compounded in this case by reading directly from a file rather than the database.
Logically, every row must be examined and compared, although partition pruning eliminates some of this.
Also, suggest you try loading the file into Snowflake and then trying the merge; also larger warehouse.
I have a simple DB table with ONLY 5 columns with no primary key having 7 billion+(7,50,01,771) data. yes, you read it correctly. it has one cluster index.
DB table columns
Cluster index
if I write a simple select query to get data, it is taking 7-8 minutes to return data. now, you get my next question. what are the techniques that I can apply to this DB table? So that I can get data in time.
in the actual scenario, where I am using this table have join with 2 temp tables that have WHERE clause and filtered data. Please find below my query for reference.
SELECT dt.ZipFrom, dt.ZipTo, dt.Total_time, sz.storelocation, sz.AcctShip, sz.Licensee,sz.Entity from #Zips z INNER join DriveTime_ZIPtoZIP dt on zipFrom = z.zip INNER join #storeZips sz on ZipTo = sz.zip order by z.zip desc, total_time asc
Thanks
You can index according to the where conditions in the query. However, this comes at a cost: Storage.
Order by statement is also important. If you have to use order by in your query, you can also index accordingly.
But do not forget, the cost of indexing ...
I am trying to build a query that will generate a list of records based on the results of a very similar query.
Here are the details and examples
Query 1: Generate a list if part #'s in a specific location of the warehouse.
Query 2: Use the list of part #'s generated in #1 to show all locations for the list of part #'s, assuming they will be in both the location specified in #1 and other locations.
Query 1 looks like this:
Select
ItemMaster.ItemNo, BinInfo.BIN, ItemDetail.Qty, ItemDetail.Whouse_ID
From
((ItemDetail
Left Join
ItemMaster on ItemMaster.ID=ItemDetail.Item_ID)
Left Join
BinInfo on BinInfo.ID = ItemDetail.Bin_ID)
Where
ItemDetail.Whouse_ID = '1'
And BinInfo.Bin = 'VLM';
Query 2 needs to be almost identical except the ItemMaster.ItemNo list will come from query #1.
Any help here would be great. I don't know if I need to learn Unions, Nested Queries, or what.
make sure that your first query returns the list of ids that you need.
then write the second query with the WHERE id IN (...) syntax:
SELECT * FROM table1 WHERE id IN
(SELECT id FROM table2 WHERE...) -- first query
I working on groups project. I have those tables :
I can get the number of members for each group by using count function :
SELECT COUNT(1) AS Counts FROM [Groups].[GroupMembers]
WHERE GroupId=Id;
Or I can add another column to Groups table for counting and every time new member join to the group, this field will increase by one. Does it better to use count function or add another column for counting ? in other words, what are the advantages and disadvantages of each method ?
Creating a column to store the count's is not recommend at all.
When you want the count of each group you can use a simple Select query to show the count of each group.
SELECT G.groupid,
Count(userid)
FROM groups G
LEFT OUTER JOIN groupmembers GM
ON G.groupid = GM.groupid
GROUP BY G.groupid
In case you want to add a new column then you will require a Trigger on GroupMembers table to update the count column in Groups table when a new user is added to any group in GroupMembers table
It depends on your table engine. If your table engine is MyISAM it would be much faster because it would simply read number of rows in the table from stored value, however Innodb engines will need to do a full table scan.
It is not recommended to store a count inside of the table itself, so if this is something you're worried about, use the MyISAM engine if possible.
Storing a value in the table would needlessly require an extra UPDATE query on each new/lost membership.
I am having problem in fetching a number of records from while joining tables. Please see the below query:
SELECT
H.EIN,
H.OUC,
(
SELECT
COUNT(1)
FROM
tbl_Checks C
INNER JOIN INFM_People_OR.dbo.tblHierarchy P
ON P.EIN = C.EIN_Checked
WHERE
(H.EIN IN (P.L1, P.L2)
OR H.EIN = C.EIN_Checked)
AND C.[Read] = 1
) AS [Read]
FROM
INFM_People_OR.dbo.tblHierarchy H
LEFT JOIN tbl_Checks C
ON H.EIN = C.EIN_Checked
WHERE
H.L1 = #EIN
GROUP BY
H.EIN,
H.OUC,
C.Check_Date
Even if there are just 100 records this query takes a much more time(around 1 min).
Please suggest a solution to tune up this query as it is throwing error in front end
Given just the query there are a few things that stick out as being non-optimal:
Any use of OR will be slower:
WHERE
(H.EIN IN (P.L1, P.L2)
OR H.EIN = C.EIN_Checked)
AND C.[Read] = 1
If there's any way to rework this based off of your data set so that both the IN and the OR are replaced with ANDs that would help.
Also, use of a local variable in the WHERE clause will not work well with the optimizer:
WHERE
H.L1 = #EIN
Finally, make sure you have indexes (and hopefully these are integer fields) where you are doing your joins and group bys (H.EIN, H.OUC, C.Check_Date
The size of the result set (100 records) doesn't matter as much as the size of the joined tables and whether or not they have appropriate indexes.
The Estimated number of rows affected is 1196880 is very high resulting in high execution time of query. I have also tried to join the tables only once but that it giving different output.
Please suggest any other solution than creating indices as I have already created non-clustered index for the table tbl_checks but it doesn't make any difference.
Below is the SQl execution plan.