I have a table that has a unique string column and a department description. The length of the unique string column represents the department hierarchy so 4 character length is the lowest level while 2 character length the highest.
My goal is to create new variables so I can show the hierarchy levels and corresponding department descriptions for each row and use these new columns as filters
My SQL code is working; however, it takes more than 20 minutes to generate results for a 1300 row table.
Is there a better way to optimize this query? Note that I’m only using one table and creating multiple copies to create the final version that I’d like to achieve.
m.UniqueDescription as "Department Code",
m.DepartmentDescription as "Department",
Left(m.UniqueDescription,2) as "Level 2 Hierarchy",
Left(m.UniqueDescription,3) as "Level 3 Hierarchy",
Left(m.UniqueDescription,4) as "Level 4 Hierarchy",
l2. DepartmentDescription as "L2 Department",
l3. DepartmentDescription as "L3 Department",
l4. DepartmentDescription as "L4 Department"
From department_table m
LEFT JOIN department_table l2
ON Left(m.UniqueDescription,2) = l2.UniqueDescription
LEFT JOIN department_table l3
ON Left(m.UniqueDescription,3) = l3.UniqueDescription
LEFT JOIN department_table l4
ON Left(m.UniqueDescription,4) = l4.UniqueDescription"
Below is the output that I would like to achieve:
Table Format
First thing, the structure and missing of numeric IDs is not a good practice
Check for index creation.
Do not use functions on the left side of your ON or WHERE clauses, it doesn't allow to the execution planner to index those columns.
Instead of FUNCTION(LeftTable.Column) = value use LeftTable.Column = INVERSE_FUNCTION(value)
Related
in the Snowflake Docs it says:
First, prune micro-partitions that are not needed for the query.
Then, prune by column within the remaining micro-partitions.
What is meant with the second step?
Let's take the example table t1 shown in the link. In this example table I use the following query:
SELECT * FROM t1
WHERE
Date = ‚11/3‘ AND
Name = ‚C‘
Because of the Date = ‚11/3‘ it would only scan micro partitions 2, 3 and 4. Because of the Name = 'C' it can prune even more and only scan micro-partions 2 and 4.
So in the end only micro-partitions 2 and 4 would be scanned.
But where does the second step come into play? What is meant with prune by column within the remaining micro partitions?
Does it mean, that only rows 4, 5 and 6 on micro-partition 2 and row 1 on micro-partition 4 are scanned, because date is my clustering key and is sorted so you can prune even further with the date?
So in the end only 4 rows would be scanned?
But where does the second step come into play? What is meant with prune by column within the remaining micro partitions?
Benefits of Micro-partitioning:
Columns are stored independently within micro-partitions, often referred to as columnar storage.
This enables efficient scanning of individual columns; only the columns referenced by a query are scanned.
It is recommended to avoid SELECT * and specify required columns explicitly.
It simply means to only select the columns that are required for the query. So in your example it would be:
SELECT col_1, col_2 FROM t1
WHERE
Date = ‚11/3‘ AND
Name = ‚C‘
I am trying to get a total summation of both the ItemDetail.Quantity column and ItemDetail.NetPrice column. For sake of example, let's say the quantity that is listed is for each individual item is 5, 2, and 4 respectively. I am wondering if there is a way to display quantity as 11 for one single ItemGroup.ItemGroupName
The query I am using is listed below
select Location.LocationName, ItemDetail.DOB, SUM (ItemDetail.Quantity) as "Quantity",
ItemGroup.ItemGroupName, SUM (ItemDetail.NetPrice)
from ItemDetail
Join ItemGroupMember
on ItemDetail.ItemID = ItemGroupMember.ItemID
Join ItemGroup
on ItemGroupMember.ItemGroupID = ItemGroup.ItemGroupID
Join Location
on ItemDetail.LocationID = Location.LocationID
Inner Join Item
on ItemDetail.ItemID = Item.ItemID
where ItemGroup.ItemGroupID = '78' and DOB = '11/20/2019'
GROUP BY Location.LocationName, ItemDetail.DOB, Item.ItemName,
ItemDetail.NetPrice, ItemGroup.ItemGroupName
If you are using SQL Server 2012 , you can use the summation on partition to display the
details and aggregates in the same query.
SUM(SalesYTD) OVER (ORDER BY DATEPART(yy,ModifiedDate)),1)
Link :
https://learn.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15
We can't be certain without seeing sample data. But I suspect you need to remove some fields from you GROUP BY clause -- probably Item.ItemName and ItemDetail.NetPrice.
Generally, you won't GROUP BY a column that you are applying an aggregate function to in the SELECT -- as in SUM(ItemDetail.NetPrice). And it is not very common, in my experience, to GROUP BY columns that aren't included in the SELECT list - as you are doing with Item.ItemName.
I think you need to go back to basics and read about what GROUP BY does.
First of all welcome to the overflow...
Second: The answer is going to be "It depends"
Any time you aggregate data you will need to Group by the other fields in the query, and you have that in the query. The gotcha is what happens when data is spread across multiple locations.
My suggestion is to rethink your problem and see if you really need these other fields in the query. This will depend on what the person using the data really wants to know.
Do they need to know how many of item X there are, or do they really need to know that item X is spread out over three sites?
You might find you are better off with two smaller queries.
I have 2 sql queries doing the same thing, first query takes 13 sec to execute while second takes 1 sec to execute. Any reason why ?
Not necessary all the ids in ProcessMessages will have data in ProcessMessageDetails
-- takes 13 sec to execute
Select * from dbo.ProcessMessages t1
join dbo.ProcessMessageDetails t2 on t1.ProcessMessageId = t2.ProcessMessageId
Where Id = 4 and Isdone = 0
--takes under a sec to execute
Select * from dbo.ProcessMessageDetails
where ProcessMessageId in ( Select distinct ProcessMessageId from dbo.ProcessMessages t1
Where Where Id = 4 and Isdone = 0 )
I have clusterd index on t1.processMessageId(Pk) and non clusterd index on t2.processMessageId (FK)
I would need the actual execution plans to tell you exactly what SqlServer is doing behind the scenes. I can tell you these queries aren't doing the exact same thing.
The first query is going through and finding all of the items that meet the conditions for t1 and finding all of the items for t2 and then finding which ones match and joining them together.
The second one is saying first find all of the items that are meet my criteria from t1, and then find the items in t2 that have one of these IDs.
Depending on your statistics, available indexes, hardware, table sizes: Sql Server may decide to do different types of scans or seeks to pick data for each part of the query, and it also may decide to join together data in a certain way.
The answer to your question is really simple the first query which have used will generate more number of rows as compared to the second query so it will take more time to search those many rows that's the reason your first query took 13 seconds and the second one to only one second
So it is generally suggested that you should apply your conditions before making your join or else your number of rows will increase and then you will require more time to search those many rows when joined.
This might be a nightmare.
Let's say I have two rows of data in two different tables, each row containing one character each. A is Row1 and B is Row2 in Table1 and is reversed in Table2. B is Row1 and A is Row2.
I also have a third table that contains three columns. The first two are columns to be joined on, and the third is the resulting value, depending on what was joined in the first two columns.
A,A= 1
A,B=.8
A,C=.2
B,A=.8
B,B= 1
B,C=.6
C,A=.2
C,B=.6
C,C= 1
What I'm trying to do, in essence, is try finding the highest-rated pairs from Table1 and Table2 by using associated values within Table3.
A,A= 1
B,B= 1
Because of the matching A's in Table1+2 and matching B's in Table1+2. Instead, I forgot that by just aimlessly joining tables, I get this instead:
A,A= 1
A,B=.8
B,A=.8
B,B= 1
However, I'm getting ALL possible pairs, and that won't work. And the problem here is that I cannot do a direct JOIN between Table1+2, because a value within Table1 might not match up with Table2, for instance...
Row1 and Row2 in Table1 is A,B and Row1 and Row2 in Table2 is B,C. If I do a direct JOIN, values A and C won't line up with each other, leaving me only with the pairs of B.
I thought of one more problem with this, though! In trying to use a subquery, the subquery would be constantly re-run... meaning that previously selected rows would then be up for grabs again the next time, leading to incorrect values.
For instance, with A,B and B,C... I would expect to get this returned via subqueries:
A,B=.8
B,B= 1
Unless, of course, there's a way from disqualifying a row from being used again.
Any suggestions or ideas? I'm using Access but I'm sure the concepts apply to any database solution.
i have three tables
documents
attributes
attributevalues
documents can have many attributes
and these atributes have value in attributevalue table
what i want in single query get all documents and assigned atributes of relevant documents in row each row
(i assume every documents have same attributes assigned dont need complexity of diffrent attribues now)
for example
docid attvalue1 attvalue2
1 2 2
2 2 2
3 1 1
how can i do that in single query
Off the top if my head, I don't think you can do this without dynamic SQL.
The crux of the Entity-Attribute-Value (EAV) technique (which is what you are using) is to store columns as rows. What you want to do is convert those rows back to columns for the purpose of this query. Using PIVOT makes this possible. However, PIVOT requires knowing the number of rows that need to be converted to columns at the time the query is written. So assuming you are using EAV because you need flexible attributes/values, you won't know this information when you write the query.
So the solution would be to use dynamic SQL in conjunction with PIVOT. Did a quick search and this looks promising (didn't really read the whole thing):
http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx
For the record, I am not a fan of dynamic SQL and would recommend finding another approach to the larger problem (e.g. pivoting in application code).
If you know all the attributes (and their IDs) at design-time:
SELECT d.docid,
a1.attvalue AS attvalue1
a2.attvalue AS attvalue2
FROM documents d
JOIN attributevalues a1 ON d.docid = a1.docid
JOIN attributevalues a2 ON d.docid = a2.docid
WHERE a1.attrid = 1
AND a2.attrid = 2
If you don't, things get quite a bit messier and difficult to answer without knowing your schema.
lets make example
documents table's columns
docid,docname,createddate,createduser
and values
1 account.doc 10.10.2010 aeon
2 hr.doc 10.11.2010 aeon
atributes table's columns
attid,name,type
and values
1 subject string
2 recursive int
attributevalues table's columns
attvalueid,docid,attid,attvalue(sql_variant)
and values
1 1 1 "accounting doc"
1 1 2 0
1 2 1 "humen r doc"
1 2 2 1
and I want query result
docid,name,atribvalue1,atribvalue1,atribvalueN
1 account.doc "accounting doc" 0
2 hr.doc "humen r doc" 1