Group By with Where clause in SQL - sql-server

I got some difference in the results when using a DateTime column in groupby. Can someone explain why?
Query:
Select Name, Source, Description, CreatedDate
From testTable
Where Source like '%Validating err%'
And CreatedDate >='2016-12-01'
Group By Name , Source, Description, CreatedDate
Result : 15 rows
The above query return me some 15 results. But when i remove the CreatedDate column from groupby clause it returns only 4 results.
Query:
Select Name, Source, Description
From testTable
Where Source like '%Validating err%'
And CreatedDate >='2016-12-01'
Group By Name , Source, Description
Result : 4 rows

I am adding this answer just for the benefit of #John so he can visually understand why his two result sets have differing numbers of records.
Imagine a table called shirts, which has only two columns, size and color. Here is some sample data:
size | color
S | red
S | green
S | blue
M | red
M | green
M | blue
L | red
L | green
L | blue
In other words, there are three sizes of shirts, and each size has three possible colors.
Now, if you execute the following query:
SELECT size
FROM shirts
GROUP BY size
you will get three records back, containing only the three sizes. However, if you do the following:
SELECT size, color
FROM shirts
GROUP BY size, color
Then you would get back nine records, or groups. All that is happening here is that the addition of another column creates new possible group combinations, and hence more groups. And the same concept applies to what you are seeing in your two queries.

Related

SQLite delete rows based on multiple columns

im pretty new to SQLite hence asking this question!
I need to remove rows in a table so that I have the earliest occurence of column each unique value in column X(colour) based on column Y (time).
Basically i have this:
test | colour | time(s)
one | Yellow | 8
one | Red | 6
one | Yellow | 10
two | Red | 4
Which i want to remove rows so that is looks like:
test | colour | time(s)
one | Yellow | 8
two | Red | 4
Thanks in advance!
EDIT: To be clearer i need to retain the Earliest occurence in time that each colour occurred, regardless of the test.
EDIT: I can select the rows i want to keep by doing this:
select * from ( select * from COL_TABLE order by time desc) x group by colour;
which produces the desired result, but i want to remove what is not there in the result of the select.
EDIT: The following worked thanks to #JimmyB:
DELETE FROM COL_TABLE WHERE EXISTS ( SELECT * FROM COL_TABLE t2 WHERE COL_TABLE .colour = t2.colour AND COL_TABLE .test = t2.test AND COL_TABLE .time < t2.time )
You can include subqueries (EXISTS/NOT EXISTS) in the WHERE clause of a DELETE statement.
Like subqueries in SELECTs, these can refer to the table in the outer statement to create matches.
In your case, try this:
DELETE FROM my_table
WHERE EXISTS (
SELECT *
FROM my_table t2
WHERE my_table.colour = t2.colour
AND my_table.test = t2.test
AND my_table.time < t2.time
)
This statement uses three noteworthy constructs:
Subquery in DELETE
Self-join
Emulation of a MIN(...), via self-join
The subquery with EXISTS is mentioned above.
The self-join is required whenever one row of a table must be compared against other rows of the same table. Finding the minimum value of some column is exactly that.
Normally, you'd use the MIN(...) function to find the minimum. The minimum can be defined as the single value for which no lower value exists, and that's what we're using here because we're not actually interested in the actual value but only want to identify the record which contains that value.
(Since we're deleting, our SELECT yields all the non-minimum rows, which we want to delete to keep only the minimums.)
So, what the statement says is:
Delete all records from my_table for which there is at least one record in my_table with the same colour and the same test but a lower time.

output the column name with the highest value

I have a lot of columns with numbers in them in a SQL Server table.
I need to check the specific columns for the highest number and output the name of the column for each row in the table.
For example:
RED | BLUE | GREEN | BLACK | Highest Column
0 2 1 4 BLACK <-- Outputted result of an expression
I have a dataset that pulls all the columns from the database table.
I need an expression that will evaluate the data and return the highest valued column name.
I'm not sure of the logic behind this type of situation. Any help would be appreciated.
This is an SSRS report.
Use CROSS APPLY. This will give you the highest valued color(black):
DECLARE #t table(red int, blue int, green int, black int)-- | Highest Column
INSERT #t values(0,2,1,4)
SELECT red, blue, green, black, highest
FROM #t -- replace #t with your own table
CROSS APPLY
(SELECT top 1 color highest
FROM
(VALUES('RED', red), ('BLUE', blue),
('GREEN', green), ('BLACK', black)) x(color, value)
ORDER BY value desc) x
It should be quite easy to replace #t with your own table.

return a list of unmatched records between queries

I apologize if this has been asked already I couldn't find anything that was quite what I wanted ->
Is there a way to return a list of records that do match two queries for example
ID | Name | Color
1 crayon blue
2 marker red
3 paint green
"Select Id, Name, color from TableA" =
ID | Name | Color
1 crayon blue
2 marker red
3 paint green
"Select Id, Name, color from TableA where color = 'blue'" =
ID | Name | Color
1 crayon blue
I was hoping there was some functionality that would take the two queries above and provide a result set like:
ID | Name | Color
2 marker red
3 paint green
Being the records of the two queries that do not equal.
Thanks in advance!
I am going to assume that your queries are really more complicated and this is just an example. One way is by using left join:
with q1 as (<query1 here>),
q2 as (<query2 here>)
select q1.*
from q1 left join
q2
on q1.id = q2.id
where q2.id is null;
This assumes the match is on id. If there are more column that need to be the same, add them to the on clause.
What about this:
<query 1 here>
EXCEPT
<query 2 here>
Of course, use of EXCEPT assumes that:
The number and the order of the columns must be the same in both
queries.
The data types must be compatible.
You can try the following query to get your desired result set:
Select Id, Name, color from TableA
EXCEPT
Select Id, Name, color from TableA where color = 'blue'

how to design a database table with similar entries

I currently have a table that has the majority (like 99%) of the data of a field dependent on a single field but the other 1% is dependent on other fields.
For example, I have the following price table
product PK
color PK
cost
The following are some entries in that table
product|color|cost
pen|red|$1.00
pen|blue|$1.00
pen|green|$1.00
etc....
pen|black|$0.90
pen|white|$0.85
pencil|red|$0.50
pencil|blue|$0.50
pencil|green|$0.50
etc...
pencil|black|$0.35
pencil|gray|$0.40
The problem I'm having with this table is that whenever I have to add a single product or color I have to add hundreds of similar entries to this table.
I'm currently thinking of storing the data in the following way
pen|all_other_colors|$1.00
pen|black|$0.90
pen|white|$0.85
pencil|all_other_colors|$0.50
pencil|black|$0.35
pencil|gray|$0.40
Am I on the right track or is there a better database design that handles this problem? Any help or links would be appreciated. I can't get the right wording to google for this problem.
You need to normalize database tables
break it in three tables as below:
Products
id | product
colors
id | color
product_cost
id | Product_id | color_id | Cost
BaseProduct has all product names and their prices
Product has all available valid product-color combinations
ColorPrice is the difference (offset) from the base price.
ColorCharge has only rows with exception pricing (no rows for ColorPrice = 0)
To get info for a specific base product (specific_prodict_id)
select
b.ProductName
, c.ColorName
, b.ProductPrice + coalesce(x.ColorPrice, 0.0) as ProductPrice
from Product as p
join BaseProduct as b on b.BaseProductID = p.BaseProductID
join Color as c on c.ColorId = p.ColorId
left join ColorCharge as x on x.BaseProductID = p.BaseProductID and x.ColorID = p.ColorID
where p.BaseProductID = specific_prodict_id;
You could group colors together and then insert the price for the whole group:
And your example could be represented similarly to this...
PRICE:
PRODUCT_ID GROUP_ID PRICE
pen 1 $1.00
pen 2 $0.90
pen 3 $0.85
pencil 1 $0.50
pencil 2 $0.35
pencil 4 $0.40
GROUP:
GROUP_ID
1
2
3
4
COLOR_GROUP:
GROUP_ID COLOR_ID
1 red
1 blue
1 green
2 black
3 white
4 gray
COLOR:
COLOR_ID
red
blue
green
black
white
gray
Whether the increased complexity is worth it is up to you...

SQL make rows into columns, PIVOT maybe

I have an MS SQL Server with a database for an E-commerce storefront.
This is some of the tables I have:
Products:
Id | Name | Price
ProductAttributeTypes: -Color, Size, Format
Id | Name
ProductAttributes: --Red, Green, 12x20 cm, Mirrored
Id | ProductAttributeTypeId | Name
Orders:
Id | DateCreated
OrderItems:
Id | OrderId | ProductId
OrderItemsToProductAttributes: --Relates an OrderItem to its product and selected attributes
OrderItemId | ProductAttributeId | ProductAttributeTypeId | ProductId
I want to select from the OrderItems table, to see which items have been purchased.
To see what kind of variants (ProductAtriibutes) was selected, I want those as "dynamic" columns in the resultset.
So the resultset should look like this:
OrderItemId | ProductId | ProductName | Color | Size | Format
1234 123 Mount. Bike Red 2x20 Mirror
I don't know if PIVOT is the thing to use? I'm not using any aggregate functions, so I guess not...
Is there any SQL Ninjas that can help me out?
If you are using sql2005 or 2008 you can use the pivot command. See here.
In the example below the OrderAttributes set will look like:
OrderItemId AttName AttValue
----- ------ -----
100 Color Red
100 Size Small
101 Color Blue
101 Size Small
102 Color Red
102 Size Small
103 Color Blue
103 Size Large
The final results after the PIVOT will be:
OrderItemId Size Color
----- ------ -----
100 Small Red
101 Small Blue
102 Small Red
103 Large Blue
WITH OrderAttributes(OrderItemId, AttName, AttValue)
AS (
SELECT
OrderItemId,
pat.Name AS AttName,
pa.Name AS AttValue
FROM OrderItemsToProductAttributes x
INNER JOIN ProductAttributes pa
ON x.ProductAttributeId = pa.id
INNER JOIN ProductAttributeTypes pat
ON pa.ProductAttributeTypeId = pat.Id
)
SELECT AttrPivot.OrderItemId,
[Size] AS [Size],
[Color] AS Color
FROM OrderAttributes
PIVOT (
MAX([AttValue])
FOR [AttName] IN ([Color],[Size])
) AS AttrPivot
ORDER BY AttrPivot.OrderItemId
There is a way to dynamically build the columns (i.e. the Color and Size columns), as can be seen here. Make sure your database compatibility level on your database is set to something greater than 2000 or you will get strange errors.
In the past, I've created physical tables for read purposes only. The structure you have above is GREAT for storage, but terrible for reporting.
So you could do the following:
Write a script (that is scheduled nightly) or a trigger (on data change) that does the following tasks:
First, you would dynamically go through each Product and build a static table "Product_[ProductName]"
Then go through each ProductAttributeTypes for each product and create/update/delete a physical column on the corresponding Product table.
Then, fill that table with the proper values based on OrderItemsToProductAttributes and ProductAttributes
This is just a rough idea. Make sure you are storing OrderID in the "Static"/"Flattened" tables. And make sure you do everything else you need to do. But after that, you should be able to start pulling from those flattened tables to get the data you need.
Pivot is your best bet, but what I did for reporting purposes, and to make it work well with SSIS is to create a view, which then has this query:
SELECT [InputSetID], [InputSetName], CAST([470] AS int) AS [Created By], CAST([480] AS datetime) AS [Created], CAST([479] AS int) AS [Updated By], CAST([460] AS datetime)
AS [Updated]
FROM (SELECT st.InputSetID, st.InputSetName, avt.InputSetID AS avtID, avt.AttributeID, avt.Value
FROM app.InputSetAttributeValue avt JOIN
app.InputSets st ON avt.InputSetID = st.InputSetID) AS p PIVOT (MAX(Value) FOR AttributeID IN ([470], [480], [479], [460])) AS pvt
Then I can just interact with the view, but, I have a trigger on the table that any new dynamic attributes must be added to, which recreates this view, so I can assume the view is always correct.

Resources