Join two rows together if they share the same value? - database

I've shifted through views and other points and I've gotten to here. Take example below
Name | Quantity | Billed |
| | |
PC Tablet| 0 | 100 |
PC Tablet| 100 | -2345 |
Monitor | 9873 | 0 |
Keyboard | 200 | -300 |
So basically the select I would do off this view. I would want it to Bring in the data BUT it be ordered by the Name first so its in nice alphabetical order and also for a few reasons some of the records appear more then once (I think the most is 4 times). If you add the up the rows with duplicates the true 'quantity' and 'billed' would be correct.
NOTE: The actual query is very long but I broke it down for a simple example to explain the problem. The idea is the same but there is A LOT MORE columns that needs to be added together... So I'm looking for a query that would bring them together if it contains the same name. I've tried a bunch of different queries with no success either it rolls ALL the rows into one. or it won't work and I get a bunch of null errors/ name column is invalid in the select list/group by because it's not an aggregate function..
Is this even possible?

Try:
SELECT A.Name, A.TotalQty, B.TotalBilled
FROM (
SELECT Name, SUM(Quantity) as TotalQty
FROM YourTableHere
GROUP BY Name
) A
INNER JOIN
(
SELECT Name, SUM(Billed) as TotalBilled
FROM YourTableHere
GROUP BY Name
) B
ON A.Name = B.Name

Related

SQL Server 2008 - False error for "Msg 8120"?

I am writing a query in SQL Server 2008 (Express I believe?). I am currently getting this error:
Msg 8120, Level 16, State 1, Line 16
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I am trying to do a historical analysis of our production WIP (Work In Process).
I have created a standalone calendar table (actually located in a separate database called BAS on the same server to not interfere with the ERP that operates the AIM database). I've been overwhelmed for days with some of the examples for creating running total queries/views/tables, so for now I'll just plan on taking care of that part inside of Crystal Reports 2016. My thinking was that I wanted to return records for each order each day of my calendar table (to be narrowed down in the future to only days that match records in the AIM database). The values I think I will need are:
Record Date (not unique)
Order Number (unique for each day)
Estimated hours for the job
The total number of hours worked on the job current as of today's date (in case the estimated hours were drastically underbudgeted)
The SUM of the direct labor hours charged to the job on said record date
The COUNT of the number of employees in attendance on said record date.
The SUM of the hours attended by employees on said record date.
The tables I use are as follows:
BAS Database:
dbo.DateDimension - Used for complete calendar of dates from 1/1/1987 to 12/31/2036
AIM Database:
dbo.AggAttend - Contains one or more records for each employee's attendance duration on a given date (i.e. One record for each punch-in / punch-out. Should be equal to indirect + direct labor)
dbo.AggTicket - Contains one or more records for each employee's direct labor duration charged to a particular order number
dbo.ModOrders - Contains one record for each order including the estimated hours, start date, and end date (I will worry about using the start and end dates later for figuring out how many available hours there were on each date)
Here is the code I'm using in my query:
;WITH OrderTots AS
(
SELECT
AggTicket.OrderNo,
SUM(AggTicket.TotDirectHrs) AS TotActHrs
FROM
AIM.dbo.AggTicket
GROUP BY
AggTicket.OrderNo
)
SELECT
d.Date,
t.OrderNo,
o.EstHrs,
OrderTots.TotActHrs,
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs,
COUNT(a.EmplCode) AS NumEmployees,
SUM(a.TotHrs) AS DaysAttendHrs
FROM
BAS.dbo.DateDimension d
INNER JOIN
AIM.dbo.AggAttend a ON d.Date = a.TicketDate
LEFT OUTER JOIN
AIM.dbo.AggTicket t ON d.Date = t.TicketDate
LEFT OUTER JOIN
AIM.dbo.ModOrders o ON t.OrderNo = o.OrderNo
LEFT OUTER JOIN
OrderTots ON t.OrderNo = OrderTots.OrderNo
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs
ORDER BY
d.Date
When I run that query in SQL Server Management Studio 2017, I get the above error.
These are my questions for the community:
Does this error message correctly describe an error in my code?
If so, why is that error an error? (To the best of my knowledge, everything is already contained in either an aggregate function or in the GROUP BY clause...smh)
What is a better way to write this query so that it will function?
Much appreciation to everyone in advance!
I am writing a query in SQL Server 2008 (Express I believe?).
SELECT ##VERSION Will let you know what version you are on.
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.
The problem is with your SUM OVER() statement:
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs
Here, since you are using the OVER clause, you must include it in the GROUP BY. The OVER clause is used to determine the partitioning and order of a row-set for a window function. So, while you are using an aggregate with SUM you are doing this in a window function. Window functions belong to a type of function known as a 'set function', which means a function that applies to a set of rows. The word 'window' is used to refer to the set of rows that the function works on.
Thus, add t.TotDirectHrs to the GROUP BY
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs, t.TotDirectHrs
If this narrows your results into a grouping that you don't want, then you can wrap it in another CTE or use a correlated sub-query. Potentially like the below:
(SELECT SUM(t2.TotDirectHrs) OVER (PARTITION BY t2.TicketDate) AS DaysDirectHrs FROM AIM.dbo.AggTicket t2 WHERE t2.TicketDate = t.TicketDate) as DaysDirectHrs,
EXAMPLE
if object_id('tempdb..#test') is not null
drop table #test
create table #test(id int identity(1,1), letter char(1))
insert into #test
values
('a'),
('b'),
('b'),
('c'),
('c'),
('c')
Given the data set above, suppose we wanted to get a count of all rows. That's simple right?
select
TheCount = count(*)
from
#test
+----------+
| TheCount |
+----------+
| 6 |
+----------+
Here, no GROUP BY is needed because it's implied to group over all columns since no columns are specified in the SELECT list. Remember, GROUP BY groups the SELECT statement results according to the values in a list of one or more column expressions. If aggregate functions are included in the SELECT list, GROUP BY calculates a summary value for each group. These are known as vector aggregates.[MSDN].
Now, suppose we wanted to count each letter in the table. We could do that at least two ways. Using COUNT(*) with the letter column in the select list--or using COUNT(letter) with the letter column in the select list. However, in order for us to attribute the count with the letter, we need to return the letter column. Thus, we must include letter in the GROUP BY to tell SQL Server what to apply the summary table to.
select
letter
,TheCount = count(*)
from
#test
group by
letter
+--------+----------+
| letter | TheCount |
+--------+----------+
| a | 1 |
| b | 2 |
| c | 3 |
+--------+----------+
Now, what if we wanted to return this same count, but we wanted to return all rows as well? This is where window functions come in. The window function works similar to GROUP BY in this case by telling SQL Server the set of rows to apply the aggregate to. Then, it's value is returned for for every row in this window / partition. Thus, it returns a column which is applied to every row making it just like any column or calculated column which is returned form the select list.
select
letter
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
+--------+---------------------+
| letter | TheCountOfTheLetter |
+--------+---------------------+
| a | 1 |
| b | 2 |
| b | 2 |
| c | 3 |
| c | 3 |
| c | 3 |
+--------+---------------------+
Now we get to your case where you want to use an aggregate and an aggregate in a window function. Remember that the return of the window function is treated like any other column, thus must be applied in the GROUP BY. Pseudo would look something like this, but window functions aren't allowed in the GROUP BY clause.
select
letter
,TheCount = count(*)
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
group by
letter
,count(*) over (partition by letter)
--returns an error
Thus, we must a correlated sub-query or a cte or some other method.
select
t.letter
,TheCount = count(*)
,TheCountOfTheLetter = (select distinct count(*) over (partition by letter) from #test t2 where t2.letter = t.letter)
from
#test t
group by
t.letter
+--------+----------+---------------------+
| letter | TheCount | TheCountOfTheLetter |
+--------+----------+---------------------+
| a | 1 | 1 |
| b | 2 | 2 |
| c | 3 | 3 |
+--------+----------+---------------------+

select all rows that have more than one ID in a specific date

My table:
Items | Price | UpdateAt
1 | 2000 | 02/02/2015
2 | 4000 | 06/04/2015
3 | 2150 | 07/05/2015
4 | 1800 | 07/05/2015
5 | 5540 | 08/16/2015
4 | 1700 | 12/24/2015
5 | 5200 | 12/26/2015
2 | 3900 | 01/01/2016
4 | 2000 | 06/14/2016
As you can see, this is a table that keeps items' price as well as their old price before the last update.
Now I need to find the rows which :
UpdateAt is more than 1 year ago from now
Must have updated price at least once ever since
Aren't the most up-to-date price
So with those conditions, the result from the above table should be :
Items | Price | UpdateAt
2 | 4000 | 06/04/2015
4 | 1800 | 07/05/2015
I can achieve what I need with this
Declare #LastUpdate date set #LastUpdate = DATEADD(YEAER, -1, GETDATE())
select Items, UpdateAt from ITEM_PRICE where Items in (
select Items from (
select Items, count(Items) as C from ITEM_PRICE group by Items) T
where T.C > 1)
and UpdateAt < #LastUpdate
But since I am still a newbie in sqlserver, and this need to be done in vb.net, passing along that query with lots of select in it seems sloppy and hard to maintain.
So, I would like to ask if anyone can give me a simpler solution ?
Sorry, i edited my question as I need one more condition to be met after trying #Tim Biegeleisen's answer, which is indeed the correct one for the question before edit. And I can't figure this out anymore.
Why I need all those condition, it's because I'm having to clean up the table: Clearing off the data that's older than 1 year, while still keeping the most up-to-date item price.
In my answer below, I use a subquery to identify all items which appear in the table during the last year. This is the requirement of having an updated price "at least once ever since." In the outer query, I restrict to only records which are older than one year from now, which is the other part of the requirement. An INNER JOIN is used, because we want to filter off records which do not meet both criteria.
SELECT t1.Items, t1.Price, t1.UpdateAt
FROM ITEM_PRICE t1
INNER JOIN
(
SELECT DISTINCT Items
FROM ITEM_PRICE
WHERE UpdateAt > DATEADD(year, -1, GETDATE())
) t2
ON t1.Items = t2.Items
WHERE t1.UpdateAt <= DATEADD(year, -1, GETDATE())
Once again, SQL Fiddle is having problems simulating SQL Server. But I went ahead and created a Fiddle in MySQL, which looks nearly identical to my SQL Server answer. You can verify that the logic and output are correct.
SQLFiddle

SQL Query returns multiple rows of the same record when View includes one-to-many table

In MS-SQL, I have a View 'ListingResult' which contains rows from tables 'ListingCategoryXref' and 'Listing'. This is the View statement:
SELECT
dbo.Listing.ListingName,
dbo.Listing.ListingId,
dbo.ListingCategoryXref.CategoryId
FROM dbo.Listing INNER JOIN
dbo.ListingCategoryXref ON dbo.Listing.ListingId = dbo.ListingCategoryXref.ListingId
GROUP BY
dbo.Listing.ListingName,
dbo.Listing.ListingId,
dbo.ListingCategoryXref.CategoryId
Listings can have many rows in ListingCategoryXref, thus.
ListingResult (View)
Listing (table)
ListingId ListingName StateId
1 Toms bar 3
2 French place 5
ListingCategoryXref (table)
ListingId CategoryId
1 10
1 15
The query below returns a row per Listing per ListingCategoryXref.
SELECT TOP(26)
[ListingResult].[ListingId],
[ListingResult].[ListingName]
FROM [ListingResult]
WHERE [ListingResult].[StateId] = 3
So 'Tom's Bar' is returned twice as it has two categories. I figure I can either change the query above, or change the ListingResult View in SQL. I still need to return 26 results which I can't dictate if I use a wrapped select statement with ROW_NUMBER() OVER(PARTITION BY ListingId. (Is that true?) I'm using LLBLGen to access the DB so I'd prefer to change the view, if that is possible? Apologies for my newness to SQL being that obvious.
From the query above, the following result will be returned...
ListingName | ListingId | CategoryId
Toms bar | 1 | 10
Toms bar |1 | 15
If you only want Toms bar to be returned once, you'll need to remove the CategoryId column from the result set, and the group by clause, or add CategoryId to an agrgate function, and remove it from the group by clause i.e.
SELECT
dbo.Listing.ListingName,
dbo.Listing.ListingId,
COUNT(dbo.ListingCategoryXref.CategoryId) as Categories
FROM dbo.Listing
INNER JOIN dbo.ListingCategoryXref ON dbo.Listing.ListingId = dbo.ListingCategoryXref.ListingId
GROUP BY dbo.Listing.ListingName, dbo.Listing.ListingId
Which will return...
ListingName | ListingId | Categories
Toms bar | 1 | 2
Can you give an example of what you would like to see?

Possible to query a database into excel on a cell by cell basis? Or another solution..?

I have various large views/stored procedures that basically churns out a lot of data into an excel spread sheet. There was a problem where not all of the
company amounts weren't flowing through. I narrowed it down to a piece of code in a stored procedure: (Note this is cut down for simplicity)
LEFT OUTER JOIN view_creditrating internal_creditrating
ON creditparty.creditparty =
internalrating.company
LEFT OUTER JOIN (SELECT company, contract, SUM(amount) amount
FROM COMMON_OBJ.amount
WHERE status = 'Active'
GROUP BY company, contract) col
ON vd.contract = col.contract
Table with issue:
company | contract | amount |
| | |
TVC | NULL | 1006 |
KS | 10070 | -2345 |
NYC-G | 10060 | 334000 |
NYC-G | 100216 | 4000 |
UECR | NULL | 0 |
SP | 10090 | 84356 |
Basically some of the contracts are NULL. So when there is a LEFT OUTER JOIN on contract the null values in contract drop out and don't flow through...So i decided to do it based on company.
This also causes problems because company appears within the table more than once in order to show different contracts. With this change the query becomes ambiguous because it won't know if I want
contract 10060's amount or the contract 100216's amount and more often than not it gives me the incorrect amount. I thought about leaving the final ON clause with company = company.
This causes the least issues.... Then Somehow directly querying for for each cell value that would be inconsistent because it only affects a few cells. Although I've searched and I don't think that this is possible.
Is this possible?? OR is there another way to fix this on the database end?
As you've worked out, the problem is in the ON clause, and its use of NULL.
One way to alter the NULL to be a value you can match against is to use COALESCE, which would alter the clause to:
ON coalesce(vd.contract,'No Contract') = coalesce(col.contract,'No Contract')
This will turn all NULL's into 'No Contract', which will change the NULL=NULL test (which would return NULL) to 'No Contract'='No Contract', which will return True

Detecting Correlated Columns in Data

Suppose I have the following data:
OrderNumber | CustomerName | CustomerAddress | CustomerCode
1 | Chris | 1234 Test Drive | 123
2 | Chris | 1234 Test Drive | 123
How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perfectly? I'm thinking that Sql Server data mining is probably the right tool for the job, but I don't have too much experience with that.
Thanks in advance.
UPDATE:
By "correlate", I mean in the statistics sense, that whenever column a is x, column b will be y. In the above data, The last three columns correlate with each other, and the first column does not.
The input of the operation would be the name of the table, and the output would be something like :
Column 1 | Column 2 | Certainty
CustomerName | CustomerAddress | 100%
CustomerAddress | CustomerCode | 100%
There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:
http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx
I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.
What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:
select distinct
case when a.OrderNumber < b.OrderNumber then a.OrderNumber
else b.OrderNumber
end as FirstOrderNumber,
case when a.OrderNumber < b.OrderNumber then b.OrderNumber
else a.OrderNumber
end as SecondOrderNumber
from
MyTable a
inner join MyTable b on
a.CustomerName = b.CustomerName
and a.CustomerAddress = b.CustomerAddress
and a.CustomerCode = b.CustomerCode
This would return you:
FirstOrderNumber | SecondOrderNumber
1 | 2
Correlation is defined on metric spaces, and your values are not metric.
This will give you percent of customers that don't have customerAddress uniquely defined by customerName:
SELECT AVG(perfect)
FROM (
SELECT
customerName,
CASE
WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress)
THEN 0
ELSE 1
END AS perfect
FROM orders
GROUP BY
customerName
) q
Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.

Resources