Summing Distinct values - snowflake-cloud-data-platform

I have this code, and I want to sum only the distinct values from the "TARGET_QUIKPLANSur.PLAN" table. But I don't know what I am doing wrong.
SELECT Distinct TARGET_QUIKPLANSur.PLAN, TARGET_QUIKPLANSur.FORM, Sum(TARGET_NEWQUIKVALFSurDec.MANNLZD) AS PREMIUM
from TARGET_QUIKPLANSur Left JOIN TARGET_NEWQUIKVALFSurDec ON TARGET_QUIKPLANSur.PLAN=TARGET_NEWQUIKVALFSurDec.NPLAN
GROUP BY TARGET_QUIKPLANSur.FORM, TARGET_QUIKPLANSur.PLAN
HAVING(((TARGET_QUIKPLANSur.FORM)='LN-5350') and TARGET_QUIKPLANSur.PLAN = 'N06000')

How about this?
SELECT PLAN, FORM, SUM(MANNLZD) FROM
(SELECT Distinct TARGET_QUIKPLANSur.PLAN, TARGET_QUIKPLANSur.FORM, TARGET_NEWQUIKVALFSurDec.MANNLZD AS PREMIUM
from TARGET_QUIKPLANSur Left JOIN TARGET_NEWQUIKVALFSurDec ON ARGET_QUIKPLANSur.PLAN=TARGET_NEWQUIKVALFSurDec.NPLAN )
GROUP BY FORM, PLAN
HAVING((FORM='LN-5350') and PLAN = 'N06000')
Filtering before grouping would be better:
SELECT PLAN, FORM, SUM(MANNLZD) FROM
(SELECT Distinct TARGET_QUIKPLANSur.PLAN, TARGET_QUIKPLANSur.FORM, TARGET_NEWQUIKVALFSurDec.MANNLZD AS PREMIUM
from TARGET_QUIKPLANSur Left JOIN TARGET_NEWQUIKVALFSurDec ON ARGET_QUIKPLANSur.PLAN=TARGET_NEWQUIKVALFSurDec.NPLAN
WHERE FORM='LN-5350' and PLAN = 'N06000')
GROUP BY FORM, PLAN;

we formatting your SQL so it's more readable:
SELECT DISTINCT q.plan
,q.form
,SUM(n.mannlzd) AS premium
FROM target_quikplansur AS q
LEFT JOIN target_newquikvalfsurdec AS n
ON q.plan = n.nplan
GROUP BY q.form, q.plan
HAVING q.form='LN-5350' and q.plan = 'N06000'
One guess to what
I want to sum only the distinct values from the "TARGET_QUIKPLANSur.PLAN" table"
means, could be you have many rows in TARGET_QUIKPLANSur and you only want them to join once to your target_newquikvalfsurdec table, thus currently if you had two rows in target_newquikvalfsurdec with value 10 you want 20 as you answer, but if you have 2+ duplicate rows in TARGET_QUIKPLANSur you are getting something like 40
In that case you should deduplicate your TARGET_QUIKPLANSur values before joining to them
SELECT q.plan
,q.form
,SUM(n.mannlzd) AS premium
FOM (
SELECT DISTINCT plan, form
FROM target_quikplansur
) AS q
LEFT JOIN target_newquikvalfsurdec AS n
ON q.plan = n.nplan
WHERE q.form='LN-5350' and q.plan = 'N06000'
Another way your question:
I want to sum only the distinct values from the "TARGET_QUIKPLANSur.PLAN" table"
could be read, is you have some rows with duplicates and you only want the rows with no duplicates. Thus for the data
form plan
'LN-5350', 'N06000'
'LN-5350', 'N06000'
'LN-5350', 'N06001'
'LN-5350', 'N06002'
you want only the rows 'N06001', & 'N06002' as they don't have duplicates. I suspect this is not what your wanting, but if it was you would use COUNT of the rows, and HAVING to filter like.
SELECT q.plan
,q.form
,SUM(n.mannlzd) AS premium
FOM (
SELECT plan, form
FROM target_quikplansur
GROUP BY 1,2
HAVING COUNT(*) = 1
) AS q
LEFT JOIN target_newquikvalfsurdec AS n
ON q.plan = n.nplan
WHERE q.form='LN-5350' and q.plan = 'N06000'
OR another way to look at it, would be that you want the first row based on some other value, thus with:
form plan other
'LN-5350', 'N06000' 1
'LN-5350', 'N06000' 2
'LN-5350', 'N06001' 1
'LN-5350', 'N06002' 1
you might want the highest other value thus with a QUALIFY and ROW_NUMBER you can filter the values
SELECT q.plan
,q.form
,SUM(n.mannlzd) AS premium
FOM (
SELECT plan, form
FROM target_quikplansur
QUALIFY ROW_NUMBER() OVER (PARTITION BY plan, form ORDER BY other DESC) = 1
) AS q
LEFT JOIN target_newquikvalfsurdec AS n
ON q.plan = n.nplan
WHERE q.form='LN-5350' and q.plan = 'N06000'
to limit to these rows:
'LN-5350', 'N06000' 2
'LN-5350', 'N06001' 1
'LN-5350', 'N06002' 1

Related

Select top * from 2 table orderb by some field

I have 2 table like this
[Info]
[Score]
I want to select top3 * orderby score in May DESC
the result should be look like this.
Try a JOIN on both tables in a derived table with a SUM on Score and order on that
SELECT TOP 3 *
FROM(
SELECT
I.User_Id, I.[Name], I.Age, Score = SUM(S.Score)
FROM
Info I
INNER JOIN
Score S On S.User_Id = I.User_Id
WHERE MONTH(S.[DATE]) = 5 --May (however I suspect this might not be a DATE object)
GROUP BY
I.User_Id, I.[Name], I.Age
) X
ORDER BY X.Score DESC
Here you go, You can use join statement.
SELECT TOP(3) a.user_id, a.Name, a.Age, b.Score FROM Users a JOIN Score b On a.user_id=b.user_id Order By b.Score desc
The following will be helpfull.
SELECT TOP 3 S.User_Id, SUM(S.Score) Score, U.Name, U.Age
FROM Info U
INNER JOIN Score S ON U.User_Id = S.User_Id
WHERE MONTH(S.Date) = 5 --Only May Month.
GROUP BY S.User_Id, U.Name, U.Age
ORDER BY 2 DESC

Why Inner Join worked as Cross Join in SQL Server?

I am trying to join several tables using INNER JOIN.
Here is code
IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL
DROP TABLE #tmpRecData
--STEP 1
SELECT DISTINCT
pr.ChainID, pr.StoreID, pr.SupplierID, pr.ProductID,
MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
--ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO
#tmpRecData
FROM
dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN
ProductIdentifiers iden ON pr.ProductID = iden.ProductID
AND iden.ProductIdentifierTypeID = 2
WHERE
pr.ChainID = '119121'
AND pr.ActiveLastDate > '12/01/2016'
GROUP BY
pr.ProductID, pr.ProductName, iden.IdentifierValue,
pr.ChainID, pr.StoreID, pr.SupplierID
--STEP 2
SELECT
rec.ChainID, rec.StoreID, rec.SupplierInvoiceNumber,
rec.TransactionTypeID, rec.SupplierID, rec.SaleDateTime,
rec.ProductID, rec.UPC, rec.ProductDescriptionReported,
rec.RawProductIdentifier
FROM
#tmpRecData t
INNER JOIN
dbo.StoreTransactions AS rec WITH (NOLOCK) ON rec.ChainID = T.ChainID
WHERE
rec.ChainID = '119121'
DROP TABLE #tmpRecData
I am getting 4096 (Step1) * 145979 (Step2) = 725077693 rows (725 million)
This is a huge number of records, but I have used INNER JOIN, so why it worked as CROSS JOIN?
CROSS JOIN is very different to INNER JOIN.
INNER JOIN displays only the rows that have a match in both the joined tables..
CROSS JOIN produces a Cartesian product of the tables in the join. The number of rows of the result is the number of the rows in first table multiplied by the number of rows in the second table.
You need to join with store ID in step2 for this to work. It is running chainID for every store , hence too many number of records. If products also need to match, then you need to Join productID as well in step2
IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL DROP TABLE #tmpRecData
--STEP 1
SELECT DISTINCT pr.ChainID,pr.StoreID,pr.SupplierID,pr.ProductID, MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
--ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO #tmpRecData
FROM dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN ProductIdentifiers iden
ON pr.ProductID=iden.ProductID
AND iden.ProductIdentifierTypeID=2
WHERE pr.ChainID='119121'
AND pr.ActiveLastDate>'12/01/2016'
GROUP BY pr.ProductID,pr.ProductName,iden.IdentifierValue,pr.ChainID,pr.StoreID,pr.SupplierID
--STEP 2
SELECT rec.ChainID,rec.StoreID,rec.SupplierInvoiceNumber,rec.TransactionTypeID,rec.SupplierID,rec.SaleDateTime,
rec.ProductID,rec.UPC,rec.ProductDescriptionReported,rec.RawProductIdentifier
FROM #tmpRecData t
INNER JOIN dbo.StoreTransactions AS rec WITH (NOLOCK)
ON rec.ChainID=T.ChainID and rec.StoreID = T.storeID
WHERE rec.ChainID='119121'
DROP TABLE #tmpRecData

SQL View Outer Apply Speed

I am seeing some strange query speed results when using a view with an outer apply, I am doing a distinct count on 2 different columns in the view, 1 is done in less than 0.1 seconds, the other takes 4-6 seconds, is the second count query returned slower because it is part of the outer apply? If so - how could I speed this query up?
The fast distinct count is -
SELECT DISTINCT ISNULL([ItemType], 'N/A') AS Items FROM vwCustomerItemDetailsFull
The slow distinct count is -
SELECT DISTINCT ISNULL([CustomerName], 'N/A') AS Items FROM vwCustomerItemDetailsFull
The view is -
SELECT I.ItemID,
IT.Name AS ItemType,
CASE
WHEN CustomerItemEndDate IS NULL
OR CustomerItemEndDate > GETDATE() THEN CustomerItems.CustomerName
ELSE NULL
END AS CustomerName,
CASE
WHEN CustomerItemEndDate IS NULL
OR CustomerItemEndDate > GETDATE() THEN CustomerItems.CustomerNumber
ELSE NULL
END AS CustomerNumber,
CASE
WHEN CustomerItemEndDate IS NULL
OR CustomerItemEndDate > GETDATE() THEN CustomerItems.CustomerItemStartDate
ELSE NULL
END AS CustomerItemStartDate,
FROM tblItems I
INNER JOIN tblItemTypes IT
ON I.ItemTypeID = IT.ItemTypeID
OUTER APPLY (SELECT TOP 1 CustomerName,
CustomerNumber,
StartDate AS CustomerItemStartDate,
EndDate AS CustomerItemEndDate
FROM tblCustomerItems CI
INNER JOIN tblCustomers C
ON C.CustomerID = CI.CustomerID
WHERE CI.ItemID = I.ItemID
ORDER BY EndDate DESC) AS CustomerItems
Check the execution plan, this speed difference is not strange at all, since it is an outer apply and not a cross apply, and within it you are limiting the results to top 1, it means that your outer apply has no influence on the number of results of the query, or the column ItemType.
Therefore when you select from the view and don't use any columns from the outer apply, the optimiser is smart enough to know it doesn't need to execute it. So in essesnce your first query is:
SELECT DISTINCT ISNULL([ItemType], 'N/A') AS Items
FROM ( SELECT tblItems
FROM Items
INNER JOIN tblItemTypes IT
ON I.ItemTypeID = IT.ItemTypeID
) vw
Whereas your second query has to execute the outer apply.
I have previously posted a longer answer which could also be helpful.
EDIT
If you wanted to change your query to a JOIN it could be rewritten as so:
SELECT I.ItemID,
IT.Name AS ItemType,
CustomerName,
CustomerNumber,
CustomerItemStartDate,
FROM tblItems I
INNER JOIN tblItemTypes IT
ON I.ItemTypeID = IT.ItemTypeID
LEFT JOIN
( SELECT ci.ItemID,
CustomerName,
CustomerNumber,
StartDate AS CustomerItemStartDate,
EndDate AS CustomerItemEndDate,
RN = ROW_NUMBER() OVER (PARTITION BY ci.ItemID ORDER BY EndDate DESC)
FROM tblCustomerItems CI
INNER JOIN tblCustomers C
ON C.CustomerID = CI.CustomerID
) AS CustomerItems
ON CustomerItems.ItemID = I.ItemID
AND CustomerItems.rn = 1
AND CustomerItems.CustomerItemEndDate < GETDATE();
However I don't think this will improve performance much since you said the most costly part is the sort on EndDate, and for your first query it will negatively impact performance because the optimiser will no longer optimise out the outer apply.
I expect the best way to improve the performance will be adding indexes, without knowing your data size or distribution I can't accurately guess the exact index you need, if you run the query on it's own showing the actual execution plan SSMS will suggest an index for you which would be better than my best guess.

SQL - Selecting counts from multiple tables

Here is my problem (I'm using SQL Server)
I have a table of Students (StudentId, Firstname, Lastname, etc).
I have a table that records StudentAttendance (StudentId, ClassDate, etc.)
I record other student activity (I'm generalizing here for simplicity) such as a Papers table (StudentId, PaperId, etc.). There may be anywhere from zero to 20 papers turned in. Similarly, there is a table called Projects (StudentId, ProjectId, etc.). Same deal as with Papers.
What I'm trying to do is create a list of counts for students who have attendance over a certain level (say 10 attendances). Something like this:
ID Name Att Paper Proj
123 Baker 23 0 2
234 Charlie 26 5 3
345 Delta 13 3 0
Here is what I have:
select
s.StudentId,
s.Lastname,
COUNT(sa.StudentId) as CountofAttendance,
COUNT(p.StudentId) as CountofPapers
from Student s
inner join StudentAttendance sa on (s.StudentId = sa.StudentId)
left outer join Paper p on (s.StudentId = p.StudentId)
group by s.StudentId, s.Lastname
Having COUNT(sa.StudentId) > 10
order by CountofAttendance
If the CountofPaper and join (either inner or left outer) to the Papers table is commented out, the query works fine. I get a nice count of students who have attended at least 10 classes.
However, if I put in the CountofPapers and the join, things get crazy. With a left outer join, any students with papers just show their attendance count in the paper column. With an inner join, both attendance and paper counts seem to multiple off each other.
Guidance needed and appreciated.
Dave
Look at using Common Table Expressions and then divide and conquer your problem. BTW, you are off by 1 in your original query, you'll have 11 minimum attendence
;
WITH GOOD_STUDENTS AS
(
-- this query defines all students with 10+ attendance
SELECT
S.StudentID
, count(1) AS attendence_count
FROM
Student S
inner join
StudentAttendance sa
on (s.StudentId = sa.StudentId)
GROUP BY
S.StudentId
HAVING
COUNT(1) >= 10
)
, STUDIOUS_STUDENTS AS
(
-- lather, rinse, repeat for other metrics
SELECT
S.StudentID
, count(1) AS paper_count
FROM
Student S
inner join
Papers P
on (s.StudentId = P.StudentId)
GROUP BY
S.StudentId
)
, GREGARIOUS_STUDENTS AS
(
SELECT
S.StudentID
, count(1) AS project_count
FROM
Student S
inner join
Projects P
on (s.StudentId = P.StudentId)
GROUP BY
S.StudentId
)
-- And now we roll it all together
SELECT
S.*
, G.attendance_count
, SS.paper_count
, GS.project_count
-- ad nauseum
FROM
-- back to the well on this one as there may be
-- students did nothing
Students S
LEFT OUTER JOIN
GOOD_STUDENTS G
ON G.studentId = S.studentId
LEFT OUTER JOIN
STUDIOUS_STUDENTS SS
ON SS.studentId = S.studentId
LEFT OUTER JOIN
GREGARIOUS_STUDENTS GS
ON GS.studentId = S.studentId
I see plenty of other answer rolling in but I typed for far too long to quit ;)
The problem is there are multiple papers per student, so a StudentAttendance row for every row of Paper that joins: the counts will be re-added every time. Try this:
select
s.StudentId,
s.Lastname,
(select COUNT(*) from StudentAttendance where s.StudentId = sa.StudentId) as CountofAttendance,
(select COUNT(*) from Paper where s.StudentId = p.StudentId) as CountofPapers
from Student s
where (select COUNT(*) from StudentAttendance where s.StudentId = sa.StudentId) > 10
order by CountofAttendance
EDITED to incorporate issue with reference to CountofAttendance
btw, this isn't the fastest solution, but it is the easiest to understand, which was my intention. You can avoid the re-calculation by using a join to an aliased select, but as I said, this is the simplest.
Try this:
select std.StudentId, std.Lastname, att.AttCount, pap.PaperCount, prj.ProjCount
from Students std
left join
(
select StudentId, count(*) AttCount
from StudentAttendance
) att on
std.StudentId = att.StudentId
left join
(
select StudentId, count(*) PaperCount
from Papers
) pap on
std.StudentId = pap.StudentId
left join
(
select StudentId, count(*) ProjCount
from Projects
) prj on
std.StudentId = prj.StudentId
where att.AttCount > 10

how to match two table column data and count the unmatched rows of the first table

Tbl_cdr(ano,starttime)
Tbl_User(id,mobileno)
I want to count the rows from Tbl_cdr with condition omitting the rows (when ano = mobileno) and group by starttime.
any help ,Plz...
select c.starttime, count(*)
from Tbl_cdr c
where not exists (select 1 from Tbl_User u where u.mobileno = c.ano)
group by c.starttime
select count(*), c.StartTime
from Tbl_cdr c
left join Tbl_User u on c.ano = u.mobileno
where u.id is null
group by c.StartTime

Resources