SQL Server AVG and Excel AVERAGE producing different results?

SQL Server AVG and Excel AVERAGE producing different results? - sql-server

I'm trying to show averages on SQL server, but when I test the data in Excel the results are not the same, there must be something obvious I am missing.
Here is the code and results from SQL server:
SELECT DISTINCT
d.d_reference + ' - ' + d.d_name AS Faculty,
AVG(sub.GroupSize) AS FacultyAverage
FROM
unitesnapshot.dbo.capd_register r
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_activity a ON a.a_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_moduleactivity ma ON ma.ma_activity = a.a_id
INNER JOIN unitesnapshot.dbo.capd_module m ON m.m_id = ma.ma_activitymodule
INNER JOIN unitesnapshot.dbo.capd_department d ON d.d_id = m.m_moduledept
INNER JOIN unitesnapshot.dbo.capd_section sec ON sec.s_id = m.m_modulesection
INNER JOIN (SELECT
r.r_reference,
COUNT(DISTINCT s.s_studentreference) AS GroupSize
FROM
unitesnapshot.dbo.capd_student s
INNER JOIN unitesnapshot.dbo.capd_person p ON p.p_id = s.s_id
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_student = p.p_id
INNER JOIN unitesnapshot.dbo.capd_register r ON r.r_id = sr.sr_register
GROUP BY
r.r_reference) sub ON sub.r_reference = r.r_reference
WHERE
SUBSTRING(r.r_reference,4,2) = '12' AND
d.d_reference = '730'
GROUP BY
d.d_reference,
d.d_name
Here is the results in Excel:
Thanks

Try this for fun:
select avg(a)
from
(values(1),(2),(3),(4)) x(a);
avg(a)
-------
2
AVG() returns the same datatype as the base column. If your columns are of type int, then the result will be truncated to an int as well. The below returns the "correct" result.
select avg(cast(a as decimal(10,5)))
from
(values(1),(2),(3),(4)) x(a);
result
--------
2.5
The discrepancy you are showing (24 vs 19.50484) will most likely involve another error in conjunction with this. For example, to check that you are summing up the same data in Excel as in SQL Server, dump this result into Excel and sum it up. If it doesn't match what you currently believe is the Excel equivalent of the SQL Server data, line the columns up and check they have the same number of rows. Then sort each column individually by value ASCENDING and compare again.
SELECT d.d_name, sub.GroupSize AS FacultyAverage
FROM unitesnapshot.dbo.capd_register r
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_activity a ON a.a_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_moduleactivity ma ON ma.ma_activity = a.a_id
INNER JOIN unitesnapshot.dbo.capd_module m ON m.m_id = ma.ma_activitymodule
INNER JOIN unitesnapshot.dbo.capd_department d ON d.d_id = m.m_moduledept
INNER JOIN unitesnapshot.dbo.capd_section sec ON sec.s_id = m.m_modulesection
INNER JOIN (SELECT r.r_reference,
COUNT(DISTINCT s.s_studentreference) AS GroupSize
FROM unitesnapshot.dbo.capd_student s
INNER JOIN unitesnapshot.dbo.capd_person p ON p.p_id = s.s_id
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_student = p.p_id
INNER JOIN unitesnapshot.dbo.capd_register r ON r.r_id = sr.sr_register
GROUP BY r.r_reference) sub ON sub.r_reference = r.r_reference
WHERE SUBSTRING(r.r_reference,4,2) = '12' AND d.d_reference = '730'
ORDER BY d.d_name

Related

Snowflake Inner join with same table

I am trying to bring this query in Snowflake. But, getting huge numbers with the last 3 inner joins which has same tables with different conditions.
select count(*) from table2; --5
select count(*) from table_3;--2824134
select count(*) from table1;--478015
Original Query:
select * from
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 Zero on Zero.ID_I = r.id and Zero.time <= d_tbl.starttime and Zero.typeid in (7,19)
inner join table_3 first on first.ID_I = r.id and first.time <= Zero.time and first.typeid in (8,9)
inner join table_3 second on second.ID_I = r.id and second.time >= d_tbl.endtime and second.typeid in (8,9)
where d_tbl.mode = 0;
I tried breaking the queries into 3 parts.
create temp table tb1 as
select *
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number ;
create temp table tb2 as
select ID_I , time as time as Zero_time,time as first_time,time as second_time
from table_3
where typeid in (8,9,7,19)
Note: saving the time column with different names for reference.
create temp table final_table as
select * from tb1 r
inner join tb2
on tb2.ID_I = r.id
where tb2.Zero_time <= r.starttime
and tb2.first_time <= Zero.time
and tb2.second_time >= r.endtime
Basically, I am trying to break the conditions in the joins to different parts.
This same logic has to be applied for different tables and do a union all for final table values.
Please help if this would work or let me know if this shall be handled with a better approach that executes faster.
TIA.

Try and convert following -
select * from
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 Zero on Zero.ID_I = r.id
AND Zero.time <= d_tbl.starttime and Zero.typeid in (7,19)
inner join table_3 first on first.ID_I = r.id
AND first.time <= Zero.time and first.typeid in (8,9)
inner join table_3 second on second.ID_I = r.id
AND second.time >= d_tbl.endtime and second.typeid in (8,9)
where d_tbl.mode = 0;
To something like below -
select whatever-columns,
case when t3.time <= d_tbl.starttime
AND Zero.typeid in (7,19) then t3.time_1 end as zero_time,
case when t3.time <= zero_time
AND Zero.typeid in (8,9) then t3.time_1 end as first_time, --- snowflake allows to select/reference previous column
case when t3.time >= d_tbl.endtime
AND second.typeid in (8,9)then t3.time_1 end as second_time
from
table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 t3 on t3.ID_I = r.id
where d_tbl.mode = 0;
This will help to reduce data-set being searched by avoiding multiple inner join on same table that has most records.

Obtain Distinct top 1 columns in SQL Server

I am writing a stored procedure for a project in SQL Server 2014 and I have this code:
ALTER PROCEDURE FOF_MejorVendedor
AS
BEGIN
SELECT TOP 1
F.Nombre, Em.Nombre, (P.Precio * CA.Cantidad) as 'Ganancia'
FROM
dbo.FO_Carrito CA
JOIN
dbo.FO_Solicitud S on S.ID = CA.FK_SolicitudC
JOIN
dbo.FO_Recibo R ON R.FK_Solicitud = S.ID
JOIN
dbo.FO_Productos P ON P.ID = CA.FK_ProductosC
JOIN
dbo.FO_Cliente C ON C.ID = S.FK_Cliente
JOIN
dbo.FO_Estante E ON E.FK_Producto = P.ID
JOIN
dbo.FO_PasilloXDepartamento PD ON PD.FK_Estante = E.NumeroEstante
JOIN
dbo.FO_Encargado En ON En.ID = PD.FK_Encargado
JOIN
dbo.FO_Empleado Em ON Em.ID = En.FK_EmpleadoE
JOIN
dbo.FO_Departamento D ON D.ID = PD.FK_Departamento
JOIN
dbo.FO_Ferreteria F ON D.FK_Ferreteria = F.ID
JOIN
dbo.FO_EmpleadosXFerreteria EF ON EF.FK_Ferreterias = F.ID
GROUP BY
F.Nombre, Em.Nombre, (P.Precio * CA.Cantidad)
ORDER BY
Ganancia DESC
END
But I am only getting the Top 1 of 'Ganancia' but I want to get it for each distinct value in the column "F.Nombre". How can I modify the query?

You are retrieving the top record because you using Top 1 clause, did u believe !
so remove it and the
Group by
will show the result as distinct.

How to improve SQL Server performance issue with hash match right outer join

I am new to performance issues. So I am not sure of what my approach should be.
This is the query that is taking over 7 minutes to run.
INSERT INTO SubscriberToEncounterMapping(PatientEncounterID, InsuranceSubscriberID)
SELECT
PV.PatientVisitId AS PatientEncounterID,
InsSub.InsuranceSubscriberID
FROM
DB1.dbo.PatientVisit PV
JOIN
DB1.dbo.PatientVisitInsurance PVI ON PV.PatientVisitId = PVI.PatientVisitId
JOIN
DB1.dbo.PatientInsurance PatIns on PatIns.PatientInsuranceId = PVI.PatientInsuranceId
JOIN
DB1.dbo.PatientProfile PP On PP.PatientProfileId = PatIns.PatientProfileId
LEFT OUTER JOIN
DB1.dbo.Guarantor G ON PatIns.PatientProfileId = G.PatientProfileId
JOIN
Warehouse.dbo.InsuranceSubscriber InsSub ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId
AND InsSub.OrderForClaims = PatIns.OrderForClaims
AND ((InsSub.GuarantorID = G.GuarantorId) OR (InsSub.GuarantorID IS NULL AND G.GuarantorId IS NULL))
JOIN
Warehouse.dbo.Encounter E ON E.PatientEncounterID = PV.PatientVisitId
The execution plan states that there is a
Hash Match Right Outer Join that Cost 89%
of the query.
There is not a right outer join in the query so I don't see where the problem is.
How can I make the query more efficient?
Here is the Hash Map Detail:

To elaborate on my comment you could try splitting it into two queries, the first to match on GuarantorID and the second to match when it is NULL in InsuranceSubscriber, and in Guarantor, or if the record is missing completely from Guarantor:
INSERT INTO SubscriberToEncounterMapping(PatientEncounterID, InsuranceSubscriberID)
SELECT PV.PatientVisitId AS PatientEncounterID, InsSub.InsuranceSubscriberID
FROM DB1.dbo.PatientVisit PV
JOIN DB1.dbo.PatientVisitInsurance PVI
ON PV.PatientVisitId = PVI.PatientVisitId
JOIN DB1.dbo.PatientInsurance PatIns
ON PatIns.PatientInsuranceId = PVI.PatientInsuranceId
JOIN DB1.dbo.PatientProfile PP
ON PP.PatientProfileId = PatIns.PatientProfileId
JOIN DB1.dbo.Guarantor G
ON PatIns.PatientProfileId = G.PatientProfileId
JOIN Warehouse.dbo.InsuranceSubscriber InsSub
ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId
AND InsSub.OrderForClaims = PatIns.OrderForClaims
AND InsSub.GuarantorID = G.GuarantorId
JOIN Warehouse.dbo.Encounter E
ON E.PatientEncounterID = PV.PatientVisitId
UNION ALL
SELECT PV.PatientVisitId AS PatientEncounterID, InsSub.InsuranceSubscriberID
FROM DB1.dbo.PatientVisit PV
JOIN DB1.dbo.PatientVisitInsurance PVI
ON PV.PatientVisitId = PVI.PatientVisitId
JOIN DB1.dbo.PatientInsurance PatIns
ON PatIns.PatientInsuranceId = PVI.PatientInsuranceId
JOIN DB1.dbo.PatientProfile PP
ON PP.PatientProfileId = PatIns.PatientProfileId
JOIN Warehouse.dbo.InsuranceSubscriber InsSub
ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId
AND InsSub.OrderForClaims = PatIns.OrderForClaims
AND InsSub.GuarantorID IS NULL
JOIN Warehouse.dbo.Encounter E
ON E.PatientEncounterID = PV.PatientVisitId
WHERE NOT EXISTS
( SELECT 1
FROM DB1.dbo.Guarantor G
WHERE PatIns.PatientProfileId = G.PatientProfileId
AND InsSub.GuarantorID IS NOT NULL
);

I would re-order the joins based on the ability to reduce the number of records returned by each join. Whichever join can reduce the number or records returned will increase efficiency. Then perform the outer join. Also, table locking can always be an issue so add with(nolock) to prevent records that are locked.
Perhaps something like this would work with a little tweaking.
INSERT INTO SubscriberToEncounterMapping (
PatientEncounterID
, InsuranceSubscriberID
)
SELECT PV.PatientVisitId AS PatientEncounterID
, InsSub.InsuranceSubscriberID
FROM DB1.dbo.PatientVisit PV WITH (NOLOCK)
INNER JOIN Warehouse.dbo.Encounter E WITH (NOLOCK)
ON E.PatientEncounterID = PV.PatientVisitId
INNER JOIN DB1.dbo.PatientVisitInsurance PVI WITH (NOLOCK)
ON PV.PatientVisitId = PVI.PatientVisitId
INNER JOIN DB1.dbo.PatientInsurance PatIns WITH (NOLOCK)
ON PatIns.PatientInsuranceId = PVI.PatientInsuranceId
INNER JOIN DB1.dbo.PatientProfile PP WITH (NOLOCK)
ON PP.PatientProfileId = PatIns.PatientProfileId
INNER JOIN Warehouse.dbo.InsuranceSubscriber InsSub WITH (NOLOCK)
ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId
AND InsSub.OrderForClaims = PatIns.OrderForClaims
LEFT JOIN DB1.dbo.Guarantor G WITH (NOLOCK)
ON PatIns.PatientProfileId = G.PatientProfileId
AND (
(InsSub.GuarantorID = G.GuarantorId)
OR (
InsSub.GuarantorID IS NULL
AND G.GuarantorId IS NULL
)
)

How to group row value using SQL Server?

I want to group same yAxisTitle in SQL Server, below image shows my data.
Expected result:
Query I used:
select
q.questionId, q.questionName,
p.perspectiveTitle, x.xAxisTitle, y.yAxisTitle, c.value
from
coaching_questionPerspectiveMap as c
inner join
Coaching_question as q on c.questionId = q.questionId
inner join
Coaching_perspective as p on c.perspectiveId = p.perspectiveId
inner join
coaching_xAxisData x on c.xAxisDataId = x.xAxisDataId
inner join
coaching_yAxisData y on c.yAxisDataId = y.yAxisDataId
where
q.questionId = 14
and p.perspectiveId = 1
order by
c.sort
Please provide any solution?
Thanks,

If you just want the data ordered so that it shows in groups of yAxisTitle, use this:
select
q.questionId, q.questionName,
p.perspectiveTitle, x.xAxisTitle, y.yAxisTitle, c.value
from
coaching_questionPerspectiveMap as c
inner join
Coaching_question as q on c.questionId = q.questionId
inner join
Coaching_perspective as p on c.perspectiveId = p.perspectiveId
inner join
coaching_xAxisData x on c.xAxisDataId = x.xAxisDataId
inner join
coaching_yAxisData y on c.yAxisDataId = y.yAxisDataId
where
q.questionId = 14
and p.perspectiveId = 1
order by
y.yAxisTitle, c.sort

Select only columns from joined tables from CTE

The following is my CTE:
;WITH CTE AS
(SELECT O.*, E.Num, E.Amount
FROM OData O
INNER JOIN Equip E
ON O.Name = E.Name)
SELECT * FROM CTE -- gives results I want to join to
The following is the query that I want to SELECT from (and only use this SELECT statement for my query results:
SELECT
MU.Type
,MU.Num
,MU.MTBUR
,MF.MTBF
,MU.Hours
,MF.Hours
FROM
MUType_Stage MU
INNER JOIN
MFType_Stage MF
ON
MU.Type = MF.Type
AND
MU.Num = MF.Num
-- Need do JOIN to CTE right here
INNER JOIN
Status_STAGE S
ON
MU.Nu = S.Part
LEFT OUTER JOIN
RCN N
ON
N.Name = R.Part
LEFT OUTER JOIN
Repair RR
ON
R.ACSS_Name = RR.Name
So basically I need to JOIN to the CTE inside the SELECT query in which I want the results.
OR ALTERNATIVELY Uses this select statement to join to the CTE but only what the selected columns from the second select statement

Try this syntax
WITH CTE
AS (SELECT O.*,
E.Num,
E.Amount
FROM OData O
INNER JOIN Equip E
ON O.Name = E.Name)
SELECT MU.Type,
MU.Num,
MU.MTBUR,
MF.MTBF,
MU.Hours,
MF.Hours
FROM MUType_Stage MU
INNER JOIN MFByACType_Stage MF
ON MU.Type = MF.Type
AND MU.Num = MF.Num
INNER JOIN CTE C --- JOIN HERE as like other tables
ON C.Num = MF.Num
INNER JOIN Status_STAGE S
ON MU.Nu = S.Part
LEFT OUTER JOIN RCN N
ON N.Name = R.Part
LEFT OUTER JOIN Repair RR
ON R.ACSS_Name = RR.Name

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server AVG and Excel AVERAGE producing different results? - sql-server

Related

Snowflake Inner join with same table

Obtain Distinct top 1 columns in SQL Server

How to improve SQL Server performance issue with hash match right outer join

How to group row value using SQL Server?

Select only columns from joined tables from CTE

Categories

Resources