Snowflake Inner join with same table - snowflake-cloud-data-platform

I am trying to bring this query in Snowflake. But, getting huge numbers with the last 3 inner joins which has same tables with different conditions.
select count(*) from table2; --5
select count(*) from table_3;--2824134
select count(*) from table1;--478015
Original Query:
select * from
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 Zero on Zero.ID_I = r.id and Zero.time <= d_tbl.starttime and Zero.typeid in (7,19)
inner join table_3 first on first.ID_I = r.id and first.time <= Zero.time and first.typeid in (8,9)
inner join table_3 second on second.ID_I = r.id and second.time >= d_tbl.endtime and second.typeid in (8,9)
where d_tbl.mode = 0;
I tried breaking the queries into 3 parts.
create temp table tb1 as
select *
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number ;
create temp table tb2 as
select ID_I , time as time as Zero_time,time as first_time,time as second_time
from table_3
where typeid in (8,9,7,19)
Note: saving the time column with different names for reference.
create temp table final_table as
select * from tb1 r
inner join tb2
on tb2.ID_I = r.id
where tb2.Zero_time <= r.starttime
and tb2.first_time <= Zero.time
and tb2.second_time >= r.endtime
Basically, I am trying to break the conditions in the joins to different parts.
This same logic has to be applied for different tables and do a union all for final table values.
Please help if this would work or let me know if this shall be handled with a better approach that executes faster.
TIA.

Try and convert following -
select * from
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 Zero on Zero.ID_I = r.id
AND Zero.time <= d_tbl.starttime and Zero.typeid in (7,19)
inner join table_3 first on first.ID_I = r.id
AND first.time <= Zero.time and first.typeid in (8,9)
inner join table_3 second on second.ID_I = r.id
AND second.time >= d_tbl.endtime and second.typeid in (8,9)
where d_tbl.mode = 0;
To something like below -
select whatever-columns,
case when t3.time <= d_tbl.starttime
AND Zero.typeid in (7,19) then t3.time_1 end as zero_time,
case when t3.time <= zero_time
AND Zero.typeid in (8,9) then t3.time_1 end as first_time, --- snowflake allows to select/reference previous column
case when t3.time >= d_tbl.endtime
AND second.typeid in (8,9)then t3.time_1 end as second_time
from
table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 t3 on t3.ID_I = r.id
where d_tbl.mode = 0;
This will help to reduce data-set being searched by avoiding multiple inner join on same table that has most records.

Related

SQL Server 2005 Select Data From Table1 and Table2 but if Table2 column1 value is null Select Data From Table3

My Query IS
SELECT TblPharmacyBillingDetails.UPBNo, TblMasterBillingData.IPDNo, InPatRegistration.PatTitle+PatientName, TblPharmacyBillingDetails.InvoiceNo, TblPharmacyBillingDetails.InvoiceDateTime, TblPharmacyBillingDetails.BillingAmount
FROM TblPharmacyBillingDetails
INNER JOIN TblMasterBillingData ON TblPharmacyBillingDetails.UPBNo = TblMasterBillingData.UPBNo
INNER JOIN InPatRegistration ON TblMasterBillingData.IPDNo = InPatRegistration.IPDNo
but if TblMasterBillingData.IPDNo value is NULL select Data From TblMasterBillingData.OPDNo and
INNER JOIN OutPatRegistration ON TblMasterBillingData.OPDNo = OutPatRegistration.IPDNo
Method #1: Using UNION
SELECT * FROm
(
SELECT TblPharmacyBillingDetails.UPBNo,
TblMasterBillingData.IPDNo,
InPatRegistration.PatTitle+PatientName,
TblPharmacyBillingDetails.InvoiceNo,
TblPharmacyBillingDetails.InvoiceDateTime,
TblPharmacyBillingDetails.BillingAmount
FROM TblPharmacyBillingDetails
INNER JOIN TblMasterBillingData ON TblPharmacyBillingDetails.UPBNo = TblMasterBillingData.UPBNo
INNER JOIN InPatRegistration ON TblMasterBillingData.IPDNo = InPatRegistration.IPDNo
WHERE TblMasterBillingData.IPDNo IS NOT NULL
UNION ALL
SELECT TblPharmacyBillingDetails.UPBNo,
TblMasterBillingData.OPDNo,
OutPatRegistration .PatTitle + PatientName,
TblPharmacyBillingDetails.InvoiceNo,
TblPharmacyBillingDetails.InvoiceDateTime,
TblPharmacyBillingDetails.BillingAmount
FROM TblPharmacyBillingDetails
INNER JOIN TblMasterBillingData ON TblPharmacyBillingDetails.UPBNo = TblMasterBillingData.UPBNo
INNER JOIN OutPatRegistration ON TblMasterBillingData.OPDNo = OutPatRegistration.OPDNo
WHERE TblMasterBillingData.OPDNo IS NOT NULL
)Tmp
ORDER BY TblPharmacyBillingDetails.UPBNo
Method #2 Using ISNULL and LEFT JOIN
SELECT TblPharmacyBillingDetails.UPBNo,
ISNULL(TblMasterBillingData.IPDNo,TblMasterBillingData.OPDNo),
ISNULL(IP.PatTitle + IP.PatientName, OP.PatTitle + OP.PatientName),
TblPharmacyBillingDetails.InvoiceNo,
TblPharmacyBillingDetails.InvoiceDateTime,
TblPharmacyBillingDetails.BillingAmount
FROM TblPharmacyBillingDetails
INNER JOIN TblMasterBillingData ON TblPharmacyBillingDetails.UPBNo = TblMasterBillingData.UPBNo
LEFT JOIN InPatRegistration IP ON TblMasterBillingData.IPDNo = IP.IPDNo
LEFT JOIN outPatRegistration OP ON TblMasterBillingData.OPDNo = OP.OPDNo
ORDER BY TblPharmacyBillingDetails.UPBNo
You can write either case statement or ISNULL() function as shown below in the demo query.
SELECT
Orders.OrderID,
Case when Customers1.CustomerName is null then Customers2.CustomerName else Customers1.CustomerName
end as CustomerName, --way 1
ISNULL(Customers1.CustomerName, Customers2.CustomerName) as Customer, --way 2
Orders.OrderDate
FROM Orders
INNER JOIN Customers1 ON Orders.CustomerID = Customers1.CustomerID
INNER JOIN Customers2 ON Orders.CustomerID = Customers2.CustomerID
-- where your condition here
-- order by your column name
You can also check whether data is available or not in the table and join the table accordingly using if exists as shown below.
if exists(select 1 from tablename where columnname = <your values>)

Multiple Nested Inner Joins: not all records are shown

I have difficulty joining two tables that look like the following:
The main table PMEOBJECT which has a unique key named OBJECTID and
has in total 12768 rows.
Then I want to join PMEOBJECTVALIDITY on it which has an n:1 relationship with PMEOBJECT, since it has more rows,
because it saves the changes over time of PMEOBJECT (i.e. when a certain object is not
valid anymore), this one has 12789 rows (meaning only 21 objects
changed over time). However, I only want to have the current last
VALIDFROM date shown in the query. This all works fine.
Then the trouble starts when I want to join PMEOBJECTDIMENSION, which has an
n:1 relationship with PMEOBJECTVALIDITY and has 36737 rows in total.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
LEFT JOIN PMEOBJECTDIMENSION
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
Results in query per step:
SELECT PMEOBJECT: 12768 rows
LEFT JOIN PMEVALIDITY: 12789 rows
INNER JOIN PMEVALIDITY: 12768 rows
LEFT JOIN PMEOBJECTDIMENSION: 36737 rows
INNER JOIN PMEOBJECTDIMENSION: 12729 rows
I want the end result again to have the same 12768 rows, I don't want any ObjectId to be left out.
What am I missing here?
Kind regards,
Igor
Following might help:
from PMEOBJECTDIMENSION onwards:
LEFT JOIN (SELECT PMEOBJECTDIMENSION.OBJECTVALIDITYID, PMEOBJECTDIMENSION.DATAAREAID
FROM PMEOBJECTDIMENSION
INNER JOIN(SELECT OBJECTVALIDITYID, MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
)X
ON X.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND X.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
and select the distinct records if duplicates present.
The INNER JOINs are filtering out records- what you want is that the LEFT JOIN table (PMEOBJECTVALIDITY and PMEOBJECTDIMENSION) should only include records that have at least a match on the INNER JOIN queries (alias B and C). You can accomplish this with by nesting the INNER JOIN with the LEFT JOIN, generally done as follows:
SELECT *
FROM A
LEFT JOIN B
INNER JOIN C
ON B.ID = C.BID
ON A.ID = B.AID
Now B is INNER JOINed on C and will only contain records that have a match in C, but will preserve the LEFT JOIN not remove any records from A.
In your case, you can simply move the ON clause from the LEFT JOIN to the end of the following INNER JOIN.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID --here it is!
LEFT JOIN PMEOBJECTDIMENSION
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID --I'm here

Is it possible to Replace CTE and SELECT with just single sql query (dependent results)

I have the following query. Given productid as input (2,4,5) i want to get keyid associate with that list. Then i want to get all products associated with that keyids.
For example:
suppose i am passing productid as 2,4,5 as input to my sproc, i will get keyids as 22,34,35,38 (CTE result). This keys are mapped to the input productlist. Based on this keys (CTE result) i want all the products associated to this keys. Say the keyid = 22 will now have product names with product id as 2,4,5,89 & keyid = 34 will now have products associated to 2,4,5,23,45 etc.
I came up with the following solution for the above problem. I am just hoping whether we could somehow improve this solution or do this job in single query as two tables are getting repeated.
WITH GetKey_CTE
AS
(
SELECT k.id, some other select statements
FROM KeyDim k
INNER JOIN KeyData kd on kd.id = k.id
INNER JOIN KeyProductMapping kpm on kpm.id = k.id and kpm.mkey = k.mkey
INNER JOIN Products p on p.productid = kpm.productid
and p.productid IN (2,4,5)
LEFT JOIN some more joins
WHERE clause conditions
)
SELECT cte.id as keyid, pn.productname, some other columns
FROM GetKey_CTE cte
INNER JOIN KeyProductMapping kpm on cte.id = kpm.id
INNER JOIN Products pn on pn.productid = kpm.productid
ORDER BY cte.id
Dataset Example for products and productkeymapping tables :
For Products table:
productid name
1 car
2 bike
3 plane
4 bus
5 train
45 cycle
ProductKeyMapping table
productid keyid
1 23
2 987
45 23
1 56
say input productid is 1, then final result should be:
keyid productid name
23 1 car
23 45 cycle
56 1 car
just looking at the data and that simple example
select pm2.*, product.name
from productmapping pm1
join productmapping pm2
on pm2.keyid = pm1.keyid
and pm1.productid in (1)
join product
on product.id = pm2.productid
declare #product table(id int, name varchar(20));
declare #map table(productid int, keyid int);
insert into #product values
(1, 'car'),
(2, 'bike'),
(3, 'plane'),
(4, 'bus'),
(5, 'train'),
(45, 'cycle');
insert into #map values
(1, '23'),
(2, '987'),
(45, '23'),
(1, '56');
select pm2.*, p.name
from #map pm1
join #map pm2
on pm2.keyid = pm1.keyid
and pm1.productid in (1)
join #product p
on p.id = pm2.productid
order by pm2.keyid;
you can also done by using sub query
SELECT * FROM KeyProductMapping km
INNER JOIN
(
SELECT k.id, some other select statements
FROM KeyDim k
INNER JOIN KeyData kd ON kd.id = k.id
INNER JOIN KeyProductMapping kpm ON kpm.id = k.id AND kpm.mkey = k.mkey
INNER JOIN Products p ON p.productid = kpm.productid
AND p.productid IN (2,4,5)
LEFT JOIN some more joins
WHERE clause conditions) AS p ON p.id = km.id
INNER JOIN Products pn ON p.productid = km.productid
ORDER BY cte.id
SELECT cte.id as keyid, pn.productname, some other columns
FROM ( SELECT k.id, some other select statements
FROM KeyDim k
JOIN KeyData kd
on kd.id = k.id
JOIN KeyProductMapping kpm
on kpm.id = k.id
and kpm.mkey = k.mkey
JOIN Products p
on p.productid = kpm.productid
and p.productid IN (2,4,5)
LEFT JOIN some more joins
WHERE clause conditions
) CTE
JOIN KeyProductMapping kpm
on cte.id = kpm.id
JOIN Products pn
on pn.productid = kpm.productid
ORDER BY cte.id
Above you you query with the CTE in line (a subquery)
A lot of stuff does not make sense to me
JOIN KeyProductMapping kpm
on kpm.id = k.id
and kpm.mkey = k.mkey
JOIN Products p
on p.productid = kpm.productid
and p.productid IN (2,4,5)
is the same as
JOIN KeyProductMapping kpm
on kpm.id = k.id
and kpm.mkey = k.mkey
and p.productid IN (2,4,5)
unless product does not have those values
why
SELECT k.id, some other select statements
FROM KeyDim k
JOIN KeyData kd
on kd.id = k.id
same as
SELECT kd.id -- move this to main select, some other select statements
FROM KeyData kd
why?
LEFT JOIN some more joins
move that to main statement
You can replace the CTE with an inline view like below but note sure why you are duplicating the JOINS again. Can you post some sample data along with your desired result to look further.
SELECT cte.id AS keyid, pn.productname, some other columns
FROM (
SELECT k.id, some other select statements
FROM KeyDim k
INNER JOIN KeyData kd ON kd.id = k.id
INNER JOIN KeyProductMapping kpm ON kpm.id = k.id AND kpm.mkey = k.mkey
INNER JOIN Products p ON p.productid = kpm.productid
AND p.productid IN (2,4,5)
LEFT JOIN some more joins
WHERE clause conditions ) cte
INNER JOIN KeyProductMapping kpm ON cte.id = kpm.id
INNER JOIN Products pn ON cte.productid = kpm.productid
ORDER BY cte.id

How to get values from 3 tables?

CREATE PROCEDURE spJoin3Tables
AS
BEGIN
SELECT
tbl_Jobs.JobTitle, tbl_Company.CompName
FROM
tbl_Jobs
INNER JOIN
tbl_Company ON tbl_Jobs.CompID = tbl_Company.ID
SELECT
tbl_Cities.CityName
FROM
tbl_Cities
INNER JOIN
tbl_JobCities ON tbl_Cities.ID = tbl_JobCities.CityID
INNER JOIN
tbl_Jobs ON tbl_JobCities.JobID = tbl_Jobs.ID
END
The result is two tables. I want to get all three columns in one table - what will be the query?
You just need to add the company table and the columns from the first query to the second query and make sure to join on the company id.
SELECT
tbl_Cities.CityName, tbl_Jobs.JobTitle, tbl_Company.CompName
FROM
tbl_Cities
INNER JOIN
tbl_JobCities ON tbl_Cities.ID = tbl_JobCities.CityID
INNER JOIN
tbl_Jobs ON tbl_JobCities.JobID = tbl_Jobs.ID
INNER JOIN
tbl_Company ON tbl_Jobs.CompID = tbl_Company.ID
USING INNER JOIN U CAN GET ALL DATE. IF IN CASE ANY TABLE IN ID COLUMNS NULL VALUE THEN USER LEFT JOIN
SELECT tbl_Jobs.JobTitle, tbl_Company.CompName , tbl_Cities.CityName
FROM tbl_Jobs
INNER JOIN tbl_Company ON tbl_Jobs.CompID = tbl_Company.ID
INNER JOIN tbl_JobCities ON tbl_JobCities.JobID = tbl_Jobs.ID
INNER JOIN tbl_Cities ON tbl_Cities.ID = tbl_JobCities.CityID

SQL Server AVG and Excel AVERAGE producing different results?

I'm trying to show averages on SQL server, but when I test the data in Excel the results are not the same, there must be something obvious I am missing.
Here is the code and results from SQL server:
SELECT DISTINCT
d.d_reference + ' - ' + d.d_name AS Faculty,
AVG(sub.GroupSize) AS FacultyAverage
FROM
unitesnapshot.dbo.capd_register r
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_activity a ON a.a_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_moduleactivity ma ON ma.ma_activity = a.a_id
INNER JOIN unitesnapshot.dbo.capd_module m ON m.m_id = ma.ma_activitymodule
INNER JOIN unitesnapshot.dbo.capd_department d ON d.d_id = m.m_moduledept
INNER JOIN unitesnapshot.dbo.capd_section sec ON sec.s_id = m.m_modulesection
INNER JOIN (SELECT
r.r_reference,
COUNT(DISTINCT s.s_studentreference) AS GroupSize
FROM
unitesnapshot.dbo.capd_student s
INNER JOIN unitesnapshot.dbo.capd_person p ON p.p_id = s.s_id
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_student = p.p_id
INNER JOIN unitesnapshot.dbo.capd_register r ON r.r_id = sr.sr_register
GROUP BY
r.r_reference) sub ON sub.r_reference = r.r_reference
WHERE
SUBSTRING(r.r_reference,4,2) = '12' AND
d.d_reference = '730'
GROUP BY
d.d_reference,
d.d_name
Here is the results in Excel:
Thanks
Try this for fun:
select avg(a)
from
(values(1),(2),(3),(4)) x(a);
avg(a)
-------
2
AVG() returns the same datatype as the base column. If your columns are of type int, then the result will be truncated to an int as well. The below returns the "correct" result.
select avg(cast(a as decimal(10,5)))
from
(values(1),(2),(3),(4)) x(a);
result
--------
2.5
The discrepancy you are showing (24 vs 19.50484) will most likely involve another error in conjunction with this. For example, to check that you are summing up the same data in Excel as in SQL Server, dump this result into Excel and sum it up. If it doesn't match what you currently believe is the Excel equivalent of the SQL Server data, line the columns up and check they have the same number of rows. Then sort each column individually by value ASCENDING and compare again.
SELECT d.d_name, sub.GroupSize AS FacultyAverage
FROM unitesnapshot.dbo.capd_register r
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_activity a ON a.a_register = r.r_id
INNER JOIN unitesnapshot.dbo.capd_moduleactivity ma ON ma.ma_activity = a.a_id
INNER JOIN unitesnapshot.dbo.capd_module m ON m.m_id = ma.ma_activitymodule
INNER JOIN unitesnapshot.dbo.capd_department d ON d.d_id = m.m_moduledept
INNER JOIN unitesnapshot.dbo.capd_section sec ON sec.s_id = m.m_modulesection
INNER JOIN (SELECT r.r_reference,
COUNT(DISTINCT s.s_studentreference) AS GroupSize
FROM unitesnapshot.dbo.capd_student s
INNER JOIN unitesnapshot.dbo.capd_person p ON p.p_id = s.s_id
INNER JOIN unitesnapshot.dbo.capd_studentregister sr ON sr.sr_student = p.p_id
INNER JOIN unitesnapshot.dbo.capd_register r ON r.r_id = sr.sr_register
GROUP BY r.r_reference) sub ON sub.r_reference = r.r_reference
WHERE SUBSTRING(r.r_reference,4,2) = '12' AND d.d_reference = '730'
ORDER BY d.d_name

Resources