I've been doing some searching on Stack Overflow as well as Google and haven't quite found the answer to my question, so here we go:
It's been a minute since I've done a 'from the ground up' data warehouse project, so I'm dusting off some of my past knowledge, but am blanking on a solution to one of my data load scenarios.
I am creating a Fact Table (factOrderLines) with of course many dimensions joined to it. One of the dimensions I would like to link to factOrderLines is the dimItem. The problem is an Item is unique based on either the item's vendor and vendor part number, manufacturer and manufacturer part number, or an identifier from a subset of items called ManagedItems (MngItemID).
source ex:
Vendor VendorPartNo Manufacturer ManufacturerPartNo MngItemID
100 3456 NULL NULL 67
100 3254 03 1234 23
NULL NULL 03 1235 24
NULL NULL 15 5120 NULL
Problem is when I do my join to the dimItem table from my source table to populate the factOrderLines table I have three lookup scenarios. This is causing the numbers to inflate and performance to be horrible.
LEFT OUTER JOIN dimItem AS i ON
(i.Vendor = src.Vendor AND i.VendorPartNo = src.VndrItemID) OR
(i.Manufacturer = src.Manufacturer AND
(i.ManufacturerPartNo = src.MfgItemID) OR (i.MngItemID = src.MngItemID)
Is there a more efficient/better approach to this scenario than what I have started to implement?
edit: Full INSERT query (for better understanding)
INSERT INTO fctOrderLine
(PurchaseOrderKey
,DateKey
,PurchaseOrderLineNo
,VendorKey
,ManufacturerKey
,ItemKey
,UnitPrice
,Qty
,UnitOfMeasure
,LineTotal)
SELECT PurchaseOrderKey = po.PurchaseOrderKey
,DateKey = ISNULL(c.DateKey, 19000101)
,PurchaseOrderLineNo = ISNULL(p.POLineNbr, -1)
,VendorKey = ISNULL(v.VendorKey, -1)
,ManufacturerKey = ISNULL(m.ManufacturerKey, -1)
,ItemKey = ISNULL(i.ItemKey, -1)
,UnitPrice = ISNULL(p.UnitPrice, -1.00)
,Qty = ISNULL(p.POQty, -1.00)
,UnitOfMeasure = ISNULL(p.ANSI_UOM, N'UNKNOWN')
,LineTotal = ISNULL(p.LineTotalCost, -1)
FROM stgOrders AS p
INNER JOIN dimPurchaseOrder AS po ON po.OrderNo = p.PONumber
LEFT OUTER JOIN dimCalendar AS c ON c.Date = (CASE WHEN p.DT_PO IS NULL OR ISDATE(REPLACE(p.DT_PO, '''', '')) = 0 THEN CAST('19000101' AS DATETIME) ELSE REPLACE(p.DT_PO, '''', '') END)
LEFT OUTER JOIN dimVendor AS v ON v.VendorID = p.VendorID
LEFT OUTER JOIN dimManufacturer AS m ON m.ManufacturerID = p.MfgID
LEFT OUTER JOIN dimItem AS i ON (i.VendorKey = v.VendorKey AND i.VendorPartNo = p.VndrItemID) OR (i.ManufacturerKey = m.ManufacturerKey AND i.ManufacturerPartNo = p.MfgItemID) OR (i.MngItemID = p.MngItemID)
Related
I am trying to run a query in T-SQL to pull back a data set based on a column being null.
This is a simplified version of the code:
SELECT
T1.Col1, T1.Col2,
T1.Col3, T1.Col4
FROM
table1 AS T1
INNER JOIN
table2 AS T2 ON T1.Col2 = T2.Col3
WHERE
T2.Col4 IS NULL
Problem is, the result includes rows where T2.Col4 are NULL and also not NULL, it's like the WHERE clause doesn't exist.
Any ideas would be greatly
UPDATE - full version of code:
SELECT
M.ref
,C.cname
,CL.clname
,C.ccity
,M.productLine
,M.code
,CL.date
,M.dept
,DPT.group
,TK2.tkname
,TK2.tkdept
FROM DB.dbo.manage AS M
OUTER JOIN DB.dbo.ClientManageRelationship AS CMR
ON CMR.RelatedEntityID = M.EntityID
OUTER JOIN DB.dbo.Client AS C
ON C.EntityID = CMR.EntityID
INNER JOIN DB.dbo.ManageCustomerRelationship AS MCR
ON MCR.EntityID = M.EntityID
INNER JOIN DB.dbo.Customer AS CL
ON CL.EntityID = MCR.RelatedID
INNER JOIN DB.dbo.timek AS TK
ON TK.tki = M.tkid
LEFT JOIN (SELECT Group = division, [Department] = newdesc, deptcode FROM DB.csrt.vw_rep_p_l_dept) AS DPT
ON tkdept = DPT.dept
LEFT JOIN (SELECT Name = TK2.tkfirst + ' ' + TK2.tklast, TK2.tki, TK2.dept, TK2.loc FROM DB.dbo.timek as TK2 WITH(NOLOCK)) AS TK2
ON TK2.tki = M.tkid
WHERE DPT.Department = 'Casualty'
AND UPPER (C.ClientName) LIKE '%LIMITED%'
AND CL.date > '31/12/2014'
AND CL.Date IS NULL
AND TK.tkloc = 'loc1' OR TK.tkloc = 'loc2'
ORDER BY M.ref
My first answer would be because you're using INNER JOIN. This only returns matches between the 2 tables. TRY FULL OUTER JOIN which will return all values regardless of matches and will include NULLS.
If you were looking to return all rows regardless of matches including NULLS from only one of the tables then use RIGHT or LEFT JOIN.
Say i had 2 tables ('Person' and 'Figure'). Not every person may have entered a figure on any one day. But an example may be i want to return all people regardless of whether they entered a figure or not on a certain day.
My initial approach to this would be a LEFT join because i want to return of all the people(left table) regardless of there being any matches in the figure table(right table)
FROM Person P
LEFT JOIN Figure F
ON P.ID = F.ID
This would produce a result such as
Name Figure
Sam 20
Ben 30
Matt NULL
Simon NULL
Whereas,
An inner join would produce only matching values not including nulls
Name Figure
Sam 20
Ben 30
Left join works the same way as right join but in the opposite direction. This is most likely the problem you were facing. But i hope this helped
I think the problem is in the last part of the where condition.
You should use brackets.
`WHERE DPT.Department = 'Casualty'
AND UPPER (C.ClientName) LIKE '%LIMITED%'
AND CL.date > '31/12/2014'
AND CL.Date IS NULL
AND (TK.tkloc = 'loc1' OR TK.tkloc = 'loc2')`
or
`WHERE DPT.Department = 'Casualty'
AND UPPER (C.ClientName) LIKE '%LIMITED%'
AND CL.date > '31/12/2014'
AND CL.Date IS NULL
AND TK.tkloc IN ('loc1', 'loc2')`
I have three tables (say A,B and C) with same column names and datatype. And these tables can be joined using four unique columns, say name,company,Seq_Number and role. Now I want to select records of particular role from table A and cross check them with the records in Table B and C.If they do not exist in both the tables, then we need to deactivate/remove those records from Table A. The problem is, the records which does not exist in table B might exist in Table C. So, I need to remove the records of particular role only if they don't exist in both B & C tables. I tried with the below query. But it is not returning the expected result. Kindly help me on this
SELECT DISTINCT FAT_Cust.name
, FAT_Cust.Company
, FAT_Cust.role
, FAT_Cust.Seq_Number
, Cust.name
, Cus.Company
, Cust.role
, Cust.Seq_Numberfrom (
SELECT DISTINCT ALC.NAME, ALC.Company, ALC.ROLE, ALC.Seq_Number
FROM AL_Customer ALC
INNER JOIN BL_Customer LPC ON ALC.NAME = LPC.NAME
AND ALC.Company = LPC.Company
AND ALC.ROLE = LPC.ROLE
AND ALC.Seq_Number = LPC.Seq_Number
AND ALC.Record_Active = 1
UNION SELECT DISTINCT ALC.NAME, ALC.Company, ALC.ROLE, ALC.Seq_Number
FROM AL_Customer ALC
INNER JOIN CL_Customer CLC ON ALC.NAME = CLC.NAME
AND ALC.Company = CLC.Company
AND ALC.ROLE = CLC.ROLE AND ALC.Seq_Number = CLC.Seq_Number
AND ALC.Record_Active = 1
) Cust
RIGHT OUTER JOIN AL_Customer FAT_Cust ON FAT_Cust.NAME = Cust.NAME
AND FAT_Cust.Company = Cust.Company
AND FAT_Cust.ROLE = Cust.ROLE
AND FAT_Cust.Seq_Number = Cust.Seq_Number
AND FAT_Cust.Record_Active = 1
WHERE Cust.NAME IS NULL
AND Cust.Company IS NULL
AND Cust.ROLE IS NULL
AND Cust.Seq_Number IS NULL
AND Cust.ROLE < > 'OWN'
Please try the query given below
SELECT ALC.* FROM AL_Customer ALC
LEFT JOIN BL_Customer BPC ON ALC.NAME = BPC.NAME
AND ALC.Company = BPC.Company
AND ALC.ROLE = BPC.ROLE
AND ALC.Seq_Number = BPC.Seq_Number
AND ALC.Record_Active = 1
AND BLC.Record_Active = 1
LEFT JOIN CL_Customer CPC ON ALC.NAME = CPC.NAME
AND ALC.Company = CPC.Company
AND ALC.ROLE = CPC.ROLE
AND ALC.Seq_Number = CPC.Seq_Number
AND ALC.Record_Active = 1
AND CLC.Record_Active = 1
WHERE ALC.Record_Active = 1
AND (BPC.NAME IS NULL)
AND (CPC.NAME IS NULL)
you can add more condition is where class to narrow down the matching criteria. the above query is assuming that name is present for all the records in the table. I hope this will resolve your issue.
I have not used sql server in a large complex scale in years, and Looking for help on how to proper sintax intersect type query to joing these two data sets, and not create duplicate names. Some patients will have both an order and a clinical event entry and some will only have a clinical event.
Data Set 1
SELECT
distinct
ea.alias as FIN,
per.NAME_Last + ', ' + per.NAME_FIRST + ' ' + Isnull(per.NAME_MIDDLE, '') as PatientName,
oa.action_dt_tm as CirOrder,
od.ORIG_ORDER_DT_TM as DischOrder,
e.disch_dt_tm as ActualDisch,
prs.NAME_FULL_FORMATTED as OrderedBy,
from pathway py
join encounter e on e.CERNER_ENCOUNTER_ID = py.encntr_id
join encntr_alias ea on ea.CERNER_ENCNTR_ID = e.CERNER_ENCOUNTER_ID and ea.ENCNTR_ALIAS_TYPE_WCD = 1049
join person per on per.CERNER_PERSON_ID = e.cerner_PERSON_ID
join orders o on o.CERNER_ENCNTR_ID= e.CERNER_ENCOUNTER_ID and o.CATALOG_wCD = '82111' -- communication order
and o.pathway_catalog_id = '43809296' ---Circumcision Order
join order_action oa on oa.[CERNER_ORDER_ID] = o.CERNER_ORDER_ID and oa.ACTION_TYPE_WCD = '2494'--ordered
join orders od on od.CERNER_ENCNTR_ID= e.CERNER_ENCOUNTER_ID and od.CATALOG_WCD = '203520' --- Discharge Patient
join prsnl prs on prs.CERNER_PERSON_ID = oa.order_provider_id
where py.pathway_catalog_id = '43809296' and ---Circumcision Order
oa.action_dt_tm > '2016-01-01 00:00:00'
and oa.ACTION_DT_TM < '2016-01-19 23:59:59'
--use the report prompts as parameters for the action_dt_tm
Data Set 2
SELECT
distinct e.[CERNER_ENCOUNTER_ID],
ea.alias as FIN,
per.NAME_Last + ', ' + per.NAME_FIRST + ' ' + Isnull(per.NAME_MIDDLE, '') as PatientName,
ce.EVENT_END_DT_TM as CircTime,
od.ORIG_ORDER_DT_TM as DischOrder,
e.disch_dt_tm as ActualDisch,
'' OrderedBy, -- should be blank for this set
cv.DISPLAY
from encounter e
join clinical_event ce on e.CERNER_ENCOUNTER_ID = ce.CERNER_ENCNTR_ID
join encntr_alias ea on ea.CERNER_ENCNTR_ID = e.CERNER_ENCOUNTER_ID and ea.ENCNTR_ALIAS_TYPE_WCD = 1049
join person per on per.CERNER_PERSON_ID = e.cerner_PERSON_ID
join orders od on od.CERNER_ENCNTR_ID= e.CERNER_ENCOUNTER_ID and od.CATALOG_WCD = '203520' --- Discharge Patient
left outer join ENCNTR_LOC_HIST elh on elh.CERNER_ENCNTR_ID = e.CERNER_ENCOUNTER_ID
left outer join CODE_VALUE cv on cv.CODE_VALUE_WK = elh.LOC_NURSE_UNIT_WCD
where ce.event_wcd = '201148' ---Newborn Circumcision
and ce.[RESULT_VAL] = 'Newborn Circumcision'
and ce.EVENT_END_DT_TM > '2016-01-01 00:00:00'
and ce.event_end_dt_tm < '2016-01-19 23:59:59’
and ce.RESULT_STATUS_WCD = '25'
and elh.ACTIVE_STATUS_DT_TM < ce.event_end_dt_tm -- Circ time between the location's active time and end time.
and elh.END_EFFECTIVE_DT_TM > ce.[EVENT_END_DT_TM]
--use the report prompts as parameters for the ce.[EVENT_END_DT_TM]
The structure of an intersect query is as simple as:
select statement 1
intersect
select statement 2
intersect
select statement 3
...
This will return all columns that are in both select statements. The columns returned in the select statements must be of the same quantity and type (or at least be convertible to common type).
You can also do an intersect type of query just using inner joins to filter out records in the one query that are not in the other. So for a simple example let's say you have two tables of colors.
Select distinct ColorTable1.Color
from ColorTable1
join ColorTable2
on ColorTable1.Color = ColorTable2.Color
This will return all the distinct colors in ColorTable1 that are also in ColorTable2. Using joins to filter could help your query perform better, but it does take more thought.
Also see: Set Operators (Transact-SQL)
I'm getting an issue during an ETL of client data where SQL chews up 100% CPU on my dev server. This only happens on occasion, and I have found the particular part of the SP that's causing it, but not sure why it's using so much CPU.
The LoadId and ClientId are both input variables for the SP. Basically, I am trying to find if any of the objects IDs in the Staging table (newly loaded data) match with existing objects (for a particular client), and also check in the Validation table (data gets a validation check before it gets processed) for any errors.
SELECT src.Id ,
o.Id ,
CASE WHEN o.Id IS NULL THEN 0
ELSE 1
END
FROM ObjectsStaging src
LEFT OUTER JOIN client.Objects o ON src.Id = o.UniqueId
WHERE src.LoadId = 22
AND ( o.ClientId IS NULL
OR o.ClientId = 3
)
AND NOT EXISTS ( SELECT 1
FROM dbo.ValidationLog v
WHERE v.LoadId = 22
AND v.RowId = src.RowId )
Maybe try this but change the v.PK to a Non nullible column in the V table.
SELECT src.Id ,
o.Id ,
CASE WHEN o.Id IS NULL THEN 0
ELSE 1
END
FROM ObjectsStaging src
LEFT OUTER JOIN client.Objects o ON src.Id = o.UniqueId
LEFT OUTER JOIN dbo.ValidationLog v on v.LoadId = 22 AND v.RowId = src.RowId
WHERE src.LoadId = 22
AND ( o.ClientId IS NULL
OR o.ClientId = 3
)
AND v.PK is null -- V.loadid is null ? --(same as not exists)
I wrote two queries below that produce one row of data each.
What is the best way to combine them such that I am LEFT with only a single row of data?
These are coming FROM two DISTINCT databases named : [ASN01] and [dsi_ASN_dsicx]
I have 70 pairs of databases like this but am showing only one for simplicity.
The fact that the three letter acronym ASN is common to both database names is no mistake and if needed can be a part of the solution.
Current Results:
Site, Elligence (header)
ASN, 100.00
Site, GP_Total (header)
ASN, 120.00
Desired results:
Site, GP_Total, Elligence (header)
ASN, 120.00, 100.00
SELECT 'ASN' AS Site ,
CASE SUM(perdblnc)
WHEN NULL THEN 0
ELSE -1 * SUM(PERDBLNC)
END AS GP_Total
FROM [ASN01].[dbo].[GL10110] T1
LEFT OUTER JOIN [ASN01].[dbo].[GL00105] T2 ON [T1].[ACTINDX] = [T2].[ACTINDX]
WHERE YEAR1 = 2012
AND PERIODID IN ( '2' )
AND ACTNUMST IN ( '4200-0000-C', '6940-0000-C', '6945-0000-C',
'6950-0000-C' )
SELECT 'ASN' AS [Site] ,
SUM(pi.amount) AS [Elligence]
FROM [dsi_ASN_dsicx].dbo.charge c
LEFT JOIN [dsi_ASN_dsicx].dbo.paymentitem pi ON c.idcharge = pi.chargeid
LEFT JOIN [dsi_ASN_dsicx].dbo.payment p ON pi.paymentid = p.idpayment
LEFT JOIN [dsi_ASN_dsicx].dbo.paymenttype pt ON p.paymenttypeid = pt.idpaymenttype
WHERE pi.amount != 0
AND pt.paymentmethod NOT IN ( '5', '7' )
AND pt.paymentmethod IS NOT NULL
AND p.sdate >= '20120201'
AND p.sdate <= '20120229'
WIthout going through and changing any of your queries, the easiest way would be to use temp tables using the "WITH" common_table_expression. Table1 and Table2 are temp tables created from your select statements. Therefore, we select table1 and join table2.
Let me know if there are any syntax problems, I don't have anything to test this on presently.
;With Table1 as (SELECT 'ASN' as Site, Case sum(perdblnc)
WHEN NULL THEN 0
ELSE -1*sum(PERDBLNC) END as GP_Total
FROM [ASN01].[dbo].[GL10110] T1
Left Outer Join [ASN01].[dbo].[GL00105] T2
ON [T1]. [ACTINDX]= [T2]. [ACTINDX]
WHERE YEAR1 = 2012
AND PERIODID in ('2')
AND ACTNUMST in ('4200-0000-C', '6940-0000-C', '6945-0000-C', '6950-0000-C'))
, Table2 as (SELECT
'ASN' as [Site],
SUM(pi.amount) as [Elligence]
FROM [dsi_ASN_dsicx].dbo.charge c
LEFT JOIN [dsi_ASN_dsicx].dbo.paymentitem pi on c.idcharge = pi.chargeid
LEFT JOIN [dsi_ASN_dsicx].dbo.payment p on pi.paymentid = p.idpayment
LEFT JOIN [dsi_ASN_dsicx].dbo.paymenttype pt on p.paymenttypeid = pt.idpaymenttype
WHERE pi.amount != 0
AND pt.paymentmethod not in ('5','7')
AND pt.paymentmethod is not null
AND p.sdate >='20120201' and p.sdate <= '20120229')
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.site = Table2.site
Hope this helps! Marks as answer if it is =)