Importing relational data into SQL Server using XML - sql-server

I've been using XML to import large amounts of data into SQL for a while now, but I am wondering if its possible to import data across multiple tables from a single XML file that has child nodes?
Given this example:
DECLARE #tbl_makes TABLE (ID int IDENTITY(1,1), makeName nvarchar(100))
INSERT INTO #tbl_makes (makeName) VALUES ('Ford')
INSERT INTO #tbl_makes (makeName) VALUES ('Jaguar')
DECLARE #tbl_models TABLE (ID int IDENTITY(1,1), makeID int, modelName nvarchar(100))
INSERT INTO #tbl_models (makeID, modelName) VALUES (1, 'Escort')
INSERT INTO #tbl_models (makeID, modelName) VALUES (1, 'Sierra')
INSERT INTO #tbl_models (makeID, modelName) VALUES (2, 'XK')
INSERT INTO #tbl_models (makeID, modelName) VALUES (2, 'XJS')
SELECT * FROM #tbl_makes m INNER JOIN #tbl_models md ON m.ID = md.makeID
DECLARE #xml XML = '
<cars>
<make name="Ford">
<model name="Mustang" />
<model name="Taurus" />
<model name="F350" />
</make>
<make name="Aston Martin">
<model name="Vanquish" />
<model name="DB7" />
<model name="Lagonda" />
</make>
</cars>'
I appreciate that the make names would need to be inserted/looked-up first before related data could be inserted. I've searched online for answers to this, but examples only use a single table. I'm guessing it's not possible without using various temporary tables, but here goes...

What about such solution?
INSERT INTO #tbl_makes (makeName)
SELECT i.i.value('#name', 'nvarchar(100)')
FROM #xml.nodes('/cars[1]/make')i(i)
LEFT JOIN #tbl_makes MA on i.i.value('#name', 'nvarchar(100)') = MA.makeName
WHERE MA.ID IS NULL;
INSERT INTO #tbl_models (makeID, modelName)
SELECT MA.ID, j.j.value('#name', 'nvarchar(100)')
FROM #xml.nodes('/cars[1]/make')i(i)
INNER JOIN #tbl_makes MA ON i.i.value('#name', 'nvarchar(100)') = MA.makeName
CROSS APPLY i.i.nodes('model')j(j)
LEFT JOIN #tbl_models MO on j.j.value('#name', 'nvarchar(100)') = MO.modelName
WHERE MO.ID IS NULL;

Related

TSQL xml output with namespaces [duplicate]

UPDATE: I've discovered there is a Microsoft Connect item raised for this issue here
When using FOR XML PATH and WITH XMLNAMESPACES to declare a default namespace, I will get the namespace declaration duplicated in any top level nodes for nested queries that use FOR XML, I've stumbled across a few solutions on-line, but I'm not totally convinced...
Here's an Complete Example
/*
drop table t1
drop table t2
*/
create table t1 ( c1 int, c2 varchar(50))
create table t2 ( c1 int, c2 int, c3 varchar(50))
insert t1 values
(1, 'Mouse'),
(2, 'Chicken'),
(3, 'Snake');
insert t2 values
(1, 1, 'Front Right'),
(2, 1, 'Front Left'),
(3, 1, 'Back Right'),
(4, 1, 'Back Left'),
(5, 2, 'Right'),
(6, 2, 'Left')
;with XmlNamespaces( default 'uri:animal')
select
a.c2 as "#species"
, (select l.c3 as "text()"
from t2 l where l.c2 = a.c1
for xml path('leg'), type) as "legs"
from t1 a
for xml path('animal'), root('zoo')
What's the best solution?
After hours of desperation and hundreds of trials & errors, I've come up with the solution below.
I had the same issue, when I wanted just one xmlns attribute, on the root node only. But I also had a very difficult query with lot's of subqueries and FOR XML EXPLICIT method alone was just too cumbersome. So yes, I wanted the convenience of FOR XML PATH in the subqueries and also to set my own xmlns.
I kindly borrowed the code of 8kb's answer, because it was so nice. I tweaked it a bit for better understanding. Here is the code:
DECLARE #Order TABLE (OrderID INT, OrderDate DATETIME)
DECLARE #OrderDetail TABLE (OrderID INT, ItemID VARCHAR(1), Name VARCHAR(50), Qty INT)
INSERT #Order VALUES (1, '2010-01-01'), (2, '2010-01-02')
INSERT #OrderDetail VALUES (1, 'A', 'Drink', 5),
(1, 'B', 'Cup', 2),
(2, 'A', 'Drink', 2),
(2, 'C', 'Straw', 1),
(2, 'D', 'Napkin', 1)
-- Your ordinary FOR XML PATH query
DECLARE #xml XML = (SELECT OrderID AS "#OrderID",
(SELECT ItemID AS "#ItemID",
Name AS "data()"
FROM #OrderDetail
WHERE OrderID = o.OrderID
FOR XML PATH ('Item'), TYPE)
FROM #Order o
FOR XML PATH ('Order'), ROOT('dummyTag'), TYPE)
-- Magic happens here!
SELECT 1 AS Tag
,NULL AS Parent
,#xml AS [xml!1!!xmltext]
,'http://test.com/order' AS [xml!1!xmlns]
FOR XML EXPLICIT
Result:
<xml xmlns="http://test.com/order">
<Order OrderID="1">
<Item ItemID="A">Drink</Item>
<Item ItemID="B">Cup</Item>
</Order>
<Order OrderID="2">
<Item ItemID="A">Drink</Item>
<Item ItemID="C">Straw</Item>
<Item ItemID="D">Napkin</Item>
</Order>
</xml>
If you selected #xml alone, you would see that it contains root node dummyTag. We don't need it, so we remove it by using directive xmltext in FOR XML EXPLICIT query:
,#xml AS [xml!1!!xmltext]
Although the explanation in MSDN sounds more sophisticated, but practically it tells the parser to select the contents of XML root node.
Not sure how fast the query is, yet currently I am relaxing and drinking Scotch like a gent while peacefully looking at the code...
If I have understood correctly, you are referring to the behavior that you might see in a query like this:
DECLARE #Order TABLE (
OrderID INT,
OrderDate DATETIME)
DECLARE #OrderDetail TABLE (
OrderID INT,
ItemID VARCHAR(1),
ItemName VARCHAR(50),
Qty INT)
INSERT #Order
VALUES
(1, '2010-01-01'),
(2, '2010-01-02')
INSERT #OrderDetail
VALUES
(1, 'A', 'Drink', 5),
(1, 'B', 'Cup', 2),
(2, 'A', 'Drink', 2),
(2, 'C', 'Straw', 1),
(2, 'D', 'Napkin', 1)
;WITH XMLNAMESPACES('http://test.com/order' AS od)
SELECT
OrderID AS "#OrderID",
(SELECT
ItemID AS "#od:ItemID",
ItemName AS "data()"
FROM #OrderDetail
WHERE OrderID = o.OrderID
FOR XML PATH ('od.Item'), TYPE)
FROM #Order o
FOR XML PATH ('od.Order'), TYPE, ROOT('xml')
Which gives the following results:
<xml xmlns:od="http://test.com/order">
<od.Order OrderID="1">
<od.Item xmlns:od="http://test.com/order" od:ItemID="A">Drink</od.Item>
<od.Item xmlns:od="http://test.com/order" od:ItemID="B">Cup</od.Item>
</od.Order>
<od.Order OrderID="2">
<od.Item xmlns:od="http://test.com/order" od:ItemID="A">Drink</od.Item>
<od.Item xmlns:od="http://test.com/order" od:ItemID="C">Straw</od.Item>
<od.Item xmlns:od="http://test.com/order" od:ItemID="D">Napkin</od.Item>
</od.Order>
</xml>
As you said, the namespace is repeated in the results of the subqueries.
This behavior is a feature according to a conversation on devnetnewsgroup (website now defunct) although there is the option to vote on changing it.
My proposed solution is to revert back to FOR XML EXPLICIT:
SELECT
1 AS Tag,
NULL AS Parent,
'http://test.com/order' AS [xml!1!xmlns:od],
NULL AS [od:Order!2],
NULL AS [od:Order!2!OrderID],
NULL AS [od:Item!3],
NULL AS [od:Item!3!ItemID]
UNION ALL
SELECT
2 AS Tag,
1 AS Parent,
'http://test.com/order' AS [xml!1!xmlns:od],
NULL AS [od:Order!2],
OrderID AS [od:Order!2!OrderID],
NULL AS [od:Item!3],
NULL [od:Item!3!ItemID]
FROM #Order
UNION ALL
SELECT
3 AS Tag,
2 AS Parent,
'http://test.com/order' AS [xml!1!xmlns:od],
NULL AS [od:Order!2],
o.OrderID AS [od:Order!2!OrderID],
d.ItemName AS [od:Item!3],
d.ItemID AS [od:Item!3!ItemID]
FROM #Order o INNER JOIN #OrderDetail d ON o.OrderID = d.OrderID
ORDER BY [od:Order!2!OrderID], [od:Item!3!ItemID]
FOR XML EXPLICIT
And see these results:
<xml xmlns:od="http://test.com/order">
<od:Order OrderID="1">
<od:Item ItemID="A">Drink</od:Item>
<od:Item ItemID="B">Cup</od:Item>
</od:Order>
<od:Order OrderID="2">
<od:Item ItemID="A">Drink</od:Item>
<od:Item ItemID="C">Straw</od:Item>
<od:Item ItemID="D">Napkin</od:Item>
</od:Order>
</xml>
An alternative solution I've seen is to add the XMLNAMESPACES declaration after building the xml into a temporary variable:
declare #xml as xml;
select #xml = (
select
a.c2 as "#species"
, (select l.c3 as "text()"
from t2 l where l.c2 = a.c1
for xml path('leg'), type) as "legs"
from t1 a
for xml path('animal'))
;with XmlNamespaces( 'uri:animal' as an)
select #xml for xml path('') , root('zoo');
The problem here is compounded by the fact that you cannot directly declare the namespaces manually when using XML PATH. SQL Server will disallow any attribute names beginning with 'xmlns' and any tag names with colons in them.
Rather than having to resort to using the relatively unfriendly XML EXPLICIT, I got around the problem by first generating XML with 'cloaked' namespace definitions and references, then doing string replaces as follows ...
DECLARE #Order TABLE (
OrderID INT,
OrderDate DATETIME)
DECLARE #OrderDetail TABLE (
OrderID INT,
ItemID VARCHAR(1),
ItemName VARCHAR(50),
Qty INT)
INSERT #Order
VALUES
(1, '2010-01-01'),
(2, '2010-01-02')
INSERT #OrderDetail
VALUES
(1, 'A', 'Drink', 5),
(1, 'B', 'Cup', 2),
(2, 'A', 'Drink', 2),
(2, 'C', 'Straw', 1),
(2, 'D', 'Napkin', 1)
declare #xml xml
set #xml = (SELECT
'http://test.com/order' as "#xxmlns..od", -- 'Cloaked' namespace def
(SELECT OrderID AS "#OrderID",
(SELECT
ItemID AS "#od..ItemID",
ItemName AS "data()"
FROM #OrderDetail
WHERE OrderID = o.OrderID
FOR XML PATH ('od..Item'), TYPE)
FROM #Order o
FOR XML PATH ('od..Order'), TYPE)
FOR XML PATH('xml'))
set #xml = cast(replace(replace(cast(#xml as nvarchar(max)), 'xxmlns', 'xmlns'),'..',':') as xml)
select #xml
A few things to point out:
I'm using 'xxmlns' as my cloaked version of 'xmlns' and '..' to stand in for ':'. This might not work for you if you're likely to have '..' as part of text values - you can substitute this with something else as long as you pick something that makes a valid XML identifier.
Since we want the xmlns definition at the top level, we cannot use the 'ROOT' option to XML PATH - instead I needed to add an another outer level to the subselect structure to achieve this.
I'm bit confusing about all these explanation while declaring a "xmlns:animals" manually is doing the job :
Here an example i wrote to generate Open graph meta data
DECLARE #l_xml as XML;
SELECT #l_xml =
(
SELECT 'http://ogp.me/ns# fb: http://ogp.me/ns/fb# scanilike: http://ogp.me/ns/fb/scanilike#' as 'xmlns:og',
(SELECT
(SELECT 'og:title' as 'property', title as 'content' for xml raw('meta'), TYPE),
(SELECT 'og:type' as 'property', OpenGraphWebMetadataTypes.name as 'content' for xml raw('meta'), TYPE),
(SELECT 'og:image' as 'property', image as 'content' for xml raw('meta'), TYPE),
(SELECT 'og:url' as 'property', url as 'content' for xml raw('meta'), TYPE),
(SELECT 'og:description' as 'property', description as 'content' for xml raw('meta'), TYPE),
(SELECT 'og:site_name' as 'property', siteName as 'content' for xml raw('meta'), TYPE),
(SELECT 'og:appId' as 'property', appId as 'content' for xml raw('meta'), TYPE)
FROM OpenGraphWebMetaDatas INNER JOIN OpenGraphWebMetadataTypes ON OpenGraphWebMetaDatas.type = OpenGraphWebMetadataTypes.id WHERE THING_KEY = #p_index
for xml path('header'), TYPE),
(SELECT '' as 'body' for xml path(''), TYPE)
for xml raw('html'), TYPE
)
RETURN #l_xml
returning the expected result
<html xmlns:og="http://ogp.me/ns# fb: http://ogp.me/ns/fb# scanilike: http://ogp.me/ns/fb/scanilike#">
<header>
<meta property="og:title" content="The First object"/>
<meta property="og:type" content="scanilike:tag"/>
<meta property="og:image" content="http://www.mygeolive.com/images/facebook/facebook-logo.jpg"/>
<meta property="og:url" content="http://www.scanilike.com/opengraph?id=1"/>
<meta property="og:description" content="This is the very first object created using the IOThing & ScanILike software. We keep it in file for history purpose. "/>
<meta property="og:site_name" content="http://www.scanilike.com"/>
<meta property="og:appId" content="200270673369521"/>
</header>
<body/>
</html>
hope this will help people are searching the web for similar issue. ;-)
It would be really nice if FOR XML PATH actually worked more cleanly. Reworking your original example with #table variables:
declare #t1 table (c1 int, c2 varchar(50));
declare #t2 table (c1 int, c2 int, c3 varchar(50));
insert #t1 values
(1, 'Mouse'),
(2, 'Chicken'),
(3, 'Snake');
insert #t2 values
(1, 1, 'Front Right'),
(2, 1, 'Front Left'),
(3, 1, 'Back Right'),
(4, 1, 'Back Left'),
(5, 2, 'Right'),
(6, 2, 'Left');
;with xmlnamespaces( default 'uri:animal')
select a.c2 as "#species",
(
select l.c3 as "text()"
from #t2 l
where l.c2 = a.c1
for xml path('leg'), type
) as "legs"
from #t1 a
for xml path('animal'), root('zoo');
Yields the problem XML with repeated namespace declarations:
<zoo xmlns="uri:animal">
<animal species="Mouse">
<legs>
<leg xmlns="uri:animal">Front Right</leg>
<leg xmlns="uri:animal">Front Left</leg>
<leg xmlns="uri:animal">Back Right</leg>
<leg xmlns="uri:animal">Back Left</leg>
</legs>
</animal>
<animal species="Chicken">
<legs>
<leg xmlns="uri:animal">Right</leg>
<leg xmlns="uri:animal">Left</leg>
</legs>
</animal>
<animal species="Snake" />
</zoo>
You can migrate elements between namespaces using XQuery with wildcard namespace matching (that is, *:elementName), as below, but it can be quite cumbersome for complex XML:
;with xmlnamespaces( default 'http://tempuri.org/this/namespace/is/meaningless' )
select (
select a.c2 as "#species",
(
select l.c3 as "text()"
from #t2 l
where l.c2 = a.c1
for xml path('leg'), type
) as "legs"
from #t1 a
for xml path('animal'), root('zoo'), type
).query('declare default element namespace "uri:animal";
<zoo>
{ for $a in *:zoo/*:animal return
<animal>
{attribute species {$a/#species}}
{ for $l in $a/*:legs return
<legs>
{ for $m in $l/*:leg return
<leg>{ $m/text() }</leg>
}</legs>
}</animal>
}</zoo>');
Which yields your desired result:
<zoo xmlns="uri:animal">
<animal species="Mouse">
<legs>
<leg>Front Right</leg>
<leg>Front Left</leg>
<leg>Back Right</leg>
<leg>Back Left</leg>
</legs>
</animal>
<animal species="Chicken">
<legs>
<leg>Right</leg>
<leg>Left</leg>
</legs>
</animal>
<animal species="Snake" />
</zoo>

How do I remove redundant namespace in nested query when using FOR XML in SQL Function

There is a solution for remove namespace here!, but I need it works in SQL function, I'm getting this error:
The FOR XML clause is invalid in views, inline functions, derived tables, and subqueries when they contain a set operator. To work around, wrap the SELECT containing a set operator using derived table syntax and apply FOR XML on top of it.
Can somebody help me?
Thanks Xtyan
From comment: This is the needed XML-output
<zoo xmlns:an="uri:animal">
<an:animal species="Mouse">
<an:legs>
<an:leg>Front Right</an:leg>
<an:leg>Front Left</an:leg>
<an:leg>Back Right</an:leg>
<an:leg>Back Left</an:leg>
</an:legs>
</an:animal>
<an:animal species="Chicken">
<an:legs>
<an:leg>Right</an:leg>
<an:leg>Left</an:leg>
</an:legs>
</an:animal>
<an:animal species="Snake" />
</zoo>
You will not like this probably...
I did not find a generic solution for this... It is no problem to create this with repeated namespaces. This is not wrong, but annoying:
declare #xml as xml;
WITH XMLNAMESPACES('uri:animal' as an)
select #xml = (
select
a.c2 as "#an:species"
, (
select l.c3 as "text()"
from t2 l where l.c2 = a.c1
for xml path('an:leg'), type
) as "an:legs"
from t1 a
for xml path('an:animal'),root('zoo'));
One solution: build this via string-concatenation
But this is super-ugly...
an (almost) working solution...
The following solution uses FLWOR to re-create the XML after its creation, but this is not generic. It is necessary to add one element of this namespaces to the <zoo>, otherwise the namespace is created on a deeper level repeatedly. I added the attribut an:Dummy.
create table t1 ( c1 int, c2 varchar(50))
create table t2 ( c1 int, c2 int, c3 varchar(50))
insert t1 values
(1, 'Mouse'),
(2, 'Chicken'),
(3, 'Snake');
insert t2 values
(1, 1, 'Front Right'),
(2, 1, 'Front Left'),
(3, 1, 'Back Right'),
(4, 1, 'Back Left'),
(5, 2, 'Right'),
(6, 2, 'Left');
GO
--the function
CREATE FUNCTION dbo.TestFunc()
RETURNS XML
AS
BEGIN
declare #xml as xml;
WITH XMLNAMESPACES('uri:animal' as an)
select #xml = (
select
a.c2 as "#an:species"
, (
select l.c3 as "text()"
from t2 l where l.c2 = a.c1
for xml path('an:leg'), type
) as "an:legs"
from t1 a
for xml path('an:animal'));
set #xml=#xml.query('declare namespace an="uri:animal";
<zoo an:Dummy="dummy">
{
for $a in /an:animal
return <an:animal an:species="{$a/#an:species}"><an:legs>
{
for $l in $a/an:legs/an:leg
return <an:leg>{$l/text()}</an:leg>
}
</an:legs></an:animal>
}
</zoo>');
return #xml;
END
GO
--the call
SELECT dbo.TestFunc();
GO
--clean up
drop function dbo.TestFunc;
drop table t1
drop table t2
The result
<zoo xmlns:an="uri:animal" an:Dummy="dummy">
<an:animal an:species="Mouse">
<an:legs>
<an:leg>Front Right</an:leg>
<an:leg>Front Left</an:leg>
<an:leg>Back Right</an:leg>
<an:leg>Back Left</an:leg>
</an:legs>
</an:animal>
<an:animal an:species="Chicken">
<an:legs>
<an:leg>Right</an:leg>
<an:leg>Left</an:leg>
</an:legs>
</an:animal>
<an:animal an:species="Snake">
<an:legs />
</an:animal>
</zoo>
Previous answer
Okay, I think I got this completely wrong in my first attempt. The following is an example, where a function returns the XML without added namespaces.
I use one of the later answers which builds the inner XML in advance without a namespace and creates the full XML as a second call with a namespace. Please check this out:
create table t1 ( c1 int, c2 varchar(50))
create table t2 ( c1 int, c2 int, c3 varchar(50))
insert t1 values
(1, 'Mouse'),
(2, 'Chicken'),
(3, 'Snake');
insert t2 values
(1, 1, 'Front Right'),
(2, 1, 'Front Left'),
(3, 1, 'Back Right'),
(4, 1, 'Back Left'),
(5, 2, 'Right'),
(6, 2, 'Left');
GO
--the function
CREATE FUNCTION dbo.TestFunc()
RETURNS XML
AS
BEGIN
declare #xml as xml;
select #xml = (
select
a.c2 as "#species"
, (select l.c3 as "text()"
from t2 l where l.c2 = a.c1
for xml path('leg'), type) as "legs"
from t1 a
for xml path('animal'));
declare #resultXML XML;
;with XmlNamespaces( 'uri:animal' as an)
select #ResultXML= (SELECT #xml for xml path('') , root('zoo'),TYPE);
return #resultXML;
END
GO
--the call
SELECT dbo.TestFunc();
GO
--clean up
drop function dbo.TestFunc;
drop table t1
drop table t2
The result
<zoo xmlns:an="uri:animal">
<animal species="Mouse">
<legs>
<leg>Front Right</leg>
<leg>Front Left</leg>
<leg>Back Right</leg>
<leg>Back Left</leg>
</legs>
</animal>
<animal species="Chicken">
<legs>
<leg>Right</leg>
<leg>Left</leg>
</legs>
</animal>
<animal species="Snake" />
</zoo>

SQL Server: bulk insert node into xml joined to another table

I've googled but can't find a good example of this.
I have a #temp table with an pk ID and decimal column
ID decimal
2 0.34
3 0.1
I have another table called master having a column with the same pk and an xml column like:
Master
ID xml
2 <Form ....
3 <Form.....
I need to insert a new node into the xml that has its element name as the decimal value. All which have the same element name and at the same level.
The xml on a basic level looks like:
<Form formCode="123">
<Node1>234</Node1>
<Node2>234</Node3>
</Form>
And I want the final xml to look like:
<Form formCode="123">
<Node1>234</Node1>
<Node2>234</Node3>
<NewNode>0.34</NewNode>
</Form>
I think it should be something like:
UPDATE Master
SET
xml.modify('insert /Form/'...followed by some kind of join.
Try something like this
DECLARE #tbl TABLE(ID INT, decimalColumn DECIMAL(4,2));
INSERT INTO #tbl VALUES
(2,0.34)
,(3,0.1);
DECLARE #master TABLE(ID INT, xmlColumn XML);
INSERT INTO #master VALUES
(2,
'<Form formCode="123">
<Node1>234</Node1>
<Node2>234</Node2>
</Form>')
,(3,
'<Form formCode="456">
<Node1>234</Node1>
<Node2>234</Node2>
</Form>')
UPDATE #master SET xmlColumn.modify('insert sql:column("NewNode.AsXml") as last into /Form[1]')
FROM #master AS m
INNER JOIN #tbl AS tbl ON tbl.ID=m.ID
CROSS APPLY(SELECT CAST(tbl.decimalColumn AS VARCHAR(MAX)) FOR XML PATH('NewNode'),TYPE) AS NewNode(AsXml);
SELECT * FROM #master
The result
2 <Form formCode="123"><Node1>234</Node1><Node2>234</Node2><NewNode>0.34</NewNode></Form>
3 <Form formCode="456"><Node1>234</Node1><Node2>234</Node2><NewNode>0.10</NewNode></Form>
Use UPDATE ... FROM and the sql:column function:
DECLARE #temp TABLE (id int, d decimal(10,2));
DECLARE #master TABLE (id int, x xml);
INSERT #temp
VALUEs (2, 0.34),(3,.1);
INSERT #master
VALUES (2, '<Form><Test /></Form>'), (3, '<Form><Test /></Form>')
UPDATE m
SET x.modify('insert <NewNode>{sql:column("d.d")}</NewNode> after (/Form/Test)[1]')
FROM #master m
INNER JOIN #temp d
ON m.id = d.id
SELECT * FROM #master

Create a view in sql server using dynamic sql inside

I know this sounds weird, but is it possible to have a view that use dynamic SQL to build it? I know the views are compiled so most probably this is not possible. For sure I could do it using an stored procedure instead but I just want to make sure is not possible.
Here I have an example:
declare #Table1 as table (
Id int,
Name nvarchar(50),
Provider nvarchar(50)
)
insert #Table1 values (1, 'John', 'Provider1')
insert #Table1 values (2, 'Peter', 'Provider1')
insert #Table1 values (3, 'Marcus', 'Provider2')
declare #Table2 as table (
Id int,
Info nvarchar(50),
AnotherInfo nvarchar(50)
)
insert #Table2 values (1, 'Expense', '480140')
insert #Table2 values (1, 'Maintenance', '480130')
insert #Table2 values (2, 'Set Up Cost', '480150')
insert #Table2 values (2, 'Something', '480160')
--No columns from Table2
select a.Id, a.Name, a.Provider from #Table1 a left join #Table2 b on a.Id = b.Id
--With columns from Table2
select a.Id, a.Name, a.Provider, b.Info, b.AnotherInfo from #Table1 a left join #Table2 b on a.Id = b.Id
The first select looks like I have repeated data, which is normal because I did the left join, the problem is that for avoiding that I need to perform a distinct and this is what I don't want to do. My example is short but I have much more columns and table is quite big.

Left join results in extra records

This is a basic left join problem and I have read many articles explaining what is going on but somehow the resolution is not clicking in my head. My left table has unique records. My right table has several records for each record in the left.
In the articles I have been reading this is often explained as left table has customers and right table has orders. That is very similar but not exactly what I am facing.
In my situation the left table has unique records and the right has repetitive data to be migrated into db the left table is in. So I am trying to write a query that will join on the key shared by both but I only need one record from the right. The results I am getting of course have multiple records since the single left matches multiple times on the right.
I am thinking I need to add some sort of filtering such as Top(1) but still reading / learning and wanted to get feedback / direction from the brainiacs on this list.
Here is a simple schema of what I am working with:
DECLARE #Customer TABLE
(
Id int,
Name varchar(50),
email varchar(50)
)
INSERT #Customer VALUES(1, 'Frodo', 'frodo#middleearth.org')
INSERT #Customer VALUES(2, 'Bilbo', 'Bilbo#middleearth.org')
INSERT #Customer VALUES(3, 'Galadriel', 'Galadriel#middleearth.org')
INSERT #Customer VALUES(4, 'Arwen', 'Arwen#middleearth.org')
INSERT #Customer VALUES(5, 'Gandalf', 'Gandalf#middleearth.org')
DECLARE #CustomerJobs TABLE
(
Id int,
email varchar(50),
jobname varchar(50)
)
INSERT #CustomerJobs VALUES(1, 'frodo#middleearth.org', 'RingBearer')
INSERT #CustomerJobs VALUES(2, 'frodo#middleearth.org', 'RingBearer')
INSERT #CustomerJobs VALUES(3, 'frodo#middleearth.org', 'RingBearer')
INSERT #CustomerJobs VALUES(4, 'frodo#middleearth.org', 'RingBearer')
INSERT #CustomerJobs VALUES(5, 'frodo#middleearth.org', 'RingBearer')
INSERT #CustomerJobs VALUES(6, 'Bilbo#middleearth.org', 'Burglar')
INSERT #CustomerJobs VALUES(7, 'Bilbo#middleearth.org', 'Burglar')
INSERT #CustomerJobs VALUES(8, 'Bilbo#middleearth.org', 'Burglar')
INSERT #CustomerJobs VALUES(9, 'Galadriel#middleearth.org', 'MindReader')
INSERT #CustomerJobs VALUES(10, 'Arwen#middleearth.org', 'Evenstar')
INSERT #CustomerJobs VALUES(10, 'Arwen#middleearth.org', 'Evenstar')
INSERT #CustomerJobs VALUES(11, 'Gandalf#middleearth.org', 'WhiteWizard')
INSERT #CustomerJobs VALUES(12, 'Gandalf#middleearth.org', 'WhiteWizard')
SELECT
Cust.Name,
Cust.email,
CJobs.jobname
FROM
#Customer Cust
LEFT JOIN #CustomerJobs CJobs ON
Cjobs.email = Cust.email
I'm toying with row_number over partition() as maybe I should be joining to a cte with the row_number over partition instead of the table itself???
One other constraint I have is I can't delete the duplicates from the right table.
So again my apologies for the simplistic question and thank you for the help.
Instead of using a left join, use an outer apply... you can then use the top clause to limit the rows returned...
select
Cust.Name
, Cust.email
, CJobs.jobname
from #Customer Cust
outer apply (
select top 1 *
from #CustomerJobs CJobs
where Cjobs.email = Cust.email
) cjobs;
You have to come up with some artificial method of reducing the second table to one row per email. For example:
SELECT
Cust.Name,
Cust.ID,
Cust.email,
CJobs.jobname
FROM
#Customer Cust
LEFT JOIN
(select min(id) as id,email, jobname
from
#CustomerJobs
group by email, jobname) as CJobs ON
Cjobs.email = Cust.email
But that's pretty much random. Is there a way to determine which row from your CustomerJobs table is the "right" one?
SELECT DISTINCT
Cust.Name,
Cust.email,
CJobs.jobname
FROM
#Customer Cust
LEFT JOIN #CustomerJobs CJobs ON
Cjobs.email = Cust.email
The additional of the DISTINCT keyword should get you what you want.
This will work:
SELECT
Cust.Name,
Cust.ID,
Cust.email,
CJobs.jobname
FROM #Customer Cust
LEFT JOIN
(SELECT DISTINCT email, jobname
FROM #CustomerJobs) C2 ON C2.email = C.email

Resources