SQL Server - update column where <set of columns> not in <other table> - sql-server

Suppose I have two tables Main and Other like the following:
Main
+----------+-----------------------------------+
| Field1 | <set of other columns...> |
+----------+-----------------------------------+
| NULL | ... |
| NULL | ... |
| NULL | ... |
Other
+-----------------------------------+
| <same set of other columns...> |
+-----------------------------------+
| ... |
| ... |
| ... |
Is there a concise way to update Main.Field1 where the rest of the columns, taken together, are not in a row of Other?
In other words, I want to update Field1 for each row in
SELECT <set of other columns...> FROM Main
EXCEPT
SELECT <same set of other columns...> FROM Other
Dynamic SQL is an option, but I'm trying to figure out the most efficient way to do something like this.

If you really wanted to you could use the Except clause
UPDATE m
SET field = 'AValue'
FROM
MAIN m
INNER JOIN
(SELECT * FROM MAIN
EXCEPT
SELECT * FROM OTHER ) t
on m.PK = t.PK
DEMO
You should note that the use of * here is very fragile and you should explicitly set your field list. It also assumes that the joining fields (PK) are available in both Main and Other and they would be the same
Other wise you're much better off using NOT EXISTS or an ANTI-JOIN (LEFT/ISNULL)
UPDATE m
SET Field1 = 'foo'
FROM 
 Main m
LEFT JOIN 
  Other o
ON m.FIELD2 = o.FIELD2
AND m.FIELD3 = o.FIELD3
WHERE 
o.PK is null
DEMO

update m
set ...
from Main m
where not exists(select * from Other where <columns-equal>)
This translates to a left-anti-semi-join.
You could also use a left join but that translates to a normal left-join-plus-filter which is slightly less efficient. This looks like an optimizer weakness.

Related

ON CONFLICT DO UPDATE/DO NOTHING not working on FOREIGN TABLE

ON CONFLICT DO UPDATE/DO NOTHING feature is coming in PostgreSQL 9.5.
Creating Server and FOREIGN TABLE is coming in PostgreSQL 9.2 version.
When I'm using ON CONFLICT DO UPDATE for FOREIGN table it is not working,
but when i'm running same query on normal table it is working.Query is given below.
// For normal table
INSERT INTO app
(app_id,app_name,app_date)
SELECT
p.app_id,
p.app_name,
p.app_date FROM app p
WHERE p.app_id=2422
ON CONFLICT (app_id) DO
UPDATE SET app_date = excluded.app_date ;
O/P : Query returned successfully: one row affected, 5 msec execution time.
// For foreign table concept
// foreign_app is foreign table and app is normal table
INSERT INTO foreign_app
(app_id,app_name,app_date)
SELECT
p.app_id,
p.app_name,
p.app_date FROM app p
WHERE p.app_id=2422
ON CONFLICT (app_id) DO
UPDATE SET app_date = excluded.app_date ;
O/P : ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
Can any one explain why is this happening ?
There are no constraints on foreign tables, because PostgreSQL cannot enforce data integrity on the foreign server – that is done by constraints defined on the foreign server.
To achieve what you want to do, you'll have to stick with the “traditional” way of doing this (e.g. this code sample).
I know this is an old question, but in some cases there is a way to do it with ROW_NUMBER OVER (PARTION). In my case, my first take was to try ON CONFLICT...DO UPDATE, but that doesn't work on foreign tables (as stated above; hence my finding this question). My problem was very specific, in that I had a foreign table (f_zips) to be populated with the best zip code (postal code) information possible. I also had a local table, postcodes, with very good data and another local table, zips, with lower-quality zip code information but much more of it. For every record in postcodes, there is a corresponding record in zips but the postal codes may not match. I wanted f_zips to hold the best data.
I solved this with a union, with a value of ind = 0 as the indicator that a record came from the better data set. A value of ind = 1 indicates lesser-quality data. Then I used row_number() over a partion to get the answer (where get_valid_zip5() is a local function to return either a five-digit zip code or a null value):
insert into f_zips (recnum, postcode)
select s2.recnum, s2.zip5 from (
select s1.recnum, s1.zip5, s1.ind, row_number()
over (partition by recnum order by s1.ind) as rn from (
select recnum, get_valid_zip5(postcode) as zip5, 0 as ind
from postcodes
where get_valid_zip5(postcode) is not null
union
select recnum, get_valid_zip5(zip9) as zip5, 1 as ind
from zips
where get_valid_zip5(zip9) is not null
order by 1, 3) s1
) s2 where s2.rn = 1
;
I haven't run any performance tests, but for me this runs in cron and doesn't directly affect the users.
Verified on more than 900,000 records (SQL formatting omitted for brevity) :
/* yes, the preferred data was entered when it existed in both tables */
select t1.recnum, t1.postcode, t2.zip9 from postcodes t1 join zips t2 on t1.recnum = t2.recnum where t1.postcode is not null and t2.zip9 is not null and t2.zip9 not in ('0') and length(t1.postcode)=5 and length(t2.zip9)=5 and t1.postcode <> t2.zip9 order by 1 limit 5;
recnum | postcode | zip9
----------+----------+-------
12022783 | 98409 | 98984
12022965 | 98226 | 98225
12023113 | 98023 | 98003
select * from f_zips where recnum in (12022783, 12022965, 12023113) order by 1;
recnum | postcode
----------+----------
12022783 | 98409
12022965 | 98226
12023113 | 98023
/* yes, entries came from the less-preferred dataset when they didn't exist in the better one */
select t1.recnum, t1.postcode, t2.zip9 from postcodes t1 right join zips t2 on t1.recnum = t2.recnum where t1.postcode is null and t2.zip9 is not null and t2.zip9 not in ('0') and length(t2.zip9)= 5 order by 1 limit 3;
recnum | postcode | zip9
----------+----------+-------
12021451 | | 98370
12022341 | | 98501
12022695 | | 98597
select * from f_zips where recnum in (12021451, 12022341, 12022695) order by 1;
recnum | postcode
----------+----------
12021451 | 98370
12022341 | 98501
12022695 | 98597
/* yes, entries came from the preferred dataset when the less-preferred one had invalid values */
select t1.recnum, t1.postcode, t2.zip9 from postcodes t1 left join zips t2 on t1.recnum = t2.recnum where t1.postcode is not null and t2.zip9 is null order by 1 limit 3;
recnum | postcode | zip9
----------+----------+------
12393585 | 98118 |
12393757 | 98101 |
12393835 | 98101 |
select * from f_zips where recnum in (12393585, 12393757, 12393835) order by 1;
recnum | postcode
----------+----------
12393585 | 98118
12393757 | 98101
12393835 | 98101

Join TSQL Tables in same DB

I want to do a simple join of two tables in the same DB.
The expected result is:
To get all Node_ID's From the Table T_Tree that are the same as the TREE_CATEGORY from the Table T_Documents
My T_Documents Tabel:
+--------+----------------+---------------------+
| Doc_ID | TREEE_CATEGORY | Desc |
+--------+----------------+---------------------+
| 89893 | 1363 | Test |
| 89894 | 1364 | with a tab or 4 spa |
+--------+----------------+---------------------+
T_Tree Tabel
+----------+-------+
| Node_ID | Name |
+----------+-------+
| 89893 | Hallo |
| 89894 | BB |
+----------+-------+
Doc_ID is the primary key in the T_Documents Table and Tree_Category is the Foreign key
Node_ID is the primary key in the T_Tree Tabel
SELECT DBName.dbo.T_Tree.NODE_ID
FROM DBName.dbo.T_Documents
inner join TREE_CATEGORY on T_Documents.TREE_CATEGORY = DBName.dbo.T_Tree.NODE_ID
I can not figure it out how to do it correctly .. is this even the right approach ?
You were close. Try this:
SELECT t2.NODE_ID
FROM DBName.dbo.T_Documents t1
INNER JOIN DBName.dbo.T_Tree t2
ON t1.Doc_ID = t2.NODE_ID
Comments:
I used aliases in the query, which are a sort of shorthand for the table names. Aliases can make a query easier to read because it removes the need to always list full table names.
You need to specify table names in the JOIN clause, and the columns used for joining in the ON clause.
Your SQL should be:
SELECT DBName.dbo.T_Tree.NODE_ID
FROM DBName.dbo.T_Documents d
inner join T_Tree t on d.Doc_ID = t.Node_ID
Remember: you join relations (tables), not fields.
Also, for it to work, you need to have common values on Node_ID and Doc_ID fields. That is, for each value in Doc_ID of T_Documents table there must be an equal value in Node_ID field of T_Tree table.

Table names in front of field names when using SELECT *

I was wondering if there is a possibility to show full column names when using SELECT * in combination with JOIN.
As an example I have this query (which selects data that gets imported from another application):
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.SomeKey = t2.SomeKey
LEFT JOIN Table3 t3 ON t2.SomeOtherKey = t3.SomeOtherKey
This gives me results like this:
+---------+------+-------+------------+---------+--------------+-----------+------------+--------------+-------+---------------+------------+
| SomeKey | Name | Value | WhateverId | SomeKey | SomeOtherKey | ValueType | CategoryId | SomeOtherKey | Value | ActualValueId | SomeTypeId |
+---------+------+-------+------------+---------+--------------+-----------+------------+--------------+-------+---------------+------------+
| bla | bla | bla | bla | bla | bla | bla | bla | bla | bla | bla | bla |
+---------+------+-------+------------+---------+--------------+-----------+------------+--------------+-------+---------------+------------+
What I'd like to have is the table name in front of each field. The results would be something like this:
+----------------+-------------+-------+------------------+
| Table1.SomeKey | Table1.Name | ..... | Table2.ValueType | .....
+----------------+-------------+-------+------------------+
| bla | bla | ..... | bla | .....
+----------------+-------------+-------+------------------+
I want to do this because the query is already given (without SELECT *) and I now have to find a column in one of the tables with values that match a given identity from an additional table. I know I could analyze each of the tables. However, I'd like to know if there is any simple statement I could add to get the table names in front of the field names.
Well, the answer to that question is NO . You can't alias the columns using the * .
If you want to work around with the columns, specify them your self and alias them as you'd like, in general I try to avoid the use of * to avoid ambiguously columns.
Think about that this way - 2 more minutes of work to avoid a possibilitiy of errors that will take you 100X of it.
You can easily build yourself the SELECT clause using AS for each column. Let's say you have the following tables:
IF OBJECT_ID('dbo.Table01') IS NOT NULL
BEGIN
DROP TABLE Table01;
END;
IF OBJECT_ID('dbo.Table02') IS NOT NULL
BEGIN
DROP TABLE Table02;
END;
CREATE TABLE Table01
(
[ID] INT IDENTITY(1,1)
,[Name] VARCHAR(12)
,[Age] TINYINT
);
CREATE TABLE Table02
(
[ID] INT IDENTITY(1,1)
,[Name] VARCHAR(12)
,[Age] TINYINT
);
and you query is:
SELECT *
FROM Table01
INNER JOIN Table02
ON Table01.[ID] = Table02.[ID];
Just execute the following statement:
SELECT ',' + [source_table] + '.' + [source_column] + ' AS [' + [source_table] + '.' + [source_column] + ']'
FROM sys.dm_exec_describe_first_result_set
(N'SELECT *
FROM Table01
INNER JOIN Table02
ON Table01.[ID] = Table02.[ID]', null,1) ;
You will get this:
Just copy and paste the result to your query (and remove the first comma):
SELECT Table01.ID AS [Table01.ID]
,Table01.Name AS [Table01.Name]
,Table01.Age AS [Table01.Age]
,Table02.ID AS [Table02.ID]
,Table02.Name AS [Table02.Name]
,Table02.Age AS [Table02.Age]
FROM Table01
INNER JOIN Table02
ON Table01.[ID] = Table02.[ID];
and you will get this:
Of course you can play with the output of the function and build the SELECT columns in a way you like (excluding columns, formatting it, etc).

Set-based approach to updating multiple tables, rather than a WHILE loop?

Apparently I'm far too used to procedural programming, and I don't know how to handle this with a set-based approach.
I have several temporary tables in SQL Server, each with thousands of records. Some of them have tens of thousands of records each, but they're all part of a record set. I'm basically loading a bunch of xml data that looks like this:
<root>
<entry>
<id-number>12345678</id-number>
<col1>blah</col1>
<col2>heh</col2>
<more-information>
<col1>werr</col1>
<col2>pop</col2>
<col3>test</col3>
</more-information>
<even-more-information>
<col1>czxn</col1>
<col2>asd</col2>
<col3>yyuy</col3>
<col4>moat</col4>
</even-more-information>
<even-more-information>
<col1>uioi</col1>
<col2>qwe</col2>
<col3>rtyu</col3>
<col4>poiu</col4>
</even-more-information>
</entry>
<entry>
<id-number>12345679</id-number>
<col1>bleh</col1>
<col2>sup</col2>
<more-information>
<col1>rrew</col1>
<col2>top</col2>
<col3>nest</col3>
</more-information>
<more-information>
<col1>234k</col1>
<col2>fftw</col2>
<col3>west</col3>
</more-information>
<even-more-information>
<col1>asdj</col1>
<col2>dsa</col2>
<col3>mnbb</col3>
<col4>boat</col4>
</even-more-information>
</entry>
</root>
Here's a brief display of what the temporary tables look like:
Temporary Table 1 (entry):
+------------+--------+--------+
| UniqueID | col1 | col2 |
+------------+--------+--------+
| 732013 | blah | heh |
| 732014 | bleh | sup |
+------------+--------+--------+
Temporary Table 2 (more-information):
+------------+--------+--------+--------+
| UniqueID | col1 | col2 | col3 |
+------------+--------+--------+--------+
| 732013 | werr | pop | test |
| 732014 | rrew | top | nest |
| 732014 | 234k | ffw | west |
+------------+--------+--------+--------+
Temporary Table 3 (even-more-information):
+------------+--------+--------+--------+--------+
| UniqueID | col1 | col2 | col3 | col4 |
+------------+--------+--------+--------+--------+
| 732013 | czxn | asd | yyuy | moat |
| 732013 | uioi | qwe | rtyu | poiu |
| 732014 | asdj | dsa | mnbb | boat |
+------------+--------+--------+--------+--------+
I am loading this data from an XML file, and have found that this is the only way I can tell which information belongs to which record, so every single temporary table has the following inserted at the top:
T.value('../../id-number[1]', 'VARCHAR(8)') UniqueID,
As you can see, each temporary table has a UniqueID assigned to it's particular record to indicate that it belongs to the main record. I have a large set of items in the database, and I want to update every single column in each non-temporary table using a set-based approach, but it must be restricted by UniqueID.
In tables other than the first one, there is a Foreign_ID based on the PrimaryKey_ID of the main table, and the UniqueID will not be inserted... it's just to help tell what goes where.
Here's the exact logic that I'm trying to figure out:
If id-number currently exists in the main table, update tables based on the PrimaryKey_ID number of the main table, which is the same exact number in every table's Foreign_ID. The foreign-key'd tables will have a totally different number than the id-number -- they are not the same.
If id-number does not exist, insert the record. I have done this part.
However, I'm currently stuck in the mind-set that I have to set temporary variables, such as #IDNumber, and #ForeignID, and then loop through it. Not only am I getting multiple results instead of the current result, but everyone says WHILE shouldn't be used, especially for such a large volume of data.
How do I update these tables using a set-based approach?
Assuming you already have this XML extracted, you could do something similar to:
UPDATE ent
SET ent.col1 = tmp1.col1,
ent.col2 = tmp1.col2
FROM dbo.[Entry] ent
INNER JOIN #TempEntry tmp1
ON tmp1.UniqueID = ent.UniqueID;
UPDATE mi
SET mi.col1 = tmp2.col1,
mi.col2 = tmp2.col2,
mi.col3 = tmp2.col3
FROM dbo.[MoreInformation] mi
INNER JOIN dbo.[Entry] ent -- mapping of Foreign_ID ->UniqueID
ON ent.PrimaryKey_ID = mi.Foreign_ID
INNER JOIN #TempMoreInfo tmp2
ON tmp2.UniqueID = ent.UniqueID
AND tmp2.SomeOtherField = mi.SomeOtherField; -- need 1 more field
UPDATE emi
SET ent.col1 = tmp3.col1,
emi.col2 = tmp3.col2,
emi.col3 = tmp3.col3,
emi.col4 = tmp3.col4
FROM dbo.[EvenMoreInformation] emi
INNER JOIN dbo.[Entry] ent -- mapping of Foreign_ID ->UniqueID
ON ent.PrimaryKey_ID = mi.Foreign_ID
INNER JOIN #TempEvenMoreInfo tmp3
ON tmp3.UniqueID = ent.UniqueID
AND tmp3.SomeOtherField = emi.SomeOtherField; -- need 1 more field
Now, I should point out that if the goal is truly to
update every single column in each non-temporary table
then there is a conceptual issue for any sub-tables that have multiple records. If there is no record in that table that will remain the same outside of the Foreign_ID field (and I guess the PK of that table?), then how do you know which row is which for the update? Sure, you can find the correct Foreign_ID based on the UniqueID mapping already in the non-temporary Entry table, but there needs to be at least one field that is not an IDENTITY (or UNIQUEIDENTIFIER populated via NEWID or NEWSEQUENTIALID) that will be used to find the exact row.
If it is not possible to find a stable, matching field, then you have no choice but to do a wipe-and-replace method instead.
P.S. I used to recommend the MERGE command but have since stopped due to learning of all of the bugs and issues with it. The "nicer" syntax is just not worth the potential problems. For more info, please see Use Caution with SQL Server's MERGE Statement.
you can use MERGE which does upsert ( update and insert ) in a single statement
First merge entries to the main table
For other tables, you can do a join with main table to get foreign id mapping
MERGE Table2 as Dest
USING ( select t2.*, m.primaryKey-Id as foreign_ID
from #tempTable2 t2
join mainTable m
on t2.id-number = m.id-number
) as Source
on Dest.Foreign_ID = m.foreign_ID
WHEN MATCHED
THEN Update SET Dest.COL1 = Source.Col1
WHEN NOT MATCHED then
INSERT (FOREGIN_ID, col1, col2,...)
values ( src.foreign_Id, src.col1, src.col2....)

How to perform statistical computations in a query?

I have a table which is filled with float values. I need to calculate the number of results grouped by their distribution around the mean value (Gaussian Distribution). Basically, it is calculated like this:
SELECT COUNT(*), FloatColumn - AVG(FloatColumn) - STDEV(FloatColumn)
FROM Data
GROUP BY FloatColumn - AVG(FloatColumn) - STDEV(FloatColumn)
But for obvious reasons, SQL Server gives this error: Cannot use an aggregate or a subquery in an expression used for the group by list of a GROUP BY clause.
My question is, can I somehow leave this computation to SQL Server? Or do I have to do it the old fashioned way? Retrieve all the data, and do the calculation myself?
To get the aggregate of the whole set you can use an empty OVER clause
WITH T(Result)
AS (SELECT FloatColumn - Avg(FloatColumn) OVER() - Stdev(FloatColumn) OVER ()
FROM Data)
SELECT Count(*),
Result
FROM T
GROUP BY Result
SQL Fiddle
You can perform a pre-aggregation of the data, and join back to the table.
Schema Setup:
create table data(floatcolumn float);
insert data values
(1234.56),
(134.56),
(134.56),
(234.56),
(1349),
(900);
Query 1:
SELECT COUNT(*) C, D.FloatColumn - A
FROM
(
SELECT AVG(FloatColumn) + STDEV(FloatColumn) A
FROM Data
) preagg
CROSS JOIN Data D
GROUP BY FloatColumn - A;
Results:
| C | COLUMN_1 |
--------------------------
| 2 | -1196.876067819572 |
| 1 | -1096.876067819572 |
| 1 | -431.436067819572 |
| 1 | -96.876067819572 |
| 1 | 17.563932180428 |

Resources