Let's say I have two tables, A and B, each with unique ID columns A_id and B_id respectively. Then let's say I wake up one day and decide that the two tables have a relationship. So, I create a table AB that contains A_id, B_id pairs. Then I go to write a SQL server script that inserts these pairs based on other data in the tables, say A_name and B_name. I'd expect the actual insertion to work something like this (though with more advanced WHERE clauses typed in by the user though a Powershell script or something):
INSERT INTO AB (A_id, B_id)
VALUES
((SELECT (A_id) FROM A WHERE A_name = 'bob'),
(SELECT (B_id) FROM B WHERE B_name = 'john'))
I'm not sure of the correct syntax for such an operation. Can anyone point me in the right direction?
Instead of selecting from two sub-selects, you should select from a join of the two tables, using whatever logic you are going to use:
INSERT INTO AB (A_id, B_id)
SELECT a.A_id, b.B_id
FROM A
INNER JOIN B
ON A.SomeColumn=B.SomeColumn
Or, to replicate your example more precisely, it would look like this:
INSERT INTO AB (A_id, B_id)
SELECT a.A_id, b.B_id
FROM A
INNER JOIN B
ON A.A_name='bob'
AND B.B_name='john'
Related
Hello SQL Server experts,
I want to take data from two different tables and insert them into a separate new (third) table. I have created the shell of the third table with the needed columns and datatypes. However, there is a common/linking column in the two tables, let's call this column identifier varchar(10).
To properly insert the data from the two tables into the third without repeat, would my code need to look something like this:
insert into Third_Table
select (identifier, column2, column3, column4)
from First_Table
select (identifer, column5, column6)
from Second_Table
full join First_Table.identifier = Second_Table.identifier;
Thanks for any counsel!
You seem to be looking to full join both tables and insert the results in the target table.
Consider:
insert into Third_Table
select
coalesce(t2.identifier, t2.identifier)
case when t2.identifier is not null then t2.column2 else t3.column5 end
case when t2.identifier is not null then t2.column3 else t3.column6 end
case when t2.identifier is not null then t2.column4 else t3.column7 end
from first_Table t1
full outer join Second_Table t2 on t1.identifier = t2.identifier;
NB: your original table seemingly had a missing column in Second_Table, that had only three wo columns for insert insted of four on First_Table, I added it.
If it's 1 to 1, I think you are after something like:
INSERT INTO Third_Table
SELECT First_Table.identifier, First_Table.column2, First_Table.column3, First_Table.column4, Second_Table.identifer, Second_Table.column5, Second_Table.column6
FROM First_Table
INNER JOIN Second_Table ON First_Table.identifier = Second_Table.identifier
if not, just make sure you use correct join type.
I have the same table in two different databases. It has the same columns, primary keys, etc. but data in this table may differ from one database to another. So I am trying to get the differences. For example:
Database A Database B
Table_A Table_B
Table_A and Table_B have Id1 and Id2 fields as primary key.
Table_A and Table_B is exactly the same but may contain different data. So I would like to obtain the differences, I mean, obtain the data that is in Table_A but not in Table_B, and insert them in Table_B, or if it is not possible to automatically insert them in Table_B to generate a list of inserts.
To obtain the data that is in Table_A and not in Table_B and vice versa, I do the following:
SELECT a.*, b.*
FROM Table_A a
FULL JOIN Table_B b ON (a.Id1=b.Id1 and a.Id2=b.Id2)
WHERE a.Id1 IS NULL OR b.Id1 IS NULL or a.Id2 IS NULL OR b.Id2 IS NULL
Then I use excel to generate my inserts to be inserted on table Table_B.
Is that correct? or is there any better way to do this?
For your scenario I would go with the MERGE statement
MERGE INTO Table_B AS Trg
USING (SELECT ID1, ID2, YourDataColumn FROM Table_A) AS Src
ON Trg.ID1 = Src.ID1 AND Trg.ID2 = Src.ID2
WHEN NOT MATCHED BY TARGET THEN
INSERT (ID1, ID2, YourDataColumn )
VALUES (Src.ID1, Src.ID2, Src.YourDataColumn );
Would be great to know what database you're on.
Oracle SQL Developer got tooling for this. http://www.thatjeffsmith.com/archive/2012/09/sql-developer-database-diff-compare-objects-from-multiple-schemas/
Stupidly simple question, but I just don't know what to google!
If I create a query like this:
Select id, data
from table1
Now I want to join with table2. I can immediately see that the id column is no longer unique and I have to change it to
table1.id
Is there any smart way (like a keyboard-shortcut) to do this, instead of manually adding table1 to every column? Either before I add the Join to secure that all columns will be unique, or after with suggestions based on the different possible tables.
No, there is no helper.
But do not you can alias the table name:
select x.Col1, y.Col2
from ALongTableName x
inner join AReallyReallyLongTableName y on x.Id = y.OtherId
which can also make queries clearer, and is very much necessary when doing self joins.
First of all, you should start using aliases:
SQL aliases are used to give a database table, or a column in a table,
a temporary name.
Basically aliases are created to make column names more readable.
This will narrow down your problem and make your code maintenance easier. If that's not enough, I guess you could start using auto-completion tools, such as these:
SQL Complete
SQL Prompt
ApexSQL Complete
These have your desired functionality, however, they do not always work as expected (at least for me).
Oh! You can use alias table name. Like this:
SELECT A.ID, A.data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
You just only use A. or B. if two table have same this column selected. If they different, you don't need: Like this:
SELECT A.ID, data -- if Table B not have column data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Or:
Select A.*, B.ID
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Ok, basically what is needed is a way to have row numbers while using a lot of joins and having where clauses using these rownumbers.
such as something like
select ADDRESS.ADDRESS FROM ADDRESS
INNER JOIN WORKHISTORY ON WORKHISTORY.ADDRESSRID=ADDRESS.ADDRESSRID
INNER JOIN PERSON ON PERSON.PERSONRID=WORKHISTORY.PERSONRID
WHERE PERSONRID=<some number> AND WORKHISTORY.ROWNUMBER=1
ROWNUMBER needs to be generated for this query on that one table though. So that if we want to access the second WORKHISTORY record's address, we could just go WORKHISTORY.ROWNUMBER=2 and if say we had two address's that matched, we could cycle through the addresses for one WORKHISTORY record using ADDRESS.ROWNUMBER=1 and ADDRESS.ROWNUMBER=2
This should be capable of being an automatically generated query. Thus, there could be more than 10 inner joins in order to get to the relevant table, and we need to be able to cycle through each table's record independently of the rest of the tables..
I'm aware there is the RANK and ROWNUMBER functions, but I'm not seeing how it will work for me because of all the inner joins
note: in this example query, ROWNUMBER should be automatically generated! It should never be stored in the actual table
Can you use a temp table?
I ask because you can write the code like this:
select a.field1, b.field2, c.field3, identity (int, 1,1) as TableRownumber into #temp
from table1 a
join table2 b on a.table1id = b.table1id
join table3 c on b.table2id = c.table2id
select * from #temp where ...
I have a master table A, with ~9 million rows. Another table B (same structure) has ~28K rows from table A. What would be the best way to remove all contents of B from table A?
The combination of all columns (~10) are unique. Nothing more in the form a of a unique key.
If you have sufficient rights you can create a new table and rename that one to A. To create the new table you can use the following script:
CREATE TABLE TEMP_A AS
SELECT *
FROM A
MINUS
SELECT *
FROM B
This should perform pretty good.
DELETE FROM TableA WHERE ID IN(SELECT ID FROM TableB)
Should work. Might take a while though.
one way, just list out all the columns
delete table a
where exists (select 1 from table b where b.Col1= a.Col1
AND b.Col2= a.Col2
AND b.Col3= a.Col3
AND b.Col4= a.Col4)
Delete t2
from t1
inner join t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
and t1.col4 = t2.col4
and t1.col5 = t2.col5
and t1.col6 = t2.col6
and t1.col7 = t2.col7
and t1.col8 = t2.col8
and t1.col9 = t2.col9
and t1.col10 = t2.col0
This is likely to be very slow as you would have to have every col indexed which is highly unlikely in an environment when a table this size has no primary key, so do it during off peak. What possessed you to have a table with 9 million records and no primary key?
If this is something you'll have to do on a regular basis, the first choice should be to try to improve the database design (looking for primary keys, trying to get the "join" condition to be on as few columns as possible).
If that is not possible, the distinct second option is to figure out the "selectivity" of each of the columns (i.e. how many "different" values does each column have, 'name' would be more selective than 'address country' than 'male/female').
The general type of statement I'd suggest would be like this:
Delete from tableA
where exists (select * from tableB
where tableA.colx1 = tableB.colx1
and tableA.colx2 = tableB.colx2
etc. and tableA.colx10 = tableB.colx10).
The idea is to list the columns in order of the selectivity and build an index on colx1, colx2 etc. on tableB. The exact number of columns in tableB would be a result of some trial&measure. (Offset the time for building the index on tableB with the improved time of the delete statement.)
If this is just a one time operation, I'd just pick one of the slow methods outlined above. It's probably not worth the effort to think too much about this when you can just start a statement before going home ...
Is there a key value (or values) that can be used?
something like
DELETE a
FROM tableA a
INNER JOIN tableB b
on b.id = a.id