I need to compare three tables from three different databases in SQL Server. Is this even possible?
I have 3 different data bases: prod, test1, test2. I have a tables with definitions called DEFINITIONS in each database. There are different values in each of the table depending on the database. My job is to compare all of these 3 tables and point the differences.
I was thinking about using the EXCEPT or INTERSECT operators to show the differences or similarities between these 3 tables but I cannot find any information how to merge these 3 databases.
Thanks for any tips!
You can do it by using except / intersect...
Main idea:
-- This creates rows that exist in db1 but not in db2
select * from db1.dbo.table1 t
except
select * from db2.dbo.table2 t
union
-- This creates rows that exist in db2 but not in db1
select * from db2.dbo.table2 t
except
select * from db1.dbo.table1 t
-- Etc...
To get the simularities you change EXCEPT to INTERSECT
The problem with this solution is that one column difference will generate two missing rows, one from db1 and one from db2.
This can be solved by using FULL OUTER JOIN ON primary keys from both tables and just displays row values.
Something like:
select CASE WHEN t.ID IS NULL THEN 'Missing in 1' WHEN t2.ID IS NULL THEN 'Missing in 2' ELSE 'Both exists'
, t.*, t2.*
from db1.dbo.table1 t
FULL OUTER JOIN db2.dbo.table2 t2
ON t2.ID = t.ID
Then you just need to format data for your usage.
A couple of caveats of these approaches:
All tables must have same number / type of columns for EXCEPT SELECT * to work. Otherwise you need to choose which columns to match
Collations of varchar fields should match between the two database tables, otherwise EXCEPT / INTERSECT will crash. You can solve it by "re-collating" the columns by using: SELECT ..., somevarcharcolumn COLLATE DATABASE_DEFAULT
There is also tools for this in Visual Studio and probably other clients (schema and data compare) etc.
Excel has some nice functions for this too, if you load data with matching rows from each table, you can color the diffing fields by using VLOOKUP etc
Related
Say I have Table_1 in Database_1 with 25 Columns
and say I have Table_2 in Database_2 with 19 Columns
I want to compare the columns in Table_1 and Table_2 and output Columns that exist in Table_1 but not in Table_2
I tried
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='Table_1'
EXCEPT
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='Table_2'
The problem is: If I am in Database_1 it only finds variables in Table_1 and return empty list for Table_2, if I am in Database_2 it only finds variables in Table_2 and returns empty list for Table_1. If say I am in Master, it returns empty list for both Table_1 and Table_2. How do I properly locate each table and their variables from one database?
You can access any database object from any database context by fully qualifying the object name in the form of database.schema.object.
Using SQL Server you are better off using the sys schema, which (if performance matters) is better than using the information_schema schema.
So you can do
select name
from database_1.sys.columns
where object_id=object_id(N'database_1.sys.table_1')
except
select name
from database_2.sys.columns
where object_id=object_id(N'database_2.sys.table_2')
I have a SQL Server 2014 Express with multiple databases. One of them has general tables with information common to the remaining databases (let's call this database UniversalData).
The other databases have information that is pertinent to a specific site (let's call one of these databases Site01Data). The universal data may change and I don't want to replicate it regularly to the other site-specific databases, so I want to include the UniversalData table in some queries, some of which involve CTEs.
What I am trying to accomplish:
WITH CTE1 AS
(
SELECT *
FROM UniversalData.dbo.someTable
),
CTE2 AS
(
SELECT *
FROM Site01Data.dbo.anotherTable
),
CTE3 AS
(
SELECT CTE1.field1, CTE2.field2
FROM CTE1
JOIN CTE2 ON CTE1.idx = CTE2.idx
)
SELECT *
FROM CTE3;
This doesn't generate an error, but I seem to get no data from the CTE1 in my final query (null result set). Intuitively, does this mean it is saving a temp table in the UniversalData database that is not accessible from the Site01Data database?
How can I use a CTE with tables from different databases on the same server?
There are lots of ways to do this..
You could read the tables in one database into a temp table on the second database and then join to it.. or join both of them on the fly.
but first.. refrain from doing select *.. specify the columns
You could go
select t1.column1,t2.column2
from UniversalData.dbo.someTable t1
inner join Site01Data.dbo.anotherTable t2
on t2.ida = t2.idx
and so onn.. it depends on which way you want to specify the join and what sort of join you want to choose..
This assumes that both the data bases are on the same instance.. else you will need linked servers
Specify servername.site1data.dbo.table etc and use linked servers if appropriate across different servernames
I am joining two tables together that are located in two separate oracle databases.
I am currently doing this in sas by creating two libname connections to each database and then simply using something like the below.
libname dbase_a oracle user= etc... ;
libname dbase_b oracle user= etc... ;
proc sql;
create table t1 as
select a.*, b.*
from dbase_a.table1 a inner join dbase_b.table2 b
on a.id = b.id;
quit;
However the query is painfully slow. Can you suggest any better options to speed up such a query (short of creating a database link going down the path of creating a database link)?
Many thanks for looking at this.
If those two databases are on the same server and you are able to execute cross-database queries in Oracle, you could try using SQL pass-through:
proc sql;
connect to oracle (user= password= <...>);
create table t1 as
select * from connection to oracle (
select a.*, b.*
from dbase_a.schema_a.table1 a
inner join dbase_b.schema_b.table2 b
on a.id = b.id;
);
disconnect from oracle;
quit;
I think that, in most cases, SAS attemps as much as possible to have the query executed on the database server, even if pass-through was not explicitely specified. However, when that query queries tables that are on different servers, different databases on a system that does not allow cross-database queries or if the query contains SAS-specific functions that SAS is not able to translate in something valid on the DBMS system, then SAS will indeed resort to 'downloading' the complete tables and processing the query locally, which can evidently be painfully inefficient.
The select is for all columns from each table, and the inner join is on the id values only. Because the join criteria evaluation is for data coming from disparate sources, the baggage of all columns could be a big factor in the timing because even non-match rows must be downloaded (by the libname engine, within the SQL execution context) during the ON evaluation.
One approach would be to:
Select only the id from each table
Find the intersection
Upload the intersection to each server (as a scratch table)
Utilize the intersection on each server as pass through selection criteria within the final join in SAS
There are a couple variations depending on the expected number of id matches, the number of different ids in each table, or knowing table-1 and table-2 as SMALL and BIG. For a large number of id matches that need transfer back to a server you will probably want to use some form of bulk copy. For a relative small number of ids in the intersection you might get away with enumerating them directly in a SQL statement using the construct IN (). The size of a SQL statement could be limited by the database, the SAS/ACCESS to ORACLE engine, the SAS macro system.
Consider a data scenario in which it has been determined the potential number of matching ids would be too large for a construct in (id-1,...id-n). In such a case the list of matching ids are dealt with in a tabular manner:
libname SOURCE1 ORACLE ....;
libname SOURCE2 ORACLE ....;
libname SCRATCH1 ORACLE ... must specify a scratch schema ...;
libname SCRATCH2 ORACLE ... must specify a scratch schema ...;
proc sql;
connect using SOURCE1 as PASS1;
connect using SOURCE2 as PASS2;
* compute intersection from only id data sent to SAS;
create table INTERSECTION as
(select id from connection to PASS1 (select id from table1))
intersect
(select id from connection to PASS2 (select id from table2))
;
* upload intersection to each server;
create table SCRATCH1.ids as select id from INTERSECTION;
create table SCRATCH2.ids as select id from INTERSECTION;
* compute inner join from only data that matches intersection;
create table INNERJOIN as select ONE.*, TWO.* from
(select * from connection to PASS1 (
select * from oracle-path-to-schema.table1
where id in (select id from oracle-path-to-scratch.ids)
))
JOIN
(select * from connection to PASS2 (
select * from oracle-path-to-schema.table2
where id in (select id from oracle-path-to-scratch.ids)
));
...
For the case of both table-1 and table-2 having very large numbers of ids that exceed the resource capacity of your SAS platform you will have to also iterate the approach for ranges of id counts. Techniques for range criteria determination for each iteration is a tale for another day.
I'm trying to create a little SQL script (in SQL Server Management Studio) to get a list of all tables in two different databases. The goal is to find out which tables exist in both databases and which ones only exist in one of them.
I have found various scripts on SO to list all the tables of one database, but so far I wasn't able to get a list of tables of multiple databases.
So: is there a way to query SQL Server for all tables in a specific database, e.g. SELECT * FROM ... WHERE databaseName='first_db' so that I can join this with the result for another database?
SELECT * FROM database1.INFORMATION_SCHEMA.TABLES
UNION ALL
SELECT * FROM database2.INFORMATION_SCHEMA.TABLES
UPDATE
In order to compare the two lists, you can use FULL OUTER JOIN, which will show you the tables that are present in both databases as well as those that are only present in one of them:
SELECT *
FROM database1.INFORMATION_SCHEMA.TABLES db1
FULL JOIN database2.INFORMATION_SCHEMA.TABLES db2
ON db1.TABLE_NAME = db2.TABLE_NAME
ORDER BY COALESCE(db1.TABLE_NAME, db2.TABLE_NAME)
You can also add WHERE db1.TABLE_NAME IS NULL OR db2.TABLE_NAME IS NULL to see only the differences between the databases.
As far as I know, you can only query tables for the active database. But you could store them in a temporary table, and join the result:
use db1
insert #TableList select (...) from sys.tables
use db2
insert #TableList2 select (...) from sys.tables
select * from #TableList tl1 join Tablelist2 tl2 on ...
Just for completeness, this is the query I finally used (based on Andriy M's answer):
SELECT * FROM DB1.INFORMATION_SCHEMA.Tables db1
LEFT OUTER JOIN DB2.INFORMATION_SCHEMA.Tables db2
ON db1.TABLE_NAME = db2.TABLE_NAME
ORDER BY db1.TABLE_NAME
To find out which tables exist in db2, but not in db1, replace the LEFT OUTER JOIN with a RIGHT OUTER JOIN.
I need to encapsulate a set of tables JOINs that we freqently make use of on a vendor's database server. We reuse the same JOIN logic in many places in extracts etc. and it seemed a VIEW would allow the JOINs to be defined and maintained in one place.
CREATE VIEW MasterView
AS
SELECT *
FROM entity_1 e1
INNER JOIN entity_2 e2 ON e2.parent_id = entity_1.id
INNER JOIN entity_3 e3 ON e3.parent_id = entity_2.id
/* other joins including business logic */
etc.
The trouble is that the vendor makes regular changes to the DB (column additions, name changes) and I want that to be reflected in the "MasterView" automatically.
SELECT * would allow this, but the underlying tables all have ID columns so I get the "Column names in each view must be unique" error.
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
Is there any way to achieve the dynamism of SELECT * but effectively exclude certain columns (i.e. the ID ones)
Thanks
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
In this case, you can't avoid it. You must specify column names and for those columns with duplicate names use an alias. Code generation can help with these many columns.
SELECT * is bad practice regardless - if someone adds a 2GB binary column to one of these tables and populates it, do you really want it to be returned?
One simple method to generate the columns you want is
select column_name+',' from information_schema.columns
where table_name='tt'
and column_name not in('ID')
As well as Oded's answer (100% agree with)...
If someone changes the underlying tables, you need view maintenance anyway (with sp_refreshview). The column changes will not appear in the view automatically. See "select * from table" vs "select colA, colB, etc. from table" interesting behaviour in SQL Server 2005
So your "reflected in the "MasterView" automatically requirement can't be satisfied anyway
If you want to ensure the view is up to date, use WITH SCHEMABINDING which will prevent changes to the underlying tables (until removed or dropped). Then make column changes, then re-apply the view
I had the same issue, see example below:
ALTER VIEW Summary AS
SELECT * FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
and I encountered that error you mentioned, the easiest solution is using the alias before * like this:
SELECT t1.* FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
You shouldn't see that error anymore.
I had gone with this in the end, building off of Madhivanan's suggestion. It's similar to what t-clausen.dk later suggested (thanks for your efforts) though I find the xml path style more elegant than cursors / rank partitions.
The following recreates the MasterView definition when run. All columns in the underlying tables are prepended with the table name, so I can include two similarly named columns in the view by default. This alone solves my original problem, but I also included the "WHERE column_name NOT IN" clause to specifically exclude certain columns that will never be used in the MasterView.
create procedure Utility_RefreshMasterView
as
begin
declare #entity_columns varchar(max)
declare #drop_view_sql varchar(max)
declare #alter_view_definition_sql varchar(max)
/* create comma separated string of columns from underlying tables aliased to avoid name collisions */
select #entity_columns = stuff((
select ','+table_name+'.['+column_name+'] AS ['+table_name+'_'+column_name+']'
from information_schema.columns
where table_name IN ('entity_1', 'entity_2')
and column_name not in ('column to exclude 1', 'column to exclude 2')
for xml path('')), 1, 1, '')
set #drop_view_sql = 'if exists (select * from sys.views where object_id = object_id(N''[dbo].[MasterView]'')) drop view MasterView'
set #alter_view_definition_sql =
'create view MasterView as select ' + #entity_columns + '
from entity_1
inner join entity_2 on entity_2 .id = entity_1.id
/* other joins follow */'
exec (#drop_view_sql)
exec (#alter_view_definition_sql)
end
If you have a Select * and then you are using the JOIN, the result might include columns with the same name and that cannot be possible in a view.If you run the query by itself, works fine but not when creating the View.
For example:
**Table A**
ID, CatalogName, CatalogDescription
**Table B**
ID, CatalogName, CatalogDescription
**After the JOIN query**
ID, CatalogName, CatalogDescription, ID, CatalogName, CatalogDescription
That's not possible in a View.
Specify a unique name for each column in the view. Using just * is not a very good practice.