I need to consolidate 20 databases that have the same structure into 1 database. I saw this post:
Consolidate data from many different databases into one with minimum latency
I didn't understand all of this so let me ask like this: There are some table who have primary keys but don't have sourceID, example:
DataBase 1
AgencyID Name
1 Apple
2 Microsoft
Database 2
AgencyID Name
1 HP
2 Microsoft
It's obvious that these two tables cannot be merged like this, it needs aditional column:
DataBase 1
Source AgencyID Name
DB1 1 Apple
DB1 2 Microsoft
Database 2
Source AgencyID Name
DB2 1 HP
DB2 2 Microsoft
If this is the right way of doing this, can these two tables be merged in one database like this:
Source AgencyID Name
DB1 1 Apple
DB1 2 Microsoft
DB2 1 HP
DB2 2 Microsoft
...and is it possible to do it with Transactional replication?
Thanks in advance for the answer, it would be really helpful if I would get the right answer for this.
Ilija
If I understand you correctly you can do that by
creating an DTS/SSIS package.
Here is a basic SSIS tutorial.
or running SQL directly like
INSERT INTO [TargetDatabase].dbo.[MergedAgency]([Source], [AgencyID], [Name])
SELECT CAST('DB1' AS nvarchar(16)), [AgencyID], [Name]
FROM [SourceDatabase1].dbo.[Agency]
INSERT INTO [TargetDatabase].dbo.[MergedAgency]([Source], [AgencyID], [Name])
SELECT CAST('DB2' AS nvarchar(16)), [AgencyID], [Name]
FROM [SourceDatabase2].dbo.[Agency]
Then call either by a recurring SQL Server Job with one Job Step and a Schedule
Don't forget to think about how you detect which row have already been copied to the target database.
I solved the problem. Now I am using Transactional Replication. In "Publication Properties > Article Properties" I have to set "Action if name is in use" flag to "Keep existing object unchanged". Default is "Drop existing object and create a new one".
In SQL 2008 even when I change table scheme these changes are applied to consolidation database.
SQL-Hub (http://sql-hub.com) will let you merge multiple databases with the same schema in to a single database. There is a free licence that will let you do this from the UI though you might need to pay for a license if you want to schedule the process to run automatically. It's much easier to use than replication - though not quite as efficient.
Related
I'm trying to work out a specific way to copy all data from a particular table (let's call it opportunities) and copy it into a new table, with a timestamp of the date copied into the new table, for the sole purpose of generating historic data into a database hosted in Azure Data Warehousing.
What's the best way to do this? So far I've gone and created a duplicate table in the data warehouse, with an additional column called datecopied
The query I've started using is:
SELECT OppName, Oppvalue
INTO Hst_Opportunities
FROM dbo.opportunities
I am not really sure where to go from here!
SELECT INTO is not supported in Azure SQL Data Warehouse at this time. You should familiarise yourself with the CREATE TABLE AS or CTAS syntax, which is the equivalent in Azure DW.
If you want to fix the copy date, simply assign it to a variable prior to the CTAS, something like this:
DECLARE #copyDate DATETIME2 = CURRENT_TIMESTAMP
CREATE TABLE dbo.Hst_Opportunities
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT OppName, Oppvalue, #copyDate AS copyDate
FROM dbo.opportunities;
I should also mention that the use case for Azure DW is million and billions of rows with terabytes of data. It doesn't tend to do well at low volume, so consider if you need this product, a traditional SQL Server 2016 install, or Azure SQL Database.
You can write insert into select query like below which will work with SQL Server 2008 +, Azure SQL datawarehouse
INSERT INTO Hst_Opportunities
SELECT OppName, Oppvalue, DATEDIFF(SECOND,{d '1970-01-01'},current_timestamp)
FROM dbo.opportunities
In Impala, is it possible to list all tables in a given database with the date each table is created? Something like:
In my_database:
TABLE CREATED_DATE
-----------------------
table_1 2016-01-01
table_2 2016-02-12
table_3 2016-05-03
Thanks a lot!
I don't Think there is a specific command to do what you are asking for:
What we usually do is to list all tables in a given DB:
show tables in db_name
then for each table we run:
show create table table_name
look for the propery transient_lastDdlTime'='1479222568' which shows the timestamp of the creation time. you'll have to change that to a "readable date".
You can do this easily on a python script, installing pyodbc package and cloudera Impala ODBC driver.
This is actually trivially easy with one huge but. With direct READONLY access to the Hive metastore - which may cause additional load that could affect impala and hive performance - you can query table properties directly.
The metastore may be built on a number of different databases - mysql and postgres are both options I believe. Here is an example from my postgres based Metastore.
select "TBL_NAME", "CREATE_TIME" from "TBLS" limit 10;
I have data in 2 SQL Server 2012 database servers. I need to create a view containing data from both servers.
My first step was to import the join-table from Server2 into Server1 and create the view. The problem is though, that I need to keep the exported table up-to-date and thus a static "export" of the table is not ideal.
What methods could I use in order to create a dynamic join between 2 tables on 2 different servers?
You could establish linked server and use 4 part names:
CREATE VIEW dbo.my_view
AS
SELECT * -- cols list
FROM dbo.table_name t
JOIN server_name.database_name.schema_name.table_name c
ON t.id = c.id;
Note:
If view will be part of transaction, MS DTC (distributed transaction coordinator) should be enabled.
Depending how you build your query, performance may be degraded.
Not every type can be used (like XML)
I'm starting out with Linq To SQL, fiddling around with Linqpad and I'm trying to duplicate a SQL script which joins on tables in separate databases on the same server (SQL Server 2008).
The TSQL query looks approximately like this:
using MainDatabase
go
insert Event_Type(code, description)
select distinct t1.code_id, t2.desc
from OtherDatabase..codes t1
left join OtherDatabase..lookup t2 on t1.key_id = t2.key_id and t2.category = 'Action 7'
where t2.desc is not null
I'm basically trying to figure out how to do a cross-database insertion. Is this possible with Linq To SQL (and is it possible in Linqpad?)
This is possible in LINQ to SQL if you create a (single) typed DataContext that contains table classes for objects in both databases. This designer won't help you here, so you have to create some of the table classes manually. In other words, use the VS designer to create a typed DataContext for your primary database, then manually add classes for the tables in the other database that you wish to access:
[Table (Name = "OtherDatabase.dbo.lookup")]
public class Lookup
{
...
}
Edit: In LINQPad Premium edition, you can now do cross-database queries with SQL Server - in one of two ways.
The simplest is the drag-and-drop approach: hold down the Ctrl key while dragging additional databases from the Schema Explorer to the query editor. To access those additional databases in your queries, use database.table notation, e.g., Northwind.Regions.Take(100). The databases that you query must reside on the same server.
The second approach is to list the extra database(s) that you want to query in the connection properties dialog. This dialog also lets you choose databases from linked servers. Here's how to proceed:
Add a new LINQ to SQL connection.
Choose Specify New or Existing Database and choose the primary database that you want to query.
Click the Include Additional Databases checkbox and pick the extra database(s) you want to include. You can also choose databases from linked servers in this dialog.
You can now do cross-database queries. These are properly optimized insofar as joins will occur on the server rather than the client.
Use linked servers with fully qualified names to query another database from the current DB. That should work.
using MainDatabase
go
insert Event_Type(code, description)
select distinct t1.code_id, t2.desc
from <Linked_Server>.OtherDatabase..codes t1
left join <Linked_Server>.OtherDatabase..lookup t2 on t1.key_id = t2.key_id and t2.category = 'Action 7'
where t2.desc is not null
I want to update a static table on my local development database with current values from our server (accessed on a different network/domain via VPN). Using the Data Import/Export wizard would be my method of choice, however I typically run into one of two issues:
I get primary key violation errors and the whole thing quits. This is because it's trying to insert rows that I already have.
If I set the "delete from target" option in the wizard, I get foreign key violation errors because there are rows in other tables that are referencing the values.
What I want is the correct set of options that means the Import/Export wizard will update rows that exist and insert rows that do not (based on primary key or by asking me which columns to use as the key).
How can I make this work? This is on SQL Server 2005 and 2008 (I'm sure it used to work okay on the SQL Server 2000 DTS wizard, too).
I'm not sure you can do this in management studio. I have had some good experiences with
RedGate SQL Data Compare in synchronising databases, but you do have to pay for it.
The SQL Server Database Publishing Wizard can export a set of sql insert scripts for the table that you are interested in. Just tell it to export just data and not schema. It'll also create the necessary drop statements.
One option is to download the data to a new table, then use commands similar to the following to update the target:
update target set
col1 = d.col1,
col2 = d.col2
from downloaded d
inner join target t on d.pk = t.pk
insert into target (col1, col2, ...)
select (d.col1, d.col2, ...) from downloaded d
where d.pk not in (select pk from target)
If you disable the FK constrains during the 2nd option - and resume them after finsih - it will work.
But if you are using identity to create pk that are involves in the FK - it will cause a problem, so it works only if the pk values remains the same.