SSIS Cross-DB "WHERE IN" Clause (or Equivalent) in Azure - sql-server

I'm currently trying to build a data flow in SSIS to select all records from a mapping table where an ID column exists in the related Item table. There are two complications:
The two tables are currently in different databases on different servers.
The databases are in Azure, for which I've read Linked Servers are not supported.
To be more clear, the job to migrate data from Staging environment to Production. I only want to push lookup records into prod if the associated Item IDs are in there. Here's some psudo-TSQL to give a clear goal of what I'm trying to achieve:
SELECT *
FROM [Staging_Server].[SourceDB].[dbo].[Lookup] L
WHERE L.[ID] IN (
SELECT P.[Item]
FROM [Production_Server].[TargetDB].[dbo].[Item] P
)
I haven't found a good way to create this in SSIS. I think I've created a work-around that involves sorting both tables and performing a merge join, but sorting both sides is an unnecessary hit on performance. I'm looking for a more direct and intuitive design for this seemingly simple data flow.

Doing this in a data flow, you'd have your Source query, sans filter, fed into a Lookup Component which is the subquery.
The challenge with this is SSIS is likely on-premises so that means you are going to pull all of your data out of Stage Azure to the server running SSIS and push it back to the Prod Azure instance.
That's a lot of network activity and as I'm reading the Azure pricing guide, I guess as long as you have the appropriate DTUs, you'd be fine. Back in the day, you were charged for Reads and not Writes so the idiom was to just push all your data to target server and then do the comparison there, much as ElendaDBA mentions. Only suggestion I'd make on the implementation is to avoid temporary tables or ad-hoc creation/destruction of them. Just implement as a physical table and truncate and reload prior to transmission to production.

You could create a temp table on staging server to copy production data into. Then you could create a query joining those two tables. After SSIS package runs, you could delete the temp table on staging server

Related

Copy records from a table on one SQL instance to an identical table on a different SQL instance

We had an intern who was given written instructions for deleting old data from a database based on dates (from within our ERP system). They were fascinated by the results and just kept deleting instead of stopping at the required date. There are now 4 years of missing records in the production database. I have these records in my development database, which is in a different instance on a different server. Is there a way to transfer just those 4 years worth of data from my development database to my production database, checking, of course, to make sure there are no duplicates (unique index on transaction number).
I haven't tried anything yet because I'm not sure where to start. I do have a test database on the same instance as the production database that I could use to test the transfer with.
There are several ways to do this. Assuming that this is on a different machine, you will want to create a Linked Server on your dev machine to link to the target server (Or, technically, a link from the production server to your dev machine could be used as well). Then, perform an insert of the selected records from the source to the target.
More efficiently, you can use the Export Data functionality. Right click on the database (Not the server / instance, but the database) and select Tasks / Export Data from the popup menu. This will pop up the SQL Server Import and Export Wizard. Use your query above to select the data for export.
If security considerations interfere with this, create a duplicate of the table(s) with alternate names (e.g. MyInvRecords) in a new database, and export the data into those tables. Back up that DB, transfer it to someplace accessible from the target server, restore that DB, then transfer the rows back into the original DB.
I haven't had to use anything but these methods before, so one of them should work for you.
A basic insert will work just fine.
Insert ProdDB.schema.YourTable
([Columns])
select ([Columns])
from TestDB.schema.YourTable
where YourDateRange predicates here

Most efficient and easiest way to back up a portion of specific tables hourly

I need to create an hourly .SQB backup file of some specific tables, each filtered with a WHERE clause, from a SQL Server database. As an example, I need this data:
SELECT * FROM table1 WHERE pk_id IN (2,5,7)
SELECT * FROM table2 WHERE pk_id IN (2,5,7)
SELECT * FROM table3 WHERE pk_id IN (2,5,7)
SELECT * FROM table4 WHERE pk_id IN (2,5,7)
The structure of the tables on the source database may change over time, e.g. columns may be added or removed, indexes added, etc.
One option is to do some kind of export, script generation, etc. into a staging database on the same instance of SQL Server. Efficiency aside, I have no problem dropping or truncating the tables on the destination database each time. In short, I'm looking to have both the schema and data of the tables duplicated to the destination database. That's completely acceptable.
Another is to just create a .SQB backup from the source database. Being that the .SQB file is all that I really need (it's going to be sent SFTP) - that would be fine, too.
What's the recommended approach in this scenario?
Well if I understand your requirement correctly, you want data from some tables from your database to be shipped over to somewhere else periodically.
Thing that is not possible in SQL server is taking a backup of a subset of tables from your database. So, this is not an option.
Since you have mentioned you will be using SFTP to send the data, using BCP command to extract data is one option, but BCP command may or may not perform very well and it definitely will not scale-out very well.
Instead of using BCP, I would prefer an SSIS package, you will be able to do all (extract files, add where clauses, drop files on SFTP, tune your queries, logging, monitoring etc) in your SSIS package.
Finally, SQL Server Replication can be used to create a subscriber, only publish the articles (tables) you are interested in, you can also add where clauses in your publications.
Again there are a few options with the replication subscriber database.
Give access to your data clients to your subscriber database, no need
for extracts.
Use BCP on the subscriber database to extract data,
without putting load on your production server.
Use SSIS Package to
extract data from the subscriber database.
Finally create a backup of
this subscriber database and ship the whole backup (.bak) file to
SFPT.
I short there is more than one way to skin the cat, now you have to decide which one suits your requirements best.

Design for importing definition data from Excel into SQL Server

We have Restaurant Inventory Control system that uses SQL Server 2008 R2.
It takes a very long time to add all the definition data: stock items, yields, packsizes, recipes, categories etc. So, our clients have asked if they can upload it from Excel.
Before I just jump in and start, I want to find out if there is a best practice way to do this.
I know all the tools: SSIS, stored procedures etc. But I'm looking for advice/resources that can help with the design process. How best to setup the spreadsheet, validate the data, create the child/parent relationships etc.
This must be a fairly common project -- so it must have a standard design/approach and that's what I'm looking for.
I think the design will depend on the technologies you're most comfortable with. If you're comfortable with SSIS and stored procedures, this is the general pattern I would use:
Excel Template - I wouldn't spend too much time on this, add the headers and sheets necessary for the tables. You can lock down certain things and/or implement rules, but most of your validation would be done in stored procs.
SSIS - Have a package that loads the excel data into Staging tables, have rows with errors get added to an error log to be presented to the user along with the validation issues from the stored procedures.
Staging Tables - Have one staging table per sheet/production table, have an ExecutionId column in each staging table to allow parallel processing. Allow all columns to be NULL so you can get the data in the staging tables or set the proper null conditions and have SSIS redirect these rows on error. Don't have any primary key / foreign key relationships in the staging tables, these can be validated in the stored procedure
Stored Procedures - Validate the staging data, any issues found would be added to the error log to be presented to the user or person performing the import. If there are no issues, import the data into the production tables. If there is existing data in the production tables, you could do a comparison and update if applicable.

Migrate SQL Server data from one database to another

We have 2 servers with a database containing the same tables and structures but different data. One is the testing environment, the other one contains productive data.
We want to copy the data from the productive database to the testing database. What is the best approach to achieve this?
If I delete the data first; will I be able to insert the data? Or will the primary key's count from where they were? What about inserting the primary keys for tables that have autonumbering?
It will depend on your specific need and data structure but here are some options to think over (prioritised in terms of what I would recommend):-
A simple backup and restore will be the easiest and quickest solution;
Using a data scripting tool (like Red-Gate's Data Compare) could solve your needs;
A SSIS package could be developed to pump data back and forth between the two instances; or
Write your own script using the SET IDENTITY INSERT ON / OFF command for the identity seeded tables

Copy Database Data from Many DBs to One. Data Replication (sort of)

This involves data replication, kind of:
We have many sites with SQL Express installed, there is an 'audit' database on each site that has one table in 1st normal form (to make life simple :)
Now I need to get this table from each site, and copy the contents (say, with a Date Time Value > 1/1/200 00:00, but this will change obviously) and copy it to a big 'super table' in sql server proper, that also has the primary key as the Site Name (That needs injecting in) and the current primary key from the SQL Express table)
e.g. Many SQL Express DBs with the following table columns
ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
And the big super table needs to have:
SiteName, ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
Where items in bold are the primary key(s)
Is there a Microsoft (or non MS I suppose) app/tool/thing to manager copying all this data accross already, or do we need to write our own?
Many thanks.
You can use SSIS (which comes with SQL Server) to populate, it can be set up with variables to change the connection string to the various databases. I have one that loops through the whole list and does the same process using three differnt files from three differnt vendors. You could so something simliar to loop through the different site databases. Put the whole list of database you want to copy the audit data from in a table and loop through it changing the connection string each time.
However, why on earth would you want one mega audit table per site? If every table in the database populates the audit table as changes happen, then the audit table eventually becomes a huge problem for performance. Every insert, update and delete has to hit this table and then you are proposing to add an export on top of that. This seems to me to be a guaranteed structure for locking and deadlocks and all sorts of nastiness. Do yourself a favor and limit each audit table to the table it is auditing.
Things to consider:
Linked servers and sp_msforeachdb as part of a do-it-yourself solution.
SQL Server Replication (by Microsoft) (which I believe can pull data from SQL Server Express)
SQL Server Integration Services which can pull data from SQL Server Express instances.
Personally, I would investigate Integration Services first.
Good luck.
You could do this with SymmetricDS. SymmetricDS is open source, web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.
As of right now, however, you would need to implement a custom IDataLoaderFilter extension point (in Java) to add the extra column. The metadata would be available though because your SiteName would be the external_id.

Resources