How to Query Remote Database with Huge Where Clause from Local Database - sql-server

I am populating a table in a local database (localdb, SQL Server) with data pulled from a remote database (remotedb, Oracle).
The remote database is across the internets, and of course, local database is local to my db server.
I need to pull data from remotedb, and in order to pull the data from remotedb, I need to only pull data from specific key values (e.g. the user_id field on a table on remotedb).
The set of keys is large, approx. 10,000.
Oh, by the way, remotedb is read-only access, and I don't have access to things like imp/exp or batch or access to the server (it is with an outside vendor).
Currently, I am using a query like this:
SELECT
<my data>
FROM
<remotedb>.<remote_table> join <remotedb>.<remote_table> ...
WHERE
<remotedb>.<remote_table>.<remote_field> in
( <select that returns my 20,000 IDs> )
This seems like a brute force solution to me, but I can't think of any other way to do it.
Has anyone out there solved this problem?

If I'm reading your statement correctly, we had a similar issue with this at a past company and it created a performance hit. Unless the vendor can pull all the information for you beforehand and you perform a query off that, I don't see another way around it.

Related

SSIS Cross-DB "WHERE IN" Clause (or Equivalent) in Azure

I'm currently trying to build a data flow in SSIS to select all records from a mapping table where an ID column exists in the related Item table. There are two complications:
The two tables are currently in different databases on different servers.
The databases are in Azure, for which I've read Linked Servers are not supported.
To be more clear, the job to migrate data from Staging environment to Production. I only want to push lookup records into prod if the associated Item IDs are in there. Here's some psudo-TSQL to give a clear goal of what I'm trying to achieve:
SELECT *
FROM [Staging_Server].[SourceDB].[dbo].[Lookup] L
WHERE L.[ID] IN (
SELECT P.[Item]
FROM [Production_Server].[TargetDB].[dbo].[Item] P
)
I haven't found a good way to create this in SSIS. I think I've created a work-around that involves sorting both tables and performing a merge join, but sorting both sides is an unnecessary hit on performance. I'm looking for a more direct and intuitive design for this seemingly simple data flow.
Doing this in a data flow, you'd have your Source query, sans filter, fed into a Lookup Component which is the subquery.
The challenge with this is SSIS is likely on-premises so that means you are going to pull all of your data out of Stage Azure to the server running SSIS and push it back to the Prod Azure instance.
That's a lot of network activity and as I'm reading the Azure pricing guide, I guess as long as you have the appropriate DTUs, you'd be fine. Back in the day, you were charged for Reads and not Writes so the idiom was to just push all your data to target server and then do the comparison there, much as ElendaDBA mentions. Only suggestion I'd make on the implementation is to avoid temporary tables or ad-hoc creation/destruction of them. Just implement as a physical table and truncate and reload prior to transmission to production.
You could create a temp table on staging server to copy production data into. Then you could create a query joining those two tables. After SSIS package runs, you could delete the temp table on staging server

How to have data upload from access to sql server upon opening file

My form is housing the data I need it to, and I have gotten all the functions corrected. I am trying to figure out a way to get the data from the local access table to auto upload to a sql server db as long as there is an established connection, I was told record sets may be a good way to go. And I tried just linking the table but that is not what my boss is looking for. ANy ideas or direction would be greatly appreciated.
The data housed is employee data, each time it uploads I just need it to push all data regardless if its new or not onto the same table, it will be overwritten. THe data now is saved to a local table whenever the form is filled out, so if it can connect it needs to upload and if it cannot then it just sits and continues to sit on the local table.
You need to create the destination table on SQLserver and link it to your access solution as an ODBC connected table. Also create an insert query that writes from your local table to the ODBC table.
Now you simply need to trigger that insert query (for example CurrentDB.execute ) during a fitting event in your form (open, close, before_update etc.).

Tools to update tables in SQL server 2000/2005

Is there any handy tool that can make updating tables easier? Usually I got an Excel file with the original value in one column and new value in another column. Then I write a formula in Excel to create the 'update' statement. Is there any way to simplify the updating task?
I believe the approach in SQL server 2000 and 2005 would be different, so could we discuss them both? Thanks.
In addition, these updates usually request by "non-programmer" (which means they don't understand SQL, so it may not feasible to let them do query), is there any tool that can let them update the table directly without having DBAs do this task? Also, that tool needs to limit the privilege to only modify certain tables. And better has a way rollback the change.
Create a DTS package that will import a csv file, make the updates and then archives the file. The user can drop the file in a specific folder designated for the task or this can be done by an ops person. Schedule the DTS to run every hour, day, etc.
In case your users would insist that they keep using Excel, you've got several different possibilities of getting the data transferred to SQL Server. My preferred one would be to use DTS/SSIS, as mentioned by buckbova.
However, another method is by using OPENROWSET(), which makes it possible to query your Excel file as if it was a table. I wrote a small article about it here: http://blog.hoegaerden.be/2010/03/29/retrieving-data-from-excel/
Another approach that hasn't been mentioned yet (I'm not a big fan of letting regular users edit data directly in the DB), any possibility of creating a small custom application for them?
There you go, a couple more possible solutions :-)
Valentino.
I think the best approach is to expose a view on your data accessible to users who are allowed to do updates, and set up triggers on the view to perform the actual updates on the underlying data. Restrict change to only the columns they should be changing.
This technique can work on SQL Server 2000 and 2005.
I would add audit triggers on the underlying tables so you can always track changes.
You'll have complete control, and they can connect to it with Access or whatever and perform their maintenance.
You could create some accounts in SQL Server for these users and limit their access to only certain tables and columns along with onlu select / update / insert privileges. Then you could create an access database with linked tables to these.

Data retrieval and Join operations with cluster db server

If any database spreads across multiple servers (ex. Microsoft Sql Server), how can we do join or filter operations. In my scenario, if suppose:
A single table spreads across multiple servers how can we filter rows based on user input?
If master table is there on one db server and transaction table is at another db server, how can we do join operations?
Please let me know how can we achieve this and where can I get more details about this?
I think you're confused about SQL clustering - it doesn't allow you to split tables across multiple servers any differently than you would put different tables in different databases on the same server. Clustering is used for hot failover and redundancy.
However, I think I see what you're asking - if you want to split a database up between different physical servers, the easiest thing to do might be to have a VIEW that unifies those tables together in one place, and then you can query and filter that. SQL Server is smart enough (as long as there are indexes and statistics in place to make the decision from) to send the query where it needs to go if you select something from the unifying view.
For example, say you have two servers - SERVER1 and SERVER2 that both have a database - DATABASE - and each server has a table - TABLE - that has half the data in it (between the two servers, you have every row). Just create a view somewhere - either server, or somewhere else entirely - that looks like this, and then add Linked Servers for SERVER1 and SERVER2 that allow SQL Server to get the data from the remote location:
CREATE VIEW SomeView
AS
SELECT *
FROM SERVER1.DATABASE..TABLE
UNION
ALL
SELECT *
FROM SERVER2.DATABASE..TABLE
That way, you have one place to query and you'll always grab the data from whatever server it's on, instead of querying each server by itself. You can do this even if you don't want to split up individual tables - just create a view for each table you want to move, and have the view check whatever server the table is actually located on.
If I've missed your actual question, please leave a comment and some clarification and I'll be happy to add some more detail.

Copy Database Data from Many DBs to One. Data Replication (sort of)

This involves data replication, kind of:
We have many sites with SQL Express installed, there is an 'audit' database on each site that has one table in 1st normal form (to make life simple :)
Now I need to get this table from each site, and copy the contents (say, with a Date Time Value > 1/1/200 00:00, but this will change obviously) and copy it to a big 'super table' in sql server proper, that also has the primary key as the Site Name (That needs injecting in) and the current primary key from the SQL Express table)
e.g. Many SQL Express DBs with the following table columns
ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
And the big super table needs to have:
SiteName, ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
Where items in bold are the primary key(s)
Is there a Microsoft (or non MS I suppose) app/tool/thing to manager copying all this data accross already, or do we need to write our own?
Many thanks.
You can use SSIS (which comes with SQL Server) to populate, it can be set up with variables to change the connection string to the various databases. I have one that loops through the whole list and does the same process using three differnt files from three differnt vendors. You could so something simliar to loop through the different site databases. Put the whole list of database you want to copy the audit data from in a table and loop through it changing the connection string each time.
However, why on earth would you want one mega audit table per site? If every table in the database populates the audit table as changes happen, then the audit table eventually becomes a huge problem for performance. Every insert, update and delete has to hit this table and then you are proposing to add an export on top of that. This seems to me to be a guaranteed structure for locking and deadlocks and all sorts of nastiness. Do yourself a favor and limit each audit table to the table it is auditing.
Things to consider:
Linked servers and sp_msforeachdb as part of a do-it-yourself solution.
SQL Server Replication (by Microsoft) (which I believe can pull data from SQL Server Express)
SQL Server Integration Services which can pull data from SQL Server Express instances.
Personally, I would investigate Integration Services first.
Good luck.
You could do this with SymmetricDS. SymmetricDS is open source, web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.
As of right now, however, you would need to implement a custom IDataLoaderFilter extension point (in Java) to add the extra column. The metadata would be available though because your SiteName would be the external_id.

Resources