Data retrieval and Join operations with cluster db server - sql-server

If any database spreads across multiple servers (ex. Microsoft Sql Server), how can we do join or filter operations. In my scenario, if suppose:
A single table spreads across multiple servers how can we filter rows based on user input?
If master table is there on one db server and transaction table is at another db server, how can we do join operations?
Please let me know how can we achieve this and where can I get more details about this?

I think you're confused about SQL clustering - it doesn't allow you to split tables across multiple servers any differently than you would put different tables in different databases on the same server. Clustering is used for hot failover and redundancy.
However, I think I see what you're asking - if you want to split a database up between different physical servers, the easiest thing to do might be to have a VIEW that unifies those tables together in one place, and then you can query and filter that. SQL Server is smart enough (as long as there are indexes and statistics in place to make the decision from) to send the query where it needs to go if you select something from the unifying view.
For example, say you have two servers - SERVER1 and SERVER2 that both have a database - DATABASE - and each server has a table - TABLE - that has half the data in it (between the two servers, you have every row). Just create a view somewhere - either server, or somewhere else entirely - that looks like this, and then add Linked Servers for SERVER1 and SERVER2 that allow SQL Server to get the data from the remote location:
CREATE VIEW SomeView
AS
SELECT *
FROM SERVER1.DATABASE..TABLE
UNION
ALL
SELECT *
FROM SERVER2.DATABASE..TABLE
That way, you have one place to query and you'll always grab the data from whatever server it's on, instead of querying each server by itself. You can do this even if you don't want to split up individual tables - just create a view for each table you want to move, and have the view check whatever server the table is actually located on.
If I've missed your actual question, please leave a comment and some clarification and I'll be happy to add some more detail.

Related

Alias a database name within the query as to avoid dynamic SQL?

We have 4 distinct environments for one of our applications through our Dev to Prod pipeline, each having two databases. Within these databases there are several queries referencing the other database. So for example we may have this:
select a.name, b.description
FROM Database1.dbo.Names a
INNER join Database2.dbo.Descriptions b on a.ID = b.ID;
Just an example, not an actual query from our system... but hopefully you get the point. All four environments live on their own SQL servers so I can gracefully move code from bottom to top without any changes.
Now we're looking at possibly combining the bottom three environments on to the same SQL Server Instance (Prod will keep its own server), so all six databases will be on the same server together. In doing this the database names will have a suffix of their type, so Database1_Dev and Database2_Dev for example.
Well there lies the issue, all queries using just Database1 and Database2 will now break, and if I do change them to use Database1_Dev and Database2_Dev at the lowest level I can't just move them up to the next level Database1_Sandbox and Database2_Sandbox without modifying the queries.
The options I'm looking at are requesting three SQL instances on this server - which I'm sure will get shot down - or changing the queries to use dynamic SQL where I can pass in the DB name as a variable. I don't like either option.
Synonyms are the only thing I can think of that would allow a different name to be used, but this is going the wrong direction. I don't need one DB to have multiple names, rather multiple databases to have the same name within the same context. So like if in a procedure I could pass in the environment and I could alias Database1 to Database1_Sandbox just within the transaction or query. I don't think there is any way to do this, but I thought I'd ask.
Thanks for any suggestions.

SSIS Cross-DB "WHERE IN" Clause (or Equivalent) in Azure

I'm currently trying to build a data flow in SSIS to select all records from a mapping table where an ID column exists in the related Item table. There are two complications:
The two tables are currently in different databases on different servers.
The databases are in Azure, for which I've read Linked Servers are not supported.
To be more clear, the job to migrate data from Staging environment to Production. I only want to push lookup records into prod if the associated Item IDs are in there. Here's some psudo-TSQL to give a clear goal of what I'm trying to achieve:
SELECT *
FROM [Staging_Server].[SourceDB].[dbo].[Lookup] L
WHERE L.[ID] IN (
SELECT P.[Item]
FROM [Production_Server].[TargetDB].[dbo].[Item] P
)
I haven't found a good way to create this in SSIS. I think I've created a work-around that involves sorting both tables and performing a merge join, but sorting both sides is an unnecessary hit on performance. I'm looking for a more direct and intuitive design for this seemingly simple data flow.
Doing this in a data flow, you'd have your Source query, sans filter, fed into a Lookup Component which is the subquery.
The challenge with this is SSIS is likely on-premises so that means you are going to pull all of your data out of Stage Azure to the server running SSIS and push it back to the Prod Azure instance.
That's a lot of network activity and as I'm reading the Azure pricing guide, I guess as long as you have the appropriate DTUs, you'd be fine. Back in the day, you were charged for Reads and not Writes so the idiom was to just push all your data to target server and then do the comparison there, much as ElendaDBA mentions. Only suggestion I'd make on the implementation is to avoid temporary tables or ad-hoc creation/destruction of them. Just implement as a physical table and truncate and reload prior to transmission to production.
You could create a temp table on staging server to copy production data into. Then you could create a query joining those two tables. After SSIS package runs, you could delete the temp table on staging server

Linking tables between databases

I’m after a bit of advice on the best way to go about this is SQL server 2008R2 express. I have a number of applications that are in separate databases on the same server. They are all “plugins” that use a central staff/structure list that will be in a separate database. The application is in the process of being migrated from JET.
What I’m looking for is the best way of all the “plugin” databases being able to see the central database and use those tables in standard queries and views etc.
As I’m using express that rules out any replication solution and so far the only option I can think of is to use triggers or a stored procedure to “push” out all the changes to the plugins. The information needs to be populated on a near enough real time basis however the number of changes will be very small maybe up to 100 a day and the biggest table only has about 1000 rows at the moment (the staff names table).
Hopefully that will cover all everything but if anyone needs any more details then just ask
Thanks
Apologies if I've misunderstood, but from your description it sounds like all these databases are hosted on the same instance of SQL Server - it's your mention of replication that makes me uncertain.
Assuming that's the case, you should be able to replace any copies of tables from the central database which are held in the "plugin" databases with views or synonyms which reference the central tables directly, since SQL server allows you to make references between databases on the same server using three-part naming (database_name.schema_name.object_name)
For example, if each plugin db has a table StaffNames, you could replace this with a view by dropping the table, then creating a view:
drop table StaffNames
go
create view StaffNames
as
select * from <centraldbname>.<schema - probably dbo>.StaffNames
go
and your code should continue to work seamlessly, as long as permissions are set up.
Alternatively, you could replace all the references to the shared tables in the plugin databases with three-part name references to the central database, but the view method requires less work.

Copy Database Data from Many DBs to One. Data Replication (sort of)

This involves data replication, kind of:
We have many sites with SQL Express installed, there is an 'audit' database on each site that has one table in 1st normal form (to make life simple :)
Now I need to get this table from each site, and copy the contents (say, with a Date Time Value > 1/1/200 00:00, but this will change obviously) and copy it to a big 'super table' in sql server proper, that also has the primary key as the Site Name (That needs injecting in) and the current primary key from the SQL Express table)
e.g. Many SQL Express DBs with the following table columns
ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
And the big super table needs to have:
SiteName, ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
Where items in bold are the primary key(s)
Is there a Microsoft (or non MS I suppose) app/tool/thing to manager copying all this data accross already, or do we need to write our own?
Many thanks.
You can use SSIS (which comes with SQL Server) to populate, it can be set up with variables to change the connection string to the various databases. I have one that loops through the whole list and does the same process using three differnt files from three differnt vendors. You could so something simliar to loop through the different site databases. Put the whole list of database you want to copy the audit data from in a table and loop through it changing the connection string each time.
However, why on earth would you want one mega audit table per site? If every table in the database populates the audit table as changes happen, then the audit table eventually becomes a huge problem for performance. Every insert, update and delete has to hit this table and then you are proposing to add an export on top of that. This seems to me to be a guaranteed structure for locking and deadlocks and all sorts of nastiness. Do yourself a favor and limit each audit table to the table it is auditing.
Things to consider:
Linked servers and sp_msforeachdb as part of a do-it-yourself solution.
SQL Server Replication (by Microsoft) (which I believe can pull data from SQL Server Express)
SQL Server Integration Services which can pull data from SQL Server Express instances.
Personally, I would investigate Integration Services first.
Good luck.
You could do this with SymmetricDS. SymmetricDS is open source, web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.
As of right now, however, you would need to implement a custom IDataLoaderFilter extension point (in Java) to add the extra column. The metadata would be available though because your SiteName would be the external_id.

Use SSIS to migrate and normalize database

We have an MS Access database that we want to migrate to a SQL Server Database with a new DB design. A part of the application that uses the SQL Server DB is already written.
I looked around to find out how to do the migration step most easily and started with Microsofts SQL Server Integration Services (SSIS). Now I have gotten to the point that I want to split a table vertically for normalization reasons.
A made up example looks like this
MS Access table person
ID
Name
Street
SQL Server table person
id
name
SQL Server table address
id
person_id
street
How can I complete this task best with SSIS? The id columns are identity (autoincrement) columns, so I cannot insert the old ID. How can I put the correct person_id foreign key in the address table?
There might even be a table which has to be broken up into three tables, where a row in table2 belongs to table1 and a row in table3 belongs to a row table2.
Is SSIS the appropriate means for this?
EDIT
Although this is a one-time migration, we need to have an automated and repeatable process, because the production database is under heavy usage and we are working on the migration in our development environment with recent, but not up-to-date data. We plan for one test run of the migration and have the customer review the behaviour. If everything is fine, we will go for the real migration.
Most of the given solutions include lots of manual steps and are thus not appropriate.
Use the execute SQL Task and write the statement yourself.
For the parent table do Select into table from table... then do the same for the rest as you progress. Make sure you set identity insert to ON for the parent table and reuse your old ID's. That will help you keep your data integrity.
For migrating your Access tables into SQL Server, use SSMA, not the Upsizing Wizard from Access.
You'll get a lot more tools at your disposal.
You can then break up your tables one by one from within SQL Server.
I'm not sure if there are any tools that can help you split your tables automatically, at least I couldn't find any, but it's not too difficult to do manually although how much work is required depends on how you used the original tables in your VBA code and forms in the first place.
A side note
Regarding normalization, don't go overboard with it: I know your example was just that but normalizing customer addresses is not always (rarely?) needed.
How many addresses can a person have?
If you count a home address, business address, delivery address, billing address, that's probably the most you'll ever need.
In that case, it's better to just keep them in the same table. Normalizing that data will just require more work to recombine and offers no benefit.
Of course, there are cases where it would make sense to normalise but I've seen people going overboard with the notion (I've been guilty of it as well) and then find themselves struggling to build more complex queries to join all that split data, making development and maintenance harder and often suffering a performance penalty in the process.
Access is so user-friendly, why not normalize your tables in Access, and then upsize the finished structure from there?
I found a different solution which was not mentioned yet and allows us to use all the comfort and options of the dataflow task:
If the destination database is on a local SQL Server, you can use a dataflow task with SQL Server destination instead of an OLE DB destination.
For a SQL Server destination you can mark the "keep identities" option. (I do not know if the english names are correct, because we have a german version.) With this you can write into identity columns
We found that we cannot use the old primary keys everywhere, because we have some tables that take a union of records from multiple tables.
We start the process by building a temporary mapping table with columns
new_id (identity)
old_id (int)
old_tablename (string)
We first fill in all the old_id s for every table that is referenced by a foreign key in the new schema. The new_id values are generated automatically by SQL Server.
So we can use a join to translate from old_id to new_id where needed. We use the new_id values to fill the identity (primary key) columns in the new tables with the "keep identities" option and can simply look them up in our mapping table for the foreign keys by a join.
You might also look at Jamie Thomson's SSIS Normalizer component. I just found out about it today (haven't actually tried it yet). The example he posts looks a lot like the one in your question.

Resources