How to compare Accesss SQL result to SQL Server results efficiently? - sql-server

I am trying to convert my Access query to SQL views. I can see how many records are returned in both cases easily. But it is getting difficult to make sure the record matches as I have to check it manually. Also I don't check every single record so there might be some conversion error that I did not notice.
Is there a way to check at least some segment of the result automatically?

In a way, you are asking the wrong question.
When you migrate the data to sql server, then at that point you assumed all data made it to sql server. I suppose you could do a one-time check of the record counts.
At that point, you then link the tables (that likely pointed to an access back end) to now pointing to sql server.
At this point, all of the existing queries in Access should and will work fine. ZERO need exists to test and check this after a migration.
For existing forms that are bound to a table (a linked table), then you DO NOT NEED to create views on sql server and this is simply not done nor required.
The ONLY time you will convert an access query to a view is if the access query runs slow.
For a simple query say on a table of 1 million invoices, you can continue to use the Access query (that is running against the linked tables). If you do a query such as
Select * from tblInvoices where invoiceNum = 12345
Then Access will ONLY pull the one record down the network pipe. So ZERO need and performance gains will occur by converting this query to a view (do not do this!!! – you are simply wasting valuable time and money).
However, for a complex query with multiple joins, and multiple tables (which are NOT in general used for a form), then you do want to convert such quires to a view, and then given the view the SAME name as what the access (client side) query had.
So you not “out of the blue” going to wind up with a large number of views on sql server after a migration.
You migrate the data, link the tables. At that point 99% of your application will work just fine as before. After I migration you DO NOT NEED TO create any views.
Creating views is a “after the fact” and occurs AFTER you have the application running. So views are NOT created until such time you have the application up and running. (And deployed).
To get your access application to work with sql server you DO NOT create ANY views. You do not even create ONE view.
You migrate the data, link the tables, and now your application should work.
You ONLY start creating views for access queries that run slow.
So for example, you might have a report that now runs slow. The reason is that while access does a good job for most existing quires (that you don’t have to change), for some, you find that the access client does a “less than ideal” job of running that query.
So you flip that query in access into sql view mode, cut + paste that sql into the sql manager, (creating a view). You then modify the syntax of the query. For some sql, it might be simply less effort to bring up the query in the access designer, and then bring up the sql designer in sql server, and simply re-create the sql from scratch in place of the cut + paste idea. (Both approaches work fine – it depends on how “crazy” the sql is you are working with as to which approach you find works better (try both – see what suits you best). Some like to cut+psate the sql, and others like to simply look at the graphical designer in both cases (on two monitors) and work that way.
At this point, after you created the sql query (the view), you simply run that view, and look at the record count. You then run the query that you have on the access client, and if it returns the same number of records, then in 999.9% of cases, you can be rather confident that the sql query (the view) produces the same results as the client query. You now delete (or re-name) the access query, link to the view (giving it the same name as the access query, and you are now done.
The 2, or 20 places that used that access client query will now be using the view. You should not have to worry or modify one thing in Access after having done this.
Now that the view is working, you then re-deploy your access front end to all workstations (I assume you have some means to update the access application part – just like you did even when not using sql server).
At this point, users are all happy and working.
If in the next day or so, you now find another report that runs slow (most will not), then that access query becomes a possible choice by you to convert to a view.
So out an application of say 200 quires in access, you will in general likely only have to convert 2-10 of them to a views. The VAST majority of the access saved queries DO NOT AND SHOULD NOT be converted to views. It is a waste of time to convert an access query that runs perfectly well and performs well to a view.
So your question suggests that you have to convert a “lot” of Access quires to views. This is NEVER the case nor is it a requirement.
So the amount of times that you need to convert an access query to a view is going to be quite rare. Most forms are based on a table. Or now a linked table to sql server. In these cases you do NOT want to convert to a view, and you don’t need to.
So the answer here is simply run the access query, and then run the view. They are both working on the same data (you using linked tables and the access queries will HIT those linked tables and work just fine). If the access client using the linked tables to sql server returns the same number of records as your view on sql server, then as noted, you are 999.8% done, and you are now safe to re-name the access query, and link to the view with the same name.
However, the number of times you do this after a migration is VERY low and VERY rare.
You don’t convert any existing access saved query to a view unless running that query in access is taking excessive time.
As noted, for “not too complex” sql quires (ones that don’t involve joins between too many tables), you will not see nor find a benefit by converting that access query to a view. You KEEP and use the existing Access quires as they are.
You only want to spend time on the “few” queries that run slow, the rest do NOT need to be modified.
For an existing form that is bound to a table (or now a linked table), such forms are NOT to be changed to views – they are editing tables directly, (or linked tables), and REALLY REALLY need to remain as such. Attempting to covert such forms using a linked table to a view is HUGE mess, and something that will not help performance, but worse only introduce potential 100’s of issues and bugs into your existing application, and you doing something that is NOT required.
Because you will create so VERY FEW views, then a automated approach is not required. You simply make the one query, run it, and then run the existing access query (client side). If they both return the same number of reocrds, then it is a cold chance in hell day that you need further testing. You going to be creating one view at a time, and creating these views over a "longer" period of time. You NEVER migrate access and then migrate a large number of queries to sql server. That is NOT how a migration works.
You migrate, link the tables. At that point your application should work fine. You get it working (and not yet have created any views). You get the application deployed. ONLY after you have everything working just fine do you THEN start to introduce views into the mix and picture. As noted, because you only ever going to work with introducing ONE view into the application, and this will occur "over time" while users are using the existing and working application, then little need for some automated approach and test is required. As I stated, with 200+ access client queries, you should not need to convert more then about 5, maybe 10 to views - the rest are to remain as they are now and untouched and unmodified by you.

Related

Effect of stored procedures on network traffic in Access/SQL setup

I am currently administering/developing an Access 2010 frontend/SQL backend database. We are trying to improve frontend performance, and one solution that has been suggested is pushing a lot of the VBA that is running the front end down into stored procedures on the server. I'm fairly proficient in VBA, but very new to SQL and network architecture. Everything I've turned up on google has been information about splitting the database, which is already done, rather than information about network loads resulting from running stored procedures vs running VBA.
What is the difference in network traffic between the current setup and pushing this action down to a stored procedure?
As a specific example, if I'm populating a form in the current setup, there are a few queries run to provide data to different elements on the form. With the current architecture, does Access retrieve the queried tables from the backend, query them client-side and then populate the data? How would that be different in terms of network traffic from, say, executing a SP when the form loads, and only transferring the data necessary for displaying the form?
The end goal is to reduce the chattiness between Access and SQL, and I'm mostly trying to figure out exactly what is happening where.
As a general rule, if you launch a form open with a where clause to restrict the form to one record, then using a bound form, or adopting a stored procedure will NOT result in any difference or reduction in network traffic.
Any local access query based on a table simply will request the one record. There is no “local” concept of processing in this regards EVEN with a linked table. Note the word “table” or singular here.
Access does not and will not pull down a whole table unless you have such forms and quires without any “where” clause to restrict the data pulled.
In other words if you have a poorly designed form, dump and change that design to something in which you now ONLY pull down the one record, then of course the setup will result in reduced network traffic.
However the above reduction is NOT DUE to adopting the stored procedure but ONLY that of adopting a design in which you restrict the records requested into the form.
So doing something poorly and then improving that process is NOT a justification to adopt stored procedures.
Thus in the case of pulling records into a form the using a stored procedure will NOT improve performance. Worse is binding a form to a stored procedure results in a form that is READY ONLY anyway!
So stored procedures don’t necessary increase performance or reduce network traffic when talking about loading a record into a form in terms of response time or performance.
If you have to do large amounts of recordset processing then of course adopting a stored procedure can save network performance. So in place of some VBA code to process 100,000 payroll reocrds, then yes moving such code server side will help. However processing a 100,000 payroll records is NOT common task and is NOT a user interface issue in most cases anyway. In other words, you don’t have a slow loading form or slow response time to load such forms. In other words, such types of processing are NOT done interactive by users waiting for a form to load.
SQL server is indeed a high performance system, and also a system that can scale to many users.
If you write your application in c++, or VB or in your case with ms-access, in GENERAL the performance of all of these tools will BE THE SAME.
In other words...sql server is rather nice, and is a standard system used in the IT industry.
However, sql server will NOT solve your performance issues without efforts on your part. And, it turns out that MOST of those same efforts also make your non sql server Access applications run better.
In fact, we see many posts that mention moving the back end data
to sql server actually slowed things down. (and in fact on a single machine, Access JET (now called ACE) is actually FASTER THEN SQL server (so when single user on same machine – Access is faster than SQL server on the same machine in most cases).
A few things:
Having a table with 75k records is quite small. Let’s assume you have 12 users. With a just a 100% file base system (jet), and no sql server, then the performance of that system should really have screamed.
I have some applications out there with 50, or 60 HIGHLY related tables. With 5 to 10 users on a network, response time is instant. I don't think any form load takes more than one second. Many of those 60+ tables are highly relational and in the 50 to 75k records range.
So, with my 5 users I see no reason why I can’t scale to 15 users with such small tables in the 75,000 record range. And this is without SQL server.
If the application did not perform with such small tables of only 75k records then upsizing to sql server will do absolute nothing to fix performance issues. In fact, in the sql server newsgroups you see weekly posts by people who find that upgrading to sql actually slowed things down.
I even seem some very cool numbers showing that some queries where actually MORE EFFICIENT in terms of network use by JET then sql server.
My point here is that technology will NOT solve performance problems. However, good designs that make careful use of limited bandwidth resources is the key here. So, if the application was not written with good performance in mind then you kind are stuck with a poor design!
I mean, when using a JET file share, you grab a invoice from the 75k record table only the one record is transferred down the network with a file share (and, sql server will also only transfer one record). So, at this point, you
really will NOT notice any performance difference by upgrading to SQL Server. There is no magic here. And adopting a SQL stored procedure will be even a GREATER waste of time!
And adopting a stored procedure in place of above will NOT gain you performance either!
Sql server is a robust and more scalable product then is JET. And, security, backup and host of other reasons make sql server a good choice. However, sql server will NOT solve a performance problem with dealing with such small tables as 75k records
Of course, when efforts are made to utilize sql server, then significant advances in performance can be realized.
I will give a few tips...these apply when using ms-access as a file share (without a server), or even odbc to sql server:
** Ask the user what they need before you load a form!
The above is so simple, but so often I see the above concept ignored. For example, when you walk up to an instant teller machine, does it download every account number and THEN ASK YOU what you want to do?
In access, it is downright silly to open up form attached to a table WITHOUT FIRST asking the user what they want! So, if it is a customer invoice, get the invoice number, and then load up the form with the ONE record. How can one record be slow? When done editing the record and the form is closed, and you are back to the prompt ready to do battle with the next customer.
You can read up on how this "flow" of a good user interface works here (and this applies to both JET, and sql server applications):
http://www.kallal.ca/Search/index.html
My only point here is restrict the form to only the ONE record the user needs. You don't need nor gain by using a stored procedure to accomplish this task. I am always dismayed how often a developer builds a nice form, attaches it to a large table, and then opens it and the throws this form attached to some huge table and then tells the users to go have at this and have fun. Don't we have any kind of concern for those poor users? Often, the user will not even know how to search for something!
So prompt, and asking the user also makes a HUGE leap forward in usability. And, the big bonus is reduced network traffic too! Gosh better and faster, and less network traffic! What more do we want!
** USE CAUTION with quires that require more than one linked table
JET has a real difficult time joining odbc tables together. Often the Access data engine (jet/Ace) does a good job, but often such joins are slow. However most forms for editing data are NOT based on a multi-table query. (so again, a stored procedure will not speed up form load for editing of data).
The simple solution for such multiple joins (for both forms and reports) is build the sql server side as a view, and then link to that view.
This view approach is MUCH less work then a stored procedure and results in the joins occurring server side. And results view are updatable as opposed to READ ONLY when you adopt stored procedures. And performance of such views will again equal that of stored procedure in THIS context.
So once gain, adopting stored procedures DOES NOT help and is more expensive from a developer cost then simply using a view. Really this just amounts to people suggesting that you rack up bills and use developer time to create something that yields nothing over that of a view except more billable hours.
I don't think it needs pointing out that if the query in question already runs well, then the above can be ignored, but just keep in mind that local queries with more than one table based on links to sql server can often run slow. So, just be aware of the above.
This view trick also applies well to combo boxes.
So one can continue to use bound forms to a linked table but one simply needs to restrict the form to the ONE RECORD you need.
You can safely open up to a single invoice form etc. but simply ENSURE you open such forms (openForm) by restricting records via the "where" clause. No view, or stored procedure is required here.
Bound forms are way less work then un-bound forms and performance is generally just as good anyway when done right.
Avoid large loading of combo boxes. A combo box is good for about 100 entries. After that you are torturing the user (what they got to look through 100s of entries). So, keep things like combo boxes down to a min size. This is both faster and MORE importantly it is kinder to your users.
After all, at the end of the day what we really want is to treat users well. It seems that treating the users well, and reducing the bandwidth (amount of data) goes hand in hand.
So, better applications treat the users well and run faster! (this is good news!)
So, #1 tip is to reduce the data that you transfer into a form.
Using stored procedures is not required in the vast majority of cases and will not reduce bandwidth requirements anymore then adopting where clauses and views.

How to copy entire SQL Server 2008 database, applying WHERE clause to restrict copied data

To allow more realistic conditions during development and testing, we want to automate a process to copy our SQL Server 2008 databases down from production to developer workstations. Because these databases range in size from several GB up to 1-2 TB, it will take forever and not fit onto some machines (I'm talking to you, SSDs). I want to be able to press a button or run a script that can clone a database - structure and data - except be able to specify WHERE clauses during the data copy to reduce the size of the database.
I've found several partial solutions but nothing that is able to copy schema objects and a custom restricted data without requiring lots of manual labor to ensure objects/data are copied in correct order to satisfy dependencies, FK constraints, etc. I fully expect to write the WHERE clause for each table manually, but am hoping the rest can be automated so we can use this easily, quickly, and frequently. Bonus points if it automatically picks up new database objects as they are added.
Any help is greatly appreciated.
Snapshot replication with conditions on tables. That way you will get your schema and data replicated whenever needed.
This article describes how to create a merge replication, but when you choose snapshot replication the steps are the same. And the most interesting part is Step 8: Filter Table Rows. of course, because with this you can filter out all the unnecessary data to get replicated. But this step needs to be done for every entity and if you've got like hundreds of them, then you'd better analyze how to do that programmatically instead of going through the wizard windows.

Database locked (or slow) with Linq To Entities and Stored Procedures

I'm using Linq To Entities (L2E) ONLY to map all my Stored Procedures in my database for easy translation into objects. My data is not really sensitive (so I'm considering Isolation level "READ UNCOMMITED" everywhere). I have several tables with millions of rows. I have the website and a bunch of scripts utilizing the same datamodel created using entity framework. I have indexed all tables (max 3 for each table) so that every filter I use is directly catched by an index. My scripts mainly consists of
1) Get your from DB (~5 seconds)
2) Making API (1-3 seconds)
3) Adding result in database
I have READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION set to ON.
Using this strategy most my queries is fixed and very fast (usually). I still have some queries used by my scripts that could run up to 20 second but the are not called that often.
The problem is that suddenly the whole database gets slow and all my queries are returned slowly (can be over 10 seconds). Using Sql Profiler I have tried to find the issue.
As mentioned Im considering NOLOCKS using "READ UNCOMMITED"... Right now I'm going through each possible database call and adding indexes and/or caching tables to make the call faster.
I have also considered removing L2E and accesing the "old way" to be sure thats not the issue. Is My data context looking my tables? it sure looks that way. I have experimenting haveing the context living over the API call to minimize created context but right now I create a new context for each call since I thought it was looking the database.
The problem is that I cannot control, that every single call is made fast, for all eternity otherwise the whole system gets slowed down.
When I restart sql server and rebuild indexes its really fast for a short period of time before everything gets slow again. Any pointers would be appreciated.
Have you considered checking if there are any major waits on the server.
Review the following page on sys.dm_os_wait_stats (Transact-SQL) and see if you can get any insight into why the server is slow....

Copy Multiple Tables into ONE Table (From Multiple Databases)

I've got multiple identical databases (distributed on several servers) and need to gather them to one single point to do data mining, etc.
The idea is to take Table1, Table2, ..., TableN from each database and merge them and put the result into one single big database.
To be able to write queries, and to know from which database each row came from we will add a single column DatabaseID to target table, describing where the row came from.
Editing the source tables is not an option, it belongs to some proprietary software.
We've got ~40 servers, ~170 databases and need to copy ~40 tables.
Now, how should we implement this given that it should be:
Easy to setup
Easy to maintain
Preferably easy to adjust if database schema changes
Reliable, logging/alarm if something fails
Not too hard to add more tables to copy
We've looked into SSIS, but it seemed that we would have to add each table as a source/transformation/destination. I'm guessing it would also be quite tied to the database schema. Right?
Another option would be to use SQL Server Replication, but I don't see how to add the DatabaseID column to each table. It seems it's only possible to copy data, not modify it.
Maybe we could copy all the data into separate databases, and then to run a local job on the target server to merge the tables?
It also seems like a lot of work if we'd need to add more tables to copy, as we'd have to redistribute new publications for each database (manual work?).
Last option (?) is to write a custom application to our needs. Bigger time investment, but it'd at least do precisely what we'd like.
To make it worse... we're using Microsoft SQL Server 2000.
We will upgrade to SQL Server 2008 R2 within 6 months, but we'd like the project to be usable sooner.
Let me know what you guys think!
UPDATE 20110721
We ended up with a F# program opening a connection to the SQL Server where we would like the aggregated databases. From there we query the 40 linked SQL Servers to fetch all rows (but not all columns) from some tables, and add an extra row to each table to say which DatabaseID the row came from.
Configuration of servers to fetch from, which tables and which columns, is a combination of text file configuration and hard coded values (heh :D).
It's not super fast (sequential fetching so far) but it's absolutely manageable, and the data processing we do afterwards takes far longer time.
Future improvements could be to;
improve error handling if it turns out to be a problem (if a server isn't online, etc).
implement parallel fetching, to reduce the total amount of time to finish fetching.
figure out if it's enough to fetch only some of the rows, like only what's been added/updated.
All in all it turned out to be quite simple, no dependencies to other products, and it works well in practice.
Nothing fancy but couldn't you do something like
DROP TABLE dbo.Merged
INSERT INTO dbo.Merged
SELECT [DatabaseID] = "Database1", * FROM ServerA.dbo.Table
UNION ALL SELECT [DatabaseID] = "Database2", * FROM ServerB.dbo.Table
...
UNION ALL SELECT [DatabaseID] = "DatabaseX", * FROM ServerX.dbo.Table
Advantages
Easy to setup
Easy to maintain
Easy to adjust
Easy to add more tables
Disadvantages
Performance
Reliable logging
We had a similar requirement where we took a different approach. first created a central database to collect the data. Then we created a inventory table to store the list of target servers / databases. Then a small vb.net based CLR procedure which take the path of SQL query, target SQL Instance name and the target table which will store the data(This would eliminate the setup of linked server when new targets are added). This also adds two additional columns to the result set. The Target server name and the timestamp when the data is captured.
Then we set up a service broker queue/service and pushed list of target servers to interogate.
The above CLR procedure is wrapped in another procedure which dequeues the message, executes the SQL on the target server provided. The wrapper procedure is then configured as the activated procedure for the queue.
With this we are able to achieve a bit of parallelism to capture the data.
Advantages :
Easy to setup Easy to manage (Add / Remove targets)
Same framework works for multiple queries
Logging tables to check for failed queries.
Works independent of each target, so if one of the target fails to
respond, others still continue.
Workflow can be pause gracefully by disabling the queue (for
maintenance on central server) and then resume collection be
re-enabling it.
Disadvantage:
requires good understanding of service brokers.
should properly handle poison messages.
Please Let me know if it helps

Migrate and Merge several databases into one

In an update project i have to do the following:
Move 3 databases from SQL2000 to SQL2005 and merge them at the same time. There are already quite a few cross database queries used in SP's and Views.
The current plan is to move each of the old databases into a separate schema in 1 database.
That means we will also have to change our current SP's and Views, we now have:
SELECT OrderId, OrderDate FROM Sales.dbo.Orders
and expect we will have to change that into
SELECT OrderId, OrderDate FROM Sales.Orders
The question is: how do we do that as automated as possible?
I know about SED and similar for changing the scripts. I would welcome tips about how to be 'smart' about this, like strategies for partitioning the scripts, performance (tons of INSERT INTO lines) etc.
Note: I did look at the Import/Export Wizard but apparently I would have to set the Schema manually on each output table and fix the SP's through ALTER scripts anyway.
I did this a couple of years ago, and I ran into a few problems that you want to be aware of.
Assumptions:
You've got a single SQL 2000 database server with 3 databases, A/B/C
You want all of the objects to end up in SQL 2005 in database A (we'll refer to that as the Target)
You want to get rid of databases B and C eventually (the old Sources)
You don't have a full-blown test environment where you can automatically restore your production databases every day, and script this again and again until it's right. (That's the best way, and I've taken that approach too, but it's labor-intensive.)
Here's my hard lessons learned:
Don't do the merge and the SQL 2005 change the same day. Either do the merge before you go to 2005, or after, but don't try to accomplish it all in a single outage. It'll be a finger-pointing mess. If it was me, I'd go to 2005 first just to get it out of the way. That way, I know anything that breaks isn't because of a schema change, and those types of breaks are easier to fix. You want at least a week of end user activity on the 2005 box before you declare victory and move on to the merge.
Build the new objects in Target ahead of time. Even if they're not being queried in your live production apps, go ahead and build 'em now. That way you can populate fake test data in there to test your applications ahead of time. Yes, this means mixing live and test data, but frankly, you're already out there working without a net. Be wary of identity fields, though, since you can end up with conflicting records with the same identity number but different data in the Target and Source databases.
Create views in Target ahead of time. You mentioned that you've got views that already do cross-database queries. Copy those from Source to Target now, and tell any other developers (report guys, power users) to start referring to the Target views instead. This isn't going to speed up your own work, but it speeds up THEIR work. If you can get to the point where you can verify that they're only hitting Target (even though the Target views still point to tables in Source) then it'll make troubleshooting easier on migration day. Then you can start denying permissions on the Source views ahead of time.
Sync tables ahead of time. Make a list of all of the tables that need to be moved out of the Sources, and for each one, analyze how it's being updated. If it's only being inserted into (not updated or deleted), like a log table, then write a T-SQL script to start keeping it in sync in Target. Run that script via a SQL Agent job during periods of low activity on your server, like nightly. This way, when it's go-live day, you won't have to push as many records around, meaning your go-live window will be smaller and your Target transaction logs can stay smaller. Tables that are being constantly updated or deleted aren't as easy, and it's up to you whether you decide to sync those as well. We did it for any tables over a million lines.
Check for record conflicts between the Source databases. It sounds like this one doesn't apply to you specifically, but I'm noting it here in case anybody else does a merge and they're reading it for tips. If you have more than one Source database, dump out the list of objects. If you've got two objects with the same name, check their schema. I've worked with instances where they had a State or Region table in each database, and they were supposed to be identical, but they had identity fields for their primary keys. Each child table (like Customers, which linked to a Region table) referred to the parent table (Region) by the primary key (identity field) - which didn't match from one database to the other. In that case, the smart thing to do is take an outage window ahead of time, before the migration day, to clean those records up with manual update scripts.
Disable any constraints or foreign key relationships
Change the identity fields (if they're lookup tables, you may be able to turn off the identity stuff and just run with manually specified pk numbers)
Modify the Region table to add a NewID field, matching to what it's going to become, and an OldID field, showing what it used to be
Update all of the child tables (Customers) to use the NewID number instead of the original
Update the Region table so that the real ID field now has the NewID value, and the OldID field has what the Region used to be. (You're probably going to screw something up like miss a child table you didn't know about, and you're going to wonder what it used to be.)
Break the migration into pieces. List every stored proc in all of the databases. If any of them can be moved without moving data, do that first. For example, if you've got Source.dbo.usp_RunReport, and it only refers to tables in the Target database, then do that in a first phase. If you've got small system lookup tables that are only used internally in your app, not visible to customers or reports, then put that in the first phase too. It sounds like it's too small to bother with, but the idea is to reduce the amount of panic on migration day. The less you wonder about, the better you can troubleshoot. We moved every static lookup table (State, Region, Calendar, etc) over ahead of time. The amount of work required in Phase 1 - just moving those small, static tables - got management to understand how huge it was going to be to move the rest, and it bought us resources and time we wouldn't have gotten otherwise.
Pre-grow the data files for Target. If you're not using SQL 2005's new Instant File Initialization, data file growths take quite a while. Enable Instant File Initialization if you've got a choice, then grow the data files to make sure they're not fragmented. If they just grow naturally during your migration day, they can be fragmented. If you can't use Instant File Initialization, you still need to pre-grow the files, but you want to do that ahead of time during periods of low activity to speed up the maintenance window.
On migration day, run your inserts one table at a time, or smaller. You want to keep your insert transactions as tight as possible. The smaller your insert transactions, the less space you'll need in the transaction log. Remember that the transaction log will grow with insert statements even in simple mode. After every round of inserts, do a sanity check to make sure that they worked, and that you're not going to run out of drive space for data files or t-log files.
After the updates finish, change security on the Source databases. Put every non-SA login into the dbdenydatareader and dbdenydatawriter roles in the Source databases. That way they can still log in if they've hard-coded the database name in the connection string, but they won't be able to do anything. This makes your troubleshooting easier too: if an app or a query runs into problems, you could consider taking their login out of the deny roles and see if it works - if it does, it's borked. The risk with that is that they might run a transaction that uses the Source database data to update the Target database (get customers from Source, update them in Target) and it might cause issues.
Other options for the Source databases are:
Rename them, so you can still query 'em but the apps won't touch 'em
Detach them, but keep the files available in case you need to troubleshoot
Strip out all logins, and use new logins to access the existing databases just in case. Then if somebody's read-only report is totally borked, you can let it work temporarily by issuing them a new login and telling them it's referring to the wrong database.
After the updates finish, rebuild indexes & statistics on Target. If you're just doing continuous inserts, this isn't a big deal, but if you're merging multiple databases (like two Sales databases that had been broken up into regions of the country) then you'll want to clean things up.
IMHO, use one schema unless you can justify a gain from multiple schemas. This last one is just my two cents, but it sounds like you're going through an awful lot of work to go from 3 databases 1 schema each, to 1 database with 3 schemas. If you're not really sure about the 3 schema thing, you might consider using 1 schema - or else you'll be in another messy rework later on down the road. 3 schemas does make sense if you have specific security needs, but otherwise, just make sure you're getting the bang for the buck that you want. Now would be a great time to go to one schema.
You could give Redgate SQL Compare and Data Compare a shot. They have a schema mapping feature that should let you map the dbo schema to the sales schema in another and then move the tables and procs. It would make it so you don't have to mess with the SQL export wizard. You still would have to refactor your other objects though.
I love these two tools.
edit:
I think you can get a fully functional demo too.
edit:
Additionally, they offer SQL Refactor, which does a 'smart' rename. Score!
Could you have a dummy database called SALES that has a VIEW called [Orders]:
CREATE VIEW Sales.dbo.Orders
AS
SELECT OrderId, OrderDate, ...
FROM CombinedDatabase.Sales.Orders
and then
SELECT ... FROM Sales.dbo.Order
will still work.
You won't be able to INSERT / UPDATE that table without some further jiggery-pokery though.
If you could have such VIEWs log that they were used that would enable you to fix the code that called them!! but I can't think of a way to do that; however you could disable each in turn, run some tests, fix whatever is broken, then move on to next one ... and thus eradicate them by refactoring, but have a largely working application during the process.
I've used SED for this type of thing, but we have unique names for all our tables and all our columns, and we use variable names within our application that match the database column names - so I would have high confidence that changing xxx_yyy_ID to aaa_bbb_ID in our application would work well, and not have accidental side effects.
If you have actual column/table names like "Sales" and "Orders" I think that something like SED would be risky
Ok, so my basic understanding of your problem is something like this:
You have three different databases (i.e. Sales, Manu, Inventory)
They have distinct table & procedure names (no table/proc names in Sales exist in Manu or Inventory)
You want all the tables/procs from all three databases in a single database (i.e. SaleManInv)
Some stored procedures in each database explicitly refer to tables in the other databases (i.e. Sales.dbo.lookupItem() explicitly refers to Inventory.dbo.Items table)
Exporting and importing the tables doesn't seem like it will be a problem, what I would do for the procs:
Export one proc from the SQL Server 2000 db to the SQL Server 2005 DB to determine if you need to get rid of the ".dbo." portion of the cross references.
Export all the procs to text files (same folder for all procs)
Use a text editor with a "Search and Replace in Files" (I use PSPAD) and replace all the "Sales.dbo." with "SaleManInv.dbo.", then all the "Iventory.dbo." with "SameManInv.dbo." etc. to convert all the references to the new db.
Then run the exported and modified procs into your new db.
Is that making any sense? :-)
I was in a similar position where I had several SQL Server 2008 databases that were merged into 1. My solution was to use Integration Services' Transfer Server Objects task into a new target database. All data was copied over along with tables. Afterwards - in what was a very complex query, I scripted out all stored procedures/functions/views/etc. to a file and changed all cross-database references and re-created the stored procedures and other objects.
The trick with the stored procedures was to script them out in the order or syscontraints in order to ensure that stored procedures or functions that were referencing other stored procedures/functions internally were created last.
If there was a tool that I felt could have handled this task in an automated fashion, I would have purchased it immediately.
I would like to know if it's same kind of data. Any way. I would create a new column with the name 'SourceSystem'. So when the boss comes running after:
" - what was the sales diff between databasesystem1 and db2 in 2004".
Then you can answer that. Then in a year or two, if that questions don't pop up. You can delete that column. Merging data removes the origin of the data.

Resources