I maintain a very old data acquisition system which uses a legacy FlashFiler2 database. One of my customers would like the database tables to be mirrored into a SQL Server database for easier post-processing.
There are two types of tables:
Measured values: these tables get new timestamped data every day and my strategy would be to record the timestamp of the last data record that I have already mirrored for every measuring point and only add the new ones to the target database.
Mostly static tables: these tables rarely change and the records don't bear a timestamp. Maybe think of a customers table that rarely gets new entries and the existing records are changed very rarely.
To handle case 2 by brute force, I would have to either clear the target table and recreate it every day or compare each and every record for changes, detect deleted records and add new ones.
What is an efficient way to accomplish this task?
In a related post I found the idea to create MD5 hashes of the target records. The hash could then be used to compare records for changes. Of course I would still have to check for added and deleted records. Would this be worth the effort or should I go with one of the brute force methods?
My tools are: Visual Studio 2017, C# with the ADO.NET provider for SQL Server and FlashFiler2 Delphi components.
I ended up with the following:
1) for all tables that contain rarely changing data I chose to recreate them completely. The performance is indeed better than I had thought.
2) for the measured values tables I chose a configurable lookback from which on the data are chronologically copied to the target database. This is mostly redundant because the lookback by intention should include data that have already been transferred. It is necessary, however, because at times it might be necessary to correct some data in the source that have already been transferred. In my case those corrections usually occur within one or two months. So a lookback of 3 months would usually be enough.
The main performance bottleneck now seems to be my record by record insertion into the target database. I guess there are batch routines for this but that would be a topic for another post.
I have inherited a custom built tool that is poorly designed and unstable, and I have a great opportunity to rebuild it from scratch. This is an internal tool only that works almost entirely in Access, and its purpose is to provide higher detail on parts that cost the company over a certain dollar amount.
How it works:
1) The raw data (new part numbers) gets pulled nightly from the EDW via macros in Access.
2) The same macros then join two tables (part numbers from one, names from another). Any part under a certain dollar amount is removed, and the new data is appended to the existing Access database.
3) During the day employees can then open a custom Access form to add more details about the part. Different questions are asked depending on the part category.
4) The completed form is forwarded to management, and the information entered is retained in the Access database – it does not write back to the EDW.
5) Managers can also pull some basic reports from the database, based on overall costs.
The problems:
1) Currently everyone has to have Access installed on their work stations, and whenever there is an update the new database gets pushed to their stations. This is not considered an ideal situation by management or IT.
2) If anyone has left the tool open accidentally at the end of the day the database is locked out, therefore the macros cannot run and the tool cannot be updated with new part numbers.
3) If the tool cannot update for a few days in a row the database can become corrupted. We can restore from the last good backup, in the past this has resulted in the loss of multiple days of work.
Ideally we want to take the tool completely out of Access. I am building a SharePoint site that can host the tool, which (if I can get it right) will eliminate the need of Access on end-user stations with a database push. However the SharePoint form would need read/write ability.
The big question is: How do I build this?
I have a completely open path of possibilities – I can design it work any way I want, using any tools or platform I want, as long as it works. It does not have to update automatically, as I already run a number of SQL scripts at the start of my day and adding one more is inconsequential.
The resources I have at my immediate disposal are: SharePoint (with designer), Access, Toad, and SQL Server. The database can be hosted on a shared network drive.
I am a recent college graduate with basic SQL knowledge. I have about a year to produce a final product, but would like to get it up and running far sooner if possible.
Any advice on what direction to pursue would be very helpful, thank you.
Caveat: I've never worked with SQL Server, so I don't know all of it's capabilities (I'm an Oracle developer).
What I'd do in your situation is something like the following (although not necessarily in this exact order):
Get a SQL Server database set up to host your tables.
Create the tables etc
Migrate test data across (I'm assuming you have a dev/uat/test environment for your current system! If you haven't, make sure you set up at least a separate test environment to prod for your new db!)
Write stored procs to do the work for adding new parts, updating existing data, etc etc
Set up an automated job on the db (I'm assuming SQL Server can do this!) to do the overnight processing.
Create a separate db user with the necessary permissions to call the stored procedures
Get your frontend to call the stored procs with the relevant parameters using the db user you created in step 6 to connect to the db.
You'd also have to think about transaction control to try and mitigate the case where users go home at the end of the day without committing their work - Does the db handle the commits/rollbacks or does Sharepoint?
Once you've worked out everything in your test environment, it's then a case of creating the prod db, users and objects, and then working out the best way of migrating the prod data across.
Good luck.
Don't forget to get backups for the new db set up as well.
I did read posts about transactional and reporting database.
We have in single table which is used for reporting(historical) purpose and transactional
eg :order with fields
orderid, ordername, orderdesc, datereceived, dateupdated, confirmOrder
Is it a good idea to split this table into neworder and orderhistrory
The new ordertable records the current days transaction (select,insert and update activity every ms for the orders received on that day .Later we merge this table with order history
Is this a recommended approach.
Do you think this is would minimize the load and processing time on the Database?
PostgreSQL supports basic table partitioning which allows splitting what is logically one large table into smaller physical pieces. More info provided here.
To answer your second question: No. Moving data from one place to another is an extra load that you otherwise wouldn't have if you used the transational table for reporting. But there are some other questions you need to ask before you make this decision.
How often are these reports run?
If you are running these reports once an hour, it may make sense to keep them in the same table. However, if this report takes a while to run, you'll need to take care not to tie up resources for the other clients using it as a transactional table.
How up-to-date do these reports need to be?
If the reports are run less than daily or weekly, it may not be critical to have up to the minute data in the reports.
And this is where the reporting table comes in. The approaches I've seen typically involve having a "data warehouse," whether that be implemented as a single table or an entire database. This warehouse is filled on a schedule with the data from the transactional table, which subsequently triggers the generation of a report. This seems to be the approach you are suggesting, and is a completely valid one. Ultimately, the one question you need to answer is when you want your server to handle the load. If this can be done on a schedule during non-peak hours, I'd say go for it. If it needs to be run at any given time, than you may want to keep the single-table approach.
Of course there is nothing saying you can't do both. I've seen a few systems that have small on-demand reports run on transactional tables, scheduled warehousing of historical data, and then long-running reports against that historical data. It's really just a matter of how real-time you want the data to be.
I was looking at godaddy.com which says they offer up to 10 MySQL DBs, but I don't know why you would need more than 1 ever since a DB can have mutliple tables. Can't multiple DBs be integrated into a single DB? Is there an example case where its better or unfeasible to not have multiple ones? And how do you differentiate between them when you want to call them, from their directory or from a name?
Best,
I guess separation of concerns would be the most obvious answer. In the same way you can have all of your functionality in one humongous class in object oriented programming, it's a good idea to keep non-related information separate. It's easier to wrap your head around smaller chunks of data, and future developers mights start to think tables are related, and aggregate data in a way they were never meant to.
Imagine that you're doing two different projects with two different teams. Maybe you won't one team to access the other team tables.
There can also be a space limit in each database, and It each one can be configured with specific params to optimize the performance.
In other hand, two final users can be assigned to make the backups of each entire database, and you wan`t one user to make the backup of the other DB because he could be able to restore the database in other place and access the first database data.
I'm sure there are some pretty good DBAs on the forum who can answer this in detail.
Storing tables in different databases makes because you are able to backup them up individually. Furthermore, you will be able to control access to each database under different NT groups (e.g. Admin vs. users). Although this can be done at the indvidual table level, sometimes it makes sense to grant or deny access to an entire database to a particular group.
When you need to call them in SQL Server you need to append the database name to the query like this SELECT * FROM [MyDatabase].[dbo].[MyTable].
One other reason to use separate databases relates to whether you need full transactional recovery or not. For instance, if I havea bunch of tables that are populated on a schedule through import processes and never by the users, putting them in a separate database allows me to set the recovery mode to simple which reduces the logging (a good thing when you are loading millions of records at once). I can also not do transactional log backup every fifteen minutes like I do for the data in the database with the user inserted data. It could also make recovery a faster process when needed as the databases would be smaller and thus individally take less time to recover. Won't help much when the whole server crashes but it could help a lot if onely one datbase gets corrupted for some reason. If the data relates to different applications, it simplifies the security as well to have the data in separte databases. And of course sometimes we have commercial databases and we can;t add tables to those and so may need a separate database to handles some things we want to add to that data (we do this for instance with our Project Management software, we have a spearate database where we extract and summarize data from the PM system for reporting and then write all our custome reports off that.)
In an update project i have to do the following:
Move 3 databases from SQL2000 to SQL2005 and merge them at the same time. There are already quite a few cross database queries used in SP's and Views.
The current plan is to move each of the old databases into a separate schema in 1 database.
That means we will also have to change our current SP's and Views, we now have:
SELECT OrderId, OrderDate FROM Sales.dbo.Orders
and expect we will have to change that into
SELECT OrderId, OrderDate FROM Sales.Orders
The question is: how do we do that as automated as possible?
I know about SED and similar for changing the scripts. I would welcome tips about how to be 'smart' about this, like strategies for partitioning the scripts, performance (tons of INSERT INTO lines) etc.
Note: I did look at the Import/Export Wizard but apparently I would have to set the Schema manually on each output table and fix the SP's through ALTER scripts anyway.
I did this a couple of years ago, and I ran into a few problems that you want to be aware of.
Assumptions:
You've got a single SQL 2000 database server with 3 databases, A/B/C
You want all of the objects to end up in SQL 2005 in database A (we'll refer to that as the Target)
You want to get rid of databases B and C eventually (the old Sources)
You don't have a full-blown test environment where you can automatically restore your production databases every day, and script this again and again until it's right. (That's the best way, and I've taken that approach too, but it's labor-intensive.)
Here's my hard lessons learned:
Don't do the merge and the SQL 2005 change the same day. Either do the merge before you go to 2005, or after, but don't try to accomplish it all in a single outage. It'll be a finger-pointing mess. If it was me, I'd go to 2005 first just to get it out of the way. That way, I know anything that breaks isn't because of a schema change, and those types of breaks are easier to fix. You want at least a week of end user activity on the 2005 box before you declare victory and move on to the merge.
Build the new objects in Target ahead of time. Even if they're not being queried in your live production apps, go ahead and build 'em now. That way you can populate fake test data in there to test your applications ahead of time. Yes, this means mixing live and test data, but frankly, you're already out there working without a net. Be wary of identity fields, though, since you can end up with conflicting records with the same identity number but different data in the Target and Source databases.
Create views in Target ahead of time. You mentioned that you've got views that already do cross-database queries. Copy those from Source to Target now, and tell any other developers (report guys, power users) to start referring to the Target views instead. This isn't going to speed up your own work, but it speeds up THEIR work. If you can get to the point where you can verify that they're only hitting Target (even though the Target views still point to tables in Source) then it'll make troubleshooting easier on migration day. Then you can start denying permissions on the Source views ahead of time.
Sync tables ahead of time. Make a list of all of the tables that need to be moved out of the Sources, and for each one, analyze how it's being updated. If it's only being inserted into (not updated or deleted), like a log table, then write a T-SQL script to start keeping it in sync in Target. Run that script via a SQL Agent job during periods of low activity on your server, like nightly. This way, when it's go-live day, you won't have to push as many records around, meaning your go-live window will be smaller and your Target transaction logs can stay smaller. Tables that are being constantly updated or deleted aren't as easy, and it's up to you whether you decide to sync those as well. We did it for any tables over a million lines.
Check for record conflicts between the Source databases. It sounds like this one doesn't apply to you specifically, but I'm noting it here in case anybody else does a merge and they're reading it for tips. If you have more than one Source database, dump out the list of objects. If you've got two objects with the same name, check their schema. I've worked with instances where they had a State or Region table in each database, and they were supposed to be identical, but they had identity fields for their primary keys. Each child table (like Customers, which linked to a Region table) referred to the parent table (Region) by the primary key (identity field) - which didn't match from one database to the other. In that case, the smart thing to do is take an outage window ahead of time, before the migration day, to clean those records up with manual update scripts.
Disable any constraints or foreign key relationships
Change the identity fields (if they're lookup tables, you may be able to turn off the identity stuff and just run with manually specified pk numbers)
Modify the Region table to add a NewID field, matching to what it's going to become, and an OldID field, showing what it used to be
Update all of the child tables (Customers) to use the NewID number instead of the original
Update the Region table so that the real ID field now has the NewID value, and the OldID field has what the Region used to be. (You're probably going to screw something up like miss a child table you didn't know about, and you're going to wonder what it used to be.)
Break the migration into pieces. List every stored proc in all of the databases. If any of them can be moved without moving data, do that first. For example, if you've got Source.dbo.usp_RunReport, and it only refers to tables in the Target database, then do that in a first phase. If you've got small system lookup tables that are only used internally in your app, not visible to customers or reports, then put that in the first phase too. It sounds like it's too small to bother with, but the idea is to reduce the amount of panic on migration day. The less you wonder about, the better you can troubleshoot. We moved every static lookup table (State, Region, Calendar, etc) over ahead of time. The amount of work required in Phase 1 - just moving those small, static tables - got management to understand how huge it was going to be to move the rest, and it bought us resources and time we wouldn't have gotten otherwise.
Pre-grow the data files for Target. If you're not using SQL 2005's new Instant File Initialization, data file growths take quite a while. Enable Instant File Initialization if you've got a choice, then grow the data files to make sure they're not fragmented. If they just grow naturally during your migration day, they can be fragmented. If you can't use Instant File Initialization, you still need to pre-grow the files, but you want to do that ahead of time during periods of low activity to speed up the maintenance window.
On migration day, run your inserts one table at a time, or smaller. You want to keep your insert transactions as tight as possible. The smaller your insert transactions, the less space you'll need in the transaction log. Remember that the transaction log will grow with insert statements even in simple mode. After every round of inserts, do a sanity check to make sure that they worked, and that you're not going to run out of drive space for data files or t-log files.
After the updates finish, change security on the Source databases. Put every non-SA login into the dbdenydatareader and dbdenydatawriter roles in the Source databases. That way they can still log in if they've hard-coded the database name in the connection string, but they won't be able to do anything. This makes your troubleshooting easier too: if an app or a query runs into problems, you could consider taking their login out of the deny roles and see if it works - if it does, it's borked. The risk with that is that they might run a transaction that uses the Source database data to update the Target database (get customers from Source, update them in Target) and it might cause issues.
Other options for the Source databases are:
Rename them, so you can still query 'em but the apps won't touch 'em
Detach them, but keep the files available in case you need to troubleshoot
Strip out all logins, and use new logins to access the existing databases just in case. Then if somebody's read-only report is totally borked, you can let it work temporarily by issuing them a new login and telling them it's referring to the wrong database.
After the updates finish, rebuild indexes & statistics on Target. If you're just doing continuous inserts, this isn't a big deal, but if you're merging multiple databases (like two Sales databases that had been broken up into regions of the country) then you'll want to clean things up.
IMHO, use one schema unless you can justify a gain from multiple schemas. This last one is just my two cents, but it sounds like you're going through an awful lot of work to go from 3 databases 1 schema each, to 1 database with 3 schemas. If you're not really sure about the 3 schema thing, you might consider using 1 schema - or else you'll be in another messy rework later on down the road. 3 schemas does make sense if you have specific security needs, but otherwise, just make sure you're getting the bang for the buck that you want. Now would be a great time to go to one schema.
You could give Redgate SQL Compare and Data Compare a shot. They have a schema mapping feature that should let you map the dbo schema to the sales schema in another and then move the tables and procs. It would make it so you don't have to mess with the SQL export wizard. You still would have to refactor your other objects though.
I love these two tools.
edit:
I think you can get a fully functional demo too.
edit:
Additionally, they offer SQL Refactor, which does a 'smart' rename. Score!
Could you have a dummy database called SALES that has a VIEW called [Orders]:
CREATE VIEW Sales.dbo.Orders
AS
SELECT OrderId, OrderDate, ...
FROM CombinedDatabase.Sales.Orders
and then
SELECT ... FROM Sales.dbo.Order
will still work.
You won't be able to INSERT / UPDATE that table without some further jiggery-pokery though.
If you could have such VIEWs log that they were used that would enable you to fix the code that called them!! but I can't think of a way to do that; however you could disable each in turn, run some tests, fix whatever is broken, then move on to next one ... and thus eradicate them by refactoring, but have a largely working application during the process.
I've used SED for this type of thing, but we have unique names for all our tables and all our columns, and we use variable names within our application that match the database column names - so I would have high confidence that changing xxx_yyy_ID to aaa_bbb_ID in our application would work well, and not have accidental side effects.
If you have actual column/table names like "Sales" and "Orders" I think that something like SED would be risky
Ok, so my basic understanding of your problem is something like this:
You have three different databases (i.e. Sales, Manu, Inventory)
They have distinct table & procedure names (no table/proc names in Sales exist in Manu or Inventory)
You want all the tables/procs from all three databases in a single database (i.e. SaleManInv)
Some stored procedures in each database explicitly refer to tables in the other databases (i.e. Sales.dbo.lookupItem() explicitly refers to Inventory.dbo.Items table)
Exporting and importing the tables doesn't seem like it will be a problem, what I would do for the procs:
Export one proc from the SQL Server 2000 db to the SQL Server 2005 DB to determine if you need to get rid of the ".dbo." portion of the cross references.
Export all the procs to text files (same folder for all procs)
Use a text editor with a "Search and Replace in Files" (I use PSPAD) and replace all the "Sales.dbo." with "SaleManInv.dbo.", then all the "Iventory.dbo." with "SameManInv.dbo." etc. to convert all the references to the new db.
Then run the exported and modified procs into your new db.
Is that making any sense? :-)
I was in a similar position where I had several SQL Server 2008 databases that were merged into 1. My solution was to use Integration Services' Transfer Server Objects task into a new target database. All data was copied over along with tables. Afterwards - in what was a very complex query, I scripted out all stored procedures/functions/views/etc. to a file and changed all cross-database references and re-created the stored procedures and other objects.
The trick with the stored procedures was to script them out in the order or syscontraints in order to ensure that stored procedures or functions that were referencing other stored procedures/functions internally were created last.
If there was a tool that I felt could have handled this task in an automated fashion, I would have purchased it immediately.
I would like to know if it's same kind of data. Any way. I would create a new column with the name 'SourceSystem'. So when the boss comes running after:
" - what was the sales diff between databasesystem1 and db2 in 2004".
Then you can answer that. Then in a year or two, if that questions don't pop up. You can delete that column. Merging data removes the origin of the data.