I am researching archiving options for our database application (highly normalized schema) and would appreciate expert feedback. We are using Sql Server 2005, but if something works only in 2008 R2 that may be an option for us.
Primary reason for archiving is to remove old data on an annual basis. The criteria to determine which objects can be archived will not be straightforward (ie: not just filtering by a date, but many more considerations involved).
Archiving needs to be basically a push button on the application (ie: not by actual DBA on the database server).
Data should be retrievable, but perhaps by special request. Perhaps an object and all its related pieces could be searched for and brought back into the current database? (Again, via the application interface.)
Another important requirement is to maintain integrity of related data. If an archived object is related to a non-archived object, I want to ensure the non-archived object can't be deleted through the interface. Currently we have many checks in place to ensure you can't delete items if they're in use, and I hesitate to alter all of those checks to join an _archive table or use a new view. Is there another way?
I have read about table/index partitioning and although it is interesting, it sounds like perhaps a LOT of work considering how many stored procedures, views, indexes, etc that we use.
What is your motivation for archiving?
You mention you want to "remove old data" but since you need it to be constantly available that doesn't make any sense.
The easiest thing to do in your situation will be a "soft" archive, where you add an Archived bit field to all your tables that indicates if a row is active or not. Then all your existing referential checks stay in place, but you need to add a filter on that bit in your views or queries, and add it to most of your indexes.
You don't really need to do an offload since you can't move the data off the server anyways.
Related
My team is looking into db migration tools (e.g., Flyway, Liquibase) and so I'm thinking about how to incorporate changes I make to the db contents using my groovy+grails service method. I'm not referring to changes to columns and/or tables (i.e., domain classes), I'm referring to inserts/updates of rows which represent configuration values for the associated webapp.
My service method is written to be used somewhat interactively. That is, when I'm adding or updating rows in various tables (i.e., newInstance or save), it helps me navigate various db constraints and to make sure all the foreign keys and my own business logic are set correctly. I run it repeatedly (rolling back each time afterwards using setRollbackOnly()) until I've found something I'm happy with. The method is written in groovy, and I don't want to rewrite it in sql.
Is there a way to get groovy/grails to emit the sql it would execute instead of executing the sql? That is, give me something I could copy/paste into a Flyway migration or Liquibase changeset?
I looked into logging, but I'd have to somehow process that output to substitute the values in and to get the proper column names, and even then I'd need a distinction between lines that I actually change the db (maybe I could just extract the inserts and updates). I also looked into these
grails database migration scripts, but they appear to either look at domain classes (which isn't where my changes are happening) or at the entire database (which would sweep up a lot of user data too).
Thanks!
I've got a reasonably large / complicated DB which I need to upgrade in the field from version 1 to version 2. There's a lot of changes in schema and importantly data between the two.
Yes, I know this should have been version controlled alla:
http://www.codinghorror.com/blog/2008/02/get-your-database-under-version-control.html
but it wasn't - it will be when I am done.
So, current problem, I'm face with the choice of either going through all the commits or trying to diff between two versions of the db. So far I've tried:
http://opendbiff.codeplex.com/
http://www.sqldelta.com/
http://www.red-gate.com/
However none of them seem to be able to successfully generate schema upgrade scripts because they don't also do the data at the same time. This results in foreign key violations when adding new keys to tables as the table it references is new and while the schema for the table has been created, the data whcih it contains has not. Well it could be, but that requires me to use a different part of the tool and then mix together the two scripts.
I know this may look like a duplicate of:
What is best tool to compare two SQL Server databases (schema and data)?
which is where I found most of the existing tools I've tried, but so far I've not managed to get any of these to produce a working schema migration script (I'm really not too fussed about the data, but I do need the data which is required for foreign keys - which tbh is all the difference as I've deploy old version and new version).
Am I expecting too much?
Should I give up and start manually stitching together what I do have?
Or do I go through all the commits and manually create upgrade scripts?
I can't think of more powerful tools available than the ones you seem to have tried. If those fail, my homegrown versioning system probably won't help you much either.
However, you should be able to generate an update script and then manually edit it to add the data transformations to it.
And/or you could disable the foreign key constraints for the time that the update script runs.
There is no such thing as doing schema and data "at the same time". Even if you have them in one big script you would still be doing the schema first and then the data. If the schema script creates a new table and adds a constraint to it there is no reason you should get a referential integrity violation error as there are no rows in those tables.
In any case, you should give our xSQL Schema Compare and Data Compare tools a try, you will be impressed with the performance and the level of control you get.
I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.
You might want to look into adding Veil to your PostgreSQL installation.
Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.
I've been tasked with revisiting a database schema we designed and use internally for various ticketing and reporting systems. Currently there exists about 40 tables in one Oracle database schema supporting perhaps six webapps.
However, there's one unifying relationship amongst them all: a rooms table describing the room. Room name, purpose and other data are thrown into a shared table for each app. My initial idea was to pull each of these applications into a separate database, and perform joins between a given database and the room database. But I've discovered this solution prevents foreign key constraints in SQL Server 2005. It seems silly to duplicate one table for each app and keep those multiple copies synchronized.
Should I just leave everything in one large DB, or is there something else I can do separate the tables without losing FK constraints?
The only way to achieve built-in referential integrity is to have the table inside the database in which it is referenced. You might be able to achieve the equivalent of referential integrity using triggers but it would likely be deathly slow.
You might be able to use SQL Server replication, in it's "Transactional replication" mode/form. http://msdn.microsoft.com/en-us/library/ms151176.aspx
if all the apps truly use and depend on the rooms - then keep them all in one DB.
you can still set privilege on the tables properly, and manage the data sets in the non overlapping areas normally -
is there any task you imagine you will not be able to perform when things are together?
We have an application that has 1000+ databases and 600+ sprocs. Each database represents a different client.
Problem: We need to move this to a single database while creating as little effect on the ui as possible, meaning dont change all the sproc signatures at 1 time.
The connection string currently sets the database attribute, a proposal is to move that to the user attribute. This attribute (using SYSTEM_USER) could be used to determine the site identifier which would be used on the where clause.
The above would not be final solution, but allows us to make changes to the sproc signature at a slow controlled pace. Once all are done we can correct the connstring and get some connection pooling.
Are there any limitation to the number of logins/users that we can have on sqlserver 2005/8. Or has anyone been down this path that could shed some light on a better option.
See my answer here
Ideas for Combining Thousand Databases into One Database
Sounds like you two are working the same project. YOu will need to change every proc before you can move to one datbase or each client will see the others' data.
As for the number of logins on SQL Server 2005 / 08 - I don't think anyone has ever run into a hard limit here. A few thousand will NOT be any problem at all.
What you could consider for this scenario might be one schema inside your single DB per customer, e.g. customer "Miller" has a "miller" schema, with its objects inside, and customer "Brown" will have a "brown" schema.
And contrary to what HLGEM just responded - no, customers won't see each others data, if you specify proper permissions - each customer (and its users) into its own schema only - should work just fine.
Marc
You might also consider setting a distinctive application name in the connection string rather than using a distinctive user, which you can get into your where clause using APP_NAME(). I'm sure that SQL Server won't have a problem with thousands of logins, but you may prefer not to have to create them.