I have really odd user requirement. I have tried to explain to them there are much better ways of supporting their business process and they don't want to hear it. I am tempted to walk away but I first want to see if maybe there is another way.
Is there any way that I can lock a whole database as opposed to row-lock or table-lock. I know I can perhaps put the database into single-user mode but that means only one person can use it at a time. I would like many people to be able to read at a time but only one person to be able to write to it at a time.
They are trying to do some really odd data migration.
What do you want to achieve?
Do you want to make the whole database read-only? You can definitely do that
Do you want to prevent any new clients from connecting to the database? You can definitely do that too
But there's really no concept of a "database lock" in terms of only ever allowing one person to use the database. At least not in SQL Server, not that I'm aware of. What good would that make you, anyway?
If you want to do data migration out of this database, then setting the database into read-only mode (or creating a snapshot copy of it) will probably be sufficient and the easiest way to go.
UPDATE: for the scenario you mention (grab the data for people with laptops, and then re-syncronize), you should definitely check out ADO.NET Sync Services - that's exactly what it's made for!
Even if you can't use ADO.NET Sync Services, you should still be able to selectively and intelligently update your central database with the changes from laptops without locking the entire database. SQL Server has several methods to update rows even while the database is in use - there's really no need to completely lock the whole database just to update a few rows!
For instance: you should have a TIMESTAMP (or ROWVERSION) column on each of your data tables, which would easily allow you to see if any changes have occured at all. If the TIMESTAMP field (which is really just a counter - it has nothing to do with date or time) has not changed, the row has not changed and thus doesn't need to be considered for an update.
Related
We are running a pretty uncommon erp-system of a small it-business which doesn't allow us to modify data in an extensive way. We thought about doing a data update by exporting the data we wanted to change directly from the db and by using Excel VBA to update a bunch of data of different tables. Now we got the data updated in excel which is supposed to be written into the Oracle DB.
The it-business support told us not to do so, because of all the triggers running in the background during a regular data update in their program. We are pretty afraid of damaging the db so we are looking for the best way to do the data update without bypassing any trigger. To be more specific there are some thousands of changes we've done in different columns and tables merged all together in one Excel-file. Now we have to be sure to insert the modified data into the db and firing all the triggers the erp-software does during data update.
Is there anyone who knows a good way to do so?
I don't know what ERP system you are using, but I can relate some experiences from Oracle's E-Business Suite.
Nowadays, Oracle's ERP includes a robust set of APIs that will allow your custom programs to safely maintain ERP data. For example, if you want to modify a sales order, you use Oracle's API for that purpose and it makes sure all the necessary, related validations and logic are applied.
So, step #1 -- find out if your ERP system offers any APIs to allow you to safely update your data.
Back in the early days of Oracle's ERP, there were not so many APIs. In those days, when we needed to update a lot of table and had no API available, the next approach would be to use some sort of data loader tool. The most popular was, in fact, called "Data Loader". What this would do is read your data from an Excel spreadsheet and send it to the ERP's user interface -- exactly as though it were being typed in by a user. Since the data went through the ERP's UI, all the necessary validations and logic would automatically be applied.
In really extreme cases, when there was no API and DataLoader was, for whatever reason, not practical, it was still sometimes deemed necessary and worth the risk to attempt our own direct update of the ERP tables. This is, in general, risky and a bad practice, but sometimes we do what we must.
In these cases, we would start a database trace going on a user's session as they keyed in a few updates via the ERP's user interface. Then, we would use the trace to figure out what validations and related logic we needed to apply during our custom direct updates. We would also analyze the source code of the ERP system (since we had it available in the case of Oracle's ERP). Then, we would test it extensively. And, after all that, it was still risky and also prone to break after upgrades. But, in general, it worked as a last resort.
No my problem is that I need to do the work fast by make some automation in my processes. The work is already done on excel that's true but it needed the modification anyway. It's only if I put it manually with c&p into the db over our ERP or all at once over I don't know what.
But I guess Mathew is right. There are validation processes in the ERP so we can't write it directly into the db.
I don't know maybe you could contact me if you have a clue to bypass the ERP in a non risky manner.
A client has one of my company's applications which points to a specific database and tables within the database on their server. We need to update the data several times a day. We don't want to update the tables that the users are looking at in live sessions. We want to refresh the data on the side and then flip which database/tables the users are accessing.
What is the accepted way of doing this? Do we have two databases and rename the databases? Do we put the data into separate tables, then rename the tables? Are there other approaches that we can take?
Based on the information you have provided, I believe your best bet would be partition switching. I've included a couple links for you to check out because it's much easier to direct you to a source that already explains it well. There are several approaches with partition switching you can take.
Links: Microsoft and Catherin Wilhelmsen blog
Hope this helps!
I think I understand what you're saying: if the user is on a screen you don't want the screen updating with new information while they're viewing it, only update when they pull a new screen after the new data has been loaded? Correct me if I'm wrong. And Mike's question is also a good one, how is this data being fed to the users? Possibly there's a way to pause that or something while the new data is being loaded. There are more elegant ways to load data like possibly partitioning the table, using a staging table, replication, have the users view snapshots, etc. But we need to know what you mean by 'live sessions'.
Edit: with the additional information you've given me, partition switching could be the answer. The process takes virtually no time, it just changes the pointers from the old records to the new ones. Only issue is you have to partition on something patitionable, like a date or timestamp, to differentiate old and new data. It's also an Enterprise-Edition feature and I'm not sure what version you're running.
Possibly a better thing to look at is Read Committed Snapshot Isolation. It will ensure that your users only look at the new data after it's committed; it provides a transaction-level consistent view of the data and has minimal concurrency issues, though there is more overhead in TempDB. Here are some resources for more research:
http://www.databasejournal.com/features/mssql/snapshot-isolation-level-in-sql-server-what-why-and-how-part-1.html
https://msdn.microsoft.com/en-us/library/tcbchxcb(v=vs.110).aspx
Hope this helps and good luck!
The question details are a little vague so to clarify:
What is a live session? Is it a session in the application itself (with app code managing it's own connections to the database) or is it a low level connection per user/session situation? Are users just running reports or active reading/writing from the database during the session? When is a session over and how do you know?
Some options:
1) Pull all the data into the client for the entire session.
2) Use read committed or partitions as mentioned in other answers (however requires careful setup for your queries and increases requirements for the database)
3) Use replica database for all queries, pause/resume replication when necessary (updating data should be faster than your process but it still might take a while depending on volume and complexity)
4) Use replica database and automate a backup/restore from the master (this might be the quickest depending on the overall size of your database)
5) Use multiple replica databases with replication or backup/restore and then switch the connection string (this allows you to update the master constantly and then update a replica and switch over at a certain predictable time)
I have this problem where I need audit trails (typically stored in DB) to be non-editable and deletable even for DBAs and System Admins.
1 way is to apply encryption and checksums, but this only allows detection of changes or prevention of snooping. It does not prevent a DBA to just delete a row.
Any discussion on this matter is appreciated.
If you want the audit trails to be non editable even by the DBAs and system admins, you would need to store them outside of equipment that is in their control.
However that would lead to the same problem - the DBAs and system admins of this system would be able to edit them.
The best bet is to have a system where you store these in two disparate locations that do not share an admin and have periodic comparision checks.
Alternatively you can have triggers on update/delete when they are made by a specific user or from a particular client. These triggers could be programmed to send email or text messages if such a non-application update or delete is made.
It should be known - very well known in the admin/dba community that such triggers exist. You wil not be able to prevent the updates or deletes but will definitely get them to stay away from that table.
There is still a catch however, which is the ability to remove or modify the trigger code.
There exists "write-once" archival storage systems, such as Venti from Plan 9. That doesn't stop anyone with physical access taking a magnet to the hard disk or similar of course ;)
A sufficiently savvy sysadmin could create a slightly modified version of the data and replace the reference to the venti score though... and an equally savvy sysadmin could still recover the original data.
Anyway, I think you could learn a lot from studying append-only storage systems. They make a lot of sense for storing audit trails, compared to a DB.
There exists appliances that act as sniffers and are able to record every single command executed on the database. IBM Guardium is an example.
I have a process that reads raw data and writes this to a database every few seconds.
What is the best way to tell if the database has been written to? I know that Oracle and MS-SQL can use triggers or something to communicate with other services, but I was hoping there would be a technique that would work with more types of SQL databases (SQL lite, MySQL, PostGRES).
Your question is lacking specifics needed for a good answer but I'll give it a try. Triggers are good for targeting tables but if you are interested in system-wide writes then you'll need a better method that is easier to maintain. For system-wide writes I'd investigate methods that detect changes in the transaction log. Unfortunately, each vendor implements this part differently, so one method that works for all vendors is not likely. That is, a method that works within the database server is unlikely. But there may be more elegant ways outside of the server at the OS level. For instance, if the transaction log is a file on disk then a simple script of some sort that detects changes in the file would indicate the DB was written to.
Keep in mind you have asked only to detect a db write. If you need to know what type of write it was then you'll need to get into the transaction log to see what is there. And that will definitely be vendor specific.
It depends on what you wish to do. If it is something external to the database that needs to be kicked off then a simple poll of the database would do the trick, otherwise a db specific trigger is probably best.
If you want to be database independant, polling can work. It's not very efficient or elegant. It also works if you are cursed to using a database that doesn't support triggers. A workaround that we've used in the past is to use a script that is timed (say via cron) to do a select MAX(primary_key_id) from saidTable. I am assuming that your primary key is an a sequential integer and is indexed.
And then compare that to the value you obtained the last time you ran the script. If they match, tell the script to exit or sleep. If not, do your thing.
There are other issues with this approach (ie: backlogs if the script takes too long, or concurrency issues, etc.). And of course performance can become an issue too!
I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.