I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.
Related
We are running a pretty uncommon erp-system of a small it-business which doesn't allow us to modify data in an extensive way. We thought about doing a data update by exporting the data we wanted to change directly from the db and by using Excel VBA to update a bunch of data of different tables. Now we got the data updated in excel which is supposed to be written into the Oracle DB.
The it-business support told us not to do so, because of all the triggers running in the background during a regular data update in their program. We are pretty afraid of damaging the db so we are looking for the best way to do the data update without bypassing any trigger. To be more specific there are some thousands of changes we've done in different columns and tables merged all together in one Excel-file. Now we have to be sure to insert the modified data into the db and firing all the triggers the erp-software does during data update.
Is there anyone who knows a good way to do so?
I don't know what ERP system you are using, but I can relate some experiences from Oracle's E-Business Suite.
Nowadays, Oracle's ERP includes a robust set of APIs that will allow your custom programs to safely maintain ERP data. For example, if you want to modify a sales order, you use Oracle's API for that purpose and it makes sure all the necessary, related validations and logic are applied.
So, step #1 -- find out if your ERP system offers any APIs to allow you to safely update your data.
Back in the early days of Oracle's ERP, there were not so many APIs. In those days, when we needed to update a lot of table and had no API available, the next approach would be to use some sort of data loader tool. The most popular was, in fact, called "Data Loader". What this would do is read your data from an Excel spreadsheet and send it to the ERP's user interface -- exactly as though it were being typed in by a user. Since the data went through the ERP's UI, all the necessary validations and logic would automatically be applied.
In really extreme cases, when there was no API and DataLoader was, for whatever reason, not practical, it was still sometimes deemed necessary and worth the risk to attempt our own direct update of the ERP tables. This is, in general, risky and a bad practice, but sometimes we do what we must.
In these cases, we would start a database trace going on a user's session as they keyed in a few updates via the ERP's user interface. Then, we would use the trace to figure out what validations and related logic we needed to apply during our custom direct updates. We would also analyze the source code of the ERP system (since we had it available in the case of Oracle's ERP). Then, we would test it extensively. And, after all that, it was still risky and also prone to break after upgrades. But, in general, it worked as a last resort.
No my problem is that I need to do the work fast by make some automation in my processes. The work is already done on excel that's true but it needed the modification anyway. It's only if I put it manually with c&p into the db over our ERP or all at once over I don't know what.
But I guess Mathew is right. There are validation processes in the ERP so we can't write it directly into the db.
I don't know maybe you could contact me if you have a clue to bypass the ERP in a non risky manner.
This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).
What is the best way or recommended best practice in the flow of database driven asp.net web application? I mean the database first or coding first or side by side?
Your data access code won't compile without an existing database - unless you stub (or Mock) it. So probably the database comes first.
But it is a bad idea to do whole chunks of the application in isolation. Ideally you should design and build slivers of the system - database and application - hand-in-hand. These slivers should be cohesive sub-sets of functionality, probably smaller than sub-systems. Inevitably, the act of coding screens and business rules will throw up problems in the data model. So it is good to have a data modeller or DBA who is happy to work incrementally alongside the developers.
edit
Stephanie makes an extremely pertinent point:
"the core tables which are persisting
your app's data really can't be
piecemealed. Most of the data is known
at project start. It has a form, you
need to find it."
I agree that the core entities are knowable at project start, and the physical data model can be derived from that logical data model. But I don't think it is ever possible to nail down completely the structure of any table, even a core table, at the start. This is because at the start of the design/build phase all we have to go on are the Requirements, and if there's one thing history tells us about the Requirements it is that they will change.
So, new tables will be needed and some existing tables will become obsolete. There will be columns which need to be added, columns which need to be modified, columns which need to be dropped. This is why Nature gave us the ALTER TABLE statement.
I am not suggesting that we don't design our tables, or assemble them piecemeal. I am merely suggesting that when we start designing the HR sub-system we need to worry about the EMPLOYEES table and the SALARIES table. We don't need to concern ourselves with INVENTORY or ORDERS until we commence work on Sales.
We personally start with the Domain and do things side-by-side. The important part is that we implement vertical slices of the application (fully working end-to-end features), not horizontal slices (e.g. first the whole database layer, then the data access, then the services, then the presentation): we build the application incrementally and demonstrate progress with working code after each iteration.
Applications are all about features.
You don't build apps to store data,
but to provide functionality. If we
can't agree on that, the discussion is
moot of course. Software should be
developed to satisfy the needs of its
users and not of its developers.
Well I have really no understanding of the second sentence. If you think my company pays me a good salary to write code that satisfies me and not my users you're crazy. So that argument is a strawman. Back to the first.
This is a common view point of application centric people (they), vs. database centric people (We). They see the entire point of the exercise to "provide features". Those are things the clients know they want and ask for them. To them, the database is just persistence required for these features. And when they are done, that's it, features delivered, database is sufficient for those features. Could be an entire Rube Goldberg inside the database with redundant data, severe violations of normal forms, constraints enforced by the application, what have you.
think overall usability alone outweighs database design
If the design of your database is affecting your usability than the design was bad. I have no doubt that one who strives for features will leave the database in such a state that it severely hampers usability.
Data Centric people, don't look at a system as a place to provide only what's been asked for, but a repository of Intellectual Capital that can be exploited by more than whatever the Application-du-jour is. I can't begin to describe the number of cases where one team has used the database of some other team's app to enhance their apps value. Just look at all the medical research that is nothing more that the meta-analysis of existing studies. None of that is possible if you believe that only the features of your app matter and subsequent uses of your apps data do not.
A good data model isn't inviolate. Sure you'll add to it, change it when requirements change. But if you don't completely understand your data, I don't know how anyone can begin to write code.
I guess you need first to define datamodel and only then going coding. You should plan everything carefully before actually writting the code.
First is a feature list.
Then, detailed spec.
Then test plan and design of all, including databases.
Then, it wouldn't matter which to implement first.
You'll probably end up doing it "side by side".
You need some data to be able to test the application, but you need the application to be able to verify that you're storing the correct data.
Do some modelling first and then build the minimum you can for one or two features. Then when these are working add the next feature and so on.
You'll need to write some database update procedures (both the code and the rules about what and when to update) as you will have to extend your tables, but you'll need those for the final system anyway as it will have to change as new requirements come along.
Having done it quite a few times, I find myself invariably doing it like so:
Define the problem I'm trying to solve.
Write out some use-cases.
Have my significant other or a friend tell me if this is even a problem.
Sketch out a few sample screens.
Write flow diagrams for the use cases.
Ask my Rubber-duck questions.
Use questions to refine 1-6.
Write out the 'nouns'. Those become my data Model.
Write out the actions. Those become application logic.
Code data Model.
Code Application Logic.
Realize I've gotten it a little wrong.
Repeat 10-12 as many times as needed.
Ask, "Have I solved the problem"?
If not, rinse, lather and repeat 1-15.
This is a trick question. IMO, they both come in parallel during your planning and design phase. They are so closely related that it make sense to do them together. Just keep in mind that your database design will be almost fully developed while your code is still in its infancy (though your application logic should be almost fully mapped out in you head or on paper)
The idea is that you're designing your solution in the context of the problem. When you're planning out your solution you will be (or should be) defining your application as a set of things and actions (nouns and verbs).
For example, a very basic helpdesk program has people and tickets. People need to create tickets, update tickets, and close tickets. The nouns that require persistent storage will comprise your database, and the nouns + actions will be contained in your application.
Sometimes your table mappings and the relationship between tables will be obvious (IE people create tickets, ticket.creatorID = people.personID) and other times the relationship doesn't really click in your head until you start working through use cases or until you start writing your code (IE different ppl have different access levels defining what they can do. At a glance this would seem like a simple field in a table, but in practice it is better as a separate table).
I have seen many database designs having following audit columns on all the tables...
Created By
Create DateTime
Updated By
Upldated DateTime
From one perspective I see tables from the following view...
Entity Tables:
Good candidate for Audit columns)
Reference Tables:
Audit columns may or may not required. In some case last update information is not at all required because record is never going to be modified.)
Reference Data Tables
Like Country Names, Entity State etc... Audit columns may not required because these information is created only during system installation time, and never going to be changed.
I have seen many designers blindly put all audit columns to all tables, is this practice good, if yes what could be the reason...
I just want to know because to me it seems illogical. It is difficult for me to figure out why do they design their db this way? I am not saying they are wrong or right, just want to know the WHY?
You can also suggest me, if there is an alternative auditing patter or solution available...
Thanks and Regards
Data auditing is a required internal control for many business systems (see Sarbanes Oxley for reasons why). It must be at the database level to assure that all changes are captured especially unauthorized ones.
Even with lookup tables an unauthorized change could wreak havok in your system and thus it is important to know who made the change and when. When is especially important because it helps the dbas know how far back to grab a backup to restore information accidentally or maliciously changed.
We like to think all our employees are trustworthy, but many of the thefts of personal data and the malicious changes to destroy company data come from internal sources (this is why is is dangerous to have many disgruntled employees) as does almost all of the fraud. Yet most programmers seem to think that they only have to protect against outside threats.
Of course you are still going to have a few people who can make unauthorized changes, you can't prevent system admins from doing this. But with auditing at least you can limit the potential for data damage (and be especially careful when hiring dbas and allow no one else admin rights on your database servers).
These columns are for the benefit of the DBA and the database developers. They just provide a quick mechanism to answer questions like "When did this record last change?" "who changed it?" They are not robust enough or fine-grained enough to satisfy compliance with SOX, HIPAA or whatever.
It is simply easier to have these columns on every table. All data can change, so it is useful to know when changes happened, especially if that data isn't supposed to change. It is possible to automate the process of adding them, by using the data dictionary to generate scripts.
It is good practice for these columns to be populated independently of the application, by triggers or some similar mechanism. These columns are metadata, the application shouldn't really be aware of them.
Relying on a full-blown audit trail to provide this functionality is usually not an option. Audit data which is collected for compliance purposes usually has restricted access, and indeed may be stored in a separate physical location.
Many applications are developed using some OOP language in which there is generally a class like BusinessObject that contains what is perceived generally helpful information like such auditing fields. Not all subclassing entities may need it, but it's there if they do. Since the overhead of the db is small and the chances that the client may request another odd statistic based on the audit fields it's better to have them around than not to have them at all. If something represents a static list of information such as country names I generally wouldn't put it in the db at all - enumerated data type are created just for such purposes.
I come across this thread by chance, as the same question popped up in my mind this morning. Every answer has got the point and I definitely agree with all of you. It is undeniable to safeguard business data and transaction data. Instead of that, the author feels doubtful about audit fields for some configuration or static data.
This kind of configuration data are not updatable by users. Usually they can be placed in other places as well, like properties, config files or even hard-coded constants. Of course putting configuration data in these places might be bad designs or styles, but from the perspective of auditing, do they matter? In addition, if these data are updatable by users, then the only ones who can update it are either dba or hackers. Truly malicious dba or hackers will already know laws before they break the laws and they do find ways circumvent the laws.
To me, the question is more related to the environment in your company. Does your company have a culture of keeping track of every little bit of tiny information? Does your company constantly enforce strict discipline, monitoring or auditing? Having these auditing fields for non-user data are simply for their satisfaction, more than any other purposes.
In the past I've never been a fan of using triggers on database tables. To me they always represented some "magic" that was going to happen on the database side, far far away from the control of my application code. I also wanted to limit the amount of work the DB had to do, as it's generally a shared resource and I always assumed triggers could get to be expensive in high load scenarios.
That said, I have found a couple of instances where triggers have made sense to use (at least in my opinion they made sense). Recently though, I found myself in a situation where I sometimes might need to "bypass" the trigger. I felt really guilty about having to look for ways to do this, and I still think that a better database design would alleviate the need for this bypassing. Unfortunately this DB is used by mulitple applications, some of which are maintained by a very uncooperative development team who would scream about schema changes, so I was stuck.
What's the general consesus out there about triggers? Love em? Hate em? Think they serve a purpose in some scenarios?
Do think that having a need to bypass a trigger means that you're "doing it wrong"?
Triggers are generally used incorrectly, introduce bugs and therefore should be avoided. Never design a trigger to do integrity constraint checking that crosses rows in a table (e.g "the average salary by dept cannot exceed X).
Tom Kyte, VP of Oracle has indicated that he would prefer to remove triggers as a feature of the Oracle database because of their frequent role in bugs. He knows it is just a dream, and triggers are here to stay, but if he could he would remove triggers from Oracle, he would (along with the WHEN OTHERS clause and autonomous transactions).
Can triggers be used correctly? Absolutely.
The problem is - they are not used correctly in so
many cases that I'd be willing to give
up any perceived benefit just to get
rid of the abuses (and bugs) caused by
them. - Tom Kyte
Think of a database as a great big object - after each call to it, it ought to be in a logically consistent state.
Databases expose themselves via tables, and keeping tables and rows consistent can be done with triggers. Another way to keep them consistent is to disallow direct access to the tables, and only allowing it through stored procedures and views.
The downside of triggers is that any action can invoke them; this is also a strength - no-one is going to screw up the integrity of the system through incompetence.
As a counterpoint, allowing access to a database only through stored procedures and views still allows the backdoor access of permissions. Users with sufficient permissions are trusted not to break database integrity, all others use stored procedures.
As to reducing the amount of work: databases are stunningly efficient when they don't have to deal with the outside world; you'd be really surprised how much even process switching hurts performance. That's another upside of stored procedures: rather than a dozen calls to the database (and all the associated round trips), there's one.
Bunching stuff up in a single stored proc is fine, but what happens when something goes wrong? Say you have 5 steps and the first step fails, what happens to the other steps? You need to add a whole bunch of logic in there to cater for that situation. Once you start doing that you lose the benefits of the stored procedure in that scenario.
Business logic has to go somewhere, and there's a lot of implied domain rules embedded in the design of a database - relations, constraints and so on are an attempt to codify business rules by saying, for example, a user can only have one password. Given you've started shoving business rules onto the database server by having these relations and so on, where do you draw the line? When does the database give up responsibility for the integrity of the data, and start trusting the calling apps and database users to get it right? Stored procedures with these rules embedded in them can push a lot of political power into the hands of the DBAs. It comes down to how many tiers are going to exist in your n-tier architecture; if there's a presentation, business and data layer, where does the separation between business and data lie? What value-add does the business layer add? Will you run the business layer on the database server as stored procedures?
Yes, I think that having to bypass a trigger means that you're "doing it wrong"; in this case a trigger isn't for you.
I work with web and winforms apps in c# and I HATE triggers with a passion. I have never come across a situation where I could justify using a trigger over moving that logic into the business layer of the application and replicating the trigger logic there.
I don't do any DTS type work or anything like that, so there might be some use cases for using trigger there, but if anyone in any of my teams says that they might want to use a trigger they better have prepared their arguments well because I refuse to stand by and let triggers be added to any database I'm working on.
Some reasons why I don't like triggers:
They move logic into the database. Once you start doing that, you're asking for a world of pain because you lose your debugging, your compile time safety, your logic flow. It's all downhill.
The logic they implement is not easily visible to anyone.
Not all database engines support triggers so your solution creates dependencies on database engines
I'm sure I could think of more reasons off the top of my head but those alone are enough for me not to use triggers.
"Never design a trigger to do integrity constraint checking that crosses rows in a table" -- I can't agree. The question is tagged 'SQL Server' and CHECK constraints' clauses in SQL Server cannot contain a subquery; worse, the implementation seems to have a 'hard coded' assumption that a CHECK will involve only a single row so using a function is not reliable. So if I need a constraint which does legitimately involve more than one row -- and a good example here is the sequenced primary key in a classic 'valid time' temporal table where I need to prevent overlapping periods for the same entity -- how can I do that without a trigger? Remember this is a primary key, something to ensure I have data integrity, so enforcing it anywhere other than the DBMS is out of the question. Until CHECK constraints get subqueries, I don't see an alternative to using triggers for certain kinds of integrity constraints.
Triggers can be very helpful. They can also be very dangerous. I think they're fine for house cleaning tasks like populating audit data (created by, modified date, etc) and in some databases can be used for referential integrity.
But I'm not a big fan of putting lots of business logic into them. This can make support problematic because:
it's an extra layer of code to research
sometimes, as the OP learned, when you need to do a data fix the trigger might be doing things with the assumption that the data change is always via an application directive and not from a developer or DBA fixing a problem, or even from a different app
As for having to bypass a trigger to do something, it could mean you are doing something wrong, or it could mean that the trigger is doing something wrong.
The general rule I like to use with triggers is to keep them light, fast, simple, and as non-invasive as possible.
I find myself bypassing triggers when doing bulk data imports. I think it's justified in such circumstances.
If you end up bypassing the triggers very often though, you probably need to take another look at what you put them there for in the first place.
In general, I'd vote for "they serve a purpose in some scenarios". I'm always nervous about performance implications.
I'm not a fan, personally. I'll use them, but only when I uncover a bottleneck in the code that can be cleared by moving actions into a trigger. Generally, I prefer simplicity and one way to keep things simple is to keep logic in one place - the application. I've also worked on jobs where access is very compartmentalized. In those environments, the more code I pack into triggers the more people I have to engage for even the simplest fixes.
I first used triggers a couple of weeks ago. We changed over a production server from SQL 2000 to SQL 2005 and we found that the drivers were behaving differently with NText fields (storing a large XML document), dropping off the last byte. I used a trigger as a temporary fix to add an extra dummy byte (a space) to the end of the data, solving our problem until a proper solution could be rolled out.
Other than this special, temporary case, I would say that I would avoid them since they do hide what is going on, and the function they provide should be handled explictly by the developer rather then as some hidden magic.
Honestly the only time I use triggers to simulate a unique index that is allowed to have NULL that don't count for the uniqueness.
As to reducing the amount of work: databases are stunningly efficient when they don't have to deal with the outside world; you'd be really surprised how much even process switching hurts performance. That's another upside of stored procedures: rather than a dozen calls to the database (and all the associated round trips), there's one.
this is a little off topic, but you should also be aware that you're only looking at this from one potential positive.
Bunching stuff up in a single stored proc is fine, but what happens when something goes wrong? Say you have 5 steps and the first step fails, what happens to the other steps? You need to add a whole bunch of logic in there to cater for that situation. Once you start doing that you lose the benefits of the stored procedure in that scenario.
Total fan,
but really have to use it sparingly when,
Need to maintain consistency (especially when dimension tables are used in a warehouse and we need to relate the data in the fact table with their proper dimension . Sometime, the proper row in the dimension table can be very expensive to compute so you want the key to be written straight to the fact table, one good way to maintain that "relation" is with trigger.
Need to log changes (in a audit table for instance, it's useful to know what ##user did the change and when it occurred)
Some RDBMS like sql server 2005 also provide you with triggers on CREATE/ALTER/DROP statements (so you can know who created what table, when, dropped what column, when, etc..)
Honestly, using triggers in those 3 scenarios, I don't see why would you ever need to "disable" them.
The general rule of thumb is: do not use triggers. As mentioned before, they add overhead and complexity that can easily be avoided by moving logic out of the DB layer.
Also, in MS SQL Server, triggers are fired once per sql command, and not per row. For example, the following sql statement will execute the trigger only once.
UPDATE tblUsers
SET Age = 11
WHERE State = 'NY'
Many people, including myself, were under the impression that the triggers are fired on every row, but this isn't the case. If you have a sql statement like the one above that may change data in more than one row, you might want to include a cursor to update all records affected by the trigger. You can see how this can get convoluted very quickly.