Are unique constraints on the DB necessary? - database

I've been wondering lately. Lets say we're doing a webapp with JSF+Spring+JPA/Hibernate (or well any other technologies) and lets say we have a "User" entity. We want the User to have a unique login. If we want to do it then we can put a #UniqueConstraint on the "Login" column, but still our application has to check during user registration whether the user input is valid (unique) or not, without that and only with the DB constraint we will simply get an error. This made me think, are DB constraints really necessary/helpful? The only two times I can think off when they would give us any benefits would be when someone tries to hack us (but I guess our app should be SQL injection proof anyway) or we try to change the DB content manually (which shouldn't really happen). Actually now that I think about it, are DB constraints in general necessary/good practice? Like the lenght of a string etc.

For me, categorically yes, see Database as a Fotress by Dan Chak from 97 Thinks Every Software Architect Should Know. He says it much better than I could.

Yes, they are. They enforce data integrity at the lowest level.
You might want to change DB content manually(i.e upgrades to new version)
You might forget to check some constrain in your code.
You can look at this like client/server validation. Your program is client, db is server. Mostly client validation is enough, but you must have server validation just in case something goes wrong.

I think a data person would say both are absolutely necessary. Your question assumes that your middle tier application code will be in front of that database now and forever.
The truth is that middle tier applications come and go, but data lives forever.
There's no getting away from length of columns in schema design. I think you're asking if it's a good practice for the middle tier to enforce them. Maybe not, but they're key for the database.

Often when you declare a set of columns to be unique, it's something that you will want to query by - so it should most likely be indexed anyway.
Yes your application should do the appropriate checking, but what if a mistake slips through? If your database knows something is meant to be unique, at least you know you won't store invalid data (or not "badly" invalid data, like duplicates of data intended to be unique). At any rate you could ask the opposite question: what does it cost you?

If you want to have bad data, then take off the unique constraints in the database. I have worked around databases since the 1970's and have queried or imported data stored in hundreds of databases. I have never seen good data in databases where the constraints are improperly set at the application level. Many things other than the application hit the database (imports from other systems, a quick update to prod data to fix a data issue that is run from the query window, other applications, etc.). Many times the application is replaced and the constraints are lost.

Constraints, both unique and foreign not only enforce data integrity, but they have performance implications. Without knowledge of these 'informal' constants, database optimizers are going to make unpredictable decisions about how to execute your statements.

but still our application has to check
during user registration whether the
user input is valid (unique) or not,
without that and only with the DB
constraint we will simply get an
error.
That's the funniest thing I've ever read. You simply get an error? That's really all you need right? Ever heard of error trapping? Try Catch ring any bells?
It's actually very counter productive for your app to "check". The database is going to check anyways on the constraint so why do it twice. The app should just insert the row as if it will be fine. The DB will check for uniqueness and raise an error on a failure... You app should CATCH that error and do whatever you were doing in your existing check.

Yes. Just catch the key violation error and the key constraint will have done the work for you without having to incur an additional overhead of an additional check first.

Related

How to handle NOT NULL SQL Server columns in Access forms elegantly?

I have an MS Access front-end linked to a SQL Server database.
If some column is required, then the natural thing to do is to include NOT NULL in that column's definition (at the database level). But that seems to create problems on the Access side. When you bind a form to that table, the field bound to that column ends up being pretty un-user-friendly. If the user erases the text from that field, they will not be able to leave the field until they enter something. Each time they try to leave the field while it's blank, they will get this error:
You tried to assign the Null value to a variable that is not a Variant data type.
That's a really terrible error message - even for a developer, let alone the poor user. Luckily, I can silence it or replace it with a better message with some code like this:
Private Sub Form_Error(DataErr As Integer, Response As Integer)
If DataErr = 3162 Then
Response = acDataErrContinue
<check which field is blank>
MsgBox "<some useful message>"
End If
End Sub
But that's only a partial fix. Why shouldn't the user be able to leave the field? No decent modern UI restricts focus like that (think web sites, phone apps, desktop programs - anything, really). How can we get around this behavior of Access with regard to required fields?
I will post the two workarounds I have found as an answer, but I am hoping there are better ways that I have overlooked.
Rather than changing backend table definitions or trying to "trick" Access with out-of-sync linked table definitions, instead just change the control(s) for any "NOT NULL" column from a bound to an unbound field (i.e. Clear the ControlSource property and change the control name--by adding a prefix for example--to avoid annoying collisions with the underlying field name.).
This solution will definitely be less "brittle", but it will require you to manually add binding code to a number of other Form events. To provide a consistent experience as other Access controls and forms, I would at least implement Form_AfterInsert(), Form_AfterUpdate(), Form_BeforeInsert(), Form_BeforeUpdate(), Form_Current(), Form_Error(), Form_Undo().
P.S. Although I do not recall seeing such a poorly-worded error message before, the overall behavior described is identical for an Access table column with Required = True, which is the Access UI equivalent of NOT NULL column criteria.
I would suggest if you can simply change all tables on sql server to allow nulls for those text columns. For bit, number columns default them to 0 sql server side. While our industry tends to suggest to avoid nulls, and many a developer ALSO wants to avoid nulls, so they un-check the allow nulls SQL server side. The problem is you can never run away and avoid tons of nulls anyway. Take a simple query of say customers and their last invoice number + invoice total. But of course VERY common would be to include customers that not bought anything in that list (customers without ivoices yet, or customers without any of a gazillion possible cases where the child record(s) don't yet exist. I find about 80% or MORE of my quires in a typical application are LEFT joins. So that means any parent record without child records will return ALL OF those child columns as null. You going to work with, and see, and HAVE to deal with tons and tons of nulls in a application EVEN if you table designs NEVER allow nulls. You cannot avoid them - you simply cannot run away from those nasty nulls.
Since one will see lots of nulls in code and any sql query (those VERY common left joins), then by far and away the best solution is to simply allow and set all text columns as allowing nulls. I can also much state that if an application designer does not put their foot down and make a strong choice to ALWAYS use nulls, then the creeping in of both NULLS and ZLS data is a much worse issue to deal with.
The problem and issue becomes very nasty and painful if one does not have control or one cannot make this choice.
At the end of the day, Access simply does not work with SQL server and the choice of allowing ZLS columns.
For a migration to sql server (and I been doing them for 10+ years), it is without question that going will nulls for all text columns is by far and away the most easy choice here.
So I recommend that you not attempt to code around this issue but simply change all your sql tables to default to and allow nulls for empty columns.
The result of above may require some minor modifications to the application, but the pain and effort is going to be far less then attempting to fix or code around Access poor support (actually non support) of ZLS columns when working with SQL server.
I will also note that this suggesting is not a great suggestion, but it is simply the best suggestion given the limitations of how Access works with SQL server. Some database systems (oracle) do have a overall setting that says every null is to be converted to ZLS and thus you don't have to care about say this:
select * from tblCustomers where (City is null) or (City is = "")
As above shows, the instant you allow both ZLS and nulls into your application is the SAME instant that you created a huge monster mess. And the scholarly debate about nulls being un-defined is simply a debate for another day.
If you are developing with Access + SQL server, then one needs to adopt a standard approach - I recommend that approach simply is that all text columns are set to allows nulls, and date columns. For numbers and bit columns, default them to 0.
This comes down to which is less pain and work.
Either attempet some MAJOR modifications to the application and say un-bind text columns (that can be a huge amount of work).
Or
Simply assume and set all text columns to allow nulls. It is the lessor of a evil in this case, and one has to conform to the bag of tools that has been handed to you.
So I don't have a workaround, but only a path and course to take that will result in the least amount of work and pain. That least pain road is to go with allowing nulls. This suggestion will only work of course if one can make that choice.
The two workarounds I have come up with are:
Don't make the database column NOT NULL and rely exclusively on Access forms for data integrity rather than the database. Readers of that table will be burdened with an ambiguous column that will not contain nulls in practice (as long as the form-validation code is sound) but could contain nulls in theory due to the way the column is defined within the database. Not having that 100% guarantee is bothersome but may be good enough in reality.
Verdict: easy but sloppy - proceed with caution
Abuse the fact that Access' links to external tables have to be refreshed manually. Make the column NULL in SQL Server, refresh the link in Access, and then make the column NOT NULL again in SQL Server - but this time, don't refresh the link in Access.
The result is that Access won't realize the field is NOT NULL and, therefore, will leave the user alone. They can move about the form as desired without getting cryptic error 3162 or having their focus restricted. If they try to save the form while the field is still blank, they will get an ODBC error stemming from the underlying database. Although that's not desirable, it can be avoided by checking for blank fields in Form_BeforeUpdate() and providing the user with an intelligible error message instead.
Verdict: better for data integrity but also more of a pain to maintain, sort of hacky/astonishing, and brittle in that if someone refreshes the table link, the dreaded error / focus restriction will return - then again, that worst-case scenario isn't catastrophic because the consequence is merely user annoyance, not data-integrity problems or the application breaking

GUID VS Auto Increment. (In comfortably wise)

A while a go, my sysadmin restored my database by mistake to a much earlier point.
After 3 hours we noticed this, and during this time 80 new rows (auto increment with foreign keys dependency) were created.
So at this point we had 80 different customers with the same ids in two tables that needed to be merged.
I dont remember how but we resolved this but it took a long time.
Now, I am designing a new database and my first thought is to use a GUID index even though this use case is rare.
My question: How do you get along with such long string as your ID?
I mean, when 2 programmers are talking about a customer, it is possible to say:
"Hey. We have a problem with client 874454".
But how do you keep it as simple with GUID, This is really a problem that can cause some trouble and dis-communications.
Thanks
GUIDs can create more problems than they solve if you are not using replication. First,you need to make sure they aren't the clustered index (which is the default for the PK in SQL Server at least) because you can really slow down insert performance. Second they are longer than ints and thus take up not only more space but make joins slower. Every join in every query.
You are going to create a bigger problem trying to solve a rare occurance. Instead think of ways to set things up so that you don't take hours to recover from a mistake.
You could create an auditing solution. That way you can easily recover from all sorts of missteps. And write the code in advance to do the recovering. Then it is relatively easy to fix when things go wrong. Frankly I would never allow a database that contains company critical data to be set up without some form of auditing. It's just too dangerous not to.
Or you could even have a script ready to go to move records to a temporary place and then reinsert them with a new identity (and update the identities on the child records to the new one). You did this once, the dba should have created a script (and put it in source control) so it is available the next time you need to do a similar fix. If your dba is so incompetent he doesn't create and save these sort of scripts, then get rid of him and hire someone who knows what he is doing.
just show a prefix in most views. That's what DVCSs do, since most of them identify most objects by a hexcoded hash.
(OTOH, I know it's fashionable in many circles to use UUIDs for primary keys; but it would take a lot more than a few scary stories to convince me)

Best practices for handling unique constraint violation at UI level

While working in my application i came across a situation in which there are likely chances to Unque Constraints Violation.I have following options
Catch the exception and throw it back to UI
At UI check for the exception and show approrpriate Error Message
This is something different idea is to Check in advance about the existance of the given Unique value before starting the whole operation.
My Question is what might be the best practice to handle such situation.Currently we are using combo of Struts2+Spring 3.x+Hibernate 3.x
Thanks in advance
edit
In case we decide to let database give tha final verdict and we will handle the exception and propagte that exception to UI and will show Message as per the Exception.What you suggest should be propagate the same exception (org.hibernate.exception.ConstraintViolationException) to UI layer or should we create a seperate exception class for this since propagating Hibernate exception to UI means polluting the UI classes with Hibernate specific imports and other things
The best way to answer this question is to split it into two ideas.
1) Where is the unique constraint ultimately enforced? In this case (from your question) the answer is the database.
2) How can we make the user experience better by checking the constraint in other places?
Because the database will ultimately make the decision within a transaction, there is no useful check you can make ahead of time. Even if you check before inserting, it is possible (though usually highly unlikely) that another user inserts that value in time between the check and the actual insert.
So let the database decide and bubble the error back up to the UI.
Note that this is not always true for all constraints. When checking foreign keys for small tables (such as a table of US States or Countries or Provinces), the UI provides the user with a selection list, which forces the user to pick an allowed value. In this case the UI really is enforcing the constraint. Though of course even in that case the database must make the final enforcement, to protect against malicious hand-crafted requests to the web layer that are trying deliberately to put in invalid values.
So, for some constraints, yes, let the UI help. But for unique constraints, the UI really cannot help because the database is the final authority, and there is no useful check you can make that you can guarantee will still be true when you make the insert.
Depends on the UI and if the user can do anything about it, as well as what else is going on in the system. I usually check before attempting an insert, especially if there's any sort of transactional logic, or other inserts happening after this one. If there's nothing like that, and the user can just pick a different number to put in, then catching the exception and displaying an error message might be just fine.

Synchronizing one or more databases with a master database - Foreign keys

I'm using Google Gears to be able to use an application offline (I know Gears is deprecated). The problem I am facing is the synchronization with the database on the server.
The specific problem is the primary keys or more exactly, the foreign keys. When sending the information to the server, I could easily ignore the primary keys, and generate new ones. But then how would I know what the relations are.
I had one sollution in mind, bet the I would need to save all the pk for every client. What is the best way to synchronize multiple client with one server db.
Edit:
I've been thinking about it, and I guess seqential primary keys are not the best solution, but what other possibilities are there? Time based doesn't seem right because of collisions which could happen.
A GUID comes to mind, is that an option? It looks like generating a GUID in javascript is not that easy.
I can do something with natural keys or composite keys. As I'm thinking about it, that looks like the best solution. Can I expect any problems with that?
This is not quite a full answer, but might at least provide you with some ideas...
The question you're asking (and the problem you're trying to address) is not specific to Google Gears, and will remains valid with other solutions, like HTML 5 or systems based or Flash/Air.
There's been a presentation about that subject given during the last ZendCon a few month ago -- and the slides are available on slideshare : Planning for Synchronization with Browser-Local Databases
Going through thoses slides, you'll see notes about a couple of possibilities that might come to mind (some did actually come to your mind, or in other answers) :
Using GUID
Composite Keys
Primary key pool (i.e. reserve a range of keys beforehand)
Of course, for each one of those, there are advantages... and drawbacks -- I will not copy-paste them : take a look at the slides ;-)
Now, in your situation, which solution will be the best ? Hard to say, actually -- and the sooner your think about synchronisation, the better/easier it'll probably be : adding stuff into an application is so much simpler when that application is still in its design stage ^^
First, it might be interesting to determine whether :
Your application is generally connected, and being dis-connected only rarely happens
Or if your application is generally dis-connected, and only connects once in a while.
Then, what are you going to synchronise ?
Data ?
Like "This is the list of all commands made by that user"
With that data replicated on each dis-connected device, of course -- which can each modify it
In this case, if one user deletes a line, and another one adds a line, how to know which one has the "true" data ?
Or actions made on those data ?
Like "I am adding an entry in the list of commands made by that user"
In this case, if one user deletes a line, and another one adds a line, it's easy to synchronize, as you just have to synchronise those two actions to your central DB
But this is not quite easy to implements, especially for a big application / system : each time an action is made, you have to kind of log it !
There is also a specific problem to which we don't generally think -- until it happens : especially if your synchronisation process can take some time (if you have a lot of data, if you don't synchronise often, ...), what if the synchronisation is stopped when it's not finished yet ?
For instance, what if :
A user, in a train, has access to the network, with some 3G card
The synchronisation starts
there is a tunnel -- and the connection is lost.
Having half-synchronised data might not be that good, in most situations...
So, you have to find a solution to that problem, too : in most cases, the synchronisation has to be atomic !
I've came up with the following solution:
Every client gets a unique id from the server. Everywhere a primary key is referenced, I use a composite key with the client id and an auto increment field.
This way, the combination is unique, and it's easy to implement. The only thing left is making sure every client does get a unique id.
I just found out one drawback: SQLite doesn't support autoincrement on composite primary keys, so I would have to handle the id's myself.
I would use a similar setup to your latest answer. However, to get around your auto-increment issue, I would use a single auto-increment surrogate key in your master database and then store the client primary key and your client id as well. That way you are not losing or changing any data in the process and you are also tracking which client the data was originally sourced from.
Be sure to also set up a unique index on your Client Pk, Client Id to enable referential integrity from any child tables.
Is there a reasonable limit to how many objects the client can create while disconnected?
One possibilty I can see is to create a sort of "local sequence".
When your client connects to the central server, it gets a numeric ID, say a 7 digit number (the server generates it as a sequence).
The actual PKs are created as strings like this: 895051|000094 or 895051|005694 where the first part is the 7 digit number sent from the server, and the second part is a "local" sequence managed by the client.
As soon as you synch with the central, you can get a new 7 digit number and restart your local sequence. This is not too different from what you were proposing, all in all. It just makes the actual PK completely independant from the client identity.
Another bonus is that if you have a scenario where the client has never connected to the server, it can use 000000|000094 locally, require a new number from the server and update the keys on its side before sending back to the server for synch (this is tricky if you have lots of FK constraints though, and could not be feasible).

Should I check for DB constraints in code or should I catch exceptions thrown by DB

I have an application that saves data into a table called Jobs. The Jobs table has a column called Name which has a UNIQUE constraint. The Name column is not PRIMARY KEY. I wonder if I should check for duplicate entries myself before I try to save/update a new entry or if it's better to wait for an exception thrown by the data access layer. I'm using NHibernate for this App if it's of any importance
Thanks to everybody for the great input.
I have found one more reason why I should validate in code and not just wait for an exception being thrown (and caught by my code). It seems that NHibernate will only throw an NHibernate.Exceptions.GenericADOException which is not very informative regarding the cause of the exception in this case. Or am I missing an aspect of NHibernate here?
The answer is: both.
If your database has constraints it can guarantee certain invariants about the data, such as uniqueness. This helps in several ways:
If you have a bug in your
application, violating the
constraint will flag something that
might otherwise not be noticed.
Other users of the database can
assume more about the behaviour of
the data as the DBMS enforces
invariants.
The database protects itself from
incorrect updates that violate the
constraints. If you find you have some other
system or interface populating the
database down the track, the
constraints enforced by the database
mean that anything caught by the
constraints won't (or at least
is less likely to) break your system.
Applications and databases live in a M:M relationship in any but the most trivial cases. The application should still have the appropriate data and business rule validations but you should still not plan for your application being the only customer of the data. Work in data warehousing for a few years and you'll see the effects of applications designed by people with this mindset.
If your design is good (both database and BL), the database shouldn't have any constraints that wouldn't be dealt with in the BL - i.e. you shouldn't be presenting the database with inconsistent data. But nothing is perfect.
I've found that confining the database to data consistency constraints lets me handle all BL validation in procedural code, and the only cases where I experience database exceptions are design and coding errors which can (and should be) fixed.
In your case, checking the name for uniqueness is data content validation, properly handled in code. Which presumably catches the error nearest the point of commission, where you hopefully have friendlier UI resources to call on without introducing undesirable coupling between abstractions.
I would leave that work entirely to the database; your code should focus on catching and properly handling the exception.
Reasons:
Performance- The database will be
highly optimized to enforce
constraints in a fast and efficient
way. You won't have time to
optimize your code as well.
Maintainability- If the constraints
change in the future, you won't have
to modify your code, or perhaps you
will just have to add a new catch{}.
If a constraint is dropped, you
won't have to touch your code at
all.
If you are going to check the constraints yourself, do it in the data access layer. Nothing above that layer should know anything about your database or its constraints.
In most cases I'd say leave it to the DAL to catch DB-originated exceptions. But in your specific case, I think we're talking about basic input validation. I'd opt for a name availability check call to the database, before submitting the whole form.
You should definitely check for any exception thrown by the data access layer. The problem with checking if there is a record with the same value is, that it requires you to lock the table for modifications until you insert the new record to prevent race conditions.
It is generally advisable to check for exceptions/errors, even if you have checked everything yourself before. There is almost always something that can go wrong or which you haven't considered in your code but is enforced by the database.
Edit: If I understand the question right, it is not about if the constraint should be enforced by the database or not, but how to deal with it in the application code. Of course you should always set up all constraints in the database to prevent bad data entering your database.
The question that you need to answer is:
"Do I need to present the user with nice messages". Example: There is already a Job with the name TestJob1.
If the answer is No, just catch the error and present a common message
If the answer is Yes, keep reading
If you catch the error after the insert there isn't enough information to present the right message (at least in an agnostic DB way)
On the other hand, there can be race conditions and you can have simultaneous transaction trying to insert the same data, therefore you need the DB constraint
An approach that works well is:
check before to present a nice
message
catch the exception and
present a common error message
(assuming this won't happen very
frequently)
Personally I'd catch the exception. It's much simpler and requires much less code.
The inner exception of the GenericADOException will tell you why the database action failed. You can catch the OracleException / MSSQLException / [InsertCustomExceptionHere] and handle the error from that message. If you want to pass this back up to the front end (assuming the user is the one who entered duplicate data) you might want to wrap it in a custom exception first so you don't couple your front end to your database. You don't really want to be passing RDBMS specific exceptions around.
I disagree with checking the db for uniqueness before doing an insert, round tripping to the database twice isn't very efficient and certainly isn't scalable if you have a high volume of user traffic.

Resources