Storing enum values in database - sql-server

FYI: I explicitly mean SQL Server 2000-8 and C#. So DBMSs with enum support like MySql is not the subject of my question.
I know this question has been asked multiple times in SO. But still, I see in answers that different approaches are taken to store enum values in db.
Save enum as int in db and extract the enum value (or enum description attribute using reflection) in code:
this is the approach I usually use. The problem is when I try to query from database in SSMS, the retrieved data is hard to understand.
Save enum as string (varchar) in db and cast back to int in code.
Actually, this might the best solution. But (don't laugh!) it doesn't feel right. I'm not sure about the cons. (Except more space in db which is usually acceptable) So anything else against this approach?
Have a separate table in db which is synchronized with code's enum definition and make a foreign key relationship between your main table and the enum table.
The problem is when another enum value should be added later, Both code and db need to get updated. Also, there might be typos which can be a pain!
So in general when we can accept the overhead on db in 2nd solution, What would be the best way to store enum values in db? Is there a general definite design pattern rule about this?
Thanks.

There is no definite design rule (that I know of), but I prefer approach #1.
Is the approach I prefer. It's simple, and enums are usually compact enough that I start remembers what the numbers mean.
It's more readable, but can get in the way of refactoring or renaming your enumeration values when you want to. You lose some freedom of your code. All of the sudden you need to get a DBA involved (depending on where/how you work) just to change an enumeration value, or suffer with it. Parsing an enum has some performance impact as well since things like Locale come into play, but probably negligible.
What problem does that solve? You still have unreadable numbers in a table somewhere, unless you want to add the overhead of a join. But sometimes, this is the correct answer too depending on how the data is used.
EDIT:
Chris in the comments had a good point: If you do go down the numeric approach, you should explicitly assign values so you can re-order them as well. For example:
public enum Foo
{
Bar = 1,
Baz = 2,
Cat = 9,
//Etc...
}

One idea I've seen before which is your option 3 more or less
A table in the database (for foreign keys etc)
A matching Enum in the client code
A startup check (via database call) to ensure they match
The database table table can have a trigger or check constraint to reduce risk of changes. It shouldn't have any write permissions because the data is tied to a client code release, but it adds a safety factor in case the DBA bollixes up
If you have other clients reading the code (which is very common) then the database has complete data.

Related

Bad practice to have IDs that are not defined in the database?

I am working on an application that someone else wrote and it appears that they are using IDs throughout the application that are not defined in the database. For a simplified example, lets say there is a table called Question:
Question
------------
Id
Text
TypeId
SubTypeId
Currently the SubTypeId column is populated with a set of IDs that do not reference another table in the database. In the code these SubTypeIds are mapped to a specific string in a configuration file.
In the past when I have had these types of values I would create a lookup table and insert the appropriate values, but in this application there is a mapping between the IDs and their corresponding text values in a configuration file.
Is it bad practice to define a lookup table in a configuration file rather than in the database itself?
Is it bad practice to define a lookup table in a configuration file rather than in the database itself?
Absolutely, yes. It brings in a heavy dependence on the code to manage and maintain references, fetch necessary values, etc. In a situation where you now need to create additional functionality, you would rely on copy-pasting the mapping (or importing them, etc.) which is more likely to cause an issue.
It's similar to why DB constraints should be in the DB rather than in the program/application that's accessing it - any maintenance or new application needs to replicate all the behaviour and rules. Having things this way has similar side-affects I've mentioned here in another answer.
Good reasons to have a lookup table:
Since DBs can generally naturally have these kinds of relations, it would be obvious to use them.
Queries first need to be constructed in code for the Type- and SubType- Text vs ID instead of having them as part of the where/having clause of the query that is actually executed.
Speed/Performance - with the right indexes and table structures, you'd benefit from this (and reduce code complexity that manages it)
You don't need to update your code for to add a new Type or SubType, or to edit/delete them.
Possible reasons it was done that way, which I don't think are valid reasons:
The TypeID and SubTypeID are related and the original designer did not know how to create a complex foreign key. (Not a good reason though.)
Another could be 'translation' but that could also be handled using foreign key relations.
In some pieces of code, there may not be a strict TypeID-to-SubTypeID relation and that logic was handled in code rather than in the DB. Again, can be managed using 'flag' values or NULLs if possible. Those specific cases could be handled by designing the DB right and then working around a unique/odd situation in code instead of putting all the dependence on the code.
NoSQL: Original designer may be under the impression that such foreign keys or relations cannot be done in a NoSQL db.
And the obvious 'people' problem vs technical challenge: The original designer may not have had a proper understanding of databases and may have been a programmer who did that application (or was made to do it) without the right knowledge or assistance.
Just to put it out there: If the previous designer was an external contractor, he may have used the code maintenance complexity or 'support' clause as a means to get more business/money.
As a general rule of thumb, I'd say that keeping all the related data in a DB is a better practice since it removes a tacit dependency between the DB and your app, and because it makes the DB more "comprehensible." If the definitions of the SubTypeIDs are in a lookup table it becomes possible to create queries that return human-readable results, etc.
That said, the right answer probably depends a bit on the specifics of the application. If there's very tight coupling between the DB and app to begin with (eg, if the DB isn't going to be accessed by other clients) this is probably a minor concern particularly if the set of SubTypeIDs is small and seldom changes.

Alternatives to isActive

Marginally related to Should I delete or disable a row in a relational database?
Given that I am going to go with the strategy of warehousing changes to my tables in a history table, I am faced with the following options for implementing a status for a given row in MySQL:
An isActive booelan
An activeStatus enum
An activeStatus INT referencing a small ActiveStatus lookup table
An activeStatus INT not referencing another table
The first approach is rather inflexible in my opinion, since I might need more booleans in the future to support other types of active statuses (I'm not sure what they would be, but maybe something like "being phased out" or "active for a random group of users", etc).
I'm told that MySQL enum is bad, so the second approach probably won't fly.
I like the third approach, but I'm wondering if it is a heavy handed solution to a relatively small problem.
The fourth approach requires that we know in advance what each status INT means and seems like an outdated way to do things.
Is there a canonical right answer? Am I ignoring another approach?
Personally I would go with your third option.
Boolean values often turn out to be more complex in reality, as you suggested. ENUMs can be nice, but they have the downside that as soon as you want to store additional information about each value - who added it, when, is it only valid for a certain time period or source system, comments etc. - it becomes difficult, whereas with a lookup table those data can easily be maintained in additional columns. ENUMs are a good tool to constrain data to certain values (like a CHECK constraint), but not such a good tool if those values have significant meaning and need to be exposed to users.
It's not entirely clear from your question if you plan to treat your history table like a fact table and use it in reports, but if so then you could consider the ActiveStatus lookup table as a dimension. In this case a table is much easier, because your reporting tool can read the possible values from the dimension table in order to let the user choose his query conditions; such tools generally don't know anything about ENUMs.
From my point of view your 2nd approach is better if u have more than 2 status.Because ENUM is great for data that you know will fall within a static set. But if u have only two status active and inactive then its always better to use boolean.
EDIT:
If u r sure in future u r not gonna change the value of your ENUM then its great to use ENUM for such field.

Enums in the DB or NO Enums in the DB

For me, the classic wisdom is to store enum values (OrderStatus, UserTypes, etc) as Lookup tables in your db. This lets me enforce data integrity in the database, preventing false or null values, etc.
However more and more, this feels like unnecessary duplication to me. Not only do I have to create tables for these values (or have an unwieldy central lookup table), but if I want to add a value, i have to remember to add it to 2 (or more, counting production, testing, live db's) and things can get out of sync easily.
Still I have a hard time letting go of lookup tables.
I know there are probably certain scenarios where one had an advantage over the other, but what are your general thoughts?
I've done both, but I now much prefer defining them as in classes in code.
New files cost nothing, and the benefits that you seek by having it in the database should be handled as business rules.
Also, I have an aversion to holding data in a database that really doesn't change. And it seems an enum fits this description. It doesn't make sense for me to have a States lookup table, but a States enum class makes sense to me.
If it has to be maintained I would leave them in a lookup table in the DB. Even if I think they won't need to be maintained I would still go towards a lookup table so that if I am wrong it's not a big deal.
EDIT:
I want to clarify that if the Enum is not part of the DB model then I leave it in code.
I put them in the database, but I really can't defend why I do that. It just "seems right". I guess I justify it by saying there's always a "right" version of what the enums can be by checking the database.
Schema dependencies should be stored in the database itself to ensure any changes to your architecture can be easily perform transparently to the app..
I prefer enums as it enforces early binding of values in code, so that exceptions aren't caused by missing values
It's also helpful if you can use code generation that can bring in the associations of the integer columns to an enumeration type, so that in business logic you only have to deal with easily memorable enumeration values.
Consider it a form of documentation.
If you've already documented the enum constants properly in the code that uses the dB, do you really need a duplicate set of documentation (to use and maintain)?

How to Handle Unknown Data Type in one Table

I have a situation where I need to store a general piece of data (could be an int, float, or string) in my database, but I don't know ahead of time which it will be. I need a table (or less preferably tables) to store this unknown typed data.
What I think I am going to do is have a column for each data type, only use one for each record and leave the others NULL. This requires some logic above the database, but this is not too much of a problem because I will be representing these records in models anyway.
Basically, is there a best practice way to do something like this? I have not come up with anything that is less of a hack than this, but it seems like this is a somewhat common problem. Thanks in advance.
EDIT: Also, is this considered 3NF?
You could easily do that if you used SQLite as a database backend :
Any column in a version 3 database, except an INTEGER PRIMARY KEY column, may be used to store any type of value.
For other RDBMS systems, I would go with Philip's solution.
Note that in my line of software (business applications), I cannot think of any situation where this kind of requirement would be needed (a value with an unknown datatype). Unless the domain model was flawed, of course... I can imagine that other lines of software may incur different practices, but I suggest that you consider rethinking your overall design.
If your application can reliably convert datatypes, you might consider a single column solution based on a variable-length binary column, with a second column to track original data type. (I did a very small routine based on this once before, and it worked well enough.) Testing would show if conversion is more efficiently handled on the application or database side.
If I were to do this I would choose either your method, or I would cast everything to string and use only one column. Of course there would be another column with the type (which would probably be useful for the first method too).
For faster code I would probably go with your method.

Enum storage in Database field

Is it best to store the enum value or the enum name in a database table field?
For example should I store 'TJLeft' as a string or a it's equivalent value in the database?
Public Enum TextJustification
TJLeft
TJCenter
TJRight
End Enum
I'm currently leaning towards the name as some could come along later and explicitly assign a different value.
Edit -
Some of the enums are under my control but some are from third parties.
Another reason to store the numeric value is if you're using the [Flags] attribute on your enumeration in cases where you may want to allow for multiple enumeration values. Say, for example you want to let someone pick what days of the week that they're available for something...
[Flags]
public enum WeekDays
{
Monday=1,
Tuesday=2,
Wednesday=4,
Thursday=8,
Friday=16
}
In this case, you can store the numeric value in the db for any combination of the values (for example, 3 == Monday and Tuesday)
I always use lookup tables consisting of the fields
OID int (pk) as the numeric value
ProgID varchar (unique) as the value's identifier in C# (i.e. const name, or enum symbol)
ID nvarchar as the display value (UI)
dbscript lets me generate C# code from my lookup tables, so my code is always in sync with the database.
For your own enums, use the numeric values, for one simple reason: it allows for every part of enum functionality, out of the box, with no hassle. The only caveat is that in the enum definition, every member must be explicitly given a numeric value, which can never change (or, at least, not after you've made the first release). I always add a prominent comment to enums that get persisted to the database, so people don't go changing the constants.
Here are some reasons why numeric values are better than string identifiers:
It is the simplest way to represent the value
Database searching/sorting is faster
Lower database storage cost (which could be a serious issue for some applications)
You can add [Flags] to your enum and not break your code and/or existing data
For [Flags] stored in a string field:
Poorly normalized data
Could generate false-positive anomalies when doing matching (i.e., if you have members "Sales" and "RetailSales", merely doing a substring search for "Sales" will match on either type). This has to be constrained either by using a regex on word boundaries (finicky using databases, and slow), or constraining in the enum itself, which is nonstandard, error-prone, and very difficult to debug.
For string fields (either [Flags] or not), if the database is obfuscated, this field has to be handled, which greatly affects the ability and efficiency when doing searching/sorting code, as mentioned in the previous point
You can rename any of the members without breaking the database code and/or existing client data.
Less over-the-wire data transfer space/time needed
There are only two situations where using the member names in the database may be an advantage:
If you're doing a lot of data editing manually... but who does that? And if you are, there's a good chance you're not going to be using an enum anyway.
Third-party enums where they may not be so diligent as to maintain the numeric value constants. But I have to say, anyone releasing a decently-written API is overwhelmingly likely to be smart enough to keep the enum values constant. (The identifiers have to stay the same since changing them would break existing code.)
On lookup tables, which I strongly discourage because they are a one-way bullet train to a maintenance nightmare:
Adding [Flags] functionality requires the use of a junction table, which means more complicated queries (existing ones need to be rewritten), and added complexity. What about existing client data?
If the identifier is stored in the data table, what's the point of having a lookup table in the first place?
If the numeric value is stored in the data table, you gain nothing since you still have to look up the identifier from the lookup table. To make it easier, you could create a view... for every table that has an enum value in it. And then let's not even think about [Flags] enums.
Introducing any kind of synchronization between database and code is just asking for trouble. What about existing client data?
Store an ID (value) and a varchar name; this lets you query on either way. Searching on the name is reasonable if your IDs (values) may get out of sync later.
It is better to use the integer representation... If you have to change the Enum later (add more values etc) you can explicitly assign the integer value to the Enum value so that your Enum representation in code still matches what you have in the database.
It depends on how important performance is versus readability. Databases can index numeric values a lot easier than strings, which means you can get better performance without using as much memory. It would also reduce the amount of data going across the wire somewhat. On the other hand, when you look at a numeric value in your database which you then have to refer to a code file to translate, that can be annoying.
In most cases, I'd suggest using the value, but you will need to make sure you're explicitly setting those values so that if you add a value in the future it doesn't shift the references around.
As often it depends on many things:
Do you want to sort by the natural order of the enums? Use the numeric values.
Do you work directly in the database using a low level tool? use the name.
Do you have huge amounts of data and performance is an issue? use the number
For me the most important issue is most of the time maintainability:
If your enums change in the future, names will either match correctly of fail hard and loud. With numbers some one can add a enum instance, changing all the numbers of all enums, so you have to update all the tables where the enum is used. And almost no way to know if you missed a table.
if you are trying to get the values of enum stored in the database back, then try this
EnumValue = DirectCast([Enum].Parse(GetType(TextJustification), reader.Item("put_field_name_here").ToString), TextJustification)
tell me if it works for you

Resources