I often write code that has a dependency on a database entity. I want to minimize the coupling between those different systems (or make it explicit and robust).
Example: I have a dropdown list of error categories, the users can define new categories, but one category is special in that errors belonging to it get an extra input field. So the system needs to know when the user has selected the special category and this special category is not allowed to just disappear.
How would you handle that special category? Would you match on on the category name or id? would you put the entity in the migration or have your code regenerate it as needed? Do you omit it from the database and have it only exist in your code? I find myself picking new solutions every time this problem presents itself, but I'm never quite satisfied with them.
Has anyone found a satisfactory solution? What drawbacks have you found and how have you mitigated them?
I dislike special case code, so I would design it to all be in the data model. The database would get a can delete field, and a has special entry field with some way to describe what that special input is. I would also try to make sure that I didn't over design the special input stuff since there is only this case so far.
Related
I was trying to implement a system where a user can save custom configurations.
My query to the teacher was "Why should I allow the user to have multiple custom configurations that are 100% same with different names?" To this query, my teacher responded with an example of the file system where I can save multiple duplicate files.
I am not very convinced by this response although it is true.
I want to know why do we allow the user to save duplicate files or in my case duplicate configurations? I believe it is just redundancy and wastage of available space which can be avoided.
Two configurations may be the same today, but next week one of them will be changed to do something different. Until then, it is a good idea to get used to loading ConfigA for JobA, and ConfigB for JobB. They are the same now, but next week ConfigB will change.
I have an application design question concerning handling data sets in certain situations.
Let's say I have an application where I use some entities. We have an Order, containing information about the client, deadline, etc. Then we have Service entity having one to many relation with an Order. Service contains it's name. Besides that, we have a Rule entity, that sets some rules concerning what to deduct from the material stock. It has one to many relation with Service entity.
Now, my question is: How to handle situation, when I create an Order, and I persist it to the database, with it's relations, but at the same time, I don't want the changes made to entities that happen to be in a relation with the generated order visible. I need to treat the Order and the data associated with it as some kind of a log, so that removing a service from the table, or changing a set of rules, is not changing already generated orders, services, and rules that were used during the process.
Normally, how I would handle that, would be duplicating Services and Rules, and inserting it into new table, so that data would be independent from the one that is used during Order generation. Order would simply point to the duplicated data, instead of the original one, which would fix my problem. But that's data duplication, and as I think, it's not the best way to do it.
So, if you understood my question, do you know any better idea for solving that kind of a problem? I'm sorry if what I wrote doesn't make any sense. Just tell me, and I'll try to express myself in a better way.
I've been looking into the same case resently, so I'd like to share some thoughts.
The idea is to treat each entity, that requires versioning, as an object and store in the database object's instances. Say, for service entity this could be presented like:
service table, that contains only service_id column, PrimaryKey;
service_state (or ..._instance) table, that contains:
service_id, Foreign Key to the service.service_id;
state_start_dt, a moment in time when this state becomes active, NOT NULL;
state_end_dt, a moment in time when this state is obsoleted, NULLable;
all the real attributes of the service;
Primary Key is service_id + state_start_dt.
for sure, state_start_dt::state_end_dt ranges cannot overlap, should be constrained.
What's good in such approach?
You have a full history of state transitions of your essential objects;
You can query system as it was at the given point in time;
Delivery of new configuration can be done in advance by inserting an appropriate record(s) with desired state_start_dt stamps;
Change auditing is integrated into the design (well, a couple of extra columns are required for a comlpete tracing).
What's wrong?
There will be data duplication. To reduce it make sure to split up the instantiating relations. Like: do not create a single table for customer data, create a bunch of those for credentials, addresses, contacts, financial information, etc.
The real Primary Key is service.service_id, while information is kept in a subordinate table service_state. This can lead to situation, when your service exists, while somebody had (intentionally or by mistake) removed all service_state records.
It's difficult to decide at which point in time it is safe to remove state records into the offline archive, for as long as there are entities in the system that reference service, one should check their effective dates prior to removing any state records.
Due to #3, one cannot just delete records from the service_state. In fact, it is also wrong to rely on the state_end_dt column, for service may have been active for a while and then suppressed. And querying service during moment when it was active should indicate service as active. Therefore, status column is required.
I think, that keeping in mind this approach downsides, it is quite nice.
Though I'd like to hear some comments from the Relational Model perspective — especially on the drawbacks of such design.
I would recommend just duplicating the data in separate snapshot table(s). You could certainly use versioning schemes on the main table(s), but I would question how much additional complexity results in the effort to reduce duplicate data. I find that extra complexity in the data model results in a system that is much harder to extend. I would consider duplicate data to be the lesser of 2 evils here.
I want to store "Tweets" and "Facebook Status" in my app as part of "Status collection" so every status collection will have a bunch of Tweets or a bunch of Facebook Statuses. For Facebook I'm only interested in text so I won't store videos/photos for now.
I was wondering in terms of best practice for DB design. Is it better to have one table (put the max for status to 420 to include both Facebook and Twitter limit) with "Type" column that determines what status it is or is it better to have two separate tables? and Why?
Strictly speaking, a tweet is not the same thing as a FB update. You may be ignoring non-text for now, but you may change your mind later and be stuck with a model that doesn't work. As a general rule, objects should not be treated as interchangeable unless they really are. If they are merely similar, you should either use 2 separate tables or use additional columns as necessary.
All that said, if it's really just text, you can probably get away with a single table. But this is a matter of opinion and you'll probably get lots of answers.
I would put the messages into one table and have another that defines the type:
SocialMediaMessage
------------------
id
SocialMediaTypeId
Message
SocialMediaType
---------------
Id
Name
They seem similar enough that there is no point to separate them. It will also make your life easier if you want to query across both Social Networking sites.
Its probably easier to use on table and use type to identify them. You will only need one query/stored procedure to access the data instead of one query for each type when you have multiple tables.
I'm building a website that lets people create vocabulary lessons. When a lesson is created, a news items is created that references the lesson. When another user practices the lesson, the user also stores a reference to it together with the practice result.
My question is what to do when a user decides to remove the lesson?
The options I've considered are:
Actually delete the lesson from
the database and remove all
referencing news items, practise
results etc.
Just flag it as deleted and
exclude the link from referencing
news items, results etc.
What are your thoughts? Should data never be removed, ala Facebook? Should references be avoided all together?
By the way, I'm using Google App Engine (python/datastore). A db.ReferenceProperty is not set to None when the referenced object is deleted as far as I can see?
Thanks!
Where changes to data need to be audited, marking data as deleted (aka "soft deletes") helps greatly particularly if you record the user that actioned the delete and the time when it occurred. It also allows data to be "un-deleted" very easily.
Having said that there is no reason to prevent "hard deletes" (where data is actually deleted) as an administrative function to help tidy up mistakes.
Marking the data as "deleted" is simplest. If you currently have no use for it, this keeps everything in your database very tidy and makes it easy to add new functionality.
On the other hand, if you're doing something like showing the user where their "vocabulary points" came from, or how many lessons they've completed, then the reference to soft deleted items might be necessary.
I'd start with the first one and change it later if you need to. Here's why:
If you're not using soft deletes, assume they won't work in the way that future requests actually want them to. You'll have to rewrite them anyway.
If you are using them, assume that nobody is using the feature which uses them. Now you've done a lot of work and tied yourself into maintenance of something nobody cares about.
If you create them, you'll find yourself creating a feature to use them. See the above.
If you don't create them, you can always create them later, once you have better knowledge about what the users of your system really want.
Not creating soft deletes gives you more options going forward. Options have value. Options expire. Never commit early unless you know why.
I'm building an Event Registration site. For any given event, we'll have a handful of items to choose from. I have a table for these items. For each event we might have special options for users. For example, for one of the events new users get to buy an item which is not available to other users. This may not apply to all the events. For other events we might have some other restriction on items. I will obviously be checking this programmatically on application side. I would like to though, set up a column containing flag in the items table. But I don't find it feasible because this condition may only apply to one particular event. I don't want all the future items to have this column. What is a good approach to take in such a situation? Should I create a special "restrictions" table and just do a join? How would I handle this on the application side?
Yes, you are going to need an additional table with the list of items that have special rules.
It sounds like the 'special options' idea is still evolving, so it's probably too early to know whether to think of it as containing 'restrictions' or 'bonuses'
And of course you'll probably need another table which maps items to particular groups of users.
General advice in this sort of situation: you should do something simple until the spec gets at least semi-frozen. I've just gone through it myself: the marketing guys had all kinds of ideas about special deals and discounts. If I had taken the time to build the perfect engine, it would have gotten tossed a month later when they changed direction.