Help determining Maintenance Item table structure best practice - sql-server

I have two different existing Database design's in the legacy apps I inherited. On one side they created two tables; tblAdminMaintCategory and tblAdminMaintItems. On the other side individual tables for everything; tblAdminPersonTitle and tblAdminRace.
Method #1
Now in the first example there would be an entry in tblAdminMaintCategory for Race(say ID is #2) and then in tblAdminMaintItems each individual race would have an entry with the corressponding categoryID. Then to query it for race options, for example, would go --> SELECT * FROM tblAdminMaintItems WHERE CategoryID = 2
Method #2
Now in the second example there would just be an entry in tblAdminRace for each individual race option. To query that would go --> SELECT * FROM tblAdminRace.
Now, I am trying to figure out, going forward, which of these paths I want to follow in my own apps. I don't like that the First Method, seemingly, introduces magic numbers. I don't like that the Second Method introduces many, many, small tables but I am not sure that is the END OF THE WORLD!!
I am curious as to others opinions on how they would proceed or how they have proceeded. What worked for you? What are some reasons I shouldn't use one or the other?
Thanks!

It is reasonable design to have separate entities in different tables; like: Race, Car, Person, Location, Maintenance task, Maintenance schedule. If you don't like joins in queries, simply design several views. I may have misunderstood the example, but here is one suggestion.

Magic numbers always seem like a bad idee. Specifically where you can use more desciptive codes in tables that do not have that many entries (Titles, Races, etc).
On the other hand having a gazillion small tables to link to not only makes the queries difficult to maintain, but harder to read and more joins to parse.
EDIT:
Smaller reference tables will make maintinance easier. Change in one place change all. But it will definately cluter up you table structure designers, and forgetting to populate these metadata tables will give you a lot of issues at new install sites.

Related

System design: whether to normalize the departments or not

I'm working with two consultants in one project. The thing is we reached a point where both of them cannot get into an agreement and each offer a different approach.
The thing is we have a store with four departments and we want to find the best approach for working with all of them in the same database.
Each department sell different products: Cars, Boats, Jetskies and Motorbikes.
When the data is inserted or updated in each department there are some triggers to be fires so different workflows will begin, when adding a new car there are certain requirements that needs to be checked as well as the details of the car that are completely different than a boat. Also, regarding the data there are not many fields there are in common, I would say so far only the brand, color, model and year, everything else is specific for each deparment due to the different products and how they work with them..
Consultant one says:
Create one table for all the departments and use a column to identify what department the row belongs to, this way you will have only one trigger and inside the trigger you will then call the function/mehod you need for each record type.
Reason: you only have one table (with over 200 fields) and one trigger, is easier to maintain. Also if you need to report you just need to query one table and filter based on the record type. If you need to report for all the items you don't need to have multiple joins.
Consultant two says:
Create one table for each deparment and a trigger for each table.
Reason: you will have smaller tables (aprox 50 fields each) and is more flexible and you have it all separated. If you want to report you need to join the tables as you want to include data from different places.
I see the advantages of having everything in one place but if I want to expand or change anything I have the feeling I will bre creating a beast table as the data grows.
On the other side keep it separated look more appealing but will need to setup everything for each different table.
What would you say is the best approach?
You should probably listen to consultant number two.
The thing is, all design is trade-offs. You need to assess the pros and cons of each approach and you need to think about the risks that each design entails.
What happens when your design grows? (department 5, more details per product type,...)
What happens when the system scales up to higher transaction volumes?
What happens when your business rules change?
I've been doing this for a long time and I've seen some pendulums swing back and forth when it comes to what is "in fashion" as far as database and software best practices.
I'd say right now the prevailing wisdom is that separation of concerns is innately good. This means you should keep your program logic (trigger code) separate for each department. This makes sense because your logic will vary from one product type to the next since they mostly have distinct columns.
This second point is also important, because your stake in the ground for a transactional system should always be start with third normal form (or higher, if necessary). Sometimes you can get away without it, but four different types of objects with 40 or more distinct attributes each doesn't sound like a good candidate for jamming everything into one table. How do you keep track of which columns belong to which type of product, for example? A separate table for each product type keeps this clean and simple - and importantly - easy for your support programmers to understand.
Contrary to what consultant one is saying, having one trigger instead of four is not likely to be easier to maintain if that one trigger is a big bowl of spaghetti, or even four tidy, well written subroutines joined together with a switch type statement.
These days, programmers favour short, atomic, single-purpose functions (triggers, in your case).
If there is enough common data and common business logic that doing it four times seems awkward, then maybe you have a good candidate for a super-type / sub-type design.
I'll say one
These are all Products, It doesn't matter that its a Bike or a Car. You can control the fields and the object by RecordTypes and Page layouts and that will save you from having 4 Objects, which means potentially 8 new classes(if it follows my pattern it could be up to 20+) + all of the workflow rules and validation rules across the these new objects, it will be very hard to maintain a structure that has 4 objects but are all the same thing.. Tracking Products.
Down the road if you decide to add a new product such as planes, it will be very easy to add a plane to this object and the code will be able to pick up from there if needed. You will definitely need Record Types to manage each Product. The trigger code shouldn't be an issue if the consultants are building it properly meaning a trigger should never have any business logic so as long as that is followed all of the code will be maintainable
I will go with one.
I assume you have a large number of products and this list will grow in future. All these are Products at the end. They will have some common fields and common logic.
If you use Process Builder with Invocable classes instead of Triggers, you may be able to get away with just configuration changes while adding a new object, if its fields and functionality are same/similar to a existing object.
There may also be limitation on the number of different objects a profile has access to based on your license types.
Salesforce has a standard object called Product. Its a single object to be classifies based on record type.
I would have gone with approach two if this was not salesforce. Based on how salesforce works and the limitations it imposes one seems like a better and cleaner solution.
I would say option 2.
Why?
(1) I would find one table with 200+ columns harder to maintain. You're also then going to have to expose fields for an object that doesn't need said fields.
(2) You are also going to have to "hide" logic inside the trigger which then decides to do different actions based on the type of department etc...
(3) Option 2 involves more "scaffolding" and separate objects but those are objects are inherently smaller and easier to maintain and don't specifically hide logic or cause any sort of ambiguity.
(4) Option 2 abides by the single responsibility principle. Not everyone follows this I understand but I find it a good guiding principle, as the responsibility for the data lies with the individual table and the responsibility for triggered the action lies with the individual trigger as opposed to just being one mammoth entity/trigger.
** I would state that I am simply looking at this from a software development perspective, I am not sure whether or not SalesForce would handle this setup, but it is the way I would personally prefer to design it. :)
Option 2 for me.
You've said that there is little common data and the trigger logic is completely different. Here are some additional technical considerations.
Option 1 Warnings
The trigger would be a single point of failure and errors will be trickier to debug. I have worked with large triggers where broken logic near the top has stopped logic near the bottom from running, sometimes silently! You also have to maintain conditional guards to control the flow of logic based on the data which is another opportunity for error.
I'm not red hot on indexes but I believe performance will suffer due to no natural order of the multi-purpose data. More specific tables will yield better indexing strategies. Also, large rows can lead to fragmented indexes.
https://blogs.msdn.microsoft.com/pamitt/2010/12/23/notes-sql-server-index-fragmentation-types-and-solutions/
You would need extra consideration when setting nullable/default constraints on each surplus field not relevant to the product in question. These subtleties can introduce bugs and might make it harder if/when you decide to work with a data layer technology such as Entity Framework. E.g. the logical difference between NULL, 0 and 'None', especially on shared columns.

Is this a "correct" database design?

I'm working with the new version of a third party application. In this version, the database structure is changed, they say "to improve performance".
The old version of the DB had a general structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES
(
ENTITY_ID,
PROPERTY_KEY,
PROPERTY_VALUE
)
so we had a main table with fields for the basic properties and a separate table to manage custom properties added by user.
The new version of the DB insted has a structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES_n
(
ENTITY_ID_n,
CUSTOM_PROPERTY_1,
CUSTOM_PROPERTY_2,
CUSTOM_PROPERTY_3,
...
)
So, now when the user add a custom property, a new column is added to the current ENTITY_PROPERTY table until the max number of columns (managed by application) is reached, then a new table is created.
So, my question is: Is this a correct way to design a DB structure? Is this the only way to "increase performances"? The old structure required many join or sub-select, but this structute don't seems to me very smart (or even correct)...
I have seen this done before on the assumed (often unproven) "expense" of joining - it is basically turning a row-heavy data table into a column-heavy table. They ran into their own limitation, as you imply, by creating new tables when they run out of columns.
I completely disagree with it.
Personally, I would stick with the old structure and re-evaluate the performance issues. That isn't to say the old way is the correct way, it is just marginally better than the "improvement" in my opinion, and removes the need to do large scale re-engineering of database tables and DAL code.
These tables strike me as largely static... caching would be an even better performance improvement without mutilating the database and one I would look at doing first. Do the "expensive" fetch once and stick it in memory somewhere, then forget about your troubles (note, I am making light of the need to manage the Cache, but static data is one of the easiest to manage).
Or, wait for the day you run into the maximum number of tables per database :-)
Others have suggested completely different stores. This is a perfectly viable possibility and if I didn't have an existing database structure I would be considering it too. That said, I see no reason why this structure can't fit into an RDBMS. I have seen it done on almost all large scale apps I have worked on. Interestingly enough, they all went down a similar route and all were mostly "successful" implementations.
No, it's not. It's terrible.
until the max number of column (handled by application) is reached,
then a new table is created.
This sentence says it all. Under no circumstance should an application dynamically create tables. The "old" approach isn't ideal either, but since you have the requirement to let users add custom properties, it has to be like this.
Consider this:
You lose all type-safety as you have to store all values in the column "PROPERTY_VALUE"
Depending on your users, you could have them change the schema beforehand and then let them run some kind of database update batch job, so at least all the properties would be declared in the right datatype. Also, you could lose the entity_id/key thing.
Check out this: http://en.wikipedia.org/wiki/Inner-platform_effect. This certainly reeks of it
Maybe a RDBMS isn't the right thing for your app. Consider using a key/value based store like MongoDB or another NoSQL database. (http://nosql-database.org/)
From what I know of databases (but I'm certainly not the most experienced), it seems quite a bad idea to do that in your database. If you already know how many max custom properties a user might have, I'd say you'd better set the table number of columns to that value.
Then again, I'm not an expert, but making new columns on the fly isn't the kind of operations databases like. It's gonna bring you more trouble than anything.
If I were you, I'd either fix the number of custom properties, or stick with the old system.
I believe creating a new table for each entity to store properties is a bad design as you could end up bulking the database with tables. The only pro to applying the second method would be that you are not traversing through all of the redundant rows that do not apply to the Entity selected. However using indexes on your database on the original ENTITY_PROPERTIES table could help greatly with performance.
I would personally stick with your initial design, apply indexes and let the database engine determine the best methods for selecting the data rather than separating each entity property into a new table.
There is no "correct" way to design a database - I'm not aware of a universally recognized set of standards other than the famous "normal form" theory; many database designs ignore this standard for performance reasons.
There are ways of evaluating database designs though - performance, maintainability, intelligibility, etc. Quite often, you have to trade these against each other; that's what your change seems to be doing - trading maintainability and intelligibility against performance.
So, the best way to find out if that was a good trade off is to see if the performance gains have materialized. The best way to find that out is to create the proposed schema, load it with a representative dataset, and write queries you will need to run in production.
I'm guessing that the new design will not be perceivably faster for queries like "find STANDARD_PROPERTY_1 from entity where STANDARD_PROPERTY_1 = 'banana'.
I'm guessing it will not be perceivably faster when retrieving all properties for a given entity; in fact it might be slightly slower, because instead of a single join to ENTITY_PROPERTIES, the new design requires joins to several tables. You will be returning "sparse" results - presumably, not all entities will have values in the property_n columns in all ENTITY_PROPERTIES_n tables.
Where the new design may be significantly faster is when you need a compound where clause on custom properties. For instance, finding an entity where custom property 1 is true, custom property 2 is banana, and custom property 3 is not in ('kylie', 'pussycat dolls', 'giraffe') is e`(probably) faster when you can specify columns in the ENTITY_PROPERTIES_n tables instead of rows in the ENTITY_PROPERTIES table. Probably.
As for maintainability - yuck. Your database access code now needs to be far smarter, knowing which table holds which property, and how many columns are too many. The likelihood of entertaining bugs is high - there are more moving parts, and I can't think of any obvious unit tests to make sure that the database access logic is working.
Intelligibility is another concern - this solution is not in most developers' toolbox, it's not an industry-standard pattern. The old solution is pretty widely known - commonly referred to as "entity-attribute-value". This becomes a major issue on long-lived projects where you can't guarantee that the original development team will hang around.

drawbacks of storing all ''things' in a central table

I am not sure if there is a term to describe this, but I have observed that content management systems store all kinds of data in a single table with their bare minimum properties while the meta data is stored in another table in form of key value pairs.
for eg. everything (blog posts, pages, images, events etc) is stored in one table and considered as a post.
I understand that this allows for abstraction and easy extensibility
we are considering designing our new project this way. It is not exactly a CMS but we plan to keep adding modules to it in stages. Lets say initially there will be only posts and images on which comments can be posted. Later on we might add videos which will also have the commenting feature.
what are the drawbacks of this approach ? and will it work for a requirement like ours ?
Thanks
The drawback is that the main table will get zillions of reads (and plenty of writes, too).
This means that there will be lots of lock contentions, heavy reindexing etc.
In order to mitigate this a bit you may consider splitting the "main table" in a series of not-so-main-tables.
Say, you will have one main table for "Posts" (possibly refined through metadata or subtables for specific types of posts, like Sticky, Announcement, Shoutbox, Private...)
One main table for Images (possibly refined for gifs, jpegs etc.)
One main table for Videos...
If this is a custom application (and not intended to be something that has to be "infinitely tweakable" like a CMS or a Portal framework) I think this kind of split is acceptable, and may provide some better performance (if you expect to have large amounts of data).
Regarding your "examples" comment... first of all, if you keep comments again in a single gigantic table you may have similar problems as if you kept all type of items in it.
Assuming this is not a problem, you can obviously put a sort of reference key (you can't use the normal foreign keys, of course) that links comments to their original item.
This works fine when you go from item to comments, a bit less when you have to move from comments to the originating item. So the tradeoff is about what kind of operations would be more frequent for your problem.
Simplicity and extensibility are indeed often attractive aspects of attribute-value and (as you say) "single table of things" approaches.
There's no 100% right answer here -- depending on your performance/throughput goals and extensibility needs, this approach might work for you too.
In most cases, however, where you know what kinds of data you will store, it's usually in your interest to model distinct entities into their own tables and relate the data accordingly. RDBMSes have been architected and refined over decades to cater to this use case and to simply use tables as generic dumping grounds doesn't typically buy you any distinct advantages, except the act of delaying the inevitable need to model your data properly. Furthermore, when you boil everything into one table, you then force users outside your app itself (if you have any, for example report writers) to have to struggle with your "model within a model", which can just make folks frustrated when they write queries, etc. And you will sink to your lowest common denominator -- if you want to optimize queries about type X and you have types Y and Z in that same table in droves, they will impact performance on querying X.
Again, to be clear, there is distinct benefit to the "all things in one table" name/value style metadata approaches. I have used them myself and turned against modeling for similar reasons. However, my advice is to limit yourself to times when you really need to do that (i.e., you need to implement something before you can correctly model the space of things you will need). Most typically, I find myself doing that when I'm prototyping complex systems and I need to get something going sooner than later.

Can you have 2 tables with identical structure in a good DB schema?

2 tables:
- views
- downloads
Identical structure:
item_id, user_id, time
Should I be worried?
I don't think that there is a problem, per se.
When designing a DB there are lots of different parameters, and some (e.g.: performance) may take precedence.
Case in point: even if the structures (and I suppose indexing) are identical, maybe "views" has more records and will be accessed more often.
This alone could be a good reason not to burden it with records from the downloads.
Also, the fact that they are indentical now does not mean they will be in the future: views and downloads are different, after all, so sooner or later one or both could grow an extra field or two.
These tables are the same NOW but may schema change in the future. If they represent 2 different concepts it is good to keep them separate. What if you wanted to have a foreign key from another table to the downloads table but not the views table, if they were that same table you could not do this.
I think the answer has to be "it depends". As someone else pointed out, if the schema of one or both tables is likely to evolve then no. I can think of other cases well (simplifying the security model by allow apps/users access to one or the other).
Having said this, I work with a legacy DB where this is a problem. We have multiple identical tables for customer invoices. Data is actually moved between then at different stages in the processing life-cycle. It makes for a complicated mess when trying to access data. It would have been easily solved by a state flag in the original schema, but we now have 20+ years of code written against the multi-table version.
Short answer: depends on why they are the same schema :).
From a E/R modelling point of view I don't see a problem with that, as long as they represent two semantically different entities.
From an implementation point of view, it really depends on how you plan to query that data:
If you plan to query those tables independently from each other, keeping them separate is a good choice
If you plan to query those tables together (maybe with a UNION of a JOIN operation) you should consider storing them in a single table with a discriminator column to distinguish their type
When considering whether to consolidate them into a single table you should also take into account other factors like:
The amount of data stored in each table
The rate at which data grows in each table
The ratio of read/write operations executed on each table
Chris Date and Dave McGoveran formalised the "Principle of Orthogonal Design". Roughly speaking it means that in database design you should avoid the possibility of allowing the same tuple in two different relvars. The aim being to avoid certain types of redundancy and ambiguity that could result.
Arguably it isn't always totally practical to do that and it isn't necessarily clear cut exactly when the principle is being broken. However, I do think it's a good guiding rule, if only because it avoids the problem of duplicate logic in data access code or constraints, i.e. it's a good DRY principle. Avoid having tables with potentially overlapping meanings unless there is some database constraint that prevents duplication between them.
It depends on the context - what is a View and what is a Download? Does a Download imply a View (how else would it be downloaded)?
It's possible that you have well-defined, separate concepts there - but it is a smell I'd want to investigate further. It seems likely that a View and a Download are related somehow, but your model doesn't show anything.
Are you saying that both tables have an 'item_id' Primary Key? In this case, the fields have the same name, but do not have the same meaning. One is a 'view_id', and the other one is a 'download_id'. You should rename your fields consequently to avoid this kind of misunderstanding.

Database Design

This is a general database question, not related to any particular database or programming language.
I've done some database work before, but it's generally just been whatever works. This time I want to plan for the future.
I have one table that stores a list of spare parts. Name, Part Number, Location etc. I also need to store which device(s) they are applicable too.
One way to do is to create a column for each device in my spare parts table. This is how it's being done in the current database. One concern is if in the future I want to add a new device I have to create a new column, but it makes the programming easier.
My idea is to create a separate Applicability table. It would store the Part ID and Device ID, if a part is applicable to more than one device it would have more than one row.
Parts
-------
ID
Name
Description
Etc...
PartsApplicability
-------
ID
PartID
DeviceID
Devices
------
ID
Name
My questions are whether this is a valid way to do it, would it provide an advantage over the original way, and is there any better ways to do it?
Thanks for any answers.
I agree with Rex M's answer, this is a standard approach. One thing you could do on the PartsApplicability table is remove the ID column, and make the PartID/DeviceID a composite primary key. This will ensure that your Part cannot be associated to the same Device more than once, and vice-versa.
You're describing the standard setup of a many-to-many relationship in an RDBMS, using an intermediate join table. Definitely the way to go if that's how your model will end up working.
Using a separate table to hold many-to-many relationships is the right way to go.
Some of the benefits for join tables are
Parts may be applicable to any device and creating new devices or parts will not lead to modifications to the database schema
You don't have to save nulls or other sentinental values for each part-device mapping that doesn't exists i.e. things will be cleaner
Your tables remain narrow which makes them easier to understand
You seem to be on your way to discover the database normal forms. The 3rd normal form or BNF should be a good goal to have although sometimes it's a good idea to break the rules.
Your second design is a very good design, and similar to what I've done (at work and on my own projects) many times in terms of describing relationships between things. Lookup tables and their equivalent are often far simpler to use than trying to stuff everything in one table.
Would also agree on making the programming easier. Ultimately, you'll find that learning more makes programming far easier than trying to push things into what you already know even when they really don't fit. Knowing how to properly join tables and the like will make your programming with databases far easier than continually modifying columns would be.

Resources