First post, please be kind.
NOTE: I have reviewed entry #20856 (how to implement tagging) but feel this is different due to the fact that the tags method I'm considering is organization specific in my app. I’m hoping someone can confirm the direction I’m going or point out some other options.
(background) We are building a web application that gives different organizations visibility to their inventory in different locations. The database stores users, organizations, sites, and items and there are links from sites and items to organizations that allow us to determine which items / sites to show to which users (based on their organization).
It is common for two (or more) organizations to want to use the portal to check on the stock status of (for example) Widget A in the Los Angeles Warehouse. That part is fine. However, the different organizations also track unique information about Widget A. For example, Org 1 wants to track the color, volume, and primary vendor for each item. Org 2 wants to track Color, Stock Type, Inventory Cycle, Buyer Code for each item. I want to avoid a situation where I have to have a table loaded with all these possible fields and then figure out which organizations use which fields.
I’m considering using something along the lines of tags, but adding a tag category, and having the tag category be defined at the Org level. So, the basic table structure would be something like:
Table: OrgTagCategory
Fields: OrgId, TagCategoryId, CategoryTitle
Table: OrgTag
Fields: OrgId, TagCategoryId, TagId, TagTitle
Table: OrgItemTag
Fields: OrgId, ItemId, TagId
Then, when the user logged in the main dashboard the grid would include their appropriate item fields as columns in the grid. So, from above example, Org 1 would see Item#, Description (would be shown for all), color, volume, and primary vendor. Org 2 would be shown Item#, Description, Color, Stock Type, Inventory Cycle, Buyer Code.
Am I overthinking this or is there a simpler method of doing this that I’m missing? All thoughts / feedback sincerely appreciated.
That should be no problem, but you're storing the OrgId redundantly. Also it seems like there could be some overlap (probably a lot of overlap, realistically) between tags and orgs.
Here's how I'd do it:
Table: OrgTag
Fields: OrgId, TagId
Table: Tag
Fields: TagId, TagTitle
Table: ItemTag
Fields: ItemId, TagId
This way each org is associated with the tags it's interested in, but you don't have redundant tags. A given tag that's used by multiple orgs just gets a bunch of rows in OrgTag, instead of multiple rows in Tag with the same TagTitle.
You'd only need a table OrgTagCategory if there were multiple tag categories per org. But you haven't described this extra association as a requirement.
Based on your description I sketched a simplified model and combined it with the observation pattern. This should enable you to track various item properties and user preferences for viewing them. Admittedly, the Preference table may grow large, but data has to be stored somewhere anyway, and you may retrieve it using sql, which simplifies the business layer.
- Organization and person are types of users. User table has columns common to all users, while Organization and Person tables have columns specific to each one.
- A stock item (widget class) can be found at several sites (warehouses); a site stores many items.
- One item belongs to one user; a user can own many items.
- Measurement and trait are types of observations. Measurement is a numeric observation, like height. Trait is a descriptive observation, like color.
- An observation is of a specific type (height, weight, color), there can be many observations of the same type.
- One item (widget class) can have many observations, an observation relates to one item only.
- A user can prefer to display many observations; an observation may be preferred (to display) by many users.
UPDATE
We could simplify user's subscription to item details (observations) by tagging observation type, for example height, weight, width would be tagged with: all, dimensions, physical. Some other tags would be: accounting_interest, tracking_specific, etc. A user would then subscribe to tags only. Tags (could) form a hierarchy with ALL at the top.
- One observation type (height, weight, color) can have many tags, one tag belongs to many observation types.
- Each tag may have a parent tag forming a hierarchy.
- A user stores preferences for a set of tags that she usually monitors.
UPDATE 2
Now we can sort out who is who and who owns what. In this modification a user (now a person) can work for more than one organization (having several part-time jobs or contracts). An item belongs to a organization now. A logged-in user can see all items from all organization that she works for.
My first quick thot on this would be that - if this is just limited to 'showing' particular fields to particular Orgs on Dashboard then it is better to handle it on the App side. If there's any other usage of 'tagging' then pls clarify.
Here's a simple approach -
You can store a field [OrgDashboardFields] in the Org master table or the OrgItem table. This will be a comma (',') separated list of fields to be shown on the dashboard. At run-time fetch the [OrgDashboardFields] field and parse the comma separated list in the app and make the Dashboard Grid behave accordingly.
Or, if there's a dynamic-query framework then based on the [OrgDashboardFields] field you can create a dynamic SQL-query and get the desired result which is purely Org specific.
Related
I am building a tool to allow people to create customized reports. My question resolves around getting the right database schema and design to support some custom report settings.
In terms of design, I have various Slides and each Slide has a bunch of settings (like date range, etc). A Report would basically be an ordered list of slides
The requirements are:
A user can create a report by putting together a list of "Slides" in any order they wish
A user can include the same slide twice in a report with different settings
So I was thinking of having the following tables:
Report Table: Id, Name, Description
Slide Table:, Id, Description
ReportSlide Table: ReportId, SlideId, Order, SlideSettings
my 2 main questions are:
Order: Is this the best way to manage the fact that a user can order their slides on any given report
SlideSettings: since every slides has a different set of settings (inputs), i was thinking of storing this as just a json blob and then parsing it out on the front end. Does anything one think this is the wrong design? Is there a better way to store this information (again, each slide has different inputs and you could have the same slide listed twice in a report each with different settings
Order: Is this the best way to manage
It is the correct way.
SlideSettings: ... storing this as just a json blob
If you never intend to query these values, then that's fine.
You may want to rename ReportSlide to SlideInReport. A relationship should not just list the referenced tables, but the nature of the relationship.
Some (me) prefer to give PK-columns and FK-columns the same name. Then you cannot get away with just Id, but you need to call them sld_id, rep_id.
May be you should have a Settings table. You may also need a ValueTypes table to define which setting can take what kind of values. (such as Date Range). And then let the list of setting IDs be stored against a slide.
Needless to say, these "best way"s will depend on type and amount of data being stored etc. Am a novice in JSON etc, but as far as I read, it's not a good idea to keep JSON strings as database fields, but not a rule.
I think, from a high level view, your schema will work. However, you might consider revising some of the table structure. For example:
Settings
Rather than a JSON blob, it may be best to add columns for each setting the ReportSlide table. Depending on what inputs you allow, give a column for each. For example, your date range will need to have StartDate/EndDate, Integers, Text fields, etc.
What purpose does the Slide Table serve? If your database allows a many-to-many relationship between Slides and Reports, then the ReportSlide table will hold all your settings. Will your Slide Table have attributes? If not, then perhaps Report Slides are all you need. For example:
Report Table: ReportID | DateCreated | UserID | Description
ReportSlides Table: ReportSlideID | ReportID | SlideOrder | StartDate | EndDate | Description...
Unless your Slide table is going to hold specific attributes that will be common across every report, you don't need the extra joins or space.
Depending on the tool, you may also want to have a DateCreated, UserID, FolderID, etc. Attributes that allow people to organize their reports.
If the Slides are dependent on each other, you will want to add constraints so Slide 2 cannot be deleted if Slide 3 depends on it.
Order
Regarding order, having a SlideOrder column will work. Because each ReportSlideID will have a corresponding Report, the SlideOrder can still be changed. That way, if ReportSlideID = 1 belongs to ReportID = 1 and has specific settings, it can be ordered 7th or 3rd and still work.
Be aware of your naming convention. If the Order column is directly referencing Slide Order, then go ahead and name it SlideOrder.
I'm sure there are a million other ways to make it efficient. That's my initial idea based on what you've provided.
Report Table: ID (Primary Key), Name, Description,....
Slide Table: ID (PK), Name, Description,...
Slide_x_report Table: ID(PK), ReportID (FK), SlideID (FK), order
Slide_settings Table: ID(PK), NameSetting, DescriptionSettings, SlideXReportID (FK),...
I think that you shoud have a structure like this, and in the Slide_settings table you will have the setting of the differents slides by reports.
Imagine that the slide_settings table may contain dynamic forms and these should relate to a specific slide of a report, in this way you can have it all properly stored and the slide_settings table, you would have only columns that are needed to define an element of slide.
We've built an algorithm that helps us deliver relevant articles to our users. In the background during certain intervals, the algorithm will calculate metadata, such as average age, age spread, and gender coefficient from a slew of data related to views, comments, and votes.
With that said, are there any downsides to storing this metadata as fields on the Articles table? Or, should I create a separate table, such as Article_Data, to store the information? I am just not sure how much the updating of this metadata will interfere with selecting the articles.
For the most part, we will be SELECTing articles and its metadata and JOINing it on user data (age, gender, etc) to show users relevant content. The only time we don't need the metadata is when we show a particular article to a user.
If the fields are clearly defined, and there are a limited number of them, put them in the Articles table.
If you are going to store more than one record of metadata fields per article, you need another table, in a one-to-many relationship with the Articles table.
If the fields are not clearly defined, user-defined, or there are many of them, you probably need a new table with one row per metadata item. But this is more difficult to work with in the long run.
See Also
http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
I am working on a homework project where we design a website for a store, and I have been assigned the database. This is my first database attempt. I am using LibreOffice Base for the design, and cannot find any guides on how to make subtypes. For example, for every shirt in the inventory, there'd be a different group of colors it comes in and for every different color a list of individual sizes and how many of each size is in stock. However, I can't find aggregation anywhere in "Table Relations."
So I make a table for shirts with the base information (brand, price, etc), and then a separate table with just 2 columns (size and number of units in stock --- we're letting the possibility of multiple colors wait for now). I then make a form for the shirt with the base information and a subform with 2 columns: size and number available. Both of the forms are tables rather than labeled text boxes. However, the subform for shirt size does not maintain separate information for each row in the main form (ie the one with the base information for the shirts). How the heck do I do this?
Lastly, since this is my first crack at databases, I would not be at all surprised if I'm going at it all wrong, and if so would gladly appreciate a push in the right direction or a webpage explaining how to do this that I didn't find due to not entering the correct search terms.
You need to create linked fields in the master table. The shirt table has a primary key; refer to that in the subordinate table. Alternatively, create a primary key in the subordinate table and refer to it in the master table. Then in the subform --> properties, designate the appropriate link between the master and slave fields. The functionality is described in the LibreOffice Base handbook (p.105)
I’m working on a project where you work with all kinds of items. What it is is of no importance, it’s the database design I’m worried about. If someone could give me some insight in how I should create the layout of my database for this, or just point me in the right direction, I would be most thankful.
All kinds of items in one list
Imagine you have lists of items. You could have a list of CDs, a list of DVDs and a list of books. This translates to 1 list has many items in database terms, with the id of the list in the item row.
But what if you wanted to have a list with all Super Mario related stuff, containing soundtrack DVDs, that horrible live action film and some fanfiction novels based on the plumber’s life.
I suddenly realized, when drawing out my database that those items, that belong to the same list, couldn’t be in the same table, as they all would have different columns to support artist/album title, director/movie title, author/novel title, etc.. Wich I couldn’t possibly have all in one giant table.
On top of that, I want to have the track titles of the soundtrack albums and the actors of the film in my database. If I had only CDs, I could easily attach a album_track-table to my item-table, but I can’t just attach all kinds of different tables to my item-table, as that wouldn’t be too good for performance if I wanted to get all items with all their details for a certain list. The procedure would have to search all attached tables for references of the list, even if the list doesn’t contain any books, vinyls, manga, tv-series, plants, furniture, etc…
What I have right now is the following layout (but I can’t imagine this is the best way to do this):
t_list (id) --> t_item (id, id_list, image)
t_item --> t_cd (id, id_item, artist, title)
t_item --> t_dvd (id, id_item, director, title)
t_item --> …
t_cd --> t_cd_track (id, id_cd, track_title, length)
t_dvd --> t_dvd_actors (id, id_dvd, actor_name, image)
…
Custom columns
Now, imagine that to add these items to a cd list, you’d have a form with input fields, according to the columns in the table t_cd (artist, album title, genre, …). I want to be able to add a custom input field for example for the average price of albums.
This is set for a certain user for a certain list. This is not set on an item level, because that would mean it would be added to everyone’s form. I just want to add that field to my own CD list.
But, it still needs to related to items, because that value needs to be filled in in the database.
I’m thinking about something like this:
t_list (id) --> t_extra_field (id, description, id_list)
t_extra_field --> t_field_value (id, id_extra_field, value)
But I’m not entirely sure where to attach this in my database scheme.
Could this kind of structure also be an answer to my previous question? (t_field --> t_field_value) If so, I also don’t know where to attach that. Perhaps to list, like I suggested in the above example?
That would mean that all details for a certain item, are in one table, but value by value, not on 1 single record, according to a category id of some sort, coming from another table, attached to item. That would be a table with a lot of records, which again raises my question : isn’t this bad for performance..?
I sincerely hope someone could give me some insight in the matter..
A completely generic database is probably a bad idea - it usually means you have to enforce the data consistency completely at the application level. This might be justified for highly "untyped" or "volatile" data when you want to avoid DDL at run-time, but the data you describe here looks "typed" enough for a more conventional database design.
Judging on your description, you'd need something similar to this:
The symbol denotes the "category" (aka. inheritance, sub-type, generalization hierarchy etc.).
For the specific cases where we know exactly how the items should be connected, we can model that directly through a link (aka. junction) table between specific sub-types, as in case of the TRACK table.
Also, we can group items of different kinds through GROUP and GROUP_ITEM (so, say, a Mario soundtrack(s), movie(s) and book(s) can be grouped together, under the same GROUP_ID).
Artists are also handled in a fairly general way, so we can easily represent a situation where (for example) a same person writes both a song and a book.
As for things such as "average price of albums", ideally you shouldn't store them at all - you should calculate them when needed, based on the existing data, so the possibility of an out-of-date result is eliminated.
If this becomes problematic performance-wise, either:
do it periodically, cache the result and live with the somewhat out-of-date result.
or cache the result whenever the data is modified (through triggers), but do it very carefully to avoid anomalies in the concurrent environment.
For example...
SELECT AVG(PRICE) FROM TABLE1;
INSERT TABLE2 (AVERAGE_PRICE) VALUES (result_of_the_previous_query);
...is almost certainly unsafe, but depending on the DBMS even...
INSERT TABLE2 (AVERAGE_PRICE) VALUES (SELECT AVG(PRICE) FROM TABLE1);
...might not be completely safe without proper locking. You'll need to learn about your DBMS'es transaction isolation and locking.
In the specific case of calculating an average, there are other tricks that you might consider, such as separately incrementing/decrementing the COUNT and adding/subtracting SUM of the price through triggers with each INSERT/UPDATE/DELETE, and then calculating the AVG on the fly. SQL guarantees that things such as UPDATE MY_COUNT = MY_COUNT + 1 will be "atomic".
Short question: How should product categories that appear under multiple categories be managed? Is it a bad practice to do so at all?
Background info:
We have a product database with categories likes this:
Products
-Arts and Crafts Supplies
-Glue
-Paper Clips
-Construction Paper
-Office Supplies
-Glue
-Paper Clips
Note that glue and paper clips are assigned to both categories. And although they appear in two different spots in this category tree, they have the same category ID in the database. Why? Two reasons:
Categories are assigned attributes - for example, a paper clip could have a weight, a material, a color, etc.
Products assigned to the glue category are displayed under arts and crafts and Office Supplies. Which is to be expected - they're the same actual category ID in the database.
This allows us to manage a single category and it's attributes and assigned products, but place it at multiple places within the category tree.
We are using the nested set model, so the db structure we use to support this is:
Category
----------
CategoryID
CategoryName
CategoryTree
------------
CategoryTreeID
CategoryID
Lft
Rgt
So there's a 1:M between Category and CategoryTree because there can be multiple instances of a given category within the category tree.
Is there a simpler way to model this that would allow a product category to display under multiple categories?
I don't see anything wrong with this as long as it is true that all Glue is appropriate for both Office Supplies and craft supplies.
What you have is a good way, though why not simplify the 2nd table like so:
Category
ID
Name
SubCategory
ID
CategoryID
SubCategoryID
Though for the future I would beware of sharing child categories between the two root categories. Sometimes it is better to create a unique categorization of products for consistency, which is easier to manage for you and potentially easier to navigate for the customer. Otherwise, you have the issue that if you're on the Glue page coming from office supplies, then do you show the other path as well? If not, you will have two identical pages, except for the path, which is an issue for SEO. If you do, then the user may get confused.
The most famous example of this is Google Mail, where the classification is done this way. Google is famous for the usability of their products ...
I believe other words are preferable to the "parent" word, that actually suggest only XToOne relationship...
Maybe you could say that a Product as many Categories, so the relationship would be ManyToMany. And only the display would starts with Categories to reach the Products...
This would highlight a problem : if you don't limit the number of categories, and you display the categories with sub-categories and so on, you could end up with:
a huge categories and product list, with many many duplications
a big depth (probably unreadable)
The interesting part is highlighting the problem, then to imagine a solution that is fine for the end-user.
It may well be necessary for a category to have multiple parents. However, no matter what parent you found a category under, its subcategories should remain the same.
I've seen real systems that implemented precisely this logic and worked fine.
edit
To answer your question, I don't think the model I'm suggesting is as restrictive as you imagine. Basically, a given branch of the tree may be found under more than one parent branch, but wherever it is found, it has the same children. Nothing about this prevents you from cherry-picking some children of one branch and also making them children of another.
So, for example, you could include the glues category under both office supplies and hobby supplies, and if you added "Crazy Glue (Suppository Edition)" under glues, it would show up in both. If you have items that might be grouped together logically but need to be separated by their use, you can still do that. You might put mucilage and paste under the category of hobby adhesives, which goes under the hobby root, but not under the office root. Or you could do that and simultaneously have a combined category that's used internally by your buyers. What you can't do is forget to include that new type of glue in all of the relevant categories once you've added it wherever it belongs in your business model ontology.
In short, you lose very little with this restriction, but gain a bit of structure to help avoid the problem of having to manage each item individually.
edit
Assuming I've made a convincing case for the model itself, there's still the issue of implementation. There are lots of options, but here's one way to go:
There is a CatalogItem table containing a synthetic primary key, the label, optional description/detail text, and an optional SKU (or equivalent). You then have a many-to-many CatalogItemJoin with child and parent ID's, both sides constrained to CatalogItemTable.
An item that appears as a parent is a category, so it should not have a SKU. An item that appears only as a child is a product, so it should have a SKU. It's fine for any item to have more than one parent; that just means that it's in multiple categories. Likewise, there's no problem with multiple children per parent; that would be the typical case of a category with a few products in it. However, given a category's ID, its children will be the same regardless of what parent category led you there. The other constraint is that you'll want to avoid loops.