I'm creating a prototype group list application. I want the following objects:
User
List
Item
Comment
I think that I should structure this as follows:
http://myapp.firebase.io/user/
http://myapp.firebase.io/user/uid/lists/
http://myapp.firebase.io/list/
http://myapp.firebase.io/item/listid/
http://myapp.firebase.io/comment/itemid
where http://myapp.firebase.io/user/uid/lists/ points to list unique id's, http://myapp.firebase.io/item/listid/ points to all item objects for a given list, and http://myapp.firebase.io/comment/itemid points to all comments for a given item.
Does this structure make sense? The reason I did it this way instead of nesting further (i.e. http://myapp.firebase.io/list/listid/item/ for items and http://myapp.firebase.io/list/listid/item/itemid/comment for comments) is because it says in the documentation that whenever you fetch an object you fetch all children. Sometimes (perhaps even most of the time) I want to fetch a list's items, but not each item's comments. I might only want to do that when a user clicks on the item.
In a NoSQL database you should model your data for how you intend to use it. I highly recommend reading this article on NoSQL data modeling.
The top-level structure seems fine and does not violate Firebase's recommendation to limit nesting of data. But there are many other places where you might still make mistakes (which is one of the reasons this question is a bit too broad for Stack Overflow, but I'll try to answer some of it anyway).
I'd separate out the user's lists into a separate top-level node:
/userlists/$uid/$listid
That way the /users/$uid nodes would just contain the user's profile information and you could cheaply show a list of users. You might even consider splitting the most visible aspect of the user profile into another top-level node, to make the showing of such a list even cheaper.
/usernames/$uid
You'll be duplicating data in this case. But storage is (relatively) cheap, and optimizing for the more common reading of data is one of the reasons NoSQL databases can scale so well.
As you may notice, I focus on showing a list of user names, retrieving the lists for a user and accessing the profile for a specific user. These are use-cases and we're modeling the data to fit them.
In a NoSQL database you should model your data for how your app accesses it. I highly recommend reading this article on NoSQL data modeling.
After that, write out your list of use-cases and see how you can most easily access the data for it. Liberally denormalize and occasionally duplicate the data, to fit the use-cases. Use multi-location updates to keep denormalized and duplicated data in sync with its main entity.
I have an asp.net-mvc website with a SQL Server backend. I am simplifying my situation to highlight and isolate the issue. I have 3 tables in the DB
Article table (id, name, content)
Location table (id, name)
ArticleLocation table (id, article Id, location Id)
On my website, when you create an article, you select from a multiselect listbox the locations where you want that article sent.
There are about 25 locations so I was debating adding a new location called "Global" as a shortcut instead of having the person select 25 different items from a listbox. I could still do this as a shortcut on the front end but now I am debating if there is benefit for this to flow through to the backend.
So if I have an article that goes global, instead of having 25 records in the ArticleLocation table, I would only have one and then I would do some tricks on the front end to select all of the items. I am trying to figure out if this is a very bad idea.
Things I can think about that are making me nervous:
what if I create an article and choose global but then last in the future 3 new locations are added. Without this global setting, these 3 location would not get the article but in the new way, they would. I am not sure what is better as the second thing might actually be what you want but its a little less explicit.
I have a requirement on a report, I want to filter by all articles that are global. Imagine I would need a article.IsGlobal() methode. Right now I guess I could say if a project has the same count of locations as all of the records in the location table I could translate that to being deemed global but again since people can add new locations, I feel like this approach is somewhat flaky.
Does anyone have any suggestions for this dilemna around creating records in a reference data table that really reflect "all records". Appreciate any advice
By request, here is my comment promoted to an answer. It's an opportunity to expand on it, too.
I'll limit my answer to a system with a single list of locations. I've done the corporate hierarchy thing: Companies, Divisions, Regions, States, Counties, Offices and employees or some such. It gets ugly.
In the case of the OP's question, it seems that adding an AllLocations bit to the Articles table makes the intention clear. Any article with the flag set to 1 appears in all locations, regardless of when they were created, and need not have any entries in the ArticleLocation table. An article can still be explicitly added to all existing locations if the author does not want it to automatically appear in future locations.
Implementation involves a little more work. I would add INSERT and UPDATE triggers to the Article and ArticleLocation tables to enforce the rule that either the AllLocations bit is set and there are no corresponding rows in ArticleLocation, or the bit is clear and locations may be explicitly set. (It's a personal preference to have the database defend itself against "bad data" whenever it's practical to do so.)
Depending on your needs, a table-valued function is a good way to hide some of the dirty work, e.g. dbo.GetArticleIdsForLocation( LocationId ) can handle the AllLocations flag internally. You can use it in stored procedures and ad-hoc queries to JOIN with Article. In other cases a view may be appropriate.
Another feature that you are welcome to borrow ("Steal from your friends!") is to have the administrator's landing page be an "exceptions" page. It's a place where I display things that vary from massive flaming disasters to mere peccadillos. In this case, articles that are associated with zero locations would qualify as something non-critical, but worth checking up on.
Articles that are explicitly shown in every location might be of interest to someone adding a new location, so I would probably have a web page for that. It may be that some of the articles should be updated to account for the new location explicitly or reconsidered for being changed to all locations.
Is it ever a good idea ... that represent “all other records”?
Is it it ever a good idea to represent a tree in table? Root of a tree represents “all other records”.
Trees and hierarchies are not simple to work with, but there are many examples, articles and books that tackle the problem -- like Celko's Trees and Hierarchies in SQL; Karwin's SQL Antipatterns.
So what you actually have here is a hierarchy (maybe just a tree) -- it may help to approach the problem that way from the start. The Global from your example is just another Location (root of a tree), so when a new location is added, you may decide if it will be a child of the Global or not.
Facts
Location(LocationID) exists.
Location(LocationID) is contained in Parent Location(LocationID).
Article(ArticleID) exists.
Article(ArticleID) is available at Location(LocationID).
Constraints
Each Location is contained in at most one Parent Location. It is possible that for some Parent Location, more than one Location is contained in that Parent Location.
It is possible that some Article is available at more than one Location
and that for some Location, more than one Article is available at that Location.
Logical
This way you can assign any location to an article -- but have to resolve it to the leaf level when needed.
The hierarchy (tree) is here represented in the "naive way"; use closure table, nested sets or enumerated path instead -- or, if you like recursion...
tl;dr
In this case as I understand it, I think it is a good idea to create a "global" location in the Location table. I definitely find it preferable to creating a "global" flag in the Article table.
"Is it ever a good idea...?" is not a question we like to answer on SO. It's mostly a debate question, not a Q&A question, and besides, we have enough creativity in our community to come up with some example where "it" would be a good idea, regardless.
To your more specific question, how do I represent "all locations" in the database? that is a judgement call based on your business requirements.
Do you want "all locations" to include future locations?
If not, then probably you should only implement "all locations" as a helper that selects all current locations in the database.
Do you anticipate having a hierarchy of locations?
Real-world locations have significant hierarchy:
Global
Multi-national (continent, trading block)
Country
Administrative region (state, province, canton, etc.)
City
Neighborhood
If you think you are going to want to have the option to choose, say, a Country, instead of Global, then implementing a hierarchical representation such as Damir suggests is the best way to go. However, if you are not sure if you are ever going to have any other grouping of locations besides Global, a hierarchical data structure is too much work for now. All you need to do is make sure your current implementation has a migration path to a possible future hierarchical representation.
Global as a pseudo-location
If you do want future locations included in Global and do not need a hierarchical location structure, then my instinct based on years of experience would be to create "Global" as a pseudo-location. That is, Global would be one of the locations in the Location table, but it would have a special meaning. This is definitely a trade-off, but has the benefit of not altering the data structure to support Global which means that all the special cases that "Global" creates are handled by excluding or including some Locations in queries rather than by checking some flags somewhere. (Or if you like flags, you can add a 'pseudo-location' flag to the Location table.)
With Global as a location, additions or deletions to the Location table are handled automatically. The query for all Global articles is straightforward: the same as the query for all articles for any other Location. Reporting on articles by location is also straight forward, with Global articles appearing in reports just like any other location. You can also represent the difference between a "Global" article (all current and future locations) and an "all locations" article (all current locations but no future locations).
Selecting all articles that should be visible at a specific location is slightly harder, it's now a check against "Global" as well as that location, but at least it is checking for 2 values in the same table versus checking two different tables.
SELECT article_id FROM ArticleLocation WHERE location_id in (1, 5);
vs
SELECT article_id FROM ArticleLocation WHERE location_id = 5
UNION
SELECT id FROM Article WHERE is_global;
From the logic, as you described it, GLOBAL should be actually global and stay global, even if you add new locations (problem 1 solved). But this also implies that GLOBAL is not the same thing as "all locations" (as there might also exist some other locations we don't have defined yet). I think this logic is needed especially by your requirement 2 - otherwise it would completely fail on adding new locations.
Analysis done! From the above we see that GLOBAL is something above all those locations. There's no sense in trying to define it as a Location. Go for the easiest solution!
Article table (id, name, content, global)
i.e. boolean flag - article is global or not. In the UI, do it simply as a checkbox - if checked, the multiselect box will be disabled. Simple, easy, requirements met. Done!
Is there a need to automatically add some articles to new locations when new locations are added? If yes then in such case I’d consider adding new ‘global’ property in the backend.
Otherwise it probably isn’t worth the effort. Even if you had 10000 articles and 20 different locations selected for each article that would be about 200k records which is not that bad when you set indexes.
Check your existing data and see how people are already choosing locations. If most users select only several locations and not all then it’s really an edge case and you shouldn’t be working on it unless it really creates problems.
I agree with #HABO's comment (he should have posted it--if he does, upvote him). Adding an atrribute to table Article to identify those items that are to be associated with all Locations, present and future, presumably for the lifetime of the article, should save you time and effort over the long run. Sure, triggers and counts-against-all will do the trick, but they're awkward and would be a pain to support if/when subsequent system changes come along. The UI would be simpler to use, as the user just has to click a checkbox (or whatever) and not multi-click everything in a dropdown of unforseeable length.
(#Damir's hierarchy idea would work as well, but--speaking from a bit too much experience--they're a hassle to work with, and I wouldn't introduce one here unless there was significantly more system and/or business use to get out of it.)
I have custom object 'Subject_c' with 3 fields and I have created those objects by uploading a CSV file. Subject_c has a lookup relationship with Leads (Its general for the same user regardless of what lead he is viewing). I am able to insert a related list and I can see that the objects are created under Data Management/Storage Usage. But it shows blank under related list.
You're saying that the custom object has lookup to Lead but then you say Subjects are generic and somehow should be displayed on every Lead page? I don't think it'll work.
Stuff appears on related list only when field Subject_c.Lead_c will be populated with "this" Lead's Id. (please note I've made best guess at the field name). So you'd need to insert separate data for each Lead which can quickly blow your storage usage and will be a pain in the a$$ to maintain later. Is it only for displaying? Or do you plan to later capture some kind of survey results for each Lead?
If it's just for display I think you'll need to embed a Visualforce page in the Lead page layout to achieve that in a saner way. The subjects are specific to current viewing user? Or it's more like a general list, just 3 subjects for whole organisation?
P.S. "object" is like a table in normal database. I think you mixed a bit the difference between table and records / rows of data stored in it.
This is more of a question for experienced people who've worked a lot with multilingual websites and e-shops. This is NOT a database structure question or anything like that. This is a question on how to store a multilingual website: NOT how to store translations. A multilingual website can not only be translated into multiple languages, but also can have language-specific content. For instance an english version of the website can have a completely different structure than the same website in russian or any other language. I've thought up of 2 storage schemas for such cases:
// NUMBER ONE
table contents // to store some HYPOTHETICAL content
id // content id
table contents_loc // to translate the content
content, // ID of content to translate
lang, // language to translate to
value, // translated content
online // availability flag, VERY IMPORTANT
ADVANTAGES:
- Content can be stored in multiple languages. This schema is pretty common, except maybe for the "online" flag in the "_loc" tables. About that below.
- Every content can not only be translated into multiple languages, but also you could mark online=false for a single language and disable the content from appearing in that language. Alternatively, that record could be removed from "_loc" table to achieve the same functionality as online=false, but this time it would be permanent and couldn't be easily undone. For instance we could create some sort of a menu, but we don't want one or more items to appear in english - so we use online=false on those "translations".
DISADVANTAGES:
- Quickly gets pretty ugly with more complex table relations.
- More difficult queries.
// NUMBER 2
table contents // to store some HYPOTHETICAL content
id, // content id
online // content availability (not the same as in first example)
lang, // language of the content
value, // translated content
ADVANTAGES:
1. Less painful to implement
2. Shorter queries
DISADVANTAGES:
2. Every multilingual record would now have 3 different IDs. It would be bad for eg. products in an e-shop, since the first version would allow us to store different languages under the same ID and this one would require 3 separate records to represent the same product.
First storage option would seem like a great solution, since you could easily use it instead of the second one as well, but you couldn't easily do it the other way around.
The only problem is ... the first structure seems a bit like an overkill (except in cases like product storage)
So my question to you is:
Is it logical to implement the first storage option? In your experience, would anyone ever need such a solution?
The question we ask ourselves is always:
Is the content the same for multiple languages and do they need a relation?
Translatable models
If the answer is yes you need a translatable model. So a model with multiple versions of the same record. So you need a language flag for each record.
PROS: It gives you a structure in which you can see for example which content has not yet been translated.
Separate records per language
But many times we see a different solution as the better one: Just seperate both languages totally. We mostly see this in CMS solutions. The story is not only translated but also different. For example in country 1 they have a different menu structure, other news items, other products and other pages.
PROS: Total flexibility and no unexpected records from other languages.
Example
We see it like writing a magazine: You can write one, then translate to another language. Yes that's possible but in real world we see more and more that the content is structurally different. People don't like to be surprised so you need lots of steps to make sure content is not visible in wrong languages, pages don't get created in duplicate etc.
Sharing logic
So what we do is most time: Share the views, make the buttons, inputs etc. translatable but keep the content seperated. So that every admin can just work in his area. If we need to confirm that some records are available in all languages we can always trick that by creating a link (nicely relational) between them but it is not the standard we use most of the time.
Really translatable records like products
Because we are flexible in creating models etc. we can just use decide how to work with them based on the requirements. I would not try to look for a general solution which works for all because there is none. You need a solution based on your data.
Assuming that you need a translatable model, as it is described by Luc, I would suggest coming up with some sort of special-character-delimited key-value pair format for the value column of the content table. Example:
#en=English Term#de=German Term
You may use UDFs (User Defined Functions in T-SQL) to set/get the appropriate term based on the specified language.
For selecting :
select id, dbo.GetContentInLang(value, #lang)
from content
For updating:
update content
set value = dbo.SetContentInLang(value, #lang, new_content)
where id = #id
The UDFs:
a. do have a performance hit but this also the case for join that you will have to do between the content and content_loc tables
and
b. are somehow difficult to implement but are reusable practically throughout your database.
You can also do the above on the application/UI layer.
At the moment, my project at work has a very inefficient loop which is suffering the n + 1 problem to a great degree. (6n + 1, I think.) Currently, a number of web services instantiate an object whose constructor builds a canonical representation of one of our ORM objects -- call them Foo and FooView(). There are a number of places where a collection of Foo is built; each instance of Foo is passed to FooView and has its (pseudo-)foreign key fields queried in another database to build a textual representation, so that, for example, we can return <fooColor>Blue</fooColor> rather than <fooColor>5</fooColor>. The sets of these properties--Colors, Shapes, and other similarly general properties--are relatively small, and obviously should be pulled into memory.
There is also another, more complex query, which is contributing to the 6n + 1 problem. This is a set of metadata fields. Each Foo has a Source. Each Source can have one, none, or many metadata fields defined for their subset of Foos. Empty XML tags are required for metadata fields which apply to a given Foo's Source. Currently, the four(!) ORM queries(!) used to build this XML are located inside the FooView constructor, meaning they get executed for each and every Foo.
My goal is as follows:
Query for general properties, like Color, Shapes, etc. before anything else.
Run the query to generate the collection of Foo. Store the primary keys in a list.
Using the list of primary keys, run the heinous multi-join, raw SQL query to generate Foo.Metadata.
Call FooView, providing the collection of Foo along with a context object containing the items built in steps 1 and 3. FooView will provide the interleaving logic, using the context object rather than database lookups.
Is this a sound practice? It will certainly solve some of the performance problems in generating the FooView, but where should this thing live? Should I call it FooHelper? FooContext? FooService? Is this a design pattern, or is there one I should be using to make this more logical?
Thanks!