NoSql - entity holds an owner ID field vs owner holds list of child ID's - database

I am currently exploring MongoDB.
I built a notes web app and for now the DB has 2 collections: notes and users.
The user can create, read and update his notes.
I want to create a page called /my-notes that will display all the notes that belong to the connected user.
My question is:
Should the notes model has an ownerId field or the opposite - the user model will have a field of noteIds of type list.
Points I found relevant for the decision making:
noteIds approach:
There is no need to query the notes that hold the desired ownerId (say we have a lot of notes then we will need indexes and search accross the whole notes collection). We just need to find the user by user ID and then get all the notes by their IDs.
In this case there are 2 calls to DB.
The data is ordered by the order of insertion to the notesIds field in the document.
ownerId approach:
We do need to find the notes by their ownerId field across the notes collection which might be more computer "intensive".
We can paginate / sort the data as we want - more control over the data.
Are there any more points you can think of?
As I can conclude this is a question of whether you want less computer intensive DB calls vs more control over the data.
What are the "best practices"?
Thanks,

A similar use case is explained in the documentation. If there is no limit on number of notes a user can have, it might be better to store a userId reference field in notes document.
As you've figured out already, pagination would be easier in the second approach. Also when updating notes, you can simply updateOne({ _id: "note_id", userId: 1 }) instead of checking user's document if the note actually belong to the user.

Related

Is this a valid DynamoDB access pattern (having "detailed" and "relational" items)?

I am building an application using DynamoDB. High level details are: there are users, there are communities' (which users can join), and there are posts (essentially, same use case as Reddit).
My question is how to construct the data in DynamoDB. I am currently using the pattern of having main items (these items are users, posts, communities) which have the exact same partition key and sort key, and these items will always have all details. I'll call these items "detailed" items.
For example, a "detailed" user item would look like this:
Partition Key: USER#<id>
Sort Key: USER#<id>
It would be similar with posts and communities:
Partition Key: POST#<id>
Sort Key: POST#<id>
Partition Key: COMMUNITY#<id>
Sort Key: COMMUNITY#<id>
Now, in order to have relations between these entity's, other items will be created which I am going to call "relational" items. So, if a user posts something, a relational item will be created like this:
Partition Key: USER#<id>
Sort Key: POST#<id>
The whole purpose of this "relational" item is just to make it apparent the user has created this post, and it allows for a simple query to get all the posts a user has created.
Now the problem, these "relational" items do not have any of the data of the detailed item, meaning that after doing a query to get all the users posts, batch get would then have to be used to get the "detailed" items (costing more RCU's).
To be clear, the data is not replicated in the "relational" item because posts can be edited, so the duplicating the details could lead to inconstancies.
Is this an appropriate way to access data, are there better ways? Is the cost of doing batch get negligible enough? Should the data just be duplicated, and if something is edited, updated both items? Just looking for outside opinions.
I have tried having no "detailed" items and having the "relational" items have all the details. However, this complicates the requests since I need both the PK and SK to delete or update an item (compared to a single key since PK and SK would be the same). Additionally, this pattern seems more streamlined in implementing, if it's an object/model in the code, then it is a "detailed" item in the database.
You can avoid the "link entity" by placing the user id in the SK of the post.
PK SK
POST_USER_ID#<user_id> POST_ID#<post_id>
This way you can do two types of queries
Query all with PK==POST_USER_ID#123 that will give you all posts of a user
Query all with PK==POST_USER_ID#123 SK==POST_ID#<post_id> will give you a specific post by its id
As for "should data be duplicated and updated when needed", this is very common with NoSQL so don't worry about it.

How to store feedback like stars or votes of users with efficiency?

I am making a system similar to our Play Store's star rating system, where a product or entity is given ratings and reviews by multiple users and for each entity, the average rating is displayed.
But the problem is, whether i should store the ratings in database of each entity with a list of users who rated it and rating given, but it will make it hard for a user to check which entities he has rated, as we need to check every entity for user's existence,
Or, should i store each entity with rating in user database but it will make rendering of entity harder
So, is there a simple and efficient way in which it can be done
Or is storing same data in both databases efficient, also i found one example of this system in stackoverflow, when the store up and down votes of a question, and give +5 for up vote while - for down vote to the asking user, which means they definitely need to store each up and down vote in question database, but when user opens the question, he can see his vote, therefore it is stored in user's database
Thanx for help
I would indeed store the 'raw' version at least, so have a big table that stores the productid/entityid, userid and rating. You can query from that table directly to get any kind of result you want. Based on that you can also calculate (or re-calculate) projections if you want, so its a safe bet to store this as the source of truth.
You can start out with a simple aggregate query, as long as that is fast enough, but to optimize it, you can make projections of the data in a different format, for instance the average review score per product. This van be achieved using (materialized) views, or you can just store the aggregated rating separately whenever a vote is cast.
Updating that projected aggregate can be very lightweight as well, because you can store the average rating for an entity, together with the number of votes. So when you update the rating, you can do:
NewAverage = (AverageRating * NumberOfRatings + NewRating) / (NumberOfRatings + 1)
After that, you store the new average and increment number of ratings. So there is no need to do a full aggregation again whenever somebody casts a vote, and you got the additional benefit of tracking the number of votes too, which is often displayed as well on websites.
The easiest way to achieve this is by creating a review table that holds the user and product. so your database should look like this.
product
--id
--name
--price
user
--id
-- firstname
--lastname
review
--id
--userId
--productId
--vote
then if you want to get all review for a product by a user then you can just query
the review table. hope this solves your problem?

CakePHP2 Get Associated Model from DataSource

I have an User Model which uses a standard MySQL database and users table and a Movie Model which is a datasource from Rotten Tomatoes. They have a hasAndBelongsToMany relationship and I'm successfully able to write to the join table users_movies which holds the user_id and movie_id (the movie_id is the Rotten Tomatoes id). Works great.
The trouble is retrieving an User's movies. The standard find:
$movies = $this->User->find('all', array('conditions' => array('id' => $user_id)));
only returns the User not the associated Movie(s). I put a die statement in my read method in the DataSource and it's not even reaching the read method. How can I go about retrieving an User's movies?
So Rotten Tomatoes is holding your movies? In that case, of course Rotten Tomatoes wouldn't allow direct SQL access to their database - you'd be accessing it via an API. So Cake definitely won't just be able to join Users to Movies the way it normally would with two tables in the same database.
What you'll have to do is 1) get a list of the user's movie_id's from Cake, and then 2) Call the Rotten Tomatoes API to get a list of movies where their ID is in your list of movie_id's. (That's assuming rotten tomatoes allows such an API call.)
Having a quick look at the API, it looks like their 'movies search' (http://developer.rottentomatoes.com/docs/json/v10/Movies_Search) only allows you to specify plain-text as the search criteria (ie, you can't search based on movie id's). And their 'movie info' method (http://developer.rottentomatoes.com/docs/json/v10/Movie_Info), which does allow you to retrieve a movie by id, only allows you to retrieve one movie at a time.
You could of course loop through your list of movie id's for a given user, and make a separate API call to rotten tomatoes for each one - though I'd imagine that would get VERY slow.
Someone has put in a feature request for retrieving based on a list of multiple id's (http://developer.rottentomatoes.com/forum/read/123940) but until that request gets implemented, you will probably be having a tough time getting anything decent working.
I got it working by making a Model out of my join table between users and movies and getting an user's movies that way. Then I did as you suggested and looped through the user's movie ids making a call to the API and getting each movie. Not sure how elegant it is, but it is working.

Create multiselect lookup in salesforce using apex

I want to create a multi-select Contact Lookup.
What i want :
When user clicks on a lookup then he should be able to select multiple contacts from that.
What i have done:
I have created an object and a field inside that object using both
"Lookup" and
"MasterDetail Relationship" and
"Junction Object"
When i try to use this Field for any input text/Field then it always provides an option to select only one value from lookup but i want to have an option to select multiple.
Even in the Junction object i have created 2 master-detail relationships still lookup allows only one value to be selected.Moreover it makes the field mandatory which i don't want.
Links that i followed:
http://success.salesforce.com/questionDetail?qId=a1X30000000Hl5dEAC
https://ap1.salesforce.com/help/doc/user_ed.jsp?loc=help&section=help&hash=topic-title&target=relationships_manytomany.htm
Can anybody suggest me how to do this.
Its same as we use Email CC/BCC under Send Email option for any Lead.
Even you use a junction object a lookup is just that, it references (looks up to) one other record: when you create a record on the junction object you still have to set each lookup individually and you're still creating only one record.
Master Detail relationships are essentially lookups on steroids, one object becomes the child of the other and will be deleted if the parent object is deleted, they're not going to provide an interface to lookup to many records at once.
If you're not a developer then your best bet is to either just create on junction object record at a time, or look into using dataloader. You could prepare your data in Excel or similar and then upload all the records into Salesforce in one go.
If you are a developer, or have developers at your disposal, then what we've done in the past is create a Visualforce page to do the job. So if, for example, you wanted to link a bunch of contacts up to an Account, we'd have a single account lookup field on the page, then some search fields relating to fields on the contact. Using a SOQL query you can then find all contacts matching the search parameters and display them in a list, where you may want to provide checkboxes to allow the user to select the contacts they want. Then it's just a case of looping through the selected contacts, setting their Account field to be the chosen account.
There are areas in Salesforce (such as the send Email functionality you mentioned) where it's clear to see that bespoke work has been done to fulfil a specific task — another instance of what you want is in the area where you can manage campaign members. This is the model I've copied in the past when implementing a Visualforce page as described.
Good luck!
For adding multiple junction objects at one time, the only solution we have found is a custom Visualforce page, as described by LaceySnr.
For a slightly different problem, where we need to assign many of object B to object A, We have trained our users to do this with a view on object B. We are assigning Billing Accounts (B) to Payment Offices (A). The view on Billing Account has check boxes on the left side. The user checks the Billing Accounts to be assigned, then double-clicks on the Payment Office field on any of the checked rows. A pop-up asks if you want to update only the single row or all checked rows. By selecting 'all checked rows', the update is done to all of them.
The view is created by the user, who enters the selection criteria (name, address, state, etc.). All user-created views are visible only to them.

Looking for Denormalization Advice for Google App Engine

I am working on a system, which will run on GAE, which will have several related entities and I am not sure of the best way to store the data. This post is a request for advice from others who may have similar experience....
The system will have users, with profile data and an image. Those users will be able to create "events" and add journal entries to it. For the purpose of the system, the "events" will likely have 1 or 2 journal entries in them, and anything over 10 would likely never happen. Other users will be able to add comments to users' entries as well, where popular ones may have hundreds or even thousands of comments. When a random visitor uses the system, they should be able to see the latest events (latest, being defined by those with latest journal entries in them), search by tag, and a very perform basic text search. Then upon selecting an event to view, it should be displayed with all journal entries, and all user comments, with user images alongside comments. A user should also have a kind of self-admin page, to view/modify/delete their events and to view/modify/delete comments they have made on other events. So, doing all this on a normal RDBMS would just queries with some big joins across several tables. On GAE it would obviously need to work differently. Here are my initial thoughts on the design of the entities:
Event entity - id, name, timstamp, list
property of tags, view count,
creator's username, creator's profile
image id, number of journal entries
it contains, number of total comments
it contains, timestamp of last update to contained journal entries, list property of index words for search (built/updated from text from contained journal entries)
JournalEntry entity - timestamp,
journal text, name of event,
creator's username, creator's profile
image id, list property of comments
(containing commenter username and
image id)
User entity - username, password hash, email, list property of subscribed events, timestamp of create date, image id, number of comments posted, number of events created, number of journal entries created, timestamp of last journal activity
UserComment entity - username, id of event commented on, title of event commented on
TagData entity - tag name, count of events with tag on them
So, I'd like to hear what people here think about the design and what changes should be made to help it scale well. Thanks!
Rather than store Event.id as a property, use the id automatically embedded in each entity's key, or set unique key names on entities as you create them.
You have lots of options for modeling the relationship between Event and JournalEntry: you could use a ReferenceProperty, you could parent JournalEntries to Events and retrieve them with ancestor queries, or you could store a list of JournalEntry key ids or names on Event and retrieve them in batch with a key query. Try some things out with realistically-distributed dummy data, and use appstats to see what works best.
UserComment references an Event, while JournalEntry references a list of UserComments, which is a little confusing. Is there a relationship between UserComment and JournalEntry? or just between UserComment and Event?
Persisting so many counts is expensive. When I post a comment, you're going to write a new UserComment entity and also update my User entity and a JournalEntry entity and an Event entity. The number of UserComments you expect per Event makes it unwise to include everything in the same entity group, which means you can't do these writes transactionally, so you'll do them serially, and the entities might be stored across different network nodes, making the whole operation slow; and you'll also be open to consistency problems. Can you do without some of these counts and consider storing others in memcache?
When you fetch an Event from the datastore, you don't actually care about its list of search index words, and retrieving and deserializing them from protocol buffers has a cost. You can get around this by splitting each Event's search index words into a separate child EventIndex entity. Then you can query EventIndex on your search term, fetch just the EventIndex keys for EventIndexes that match your search, derive the corresponding Events' keys with key.parent(), and fetch the Events by key, never paying for the retrieval or deserialization of your search index word lists. Brett Slatkin explains this strategy here at 14:35.
Updating Event.viewCount will fail if you have a lot of views for any Event in rapid succession, so you should try out counter sharding.
Good luck, and tell us what you learn by trying stuff out.

Resources