Working with nested single queries in Firestore - database

Recently I moved my data model from Firebase to Firestore. All my code is working, but I'm having some ugly troubles regarding my nested queries for retrieve some data. Here is the point:
Right now my data model for this part looks like this(Yes! Another followers/feed example):
{
"Users": { //Collection
"UserId1" : { //Document
"Feed" : { //Subcollection of Id of posts from users this user Follow
"PostId1" : { //Document
"timeStamp" : "SomeDate"
},
"PostId2" : {
"timeStamp" : "SomeDate"
},
"PostId3" : {
"timeStamp" : "SomeDate"
}
}
//Some data
}
},
"Posts":{ //Collection
"PostId1":{ //Document
"Comments" :{ //Subcollection
"commentId" : { //Document
"authorId": "UserId1"
//comentsData
}
},
"Likes" : { //Subcollection
"UserId1" : { //Document
"liked" : true
}
}
}
}
}
My problem is that for retrieve the Posts of the feed of an user I should query in the next way:
Get the last X documents orderer by timeStamp from my Feed
feedCol(userId).orderBy(CREATION_DATE, Query.Direction.DESCENDING).limit(limit)
After that I should do a single query of each post retrieved from the list: workoutPostCol.document(postId)
Now I have the data of each post, but I want shot the username, picture, points.. etc of the author, which is in a different Document, so, again I should do another single query for each authorId retrieved in the list of posts userSocial(userId).document(toId)
Finally, and not less important, I need to know if my current user already liked that post, so I need to do a single query for each post(again) and check if my userId is inside posts/likes/{userId}
Right now everything is working, but thinking that the price of Firestore is depending of the number of database calls, and also that it doesn't make my queries more simple, I don't know if it's just that my data model is not good for this kind of database and I should move to normal SQL or just back to Firebase again.
Note: I know that EVERYTHING, would be a lot more easier moving this subcollections of likes, feed, etc to arraylists inside my user or post documents, but the limit of a Document is 1MB and if this grow to much, It will crash in the future. In other hand Firestore doesnt allow subdocument queries(yet) or an OR clause using multiple whereEqualTo.
I have read a lot of posts from users who have problems looking for a simple way to store this kind of ID's relationship to make joins and queries in their Collections, use Arraylists would be awesome, but the limit of 1MB limit it to much.
Hope that someone will be able to clarify this, or at least teach me something new; maybe my model is just crap and there is a simple and easiest way to do this? Or maybe my model is not possible for a non-sql database.

Not 100% sure if this solves the problem entirely, since there may be edge cases for your usage. But with a 5 min quick thinking, I feel like the following could solve your problem :
You can consider using a model similar to Instagram's. If my memory serves me well, what they use is an events-based collection. By events in this specific context I mean all actions the user takes. So a comment is an event, a like is an event etc.
This would make it so that you'll need three main collections in total.
users
-- userID1
---- userdata (profile pic, bio etc.)
---- postsByUser : [postID1, postID2]
---- followedBy : [userID2, ... ]
---- following : [userID2, ... ]
-- userID2
---- userdata (profile pic, bio etc.)
posts
-- postID1 (timestamp, so it's sortable)
---- contents
---- author : userID1
---- authorPic : authorPicUrl
---- authorPoints : 12345
---- taggedUsers : []
---- comments
------ comment1 : { copy of comment event }
---- likes : [userID1, userID2]
-- postID2 (timestamp)
---- contents
...
events
-- eventID1
---- type : comment
---- timestamp
---- byWhom : userID
---- toWhichPost : postID
---- contents : comment-text
-- eventID2
---- type : like
---- timestamp
---- byWhom : userID
---- toWhichPost : postID
For your user-bio page, you would query users.
For the news feed you would query posts for all posts by userIDs your user is following in the last 1 day (or any given timespan),
For the activity feed page (comments / likes etc.) you would query events that are relevant to your userID limited to the last 1 day (or any given timespan)
Finally query the next days for posts / events as the user scrolls (or if there's no new activity in those days)
Again, this is merely a quick thought, I know the elders of SOF have a habit of crucifying these usually, so forgive me fellow members of SOF if this answer has flaws :)
Hope it helps Francisco,
Good luck!

Related

Database design: Send the ID or the Value in a relationship

this is a design question.
Imagine this, I have two tables.
|user|
------
|id|
|username|
|team_id|
|team|
------
|id|
|name|
So when receiving a POST /users
Should I send
{
"username": "newUser",
"name": "myTeam" /
}
Get the team id first or includes if using ORMs
or
{
"username": "newUser",
"team_id": 1 // references the "myTeam"
}
Insert it directly and if the team_id doesn't exists fail
Which one is the best and why?
This is just an example with only one relationship and it could happen that the user table has a lot of relationships
It depends upon which data is important and which isn't. If your front end wants to show the name of the team then send the name of the team or if it just wants to show the id then just send the id.
In my opinion, you should send the name because it will be much more clear to the user than just the id.

Parse Server, MongoDB - get "liked" state of an object

I am using Parse Server, which runs on MongoDB.
Let's say I have collections User and Comment and a join table of user and comment.
User can like a comment, which creates a new record in a join table.
Specifically in Parse Server, join table can be defined using a 'relation' field in the collection.
Now when I want to retrieve all comments, I also need to know, whether each of them is liked by the current user. How can I do this, without doing additional queries?
You might say I could create an array field likers in Comment table and use $elemMatch, but it doesn't seem as a good idea, because potentially, there can be thousands of likes on a comment.
My idea, but I hope there could be a better solution:
I could create an array field someLikers, a relation (join table) field allLikers and a number field likesCount in Comment table. Then put first 100 likers in both someLikers and allLikers and additional likers only in the allLikers. I would always increment the likesCount.
Then when querying a list of comments, I would implement the call with $elemMatch, which would tell me whether the current user is inside someLikers. When I would get the comments, I would check whether some of the comments have likesCount > 100 AND $elemMatch returned null. If so, I would have to run another query in the join table, looking for those comments and checking (querying by) whether they are liked by the current user.
Is there a better option?
Thanks!
I'd advise agains directly accessing MongoDB unless you absolutely have to; after all, the way collections and relations are built is an implementation detail of Parse and in theory could change in the future, breaking your code.
Even though you want to avoid multiple queries I suggest to do just that (depending on your platform you might even be able to run two Parse queries in parallel):
The first one is the query on Comment for getting all comments you want to display; assuming you have some kind of Post for which comments can be written, the query would find all comments referencing the current post.
The second query again is for on Comment, but this time
constrained to the comments retrieved in the first query, e.g.: containedIn("objectID", arrayOfCommentIDs)
and constrained to the comments having the current user in their likers relation, e.g.: equalTo("likers", currentUser)
Well a join collection is not really a noSQL way of thinking ;-)
I don't know ParseServer, so below is just based on pure MongoDB.
What i would do is, in the Comment document use an array of ObjectId's for each user who likes the comment.
Sample document layout
{
"_id" : ObjectId(""),
"name" : "Comment X",
"liked" : [
ObjectId(""),
....
]
}
Then use a aggregation to get the data. I asume you have the _id of the comment and you know the _id of the user.
The following aggregation returns the comment with a like count and a boolean which indicates the user liked the comment.
db.Comment.aggregate(
[
{
$match: {
_id : ObjectId("your commentId")
}
},
{
$project: {
_id : 1,
name :1,
number_of_likes : {$size : "$liked"},
user_liked: {
$gt: [{
$size: {
$filter: {
input: "$liked",
as: "like",
cond: {
$eq: ["$$like", ObjectId("your userId")]
}
}
}
}, 0]
},
}
},
]
);
this returns
{
"_id" : ObjectId(""),
"name" : "Comment X",
"number_of_likes" : NumberInt(7),
"user_liked" : true
}
Hope this is what your after.

Trigger Duplicate CSV

am trying to upload a CSV file / insert a bulk of records using the import wizard. In short I would like to keep the latest record, in case if duplicates are found. Duplicates record are a combination of First name, Last name and title
For example if my CSV file looks like the following:
James,Wistler,34,New York,Married
James,Wistler,34,London,Married
....
....
James,Wistler,34,New York,Divorced
This should only keep in my org: James,Wistler,34,New York,Divorced
I have been trying to write a trigger before an update / insert but so far no success Here is my trigger code: (The code is not yet finished (only filering with Firstname), I am having a problem deleting found duplicate in my CSV ) Any hints. Thanks for reading!
trigger CheckDuplicateInsert on Customer__c(before insert,before update){
Map <String, Customer__c> customerFirstName = new Map<String,Customer__c>();
list <Customer__c> CustomerList = Trigger.new;
for (Customer__c newCustomer : CustomerList)
{
if ((newCustomer.First_Name__c != null) && System.Trigger.isInsert )
{
if (customerFirstName.containsKey(newCustomer.First_Name__c) )
//remove the duplicate from the map
customerFirstName.remove(newCustomer.First_Name__c);
//end of the if clause
// add this stage we dont have any duplicate, so lets add a new customer
customerFirstName.put(newCustomer.First_Name__c , newCustomer);
}
else if ((System.Trigger.oldMap.get(newCustomer.id)!= null)&&newCustomer.First_Name__c !=System.Trigger.oldMap.get(newCustomer.id).First_Name__c )
{//field is being updated, lets mark it with UPDATED for tracking
newCustomer.First_Name__c=newCustomer.First_Name__c+'UPDATED';
customerFirstName.put(newCustomer.First_Name__c , newCustomer);
}
}
for (Customer__c customer : [SELECT First_Name__c FROM Customer__c WHERE First_Name__c IN :customerFirstName.KeySet()])
{
if (customer.First_Name__c!=null)
{
Customer__c newCustomer=customerFirstName.get(customer.First_Name__c);
newCustomer.First_Name__c=Customer.First_Name__c+'EXIST_DB';
}
}
}
Purely non-SF solution would be to sort them & deduplicate in Excel for example ;)
Good news - you don't need a trigger. Bad news - you might have to ditch the import wizard and start using Data Loader. The solution is pretty long and looks scary but once you get the hang of it it should start to make more sense and be easier to maintain in future than writing code.
You can download the Data Loader in setup area of your Production org and here's some basic info about the tool.
Anyway.
I'd make a new text field on your Contact, call it "unique key" or something and mark it as External Id. If you have never used ext. ids - Jeff Douglas has a good post about them.
You might have to populate the field on your existing data before proceeding. Easiest would be to export all Contacts where it's blank (from a report for example), fill it in with some Excel formulas and import back.
If you want, you can even write a workflow rule to handle the generation of the unique key. This might help you when Mrs. Jane Doe gets married and becomes Jane Bloggs and also will make previous point easier (you'd just import Contacts without changes, just "touching" them and the workflow will fire). Something like
condition: ISBLANK(Unique_key__c) || ISCHANGED(FirstName) || ISCHANGED(LastName) || ISCHANGED(Title)
new value: Title + FirstName + ' ' + LastName
Almost there. Fire Data Loader and prepare an upsert job (because we want to insert some records and when duplicate is found - update them instead).
My only concern is what happens when what's effectively same row will appear more than once in 1 "batch" of records sent to SF like in your example. Upsert will not know which value is valid (it's like setting x = 7; and x = 5; in same save to DB) and will decide to fail these rows. So you might have to tweak the amount of records in a batch in Data Loader's settings.

Soccer and CouchDb (noob pining for sql and joins)

This has kept me awake until these wee hours.
I want a db to keep track of a soccer tournament.
Each match has two teams, home and away.
Each team can be the home or the away of many matches.
I've got one db, and two document types, "match" (contains: home: teamId and away: teamId) and team (contains: teamId, teamName, etc).
I've managed to write a working view but it would imply adding to each team the id of every match it is involved in, which doesn't make much logical sense - it's such an hack.
Any idea on how this view should be written? I am nearly tempted to just throw the sponge in and use postgres instead.
EDIT: what I want is to have the team info for both the home and away teams, given the id of a match. Pretty easy to do with two calls, but I don't want to make two calls.
Just emit two values in map for each match like this:
function (doc) {
if (!doc.home || !doc.away) return;
emit([doc._id, "home"], { _id: doc.home });
emit([doc._id, "away"], { _id: doc.away });
}
After querying the view for the match id MATCHID with:
curl 'http://localhost:5984/yourdb/_design/yourpp/_view/yourview?startkey=\["MATCHID"\]&endkey=\["MATCHID",\{\}\]&include_docs=true'
you should be able to get both teams' documents in doc fields in list of results (row), possibly like below:
{"total_rows":2,"offset":0,"rows":[
{"id":"MATCHID","key":["MATCHID","home"],"value":{"_id":"first_team_id"},"doc":{...full doc of the first team...}},
{"id":"MATCHID","key":["MATCHID","away"],"value":{"_id":"second_team_id"},"doc":{...full doc of the second team...}}
]}
Check the CouchDB Book: http://guide.couchdb.org/editions/1/en/why.html
It's free and includes answers to a beginner :)
If you like it, consider buying it. Even thought Chris and Jan are so awesome they just put their book out there for free you should still support the great work they did with their book.

Simple search by value?

I would like to store some information as follows (note, I'm not wedded to this data structure at all, but this shows you the underlying information I want to store):
{ user_id: 12345, page_id: 2, country: 'DE' }
In these records, user_id is a unique field, but the page_id is not.
I would like to translate this into a Redis data structure, and I would like to be able to run efficient searches as follows:
For user_id 12345, find the related country.
For page_id 2, find all related user_ids and their countries.
Is it actually possible to do this in Redis? If so, what data structures should I use, and how should I avoid the possibility of duplicating records when I insert them?
It sounds like you need two key types: a HASH key to store your user's data, and a LIST for each page that contains a list of related users. Below is an example of how this could work.
Load Data:
> RPUSH page:2:users 12345
> HMSET user:12345 country DE key2 value2
Pull Data:
# All users for page 2
> LRANGE page:2:users 0 -1
# All users for page 2 and their countries
> SORT page:2:users By nosort GET # GET user:*->country GET user:*->key2
Remove User From Page:
> LREM page:2:users 0 12345
Repeat GETs in the SORT to retrieve additional values for the user.
I hope this helps, let me know if there's anything you'd like clarified or if you need further assistance. I also recommend reading the commands list and documentation available at the redis web site, especially concerning the SORT operation.
Since user_id is unique and so does country, keep them in a simple key-value pair. Quering for a user is O(1) in such a case... Then, keep some Redis sets, with key the page_id and members all the user_ids..

Resources