Database architecture for chatting app to handle tons of users - database

I found all applications has Messages Collection but I found it insufficient to search all web-app messages every time you get a request.
So If I thought about making a collection for each person is this a good practice ?

Well that would mean a couple of things:
If you had 100 users, you would have 100 collections. Afterwards, if you get 1000 users , you would have to create an additional 900 collections. That is not practical as you would have to keep creating new collections as the number of users increases.
You would have to somehow keep track of the collection relatively to the user. Most DBs have nothing like that out of the box and you would have to create from scratch a program just to be able to delete update etc the correct collections. This is not a small task. Your time is better put to use developing the main functionality of your app
DBs specialize in data lookup in collections, as long as you have your collection properly indexed , you could put millions of messages in a collection and find the ones you need in almost no time at all.
And that is just the tip of the iceberg. As such, making collections per user are not only bad practice , but very impractical unfortunately.
Having said all that, I encourage to keep thinking out of the box. Not all the ideas will work out (like this one), but many great innovations have come from trying something new.

Related

Mongo DB: Single collection per user whit all interaction, or Multiple Collections per argument?

Good Evening.
I'm pretty new to mongo db and i'm planning to make an app who will work whit Nosql(MongoDB).
The scope of the app is pretty simple:
Register a profile
Request item to a shopper
Fulfill and sent payment notice.
If i would make this whit SQL i would create a User Table, A Request Item Table a sending Paymen Table.
I would, also in order to learn something, to make it whit NOsql, and i choose mongo.
I could create 3 collection and put every different document and make a search every time i need.
OR, and this is the question, COULD i create collection for EVERY user, and inside every user put every interaction of the very same user.
So if i need to search for User10 order and paymen, i would look only inside User10 collection and search for every item he\she requested.
But on the other hand, how much can affect me if i need to search all order in a specific timeframe? It should be slower than SQL i suppose.
Is a acceptable way to do this, there are some backdraw i did not yet seen, or is discouraged in order to make another approach?
The backend would be write in Java, meanwile the app (for...reason) would be write in Xamarin.Form
.
While this is possible I would personally recommend against this as this is considered an anti pattern, you should read this article about this very topic.
I would personally ask myself what are the advantages of this approach that i'm hoping to gain? if quick queries at a user level is what you seek this should not be a problem with sufficient indexes. (on user_id and on timeframe ).
There are other standard solutions built to deal with scale like collection sharding. From my personal experience MongoDB deals with scale very well, It sounds like this is a personal project to learn from which probably means you'll never really reach hyper scale, The first barrier you'll probably encounter is hardware.

Building personalized feed on App Engine

I have been working on a social app. I'll first explain the problems, and then summarize in the questions below.
In the network, there would be channels, and users. Users can subscribe to these channels, and to other users. This way, we have two sources from which posts can be generated.
Now, we can simply keep one Activity model where we record all the actions, their kind, and what they affect. Be it from channels, or from the users. And refer these while creating a feed for each user.
I found a solution given in a talk by Brett Slatkin which basically suggests using ListProperty to link each post with each subscriber. But Guido suggests not to use lists if there's going to be more than 1000 elements. So if there's going to be more than 1000 subscribers to a channel, this will probably run into problem. Even if this were to work --
I want to rank the posts based on popularity (based on number of votes, comments), and apply some time decay function. More like Reddit. To do so, I will have to keep the Activity in memory, and filter and order it based on ranks for each user. I'll also need to do it periodically since new activities will keep occurring also old activities will gain, or lose their values.
The challenge is -- To keep the data in memory (for processing the feed as well as to keep things fast). I will have to store a copy of each users feed to persistent storage, but if the order of posts is going to be changing, how do I keep track of that in the database?
Also: I have kept my options open -- I will move to AWS if I have to.
To summarize:
Is there a better solution to keep track of subscribers without using Lists? Using something like PostID > SubscriberID in one entity would be very, very expensive and inefficient.
If there's any cost-effective and fast solution to the problem above, how do I deal with the next challenge -- which is to generate a personalized feed? (memory issues - unknown size of memcache)
If I can generate a personalized feed (which will be dynamic, will be changing) how to I keep it in the database?.
I have gone through several articles and I can probably solve first two problems with AWS, but I am trying to stay away from the manual scaling work. If there is no way, I am willing to move to AWS. Even if I move to AWS, I can't think of a solution to the third problem.
Any thoughts, directions, resources would be helpful! Thanks!

Strong consistency in Datastore (HRD)... my idea

I'm hoping that this isn't flagged as "not helpful" because I think that many people are attempting to figure out a way to keep strong consistency in the HRD.
Here is the idea I'm using for my app. I'd like to get your opinions.
I have a fitness app. This is of course made up of Workouts and Exercises.
The HRD contains about 400 exercises to pick from, or the User can create their own Exercise (a UExercise).
When the User logs in, I load all of the Workout keys into a "workoutKeys" List on the User. At the same time I load all the User exercise keys (UExercise) into a "exerciseKeys" List also on the User.
If the user wants to add/delete exercises from a specific workout, the Workout is loaded and all its Exercise keys are loaded into a "exerciseKeys" List on the Workout.
See a pattern here?
So whenever I want to view Exercises created by the user (UExercise) or the users Workouts, or the Exercises in that Workout, I do a get() using those keys.
Since a user would probably not have 1000's of Workouts, or create 1000's of Exercises, I think this is a safe and quick way to achieve strong consistency.
Not saying that this is the best way for EVERY app. But for mine I believe it will work well.
I would greatly appreciate all of your input as to if there is something I may be missing here, or not properly taking into consideration.
Ok... After some careful consideration of how my app will work, and how users actually use it, I have decided to ditch the idea above and go with Ancestor Queries.
So for the above models, I have come up with the following...
For a Workout, I make the User the parent
For an Exercise created a user (UExercise), I make the User the
parent
This allows me to use Ancestor Queries (which are strongly consistent) to pull the most recently added or modified Entities.
Due to the fact that the user will not be modifying these Entities en mass, I think the limitations on the writes will not be a factor.
This also rids me of properties on Model objects that should not really be there in the first place.
By the way, I also tried Memcache. I found this to be the ultimate pain. Having to keep the Memcache and the Datastore in sync seemed to inject much more complexity than was really needed.
But your site, and results may differ. This idea works well for my app.
Thanks!

Environmental database design

I've never designed a database before, but I've had experience programming in a few languages and assembler throughout college, as well as some web design, so I'm able to at least pick up what I need to know if I can be pointed in the right direction. One of the tasks of my job is to sort through some data that we've been collecting in the field, using a "sonde" which measures temperature, pH, conductivity, and other parameters. The device sits in a stream 24/7 (except for when we take it out and switch it with our other sonde every couple weeks, so that we can put in a newly calibrated one in the stream and retrieve the data from the one that was in the field). It collects data every 15 minutes or so, and has done so since 2007. Currently, all of our data is spread across multiple excel spreadsheets, and we have additional data from a weather station and another instrument that all gets compiled into quarterly documents. My goal is to design as simple of a database as possible with most of the functionality of a database like this: http://hudson.dl.stevens-tech.edu/hrecos/d/index.shtml. Ours would be significantly simpler as it is not live data (but would instead retrieve data from files that we upload once we'd finished handling the formatting and compilation of all our data). I would very much like the graphing ability on the site that the above database has, but I at least need to be able to select a range of data and select as many variables as I want within that time range and then be able to download a spreadsheet with the generated data (or at least a CSV file).
I realize this is a tough task, and as I have not designed a database before, I suspect it is very much an uphill task. However if I would be able to learn the things necessary to do this, and make it web-accessible, that would be a huge accomplishment and very much impress my boss. Any advice or tips to go off in the right direction would be very much appreciated.
Thanks for your help!
There are actually 2 parts to the solution you're looking for:
The database, which will store your data in a single organized place, and
The application, which is the interface used by people to interact with the database.
Basically, a database by itself is just a container. You need some kind of application which accept criteria from a user, pull the appropriate data meeting the criteria from the database, and display it to the user in a meaningful fashion - in this case, a graph or a spreadsheet.
Normally for web-based apps the database and application are two separate components. However, for a small app with a fairly small number of users, and especially for someone just starting out, you may want to consider an all-in-one solution like InfoDome, sort of like MSAccess for the web.
Either way, you're still going to need to learn about database design. There's many good tutorials out there, just do some searching. DatabaseAnswers.org has been useful for me. They have a set of tutorials as well as a large collection of sample database schemas.

App Engine - Import data

I'm unsure of a good way to import data that I have from an old SQL-based application into app engine (big table). I'm very confused though I'm sure I'm missing something simple.
The data is not just a simple spread sheet. It consists of customers, appointments, and a few other things. They're all tied together by keys, so that adds a little to the complexity.
I realize there is a bulk uploader, that seemed more for someone with administrative access though and I was hoping to come up with a solution that would work for a user.
It seems that if I could upload a file and do it that way, that would work, but there is a 30 second limit on processes, this would likely exceed the 30 second time limit if adding a few thousand records. Maybe I could use the task queue? I think this may allow processes that take more than 30 seconds, but then I think I'd have issues synchronizing with the development server?
Its not that I don't know how to do this at all, but its that I really have no clue as to a way that will involve the least amount of headache.
From what I understand (and I am a beginner as well), App Engine uses 'denormalized' data. This means there are really no such things as 'joins'. There are some things that can be done to connect tables (property settings I believe) but I have no idea how they work for certain - I haven't tried.
I believe your only option would be to build scripts and rules to convert your SQL data to a denormalized state and then store that in App Engine. If you have to have two way sync, then this could get messy real quick!
See this article:
http://blog.notdot.net/2010/10/Modeling-relationships-in-App-Engine
or maybe this post
https://dba.stackexchange.com/questions/52/in-google-app-engine-what-is-the-most-effective-many-to-many-join-model

Resources