Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am building a social network app and I am very concerned about next thing.What happens when in mongoDB a lot of users(let's assume millions) try to modify the same document at the same time. Will there be any mismatch ,or ignored queries or any kind of unexpected behaviour?
Practical Example:
2 collections: 'posts' and 'likes'
posts will have fields id | name | info | numberOfLikes
likes will have fields id | post | fromUser
When assumed millions of users like the post ,like object appears in 'likes' collection and business logic automatically increments numberOfLikes for post. I thought if there could be a conflict when tons of users try to modify that post likes count at the same time.
Databases have mechanisms in place to prevent this kind of situation. You can 'lock' on various logical structures, so you can be assured your data is intact - regardless of your transaction count.
See more below:
http://docs.mongodb.org/manual/faq/concurrency/
In MongoDB, operations are atomic at the document level.
See http://docs.mongodb.org/manual/core/data-modeling-introduction/#atomicity-of-write-operations
A couple of things.
You're saying you're building a social app and expect millions of likes and "tons" of them at the same time. Of course it's good to consider performance and scaling at the start of a project, but you're not going to build the next Facebook right now.
Furhtermore, you seem to want to use MongoDB as primary database for this app, and you seem to want to use it as a relational database. Read the somewhat biased titled article Why You Should Never Use MongoDB.
I'd suggest backing your site with a relational database (which may also be better for queries like "What posts did this user like" and "Did this user already like this post") and denormalizing that into MongoDB at regular intervals.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Firebase says it can have only 100k users simultaneously for spark plan. It also states per database. What does that mean? How can I store data in multiple databases and connect each other? Also it states 1gb data stored. How much will that be approx? Say 1 users data will have 10 childs. So how many users data can be stored at that space? Someone please help me out as google isn't very clear about it.
I'm going to assume you're talking about Realtime Databases and not Cloud Firestore.
The Firebase Spark "Free" Plan includes 100 simultaneous users not 100k. (100k+ users is supported with the Flame plan and Blaze plan).
You can store 1GB worth of data in the Real Time Database, and 100GB worth a month for download. This plan only supports 1 database per project, connecting of multiple databases isn't possible.
It's hard to determine how much "storage" that would take up, due to varying factors. But, a good rule of thumb is that most JSON data doesn't take up a lot of space so you should be good.
I would like to clarify with you that simultaneous users is just the amount of users that can access your database (via any interface or platform) at the same time to a single database.
There's a great documentation on the features and pricing of Firebase here, and I would also recommend reading some of their documentation on Realtime Databases.
I hope this helps, if you need any more help please let me know.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
We have an application to manage company, teams, branches,employee etc and have different tables for that. Now we have a requirement that we have to give access of same system to our technology partners so that they can also do the same thing which we are doing. But at the same time we need to supervise these partners in our system.
So in terms of DB schema what will be the best way to manage them:
1)To duplicate the entire schema for partners, and for that we have to duplicate around 50-60 tables and many more in future as system will grows.
2)To create some flag in each table which will tell it is internal or external entity.
Please suggest if anyone has any experience.
Consider the following points before finalizing any of the approaches.
Do you want a holistic view of the data
By this I mean that do you want to view the data your partner creates and which you create in a single report / form. If the answer is yes then it would make sense to store the database in the same set of tables and differentiate them based on some set of columns.
Is your application functionality going to vary significantly
If the answer to this question is NO then it would make sense to keep the data in the same set of tables. This way any changes you do to your system will automatically reflect to all the users and you won't have to replicate your code bits across schemas / databases.
Are you and your partner going to use the same master / reference data
If the answer to this question is yes then again it makes sense to use the same set of tables since you will do away with unnecessary redundant data.
Implementation
Rather than creating a flag I would recommend creating a master table known as user_master. The key of this table should be made available in every transaction table. This way if you want to include a second partner down the line you can make a new entry in your user_master table and make necessary modifications to your application code. Your application code should manage the security. Needless to say that you need to implement as much security as possible at the database level too.
Other Suggestions
To physical separate data of these entities you can either implement
partitioning or sharding depending upon the db you are using.
Perform thorough regression testing and check that your data is not
visible in partner reports or forms. Also, check that partner is not
able to update or insert your data.
Since the data in your system will increase significantly it would
make sense to performance test your reports, forms and programs.
If you are using indexes then you will need to revisit those since
your where conditions would change.
Also, revisit your keys and relationships.
None of your asked suggestion is advisable. You need to follow given guideline to secure your whole system and audit your technology partner as well.
[1]You should create a module on Admin side which will show you existing tables as well table which will be added in future.
[2]Create user for your technology partner and provide permission on those objects.
[3]Keep one audit-trail table, and insert entry of user name/IP etc.in it. So you will have complete tracking of activity carried out by your technology partner.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have around 10 tables containing millions of rows. Now I want to archive 40% of data due to size and performance problem.
What would be best way to archive the old data and let the web application run? And in the near future if I need to show up the old data along with existing.
Thanks in advance.
There is no single solution for any case. It depends much on your data structure and application requirements. Most general cases seemed to be as follows:
If your application can't be redesigned and instant access is required to all your data, you need to use more powerful hardware/software solution.
If your application can't be redesigned but some of your data could be count as obsolete because it's requested relatively rearely you can split data and configure two applications to access different data.
If your application can't be redesigned but some of your data could be count as insensitive and could be minimized (consolidated, packed, etc.) you can perform some data transformation as well as keeping full data in another place for special requests.
If it's possible to redesign your application there are many ways to solve the problem.In general you will implement some kind of archive subsystem and in general it's complex problem especially if not only your data changes in time but data structure changes too.
If it's possible to redesign your application you can optimize you data structure using new supporting tables, indexes and other database objects and algorythms.
Create archive database if possible maintain different archive server because this data wont be much necessary but still need to be archived for future purposes, hence this reduces load on server and space.
Move all the table's data to that location. Later You can retrieve back in number of ways:
Changing the path of application
or updating live table with archive table
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am looking for a non-SQL database.
My requirements are as follow:
Should be able to store >10 billion records
Should consume only 1 gb of memory atmost.
User request should take less than 10 ms. (including processing time)
Java based would be great.(i need to access it from java and also if anytime I need to modify the database code )
The database will hold e-commerce search records like number of searches ,sales , product bucket,product filters...and many more...the database now is a flat file and I show now some specific data to users.The data to be show I configure prior and then according to that configuration users can send http request to view data. I want to make things more dynamic and people can view data without prior configuration....
In other words I want to built a fast analyzer which can show users what the user request for.
The best place to find names of non-relational databases is the NoSQL site. Their home page has a pretty comprehensive list, split onto various categories - Wide Column Store, Key-value Pair, Object, XML, etc. Find out more.
You don't really give enough information about your requirements. But it sounds like kdb+ meets all of the requirements that you've stated. But only if you want to get to grips with the rather exotic (and very powerful) Q language.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What should be the data model for a work flow application? Currently we are using an Entity Attribute Value based model in SQL Server 2000 with the user having the ability to create dynamic forms (on asp.net), but as the data grows performance is getting down and hard to generate report and worse if too many users concurrently query the data (EAV).
As you have probably realized, the problem with an EAV model is that tables grow very large and queries grow very complex very quickly. For example, EAV-based queries typically require lots of subqueries just to get at the same data that would be trivial to select if you were using more traditionally-structured tables.
Unfortunately, it is quite difficult to move to a traditionally-structured relational model while simultaneously leaving old forms open to modification.
Thus, my suggestion: consider closing changes on well-established forms and moving their data to standard, normalized tables. For example, if you have a set of shipping forms that are not likely to change (or whose change you could manage by changing the app because it happens so rarely), then you could create a fixed table and then copy the existing data out of your EAV table(s). This would A) improve your ability to do reporting, B) reduce the amount of data in your existing EAV table(s) and C) improve your ability to support concurrent users / improve performance because you could build more appropriate indices into your data.
In short, think of the dynamic EAV-based system as a way to collect user's needs (they tell you by building their forms) and NOT as the permanent storage. As the forms evolve into their final form, you transition to fixed tables in order to gain the benefits discussed above.
One last thing. If all of this isn't possible, have you considered segmenting your EAV table into multiple, category-specific tables? For example, have all of your shipping forms in one table, personnel forms in a second, etc. It won't solve the querying structure problem (needing subqueries) but it will help shrink your tables and improve performance.
I hope this helps - I do sympathize with your plight as I've been in a similar situation myself!
Typically, when your database schema becomes very large and multiple users are trying to access the same information in many different ways, Data Warehousing, is applied in order to reduce major load on the database server. Unlike your traditional schema where you are more than likely using Normalization to keep data integrity, data warehousing is optimized for speed and multiple copies of your data are stored.
Try using the relational model of data. It works.