I am a self-taught developer and currently I am working on a finance app and would like to ask you which way is more efficient to store my data in Firestore.
To say it in general, I have many users, but which also have many transactions and query them with a stream in real time. My question now is how it would be better to store this data and whether there are also advantages in terms of cost?
My structures to store this data would be like followed:
Which structure would you use, or is there even a better way to structure this data?
Thanks in advance for your answers😄
as you are using Firestore so I think you want to optimize the Reads/Writes
so I would prefer the first one because you will have the user with its own transactions but the other one you will make a read for the user and another one for its own transactions but you can do it better by making both because you might need to get the transactions only so u can get it from transaction collection avoiding the additional quires to get each transaction from the Users collection
for simplicity, and what i usually do is the "STRUCTURE 2". the structure 2 is what i usually use
Related
If you had to make provision for 80 million records (one for each page on the internet) and store the relationships between those records (which is 80 billion to the nth power), which database would be the best for this?
I've started this project thinking we will only map a portion of the internet, but unfortunately it has gone far beyond the limits of mysql. I need a better way to keep track of this data. The frontend is PHP, but I suppose the backend can be anything, as long as it can handle that amount of data?
i won't say there is the one holy database for your needs, maybe it could be better for your company to split your database in logical parts to handle the amount of data in a better way. maybe you could outsource some data into file system as you won't need anything everytime in your database.
if you scan the interwebs, you probably save the html, css or any big data you crawl for into your filesystem while you save connections and everything meta related into your database. but i really think you'd mentioned that already.
the best advice i want to give here is to make sure, your structure of your database is whatever fits your processes the best before think about switching the database. if you really need to switch (as mysql would not give you more performance), there will be mongodb and/or webscalesql. webscale seems to be used by facebook to handle the amount of their data.
a big question would be if you just can improve your performance by improve your hardware. you should check that too, AFTER you checked your structure and processes!
I am working on a group project and we are having a discussion about whether to calculate data that we want from an existing database and store it in a new database to query later, or calculate the data from the existing database every time we need to use it. I was wondering what the pros and cons may be for either implementation. Is there any advice you could give?
Edit: Here is more elaborate explanation. We have a large database that has a lot of information being submitted to it daily. We are building a system to track certain points of data. For example, we are getting the count of how many times a user does something that is entered in the database. Using this example (are actual idea is a bit more complex), we are discussing to methods of getting the count of actions per users. The first method is to create a database that stores the users and their action count, and query this database every time we need the action count. The second method would be to query the large database and count the actions per user every time we need to use it. I hope this explanation helps explain. Thoughts?
Edit 2: Two more things that may be useful to point out is 1: I only have read access to the large database and 2: My ultimate goal is to display this information on a web page for end users.
This is a generic question about optimization by caching. The following was my answer to essentially the same question. Even though that question provided a bunch of different details, none of them were specific enough to merit a non-generic answer either:
The more you want to calculate at query time, the more you want views,
calculated columns and stored or user routines. The more you want to
calculate at normalized base update time, the more you want cascades
and triggers. The more you want to calculate at some other (scheduled
or ad hoc) time, the more you use snapshots aka materialized views and
updated denormalized bases. You can combine these. Any time the
database is accessed it can be enabled by and restricted by stored
routines or other api.
Until you can show that they are in adequate, views and calculated
columns are the simplest.
The whole idea of a DBMS is to store a representation of your
application state as the database (which normalization reduces the
redundancy of) and then you query and let the DBMS implement and
optimize calculation of the answer. You haven't presented a reason for
not doing that in the most straightforward way possible.
[sic]
Always make sure an application is reading its own personal ("external") database that is a view of "the" ("conceptual") database so that when you change the implemention of the former (plus the rest of some combined interfact) by the latter (plus the rest of some compbined mechanisms) your applications do not have to change ("logical independence"). Here the applications are your users' and your trackers'.
Ultimately you must instrument and guestimate. When it is worth it you start caching. Preferably as much as possible in terms of high-level notions like views and snapshots and as little as possible in non-DBMS code. One of he benefits of the relational model is that it is easy to describe a strightforward relational interface in terms of another straightforward relational interface. You protect your applications from change by offering an interface that hides secrets of implementation or which of a family of interfaces is the current one.
Good day!
I have 350GB unstructured data disaggregated by 50-80 columns.
I need to store this data in NoSQL database and do a variety of selection and map / reduce queries filtered by 40 columns.
I would like to use mongodb, so I have a certain question: is this database able to cope with this task and what do I need to implement its architecture within the existing provider hetzner.de?
Yes, large datasets are easy.
Perhaps Apache Hadoop is also worth looking at. It is aimed at handling/analyzing large/huge amounts of data.
mongodb is a very scalable and flexible database, if used properly. It can store as much data as you need, but the bottom line is whether you can query your data efficiently.
comments:
You will need to make sure you have the proper indexes in place and that a fair amount of them can fit in RAM.
In order to achieve that you may need to use sharding to split the working set
current mapreduce is easy to use, can iterate over all your data but it is rather slow to process. It should become faster in next mongodb and there will also be a new aggregation framework to complement mapreduce.
Bottom line is that you should not take mongodb as a magical store that will be perfect out of the box, make sure you read the good docs and materials :)
I am designing an app that would involve users 'following' each other's activity, in the twitter sense, but I am not very experienced with database/query design/efficiency. Are there best practices for managing this, pitfalls to avoid, etc.? I gather this can create a very large load on the db if not done properly (or maybe even then?).
If it makes a difference it is likely that people will 'follow' only a relatively small number of people (but a person may have many followers). However this is not certain, and I wouldn't want to count on it.
Any advice gratefully received. Thanks.
Pretty simple and easy to do with full normalisation. If you have a table of users, each with a unique ID, you would have a TABLE_FOLLOWERS table with the columns, USERID and FOLLOWERID which would describe all the followers for each user as a one to one to many relationship.
Even with millions of assosciations on a half decent database server this will perform well and fast as long as you are using a good database (IE, not MS-Access).
The model is fairly simple. The problem is in the size of the Subscription table; if there are 1 million users, and each subscribes to 1000, then the Subscription table has 1 billion rows.
That depends on how many users you expect to need to support; how many followers you expect users to have; and what sort of funding/development-effort you expect to have access to should your answers to the previous questions prove optimistic.
For a small scale project I would likely ignore the database, design the application as a simple object model with User objects that maintain a List[followers]. Keep it all in RAM for normal operation and use an ORM to persist to a database periodically (probably postgresql or mysql).
For a larger project I would not be using a relational database at all; but exactly what I would use would depend on the specific details of the project.
If you are only trying to spike the concept, go with the ORM approach; but, keep in mind it won't scale.
You probably should read http://highscalability.com/ and it's articles on how this is managed by the big sites.
I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:
CRUD of entities with some fields which have unique index on it
ecommerce web app like eBay (better description here).
And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).
Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?
You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)
One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?
Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?
So lets look at your examples:
CRUD of entities with some fields with unique index on it.
If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.
So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.
ecommerce web app like ebay
I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.
Is there a need to normalize the data?
Yes: Use relational.
No: Use document.
I am in the same boat, I am loving couchdb at the moment, and I think that the whole functional style is great. But when exactly do we start to use them in ernest for applications. I mean, yes we can all start to develop applications extremely quickly, cruft free with all those nasty hang-ups about normal form being left in the wayside and not using schemas. But, to coin a phrase "we are standing on the shoulders of giants". There is a good reason to use RDBMS and to normalise and to use schemas. My old oracle head is reeling thinking about data without form.
My main wow factor on couchdb is the replication stuff and the versioning system working in tandem.
I have been racking my brain for the last month trying to grok the storage mechanisms of couchdb, apparently it uses B trees but doesn't store data based on normal form. Does this mean that it is really really smart and realises that bits of data are replicated so lets just make a pointer to this B tree entry?
So far I am thinking of xml documents, config files, resource files streamed to base64 strings.
But would I use couchdb for structural data. I don't know, any help greatly appreciated on this.
Might be useful in storing RDF data or even free form text.
A possibility is to have a main relational database that stores definitions of items that can be retrieved by their IDs, and a document database for the descriptions and/or specifications of those items. For example, you could have a relational database with a Products table with the following fields:
ProductID
Description
UnitPrice
LotSize
Specifications
And that Specifications field would actually contain a reference to a document with the technical specifications of the product. This way, you have the best of both worlds.
Document based DBs are best suiting for storing, well, documents. Lotus Notes is a common implementation and Notes email is an example. For what you are describing, eCommerce, CRUD, etc., realtional DBs are better designed for storage and retrieval of data items/elements that are indexed (as opposed to documents).
Re CRUD: the whole REST paradigm maps directly to CRUD (or vice versa). So if you know that you can model your requirements with resources (identifiable via URIs) and a basic set of operations (namely CRUD), you may be very near to a REST-based system, which quite a few document-oriented systems provide out of the box.