How can I know where are Reads Writes in firebase - database

I am a new subscriber to Firebase, and I have unexpected reads in my database. And I'd like to know where the readings are done, specifically in which document or collection to identify the problem.
Is there a way to access this information?

Related

How does real-time collaborative applications saves the data?

I have previously done some very basic real-time applications using the help of sockets and have been reading more about it just for curiosity. One very interesting article I read was about Operational Transformation and I learned several new things. After reading it, I kept thinking of when or how this data is really saved to the database if I were to keep it. I have two assumptions/theories about what might be going on, but I'm not sure if they are correct and/or the best solutions to solve this issue. They are as follow:
(For this example lets assume it's a real-time collaborative whiteboard:)
For every edit that happens (ex. drawing a line), the socket will send a message to everyone collaborating. But at the same time, I will store the data in my database. The problem I see with this solution is the amount of time I would need to access the database. For every line a user draws, I would be required to access the database to store it.
Use polling. For this theory, I think of saving every data in temporal storage at the server, and then after 'x' amount of time, it will get all the data from the temporal storage and save them in the database. The issue for this theory is the possibility of a failure in the temporal storage (ex. electrical failure). If the temporal storage loses its data before it is saved in the database, then I would never be able to recover them again.
How do similar real-time collaborative applications like Google Doc, Slides, etc stores the data in their databases? Are they following one of the theories I mentioned or do they have a completely different way to store the data?
They prolly rely on logs of changes + latest document version + periodic snapshot (if they allow time traveling the document history).
It is similar to how most database's transaction system work. After validation the change is legit, the database writes the change in very fast data-structure on disk aka. the log that will only append the changed values. This log is replicated in-memory with a dedicated data-structure to speed up reads.
When a read comes in, the database will check the in-memory data-structure and merge the change with what is stored in the cache or on the disk.
Periodically, the changes that are present in memory and in the log, are merged with the data-structure on-disk.
So to summarize, in your case:
When an Operational Transformation comes to the server, two things happens:
It is stored in the database as is, to avoid any loss (equivalent of the log)
It updates an in-memory datastructure to be able to replay the change quickly in case an user request the latest version (equivalent of the memory datastructure)
When an user request the latest document, the server check the in-memory datastructre and replay the changes against the last stored consolidated document that might be lagging behind because of the following point
Periodically, the log is applied to the "last stored consolidated document" to reduce the amount of OT that must be replayed to produce the latest document.
Anyway, the best way to have a definitive answer is to look at open-source code that does what you are looking for, e.g. etherpad.

Flutter save data on device from firebase

I wanted to understand how to optimize the use of accesses to the Firebase database.
I have a database containing: mountain refuges and routes.
Practically every opening of the application all the necessary data are downloaded from the database.
My idea would be to save the relevant data in a json file, and only when the user wants more information on this data does the read access to the database take place.
How can I reduce read access to the database? Is it possible to save the most "relevant" data in a json file on the device so as not to download the "relevant" data every time the user opens the app?
If you're asking about pricing, Firestore is billed by interactions with the database reads, writes, deletes, not quantity of data (ignoring Stored Data for this use case)
Additionally, Queries are shallow: they only return documents in a particular collection or collection group and do not return subcollection data.
So as long as your structure supports it, showing the higher level List of Places should be the only reads you're doing until the user actually selects a place to get more details on.
If you have a million places, leverage pagination to only load enough to support the UI - say 100 at a time. That will limit the number of reads needed as well.

Data consistency across multiple microservices, which duplicate data

I am currently trying to get into microservices architecture, and I came across Data consistency issue. I've read, that duplicating data between several microservices considered a good idea, because it makes each service more independent.
However, I can't figure out what to do in the following case to provide consistency:
I have a Customer service which has a RegisterCustomer method.
When I register a customer, I want to send a message via RabbitMQ, so other services can pick up this information and store in its DB.
My code looks something like this:
...
_dbContext.Add(customer);
CustomerRegistered e = Mapper.Map<CustomerRegistered>(customer);
await _messagePublisher.PublishMessageAsync(e.MessageType, e, "");
//!!app crashes
_dbContext.SaveChanges();
...
So I would like to know, how can I handle such case, when application sends the message, but is unable to save data itself? Of course, I could swap DbContextSave and PublishMessage methods, but trouble is still there. Is there something wrong with my data storing approach?
Yes. You are doing dual persistence - persistence in DB and durable queue. If one succeeds and other fails, you'd always be in trouble. There are a few ways to handle this:
Persist in DB and then do Change Data Capture (CDC) such that the data from the DB Write Ahead Log (WAL) is used to create a materialized view in the second service DB using real time streaming
Persist in a durable queue and a cache. Using real time streaming persist the data in both the services. Read data from cache if the data is available in cache, otherwise read from DB. This will allow read after write. Even if write to cache fails in worst case, within seconds the data will be in DB through streaming
NServiceBus does support durable distributed transaction in many scenarios vs. RMQ.Maybe you can look into using that feature to ensure that both the contexts are saved or rolled back together in case of failures if you can use NServiceBus instead of RMQ.
I think the solution you're looking for is outbox pattern,
there is an event related database table in the same database as your business data,
this allows them to be committed in the same database transaction,
and then a background worker loop push the event to mq

Handling user activity on web portal with performance

The users on my website do operations like login, logout, update profile, change passwords etc. I am trying to come up with something that can store these user activities for my users and also return the matching records in case some system asks me for them based on userIds.
The problem from what I can think of looks more write intensive(since users keep logging into the website very often and I have to record that). Once in a while(say when users clicks on history or some reporting team needs it), the data records are read and returned.
So my application is write intensive
There are various approaches I can think of.
Approach 1. The system that gets those user activities keeps on writing them into a queue and another one keep fetching from that queue(periodically or when it is filled completely) and write them into database(which has been sharded(assume based on hash of userId)).
The problem with this approach is if my activity manager runs on multiple nodes, it has to send those records to various shards which means a lot of data movement over network.
Approach 2:The system that gets those user activities keeps on writing them into a queue and another one keep fetching from that queue(periodically or when it is filled completely) and write them into read though write through cache which would take care of writing into the database.
Problem with this approach is I do not know If I can control as to where those records would be written(I mean to which shard). Basically I do not know if the write through cache works(does it map to a local DB or it can manage to send data to shards).
Approach 3: The login operation is the most common user activity in my system. I can have a separate queue for login which must be periodically flushed to disk.
Approach 4: Use some cloud based storage which acts as a in memory queue where data coming from all nodes in stored. This would be reliable cache that guarantees no data loss. Periodically read from this cache and store that into the database shards.
There are many problems to solve:
1. Ensuring I do not loose the data(What kind of data replication to use i.e. any queue that ensures reliability)
2. Ensuring my frequent writes do not result in performance
3. Avoid single point of failure.
4. Achieving infinite scale
I need suggestion based on above from the existing solution available.

Using Task Queues in GAE to insert bulk data

I am using Google App Engine to create a web application. The app has an entity, records for which will be inserted through an upload facility by the user. User may select up to 5K rows(objects) of data. I am using DataNucleus project as JDO implementation. Here is the approach I am taking for inserting the data to Data Store.
Data is read from the CSV and converted to entity objects and stored in a list.
The list is divided into smaller groups of objects say around 300/group.
Each group is serialized and stored in cache using memcache using a unique id as the key.
For each group, a task is created and inserted into the Queue along with the key. Each task calls a servlet which takes this key as the input parameter, reads the data from memory and inserts this to the data store and deletes the data from memory.
The Queue has a maximum rate of 2/min and the bucket size is 1. The problem i am facing is the task is not able to insert all 300 records in to data store. Out of 300, maximum that gets inserted is around 50. I have validated the data once it is read from memcache and am able to get all the stored data back from the memory. I am using the makepersistent method of the PersistenceManager to save data to ds. Can someone please tell me what the issue could be?
Also, I want to know, is there a better way of handling bulk insert/update of records. I have used BulkInsert tool. But in cases like these, it will not satisfy the requirement.
This is a perfect use-case for App Engine mapreduce. Mapreduce can read lines of text from a blob as input, and it will shard your input for you and execute it on the taskqueue.
When you say that the bulkloader "will not satisfy the requirement", it would help if you say what requirement you have that it doesn't satisfy, though - I presume in this case, the issue is that you need non-admin users to upload data.

Resources