So here is the issue i am facing on deciding, Would like to know what is the best practice and which one to choose.
Suppose we have a list of items we use GET to get all the lists and we do POST for editing and DELETE for deleting a list. After successful completion of either of these request whats the best practices to make sure client state is in sync with server state.
GET to fetch all the lists
let server return all the lists after edit or delete operations.
or client should trust 200 OK and update the client state with modifying lists
This is very opinion-based, as there are lots of variables in play here. Ideally I'd always say, drop the client-state, and always re-get the data after updates. This of course completely depends on the cost/complexity of re-retrieving that state. If it is cheap/small, go ahead and re-GET the lists... it's far simpler than trying to maintain state synchronization. If you DO update a local cache, make sure it's a CACHE pattern, and not a REPOSITORY pattern. This means you shouldn't necessarily trust the cache for important operations, and that you should do things like verify the entities before operating on them.
I would not return updated lists from a POST/DELETE, updated entities is fine, but the whole list should be a subsequent GET. Anything else and you are not only breaking REST, but you are also defining the behavior of the client application in your service (which you don't want to do).
Related
I have a large database, so I decided to create a copy of this database on the front-end side in my Store (I am using Vuex with Vuejs), i.e. each table in the database has a corresponding array in the Store, and every time I have an update I persist it to the real database and to my copy (which exists in the Store) at the same time to avoid fatch the data again after each update.
Is it a good idea or not (in terms of performance) ?
I think its totally fine to create a cache of your data in the client side, and it is even recommended. But you do need to be aware of some things:
Make sure you update the client side only when the data has been successfully saved in the database. So the client will not see false information.
If you have multiple users who can change the data, and you need to use the data in real time - make sure that you send the updated data to other users as well.
Check which tables from the database you actually need at any time - maybe you need different tables at different views of the application, so you do not have to save all the database all the time. It can help to reduce memory usage
Consider using lazy loading, which means that you load the tables only when you need them, and then save it in the cache. The next time you need to use this table you won`t load it from the server, but instead use the cached data.
When you put the data in vuex store, vue will consider this data reactive, which can cause performance issues - especially if you have a lot of data. If you have data that you know will not be changed, or data that is rarely changed - consider using Object.freeze() which basically tells vue not to put any watchers on this object. This could help improve performance issues by much.
EDIT:
if you are concerned about performance issues I would implement the cache using lazy loading, and Object.freeze() which means you will not be able to change the data in the client side - so for every change you should send the update to the server and receive the full updated table in - so you will assign the new value to your cache with Object.freeze(). That way you don't have to request the table from the server for every usage, only for updates. This will help to keep good performance.
I have a list of 20k employees to display in a React table. When the admin user changes one, I want the change reflected in the table - even if she does a reload - but I don't want to re-fetch all 20k including the unchanged 19 999.
(The table is of course paged and shows max N at once but I still need all 20k to support search and filtering, which is impractical to do server side for various reasons)
The solution I can think of is to set caching headers for /api/employees so that it is cached for e.g. one hour and have another endpoint, /api/employees?changedSince= and somehow ensure that server knows which employees have been changed. But I am sure somebody has already implemented a solution(s) for this...
Thank you!
A timestamp solution would be the best, and simplest, way to implement it. It would only require a small amount of extra data to be stored and would provide the most maintainable and expandable solution.
All you would need to do is update the timestamp when an item in the list is updated. Then, when the page loads for the first time, access /api/employees, then periodically request /api/employees?changedSince to return all of the changed rows in the table, for React to then update.
In terms of caching the main /api/employees endpoint, I’m not sure how much benefit you would gain from doing that, but it depends on how often the data is updated.
As you are saying your a in control of the frontends backend, imho this backend should cache all of the upstream data in its own (SQL or whatever) database. The backend then can expose a proper api (with pagination and search).
The backend can also implement some logic to identify which rows have changed.
If the frontend needs live updates about changes you can use some technology that allows bi-directional communication (SignalR if your backend is .NET based, or something like socket.io if you have a node backend, or even plain websockets)
I have a Flink project that receives an events streams, and executes some logic to add a flag of this event, then it saves the flag and the eventID for a while to be reused or to be queried by other system.
in this case, the volume of data is not too many, and need to be good reliability, of course, better to be updated in time before being used.
Traditionally, we can use an external database to save this kind of data.
But after I learned the state, I saw it seems to be very useful, and has a good backends mechanism, and can be queryable.
So I am asking question to listen more to your arguments and evidence.
I am moving my last two comments to here as an answer since I realized I am essentially doing that.
Ok, It might have been the Uber keynote then. But the bottom line is that there are companies that are using extremely large state to hold data that you need to perform calculations against effectively.
For example, I made a program that took in messages that with an unique ID and a value field(int). I then had a stateful function that was keyed by the ID of the received message and every message I received for that ID would be added to a stateful value object, updating the the total for that ID. You could make a stateful list object to hold all the messages you received if you needed that. An alternative to that is to use a "new age" database that is designed for quick read/writes, like Cassandra, to store that. But that approach comes with its own limitations because of the I/O (long story short, Flink and Cassandra could handle lots of dat fast, the network bandwidth could not).
So keeping all that data in state in flink can be done and used well and has many benefits.
The one thing that I have to caveat this with is that I do not know if Flink's state has the same sort of failsafes like that of Cassandra or Kafka. Whereas they replicate their data across nodes so that if one goes down, then the others can handle everything and repopulate the other node when it is restarted. Flink's state can be stored on a remote backend like an s3 bucket or hdfs
(see: https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/state_backends.html), but I do not know if there is replication of the state. So if the state is stored all on one node that goes down, if it is gone for good or is backed up on another node. That is something to look into more since that should be a big decision in your choice.
Hope that at least gave you some info and a brief idea of what questions to ask.
We need to keep our Firebase data in sync with other databases for full-text search (in ElasticSearch) and other kinds of queries that Firebase doesn't easily support.
This needs to be as close to real-time as possible, we can't just export a nightly dump of the Firebase JSON or anything like that, aside from the fact that this will get rather large.
My initial thought was to run a Node.js client which listens to child_changed, child_added, child_removed etc... events of all the main lists, but this could get a bit unweildy and would it be a reliable way of syncing if the client re-connects after a period of time?
My next thought was to maintain a list of "items changed" events and write to that every time an item is created/updated, similar to the Firebase work queue example. The queue could contain the full path to the data which has changed and the worker just consumes that and updates the local database accordingly.
The problem here is every bit of code which makes updates has to remember to write to this queue otherwise the two systems will get out of sync. Some proxy code shouldn't be too hard to write though.
Has anyone else done anything similar with any success?
For search queries, you can integrate directly with ElasticSearch; there is no need to sync with a secondary database. Firebase has a blog post about integrating and a lib, Flashlight, to make this quick and painless.
Another option is to use the logstash-input-firebase Logstash plugin in order to listen to changes in your Firebase real-time database(s) and forward the data in real-time to Elasticsearch using an elasticsearch output.
What is the best way to program an immediate reaction to an update to data in a database?
The simplest method I could think of offhand is a thread that checks the database for a particular change to some data and continually waits to check it again for some predefined length of time. This solution seems to be wasteful and suboptimal to me, so I was wondering if there is a better way.
I figure there must be some way, after all, a web application like gmail seems to be able to update my inbox almost immediately after a new email was sent to me. Surely my client isn't continually checking for updates all the time. I think the way they do this is with AJAX, but how AJAX can behave like a remote function call I don't know. I'd be curious to know how gmail does this, but what I'd most like to know is how to do this in the general case with a database.
Edit:
Please note I want to immediately react to the update in the client code, not in the database itself, so as far as I know triggers can't do this. Basically I want the USER to get a notification or have his screen updated once the change in the database has been made.
You basically have two issues here:
You want a browser to be able to receive asynchronous events from the web application server without polling in a tight loop.
You want the web application to be able to receive asynchronous events from the database without polling in a tight loop.
For Problem #1
See these wikipedia links for the type of techniques I think you are looking for:
Comet
Reverse AJAX
HTTP Server Push
EDIT: 19 Mar 2009 - Just came across ReverseHTTP which might be of interest for Problem #1.
For Problem #2
The solution is going to be specific to which database you are using and probably the database driver your server uses too. For instance, with PostgreSQL you would use LISTEN and NOTIFY. (And at the risk of being down-voted, you'd probably use database triggers to call the NOTIFY command upon changes to the table's data.)
Another possible way to do this is if the database has an interface to create stored procedures or triggers that link to a dynamic library (i.e., a DLL or .so file). Then you could write the server signalling code in C or whatever.
On the same theme, some databases allow you to write stored procedures in languages such as Java, Ruby, Python and others. You might be able to use one of these (instead of something that compiles to a machine code DLL like C does) for the signalling mechanism.
Hope that gives you enough ideas to get started.
I figure there must be some way, after
all, web application like gmail seem
to update my inbox almost immediately
after a new email was sent to me.
Surely my client isn't continually
checking for updates all the time. I
think the way they do this is with
AJAX, but how AJAX can behave like a
remote function call I don't know. I'd
be curious to know how gmail does
this, but what I'd most like to know
is how to do this in the general case
with a database.
Take a peek with wireshark sometime... there's some google traffic going on there quite regularly, it appears.
Depending on your DB, triggers might help. An app I wrote relies on triggers but I use a polling mechanism to actually 'know' that something has changed. Unless you can communicate the change out of the DB, some polling mechanism is necessary, I would say.
Just my two cents.
Well, the best way is a database trigger. Depends on the ability of your DBMS, which you haven't specified, to support them.
Re your edit: The way applications like Gmail do it is, in fact, with AJAX polling. Install the Tamper Data Firefox extension to see it in action. The trick there is to keep your polling query blindingly fast in the "no news" case.
Unfortunately there's no way to push data to a web browser - you can only ever send data as a response to a request - that's just the way HTTP works.
AJAX is what you want to use though: calling a web service once a second isn't excessive, provided you design the web service to ensure it receives a small amount of data, sends a small amount back, and can run very quickly to generate that response.