I want to update my hibernate cache if someone updated the database manually. I have lots of data coming through so I cannot afford to clear data all the time or rely on "timeToLiveSeconds" to update. I want something to update the cache memory as soon as there is any changes in the database.
There is no way to identify the same. How the cache come to know if someone modifies the value from the backend. You have to identify those sets of value and modify the cache at regular interval.
If you are willing to code against the SPIs of Hibernate and possibly also against the API of your cache provider, you can implement this. You just need access to the cache regions and invalidate them per query space (table) or individual entries which are keyed by primary key.
Related
I have a large database, so I decided to create a copy of this database on the front-end side in my Store (I am using Vuex with Vuejs), i.e. each table in the database has a corresponding array in the Store, and every time I have an update I persist it to the real database and to my copy (which exists in the Store) at the same time to avoid fatch the data again after each update.
Is it a good idea or not (in terms of performance) ?
I think its totally fine to create a cache of your data in the client side, and it is even recommended. But you do need to be aware of some things:
Make sure you update the client side only when the data has been successfully saved in the database. So the client will not see false information.
If you have multiple users who can change the data, and you need to use the data in real time - make sure that you send the updated data to other users as well.
Check which tables from the database you actually need at any time - maybe you need different tables at different views of the application, so you do not have to save all the database all the time. It can help to reduce memory usage
Consider using lazy loading, which means that you load the tables only when you need them, and then save it in the cache. The next time you need to use this table you won`t load it from the server, but instead use the cached data.
When you put the data in vuex store, vue will consider this data reactive, which can cause performance issues - especially if you have a lot of data. If you have data that you know will not be changed, or data that is rarely changed - consider using Object.freeze() which basically tells vue not to put any watchers on this object. This could help improve performance issues by much.
EDIT:
if you are concerned about performance issues I would implement the cache using lazy loading, and Object.freeze() which means you will not be able to change the data in the client side - so for every change you should send the update to the server and receive the full updated table in - so you will assign the new value to your cache with Object.freeze(). That way you don't have to request the table from the server for every usage, only for updates. This will help to keep good performance.
One parameter for my flink job is dynamic and i have an api so as to fetch the dynamic value. Can i call the api in source everytime so as to fetch data based on the parameter? Is it the correct way? Will it cause any trouble in flink job?
So, If I understand correctly the idea is that You first get some key from dynamoDB and then use that to query external service from the source.
I think that should be possible in general, but there are few things to have in mind when doing that.
Not sure about performance of such solution. Are You going to query database constantly? Or somehow just get changes ? There are several things to consider here to have good performance of the source.
It may be hard to provide any strong guarantees for such setup, but that depends on the charcteristics of the setup itself. I.e. how are You going to handle failures? How often will key in database change? Will the data be accessible via URL after the key in DB changes ? You probably can keep the last read key in state, so that when the job fails and key in DB changes You can try to read the data for the previous key (for which the job has failed) but that depends on the questions above.
Finally, depending on the characteristics of the setup, it may be possible to use existing Flink operators to achieve that. For example, You can technically stream changes from Database (using one of existing connectors depending on DB) and then use that data in AsyncIO to query external URL, so that finally You have a stream of data from URL witout creating Your own source.
I'm wondering what the preferred way is to cache elements from a database with an in-memory cache, like redis or memcache. The context is that I have a table of items which are being accessed by an API, frequently ( millions of times per second ) as real-time stats. In general, the API is just looking for items in a given range of time, with a certain secondary id. The same data is likely to be hit many times. It seems like you could do it in a few ways:
Cache the entire query.
Meaning, the entire data string resulting from the real query to the Database would get stored in the cache, with a minimal query as the key. The advantage is that for frequently used queries, there is just a single access to get the entire set of results back. But any slightly different query needs to be redone and cached.
Cache the items in the query.
Meaning, each item returned from the real query gets stored individually in the cache, with a searchable id as the key. The advantage is that for slightly different queries, you don't need to run a full query against the DB again, just elements that are not currently cached.
Mirror the entire database
Meaning, each item is put into the cache as soon as it gets created/udpdated in the DB. The cache is always assumed to be up to date, and so all queries can just run on the cache directly.
It seems like these approaches might be better or worse in certain circumstances, but are there some pitfalls here that make some completely undesirable? Or just clearly better in this use-case?
Thanks for any advice!
#3 i.e., Mirroring the database is not a good option. Also, keep in mind that most in memory systems like Redis don't have a query langurage but rather retreival is based on Keys. So, it is not a good idea to replicate data, especially if data is relational.
You should use a combination of #1 and #2. Redis is key based, so you will have to design the keys as per your query criteria. I would suggest to build a library that works on the concept of etag. In redis, save the etag and the query response. The library should pass the etag to backend logic, which will re-run the query only if etag doesn't match. If the etag matches then backend will not re-run query and library will take the cached response from redis and send back to client.
Refer
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag for concept.
I have a list of 20k employees to display in a React table. When the admin user changes one, I want the change reflected in the table - even if she does a reload - but I don't want to re-fetch all 20k including the unchanged 19 999.
(The table is of course paged and shows max N at once but I still need all 20k to support search and filtering, which is impractical to do server side for various reasons)
The solution I can think of is to set caching headers for /api/employees so that it is cached for e.g. one hour and have another endpoint, /api/employees?changedSince= and somehow ensure that server knows which employees have been changed. But I am sure somebody has already implemented a solution(s) for this...
Thank you!
A timestamp solution would be the best, and simplest, way to implement it. It would only require a small amount of extra data to be stored and would provide the most maintainable and expandable solution.
All you would need to do is update the timestamp when an item in the list is updated. Then, when the page loads for the first time, access /api/employees, then periodically request /api/employees?changedSince to return all of the changed rows in the table, for React to then update.
In terms of caching the main /api/employees endpoint, I’m not sure how much benefit you would gain from doing that, but it depends on how often the data is updated.
As you are saying your a in control of the frontends backend, imho this backend should cache all of the upstream data in its own (SQL or whatever) database. The backend then can expose a proper api (with pagination and search).
The backend can also implement some logic to identify which rows have changed.
If the frontend needs live updates about changes you can use some technology that allows bi-directional communication (SignalR if your backend is .NET based, or something like socket.io if you have a node backend, or even plain websockets)
I am writing a server-side application, say app1, using Spring-boot. This app1 accesses a database db1 on a dedicated DB server. To speed up DB access, I have marked some of my JPARepository as #Cacheable(<some_cache_key), with an expiration time, say 1 hour.
The problem is: db1 is shared among several applications, each may update entries inside it.
Question: will I have performance gain in my app1 by using caches inside my application (#Cacheable)? (Note, the cache is inside my application, not inside the database, i.e. mask the entire DB with cache manager like Redis)
Here are my thoughts:
If another application app2 modifies a DB entry, how would the cache inside app1 know that entry is updated? Then my app1's cache went stale, isn't it? (until it starts to refresh after the fixed 1hr refresh cycle)
if #1 is correct, then does it mean the correct way of setting up cache should be mask the entire DB with some sort of cache manager. Is Redis for such kind of usage?
So, many questions there.
Will I have performance gain in my app1 by using caches inside my
application (#Cacheable)?
You should always benchmark it but theoretically, it will be faster to access the cache than the database
If another application app2 modifies a DB entry, how would the cache
inside app1 know that entry is updated? Then my app1's cache went
stale, isn't it? (until it starts to refresh after the fixed 1hr
refresh cycle)
It won't be updated unless you are using a clustered cache. Ehcache using a Terracotta cluster is such a cache. But yes, if you stick on a basic application cache, it will get stale.
if #1 is correct, then does it mean the correct way of setting up
cache should mask the entire DB with some sort of cache manager. Is
Redis for such kind of usage?
Now it gets subtle. I'm not a Redis expert. But as far as I know, Redis is frequently used as a cache but it's actually a NoSQL database. And it won't be in front (again, from as far as I know), it will be aside. So you will first query Redis to see if your data is there and then your database. If your database is much slower to access and you have a really good cache hit, it will improve your performance. But please do a benchmark.
Real caches (like Ehcache) are a bit more efficient. They add the concept of near caching. So your app will keep cache entries in memory but also on the cache server. If the entry is updated, near cache will be updated. So you get application cache performance but also coherence between servers.