I am having some trouble understanding where an HQL query gets the information from. My project is using different threads and each thread reads/writes to the database. Threads do not share Session objects, instead I am using a HibernateUtil class which creates sessions for me.
Until recently, I would only close a session after writing but not after reading. Changes to objects would be immediately seen in the database but when reading on other threads (different Session object than the one used for writing) I would get stale information. Reading and writing happened always on different threads which means different Session objects and different session caches.
I always thought that by using HQL instead of Criteria, I would always target the database (or second level cache) and not the session cache but while debugging my code, it was made clear to me that the HQL was looking for the object in the session cache and retrieved an old outdated object.
Was I wrong in assuming that HQL always targets the database? Or at least the second level cache?
PS: I am using only one SessionFactory object.
Hibernate has different concepts of caching - entity caches, and query caches. Entity caching is what the session cache (and the 2nd level cache, if enabled) does.
Assuming query caching is not enabled (which it's not, by default), then your HQL would have been executed against the database. This would have returned the IDs of the entities that match the query. If those entities were already in the session cache, then Hibernate would have returned those, rather than rebuilding them from the database. If your session has stale copies of them (because another session has updated the database), then that's the problem you have.
I would advise against using long-lived sessions, mainly for that reason. You should limit the lifespan of the session to the specific unit of work that you're trying to do, and then close it. There's little or no performance penalty to doing this (assuming you use a database connection pool). Alternatively, to make sure you don't get stale entities, you can call Session.clear(), but you may end up with unexpected performance side-effects.
Related
I have been trying to investigate what behaviours that may come from getting a DeadlineExceededException when using Objectify with caching turned on.
My experiments so far have been like this: 1) Store an entity or entities, 2) then sleep for most of the remaining execution time and 3) make some updates in an infite loop till the request is aborted. 4) Whether the cache is in sync with successful writes to the datastore is checked in a separate request.
"Some updates" means changing a lot (50) of strings in an object, then writing it back. I have also tried making updates to several objects in a transaction to test if I can get some inconsistent results when loading the entities again. So far, after thousands of tests, I have not got a single inconsistent entity from the cache.
So can I somehow provoke a load of a presumably cached entity to be inconsistent with the entity in the datastore?
There are a lot of possible reasons for this. If you're making changes in a single request, you're probably seeing the session cache in operation:
https://github.com/objectify/objectify/wiki/Caching
If you're making queries in many requests, you may be seeing the results of eventual consistency:
https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/
Perhaps you are seeing session cache contamination because you don't have the ObjectifyFilter installed? Recent versions of Objectify give you a nasty warning if you don't, but maybe you're running an old version?
I cache some data in redis, and reading data from redis if it's exists, otherwise reading data from database and write the data in redis.
I find that there are several ways to update redis after updating database.For example:
set keys in redis to expired
update redis immediately after updating datebase.
put data in MQ and use consumer to update redis.
I'm a little confused and don't know how to choose.
Could you tell me the advantage and disadvantage of each way and it's better to tell me other ways to update redis or recommend some blog about this problem.
Actual data store and cache should be synchronized using the third approach you've already described in your question.
As you add data to your definitive store (i.e. your SQL database), you need to enqueue this data to some service bus or message queue, and let some asynchronous service do the whole synchronization using some kind of background process.
You don't want get into this cases (when not using a service bus and asynchronous service):
Make your requests or processes slower because the user needs to wait until the data is both stored in your database and cache.
Have the risk of a fail during the caching process and not being able to have a retry policy (which is usually a built-in feature in a service bus or some message queues). Also, this failure can end up in a partial or complete cache corruption and you won't be able to automatically and easily schedule some task to fix this situation.
About using Redis key expiration, it's a good idea. Since Redis can expire keys using its built-in mechanism, you shouldn't implement key expiration from the whole background process. If a key exists is because it's still valid.
BTW, you won't be always on this case (if a key isn't expired it means that it shouldn't be overwritten). It might depend on your actual domain.
You can create an api to interact with your redis server, then use SQL CLR to call the call api
So I have been reading a lot of documentation on HRD and NDB lately, yet I still have some doubts regarding how NDB caches things.
Example case:
Imagine a case where a users writes data and the app needs to fetch it immediately after the write. E.g. A user creates a "Group" (similar to a Facebook/Linkedin group) and is redirected to the group immediately after creating it. (For now, I'm creating a group without assigning it an ancestor)
Result:
When testing this sort of functionality locally (having enabled high replication), the immediate fetch of the newly created group fails. A NoneType is returned.
Question:
Having gone through the High Replication docs and Google IO videos, I understand that there is a higher write latency, however, shouldn't NDB caching take care of this? I.e. A write is cached, and then asynchronously actually written on disk, therefore, an immediate read would be reading from cache and thus there should be no problem. Do I need to enforce some other settings?
Pretty sure you are running into the HRD feature where queries are "eventually consistent". NDB's caching has nothing to do with this behavior.
I suspect it might be because of the redirect that the NoneType is returned.
https://developers.google.com/appengine/docs/python/ndb/cache#incontext
The in-context cache persists only for the duration of a single incoming HTTP request and is "visible" only to the code that handles that request. It's fast; this cache lives in memory. When an NDB function writes to the Datastore, it also writes to the in-context cache. When an NDB function reads an entity, it checks the in-context cache first. If the entity is found there, no Datastore interaction takes place.
Queries do not look up values in any cache. However, query results are written back to the in-context cache if the cache policy says so (but never to Memcache).
So you are writing the value to the cache, redirecting it and the read then fails because the HTTP request on the redirect is a different one and so the cache is different.
I'm reaching the limit of my knowledge here but I'd suggest initially that you try the create in a transaction and redirect when complete/success.
https://developers.google.com/appengine/docs/python/ndb/transactions
Also when you put the group model into the datastore you'll get a key back. Can you pass that key (via urlsafe for example) to the redirect and then you'll be guaranteed to retrieve the data as you have it's explicit key? Can't have it's key if it's not in the datastore after all.
Also I'd suggest trying it as is on the production server, sometimes behaviours can be very different locally and on production.
I am having the following problem. I am now using the low-level
google datastore API rather than JDO, that way I should be in a
better position to see exactly what is happening in my code. I am
writing an entity to the datastore and shortly thereafter reading it
from the datastore using Jetty and eclipse. Sometimes the written
entity is not being read. This would be a real problem if it were to
happen in production code. I am using the 2.0 RC2 API.
I have tried this several times, sometimes the entity is retrieved
from the datastore and sometimes it is not. I am doing a simple
query on the datastore just after committing a write transaction.
(If I run the code through the debugger things run slow enough
that the entity has a chance of being read back on the second pass).
Any help with this issue would be greatly appreciated,
Regards,
The development server has the same consistency guarantees as the High Replication datastore on the live server. A "global" query uses an index that is only guaranteed to be eventually consistent with writes. To perform a query with strongly consistent guarantees, the query must be limited to an entity group, using an "ancestor" key.
A typical technique is to group data specific to a single user in a group, so the user can see changes to queries limited to the user's group with strong consistency guarantees. Another technique is to use fancier client logic to update the client's local view as soon as the change is submitted, so the user sees the change in the UI immediately while the update to the global index is in progress.
See the docs on queries and transactions.
I am using Hibernate in an Eclipse RAP application. I have database tables mapped to classes with Hibernate and these classes have properties that are fetched lazily (If these weren't fetched lazily then I would probably end up loading the whole database into memory on my first query). I do not synchronize database access so there are multiple Hibernate Sessions for the users and let the DBMS do the transaction isolation. This means different instances of fetched data will belong to different users. There are things that if a user changes those things, then I would like to update those across multiple users. Currently I was thinking about using Hibernate session.refresh(object) in these cases to refresh the data, but I'm unsure how this will impact performance when refreshing multiple objects or if it's the right way to go.
Hope my problem is clear. Is my approch to the problem OK or is it fundamentally flawed or am I missing something? Is there a general solution for this kind of problem?
I would appreciate any comments on this.
The general solution is
to have transactions as short as possible
to link the session lifecycle to the transaction lifecycle (this is the default: the session is closed when the transaction is committed or rolled back)
to use optimistic locking concurrency to avoid two transactions updating the same object at the same time.
If each transaction is very short and transaction A updates some object from O to O', then concurrent transaction B will only see O until it commits or rolls back, and any other transaction started after A will see O', because a new session starts with the transaction.
We maintain an application that does exactly what you are trying to accomplish. Yes, every session.refresh() will hit the database, but since all sessions will refresh the same row at the same time, the DB server will answer all of these queries from memory.
The only thing that you still need to solve is how to propagate the information that something has changed and needs reloading to all the other sessions, possibly even to sessions on a different host.
For our application, we have about 30 users on RCP and 10-100 users on RAP instances that all connect to the very same DB backend (though through pgpool). We use a small network service that every runtime connects to; when a transaction commits, the application tells this change service that "row id X of table T" has changed and this is then propagated to all other "change subscribers", even across JVMs.
But: make sure that session.refresh() is called within the Thread that belongs to that session, possibly the RAP-Display thread. Do not call refresh() from Jobs or other unrelated threads.
As long you don't have a large number of users updating big counts of rows in short time, I guess you won't have to worry about performance.