I'm running into a strange issue with Hibernate4 caching in a Grails 2.5.0 application that is serving as a platform for data migrated from a legacy system. The migration involves direct database inserts and removals (while testing migration SQL) of database records. These operations are causing pageload errors in the system because cached data is different from the actual state of the database. Stacktrace errors on a particular failed page load indicate missing records whose IDs are not currently referenced by anything in the database via foreign key. For example, one page fails to render with the following error:
018-02-27 10:16:32,495 http-bio-8080-exec-8 | ERROR StackTrace | superAdmin | Full Stack Trace:
org.hibernate.UnresolvableObjectException: No row with the given identifier exists: [com.tlc.worx.company.CompanyQuestion#48466]
at org.hibernate.UnresolvableObjectException.throwIfNull(UnresolvableObjectException.java:68)
at org.hibernate.event.internal.DefaultRefreshEventListener.onRefresh(DefaultRefreshEventListener.java:179)
at org.hibernate.event.internal.DefaultRefreshEventListener.onRefresh(DefaultRefreshEventListener.java:61)
at org.hibernate.internal.SessionImpl.fireRefresh(SessionImpl.java:1121)
at org.hibernate.internal.SessionImpl.refresh(SessionImpl.java:1094)
at org.hibernate.internal.SessionImpl.refresh(SessionImpl.java:1089)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate$10.doInHibernate(GrailsHibernateTemplate.java:342)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate.doExecute(GrailsHibernateTemplate.java:188)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate.refresh(GrailsHibernateTemplate.java:339)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate.refresh(GrailsHibernateTemplate.java:335)
at org.codehaus.groovy.grails.orm.hibernate.HibernateGormInstanceApi.refresh(HibernateGormInstanceApi.groovy:150)
at com.tlc.worx.company.CompanyQuestion.refresh(CompanyQuestion.groovy)
at com.tlc.worx.company.CompanyQuestion$refresh.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:114)
at com.tlc.worx.checklist.CompanyQuestionController$_index_closure1$_closure2$_closure3.doCall(CompanyQuestionController.groovy:49)
at com.tlc.worx.checklist.CompanyQuestionController$_index_closure1$_closure2$_closure3.doCall(CompanyQuestionController.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
--
A search of the database for the record number being referenced reveals it only in a completely unrelated field and a Tally table:
(Note the page hitting the error does not have to do with the missing record itself, only an object that could be related to a CompanyQuestion but is currently NOT related to it).
I suspected a Hibernate caching issue, especially since this has coincided with the removal of records. Furthermore, migrating the same database to another environment for testing does not give rise to the same error on the new environment--corroborating my theory that this is related to environment-specific caching. But oddly, a Tomcat7 restart (app runs on Tomcat) within the original environment does not cause the problem to go away. Hibernate configuration is as follows:
hibernate {
cache.use_second_level_cache = true
cache.use_query_cache = false
cache.region.factory_class = 'org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory' // Hibernate 4
singleSession = true // configure OSIV singleSession mode
flush.mode = 'auto' // pre-Hibernate4 default behavior was auto, so we'll stick with that for now. See https://grails.org/2.4.3+Release+Notes
}
It's the restart not causing the issue to disappear that has me scratching my head--is this normal Hibernate behavior to cache things for eternity even between Tomcat restarts? Am I missing the mark entirely here? My next step is running the application in the first environment with the second level cache disabled, but I would like to get community feedback as well that I am at least on the right track regarding my theory--it seems crazy. Any recommendations/feedback appreciated!
Wanted to close this up as I eventually found the issue.
Our application utilises ElasticSearch for compiling a set of data to query for searches. We have always reindexed ElasticSearch on app restart, and not run into this issue before, however I learned that the reindex operation does not always do exactly what we want to do and can actually index old data as new records, leading to either duplicates or a mixed bag of good/bad records.
The error in my question occurred when hitting one of the "bad", stale records from a previous reindex. Purging all ElasticSearch indices prior to reindexing resolved the issue.
Related
I have downloaded the Desktop Version of NEO4J on my MAC last week. (It's version 1.2.4)
Neo4j Browser version: 4.0.3
Neo4j Server version: 3.5.14 (enterprise)
Last week I was using the USING PERIODIC COMMIT command of loading in a CSV as seen below, this set up my relationships fine. However as of a couple of days ago, I tried to do the exact same command however now I get an error which is shown as Executing queries that use periodic commit in an open transaction is not possible. Please can someone explain to me what has gone wrong please?
query:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
"file:/Volumes/Twitter_Dataset/tweets.csv" AS csvLine
MATCH (tweet:Tweet {tweetID: csvLine.tweetID})
MATCH (user:User {username: csvLine.username})
MERGE (user)-[:POSTS]->(tweet);
The short answer:
Prefix your USING PERIODIC COMMIT queries with :auto
Changes were pushed out to provide more context here, so the error message now includes a link for more info about what's going on, as well as the :auto workaround above.
The long answer:
This is related to a recent feature improvement in the Neo4j Browser, which has a side effect with USING PERIODIC COMMIT operations, but there is a way around this, and a browser update was already pushed to provide more context with a clear workaround.
The latest round of Neo4j Browser updates include this change, which uses transactional functions instead of auto-committed transactions, giving queries through the browser automatic retry capability, and better ability to cope with member changes when hitting against a causal cluster.
The problem is that USING PERIODIC COMMIT needs to be run in an auto-committed transaction. This requires a means to switch whether we're using an auto-committed transaction or not.
You said you're using browser version 4.0.3. I believe that went out yesterday, and included with it changes providing details about what's going on and how to get this to work as normal. When encountering that error, you should now see a link with info on the :auto command, mentioning auto-committing transactions. Following the link should show an info card with:
The :auto command will send the Cypher query following it, in an auto committing transaction. In general this is not recommended because of the lack of support for auto retrying on leader switch errors in clusters.
Some query types do however need to be sent in auto-committing transactions, USING PERIODIC COMMIT is the most notable one.
An example is provided on the card for prefixing a USING PERIODIC COMMIT query with :auto to let it execute in an auto-committing transaction.
We are experiencing a sudden and strange issue with our Azure Search indexer. We had an index (2015-02-28-preview version) with corresponding datasource and indexer based on a table of a SQL Azure v12 database. Change tracking was enabled and changes were properly forwarded in the index. A couple of days ago, our attention was drawn by the fact that last changes in the database were no more properly replicated to the index. Being in a development phase, this index was frequently rebuilt by developers and nobody has noticed when exactly things started to go wrong.
In the Azure portal, the index is displayed in red color with an error message stating we have a duplicate column in the datasource...("Datasource contains multiple columns with the same name 'ProductId'") which is false. We cleaned the database and tried several things but could not find any duplicate column. As for today, the situation is the following :
1/ After deleting and recreating everything (index, indexer and datasource) the index is filled with the 2000 documents present in the SQL table
2/ The index is full and can be queried without any issue, though it still shows up in red with the "duplicate column" error message
3/ Due to this error, we cannot manually force a new indexation from the azure portal
4/ In order to reflect changes of the indexed table, we have to run again the script which deletes index, indexer and datasource and re-creates everything. After running this script .. we're back at step 1 above (index queryable, but in error state and cannot be updated without drop/recreate).
This problem seems to have occurred all of a sudden without any change on our side, as if there had been a server-side version change. Are there any newer release of the Azure Search Rest APIs available ? Has anyone ever encountered the same issue or has any hints on things we could check ?
Thanks for your help shedding some light on what may be broken here,
Problem fixed thanks to Eugene investigations. He discovered a bug in the C# code used to generate the datasource : a casing difference between a “ProductId” column in the database and a “ProductID” field in the index.
We fixed the misspelling and the issue is gone. Microsoft support said that they'll "fix the issue in the coming weeks" : The same code used to work properly (and is still working properly on the first run), so it looks like the indexing process has somehow become more case sensitive than before.
Over the last few weeks we have repeatedly failed on doing a complete backup of the data store using the datastore admin tool. We thought the issues had to do with quota errors we were running into so we switched our application from a free to a paid app and we still have problems.
Each time we are attempting to back up to the blobstore and what occurs is that the process never finishes. We see the backup in our Pending Backups list but it never actually completes. We only have a total of 43MB of data right now so we don't see it as a data transfer problem. Looking at our default Task Queues it shows that we have two pending tasks one is a call to /_ah/mapreduce/controller_callback and another is a call to /_ah/mapreduce/worker_callback
The worker_callback racks up its retry count and the only error clue we have is on the Previous Run tab it shows the last http response code to be 500. There is no error message, nothing shows up in our error logs, it just keeps trying over and over again.
We've been able to narrow the backup problems to a specific entity kind for a particular namespace but we can't figure out why that entity kind is failing whereas the others are not. The major difference is the entity kind has a large number of embedded entities, but if the app engine is able to read / put those entities we can't understand why it seems to be having problems backing it up. The particular namespace that the error occurs in has the largest data stored for that entity kind compared to the other namespaces we have setup.
We think if we can see what error is occurring in the worker_callback we may be able to figure out why the backup is failing, or what is wrong with our data that's preventing the backup. Is there something we need to setup / enable through settings / configuration files to give us more detailed information on the backup? Or is there some other avenue we should explore to figure out how to investigate/fix this problem?
I should mention we are using the Java SDK as well as Objectify V3 to work with the data store. We are also backing up data to the Blobstore.
Thank you.
Well with the app engine team's help we figured what the problem was and we worked around the issue. I want to give details in case anyone else runs into this problem.
From issue 8363 the app engine team indicated that from their logs they could see that the map reduce failed because of the large number of properties that our entity kind had. The specific entity kind that was causing the failure had a large number of variable properties that was generating errors when map reduce tried to write out a schema. They indicated that the solution on their end was to ignore entities that were like this in the backup to make it so the backup worked successfully.
What we did to work around the issue and make the backup work was change how we told objectify to store out data. The large number of properties were being created due to our use of the #embedded keyword on a HashMap() class member field. Since the embedded keyword breaks down classes into individual components it was generating a large number of properties. We switched the member field to be #serialized and then ran a conversion process to make it use the new serialized property. This made the backup / restore work again.
You can read more about the differences between embedded and serialized on objectify's website
snielson, would you mind opening an issue on our Public issue tracker here. Remember to add your Application ID so we can further debug this specific scenario.
Thanks!
Use a StreamingUpdateSolrServer, I used the following algorithm to re-index my huge dataset into SOLR.
Initialize StreamingUpdateSolrServer server = new StreamingUpdateSolrServer(solrServerUrl, numDocsToAddInBatch, numOfThreads);
For each Item…
-->Create document
-->Server.add(document)
When all finished,
server.commit();
server.optimize();
The problem:
Some of my items are not making it into the SOLR index, but no logs are being generated to tell me what happened.
I was able to find most of the documents, but some were missing. No errors in any logs – and I have substantial try/catch blocks with logs around all SOLRJ exceptions on the clients site.
Verify logging is not being hidden for the SOLR WAR
You will definitely want to verify that the SOLR server log settings are not hiding the fact that documents are failing to be added to the index.
Because SOLR uses the SLF4J API, your SOLR server could be over-riding the log settings allowing you to see an error message when the document failed to be indexed.
If you have a custom {solr-war}/WEB-INF/classes/logging.properties, you will need to make sure that the settings are not such that it is hiding the error messages.
By default, errors in adding an item should be shown automatically. So if you did not change your SOLR log settings at any point... you should be seeing any errors during indexing in your server log file.
Troubleshoot why Documents are failing to be indexed
In order to investigate this, it is helpful to follow verification step any time after the indexing is complete:
Initialize new log log_fromsolr
Initialize new log log_notfound
For each Item…
-->Search SOLR for the item. If SOLR has the object, log each item’s fields into log_fromsolr on a single line into log_fromsolr. This should include the unqiueKey for your document if you have one.
-->If document cannot be found in SOLR for this item, write a line to log_notfound with all the fields from the object from the database, also supplying the uniqueKey as the first line.
Once the verification step has completed, the log log_notfound created a list of all Documents that failed to be added into the Index.
You can use the log created by log_fromsolr to compare the document fields for an item that made it into the index and one that did not.
Verify it is not an intermittent issue
Sometimes it might be the case that it is not the same Items failing to be added to the index each time you try to index.
If you find objects in the log_notfound log, you will want to back up the current notfound log and run the indexing process again from scratch. Use a diff tool to see the differences between the first notfound log and the second notfound log.
An intermittent problem is evident when you see large numbers of differences in these files (Note: some differences are to be expected if new objects are being created in the database in between the first and second re-indexing).
If your problem is intermittent, it most certainly points at the application code with respect to your SOLR transactions not being committed correctly.
The same documents consistently come up missing each time it indexes
At this point we have to compare documents that are being found from the SOLR index, versus documents that are not getting into the Lucene index. Usually a field-by-field comparison of the object will start turning of some suspicious values that may be causing issues when adding the document to the index.
Try eliminating all the suspicious fields and then re-indexing the entire thing again. See if the documents are still failing to be indexed. If this worked, you will want to start re-introducing the fields that you removed and see if you can pinpoint the one that is the issue.
We are using ASP.NET MVC with LINQ to SQL. We added some features and tested them all to perfection on our QA box. We are using Windows Server 2003 and SQL Server 2005. So when we pushed out changes to the Live web server we also used Red Gate SQL Compare to push new database changes to the LIVE database. We tested again between the few of us, no problems. Time for bed.
The morning comes and users are starting to hit the app, and BOOM. We have no idea why this would happen as we have not been doing any new types of code things that we were not doing before. However we did notice that during the SQL Compare sync the names of all the foreign keys were different between the two databases, not the IDs in the tables, FK_AssetAsset_A0EB67 to FK_AssetAsset_B67EF8 (for example, don't remember the exact number of trailing mixed characters during the SQL Compare), we are not sure why but that is another variable in this problem.
Strangely once this was all pushed out we could then replicate the errors on QA, but not before everything was pushed to LIVE.
QA and LIVE databases are on the same SQL Server, but the apps are on different instances of Windows Server 2003.
Errors generated:
Index was outside the bounds of the array.
Invalid attempt to call FieldCount when reader is closed.
Server failed to resume the transaction.
There is already an open DataReader associated with this Command which must be closed first.
A transport-level error has occurred when sending the request to the server.
A transport-level error has occurred when receiving results from the server.
Invalid attempt to call Read when reader is closed.
Invalid attempt to call MetaData when reader is closed.
Count must be positive and count must refer to a location within the string/array/collection. Parameter name: count
ExecuteReader requires an open and available Connection. The connection's current state is connecting.
Any one have any idea what the heck could have happened?
EDIT: Since we were able to replicate the errors all of a sudden on QA, it might not be a user load issue... Needless to say we all feel really screwed here.
Concurrency always brings bugs out of the woodwork. I'd recommend you check for objects that could be shared among requests (such as static members and singletons) and refactor your code so that as little as possible is shared.
As far as specifics go, for the error "There is already an open DataReader associated with this Command which must be closed first," you may want to try adding MultipleActiveResultSets=True to your connection strings.
It sounds like you're crossing the streams a bit and trying to share DataContexts across requests. My suggestion would be to wire in a dependancy injection framework that creates a new instance of the dependancy for each request.
I use Castle's IoC and wire it into the controller factory so that when it sees a dependancy on a repository it creates a new instance of that repository for each request. If you go this route let me know and I can shoot you a few more resources.