CouchDB - document removed by PUT _deleted attribute is still available - database

I remove document in CouchDB by setting the _deleted attribute to true (PUT method). The last revision of document is deleted but previous revision is still available.
And when I pull documents of specific type from database, this document is still available.
How I should delete document to not be available?
I use synchronization between CouchDB on the server and PouchDB instances on mobile applications (Ionic).

You need to compact your database. Compaction is a process of removing unused and old data from database or view index files, not unlike vacuum in RDBMS. It could be triggered by calling _compact end-point of a database, e.g. curl -X POST http://192.168.99.100:5984/koi/_compact -H'Content-Type: application/json'. After that the attempts to access the previous revisions of a deleted document should return error 404 with a reason missing.
Note that the document itself not going to completely disappear, something called "tombstone" will be left behind. The reason is that CouchDB needs to track deleted documents during replication to prevent accidental document recovery.

Related

Posting large directory of files to SOLR using post tool, how to commit after every file

I am using the java post tool for solr to upload and index a directory of documents. There are several thousand documents. Solr only does a commit at the very end of the process and sometimes things stop before it completes so I lose all the work.
Has anyone a technique to fetch the name of each doc and call post on that so you get the commit for each document? Rather than the large commit of all the docs at the end?
From the help page for the post tool:
Other options:
..
-params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
This should allow you to use -params "commitWithin=1000" to make sure each document shows up within one second of being added to the index.
Committing after each document is an overkill for the performance, in any case it's quite strange that you had to resubmit anything from start if something goes wrong. I suggest to seriously to change the indexing strategy you're using instead of investigating in a different way to commit.
Given that, if you not have any other way that change the commit configuration, I suggest to configure autocommit in your Solr collection/index or use the parameter commitWithin, as suggested by #MatsLindh. Just be aware if the tool you're using has the chance to add this parameter.
autoCommit
These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is
to use commitWithin, which can be defined when making the update
request to Solr (i.e., when pushing documents), or in an update
RequestHandler.

Solr change schema

I am running Solr 4.1, and I need to change my schema. I modified the schema file, but it doesn't seem to pick up the changed schema. I even rebooted my server.
Also, how do I see my schem? I tried to do curl http://localhost:1080/Mark/names/schema?wt=json and I get
The origin server did not find a current representation
for the target resource or is not willing to disclose that one exists.
I can curl the same endpoint and get a document just fine.
How do I get Solr to pick up the new schema?
I had an error in my code that was indexing the data. I was setting the values for the new fields, but I did not realize that there was another method buried deep in the bowels :) that stripped it off.
Turns out, Solr picked up the changes, I just didn't send the data for it.

Solr 4 - Delete fields from FieldsInfo file

I am running solr 4 version. I have created million fields in solr using a script . I saw GC has gone very high after adding these fields as every time searcher is open, these fields were loaded.
Now, I want to go back to the stage where my solr cluster was before adding those fields. Even though, I delete documents which has those fields, the cluster is not coming back to what it was as the fields are not getting deleted from fieldsInfo file.
Is there a way we can explicitly tell solr to delete the fields from the fieldsInfo file???
There is a schema API documented that can delete a field. However I don't know if this is already available for Solr 4. You should try if it works.

Hibernate4, Grails 2.5 -- cached data persists between restarts?

I'm running into a strange issue with Hibernate4 caching in a Grails 2.5.0 application that is serving as a platform for data migrated from a legacy system. The migration involves direct database inserts and removals (while testing migration SQL) of database records. These operations are causing pageload errors in the system because cached data is different from the actual state of the database. Stacktrace errors on a particular failed page load indicate missing records whose IDs are not currently referenced by anything in the database via foreign key. For example, one page fails to render with the following error:
018-02-27 10:16:32,495 http-bio-8080-exec-8 | ERROR StackTrace | superAdmin | Full Stack Trace:
org.hibernate.UnresolvableObjectException: No row with the given identifier exists: [com.tlc.worx.company.CompanyQuestion#48466]
at org.hibernate.UnresolvableObjectException.throwIfNull(UnresolvableObjectException.java:68)
at org.hibernate.event.internal.DefaultRefreshEventListener.onRefresh(DefaultRefreshEventListener.java:179)
at org.hibernate.event.internal.DefaultRefreshEventListener.onRefresh(DefaultRefreshEventListener.java:61)
at org.hibernate.internal.SessionImpl.fireRefresh(SessionImpl.java:1121)
at org.hibernate.internal.SessionImpl.refresh(SessionImpl.java:1094)
at org.hibernate.internal.SessionImpl.refresh(SessionImpl.java:1089)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate$10.doInHibernate(GrailsHibernateTemplate.java:342)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate.doExecute(GrailsHibernateTemplate.java:188)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate.refresh(GrailsHibernateTemplate.java:339)
at org.codehaus.groovy.grails.orm.hibernate.GrailsHibernateTemplate.refresh(GrailsHibernateTemplate.java:335)
at org.codehaus.groovy.grails.orm.hibernate.HibernateGormInstanceApi.refresh(HibernateGormInstanceApi.groovy:150)
at com.tlc.worx.company.CompanyQuestion.refresh(CompanyQuestion.groovy)
at com.tlc.worx.company.CompanyQuestion$refresh.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:114)
at com.tlc.worx.checklist.CompanyQuestionController$_index_closure1$_closure2$_closure3.doCall(CompanyQuestionController.groovy:49)
at com.tlc.worx.checklist.CompanyQuestionController$_index_closure1$_closure2$_closure3.doCall(CompanyQuestionController.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
--
A search of the database for the record number being referenced reveals it only in a completely unrelated field and a Tally table:
(Note the page hitting the error does not have to do with the missing record itself, only an object that could be related to a CompanyQuestion but is currently NOT related to it).
I suspected a Hibernate caching issue, especially since this has coincided with the removal of records. Furthermore, migrating the same database to another environment for testing does not give rise to the same error on the new environment--corroborating my theory that this is related to environment-specific caching. But oddly, a Tomcat7 restart (app runs on Tomcat) within the original environment does not cause the problem to go away. Hibernate configuration is as follows:
hibernate {
cache.use_second_level_cache = true
cache.use_query_cache = false
cache.region.factory_class = 'org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory' // Hibernate 4
singleSession = true // configure OSIV singleSession mode
flush.mode = 'auto' // pre-Hibernate4 default behavior was auto, so we'll stick with that for now. See https://grails.org/2.4.3+Release+Notes
}
It's the restart not causing the issue to disappear that has me scratching my head--is this normal Hibernate behavior to cache things for eternity even between Tomcat restarts? Am I missing the mark entirely here? My next step is running the application in the first environment with the second level cache disabled, but I would like to get community feedback as well that I am at least on the right track regarding my theory--it seems crazy. Any recommendations/feedback appreciated!
Wanted to close this up as I eventually found the issue.
Our application utilises ElasticSearch for compiling a set of data to query for searches. We have always reindexed ElasticSearch on app restart, and not run into this issue before, however I learned that the reindex operation does not always do exactly what we want to do and can actually index old data as new records, leading to either duplicates or a mixed bag of good/bad records.
The error in my question occurred when hitting one of the "bad", stale records from a previous reindex. Purging all ElasticSearch indices prior to reindexing resolved the issue.

SOLR 3.6.0, After a full re-index of a bunch of entities, some of my items are not making it into the SOLR index, but no logs are being generated

Use a StreamingUpdateSolrServer, I used the following algorithm to re-index my huge dataset into SOLR.
Initialize StreamingUpdateSolrServer server = new StreamingUpdateSolrServer(solrServerUrl, numDocsToAddInBatch, numOfThreads);
For each Item…
-->Create document
-->Server.add(document)
When all finished,
server.commit();
server.optimize();
The problem:
Some of my items are not making it into the SOLR index, but no logs are being generated to tell me what happened.
I was able to find most of the documents, but some were missing. No errors in any logs – and I have substantial try/catch blocks with logs around all SOLRJ exceptions on the clients site.
Verify logging is not being hidden for the SOLR WAR
You will definitely want to verify that the SOLR server log settings are not hiding the fact that documents are failing to be added to the index.
Because SOLR uses the SLF4J API, your SOLR server could be over-riding the log settings allowing you to see an error message when the document failed to be indexed.
If you have a custom {solr-war}/WEB-INF/classes/logging.properties, you will need to make sure that the settings are not such that it is hiding the error messages.
By default, errors in adding an item should be shown automatically. So if you did not change your SOLR log settings at any point... you should be seeing any errors during indexing in your server log file.
Troubleshoot why Documents are failing to be indexed
In order to investigate this, it is helpful to follow verification step any time after the indexing is complete:
Initialize new log log_fromsolr
Initialize new log log_notfound
For each Item…
-->Search SOLR for the item. If SOLR has the object, log each item’s fields into log_fromsolr on a single line into log_fromsolr. This should include the unqiueKey for your document if you have one.
-->If document cannot be found in SOLR for this item, write a line to log_notfound with all the fields from the object from the database, also supplying the uniqueKey as the first line.
Once the verification step has completed, the log log_notfound created a list of all Documents that failed to be added into the Index.
You can use the log created by log_fromsolr to compare the document fields for an item that made it into the index and one that did not.
Verify it is not an intermittent issue
Sometimes it might be the case that it is not the same Items failing to be added to the index each time you try to index.
If you find objects in the log_notfound log, you will want to back up the current notfound log and run the indexing process again from scratch. Use a diff tool to see the differences between the first notfound log and the second notfound log.
An intermittent problem is evident when you see large numbers of differences in these files (Note: some differences are to be expected if new objects are being created in the database in between the first and second re-indexing).
If your problem is intermittent, it most certainly points at the application code with respect to your SOLR transactions not being committed correctly.
The same documents consistently come up missing each time it indexes
At this point we have to compare documents that are being found from the SOLR index, versus documents that are not getting into the Lucene index. Usually a field-by-field comparison of the object will start turning of some suspicious values that may be causing issues when adding the document to the index.
Try eliminating all the suspicious fields and then re-indexing the entire thing again. See if the documents are still failing to be indexed. If this worked, you will want to start re-introducing the fields that you removed and see if you can pinpoint the one that is the issue.

Resources