appengine search index reclaim space of deleted documents - google-app-engine

After creating a search index in appengine, adding a lot of documents, found out that the data has an issue. Deleted all documents and re-indexed them but the usage space of the index keeps incrementing and never decrement when deleting documents. We are approaching the 10GB limit and not sure how we can reclaim the space of all the deleted documents.

Have you run the "gcloud datastore cleanup-indexes index.yaml" command, after having removed unwanted index entries from the index.yaml configuration file? You may find more detail on the "Cloud Datastore Indexes" documentation page.

Related

GAE datastore auto built indexes doesn't contain imported data

When i was merging 2 gae instances, i took an export of the data in datastore from the first and imported it in the second.
All predefined indexes worked fine and i found the imported entities, but when i searched using auto built indexes it didn't return any values.
first time i did the import using Admin UI, then i tried using "gcloud datastore import", yet i got the same result.
reading imported entity by key and writing it again did the refresh for that entity indexes, but if do this for all imported entities it will cost a lot $$$.
Any advice how to refresh auto built indexes as it should be automatically refreshed as mentioned in documentation.
I have tried to replicate your scenario and the indexes are building automatically. If you are still observing that built-in indexes are not automatically built after an import, you might want to file a bug at Issue Tracker so this can be further investigated.
You will be charged the same amount for your built-in indexes, irrespective of the index rebuilds being triggered manually or automatically as they will have the same reads and writes and take up the same space.

Segments is solr

Can someone explain me what are semgents in solr.
I have not found good description online.
I have also seen various segments file in solr? what are there for.
What happens if I delete one segment file.? will that corrupt the index?
I am using solr 5.3(if that makes any difference)
Also whar are tlogs and what are there role?
The segment files in Solr are parts of the underlying Lucene index. You can read about the index format in the Lucene index docs.
In principle, each segment contains a part of the index. New files get created when you add documents and you can completely ignore them. Only if you have problems with too many open file handles you my merge some of them together with the index OPTIMIZE command.
And yes, deleting one of the files will corrupt the index.
The tlog files are transaction logs where every index changing transaction (ADD, UPDATE, DELETE) is written down. If anything happens to your Solr server while there's an open segment currently undergoing a some transactions, the segment file will be corrupt. Solr then uses the tlog to rewind the already transmitted transactions and restore the failed segment to its best guess. You can read more on this in this nice post on the Lucidworks blog.

How do I check the status of Datastore indexes with a script?

I am writing a script to vacuum/update indexes in Google Datastore using the appcfg script. However, I need a way to know when the indexes are finished being deleted/built.
How do I do this programmatically?
The docs say "You can check the status of the Cloud Datastore instance's indexes from the Indexes page in the Cloud Platform Console." But that's in a GUI -- how do I do it from Python (or bash or Java)?
If I run update_indexes while indexes are being vacuumed, I get "Index being deleted cannot be (re)built until it is completely deleted." So, this is a way to determine index status in a script. Needless to say, it is too dangerous to try to overwrite indexes just to determine the status of indexes. But that indicates that in principle there is some way for scripts to do this.

Solr indexes are not visible

We're using a training server to create solr indexes and uploading them to another (solr) server via rsync.
Until now, everything has been fine. Now, our index size on one core has increased drastically and our solr instances are refusing to read those indexes on that core. Also, they are ignoring those indexes without any exceptions. (we sure are reloading the cores or restarting tomcat after rsyncs)
ie: in solr stats, numDocs is 0 or /select?q=*:* is not returning any results..
Just to answer the question, are those indexes corrupted, we have regenerated them a couple of times. But nothing has changed. When we try to use smaller indexes, they are being read fine.
our solrconfig.xml in this core is like this; https://gist.github.com/983ebb13c895c9cccbfb
Copying your index using rsync is a bad idea. Your Solr server may not have completed writing files to disc when you initiate the copy operation, and you could end up with corruption. The only safe way to do this is to shut down the master (source index), shut down the slave (destination index), remove the entire content of the slave's index directory, copy the master's index across, and then restart everything.
A better approach is what was suggested by Peer Allan above - use Solr's built-in replication support. See http://wiki.apache.org/solr/SolrReplication.

How do you reset the Google AppEngine Datastore Quota Exceeded condition?

If you have a application that exceeds that Datastore quota of 1 Gig (Master/Slave configuration) How do you clear the condition?
I have an application called "parking-helper" and it has a Total Stored Data 100% used message.
But I've deleted 99% of the data (a few days ago) cleared the indexes,
vacuumed the indexes and cleared the memcache and also waited for at
least 2 reset cycles and yet the datastore still says 100% used.
The datastore admin shows that the datastore does not add up to more than a
meg.
I know that memcache and indexes count also. It would be nice to see the sizes of these in the Datastore Admin view (but currently we cannot see these values). But I believe I cleared them also.
How can I reset this application without the drastic measure of deleting and re-creating which means I'd have to pick another application id and re-create all the other data.
Thanks,
Ralph
It's usually just a matter of waiting. And it looks like your quota is under 100% now.

Resources