GAE datastore auto built indexes doesn't contain imported data - google-app-engine

When i was merging 2 gae instances, i took an export of the data in datastore from the first and imported it in the second.
All predefined indexes worked fine and i found the imported entities, but when i searched using auto built indexes it didn't return any values.
first time i did the import using Admin UI, then i tried using "gcloud datastore import", yet i got the same result.
reading imported entity by key and writing it again did the refresh for that entity indexes, but if do this for all imported entities it will cost a lot $$$.
Any advice how to refresh auto built indexes as it should be automatically refreshed as mentioned in documentation.

I have tried to replicate your scenario and the indexes are building automatically. If you are still observing that built-in indexes are not automatically built after an import, you might want to file a bug at Issue Tracker so this can be further investigated.
You will be charged the same amount for your built-in indexes, irrespective of the index rebuilds being triggered manually or automatically as they will have the same reads and writes and take up the same space.

Related

Google App Engine index inconsistent hours after put

I'm having a consistency problem on App Engine I'm not sure how to address. I've performed a query, selected an entity from the result, edited it and saved. All, using the low level Java API.
Performing an Entity Key query on the edited item, returns the updated version. However performing an indexed query by a certain property of this entity, returns the old version.
I've originally considered this as an occasionally expected "eventually consistent" HRD issue. However, I'm now hours after the original save, and this is still the case.
Ridiculously enough (and perhaps not so surprising), this occurs even when using the Datastore Viewer in the App Engine console: A GQL query returns a row with stale data, and selecting the entity opens the updated data.
What can I do? This sounds like an index bug. Is there a way to "refresh" the Index?

GAE Golang - queries not returning data right after it is saved

I am writing a Google App Engine Go application and I'm having problem in testing some functionality. Here is some sample code. The problem is as follows:
I create a new item and save it to datastore
I make a search query for that item immediately after (for example, getting all items in the namespace)
The item is not there
If I query for the item later (say, in subsequent pages), the item is found normally. I understand this behaviour might be intentional for apps deployed on GAE, but I would expect the local datastore to be instantly good for queries.
Is there any way I can force the datastore to consolidate and be good for such queries?
This is called eventual consistency, and it's a feature of App Engine's Datastore.
You can use a get method instead of a query to test if an entity has been saved.
In Java we can set the desired behavior of a local datastore by changing a run parameter:
By default, the local datastore is configured to simulate the
consistency model of the High Replication Datastore, with the
percentage of datastore writes that are not immediately visible in
global queries set to 10%.
To adjust this level of consistency, set the
datastore.default_high_rep_job_policy_unapplied_job_pct system
property with a value corresponding to the amount of eventual
consistency you want your application to see.
I could not find something similar for Go.

Preventing duplicates with MapReduce to BigQuery pipeline

I was reading the answer by Michael to this post here, which suggests using a pipeline to move data from datastore to cloud storage to big query.
Google App Engine: Using Big Query on datastore?
I want to use this technique to append data to a bigquery table. That means I have to have some way of knowing if the entities have been processed, so they don't get repeatedly submitted to bigquery during mapreduce runs. I don't want to rebuild my table each time.
The way I see it, I have two options. I can put a flag on the entities and update it when each entity is processed and filter it out on subsequent runs - or - I can save each entity to a new table and delete it from the source table. The second way seems superior but I wanted to ask for options or see if there's any gotchas
Assuming you have some stream of activity represented as entities, you can use query cursors to start up one query where a prior one left off. Query cursors are perfect for the type of incremental situation that you've described, because they avoid the overhead for marking entities as having been processed.
I'd have to poke around a bit to see if App Engine MapReduce supports cursors (I suspect that it doesn't, yet).

Sunspot with Solr 3.5. Manually updating indexes for real time search

Im working with Rails 3 and Sunspot solr 3.5. My application uses Solr to index user generated content and makes it searchable for other users. The goal is to allow users to search this data as soon as possible from the time the user uploaded it. I don't know if this qualifies as Real time search.
My application has two models
Posts
PostItems
I index posts by including data from post items so that a when a user searches based on certain description provided in a post_item record the corresponding post object is made available in the search.
Users frequently update post_items so every time a new post_item is added I need to reindex the corresponding post object so that the new post_item will be available during search.
So at the moment whenever I receive a new post_item object I run
post_item.post.solr_index! #
which according to this documentation instantly updates the index and commits. This works but is this the right way to handle indexing in this scenario? I read here that calling index while searching may break solr. Also frequent manual index calls are not the way to go.
Any suggestions on the right way to do this. Are there alternatives other than switching to ElasticSearch
try to use this gem https://github.com/bdurand/sunspot_index_queue
you will than be able to batch reindex, let's say, every minute, and it definitely will not brake an index
If you are just starting out and have the luxury to choose between Solr and ElasticSearch, go with ElasticSearch.
We use Solr in production and have run into many weird issues as the index and search volume grew. The conclusion was Solr was built/optimzed for indexing huge documents(word/pdf content) and in large numbers(billions?) but updating the index once a day or a couple of days when nobody is searching.
It was a wrong choice for consumer Rails application where documents are small, small in numbers( in millions) updates are random and continuous and the search needs to be somewhat real time( a delay of 5-10 sec is fine).
Some of the tricks we applied to tune the server.
removed all commits (i.e., !) from rails code,
use Solr auto-commit every 5/20 seconds,
have master/slave configuration,
run index optimization(on Master) every 1 hour
and more.
and we still see high CPU usage on slaves when the commit triggers. As a result some searches take a long time(> 60 seconds at times).
Also I doubt if the batching indexing sunspot_index_queue gem can remedy the high CPU issue.

deleting and entity through google app engine's datastore viewer does not remove the entity from the memcache

I 've noticed this behavior which results in consecutive gets to succeed.
Has anybody else seen this?
I've found a way to remove one single entity from memcache, painful but it works.
Now, I use Java and Objectify but I hope you'll find this useful, whatever environment and language you use.
Go to the page https://console.cloud.google.com/appengine/memcache for your project.
Enter under Namespace the value "ObjectifyCache", or whatever namespace you use.
Under Key Type, select Java String
This is the tricky bit. Under Key you've got to enter the "URL-safe key" you'll find from the Datastore Edit page for your entity (https://console.cloud.google.com/datastore/entities/edit)
Click on Find and hopefully en entity will appear.
Check the box, and click on DELETE
Now you click on Find again, nothing will come up.
If you're using the high-replication datastore, gets immediately after deletes may succeed and pull up the stale results. It takes up to a few seconds for the results of each operation to appear in the results of other operations.
Memcache is operates independently of the datastore. Some libraries like Objectify connect them,. If you're using Objectify to cache entities and you delete something from outside of Objectify (e.g. the data viewer) you'll have to update your cache yourself. This happens to me occasionally and I just wipe the whole memcache.
You have to find a way to work with this behavior. The simplest (expensive and really slow) method, for example, would just be to wait ten seconds after you do every datastore operation. Better methods might use a cache to return freshly stored or deleted entities.

Resources