How can you delete individual indexes on properties in an Entity? - google-app-engine

I know you can delete composite indexes with gcloud datastore indexes cleanup but what about deleting individual indexes on properties in an Entity?
For example, let's say you have been indexing a property on an entity for a while, but then you decide not to anymore and upload a new version of your app that excludes it. I presume the index is still somewhere there in a table somewhere. Is there a way to clear these out?

The index is updated for an entity when you put the entity. You could put all of your entities to clear that index for all of them.

Related

How do I control which shard/split a record goes to in Google Cloud Spanner?

When adding new records to a Cloud Spanner Table, how do I determine which shard/split this row is going to?
This would be especially relevant in adding records with a foreign key value, and I would like to ensure it lives in the same place as the parent row which it references.
You can't. Google Cloud Spanner does all that for you in the background.
Instead of Foreign Keys, Cloud Spanner calls them "Interleaves". When inserting a new record into a table with a foreign key, Spanner will do its best to make sure the new record lives in the same Split as its Interleave parent record. There are corner cases which this does not happen, but Spanner is constantly re-organizing its splits, so even if a new record does not live in the same Split as its Interleave parent record, it eventually will.
It is also inferred by this regular re-organization of splits that even if you could decide which Split a parent record and all its children records live in, Spanner might decide at any time they are best existing in a completely different Split.

Is query by key faster than query by indexed property in Google Datastore?

Consider the below datastore entity:
public class Employee {
#Id String id;
#Index String userName
}
My understanding is that only those properties which are part of the filter criteria in the queries need to be annotated with #Index. Indexing in datastore is not for performance but for fetching the data.
Should id also be annotated with #Index to query by id? If no, does datastore automatically create indexes for keys?
#Id annotation makes sure to manage uniqueness, but it has no performance advantage over indexed properties. Is that right?
Will query by id be faster than query by userName in the above example?
1:
No, you don't need to explicitly index it. Datastore uses your key as a primary key for your entities (in the Entities table).
2 & 3:
Querying by primary key is more efficient (you only require a single scan on the primary table instead of a scan on the index followed by a lookup in the primary table. However, it also allows you to do a Lookup instead of a query:
Employee e = ofy().load().type(Employee.class).id("<id>").now();
Besides avoiding the query planning and index scan to lookup this Employee, this is Strongly Consistent. If you don't do this, you may write a new Employee but then not actually see them when you query for them.
While Strong Consistency is important from an application correctness point-of-view, it will be slower. In particular, when you do a strongly consistent lookup, Datastore may need to talk to the other replicas (in other data centers) to catch up your entity group.
If you are ok with eventual consistency, you can perform a Lookup with eventual consistency to avoid the index scans and the replica catch up using a read policy. In objectify, this looks like:
Employee e = ofy().consistency(Consistency.EVENTUAL).load()
.type(Employee.class).id("<id>).now();
Note: This answer talks a lot about indexes and tables. In generally I recommend not thinking about Datastore in terms of indexes and table (since it is not a relational storage system). However, it is implemented on a relational DB, so useful for answering your questions. This page has a lot of good background.
No, will be created automatically
#Id makes sure it's Key
Can't find confirmation, but must be faster. Also it's cheaper than query, 1 read for get vs 2 read for query. See https://cloud.google.com/datastore/docs/pricing
Also, keep in mind that if you decide to add #Index annotation later, then it will be created only for new entities, all existing entities will be unindexed. Which means you need to reindex db, or only new records will be returned from Query with a filter by this field.
Objectify always does a get by key - if you run a query, it does a keys only query, then fetches results by id. This works well because it has cache integration and it also means that you get accurate results (as in the data is strongly consistent, even though they query results aren't). You can control this using the .hybrid(boolean) method on a query.
You cannot query by id - you can only get by key. If you want to do that, you need a duplicate indexed field, and to query on that. This is an artifact of how keys work in the datastore.

Why do entities need keys in Appengine datastore

What is the usage of keys in the appengine datastore: I am new to Appengine, any info on it would be great.
Comparison
To keep things simple, let's assume MySQL stores all the rows of a table in a single file. That way, it can find all the rows by scanning that file.
App Engine's datastore (BigTable) does not have a concept of tables. Each entity (~row in MySQL) is stored separately. [It can also have a individual structure (~columns).] Because entities are not connected in any way, there is no "default" method to go through all of them. Each entity needs an ID and must be indexed.
Key Structure
A key consists of:
App ID (the closest thing in MySQL is a database).
Kind (the closest thing in MySQL is a table).
ID or name (the closest thing in MySQL is a primary key).
(Optionally) Parent key (all the above of another entity). (Details omitted for the sake of simplicity.)
Please note that what is meant by the closest thing is conceptual similarity. Technically, these things are not related. In MySQL, databases and tables represent actual storage structures. In BigTable they are just IDs, and the storage is actually flat, i.e. every entity is essentially a file.
In other words, identity-wise, a key is to an entity as the database + table + primary key are to a row in a MySQL table.
Key's Responsibilities
An entity's key:
States what application the entity belongs to.
What kind (class, table) it is of.
By the means of the above and either a numeric key ID or a textual key name, identifies the entity uniquely.
(Optionally) What the parent entity of the entity is. (Details omitted for the sake of simplicity.)
Usage
So that you can retrieve all entities of a kind, App Engine automatically builds indexes. That means App Engine maintains a list of all your entities. More specifically, it maintains a list of your entities' keys.
Complex indexes may be defined to run queries on multiple properties (~columns).
In contrast to MySQL, every BigTable query requires an index. Whenever a query is run, the corresponding index is scanned to find the entities that meet the query's conditions, and then the individual entities are retrieved by key.
A common high-level use is to identify an entity in a URL, as every key can be represented as a URL-safe string. When an entity's key is passed in the URL, the entity can be retrieved unambiguously, as the key identifies it uniquely.
Moreover, retrieving an entity by its key is strongly consistent, as opposed to queries on indexes, which means that when entity is retrieved by its key, it's guaranteed to be the latest version.
Tips
Every entity stored in BigTable has a key. Such a key may be programmatically created in your application and given an arbitrary key name. If it's not, an numeric ID will be allocated transparently, as the entity is being stored.
Once an entity is stored, its key may not be changed.
The optional parent component might be used to define a hierarchy of entities, but what it's really important for is transactions and strong consistency.
Entities that share a parent are said to belong to the same entity group.
Queries within a group are strongly consistent.
Just to reiterate, retrieving an entity by its key or querying an index by a parent key are strongly consistent. Retrieving entities in other ways (e.g. by a query on a property) is eventually consistent.
Glossary
Entity - a single key-value document.
Eventual consistency - retrieving an entity (often a number of them) without the guarantee that the replication has completed, which may result in some entities being an old version and some being missing, as they have not yet been brought from the server they were stored on.
Key - an entity's ID.
Kind - arbitrary textual name of a class of entities, such as User or Article.
Key ID - a numeric identifier of a key. Usually automatically allocated.
Key name - a textual identifier of a key.
Strong consistency - retrieving an entity in such a way that its latest version is retrieved.
(I intentionally used MySQL in the examples, as I'm much more familiar with it than with any other relational database.)
Please read https://developers.google.com/appengine/docs/java/datastore/#Java_Entities ... you may want to delete your question and ask again after you have studied this documentation section.
(This is meant to help you, not complain.)

Solrcloud duplicate documents with id field

I am using solrcloud-4.3.0 and zookeeper-3.4.5 on windows machine. I have a collection of index with unique field "id". I observed that there were duplicate documents in the index with same unique id value. As per my understanding this should not happen cause the purpose of the unique field is to avoid such situations. Can anyone help me out here what causes this problem ?
In the "/conf/schema.xml" file there is a XML element called "", which seems to be "id" by default... that is supposed to be your "key".
However, according to Solr documentation (http://wiki.apache.org/solr/UniqueKey#Use_cases_which_do_not_require_a_unique_key) you do not always need to have always to have a "unique key", if you do not require to incrementally add new documents to an existing index... maybe that is what is happening in your situation. But I also had the impression you always needed a unique ID.
Probably too late to add an answer to this question, but it is also possible to duplicate documents with unique keys/fields by merging indexes with duplicate documents/fields.
Apparently when indexes are merged either via the lucene IndexMergeTool or the solr CoreAdminHandler, any duplicate documents will be happily appended to the index. (as of lucene and solr 4.6.0)
de-duplication seems to happen at retrieval time.
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes

is datastore give the deleted keys again

when i delete an entity ,is it possibile that the datastore give the keys again?
In documents writing that the datastore never gives same key in same parent group.
thanks
No. Autogenerated keys will never be reused.

Resources