A keys-only query cost if sorted

A keys-only query cost if sorted - google-app-engine

I am looking into using Google Datastore for a project of mine. I would like to retrieve all the keys of the entities but I would like it sorted by newest -> oldest:
query.select('__key__').order('DateCreated');;
query.run(function(err, entities) {
var keys = entities.map(function(entity) {
return entity[datastore.KEY];
});
});
If inside of my entity I have a "Date Created" field can I use this field to sort the results while the query still being considered a "free" operation meaning it will cost me nothing to get the results? Or if I do sort this using my own attribute inside the entity does this then cost me something?

Yes, they're still keys-only queries. From the Pricing and Quota:
Small Operations Unlimited Free
Small operations include calls to allocate Cloud Datastore IDs or
keys-only queries.

Related

How to find mongodb document by id without a search?

I need to find a document in mongodb using it's ID . This operation should be so fast. So we need to get the exact document which has the given ID . Is there any way to do that . I am a beginner here . So I would be much thankful to you if you could give some in-depth answer

Okay so your are really a beginner.
First thing you should know from now that getting any kind of record
from a database is done by a querying the database and its called as
Search.
It simply means that when you want any data from your database, database engine has to search it in database using the query you provided.
So I think this sufficient to get you know that whenever you ask(using database query) database to give some records from it, it will perform a search based on conditions you provided, then it doesn't matter
You provide condition with a single unique key or multiple complex combinations of columns or joins of multiple tables
Your database contain no records or billions of records.
it has to search in database.
So above explanation holds true for I guess every database as far as I know.
Now coming to MongoDB
So referring to above explanation, MongoDB engine query database to get result.
Now main question is - How to get result Fast!!!
And I think that's what your main concern should be.
So query speed (Search Speed) is mainly depend on 2 things:
Query.
Number of records in your database.
1. Query
Here factors affecting are :
a. Nature of parameters used in a query - (Indexed or UnIndexed)
If you use Indexed parameters in your query it always going to be a faster search operation for database.
For example the _id field is default indexed by mongodb. So if you search document in collection using _id filed alone is going to be always a faster search.
b. Combination of parameters with operators
This refers to number of parameters used for query (more the parameter, more slower) and kind of query operators you used in query (simple query operator give result faster as compare to aggregation query operators with pipelines)
c. Read Preferences
Read preference describes how MongoDB will route read operations to the members of a replica set. It actually describe your preference of confidence in data that you are getting.
Above are two main parameters, but there are many things such as :
Schema of your collection,
Your understanding of the schema (specifically data-types of documents)
Understanding of query operator you used. For example - when to use $or, $and operators and when to use $in and $nin operators.
2. Number of records in your database.
Now this happens when you have enormous data in database, of course with single database server it will be slower i.e. more records more slower.
In such cases Sharding (Clustering) your data on multiple database server will give you faster search performance.
MongoDB has Mongos component which will route our query to perfect database server in cluster. In order to perform such routing it uses config servers which stores the meta-data about our collections using Indexes and Shard-Key.
Hence in sharded environment choosing proper shard-key plays important role in faster query response.
I hope this will give you a descent idea of how actually a search is affected by various parameters.
I will improve this answer in future time.

Its pretty starlight forward, you can try for the following:
var id = "89e6dd2eb4494ed008d595bd";
Model.findById(id, function (err, user) { ... } );

with mongoose:
router.get("/:id", (req, res) => {
if (!mongoose.Types.ObjectId.isValid(req.params.id)) { // checking if the id is valid
return res.send("Please provide valid id");
}
var id = mongoose.Types.ObjectId(req.params.id);
Item.findById({ _id: id })
.then(item=> {
res.json(item);
})
.catch(err => res.status(404).json({ success: false }));
});

Google Datastore, "IN" query filter and pagination

We have an application running in Google App Engine and storing data in Google Datastore.
For a given Datastore kind, all our entities have a property type.
We are interested in running a query with an IN query filter to fetch multiple types at once, something like:
type in ['event', 'comment', 'custom']
As there are thousands of entities within this kind, pagination is needed.
The problem we are having is that it is a known limitation of the Datastore that queries with "IN" filters do not support cursor.
Are there sensible ways to get around this limitation?
Using offset would be costy and not performant. Also we can't fetch all entities and filter in the client as we are building an API, hence we don't develop the client ourselves.
Any hint would be really appreciated, thanks!

IN filter results in individual EQUAL queries for each item on the list. This is why they do not support cursors - in your case, there will be 3 distinct positions in the index after your run the IN query.
Consider instead adding another property to your entity, which will serve as a flag for this type of API call: its value will be "true" if a type is in ['event', 'comment', 'custom'], or "false" otherwise. Maybe this flag may allow you to make "type" property unindexed - that would be an additional benefit.
With this new indexed property you can use a regular EQUAL filter. It will be faster (1 query instead of 3), and you can use cursors for pagination.

google app engine query opimization

I am trying to do my reads and writes for GAE as efficiently as possible and I was wondering which is the best of the following two options.
I have a website where users are able to post different things and right now whenever I want to show all posts by that user I do a query for all posts with that user's user ID and then I display them. Would it be better to store all of the post IDs in the user entity and do a get_by_id(post_ID_list) to return all of the posts? Or would that extra space being used up not be worth it?
Is there anywhere I can find more information like this to optimize my web app?
Thanks!

The main reason you would want to store the list of IDs would be so that you can get each entity separately for better consistency - entity gets by id are consistent with the latest version in the datastore, while queries are eventually consistent.
Check datastore costs and optimize for cost:
https://developers.google.com/appengine/docs/billing
Getting entities by key wouldn't be any cheaper than querying all the posts. The query makes use of an index.
If you use projection queries, you can reduce your costs quite a bit.

There is several cases.
First, if you keep track for all ids of user's posts. You must use entity group for consistency. Thats means speed of write to datastore would be ~1 entity per second. And cost is 1 read for object with ids and 1 read per entity.
Second, if you just use query. This is not need consistency. Cost is 1 read + 1 read per entity retrieved.
Third, if you quering only keys and after fetching. Cost is 1 read + 1 small per key retrieved. Watch this: Keys-Only Queries. This equals to projection quering for cost.
And if you have many result, and use pagination then you need use Query Cursors. That prevent useless usage of datastore.
The most economical solution is third case. Watch this: Batch Operations.

In case you have a list of id's because they are stored with your entity, a call to ndb.get_multi (in case you are using NDB, but it would be similar with any other framework using the memcache to cache single entities) would save you further datastore calls if all (or most) of the entities correpsonding to the keys are already in the datastore.
So in the best possible case (everything is in the memcache), the datastore wouldn't be touched at all, while using a query would.
See this issue for a discussion and caveats: http://code.google.com/p/appengine-ndb-experiment/issues/detail?id=118.

Google App Engine: close .put() and .get() race condition?

I'm putting some data in the datastore via entity.put(), then soon thereafter reading from the datastore (getting data that includes the just put entity) via a .get().
The .get() data is correct, but often the order of it doesn't make sense:
SELECT * FROM entityName
WHERE someThing = 'value'
ORDER BY votes DESC, lastTouchedTimestamp DESC
Will return the correct entities (updated to include new data from the aforementioned .put()) but in an order that is incorrect (i.e. the votes and/or lastTouchedTimestamp actually aren't in order)
Pretty new to GAE so sorry if there is some simple thing I'm overlooking.
EDIT/ADDITION:
each entity has a vote integer. the SELECT should return entities in order of votes like: 10,8,7,7,1, but instead sometimes returns 10,7,8,7,1, for example.

What you're describing is in App Engine terms not a .get() call but a query. Proper .get() calls specify a key and are not subject to this race. (Nor are ancestor queries.) For more background info about this topic, read https://developers.google.com/appengine/docs/python/datastore/overview#Datastore_Writes_and_Data_Visibility
You're lucky that you're getting the updated entity in your query results at all -- that's because the entity as it existed before your .put() call still matched the query. You're getting the correct value in the entity because query results (except for projection queries as #tesdal mentioned) are accessed by key; but you're getting the wrong ordering because the ordering is taken from the index.

App Engine has no guarantee concerning index update timing.
In your example it means that index data is 10,7,7,7,1 but the returned results are actual objects (which are updated) so you notice that ordering is off because you expect 8 for one of the entries.
If you use a projection query, you'll see 10,7,7,7,1.

Mass updates in Google App Engine Datastore

What is the proper way to perform mass updates on entities in a Google App Engine Datastore? Can it be done without having to retrieve the entities?
For example, what would be the GAE equivilant to something like this in SQL:
UPDATE dbo.authors
SET city = replace(city, 'Salt', 'Olympic')
WHERE city LIKE 'Salt%';

There isn't a direct translation. The datastore really has no concept of updates; all you can do is overwrite old entities with a new entity at the same address (key). To change an entity, you must fetch it from the datastore, modify it locally, and then save it back.
There's also no equivalent to the LIKE operator. While wildcard suffix matching is possible with some tricks, if you wanted to match '%Salt%' you'd have to read every single entity into memory and do the string comparison locally.
So it's not going to be quite as clean or efficient as SQL. This is a tradeoff with most distributed object stores, and the datastore is no exception.
That said, the mapper library is available to facilitate such batch updates. Follow the example and use something like this for your process function:
def process(entity):
if entity.city.startswith('Salt'):
entity.city = entity.city.replace('Salt', 'Olympic')
yield op.db.Put(entity)
There are other alternatives besides the mapper. The most important optimization tip is to batch your updates; don't save back each updated entity individually. If you use the mapper and yield puts, this is handled automatically.

No, it can't be done without retrieving the entities.
There's no such thing as a '1000 max record limit', but there is of course a timeout on any single request - and if you have large amounts of entities to modify, a simple iteration will probably fall foul of that. You could manage this by splitting it up into multiple operations and keeping track with a query cursor, or potentially by using the MapReduce framework.

you could use the query class, http://code.google.com/appengine/docs/python/datastore/queryclass.html
query = authors.all().filter('city >', 'Salt').fetch()
for record in query:
record.city = record.city.replace('Salt','Olympic')

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight