Google App Engine Datastore Queries and Getting Relative Rank - google-app-engine

Say I have the following model in Google App Engine's datastore:
class Post(ndb.Model):
name = ndb.StringProperty()
votes = ndb.IntegerPropery()
I want to first query Post to get all posts ordered by number of votes (highest to lowest) and then find the ranking/place of the Post with name "John".
So if the Post by John has the 2nd most votes, he would be in 2nd place.
Is there an easy way to do this?
Right now I'm doing something like:
posts = Post.query().order(-Post.votes)
for post, index in enumerate(posts):
if post.name == "John":
print(index)
I imagine this is not a very good/efficient way to do this.

If the total number of posts is very high and those named John are few it might be more efficient to do something like this:
total_posts = len(Post.query().fetch(keys_only=True))
john_posts = Post.query(Post.name == 'John')
for post in john_posts:
# find out how many posts have higher votes than this post
rank = len(Post.query(Post.votes > post.votes).fetch(keys_only=True))
print('%d out of %d' % (rank, total_posts))
It will also be cheaper, since you'd be reading from the DB only John-named posts (paid DB ops) - keys_only queries are free.
It's not an actual index in the total list though - you'd get same rank for posts with the same number of votes.

Related

Get value of unknown keys

This is my firebase database structure in the image :
I'll explain the structure aswell,
I have a forum, which people can post trades in.
Every post has some random user key as you can see in the picture.
Also every post has a list of items which the user wants to sell or buy ('have' or 'want', in the image).
the number of items in each trade can change from 1 to 10.
I want to fetch all of the forum posts of the users that are selling ( or buying ) with some item id.
For example : Get forum posts of users selling items with 'Id' of 'Some Item Name'
How do I do this? I can't seem to get reference to each item in the inventory
since you can't do multiple orderByChild.
If you think it's impossible with this DB structure, Please offer me an alternative.
I want to know whether I'm wasting my time with firebase and this is impossible to do :(
Please note that I have a lot of data in my DB so I can't just filter the posts on the client side.
Either you can change your database structure OR You can create a different node which can contain the metadata of all the "have" OR "want" items with the itemID, userID and "have" or "want" node child number(in this case it should be 0-9, as there are 10 items in each type). So whenever you are saving/deleting you data from "have" or "want" section you have to do the same operation in the other new metadata table also.
You can run your query on this node to get the desired item and then with the help of result data you get those particular items directly by creating a reference at runtime as you are having userId, have or want type, itemId.
your metadata should be something like.
metadata
|
|
+{randonId1}
|
|-type : "have" or "want"
|-userId : "randonId of user".(Kt0QclmY3.as shown in picture)
|-Id: "Breakout Type-S"
|-childOnNode: 0, (0-9)
+{randonId2}
|
|-type : "have" or "want"
|-userId : "randonId of user".(Kt0R48Cp..as shown in picture)
|-Id: "Breakout"
|-childOnNode: 0, (0-9)

Structure data in app engine ndb and speed up query

I am looking for some help as to the best way to structure data in app engine ndb using python, process it and query it later. I want to store temperature data at hourly intervals for different geographical regions.
I can think of two entity options but there maybe something much better. The first would be to store the hourly temperature in individual properties:
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
00:00 = ndb.FloatProperty()
01:00 = ndb.FloatProperty()
...
23:00 = ndb.FloatProperty()
Or I could store the data
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
time = ndb.TimeProperty()
temp = ndb.FloatProperty()
(it might be better to store date and time as one property?)
I want to be able to query the datastore to calculate the Total, Max, Min, and average temperature for any given date range. In the first option I could potentially create 4 more properties to effectively pre-process and store the Total, Max etc for each day so if I wanted to query the total temperature for a year I would only have to sum 365 values as opposed to 8760? I'm not sure how I would do this in the second option?
I am relatively new to app engine and datastore and I think I am still thinking in terms of relationship db's so any help would really be appreciated. Later on it might be necessary to store data in different time zones.
Thanks
Paul
Personally, I'd go with a variant of the first approach:
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
temp = ndb.FloatProperty(repeated=True)
using the temp list to store temperatures by hour in order as you learn about them. I don't think the preprocessing per-date will add anything much: to compute whatever for a year, you'd still need to fetch 365 entities, and the delay for that will swamp the tiny amount of time required to sum up a few thousand numbers anyway.
In general, preprocessing is useful if you want to handily query by the new fields you create by such processing (e.g rapidly answer the question "which dates in locale X had average temperatures greater than 20 Celsius"). That does not seem to be your use case.
If anything, if it's common for you to have to compute many-month values, preprocessing to aggregate things per-month (into simpler TempDataMonth entities) may be more useful. Or, any other several-days period you find useful, of course (weeks, ten-day-groups, whatever). Those could be computed in a background task periodically checking which such periods have become complete since the last check. But, this is a bit beyond your question, so I'm not getting into fine-grained details.
The general idea is that minimizing the number of entities to fetch tends to be the single most important optimization; other optimizations are of course also possible, but, they tend to play second fiddle to that:-).

What's the most effective way of storing this data?

Need help figuring out a good way to store data effectively and efficiently
I'm using Parse (JavaScript SDK), here's an example of what I'm trying to store
Predictions of football (soccer) matches so an example of one match would be;
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 predicts the score will be Team A 2-0 Team B -> so 2-0
User456 predicts the score will be Team A 1-3 Team B -> so 1-3
Each event has information attached to it like an eventId, several categories, start time, end time, a result and more
I need to record a score prediction per user for each event (usually 10 events at a time so a lot of predictions will be coming in)
I need to store these so I can cross reference the correct result against the user's prediction and award points based on their prediction, the teams in the match and the categories of the event but instead of adding to a total I need all the awarded points stored separately per category and per user so I can then filter based on predictions between set dates and certain categories e.g.
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 prediction = 2-0
Actual result = 2-0
So now I need to award X points to User123 for Team A, Team B, "League-1", and "Sunday-League" and record it to the event date too.
I would suggest you create a table for games and a table for users and then an associative table to handle the many to many relationship. This is a pretty standard many to many relationship.

How to store 10 numbers (updated weekly) with GAE?

My GAE app will request weekly data from Google Analytics like
number of visitors during last week
number of visitors of particular page during last week
etc.
Then I would like to show this data on my GAE web-page with Google Charts. The data will be shown for last X weeks (let's say, 10 weeks).
What is the best approach to store this data (number of metrics multiplied by number of weeks)? Old data could be deleted.
I don't think I should use datastore like:
class Visitors(ndb.Model):
week1 = ndb.IntegerProperty(default=0) # should store week start and end dates also
week2 = ndb.IntegerProperty(default=0)
...
Probably, it would be better to store data like:
class Analytics(ndb.Model):
visitors = ndb.StringProperty(default=0) # comma separated values like '1000,1001,1002'; last value is previous week
page_visitors = ndb.IntegerProperty(repeated=True,default=0) # [1000,1001,1002]
...
What are you trying to optimize?
With this amount of data, you will pay pennies, or less, for data storage. You are well within the free quota on datastore reads and writes. Performance-wise, the difference is negligible.
I would recommend going with the most straightforward solution: each week is a new entity, each data point is in its own property.

Social feed system design

http://twitter.com/#!/ladygaga
When ladygaga tweet 1 message, does it mean to insert 1 data record for EACH of her followers (total 12,221,751)? So totally 12,221,751 records are inserted?
Any clues in designing such a social feed system?
------------------------------- Edit line -------------------------------
Real issue:
Performing SELECT tweet FROM Tweets IN ([FollowingIDs]) is not possible in google app engine, which limiting to a maximum of 30 items in the IN clause
While in app engine it actually means, performing 30 queries in parallel, which is not very wise to do so I guess.
Even if I am allowed to overtake the 30 limits,
what if I am subscribing to 10000 people? I am not sure if there are any performance issues to do it in MYSQL or any other kind of database infrastructure using the "IN clause"
(the bigtable of app engine is different from MYSQL)
So it is better to use the IN clause to query?
or setting up a UserFeed table for storing the feed relationship?
or 3rd method?
Database/SQL guru please help
Please see this talk from Google I/O 2009 to see how to handle these sort of cases on App Engine with a 'fan out' data structure.
Can you imagine it?
no
every people has list of followers
ID | FOLLOWER__ID
ladygaga | genesis
ladygaga | user
//php
$result = mysql_query("SELECT ID FROM followers WHERE FOLLOWER__ID = 'genesis';");
while($row = mysql_fetch_assoc($result)){
$select[] = $row['ID'];
}
$tweets = mysql_query("SELECT * FROM tweets WHERE owner IN (".implode(",", $select).")");

Resources