Best way to design app engine datastore and text search modelling - google-app-engine

We have a Java application running on google app engine. Having a kind called Contact. following is the sample schema
Contact
{
long id
String firstName
String lastName
...
}
The above is the existig model,for supporting few requirements we are storing this object both in datastore and text search
Now we want to integrate contacts with their page views data.
Each contact can have thousands of page views records or even millions for some contacts
Following is the sample page visit object [Note : We don't have this object as of now, this is just give information about page visit]
PageVisit
{
long id
String url
String refUrl
int country
String city
....
}
We have a requirement , which needs a query on contact core properties and his page visited data
for ex :
select * from Contact where firstName = 'abc' and url = 'cccccc.com';
select * from Contact where firstName = 'abc' or url = 'cccccc.com';
To write this kind of queries we need both contact core properties and their page visited need to available in Contact object itself but contact
can have huge number page views. So this will cross entity maximum size limit
So how to design contact model in this kind of situation both in datastore and text search.
Thanks

Cloud Datastore doesn't support joins, so you will need to handle this in some manner from the client code.
2 possible ways to handle this are:
Denormalize the Contact you need to search into PageVisit:
PageVisit
{
long id
String firstName // Denormalized from Contact
String url
String refUrl
int country
String city
....
}
This requires you to create a composite index:
- kind: PageVisit
ancestor: no
properties:
- name: firstName
- name: url
Or run multiple queries
select id from Contact where firstName = 'abc'
select * from PageVisit where contactId={id} and url = 'cccccc.com';
select * from PageVisit where contactId={id} or url = 'cccccc.com';
This requires you to create a composite index:
- kind: PageVisit
ancestor: no
properties:
- name: contactId
- name: url
Final aside: Depending on how large your site is, it might be worth looking into Cloud Bigtable for the PageView data. It's a better solution for high write OLAP-style workloads.

Related

Wagtail - Search Page Owner Full Name

I am setting up a blog style documentation site. I was using a user input field for author when a child page was created. I found out that Wagtail houses owner in the page model. In the interest of not duplicating data, I removed my author field so I can use the default wagtail field. However, I have set up an LDAP module for authentication so the owner is housed as an Employee ID and not a user name. This Employee ID does map to a full name though and I am able to access that on a template via the owner.get_full_name.
So the question is, how do I set up the default search to check the owner full name when performing searches? How to get this into the search index? I am still a bit new to Wagtail so this may be a case of creating an author field with a foreign key mapping back to the user table or should I be modifying the search view to include a run through the user table?
def search(request):
search_query = request.GET.get('query', None)
page = request.GET.get('page', 1)
# Search
if search_query:
search_results = Page.objects.live().search(search_query)
query = Query.get(search_query)
# Record hit
query.add_hit()
else:
search_results = Page.objects.none()
# Pagination
paginator = Paginator(search_results, 10)
try:
search_results = paginator.page(page)
except PageNotAnInteger:
search_results = paginator.page(1)
except EmptyPage:
search_results = paginator.page(paginator.num_pages)
return TemplateResponse(request, 'search/search.html', {
'search_query': search_query,
'search_results': search_results,
})
If you check the Wagtail search documentation, it describes the process for indexing callables and extra attributes:
https://docs.wagtail.io/en/stable/topics/search/indexing.html#indexing-callables-and-other-attributes
So what I would do is:
Create a get_owner_full_name() method in your page class
Add an index.SearchField('get_owner_full_name') to the search_fields
One note though, this will only work if you are using either the PostgreSQL backend, or the Elasticsearch backend. The default database backend does not support the indexing of extra fields.

JPA, Hibernate : Database Schema

This is my first post, so hi everybody! :)
I have a question regarding a schema of my database. I'm writing RESTful application using Spring. The idea is to allow user to create his own diet based on products stored in DB.
So I came to creating entity Meal, which should consist of Products and amount of those products. It seems like natural way to have something like this is using Map. Problem is, that as I have read there is a problem with mapping such class to JSON Object, which I would like to send to clients browser. My other idea was to store List of objects like ProductWithQuantity instead of such map, but I'm a little worried that DB would be quickly flooded by entries like 1 glass of milk, 2 glasses of milk, 1.1243 glasses of milk and so on.
So my question is - do you have any better idea for the schema for such purpose? ;)
I would define an entity Meal which has a oneToMany relation to an entity Product, this product has properties like 'name', 'amount' and 'unit' and 'price' or something like that. Unit can be "gramm", "liter" and so on.
I might suggest a Meal with many servings, each serving being of a single product. Products like Milk or Hamburg are likely to have nutritional information, while a Meal will have many servings of different products. Serving would essentially be a relational table between Mean and Product, but with additional information like serving size.
#Entity
Class Meal {
#Id
Integer Id;
#OneToMany(mappedBy="meal")
List<Serving> servings;
}
#Entity
Class Serving {
#Id
Integer Id;
#OneToOne
Meal meal;
#OneToOne
Product product;
#Basic
Long servingCount;
}
#Entity
Class Product {
#Id
Integer Id;
#Basic
String simpleName;
#Basic
Integer caloriesPerServing;
..
}

NDB Datastore: Data Modeling for a Job website

What is the best way to model data for a job website which has the following elements:
Two types of user accounts: JobSeekers and Employers
Employers can create JobPost entities.
Each JobSeeker can create a Resume entity, and many JobApplication entities.
JobSeekers can create a JobApplication entity which is related to a JobPost entity.
A JobPost entity may receive many JobApplication entities.
A JobSeeker may only create one JobApplication entity per JobPost entity.
A Resume contains one or more instances of Education, Experience, using ndb.StructuredProperty(repeated = True).
Each Education contains the following ndb.StringProperty fields: institution, certification, area_of_study
While each Experience contains: workplace, job_title.
Here is a skeleton model that meets your requirements:
class Employer(ndb.Model):
user = ndb.UserProperty()
class JobPost(ndb.Model):
employer = ndb.KeyProperty(kind=Employer)
class JobSeeker(ndb.Model):
user = ndb.UserProperty()
def apply(self, job_post):
if JobApplication.query(JobApplication.job_seeker == self.key,
JobApplication.job_post == job_post).count(1) == 1:
raise Exception("Already applied for this position")
...
class Resume(ndb.Model):
job_seeker = ndb.KeyProperty(JobSeeker)
education = ndb.JsonProperty()
experience = ndb.JsonProperty()
class JobApplication(ndb.Model):
job_seeker = ndb.KeyProperty(JobSeeker)
job_post = ndb.KeyProperty(JobPost)
Notes:
Employer and JobSeeker have the built-in UserProperty to identify and allow them to login.
Resume uses JsonProperty for education and experience to allow for more fields in the future. You can assign a Python dictionary to this field, for example
resume.education = {'institution': 'name', 'certification': 'certificate', 'area_of_study': 'major', 'year_graduated': 2013, ...}
(I have personally found StructuredProperty to be more pain than gain, and I avoid it now.)
Limiting a JobSeeker to only one JobApplication can be done with the method apply() which checks the JobApplication table for existing applications.

mongo db indexes on embedded documents

I have a domain object model as below...
#document
Profile
{
**social profile list:**
SocialProfile
{
**Interest list:**
{
Interest
{
id
type
value
}
...
}
...
}
Each profile can have many social profiles, in each social profile there are many interests related to the profile via the specific social profile ( social profile represent social network like Facebook), each interest is also embedded document with the fields id , type , value.
So I have two questions..
can I index few fields separately in the embedded document interest?
can I create compound index in the embedded document interest?
I guess the complexity in my model is the deep level of the embedded document which is 2.. and that the path to that document is via arrays...
can it be done in spring way via metadata annotations? if you think my model is wrong please let me know I am a newbie on mongo
Thanks
You can index separately on the fields in an embedded document.
You can also create a compound index on the fields, so long as no more than one field is an array.
These might offer more answers:
http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys
http://www.mongodb.org/display/DOCS/Multikeys

Twitter-ish DB structure in Google App Engine

I'm trying to create a site which is quite similar to Twitter. Users will be able to post messages. And users will be able to 'follow' each other. On the homepage, they see the messages from the users they follow, sorted by time.
How do I go about creating the appengine models for this?
In a traditional relational DB, i guess it would be something like this:
Database 'user':
id
username
Database 'follows':
user_id
follow_id
Database 'messages':
user_id
message
And the query will be something like:
SELECT * FROM messages m, follows f WHERE m.user_id = f.follow_id AND f.user_id = current_user_id
I guess i was clear with the example above. How do I replicate this in Google App Engine?
There is a useful presentation at Google I/O a while back by Brett Slatkin which describes building a scalable twitter-like microblog app, and deals with this very question at length: http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
REVISED:
class AppUser(db.Model):
user_id = db.UserProperty()
username = db.StringProperty()
following = db.ListProperty(db.Key) # list of AppUser keys
class Message(db.Model):
sender = db.ReferenceProperty(AppUser)
body = db.TextProperty()
You would then query the results in two steps:
message_list = []
for followed_user in current_user.following:
subresult = db.GqlQuery("SELECT __key__ FROM Message WHERE sender = :1", followed_user)
message_list.extend(subresult)
results = Message.get(message_list)
(with 'current_user' being the 'AppUser' entity corresponding with your active user)

Resources