What is the best way (Least Read Operations) to do autocomplete on Google App Engine Using Objectify - google-app-engine

I am currently using ajax to do autocomplete emails and would like to find out what is the best way to do this without too much read operations. Thanks!

The best way to do these kind of operations is use the following approach
Use full text search:
https://cloud.google.com/appengine/docs/java/search/
When creating a document to search on, you could tokenize the email id. for example if you have foobar#baz.com. you could tokenize it to f, fo, foo, foobar .... and save it into a textfield.
then use index.search to query for the results.
then every successful lookup can be cached for say 2 hours ( you can change it as per your requirement ).
Anytime you update the model add/update/remove entries then delete the memcache entries/flush the memcache, preferably using the datastore callbacks.
https://cloud.google.com/appengine/docs/java/datastore/callbacks
please note that the tokenize + adding a document could to be processed in task queue to fit into the "gae way of doing things"
Also as a footnote, you could try implementing client side caching mechanism using http cache control + etags. I have not implemented such a solution so others could pitch in how their experience was implementing such a solution.
https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en

Related

Any examples of using a Wandsearcher in vespa ? (After a weighted set query)

Currently i am using the REST interface to query vespa, which seems to work great but something tells me that i should be using searchers in the application to make the client(server side code) a bit lighter (bundle the jar file in the application package) to make it a bit smoother. I have managed to do some simple searcher/processor applications. But this is a bit overwhelming.
So are there any readily available examples ?
Basicially i want to:
Send to /search?query=someId
Do a ordinary search for the weighted set on this documentID (I guess this one can be handy: https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html)
Take those items in the response and add it to a wand item(s) and query for a wand with wandsearcher on a given field. Similar to the yql:
"select * from sources * where wand(interest, some weightedsets));","ranking":"combined_score" and return the matches.
Just curious also, apart from the trouble of string building with the http request i am doing at the moment are there any performance gains of using a searcher or go the java route vs rest?
thanks for any insight or code help i can start with.
There is an example of using the WandItem (YQL wand)here https://docs.vespa.ai/documentation/advanced-ranking.html and see also https://docs.vespa.ai/documentation/using-wand-with-vespa.html as there are two wand implementations available in Vespa, it sounds from the description that the wand() is what you want to use for this use case. For the first call you probably want to have a dedicated document summary to reduce the amount of data fetched for your first query and also the option of serving it out of memory only (See https://docs.vespa.ai/documentation/document-summaries.html)
Also see https://docs.vespa.ai/documentation/searcher-development.html as a general resource on writing searchers.
For your use case it makes a lot of sense to write a searcher to perform these two queries as your second query depends on the first and you avoid the cost of rendering/http/yql parsing which might matter if your client is remote with high network latency.

Custom Searcher - Blending of hits from different sources

We have a need for "Blending of hits from different sources", as per your documentation it is recommended to write a custom-searcher in JAVA. Is there a demo of this written somewhere on Github ? I wouldn't even know where to start :( I understand I can create search "chains" , preferably Asynchronous, and then blend results in JAVA before returning them...but then how would I handle paginations, limits...etc ? This all seems very complicated, for someone who doesn't even know JAVA that much. So, I am hoping someone has already written a demo for this ? Please ? Anyone ?
Thank you so much
EDIT to make my quesion clearer:
We are writing a search engine that fetches data from various websites. Some websites have 10mil indexable items, other websites only 100,000. When we present the results to end user, we want to include results from all our sources ( when match applies ). Let's say 10 results from each of the websites we crawl, so that they all get equal amount of attention on page. If we don't do custom blending, what happens is that the largest website with most items wins all our traffic.
I understand that we can send 10 separate queries to VESPA, and blend the results in our front end, but that seems very inefficient. Thus, the quesion of "Custome Searcher". Thank you so much !
That documentation covers some very advanced use cases which you do not have. Are your sources different Vespa schemas or content clusters? If so Vespa will by default blend the hits returned from each according to their relevance scores so there's nothing you need to do.
The two other most common use-cases are:
Some (or all) the data sources are external, so you need to write a Searcher component to fetch the external data and turn it into a Result.
You want the data to be blended in some custom way (rather than by relevance score). If so you need to exclude the default blending Searcher (com.yahoo.prelude.searcher.BlendingSearcher) and write your own.
If you provide some more information about your use cases I can give you some code examples.
EDIT: Use grouping to solve the need explained under "EDIT" in the question:
Create a "siteid" field when feeding (e.g in document processing).
Use the grouping expression all(group(siteid) each(max(10) output(summary())))
See http://docs.vespa.ai/documentation/grouping.html

Seamless Integration with REST API

Many examples on the net show you how to use ng-repeat with in-memory data, but in my case I have long table with infinite scroll that gets data by sending requests to a REST API (scroll down - fetch some data, scroll down again - fetch some more data, etc.). It does work, but I'm wondering how can I integrate that with filters?
Right now I have to call a specific method of API service that makes a request based on text in "search" input box and then controller updates $scope.data.
Is it possible to build a custom filter that would do that? And then my view would be utterly decoupled from the service and I could declaratively tell it how to group and order and filter data, regardless if it's in-memory or comes from a remote server, server that can serve only limited records at a time.
Also later I'm gonna need grouping and ordering as well, I'm so tempted to download the entire dataset and lock parts of the app responsible for grouping, searching and ordering (until all data is on the client), but:
a) that dataset is huge (hundred thousands of records)
b) nobody wants to deal with cache invalidation headaches
c) doing so feels so damn wrong, you don't really expect me to 'keep' all that data in-memory, right?
Can you guys point me to maybe some open-source examples where I can steal some ideas from?
Basically I need to build a service and filters that let me to work with my "pageable" data that comes from api, like it's in memory-data.
Regardless of how you choose to solve it (there are many ways to infinite-scroll with angular, here is one: http://binarymuse.github.io/ngInfiniteScroll/), at its latest current beta version, ng-repeat works really bad with large amount of data - so do filters. The reason is obvious - pulling so much information for changes is a tuff job. Moreover, ng-repeat by default will re-draw your complete list every time something changes.
There are many solutions you can explore in this area, here are the ones I found productive:
http://kamilkp.github.io/angular-vs-repeat/#?tab=8
http://www.williambrownstreet.net/blog/2013/07/angularjs-my-solution-to-the-ng-repeat-performance-problem/
https://github.com/allaud/quick-ng-repeat
You should also consider the following, which really helps with large amounts of data.
https://github.com/Pasvaz/bindonce
Updated
I guess you can't really control your server output, because filtering and ordering large amount of data are better off done on the server side.
I was pointing out the links above since even if you write your own filters (and order-bys), which is quite simple to do - http://jsfiddle.net/gdefpfqL/ - (filter by some company name and then click the "Add More" button - to add more items). ordering by is virtually impossible if you can't control the data coming for the server - the only option is getting it all, ordering and then lazy load from the client's memory. So if each of your list items doesn't have many binding by it self (as in the example I've added) - the list item is a fairly simple one (for instance: you simply present the results as a plain text in a <li>{{item.name}}</li> then angular ng-repeat might work for you. In this case, filters will work as expected - say you filter by searched text:
<li ng-repeat="item in items | filter:searchedText"></li>
even for new items added after the user has searched a text, it will still works because the magic of binding.

Achieve good paging using objectify

I'm using objectify cursors to achieve basic paging, basically creating a more button.. How best do you achieve paging using objectify for building links that allow users to go forward and backwards. Something more like a page list..
1, 2, 3, 4, more
Your best bet is probably to fetch the keys for the entire result set and stash it in a session or in javascript. Each next/previous can load the next item in your list by id. Loading by id is very cheap. You can cache the full query results in memcache as long as it's not too large but that's going to depend on what kind of objects you're fetching.
You can use cursors to create paging by one page forward and backwards, via FetchOptions.startCursor(..) and FetchOptions.endCursor(..)
To create more direct paging links you will have to use FetchOptions.limit(..) and FetchOptions.offset(..).
Note that offset(..) can be very costly as it fetches all entities that come before given page. So, depending on usage and size of the whole set, you might be better off by preloading and caching a set of keys. Or better, replace paging with search.

Best way to implement supplemental analytics

I want to be able to allow my writers to see how much traffic their articles are getting. I can do this in Google Analytics but can't figure out how to share this data with them without giving them access to all the data so I was thinking of adding another analytics service that would insert a unique code for each author on their articles. I already have the GA code and quantcast code so I don't want to bog down my site much more. Should I use a pixel tracker or javascript tracker?
UPDATE: Here is the code I use in analytics to track my authors.
var pageTracker = _gat._getTracker("UA-xxxxxxx-x");
pageTracker._trackPageview();
} catch(err) {}
<?php if ( is_singular()) { ?>
pageTracker._trackEvent('Authors','viewed','<?php the_author_meta('ID'); ?>');
<?php } ?>
you could use a custom field to track the writers by a unique id that they probably have. Then you could use GA's api to pull data where custom field value = unique id and display it in their profile or wherever you want them to see it.
One option would be to use a server-local Redis instance and use the PHP Redis library to increment a local counter using the author ID and article IDs.
For example, if in redis you use a sorted set with AuthorID as the redis key, and use the article ID (or however you identify an article) as a member that you increment using zincrby for each load you'll have the data readily available and under your control. You could then have a PHP page that pulls the author's data from Redis and display it in whatever format you need. For example you could build a table showing them traffic for each of their articles, or make pretty graphs to display it. You could extend the above to do per-day traffic (for example) by using a key structure of "AUTHORID:YYYY-MM-DD" instead of just author ID.
The hit penalty for tracking this is much lower than reaching out to an external site - it should be on the order of single-digit milliseconds. Even if your Redis instance was elsewhere the response times should still be lower than external tracking. I know you are using GA but this is a simple to implement method you could consider.
This slightly depends on how many authors you have and your level of involvement, main type I would use is either
Create a separate view per author and filter in his / hers traffic
Use a google docs plugin to pull down authors data and share
Use the API to pull down relevant information
Happy to give mor specifics if you could guide in more details what you want

Resources