Vespa.ai exploit multiple instances to answer queries - vespa

I have Vespa.ai cluster with multiple container/content nodes. After Vespa is loaded with data, my app sends queries and gets the data from Vespa. I want to be sure that I utilize well all the nodes and I get the data as fast as possible. My app builds HTTP request and sends it to one of the nodes.
Which node/nodes should I direct my request to?
How can I be sure that all instances participate in answering queries?
What should I do to utilize all the cluster nodes?
Does Vespa know to load balance these requests to other instances for better performance?

Vespa is a 2-tier system:
The containers will load balance over the content nodes (if you have multiple groups), but since you are sending the requests to the containers, you need to load balance over those.
This can be done by code you write in your client, by VIP, by another tier of nodes you host yourself such as e.g Nginx, or by a hosted load balancer such as AWS ELB.

You can debug the distributed query execution by adding &presentation.timing=true&trace.timestamps&tracelevel=5
to the search request, then you'll get a trace in the response where you can see how the query was dispatched and how long each node uses to match the query. See also Scaling Vespa https://docs.vespa.ai/en/performance/sizing-search.html

Related

Multiple parallel data rest call in angular

This is more like a question about the right approach:
We have an single page web application in angularjs that is loading a view that contains multiple diagrams. Each diagram fetch the data that needs to be displayed through the REST service. There is a limitation in chrome with 6 connection simultaneously. As we have views with more than 10 diagrams the data fetch results in queuing the calls untils previous one are resolved. This appears to the user as if the data fetch is slow.
Is there a way to execute all calls in parallel (same server, different REST endpoints)?
What where the single page solution that would not be limited by the browser but provide faster throughput?
Caching in frontend is only partially applicable, due to the active filtering of data by the user.
One solution will be combining multiple request to one request, by that the overhead of multiple connection establishment time will be gone.
You can make a proxy api which can take care of them.
The problem with combining endpoints is, if any of your endpoint has higher processing time then the other combined endpoints response has to wait for it.
Best solution is, make the endpoints first enough so 6 connections are enough

Programatically listing and sending requests to dynamic App Engine instances

I want to send a particular HTTP request (or otherwise communicate a message) to every (dynamic/autoscaled) instance which is currently running for a particular App Engine application.
My goal is to trigger each instance to discard some locally cached data (because I have just modified the underlying data and want them to reload it).
One possible solution is to store a value in Memcache, and have instances check this each time they handle a request to see if they should flush their cache. But this adds latency to every request.
Another possible solution would be to somehow stop all running instances. No fixed overhead, but some impact while instances are restarted.
An even less desirable solution would be to redeploy the application code in order to cause all instances to be stopped. This now adds additional delay on my end as a deployment takes some time.
You could use the management API to list instances for a given version, but I'd suggest that you'd probably want to use something like the PubSub API to create a subscription on each of your App Engine instances. Since each instance has its own subscription, any messages sent to the monitored queue will be received by all instances.
You can create the subscription at startup (the /_ah/start endpoint may be useful), and then delete it at shutdown (using the /_ah/stop endpoint).

Camel and load balancer

I am using camel to implement a route, that load data from DB and then apply some processing on it before producing results that are saved in the DB again.
This is part of a web application.
My problem is this war is going to be deployed by a load balancer into two servers. Then there will be two camel contexts with two routes performing the same processing on the same DB.
I will have the case where the same record is being processed by the two routes. How to handle this problem to prevent the routes from performing the same job twice?
If you need to have that setup so that each server might receive the same record - then you need an idempotent route. And you need to make sure your idempotent repository is the same between your machines. Using a database as the repository is an easy option. If you do not have a database, a hazelcast repo might be an option.
What can be an issue is to determine what is unique in your records - such as an order number or customer + date/time or some increasing transaction ID number.

How to fan out URL Fetch requests in a timely fashion?

Every minute or so my app creates some data and needs to send it out to more than 1000 remote servers via URL Fetch callbacks. The callback URL for each server is stored on separate entities. The time lag between creating the data and sending it to the remote servers should be roughly less than 5 seconds.
My initial thought is to use the Pipeline API to fan out URL Fetch requests to different task queues.
Unfortunately task queues are not guaranteed to be executed in a timely fashion. Therefore from requesting a task queue start to it actually executing could take minutes to hours. From previous experience this gap is regularly over a minute so is not necessarily appropriate.
Is there any way from within App Engine to achieve what I want? Maybe you know of an outside service that can do the fan out in a timely fashion?
Well, there's probably no good solution for the gae here.
You could keep a backend running; hammering the datastore/memcache
every second for new data to send out, and then spawn dozens of async url-fetches.
But thats really inefficient...
If you want a 3rd party service, pubnub.com is capable of doing fan-out, however i don't know if it could fit in your setup.
How about using the async API? You could then do a large number of simultaneous URL calls, all from a single location.
If the performance is particularly sensitive, you could do them from a backend and use a B8 instance.

Turn off AppEngine (Java) sessions for certain requests

We're using sessions in our GAE/J application. Over the weekend, we had a large spike in our datastore writes that appears to have been caused by a large number of _ah_SESSION entities being created (about 100-200 every minute). Near as we can tell, there was a rogue task queue creating them because they stopped when we purged the queue. The task was part of a mapper process we run hourly.
We don't need sessions in that hourly mapper (or indeed in any of our task queues or cron jobs or many other requests). Is there a way to disable creating a session for selected URLs?
Unfortunately that can not be done.
This is particularly nasty when you have a non-browser clients (devices via REST or mapreduce jobs) where every request generates a new _ah_SESSION entity in the database.
The only way to avoid this is to write your own session handler: e.g. a servlet filter that sets/checks cookies and set it so that it ignores certain paths.
EDIT:
I just realized that there could be another way: make sure your client (mapreduce job) sets a dummy cookie with a proper name. GAE uses cookies named ACSID in production and dev_appserver_login on dev server. Just use always the same cookie value, so all requests will be treated as one user/session.
There will still be overhead of looking-up/saving session objects, but at least it will not create countless _ah_SESSION entities.

Resources