Delays of 127s between google app engine and azure cognitive search - google-app-engine

Most of our services are on Google cloud, but we connect to Azure Cognitive search for full text searches.
Since around a month ago, random delays started appearing between app engine and cognitive search, with requests consistently taking 127.2-127.3 seconds, i.e. exactly 127 seconds more than normal requests. This is specifically on the request to cognitive search, and not any other part of the infra or code.
Strangely, these delays do not appear from local testing, vms, k8s or anywhere else. Still, after taking 3 weeks to answer, google insists this is 'an azure problem' caused by 'packet loss'. They do indicate that some requests connect to an ipv4 and others to an ipv6 address, but I have found no correlation here either.
What kind of problem could cause such an extremely specific delay?

It appears this is due to IPv6 problems, and the requests library (or underlying python infrastructure) switches to IPv4 after a number of exponentially backed-off retries, leading to ~2^7-1 seconds delay.
Forcing an IPv4 connection canbe done using this ugly hack:
import socket
import requests.packages.urllib3.util.connection as urllib3_cn
# https://pythonadventures.wordpress.com/2019/06/28/force-requests-to-use-ipv4/
urllib3_cn.allowed_gai_family = lambda: socket.AF_INET # force IPv4

Related

Matomo Setup highly Unstable

I’m having trouble with my matomo in my openshift. This matomo is rather unstable.
When I start the pod motomo runs fine for a short time (very short time). Then matomo starts to respond with http 504 regularly … eventualy beeing unable to process any request successfully and respond with 504 only.
My guess is that matomo tries (lots of) communication with the internet. My openshift is not allowed to communication with the internet. Could this be the cause of trouble?
What is the recommended setup for matomo in general and matomo in openshift in particular?
I recently updated to matomo 4. Looks a tiny little bit more stable but still ways to go for production use.
Best Regards
Sebastian
not sure if you're still having the issue but if your container has limited internet connectivity, then there might be something to it. AFAIK by default Matomo has Provider plugin enabled, which performs external DNS lookup. That takes place as part of tracking requests processing, and could fail in your case. Check this: Matomo optimisation how-to
Tracker API: if the ‘Provider’ plugin is activated in your Matomo, the Internet provider by doing a reverse DNS lookup which adds a few milliseconds overhead.
Other than that, I suppose you're speaking about timeouts on tracking requests - you could enable tracker debug mode (in config.ini.php) to see the output of request processing.
If this is about reporting queries - then this may be broader subject, as this may be issue with archiving timeouts.
If this will reach you and you would respond, please specify if this is about tracking or reporting requests :)

Concurrent requests handling on Google App Engine

I was experimenting with concurrent request handling on few platforms.
The aim of the experiment was to have a broad measure of the capacity bounds of some selected technologies.
I set up a Linux VM on my machine with a basic Go http server (the vanilla http.HandleFunc of the http default package).
The server would then compute a modified version of the fasta algorithm that restricted threads and processes to 1, and return the result. N was set to 100000.
The algorithm runs in roughly 2 seconds.
I used the same algorithm and logic on a Google App Engine project.
The algorithm is written using the same code, just the handler set up is done on init() instead of main() as per GAE requirements.
On the other end an Android client is spawning 500 threads each one issuing in parallel a GET request to the fasta computing server, with a request timeout of 5000 ms.
I was expecting the GAE application to scale and answer back to each request and the local Go server to fail on some of the 500 requests but results were the opposite:
the local server correctly replied to each request within the timeout bounds while the GAE application was able to handle just 160 requests out of 500. The remaining requests timed out.
I checked on the Cloud Console and I verified that 18 GAE instances were spawned, but still the vast majority of requests failed.
I thought that most of them failed because of the start-up time of each GAE instance, so I repeated the experiment right after but I had the same results: most of the requests timed out.
I was expecting GAE to scale to accomodate ALL the requests, believing that if a single local VM could successfully reply to 500 concurrent requests GAE would have done the same, but this is not what happened.
The GAE console doesn't show any error and correctly reports the number of incoming requests.
What could be the cause of this?
Also, if a single instance could handle all the incoming requests on my machine by virtue of only goroutines, how come that GAE needed to scale so much at all?
To make optimal usage in terms of minimizing costs you need to configure few things in app.yaml:
Enable threadsafe: true - actually it's from Python config and not applicable to Go but I would set it just in case.
Adjust scaling section:
max_concurrent_requests - set to maximum 80
max_idle_instances - set to minimum 0
max_pending_latency - set it to automatic or greater then min_pending_latency
min_idle_instances - set it to 0
min_pending_latency - set to higher number. If you are OK to get 1 second latency and you handlers take on average 100ms to process set it to 900ms.
Then you should be able to proceed a lot of request on single instance.
If you OK to burn cash for the sake of responsiveness & scalabiluty - increase min_idle_instances & max_idle_instances.
Also do you use similar instance types for VM and GAE? The GAE F1 instance is not too fast and is more optimal for async tasks like working with IO (datastore,http,etc.). You can configure usage of more powerful instance to better scale for computation intensive tasks.
Also do you test on paid account? Free accounts have quotas and AppEngine would refuse percentage of requests if it believe the load would exceed the daily quota if continuous with the same pattern.
Extending on Alexander's answer.
The GAE scaling logic is based on incoming traffic trend analysis.
The key for being able to handle your case - sudden spikes in traffic (which can't be takes into account in the trend analysis due to its variation speed) - is to have sufficient resident (idle) instances configured for your application to handle such traffic until GAE spins up additional dynamic instances. It can handle as high peaks as you want (if your pockets are deep enough).
See Scaling dynamic instances for more details.
Thanks everyone for their help.
Many interesting points and insights have been made by the answers I had on this topic.
The fact the the Cloud Console were reporting no errors led me to believe that the bottleneck was happening after the real request processing.
I found the reason why the results were not as expected: bandwidth.
Each response had a payload of roughly 1MB and thus responding to 500 simultaneous connections from the same client would clog the lines, resulting in timeouts.
This was obviously not happening when requesting to the VM, where the bandwith is much larger.
Now GAE scaling is in line with what I expected: it successfully scales to accomodate each incoming request.

Google Cloud Bigtable Python Client Performance Issue

I'm running into a performance issue with Google Cloud Bigtable Python Client. I'm working on a flask API that writes to and reads from a GCP Bigtable instance. The API uses the python client to communicate with Bigtable, and was deployed to GCP App Engine flexible environment.
Under low traffic, the API works fine. However during a load test, the endpoints that reads and writes to Bigtable suffers a huge performance decrease compare to a similar endpoint that doesn't communicate with Bigtable. Also, a large percentage of requests went to the endpoint receives a 502 Bad Gateway, even when health check was turned off in App Engine.
I'm aware of that the client is currently in Alpha. I wonder if the performance issue is known, or if anyone also ran into the same issue
Update
I found a documentation from Google stating:
There are issues with the network connection. Network issues can
reduce throughput and cause reads and writes to take longer than
usual. In particular, you'll see issues if your clients are not
running in the same zone as your Cloud Bigtable cluster.
In my case, my client is in a different region, by moving it to the same region had a huge increase in performance. However the performance issue still exist, and the recommendation from the documentation is to put client in the same zone as Bigtable.
I also considered using Container engine or Compute Engine where it is easier to specify the zone, but I want stay with App Engine for its autoscale functionality and managed services.
Bigtable client take somewhere between 3 ms to 20 ms to complete each request, and because python is single threaded, during that period of time it will just wait until the response comes back. The best solution we found was for any writes, publish the request to Pubsub, then use Dataflow to write to Bigtable. It is significantly faster because publishing a message in Python would take way below 1 ms to complete, and because Dataflow can be set to exactly the same region as Bigtable, and it is easy to parallel, it can write much faster.
Though it doesn't solve the scenario where you need frequent read or write need to be instantaneous

Platform as a Service to handle tens of thousands of simultaneous long term network connections

Is there a Platform as a Service (PaaS, e.g. Google App Engine or Windows Azure) that for a reasonable cost can be used to run a server for relaying peer to peer "real time" communication between clients?
This system will in my case be used to relay (small amounts of) network traffic to and from small home automation gadgets with limited resources programmed in embedded C, to Android and iOS apps. In a few years I expect several tens of thousands of simultaneous connections.
The reason I am looking for a PaaS solution and not IaaS is that I would like to minimize the time and expertise needed for virtual computer, OS and server application maintenance.
Because of the resource constraints of the home automation gadget, a solution like PubNub is not possible. I have a few thousand bytes of available program flash for my embedded C code, so the protocol used would have to be pretty basic (e.g. raw TCP or UDP, HTTP or WebSockets).
Using "long polling" with Google App Engine (GAE) would be too expensive, as they bill for the whole duration of the connection even if almost no traffic is transfered. GAE supports Sockets, but only outgoing sockets and not listening sockets on the server. Is it possible to get around this limitation somehow by e.g. sending a UDP packet to GAE first (to punch a hole in the user's firewall, and having GAE then initiating an outgoing socket back to the home automation gadget or Android/iOS app?
Or do you see any other possible solutions using the PaaS aspects of Windows Azure or other PaaS providers?
Any tips or possible solutions are greatly appreciated!
AMQP seems like it would fit your protocol needs and the Apache Qpid/Proton project has some client libraries, their C code might meet your needs. On the service side you could test things out using Azure ServiceBus since it speaks AMQP. If that didn't meet your needs you could host a worker role and run one of the AMQP clients in there.
Another option to consider is ZeroMQ. They have a lot of very simple client APIs and building a relay service that ran in a Worker role would be a trivial amount of code. Java Sample C# Sample Those samples are using an "inproc" transport and I'm guessing you want to switch that to TCP.

Realtime game on Google Cloud : Channel API or Compute Engine?

We need to develop a multi-player game with real-time performance.
This needs to be working worldwide (servers in America, Europe, Asia), and supporting a huge traffic. Using Google Cloud services for the hosting.
We're thinking of references like Jam with Chrome, Chrome Maze or Cube Slam.
The game :
2 players challenge a race
We need to simultaneously display the progression of the 2 players
Each match could last around 30 to 45 seconds
The hosting :
We will obviously host the website on AppEngine, automagically scaling,
but are thinking about 2 solutions for the real-time servers :
Using websocket servers with Compute Engine
Like they did for Jam with Chrome, Maze, etc.
Developing our own websocket servers (technology TBD), deploying on datacenters in Europe, US, Asia, handling scaling, syncing between them, computing latency issues on servers and clients, etc.
But it's pretty technically challenging as we are very short on time, and missing an admin sys and network guy for now.
Or using Channel API
We understand that it's not a websocket platform, and real-time performances are lower.
But it would be way more simple and secure for us and the time we have.
So, we would also like to know more about that.
In any case, we think we could use some graphical tricks on front ends, to make it look like real-time, but it really depends if we have a 100~500ms or a 500ms~10s latency.
Some questions :
What would the latency range values look like for the different solutions ?
(Jam w/ Chrome got 100ms with GCE, could Channel API reach several seconds ?)
How would Channel API servers handle high traffic, how does scaling work, could the latency go very high ? (no info about that on Channel docs ?)
What if someone in France play with someone in US, connecting to different servers, waiting them to sync, how to deal with it ?
Any advice or experience to share ?
Any interesting reading or viewing ? (seen some but not very precise)
Any other solution ?
Thank you for any helping comment !
EDIT :
Only 2 players connected together, potentially from different world zone, no broadcasting needed.
We could find some front side tricks to avoid server side processing. This is a race between 2 players, so we actually just need to compare their progression, and the real winner resolution is not that important as there is no real stuff to win, this is more for fun.
If you need a server for processing the data:
I would definitely go with websockets at Compute Engine!
The Channels API is much slower, and also quite unpredictable (latency differs from message to message)! Data has to go to the Channels server, which sends it to the App Engine instance, which has to do a request back to the Channels server, which will push the message to the client. There is too much going on there if you want to keep latency down!
Here is a Channels API stress test:
http://channelapistresstest.appspot.com/
Try clicking "send 5"-button a lot, and you will see latency numbers going up to several seconds.
The Channels API is also quite expensive under heavy load (it probably does not scale well, even if Google of course can solve that with more instances).
When keeping latency down, geolocation is quite important. With a websocket server at Compute Engine, you can send your european visitors to google's european datacenter and your american visitors to the US datacenter (using the geo location headers that AppEngine will provide). You have no such control with the Channels API (or app engine, which all your messages are relayed through). Maybe Google has edge servers for the Channels API (I don't know), but if your AppEngine instance is on the other side of the planet, that does not matter.
If you do NOT need a server for processing the data:
You should establish a peer-to-peer connection with WebRTC, sending stuff directly between the users' browsers. That is was Cube Slam does. (WebRTC requires some initial handshaking ("signaling") so the two peers can find each other, and Channels API would work fine for that handshaking, that's just a couple of messages to establish the peer-to-peer connection.)
WebRTC DataChannels API will give you a nice websocket-like interface like channel.onmessage = function(e) { yadayada()... }; and channel.send("yadayada"); to send your data between the peers.
Occasionally, WebRTC is not able to make a peer-to-peer connection. Then it will fall back to a TURN server, which relays traffic between the peers. Cube Slam is using TURN servers running on ComputeEngine (in both Europe and America to keep latency down), but that is just the fallback when true peer-to-peer is not possible.
It also depends on other things like scalability.
Ingress is built on app engine and a part from the occasional cache glitch it is pretty impressive.
Remember that the channel api is using talk.Google which is the service that hangouts is built on. Scalable and real time.
Personally if your traffic levels are going to be erratic and unpredictable, go app engine. If you think it can be controlled and predictable use compute engine or something else.
Alfred's answer is the best in the frame of the question I asked.
Thank you very much !
However, I forgot to mention a few important points and the scope changed a bit :
We have very little development time (about 1 week only)
This is for a campaign that will last 3 weeks only (we'll need to keep it online a few months afterward, but this is not like we need a long-lasting architecture)
We need to make it work on the broader browser audience as possible (WebRTC only runs on Chrome & Firefox for now)
According to these points, we eventually came up to a 3rd solution :
Using a real-time PAAS.
It's way easier and faster to develop, way cheaper as we don't need a solid backend developer and system/network admin, and we can concentrate more on the project than on the infrastructure and platform.
There are a couple of services that seems good out there, already hosting MMO RPG and the kind, worldwide, with low latency, and good scaling systems.
Here is a list of providers :
https://github.com/leggetter/realtime-web-technologies-guide/blob/master/guide.md

Resources