Is there any way for the Google App Engine's urlfetch to open and keep open a Twitter Streaming API connection? - google-app-engine

The Twitter streaming api says that we should open a HTTP request and parse updates as they come in. I was under the impression that Google's urlfetch cannot keep the http request open past 10 seconds.
I considered having a cron job that polled my Twitter account every few seconds, but I think Google AppEngine only allows cron jobs once a minute. However, my application needs near-realtime access to my twitter #replies (preferably only a 10 second or less lag).
Are there any method for receiving real-time updates from Twitter?
Thanks!

Unfortunately, you can't use the urlfetch API for 'hanging gets'. All the data will be returned when the request terminates, so even if you could hold it open arbitrarily long, it wouldn't do you much good.
Have you considered using Gnip? They provide a push-based 'web hooks' notification system for many public feeds, including Twitter's public timeline.

I'm curious.
Wouldn't you want this to be polling twitter on the client side? Are you polling your public feed? If so, I would decentralize the work to the clients rather than the server...

It may be possible to use Google Compute Engine https://developers.google.com/compute/ to maintain unrestricted hanging GET connections, then call a webhook in your AppEngine app to deliver the data from your compute engine VM to where it needs to be in AppEngine.

Related

Google App Engine streaming data into Bigquery: GCP architecture

I'm working with a Django web app deployed on Google App Engine flexible environment.
I'm streaming my data while processing requests in my views using bigquery.Client(). But I think it is not the best way to do it. Do I need to delegate this process outside of the view (using pub/sub, tasks, cloud functions etc.? If so, give me a suitable architecture: which GCP product should I use, how to connect, and what to read.
Based on your comment, I could recommend you Cloud Run;
Cloud Run is a serverless container based product. You write a webserver (that handle your POST request), wrap it in a container and deploy it on Cloud Run.
With a brand new feature, named always on the CPU is not throttled after the response sent (the normal behavior). With always on, you keep the full CPU up to the Cloud Run instances off load (usually after 15 minutes, but can be quicker).
The benefit of the feature is the capacity to return immediately the response to the client, and then to continue to process, asynchronously, your data to store in BigQuery (in streaming mode).

what google cloud component to use for my client-server pub/sub?

I'm building an app on google cloud that takes users' audio recording. Users would record 1 or more audio clip and upload them to the the backend, which would process the clips, run Machine Learning prediction model I built and return an integer back to the user for each of the audio that are uploaded. Processing and predicting 1 piece of audio takes about 10 seconds. Users can upload 20 audio at a time.
What I have so far is:
HTML, Javascript, css on the client side. The upload functionality is async using fetch and return a promise
The backend is running Google AppEngine (python3.7), Firebase Authentication, Google CLoud storage and cloud logging
The processing and the prediction is running on Google Cloud Function.
My question is as follow:
Since, it might take up to 200-300 seconds for the processing to complete, how should I be handling the task once users hit the upload button? Is simple request-response enough?
I have investigated the following:
Google Cloud tasks. This seems inappropriate, because client actually needs to know when the processing is done. There is really no call back when the task is done
Google Cloud PubSub. There is a call back for when the job is done (subscribe), but it's server side. This seems more appropriate for server to server communication, instead of client-server.
What is the appropriate piece of tech to use in this case?
There is to way to improve the user experience.
Firstly, on the processing, you can perform parallel processing. All the prediction should be handled by the same Cloud Functions. In App Engine, you should have a multi-thread processing which invoke your CLoud Functions for only one audio clip, and do that 20 time in parallel. I don't know how to achieve that with async in Python, but I know that you can
Then, if you implement that, you will wait all the end of all the audio clip processing to send a response to your users. the total should be between 15 - 20 seconds.
If you use Cloud Run, you can use streaming (or partial HTTP response). Therefore, you could send a partial response when you get a response from your CLoud Functions, whoever the audio clip (the 3rd can be finished before the 1st one).
As you note, Cloud Pub/Sub is more appropriate for server-to-server communication since there is currently a limit of 10,000 subscriptions per-topic/per-project (this may change in the future).
Firebase Cloud Messaging may be a more appropriate solution for client-server messaging.

Firebase: How to awake App Engine when client changes db?

I'm running a backend app on App Engine (still on the free plan), and it supports client mobile apps in a Firebase Realtime Database setup. When a client makes a change to the database, I need my backend to review that change, and potentially calculate some output.
I could have my App Engine instance sit awake and listen on Firebase ports all the time, waiting for change anywhere in the database, but That would keep my instance awake 24/7 and won't support load balancing.
Before I switched to Firebase, my clients would manually wake up the backend by sending a REST request of the change they want to perform. Now, that Firebase allows the clients to make changes directly, I was hoping they won't need to issue a manual request. I could continue to produce a request from the client, but that solution won't be robust, as it would fail to inform the server if for some reason the request didn't come through, and the user switched off the client before it succeeded to send the request. Firebase has its own mechanism to retain changes, but my request would need a similar mechanism. I'm hoping there's an easier solution than that.
Is there a way to have Firebase produce a request automatically and wake up my App Engine when the db is changed?
look at the new (beta) firebase cloud functions. with that, you can have node.js code run, pre-process and call your appengine on database events.
https://firebase.google.com/docs/functions/
Firebase currently does not have support for webhooks.
Have a look to https://github.com/holic/firebase-webhooks
From Listening to real-time events from a web browser:
Posting events back to App Engine
App Engine does not currently support bidirectional streaming HTTP
connections. If a client needs to update the server, it must send an
explicit HTTP request.
The alternative doesn't quite help you as it would not fit in the free quota. But here it is anyways. From Configuring the App Engine backend to use manual scaling:
To use Firebase with App Engine standard environment, you must use
manual scaling. This is because Firebase uses background threads to
listen for changes and App Engine standard environment allows
long-lived background threads only on manually scaled backend
instances.

Google App Engine long process that needs to return to consumer

I am trying to use Google App Engine as a mediator between the mobile platform and a popular cloud storage service. The mobile app tells app engine what parts of a particular file it wants from the cloud storage, app engine should then fetch that file data, processes it and extracts the requested parts to send back to the mobile app. Yes it has to be set up this way, the mobile os is unable to read files of this particular format, but app engine can, and this particular cloud storage is integrated with a required desktop software.
The issue: processing the file and extracting the data exceeds the 60 second response limit and the Task Queue cannot return data back to the originally requesting mobile app. in most cases, the data would be ready to return in 1-3 minutes. I realize that the Channel Api could allow me to receive real-time messages via a web view as to when the data is ready, but this api is very expensive since I would need to allow for thousands of connections a day and each user has to have their own channel per the docs. Should I look in to polling (outside the channel api)? What design models, methods or even other services should I look in to (I have been using gae because of its ease of use, automatic scaling and security; I'm a one man show).
The product relies on a capability that only exists in Java to process the data. Thanks.
You could return a transaction id to the client, and then let the client periodically ping your server with that id to see if the long process is complete.
Appengine 'Backend' instances do not have the 60 seconds limit. You can see the comparison between normal frontend instance and backend instance here: https://developers.google.com/appengine/docs/java/backends/

Stack Exchange API compliant request throttle implementation on Google App Engine Cloud infrastructure

I have been writing a Google Chrome extension for Stack Exchange. It's a simple extension that allows you to keep track of your reputation and get notified of comments on Stack Exchange sites.
Currently I've encountered with some issues that I can't handle myself.
My extension uses Google App Engine as its back-end to make external requests to Stack Exchange API. Each single client request from extension for new comments on single site can cause plenty of requests to api endpoint to prepare response even for non-skeetish user. Average user has accounts at least on 3 sites from Stack Exchange network, some has > 10!
Stack Exchange API has request limits:
A single IP address can only make a certain number of API requests per day (10,000).
The API will cut my requests off if I make more than 30 requests over 5 seconds from single IP address.
It's clear that all requests should be throttled to 30 per 5 seconds and currently I've implemented request throttle logic based on a distributed lock with memcached. I'm using memcached as a simple lock manager to coordinate the activity of GAE instances and throttle UrlFetch requests.
But I think it's a big failure to limit such powerful infrastructure to issue no more than 30 requests per 5 sec. Such api request rate does not allow me to continue development of new interesting and useful features and one day it will stop working properly at all.
Now my app has 90 users and growing and I need come up with solution how to maximize request rate.
As known App Engine makes external UrlFetch requests via the same pool of different IP's.
My goal is to write request throttle functionality to ensure compliance with the api terms of usage and to utilize GAE distributed capabilities.
So my question is how-to provide maximum practical API throughput while complying with api terms of usage and utilizing GAE distributed capabilities.
Advise to use another platform/host/proxy is just useless in my mind.
If you are searching a way to programmatically manage Google App Engine shared pool of IPs, I firmly believe that you are out of luck.
Anyway, quoting this advice that is part of the faq, I think you have more than a chance to keep on running your awesome app:
What should I do if I need more
requests per day?
Certain types of applications -
services and websites to name two -
can legitimately have much higher
per-day request requirements than
typical applications. If you can
demonstrate a need for a higher
request quota, contact us.
EDIT:
I was wrong, actually you don't have any chance.
Google App Engine [app]s are doomed.
First off: I'm using your extension and it rocks!
Have you consider using memcached and caching the results?
Instead of taking the results from the API directly, try first to find them on the cache if they are use it and if they are not: retrieve them and cache them and let them expire after X minutes.
Second, try to batch up users requests, instead of asking the reputation of a single user ask the reputation of several users together.

Resources