Best way to implement Google Pub/Sub Subscribe + HTTP Rest API + Database - database

I want to ask about the best way to implement Pub Sub with REST API.
I want to subscribe data from Pub/Sub, then I send the data via HTTP Rest API, which hits every endpoint per subscribe.
And then, save into database after hit endpoint.
So, there are 3 jobs in the service.
Subscribe to Pub/Sub
Send payload from step 1 via HTTP Rest API
Save data into database.
My first approach will bee create two different service.
First service to subscribe data from Pub/Sub.
Second service to send payload data from step 1 to HTTP Rest API and in this service, save payload into database.
What is the best way to implement it?
What if the data from Pub/Sub becomes too much. Is it possible that the service could be down or slow?
I don't have much experience with Pub/Sub.
Thank you.

Because your architecture is asynchronous, I propose you that approach
PubSub push subscription (initial topic)
Cloud Run service that call the HTTP service. If successful, publish a message in PubSub
3 PubSub push subscription (topic on successful HTTP rest API call)
Cloud Run service that save the data in database.
That architecture is idempotent and scalable. You can play with the concurrency parameter of Cloud Run to save money or speed up the process.

Related

How to call streaming API from Azure Logic Apps?

Is there any way that I can use Azure Logic Apps to call streaming APIs? The API doesn't support chunking parameter so Range header is not an option.
I am looking for a logic similar to Python Requests library's Response.iter_content() that I can iterate over response data in chunks and write them back in DB.
My desired workflow: The logic app gets triggered once a day at a specific time. It calls the streaming API using HTTP connector and streams in all the data (data is much larger than HTTP connector internal buffer size). The API supports stream request and I am able to iterate through response in chunks using python Response.iter_content(chunk_size=1048576) to fetch them all.
PS: I know I can use Azure function and have it triggered within my Logic App workflow, but I'd prefer to use Logic Apps native connectors.

How to perform database queries in async microservices?

I have problem with one concept about async microservices.
Assume that all my services are subscribe to some event bus, and I have exposed API Gateway which takes HTTP request and translate them to AMQP protocol.
How to handle GET requests to my API Gateway? Should I use RPC? For single entity it’s ok, but what about some search or filtering (eg. get games by genre from Games service)?
I’m thinking about using RPC for getting single entities by ids and creating separate Search service with Elastic which will expose some GET endpoints to API Gateway. But maybe somewhere it’s simpler solution for my problem. Any ideas?
Btw., It’s correct to translate HTTP requests from API Gateway to AMQP messages?
Please take a look at Event Sourcing and CQRS. You can also take a look at my personal coding project:

Firebase: How to awake App Engine when client changes db?

I'm running a backend app on App Engine (still on the free plan), and it supports client mobile apps in a Firebase Realtime Database setup. When a client makes a change to the database, I need my backend to review that change, and potentially calculate some output.
I could have my App Engine instance sit awake and listen on Firebase ports all the time, waiting for change anywhere in the database, but That would keep my instance awake 24/7 and won't support load balancing.
Before I switched to Firebase, my clients would manually wake up the backend by sending a REST request of the change they want to perform. Now, that Firebase allows the clients to make changes directly, I was hoping they won't need to issue a manual request. I could continue to produce a request from the client, but that solution won't be robust, as it would fail to inform the server if for some reason the request didn't come through, and the user switched off the client before it succeeded to send the request. Firebase has its own mechanism to retain changes, but my request would need a similar mechanism. I'm hoping there's an easier solution than that.
Is there a way to have Firebase produce a request automatically and wake up my App Engine when the db is changed?
look at the new (beta) firebase cloud functions. with that, you can have node.js code run, pre-process and call your appengine on database events.
https://firebase.google.com/docs/functions/
Firebase currently does not have support for webhooks.
Have a look to https://github.com/holic/firebase-webhooks
From Listening to real-time events from a web browser:
Posting events back to App Engine
App Engine does not currently support bidirectional streaming HTTP
connections. If a client needs to update the server, it must send an
explicit HTTP request.
The alternative doesn't quite help you as it would not fit in the free quota. But here it is anyways. From Configuring the App Engine backend to use manual scaling:
To use Firebase with App Engine standard environment, you must use
manual scaling. This is because Firebase uses background threads to
listen for changes and App Engine standard environment allows
long-lived background threads only on manually scaled backend
instances.

Collecting Data from public mobile application

I'll like to collect information from a mobile application I created. The app allow users to use it without authentication and also I'll like to collect the data to highly-available service such as AWS SQS so I'll not miss any data.
The application is always connected to the internet so no need for offline collection of the data.
What bother me is how can I send the data in a secure manner so that users will not be able to send fake data into the same endpoint I'm using.
Google Analytics is not fit here because I need access to the raw data, not only aggregate of it.
You should look into STS for getting temporary access credentials from your app instead of hard coding AWS credentials into your app.
The fact that your application does not require authentication does not necessarily mean you are at an increased likelihood of having a malicious actor send bad data to your service. If your app had authentication it would still be possible for a malicious actor to reverse engineer the requests and send bad data using the authenticated credentials.
While sending data directly to SQS is a valid option, you could also send the data into SNS if you want to have the ability to fan out to multiple systems such as multiple SQS queues.
You could also look into using API Gateway + Lambda as the service that is called from your app even if the Lambda function only sends the data to SQS as this would allow for additional processing flexibility in the future such as validating input with additional logic before it is sent to SQS. However, this type of logic could just as easily be performed when the messages are pulled off the queue.

Is there any way for the Google App Engine's urlfetch to open and keep open a Twitter Streaming API connection?

The Twitter streaming api says that we should open a HTTP request and parse updates as they come in. I was under the impression that Google's urlfetch cannot keep the http request open past 10 seconds.
I considered having a cron job that polled my Twitter account every few seconds, but I think Google AppEngine only allows cron jobs once a minute. However, my application needs near-realtime access to my twitter #replies (preferably only a 10 second or less lag).
Are there any method for receiving real-time updates from Twitter?
Thanks!
Unfortunately, you can't use the urlfetch API for 'hanging gets'. All the data will be returned when the request terminates, so even if you could hold it open arbitrarily long, it wouldn't do you much good.
Have you considered using Gnip? They provide a push-based 'web hooks' notification system for many public feeds, including Twitter's public timeline.
I'm curious.
Wouldn't you want this to be polling twitter on the client side? Are you polling your public feed? If so, I would decentralize the work to the clients rather than the server...
It may be possible to use Google Compute Engine https://developers.google.com/compute/ to maintain unrestricted hanging GET connections, then call a webhook in your AppEngine app to deliver the data from your compute engine VM to where it needs to be in AppEngine.

Resources