Appengine, excel files and the 30 seconds request limit - google-app-engine

How can I upload parse and download excel files in Google appengine that require more than 30secs ? I use java poi and backend tasks, but as soon as the backend does the job I cannot notify the client. I cannot download the excel that is created from the backend task... Any suggestions would be much appreciated.

The best approach here is not to fight HTTP and a web service architecture but rather to work with it.
Introduce a notion of a job id. When your client uploads a file, immediately return a token that represents that job. Extra credit, include an estimated duration of the job. For starters, lets say its 2 minutes.
The client is then responsible for querying the server for the state of that job id using the token. The server either returns the answer, or it returns the token back with an updated ETA.
For starters, you could just always tell the client to check back in 2 minutes (or whatever constant makes most sense for your workload). As your server processing becomes smarter, you could give more accurate estimates, and decrease the busy-waiting the client does.

Related

Display realtime data in reactjs

I'm sending data from my backend every 10 seconds and I wanted to display that data in reactjs. I've searched on the net to use socket.io to display real-time data. Is there a better way to use it?
If you're dead set on updating your data every 10 seconds, it would make more sense to make a request from the client to the server, as HTTP requests can only be opened from client to server. By using HTTP requests, you won't need to use socket.io, but socket.io is an easy alternative if you need much faster requests.
Depending on how you are generating the data being sent from your backend, specifically if you are using a database, there is most likely a way to subscribe to changes in the database. This would actually update the data in realtime, without a 10 second delay.
If you want a more detailed answer, you'll have to provide more detail regarding your question: what data are you sending? where is it coming from or how are you generating it?
I'm working on an autodialer feature, in which an agent will get a call when I trigger the button from the frontend (using react js language), and then automatically all the leads in the agent assigned portal will get back-to-back calls from agent number. However, because this process is automatic, the agent won't know who the agent has called, so I want to establish a real-time connection so that I can show a popup on the frontend that contains information about the lead who was called.

Does JMeter scripts actually creates records in database

Let's say I run a recorded script for 'New User Registration' function of a web site to evaluate the response time for entire scenario. When I run the recorded script from JMeter, for each registration script, is there a new user record getting created in the application database ?
Yes, if you record registration and correlate it (meaning you create a valid unique name for every request) you will create a real user in your environment.
JMeter is simulating a real scenario which effect your environment.
That is part of the reason JMeter will be executed in different environment than production (as stage)
Well-behaved JMeter script must represent a real user using a real browser as close as it is possible.
Browsers execute HTTP requests and render the response
JMeter executes the same HTTP requests but doesn't render the response, instead it records performance metrics like response time, connect time, latency, throughput, etc.
HTTP is a stateful protocol therefore given you execute the same request you will get the same response. So if there are no mistakes in your script it either should create a new user or fail due to non-unique username error.
Yes, if your script accurately represents the full set of data flows associated with the business process, "New User Registration," then the end state of that process should be identical to that of the user behavior so modeled.
A record will be created in the database. If not, then your user is not accurate in its behavior

Is Amazon SQS a good tool for handling analytics logging data to a database?

We have a few nodejs servers where the details and payload of each request needs to be logged to SQL Server for reporting and other business analytics.
The amount of requests and similarity of needs between servers has me wanting to approach this with an centralized logging service. My first instinct is to use something like Amazon SQS and let it act as a buffer with either SQL Server directly or build a small logging server which would make database calls directed by SQS.
Does this sound like a good use for SQS or am I missing a widely used tool for this task?
The solution will really depend on how much data you're working with, as each service has limitations. To name a few:
SQS
First off since you're dealing with logs, you don't want duplication. With this in mind you'll need a FIFO (first in first out) queue.
SQS by itself doesn't really invoke anything. What you'll want to do here is setup the queue, then make a call to submit a message via the AWS JS SDK. Then when you get the message back in your callback, get the message ID and pass that data to an invoked Lambda function (you can write those in NodeJS as well) which stores the info you need in your database.
That said it's important to know that messages in an SQS queue have a size limit:
The minimum message size is 1 byte (1 character). The maximum is
262,144 bytes (256 KB).
To send messages larger than 256 KB, you can use the Amazon SQS
Extended Client Library for Java. This library allows you to send an
Amazon SQS message that contains a reference to a message payload in
Amazon S3. The maximum payload size is 2 GB.
CloudWatch Logs
(not to be confused with the high level cloud watch service itself, which is more sending metrics)
The idea here is that you submit event data to CloudWatch logs
It also has a limit here:
Event size: 256 KB (maximum). This limit cannot be changed
Unlike SQS, CloudWatch logs can be automated to pass log data to Lambda, which then can be written to your SQL server. The AWS docs explain how to set that up.
S3
Simply setup a bucket and have your servers write out data to it. The nice thing here is that since S3 is meant for storing large files, you really don't have to worry about the previously mentioned size limitations. S3 buckets also have events which can trigger lambda functions. Then you can happily go on your way sending out logo data.
If your log data gets big enough, you can scale out to something like AWS Batch which gets you a cluster of containers that can be used to process log data. Finally you also get a data backup. If your DB goes down, you've got the log data stored in S3 and can throw together a script to load everything back up. You can also use Lifecycle Policies to migrate old data to lower cost storage, or straight remove it all together.

Using 1 intance of google-app-engine to monitor external service

I planning to create a NodeJS program, that work 24/7, that ping and make requests to an external server (outside of google cloud) every minute. Just to see that it the external services are are live.
If there is any error it will notify me by SMS & Email.
I don't need any front-end for this app, and no one needs to connect to it. Just simple NodeJS program.
The monitoring and configuration will be by texts files.
Now the questions:
It looks like it will cost me just $1.64. It sounds very cheap. Am I missing something?
It needs to work around the clock, I will request it to start it once, and it need to continue working, (by using setInterval). Is it will be aborted?
What it is exactly mean buy 1 instance. What an instance can do? Only respond to one request or what?
I tried to search in Google: appengine timeout, but didn't found anything that helps.
Free Quota
If you write your application in Python, PHP, Go or Java it can fit in free usage quota:
https://cloud.google.com/appengine/docs/quotas
So there will be absolutely no costs to run it on Google App Engine platform.
There are limit of 657,000 UrlFetch API Calls per day (more than 450 calls per minute in 24/7 mode) for free apps. 4GB traffic may also be sufficient for this kind of work.
Keep in mind there is no SMS sending services provided by Google App Engine and you will need to spend additional UrlFetch API calls to use external SMS services.
Email sending is also limited to 100 Emails per day (or 5000 Emails to admin address), so try not so send repeated notifications about same monitored server every minute, or you'll deplete your Email quote in 1.5 hours.
Scheduled Tasks
There is no way to run single process indefinitely without interruption on App Engine. But you don't have to!
You'll need to encapsulate all the work you're planning to execute in every iteration into single task and then schedule it to run every minute with Cron. See this documentation for Python: https://cloud.google.com/appengine/docs/python/config/cron
It is recommended to have some configuration page where you can set some internal configuration or see monitoring statistics, at least manage flag to temporarily pause tasks execution without redeploying your app.

How to fan out URL Fetch requests in a timely fashion?

Every minute or so my app creates some data and needs to send it out to more than 1000 remote servers via URL Fetch callbacks. The callback URL for each server is stored on separate entities. The time lag between creating the data and sending it to the remote servers should be roughly less than 5 seconds.
My initial thought is to use the Pipeline API to fan out URL Fetch requests to different task queues.
Unfortunately task queues are not guaranteed to be executed in a timely fashion. Therefore from requesting a task queue start to it actually executing could take minutes to hours. From previous experience this gap is regularly over a minute so is not necessarily appropriate.
Is there any way from within App Engine to achieve what I want? Maybe you know of an outside service that can do the fan out in a timely fashion?
Well, there's probably no good solution for the gae here.
You could keep a backend running; hammering the datastore/memcache
every second for new data to send out, and then spawn dozens of async url-fetches.
But thats really inefficient...
If you want a 3rd party service, pubnub.com is capable of doing fan-out, however i don't know if it could fit in your setup.
How about using the async API? You could then do a large number of simultaneous URL calls, all from a single location.
If the performance is particularly sensitive, you could do them from a backend and use a B8 instance.

Resources