Outgoing bandwidth 2x that of total responses - google-app-engine

I have a very simple app engine application serving 1.8Kb - 3.6Kb gzipped files stored in blobstore. A map from numeric file ID to blobkey is stored in the datastore and cached in memcache. The servlet implementation is trivial: a numeric file ID is received in the request; the blobkey is retrieved from memcache/datastore and the standard BlobstoreService.serve(blobKey, resp) is invoked to serve the response. As expected the app logs show that response sizes always match the blobstore file size that was served.
I've been doing some focused volume testing and this has revealed that the outgoing bandwidth quota utilization is consistently reported to be roughly 2x what I expect given requests received. I've been doing runs of 100k requests at a time summing bytes received at the client, comparing this with the app logs and everything balances except for the outgoing bandwith quota utilization.
Any help in understanding how the outgoing bandwidth quota utilization is determined for the simple application I describe above? What am I missing or not accounting for? Why would it not tally with the totals shown for response sizes in the app logs?
[Update 2013.03.04: I abandoned the use of the blobstore and reverted back to storing the blobs directly in the datastore. The outgoing bandwidth utilization is now as exactly as expected. It appears that the 2x multiplier is somehow related to the use of the blobstore (but it remains inexplicable). I encountered several other problems with using the blobstore service; most problematic were the additional datastore reads and writes (which are related to the blobinfo and blobindex meta data managed in the datastore - and which is what I was originally trying to reduce by migrating my data to the blobstore). A particularly serious issue for me was this: https://code.google.com/p/googleappengine/issues/detail?id=6849. I consider this a blobstore service memory leak; once you create a blob you can never delete the blob meta data in the datastore. I will be paying for this in perpetuity since I was foolish enough to run a 24 hr volume and performance test and now am unable to free the storage used during the test. It appears that the blobstore is currently only suitable for use in very specific scenarios (i.e. permanently static data). Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.]

The blobstore data can be deleted (i don't recommend it since it can lead to unexpected behavior), but only if you know the table that it is saved in __BlobInfo__ and __BlobFileIndex__. This is done, so your uploaded files don't have the same name, and to accidentally replace an old file.
For a full list of tables that are stored in datastore you can run SELECT * FROM __kind__.
I am not sure why your app engine app consumes 2x you outgoing bandwidth, but i will test it myself.
An alternative is to use Google Cloud Storage. If you use the default bucket for your app engine app, you get 5GB free storage
Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.
That's true, you can either use cloud storage or datastore (cloud storage is an immutable object storing service). Blobstore was more for uploading files via <input type='file' /> forms. (since recently, writing files from inside app, has been deprecated in favor to cloud storage)

Related

Datastore Issue in Google App Engine making the application not allowing the inputs to work

App engine application is deployed in server and is connected to the database. All the data is loaded in the list and it is showing properly, but if we give some inputs in input box or any other form fields it is not reflecting in the application. The error i got from the console of app engine was this
com.google.apphosting.api.ApiProxy$RequestTooLargeException: The
request to API call datastore_v3.Put() was too large
In the App engine server I cleared the data store entries but it didn't work out well. I disabled the write access of the data store file but for few minutes it worked i was able to give the inputs and it was reflecting but after sometime the URL was not working and then I enabled the write access then the application was accessible.
This error message was caused due the Datastore default limits, most likely the limit related to the limit size for a Datastore entity, since the exception is thrown by datastore_v3.Put():
Maximum size for an entity: 1,048,572 bytes (1 MiB - 4 bytes)
However, also take in mind the limits regarding other Datastore metrics, in the linked documentation.
I would recommend you to double check that the size of the entities you are trying to insert into Datastore falls under this value, as well as under the values of the rest of the limits.

Is Amazon SQS a good tool for handling analytics logging data to a database?

We have a few nodejs servers where the details and payload of each request needs to be logged to SQL Server for reporting and other business analytics.
The amount of requests and similarity of needs between servers has me wanting to approach this with an centralized logging service. My first instinct is to use something like Amazon SQS and let it act as a buffer with either SQL Server directly or build a small logging server which would make database calls directed by SQS.
Does this sound like a good use for SQS or am I missing a widely used tool for this task?
The solution will really depend on how much data you're working with, as each service has limitations. To name a few:
SQS
First off since you're dealing with logs, you don't want duplication. With this in mind you'll need a FIFO (first in first out) queue.
SQS by itself doesn't really invoke anything. What you'll want to do here is setup the queue, then make a call to submit a message via the AWS JS SDK. Then when you get the message back in your callback, get the message ID and pass that data to an invoked Lambda function (you can write those in NodeJS as well) which stores the info you need in your database.
That said it's important to know that messages in an SQS queue have a size limit:
The minimum message size is 1 byte (1 character). The maximum is
262,144 bytes (256 KB).
To send messages larger than 256 KB, you can use the Amazon SQS
Extended Client Library for Java. This library allows you to send an
Amazon SQS message that contains a reference to a message payload in
Amazon S3. The maximum payload size is 2 GB.
CloudWatch Logs
(not to be confused with the high level cloud watch service itself, which is more sending metrics)
The idea here is that you submit event data to CloudWatch logs
It also has a limit here:
Event size: 256 KB (maximum). This limit cannot be changed
Unlike SQS, CloudWatch logs can be automated to pass log data to Lambda, which then can be written to your SQL server. The AWS docs explain how to set that up.
S3
Simply setup a bucket and have your servers write out data to it. The nice thing here is that since S3 is meant for storing large files, you really don't have to worry about the previously mentioned size limitations. S3 buckets also have events which can trigger lambda functions. Then you can happily go on your way sending out logo data.
If your log data gets big enough, you can scale out to something like AWS Batch which gets you a cluster of containers that can be used to process log data. Finally you also get a data backup. If your DB goes down, you've got the log data stored in S3 and can throw together a script to load everything back up. You can also use Lifecycle Policies to migrate old data to lower cost storage, or straight remove it all together.

Limit upload size for appengine interface to cloud store

Consider an image (avatar) uploader to Google Cloud Storage which will start from the user's web browser, and then pass through a Go appengine instance which will handle standard compression/cropping etc. and then set the resulting image as an object in Cloud Storage
How can I ensure that the appengine instance isn't overloaded by too much or bad data? In other words, I think I'm asking two questions (or possibly not):
How can I limit the amount of data allowed to be sent to an appengine instance in a single request, or is there already a default safe limit?
How can I validate the data to make sure it's proper jpg/png/gif before attempting to process it with standard go image libraries?
All App Engine requests are limited to 32MB.
You can check the size of the file being uploaded before the upload starts.
You can verify the file's mime-type and only allow correct files to be uploaded.

Google app engine and application scope data

I'm developing a spring based web application that is deployed on Google app Engine.
I have a manager that store data in application scope. I'm using a spring bean (singleton) that hold a map and simply perform get and remove from the map,
however GAE is a distributed environment and this design has a problem since each application instance will have its own manager and request are not guaranteed to be made to the same application instance.
So I've looked around and found 2 possible solution:
use data store
use memcache
storing the data will cause a lot of read and writes and eventually I do not need the data to be saved.
second looked promising but Google mentioned that:
In general, an application should not expect a cached value to always be available.
and I need a guarantee that when I ask my manger to get a value it will return it.
is there any other solution?
did I miss anything?
A common solution is storing the values in a memcache, backed by the datastore.
First fetch the application-scope value from the memcache, and if the memcache returns zero-result (a cache-miss event), fetch the value from the datastore and put the fetched value on the memcache.
In this way, the next fetch to the memcache would return your application-scope data, reducing the need for a (relatively) costly read to the datastore.
App Engine memcache is a least-recently used cache, so values that are frequently read would suffer from few cache-miss event.
A more complex solution to have a application-scope value is to store the values in-memory at a resident backend, and have all your other instances request/update the value from/to this particular backend via a servlet handler.

Uploading file on Google App Engine using Datastore and 30 sec response time limitation

Will the response timer on google app engine start upon submitting the web page's form?
If I'm going to upload a file that is greater than 1MB, I could split the files to 1MB to fit in the limitation of the Google App Engine Datastore. Now, my concern is if the client's internet connection is slow, it would eat up the 30 seconds timer right? If this is the case, it is impossible to upload large files with slow connection?
The 30 second response time limit only applies to code execution. So the uploading of the actual file as part of the request body is excluded from that. The timer will only start once the request is fully sent to the server by the client, and your code starts handling the submitted request. Hence it doesn't matter how slow your client's connection is.
As an side note, Instead of splitting your file into multiple parts, try using the blobstore. I am using it for images and it raises the storage limit to 50MB. (Remember to enable billing to get access to the blobstore)

Resources