Send file from url to GCS - google-app-engine

Send file from url to GCS - google-app-engine

I'm having trouble sifting through all the threads about uploading from a form and can't seem to find anything related to downloading and send from a URL.
I have a large gzipped json file that I need to download from an external server and process its results on App Engine. I have it working right now so that it is downloads the file into memory and unzips it, and then processes it into a task queue in small tasks. However, before making small code memory optimizations, I was hitting my 128MB limit on App Engine. I'm worried this is eventually going to happen again.
Here is my code just in case it is helpful to anyone else:
READ_BLOCK_SIZE = 1024*8
request = urllib2.Request(url)
response = urllib2.urlopen(request)
d = zlib.decompressobj(16+zlib.MAX_WBITS)
str = ""
while True:
data = response.read(READ_BLOCK_SIZE)
if not data:
break
data = d.decompress(data)
str += data
return str
Does anyone have any thoughts? Is there a good way to handle large files on App Engine and send them right to GCS so I don't have to hold it all in memory? Stream the download right to GCS somehow?

Related

Corrupted PDF when downloading through React

I have a pdf uploaded in a Azure Blobstorage and I'm facing some problems during the download routine.
My application runs with springboot and I use an OutputStream provided by a HttpServletResponse in a #RestController method to stream the bytes from the blobstorage to the request.
In the frontend I have a Reac application receiving the information and executing the download through the browser.
Every time I stream the bytes to the frontend I got a corrupted file. It works just fine when I execute a request through Insomnia or Postman.
I tried to compare the files as texts and I could see some differences between them.
Differences
The size of the corrupted file is almost double the size of the original
When I opened files on Notepad++ they seems to be in a different encoding
corrupted
consistent
It looks like there are some characters bad interpreted
corrupted
consistent
My frontend uses FileSaver #2.0.2 to persist the file on disk
const blobParts = [];
const blobOptions = {
type: axiosResponse.headers['content-type'],
};
blobParts.push(axiosResponse.data);
const file = new File(
blobParts,
axiosResponse.headers['content-disposition'].split('=')[1],
blobOptions,
);
return FileSaver.saveAs(file);
I'm wondering if there's a way of keep the ANSI encoding through the persistence process or if there's is a way

Dealing with large zip uploads and extracting using google cloud

I am trying to create a site for e-learning courses (zips html/css/js/media) to be uploaded to.
I am using go on google app engine with google cloud storage to store the zips and extracted courses.
I will explain the development dead ends I have encountered.
My first thought was to use the resumable upload functionality of cloud storage to send the zip file, then read it using go on app engine, unzip the files and write them back to cloud storage.
This took a while to read and understand the documentation and worked perfectly for my 2MB test zip. It failed when I tried it with a modest 67MB zip. I had encountered a hidden limitation when accessing cloud storage from app engine. No matter the client I used there was a 10MB/32MB limit.
I tried both the old and new libraries as well as blobstore.
I also looked into creating a custom oauth2 supporting client library using sockets but hit too many dead ends.
Giving up on that approach I thought even though it would mean more uploading, perhaps just extracting on the client (browser) side then uploading each file with it's own resumable upload would make the most sense. After exploring a few libraries I had extracting in browser working ready to upload.
I wrote my handler that created the datastore entry for the upload, selected a location for the upload and created all the upload urls.
When testing this I was finding that it would take a while to go through generating the long lists of files (anything over 100). I decided that it would make sense since I was using to to make the requests concurrently. I spend a day or two getting that working. After dealing with some CORS issues that weirdly did not show up earlier I had everything working.
Then I started getting errors when stress testing my approach with a large (500mb) zip/course. The uploads would fail and I discovered that when trying to send 300+ files to generate upload urls I was getting the following error
Post http://localhost:62394: dial tcp [::1]:62394: connectex: No connection could be made because the target machine actively refused it.
now I have no idea how to diagnose this. I don't know if I am hitting a rate limit and if I am I don't know how to avoid it.
This seems like creating this should be simple, but it is anything but.
I have a few options I can pursue
Try to create the resumable uploads with a batch operation(https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch)
batch operations to /upload are not supported.
Maybe make requesting each url a one by one api call.
Make requesting the url happen over a channel (https://cloud.google.com/appengine/docs/go/channel/reference)
spend the next week or more adding layers of retries and fallback error handling.
Try another solution.
This should be simple. How should this be done?

How do i determine the stream size from an uploaded file from a website which i want to insert in Google Drive

I'm trying to upload files to Google Drive with ProgressListener and ChunkSize enabled (thus with DirectUploadEnabled disabled). This way i have a more reliable upload and the possibility for a progress indication to the user.
I transfer the files from the GWT website to the GAE with a FormPanel and a FileUploadField which POSTS the file to GAE on submit(). On the GAE i receive the file with an UploadServlet which uses org.apache.commons.fileupload to receive the documents as a stream. I don't want to receive the complete documents on the GAE because the documents are to big. Therefore i start the upload (insert) to Google Drive with the received stream from the incoming request.
Now there's a problem; for the insert i need to know the size of the stream;
int lContentLength = getRequest().getContentLength();
FileItemStream lFileItemStream = getFileItemStream();
InputStream lInputStream = lFileItemStream.openStream();
BufferedInputStream lBufferedInputStream = new BufferedInputStream(lInputStream);
InputStreamContent lInputStreamContent = new InputStreamContent(pContentType, lBufferedInputStream);
lInputStreamContent.setLength(lContentLength);
My first guess was the ContentLengt from the incoming Servlet request. But this is not correct because this concerns the complete request (which also contains other fields which are used as parameters). Without the Drive option DirectUploadEnabled i need the exact stream size from the uploaded document, otherwise the upload stall's at the end...
How do i grap this document size? The Google example is stupid because it uses a local file;
https://code.google.com/p/google-api-java-client/wiki/MediaUpload
Yes from a local file it is easy to get the file size (mediaFile.length()). But from a website ... Several sites specify it is not possible to grab the file size before submit() from the website, and it seems also impossible to determine the stream-size on GAE without loading the complete file...
How do i determine this streamsize? Is there another solution for this problem?

503 and 400 on uploading images in Google App Engine

Recently I experienced two problems in uploading files to my Java gae app.
I'm using the tecnique described in the blobstore doc.
With regular files, occasionally (let's say 15% of times) the client receives a "503 Service Unavailable".
With high resolution images (example 7000x10000) the client always receives a "400 Bad Request".
On both cases on the server there are no error messages logged, the blobs are written correctly, but the successPath url (the callback of createUploadUrl) is never called. It seems that the GAE endpoint handling the upload crashes for some reasons.
My client is a js XMLHttpRequest, wrapped in GWT:
public native void uploadWithXMLHttpRequest(UploadForm uploadForm) /*-{
var fd = new FormData();
var files = uploadForm.#mypackage.UploadForm::getFiles()();
for (var i = 0; i < files.length; i++) {
fd.append("uploadFile"+i, files[i]);
}
var xhr = new XMLHttpRequest();
//xhr.upload.addEventListeners... omitted
xhr.open("POST", uploadForm.#mypackage.UploadForm::getUploadUrl()());
xhr.send(fd);
}
Any ideas for possible causes and solutions/workarounds?
Thx.

This issue is being discussed in a GAE ticket opened by another user having the same problem: https://code.google.com/p/googleappengine/issues/detail?id=7619 (btw, the bug tracker system has a "start" feature, which allows you to vote for the ticket and receive notifications)

Possible reason:
1 You uploading large file (> 1MB) and writing it all. You should write it portinal: 1 write = 1MB.
2 Your request takes longer than 30 sec - use backend.

In this case the 503 is caused by errors when we write the upload info into your datastore. As you are using M/S datastore then transient errors are expected from time to time. I suggest you convert your app to HRD to minimize the chances of there being errors related to writing the upload information to your datastore.
The 400 error was generated by your app & is in your application logs.

Try to use Google Cloud Storage, since the blob store service has lot of problem, so google is trying to migrate uses from Blob to GCS support
I guess the image resolution can't exceed 8000 in the app engine blob store that is the reason it caused.

Uploading file on Google App Engine using Datastore and 30 sec response time limitation

Will the response timer on google app engine start upon submitting the web page's form?
If I'm going to upload a file that is greater than 1MB, I could split the files to 1MB to fit in the limitation of the Google App Engine Datastore. Now, my concern is if the client's internet connection is slow, it would eat up the 30 seconds timer right? If this is the case, it is impossible to upload large files with slow connection?

The 30 second response time limit only applies to code execution. So the uploading of the actual file as part of the request body is excluded from that. The timer will only start once the request is fully sent to the server by the client, and your code starts handling the submitted request. Hence it doesn't matter how slow your client's connection is.

As an side note, Instead of splitting your file into multiple parts, try using the blobstore. I am using it for images and it raises the storage limit to 50MB. (Remember to enable billing to get access to the blobstore)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight