Upload file bigger than 40MB to Google App Engine? - google-app-engine

I am creating a Google App Engine web app to "transform" files of 10K~50M
Scenario:
User opens http://fixdeck.appspot.com in web browser
User clicks on "Browse", select file, submits
Servlet loads file as an InputStream
Servlet transforms file
Servlet saves file as an OutputStream
The user's browser receives the transformed file and asks where to save it, directly as a response to the request in step 2
(For now I did not implement step 4, the servlet sends the file back without transforming it.)
Problem: It works for 15MB files but not for a 40MB file, saying: "Error: Request Entity Too Large. Your client issued a request that was too large."
Is there any workaround against this?
Source code: https://github.com/nicolas-raoul/transdeck
Rationale: http://code.google.com/p/ankidroid/issues/detail?id=697

GAE has a hard limits of 32MB for HTTP requests and HTTP responses. That will limit the size of uploads/downloads directly to/from a GAE app.
Revised Answer (Using Blobstore API.)
Google provides to the Blobstore API for handling larger files in GAE (up to 2GB). The overview documentation provides complete sample code. Your web form will upload the file to blobstore. The blobstore API then rewrites the POST back to your servlet where you can do your transformation and save the transformed data back in to the blobstore (as a new blob).
Original Answer (Didn't Consider Blobstore as an option.)
For downloading, I think GAE only workaround would be to break the file up in to multiple parts on the server, and then reassemble after downloading. That's probably not doable using a straight browser implementation though.
(As an alternative design, perhaps you could send the transformed file from GAE to an external download location (such as S3) where it could be downloaded by the browser without the GAE limit restrictions. I don't believe GAE initiated connections have same request/response size limitations, but I'm not positive. Regardless, you would still be restricted by the 30 second maximum request time. To get around that, you'd have to look in to GAE Backend instances and come up with some sort of asynchronous download strategy.)
For uploading larger files, I've read about the possibility of using HTML5 File APIs to slice the file in to multiple chunks for uploading, and then reconstructing on the server. Example: http://www.html5rocks.com/en/tutorials/file/dndfiles/#toc-slicing-files . However, I don't how practical a solution that really is due to changing specifications and browser capabilities.

You can use the blobstore to upload files as large as 2 gigabytes.

When uploading larger files, you can consider the file to be chunked into small sets of requests (should be less than 32MB which is the current limit) that Google App Engine supports.
Check this package with examples - https://github.com/pionl/laravel-chunk-upload
Following is a working code which uses the above package.
View
<div id="resumable-drop" style="display: none">
<p><button id="resumable-browse" class="btn btn-outline-primary" data-url="{{route('AddAttachments', Crypt::encrypt($rpt->DRAFT_ID))}}" style="width: 100%;
height: 91px;">Browse Report File..</button>
</div>
Javascript
<script>
var $fileUpload = $('#resumable-browse');
var $fileUploadDrop = $('#resumable-drop');
var $uploadList = $("#file-upload-list");
if ($fileUpload.length > 0 && $fileUploadDrop.length > 0) {
var resumable = new Resumable({
// Use chunk size that is smaller than your maximum limit due a resumable issue
// https://github.com/23/resumable.js/issues/51
chunkSize: 1 * 1024 * 1024, // 1MB
simultaneousUploads: 3,
testChunks: false,
throttleProgressCallbacks: 1,
// Get the url from data-url tag
target: $fileUpload.data('url'),
// Append token to the request - required for web routes
query:{_token : $('input[name=_token]').val()}
});
// Resumable.js isn't supported, fall back on a different method
if (!resumable.support) {
$('#resumable-error').show();
} else {
// Show a place for dropping/selecting files
$fileUploadDrop.show();
resumable.assignDrop($fileUpload[0]);
resumable.assignBrowse($fileUploadDrop[0]);
// Handle file add event
resumable.on('fileAdded', function (file) {
$("#resumable-browse").hide();
// Show progress pabr
$uploadList.show();
// Show pause, hide resume
$('.resumable-progress .progress-resume-link').hide();
$('.resumable-progress .progress-pause-link').show();
// Add the file to the list
$uploadList.append('<li class="resumable-file-' + file.uniqueIdentifier + '">Uploading <span class="resumable-file-name"></span> <span class="resumable-file-progress"></span>');
$('.resumable-file-' + file.uniqueIdentifier + ' .resumable-file-name').html(file.fileName);
// Actually start the upload
resumable.upload();
});
resumable.on('fileSuccess', function (file, message) {
// Reflect that the file upload has completed
location.reload();
});
resumable.on('fileError', function (file, message) {
$("#resumable-browse").show();
// Reflect that the file upload has resulted in error
$('.resumable-file-' + file.uniqueIdentifier + ' .resumable-file-progress').html('(file could not be uploaded: ' + message + ')');
});
resumable.on('fileProgress', function (file) {
// Handle progress for both the file and the overall upload
$('.resumable-file-' + file.uniqueIdentifier + ' .resumable-file-progress').html(Math.floor(file.progress() * 100) + '%');
$('.progress-bar').css({width: Math.floor(resumable.progress() * 100) + '%'});
});
}
}
</script>
Controller
public function uploadAttachmentAsChunck(Request $request, $id) {
// create the file receiver
$receiver = new FileReceiver("file", $request, HandlerFactory::classFromRequest($request));
// check if the upload is success, throw exception or return response you need
if ($receiver->isUploaded() === false) {
throw new UploadMissingFileException();
}
// receive the file
$save = $receiver->receive();
// check if the upload has finished (in chunk mode it will send smaller files)
if ($save->isFinished()) {
// save the file and return any response you need, current example uses `move` function. If you are
// not using move, you need to manually delete the file by unlink($save->getFile()->getPathname())
$file = $save->getFile();
$fileName = $this->createFilename($file);
// Group files by mime type
$mime = str_replace('/', '-', $file->getMimeType());
// Group files by the date (week
$dateFolder = date("Y-m-W");
$disk = Storage::disk('gcs');
$gurl = $disk->put($fileName, $file);
$draft = DB::table('draft')->where('DRAFT_ID','=', Crypt::decrypt($id))->get()->first();
$prvAttachments = DB::table('attachments')->where('ATTACHMENT_ID','=', $draft->ATT_ID)->get();
$seqId = sizeof($prvAttachments) + 1;
//Save Submission Info
DB::table('attachments')->insert(
[ 'ATTACHMENT_ID' => $draft->ATT_ID,
'SEQ_ID' => $seqId,
'ATT_TITLE' => $fileName,
'ATT_DESCRIPTION' => $fileName,
'ATT_FILE' => $gurl
]
);
return response()->json([
'path' => 'gc',
'name' => $fileName,
'mime_type' => $mime,
'ff' => $gurl
]);
}
// we are in chunk mode, lets send the current progress
/** #var AbstractHandler $handler */
$handler = $save->handler();
return response()->json([
"done" => $handler->getPercentageDone(),
]);
}
/**
* Create unique filename for uploaded file
* #param UploadedFile $file
* #return string
*/
protected function createFilename(UploadedFile $file)
{
$extension = $file->getClientOriginalExtension();
$filename = str_replace(".".$extension, "", $file->getClientOriginalName()); // Filename without extension
// Add timestamp hash to name of the file
$filename .= "_" . md5(time()) . "." . $extension;
return $filename;
}

You can also use blobstore api to directly upload to cloud storage. Blow is the link
https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
upload_url = blobstore.create_upload_url(
'/upload_handler',
gs‌​_bucket_name = YOUR.BUCKET_NAME)
template_values = { 'upload_url': upload_url }
_jinjaEnvironment = jinjaEnvironment.JinjaClass.getJinjaEnvironemtVariable()
if _jinjaEnvironment:
template = _jinjaEnvironment.get_template('import.html')
Then in index.html:
<form action="{{ upload_url }}"
method="POST"
enctype="multipart/form-data">
Upload File:
<input type="file" name="file">
</form>

Related

Corrupt video uploads when chunking MediaRecorder to Google Cloud platform

I currently am using react hook powered component to record my screen, and subsequently upload it to Google Cloud Storage. However, when it finishes, the file created inside Google Cloud appears to be corrupt.
This is the gist of the code within my React component, where useMediaRecorder is from here: https://github.com/wmik/use-media-recorder -
let {
error,
status,
mediaBlob,
stopRecording,
getMediaStream,
startRecording,
liveStream,
} = useMediaRecorder({
onCancelScreenShare: () => {
stopRecording();
},
onDataAvailable: (chunk) => {
// do the uploading here:
onChunk(chunk);
},
recordScreen: true,
blobOptions: { type: "video/webm;codecs=vp8,opus" },
mediaStreamConstraints: { audio: audioEnabled, video: true },
});
As data becomes available through this hook - it calls onChunk( chunk ) passing a binary Blob through to that method, to perform the upload, I tie in with this section of code to perform the upload:
const onChunk = (binaryData) => {
var formData = new FormData();
formData.append("data", binaryData);
let customerApi = new CustomerVideoApi();
customerApi.uploadRecording(
videoUUID,
formData,
(res) => {},
(err) => {}
);
};
customerApi.uploadRecording looks like this (using axios).
const uploadRecording = (uuid, data, fn, fnErr) => {
axios
.post(endpoint + "/stream/upload", data, {
headers: {
"Content-Type": "multipart/form-data",
},
})
.then(function (response) {
fn(response);
})
.catch(function (error) {
fnErr(error.response);
});
};
The HTTP request succeeds, and all is well with the world: the server side code to upload is based on laravel:
// this is inside the controller.
public function index( Request $request )
{
// Set file attributes.
$filepath = '/public/chunks/';
$file = $request->file('data');
$filename = $uuid . ".webm";
// streamupload
File::streamUpload($filepath, $filename, $file, true);
return response()->json(['uploaded' => true,'uuid'=>$uuid]);
}
// there's a service provider used to create a new macro on the File:: object, providing the facility for appropriate handling the stream:
public function boot()
{
File::macro('streamUpload', function($path, $fileName, $file, $overWrite = true) {
$resource = fopen($file->getRealPath(), 'r+');
$storageClient = new StorageClient([
'projectId' => 'myprjectid',
'keyFilePath' => '/my/path/to/servicejson.json',
]);
$bucket = $storageClient->bucket('mybucket');
$adapter = new GoogleStorageAdapter($storageClient, $bucket);
$filesystem = new Filesystem($adapter);
return $overWrite
? $filesystem->putStream($fileName, $resource)
: $filesystem->writeStream($fileName, $resource);
});
}
So to reiterate:
React app chunks out blobs,
server side determines if it should create or append in Google Cloud Storage
server side succeeds.
4) Video inside Google Cloud platform is corrupted.
However, the video file, inside the Google Cloud container is corrupted and won't play. I'm unsure exactly why it is corrupted, but my guesses so far:
Some sort of Dodgy Mime type problem.. - different browsers seem to handle the codec / filetype differently from the mediarecorder: e.g. Chrome seems to be x-matroska (.mkv?) - firefox different again.. Ideally I would have a container of .webm - notice how I set the file name server side, and it isn't coming from the client. Should it? I'm unsure how to force the MediaRecorder to be a specific mimeType - I thought the blobOptions option should do it, but changing the extension and mime type seems to have little to no impact on the corruption occurring.
Some sort of problem during upload where an HTTP request doesn't execute and finish in order - e.g.
1 onDataAvailable completes second
2 onDataAvailable completes first
3 onDataAvailable completes third
I've sort of ruled this out because I think the chunks should be small enough.
Some sort of problem with Google Cloud Storage APIs that I'm using, perhaps in the wrong way? Does the cloud platform support streaming, and does this library send the correct params to do so?
Some sort of problem with how I'm uploading - should the axios headers be multipart formdata, or something else?
This is the package I'm using for the Server side: https://github.com/Superbalist/flysystem-google-cloud-storage
Can anyone could shed any light on how to achieve this goal of streaming up into Google Cloud without the video from the mediarecorder being corrupted? Hopefully there's enough detail here in the question to help figure it out. The problem as illustrated isn't on getting the file as far as Google cloud, but rather the resulting file being unplayable in any video format.
Update
I've ordered my chunks client side now, and queued them properly before letting them reach the server. No difference to the output. As some have suggested - a single blob upload request works fine.
Tried using streamable config param (from reading source code it seems like chunks need to be a certain size before Google recognises them as a resumable upload
$filesystem = new Filesystem($adapter, [
'resumable'=>true
]);
Not sure how: https://cloud.google.com/storage/docs/performing-resumable-uploads - is implemented within the libraries I'm using, (or within the Google Cloud APIs themselves if at all?). Do I need to implement that myself? Documentation is light on Google's part.
Short version:
The first thing you should do is buffer the whole video locally, and send a single payload to the server and to google drive. This will validate your code for a small video is actually correct. Once you can verify this you can move onto handling multi-chunk uploads.
Longer version:
For starters, you aren't passing the uuid to the request, it's being used:
const uploadRecording = (uuid, data, fn, fnErr) => {
axios
.post(endpoint + "/stream/upload", data, {
headers: {
"Content-Type": "multipart/form-data",
},
})
.then(function (response) {
fn(response);
})
.catch(function (error) {
fnErr(error.response);
});
};
Next, you can't trust how chunking will work, I think you verified this behavior with the out of order result of chunk logging. You need to assume on your server you will get chunks out of order and handle them correctly.
Each chunk you get on the server needs to put in the right place, you can't just "writeStream", you need to write to the explicit binary block. Specifically, on every request specify the byte range: Google docs:
curl -i -X PUT --data-binary #CHUNK_LOCATION \
-H "Content-Length: CHUNK_SIZE" \
-H "Content-Range: bytes CHUNK_FIRST_BYTE-CHUNK_LAST_BYTE/TOTAL_OBJECT_SIZE" \
"SESSION_URI"
CHUNK_LOCATION is the local path to the
chunk that you're currently uploading. CHUNK_SIZE is the number of
bytes you're uploading in the current request. For example, 524288. CHUNK_FIRST_BYTE is the
starting byte in the overall object that the chunk you're uploading
contains. CHUNK_LAST_BYTE is the ending byte in the
overall object that the chunk you're uploading contains.
TOTAL_OBJECT_SIZE is the total size of the
object you are uploading. SESSION_URI is the value returned in the
Location header when you initiated the resumable upload.
Try to eliminate as many variables as possible and pinpoint where exactly the file is getting corrupted.
Since you are using a React(JS) -> Laravel(PHP) -> GoogleCloud path,
first thing I would suggest is to test each step separately:
React -> Laravel - save the file on your server and check if its corrupted at this point
Laravel -> GoogleCloud - Load a file from the server filesystem and upload to cloud and see if it gets corrupted
I don't have experience with Google cloud, but I did something very similar with AWS and found that their video uploading service was extremely picky about the requests (including order of headers that were sent).
Try to compare the specs on the service you are using with your input, make the smallest possible thing that works and start adding variables until you get to the final state.
Also I don't see any kind of data ordering in your code.
If your chunks are close to each other, and with streaming it is highly possible then there is a chance that they will arrive in different order than originally sent. If you just append them to a file without any control of the sorting then the file will indeed get corrupted. Not sure if for webm that would cause just parts of the video to be broken or the entire thing to die.

How to stream mp3 from server-side keeping the file hidden?

I'm trying to play mp3 files from server-side to client-side. Where the client access the server passing some ID and the server return the file.
Right now, how this is working?
Well, using Laravel (server-side) and AngularJS (client-side) on distinct urls, i'm able to play the song.
But, if I get the request response I'm able to download the song.
So, what would be the good way to work so that information wouldn't be visible to the user?
I would write some sort of file proxy.
You have to move your files out of the publicly accessible area. F.a one level above the page root. So it is not possible to get the data directly.
Then you need a server side script, that gets the data and returns it with the headers you need.
Here is an example (plain PHP):
/**
* #param string $file_name
* #param string $mime
* #param bool $download
*/
public function fileProxyAction($file_name, $mime, $download = false) {
if(basename($file_name) != $file_name) return 'Filename not valid!';
$path = '... your path goes here';
$file = $path.$file_name;
if (!(file_exists($file) && is_readable($file))) return 'The file "'.$file_name.'" could not be found!';
ob_clean();
if($download === false) {
header('Content-type: '.$mime);
header('Content-length: '.filesize($file));
$open = # fopen($file, 'rb');
if ($open) {
fpassthru($open);
exit;
}
} else {
// download
$path_parts = pathinfo($file);
header("Content-Disposition: attachment; filename=\"".$path_parts["basename"]."\"");
header("Content-type: application/octet-stream");
header("Content-length: " . filesize($file));
header("Content-Disposition: filename=\"".$path_parts["basename"]."\"");
header("Cache-control: private"); // open files directly
readfile($file);
die;
}
}
Laravel has an excellent Built-In-Filesystem. Check it out. I'm sure you can optimize my method with it.
EDIT
If you need to check a token or something, you shouldn't call the fileProxyAction directly by the router. Instead let your router call a Method which checks the token or what ever you're using ;)
Example (pseudo code):
Route::get('/mp3/{id}/{token}', function($id, $token) {
if($token !== Session::get('token')) return App::abort(401);
$name = Mp3::findOrFail($id)->name;
$mime = Mp3::findOrFail($id)->mime;
return $this->fileProxyAction($name, $mime);
});

AWS file upload more than 5mb - nodejs

I am using this module to upload file to amazon
https://www.npmjs.com/package/streaming-s3, which is working fine if file is less or equal than 5 MB.
I tried to upload PDF file with size 6 MB. it shows upload successfully, but when i tried to open that file through aws.
it shows Failed to load PDF document
When i checked size on Aws it shows 5 MB.
I am using following code to upload on AWS
var options = {
concurrentParts: 2,
waitTime: 20000,
retries: 2,
maxPartSize: 10 * 1024 * 1024
};
//call stream function to upload the file to s3
var uploader = new streamingS3(fileReadStream, config.aws.accessKey, config.aws.secretKey, awsHeader, options);
//start uploading
uploader.begin();// important if callback not provided.
// handle these functions
uploader.on('data', function (bytesRead) {
console.log(bytesRead, ' bytes read.');
});
uploader.on('part', function (number) {
console.log('Part ', number, ' uploaded.');
});
// All parts uploaded, but upload not yet acknowledged.
uploader.on('uploaded', function (stats) {
console.log('Upload stats: ', stats);
});
uploader.on('finished', function (response, stats) {
console.log(response);
logger.log('info', "UPLOAD ", response);
cb(null, response);
});
uploader.on('error', function (err) {
console.log('Upload error: ', err);
logger.log('error', "UPLOAD Error: ", err);
cb(err);
});
which is working fine for less than 5 MB files.
Any idea? Is there is any settings which i need to do on AWS ?
Thanks
This is desired feature for piping the content to S3 via the multipart file upload API, you can keep memory usage low even when operating on a stream that is GB in size. This stream avoids high memory usage by flushing the stream to S3 in 5 MB parts such that it should only ever store 5 MB of the stream data at a time.
The problem that we are facing here is that the next part is not adding up to the stream.
Refer this link for end to end details
https://www.npmjs.com/package/s3-upload-stream
you can also track the upload progress, to debug the issue using
/* Handle progress. Example details object:
{ ETag: '"f9ef956c83756a80ad62f54ae5e7d34b"',
PartNumber: 5,
receivedSize: 29671068,
uploadedSize: 29671068 }
*/
upload.on('part', function (details) {
console.log(details);
});
Even on complete file upload done.
upload.on('uploaded', function (details) {
console.log(details);
});

PhoneGap photo to Google App Engine

I am trying to upload a photo jpg from a PhoneGap app (javascript) to Google App Engine (php), store parameters in a db, and photo to Google Cloud Storage. All works except the photo file transfer.
The upload function I'm using is typical of PhoneGap's file transfer example http://docs.phonegap.com/en/edge/cordova_file_file.md.html#FileTransfer.
function uploadPhoto(imageURI) {
// imageURI is photo local url file:///Users/me/Library/...etc ... .jpg
// from navigator.camera.getPicture function
// prepare post variables
var options = new FileUploadOptions();
options.fileKey="file";
options.fileName=imageURI.substr(imageURI.lastIndexOf('/')+1);
options.mimeType="image/jpeg";
var params = new Object();
params.foo = "foo";
options.params = params;
options.chunkedMode = false;
// upload image and option
var ft = new FileTransfer();
ft.upload(imageURI, 'http://myapphere.appspot.com/php/myphoto.php', function(r){
// do stuff with r.response
},function(error){
// error
},options,true);
}
On the Google App Engine server, the parameter variables pass fine - what doesn't seem to transfer it the $_FILE file.
/* php/myphoto.php */
// option variables are passed - this works
$foo = $_POST["foo"];
// but it seems the $_FILES is empty?
$gs_name = $_FILES["file"]["tmp_name"]; // <-- not transferring?
$fileName = 'test.jpg';
$moveResult = move_uploaded_file($gs_name, "gs://mybucket/".$fileName);
A file test.jpg is stored in mybucket (a small blank binary/octet). As a test, I created an HTML form on GAE to upload a file image and move_upload_file $_FILE to mybucket, it works. It's the transfer of $_FILES from the javascript app that I can't figure out. (I'm aware of the Google Cloud Storage JSON API objects.insert, but here I'd like to go from phonegap html/javascript to Google App Engine PHP page to process passed data).

Unable to upload multiple files to Google drive using PHP SDK

I am trying to upload multiple files to Google drive using the PHP SDK. For this I am calling the function below iteratively passing the required parameters:
function insertFile($driveService, $title, $description, $parentId, $fileUrl) {
global $header;
$file = new Google_DriveFile();
$file->setTitle($title);
$file->setDescription($description);
$mimeType= "application/vnd.google-apps.folder";
if ($fileUrl != null) {
$fileUrl = replaceSpaceWithHtmlCode($fileUrl);
$header = getUrlHeader($fileUrl);
$mimeType = $header['content-type'];
}
$file->setMimeType($mimeType);
$parent = new Google_ParentReference();
// Set the parent folder.
if ($parentId != null) {
$parent->setId($parentId);
$file->setParents(array($parent));
}
try {
$data = null;
if ($fileUrl != null) {
if (hasErrors($driveService, $fileUrl) == True) {
return null;
}
$data = file_get_contents($fileUrl);
}
$createdFile = $driveService->files->insert($file, array(
'data' => $data,
'mimeType' => $mimeType,
));
return $createdFile;
} catch (Exception $e) {
echo "Error: 12";
return null;
}
}
I am running this app on the Google App Engine.
However, I am unable to upload all the files I pass to it. For example, if I pass about 12-15 files, only 10-11 get uploaded, and sometimes all get uploaded, even though all parameters are correct. I have caught the exception when it fails to create a file and this says it is unable to create a file, for the files that are not uploaded. I don't see any warnings or errors in the logs on the app engine.
Am I missing something? Can someone please point me where I should be looking to correct this and make it reliable enough to upload all files given to it?
The HTTP response that I get when I try to upload 30 files is this:
PHP Fatal error: The request was aborted because it exceeded the maximum execution time
Check the http response to see the detailed reason. It might be that you are hitting the throttle limit and getting a 403 rate limit response.

Resources