Is it possible to save a file directly from a web worker? - file

I have an entirely browser-based (i.e. no backend) application which analyzes XML data in files which average about 250MB each. The actual parsing and analysis happens in a web worker, which is fed data in 64KB chunks by a FileReader instance, and this is all quite performant.
I have a request from the client to expand this application so that it can generate a .zip file containing the original input file and the results of the analysis, and allow the user to save that file to her local machine. Generating a .zip file in memory with those contents isn't a problem. The problem lies in transferring that much data from the web worker which generates it back to the main browser thread, so that it can be saved; attempting to do this invariably provokes a crash or out-of-memory exception. (I've tried transferring strings all at once and a chunk at a time, and I've tried using an ArrayBuffer as a transferable object to avoid copying. All fail in the same fashion.)
Unfortunately, I don't know any way to invoke a file save operation directly from a worker thread. I know several methods of doing so from the main browser thread, but all of them require either the ability to create DOM nodes (which worker threads of course can't do), or the use of interfaces (i.e. msSaveBlob, saveAs) which no browser seems to expose to a worker thread. I've spent a while looking for possibilities on the web, but found nothing usable; FileWriterSync looked good, but only Chrome supports it, and I need to target IE and Firefox as well.
Is there a method I've overlooked for saving files directly from a web worker? If so, what is it? Or am I just out of luck here?

tl;dr demo
You don't need to copy the entire file to the client side at all. You don't even need to transfer it, in fact. First a recap.
This is how to create Blob from some typed array:
// Some arbitrary binary data
const mydata = new Uint16Array([1,2,3,4,5]);
// mydata vs. mydata.buffer does not seem to make any difference
const blob = new Blob([mydata], {type: "octet/stream"});
You can create an object URL, which is a copy of the original Blob managed by the browser and accessible as URL. I have done this with huge files without seeing performance impact:
const url = URL.createObjectURL(blob);
This is how I typically download URLs:
const link = document.createElement("a");
link.download = "data.bin";
link.href = e.data.link;
link.appendChild(new Text("Download data"));
link.addEventListener("click", function() {
this.parentNode.removeChild(this);
// remember to free the object url, but wait until the download is handled
setTimeout(()=>{URL.revokeObjectURL(e.data.link);}, 500)
});
document.body.appendChild(link);
You can trigger the download automatically by invoking click event on that link. I prefer to let the user decide when to download.
So, all together:
worker.js
// Some arbitrary binary data
const mydata = new Uint16Array([1,2,3,4,5]);
self.onmessage = function(e) {
console.log("Message: ",e.data)
switch(e.data.name) {
case "make-download" :
const blob = new Blob([mydata.buffer], {type: "octet/stream"});
const url = URL.createObjectURL(blob);
self.postMessage({name:"download-link", link:url});
break;
default:
console.error("Unknown message:", e.data.name);
}
}
main.js
var worker = new Worker("worker.js");
worker.addEventListener("message", function(e) {
switch(e.data.name) {
case "download-link" : {
if(e.data.error) {
console.error("Download error: ", e.data.error);
}
else {
const link = document.createElement("a");
link.download = "data.bin";
link.href = e.data.link;
link.appendChild(new Text("Download data"));
link.addEventListener("click", function() {
this.parentNode.removeChild(this);
// remember to free the object url, but wait until the download is handled
setTimeout(()=>{URL.revokeObjectURL(e.data.link);}, 500)
});
document.body.appendChild(link);
}
break;
}
default:
console.error("Unknown message:", e.data.name);
}
});
function requestDownload() {
worker.postMessage({name:"make-download"});
}
When I click Download in my demo, I can see this in my HEX editor:
Looks just fine :)

Related

Corrupt video uploads when chunking MediaRecorder to Google Cloud platform

I currently am using react hook powered component to record my screen, and subsequently upload it to Google Cloud Storage. However, when it finishes, the file created inside Google Cloud appears to be corrupt.
This is the gist of the code within my React component, where useMediaRecorder is from here: https://github.com/wmik/use-media-recorder -
let {
error,
status,
mediaBlob,
stopRecording,
getMediaStream,
startRecording,
liveStream,
} = useMediaRecorder({
onCancelScreenShare: () => {
stopRecording();
},
onDataAvailable: (chunk) => {
// do the uploading here:
onChunk(chunk);
},
recordScreen: true,
blobOptions: { type: "video/webm;codecs=vp8,opus" },
mediaStreamConstraints: { audio: audioEnabled, video: true },
});
As data becomes available through this hook - it calls onChunk( chunk ) passing a binary Blob through to that method, to perform the upload, I tie in with this section of code to perform the upload:
const onChunk = (binaryData) => {
var formData = new FormData();
formData.append("data", binaryData);
let customerApi = new CustomerVideoApi();
customerApi.uploadRecording(
videoUUID,
formData,
(res) => {},
(err) => {}
);
};
customerApi.uploadRecording looks like this (using axios).
const uploadRecording = (uuid, data, fn, fnErr) => {
axios
.post(endpoint + "/stream/upload", data, {
headers: {
"Content-Type": "multipart/form-data",
},
})
.then(function (response) {
fn(response);
})
.catch(function (error) {
fnErr(error.response);
});
};
The HTTP request succeeds, and all is well with the world: the server side code to upload is based on laravel:
// this is inside the controller.
public function index( Request $request )
{
// Set file attributes.
$filepath = '/public/chunks/';
$file = $request->file('data');
$filename = $uuid . ".webm";
// streamupload
File::streamUpload($filepath, $filename, $file, true);
return response()->json(['uploaded' => true,'uuid'=>$uuid]);
}
// there's a service provider used to create a new macro on the File:: object, providing the facility for appropriate handling the stream:
public function boot()
{
File::macro('streamUpload', function($path, $fileName, $file, $overWrite = true) {
$resource = fopen($file->getRealPath(), 'r+');
$storageClient = new StorageClient([
'projectId' => 'myprjectid',
'keyFilePath' => '/my/path/to/servicejson.json',
]);
$bucket = $storageClient->bucket('mybucket');
$adapter = new GoogleStorageAdapter($storageClient, $bucket);
$filesystem = new Filesystem($adapter);
return $overWrite
? $filesystem->putStream($fileName, $resource)
: $filesystem->writeStream($fileName, $resource);
});
}
So to reiterate:
React app chunks out blobs,
server side determines if it should create or append in Google Cloud Storage
server side succeeds.
4) Video inside Google Cloud platform is corrupted.
However, the video file, inside the Google Cloud container is corrupted and won't play. I'm unsure exactly why it is corrupted, but my guesses so far:
Some sort of Dodgy Mime type problem.. - different browsers seem to handle the codec / filetype differently from the mediarecorder: e.g. Chrome seems to be x-matroska (.mkv?) - firefox different again.. Ideally I would have a container of .webm - notice how I set the file name server side, and it isn't coming from the client. Should it? I'm unsure how to force the MediaRecorder to be a specific mimeType - I thought the blobOptions option should do it, but changing the extension and mime type seems to have little to no impact on the corruption occurring.
Some sort of problem during upload where an HTTP request doesn't execute and finish in order - e.g.
1 onDataAvailable completes second
2 onDataAvailable completes first
3 onDataAvailable completes third
I've sort of ruled this out because I think the chunks should be small enough.
Some sort of problem with Google Cloud Storage APIs that I'm using, perhaps in the wrong way? Does the cloud platform support streaming, and does this library send the correct params to do so?
Some sort of problem with how I'm uploading - should the axios headers be multipart formdata, or something else?
This is the package I'm using for the Server side: https://github.com/Superbalist/flysystem-google-cloud-storage
Can anyone could shed any light on how to achieve this goal of streaming up into Google Cloud without the video from the mediarecorder being corrupted? Hopefully there's enough detail here in the question to help figure it out. The problem as illustrated isn't on getting the file as far as Google cloud, but rather the resulting file being unplayable in any video format.
Update
I've ordered my chunks client side now, and queued them properly before letting them reach the server. No difference to the output. As some have suggested - a single blob upload request works fine.
Tried using streamable config param (from reading source code it seems like chunks need to be a certain size before Google recognises them as a resumable upload
$filesystem = new Filesystem($adapter, [
'resumable'=>true
]);
Not sure how: https://cloud.google.com/storage/docs/performing-resumable-uploads - is implemented within the libraries I'm using, (or within the Google Cloud APIs themselves if at all?). Do I need to implement that myself? Documentation is light on Google's part.
Short version:
The first thing you should do is buffer the whole video locally, and send a single payload to the server and to google drive. This will validate your code for a small video is actually correct. Once you can verify this you can move onto handling multi-chunk uploads.
Longer version:
For starters, you aren't passing the uuid to the request, it's being used:
const uploadRecording = (uuid, data, fn, fnErr) => {
axios
.post(endpoint + "/stream/upload", data, {
headers: {
"Content-Type": "multipart/form-data",
},
})
.then(function (response) {
fn(response);
})
.catch(function (error) {
fnErr(error.response);
});
};
Next, you can't trust how chunking will work, I think you verified this behavior with the out of order result of chunk logging. You need to assume on your server you will get chunks out of order and handle them correctly.
Each chunk you get on the server needs to put in the right place, you can't just "writeStream", you need to write to the explicit binary block. Specifically, on every request specify the byte range: Google docs:
curl -i -X PUT --data-binary #CHUNK_LOCATION \
-H "Content-Length: CHUNK_SIZE" \
-H "Content-Range: bytes CHUNK_FIRST_BYTE-CHUNK_LAST_BYTE/TOTAL_OBJECT_SIZE" \
"SESSION_URI"
CHUNK_LOCATION is the local path to the
chunk that you're currently uploading. CHUNK_SIZE is the number of
bytes you're uploading in the current request. For example, 524288. CHUNK_FIRST_BYTE is the
starting byte in the overall object that the chunk you're uploading
contains. CHUNK_LAST_BYTE is the ending byte in the
overall object that the chunk you're uploading contains.
TOTAL_OBJECT_SIZE is the total size of the
object you are uploading. SESSION_URI is the value returned in the
Location header when you initiated the resumable upload.
Try to eliminate as many variables as possible and pinpoint where exactly the file is getting corrupted.
Since you are using a React(JS) -> Laravel(PHP) -> GoogleCloud path,
first thing I would suggest is to test each step separately:
React -> Laravel - save the file on your server and check if its corrupted at this point
Laravel -> GoogleCloud - Load a file from the server filesystem and upload to cloud and see if it gets corrupted
I don't have experience with Google cloud, but I did something very similar with AWS and found that their video uploading service was extremely picky about the requests (including order of headers that were sent).
Try to compare the specs on the service you are using with your input, make the smallest possible thing that works and start adding variables until you get to the final state.
Also I don't see any kind of data ordering in your code.
If your chunks are close to each other, and with streaming it is highly possible then there is a chance that they will arrive in different order than originally sent. If you just append them to a file without any control of the sorting then the file will indeed get corrupted. Not sure if for webm that would cause just parts of the video to be broken or the entire thing to die.

Download files from firebase storage using ReactJs

Succesfully i have made to Upload files into firebase storage, but now i want to display all files in table and to have option to download each file.I've read the documentation in firebase but it won't work.When i click the button which function is to get all files and the i want to visualize them in table which users can see:
Show file function:
showFileUrl(){
storageRef.child('UploadedFiles/').listAll().then(function(res) {
res.items.forEach(function(folderRef) {
console.log("folderRef",folderRef.toString());
var blob = null;
var xhr = new XMLHttpRequest();
xhr.open("GET", "downloadURL");
xhr.responseType = "blob";
xhr.onload = function()
{
blob = xhr.response;//xhr.response is now a blob object
console.log(blob);
}
xhr.send();
});
}).catch(function(error) {
});
}
This is log of the network which i found when debugging.What i need to do to get all data and visualize it in table and to hava a download button and when is pressed to download the file
Network log:
Storage in firebase:
Blob object of the files:
Your code gets a list of all the files, but it doesn't actually to anything to read the data for each file.
When using the Web client SDK, the only way to get the data for a file is through a download URL as shown here. So you'll need to:
Loop through all the files you get back from listAll() (you're already doing this).
Call `getDownloadURL as shown here, to get a download URL for each file.
Then use another library/function (such as fetch()/XMLHTTPRequest) to read the data for each file.
Alternatively, if your files are images, you can stuff the download URL in an img tag as the preview.

I'm concerned about memory leak from RecordRTC url object

Using RecordRTC library, I'm hooking my React web application with webcam video recording, replaying and saving functionalities. Coming from native application development, I'm always concerned about potential memory leak which often can be easily diagnosed by checking system memory or lagging UI experience. In web applications, what diagnoses can you perform to see if a JS object is being created and deleted properly without leaks.
My concern appeared when I began integrating replay functionality as shown below. The requestusermedia method instantiates the webcam stream when React component mounts. In fact, the src state gets assigned with the url to the video stream. Afterwards, anytime a stop button is clicked, a new url, representing a webm file of recorded video, is created and assigned to the same src state. The functionality of streaming and replaying works as planned. But, I'm concerned that continuation of creating and replaying video, essentially creating a new url wrapping webm file would only result in memory leak unless the browser is refreshed.
Are there any checks in the browser level I could conduct to diagnose this? Or is this something I shouldn't be concerned of at all in the web application world?
requestUserMedia() {
captureUserMedia((stream) => {
this.setState({ src: window.URL.createObjectURL(stream)});
});
}
handleRecord(){
if (!this.state.record) {
captureUserMedia((stream) => {
var recorder = RecordRTC(stream, {
type: 'video'
});
recorder.startRecording();
this.state.recordVideo = recorder;
});
} else {
var recorder = this.state.recordVideo
recorder.stopRecording(() => {
var blob = recorder.getBlob();
var url = window.URL.createObjectURL(blob);
this.setState({ src: url })
});
}
let newRecordState = !this.state.record
this.setState({
record: newRecordState
})
}
Setting the videos src to a string created with URL.createObjectURL has been deprecated for that reason. Set video.srcObject = stream instead.
For the second createObjectURL use URL.revokeObjectURL to revoke the previous one.

AngularJS GET receives empty reply in Chrome but not in Fiddler

I'm implementing file download using AngularJS and WCF. My back-end is a .NET project hosted in IIS. The file is serialized as an array of bytes and then on the client side I utilize the File API to save the content.
To simplify the problem, back-end is like:
[WebInvoke(Method = "GET", UriTemplate = "FileService?path={path}")]
[OperationContract]
public byte[] DownloadFileBaseOnPath(string path)
{
using (var memoryStream = new MemoryStream())
{
var fileStream = File.OpenRead(path);
fileStream.CopyTo(memoryStream);
fileStream.Close();
WebOperationContext.Current.OutgoingResponse.Headers["Content-Disposition"] = "attachment; filename=\"Whatever\"";
WebOperationContext.Current.OutgoingResponse.ContentType = "application/octet-stream"; // treat all files as binary file
return memoryStream.ToArray();
}
}
And on client side, it just sends a GET request to get those bytes, converts in into a blob and save it.
function sendGetReq(url, config) {
return $http.get(url, config).then(function(response) {
return response.data;
});
}
Save the file then:
function SaveFile(url) {
var downloadRequest = sendGetReq(url);
downloadRequest.then(function(data){
var aLink = document.createElement('a');
var byteArray = new Uint8Array(data);
var blob = new Blob([byteArray], { type: 'application/octet-stream'});
var downloadUrl = URL.createObjectURL(blob);
aLink.setAttribute('href', downloadUrl);
aLink.setAttribute('download', fileNameDoesNotMatter);
if (document.createEvent) {
var event = document.createEvent('MouseEvents');
event.initEvent('click', false, false);
aLink.dispatchEvent(event);
}
else {
aLink.click();
}
setTimeout(function () {
URL.revokeObjectURL(downloadUrl);
}, 1000); // cleanup
});
}
This approach works fine with small files. I could successfully download files up to 64MB. But when I try to download a file larger than 64MB, the response.body is empty in Chrome. I also used Fiddler to capture the traffic. According to Fiddler, Back-end has successfully serialized the byte array and returned it. Please refer to the screenshot below.
In this example, I was trying to download a 70MB file:
And the response.data is empty:
Any idea why this is empty for file over 70MB? Though the response itself is more than 200MB, I do have enough memory for that.
Regarding to the WCF back-end, I know I should use Stream Mode when it comes to large files. But the typical use of my application is to download files less than 10MB. So I hope to figure this out first.
Thanks
Answer my own question.
Honestly I don't know what's going wrong. The issue still persists if I transfer it as a byte array. I eventually gave up this approach by returning a stream instead. Then on the client side, adding the following configuration
{responseType : blob}
and save it as a blob.

Angular js way to download file and show loading screen using the $resource

I am using Angular js to show loading screen. It works for all the REST services call except REST service to download the file. I understand why it is not working because for download I am not making any service call using $resource; instead of that I am using normal approach to download the file therefore Angular js code doesn't have any control on start/finish the service request. I tried to use $resource to hit this REST service however I am getting the data from this service and in this case loading screen was working fine however not sure how to use this data to display to user to download in angular way. Following are required details. Please help.
Approach 1 using iframe approach:
/*Download file */
scope.downloadFile = function (fileId) {
//Show loading screen. (Somehow it is not working)
scope.loadingProjectFiles=true;
var fileDownloadURL = "/api/files/" + fileId + "/download";
downloadURL(fileDownloadURL);
//Hide loading screen
scope.loadingProjectFiles=false;
};
var $idown; // Keep it outside of the function, so it's initialized once.
var downloadURL = function (url) {
if ($idown) {
$idown.attr('src', url);
} else {
$idown = $('<iframe>', { id: 'idown', src: url }).hide().appendTo('body');
}
};
Approach 2 using $resource (Not sure how to display data on screen to download)
/*Download file */
scope.downloadFile = function (fileId) {
//Show loading screen (Here loading screen works).
scope.loadingProjectFiles=true;
//File download object
var fileDownloadObj = new DownloadFile();
//Make server call to create new File
fileDownloadObj.$get({ fileid: fileid }, function (response) {
//Q? How to use the response data to display on UI as download popup
//Hide loading screen
scope.loadingProjectFiles=false;
});
};
This is the correct pattern with the $resource service:
scope.downloadFile = function (fileId) {
//Show loading screen (Here loading screen works).
scope.loadingProjectFiles=true;
var FileResource = $resource('/api/files/:idParam', {idParam:'#id'});
//Make server call to retrieve a file
var yourFile = FileResource.$get({ id: fileId }, function () {
//Now (inside this callback) the response data is loaded inside the yourFile variable
//I know it's an ugly pattern but that's what $resource is about...
DoSomethingWithYourFile(yourFile);
//Hide loading screen
scope.loadingProjectFiles=false;
});
};
I agree with you that this is a weird pattern and is different of other APIs where the downloaded data is assigned to a parameter in a callback function, hence your confusion.
Pay attention to the names and the cases of the parameters, and look that there're two mappings involved here, one between the caller to the $resource object and the object itself, and another between this object and the url that it contructs for downloading the actual data.
Here are some idea's for the second approach, you could present the user with a link after the download has happened:
With a "data url". Probably not a good idea for large files.
With a URL like "filesystem:mydownload.zip" You'd first have to save the file with the filesystem API. You can find some inspiration on html5rocks

Resources