create on fly zip file for download through node.js - file

I simply need to achieve below setup with node js script (generate the zip on the fly without ever touching disk and respond back to client to download). Can someone guide and post your working scripts. I tried googling, seems we can achieve it through zipstream. But didn't find any example/working script.
grab the files matching *.xml files from root folder.
Immediately writes to the client’s http response the http headers to say it’s a download and the file name is .zip.
zipstream writes the header bytes of zip container.
Creates an http request to the first image in S3.
Pipes that into zipstream (we don’t actually need to run deflate as the images are already compressed).
Pipes that into the client’s http response.
Repeats for each image, with zipstream correctly writing envelopes for each file.
zipstream writes the footer bytes for the zip container
Ends the http response.
Thanks,
Srinivas

I had the same requirement ... stream files from Amazon S3, zip them on the fly (in memory) and deliver to the browser through node.js. My solution involved using the knox and archiver packages and piping the archive's bytes to the result stream.
Since this is on the fly, you wont know the resulting archive size and therefore you cannot use the "Content-Length" HTTP header. Instead you'll have to use the "Transfer-Encoding: chunked" header.
The downside to "chunked" is you won't get a progress bar for the download. I've tried setting the Content-Length header to an approximate value, but this only works for Chrome and Firefox; IE corrupts the file; haven't tested with Safari.
var http = require("http");
var knox = require("knox");
var archiver = require('archiver');
http.createServer(options, function(req, res) {
var zippedFilename = 'test.zip';
var archive = archiver('zip');
var header = {
"Content-Type": "application/x-zip",
"Pragma": "public",
"Expires": "0",
"Cache-Control": "private, must-revalidate, post-check=0, pre-check=0",
"Content-disposition": 'attachment; filename="' + zippedFilename + '"',
"Transfer-Encoding": "chunked",
"Content-Transfer-Encoding": "binary"
};
res.writeHead(200, header);
archive.store = true; // don't compress the archive
archive.pipe(res);
client.list({ prefix: 'myfiles' }, function(err, data) {
if (data.Contents) {
var fileCounter = 0;
data.Contents.forEach(function(element) {
var fileName = element.Key;
fileCounter++;
client.get(element.Key).on('response', function(awsData) {
archive.append(awsData, {name: fileName});
awsData.on('end', function () {
fileCounter--;
if (fileCounter < 1) {
archive.finalize();
}
});
}).end();
});
archive.on('error', function (err) {
throw err;
});
archive.on('finish', function (err) {
return res.end();
});
}
}).end();
}).listen(80, '127.0.0.1');

Related

Downloading an Excel file causes it to corrupt

I have a simple service on Angular 2 and Typescript that requests Excel files to a server and then opens a download file dialogue for the user. However, as it is currently, the file becomes corrupt when downloaded.
When downloaded, it opens fine in OpenOffice and derivates, but throws a "File is Corrupt" error on Microsoft Excel, and asks if the user wants to recover as much as it can.
When Excel is prompted to recover the file, it does so successfully, and the recovered Excel has all rows and data that is expected for the Excel file. Comparing the recovered file against opening the file in OpenOffice and derivates evidence no outstanding differences.
The concrete Excel I am trying to download is generated with Apache POI in a microservice, then passed to the main backend and finally served to the frontend for the user to download. Both the backend and microservice are written in Java, through Spark Framework.
I made some tests on the backends, and concluded the problem is not the report generation nor the data transfer:
Asking the microservice to save the generated Excel in a file within the server and then opening such file (hereby file A) in Excel shows that file A is not corrupted.
Asking the main backend server to save the Excel file that it receives from the microservice in a file within itself and then opening such file in Excel (hereby file B) shows that file B is not corrupted.
Downloading both file A and file B through FileZilla from their respective servers yields completely uncorrupted files.
As such, I believe it is safe to assume the Excel becomes corrupted somewhere between the time the file is received on the frontend and the time the user downloads such file. Additionally, the Catalina logs do not evidence any error that might potentially be happening.
I have read several posts that deal with the issue, including a bug report (https://github.com/angular/angular/issues/14083) that included a workaround via XMLHTTPRequest. However, none of the workarounds detailed were successful in solving my issue.
Attached is the code I am using to both obtain the Excel file from the backend and serve it to the user. I am including both an XMLHTTPRequest and an Angular http call (within comments) since those are the two main ways I have been trying to make this work. Additionally, please do take into account the code has been altered to remove information I do not wish to make public.
download(body) {
let reply = Observable.create(observer => {
let xhr = new XMLHttpRequest();
xhr.open('POST', 'URL', true);
xhr.setRequestHeader('Content-type', 'application/json;charset=UTF-8');
xhr.setRequestHeader('Accept', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
xhr.setRequestHeader('Authorization', 'REDACTED');
xhr.responseType = 'blob';
xhr.onreadystatechange = function () {
if(xhr.readyState === 4) {
if(xhr.status === 200) {
var contentType = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet';
var blob = new Blob([xhr.response], { type: contentType });
observer.next(blob);
observer.complete();
}
else {
observer.error(xhr.response);
}
}
}
xhr.send(JSON.stringify(body));
});
return reply;
/*let headers = new Headers();
headers.set("Authorization", 'REDACTED');
headers.set("Accept", 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
let requestOptions :RequestOptions = new RequestOptions({headers: headers, responseType: ResponseContentType.Blob});
return this.http.post('URL', body, requestOptions);*/
}
Hereby is the code to prompt the user to download the Excel. It is currently made to work with the XMLHTTPRequest. Please do note that I have also attempted to download without resorting to FileSaver, with no luck.
downloadExcel(data) {
let body = {
/*REDACTED*/
}
this.service.download(body)
.subscribe(data => {
FileSaver.saveAs(data, "Excel.xlsx");
});
}
Hereby are the versions of the tools I am using:
NPM: 5.6.0
NodeJs: 8.11.3
Angular JS: ^6.1.0
Browsers used: Chrome, Firefox, Edge.
Any help on this issue would be appreciated. Any additional information you may need I will be happy to provide.
I think what you want is CSV format which open in Excel, update your sevice as follow:
You should tell Angular you are expecting a response of type blob (Binary Large Object) that is your Excel/Csv file.
Also make sure the URL/API on your server is set to accept content-type='text/csv'.
Here's an example with Angular 2.
#Injectable()
export class YourService {
constructor(private http: Http) {}
download() { //get file from the server
this.http.get("http://localhost/..", {
responseType: ResponseContentType.Blob,
headers: new Headers({'Content-Type', 'text/csv'})
}).subscribe(
response => {
var blob = new Blob([response.blob()], {type: 'text/csv'});
FileSaver.saveAs(blob, 'yourFileName.csv');
},
error => {
console.error('something went wrong');
}
);
}
}
Have you tried uploading/downloading your xls file as base64?
var encodedXLSToUpload = 'data:application/xls;base64,' + btoa(file);
Check this for more details: Creating a Blob from a base64 string in JavaScript

Read from external file into variable for this.response.speak

I have the following intent for an Alexa Skill and I need to read a .txt file from an external URL into a variable for Alexa to say it. This is what I have so far...
'PlayVoice': function() {
var url = "https://example.com/myfile.txt";
var output = 'Okay, here is the text file' + url;
this.response.speak(output);
this.emit(':responseReady');
},
Obviously, the only thing it does now is to read the actual URL.
I have tried using fs.readFile but I just get an error in the Alexa Skill. This is the code I tried:
'PlayVoice': function() {
var content;
fs.readFile('https://example.com/myfile.txt', function read(err, data) {
content = data;
this.response.speak(content);
}
this.emit(':responseReady');
},
Any help on how to simply read a text file into a variable I can get Alexa to speak via this.response.speak?
You can use request package.
something like this should help.
var request = require('request');
request('url/of/the/file', function (error, response, body) {
console.log('error:', error); // Print the error if one occurred
console.log('statusCode:', response && response.statusCode); // Print the response status code if a response was received
console.log('body:', body); // contents of your file.
});
source : https://www.npmjs.com/package/request#super-simple-to-use
Also you'll need to add the package request to your skill's lambda.
To do that install the request package in the folder where your code is (lambda_function.js and all other files). Then create a zip of all the files (not the folder in which your files are) and upload it to your aws lambda.

Uploading file to openstack object storage from JavaScript

I have a openstack object storage container to which I'm trying to upload files directly from browser.
As per the documentation here, I can upload the file using a PUT request and I'm doing this using Angularjs provided $http.put method as shown below.
$http.put(temporaryUploadUrl,
formData,
{
headers: {
'Content-Type': undefined
}
}
).then(function (res) {
console.log("Success");
});
The file uploads successfully and it has no problems in authentication and gives me a 201 Created response. However the file is now containing junk lines on the top and bottom of it because its a multipart request sent using FormData().
Sample file content before upload:
Some sample text
here is more text
here is some other text
File content after downloadiong back from openstack container :
------WebKitFormBoundaryTEeQZVW5hNSWtIqS
Content-Disposition: form-data; name="file"; filename="c.txt"
Content-Type: text/plain
Some sample text
here is more text
here is some other text
------WebKitFormBoundaryTEeQZVW5hNSWtIqS--
I tried the FileReader to read the selected file as a binary string and wrote the content to the request body instead of FormData and the request which works fine for text files but not the binary files like XLSX or PDF The data is entirely corrupted this way.
EDIT:
The following answer is now considered a less performing workaround As
it will encode the entire file to base64 multipart form data. I would
suggest go ahead with #georgeawg's Answer if you are not Looking for a
formData + POST solution
Openstack also provides a different approach using FormData for uploading one or more files in a single go as mentioned in this documentation. Funny this was never visible in google search.
Here is a brief of it.
First you need to generate a signature similar to tempUrl signature using the following python procedure.
import hmac
from hashlib import sha1
from time import time
path = '/v1/my_account/container/object_prefix'
redirect = 'https://myserver.com/some-page'
max_file_size = 104857600
max_file_count = 1
expires = 1503124957
key = 'mySecretKey'
hmac_body = '%s\n%s\n%s\n%s\n%s' % (path, redirect,
max_file_size, max_file_count, expires)
signature = hmac.new(key, hmac_body, sha1).hexdigest()
Then in your javascript call post to the container like this.
var formData = new FormData();
formData.append("max_file_size", '104857600');
formData.append("max_file_count", '1');
formData.append("expires", '1503124957');
formData.append("signature", signature);
formData.append("redirect", redirect);
formData.append("file",fileObject);
$http.post(
"https://www.example.com/v1/my_account/container/object_prefix",
formData,
{
headers: {'Content-Type': undefined},
transformRequest: angular.identity
}
).then(function (res) {
console.log(response);
});
Points to note.
The formData in POST request should contain only these
parameters.
The file entry in the formData should be the last one.(Not sure why
it doesnt work the other way around).
The formData content like path with prefix, epoch time, max file
size, max file count and the redirection urls should be the same as
the one which were used to generate the signature. Otherwise you will
get a 401 Unauthorized.
I tried the FileReader to read the selected file as a binary string and wrote the content to the request body instead of FormData and the request which works fine for text files but not the binary files like XLSX or PDF The data is entirely corrupted this way.
The default operation for the $http service is to use Content-Type: application/json and to transform objects to JSON strings. For files from a FileList, the defaults need to be overridden:
var config = { headers: {'Content-Type': undefined} };
$http.put(url, fileList[0], config)
.then(function(response) {
console.log("Success");
}).catch(function(response) {
console.log("Error: ", response.status);
throw response;
});
By setting Content-Type: undefined, the XHR send method will automatically set the content type header appropriately.
Be aware that the base64 encoding of 'Content-Type': multipart/form-data adds 33% extra overhead. It is more efficient to send Blobs and File objects directly.
Sending binary data as binary strings, will corrupt the data because the XHR API converts strings from DOMSTRING (UTF-16) to UTF-8. Avoid binary strings as they are non-standard and obsolete.

AngularJS GET receives empty reply in Chrome but not in Fiddler

I'm implementing file download using AngularJS and WCF. My back-end is a .NET project hosted in IIS. The file is serialized as an array of bytes and then on the client side I utilize the File API to save the content.
To simplify the problem, back-end is like:
[WebInvoke(Method = "GET", UriTemplate = "FileService?path={path}")]
[OperationContract]
public byte[] DownloadFileBaseOnPath(string path)
{
using (var memoryStream = new MemoryStream())
{
var fileStream = File.OpenRead(path);
fileStream.CopyTo(memoryStream);
fileStream.Close();
WebOperationContext.Current.OutgoingResponse.Headers["Content-Disposition"] = "attachment; filename=\"Whatever\"";
WebOperationContext.Current.OutgoingResponse.ContentType = "application/octet-stream"; // treat all files as binary file
return memoryStream.ToArray();
}
}
And on client side, it just sends a GET request to get those bytes, converts in into a blob and save it.
function sendGetReq(url, config) {
return $http.get(url, config).then(function(response) {
return response.data;
});
}
Save the file then:
function SaveFile(url) {
var downloadRequest = sendGetReq(url);
downloadRequest.then(function(data){
var aLink = document.createElement('a');
var byteArray = new Uint8Array(data);
var blob = new Blob([byteArray], { type: 'application/octet-stream'});
var downloadUrl = URL.createObjectURL(blob);
aLink.setAttribute('href', downloadUrl);
aLink.setAttribute('download', fileNameDoesNotMatter);
if (document.createEvent) {
var event = document.createEvent('MouseEvents');
event.initEvent('click', false, false);
aLink.dispatchEvent(event);
}
else {
aLink.click();
}
setTimeout(function () {
URL.revokeObjectURL(downloadUrl);
}, 1000); // cleanup
});
}
This approach works fine with small files. I could successfully download files up to 64MB. But when I try to download a file larger than 64MB, the response.body is empty in Chrome. I also used Fiddler to capture the traffic. According to Fiddler, Back-end has successfully serialized the byte array and returned it. Please refer to the screenshot below.
In this example, I was trying to download a 70MB file:
And the response.data is empty:
Any idea why this is empty for file over 70MB? Though the response itself is more than 200MB, I do have enough memory for that.
Regarding to the WCF back-end, I know I should use Stream Mode when it comes to large files. But the typical use of my application is to download files less than 10MB. So I hope to figure this out first.
Thanks
Answer my own question.
Honestly I don't know what's going wrong. The issue still persists if I transfer it as a byte array. I eventually gave up this approach by returning a stream instead. Then on the client side, adding the following configuration
{responseType : blob}
and save it as a blob.

Downloaded .pdf files are corrupted when using expressjs

I am working on meanjs application generated using https://github.com/DaftMonk/generator-angular-fullstack. I am trying to generate a .pdf file using phantomjs and download it to the browser.
The issue is that the downloaded .pdf file always shows the blank pages regardless of the number of pages. The original file on server is not corrupt. When I investigated further, found that the downloaded file is always much larger than the original file on the disk. Also this issue happens only with .pdf files. Other file types are working fine.
I've tried several methods like res.redirect('http://localhost:9000/assets/exports/receipt.pdf');, res.download('client\\assets\\exports\\receipt.pdf'),
var fileSystem = require('fs');
var stat = fileSystem.statSync('client\\assets\\exports\\receipt.pdf');
res.writeHead(200, {
'Content-Type': 'application/pdf',
'Content-Length': stat.size
});
var readStream = fileSystem.createReadStream('client\\assets\\exports\\receipt.pdf');
return readStream.pipe(res);
and even I've tried with https://github.com/expressjs/serve-static with no changes in the result.
I am new to nodejs. What is the best way to download a .pdf file to the browser?
Update:
I am running this on a Windows 8.1 64bit Computer
I had corruption when serving static pdfs too. I tried everything suggested above. Then I found this:
https://github.com/intesso/connect-livereload/issues/39
In essence the usually excellent connect-livereload (package ~0.4.0) was corrupting the pdf.
So just get it to ignore pdfs via:
app.use(require('connect-livereload')({ignore: ['.pdf']}));
now this works:
app.use('/pdf', express.static(path.join(config.root, 'content/files')));
...great relief.
Here is a clean way to serve a file from express, and uses an attachment header to make sure the file is downloaded :
var path = require('path');
var mime = require('mime');
app.get('/download', function(req, res){
//Here do whatever you need to get your file
var filename = path.basename(file);
var mimetype = mime.lookup(file);
res.setHeader('Content-disposition', 'attachment; filename=' + filename);
res.setHeader('Content-type', mimetype);
var filestream = fs.createReadStream(file);
filestream.pipe(res);
});
There are a couple of ways to do this:
If the file is a static one like brochure, readme etc, then you can tell express that my folder has static files (and should be available directly) and keep the file there. This is done using static middleware:
app.use(express.static(pathtofile));
Here is the link: http://expressjs.com/starter/static-files.html
Now you can directly open the file using the url from the browser like:
window.open('http://localhost:9000/assets/exports/receipt.pdf');
or
res.redirect('http://localhost:9000/assets/exports/receipt.pdf');
should be working.
Second way is to read the file, the data must be coming as a buffer. Actually, it should be recognised if you send it directly, but you can try converting it to base64 encoding using:
var base64String = buf.toString('base64');
then set the content type :
res.writeHead(200, {
'Content-Type': 'application/pdf',
'Content-Length': stat.size
});
and send the data as response.
I will try to put an example of this.
EDIT: You dont even need to encode it. You may try that still. But I was able to make it work without even encoding it.
Plus you also do not need to set the headers. Express does it for you. Following is the Snippet of API code written to get the pdf in case it is not public/static. You need API to serve the pdf:
router.get('/viz.pdf', function(req, res){
require('fs').readFile('viz.pdf', function(err, data){
res.send(data);
})
});
Lastly, note that the url for getting the pdf has extension pdf to it, this is for browser to recognise that the incoming file is pdf. Otherwise it will save the file without any extension.
Usually if you are using phantom to generate a pdf then the file will be written to disc and you have to supply the path and a callback to the render function.
router.get('/pdf', function(req, res){
// phantom initialization and generation logic
// supposing you have the generation code above
page.render(filePath, function (err) {
var filename = 'myFile.pdf';
res.setHeader('Content-type', "application/pdf");
fs.readFile(filePath, function (err, data) {
// if the file was readed to buffer without errors you can delete it to save space
if (err) throw err;
fs.unlink(filePath);
// send the file contents
res.send(data);
});
});
});
I don't have experience of the frameworks that you have mentioned but I would recommend using a tool like Fiddler to see what is going on. For example you may not need to add a content-length header since you are streaming and your framework does chunked transfer encoding etc.

Resources