Non UTF-8 characters in PDF Javascript Blob - angularjs

I have a PDF file that I serve from a WebApi 2 application to an AngularJS client. I use file-saver to then save the file on the client as follows (in TypeScript):
this.$http.get(`${webUrl}api/pdf?id=${fileDto.id}`)
.then((response: ng.IHttpPromiseCallbackArg<any>) => {
var file = new Blob([response.data], { type: 'application/pdf' });
saveAs(file, 'my.pdf');
});
The reason I do this is so that I can use a bearer token to authorize access to the PDF (this is added via an interceptor). This works except for when the PDF file contains non UTF8 characters. In this latter case the file still downloads, but when I open it, it appears blank. Opening the file I can see that the non UTF8 characters are replaced with a □ character. In the JavaScript when I inspect the string value of response.data in the debugger I see that those characters are represented by �. Am I right in assuming that, since the file has been written from a string in JavaScript, no matter what I do I will not be able to correctly save a file with non UTF8 characters from JavaScript?

The � character is the Unicode Replacement Character \uFFFD which is inserted by the UTF-8 parser when it tries to parse illegal UTF-8.
PDF files are not UTF-8 strings; they are binary files.
To avoid the conversion from UTF-8 to DOMstring (UTF-16), set the config to responseType: 'blob':
var config = {responseType: 'blob'};
this.$http.get(`${webUrl}api/pdf?id=${fileDto.id}`, config)
.then((response: ng.IHttpPromiseCallbackArg<any>) => {
̶v̶a̶r̶ ̶f̶i̶l̶e̶ ̶=̶ ̶n̶e̶w̶ ̶B̶l̶o̶b̶(̶[̶r̶e̶s̶p̶o̶n̶s̶e̶.̶d̶a̶t̶a̶]̶,̶ ̶{̶ ̶t̶y̶p̶e̶:̶ ̶'̶a̶p̶p̶l̶i̶c̶a̶t̶i̶o̶n̶/̶p̶d̶f̶'̶ ̶}̶)̶;
var file = response.data;
saveAs(file, 'my.pdf');
});
For more information, see
MDN Web API Reference - XHR ResponseType
AngularJS $http Service API Reference

Related

Send .mat file through Django Rest Framework

I have an issue to send the contents of a .mat file to my frontend. My end goal is to allow clients to download the content of this .mat file at the click of a button so that they end up with the same file in their possession. I use Next.js + Django Rest Framework.
My first try was as follow:
class Download(APIView):
def get(self, request):
with open('file_path.mat', 'rb') as FID:
fileInstance = FID.read()
return Response(
fileInstance,
status=200,
content_type="application/octet-stream",
)
If I print out the fileInstance element I get some binary results:
z\xe1\xfe\xc6\xc6\xd2\x1e_\xda~\xda|\xbf\xb6\x10_\x84\xb5~\xfe\x98\x1e\xdc\x0f\x1a\xee\xe7Y\x9e\xb5\xf5\x83\x9cS\xb3\xb5\xd4\xb7~XK\xaa\xe3\x9c\xed\x07v\xf59Kbn(\x91\x0e\xdb\xbb\xe8\xf5\xc3\xaa\x94Q\x9euQ\x1fx\x08\xf7\x15\x17\xac\xf4\x82\x19\x8e\xc9...
But I can't send it back to my frontend because of a
"UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 137: invalid start byte"
This error is always the same regardless of which .mat file I try to send in my response.
Next I tried to use the scipy.io.loadmat() method. In this case, fileInstance gives me a much more readable dictionary object, but I still can't get it to transfer to the frontend because of the presence of NaN in my dict:
ValueError: Out of range float values are not JSON compliant
Finally, some suggested to use h5py to send back the data as such:
with h5py.File('file_path.mat', 'r') as fileInstance:
print(fileInstance)
But in that case the error I get is
Unable to open file (file signature not found)
I know my files are not corrupted because I can open them in Matlab with no problem.
With all this trouble I'm wondering if I'm using the right approach to this problem. I could technically send the dictionary obtained through 'scipy.io.loadmat()' as a str element instead of binary, but I'll have to figure out a way to convert this text back to binary inside a Javascript function. Would anybody have some ideas as to how I should proceed?
The problem was in my frontend after all. Still, here's the correct way to go about it:
class Download(APIView):
parser_classes = [FormParser, MultiPartParser]
def get(self, request):
try:
file_path = "xyz.mat"
response = FileResponse(file_path.open("rb"), content_type="application/octet-stream")
response["Content-Disposition"] = f"attachment; filename=file_name"
return response
except Exception as e:
return Response(status=500)
This should send to the frontend the right file in the right format. No need to worry about encoding and such.
Meanwhile, on the frontend you should receive the file as follows:
onClick={() => {
const url = '/url_to_your_api/';
axios({ method: 'get', url: url, responseType: 'blob' })
.then((response) => {
const { data } = response;
const fileName = 'file_name';
const blob = new Blob([data], { type: 'application/octet-stream' });
const href = URL.createObjectURL(blob);
const link = document.createElement('a');
link.href = href;
link.download = fileName + '.mat';
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
URL.revokeObjectURL(href);
})
.catch((response) => {
console.error(response);
});
}}
Long story short, the part I was missing was to specify to receive the data as blob inside the 'onClick()' function. By default, responseType from Axios is set to Json/String. For that reason, my file was modified at reception and would not be usable in matlab afterwards. If you face a similar problem in the future, try to use the 'shasum' BASH function to observe the hashed value of the file. It is with the help of that function that I could deduce that my API function would return the correct value and that therefore the problem was happenign on the frontend.

Uploading file to openstack object storage from JavaScript

I have a openstack object storage container to which I'm trying to upload files directly from browser.
As per the documentation here, I can upload the file using a PUT request and I'm doing this using Angularjs provided $http.put method as shown below.
$http.put(temporaryUploadUrl,
formData,
{
headers: {
'Content-Type': undefined
}
}
).then(function (res) {
console.log("Success");
});
The file uploads successfully and it has no problems in authentication and gives me a 201 Created response. However the file is now containing junk lines on the top and bottom of it because its a multipart request sent using FormData().
Sample file content before upload:
Some sample text
here is more text
here is some other text
File content after downloadiong back from openstack container :
------WebKitFormBoundaryTEeQZVW5hNSWtIqS
Content-Disposition: form-data; name="file"; filename="c.txt"
Content-Type: text/plain
Some sample text
here is more text
here is some other text
------WebKitFormBoundaryTEeQZVW5hNSWtIqS--
I tried the FileReader to read the selected file as a binary string and wrote the content to the request body instead of FormData and the request which works fine for text files but not the binary files like XLSX or PDF The data is entirely corrupted this way.
EDIT:
The following answer is now considered a less performing workaround As
it will encode the entire file to base64 multipart form data. I would
suggest go ahead with #georgeawg's Answer if you are not Looking for a
formData + POST solution
Openstack also provides a different approach using FormData for uploading one or more files in a single go as mentioned in this documentation. Funny this was never visible in google search.
Here is a brief of it.
First you need to generate a signature similar to tempUrl signature using the following python procedure.
import hmac
from hashlib import sha1
from time import time
path = '/v1/my_account/container/object_prefix'
redirect = 'https://myserver.com/some-page'
max_file_size = 104857600
max_file_count = 1
expires = 1503124957
key = 'mySecretKey'
hmac_body = '%s\n%s\n%s\n%s\n%s' % (path, redirect,
max_file_size, max_file_count, expires)
signature = hmac.new(key, hmac_body, sha1).hexdigest()
Then in your javascript call post to the container like this.
var formData = new FormData();
formData.append("max_file_size", '104857600');
formData.append("max_file_count", '1');
formData.append("expires", '1503124957');
formData.append("signature", signature);
formData.append("redirect", redirect);
formData.append("file",fileObject);
$http.post(
"https://www.example.com/v1/my_account/container/object_prefix",
formData,
{
headers: {'Content-Type': undefined},
transformRequest: angular.identity
}
).then(function (res) {
console.log(response);
});
Points to note.
The formData in POST request should contain only these
parameters.
The file entry in the formData should be the last one.(Not sure why
it doesnt work the other way around).
The formData content like path with prefix, epoch time, max file
size, max file count and the redirection urls should be the same as
the one which were used to generate the signature. Otherwise you will
get a 401 Unauthorized.
I tried the FileReader to read the selected file as a binary string and wrote the content to the request body instead of FormData and the request which works fine for text files but not the binary files like XLSX or PDF The data is entirely corrupted this way.
The default operation for the $http service is to use Content-Type: application/json and to transform objects to JSON strings. For files from a FileList, the defaults need to be overridden:
var config = { headers: {'Content-Type': undefined} };
$http.put(url, fileList[0], config)
.then(function(response) {
console.log("Success");
}).catch(function(response) {
console.log("Error: ", response.status);
throw response;
});
By setting Content-Type: undefined, the XHR send method will automatically set the content type header appropriately.
Be aware that the base64 encoding of 'Content-Type': multipart/form-data adds 33% extra overhead. It is more efficient to send Blobs and File objects directly.
Sending binary data as binary strings, will corrupt the data because the XHR API converts strings from DOMSTRING (UTF-16) to UTF-8. Avoid binary strings as they are non-standard and obsolete.

Angular http.get charset

I'm having some trouble getting my angular application to parse my json-data correctly.
When my json-feed contains e.g. { "title": "Halldórsson Pourié" }
my application shows Halld�rsson Pouri�
I figure the problem is the charset, but i can't find where to change it.
Currently i am using ng-bind-html and using $sce.trustAsHtml(), and I'm very sure the problem occurs when $http.get(url) parses my json.
So how do i tell $http.get(url) to parse the data with a specific charset?
I had a similar issue and resolved it by using:
encodeURIComponent(JSON.stringify(p_Query))
where p_Query is a JSON containing the details of the request (i.e. your { "title": "Halldórsson Pourié" }).
EDIT:
You may also need to add to the header of your GET request the following:
'Content-Type': 'application/x-www-form-urlencoded ; charset=UTF-8'
Had same problem with accented characters and some scientific notation in text/JSON fields and found that AngularJS (or whatever native JavaScript XHR/fetch functions it is using) was flattening everything to UTF-8 no matter what else we tried.
The other respondent here claimed that UTF-8 should somehow still be accommodating your extended character set, but we found otherwise: taking samples of the source data directly, we loaded it into text editor with different encodings and UTF-8 would still squash extended characters to that � placeholder.
In our case, the encoding was ISO-8859-15 but you may be sufficient with ISO-8859-1.
Try adding this $http configuration to your $http.get() call, accordingly:
$http.get(url, {
responseType: 'arraybuffer',
transformResponse: function(data) {
let textDecoder = new TextDecoder('ISO-8859-15'); // your encoding may vary!
return JSON.parse(textDecoder.decode(data));
}
});
This should intercept AngularJS default response transformation and instead apply the function here. You will, of course, want to consider adding some error handling, etc.
If there are better ways to do this without a custom transformResponse function, I have yet to find anything in the AngularJS documentation.
$http.get(url, {responseType: 'arraybuffer', }).then(function (response) { var textDecoder = new TextDecoder('ISO-8859-1'); console.log(textDecoder.decode(response.data));});
This code will replace all your � placeholders into normal characters.

How to set content disposition character encoding in ng-file-upload

I'm trying to upload a file with an em/en dash '–' character in its name (encoded in utf8). The character is rendered correctly in the browser request (looking at chromes network monitor). But no encoding information is sent with the request to the server, the character is then interpreted as 'â' (code page 28591, iso-8859-1) by the server. I've come to understand that using
filename*=UTF-8''myFileWith–In.pdf
may work but can't seem to manipulate the filename attribute in a suitable way, for example not using double quotes. Source
My angular code looks something like this, is it possible from here to add encoding information to the content disposition?
function uploadFile(file, url)
{
$upload.upload({
url: url,
method: 'POST',
file: file,
});
}
On the server I can use the following code to correct the encoding but it assumes that all incoming traffic was encoded in utf8, there needs to be a way of identifying the encoding in the request.
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] isoBytes = iso.GetBytes(file.FileName);
string msg = utf8.GetString(isoBytes);
You can use fileName option of upload to set the content-disposition's filename. See the readme reference for explanation.
function uploadFile(file, url)
{
$upload.upload({
url: url,
method: 'POST',
file: file,
fileName: 'UTF-8\'\'' + file.name
});
}

Downloaded .pdf files are corrupted when using expressjs

I am working on meanjs application generated using https://github.com/DaftMonk/generator-angular-fullstack. I am trying to generate a .pdf file using phantomjs and download it to the browser.
The issue is that the downloaded .pdf file always shows the blank pages regardless of the number of pages. The original file on server is not corrupt. When I investigated further, found that the downloaded file is always much larger than the original file on the disk. Also this issue happens only with .pdf files. Other file types are working fine.
I've tried several methods like res.redirect('http://localhost:9000/assets/exports/receipt.pdf');, res.download('client\\assets\\exports\\receipt.pdf'),
var fileSystem = require('fs');
var stat = fileSystem.statSync('client\\assets\\exports\\receipt.pdf');
res.writeHead(200, {
'Content-Type': 'application/pdf',
'Content-Length': stat.size
});
var readStream = fileSystem.createReadStream('client\\assets\\exports\\receipt.pdf');
return readStream.pipe(res);
and even I've tried with https://github.com/expressjs/serve-static with no changes in the result.
I am new to nodejs. What is the best way to download a .pdf file to the browser?
Update:
I am running this on a Windows 8.1 64bit Computer
I had corruption when serving static pdfs too. I tried everything suggested above. Then I found this:
https://github.com/intesso/connect-livereload/issues/39
In essence the usually excellent connect-livereload (package ~0.4.0) was corrupting the pdf.
So just get it to ignore pdfs via:
app.use(require('connect-livereload')({ignore: ['.pdf']}));
now this works:
app.use('/pdf', express.static(path.join(config.root, 'content/files')));
...great relief.
Here is a clean way to serve a file from express, and uses an attachment header to make sure the file is downloaded :
var path = require('path');
var mime = require('mime');
app.get('/download', function(req, res){
//Here do whatever you need to get your file
var filename = path.basename(file);
var mimetype = mime.lookup(file);
res.setHeader('Content-disposition', 'attachment; filename=' + filename);
res.setHeader('Content-type', mimetype);
var filestream = fs.createReadStream(file);
filestream.pipe(res);
});
There are a couple of ways to do this:
If the file is a static one like brochure, readme etc, then you can tell express that my folder has static files (and should be available directly) and keep the file there. This is done using static middleware:
app.use(express.static(pathtofile));
Here is the link: http://expressjs.com/starter/static-files.html
Now you can directly open the file using the url from the browser like:
window.open('http://localhost:9000/assets/exports/receipt.pdf');
or
res.redirect('http://localhost:9000/assets/exports/receipt.pdf');
should be working.
Second way is to read the file, the data must be coming as a buffer. Actually, it should be recognised if you send it directly, but you can try converting it to base64 encoding using:
var base64String = buf.toString('base64');
then set the content type :
res.writeHead(200, {
'Content-Type': 'application/pdf',
'Content-Length': stat.size
});
and send the data as response.
I will try to put an example of this.
EDIT: You dont even need to encode it. You may try that still. But I was able to make it work without even encoding it.
Plus you also do not need to set the headers. Express does it for you. Following is the Snippet of API code written to get the pdf in case it is not public/static. You need API to serve the pdf:
router.get('/viz.pdf', function(req, res){
require('fs').readFile('viz.pdf', function(err, data){
res.send(data);
})
});
Lastly, note that the url for getting the pdf has extension pdf to it, this is for browser to recognise that the incoming file is pdf. Otherwise it will save the file without any extension.
Usually if you are using phantom to generate a pdf then the file will be written to disc and you have to supply the path and a callback to the render function.
router.get('/pdf', function(req, res){
// phantom initialization and generation logic
// supposing you have the generation code above
page.render(filePath, function (err) {
var filename = 'myFile.pdf';
res.setHeader('Content-type', "application/pdf");
fs.readFile(filePath, function (err, data) {
// if the file was readed to buffer without errors you can delete it to save space
if (err) throw err;
fs.unlink(filePath);
// send the file contents
res.send(data);
});
});
});
I don't have experience of the frameworks that you have mentioned but I would recommend using a tool like Fiddler to see what is going on. For example you may not need to add a content-length header since you are streaming and your framework does chunked transfer encoding etc.

Resources