Decoding and decompressing AI9_DataStream within .eps files - zlib

Context: I am attempting to automate the inspection of eps files to detect a list of attributes, such as whether the file contains locked layers, embedded bitmap images etc.
So far we have found some of these things can be detected via inspection of the raw eps file data and its accompanying metadata (similar to the information returned by imagemagick.) However it seems that in files created by illustrator 9 and above the vast majority of this information is encoded within the "AI9_DataStream" portion of the file. This data is encoded via ascii85 and compressed. We have found some success in getting at this data by using: https://github.com/huandu/node-ascii85 to decode and nodes zlib library to decompress / unzip. (Our project is written in node / javascript). However it seems that in roughly half of our test cases / files the unzipping portion fails, throwing Z_DATA_ERROR / "incorrect data check".
Our method responsible for trying to decode:
export const decode = eps =>
new Promise((resolve, reject) => {
const lineDelimiters = /\r\n%|\r%|\n%/g;
const internal = eps.match(
/(%AI9_DataStream)([\s\S]*?)(AI9_PrivateDataEnd)/
);
const hasDataStream = internal && internal.length >= 2;
if (!hasDataStream) resolve('');
const encoded = internal[2].replace(lineDelimiters, '');
const decoded = ascii85.decode(encoded);
try {
zlib.unzip(decoded, (err, buffer) => {
// files can crash this process, for now we need to allow it
if (err) resolve('');
else resolve(buffer.toString('utf8'));
});
} catch (err) {
reject(err);
}
});
I am wondering if anyone out there has had any experience with this issue and has some insight into what might be causing this and whether there is an alternative avenue to explore for reliably decoding this data. Information on this topic seems a bit sparse so really anything that could get us going in the right direction would be very much appreciated.
Note: The buffers produced by the ascii85 decoding all have the same 78 9c header which should indicate standard zlib compression (and it does in fact decompress into parsable data about half the time without error)

Apparently we were misreading something about the ascii85 encoding. There is a ~> delimiter at the end of the encoded block that needs to be omitted from the string before decoding and subsequent unzipping.
So instead of:
/(%AI9_DataStream)([\s\S]*?)(AI9_PrivateDataEnd)/
Use:
/(%AI9_DataStream)([\s\S]*?)(~>)/
And you can get to the correct encoded / compressed data. So far this has produced human readable / regexable data in all of our current test cases so unless we are thrown another curve that seems to be the answer.

The only reliable method for getting content from PostScript is to run it through a PostScript interpreter, because PostScript is a programming language.
If you stick to a specific workflow with well understood input, then you may have some success in simple parsing, but that's about the only likely scenario which will work.
Note that EPS files don't have 'layers' and certainly don't have 'locked' layers.
You haven't actually pointed to a working example, but I suspect the content of the AI9_DataStream is not relevant to the EPS. Its probably a means for Illustrator to include its own native file format inside the EPS file, without it affecting a PostScript interpreter. This is how it works with AI-produced PDF files.
This means that when you reopen the EPS file with Adobe Illustrator, it ignores the EPS and uses the embedded native file, which magically grants you the ability to edit the file, including features like layers which cannot be represented in the EPS.

Related

How To Upload A Large File (>6MB) To SalesForce Through A Lightning Component Using Apex Aura Methods

I am aiming to take a file a user attaches through an Lightning Component and create a document object containing the data.
So far I have overcome the request size limits by chunking the data being uploaded into 1MB chunks. When the Apex Aura method receives these chunks of data it will either create a new document (if it is the first chunk), or will retrieve the existing document and add the new chunk to the end.
Data is received Base64 encoded, and then decoded server-side.
As the document data is stored as a Blob, the original file contents will be read as a String, and then appended with the chunk received. The new contents are then converted back into a Blob to be stored within the ContentVersion object.
The problem I'm having is that strings in Apex have a maximum length of 6,000,000 or so. Whenever the file size exceeds 6MB, this limit is hit during the concatenation, and will cause the file upload to halt.
I have attempted to avoid this limit by converting the Blob to a String only when necessary for the concatenation (as suggested here https://developer.salesforce.com/forums/?id=906F00000008w9hIAA) but this hasn't worked. I'm guessing it was patched because it's still technically allocating a string larger then the limit.
Code's really simple when appending so far:
ContentVersion originalDocument = [SELECT Id, VersionData FROM ContentVersion WHERE Id =: <existing_file_id> LIMIT 1];
Blob originalData = originalDocument.VersionData;
Blob appendedData = EncodingUtil.base64Decode(<base_64_data_input>);
Blob newData = Blob.valueOf(originalData.toString() + appendedData.toString());
originalDocument.VersionData = newData;
You will have hard time with it.
You could try offloading the concatenation to asynchronous process (#future/Queueable/Schedulable/Batchable), they'll have 12MB RAM instead of 6. Could buy you some time.
You could try cheating by embedding an iframe (Visualforce or lightning:container tag? Or maybe a "canvas app") that would grab your file and do some manual JavaScript magic calling normal REST API for document upload: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_sobject_insert_update_blob.htm (last code snippet is about multiple documents). Maybe jsforce?
Can you upload it somewhere else (SharePoint? Heroku?) and have that system call into SF to push them (no Apex = no heap size limit). Or even look "Files Connect" up.
Can you send an email with attachments? Crude but if you write custom Email-to-Case handler class you'll have 36 MB of RAM.
You wrote "we needed multiple files to be uploaded and the multi-file-upload component provided doesn't support all extensions". That may be caused by these:
In Experience Builder sites, the file size limits and types allowed follow the settings determined by site file moderation.
lightning-file-upload doesn't support uploading multiple files at once on Android devices.
if the Don't allow HTML uploads as attachments or document records security setting is enabled for your organization, the file uploader cannot be used to upload files with the following file extensions: .htm, .html, .htt, .htx, .mhtm, .mhtml, .shtm, .shtml, .acgi, .svg.

In Tensorflow JS, using Node (tfjs-node) is there any way to Load Universal Sentence Encoder (USE) from local File?

I have a tensorflow.js script/app that runs in Node.js using tfjs-node and Universal Sentence Encoder (USE).
Each Time the script runs, it downloads a 525 MegaByte File (the USE model file).
Is there any way to load the Universal Sentence Encoder Model File from the local file system to avoid downloading such a large file every time I need to run the node.js tensorflow script?
I've noted several similar model loading examples but none that work with Universal Sentence Encoder as it does not appear to have the same type functionality. Below is a stripped down example of a functioning script that downloads the 525 MB file every time it executes.
Any help or recommendations would be appreciated.
const tf = require('#tensorflow/tfjs-node');
const use = require('#tensorflow-models/universal-sentence-encoder');
// No Form of Universal Sentence Encoder loader appears to be present
let model = tf.loadGraphModel('file:///Users/ray/Documents/tf_js_model_save_load2/models/model.json');
use.load().then(model => {
const sentences = [
'Hello.',
'How are you?'
];
model.embed(sentences).then(embeddings => {
embeddings.print(true /* verbose */);
});
});
I've tried several recommendations that appear to work for other models but not Universal Sentence Encoder such as:
const tf = require('#tensorflow/tfjs');
const tfnode = require('#tensorflow/tfjs-node');
async function loadModel(){
const handler = tfnode.io.fileSystem('tfjs_model/model.json');
const model = await tf.loadLayersModel(handler);
console.log("Model loaded")
}
loadModel();
its not a model issue per-say, its a module issue.
model can be loaded any way you want, but the module #tensorflow-models/universal-sentence-encoder implements only a specific internal way on how it loads actual model data.
specifically, it internally uses tf.util.fetch.
solution? use some library (or write your own) to register a global fetch handler that knows how to handle file:// prefixes - if global fetch handler exists, tf.util.fetch will simply just use it.
hint: https://gist.github.com/joshua-gould/58e1b114a67127273eef239ec0af8989

How to read in files with a specific file ending at compile time in nim?

I am working on a desktop application using nim's webgui package, which sort of works like electron in that it renders a gui using HTML + CSS + JS. However, instead of bundling its own browser and having a backend in node, it uses the browser supplied by the OS (Epiphany under Linux/GNOME, Edge under Windows, Safari under iOS) and allows writing the backend in nim.
In that context I am basically writing an SPA in Angular and need to load in the HTML, JS and CSS files at compile-time into my binary.
Reading from a known absolute filepath is not an issue, you can use nim's staticRead method for that.
However, I would like to avoid having to adjust the filenames in my application code all the time, e.g. when a new build of the SPA changes a file name from main.a72efbfe86fbcbc6.js to main.b72efbfe86fbcbc6.js.
There is an iterator in std/os that you can use at runtime called walkFiles and walkPattern, but these fail when used at compileTime!
import std/[os, sequtils, strformat, strutils]
const resourceFolder = "/home/philipp/dev/imagestable/html" # Put into config file
const applicationFiles = toSeq(walkFiles(fmt"{resourceFolder}/*"))
/home/philipp/.choosenim/toolchains/nim-#devel/lib/pure/os.nim(2121, 11) Error: cannot 'importc' variable at compile time; glob
How do I get around this?
Thanks to enthus1ast from nim's discord server I arrived at an answer: using the collect macro with the walkDir iterator.
The walkDir iterator does not make use of things that are only available at runtime and thus can be safely used at compiletime. With the collect macro you can iterate over all your files in a specific directory and collect their paths into a compile-time seq!
Basically you start writing collect-block, which is a simple for-loop that at its end evaluates to some form of value. The collect macro will put them all into a seq at the end.
The end result looks pretty much like this:
import std/[sequtils, sugar, strutils, strformat, os]
import webgui
const resourceFolder = "/home/philipp/dev/imagestable/html"
proc getFilesWithEnding(folder: string, fileEnding: string): seq[string] {.compileTime.} =
result = collect:
for path in walkDir(folder):
if path.path.endswith(fmt".{fileEnding}"): path.path
proc readFilesWithEnding(folder: string, fileEnding: string): seq[string] {.compileTime.} =
result = getFilesWithEnding(folder, fileEnding).mapIt(staticRead(it))

React Load Binary File / URL scheme "file" is not supported

Background
I built an app, which converts files from type A to type B (a binary file). I want to import and use a dummy file of type B to fill the data of file type A. The dummy always stays the same. The app has no backend. I want to share the html, so anything which requires turning off browser security etc., isn't an option.
Problem
At the moment, I load the files as I found here, but this works only with a backend server:
Requesting blob images and transforming to base64 with fetch API
import dummy from '../templates/Grid2.shp';
let hex = await fetch(dummy)
.then( response => response.blob() )
.then( blob => new Promise( callback =>{
let reader = new FileReader() ;
reader.onload = function(){
const serumShp = atob(this.result.substring(37)); // 37 strips the base64 info data:...
callback(binaryToHex(serumShp))
} ;
reader.readAsDataURL(blob) ;
}) ) ;
It works in my development but not at the built stage. As the browsers requests from the filesystem.
I found a solution over a file loader, but this solution also throws an error:
Using file-loader to load binary file in react
import/no-webpack-loader-syntax
Also, I don't see any configuration files for Webpack. As far as I have seen I would need to eject them, which is also not recommended.
Question:
How can I import binary files into my app without a backend server/any changes, etc.?
Sorry, I cannot help, but pointing out that there is a general discussion in CRA to support a more elegant way of importing binary/raw data. Sadly there doesn't seem to be much progress, the proposal is from 2018.

Safety/sanitization when storing images in DB with PHP

I'm looking to store images for an application in an MSSQL database. (I understand that there is some debate about whether this or file system storage is better; that's another thread though.) I'm looking at doing something similar to http://forum.codecall.net/topic/40286-tutorial-storing-images-in-mysql-with-php/ but in CodeIgniter, something along the lines of:
foreach ($_FILES as $upload_name => $info) {
if ($info['name']) {
// Temporary file name stored on the server
$tmpName = $info['tmp_name'];
// Read the file
$fp = fopen($tmpName, 'r');
$data = fread($fp, filesize($tmpName));
fclose($fp);
//model code consolidated here for ease of question-asking
$db = $this->load->database();
$stmt = $db->insert('my_table', array('image' => $data));
}
}
My question is mostly along the lines of security. Basically is there any particular concerns I should have for sanitizing image binary data inserts versus other sorts of string data? I took out the addslashes() in the code from the site linked above because I know CI's active records do some sanitization on their own but I don't know if it is better to have it (or do some other prep work altogether).
If I understand your question correctly, you should not have to worry about it as long as you store the file_type (The file's Mime type) with it and fore the Mime type with the binary data. Then whenever you handle the data you make sure and use it with the proper Mime type so even if they upload a script of virus you can make sure it is only rendered as an image instead of letting your server or the browser handle it.
Other than this I do not think you will need to pull the upload into memory and try and scrub it.

Resources