Converting Blob object to html in google app - google-app-engine

I have stored user uploaded document (.doc ,.pdf) as a Blob object into data-store.
Instead of allowing user to download the document, I would like to present it as an HTML page
for viewing the doc. how do I convert Blob into HTML ? does google app engine provides any ready made API for the same?

There is no ready made API in AppEngine to convert .doc or .pdf (or or other types of) files to HTML. You would need to find a library for your preferred language to parse the blob file into its parts structured as an object model (like a DOM). Then you would need to write code to convert individual parts of the object model to HTML, unless you are lucky enough to find another library. And no, StackOverflow is not a good place to ask "what library is there...".

No. AppEngine itself does not provide any file format conversion tools. You might want to look into Google Drive API, which might, to some extent, do the format conversion for you.

You can have embed a PDF reader on a web page by using pdf.js.

Most browsers already have a built-in PDF viewer. If you provide a link to a PDF file, when users click on it, many browsers will automatically display the document. Those browsers that do not support this option, will offer a user to download the file to their hard-drive.
This is the easiest solution - you don't have to do anything at all.

Related

Have any one Worked on The fillable pdf in sales force

hi I am working on the fillable pdf in Salesforce, i have uploaded the pdf in the static resource in Salesforce , now need to handle the data to fill in it and download the file , can any suggestions to achieve it
Thanks in advance
There is no functionality in Apex to interact with fillable PDFs. While it's possible to interact with binary formats such as PDF in Apex, it's difficult, slow, and subject to the extensive limitations of the Salesforce governor limits. You'd have to implement this from scratch based on your knowledge of the PDF format.
You will likely have much more success either building PDF manipulation functionality in JavaScript on the front end, or calling out to an external service on Heroku or elsewhere that uses PDF libraries available in some other stack to do this work.
Salesforce does not support inputs on generated pdf via Visualforce
pages.
To achieve this functionality, you can create a form with all the
inputs that are required in the pdf.
Once the user fills in the information in the created form and submits it, generate the pdf with the filled information by the user.
Bonus: you can save the filled-in information as well by creating a record under the object for future reference (data is everything)

LinkedIn share links to PDF documents

I am trying to create buttons on a web page that allow users to share links to PDF documents on LinkedIn. LinkedIn loads a window without any errors but offers no link or preview of the PDF or any indication of what is being shared.
Here are the two methods I have tried. First the plugin method.
<script type="in/share" data-url="http://example.net/DocumentDownload.aspx?Command=Core_Download&entryID=114"></script>
And, secondly with a custom url.
TEST
Encoding the url makes no difference.
The above links are direct document links from a DNN web site using Document Exchange. If I change the urls to any html page it works fine and LinkedIn seems to be able to extract the useful information right from the page and use that for the share details.
Can LinkedIn handle this kind of thing? There is nothing to guide me on the type of links that can be shared. I can't find any information about it. There are no errors in the web console.
Not sure, but you should try to provide LinkedIn with the link that has .pdf at the end, like http://example.com/documents/file1.pdf. I guess LinkedIn just checks the URL if it has .pdf file at the end to decide if it is a PDF document or not.
I have no problem sharing pdf's on LinkedIn. Check it out...
https://www.linkedin.com/sharing/share-offsite/?url=https://www.revoltlib.com/anarchism/the-conquest-of-bread/view.pdf
Works perfectly fine. And view.pdf is a script, not a file, either, so, it's not looking for a PDF file to analyze, so much as headers that indicate you have a PDF file available to analyze, so, in PHP, at DocumentDownload.aspx, we would do...
header('Content-type: application/pdf; charset=utf-8');
This header let's the sharing app know that it can analyze the document as a PDF file and extract useful information from it, as you can see from the screen shot.

Translate PDF file using Google Translate API

I want to use Google Translate in my project. I completed all the formalities with Google. I have the API key also with me. With this key I can easily translate any word with JavaScript. But how to translate the PDF file as we can do in Google Translate site? I found one thing like this:
http://translate.google.com/translate?hl=fr&sl=auto&tl=en&u=http://www.example.com/PDF.pdf
But here I cannot use my key, as a result it takes so much time to translate. So I want to use my Key and translate a PDF file. Please help me out.
My approach is like this:
1. One html page I have.
2. One browse button for pdf
3. Upload the file
4. Transalte the pdf with Google API and show in the html page.
I searched it for this pdf translate with but did not find anything. Please help me out.
TL:DR: Use headless browser to render a PDF from the Google's PDF translation service.
PDF is a complex format and can include many components that are text. To translate it I will describe solution from easy one to more advanced.
Translate raw text
If you only need the translation without the visual output, you can extract the text and give it to Google Translate.
Since you did not provide information on your project (language, environment, ...) I will redirect you to this thread on how to extract text
Translate all text
If you need to get text from everything in your PDF, well that's pretty hard. To avoid headache (partially) you can convert the PDF to an image (using imagemagick tools or similar) and then you have three options:
OCR the text from the image, then give it to google, again you are loosing the original form.
OCR the text, but saving the position (some libraries can do that, again since you did not specify your project information, see theses links: #1, #2, #3, #4).
Then translate it with google api, and write the result to the image. For great results you need to take account of text font, color and background color. Pretty difficult, but feasible.
Translate the image using google translate image service. Unfortunately this feature is not available in the public API, so unless doing some reverse engineering, this is not possible.
Translate using Google's PDF translation service
The solution you provide by using the translate site can be automated quite easily. The reason it's long is because it is an heavy process and you probably won't beat Google.
Using an headless browser, you can get the translation page with your pdf, then observe that the translated content is sitting in an iframe, get that iframe and finally print to PDF.
Here is a short example using SlimerJS (should be compatible for Phantomjs)
var page = require("webpage").create();
// here you may want to setup page size and options
// get the page
page.open('https://translate.google.fr/translate?hl=fr&sl=en&u=http://example.com/pdf-sample.pdf', function(status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
// find the iframe with querySelector
var iframe_src = page.evaluate(function() {
return document.querySelector('#contentframe').querySelector('iframe').src;
});
console.log('Found iframe: ' + iframe_src);
// render the iframe
page.open(iframe_src, function(status) {
// wait a bit for javascript to translate
// this can be optimized to be triggered in javascript when translation is done
setTimeout(function() {
// print the page into PDF
page.render('/tmp/test.pdf', { format: 'pdf' });
phantom.exit(0);
}, 2000);
});
}
});
Giving this file: http://www.cbu.edu.zm/downloads/pdf-sample.pdf
It produce this result (translated in French): (I posted a screenshot since I cannot embed PDF ;) )
Use Apache Tika to extract the text content of the pdf file(you should write the necessary java code), then use whatever API you want to use to translate it. But, as it has been mentioned above Google Translate is a paid service.

How to convert the uploaded file in Box.com?

https://salesforce.stackexchange.com/questions/45823/how-to-download-uploaded-file-in-box-com/46515#46515
In the above link the code will give an url (in the response) to download the file. But i want the file to be downloaded in HTML format. As per the box.com website i.e., https://developers.box.com/view/ they have given that we can convert the files to HTML. How can i do this using salesforce.
#Srilakshmi B S
If you want to download the file as HTML, you'll have to take the file and send it to the Box View service/API for conversion. The Box content API doesn't by itself allow you to convert documents for preview and download into HTML. That means that you'll need two separate tokens if using both API endpoint. If you are in fact planning to use both, there's some documentation here on how:
https://developers.box.com/using-the-view-api-with-the-content-api/

Getting images with HTTP Request in C

I am writing a program in C that acts like a proxy server in a Linux system: Client asks it for a web page,
it sends an HTTP GET Request to a distant server, and it gets the servers response (web page), which is saved in an .html file.
Here goes my problem: Most web sites got some references to images, so when i try to view the .html file proxy created, the images don't appear.
I have searched a lot, but found nothing..Is there a way to write some code to GET images too?
Thank you in advance
You're going to have to write code that parses the HTML file you get back and looks for image references (img tags), then queries the server for those image files. This is what web browsers are doing under the hood.
You have an additional problem though which is that the image references in the HTML file are to the original server. I'm assuming that since they don't load for you the server that returned the original HTML isn't available. In that case after you get each image file you will need to give it a name on the local filesystem and then alter the reference in the HTML (programmatically) to point to your new local image name.
So for example:
<img src='http://example.com/image1.png'>
would become
<img src='localImage1.png'>
If you're querying arbitrary websites then you'll also find that there are various other files you'll need to do the same with like CSS files and JavaScript files. In general its hard to mirror arbitrary web pages accurately - browsers have complex object models they use to interpret web pages because they have to deal with things like CSS and Javascript and you may need to be able to 'run' all that dynamic code to even be sure what files to download from the server (e.g. JavaScript including other JavaScript etc).

Resources