Form Recognizer Supported Documents except from Image and pdf - azure-form-recognizer

I am able to process the Pdf and image invoice documents using Microsoft Form Recognizer. When I am trying to process the invoices as Microsoft-Word or excel document, It is throwing an error of "Unsupported document type".
According to officail documentation, Supported file formats are JPEG, PNG, PDF, and TIFF. Is there a way to process the documents in excel or word format apart from the formats mentioned?
Thanks.

Form Recognizer does not yet support word or excel formats. Please convert these to PDF and then send them to Form Recognizer for extraction.

Related

How do I export data with attachments from a Lotus Notes Database into an Excel Spreadsheet or into a Microsoft Access Database?

Not a Lotus Notes Developer but have to get data in a Lotus Notes database into SharePoint. All of the LN entries have attachments. I tried to export to a csv file but that doesn't include the attachments. I think created a new view with the Attachments field but that only returns the number of attachments. How can I extract the associated attachments with each LN form. Thanks in advance
Your question is pretty broad. Attachments are (sometimes) treated as embedded objects in a Rich Text Field. This URL has some sample code:
https://www.ibm.com/support/knowledgecenter/en/SSVRGU_9.0.1/basic/H_EXAMPLES_EMBEDDEDOBJECTS_PROPERTY_RTITEM.html
Copy/paste may not work for you because the attachments may not be in a field called "Body" or there may be multiple "Body" fields on the document (which requires other considerations beyond the scope of this question), or the attachment may be embedded objects in the document. Or all the of the above. That that code will give you a sense of what you need to do.
Also, see this:
How to retrieve Lotus Notes attachments?
I have done this by writing LotusScript code to detach all the attachments from all docs into a single folder, using the document's UNID plus the attachment name for the filename in the folder. Adding the UNID covers cases where attachments with the same name exist in mulitple documents and might actually have different content. I do not attempt to de-duplicate.
The agent adds a NotesItem to each document giving the filename(s) of the detached attachment(s).
I then create a view containing all the fields that I want to export, including the new field with the filenames. I export that view to CSV. I hand the CSV and a zip file containing the attachments over to the SharePoint team.
Maybe a bit late but... I do have extensive experience (approx. 15 years) with data extraction from IBM Notes applications/databases - independent of the type of application - and have supported migrations of quite a few large IBM Notes applications to various targets for companies around the world.
You can access IBM Notes databases using the native C-API, LotusScript, COM or Java, for example or make a document available for further processing by exporting it to Domino XML (DXL) format.
The C-API is the foundation of IBM Notes, meaning that COM and Java APIs only offer a subset of the C-API's functionality. Any of the APIs should give you the ability to extract a document's metadata and attachments. However:
A document, including it's attachment, can be encrypted using an IBM Notes ID. If you do not have access to the ID that was used to encrypt the document, you will neither be able to extract the document nor the attachment.
Attachments can be "real attachments" or so called "embedded objects". Depending on the type of attachment, the attachment needs to be handled differently if it comes to the API calls required to do the export.
Attachments can be compressed. In most cases, the API should handle the decompression transparently. However, there is at least one proprietary compression algorithm (based on Hufman) that is widely used. If you extract documents in DXL format, you will not be able to read those attachments, as they are embedded into the DXL in compressed form.
Objects being embedded into a document using (Object Linking and Embeddeding (OLE)) cannot be extracted using the COM or Java API. I.e. even if you gain access to the documents, you will not be able to transform them into a readable format.
If the information you are trying to transfer from IBM Notes to SharePoint is important to the company you work for, I would recommend to rely on a proven solution for the export/migration rather than developing this on your own, as the details can really be tricky.
Should you have any further questions, don't hesitate to get in touch.

Convert FDF file content into PDF readable format as attachment using APEX

I have an FDF format attachment which needs to be converted into PDF format attachment. I am facing issues while reading the FDF file content.
I believe FDF formatted files are not text files. While you can technically read any file in Apex, you will not be able to parse the file, since it's in a format designed to be ready by an Adobe product.
The only way to work with this from Apex would be to run the Acrobat Forms Data Format Toolkit on another server and then perform a callout from Apex to the other server. Apex itself will not be able to work with the format.
This concept of running a web service as a form of middleware is commonly used and Apex does make it very easy to perform callouts.

IBM Watson, how to input data of entire books

Im using the IBM Watson analytics trial, it says it only takes data as CSV, Excel and a few others. How can i convert books or bodies of text into an acceptable format? thank you
It seems like the architecture of WCA(Watson Context Analytics) does not support PDF itself. Please refer the following images from IBM Link
I think it would be better to convert pdf to text with converter such as CONVERTER and pushing it into database or others.
Then, you can crawing the text data from it.
FYI, the document has to have a KEY column (i.e. name of the book).
Even if you do convert your book into an acceptable text format (.csv. .xls, .xlsx. .sav), Watson Analytics isn't optimized for text analytics. It sounds like Watson Explorer is the offering that'd best suit your needs.
Hope this helps.
Even though CSV or XLS is the acceptable format of the file, Datasets needs to be in the specific structure. You need headers for all the tables and data following it. I am not sure how a data of the book can fit into that format.
I have recently published this blog post on how to structure and refine data before importing into Watson Analytics to get the best results.
For your specific requirement, you can look into Watson Explorer as suggested by Brennan above, or even better you can learn to use IBM Content Analytics here.

Does angular provide any library or plugin to convert server side binary data to pdf or excel format

I have a requirement that my server will be sending data in binary form but I have to show the data in pdf or excel format.
Does angular provide any way to do so.
Please help.
https://stackoverflow.com/a/21732039/1618775
Try this, it can be your solution.
I advice you convert binary to pdf at serverside.

File Maker Scripting - Sending Different Attachment

Is there a way to send a mail with different PDF file to different contacts using file maker?
I am aware of sending batch emails with one attachment but I would like to send a personalize PDF for each contact which seems not so simple.
Also
Can I add PDF files to the table itself or would I have to use the path to the file?
Example:
Table 1
**Name** [James Brown] [James Blue]
**Email** [brown.j#gmail.com] [blue.j#gmail.com]
**PDFfileAttchamnet** [folder/PDF/JamesBrown.pdf] [folder/PDF/JamesBlue.pdf]
So an Email for James Brown would look like:
Dear James Brown, please see the attached file.
Attachment [JamesBrown.pdf] {actual file}
and
Dear James Blue, please see the attached file.
Attachment [JamesBlue.pdf] {actual file}
I think you can solve it by creating container field in you database and import the pdfs in it.
then you can use export Field Contents[] to export it and send it by email
Hope it useful
I would like to send a personalize PDF for each contact which seems
not so simple.
Find the records of contacts you want to include and loop among them, sending mail to each one individually (i.e. without selecting the 'Collect addresses across found set' option).
Can I add PDF files to the table itself or would I have to use the
path to the file?
You can do either, it's up to you. If the path to the file can be calculated (as in your example), you can calculate it right there in the Send Mail script step.
Note that you can also generate the PDF files during the process itself.
Do I understand correctly that you would actually like to personalize the PDF document(s)?
This is possible, maybe not very simple, but quite simple. The trick is to prepare the PDF as a form, and then fill the form fields to personalize.
PDF has a native forms data format (called FDF), which is described in ISO 32000 (as well as the older PDF specification documents provided by Adobe, as you can find in the Acrobat SDK, downloadable from the Adobe website).
FDF is a simple structured text file, which can easily be assembled using FileMaker (I have done that routinely for several catalog projects). The easiest way to get going is to open the form in Acrobat, fill in the fields, and then export the data as FDF. This gives you the pattern to "fill in the blanks".
So, you create the FDF files using Filemaker. With them you can fill the blank form and feed the saved document to the eMail system.
Which tool to use to fill the blank form depends on the volume you have to process. Acrobat is not very powerful (and you may end up in a bit of a legal gray zone, because Acrobat is not set up for being used as a service). There are applications which are made specifically for filling out forms on a server (such as FDFMerge by Appligent), or there are also several libraries which have the tools to fill out forms (iText or pdflib come to my mind). These applications also allow you to flatten the PDF, which means that there are no longer form fields, but their contents becomes part of the base.
The resulting file can now be either made to an eMail attachment, or you make it available on a server and send an eMail with the link to the file (which method you will use may depend on security and privacy regulations).

Resources