IBM Watson, how to input data of entire books - ibm-watson

Im using the IBM Watson analytics trial, it says it only takes data as CSV, Excel and a few others. How can i convert books or bodies of text into an acceptable format? thank you

It seems like the architecture of WCA(Watson Context Analytics) does not support PDF itself. Please refer the following images from IBM Link
I think it would be better to convert pdf to text with converter such as CONVERTER and pushing it into database or others.
Then, you can crawing the text data from it.
FYI, the document has to have a KEY column (i.e. name of the book).

Even if you do convert your book into an acceptable text format (.csv. .xls, .xlsx. .sav), Watson Analytics isn't optimized for text analytics. It sounds like Watson Explorer is the offering that'd best suit your needs.
Hope this helps.

Even though CSV or XLS is the acceptable format of the file, Datasets needs to be in the specific structure. You need headers for all the tables and data following it. I am not sure how a data of the book can fit into that format.
I have recently published this blog post on how to structure and refine data before importing into Watson Analytics to get the best results.
For your specific requirement, you can look into Watson Explorer as suggested by Brennan above, or even better you can learn to use IBM Content Analytics here.

Related

How do I export data with attachments from a Lotus Notes Database into an Excel Spreadsheet or into a Microsoft Access Database?

Not a Lotus Notes Developer but have to get data in a Lotus Notes database into SharePoint. All of the LN entries have attachments. I tried to export to a csv file but that doesn't include the attachments. I think created a new view with the Attachments field but that only returns the number of attachments. How can I extract the associated attachments with each LN form. Thanks in advance
Your question is pretty broad. Attachments are (sometimes) treated as embedded objects in a Rich Text Field. This URL has some sample code:
https://www.ibm.com/support/knowledgecenter/en/SSVRGU_9.0.1/basic/H_EXAMPLES_EMBEDDEDOBJECTS_PROPERTY_RTITEM.html
Copy/paste may not work for you because the attachments may not be in a field called "Body" or there may be multiple "Body" fields on the document (which requires other considerations beyond the scope of this question), or the attachment may be embedded objects in the document. Or all the of the above. That that code will give you a sense of what you need to do.
Also, see this:
How to retrieve Lotus Notes attachments?
I have done this by writing LotusScript code to detach all the attachments from all docs into a single folder, using the document's UNID plus the attachment name for the filename in the folder. Adding the UNID covers cases where attachments with the same name exist in mulitple documents and might actually have different content. I do not attempt to de-duplicate.
The agent adds a NotesItem to each document giving the filename(s) of the detached attachment(s).
I then create a view containing all the fields that I want to export, including the new field with the filenames. I export that view to CSV. I hand the CSV and a zip file containing the attachments over to the SharePoint team.
Maybe a bit late but... I do have extensive experience (approx. 15 years) with data extraction from IBM Notes applications/databases - independent of the type of application - and have supported migrations of quite a few large IBM Notes applications to various targets for companies around the world.
You can access IBM Notes databases using the native C-API, LotusScript, COM or Java, for example or make a document available for further processing by exporting it to Domino XML (DXL) format.
The C-API is the foundation of IBM Notes, meaning that COM and Java APIs only offer a subset of the C-API's functionality. Any of the APIs should give you the ability to extract a document's metadata and attachments. However:
A document, including it's attachment, can be encrypted using an IBM Notes ID. If you do not have access to the ID that was used to encrypt the document, you will neither be able to extract the document nor the attachment.
Attachments can be "real attachments" or so called "embedded objects". Depending on the type of attachment, the attachment needs to be handled differently if it comes to the API calls required to do the export.
Attachments can be compressed. In most cases, the API should handle the decompression transparently. However, there is at least one proprietary compression algorithm (based on Hufman) that is widely used. If you extract documents in DXL format, you will not be able to read those attachments, as they are embedded into the DXL in compressed form.
Objects being embedded into a document using (Object Linking and Embeddeding (OLE)) cannot be extracted using the COM or Java API. I.e. even if you gain access to the documents, you will not be able to transform them into a readable format.
If the information you are trying to transfer from IBM Notes to SharePoint is important to the company you work for, I would recommend to rely on a proven solution for the export/migration rather than developing this on your own, as the details can really be tricky.
Should you have any further questions, don't hesitate to get in touch.

How to instruct IBM Watson Discovery about the format of my documents?

I am trying to use the Watson Discovery service to build a virtual customer support agent. We have many documents with tons of Q and A in various formats. In the simplest case, we just have a doc, with an array of:
Q:..
A:...
Q:...
A:...
etc. When we upload these PDF files and then try to query it, it returns the full document that included the relevant answer. Is there a way to instruct Discover service, so that it will only return the relevant question and answer pair instead of the full document?
To have Discovery return the individual relevant QA pairs, they should be split up and passed to the service as separate documents. Discovery does not have a method to split a single document on it's own.
If your primary requirement is Q&A, you might probably look into Retrieve-Rank
Discovery is used to deal with complex unstructured data, in your case you have data in a consistent format.
Have a look at this sample app here

Generate a series of documents based on SQL table

I am trying to formulate a proposal for an application that allows a user to print a batch of documents based on data stored in a SQL table. The SQL table indicates which documents are due and also contains all demographic information. This is outside of what I normally do and am trying to see if these is a platform/application that already exists to do such a task
For example
List of all documents: Document #1 - Document #10
Person 1 is due for document #: 1,5,7,8
Person 2 is due for document #: 2.6
Person 3 is due for document #: 7,8,10
etc
Ideally, what I would like is for the user to be able to push a button and get a printed stack of documents that have been customized for each user including basic demographic info like name, DOB, etc
Like i said at the top, I already have all of the needed information in a database, I am just trying to figure out the best approach to move that information onto a document
I have done some research and found some people have used mail merge in Word or using Access as a front end but I don't know if this is the best way. I've also found this document. Any advice would be greatly appreciated
If I understand your problem correctly, your problem is two-fold: Firstly, you need to find a way to generated documents based on data (mail-merge) and secondly, you might need to print them two.
For document generation you have two basic approaches: template-based and programmatically from scratch. I suppose that you will opt for a template based approach which basically means that you design (in MS Word) a template document (Word, RTF, ...) that acts as a template and contains placeholders and other tags that designate »dynamic« parts of the document. Then, at document generation time, you need a .NET library/processor that you will pass this template document and the data, where the processor will populate the template with the data and return the resulting document.
One way to achieve this functionality would be employing MS Words' native mail-merge, but you should know that this would involve using Office COM and Word Application Automation which should be avoided almost always.
Another option is to build such a system on top of Open XML SDK. This is velid option, but it will be a pretty demanding task and will most probably cost you much more than buying a commercial .NET library that does mail-merge out-of-the-box – been there, done that. But of course, the good side here is that you will be able to tailer the solution to your needs. If you go down this road I recoment that you use Content Controls for tagging documents/templates. The solution with CCs will be much easier to implement than the solution with bookmarks.
I'm not very familliar with the open source solutions and I'm not sury how many there are that can do mail-merge. One I know is FlexDoc (on CodePlex) but its problem is that uses a construct (XmlControl) for tagging that is depricated in Word 2010+.
Then there are commercial solutions. Again I don't know them in detail but I know that the majority of them are a general purpose document processing libraries. Our company has been using this document generation toolkit for some time now and I can say it covers all our »template-based document generation« needs. It doesn't require MS Word at doc generation time, and has really helpful add-in for MS word and you only need several lines of code to integrate it in your project. Templating is very powerful and you can set-up a template in a very short time. While templates are Word documents, you can generate PDF or XPS docs as well. XPS is useful because you can use .NET/WPF prining framework that works with XPS docs to print documents. This is a very high-end solution, but of course, the downside here is that it is not a free solution.

Export content from an ecommerce site without using the Backend

I have a site that I'm looking to transfer to Volusion. Importing tabled content into Volusion's a breeze, it's getting it tabled that's an issue. The old site has no real ability to export, nor do I know how to get at it's database. I'm thinking there must be some sort of script I can write to take the content from the frontend and download it in some sort of list that I can put into a CSV, and put into Volusion.
www.twincitygreetings.com
Any suggestions? I'm hoping to get in the image directory as well and download all them for upload to the new site.
You are going to need at the very least a file with product code, product name, weight and price.
Looking at the URL you provided it doesn't appear that the products their follow any type of orderly structure where you can target the images folder or products based on a known piece of information like a products code. Unless the back-end has some type of product export function you may have no choice but to recreate it from scratch.
I don't know if you solved this yet or not, but I would suggest scraping the data providing you have the information on the old site currently. This can be done easily using vbscript and excel, or if you aren't very savvy at coding you could look at a piece of software called mozenda. There are a whole variety of methods that can be used to scrape data, all of them pretty easy to learn with a bit of research. Basically you write a script that will crawl your dom and extract the data (to xml works best in my experience)
Hope this helps.

What is the data format required for input in the IBM-Watson cloud product?

I'm having a hard time figuring out what type of data watson accepts: RDF triples, relational, delimited text, etc.
There's really no documentation anywhere.
Does anyone know?
Watson currently eats unstructured English Prose in HTML, Word Doc, Text, and certain formats of PDF.
Some API documentation can be found here: https://www.ibmdw.net/watson/wp-content/uploads/sites/19/2013/11/An-Ecosystem-Of-Innovation-Creating-Cognitive-Applications-PoweredByWatson.pdf
You can also get a bit more if you go to the bottom of the mobile developer challenge page that's here: ibmwatson.com (see 'Helpful Hints about Watson')
If there's other documentation you're looking for, specific feedback would be helpful to pass on

Resources