Preserving HTML with Watson Document Conversion - ibm-watson

We have Microsoft Word documents structured such that they generate quality Watson RaR JSON answer units using the Watson Document Conversation Service. However, any working links in the Word document are removed by the Doc Con service in the resulting JSON answer units which are just plane text.
Is there a way to configure the Doc Con service to preserve these links so that the link HTML appears in the resulting JSON answer units? If not, how do you suggest we proceed in getting Word documents with working links into our RaR corpus.

Currently the Doc Con (specifically the Microsoft Doc and Docx conversion) removes the external links. The internal links are preserved.
Unfortunately there is no configuration setting(s) to preserve the external links.

Related

How to import Alexa skill into API.AI / Dialogflow?

I'm trying to export my Alexa Skill / import it into Dialogflow (used to be called API.AI), but I'm getting the following error message:
Invalid Alexa schema json file.
My Zip file is the index.js file and the node_modules folder zipped together. Then I added the Alexa Skill JSON named schema.json to the zip too, but it still gives the same error.
I cannot find instructions on how to export the correct Alexa .zip for import, nor how to format the zip to build it myself. I've been searching for a while -- does anyone know how to do this? (I emailed their support already, but no response yet.)
There were some updates to the Alexa Interaction Model, so the Dialogflow Alexa Importer doesn't seem to work anymore.
There are a few things to consider when porting an Alexa Model into a Dialogflow Agent:
Built-in Intents: You need to create custom Dialogflow intents for built-in Alexa intents like AMAZON.HelpIntent
Built-in Slots: Amazon offers a large variety of slots (e.g. AMAZON.Number) that need to be converted to Dialogflow. For this, Dialogflow offers System Entities. Find all System Entities here.
I created a complete step by step guide and video that uses the Jovo Language Model to translate an Alexa Model into a Dialogflow Agent. You can find it here: Tutorial: Turn an Alexa Interaction Model into a Dialogflow Agent.
Here is an example of the format for the zip: https://github.com/dialogflow/fulfillment-webhook-importer-nodejs/tree/master/skill/speechAssets
The zip should have two files: IntentSchema.json and SampleUtterances.txt
Here is how to get IntentSchema.json and SampleUtterances.txt:
Go to https://developer.amazon.com/edw/home.html#/skills to view all your skills.
Select the skill you'd like to export to by clicking on the skill name for the corresponding skill:
On the left select "Interaction Model" from the list and you should see the screenshot below:
Copy the contents of the the editor and paste it into your IntentSchema.json file and save it.
Next, copy the contents into the editor of the "Sample Utterances" section and paste into your SampleUtterances.txt file and save:
Lastly zip up your IntentSchema.json and SampleUtterances.txt files and upload them to Dialogflow
I'm not sure if your still working on this but if anyone else is stuck, the files you zip have to read IntentSchema.json and SampleUtterances.txt exactly.

Can I bulk load documents with inline attachments to Cloudant?

I want to bulk load a set of documents into a Cloudant DB. Cloudant provides a REST endpoint for this, _bulk_docs. But some of the documents I want to load contain attachments. If I were creating these documents individually, I could create the attachment along with the document by including it as an inline attachment. But it's not clear whether the _bulk_docs endpoint supports documents with inline attachments. The documentation does not say one way or the other, and my own attempts are so far unsuccessful.
Can someone please give an authoritative answer on whether the _bulk_docs endpoint of Cloudant supports docs with inline attachments?
The _bulk_docs endpoint does support docs with inline attachments.
I finally got this to work. My earlier attempts failed due to an unrelated problem. I was able to successfully bulk load 51 documents, 14 of which have inline attachments.

CAN I use Google Application Engine to implement this project?

I take web application course this semester and I want to use google application engine to implement my course project, but I'm wondering if GAE can satisfy this project's requirements.
This course project is a homework submittal system which allows users(students) uploading homework to the sever and teachers checking homework online.
Assuming homework students uploaded is some html and css stuff. What confused me is how to implemnent teacher checking online function? For example:
Student A uploaded a html file hello.html and teacher want to use http: //xxx.xx/xx/xx/hello.html to check this homework.
Can GAE satisfy this requirement? As far as I konw, GAE uses app.yaml to point to different files or htmls, but when students upload their homework, they can not change app.yaml,right?
I get stuck here. Please help me. Thank you!
Yes, you can use GAE to create this application, but you'll have to move away from the idea that you are uploading and serving an HTML file as if it were living directly on the filesystem. You can't do that.
What you can do -- relatively easily -- is store the submitted file or files as datastore objects and provide a URL which takes the desired filename as a parameter and serves it out of the datastore.
You could store the submitted files in a model like this:
class HomeworkItem(db.Model):
author = db.UserProperty()
filename = db.StringProperty()
content = db.TextProperty(multiline=True)
submitted_on = db.DateProperty()
The content field is declared as a TextProperty assuming that you are dealing with HTML and CSS files, but if you were ever going to deal dealing with binary data, you'd want to use a BlobProperty.
You'd need to have two URLs to handle upload and download of assets. You can use a web framework or write some code to handle parameterized URLs, allowing you to encode things like the filename into the URL itself, like this:
http://homeworkapp.edu/review/hello.html
And then the method that handles /review/* URLs would retrieve the data from the datastore and send it back as the reply.
GAE would satisfy your requirement but you would need to save each “hello.html” file in either the Blobstore or the Datastore and build some system to retrieve and serve the uploaded files. See this Q&A for further reference.

Full Text Search in GAE

I've developing a test app using GAE/J + Objectify and now trying to query my Data with Full Text Search (assuming Full Text Search is same as queries my data with GQL).
When I go to http://localhost:8888/_ah/admin/search it shows me following error:
There are no Full Text Search indexes in the Empty namespace. You need
to add data programatically before you can use this tool to view and
edit it.
I do have some data in my database to test the search.
What should I do enable that to search my data using GQL.
I believe FTS is still under a trusted tester program (I haven't seen any official docs even for the dev environment)
You might want to try signing up here:
https://docs.google.com/a/google.com/spreadsheet/viewform?formkey=dEdWcnRJUXZ2VGR3YmVsT1Q1WVB2Smc6MQ

In Drupal, is there a way to index files (pdf, doc) that were submitted via a Webform?

I'm trying to figure out a solution on how to be able to index/search PDF, doc, and maybe txt files that were uploaded via a webform. I've found a module (Search API attachments) that will index files but it appears that it only indexes files that are attached to nodes. :(
Our client wants to be able to search the contents of resumés that are submitted from a webform.
If your clients are expecting hundreds of nodes, it might be worthwhile to set up an Apache Solr. Then you can use Tika to index all kinds of files: http://tika.apache.org/
If that's not an option, you can write a custom module that uses the Webform API that saves the attached file as a node... and then use your Search API attachments module.

Resources