How do we get the document file url using the Watson Discovery Service? - ibm-watson

I don't see a solution to this using the available api documentation.
It is also not available on the web console.
Is it possible to get the file url using the Watson Discovery Service?

If you need to store the original source/file URL, you can include it as a field within your documents in the Discovery service, then you will be able to query that field back out when needed.

I also struggled with this request but ultimately got it working using Python bindings into Watson Discovery. The online documentation and API reference is very poor; here's what I used to get it working:
(Assume you have a Watson Discovery service and have a created collection):
# Programmatic upload and retrieval of documents and metadata with Watson Discovery
from watson_developer_cloud import DiscoveryV1
import os
import json
discovery = DiscoveryV1(
version='2017-11-07',
iam_apikey='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
url='https://gateway-syd.watsonplatform.net/discovery/api'
)
environments = discovery.list_environments().get_result()
print(json.dumps(environments, indent=2))
This gives you your environment ID. Now append to your code:
collections = discovery.list_collections('{environment-id}').get_result()
print(json.dumps(collections, indent=2))
This will show you the collection ID for uploading documents into programmatically. You should have a document to upload (in my case, an MS Word document), and its accompanying URL from your own source document system. I'll use a trivial fictitious example.
NOTE: the documentation DOES NOT tell you to append , 'rb' to the end of the open statement, but it is required when uploading a Word document, as in my example below. Raw text / HTML documents can be uploaded without the 'rb' parameter.
url = {"source_url":"http://mysite/dis030.docx"}
with open(os.path.join(os.getcwd(), '{path to your document folder with trailing / }', 'dis030.docx'), 'rb') as fileinfo:
add_doc = discovery.add_document('{environment-id}', '{collections-id}', metadata=json.dumps(url), file=fileinfo).get_result()
print(json.dumps(add_doc, indent=2))
print(add_doc["document_id"])
Note the setting up of the metadata as a JSON dictionary, and then encoding it using json.dumps within the parameters. So far I've only wanted to store the original source URL but you could extend this with other parameters as your own use case requires.
This call to Discovery gives you the document ID.
You can now query the collection and extract the metadata using something like a Discovery query:
my_query = discovery.query('{environment-id}', '{collection-id}', natural_language_query="chlorine safety")
print(json.dumps(my_query.result["results"][0]["metadata"], indent=2))
Note - I'm extracting just the stored metadata here from within the overall returned results - if you instead just had:
print(my_query) you'll get the full response from Discovery ... but ... there's a lot to go through to identify just your own custom metadata.

Related

Flask Restplus mutlipart/form-data model

Is it possible with Flask Restplus to create a model for a multipart/form-data request so that I can use it to validate the input with #api.expect?
I have this complex data structure for which I've created a api.namespace().model that has to be received together with a file. However when I tried to document the endpoint I noticed that this doesn't seem to be supported by Flask Restplus.
I've tried to find something along the lines of
parser = ns.parser()
parser.add_argument("jsonModel", type=Model, location="form")
parser.add_argument("file", type=FileStorage, location="files")
and
formModel = ns.model("myForm", {"jsonModel": fields.Nested(myModel), "file": fields.File})
But neither methods seem to support this kind of behavior.

Make a solr query from Geotools through geoserver

I come here because I am searching (like the title mentionned) to do a query from geotools (through geoserver) to get feature from a solr index.
To be more precise :
I saw on geoserver user manual that i can do query on solr like this in http :
http://localhost:8080/geoserver/wfs?service=WFS&version=1.1.0&request=GetFeature
&typeName=mySolrLayer
&format="xxx"
&viewparams=q:"mySolrQuery"
The important part on this URL is the viewparams that I want to use directly from geotools.
I have already test this case (this is a part of my code):
url = new URL(
"http://localhost:8080/geoserver/wfs?request=GetCapabilities&VERSION=1.1.0";
);
Map<String, String> param = new HashMap();
params.put(WFSDataStoreFactory.URL.key, url);
param.put("viewparams","q:myquery");
Hints hints = new Hints();
hints.put(Hints.VIRTUAL_TABLE_PARAMETERS, viewParams);
query.setHints(hints);
...
featureSource.getFeatures(query);
But here, it seems to doesn't work, the url send to geoserver is a normal "GET FEATURE" request without the viewparams parameter.
I tried this with geotools-12.2 ; geotools-13.2 and geotools-15-SNAPSHOT but I didn't succeed to pass the query, geoserver send me all the feature in my database and doesn't take "viewparams" as a param.
I need to do it like this because actually the query come from another program and I would easily communicate this query to another part of the project...
If someone can help me ?
There doesn't currently seem to be a way to do this in the GeoTool's WFSDatastore implementations as the GetFeature request is constructed from the URL provided by the getCapabilities document. This is as the standard requires but it may be worth making a feature enhancement request to allow clients to override this string (as QGIS does for example) which would let you specify the additional parameter in your base URL which would then be passed to the server as you need.
Unfortunately the WFS module lives in Unsupported land at present so unless you have resources to work on this issue yourself and can provide a PR to implement it there is not a great chance of it being implemented.

Parsing Swagger JSON data and storing it in .net class

I want to parse Swagger data from the JSON I get from {service}/swagger/docs/v1 into dynamically generated .NET class.
The problem I am facing is that different APIs can have different number of parameters and operations. How do I dynamically parse Swagger JSON data for different services?
My end result should be list of all APIs and it's operations in a variable on which I can perform search easily.
Did you ever find an answer for this? Today I wanted to do the same thing, so I used the AutoRest open source project from MSFT, https://github.com/Azure/autorest. While it looks like it's designed for generating client code (code to consume the API documented by your swagger document), at some point on the way producing this code it had to of done exactly what you asked in your question - parse the Swagger file and understand the operations, inputs and outputs the API supports.
In fact we can get at this information - AutoRest publically exposes this information.
So use nuget to install AutoRest. Then add a reference to AutoRest.core and AutoRest.Model.Swagger. So far I've just simply gone for:
using Microsoft.Rest.Generator;
using Microsoft.Rest.Generator.Utilities;
using System.IO;
...
var settings = new Settings();
settings.Modeler = "Swagger";
var mfs = new MemoryFileSystem();
mfs.WriteFile("AutoRest.json", File.ReadAllText("AutoRest.json"));
mfs.WriteFile("Swagger.json", File.ReadAllText("Swagger.json"));
settings.FileSystem = mfs;
var b = System.IO.File.Exists("AutoRest.json");
settings.Input = "Swagger.json";
Modeler modeler = Microsoft.Rest.Generator.Extensibility.ExtensionsLoader.GetModeler(settings);
Microsoft.Rest.Generator.ClientModel.ServiceClient serviceClient;
try
{
serviceClient = modeler.Build();
}
catch (Exception exception)
{
throw new Exception(String.Format("Something nasty hit the fan: {0}", exception.Message));
}
The swagger document you want to parse is called Swagger.json and is in your bin directory. The AutoRest.json file you can grab from their GitHub (https://github.com/Azure/autorest/tree/master/AutoRest/AutoRest.Core.Tests/Resource). I'm not 100% sure how it's used, but it seems it's needed to inform the tool about what is supports. Both JSON files need to be in your bin.
The serviceClient object is what you want. It will contain information about the methods, model types, method groups
Let me know if this works. You can try it with their resource files. I used their ExtensionLoaderTests for reference when I was playing around(https://github.com/Azure/autorest/blob/master/AutoRest/AutoRest.Core.Tests/ExtensionsLoaderTests.cs).
(Also thank you to the Denis, an author of AutoRest)
If still a question you can use Swagger Parser library:
https://github.com/swagger-api/swagger-parser
as simple as:
// parse a swagger description from the petstore and get the result
SwaggerParseResult result = new OpenAPIParser().readLocation("https://petstore3.swagger.io/api/v3/openapi.json", null, null);

Google Doc Download URL is null

I am trying to read the google doc using drive api. When I print the file metadata it prints as below:
[s~sakshumweb-hrd/3.370043974717039698].<stdout>: invite_friends_email:{"displayName":"Vivek Kumar","isAuthenticatedUser":true,"kind":"drive#user","permissionId":"13178633125197568962","picture":{"url":"https://lh5.googleusercontent.com/-4ElLv3j4-eI/AAAAAAAAAAI/AAAAAAAAAfQ/3b6TZenyTyA/s64/photo.jpg"}}
I 2013-09-06 19:35:41.489
[s~sakshumweb-hrd/3.370043974717039698].<stdout>: Download url is:null
The code printing it is as below:
System.out.println(file.getTitle() + ":" + file.getOwners().get(0) );
System.out.println("Download url is:" + file.getDownloadUrl());
Any idea why it comes null? Ultimately I want to read the file contents in my GAE for java code. So, if there is any other way to read then that would be fine too.
Look at the complete item metadata. If there is no download URL, it is usually because the document is a native Google Doc, in which case you should use exportLinks in place of downloadUrl. Another possibility is that you only have metadata scope, so don't have permission to access the content.

Using bottle.py and blobstore GAE

I recently started using bottle and GAE blobstore and while I can upload the files to the blobstore I cannot seem to find a way to download them from the store.
I followed the examples from the documentation but was only successful on the uploading part. I cannot integrate the example in my app since I'm using a different framework from webapp/2.
How would I go about creating an upload handler and download handler so that I can get the key of the uploaded blob and store it in my data model and use it later in the download handler?
I tried using the BlobInfo.all() to create a query the blobstore but I'm not able to get the key name field value of the entity.
This is my first interaction with the blobstore so I wouldn't mind advice on a better approach to the problem.
For serving a blob I would recommend you to look at the source code of the BlobstoreDownloadHandler. It should be easy to port it to bottle, since there's nothing very specific about the framework.
Here is an example on how to use BlobInfo.all():
for info in blobstore.BlobInfo.all():
self.response.out.write('Name:%s Key: %s Size:%s Creation:%s ContentType:%s<br>' % (info.filename, info.key(), info.size, info.creation, info.content_type))
for downloads you only really need to generate a response that includes the header "X-AppEngine-BlobKey:[your blob_key]" along with everything else you need like a Content-Disposition header if desired. or if it's an image you should probably just use the high performance image serving api, generate a url and redirect to it.... done
for uploads, besides writing a handler for appengine to call once the upload is safely in blobstore (that's in the docs)
You need a way to find the blob info in the incoming request. I have no idea what the request looks like in bottle. The Blobstoreuploadhandler has a get_uploads method and there's really no reason it needs to be an instance method as far as I can tell. So here's an example generic implementation of it that expects a webob request. For bottle you would need to write something similar that is compatible with bottles request object.
def get_uploads(request, field_name=None):
"""Get uploads for this request.
Args:
field_name: Only select uploads that were sent as a specific field.
populate_post: Add the non blob fields to request.POST
Returns:
A list of BlobInfo records corresponding to each upload.
Empty list if there are no blob-info records for field_name.
stolen from the SDK since they only provide a way to get to this
crap through their crappy webapp framework
"""
if not getattr(request, "__uploads", None):
request.__uploads = {}
for key, value in request.params.items():
if isinstance(value, cgi.FieldStorage):
if 'blob-key' in value.type_options:
request.__uploads.setdefault(key, []).append(
blobstore.parse_blob_info(value))
if field_name:
try:
return list(request.__uploads[field_name])
except KeyError:
return []
else:
results = []
for uploads in request.__uploads.itervalues():
results += uploads
return results
For anyone looking for this answer in future, to do this you need bottle (d'oh!) and defnull's multipart module.
Since creating upload URLs is generally simple enough and as per GAE docs, I'll just cover the upload handler.
from bottle import request
from multipart import parse_options_header
from google.appengine.ext.blobstore import BlobInfo
def get_blob_info(field_name):
try:
field = request.files[field_name]
except KeyError:
# Maybe form isn't multipart or file wasn't uploaded, or some such error
return None
blob_data = parse_options_header(field.content_type)[1]
try:
return BlobInfo.get(blob_data['blob-key'])
except KeyError:
# Malformed request? Wrong field name?
return None
Sorry if there are any errors in the code, it's off the top of my head.

Resources