Microsoft Azure Search - azure-cognitive-search

I have a use case in which:
I want to store images in Microsoft Blob storage,
Search images by giving text input like 'water' then all images which contain water in any way should appear in the search result.
I followed below link:
https://github.com/Azure/LearnAI-Cognitive-Search/blob/master/05-Lab-2-Image-Skills.md
But here I get to know that there are only 2 predefined skills which are ImageAnalysisSkill and OcrSkill which do not give full images as the search result.
Please help...

Programatically, you can use the Image Analysis Cognitive Skill to automatically extract tags from the images.
See https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-image-analysis for more information on this skill.
Your skill would look something like this:
{ "#odata.type": "#Microsoft.Skills.Vision.ImageAnalysisSkill",
"context": "/document/normalized_images/*",
"visualFeatures": [
"Tags",
"Description"
],
"defaultLanguageCode": "en",
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "tags",
"targetName": "myTags"
},
{
"name": "description",
"targetName": "myDescription"
}
]
}
Then in the index, make sure to create a field of type Collection(Edm.String) to contain the list of tags. Let's call that the imageTags field. Make sure that field is searchable.
In the output field mappings (a property of the indexer) you will need to map the list of tags to the newly created imageTags field as follows:
"outputFieldMappings": [
{
"sourceFieldName": "/document/normalized_images/*/myDescription/tags/*",
"targetFieldName": "imageTags"
}
This will ensure that each of the tags found on the images are insterted in the imageTags array.
Please also read this document that explains how to extract normalized_imagesif you are not familiar with that already: https://learn.microsoft.com/en-us/azure/search/cognitive-search-concept-image-scenarios

Related

Why are my dataset features IDs undefined in Mapbox GL while I have set them?

I struggle to set feature IDs using mapbox GL.
I've read that you can auto-generate IDs using generateId:true in your source:
Whether to generate ids for the geojson features. When enabled, the
feature.id property will be auto assigned based on its index in the
features array, over-writing any previous values.
Except that I want to use my data at other places than just the mapbox map (a list of markers aside); so I would like to set them manually because I want to be able to target my feature on the map from my list aside. So, I don't want to use generateId:true here.
In the doc, their dataset example is like
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"id": "marker-iv1qi3x10",//an ID here
"title": "Burnham Park",
"description": "A lakefront park on Chicago's south side.",
"marker-size": "medium",
"marker-color": "#1087bf",
"marker-symbol": "marker-blue"
},
"geometry": {
"coordinates": [
-87.603735,
41.829985
],
"type": "Point"
},
"id": "0de616c939ce2f31676ff0294c78321b"//another ID here
}
]
}
So they have an ID in the feature object "id": "0de616c939ce2f31676ff0294c78321b", and another ID in the properties of that feature "id": "marker-iv1qi3x10".
I guess that the ID that mapbox uses internally for features (and auto-generated when generateId is set to true in your source) is the first one.
Let's say I set the IDs manually:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"id": "customPropId01"
},
"geometry": {
"coordinates": [
-87.603735,
41.829985
],
"type": "Point"
},
"id": "customID01"
}
]
}
When inspecting the data when the source has loaded, my custom IDs are still in place (using this code).
//when a specific source has been loaded
map.on('sourcedata', (e) => {
if (e.sourceId !== 'markers') return;
if (!e.isSourceLoaded) return;
console.log("SOURCE DATA LOADED",e.source);
});
But when I click on a marker on the map and that I log it, the ID property of my feature has been removed and is now undefined:
Rather than using my input source data to list my markers, I also had a look at querySourceFeatures, but this doesn't help since it returns only the features in the map bouding box - and I want my listing to display all the features, that is why I need to use the "raw" source data there.
This is driving me crazy.
Does anyone knows why the IDs are unset and how I could fix this ?
Thanks !
Your id is a string. But if you want preserve your id, this string should be cast in integer like the doc explain:
Features are identified by their id attribute, which must be an integer or a string that can be cast to an integer.
I found out that you can set promoteId to the feature property you want to use as IDs :
map.addSource('markers',{
type:"geojson",
data:"http://www.example.com/markers.geojson",
promoteId:'unique_id'
})
With promoteId:'unique_id', the property unique_id of my features will be used to set each feature's ID.

JSON schema deeper object uniqueness

I'm trying to get into JSON schema definitions and wanted to find out, how to achieve a deeper object uniqueness in the schema definition. Please look at the following example definition, in this case a simple IO of a module.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"required": ["modulIOs"],
"properties": {
"modulIOs": {
"type": "array",
"uniqueItems": true,
"items": {
"allOf": [
{
"type": "object",
"required": ["ioPosition","ioType","ioFunction"],
"additionalProperties": false,
"properties": {
"ioPosition": {
"type": "integer"
},
"ioType": {
"type":"string",
"enum": ["in","out"]
},
"ioFunction": {
"type":"string"
}
}
}
]
}
}
}
}
When I validate the following with i.E. draft-06 I get a positive validation.
{"modulIOs":
[
{
"ioPosition":1,
"ioType":"in",
"ioFunction":"240 V AC in"
},
{
"ioPosition":1,
"ioType":"in",
"ioFunction":"24 V DC in"
}
]
}
I'm aware that the validation is successfull because the validator does what he's intended to - it checks the structure of a JSON-object, but is there a possibility to validate object value data in deeper objects or do i need to perform the check elsewhere?
This is not currently possible with JSON Schema (at draft-7).
There is an issue raised on the official spec repo github for this: https://github.com/json-schema-org/json-schema-spec/issues/538
If you (or anyone reading this) really wants this, please thumbsup the first issue comment.
It's currently unlikely to make it into the next draft, and even if it did, time to impleemntations picking it up may be slow.
You'll need to do this validation after your JSON Schema validation process.
You can validate data value of your object fields by using JSON schema validation.
For example, if you need to check if ioPosition is between 0 and 100 you can use:
"ioPosition": {
"type": "integer",
"minimum": 0,
"maximum": 100
}
If you need to validate ioFunction field you can use regualr expression such as:
"ioFunction": {
"type": "string",
"pattern": "^[0-9]+ V [A,D]C"
}
Take a look at json-schema-validation.

Prioritize search results based on certain parameter in AzureSearch.

I have an index on AzureSearch similar to this one:
"fields": [
{
"name": "key",
"type": "Edm.String",
"filterable": true,
},
{
"name": "title",
"type": "Edm.String",
"searchable": true
},
{
"name": "followers",
"type": "Collection(Edm.String)",
"filterable": true,
}
]
Here, title is the title of a Post and its text searchable. followers contains the user ids of users who are following that particular Post.
I am getting current logged in userId from session. Now when a user does some text search, I want to show those Posts on top which current user is following.
Please tell if this is achievable in AzureSearch using ScoringProfiles or anything else?
Tag boosting in ScoringProfile does exactly that. All you need to do is to add a scoring profile as below :
{
"scoringProfiles": [
  {
    "name": "personalized",
    "functions": [
    {
      "type": "tag",
      "boost": 2,
      "fieldName": "followers",
      "tag": { "tagsParameter": "follower" }
    }
    ]
  }
  ]
}
Then, at query time, issue a search query with the scoring profile with the parameters to customize the ranking :
docs?search=some%20post&&scoringProfile=personalized&scoringParameter=follower:user_abc
Hope this helps. You can read more about it here.
https://azure.microsoft.com/en-us/blog/personalizing-search-results-announcing-tag-boosting-in-azure-search/
Nate

Watson conversation service validation

Is there any way of validating the user input which uses context variable?
My context variable stores the email address,so I would like the validation to check for the "#" sign.
Is there any way of doing this?
You can use the context variable with regex to extract the e-mail address, and after your code just validate the information, if the variableEmail = context.mail, do it... I cant help you with the code because you did not report your Programmation language.
But, if you want to saves the mail address in a context variable.
I made a conversation example so you know how to do it, here are the steps:
Part I:
Part II:
Part III:
The JSON files
Name example:
{
"context": {
"name": "<? input.text?>"
},
"output": {
"text": {
"values": [
"Hi $name, please report your e-mail address."
],
"selection_policy": "sequential"
}
}
}
Mail example:
{
"context": {
"mail": "<? input.text.extract('[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+(\\.[a-zA-Z]+){1,}',0) ?>"
},
"output": {
"text": {
"values": [
"Thanks very much, your name is $name and your mail is $mail."
],
"selection_policy": "sequential"
}
}
}
And finnaly, the result is:
If you want knows how to validate mail, search with the programming language you are developing the application and don't forget: the informations is saved inside: context.name or context.mail, according to my example.

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.
Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

Resources