How to decode encoded data in Azure Search? - azure-cognitive-search

I am using Azure Search for search in my Web API.
Indexer Source used: SQL Azure Database
The column that is being indexed has special characters. Hence I am encoding the data when pushing into Azure Search.
Is there a way to decode the data as part of retrieval from Azure Search?
Only documentation I could find online on decoding in Azure Search is for decoding metadata of blobs.

The question involves using a table column that may contain key-invalid characters as the document key.
To make this work, define a field mapping and apply base64Encode function. See https://azure.microsoft.com/en-us/documentation/articles/search-indexer-field-mappings/#base64EncodeFunction
Note that Azure Search will store the encoded value as the key of your document. If you actually want to search / filter on the original value, fork the original value in your field mappings, like this:
"fieldMappings" : [
{ "sourceFieldName" : "Id", "targetFieldName" : "Id", "mappingFunction" : { "name" : "base64Encode" } },
{ "sourceFieldName" : "Id", "targetFieldName" : "OriginalId" }
]
HTH!

Related

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Cloiudant using $nin There is no index available for this selector

I created a JSON index in cloudant on _id like so:
{
"index": {
"fields": [ "_id"]
},
"ddoc": "mydesigndoc",
"type": "json",
"name": "myindex"
}
First off, unless I specified the index name, somehow cloudant could not differentiate between the index I created and the default text based index for _id (if that is truly the case, then this is a bug I believe)
I ran the following query against the _find endpoint of my db:
{
"selector": {
"_id": {
"$nin":["v1","v2"]
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
The result was this error:
{"error":"no_usable_index","reason":"There is no index available for this selector."}
if I change "$nin":["v1","v2"] to "$eq":"v1" then it works fine, but that is not the query I am after.
So in order to get what I want, I had to this to my selector "_id": {"$gt":null}, which now looks like:
{
"selector": {
"_id": {
"$nin":["v1","v2"],
"$gt":null
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
Why is this behavior? This seems to be only happening if I use the _id field in the selector.
What are the ramifications of adding "_id": {"$gt":null} to my selector? Is this going to scan the entire table rather than use the index?
I would appreciate any help, thank you
Cloudant Query can use Cloudant's pre-existing primary index for selection and range querying without you having to create your own index in the _id field.
Unfortunately, the index doesn't really help when using the $nin operator - Cloudant would have to scan the entire database to check for documents which are not in your list - the index doesn't really get it any further forward.
By changing the operator to $eq you are playing to the strengths of the index which can be used to locate the record you need quickly and efficiently.
In short, the query you are attempting is inefficient. If your query was more complex e.g. the equivalent of WHERE colour='red' AND _id NOT IN ['a','b'] then a Cloudant index on colour could be used to reduce the data set to a reasonable level before doing the $nin operation on the remaining data.

Cloudant search documents that appear after certain id

There is a cloudant database that stores some documents.
There is also mobile app that takes those documents by using search indexes.
Question is:
Is it possible to make query "get me all indexes that appear after this one"?
For example:
I start app, and get from database documents with id 'aaa','aab' and 'aac'.
I want to store last id - 'aac' - in memory of my app.
Then, when I start the app, I want to get from database documents that appeared after 'aac'.
I think the main problem will be, that _ids are assigned as random strings, but I want to be sure.
when searching the index, try including the selector field in JSON object of the request body:
{
"selector": {
"_id": {
"$gt": "the_previous_id"
}
},
"sort": [
{
"_id": "asc"
}
]
}
in addition, from https://docs.cloudant.com/document.html:
"The _id field is either created by you, or generated automatically as a UUID by Cloudant."
therefore, it is possible to provide your own _ids when creating a document if the Cloudant generated _ids are not working for you.
condition operators:
https://docs.cloudant.com/cloudant_query.html#condition-operators

Matching elasticsearch data indexed by Titan

I have indexed titan data in elasticsearch, it worked fine and indexed but when i see the data in elasticsearch using REST API. the column/property name looks different than from Titan.
For example i have indexed age while inserting data to Titan
final PropertyKey age = mgmt.makePropertyKey("age").dataType(Integer.class).make();
mgmt.buildIndex("vertices",Vertex.class).addKey(age).buildMixedIndex(INDEX_NAME);
and if i see same in elasticsearch
{
"_index" : "titan",
"_type" : "vertices",
"_id" : "sg",
"_score" : 1.0,
"_source":{"6bp":30}
},
Looking at the data i can understand "6bp" is age. how this conversion is done? How can i decode it.?
My goal is to insert data to Titan index on ElasticSearch. The user query should search on ElasticSearch using ElasticSearch client becuase we need more search functionality that ElasticSearch supports, if data is searched then get the related result using Titan query.
The field names are Long encoded. You can reverse encode using this class
com.thinkaurelius.titan.util.encoding.LongEncoding
or, an even better option if you can use it, would be to simply specify the search field names explicitly using the field mapping:
By default, Titan will encode property keys to generate a unique field name for the property key in the mixed index. If one wants to query the mixed index directly in the external index backend can be difficult to deal with and are illegible. For this use case, the field name can be explicitly specified through a parameter.
mgmt = g.getManagementSystem()
name = mgmt.makePropertyKey('bookname').dataType(String.class).make()
mgmt.buildIndex('booksBySummary',Vertex.class).addKey(name,com.thinkaurelius.titan.core.schema.Parameter.of('mapped-name','bookname')).buildMixedIndex("search")
mgmt.commit()
http://s3.thinkaurelius.com/docs/titan/0.5.1/index-parameters.html#_field_mapping

Update a new field to existing document

is there possibility to update a new field to an existing document?
For example:
There is an document with several fields, e.g.
ID=99999
Field1:text
Field2:text
This document is already in the index, now I want to insert a new field to this document WITHOUT the old data:
ID=99999
Field3:text
For now, the old document will be deleted and a new document with the ID will be created. So if I now search for the ID 99999 the result will be:
ID=99999
Field3:text
I read this at the Solr Wiki
How can I update a specific field of an existing document?
I want update a specific field in a document, is that possible? I only need to index one field for >a specific document. Do I have to index all the document for this?
No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only).
In Lucene to update a document the operation is really a delete followed by an add. You will need >to add the complete document as there is no such "update only a field" semantics in Lucene.
So is there any solution for this? Will this function be implemented in a further version (I currently use 3.6.0). As a workaround, I thought about writing a script or an application, which will collect the existing fields, add the new field and update the whole document. But I think this will suffer performance. Do you have any other ideas?
Best regards
I have 2 answers for you (both more or less bad):
To update filed with in document in Solr you have to reindex whole document (to update Field3 within document ID:99999 you have to reindex that document with values for all fields)
In Solr 4 they implemented feature like that, but they have a condition: all fields have to be stored, not just indexed. What is happening that is they are using stored values and reindexing document in the background. If you are interested, there is nice article about it: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ This solution have obvious flaw and that is size of index when you are storing all fields.
I hope that this will help you with your problem. If you have some more questions, please ask
It is possible to do this in Solr 4. E.g. Consider the following document
{
"id": "book123",
"name" : "Solr Rocks"
}
In order to add an author field to the document the field value would be a json object with "set" attribute and the field value
$ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d '
[
{"id" : "book123",
"author" : {"set":"The Community"}
}
]'
Your new document
$ curl http://localhost:8983/solr/get?id=book123
will be
{
"doc" : {
"id" : "book123",
"name" : "Solr Rocks"
"author": "The Community"
}
}
Set will add or replace the author field. Along with set you also have the option to increment(inc) and adding(add)
From Solr 4 onwards you can update a field in solr ....no need to reindex the entire indexes .... various modifiers are supported like ....
set – set or replace a particular value, or remove the value if null is specified as the new value
add – adds an additional value to a list
remove – removes a value (or a list of values) from a list
removeregex – removes from a list that match the given Java regular expression
inc – increments a numeric value by a specific amount (use a negative value to decrement)
example :
document
{
"id": "1",
"name" : "Solr"
"views" : "2"
}
now update with
$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "1",
"author" : {"set":"Neal Stephenson"},
"views" : {"inc":3},
}
]'
will result into
{
"id": "1",
"name" : "Solr"
"views" : "5"
"author" : "Neal Stephenson"
}

Resources