Matching elasticsearch data indexed by Titan

Matching elasticsearch data indexed by Titan - solr

I have indexed titan data in elasticsearch, it worked fine and indexed but when i see the data in elasticsearch using REST API. the column/property name looks different than from Titan.
For example i have indexed age while inserting data to Titan
final PropertyKey age = mgmt.makePropertyKey("age").dataType(Integer.class).make();
mgmt.buildIndex("vertices",Vertex.class).addKey(age).buildMixedIndex(INDEX_NAME);
and if i see same in elasticsearch
{
"_index" : "titan",
"_type" : "vertices",
"_id" : "sg",
"_score" : 1.0,
"_source":{"6bp":30}
},
Looking at the data i can understand "6bp" is age. how this conversion is done? How can i decode it.?
My goal is to insert data to Titan index on ElasticSearch. The user query should search on ElasticSearch using ElasticSearch client becuase we need more search functionality that ElasticSearch supports, if data is searched then get the related result using Titan query.

The field names are Long encoded. You can reverse encode using this class
com.thinkaurelius.titan.util.encoding.LongEncoding
or, an even better option if you can use it, would be to simply specify the search field names explicitly using the field mapping:
By default, Titan will encode property keys to generate a unique field name for the property key in the mixed index. If one wants to query the mixed index directly in the external index backend can be difficult to deal with and are illegible. For this use case, the field name can be explicitly specified through a parameter.
mgmt = g.getManagementSystem()
name = mgmt.makePropertyKey('bookname').dataType(String.class).make()
mgmt.buildIndex('booksBySummary',Vertex.class).addKey(name,com.thinkaurelius.titan.core.schema.Parameter.of('mapped-name','bookname')).buildMixedIndex("search")
mgmt.commit()
http://s3.thinkaurelius.com/docs/titan/0.5.1/index-parameters.html#_field_mapping

Related

Store abbreviation using Solr in-built feature

I want to make abbreviation of words using Solr. I am using Solr 7.1. In the schema, I have one field named "author", which is of data type string. Now I want to make another copy field from it which will store abbreviation of the string which is trying to store in "author" field. As example, "William Shakespeare" is going to store in "author" field, during addition "W. Shakespeare" will add in the copy field. I am very new in Solr and unable to configure it to fulfill the purpose. Please help.

Azure Search using Lucene Query Syntax Returns Incorrect Results

I am using the Microsoft.Azure.Search .NET SDK v5.0.1. I am attempting to perform a search against my Azure Search index as follows: Documents.SearchAsync("fieldname:val* AND timeStamp:2018-05-03T13\:23\:59Z"). The results are incorrect. There are exactly 2 documents in my index with that timestamp. There are 121 documents in my index where the fieldname starts with val. When I run the above query using the SDK, it always returns 121 documents. Is there some special way to query timestamp that I am missing?

There are a few points to make here:
In your index definition, I believe you have timeStamp set to be a String. Otherwise you wouldn't have been able to make the search query as DateTime fields are not searchable. Firstly, I'd advice against treating timeStamp as a string. This is because searchable fields go through a bunch of analysis (tokenization being one of them) Reference on query parsing. In your case, the timestamp query (say 2018-05-03) will be tokenized into smaller constituents (2018, 05, 03) and documents containing any of those terms will be returned. Which is why you observe what you see.
Your scenario seems to be a classic case of "filter" results based on a criteria, followed by "search" on the filtered documents. To accomplish this, you need to do the following:
Use a filter on the timestamp, so that it doesn't go through the analysis
On the filtered results, apply your search query.
Reference
I strongly recommend however that if possible, you should make your timeStamp column a datetime for more reasonable semantics.
As an example, here's how you'd go about achieving a filter + search combo:
parameters = new SearchParameters()
{
Filter = "timeStamp eq '2018-05-03'"
};
Documents.SearchAsync("fieldname:val*", parameters);

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.

The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Language support for Google Search API

I'd like to perform a partial text / phrase search against a Datastore record field using Ruby.
I've figured out how to do it with a conditional constraint using >= <"\ufffd" condition, but that only works from the beginning of the field.
This works; querying for "Ener" returns "Energizer AA Batteries" but querying for "AA" does not return the same.
In the docs for the Python Google Client's Search API, it documents the ability to manually create indexes which allow for both atomic and partial word searches.
https://cloud.google.com/appengine/docs/standard/python/search/ says:
Tokenizing string fields When an HTML or text field is indexed, its
contents are tokenized. The string is split into tokens wherever
whitespace or special characters (punctuation marks, hash sign, etc.)
appear. The index will include an entry for each token. This enables
you to search for keywords and phrases comprising only part of a
field's value. For instance, a search for "dark" will match a document
with a text field containing the string "it was a dark and stormy
night", and a search for "time" will match a document with a text
field containing the string "this is a real-time system".
In the docs for Ruby and PHP, I cannot find such an API reference to enable me to do the same thing. Is it possible to perform this type of query in Ruby / PHP with Cloud Datastore?
If so, can you point me to the docs, and if not, is there a workaround, for example, create indexes with the Python Search API, and then configure the PHP/Ruby client to execute it's queries against those indexes?

How to decode encoded data in Azure Search?

I am using Azure Search for search in my Web API.
Indexer Source used: SQL Azure Database
The column that is being indexed has special characters. Hence I am encoding the data when pushing into Azure Search.
Is there a way to decode the data as part of retrieval from Azure Search?
Only documentation I could find online on decoding in Azure Search is for decoding metadata of blobs.

The question involves using a table column that may contain key-invalid characters as the document key.
To make this work, define a field mapping and apply base64Encode function. See https://azure.microsoft.com/en-us/documentation/articles/search-indexer-field-mappings/#base64EncodeFunction
Note that Azure Search will store the encoded value as the key of your document. If you actually want to search / filter on the original value, fork the original value in your field mappings, like this:
"fieldMappings" : [
{ "sourceFieldName" : "Id", "targetFieldName" : "Id", "mappingFunction" : { "name" : "base64Encode" } },
{ "sourceFieldName" : "Id", "targetFieldName" : "OriginalId" }
]
HTH!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight