Elasticsearch build sql server index and create search query failes - sql-server

After reading and trying several of articles and getting no result..
I want to create and elasticsearch query that returns data base results
Example:
[Step 1]:
my db is [my_db] and my table name is [my_table]
to build new index on localhost:9200
POST /my_index/my_type/_meta
{
"type":"jdbc",
"jdbc":
{
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://[my_db_ip];databaseName=[my_db]",
"user":"sa","password":"xxxxxx",
"sql":"SELECT * FROM [my_table]",
"poll":"5s",
"index": "my_index",
"type": "my_type"
}
}
The index creation result:
{
"_index": "my_index",
"_type": "my_type",
"_id": "_meta",
"_version": 1,
"created": true
}
[Step 2]:
The search query
POST /my_index/_search
{
"query_string" : {
"query" : "FreeText"
}
}
The search result
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures....
}
what is wrong with my search query??
how can i create a query that returns results from [my_table] rows?

Try the match_all query (see here for the official documentation). This will bring you all the results of my_type.
Example:
POST /my_index/my_type/_search
{
"query": { "match_all": {} }
}
If you need to search for some specific term then you must pay attention to the mappings of your type and the query type that you'll use.
Update
Mappings:
From the schema of your table I understand that the below mappings for my_type would suit you well.
{
"my_table" : {
"properties" : {
"orderid" : {"type" : "integer", "index" : "not_analyzed"},
"ordername " : {"type" : "string" }
}
}
}
Keep in mind that if the data are already indexed you cannot change the mappings. You must reindex your data after defining the proper mapping.
Generally I'd propose you to follow the below methodology:
Create your index with the index settings that you need
Define the mappings of your type
Index your data
Do not mingle all of these steps in one and avoid leaving things in luck (like default mappings).
You can use the match query in order to search the data of a field on the document.
Example
POST /my_index/my_type/_search
{
"query": {
"match": {
"FIELD": "TEXT"
}
}
}
You can use the multi-match query in order to search the data of multiple fields of the document.
Example
POST /my_index/my_type/_search
{
"query": {
"multi_match": {
"query": "TEXT",
"fields": [ "field1", "field2" ]
}
}
}
For more querying options check the official documentation on Query DSL.

Related

Search for exact field in an array of strings in elasticsearch

Elasticsearch version: 7.1.1
Hi, I try a lot but could not found any solution
in my index, I have a field which is containing strings.
so, for example, I have two documents containing different values in locations array.
Document 1:
"doc" : {
"locations" : [
"Cloppenburg",
"Berlin"
]
}
Document 2:
"doc" : {
"locations" : [
"Landkreis Cloppenburg",
"Berlin"
]
}
a user requests a search for a term Cloppenburg
and I want to return only those documents which contain term Cloppenburg
and not Landkreis Cloppenburg.
the results should contain only Document-1.
but my query is returning both documents.
I am using the following query and getting both documents back.
can someone please help me out in this.
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"doc.locations": {
"query": "cloppenburg",
"operator": "and"
}
}
}
]
}
}
}
The issue is due to your are using the text field and match query.
Match queries are analyzed and used the same analyzer of search terms which is used at index time, which is a standard analyzer in case of text fields. which breaks text on whitespace on in your case Landkreis Cloppenburg will create two tokens landkreis and cloppenburg both index and search time and even cloppenburg will match the document.
Solution: Use the keyword field.
Index def
{
"mappings": {
"properties": {
"location": {
"type": "keyword"
}
}
}
}
Index your both docs and then use same search query
{
"query": {
"bool": {
"must": [
{
"match": {
"location": {
"query": "Cloppenburg"
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "location",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"location": "Cloppenburg"
}
}
]

Filter All using Elasticsearch

Let's say I have User table with fields like name, address, age, etc. There are more than 1000 records in this table, so I used Elasticsearch to retrieve this data one page at a time, 20 records.
And let's say I just wanted to search for some text "Alexia", so I wanted to display: is there any record contain Alexia? But special thing is that I wanted to search this text via all my fields within the table.
Does search text match the name field or age or address or any? IF it does, it should return values. We are not going to pass any specific field for Elastic query. If it returns more than 20 records matched with my text, the pagination should work.
Any idea of how to do such a query? or any way to connect Elasticsearch?
Yes you can do that by query String
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime** -------> This could be current time or age or any property that like to do a range query
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
For getting only 20 records you can pass the Size as 20 and for Pagination you can use RangeQuery and get the next set of Messages
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gt": 1589570610732. ------------> From previous response
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
You can do the same by using match query as well . If in match query you specify _all it will search in all the fields.
{
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
When you are using ElasticSearch to provide search functionality in search boxes , you should avoid using query_string because it throws error in case of invalid syntax, which other queries return empty result. You can read about this from query_string.
_all is deprecated from ES6.0, so if you are using ES version from 6.x ownwards you can use copy_to to copy all the values of field into single field and then search on that single field. You can refer more from copy_to.
For pagination you can make use of from and size parameter . size parameter tells you how many documents you want to retrieve and from tells from which hit you want to process.
Query :
{
"from" : <current-count>
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
from field value you can set incremently in each iteration to how much much documents you got. For e.g. first iteration you can set from as 0 . For next iteration you can set it as 21 (since in first iteration you got first 20 hits and in second iteration you want to get documents after first 20 hits). You can refer this.

Find similar documents/records in database

I have a quite big number of records currently stored in mongodb, each looks somehow like this:
{
"_id" : ObjectId("5c38d267b87d0a05d8cd4dc2"),
"tech" : "NodeJs",
"packagename" : "package-name",
"packageversion" : "0.0.1",
"total_loc" : 474,
"total_files" : 7,
"tecloc" : {
"JavaScript" : 316,
"Markdown" : 116,
"JSON" : 42
}
}
What I want to do is to find similar data record based on e.g., records which have about (+/-10%) the number of total_loc or use some of the same technologies (tecloc).
Can I somehow do this with a query against mongodb or is there a technology that fits better for what I want to do? I am fine with regenerating the data and storing it e.g., in elastic or some graph-db.
Thank you
One of the possibility to solve this problem is to use Elasticsearch. I'm not claiming that it's the only solution you have.
On the high level - you would need to setup Elasticsearch and index your data. There are various possibilities to achieve: mongo-connector, or Logstash and JDBC input plugin or even just dumping data from MongoDB and putting it manually. No limits to do this job.
The difference I would propose initially is to make field tecloc - multivalued field, by replacing { to [, and adding some other fields for line of code, e.g:
{
"tech": "NodeJs",
"packagename": "package-name",
"packageversion": "0.0.1",
"total_loc": 474,
"total_files": 7,
"tecloc": [
{
"name": "JavaScript",
"loc": 316
},
{
"name": "Markdown",
"loc": 116
},
{
"name": "JSON",
"loc": 42
}
]
}
This data model is very trivial and obviously have some limitations, but it's already something for you to start and see how well it fits your other use cases. Later you should discover nested type as one of the possibility to mimic your data more properly.
Regarding your exact search scenario - you could search those kind of documents with a query like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tecloc.name.keyword": {
"value": "Java"
}
}
},
{
"term": {
"tecloc.name.keyword": {
"value": "Markdown"
}
}
}
],
"must": [
{"range": {
"total_loc": {
"gte": 426,
"lte": 521
}
}}
]
}
}
}
Unfortunately, there is no support for syntax with +-10% so this is something that should be calculated on the client.
On the other side, I specified that we are searching documents which should have Java or Markdown, which return example document as well. In this case, if I would have document with both Java and Markdown the score of this document will be higher.

Elastic Search query to find documents where nested field "contains" X objects

This is an example of what my data looks like for an Elastic Search index called video_service_inventory:
{
'video_service': 'netflix',
'movies' : [
{'title': 'Mission Impossible', 'genre: 'action'},
{'title': 'The Hangover', 'genre': 'comedy'},
{'title': 'Zoolander', 'genre': 'comedy'},
{'title': 'The Ring', 'genre': 'horror'}
]
}
I have established in my index that the "movies" field is of type "nested"
I want to write a query that says "get me all video_services that contain both of these movies":
{'title': 'Mission Impossible', 'genre: 'action'}
AND
{'title': 'The Ring', 'genre': 'horror'}
where, the title and genre must match. If one movie exists, but not the other, I don't want the query to return that video service.
Ideally, I would like to do this in 1 query. So far, I haven't been able to find a solution.
Anyone have suggestions for writing this search query?
the syntax may vary depending on elasticsearch version, but in general you should combine multiple nested queries within a bool - must query. For nested queries you need to specify path to "navigate" to the nested documents, and you need to qualify the properties with the part + the field name:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "movies",
"query": {
"bool": {
"must": [
{ "terms": { "movies.title": "Mission Impossible" } },
{ "terms": { "movies.genre": "action" } }
]
}
}
}
},
{
"nested": {
"path": "movies",
"query": {
"bool": {
"must": [
{ "terms": { "movies.title": "The Ring" } },
{ "terms": { "movies.genre": "horror" } }
]
}
}
}
}
]
}
}
}
This example assumes that the title and genre fields are not analyzed properties. In newer versions of elasticsearch you may find them as a .keyword field, and you would then use "movies.genre.keyword" to query on the not analyzed version of the data.ยจ
For details on bool queries you can have a look at the documentation on the ES website:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
For nested queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Cloudant find Query with $and and $or elements

I'm using the following json to find results in a Cloudant
{
"selector": {
"$and": [
{
"type": {
"$eq": "sensor"
}
},
{
"v": {
"$eq": 2355
}
},
{
"$or": [
{
"p": "#401000103"
},
{
"p": "#401000114"
}
]
},
{
"t_max": {
"$gte": 1459554894
}
},
{
"t_min": {
"$lte": 1459509591
}
}
]
},
"fields": [
"_id",
"p"
],
"limit": 200
}
If I run this againt my cloudant database I get the following error:
{
"error": "unknown_error",
"reason": "function_clause",
"ref": 3379914628
}
If I remove one the $or elements I get the results for query.
(,{"p":"#401000114"})
Also i get a result if I replace #401000114 with #401000114 I get result.
But when I want to use both element I get the error code above.
Can anybody tell what this error_reason: function_clause mean?
error_reason: function_clause means there was a problem on the server, you should probably reach out to Cloudant Support and see if they can help you with your issue.
I had contact with the Cloudant support.
This is there answer:
The issue affects Cloudant generally
It affects both mult-tenant and dedicated clusters.
There are working on the sollution.
A workaround is in the array to which the $or operator applies has two elements, you can get the correct result by repeating one of the items in the array.

Resources