How to search a term with a middle dash in azure search?

How to search a term with a middle dash in azure search? - azure-cognitive-search

I'm learning to use azure search and I dont find a way to search a term with a middle dash into the ItemId field, doesn't care if the term to search it's at the beginning or at the middle.
I have these fields with data in my index
+-----+--------------------+-------------+
| Cat | ItemId | Description |
+-----+--------------------+-------------+
| 100 | 400800-1100103U | desc item 1 |
| 100 | 400800-11001066 | desc item 2 |
| 100 | 400800-11001068 | desc item 3 |
| 101 | 400800-110010F6 | desc item 4 |
+-----+--------------------+-------------+
This is my index field configuration:
+-------------+-------------+-----------+-----------+-----------+------------+
| Field Name | Retrievable | Filerable | Sortable | Facetable | Searchable |
+-------------+-------------+-----------+-----------+-----------+------------+
| Cat | OK | OK | OK | OK | X |
| ItemId | OK | OK | OK | OK | OK |
| Description | OK | | | | |
+-------------+-------------+-----------+-----------+-----------+------------+
And this is my custom analyzer to the field ItemId to generate just one token even if has a middle dash.
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "keyword_lowercase",
"tokenizer": "keyword_v2",
"tokenFilters": [
"lowercase"
],
"charFilters": []
}
If I search with this query: $select=RowKey&search=400800-1100*
I get these results:
400800-1100103U
400800-11001066
400800-11001068
400800-110010F6
But if I try to search with a middle term like this: $select=RowKey&search=RowKey:(00800-1100*)~
I get 0 results.
So how can I search a term with a middle dash into the ItemId, doesn't care if the term to search it's at the beginning or at the middle?

I remove the analyzer and change the GET by a POST using this code in the body of the request.
{
"queryType": "full",
"search": "/.*00-11.*/",
"searchFields": "ItemId",
"select": "ItemId",
"count": true,
"top": 10
}
Using the full query property with Lucene syntax analyzer and a regex, the search works as expected.
Note that if you try to use this regex in the query search explorer in azure, this doesn't return any result. I think that it's because the search explorer use a GET request.
thanks for the answer Corom - MSFT. It works. I just want to answer with more clarify

I believe that this post answers your question by using regular expression search but has some considerations. Alternatively you can consider using fuzzy search or use the Edge N-gram tokenizer with a reverse token filter depending on your specific scenario.

Related

How does Edge Index differ from Relation Index in janusGraph or are they the same?

I am quite new to janusgraph and I am in the process of indexing my graph.
Through the process I have discovered that we could index the edges like we index vertices.
FYI : I am indexing vertices using mixed - elastic search
This is the script that I have followed from the documentation
graph.tx().rollback()
mgmt = graph.openManagement()
type = mgmt.getPropertyKey('type')
has_reported = mgmt.getEdgeLabel('has_reported')
mgmt.buildEdgeIndex(has_reported, 'reportedByType', Direction.BOTH, Order.desc, type)
mgmt.commit()
ManagementSystem.awaitRelationIndexStatus(graph, 'reportedByType', 'has_reported').call()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getRelationIndex(has_reported, "reportedByType"), SchemaAction.REINDEX).get()
mgmt.commit()
From the above script, here I have an edge 'has_reported' containing a property 'type'. After excuting the above script I had my edge 'REGISTERED' and successfully reindex with final status 'ENABLED' as below.
gremlin> mgmt.printIndexes()
==>------------------------------------------------------------------------------------------------
Vertex Index Name | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
byUserNameMixed | Mixed | false | search | username: ENABLED |
byPasswordMixed | Mixed | false | search | password: ENABLED |
byStatusIdComposite | Composite | false | internalindex | status_id: ENABLED |
---------------------------------------------------------------------------------------------------
Edge Index (VCI) Name | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Relation Index | Type | Direction | Sort Key | Order | Status |
---------------------------------------------------------------------------------------------------
reportedByType | has_reported | BOTH | type | desc | ENABLED |
---------------------------------------------------------------------------------------------------
Here I have confusion as I see three type of indexes shown
Vertex Index Name
Edge Index (VCI ) Name
Relation Index
Need help on the following
difference between Edge Index (VCI) Name and Relation Index
how to index using 'Edge Index (VCI) Name' as I don't see that in the documentation
Documention referred : https://docs.janusgraph.org/schema/index-management/index-performance/#vertex-centric-indexes
Thanks!

Edited (because the previous answer was not correct for more recent versions of JanusGraph):
I tend to agree that the naming of the JanusGraph indices used to be confusing (as in your question). In JanusGraph 0.6.2 the output of printIndexes() looks different:
gremlin> mgmt.printIndexes()
==>------------------------------------------------------------------------------------------------
Graph Index (Vertex) | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
name | Composite | true | internalindex | name: ENABLED |
vertices | Mixed | false | search | age: ENABLED |
byTestVal | Composite | false | internalindex | testval: ENABLED |
someName | Mixed | false | search | p: ENABLED |
| | | | q: ENABLED |
---------------------------------------------------------------------------------------------------
Graph Index (Edge) | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
edges | Mixed | false | search | reason: ENABLED |
| | | | place: ENABLED |
---------------------------------------------------------------------------------------------------
Relation Index (VCI) | Type | Direction | Sort Key | Order | Status |
---------------------------------------------------------------------------------------------------
battlesByTime | battled | BOTH | time | desc | ENABLED |
---------------------------------------------------------------------------------------------------
The Graph Index (Edge) enables a lookup among all edges in the graph database that have the label and key(s) needed for that relation index. This index is typically triggered with a query g.E().has(key, value).
A Relation Index (VCI-vertex centric index) enables a lookup among the matching edges bound to a single vertex. This index can only be triggered when first traversing to the associated vertex, like g.V().has(keyV, valueV).bothE(labelE).has(keyE, valueE). The vertex centric index enable you to traverse so-called supernodes that have very many connections.
You want the Graph Index (Edge), so you need to instantiate an index builder for that:
mgmt.buildIndex('reportedByType', Edge.class)
see the reference docs for further build steps.

Replacing placeholder with another table's data (without knowing in advance the substitutions)

I need to replace placeholders in a text, reading from a query matched to that specific message.
Table Template_Messages
| ID | String | Query
| PICKUP_MSG|Your {vehicle_name} will be ready for pick-up on {pickup_date}|SELECT * FROM vehicles WHERE ID = ?
If I take the query I find in the 'Query' column, I will find the following table:
Table Vehicles
| ID | vehicle_name | plate | pickup_date | ... |
| P981| BMW X5 | AA014CC| 2022-09-20 | ... |
| Z323| Ford Focus | HH000JJ| 2022-10-21 | ... |
Then with the following query:
SELECT * FROM vehicles WHERE ID = 'Z323'
By making the appropriate substitutions I should obtain this output:
Your Ford Focus will be ready for pick-up on 2022-10-21
How can I achieve this?
And since the 'query' column of the first table does not only refer to the 'vehicles' table, can it work dynamically on any placeholder/query?

Why does Solr changes record position after updating a field

I am new to Solr and encountered a weird behavior as I update a field and perform search.
Here's the scenario :
I have a 300records in my core, I have a search query wherein I filtered the results with this
fq=IsSoldHidden:false AND IsDeleted:false AND StoreId:60
and I sort it by DateInStock asc
Everything is perfectly returning my expected results,
Here is the sample top 3 results of my query :
--------------------------------------------------------------------------------------
id | Price | IsSoldHidden | IsDeleted | StoreId | StockNo | DateInStock
--------------------------------------------------------------------------------------
27236 | 15000.0 | false | false | 60 | A00059 | 2021-06-07T00:00:00Z
--------------------------------------------------------------------------------------
37580 | 0.0 | false | false | 60 | M9202 | 2021-06-08T00:00:00Z
--------------------------------------------------------------------------------------
37581 | 12000 | false | false | 60 | M9173 | 2021-06-08T00:00:00Z
but when I tried to update(AtomicUpdate to be specific) the Price field in 2nd row , and trigger a search again with the same filters requirements, the results changes to this :
--------------------------------------------------------------------------------------
id | Price | IsSoldHidden | IsDeleted | StoreId | StockNo | DateInStock
--------------------------------------------------------------------------------------
27236 | 15000.0 | false | false | 60 | A00059 | 2021-06-07T00:00:00Z
--------------------------------------------------------------------------------------
37581 | 0.0 | false | false | 60 | M9173 | 2021-06-08T00:00:00
--------------------------------------------------------------------------------------
37582 | 0.0 | false | false | 60 | M1236 | 2021-06-08T00:00:00Z
and the 2nd row(37580) of the 1st results was placed at the last row(document#300).
I have researched online , and Here's what I've found
Solr changes document's score when its random field value altered
but I think the situation is different to mine, since I did not add the score as a Sort.
I am not sure why does it behave like this,
Am I missing something ?
Or is there anyone can explain it ?
Thanks in advance.

Since the dates are identical, their internal sort order depends on their position in the index.
Updating the document marks the original document as deleted and adds a new document at the end of the index, so its position in the index changes.
If you want to have it stable, sort by date and id instead - that way the lower id will always be first when the dates are identical, and the sort will be stable.

Solr edismax not using Logical Operators AND/OR

Just got an issue where our solr 6.6 doesn't seem to be using logical operators (AND,OR,NOT) as operators but is actually search for the words. So results that should only have a few hundred hits now have thousands. We are using the edismax parser.
Solr Query: Apple AND Google
"debug":{
"rawquerystring":"Apple AND Google",
"querystring":"Apple AND Google",
"parsedquery":"(+(DisjunctionMaxQuery((cm_city_t:apple | cm_credit_t:apple | cm_notes_t:apple | cm_state_t:apple | cm_country_t:apple | cm_description_t:apple | cm_caption_writer_s:Apple | cm_photographer_t:apple)) DisjunctionMaxQuery((cm_city_t:and | cm_credit_t:and | cm_notes_t:and | cm_state_t:and | cm_country_t:and | cm_description_t:and | cm_caption_writer_s:AND | cm_photographer_t:and)) DisjunctionMaxQuery((cm_city_t:google | cm_credit_t:google | cm_notes_t:google | cm_state_t:google | cm_country_t:google | cm_description_t:google | cm_caption_writer_s:Google | cm_photographer_t:google))))/no_coord",
"parsedquery_toString":"+((cm_city_t:apple | cm_credit_t:apple | cm_notes_t:apple | cm_state_t:apple | cm_country_t:apple | cm_description_t:apple | cm_caption_writer_s:Apple | cm_photographer_t:apple) (cm_city_t:and | cm_credit_t:and | cm_notes_t:and | cm_state_t:and | cm_country_t:and | cm_description_t:and | cm_caption_writer_s:AND | cm_photographer_t:and) (cm_city_t:google | cm_credit_t:google | cm_notes_t:google | cm_state_t:google | cm_country_t:google | cm_description_t:google | cm_caption_writer_s:Google | cm_photographer_t:google))",
"QParser":"ExtendedDismaxQParser",
"altquerystring":null,
"boost_queries":null,
"parsed_boost_queries":[],
"boostfuncs":null,
"filter_queries":["an_sas_s:\"Photo System \\- P1\" AND an_security_group_s:\"P1.Leaders\""],
"parsed_filter_queries":["+an_sas_s:Photo System - P1 +an_security_group_s:P1.Leaders"],
You can see the "and" is being included as a search term in our fields. I'm not sure why its doing this. I have found that if I drop down to the dismax parser it works fine. I'm new to developing and working with solr, but my understanding is edismax is the better parser to use and should be way more advanced. Could this be a configuration issue with the response handler or something else?

PostgreSQL JSONB versus keeping a separate table

I'm thinking through a database design and I was wondering if anyone could chime in. I have some structured data that I will occasionally be filtering against some somewhat unstructured data. I'm thinking a lot about performance so I'm trying to keep it as denormalized as possible. Do people have opinions about an indexed JSONB column over a separate table? For example:
| (smurfs) id | name | filters (GIN index) |
|-------------+--------+-------------------------------------|
| 1 | Papa | { "color": "blue" } |
| 2 | Brainy | { "brain": "big", "color": "blue" } |
And I'd query against the indexed JSONB data.
Or:
| (smurfs) id | name |
|-------------+--------+
| 1 | Papa |
| 2 | Brainy |
| (filters) id | smurf_id | filter_type | filter_value |
|--------------+----------+-------------+--------------|
| 1 | 1 | color | blue |
| 2 | 2 | brain | big |
| 3 | 2 | color | blue |
and I'd JOIN the filters with the data for my query.
There's a lot of talk about misuse of JSON in relational databases. Does this fit into that category? Would one be preferable over another from a good design standpoint. Is one more performant? I'm trying to optimize for reads on a large table. Seems like in the second case, I'd have 2 large tables instead of one.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to search a term with a middle dash in azure search? - azure-cognitive-search

I believe that this post answers your question by using regular expression search but has some considerations. Alternatively you can consider using fuzzy search or use the Edge N-gram tokenizer with a reverse token filter depending on your specific scenario.

Related

How does Edge Index differ from Relation Index in janusGraph or are they the same?

Replacing placeholder with another table's data (without knowing in advance the substitutions)

Why does Solr changes record position after updating a field

Solr edismax not using Logical Operators AND/OR

PostgreSQL JSONB versus keeping a separate table

Categories

Resources