How can I calculate sub-aggregates with azure search?

How can I calculate sub-aggregates with azure search? - azure-cognitive-search

In azure search I have documents with the following format which comes from IoT realm:
device identifier: string
event_level: int
I'd like to facet(group) the count by device identifier and event level together. I can do basic facet at each individual level, but can't figure out multiple fields. Basically I'm looking for a count of the different event_level per device_identifier. So result would be
device_identifier 1 - event_level 2 - count 100
device_identifier 1 - event_level 1 - count 00
...
Any help would be appreciated.

You can combine device_identifier and event_level into a single field and facet on that. However, be aware that faceting on fields with high cardinality can be slow.

Related

Dynamic facet limits using Solr

How can I group my Solr query results using a numeric field into x buckets, where the bucket start and end values are determined when the query is run?
For example, if I want to count and group documents into 5 buckets by a wordCount field, the results should be:
250-500 words: 3438 results
500-750 words: 4554 results
750-1000 words: 9854 results
1000-1250 words: 3439 results
1250-1500 words: 38 results
Solr's faceting API docs assume that the facet buckets are known in advance, but this isn't possible for numeric fields because the lower and upper buckets depend on the search results.
My current query (which doesn't work) is:
curl http://localhost:8983/solr/pages/query -d '
q=*:*&
rows=0&
json.facet={
wordCount : {
type: range,
field : wordCount,
start : max(wordCount),
end : min(wordCount),
gap : 1000
}
}'
I have read this question, which suggests calculating the buckets in the application code prior to sending them to Solr for counting. This is not ideal because it involves querying the database multiple times, and also the answer is several years out of date and since then Solr has added the JSON faceting API, which allows more complicated faceting settings.
In SQL, this type of dynamic bucketing is possible with union queries, in which each query in the union which calculates a specific bucket's lower and upper bounds and counts the results in that bucket. So it seems weird that in Solr, where a lot of effort has gone into making faceting easy, this kind of query is not possible.

Can Solr be used to calculate the matching Percentage between documents?

I got the Database of Articles, the text contains the data of 500 characters to 2000 characters, I am getting that data from IIIrd party,
for new data I have to check the data duplicate in percentage with data we already have and if the duplicate percentage is more that 50% then we have to reject that data else insert data in the database.
Is it possible the duplicate percentage in Solr, if yes then how can we achieve this.
Thanks.

Solr doesn't work with the percentage of Similarity but with the concept of score. Till version 6 Solr computed score using TFIDF and If you're interested on how the score is calculated you can refer to this document. Starting from version 6 score is calculated using BM25 as described here.
So if you want to use Solr you'll need to follow one of the approaches below:
Adopt an approach based on score instead of percentage;
Build your own similarity class to work on percentage.

Can Solr or ElasticSearch return same results in different orders to different visitors for the same search criteria?

I am developing a Spring-based website and I need to use a search engine to provide "customized" search results. I am considering Solr or Elastic.
Here is what I mean by "customized".
Suppose I have two fields A and B to search against.
Suppose that there are two visitors and I am able to profile them by tracking their activities. Suppose visitor 1 constantly uses or searches for value a (of A) and visitor 2 value b (of B). Now both visitors search for records that satisfy A=a OR B=b.
Can Solr or Elastic return results in different order for visitor 1 and 2? I mean that for example, results with A=a are ahead of only B=b results for visitor 1? And the opposite for visitor 2?
I understand that I need to pass some signal to a search engine to ask the engine to give more "weight" to one of the fields.
Thanks and Regards.

It looks like you just need to give a different weight to the fields you're querying on depending on the user that's executing the query.
You could for example use a multi_match query with elasticsearch, which allows you to search on multiple fields giving them different weights as well. Here is an example that makes the fieldA more important:
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "fieldA^3", "fieldB" ]
}
}
That way the score is influenced by the weights that you put on the query, and if you sort by score (default) you get the results in the expected order. The weights assigned to the fields need some fine-tuning though depending on your documents and the query you execute.

How to perform these SOLR queries?

I have an indexed data(indexed using solrj from rdbms) having fields related to banking such as(sample):customerid, cust_name, accountno, amount, positions, pos_value, EOD_value etc
Now i want to do some searching on the data and search queries:
top 10 stocks/positions(based on stock value)
top 5 customers in decreasing order of amount in bank
which stock gained the max in a day (and the stock details)
lowest value of a stock in a particular time frame
How is it possible to query for the above in SOLR
I did read Function Query and solr Plugins but could not find much useful information...
Can we perform faceting on fields(amount,stock value etc) using some maths operations like average,sum etc...
I want to use velocity UI for the following search and what customization to its search box would be required?
Any idea???

Solr is a high performance text search engine, bases on Lucene, an excellent token matching and scoring library. This said, the kind of queries you want to run will certainly work in one way or an other with Solr, but you will have to provide Solr will all the data you want to search on. Solr will not compute min, max average values for you. It's job is it to find, rank and sort as fast as possible in previously computed values.
The fields you have listed might not give you all the details you are looking for. You will need to index some more.
If you have the data you are looking for in your index, the following queries might get the answer you are looking for or should give you a hint on how to state them.
top 10 stocks/positions
q=*:* sort=stock_value DESC rows=10
This requires that stock_value is numeric and has the latest stock price in the index.
top 5 customers
This is pretty similar.
q=*:* sort=account_value DESC rows=5
which stock gained the max in a day
You will need to index the gain per day
q=date:1995-12-31T23:59:59.999Z sort=stock_gain DESC rows=1
lowest value of a stock in a particular time frame
q=symbol:abc123 date:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z] sort=stock_value ASC rows=1
See Solr Query Syntax for Details on Date Queries

We have implemented the same thing in one of our application.
In Browse.vm under navigators "div" we have created our custom facet and when we click on that facet it recreats the url along with the parameter mentioned by "phisch" in his answer.
Example:
We have created a link called "Top 10 Stocks" in facet section on UI, and when we click it. we created a url adding parameters as
q=:&sort=stock_value DESC&rows=10
Please try this out at your end as it is working fine at my end. Sorry I cannot share the code as it is client confidential.

How to implement a complex token-matching algorithm in SOLR

Problem Description
I'm trying to implement a custom algorithm to match user provided free-text input, a company name such as "Ford Motor", against a reference data source consisting of 1.4 million company names.
The algorithm executes following steps:
Step 1) Performs an "Exact Match", followed by "Begins Match" and finally "Contains Match" of user provided search input. Results from this step are also sorted in the same order.
Step 2) Performs a token by token match of search input with reference company name.
Every token is matched in following order: Exact, Begins, Contains, Levenshtein Distance (< 0.2) and Refined Soundex.
E.g. If user input is "Foord Motur Holding" and it's being matched against "The Ford Motor Holdings Company" then first token "Foord" will match "Ford" based on Soundex match, second token "Motur" will match "Motor" based on Edit Distance Algo and and last token "Holding" will match "Holdings" via Begins match.
Scoring:
Every token match is first scored on a scale that rates the matching technique, with Exact match being the best and Soundex being the worst.
The overall score is calculated, on a scale of 0-100%, by calculating a weighted average of individual token-match scores. Weights are assigned based on index-order of token i.e. the first token has highest weight and last token has lowest.
My Partial Solution
I have implemented a simple schema in solr to store referance company names. A String field (called companyName), a simple text field (called as companyText) copied from string and another text field (called as companySoundex) copied from string and using PhoneticFilterFactory for Refined Soundex based matching.
I have been able to replicate step 1) in a single solr query.
For step 2) I plan to fire 3 parallel queries to solr server. First query performing a simple text search on companyText field, second query performing fuzzy match using ~ operator on companyText field and third query performing soundex match on companySoundex field. I plan to somehow combine the results from these 3 parallel queries to get desired final result.
Questions:
1) Is there a better way to replicate Step 2) of original algorithm?
2) Even if I go with my "three-parallel-queries" approach then how to get the "right" sorting order as I get in the original algorithm ?
I guess the main problem is how to compare the solr scores from these 3 entirely different queries to do the final combining of results
Thanks for reading this long question. Any help/pointers would be greatly appreciated.

Look at the DisMax query parser. http://wiki.apache.org/solr/DisMaxRequestHandler
For each separate query, you'll actually build up separate fields in the index for matching. Then use DisMax to combine the queries in a weighted fashion.
I suggest giving up on your 3 parallel queries approach now. Last time I looked into this it was impossible to relate scores from 2 separate queries. It just doesn't work. If you want a single set of results sorted by score, you have to figure out how to do this in a single query.

IMHO, This functionality can not be achieved in out of the box handlers that Solr provides. You should be better with writing a custom query handler that handles and scores the results in this manner.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight