Find relevance of search results in Azure Search - azure-cognitive-search

I have a Azure search index containing address information of users with fields and corresponding weights as follows:
weights= #{
HouseNumber = '40'
StreetName = '36'
City = '30'
PostalCode = '29'
Province = '25'
Country = '21'
FSA = '20'
Plus4 = '16'
SuiteName = '12'
SuiteRange = '11'
StreetPost = '10'
StreetPre = '8'
StreetSuffix = '6'
}
I am using searchmode as any for querying. How can I decide, that the record with max score is the most relevant one? Means, in case, the user doesn't enter all the keywords of the address, The relevance of the records may vary. E.g., if the string contains keywords like, '1A1' which can be part of the postal code 'A1A 1A1' or may be a housenumber. This will return both the records, but with different scores. How should I fix this?

If a query's terms can match multiple fields (e.g. if '1A1' can match results in the PostalCode and the HouseNumber field), then the scoring profile would be working as expected by boosting each respective result.
You should instrument the application so the query is field-scoped. That way, each part of the query is searching against the proper field and the matches are boosted accordingly.

Related

union request and pagination in cakephp4

I made two requests. The first one gives me 2419 results and I store the result in $requestFirst. The second, 1 result and I store the result in $requestTwo.
I make a union :
$requestTot = $requestFirst->union($requestTwo);
The total of the $requestTot is 2420 results so all is well so far.
Then :
$request = $this->paginate($requestTot);
$this->set(compact('request'));
And here I don't understand, on each page of the pagination I find the result of $requestTwo. Moreover the pagination displays me :
Page 121 of 121, showing 20 record(s) out of 2,420 total
This is the right number of results except that when I multiply the number of results per page by the number of pages I get 2540. This is the total number of results plus one per page.
Can anyone explain?
Check the generated SQL in Debug Kit's SQL panel, you should see that the LIMIT AND OFFSET clauses are being set on the first query, not appended as global clauses so that they would affect the unionized query.
It will look something like this:
(SELECT id, title FROM a LIMIT 20 OFFSET 0)
UNION
(SELECT id, title FROM b)
So what happens then is that pagination will only be applied to the $requestFirst query, and the $requestTwo query will be unionized on top of it each and every time, hence you'll see its result on every single page.
A workaround for this current limitation would be to use the union query as a subquery or a common table expression from which to fetch the results. In order for this to work you need to make sure that the fields of your queries for the union are being selected without aliasing! This can be achieved by either using Table::subquery():
$requestFirst = $this->TableA
->subquery()
->select(['a', 'b'])
// ...
$requestTwo = $this->TableB
->subquery()
->select(['c', 'd'])
// ...
or by explicitly selecting the fields with aliases equal to the column names:
$requestFirst = $this->TableA
->find()
->select(['a' => 'a', 'b' => 'b'])
// ...
$requestTwo = $this->TableB
->find()
->select(['c' => 'c', 'd' => 'd'])
// ...
Then you can safely use those queries for a union as a subquery:
$union = $requestFirst->union($requestTwo);
$wrapper = $this->TableA
->find()
->from([$this->TableA->getAlias() => $union]);
$request = $this->paginate($wrapper);
or as a common table expression (in case your DBMS supports them):
$union = $requestFirst->union($requestTwo);
$wrapper = $this->TableA
->find()
->with(function (\Cake\Database\Expression\CommonTableExpression $cte) use ($union) {
return $cte
->name('union_source')
->field(['a', 'b'])
->query($union)
})
->select(['a', 'b'])
->from([$this->TableA->getAlias() => 'union_source']);
$request = $this->paginate($wrapper);

Cypher statement with distinct match conditions is returning the same result

I am using Neo4j as a database to store voting information related to another database object.
I have a Vote object which has fields:
type:String with values of UP or DOWN.
argId:String which is a string ID value linking to a unique argument object
I am trying to query the number of votes assigned to a given argId using the following queries:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='DOWN'
RETURN {downvotes: COUNT(v)} AS votes
UNION
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
RETURN {upvotes: COUNT(v)} AS votes
Note that this above cypher -- works and returns the expected result result like so:
[
{
"downvotes": 1
},
{
"upvotes": 10
}
]
But I feel like the query could be a bit neater and want to write something like this:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN {upvotes: COUNT(v), downvotes: COUNT(b)}
Just reading it through, I think it makes sense, b and v are declared as separate variables, so all should be good (so I thought).
But running it given me this:
{
"upvotes": 10,
"downvotes": 10
}
But it should be what I have above.
Why is this?
I'm kinda new to neo4j and cypher so I've probably not understood how cypher works fully.
Can anyone shine any light?
Thank you!
p.s. I'm using Neo4j 3.5.6 and running the queries via the Desktop web browser app.
I think if you run this query you will get a clearer picture of what is happeneing. Your query produces a cartesian product of the upvotes(10) and the downvotes(1). The product is a result set of 10 rows. When they are subsequently counted, there are ten of each.
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN v.type, b.type
In order to get the result you want you need to filter the values and count them individually.
Rather than have two match statements, have a single match statement that retreives all of the values of interest and then use a conditional statement to filter them into upvotes and downbotes buckets.
Something like this may suit you.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
RETURN {
upvotes: count(CASE WHEN v.type = 'DOWN' THEN 1 END),
downvotes: count(CASE WHEN v.type = 'UP' THEN 1 END)
} AS vote_result
Using APOC you could do something like this whereby you use the type values themselves to aggregate the counts and then use APOC to convert it to a map with the types as the keys in the map.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
WITH [v.type, count(*)] AS vote_pair
RETURN apoc.map.fromPairs(collect(vote_pair)) AS votes

Filter SQL datatable according to different parameters, without a WHERE clause

I'm building an application that needs to allow the user to filter a data table according to different filters. So, the user will have three different filter posibilites but he might use only one, or two or the three of them at the same tame.
So, let's say I have the following columns on the table:
ID (int) PK
Sede (int)
Programa (int)
Estado (int)
All of those columns will store numbers, integers. The "ID" column is the primary key, "Sede" stores 1 or 2, "Programa" is any number between 1 and 15, and "Estado" will store numbers between 1 and 13.
The user may filter the data stored in the table using any of those filters (Sede, Programa or Estado). But the might, as well, use two filters, or the three of them at the same time.
The idea is that this application works like the data filters on Excel. I created a simulated table on excel to show what I want to achieve:
This first image shows the whole table, without applying any filter.
Here, the user selected a filter for "Sede" and "Programa" but leaved the "Estado" filter empty. So the query returns the values that are equal to the filter, but leaves the "Estado" filter open, and brings all the records, filering only by "Sede" (1) and "Programa" (6).
In this image, the user only selected the "Estado" filter (5), so it brings all the records that match this criteria, it doesn't matter if "Sede" or "Programa" are empty.
If I use a SELECT clasuse with a WHERE on it, it will work, but only if the three filters have a value:
DECLARE #sede int
DECLARE #programa int
DECLARE #estado int
SET #sede = '1'
SET #programa = '5'
SET #estado = '12'
SELECT * FROM [dbo].[Inscripciones]
WHERE
([dbo].[Inscripciones].[Sede] = #sede)
AND
([dbo].[Inscripciones].[Programa] = #programa)
AND
([dbo].[Inscripciones].[Estado] = #estado)
I also tryed changing the "AND" for a "OR", but I can't get the desired result.
Any help will be highly appreciated!! Thanks!
common problem: try using coalesce on the variable and for the 2nd value use the field name you're comparing to. Be careful though; Ensure it's NULL and not empty string being passed!
What this does is take the first non-null value of the variable passed in or the value you're comparing to.. Thus if the value passed in is null the comparison will always return true.
WHERE
[dbo].[Inscripciones].[Sede] = coalesce(#sede, [dbo].[Inscripciones].[Sede])
AND
[dbo].[Inscripciones].[Programa] = coalesce(#programa, [dbo].[Inscripciones].[Programa])
AND
[dbo].[Inscripciones].[Estado] = coalesce(#estado, [dbo].[Inscripciones].[Estado])
If sede is null and programa and estado are populated the compare would look like...
?=? (or 1=1)
?=programa variable passed in
?=Estado variable passed in
Boa Sorte!
Thank you all for your anwers. After reading the article posted in the comments by #SeanLange I was finally able to achieve what was needed. Using a CASE clause in the WHERE statement solves the deal. Here's the code:
SELECT
*
FROM [dbo].[Inscripciones]
WHERE
([dbo].[Inscripciones].[Sede] = (CASE WHEN #sede = '' THEN [dbo].[Inscripciones].[Sede] ELSE #sede END))
AND
([dbo].[Inscripciones].[Programa] = (CASE WHEN #programa = '' THEN [dbo].[Inscripciones].[Programa] ELSE #programa END))
AND
([dbo].[Inscripciones].[Estado] = (CASE WHEN #estado = '' THEN [dbo].[Inscripciones].[Estado] ELSE #estado END))
AND
([dbo].[Inscripciones].[TipoIngreso] = (CASE WHEN #tipoingreso = '' THEN [dbo].[Inscripciones].[TipoIngreso] ELSE #tipoingreso END))
Thanks again!!

In ArangoDB, will querying, with filters, from the neighbor(s) be done in O(n)?

I've been reading Aql Graph Operation and Graphs, and have found no concrete example and performance explanation for the use case of SQL-Traverse.
E.g.:
If I have a collection Users, which has a company relation to collection Company
Collection Company has relation location to collection Location;
Collection Location is either a city, country, or region, and has relation city, country, region to itself.
Now, I would like to query all users who belong to companies in Germany or EU.
SELECT from Users where Users.company.location.city.country.name="Germany";
SELECT from Users where Users.company.location.city.parent.name="Germany";
or
SELECT from Users where Users.company.location.city.country.region.name="europe";
SELECT from Users where Users.company.location.city.parent.parent.name="europe";
Assuming that Location.name is indexed, can I have the two queries above executed with O(n), with n being the number of documents in Location (O(1) for graph traversal, O(n) for index scanning)?
Of course, I could just save regionName or countryName directly in company, as these cities and countries are in EU, unlike in ... other places, won't probably change, but what if... you know what I mean (kidding, what if I have other use cases which require constant update)
I'm going to explain this using the ArangoDB 2.8 Traversals.
We Create these collections to match your shema using arangosh:
db._create("countries")
db.countries.save({_key:"Germany", name: "Germany"})
db.countries.save({_key:"France", name: "France"})
db.countries.ensureHashIndex("name")
db._create("cities")
db.cities.save({_key: "Munich"})
db.cities.save({_key: "Toulouse")
db._create("company")
db.company.save({_key: "Siemens"})
db.company.save({_key: "Airbus"})
db._create("employees")
db.employees.save({lname: "Kraxlhuber", cname: "Xaver", _key: "user1"})
db.employees.save({lname: "Heilmann", cname: "Vroni", _key: "user2"})
db.employees.save({lname: "Leroy", cname: "Marcel", _key: "user3"})
db._createEdgeCollection("CityInCountry")
db._createEdgeCollection("CompanyIsInCity")
db._createEdgeCollection("WorksAtCompany")
db.CityInCountry.save("cities/Munich", "countries/Germany", {label: "beautiful South near the mountains"})
db.CityInCountry.save("cities/Toulouse", "countries/France", {label: "crowded city at the mediteranian Sea"})
db.CompanyIsInCity.save("company/Siemens", "cities/Munich", {label: "darfs ebbes gscheits sein? Oder..."})
db.CompanyIsInCity.save("company/Airbus", "cities/Toulouse", {label: "Big planes Ltd."})
db.WorksAtCompany.save("employees/user1", "company/Siemens", {employeeOfMonth: true})
db.WorksAtCompany.save("employees/user2", "company/Siemens", {veryDiligent: true})
db.WorksAtCompany.save("employees/user3", "company/Eurocopter", {veryDiligent: true})
In AQL we would write this query the other way around.
We start with the constant time FILTER on the indexed attribute name and start our traversals from there on.
Therefor we filter for the country "Germany":
db._explain("FOR country IN countries FILTER country.name == 'Germany' RETURN country ")
Query string:
FOR country IN countries FILTER country.name == 'Germany' RETURN country
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR country IN countries /* hash index scan */
5 ReturnNode 1 - RETURN country
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash countries false false 66.67 % [ `name` ] country.`name` == "Germany"
Optimization rules applied:
Id RuleName
1 use-indexes
2 remove-filter-covered-by-index
Now that we have our well filtered start node, we do a graph traversal in reverse direction. Since we know that Employees are exactly 3 steps away from the start Vertex, and we're not interested in the path, we only return the 3rd layer:
db._query("FOR country IN countries FILTER country.name == 'Germany' FOR v IN 3 INBOUND country CityInCountry, CompanyIsInCity, WorksAtCompany RETURN v")
[
{
"cname" : "Xaver",
"lname" : "Kraxlhuber",
"_id" : "employees/user1",
"_rev" : "1286703864570",
"_key" : "user1"
},
{
"cname" : "Vroni",
"lname" : "Heilmann",
"_id" : "employees/user2",
"_rev" : "1286729095930",
"_key" : "user2"
}
]
Some words about this queries performance:
We locate Germany using a hash index is constant time -> O(1)
Based on that we want to traverse m many paths where m is the number of employees in Germany; Each of them can be traversed in constant time.
-> O(m) at this step.
Return the result in constant time -> O(1)
All combined we need O(m) where we expect m to be less than n (the number of employees) as used in your SQL-Traversal.

Dapper: How to get value from DapperRow if column name is "count(*)"?

I have a dynamic result from Dapper query that contains records like this:
{DapperRow, billing_currency_code = 'USD', count(*) = '6'}
I'm able to access 'USD' by using rowVariable.billing_currency_code
To get '6' value I tried rowVariable["count(*)"] and rowVariable.kv["count(*)"] and unfortunately nothing works...
I can't change the count(*) column name in my case
How to get the '6' value from the rowVariable of type DapperRow in such case?
If the column name genuinely is "count(*)", then you can cast the row to a dictionary:
var data = (IDictionary<string,object>)row;
object value = data["count(*)"];
For that to work (at least, in SQL Server), your query would need to be something like:
select count(*) as [count(*)]
However, in most cases the column doesn't have a name, in which case: fix your query ;p
Actually, I'd probably say fix your query anyway; the following would be much easier to work with:
select count(*) as [Count]
Suppose Your Data as below
var Details={DapperRow, billing_currency_code = 'USD', count(*) = '6'}
as The columns is coming dynamically
var firstRow= Details.FirstOrDefault();
To get the heading columns of the data
var Heading= ((IDictionary<string, object>)firstRow).Keys.ToArray();
To get the value of the data by using key
var details = ((IDictionary<string, object>)firstRow);
var vallues= details[Heading[0]];

Resources