SOLR field query skip records if..statement - solr

I have a lot of products in solr, all of them with a field_code and a field_stock, and i want to exclude records like this, with a field query.
-field_code:(A OR B) OR (-(field_code:C AND field_stock:false) AND -(field_code:D AND field_stock:false))
so, all products with field_code either A or B
OR
all products with field_code C OR D and both not in stock, shoudl be excluded
everything else should be returned ?
Update:
I have added another field to the query, and it does not work.
i have field_code , field_stock and the new field_type
I need to remove all products from the query, that:
does not have: field_code = A OR B OR C OR D
and
field_stock = 0 (-field_stock:1)
AND
field_type = 'joe'
So something like:
SELECT * FROM table WHERE (field_code NOT IN (A,B,C,D) and field_type = 'joe' AND field_stock=0)
So that all records is returned, but the above ? Make any sense ?

When using a negative query you have to subtract the query from something - just having a negative query won't do anything by itself (if you ONLY have a single, negative query without any boolean operators Solr helpfully prefixes the set of all documents in front, since that's probably what you meant).
Something like this would probably match what you're describing:
(*:* -field_code:(A OR B)) OR
(*:* -(field_code:(C OR D) AND field_stock:false))
I'm assuming that field_code is multi-valued, since the second term after OR won't make sense otherwise.

Related

MSSQL select query with prioritized OR

I need to build one MSSQL query that selects one row that is the best match.
Ideally, we have a match on street, zip code and house number.
Only if that does not deliver any results, a match on just street and zip code is sufficient
I have this query so far:
SELECT TOP 1 * FROM realestates
WHERE
(Address_Street = '[Street]'
AND Address_ZipCode = '1200'
AND Address_Number = '160')
OR
(Address_Street = '[Street]'
AND Address_ZipCode = '1200')
MSSQL currently gives me the result where the Address_Number is NOT 160, so it seems like the 2nd clause (where only street and zipcode have to match) is taking precedence over the 1st. If I switch around the two OR clauses, same result :)
How could I prioritize the first OR clause, so that MSSQL stops looking for other results if we found a match where the three fields are present?
The problem here isn't the WHERE (though it is a "problem"), it's the lack of an ORDER BY. You have a TOP (1), but you have nothing that tells the data engine which row is the "top" row, so an arbitrary row is returned. You need to provide logic, in the ORDER BY to tell the data engine which is the "first" row. With the rudimentary logic you have in your question, this would like be:
SELECT TOP (1)
{Explicit Column List}
realestates
WHERE Address_Street = '[Street]'
AND Address_ZipCode = '1200'
ORDER BY CASE Address_Number WHEN '160' THEN 1 ELSE 2 END;
You can't prioritize anything in the WHERE clause. It always results in ALL the matching rows. What you can do is use TOP or FETCH to limit how many results you will see.
However, in order for this to be effective, you MUST have an ORDER BY clause. SQL tables are unordered sets by definition. This means without an ORDER BY clause the database is free to return rows in any order it finds convenient. Mostly this will be the order of the primary key, but there are plenty of things that can change this.

Creating multicolumn indexes in PostgreSQL

I'm researching now on creating indexes for our tables.
I found out about multicolumn indexes but I'm not sure on the impact.
Example:
We have SQLs on findById, findByIdAndStatus, findByResult.
It says that the most used on WHERE should be listed first in the columns list. But I was wondering if it'll have a huge impact if I create index on different combination where clauses.
This: (creating one index for all)
CREATE INDEX CONCURRENTLY ON Students (id, status, result)
vs.
This: (creating different indexes on different queries)
CREATE INDEX CONCURRENTLY ON Students (id)
CREATE INDEX CONCURRENTLY ON Students (id, status)
CREATE INDEX CONCURRENTLY ON Students (result)
Thank you so much in advance!
Creating one index for all and creating different indexes will have completely different impact on the queries.
You can use EXPLAIN to see if indexes are getting used for the queries.
This video is really good to know about DB indexes.
Index CREATE INDEX CONCURRENTLY ON Students (id, status, result) will be used only and only if query uses id, (id,status) or (id, status and result) in WHERE clause. a query with status in Where will not use this index at all.
Indexes are basically balanced binary trees. A multicolumn index will index rows by id, then rows ordered by id's are further indexes by status and then with result and so on.
You can see that in this index, the ordering via status is not present at all. It is only available on rows indexed by the id's first.
Do have the look at video, it explains all this pretty well.
The rule of thumb you read is wrong.
A better rule is: create such an index only if it is useful and gets used often enough that it is worth the performance hit on data modification that comes with every index.
A multi-column B-tree index on (a, b, c) is useful in several cases:
if the query looks like this:
SELECT ... FROM tab
WHERE a = $1 AND b = $2 AND c <operator> $3
where <operator> is an operator supported by the index and $1, $2 and $3 are constants.
if the query looks like this:
SELECT ... FROM tab
WHERE a = $1 AND b = $2
ORDER BY c;
or like this
SELECT ... FROM tab
WHERE a = $1
ORDER BY b, c;
Any decorations in the ORDER BY clause must be reflected in the CREATE INDEX statement. For example, for ORDER BY b, c DESC the index must be created on (a, b, c DESC) or (a, b DESC, c) (indexes can be read in both directions).
if the query looks like this:
SELECT c
FROM tab
WHERE a = $1 AND b <operator> $2;
If the table is newly VACUUMed, this can get you an index only scan, because all required information is in the index.
In recent PostgreSQL versions, such an index in better created as
CREATE INDEX ON tab (a, b) INCLUDE (c);

Lucene Query syntax using Boolean Clauses

I have two fields in Lucene
type (can contain values like X, Y, Z)
date (contains values like 2015-18-10 etc)
I want to write following query: (type = X and date=today's data) OR (type = anything except X).
How can I write this query using SHOULD, MUST, MUST_NOT? looks like there is no clause for these type of query.
You can express the latter part using *:* -type:X, as this creates the set of all documents, and then subtracts the set of documents that has type:X. The *:* query is represented as MatchAllDocsQuery in code.
If I got your problem, I think the solution is just some combination of BooleanQuery, following is the code written in Scala to address the issue.
According to the documentation(in BooleanClause.java), MUST_NOT should be used with caution.
Use this operator for clauses that must not appear in the matching documents.Note that it is not possible to search for queries that only consist of a MUST_NOT clause.
object LuceneTest extends App {
val query = new BooleanQuery
val subQuery1 = new BooleanQuery
subQuery1.add(new TermQuery(new Term("type", "xx")), BooleanClause.Occur.MUST)
subQuery1.add(new TermQuery(new Term("date", "yy")), BooleanClause.Occur.MUST)
val subQuery2 = new BooleanQuery
// As mentioned above, so I put MatchAllDocsQuery here to avoid only containing MUST_NOT
subQuery2.add(new MatchAllDocsQuery, BooleanClause.Occur.MUST)
subQuery2.add(new TermQuery(new Term("type", "xx")),BooleanClause.Occur.MUST_NOT)
// subQuery1 and subQuery2 construct two subQueries respectively
// then use OR(i.e SHOULD in Lucene) to combine them
query.add(subQuery1, BooleanClause.Occur.SHOULD)
query.add(subQuery2, BooleanClause.Occur.SHOULD)
query
}
Anyway, hope it helps.

Equivalent of "IN" that uses AND instead of OR logic?

I know I'm breaking some rules here with dynamic SQL but I still need to ask. I've inherited a table that contains a series of tags for each ticket that I need to pull records from.
Simple example... I have an array that contains "'Apples','Oranges','Grapes'" and I am trying to retrieve all records that contain ALL items contained within the array.
My SQL looks like this:
SELECT * FROM table WHERE basket IN ( " + fruitArray + " )
Which of course would be the equivalent of:
SELECT * FROM table WHERE basket = 'Apples' OR basket = 'Oranges' OR basket = 'Grapes'
I'm curious if there is a function that works the same as IN ( array ) except that it uses AND instead of OR so that I can obtain the same results as:
SELECT * FROM table WHERE basket LIKE '%Apples%' AND basket LIKE '%Oranges%' AND basket LIKE '%Grapes%'
I could probably just generate the entire string manually, but would like a more elegant solution if at all possible. Any help would be appreciated.
This is a very common problem in SQL. There are basically two solutions:
Match all rows in your list, group by a column that has a common value on all those rows, and make sure the count of distinct values in the group is the number of elements in your array.
SELECT basket_id FROM baskets
WHERE basket IN ('Apples','Oranges','Grapes')
GROUP BY basket_id
HAVING COUNT(DISTINCT basket) = 3
Do a self-join for each distinct value in your array; only then you can compare values from multiple rows in one WHERE expression.
SELECT b1.basket_id
FROM baskets b1
INNER JOIN baskets b2 USING (basket_id)
INNER JOIN baskets b3 USING (basket_id)
WHERE (b1.basket, b2.basket, b3.basket) = ('Apples','Oranges','Grapes')
There may be something like that in full text search, but in general, I sincerely doubt such an operator would be very useful, outside the conjunction with LIKE.
Consider:
SELECT * FROM table WHERE basket ='Apples' AND basket = 'Oranges'
it would always match zero rows.
If basket is a string, like your example suggests, then the closest you could get would be to use LIKE '%apples%oranges%grapes%', which could be built easily with '%'.implode('%', $tags).'%'
The issue with this is if some of 'tags' might be contained in other tags, e.g. 'supercalifragilisticexpialidocious' LIKE '%super%' will be true.
If you need to do LIKE comparisons, I think you're out of luck. If you are doing exact comparisons invovling matching sets in arbitrary order, you should look into the INTERSECT and EXCEPT options of the SELECT statement. They're a bit confusing, but can be quite powerful. (You'd have to parse your delimited strings into tabular format, but of course you're doing that anyway, aren't you?)
Are the items you're searching for always in the same order within the basket? If yes, a single LIKE should suffice:
SELECT * FROM table WHERE basket LIKE '%Apples%Oranges%Grapes%';
And concatenating your array into a string with % separators should be trivial.

Datastore Query filtering on list

Select all records, ID which is not in the list
How to make like :
query = Story.all()
query.filter('ID **NOT IN** =', [100,200,..,..])
There's no way to do this efficiently in App Engine. You should simply select everything without that filter, and filter out any matching entities in your code.
This is now supported via GQL query
The 'IN' and '!=' operators in the Python runtime are actually
implemented in the SDK and translate to multiple queries 'under the
hood'.
For example, the query "SELECT * FROM People WHERE name IN ('Bob',
'Jane')" gets translated into two queries, equivalent to running
"SELECT * FROM People WHERE name = 'Bob'" and "SELECT * FROM People
WHERE name = 'Jane'" and merging the results. Combining multiple
disjunctions multiplies the number of queries needed, so the query
"SELECT * FROM People WHERE name IN ('Bob', 'Jane') AND age != 25"
generates a total of four queries, for each of the possible conditions
(age less than or greater than 25, and name is 'Bob' or 'Jane'), then
merges them together into a single result set.
source: appengine blog
This is an old question, so I'm not sure if the ID is a non-key property. But in order to answer this:
query = Story.all()
query.filter('ID **NOT IN** =', [100,200,..,..])
...With ndb models, you can definitely query for items that are in a list. For example, see the docs here for IN and !=. Here's how to filter as the OP requested:
query = Story.filter(Story.id.IN([100,200,..,..])
We can even query for items that in a list of repeated keys:
def all(user_id):
# See if my user_id is associated with any Group.
groups_belonged_to = Group.query().filter(user_id == Group.members)
print [group.to_dict() for group in belong_to]
Some caveats:
There's docs out there that mention that in order to perform these types of queries, Datastore performs multiple queries behind the scenes, which (1) might take a while to execute, (2) take longer if you searching in repeated properties, and (3) will up your costs with more operations.

Resources