Lucene Query syntax using Boolean Clauses - solr

I have two fields in Lucene
type (can contain values like X, Y, Z)
date (contains values like 2015-18-10 etc)
I want to write following query: (type = X and date=today's data) OR (type = anything except X).
How can I write this query using SHOULD, MUST, MUST_NOT? looks like there is no clause for these type of query.

You can express the latter part using *:* -type:X, as this creates the set of all documents, and then subtracts the set of documents that has type:X. The *:* query is represented as MatchAllDocsQuery in code.

If I got your problem, I think the solution is just some combination of BooleanQuery, following is the code written in Scala to address the issue.
According to the documentation(in BooleanClause.java), MUST_NOT should be used with caution.
Use this operator for clauses that must not appear in the matching documents.Note that it is not possible to search for queries that only consist of a MUST_NOT clause.
object LuceneTest extends App {
val query = new BooleanQuery
val subQuery1 = new BooleanQuery
subQuery1.add(new TermQuery(new Term("type", "xx")), BooleanClause.Occur.MUST)
subQuery1.add(new TermQuery(new Term("date", "yy")), BooleanClause.Occur.MUST)
val subQuery2 = new BooleanQuery
// As mentioned above, so I put MatchAllDocsQuery here to avoid only containing MUST_NOT
subQuery2.add(new MatchAllDocsQuery, BooleanClause.Occur.MUST)
subQuery2.add(new TermQuery(new Term("type", "xx")),BooleanClause.Occur.MUST_NOT)
// subQuery1 and subQuery2 construct two subQueries respectively
// then use OR(i.e SHOULD in Lucene) to combine them
query.add(subQuery1, BooleanClause.Occur.SHOULD)
query.add(subQuery2, BooleanClause.Occur.SHOULD)
query
}
Anyway, hope it helps.

Related

Django query filter using large array of ids in Postgres DB

I want to pass a query in Django to my PostgreSQL database. When I filter my query using a large array of ids, the query is very slow and goes up to 70s.
After looking for an answer I saw this post which gives a solution to my problem, simply change the ARRAY [ids] in IN statement by VALUES (id1), (id2), ....
I tested the solution with a raw query in pgadmin, the query goes from 70s to 300ms...
How can I do the same command (i.e. not using an array of ids but a query with VALUES) in Django?
I found a solution building on #erwin-brandstetter answer using a custom lookup
from django.db.models import Lookup
from django.db.models.fields import Field
#Field.register_lookup
class EfficientInLookup(Lookup):
lookup_name = "ineff"
def as_sql(self, compiler, connection):
lhs, lhs_params = self.process_lhs(compiler, connection)
rhs, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + rhs_params
return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params
This allows to filter like this:
MyModel.objects.filter(id__ineff=<list-of-values>)
The trick is to transform the array to a set somehow.
Instead of (this form is only good for a short array):
SELECT *
FROM tbl t
WHERE t.tbl_id = ANY($1);
-- WHERE t.tbl_id IN($1); -- equivalent
$1 being the array parameter.
You can still pass an array like you had it, but unnest and join. Like:
SELECT *
FROM tbl t
JOIN unnest($1) arr(id) ON arr.id = t.tbl_id;
Or you can keep your query, too, but replace the array with a subquery unnesting it:
SELECT * FROM tbl t
WHERE t.tbl_id = ANY (SELECT unnest($1));
Or:
SELECT * FROM tbl t
WHERE t.tbl_id IN (SELECT unnest($1));
Same effect for performance as passing a set with a VALUES expression. But passing the array is typically much simpler.
Detailed explanation:
IN vs ANY operator in PostgreSQL
How to use ANY instead of IN in a WHERE clause with Rails?
Optimizing a Postgres query with a large IN
Is this an example of the first thing you're asking?
relation_list = list(ModelA.objects.filter(id__gt=100))
obj_query = ModelB.objects.filter(a_relation__in=relation_list)
That would be an "IN" command because you're first evaluating relation_list by casting it to a list, and then using it in your second query.
If instead you do the exact same thing, Django will only make one query, and do SQL optimization for you. So it should be more efficient that way.
You can always see the SQL command you'll be executing with obj_query.query if you're curious what's happening under the hood.
Hope that answers the question, sorry if it doesn't.
I had lots of trouble to make the custom lookup 'ineff' work.
I may have solved it, but would love some validation from Django and Postgres experts.
1) Using it 'directly' on a ForeignKey field (ModelB)
ModelA.objects.filter(ModelB__ineff=queryset_ModelB)
Throws the following exception:
"Related Field got invalid lookup: ineff"
ForeignKey fields cannot be used with custom lookups.
A similar issue is reported here:
Custom lookup is not being registered in Django
2) Using it 'indirectly' on the pk field of related model (ModelB.id)
ModelA.objects.filter(ModelB__id__ineff=queryset_ModelB.values_list('id', flat=True))
Throws the following exception:
"can only concatenate list (not "tuple") to list"
Looking at Django Traceback, I noticed that rhs_params is a tuple.
Yet we try to add it to lhs_params (a list) in our custom lookup.
Hence I changed:
params = lhs_params + rhs_params
into:
params = lhs_params + list(rhs_params)
3) I then got a Postgres error (at least I had passed Django ORM)
"function unnest(uuid) does not exist"
"HINT: No function matches the given name and argument types. You might need to add explicit type casts."
I apparently solved it by changing the sql:
from:
return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params
to:
return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params
Hence my final as_sql method looks like this:
def as_sql(self, compiler, connection):
lhs, lhs_params = self.process_lhs(compiler, connection)
rhs, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + list(rhs_params)
return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params
It seems to work, and is indeed faster than in__ (tested with EXPLAIN ANALYZE in Postgres).
But I would love to have some validation from experts, perhaps Erwin Brandstetter?
Thanks for your input.

Solr: how to get total number of results for grouped query using the Java API

I have the following:
43 documents indexed in Solr
If I use the Java API to do a query without any grouping, such as:
SolrQuery query = new SolrQuery("*:*");
query.setRows(10);
I can then obtain the total number of matching elements like this:
solrServer.query(query).getResults().getNumFound(); //43 in this case
The getResults() method returns a SolrDocumentList instance which contains this value.
If however I use grouping, something like:
query.set("group", "true");
query.set("group.format", "simple");
query.set("group.field", "someField");
Then the above code for retrieving query results no loger works (throws NPE), and I have to instead use:
List<GroupCommand> groupCommands = solrServer.query(query).getGroupResponse().getValues();
List<Group> groups = groupCommand.getValues();
Group group = groups.get(groupIndex);
I don't understand how to use this part of the API to get the overall number of matching documents (the 43 from the non-grouping query above). First I thought that with grouping is no longer possible to get that, but I've noticed that if I do a similar query in the Solr admin console, with the same grouping and everything, it returns the exact same results as the Java API and also numFound=43. So obviously the code used for the console has some way to retrieve that value even when grouping is used:
My question is, how can I get that overall number of matching documents for a query using grouping executed via Solr Java API?
In looking at the source for Group that is returned from your groups.get(groupIndex) call, it has a getResults() method that returns a SolrDocumentList. The SolrDocumentList has a getNumFound() method that should return the overall number, I believe...
So you should be able to get this as the following:
int numFound = group.getResults().getNumFound();
Hope this helps.
Update: I believe as OP stated, group.getResults().getNumFound() will only return the number of items in the group. However, on the GroupCommand there is a getMatches() method that may be the corresponding count that is desired.
int matches = groupCommands.getMatches();
If you set the ngroups parameter to true (default false) this will return the number of groups.
eg:
solrQuery.set("group.ngroups", true);
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
this can then be retrieved from your responding GroupCommand with:
int numGroups = tempGroup.getNGroups();
At least that was my understanding?

SOLR (3.1+) - Multiple Spatial Queries with OR in Same Request

Is it possible to conduct multiple spatial queries within the same SOLR (3.1+) request?
We currently have a need to allow user to search for inventory with a location of their choice via a frontend search form. But we want to also add another spatial search behind the scenes so it will include more inventory. The resulting search would result in a venn diagram type of search.
Edit 10.4.2011
Example construct: q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20_query_:(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
The above construct does not work, but hopefully demonstrates what I am trying to accomplish.
This is old, but it doesn't seem like it ever got a full answer. I had the same issue and found that this syntax works:
q =*:*& fq = (({
!geofilt sfield = Location pt = 40.68063802521456,
-74.00390625 d = 80.4672
}
AND ClientId : "client1")OR({
!geofilt sfield = Location pt = 36.1146460,
-115.1728160 d = 80.4672
}
AND ClientId : "client2"))
It looks like, you like to run N querys in one request in order to get one result set per query?!
So Field Collapsing ( http://wiki.apache.org/solr/FieldCollapsing ) is what you are looking for. Unfortunately FieldCollapsing is only available from 3.3.
Depending on your needs, maybe counted results from different faceted searches could be also useful?!
What if you moved your second location query into an additional filter query, like below:
q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)&fq={!geofilt}&sfield=Location&(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
Will that provide the results that you are looking for? It might end up being too limiting, but thought it was worth trying.
You might also try:
q=*:*&fq={!geofilt}&sfield=Location&((ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672))

GQL query with "like" operator [duplicate]

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Google App Engine: Is it possible to do a Gql LIKE query?

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Resources