How to utilize a Lucene Query class (CommonTermsQuery) with SolrJ

How to utilize a Lucene Query class (CommonTermsQuery) with SolrJ - solr

I want to use Lucene's CommonTermsQuery class for a query executed with SolrJ, so how do I utilize Lucene's Query classes? And what are the differences between those classes and what appears to be Solr's query parsers?

Solr currently doesn't include a query parser that uses CommonTermsQuery, but you can add your own query parsers to Solr by compiling a .jar by yourself and then adding that jar in a <lib .. directive in solrconfig.xml.
There's an existing example on how to make a QParserPlugin for Solr with CommonTermsQuery available as a gist, so that's probably a good place to start for a custom plugin. You'll select the custom QueryParser through the standard {!syntax} in start of a query. Since SolrJ is just the client talking to a Solr server, the plugin itself has to be implemented and loaded on the server (or if you're running in SolrCloud / Cluster mode, on all servers).
A Query Parser takes free form text (which is what Solr is great at) and converts it into a set of Query classes for Lucene to execute (which represents the query, in the way that the query parser thought that the user wanted to express herself).
The differences between Solr's query parser and Lucene's query parser are several, but most people use the edismax or dismax query parser these days (these may have evolved into the Lucene QP over time as well unknown to me):
Differences in the Solr Query Parser include (these are from an older page on the Solr Wiki - I'm not sure if there's a more recent version available, but since Solr and Lucene's code merged into a single tree and got synchronized, I guess there are less new differences introduced compared to when they were separate projects):
Range queries [a TO z], prefix queries a*, and wildcard queries a*b are constant-scoring (all matching documents get an equal score). The scoring factors tf, idf, index boost, and coord are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).
Lucene 2.1 has also switched to use ConstantScoreRangeQuery for its range queries.
A * may be used for either or both endpoints to specify an open-ended range query.
field:[* TO 100] finds all field values less than or equal to 100
field:[100 TO *] finds all field values greater than or equal to 100
field:[* TO *] matches all documents with the field
Pure negative queries (all clauses prohibited) are allowed.
-inStock:false finds all field values where inStock is not false
-field:[* TO *] finds all documents without a value for field
A hook into FunctionQuery syntax. Quotes will be necessary to encapsulate the function when it includes parentheses.
Example: _val_:myfield
Example: _val_:"recip(rord(myfield),1,2,3)"
Nested query support for any type of query parser (via QParserPlugin).
Quotes will often be necessary to encapsulate the nested query if it contains reserved characters.
Example: query:"{!dismax qf=myfield}how now brown cow"

Related

In Solr, how can we use terms external to the search query to bias result ordering?

We're working on a plan to identify content tags our users are interested in. So, for instance, we may determine that User X consumes content tagged with "kermit" and "piggy" more often than other tags. These are their "favored tags."
When the users search, we'd like to favor/bias documents that contain these terms.
This means we can't boost the documents at index time, because every user will have different favored tags. Additionally, they may not be searching for the favored tags themselves. They may search for "gonzo," and so we absolutely want to give them documents with "gonzo," but we want to boost documents that also contain "kermit" or "piggy."
These favored tags are not used to actually query the index, but rather are used to bias the result ordering. The favored tags become something of a tie-breaker -- all else being equal, documents containing these terms will rank higher.
This is new/planned development, so we can use whatever version and parser stack is optimal to solve this problem.
Solution in SolrNet
The question was correctly answered below, but here's the code for SolrNet just in case someone else is using it.
var localParams = new LocalParams();
localParams.Add("bq", "kermit^10000); //numeric value is the degree of boost
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<MySolrDocumentClass>>();
solr.Query(new SolrQuery("whatever") + localParams);

You didn't specify which query parser you're using, but if you are using the Dismax or Extended Dismax query parser, the bq argument should do exactly what you're looking for. bq adds search criteria to a search solely for the purpose of affecting the relevancy, but not to limit the result set.
From the Dismax documentation:
The bq (Boost Query) Parameter
The bq parameter specifies an additional, optional, query clause that
will be added to the user's main query to influence the score. For
example, if you wanted to add a relevancy boost for recent documents:
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
You can specify multiple bq parameters. If you want your query to be
parsed as separate clauses with separate boosts, use multiple bq
parameters.
In this case, you may want to add &bq=kermit&bq=piggy to the end of your Solr query. If you aren't using one of these query parsers, this need may be exactly the motivation you need to switch.

Solr 4.2.1 and SOLR-1604 : ComplexPhrase AND date range queries do not work together

I recently patched my Solr 4.2.1 with the ComplexPhrase query addon (https://issues.apache.org/jira/browse/SOLR-1604). When I issue a query such as :
my_text_field:"testin* compl*"~1 AND my_date_field:2013-12-12T04:58:53.732Z
I get results that contain the text query I issued and the date I issued in the my_date_field.
But when I do this:
my_text_field:"testin* compl*"~1 AND my_date_field:[2013-01-01T02:58:53.732Z TO 2013-12-12T04:58:53.732Z]
I get no results.
If I remove the complexphrase parser things go back to normal ( but I have no support for complex phrase queries ).

Ok after some time reading the lucene and solr code I figured it out.
This patch creates a Query Parser that extends the Lucene QueryParser. The Lucene QueryParser does not handle range queries other than Term Ranges ( simple strings in a way ). If one wants to specialize the behavior of the QueryParser, he must extract the field type and create the appropriate range query ( eg NumericRangeQuery for numbers, etc).

Solr/Lucene - partial fuzzy match

How do you set up partial (substring) fuzzy match in Solr 4.2.1?
For example, if you have a list of US cities indexed, I would like a search term "Alber" to match "Alburquerque".
I have tried using the NGramFilterFactory on the <fieldType> and rebuilt the index but queries do not return results as expected - they still work as if I had just done the standard text_general defaults. Exact matches work, and explicit fuzzy searches would work given sufficient similarity (for example "Alberquerque~" with one misspelling would work.)
I did go to the analyzer tool in the Solr admin and saw that my ngrams were indeed being generated.
Is there something i'm missing from the query side?
Or should I take a different approach altogether?
And can this work with dismax? (Multiple fields indexed like this with different weights)
Thanks!

How to boost fields in solr

I already have the boost determined before hand. I have a field in the solr index called boost1 . This boost field will have a value from 1 to 10 similar to google PR rank. This is the boost that should be applied to every query ran in solr. here are the fields in my index
Id
Title
Text
Boost1
The boost field should be apply to every query. I am trying to implement functionality similar to Google PR rank. Is there a way to do this using solr?

you can add the boost during query e.g.
q={!boost b=boost1}
How_can_I_boost_the_score_of_newer_documents
However, this may need to be added explicitly by you.
If you are using dismax or edismax with the request handler, The bf (Boost Functions) parameter could be used to boost the documents.
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
bf=boost1^0.5
This can be added to defaults with the request handler definition, so that they are applied to all the search queries.
you can use function queries to vary the amount of boost FunctionQuery

I think you need to use index time document boosts. See this if you are indexing XML or this if using DataImportHandler.

How to do a constant score query in Solr

I'm using SolrNet to access a Solr index where I have a multivalue field called "tags". I want to perform the following pseudo-code query:
(tags:stack)^10 OR (tags:over)^5 OR (tags:flow)^2
where the term "stack" is being boosted by 10, "over" is being boosted by 5 and "flow" is being boosted by 2. The result I'm after is that results with "stack" will appear higher than those with "flow", etc.
The problem I'm having is that say "flow" only appears in a couple of documents, but "stack" appears in loads, then due to a high idf value, documents with "flow" appear above those with "stack".
When this was project was implemented straight in Lucene, I used ConstantScoreQuery and these eliminated the idf based the score solely on the boost value.
How can this be achieved with Solr and SolrNet, where I'm effectivly just passing Solr a query string? If it can't, is there an alternative way I can approach this problem?
Thanks in advance!

Solr 5.1 and later has this built into the query parser syntax via the ^= operator.
So just take your original query:
(tags:stack)^10 OR (tags:over)^5 OR (tags:flow)^2
And replace the ^ with ^= to change from boosted to constant:
(tags:stack)^=10 OR (tags:over)^=5 OR (tags:flow)^=2

I don't think there any way to directly express a ConstantScoreQuery in Solr, but it seems that range and prefix queries use ConstantScoreQuery under the hood, so you could try faking a range query, e.g. tags:[flow TO flow]
Alternatively, you could implement your own Solr QueryParser.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight