Solrj Api Use in Java code - solr

Anybody pleasee help.I am new to Solr.My project uses Solrj api to access solr in java code.I don't understand the different steps in querying with solr and solrj.I got ths following code from net.Can anyone please describe the importance of these statements.?
public class SolrJSearcher {
public static void main(String[] args) throws MalformedURLException, SolrServerException {
HttpSolrServer solr = new HttpSolrServer("http://localhost:8983/solr");
SolrQuery query = new SolrQuery();
query.setQuery("sony digital camera");
query.addFilterQuery("cat:electronics","store:amazon.com");
query.setFields("id","price","merchant","cat","store");
query.setStart(0);
query.set("defType", "edismax");
QueryResponse response = solr.query(query);
SolrDocumentList results = response.getResults();
for (int i = 0; i < results.size(); ++i) {
System.out.println(results.get(i));
}
}
}

You'll have to read up on Solr concepts to actually use SolrJ for anything useful, so that you're able to tell what the different parts of the API are. I'm not going to go into any detail here, and you should really research a concept more before posting a very broad and basic question. I'll answer it for further reference for anyone stumbling across this post from the Internet anyway, or if anyone need to reference it from another post.
setQuery - The actual query to send to Solr. This is what usually goes in the q parameter when reading the Solr documentation. The format of the query depends on which query parser you're using (which is edismax here, I'll get back to that). Lucene query syntax in general is field:value.
addFilterQuery - Filter the search result by the values supplied. This is what you'll see in the fq parameter in the Solr docs. A filter query doesn't affect scoring, it just filters the search result returned by Solr by removing any documents that doesn't match the filter query.
setFields - Which fields to return from the index. If you don't need all the fields, you can cut down the size of the response from Solr by just requesting the fields you need.
setStart - The offset of the query result, which document hit to start retrieving data from. Useful for pagination.
set - Set any parameter that isn't available through dedicated methods. Here the parameter defType is set, which tells Solr which query parser to use. edismax is one such query parser, that accepts queries in a natural format like you'd expect most people to be familiar with from general search engines.
query - Performs the actual query on the Solr server, and retrieves the result. The response is returned, and then used to get the list of documents in the result (getResults ).
The results are then printed out one by one.

Related

Why dismax q.alt doesn't return any result

I'm new to solr.
After following the tutorial exercise 1(https://solr.apache.org/guide/8_9/solr-tutorial.html), I'm able to do some solr query on my loacl machine.
If I want to get result without condition, I will do the query like
http://127.0.0.1:8983/solr/#/techproducts/query?q=*:*&q.op=OR
This works pretty fine.
But when I switch to "dismax" and try to have similar result, I do need to use "q.alt".
The query is like
http://127.0.0.1:8983/solr/#/techproducts/query?q.op=OR&defType=dismax&q.alt=*:*
However, this query resulted in no result, which is pretty weird.
Even thought I specified the row, it still won't work.
http://127.0.0.1:8983/solr/#/techproducts/query?q.op=OR&defType=dismax&q.alt=*:*&row=0
Does anyone face the same problem before?
These parameters are not meant to be used with the user interface URLs; they're for sending directly to Solr. The user interface is a Javascript interface that talks to the Solr API behind the scenes. You can see that your urls have a local anchor in them (#), and this is just references that the javascript based user interface uses to load the correct page.
The rows parameter is also named rows, not row - and when used with 0, no documents will be returned (in the example it's given as an example for using facets with complete counts - you have to ask for facets for that to make sense).
The actual URL to query Solr for matching documents would be:
http://127.0.0.1:8983/solr/techproducts/select?defType=edismax&q.alt=*:*
This URL is shown in the user interface over the query results when using the query page.
There is also usually no reason to use dismax and not edismax these days, as edismax does everything that the old dismax handler did and with more functionality.

SOLR Solarium can we use filter-queries with dismax-queries?

i just built a search form backed by solr, we are using the solarium library to construct our requests.
we built a "huge" collection of filterqueries like that one:
$query = $client->createQuery($client::QUERY_SELECT);
$query->setStart(0)->setRows(1000);
$query->addFilterQuery($query->createFilterQuery("foo")->setQuery("bar:true"));
$query->addFilterQuery($query->createFilterQuery("fo")->setQuery("ba:false"));
....
but we realized that the search just hits all the single fields we specify in the filterqueries, but we have to actually query multiple fields. while reading the docs i realized we could have been wrong, right? the correct approach would be to use disMax queries (in combination with facets?)? im wondering, can we use DisMax in combination with filterqueries to "expand" our search to multiple fields (with boosts) ? or do we have to actually rework everything?
im kinda missing the big picture to decide what the best/working solution would be
help is much appreciated
edit:
solr:
solr-spec 7.6.0
solarium:
solarium/solarium 6.0.1 PHP Solr client
You can give a query parser when giving the fq argument:
fq={!dismax qf="firstfield secondfield^5"}this is my query
The syntax is known as Local Parameters. Since dismax (or edismax which you should normally use now) doesn't have a identifier in front of it, it is implicitly parsed as the type.
If a local parameter value appears without a name, it is given the implicit name of "type". This allows short-form representation for the type of query parser to use when parsing a query string.
You'll have to make sure that Solarium doesn't escape the value you give to setQuery, but seeing as you're already giving a field:value combination, it doesn't seem to get escaped. Double check the Solr log to see exactly what query is being sent to Solr (or ask Solarium to give you the exact query string being sent if possible).

How to get a Query object from solr query string

There are solr query strings available from the log ,and the intent is to analyze the query to find out number fqs ,terms etc. Is there any api/parser available in solr/lucene to parse the entire query string and get the terms used ,filters used ,languages used ,fields used etc. Looked at QueryParser provided by lucene ,but it doesn't seem to help.
Example simple query string:
q=*:*&facet.field=Language&facet=true&f.Language.facet.limit=101&rows=0&sort=score desc,DefaultRelevance desc&fl=xxNonexx&bmf=50&wt=xml
You can use the SolrRequestParsers.parseQueryString() method to convert the string into Solr Params. Here's a link to documentation for it.
Below is an example.
String queryString = "q=*:*&facet.field=Language&facet=true&f.Language.facet.limit=101&rows=0&sort=score desc,DefaultRelevance desc&fl=xxNonexx&bmf=50&wt=xml";
MultiMapSolrParams solrParams = SolrRequestParsers.parseQueryString(String);
The code resides in the solr-core library, so you may need to add it.
I think you're not really looking for a parser but for a way to debug your query. Fortunately Solr has a debug parameter that you can use for such purpose as explained here. For instance you can add to your query:
q=*:*&facet.field=Language&facet=true&f.Language.facet.limit=101&rows=0&sort=score desc,DefaultRelevance desc&fl=xxNonexx&bmf=50&debug=true&wt=xml

How Can I perform Distributed search inside solr

I am doing a distributed search inside Solr on 4 diffrent solr servers on different machine. I have extended my class to Query and I would like to perform distributed search. I have created a solr query using solrj. But when I make a query to solr sometimes it gives me a correct result and sometimes incorrect. It gives me incorrect result only when some shards throws query parsing exception. So my question is can I perform a distributed search inside solr. Outline of my class from where I am making a distributed solr search is as given below.
public class CutomClass extends Query {
// some other code....
public Weight createWeight(IndexSearcher searcher1) throws IOException {
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.add(ShardParams.SHARDS, getShards);
query.setStart(0);
query.setRows(0);
query.set("sort", "score desc");
query.setFacet(true);
query.addFacetField("CLIENT");
query.setFacetMinCount(1);
QueryResponse queryResponse = solrServer.query(query, SolrRequest.METHOD.POST);
}
// some other code....
}
Sometimes it gives follwing parsing exception on some shard and the result comes incorrect.
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.search.SyntaxError: org.apache.lucene.queryParser.ParseException: Cannot parse ':': Encountered "" at line 1, column 0.
Yes, you can perform a distributed search in Solr. If you are using a single collection (say collection1) and using the following Solr URL to create the solrServer object, then you are doing a distributed search by default.
http://localhost:8983/solr/collection1/select?
This URL let you query across all the shards in collection1, whether they are in same or different machine. But, if you have separate collections and you want a search among those collections, please follow http://wiki.apache.org/solr/SolrCloud.
To know more about distributed search, please look into https://wiki.apache.org/solr/DistributedSearch.
You can simply comment the following line.
query.add(ShardParams.SHARDS, getShards);
And yes, *:* is not the problematic part in your program as stated by D. Kasipovic. Query can contain :. You need to provide query like field:text. Problem occurs when the text part is having ":". Then you need to escape : by \:.
Please modify the URL to do a distributed search and then let us know what happens.

Adding date boosting to complex SOLR queries

I currently have a SOLR query which uses the query (q), query fields (qf) and phrase fields (pf) to retrieve the results I want. An example is:
/solr/select
?q=superbowl
&qf=title^3+headline^2+intro+fulltext
&pf=title^3+headline^2+intro+fulltext
&fl=id,title,ts_modified,score
&debugQuery=true
The idea is that the title and headline of the "main item" give the best indication of what the result is "about", but the intro and fulltext provides some input too. Ie, imagine a collection of links, where the collection itself has metadata (what it's a collection of), but each link has it's own data (title of the link, synopsis, etc). If we search for "superbowl", the most relevant results are the ones with "superbowl" in the collection metadata, the least relevant results are those with "superbowl" in just the synopsis of one of the links... but they're all valid results.
What I'm trying to do is add a boost to the relevancy score so that the most recent results float towards the top, but retaining title,headline,intro,fulltext as part of the formula. A recent result with the search string in the collection metadata would be more relevant than one with it only in the links metadata... but that "links only" recent result might be more relevant than a very old result with the search string in the collection metadata. (I hope that's somewhat clear).
The problem is that I can't figure out how to combine the boost function documented on the SOLR site with the use of the qf/pf fields. Specifically...
From the SOLR site, something like the following works to boost the results by date:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq}
&dateboost=ord(ts_modified)
&qq=superbowl
&fl=ts_modified,score
&debugQuery=true
However, I can't figure out how to combine that query with the use of qf and pf. Any suggestions would be more than welcome.
Thanks to danben's response, I was able to come up with the following:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq%20defType=dismax}
&dateboost=ord(ts_modified)
&qq=superbowl
&qf=title^3+headline^2+intro^2+fulltext
&pf=title^3+headline^2+intro^2+fulltext
&fl=ts_modifieds,score
&debugQuery=true
It looks like the actual problems I was having were:
I left spaces in the q param instead of escaping them (%20) when copy/pasting
I didn't include the defType=dismax in my q param, so that it would pay attention to the qf/pf parameters
Check out http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
This is based on the ms function, which returns the difference in milliseconds between two timestamps / dates, and ReciprocalFloatFunction which increases as the value passed decreases.
Since you are using the DisMaxRequestHandler, you may need to specify your query using the bq/bf parameters. From http://lucene.apache.org/solr/api/org/apache/solr/handler/DisMaxRequestHandler.html:
bq - (Boost Query) a raw lucene query that will be included in the
users query to influence the score. If
this is a BooleanQuery with a default
boost (1.0f), then the individual
clauses will be added directly to the
main query. Otherwise, the query will
be included as is. This param can be
specified multiple times, and the
boosts are are additive. NOTE: the
behaviour listed above is only in
effect if a single bq paramter is
specified. Hence you can disable it by
specifying an additional, blank, bq
parameter.
bf - (Boost Functions) functions (with optional boosts) that will be
included in the users query to
influence the score. Format is:
"funcA(arg1,arg2)^1.2
funcB(arg3,arg4)^2.2". NOTE:
Whitespace is not allowed in the
function arguments. This param can be
specified multiple times, and the
functions are additive.
Here is a nice article about Date-boosting Solr search results:
http://www.metaltoad.com/blog/date-boosting-solr-drupal-search-results
In Drupal this can be simply achieved by the following code:
using Apachesolr module
/**
* Implements hook_apachesolr_query_alter().
*/
function hook_search_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
$query->addParam('bf', array('freshness' =>
'recip(abs(ms(NOW/HOUR,dm_field_date)),3.16e-11,1,.1)'
));
}

Resources