From reading the docs the search +term +another_term should return the same documents as term AND another_term. But I'm getting different results. Someone suggested that one of the terms is actually acting as an OR. But I thought the search queries were baked into SOLR.
Where in the Solr config would I check for this?
If you enable the debug flat in the admin UI when you run those two queries, it will show you what they get translated to on the lowest level after the Query Parser, etc. You can compare and see if something is different.
Related
I have a set of keywords defined by client requirements stored in a SOLR field. I also have a never ending stream of sentences entering the system.
By using the sentence as the query against the keywords I am able to find those sentences that match the keywords. This is working well and I am pleased. What I have essentially done is reverse the way in which SOLR is normally used by storing the query in Solr and passing the text in as the query.
Now I would like to be able to extend the idea of having just a keyword in a field to having a more fully formed SOLR query in a field. Doing so would allow proximity searching etc. But, of course, this is where life becomes awkward. Placing SOLR query operators into a field will not work as they need to be escaped.
Does anyone know if it might be possible to use the SOLR "query" function or perhaps write a java class that would enable such functionality? Or is the idea blowing just a bit too much against the SOLR winds?
Thanks in advance.
ES has percolate for this - for Solr you'll usually index the document as a single document in a memory based core / index and then run the queries against that (which is what ES at least used to do internally, IIRC).
I would check out the percolate api with ElasticSearch. It would sure be easier using this api than having to write your own in Solr.
What is the simplest way to query Solr for the documents that contain text similiar to a (longish) passage. This is similar to what ElasticSearch match queries do or what probabilistic search engines like Indri do by default. This is something between an and and an or query. None of the terms is required, but you get documents that contain many of the terms. You can also just pass a passage of raw text to the engine and it returns documents with high term overlap with the passage without having to try to parse or tokenize the text in the client. The best I option can see in the Solr query reference is to tokenize the query text myself and then insert an OR between each pair of terms and return the top N results. Is there more concise way of doing it with Solr?
The answer above is correct. You can choose to find documents similar to another document in the index, similar to a given external URL or similar to some given text. You can choose what field(s) to target and various other parameters. Here's the official Solr Reference Guide documentation page for MLT: https://cwiki.apache.org/confluence/display/solr/MoreLikeThis
I am working on two different searching tools: DtSearch and Solr. I do a FULL_TEXT search on one indexed search term ("2008/12/02") and unfortunately both give different hits though the data are the same. Another strange thing I notice is that Solr gives three DOC_ID as hits and DtSearch gives me five for the same search terms.
I am confused about date searching now. How can it be possible though the data are the same?
Do I need to apply some extra settings in config files? Is there any way I get consistent output?
Thank you,
I have a Solr solution working which requires two queries, but I'm looking for a way to do it in a single query. My idea is that if I can figure out a way to do this, I wont have to incur the overhead of twice the load on the Solr cluster.
The details: I'm running a simple query like "q=camera" with a query filter of say "fq=type:digital". The second query is identical to the first, but the filter is the inverse, like "fq=-type:digital" I'm imagining that if there's a way to run a single query while applying the first filter to get the first set of topDocs, then generate a second set with the second filter the results could be merged and returned ( it doesn't matter if sorting resorts and mixes the two sets).
I experimented with partitioning the data by marking a specific field during indexing, into two different groups and then using Solr "grouping" queries, but the response time for these wasn't acceptable in my setup.
I'm looking for suggestions the most Solr congruent approach to experiment with: tuning to improve the two-query solution performance, or investigating a kind of custom Solr post-filter ( I read Yonik's 2/2012 blog post ).
I have to implement this in Solr 3.5, although if there's a slam dunk solution in 4.0 I'll eventually be able to move to that.
I can think of two alternate approaches :-
Instead of filter the results, use a variable higher boost so that all the results for type:digital come on top and rest of the documents would follow. No need for separate queries. The boost can be changes as per the type value.
Other approach is not to display the results for type other then digital. However, you can display the facets for the other types with the counts for the same for users to know if the other types exist for the search term. You can check on tagging and excluding filters
Result grouping might give you what you want. Just group by that parameter and specify sufficient top number of documents in each group.
But I would test whether its performance is any better than two queries. Just because it mentions performance in limitations section.
I'm using Apache Solr and querying an index with a schema that has a text field PostBody, a integer Userid field, and a trie based datetime field MostRecentActivityDate.
I'm attempting to apply query-time boosting to my select query such that more recent posts are boosted by some factor to assist in scoring. My values for this are in attempts to have a timescale of days rather than years as in many online date boosting examples.
The following two queries produce different results, the only thing being different in them is where the "code" for the boosting is actually placed (i.e. prior to or after the field conditionals themselves). In my testing I've also noticed that they both produce different results from when there is no {} boosting code, so its not as if in one case its being ignored.
Is anyone able to explain why they would produce different results? Thanks!
{!boost%20b=recip(ms(NOW,MostRecentActivityDate),1.16e-7,1,1)} (PostBody:"timmy is great and that is a fact") AND !Userid=2
Vs.
(PostBody:"timmy is great and that is a fact") AND !Userid=2 {!boost%20b=recip(ms(NOW,MostRecentActivityDate),1.16e-7,1,1)}
Since this will be very specific to your data, the best way to figure out what is happening, is to turn on query Debugging - via the debugQuery=on parameter of your search. Here are two links that help explain the debug output.
Debugging Search Applications Relevance - Explanations
Why does id:archangel come before id:hawkgirl when querying for "wings"