Apache Solr use entire string for search within collection - solr

I have managed to create a dataset using Apache Solr. I have also managed to make queries, such as in this example:
content:(test1 OR test2) OR title: test2
I would now like to search the dataset using an entire string, in similar fashion to searching on google. Is the correct way to approach this to keep using or tags on the title and content for each word within the query, or is there a better way to achieve this ? (I am not looking for exact matches, just the most relevant ones)

You can use dismax or edismax for your approach and can pass the phrases if you have with the boosting.
The DisMax query parser is designed to process simple phrases (without
complex syntax) entered by users and to search for individual terms
across several fields using different weighting (boosts) based on the
significance of each field. Additional options enable users to
influence the score based on rules specific to each use case
(independent of user input).
The detailed parameters are found on the solr page at Solr Dismax

Related

Is it possible to exclude specific values from being included in Solr facets?

I'm using Solr facets to get the most common values for specific fields. It has occurred to me that (for business logic purposes) it would be preferable to exclude certain values. I cannot seem to find a way to do this, however.
I'm not looking to exclude the filter query, as seems to be commonly discussed.
If I'm getting the top 3 facets for a field, and seeing that "ValueA", "ValueB", and "ValueC", I'd like to say, essentially, "Get facets that aren't ValueB". So my facet instead returns data for "ValueA", "ValueC", and "ValueD".
Use the facet.excludeTerms parameter. According to the source the format seems to be "term1,term2" to exclude those two terms.
The feature was introduced with Solr 6.5.
If you need the same feature before Solr 6.5 - if you need to supply the term to exclude separately for each query, you're going to have to do it in your controller / Solr interfacing code. If you want to do it for a single or multiple terms across the whole index for all queries, add a separate field and filter out those terms while indexing.

Extract query terms from text for querying Solr server

I am using Solrj to build queries for Solr server.
So I have some pretty short free-form texts that can contain various special characters - like Mr. John's New-Wall, "Hotels & Food".
A phrase query for text like this would not produce enough matches. So from this text I would like to extract terms for building a simple query, something like content:Mr OR content:John's OR content:Hotels OR content:Food. (It probably would be good to somehow consider the term proximity, but I have to start with something).
The field that I am searching is the default text_general field. I started with replacing some special characters with spaces and splitting them up to extract the terms. But it feels kind of redundant.
Isn't there an easier way to extract terms from text using Solrj and Solr? Basically I would like to extract terms from text similarly to how it is done by Solr when it creates its index.
I am not sure exactly what your question is, however here is a bit of info that you may find helpful:
Basically I would like to extract terms from text similarly to how it is done by Solr when it creates its index.
You can configure indexing and query field processing in your schema. I would suggest you take a look in here. This gives you a bit of flexibility to normalize your data.
So from this text I would like to extract terms for building a simple query, something like content:Mr OR content:John's OR content:Hotels OR content:Food.
This is the default way that solr queries under the hood. I would suggest you look up edismax query parser and qf and tie parameters.
Hope it helps

Similarity/approximate queries in Solr

What is the simplest way to query Solr for the documents that contain text similiar to a (longish) passage. This is similar to what ElasticSearch match queries do or what probabilistic search engines like Indri do by default. This is something between an and and an or query. None of the terms is required, but you get documents that contain many of the terms. You can also just pass a passage of raw text to the engine and it returns documents with high term overlap with the passage without having to try to parse or tokenize the text in the client. The best I option can see in the Solr query reference is to tokenize the query text myself and then insert an OR between each pair of terms and return the top N results. Is there more concise way of doing it with Solr?
The answer above is correct. You can choose to find documents similar to another document in the index, similar to a given external URL or similar to some given text. You can choose what field(s) to target and various other parameters. Here's the official Solr Reference Guide documentation page for MLT: https://cwiki.apache.org/confluence/display/solr/MoreLikeThis

Solr doesnot accepts unparsed query

I have added some documents in my solr index using requestHandler and now I am trying to query them from the web UI, I am getting the correct result when my query parameter is in the fomat
[id]:[search-item]
but i want to search it without parsing in this format, so for example i have to search for cat, i just type "cat" and it gives me the result, and not "animal:cat",
I am new to solr so I am not very sure, where am I going wrong
Use the DisMax query parsers/handlers
Extract from DisMax documentation
The DisMax query parser is designed to process simple phrases (without
complex syntax) entered by users and to search for individual terms
across several fields using different weighting (boosts) based on the
significance of each field. Additional options enable users to
influence the score based on rules specific to each use case
(independent of user input).
In general, the DisMax query parser's interface is more like that of
Google than the interface of the 'standard' Solr request handler. This
similarity makes DisMax the appropriate query parser for many consumer
applications. It accepts a simple syntax, and it rarely produces error
messages.
Also see DisMax and full documentation of the DisMax query parser here

solr faceted search - how do I specify multiple fields on the Solr Query UI?

I'm a newbie to solr and tying my hands at solr.
Can some one here please explain how to specify multiple facet fields for a given search.
I'm using the Solr Admin UI/ query ink and it allows me to specify only one field.
I would however like to facet on multiple fields like region industry stock-exchange etc on my company search.
I have gone through the solr wiki and relevant doc links like the one below
http://docs.lucidworks.com/display/solr/Query+Screen
but none of them seem to explain how to specify multiple fields.
I want to build something like the usual Amazon/Walmart etc search ui that provides multiple facets and counts when trying to search for a product on my planned cmpany search page.
You can query multiple facet fields. Just write with the syntax:
.../select?q=&facet=true&facet.field=<field1>&facet.field=<field2>
When you execute the search in the Solr Query UI, it will show the actual url that is being sent to Solr above the results pane. Click on that url and it will open a new window in your browser to that url. From there you can add additional parameters to the url to get facteing on multiple fields, by adding additional &facet.field=<your field> entries.
Please see the Solr Faceting Parameters reference for more details and other options.
You are looking for json.facet
It's available from solr 5(some advanced features are available from solr 6).
Basically that means you can insert your facet search parameters via json to the url.
It looks like this(live example):
&facet=true&json.facet={"filed1":{"type":"terms","field":"filed1","limit":2000},"filed2":{"type":"terms","field":"filed2","limit":2000}}
There is also a shorter version:
&facet=true&json.facet={"field1":{"terms":"field1"},"field2":{"terms":"field2"}}
You can find more information here
For facet queries, its not done till 4.3. Resolved for versions 4.4/5.0
The Solr Admin UI allows you to specify multiple facets fields i.e. a csv of fields in the facet.field parameter. You need to check the facet checkbox and then you will get more options.
If you are querying Solr using a link then the link should look like - facet=true&facet.field=field1&facet.field=field2.

Resources