Is it possible to use multiple words in a filter query in SOLRJ / SOLR? - solr

I am using SOLRJ (with SOLR 7) and my index features some fields for the document contents named content_eng, content_ita, ...
It also features a field with the full path to the document (processed by a StandardTokenizer and a WordDelimiterGraphFilter).
The user is able to search in the content_xyz fields thanks to the lines :
final SolrQuery query = new SolrQuery();
query.setQuery(searchedText);
query.set("qf",searchFields); // searchFields is a generated String which looks like "content_eng content_ita" (field names separated by space)
Now the user needs to be able to specify some words contained in the path (namely some subdirectories). So I added a filterQuery :
query.addFilterQuery(
"full_path_split:" + searchedPath);
If searchedPath contains only a single word contained in the document path, the document is correctly returned however if searchedPath has several words contained in the path, the document is not returned. To sum it up the fq only works if searchedPath contains a single word.
For example doc1 is in /home/user/dir1/doc1.txt
If I search for all (* in searchedText) documents that are in user dir (fq=full_path_split%3Adir) doc1.txt is returned.
If I do the same search but for documents that are in user and dir1 (fq=full_path_split%3user+dir1) doc1.txt is not returned, and I think it is because the fq is parsed as "+full_path_split:user +text:dir1" as debug=query shows. I don't know where text comes from it may be a default field.
So is it possible to use a filter query with several words to fulfill my needs ?
Any help appreciated,

Your suspicion is correct - the _text_:dir1 part comes from you not providing a field name, and the default field name being used instead.
You can work around this by using the more general edismax (or the older dismax) parser as you're doing in your main query with qf:
fq={!type=edismax qf='full_path_split'}user dir1

Related

How can I query Solr to get a list with all field-names prefixed by a string?

I would like to create an output based on the field-names of my Solr index objects.
What I have are objects like this e.g.:
{
"Id":"ID12345678",
"GroupKey":"Beta",
"PricePackage":5796.0,
"PriceCoupon":5316.0,
"PriceMin":5316.0
}
Whereby the Price* fields may vary from object to object, some might have more than three of those, some less, however they would be always prefixed with Price.
How can I query Solr to get a list with all field-names prefixed by Price?
I've looked into filters, facets but could not find any clue on how to do this, as all examples - e.g. regex facet - are in regard to the field-value, not the field-name itself. Or at least I could not adapt it to that.
You can get a comma separated list of all existing field names if you query for 0 documents and use the csv response writer (wt parameter) to generate the field name list.
For example if you request /solr/collection/select?q=*:*&wt=csv you get a list of all fields. If you only want fields prefixed with Price you could also add the field list parameter (fl) to limit the fields.
So the request to /solr/collection/select?q=*:*&wt=csv&fl=Price*should return the following response:
PricePackage,PriceCoupon,PriceMin
With this solution you get all fields existing including dynamic fields.

Solr query string not working for full text searches

I'm following this tutorial on how to perform indexing on sample documents using Solr. The default collection is "gettingstarted" as shown. Now I'm trying to query it. There are 52 entries as shown:
However, when I replace the q argument with say electronics, it should return 14 results. However, I get nothing.
When I replace the query string q with cat:electronics, then I actually get the 14 results. But why is this the case? isn't q=word supposed to search for word wherever it appears?
No, it's not. Your assumption that:
isn't q=word supposed to search for word wherever it appears?
is wrong. If you're using word as your only query, and nothing more - you're searching for word in the default search field. It does not search all available fields in all available documents.
Also be aware that the default query parser assumes that your query is in the Lucene Query Syntax. To handle more "natural" querying, you can use the edismax query parser. This query parser supports the qf parameter that tells Solr which fields to search, instead of having to use the cat:electronics syntax. Your example would then be q=electronics&qf=cat.
In the example documents you've given, qf=series_t author name cat is probably a decent value to search all these fields for the given query. You can also append ^<weight> to a field name to give hits in the different fields different weights. qf=name^10 cat would give a hit in name ten times the weight of a hit in the cat field.

Elements getting added in Solr index but not able to search elements as desired

I'm working with solr to store web crawling search results to be used in a search engine. The structure of my documents in solr is the following:
{
word: The word received after tokenizing the body obtained from the html.
url: The url where this word was found.
frequency: The no. of times the word was found in the url.
}
When I go the Solr dashboard on my system, which is http://localhost:8983/solr/#/CrawlerSearchResults/query I'm able to find a word say "Amazon" with the query "word: Amazon" but on directly searching for Amazon I get no results. Could you please help me out with this issue ?
Image links below.
First case
Second case (No results)
Thanks,
Nilesh.
In your second example, the value is searched against the default search field (since you haven't provided a field name). This is by default a field named _text_.
To support just typing a query into the q parameter without field names, you can either set the default field name to search in with df=wordin your URL, or use the edismax query parser (defType=edismax) and the qf parameter (query fields). qf allows multiple fields and giving them a weight, but in your case it'd just be qf=word.
Second - what you're doing seems to replicate what Lucene is doing internally, so I'm not sure why you'd do it this way (each word is what's called a "token", and each count is what's called a term frequency). You can write a custom similarity to add custom scoring based on these parameters.

Solr query looking for IN, IT, or IS in field

When performing these queries (separately):
country:IN
-or-
country:IT
-or-
country:IS
... I get all items in the index returned. I want to get only the items whose country field matches those params. I've tried every combination of escaping with single/double quotes and single/double slashes. When doing so, no items are returned at all.
I've verified that items exist in the index for these params by dumping the whole index (with a loose query) and identifying them. I'm on django-haystack in case that matters, but the issue is there for both the Django python shell and the Solr web admin interfaces.
Thanks for any help!
Filter queries return a subset of documents that match them.
fq=country:(IN OR IT OR IS)
fq=country:IN
Those are standard noise / stop words. You can either remove the terms from the stopwords file (stopwords_en.txt) and redindex your documents. Or set the type to string and use fq like aitchnyu mentions above.

Solr doesn't index document's content

I've a little problem with Sorl.
I've indexed about 1400 documents by an xml file with the post.jar command. Within the xml file I placed some information like ID, TITLE and URL of the documents.
When I search a document, It finds nothing, but if I specified an attribute, ex. TITLE: IEEE, It finds the documents.
So I change, on schema.xml, the default field search from text to title. In this way it finds documents without specifying the attribute.
Why doesn't it find the content? Did I mess up the indexing by changing the xml file?
Do a q=*:*. This fetches 10 (implicit default value for rows) documents with all fields and their values. Is all your data indexed properly?
Then do a q=fieldx:val with some known field and value. Do they show up in the results? Can you do more than string matches? If not, you need to choose data types (and storage/indexing options) in schema. Example: string allows only equality and prefix matches and text allows full text search.

Resources