How to search multiple words in one field on solr? - solr

I have a field in solr of type list of texts.
field1:{"key1:val1,key2:val2,key3:val3", "key1:val1,key2:val2"}
I want to form a query such that when I search for key1:val1 and key3:val3 I get the result who has both the strings i.e key1:val1 and key3:val3.
How shall I form the query?

If these are values in a multivalued field, you can't - directly. You'll have to use something like highlighting to tell you where Solr matched it.
There is no way to tell Solr "I only want the value that matched inside this set of values".
If this is a necessary way to query your index, index the values as separate documents instead in a separate collection. In that case you'd have to documents instead, one with field1:"key1:val1,key2:val2,key3:val3" and one with key1:val1,key2:val2.

You can use AND with fq.
Like:
fq=key1:val1 AND key3:val3
With this filter query you will get only records where key1 = val1 AND key3 = val3.

Related

CFSearch solr using a list in a custom field

I'm attempting to index around 30,000 database records in a single collection and per my requirements I need to be able to include a list of items in a single custom field - and use that in my search.
Here's an example of my index:
<cfindex collection = "myCollection"
action = "refresh"
type = "custom"
query = "Local.myQuery"
key = "ID"
title="Title"
applications_s="A_Comma_Separated_List"
body = "a_field,a_nother_field">
In this example, applications_s is a dynamic custom field (introduced in CF10) containing a list of application IDs.
An example of content for this field would be:
T1,T2,B4,G1
This all indexes splendidly, however I've been unable to figure out how to search, using a single item in the applications list as criteria.
So, I'd like to be able to do this:
<cfsearch name="Local.qSearch"
collection="myCollection"
criteria="test AND applications_s:T1">
This should return all records that contain the word 'test' in the body, and also contain 'T1' in the applications field. However, I can't find a criteria syntax that will treat the contents of the custom field as a comma separated list... it seems to only work as a string. Therefore my example record wouldn't be returned unless I include a wildcard - which could cause problems with extra records being returned by mistake.
Is there any way to explicitly specify that my custom field is a list and should contain my specified value?
I managed to get the following to work on CF9.0.1. Although the MYCUSTOMNAME_TYPE (e.g. applications_s) fields are CF10-only, I was able to use the custom1 field and specify it as a "string" type by editing the collection's schema.xml and restarting Solr. You shouldn't have to on CF10.
1) In the query you're indexing, add TWO commas to the beginning of the application list column, and ONE at the end, so an example row would look like:
,,T1,T1B,T2,B4,G1,
You could do this either in your SQL using concatenation (preferable), or by post-processing the query result with Query-of-Queries, or QueryNew() and looping over the query to build a copy.
2) Index the query with cfindex as in your question, using applications_s to ensure the field is a string type, not text. We don't want the list to be "tokenised" as words. The commas are critical and we don't want them to be ignored.
3) In your cfsearch pad the criteria as follows:
<cfset searchString= "test">
<cfset applicationFilter = "T1">
<cfsearch name="Local.qSearch"
collection="myCollection"
criteria="#searchString# AND applications_s:,*,#applicationFilter#,*">
Note there are 3 commas and 2 wildcard asterisks altogether. The first comma is there because you cannot start a Solr query with a wildcard. The second and third commas ensure that the wildcard search for T1 does not match T1B.

Behavior of the OR clause in Solr

My solr index contains documents which have a field named department. This field is a multivalue non-required int field. I want to construct a query whose result must be union of
All the documents that do not contain the field department
All the documents that contain the field department, but the values of the field are restricted to a selected few.
I tried constructing the query that looks like so:
-department:* OR (department:* AND department:(100 OR 200))
This doesn't return any results. Whereas if I just just use
-department:*
or
department:* AND department:(100 OR 200)
, the query seems to work well. In short I'm having trouble understanding the behavior of OR clause in this context. Any pointers?
Checkout SolrQuerySyntax
Pure Negative Queries :-
-field:[* TO *] finds all documents without a value for field
You can try :-
q=-department:[* TO *] OR department:(100 OR 200)
To achieve what you want, I think you can use Solr grouping.
You can give something like,
&q=*&group=true&group.query=department:[100]&group.query=department:[200]&group.query=-department:[*]

to get the count of multi valued field in solr

In solr i have a multiValued field called animal and it has the values {cat,dog} is it possible to get the number of values inside the multiValued field in solr(in my example 2)?
If you want to count the items in a multivalued field use CountFieldValuesUpdateProcessorFactory
There is no direct way to get the count of items in a multivalued field.
You can always maintain the field count during indexing and use it.
If not using during query time, you can always count the list size.

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

Using multivalued field in map function

I'm working on implementing Solr in a project and right now I'm stuck on a specific search including an arr field. The thing is:
I'd like to search sub-id's on an object, these sub-id's are stored in a multivalue field, e.g.:
<arr name="SubIds">
<int>12272</int>
<int>12304</int>
<int>12306</int>
</arr>
The query (or part of the query) that I want to use is as follows:
map(SubIds,i,i,1,0)
When I, for example, fill 12304 on the 'i' space in the map function above, I would expect my function to return 1. If I would enter 12345 it should return 0. The thing is that when I run this query it returns 0, or "There's no number 12304 in this field, I return 0".
When removing the 0 from my map function I can see the actual value returned to me (when 12304 return 1, when different return value), in this case that's 12306! I've tried this with some different multivalued fields but the result is the same; it looks like the function is checking the last value in the multivalue field against my filled in ID.
Is this true? And when it does, is there any way in looking through the whole arr and only return 0 when the value doesn't exist in the whole multivalued field?
** Edit: It's just a hunch, but could it be that the map() function automatically orders the arr list when it sees that all the items are of type int (for example). That could mean that the map returns the first number (the highest) which would (in my example) be 12306, not 12304...*
Thanks!
... It looks like function queries don't work with multivalued fields ...
http://lucene.472066.n3.nabble.com/Using-multivalued-field-in-map-function-td3318843.html#a3322023:
Function queries don't work with multivalued field.
http://wiki.apache.org/solr/FunctionQuery#Vector_Functions
Given the following case, is there anybody who has a better idea on how I can query the wanted data?
I've got a website full of blogposts and every blogpost has an owner,
this owner is refererred to through his/her id. For example: BloggerId
= 123. It's also possible that the blog has multiple co-writers, which
are also referred to by there BloggerId but these id's are stored in
the multivalue field, in my previous example SubIds.
When searching for a specific blogger one searches the BloggerId.
Searchresults are influenced by a number of variables, the
country/state/more specific geological data, the blogcategory, etc.
For this I use a facetted query. Next I want to make some results more
important, depending on the BloggerId, I tried to do this with the
following query:
?q={!func}map(sum(map(BloggerId,12304,12304,2,0),map(BloggerId,12304,12304,1,0)),3,3,2)&fl=*,score&facet.field=Country&f.Country.facet.limit=6&facet.field=State&fq=(BlogCategory:internet%20OR%20BlogCategory:sports&sort=score%20desc,Top%20desc,%20SortPriority%20asc&start=0&omitHeader=true
In the resulting list, blogs written by BloggerId 12304 should be on
top of the list, followed by the blogs where BloggerId 12304 was
co-writer. After that, all other blogs that follow the criteria but
aren't written (or co-written) by BloggerId 12304.
Maybe I could make this multivalued field a string field (where id's are seperated by ";") and query my value, but if one has a better idea your always welcome!
In the end I chose to add a string valued field with whitespaces to seperate the different values. After that I used the solr.WhitespaceTokenizerFactory class to quickly scan the string for occurences of a specific ID.

Resources