CFSearch solr using a list in a custom field - solr

I'm attempting to index around 30,000 database records in a single collection and per my requirements I need to be able to include a list of items in a single custom field - and use that in my search.
Here's an example of my index:
<cfindex collection = "myCollection"
action = "refresh"
type = "custom"
query = "Local.myQuery"
key = "ID"
title="Title"
applications_s="A_Comma_Separated_List"
body = "a_field,a_nother_field">
In this example, applications_s is a dynamic custom field (introduced in CF10) containing a list of application IDs.
An example of content for this field would be:
T1,T2,B4,G1
This all indexes splendidly, however I've been unable to figure out how to search, using a single item in the applications list as criteria.
So, I'd like to be able to do this:
<cfsearch name="Local.qSearch"
collection="myCollection"
criteria="test AND applications_s:T1">
This should return all records that contain the word 'test' in the body, and also contain 'T1' in the applications field. However, I can't find a criteria syntax that will treat the contents of the custom field as a comma separated list... it seems to only work as a string. Therefore my example record wouldn't be returned unless I include a wildcard - which could cause problems with extra records being returned by mistake.
Is there any way to explicitly specify that my custom field is a list and should contain my specified value?

I managed to get the following to work on CF9.0.1. Although the MYCUSTOMNAME_TYPE (e.g. applications_s) fields are CF10-only, I was able to use the custom1 field and specify it as a "string" type by editing the collection's schema.xml and restarting Solr. You shouldn't have to on CF10.
1) In the query you're indexing, add TWO commas to the beginning of the application list column, and ONE at the end, so an example row would look like:
,,T1,T1B,T2,B4,G1,
You could do this either in your SQL using concatenation (preferable), or by post-processing the query result with Query-of-Queries, or QueryNew() and looping over the query to build a copy.
2) Index the query with cfindex as in your question, using applications_s to ensure the field is a string type, not text. We don't want the list to be "tokenised" as words. The commas are critical and we don't want them to be ignored.
3) In your cfsearch pad the criteria as follows:
<cfset searchString= "test">
<cfset applicationFilter = "T1">
<cfsearch name="Local.qSearch"
collection="myCollection"
criteria="#searchString# AND applications_s:,*,#applicationFilter#,*">
Note there are 3 commas and 2 wildcard asterisks altogether. The first comma is there because you cannot start a Solr query with a wildcard. The second and third commas ensure that the wildcard search for T1 does not match T1B.

Related

Is it possible to use multiple words in a filter query in SOLRJ / SOLR?

I am using SOLRJ (with SOLR 7) and my index features some fields for the document contents named content_eng, content_ita, ...
It also features a field with the full path to the document (processed by a StandardTokenizer and a WordDelimiterGraphFilter).
The user is able to search in the content_xyz fields thanks to the lines :
final SolrQuery query = new SolrQuery();
query.setQuery(searchedText);
query.set("qf",searchFields); // searchFields is a generated String which looks like "content_eng content_ita" (field names separated by space)
Now the user needs to be able to specify some words contained in the path (namely some subdirectories). So I added a filterQuery :
query.addFilterQuery(
"full_path_split:" + searchedPath);
If searchedPath contains only a single word contained in the document path, the document is correctly returned however if searchedPath has several words contained in the path, the document is not returned. To sum it up the fq only works if searchedPath contains a single word.
For example doc1 is in /home/user/dir1/doc1.txt
If I search for all (* in searchedText) documents that are in user dir (fq=full_path_split%3Adir) doc1.txt is returned.
If I do the same search but for documents that are in user and dir1 (fq=full_path_split%3user+dir1) doc1.txt is not returned, and I think it is because the fq is parsed as "+full_path_split:user +text:dir1" as debug=query shows. I don't know where text comes from it may be a default field.
So is it possible to use a filter query with several words to fulfill my needs ?
Any help appreciated,
Your suspicion is correct - the _text_:dir1 part comes from you not providing a field name, and the default field name being used instead.
You can work around this by using the more general edismax (or the older dismax) parser as you're doing in your main query with qf:
fq={!type=edismax qf='full_path_split'}user dir1

How can I query Solr to get a list with all field-names prefixed by a string?

I would like to create an output based on the field-names of my Solr index objects.
What I have are objects like this e.g.:
{
"Id":"ID12345678",
"GroupKey":"Beta",
"PricePackage":5796.0,
"PriceCoupon":5316.0,
"PriceMin":5316.0
}
Whereby the Price* fields may vary from object to object, some might have more than three of those, some less, however they would be always prefixed with Price.
How can I query Solr to get a list with all field-names prefixed by Price?
I've looked into filters, facets but could not find any clue on how to do this, as all examples - e.g. regex facet - are in regard to the field-value, not the field-name itself. Or at least I could not adapt it to that.
You can get a comma separated list of all existing field names if you query for 0 documents and use the csv response writer (wt parameter) to generate the field name list.
For example if you request /solr/collection/select?q=*:*&wt=csv you get a list of all fields. If you only want fields prefixed with Price you could also add the field list parameter (fl) to limit the fields.
So the request to /solr/collection/select?q=*:*&wt=csv&fl=Price*should return the following response:
PricePackage,PriceCoupon,PriceMin
With this solution you get all fields existing including dynamic fields.

How to search multiple words in one field on solr?

I have a field in solr of type list of texts.
field1:{"key1:val1,key2:val2,key3:val3", "key1:val1,key2:val2"}
I want to form a query such that when I search for key1:val1 and key3:val3 I get the result who has both the strings i.e key1:val1 and key3:val3.
How shall I form the query?
If these are values in a multivalued field, you can't - directly. You'll have to use something like highlighting to tell you where Solr matched it.
There is no way to tell Solr "I only want the value that matched inside this set of values".
If this is a necessary way to query your index, index the values as separate documents instead in a separate collection. In that case you'd have to documents instead, one with field1:"key1:val1,key2:val2,key3:val3" and one with key1:val1,key2:val2.
You can use AND with fq.
Like:
fq=key1:val1 AND key3:val3
With this filter query you will get only records where key1 = val1 AND key3 = val3.

Solr copyField mixed with RegexTransformer

Scenario:
In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|
What I want:
In Solr index, I want to have 2 fields:
Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130
For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml:
<field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml
What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.
As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.
Any help is appreciated, thanks.
Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,
Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.
Check this Question, it is similar.
You can create two columns from the same data and treat them separately.
SELECT categories, categories as categories_pipe FROM category_table
Then you can split the "categories" column, but index the other one as-is.

Using multivalued field in map function

I'm working on implementing Solr in a project and right now I'm stuck on a specific search including an arr field. The thing is:
I'd like to search sub-id's on an object, these sub-id's are stored in a multivalue field, e.g.:
<arr name="SubIds">
<int>12272</int>
<int>12304</int>
<int>12306</int>
</arr>
The query (or part of the query) that I want to use is as follows:
map(SubIds,i,i,1,0)
When I, for example, fill 12304 on the 'i' space in the map function above, I would expect my function to return 1. If I would enter 12345 it should return 0. The thing is that when I run this query it returns 0, or "There's no number 12304 in this field, I return 0".
When removing the 0 from my map function I can see the actual value returned to me (when 12304 return 1, when different return value), in this case that's 12306! I've tried this with some different multivalued fields but the result is the same; it looks like the function is checking the last value in the multivalue field against my filled in ID.
Is this true? And when it does, is there any way in looking through the whole arr and only return 0 when the value doesn't exist in the whole multivalued field?
** Edit: It's just a hunch, but could it be that the map() function automatically orders the arr list when it sees that all the items are of type int (for example). That could mean that the map returns the first number (the highest) which would (in my example) be 12306, not 12304...*
Thanks!
... It looks like function queries don't work with multivalued fields ...
http://lucene.472066.n3.nabble.com/Using-multivalued-field-in-map-function-td3318843.html#a3322023:
Function queries don't work with multivalued field.
http://wiki.apache.org/solr/FunctionQuery#Vector_Functions
Given the following case, is there anybody who has a better idea on how I can query the wanted data?
I've got a website full of blogposts and every blogpost has an owner,
this owner is refererred to through his/her id. For example: BloggerId
= 123. It's also possible that the blog has multiple co-writers, which
are also referred to by there BloggerId but these id's are stored in
the multivalue field, in my previous example SubIds.
When searching for a specific blogger one searches the BloggerId.
Searchresults are influenced by a number of variables, the
country/state/more specific geological data, the blogcategory, etc.
For this I use a facetted query. Next I want to make some results more
important, depending on the BloggerId, I tried to do this with the
following query:
?q={!func}map(sum(map(BloggerId,12304,12304,2,0),map(BloggerId,12304,12304,1,0)),3,3,2)&fl=*,score&facet.field=Country&f.Country.facet.limit=6&facet.field=State&fq=(BlogCategory:internet%20OR%20BlogCategory:sports&sort=score%20desc,Top%20desc,%20SortPriority%20asc&start=0&omitHeader=true
In the resulting list, blogs written by BloggerId 12304 should be on
top of the list, followed by the blogs where BloggerId 12304 was
co-writer. After that, all other blogs that follow the criteria but
aren't written (or co-written) by BloggerId 12304.
Maybe I could make this multivalued field a string field (where id's are seperated by ";") and query my value, but if one has a better idea your always welcome!
In the end I chose to add a string valued field with whitespaces to seperate the different values. After that I used the solr.WhitespaceTokenizerFactory class to quickly scan the string for occurences of a specific ID.

Resources