Using logic AND in a text field - solr

I'm using a schema that has a text field containing ids separated by spaces. The field definition in schema is below:
<field name="aux_identifiers" type="text" indexed="true" stored="true"/>
a query that fetch a single document returns the field as below - example:
<str name="aux_identifiers">1 2 3 4</str>
is there any possibility to apply a logic AND operator to these fields? I need to find the documents that has, as example, the ids 2 and 3 in the field.
fyi, we can't modifiy those fields to multivalued or array and reindex right now. that's why i'm trying a alternate solution.

It would depend on what kind of processing you have on that field, but this should work:
q=aux_identifier:2 AND aux_identifier:3

Related

Solr Search not working after dataimport successful

I am new in Solr. I have tried DataImport using a Oracle Database. The data gets successfully imported. When I try to search with query:
qt=standard
q=*
I get good results. But when I do a specific search, the results are empty showing no documents. The logger is empty and there are NO errors displayed.
Ok! I got it.
I observed that when I am using some pre-defined fields of schema.xml, the search on those fields are working fine. But when I defined some fields of my own, the result was still NOTHING.
Then I looked into "solr-config.xml's" "/select" request handler. There is a line
<str name="df">text</str>
which says that "txt" is the only field which is searchable. But then how does it searches the other fields?
Answer lies in "schema.xml's"
"<copyField>"
tag. The fields present by default are copied into "text" which makes them searchable. Hence if you want your defined field as searchable, just define your field and add it in copyField tag. ;)
TLDR Version: Define your fields as type="text" to start off. If you have a field called "product", add <field name="product" type="text" indexed="true" stored="true" /> to the default schema.xml inside the <fields> tag and you should be done. To search using the select request-handler, use q=<field_name>:<text_to_look_for> or q=*:* to show all documents.
There are a few mistakes you're making here. I'll be explaining using the 'select' request handler.
The format for a query is ?q=<field_name>:<text_to_look_for>. So if you want to return all the values matching all the fields, you'd say q=*:*
And if you were to look for the word "iPod" in the field "product" your query would be q=product:iPod
Another thing to keep in mind is that if in schema.xml, say if you specify the field product as type="string" which maps to class="solr.StrField", the query (<text_to_look_for>) should precisely match the value in the index, since Solr doesn't tokenize the StrField by default, i.e., ipod will not return results if your index holds it as iPod. If you need it to return it still, you could use the type="text" in schema.xml (the fieldType definition is present already in the default schema.xml.) The "text" fieldType has several analyzers(one analyzer ignores case) and tokenizers(tokenizer splits up the words in the field and indexes them so that if you search for a particular word, say "ipod", it would match the value "iPod 16GB White").
Regarding your own answer, the <str name="df">text</str> specifies the default field to search in, i.e, if you just said q=iPod, it would look in this field. The objective of this field called text is to hold all the other fields in the document, so that you could just search in this field and know that some or the other field in this document would match your query, thereby you wouldn't need to search in a specific field if you don't know what field you're expecting the value to be in.

Searching multi-valued fields in the same position

Let's say we have a document like this:
<arr name="pvt_rate_type">
<str>CORPORATE</str>
<str>AGENCY</str>
</arr>
<arr name="pvt_rate_set_id">
<str>1</str>
<str>2</str>
</arr>
Now I do a search where I want to return the document only if it contains pvt_rate_set_id = 1 AND pvt_rate_type = AGENCY in the same position in their mutli-valued fields so the above document should NOT be returned (because pvt_rate_set_id 1 has a pvt_rate_type of CORPORATE)
Is this possible at all in SOLR ? or is my schema badly designed ? how else would you design tat schema to allow for the searching I want?
This may not be available Out of the Box.
You would need to modify the schema to have fields with pvt rate type as field name and id as its value
e.g.
CORPORATE=1
AGENCY=2
This can be achieved by having dynamic fields defined.
e.g.
<dynamicField name="*_pvt_rate_type" type="string" indexed="true" stored="true"/>
So you can input data as corporate_pvt_rate_type or agency_pvt_rate_type with the respective values.
The filter queries will be able to match the exact mappings fq=corporate_pvt_rate_type:1
Unfortunately Solr does not seem to support this.
Another way to do this in Solr would be to store a concatenated string field type_and_id with a delimiter (say comma) separating the type and the id and query like:
q=type_and_id:AGENCY%2C1
(where %2C is the URL encoding for comma).

SOLR - Use single text field in schema for full text search

I am getting familiar with SOLR.
I would like to use SOLR for full text search for many kind of entities. I don't want to create a Document for every different type of entity. I don't want to be able to search for specific fields. I am only interested in that if a specified string is anywhere in any item.
In database terms for example I have a table News and a table Employee and I want to search for the word 'apple', I don't mind in which field it is, I only want to get back the database ID from the records which contain it.
Could it be a solution, that I use a SOLR schema something like this:
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="content" type="text" indexed="true" stored="false"/>
</fields>
So, I only need an ID and the contents. I put all the data, in which I want to be able search into one 'content' field. When I search for some words it looks for it in the 'id' and int the 'content'.
Is this a good idea? Any performance or design problem?
Thanks,
Tamas
See https://wiki.apache.org/solr/SchemaXml#Copy_Fields. It says:
A common requirement is to copy or merge all input fields into a single solr field. This can be done as follows:-
<copyField source="*" dest="text"/>
That's typically what is done to search across multiple fields.
But if you don't even want your original fields, just concatenate all your fields into one big field content and index in Solr. There should be no problems with that.
You can either copyField to text (see example in the distribution) and have that set as default field ("df" parameter in solrconfig.xml for the select handler).
Or, if you anticipate more complex requirements down the line and/or non-text searches, I would recommend looking at eDismax with qf parameter and it will handle searching all those fields itself.

How to create Solr schema for hierarchical facet by splitting data into multiple fields at index time

I want to implement Solr hierarchical facet for my application where there is 2 level hierarchy between Category and SubCategory. I want to use a solution mentioned on http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets link.
The flattened data will be as below:
Doc#1: NonFic > Law
Doc#2: NonFic > Sci
Doc#3: NonFic > Sci > Phys
And this data should be split into a separate field for each level of the hierarchy at index time. Same as below.
Indexed Terms
Doc#1: category_level0: NonFic; category_level1: Law
Doc#2: category_level0: NonFic; category_level1: Sci
Doc#3: category_level0: NonFic; category_level1: Sci, category_level2:Phys
So can anyone please suggest ways to implement this? How do I define Solr schema to achieve this? I could not find any reference for splitting data as mentioned above at Index time.
Thanks,
Priyanka
Do you need to display those individual fields as part of the documents returned? In which case you need those split values in 'stored' version of the field. If you only need to have them during search or during faceting, you can ignore the 'stored' form and concentrate on 'indexed' form.
In either case, if you need to split one field into several, you can do that with copyField or with UpdateRequestProcessor.
With copyField, the 'stored' form will be the same for all fields, but you can have different processors for each field, picking different part of the hierarchy for the 'indexed' part.
With UpdateRequestProcessor, you can write a custom one that takes one field and then spits out several fields, each with only its part of the path. You can do a custom one or do a couple of field copies and then different Regex processor on each field.
To split the data, use a ScriptTransformer that allows you to transform the data using Javascript within your config files.
Add the following to your db-data-config at the same level as dataSource and document. This defines a function that splits the string within a field on the delimiter, >, and adds a field for each of the split values called category_level0, category_level1,...
<script><![CDATA[
function CategoryPieces(row) {
var pieces = row.get('ColumnToSplit').split('>');
for (var i=0; i < pieces.length; i++) {
row.put('category_level' + i, pieces[i]);
}
return row;
}
]]></script>
Then in your main <entity> tag, add transformer="script:CategoryPieces", and add the columns to your field list.
<field column="category_level0" name="Category_Level0" />
<field column="category_level1" name="Category_Level1" />
Last, in your schema.xml, add the new fields.
<field name="Category_Level0" type="string" indexed="true" stored="true" multiValued="false" />
<field name="Category_Level1" type="string" indexed="true" stored="true" multiValued="false" />

Solr schema.xml field confusion

i m new to solr so i really need someone to help me understand the fields below. What's the meaning of the field if it's stored=false, indexed=false? see the two examples below, what's the differences? If the field is not stored, what's the use of it...
<field name="test1" type="text" indexed="false"
stored="false" required="false" />
How about this one?
<field name="test2" type="text" indexed="false"
stored="false" required="false" multiValued="true" />
Thanks a lot!
You can find best explanation from Solr wiki.
If you want a field to be searchable then you should set indexed attribute to true.
indexed=true : True if this field should be "indexed". If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
If you want to retrieve the field at the search result then you should set stored attribute to true.
stored=true : True if the value of the field should be retrievable during a search
If you want to store multiple value in a single field then you should set multivalued field to true.
multivalued=true : True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document
It's easier than it seems:
indexed: you can search on it
stored: you can show it within your search results
In fact, there might be fields that you don't use for search, but you just want to show them within the results. On the other hand, there might be fields that you want to show within the results but you don't want to use for search. The stored=false is important when you don't need to show a certain field, since it improves performance. If you make all your fields stored and you have a lot of fields, Solr can become slow returning the results.
Of course, having both false doesn't make a lot of sense, since the field would become totally useless.
The unique difference between your two fields is the multiValued=true, which means that the second field can contain multiple values. That means that the content of the field is not just a text entry but a list of text entries.

Resources