dynamic fieldType assignments in solr - solr

I have a scenario where I would like to dynamically assign incoming document fields to two different solr schema fieldTypes. One fieldType will be an 'exact match' fieldType while the other will be a 'full text' fieldType. The fields will follow a predictable pattern but the pattern can not be recognized using the dynamicField type and will not be known ahead of time.
So here is an example of the field names that I need to be able to process:
FOO_BAR_TEXT_1
FOO_BAR_TEXT_2
WIDGET_BAR_TEXT_3
WIDGET_BAR_TEXT_4
--
FOO_BAR_SELECT_1
FOO_BAR_SELECT_2
WIDGET_BAR_SELECT_1
The above fields will not be defined in advance. I need to map all fields with the name _BAR_SELECT_ to a fieldType of 'exactMatch' and I need to map all of the fields with name _BAR_TEXT_ to a fieldType of 'fulltext'. I was hoping there might be a way of doing
this dynamically when the document is indexed.

Have you tried using solr dynamic fields?
https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields
Basically it would look something like this:
Obviously you'd need to make your own definitions (or use an existing one) for the types.

It is currently not possible to create fields like *_BAR_SELECT_*.
In the old solr wiki, as well as in the collection1 schema.xml file the is a restriction mentioned for dynamic fields:
RESTRICTION: the glob-like pattern in the name attribute must have a "*" only at the start or the end.
However, if you change the name to, say, BAR_SELECT_*, than it will be possible to dynamicly create fields "BAR_TEXT_FOO_1", "BAR_TEXT_FOO_2", "BAR_TEXT_WIDGET_3", and so on.
Like this:
<dynamicField name="BAR_TEXT_*" type="fulltext" />
<dynamicField name="BAR_SELECT_*" type="exactMatch" />

Related

Solr: how do I use dismax instead of using copyField?

I've been trying to figure this out for a bit now. If I create a schema without the directive:
<copyField source="*" dest="text" />
I can't seem to pull anything up. But when I add that directive, things magically appear. I'm trying my query with ?defType=dismax, but that doesn't seem to help.
Am I missing something? Do I need something special in my schema? I'm indexing all the fields I need to search against.
Thoughts?
Thanks!
If you use defType=lucene you need to specify the field before your search query like this:
q=title:test
If you don't specify a field solr will use the default field specified in solrconfig.xml. This field is text by default. As all the fields are copied to text the search works well.
If you decide to use dismax the query structure changes. You need to put your search term like that:
q=test
and specify the fields to search in other parameter like that:
<str name="qf">field1 field2</str>
Where field1 and field2 are the fields you want to search the terms.

Solr dynamicField not searched in query without field name

I'm experimenting with the Example database in Solr 4.10 and not understanding how dynamicFields work. The schema defines
dynamicField name="*_s" type="string" indexed="true" stored="true"
If I add a new item with a new field name (say "example_s":"goober" in JSON format), a query like
?q=goober
returns no matches, while
?q=example_s:goober
will find the match. What am I missing?
I would like to see the SearchHandler from solrconfig.xml file that you are using to execute the above mentioned query.
In SearchHandler we generally have Default Query Field i.e. qf parameter.
Check that your dynamic field example_s is present in that query field list of solrconfig file else you can pass it while sending query to search handler.
Hope this will help you in resolving your problem.
If you are using the default schema, here's what's happening:
You are probably using default end-point (/select), so you get the definition of search type and parameters from that. Which means, it is default (lucene) search and the field searched is text.
The text field is an aggregate and is populated by copyField instruction from other fields.
Your dynamic field definition for *_s allows you to index the text with any name ending in _s, such as example_s. It's indexed (so you could search against it directly) and stored (so you can see it when you ask for all fields). It will not however search it as a general text. Notice that (differently from ElasticSearch), Solr strings have to be matched fully and completely. If you have some multi-word text in it, there is barely any point searching it. "goober" is one word so it's not a very good example to understand the difference here.
The easiest solution for you is add another copyField instruction:
<copyField source="*_s" dest="text"/>, then all your *_s dynamic fields would also be searchable. But notice that the search analyzers will not be the ones for *_s definition, but the ones for the text field's definition, which is not string, but text_general, defined elsewhere in the file.
As to Solr vs. ElasticSearch, they both err on the different sides of magic. Solr makes you configure the system and makes it very easy to see the exact current configuration. ElasticSearch hides all of the configuration, but you have to rediscover it the second you want to change away from the default behaviour. In the end, the result is probably similar and meets somewhere in the middle.

Index every word of a text file which are delimited by whitespace in solr?

I am implementing solr 3.6 in my application.as i have the below data in my text file..
**
date=2011-07-08 time=10:55:06 timezone="IST" device_name="CR1000i"
device_id=C010600504-TYGJD3 deployment_mode="Route"
log_id=031006209001 log_type="Anti Virus" log_component="FTP"
log_subtype="Clean" status="Denied" priority=Critical fw_rule_id=""
user_name="hemant" virus="codevirus" FTP_URL="ftp.myftp.com"
FTP_direction="download" filename="hemantresume.doc" file_size="550k"
file_path="deepti/Shortcut to virus.lnk" ftpcommand="RETR"
src_ip=10.103.6.100 dst_ip=10.103.6.66 protocol="TCP" src_port=2458
dst_port=21 dstdomain="myftp.cpm" sent_bytes=162 recv_bytes=45
message="An FTP download of File resume.doc of size 550k from server
ftp.myftp.com could not be completed as file was infected with virus
codevirus"
**
now i want to split above data based on key-value pairs..and want the each value to be indexed based on the key..
i want the changes should be in the configuraion files..i have gone through tokenizer in which whitespaceokenizer may work.but want the whole structure to be indexed..so can anyone please help me on this???
thanks..
There is no tokenizer that I know of does this.
Using static fields:
You have to define all your "keys" as fields in schema.xml . They should have the relevant types (dates, string etc).
Create a POJO with these fields and parse this key/value pairs and populate the POJO. Add this pojo to solr using solrj.
Using dynamic fields:
In this case you dont need to define the keys in schema but use dynamic fields (based on the type of data). You still need to parse the key/value pairs and add to solr document. These fields need to be added using solrInputdoc.addField method.
As you define add new key/value pairs, the client would still need to know of the existence of this new key. But your indexer does not need to.
This cannot be done with a tokenizer. Tokenizers are called for each field, but you need processing before handing the data to a field.
A Transformer could probably do this, or you might do some straightforward conversion before submitting it as XML. It should not be hard to write something that reads that format and generates the proper XML format for Solr submissions. It sure wouldn't be hard in Python.
For this input:
date=2011-07-08 time=10:55:06 timezone="IST" device_name="CR1000i"
You would need to create the matching fields in a schema, and generate:
<doc>
<field name="date">2011-07-08</field>
<field name="time">2011-07-08</field>
<field name="timezone">IST</field>
<field name="device_name">CR1000i</field>
...
Also in this pre-processing, you almost certainly want to convert the first three fields into a single datetime in UTC.
For details about the Solr XML update format, see: http://wiki.apache.org/solr/UpdateXmlMessages
The Apache wiki is down at this exact moment, so try again if there is an error page.

Solr copyField mixed with RegexTransformer

Scenario:
In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|
What I want:
In Solr index, I want to have 2 fields:
Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130
For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml:
<field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml
What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.
As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.
Any help is appreciated, thanks.
Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,
Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.
Check this Question, it is similar.
You can create two columns from the same data and treat them separately.
SELECT categories, categories as categories_pipe FROM category_table
Then you can split the "categories" column, but index the other one as-is.

Querying Solr without specifying field names

I'm new to using Solr, and I must be missing something.
I didn't touch much in the example schema yet, and I imported some sample data. I also set up LocalSolr, and that seems to be working well.
My issue is just with querying Solr in general. I have a document where the name field is set to tom. I keep looking at the config files, and I just can't figure out where I'm going awry. A bunch of fields are indexed and stored, and I can see the values in the admin, but I can't get querying to work properly. I've tried various queries (http://server.com/solr/select/?q=value), and here are the results:
**Query:** ?q=tom
**Result:** No results
**Query:** q=\*:\*
**Result:** 10 docs returned
**Query:** ?q=*:tom
**Result:** No results
**Query:** ?q=name:tom
**Result:** 1 result (the doc with name : tom)
I want to get the first case (?q=tom) working. Any input on what might be going wrong, and how I can correct it, would be appreciated.
Set <defaultSearchField> to name in your schema.xml
The <defaultSearchField> Is used by
Solr when parsing queries to identify
which field name should be searched in
queries where an explicit field name
has not been used.
You might also want to check out (e)dismax instead.
I just came across to a similar problem... Namely I have defined multiple fields (that did not exist in the schema.xml) to describe my documents, and want to search/query on the multiple fields of the document, not only one of them (like the "name" in the above mentioned example).
In order to achieve this, I have created a new field ("compoundfield"), where I then put/copyField my defined fields (just like the "text" field on the schema.xml document that comes with Solr distribution). This results in something like this:
coumpoundfield definition:
<field name="compoundfield" type="text_general" indexed="true" stored="false" multiValued="true"/>
defaultSearchField:
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>compoundfield</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster searching. -->
<!-- ADDED Fields -->
<copyField source="field1" dest="compoundfield"/>
<copyField source="field2" dest="compoundfield"/>
<copyField source="field3" dest="compoundfield"/>
This works fine for me, but I am not sure if this is the best way to make such a "multiple field" search...
Cheers!
It seems that a DisMax parser
is the right thing to use for this end.
Related stackoverflow thread here.
The current solution is deprecated in newer versions of lucene/solr. To change the default search field either use the df parameter or change the field that is in:
<initParams
path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">default_field</str>
</lst>
</initParams>
inside the solrconfig.xml
Note I am using a non-managed schema and solr 7.0.0 at the time of writing
Going through the solr tutorial is definitely worth your time:
http://lucene.apache.org/solr/tutorial.html
My guess is that the "name" field is not indexed, so you can't search on it. You'd need to change your schema to make it indexed.
Also make sure that your XML actually lines up with the schema. So if you are adding a field named "name" in the xml, but the schema doesn't know about it, then Solr will just ignore that field (ie it won't be "stored" or "indexed").
Good luck
Well, despite of setting a default search field is quite usefull i don't understand why don't you just use the solr query syntax:
......./?q=name:tom
or
......./?q=:&fq=name:tom

Resources