Store abbreviation using Solr in-built feature - solr

I want to make abbreviation of words using Solr. I am using Solr 7.1. In the schema, I have one field named "author", which is of data type string. Now I want to make another copy field from it which will store abbreviation of the string which is trying to store in "author" field. As example, "William Shakespeare" is going to store in "author" field, during addition "W. Shakespeare" will add in the copy field. I am very new in Solr and unable to configure it to fulfill the purpose. Please help.

Related

Copy Solr Field Values via Script

I would like to copy the data from one field to another field for all documents in Solr.
A title field that is already populated needs to be copied into another field I just created. I'd like to do them all at once if possible via Putty or the Solr Admin console.
Thank you for any help.
If you have pre-ingested data then the only option is to re-ingest the data after adding the second field. You can set only the new field in the docs instead of inserting all the fields using Solr atomic updates. https://solr.apache.org/guide/8_6/updating-parts-of-documents.html#atomic-updates
solr.add({'id':1, 'newField': {'set': 'sample value'}})
For future insertions, if you want the second field to be auto filled, you can use Solr copy field with the source set to the first field. https://solr.apache.org/guide/8_6/copying-fields.html

Matching elasticsearch data indexed by Titan

I have indexed titan data in elasticsearch, it worked fine and indexed but when i see the data in elasticsearch using REST API. the column/property name looks different than from Titan.
For example i have indexed age while inserting data to Titan
final PropertyKey age = mgmt.makePropertyKey("age").dataType(Integer.class).make();
mgmt.buildIndex("vertices",Vertex.class).addKey(age).buildMixedIndex(INDEX_NAME);
and if i see same in elasticsearch
{
"_index" : "titan",
"_type" : "vertices",
"_id" : "sg",
"_score" : 1.0,
"_source":{"6bp":30}
},
Looking at the data i can understand "6bp" is age. how this conversion is done? How can i decode it.?
My goal is to insert data to Titan index on ElasticSearch. The user query should search on ElasticSearch using ElasticSearch client becuase we need more search functionality that ElasticSearch supports, if data is searched then get the related result using Titan query.
The field names are Long encoded. You can reverse encode using this class
com.thinkaurelius.titan.util.encoding.LongEncoding
or, an even better option if you can use it, would be to simply specify the search field names explicitly using the field mapping:
By default, Titan will encode property keys to generate a unique field name for the property key in the mixed index. If one wants to query the mixed index directly in the external index backend can be difficult to deal with and are illegible. For this use case, the field name can be explicitly specified through a parameter.
mgmt = g.getManagementSystem()
name = mgmt.makePropertyKey('bookname').dataType(String.class).make()
mgmt.buildIndex('booksBySummary',Vertex.class).addKey(name,com.thinkaurelius.titan.core.schema.Parameter.of('mapped-name','bookname')).buildMixedIndex("search")
mgmt.commit()
http://s3.thinkaurelius.com/docs/titan/0.5.1/index-parameters.html#_field_mapping

SOLR Data Input Handler ( DIH ) : extract email addresses during indexing and put in another field ?

I have a field called main_text which contains a large text entries .
I want to reindex my data, by creating a new collection , but I want to extract all email addresses from this field to new special field called emails_fields.
What will be the best way to do it ?
What handler to use ? DIH ? another ?
what type should be this new field ?
To use the DataImportHandler you should add something similar to the following to your data-config.xml file.
<field column="email_fields" regex="(/S+#/S+)" sourceColName="main_text"/>
This would look for email addresses that match the regex /S+#/S+. This regular expression should be changed to something better for real use.
The type of the field depends on how you want to search it but it should probably be string or text_general and if you are expecting there to be more than one email in each document it should be multi-valued.

Solr 3.5 field type

I'm using Solr 3.5 in the application that I'm working currently. I have defined few field types as custom which would be a prefixed values.
Mostly they are price which differs for each and every prefix.
Example 123_34.99 will define the price "34.99" in the store "123".
I need to know whether any exact/similar Out of Box fieldtype is there in Solr 4.1.0 to handle the above mentioned field types.
I guess a better approach to store your data would be to use Solr dynamic fields. Instead of storing your data as 123_34.99, wouldn't you want to store it in a price_STOREID field like
price_123 = 34.99
Or is there a specific reason you want to store it as 123_34.99?

Returning documents using multi-valued field

I'm quite new to Solr and I'm supporting an existing Solr search engine which was written by someone else. I've been reading on Solr for the last couple of weeks so I'd consider myself beyond the basics.
A particular field, let's say name, is multi-valued. For example, a document has a field "name" with values "Alice, Trudy". We want that the document is returned when "Alice" or "Trudy" is input and not when "Alice Trudy" is entered. Currently the document is even with "Alice Trudy". How could this be done?
Thanks a lot!
Krt_Malta
If the field value is "Alice, Trudy", normally solr/lucene should match for "alice" or "trudy". If not, there could be special "Text Analysis" or stemming options active for this field.
Take a look at the part "text analysis" at the solr documentation: http://lucene.apache.org/solr/tutorial.html#Text+Analysis
and: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Resources