solr migration pdate vs. tdate - solr

I'm migrate my solr environment from 6.3 to 7.2 and walks trough all the config files.
In 6.3 I have a lot of date files, using the tdate filedType, which uses solr.TrieDateField.
<fieldType name="tdate" class="solr.TrieDateField" positionIncrementGap="0" docValues="true" precisionStep="6"/>
In Solr 7 the tdate field is no more part of the default schema file. Instead of tdate, solr 7 seams to use pdate:
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
Looking at this "Solr 7 fieldTypes doc" It seams like tdate is no more avaliable in solr 7.x
Can and should I change all the fileds using tdate to pdate?

First, if you want, you can still use TrieDateField if you want not to change anything. It deprecated, but not removed. If this declaration
<fieldType name="tdate" class="solr.TrieDateField" positionIncrementGap="0" docValues="true" precisionStep="6"/>
is missing in your schema, add it.
But, can you change to pdate? Sure, if it is easy to reindex for you, you can change and reindex. Should you? The newer type is more efficient, but for some usecases the new types were less performant than the older one, if you have a good testbed that reflect your real world usage, the best thing would be to benchmark both, if the newer ones perform at least as well as the older ones, I would say, upgrade.

Related

multivalued field sorting on SOLR 7.2.1

i use a server with solr 7.3.0 for testing. my scheme has some multivalued string fields like
<field name="rating" type="string" omitNorms="true" multiValued="true" indexed="true" stored="true"/>
On solr 7.3.0 a url query for sorting for the field "rating" works fine. Something like this:
server-name1:8983/sorl/core/search?q=*&sort=rating DESC
But recently i use a slave and the above mentioned solr 7.3.0 server is the master server.
The slave has a lesser version (7.2.1) installed. Because the server exists longer and this was was the latest version back then. I never bothered to update this yet.
But now the same query as above
server-name2:8983/sorl/core/search?q=*&sort=rating DESC
returns a error message:
"msg":"can not sort on multivalued field:rating"
My question is: is this just a recently implemented feature or did i miss something? I could update to 7.3.0 for the slave as well, but it just want to be sure if this is just a version issue.
Yes, this was implemented for 7.3.0. See SOLR-11854 - multiValued PrimitiveFieldType should implicitly sort on min/max based on the asc/desc keyword.
To find out if something has changed between versions, refer to the changelog for the new version. This is listed under the "New features" section:
SOLR-11854: multivalued primitive fields can now be sorted by implicitly choosing the min/max value for asc/desc sort orders. (hossman)

Solr highlighting for external fields

I would like to use Solr highlighting, but our documents are only indexed and not stored. The field values are found in a separate database. Is there a way to pass in the text to be highlighted without Solr needing to pull that text from its own stored fields? Or is there an interface that would allow me to pass in a query, a field name, a field value and get back snippets?
I'm on Solr 5.1.
Lucene supports highlighting (returns offsets) also for non-stored content by using docValues.
Enabling a field for docValues only requires adding docValues="true" to the field (or field type) definition, e.g.:
<field name="manu_exact" type="string" indexed="true" stored="false" docValues="true" />
(introduced in Lucene 8.5, SOLR-14194)
You could reindex the resultset (read from database) in an embedded solr instance and run the query with same set of keywords with highlighting turned on and get the highlighted text back.
You could read the schema and solrconfig as resources from local jar and extract to temporary solr core directory to get this setup working.

Solr reinterprets field during replication

I've got a Solr (version 4.10.3) cloud consisting of 3 Solr instances managed by Zookeeper. Each core is replicated from the current leader to the other 2 for redudancy.
Now to the problem. I need to index a datetime field from SQL as a TextField for wildcard queries (not the best solution, but a requirement non the less). On the core that does the import, everything looks like it should and the field contains values like: 2008.10.18 17:16:31.0 but the corresponding document (synced by the replicationhandler) on the other cores has values like: Sat Oct 18 17:16:31 CEST 2008 for the same field. I've been trying for a while to get to the bottom of this without success. The behavior of both the core and the cloud is as intended aside from this.
Does anyone have an idea of what im doing wrong?
The fieldType looks like this:
<fieldType name="stringD" class="solr.TextField" sortMissingLast="true" omitNorms="false">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([-])" replacement="." replace="all" />
</analyzer>
</fieldType>
Here is a link to a screenshot showing the behavior in all its glory, the top part is from the core that did the full-import.
So my first answer goes to my first question here ;)
When initially setting this core up an import-query like this was used.
SELECT * FROM [TABLE]
and then the fields were mapped like this in the data-import-handler.
<field column="ENDTIME" name="ENDTIME" />
When the Solr started to convert the content of the [ENDTIME] (datetime2) column in SQL to a date, this was added to the import-query.
CAST(CAST(ENDTIME as datetime2(0)) as varchar(100)) as ENDTIMESTR
to force the correct format from SQL: 2008-10-18 17:16:31.0.
The data-import-handler mapping was also changed to the following:
<field column="ENDTIMESTR" name="ENDTIME" />
Because of this, both [ENDTIME] and [ENDTIMESTR] came from SQL into the data-import-handler and somehow Solr was only able to use the correct field/fieldType on the core which initiated the full-import. When replicating the field to the other cores Solr seems to have looked at the original [ENDTIME] column (only existing in the data-import-handler during a full/delta-import, remember SELECT * FROM [TABLE]). ENDTIME in the Solr-schema was a TextField all along.
SOLUTION: Removing the * and instead explicitly define all fields in the full/delta-queries with [ENDTIME] looking like this CAST(CAST(ENDTIME as datetime2(0)) as varchar(100)) as ENDTIME.
Everything now behaves as intended. I guess there's a bug in the data-import-handler mapping somewhere but my configuration wasn't really the best either.
Hope this can help someone else out on a slippery-Solr-slope!

SOLR 4.2 - solr.LatLonType type vs solr.SpatialRecursivePrefixTreeFieldType

I am currently using SOLR 4.2 to index geospatial data (latitude and longitude data). I have configured my geospatial field as below.
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
<field name="latlong" type="location" indexed="true" stored="false" multiValued="true"/>
I just want to make sure that I am using the correct SOLR class for performing geospatial search since I am not sure which of the 2 class(LatLonType vs SpatialRecursivePrefixTreeFieldType) will be supported by future versions of SOLR.
I assume SpatialRecursivePrefixTreeFieldType is an upgraded version of latlong, can someone please confirm if I am right?
I generally recommend the SpatialRecursivePrefixTreeFieldType. It's better in many ways, but I wouldn't call it an "upgraded version of LatLonType" since that wording suggests it is a derivative which totally false. It's documented here: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 The main reason to use LatLonType (perhaps in conjunction with the new field type) is for distance sorting/relevancy which is better implemented by LatLonType still (as of Solr 4.3).
I don't see LatLonType going away any time soon; Solr takes backwards compatibility pretty seriously.

Solr british and american spelling

Search for 'globali*z*ation' only returns search results for 'globalization' but doesn't include any results for 'globali*s*ation' and vice versa.
I'm looking
into solr.HunspellStemFilterFactory filter (available in Solr 3.5).
<filter class="solr.HunspellStemFilterFactory" dictionary="en_GB.dic,en_US.dic" affix="en_GB.aff,en_US.aff" ignoreCase="true" />
Before upgrading from Solr 3.4 to 3.6.1 I was wondering if Hunspell filter is the way to go?
Thanks
If stemming doesn't solve this for you, you could always use a SynonymFilterFactory in order to normalize both spellings into one, I guess a dictionary containing US/UK spelling variations wouldn't be hard to come by.

Resources