Sitecore 9 Indexing : Solr Pattern Tokenizer not Working - solr

I'm new with this combination sitecore and solr stuff.. I've a little issue with the pattern tokenizer which is not working.. I'm following this documentation
Solr :
https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-RegularExpressionPatternTokenizer)
Sitecore 9 Solr :
https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/search_and_indexing/using_solr_field_name_resolution
When I do the indexing, my field value is : a,b,c and I expected on solr it will be ["a","b","c"] but it contains ["a,b,c"]
This is my Sitecore Config
<fieldMap>
<typeMatches hint="raw:AddTypeMatch">
<typeMatch type="System.Collections.Generic.List`1[System.String]" typeName="commaDelimitedCollection" fieldNameFormat="{0}_cd"
multiValued="true" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider"/>
</typeMatches>
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="Keywords" returnType="commaDelimitedCollection"/>
</fieldNames>
</fieldMap>
This is my Solr Schema
<fieldType name="commaDelimited" class="solr.TextField" multiValued="true">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*"/>
</analyzer>
</fieldType>
<dynamicField name="*_cd" type="commaDelimited" multiValued="true" indexed="true" stored="true"/>
Any idea what's wrong with my configuration above?
Thanks

Not sure if I get the full picture here. Maybe your approach is perfectly valid, but I don't think I've seen that one before. Instead of defining a new type, you could reuse the *_sm (multiValued string) and perform the splitting of the string at index time on the Sitecore side. Usually you don't need more field types than the ones provided by sitecore and it's typically easier to maintain all the code in your VS solution instead of depending on additional Solr config. (In Sitecore 9 you can deploy your Solr managed schema from the control panel though.)
A simple computed field field can look like this:
<fields hint="raw:AddComputedIndexField">
<field fieldName="keywords" returnType="stringCollection">
Your.Name.Space.YourComputedFieldClass, YourAssembly
</field>
</fields>
And a class implementation could look something like this:
public class YourComputedFieldClass : IComputedIndexField
{
public object ComputeFieldValue(IIndexable indexable)
{
var item = indexable as SitecoreIndexableItem;
var fieldValue = item?.Item?["Keywords"]
if (string.IsNullOrWhitespace(fieldValue)) {
return null;
}
return fieldValue.Split(',');
}
public string FieldName { get; set; }
public string ReturnType { get; set; }
}

Related

Solr tokenizer does not do anything

I want to tokenize one solr string field "content" to another field "tokenized".
So e.g.:
{
"content":"Hello World this is a Test",
"tokenized":["hello", "world", "this", ...]
}
For that i use
<field name="content" type="string" indexed="true" stored="true"/>
<field name="tokenized" type="customType" indexed="true" stored="true"/>
<copyField source="content" dest="tokenized"/>
and the custom field type
<fieldType name="customType" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
My understanding was that upon committing all contents are tokenized with the specified tokenizer and then put, as a list of tokens, into the tokenized field. However the tokenized field only contains the content in a list, e.g.:
{
"content":"Hello World this is a Test",
"tokenized":["Hello World this is a Test"]
}
Is there some global configuration i need to make to get tokenizers to work?
Tokens are only stored internally in Lucene and Solr. They do not change the stored text that gets returned to you in any way. The text is stored verbatim - i.e. the text you sent in is what gets returned to you.
The tokens generated in the background and stored in the index affect how you can search against the content you've stored and how it's processed, it does not affect the display value of the field.
You can use the Analysis page under Solr's admin page to see exactly how text for a field gets processed into tokens before being stored in the index.
The reason for this is that you're usually interested in returning the actual text to the user, making the tokenized and processed values visible doesn't really make sense for a document that gets returned to a human.

Solr Spatial - Indexing bean

I need to index on Solr a bean that contains a generic spatial field (generally, a polygon).
I configured my Solr core schema in this way (following the tutorial here):
<fieldType name="area" class="solr.RptWithGeometrySpatialField" spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
autoIndex="true"
validationRule="repairBuffer0"
distErrPct="0.025"
maxDistErr="0.001"
distanceUnits="kilometers" />
....
<field name="location" type="area" indexed="true" stored="true" required="true" multiValued="false" />
My bean class is as follows:
public class MySolrBean {
#Field("id")
private String id;
#Field("location")
private Geometry location;
// getters and setters...
}
where Geometry refers to com.vividsolutions.jts.geom.Geometry (jts-1.13)
When I try to add a new bean to the index with SolrClient.addBean(Object) I get the following error:
Unable to parse shape given formats "lat,lon", "x y" or as WKT because java.text.ParseException: Unknown Shape definition [com.vividsolutions.jts.geom.Polygon:POLYGON ((1 0, 0.9980267284282716 0.0627905195293134, 0.9921147013144779 0.12533323356430...]
where WKT representation of my polygon is prefixed by the class fqn. I remember I saw a similar problem some time ago, this time when using ZonedDateTime: I changed my code to use java.util.Date and everything worked.
Though now I would not know which class to use instead of com.vividsolutions.jts.geom.Geometry and surfing the web I didn't find any documentation about that.
Anyone can help me out sorting this issue?
EDIT
Forgot to mention I'm using the latest Solr and Solrj distribution: 6.5.1

Sitecore _path field returns NULL in Solr index

I am using Solr index for Sitecore.
However, the search result always gives back null for _path field.
It was working on Lucene. Does Solr needs special treatment?
Below is the glass mapper property:
[IndexField("_path"), TypeConverter(typeof(IndexFieldEnumerableConverter))]
[SitecoreIgnore]
public virtual System.Collections.Generic.IEnumerable<ID> EntityPath { get; set; }
And the SOLR schema has entry below:
<field name="_path" type="string" indexed="true" stored="false" multiValued="true" />
Change your "store" setting to true:
<field name="_path" type="string" indexed="true" stored="true" multiValued="true" />
The stored attribute will make sure that your original value is kept in the index for retrieval. Otherwise you can search in the field, but not fetch it.

Solr geographical search

i am testing solr query for geographical search, this is my query:
SolrQuery query =new SolrQuery();
query.setParam("q","*:*");
query.setParam("fq","geofilt");
query.setParam("d","100000");
query.setParam("pt","51.53750834,-0.19329616");
query.setParam("sfield","location_s");
i am getting no results although there is very near points and also exact point to the pt.
any idea whats the reason??
hint: im using this field type for spatial search (the one comes in the schema.xml by default):
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
because when i try to use this one as mentioned in the solr website i get an error:
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
autoIndex="true"
distErrPct="0.025"
maxDistErr="0.000009"
units="degrees" />
and this is my field definition:
<field name="location_s" type="location_rpt" indexed="true" stored="true"/>
thanks in advance!
instead of this kind of field names please add only 1 field that served for both x/y :
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="true"/>
That will allow you to put directly datas in solr geo format :
"env": "DEV",
"latlgn_0_coordinate": -2.6263,
"latlgn_1_coordinate": -44.1978,
please take a look to both solr spatialsearch & solr wiki spatialsearch
As said above, please make sure to have in your runtime classpath.
You might download and install in your path the JTS library : JTS Library
Solr Manual solr install documentation
The JTS jar file must be on Solr's classpath as well. Due to a
combination of things, JTS can't simply be referenced by a ""
entry in solrconfig.xml; it needs to be in WEB-INF/lib in Solr's war
file, basically.
enjoy :)
Thanks jean, it worked with me without using JTS library, solr schema comes already with this field type that I used:
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
I defined a field of that type and it worked. This is my query:
SolrQuery query =new SolrQuery();
query.set("q","*:*");
query.set("fq","{!geofilt}");
query.set("pt","32.014708,35.873725");
query.set("sfield","location");
query.set("d","100");
Thanks.

Updatable Fields In Solr

I am using Solr for searching my corpus of web page data. My solr-indexer will create several fields and corresponding values. However some of these fields I want to update more often, like for example the number of clicks on that page. These fields need not be indexable and I don't need to perform a search on these field values. However I do want to fetch them and update them often.
I am a newbie in solr so a more descriptive answer with perhaps some running example/code would help me better.
If you are on Solr 4+, yes you can push a Partial Update to Solr index.
For partial update, all fields in your schema.xml need to be stored.
This is how your fields section should look like:
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true" />
<field name="body" type="text_general" indexed="true" stored="true"/>
<field name="clicks" type="integer" indexed="true" stored="true" />
</fields>
Now when you send a partial update to one of the fields, eg: in your case the "clicks"; in the background Solr will go and fetch values for all other fields for that document, such as title, description, body, delete old document and will push new updated document to Solr index.
localhost:8080/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","clicks":{"set":100}}]
Here is a good documentation on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Sample SOLR- partial update code:
Prerequisites: The fields need to be stored.
You need to configure update log path under direct update handler
<updateHandler class="solr.DirectUpdateHandler2">
<!-- Enables a transaction log, used for real-time get, durability, and
and solr cloud replica recovery. The log can grow as big as
uncommitted changes to the index, so use of a hard autoCommit
is recommended (see below).
"dir" - the target directory for transaction logs, defaults to the
solr data directory. -->
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>
</updateHandler>
Code:
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
public class PartialUpdate {
public static void main(String args[]) throws SolrServerException,
IOException {
SolrServer server = new HttpSolrServer("http://localhost:8080/solr");
SolrInputDocument doc = new SolrInputDocument();
Map<String, String> partialUpdate = new HashMap<String, String>();
// set - to set a field.
// add - to add to a multi-valued field.
// inc - to increment a field.
partialUpdate.put("set", "peter"); // value that need to be set
doc.addField("id", "122344545"); // unique id
doc.addField("fname", partialUpdate); // value of field fname corresponding to id 122344545 will be set to 'peter'
server.add(doc);
}
}

Resources