solr 8.11 Field Types docs contradiction. Any guidance? - solr

I'm setting up my first Solr server via docker using solr:8.11.1-slim. I am gonna use the schema API to set up the schema for my core whose name is 'products'.
While reading the docs there seems to be false info on the docs for field types:
https://solr.apache.org/guide/8_11/field-types-included-with-solr.html
vs.
https://solr.apache.org/guide/8_11/schema-api.html
I followed the first guide to get info on what field types I can specify and am trying to send requests based on the second doc such as this:
{ 'add-field': { "name":"latlong", "type":"LatLongPointSpatialField", "multiValued":False, "stored":True, 'indexed': True } },
but Solr gives me back errors such as:
org.apache.solr.api.ApiBag$ExceptionWithErrObject: error processing commands, errors: [{add-field={name=latlong, type=LatLongPointSpatialField, multiValued=false, stored=true, indexed=true}, errorMessages=[Field 'latlong': Field type 'LatLongPointSpatialField' not found
So what gives? Am I misreading the docs or are they wrong or is something wrong with the solr 8.11.1 image in docker? Why does it not accept the field types I'm providing?
Thanks for your help ahead of time.

Related

DSpace 7 - No suggestions when adding a new metadata field to an existing item

In DSpace 7.4, while adding new metadata to an existing item I'm not getting any suggestions in the input box, is there any configuration step I'm missing?
I can see it working in the official demo site
here
I see in dspace.log that there is this API call when I type something in the input box:
GET /server/api/core/metadatafields/search/byFieldName
so I tried this with no results:
/server/#api/core/metadatafields/search/byFieldName?query=author
The same API call in the DSpace 7 demo site returns 3 results: https://api7.dspace.org/server/#api/core/metadatafields/search/byFieldName?query=author
Also, in solr.log I see this call, which returns no hits:
2023-01-11 13:09:55.709 INFO (qtp359742806-22) [ x:search] o.a.s.c.S.Request [search] webapp=/solr path=/select params={q=*:*&fl=search.resourcetype,search.resourceid,search.uniqueid,database_status&start=0&fq=fieldName_keyword:author*&fq=&wt=javabin&version=2} hits=0 status=0 QTime=1
Maybe there is a problem with SOLR?
Since it seemed to be a SOLR problem, I tried to reindex like this: dspace index-discovery -b, thus getting this error:
java.lang.NullPointerException: Cannot invoke "org.dspace.eperson.EPerson.getID()" because the return value of "org.dspace.authorize.ResourcePolicy.getEPerson()" is null
at org.dspace.discovery.SolrServiceResourceRestrictionPlugin.additionalIndex(SolrServiceResourceRestrictionPlugin.java:95)
at org.dspace.discovery.indexobject.IndexFactoryImpl.buildDocument(IndexFactoryImpl.java:67)
at org.dspace.discovery.indexobject.InprogressSubmissionIndexFactoryImpl.buildDocument(InprogressSubmissionIndexFactoryImpl.java:46)
at org.dspace.discovery.indexobject.WorkspaceItemIndexFactoryImpl.buildDocument(WorkspaceItemIndexFactoryImpl.java:63)
at org.dspace.discovery.indexobject.WorkspaceItemIndexFactoryImpl.buildDocument(WorkspaceItemIndexFactoryImpl.java:30)
at org.dspace.discovery.SolrServiceImpl.update(SolrServiceImpl.java:165)
at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:155)
at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:340)
at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:327)
at org.dspace.discovery.IndexClient.internalRun(IndexClient.java:130)
at org.dspace.scripts.DSpaceRunnable.run(DSpaceRunnable.java:104)
at org.dspace.app.launcher.ScriptLauncher.executeScript(ScriptLauncher.java:149)
at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:131)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98)
And found the solution here: https://groups.google.com/g/dspace-tech/c/ioukme4el9o.
WARNING: It's recommended to test this in a development environment first.
The problem is that there was several rows with no eperson_id and no epersongroup_id in the resourcepolicy table. So, I've deleted those rows with delete from resourcepolicy where eperson_id is null and epersongroup_id is null; and reindexed: dspace index-discovery -b.
Note: this problem seems to occur whenever the database comes from a dump of a previous DSpace version. In my case I restored a dump from a 5.6 version and migrated it to 7.4 before running into this issue.

Solr - How to fix "Error adding field ... msg=For input string" when post data to core

I am new to Solr.
I created a Solr(8.1.0) core using SolrCloud for testing, and try to post data as a json file.
When an object has a value with float like "spalte412": "35.5" or with special characters, it throws an error in the in the console:
SimplePostTool: WARNING: Response: {
"responseHeader":{
"rf":2,
"status":400,
"QTime":223},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=52] Error adding field 'spalte421'='156.6' msg=For input string: \"156.6\"",
"code":400}}
I tried to edit core Schema by adding the field, in the Admin UI, without success.
Thanks for you help !
If you're not pre-defining your fields, the field types determined for the field will depend on the first document submitted that has that field present. Solr uses this field type to guess the type of the field, and in this case the guessed field type differs from the format you're sending in later documents.
The schemaless mode is neat for prototyping, but when moving to production you should always add the fields up front with the correct types so you don't suddenly get any surprises (as above) when the documents are submitted in a different order (or different documents) than when developing.
You can define fields in schema.xml or through the SchemaAPI.
You should post the schema.xml an an short description, what you did before.
"root-error-class","java.lang.NumberFormatException"
Sounds like solr war unable to understand that number format while your are trying to put a document with an stringt ( =For input string: \"156.6\"")
Sounds like you have a mismatch between a delivered and expected format.
Thanks guys.
indeed, I solved it by deleting the fields in the admin UI and defining with
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' http://localhost:8983/solr/films/schema

Logstash - Solr Integration

Below is my actual log,
--2019-05-09 06:49:05.590 -TRACE 6293 --- [ntainer#0-0-C-1] c.s.s.service.MessageLogServiceImpl : [41a6811cbc1c66eda0e942712a12a003d6bf4654b3edb6d24bf159b592afc64f1557384545548] Event => Message Failure Identified : INVALID_STRUCTURE
My given grok filter pattern,
match => {
"message" => "--%{TIMESTAMP_ISO8601:logtime} -%{LOGLEVEL:level} (?<pid>\d+) --- \[(?<thread>[^\]]+)] (?<classname>[\w.]+)\s+: \[(?<token>[^\]]+)] Event \=> Message Failure Identified : (?<code>[\w]+)"
}
After doing some adding/removing desired filed below is my tokenized form,
{
"code" => "INVALID_STRUCTURE",
"event" => "message_failure",
"token" => "41a6811cbc1c66eda0e942712a12a003d6bf4654b3edb6d24bf159b592afc64f1557384545548",
"logtime" => "2019-05-09 06:49:05.590"
}
Now I want to send it to solr, but while sending this is giving me the warning,
[WARN ][logstash.outputs.solrhttp] An error occurred while indexing: undefined method `iso8601' for nil:NilClass
I think it related to "logtime" field since that is the only portion which deals with ISO8601. Nothing extra information found in the logs. What is the problem here?
Finally got the answer. Thanks
this blog
Logstash will use a ruby library called Rsolr for connecting to Solr. Logstash will install the latest version of Rsolr, but latest Rsolr has a bub with timestamp datatype which will prevent indexing to Solr. To prevent this we will have to manually install the working version of Rsolr.
Changed the default rsolr plugin as mentioned in the post and everything started working

Error adding field 'field_name'-'field_value' msg=For input string: \"field_Value\"

We are struggling to import certain files into Solr occasionally. It seems like certain documents have weird meta data (values), not sure if it might be from eccentric word processor or something else. See two examples here:
Type: Solarium\Exception\HttpException
Message: Solr HTTP error: OK (400)
{"responseHeader":{"status":400,"QTime":49},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],"msg":"ERROR: [doc=3932487729] Error adding field 'brightness_value'='6.18' msg=For input string: \"6.18\"","code":400}}
And
Type: Solarium\Exception\HttpException
Severity: error --> Exception: Solr HTTP error: OK (400)
{"responseHeader":{"status":400,"QTime":72},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],"msg":"ERROR: [doc=16996] Error adding field 'version'='5.3.1' msg=For input string: \"5.3.1\"","code":400}}
How do we prevent these issues? We are not in control of the documents, so need to fix it on the server.
Define the field type explicitly in the schema instead of relying on Solr to create the field type for you - the first document that contains the field will make Solr guess the type of the field, and if later documents doesn't match the same, expected format, you'll get an error like this.
Always define the schema for a collection when using it in production or in an actual application - the schemaless mode is really neat for prototyping and experimenting, but in an actual application you want the types to be well defined.

How to query custom AEM metadata in Solr

I've done Solr integration with Remote solr server and all the indexes were created, everything is fine.
Problem: I am not getting search result when I tried with OOTB metadata (dc:title) images though DAM Console search and asset viewer. It works only when I search on the node name GeoCube_Datasheet.pdf (Node name of type dam:Asset) . After I added a custom metadata for an Asset under jcr:content/metadata node, lets say dam:custom of type String, after I added another node under oak:index with below properties :
jcr:primaryType=oak:QueryIndexDefinition, reindex=true ( type Boolean) ,
propertyNames=dam:custom (type Name[]), type = property (type String).
After I modified schema.xml of Solr as :
<field name="dam:custom" type="matchall" />
After doing above configuration I can see the index is created in Solr Admin console. Solr admin console returns json output.
"path_exact": "/content/dam/geometrixx/portraits/scott_reynolds.jpg/
jcr:content/metadata", "jcr:primaryType": [ "nt:unstructured" ],
"dam:custom": [ "helloworld" ],
But When I do fulltext search in DAM console/ Asset viewer then Querybuilder does not return anything. But When I use default search engine (Lucene) then it returns records based on custom metadata.
For testing purpose when I executed this somehost:port/bin/querybuilder.json?fulltext=helloworld&type=dam:Asset then it does not return anything but it returns resukt when I executed somehost:port/bin/querybuilder.json?fulltext=helloworld&type=nt:unstructured
Please let me know what I'm missing here and how to get search result based on metadata
Thanks for your help!
You need to change Apache Jackrabbit Oak Solr Query index provider configuration in the felix console and enable the query time aggregation.

Resources