Solr exception due to schema - solr

I have the following solr schema
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="testthing" version="1.5">
<fields>
<field name="_version_" type="long" indexed="true" stored="true" required="true"/>
<field name="doc_id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="title" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="doc_type" type="string" indexed="false" stored="true" required="true" multiValued="false"/>
<field name="description" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="allText" type="fs_text" indexed="true" stored="false" required="true" multiValued="true"/>
</fields>
<uniqueKey>doc_id</uniqueKey>
<copyField source="title" dest="allText" />
<copyField source="description" dest="allText" />
<dynamicField name="*" type="ignored" multiValued="true" />
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="fs_text" class="solr.TextField" positionIncrementGap="100"/>
</types>
</schema>
Solr complains about missing field text at dynamic field type
1898 [main] INFO org.apache.solr.servlet.SolrDispatchFilter ? SolrDispatchFilter.init() done
1918 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: undefined field text at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1235)
however, my one and only dynamic field (ignore all not matched) doesn't use text type (it's type=ignore).
What am I missing here?
** so far, renaming the allText to text pretty much fixed the issue but I can't figure out why! Is there something special/predefined about text in Solr 4.1 ?

It is not about field type "text". It is about field named "text".
<defaultSearchField>text</defaultSearchField>
You may have changed or remove the default field in config. If this fixes the issue, then you know somewhere in the configuration you're referring to "text" field, possibly in solrconfig.xml as suggested in

Related

Missing 2 fields after applying the schema file puzzler

I am using Solr 7.4 and creating core using the 3 files from the gist (one can download the files and save them in the directory <dir>/test/conf).
solr create -c test -d <dir>/test
The schema has 14 files, while only 12 end up in schema browser in Admin UI.
The schema file looks like:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="collection" version="1.6"
xmlns:inc="http://www.w3.org/2001/XInclude">
<types>
<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" />
<fieldType name="int" class="solr.IntPointField" sortMissingLast="true"/>
<fieldType name="long" class="solr.LongPointField" sortMissingLast="true"/>
</types>
<fields>
<field name="childCode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="parentCode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="id" type="string" indexed="true" stored="true" multiValued="false" />
<filed name="sortOrder" type="int" indexed="true" stored="true" multiValued="false" />
<filed name="locked" type="boolean" indexed="true" stored="true" multiValued="false" />
<field name="status" type="string" indexed="true" stored="true" multiValued="false" />
<field name="filename" type="string" indexed="false" stored="true" multiValued="false" />
<field name="url" type="string" indexed="false" stored="true" multiValued="false" />
<field name="previewUrl" type="string" indexed="false" stored="true" multiValued="false" />
<field name="shape" type="string" indexed="true" stored="true" multiValued="false" />
<field name="originalHeight" type="int" indexed="true" stored="true" multiValued="false" />
<field name="originalWidth" type="int" indexed="true" stored="true" multiValued="false" />
<field name="sizes" type="string" indexed="true" stored="true" multiValued="true" />
<field name="_version_" type="long" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
</schema>
The missing fields are 'sortOrder' and 'locked'. Based on the documentation those are valid field names:
The name of the field. Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g., version) are reserved. Every field must have a name.
Other int fields with camel case are created such as 'originalHeight' and 'originalWidth'. I am able to go into Admin UI and add the fields manually with the name and the type from the file.
I am puzzled and would appreciate any clue to this disappearing fields mystery.
Your spelling is wrong:
<filed name="sortOrder" ..
<filed name="locked" ..
Change it to <field> and it'll work as the other fields.

Access Denied trying to create Solr Config

I'm following the example at:
https://github.com/watson-developer-cloud/node-sdk/blob/master/examples/retrieve_and_rank_solr.v1.js
But everytime I try and upload a config I get
"Error: Unauthorized: Access is denied due to invalid credentials."
I've made an API key for Retrieve and Rank, are there more things to do to manage the credentials for R&R?
Here's my code:
return retrieveInstance.uploadConfigAsync({
cluster_id: clusterId,
config_name: watsonConfig.config_name,
config_zip_path: (__dirname + "/../../" + watsonConfig.config_path)
});
I'm successfully creating a cluster with this API key.
Schema.zip has this schema.xml
<schema name="simple" version="1.5">
<fields>
<!-- required -->
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="question" type="string" indexed="true" stored="true" required="true" />
<field name="answer" type="string" indexed="true" stored="true" required="true" />
<dynamicField name="*_s" type="string" indexed="true" stored="true" />
<dynamicField name="*_ms" type="string" indexed="true" stored="true" multiValued="true" />
<dynamicField name="*_t" type="string" indexed="true" stored="true" />
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_mi" type="int" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
</types>
</schema>
Details on how to access the credentials can be found here : https://www.ibm.com/watson/developercloud/doc/retrieve-rank/tutorial.shtml#credentials
To sum up, from the Bluemix web dashboard, if you click on your R&R service instance, the "Service Credentials" tab will show a username and password. These will not be your IBM ID username or password.
That said, if you've been able to create a cluster, that would suggest that you have got valid credentials. Are you sure that the cluster was created successfully? Can you confirm this by getting the cluster details using the curl command described at https://www.ibm.com/watson/developercloud/retrieve-and-rank/api/v1/?curl#list_solr_clusters ?
Dude, I met the same problem. Use the cranfield-solr-config.zip in Tutorial and replace its original config file (schema.xml...) with your config file. But do not uncompress the zip file and compress it again!!! I do not know why this happens, but it does...

Apache Solr Facet Search with Space

I am new to Solr Facet Search. I am searching some data using Apache Solr search, I had used Facet for some column to get the count but if there is a space or special character in that field it has been taken into count separately. I had used the solution in this link Apache Solr facet search exclude space to avoid space but still my problem persists
My altered Schema.XML file after seeing the above link is
<schema name="solr_quickstart" version="1.1">
<types>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_not_tokenized" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="int" class="solr.TrieIntField"/>
<fieldType name="UUIDField" class="solr.UUIDField"/>
</types>
<fields>
<field name="id" type="UUIDField" indexed="true" stored="true"/>
<field name="caseid" type="int" indexed="true" stored="true"/>
<field name="casenumber" type="text" indexed="true" stored="true"/>
<field name="casestatus" type="text" indexed="true" stored="true"/>
<field name="casetype" type="text" indexed="true" stored="true"/>
<field name="closeddate" type="text" indexed="true" stored="true"/>
<field name="courtname" type="text" indexed="true" stored="true"/>
<field name="courtabbr" type="text" indexed="true" stored="true"/>
<field name="fileddate" type="text" indexed="true" stored="true"/>
<field name="judgename" type="text" indexed="true" stored="true"/>
<field name="lastupdated" type="text" indexed="true" stored="true"/>
<field name="maindefendant" type="text" indexed="true" stored="true"/>
<field name="mainplaintiff" type="string" indexed="true" stored="true"/>
<field name="all" type="string" docValues="true" indexed="true" stored="false" multiValued="true"/>
</fields>
<defaultSearchField>casenumber</defaultSearchField>
<uniqueKey>id</uniqueKey>
<copyField source="casenumber" dest="all"/>
<copyField source="casestatus" dest="all"/>
<copyField source="casetype" dest="all"/>
<copyField source="courtname" dest="all"/>
<copyField source="courtabbr" dest="all"/>
<copyField source="judgename" dest="all"/>
<copyField source="maindefendant" dest="all"/>
<copyField source="mainplaintiff" dest="all"/>
</schema>
kindly anyone guide me in the right way of configuring my Schema.XML file
Your problem is the tokenizer.
This splits the field-value into different terms and every term get it's own count in facet queries. To avoid this, you could remove the tokenizer (ore use an other tokenizer). The result will be, that the whole field will be one term. This is a problem, if you have mar than one "subject" in your textfield.
I had an equal problem and tried to use the protected words, wich will not be applied on the tokenizer. It's more (only?) for stemming: solr not tokenizing protected words

Unable to see data in a string field in apache solr 3.6 when importing data from mysql

All the other fields have the imported data but I dont see company_logo field when I search all the results i.e. : , here is my data config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/demandfire"
user="root" password=""/>
<document>
<entity name="core4"
query="select company.company_name,demand.id,demand.issue,
demand.suggestion,demand.title,demand.company_id,company.logo from company,demand
where demand.company_id = company.id;
">
<field column="demand.id" name="id"/>
<field column="demand.issue" name="issue"/>
<field column="demand.suggestion" name="suggestion"/>
<field column="demand.title" name="title"/>
<field column="demand.company_id" name="company_id"/>
<field column="company.company_name" name="company_name"/>
<field column="company.logo" name="company_logo"/>
</entity>
</document>
</dataConfig>
The following is my schema file, the problem comes in the field company_logo, I have mapped it correctly in the data-config file, all the other fields are able to get data but this field cant, the sample entry in this(logo) field of mysql table is of this type '6bf38f4e-a9af-40b8-af04-2b90d3c93f1f.jpg'
<schema name="example core one" version="1.1">
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="string_lowercase" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="issue" type="string_lowercase" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="suggestion" type="string_lowercase" indexed = "true" stored="true" multivalued="false" required = "true"/>
<field name="title" type="string_lowercase" indexed = "true" stored="true" multivalued="false" required = "true"/>
<field name="company_id" type="string" indexed = "true" stored="true" multivalued="false" required = "true"/>
<field name="company_name" type="string_lowercase" indexed = "true" stored="true" multivalued="false" required = "true"/>
<field name="company_logo" type="string" indexed = "true" stored="true" multivalued="false" required = "false"/>
</fields>
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>title</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
</schema>*
Try to remove the required= "false" attribute in your field (it should be the default anyway).
Try to change the definition of your field with this:
<field name="company_logo" type="string" indexed = "true" stored="true" multivalued="false"/>

SOLR 4.0 alphabetical sorting trouble

I'm having a hard time of getting my head around an issue I have with my SOLR address database.
I built this one up from the example files. I'm basically running the example configuration with a modified schema.
schema.xml:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true" required="false" multiValued="false" />
<field name="givenname_s" type="text_de" indexed="true" stored="true" required="true" multiValued="false" />
<field name="middleinitial_s" type="text_de" indexed="false" stored="true" required="false" multiValued="false" />
<field name="surname_s" type="text_de" indexed="true" stored="true" required="true" multiValued="false" />
<field name="gender_s" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="pictureuri_s" type="string" indexed="false" stored="true" required="false" multiValued="false" />
<field name="function_s" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />
<field name="organizationalunit_s" type="text_general" indexed="true" stored="true" required="false" multiValued="false" />
<field name="organizationalunitdescription_s" type="text_de" indexed="false" stored="true" required="false" multiValued="false" />
<field name="company_s" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />
<field name="street_s" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />
<field name="streetnumber_s" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="postcode_s" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="city_s" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />
<field name="building_s" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />
<field name="roomnumber_s" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="country_s" type="text_en" indexed="true" stored="true" required="true" multiValued="false" />
<field name="countrycode_s" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="emailaddress_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="phone1_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="phone2_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="mobile_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="fax_s" type="string" indexed="true" stored="true" required="false" multiValued="false" />
I am populating the database by pushing about 20.000 random test datasets like the following to post.jar:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<add>
<doc>
<field name="id">1352498443_1</field>
<field name="givenname_s">Aynur</field>
<field name="middleinitial_s"/>
<field name="surname_s">Lehnen</field>
<field name="gender_s">F</field>
<field name="pictureuri_s">dummy_assets/female.jpg</field>
<field name="function_s">Zugschaffner/in</field>
<field name="organizationalunit_s">P 07</field>
<field name="organizationalunitdescription_s">Lorem Ipsum sadipscing voluptua ipsum invidunt dolor et dolore invidunt sed consetetur accusam dolore Lorem tempor.</field>
<field name="company_s">Lorem Lagna Epsum Emet</field>
<field name="street_s">Erlenweg</field>
<field name="streetnumber_s">82</field>
<field name="postcode_s">76297</field>
<field name="city_s">Lübeck</field>
<field name="building_s"/>
<field name="roomnumber_s">242</field>
<field name="country_s">GERMANY</field>
<field name="countrycode_s">DE</field>
<field name="emailaddress_s">aynur.lehnen#lorem-lagna-epsum-emet.de</field>
<field name="phone1_s">0392984823</field>
<field name="phone2_s">0124111417</field>
<field name="mobile_s">0325117132</field>
<field name="fax_s">0171459177</field>
</doc>
</add>
However when retreiving data I seem to have problems with alphabetical sorting. Consider the folowing query:
{
"responseHeader": {
"status": 0,
"QTime": 5,
"params": {
"sort": "surname_s asc",
"fl": "surname_s",
"indent": "true",
"wt": "json",
"q": "city_s:berlin"
}
},
"response": {
"numFound": 1094,
"start": 0,
"docs": [{
"surname_s": "Weil"
}, {
"surname_s": "Abel"
}, {
"surname_s": "Adam"
}, {
"surname_s": "Ade"
}, {
"surname_s": "Adrian"
}, {
"surname_s": "Aigner"
}, {
"surname_s": "Aigner"
}, {
"surname_s": "Alber"
}, {
"surname_s": "Alber"
}, {
"surname_s": "Albers"
}]
}
}
Why is "Weil" on position one, while the rest of the data appears to be sorted correctly?
I believe that some of the additional analyzers that are being applied in the text_de field type are the cause for this sorting behavior. In my experience, for the best results when sorting strings is to use the alphaOlySort fieldType that comes with the example schema.xml shown below.
<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.LowerCaseFilterFactory" />
<!-- The TrimFilter removes any leading or trailing whitespace -->
<filter class="solr.TrimFilterFactory" />
<!-- The PatternReplaceFilter gives you the flexibility to use
Java Regular expression to replace any sequence of characters
matching a pattern with an arbitrary replacement string,
which may include back references to portions of the original
string matched by the pattern.
See the Java Regular Expression documentation for more
information on pattern and replacement string syntax.
http://java.sun.com/j2se/1.6.0/docs/api/java/util/regex/package-summary.html
-->
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all"
/>
</analyzer>
</fieldType>
I would recommend creating a new field and then copying the value from surname_s via copyField, something like the following:
<field name="surname_s_sort" type="alphaOnlySort" indexed="true" stored="false" required="false" multiValued="false" />
<copyField source="surname_s" dest="surname_s_sort"/>
Note: there is not any need to store the value in the surname_s_sort field, hence the stored="false" attribute, unless you expect to display that to the users.
Then you can just change your query to sort on the surname_s_sort instead.
Sorting doesn't work well on multivalued and tokenized fields.
Documentation -
Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)
Use string as the field type and copy the title field into the new field.
<field name="surname_s_sort" type="string" indexed="true" stored="false"/>
<copyField source="surname_s" dest="surname_s_sort" />
As #Paige answered you can have keyword tokenizer, lower case filters which do not tokenize the field.
I had similiar issues and I tried the alphaOnlySort. This work for some part, but it starts messing up the sort results when the field contains values like -,/ spaces etc.
So the result was something like
/ abc
aa
/ abc2
So I ended up using the field type lowercase. It was already there so I figured that its a default type. I did use the copy field construction, so my final config was:
<schema>
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fields>
<field name="job_name_sort" type="lowercase" indexed="true" stored="false" required="false"/>
</fields>
<copyField source="job_name" dest="job_name_sort"/>
</schema>

Resources