how to use UUID in solr schema - solr

I know this:
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
<field name="id" type="uuid" indexed="true" stored="true" default="NEW"/>
<uniquekey>id</uniquekry>
Yes,it works , But I can update the same pdf , so it's duplicated.
I want use version3(md5) of uuid to crypte pdf files.
How can I do??

Related

SOLR: copy 2 fields into another field and add filters to that new field

While importing I have below fields in CSV file
<field name="Brand" type="string" indexed="true"/>
<field name="Colour" type="lowercaseExactMatch"/>
<field name="Keywords" type="text_general"/>
<field name="Name" type="text_general" indexed="true"/>
<field name="Price" type="string" indexed="true"/>
<field name="SKU" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
I want to create another field dynamically NameKeywords, in which I want to concat Name and Keywords fields.
Also, I want to apply lowercase, EnglishPorterFilterFactory, EnglishPossessiveFilter, and HyphenatedWordsFilter
So I can apply filters to that field by creating a custom field type. But How to combine two fields into another field?
I saw CopyField into my schema.xml
<copyField source="Name" dest="Name_str" maxChars="256"/>
But not sure is it displays anywhere and how to combine fields here.
Create a field named NameKeywords as below.
<field name="NameKeywords" type="customFieldType" indexed="true" stored="true" multiValued="true"/>
then copy the source fields to destination field as below.
<copyField source="Name" dest="NameKeywords"/>
<copyField source="Keywords" dest="NameKeywords"/>

Missing 2 fields after applying the schema file puzzler

I am using Solr 7.4 and creating core using the 3 files from the gist (one can download the files and save them in the directory <dir>/test/conf).
solr create -c test -d <dir>/test
The schema has 14 files, while only 12 end up in schema browser in Admin UI.
The schema file looks like:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="collection" version="1.6"
xmlns:inc="http://www.w3.org/2001/XInclude">
<types>
<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" />
<fieldType name="int" class="solr.IntPointField" sortMissingLast="true"/>
<fieldType name="long" class="solr.LongPointField" sortMissingLast="true"/>
</types>
<fields>
<field name="childCode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="parentCode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="id" type="string" indexed="true" stored="true" multiValued="false" />
<filed name="sortOrder" type="int" indexed="true" stored="true" multiValued="false" />
<filed name="locked" type="boolean" indexed="true" stored="true" multiValued="false" />
<field name="status" type="string" indexed="true" stored="true" multiValued="false" />
<field name="filename" type="string" indexed="false" stored="true" multiValued="false" />
<field name="url" type="string" indexed="false" stored="true" multiValued="false" />
<field name="previewUrl" type="string" indexed="false" stored="true" multiValued="false" />
<field name="shape" type="string" indexed="true" stored="true" multiValued="false" />
<field name="originalHeight" type="int" indexed="true" stored="true" multiValued="false" />
<field name="originalWidth" type="int" indexed="true" stored="true" multiValued="false" />
<field name="sizes" type="string" indexed="true" stored="true" multiValued="true" />
<field name="_version_" type="long" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
</schema>
The missing fields are 'sortOrder' and 'locked'. Based on the documentation those are valid field names:
The name of the field. Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g., version) are reserved. Every field must have a name.
Other int fields with camel case are created such as 'originalHeight' and 'originalWidth'. I am able to go into Admin UI and add the fields manually with the name and the type from the file.
I am puzzled and would appreciate any clue to this disappearing fields mystery.
Your spelling is wrong:
<filed name="sortOrder" ..
<filed name="locked" ..
Change it to <field> and it'll work as the other fields.

Solr 4.2.1 - Indexing pipe seperated file using DataImportHandler - 2 Issues

I am new to Solr and working on a solr POC.
I searched the StackOverflow for a similar issue but couldnot find one
I am trying to use Solr 4.2.1 to index a text file containing pipe (|) seperated data. The following is snippet of sample data
cust_id|name1|name2|name3|dob|address|city|pincode|phone|idenfication|salary
1001000003|John|D|Doe|31081962|H-904, Green Mandion, M G Rd, Santacruz(east)|mumbai|400056|9812030334|AMXPT7702P|50000.56
1001000005|Bob||Taylor|1041982|210, Greek Heights, Khar|mumbai|400057|976130321|AAXZZ2103P|20000.65
I am using the dataimporthandler to import the data into Solr
I have two issues
When I do a select query
I get following response
{
'responseHeader'=>{
'status'=>0,
'QTime'=>0},
'response'=>{'numFound'=>3,'start'=>0,'docs'=>[
{
'cust_id'=>'cust_id|name1|name2|name3|dob|address|city|pincode|phone|idenfication|salary'},
{
'cust_id'=>'1001000003|John|D|Doe|31081962|H-904, Green Mandion, M G Rd, Santacruz(east)|mumbai|400056|9812030334|AMXPT7702P|50000.56'},
{
'cust_id'=>'1001000005|Bob||Taylor|1041982|210, Greek Heights, Khar|mumbai|400057|976130321|AAXZZ2103P|20000.65'}]
}}
How do I get this into column:value and not as a string of data, I mean
{
'responseHeader'=>{
'status'=>0,
'QTime'=>0},
'response'=>{'numFound'=>3,'start'=>0,'docs'=>[
{
'cust_id'=>'1001000003',
'name1' => 'John',
'name2' => 'D',
......
......
'salary' => 50000.56
}
,
{
'cust_id'=>'1001000005,
'name1' => 'Bob'
....
'salary' => 20000.65
}]
}}
My config file is as follows
<dataConfig>
<dataSource name="dfs" encoding="UTF-8" type="FileDataSource" />
<document>
<entity name="sourcefile"
processor="FileListEntityProcessor"
newerThan="${dataimporter.last_index_time}"
fileName="sample.txt"
rootEntity="false"
baseDir="C:/mfi_data/"
header=true
>
<entity name="entryline"
processor="LineEntityProcessor"
url="${sourcefile.fileAbsolutePath}"
rootEntity="true"
dataSource="dfs"
separator="|"
transformer="RegexTransformer"
>
<field column="rawLine"
regex="^(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)$"
groupNames="cust_id,name1,name2,name3,dob,address,city,pincode,phone,idenfication,salary"
/>
</entity>
</entity>
</document>
</dataConfig>
My schema.xml
<?xml version="1.0" encoding="UTF-8" ?>
<schema version="1.5">
<fields>
<field name="cust_id" type="string" indexed="true" stored="true" />
<field name="name1" type="string" indexed="true" stored="true" />
<field name="name2" type="string" indexed="true" stored="true" />
<field name="name3" type="string" indexed="true" stored="true" />
<field name="dob" type="string" indexed="true" stored="true" />
<field name="address" type="string" indexed="true" stored="true" />
<field name="city" type="string" indexed="true" stored="true" />
<field name="pincode" type="int" indexed="true" stored="true" />
<field name="phone" type="string" indexed="true" stored="true" />
<field name="identification" type="string" indexed="true" stored="true" />
<field name="salary" type="float" indexed="false" stored="true" />
<field name="rawLine" type="text" indexed="false" stored="false" multiValued="true" />
</fields>
<uniqueKey>cust_id</uniqueKey>
<types>
<fieldType name="string" class="solr.StrField" />
<fieldType name="int" class="solr.TrieIntField" />
<fieldType name="text" class="solr.TextField" />
<fieldType name="float" class="solr.FloatField" />
</types>
</schema>
How do I remove header from being considered as data to be indexed?
I tried Header="true" in the dataConfig but thats not working
Please guide if you have encountered a way around this, thanks in advance?
Solr accepts index updates in CSV (Comma Separated Values) format. Different separators and escape mechanisms are configurable, and multi-valued fields are supported. http://wiki.apache.org/solr/UpdateCSV
separator
Specifies the character to act as the field separator. Default is separator=,
header
true if the first line of the CSV input contains field or column names. The default is header=true. If the fieldnames parameter is absent, these field names will be used when adding documents to the index.

solr join function to query documents in multiple cores NullPointerException

I use solr join to query documents from two cores, my cores is defined as follows:
Post core:
<fields>
<!-- general -->
<field name="id"type="string"indexed="true"stored="true" multiValued="false" required="true"/>
<field name="creatorId"type="string"indexed="true"stored="true"multiValued="false" required="true"/>
.
.
.
</fields>
User core:
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="username" type="string" indexed="true" stored="true" multiValued="false" />
<field name="email" type="string" indexed="true" stored="true" multiValued="false" />
<field name="userBrief" type="string" indexed="true" stored="true" multiValued="false" />
<field name="jobNumber" type="string" indexed="true" stored="true" multiValued="false" />
</fields>
now I want to query all user who has created post, I use join function, my url is like this:
http://localhost:9080/solr/user/select?q=*:*&fq={!join from=creatorId to=id fromIndex=post}
but it don't work, and it throw a exception:
null: java.lang.NullPointerException
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:559)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:646)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
.
.
.
I don't know why, can you help me?
The fq parameter requires a valid query with the !join.
Try adding an everything search to the end of the fq param like this. http://localhost:9080/solr/user/select?q=*:*&fq={!join from=creatorId to=id fromIndex=post}*:*
In a realistic setting you would likely want to filter the joined results in some way, for example, "Find me all action movies rated by this user updated in the past two weeks," where the movies and user ratings are stored as separate documents.

Solr exception due to schema

I have the following solr schema
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="testthing" version="1.5">
<fields>
<field name="_version_" type="long" indexed="true" stored="true" required="true"/>
<field name="doc_id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="title" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="doc_type" type="string" indexed="false" stored="true" required="true" multiValued="false"/>
<field name="description" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="allText" type="fs_text" indexed="true" stored="false" required="true" multiValued="true"/>
</fields>
<uniqueKey>doc_id</uniqueKey>
<copyField source="title" dest="allText" />
<copyField source="description" dest="allText" />
<dynamicField name="*" type="ignored" multiValued="true" />
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="fs_text" class="solr.TextField" positionIncrementGap="100"/>
</types>
</schema>
Solr complains about missing field text at dynamic field type
1898 [main] INFO org.apache.solr.servlet.SolrDispatchFilter ? SolrDispatchFilter.init() done
1918 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: undefined field text at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1235)
however, my one and only dynamic field (ignore all not matched) doesn't use text type (it's type=ignore).
What am I missing here?
** so far, renaming the allText to text pretty much fixed the issue but I can't figure out why! Is there something special/predefined about text in Solr 4.1 ?
It is not about field type "text". It is about field named "text".
<defaultSearchField>text</defaultSearchField>
You may have changed or remove the default field in config. If this fixes the issue, then you know somewhere in the configuration you're referring to "text" field, possibly in solrconfig.xml as suggested in

Resources