How to create a composite uniqueKey in schema.xml? - solr

Is it possible to create a composite uniqueKey in schema.xml? Or is it better to concatenate the unique fields into one unique string id field in the source data?
If it's possible, and if it's not that big of a difference, I would prefer to do the former because it would save me a bit of time.

As discussed here How to set multiple fields as uniqueKey in solr? or there http://lucene.472066.n3.nabble.com/Multiple-uniqueKey-fields-td472939.html u cannot simply add multiple fields. Ud need to combine them into one field, but this can not be a multivalued field.

Related

Unique key field in solr

There is a field named "id" which is used as unique key in solr. Although it's not directly used for faceting or sorting queries, it still comes up in fieldcache and occupies lot of memory.
Please help me understand how this id field came in field cache and also if there is a way to avoid this from fieldcache.

Update specific field on SOLR index without storing other fields?

Is it possible in SOLR to update specific field on indexed document without storing other fields ?
I am using Apache Lucene in which update field internally delete original document and index all fields from document, which leads to store all fields values while indexing, and Storing all fields values degraded the indexing performance.
I got thread which says it is possible to update documents without storing the other fields values.

How to deal with compound keys using dih in solr

I am importing data from mysql db into solr documents. All is fine but I have one table which has a compound key (a pair of columns together as primary key) -> primary key for post_locations table is (post_id, location_id).
But my post_id is the primary key for my solr document, so when data is being imported from post_location table the location_ids are being overwritten.Is it possible to get location_ids(which is of type int) as an array(as there can be more than one location_id for a post).
For MySQL you can use GROUP BY and GROUP_CONCAT to get all the values for a field grouped together in a single column, separated by ,. You can then use the RegexTransformer and splitBy for that field to index the field as multiValued (in practice indexing it as an array). I posted an example of this in a previous answer. You might also do this by having dependent entity entries in DIH, but it will require more SQL queries than doing a GROUP BY and GROUP_CONCAT.
If you want one row for each entry, you can use build a custom uniqueKey instead, using CONCAT to build the aggregate / compound key on the MySQL side.

Solrcloud duplicate documents with id field

I am using solrcloud-4.3.0 and zookeeper-3.4.5 on windows machine. I have a collection of index with unique field "id". I observed that there were duplicate documents in the index with same unique id value. As per my understanding this should not happen cause the purpose of the unique field is to avoid such situations. Can anyone help me out here what causes this problem ?
In the "/conf/schema.xml" file there is a XML element called "", which seems to be "id" by default... that is supposed to be your "key".
However, according to Solr documentation (http://wiki.apache.org/solr/UniqueKey#Use_cases_which_do_not_require_a_unique_key) you do not always need to have always to have a "unique key", if you do not require to incrementally add new documents to an existing index... maybe that is what is happening in your situation. But I also had the impression you always needed a unique ID.
Probably too late to add an answer to this question, but it is also possible to duplicate documents with unique keys/fields by merging indexes with duplicate documents/fields.
Apparently when indexes are merged either via the lucene IndexMergeTool or the solr CoreAdminHandler, any duplicate documents will be happily appended to the index. (as of lucene and solr 4.6.0)
de-duplication seems to happen at retrieval time.
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes

Best way to store tags in a sql server table?

What's the best way to store tags for a record? Just use a varchar field? What about when selecting rows that contains tag x? Use the like operator?
thanks!
Depends on two things:
1) The amount of tags/tagged records
2) Whether or not you have a religious opinion on normalization :-)
Unless dealing with very large volumes of data, I'd suggest having a 'Tags' table mapping varchar values to integer identifiers then second table mapping tagged records to their tag ids. I'd suggest implementing this first, then check if it doesn't meet your performance needs. In that case, keep a single table with a id for the tagged row and the actual text of the tag, but in this I'd suggest you use a char column as it will kill your query if the optimizer does a full table scan against a large table with a varchar column.
Use a tags table with the smallest allowable primary key. If there are less than 255 tags use a byte (tinyint) or else a word (smallint). The smaller the key the smaller and faster the index on the foreign key in the main table.
No, it is generally a bad idea to put multiple pieces of data in a single field. Instead, use a separate Tags table (perhaps with just a TagID and TagName) and then, for each record, indicate the TagID associated with it. If a record is associated with multiple tags, you will have duplicate records with the only difference being TagID.
The advantage here is that you can easily query by tag, by record, and maintain the Tags table separately (i.e. what if a tag name changes?).

Resources