I have 2 cores on a single solr instance. Schema's of both cores share same primary key.
I want to merge results of a query from both the cores. Is it possible using solr?
I followed Solr:Distributed Search however the example didnt work for me ( I did get result but it was not unified) . I queried solr cores using :
localhost:8983/solr/core1/select/?shards=localhost:8983/solr/core1,localhost:8983/solr/core0&q=123_456.
Has anyone tried this approach before?
What do you mean by Unified results ??
You should be able to get a combined results from both the Cores in a single result set.
However, there are few limitations.
The schema needs to same for both the cores or has been synced up so that the search happens on the same fields and are returned accordingly.
What do mean cores share same primary key ?
The Id needs to be unique across cores.
The unique key field must be unique across all shards. If docs with
duplicate unique keys are encountered, Solr will make an attempt to
return valid results, but the behavior may be non-deterministic.
Related
There is a field named "id" which is used as unique key in solr. Although it's not directly used for faceting or sorting queries, it still comes up in fieldcache and occupies lot of memory.
Please help me understand how this id field came in field cache and also if there is a way to avoid this from fieldcache.
Is there a existing in-memory production-ready KV storage that allow me to retrive a single value via any of multiple keys?
Let say I have millions of immutable entities that have a primary key associated. Any of this entity can have multiple aliases and most common scenario is to retrieve the enity by such alias(90% of all requests). The second common scenario is to be able to retrive the entity via the primary key and after that put the new alias record(the last 10%). One special thing about this step - it always prepended by the alias searching and happens only if alias search was unsuccessful.
The entire dataset does fit into the RAM but probably doesn't if entire record data will be duplicated accross all aliases.
I'm higly concerned about data retrieval latency and less concerned on writing speed.
This can be done with Redis in two sequential lookups or via any SQL/Mongodb. I think both ways is suboptimal. The first one obviously because of two round trips for every search attempt and the second one because of latency concerns.
Any suggestions?
Can you do two hashmaps one that goes pk -> record data and the other that goes from alias -> pk ?
Another option is to have some sort of deterministic alias so that you can go from the alias to the primary key directly in code without doing a lookup in a datastore
I want to use this Go package https://github.com/bwmarrin/snowflake to generate primary int64 keys for my tables in Postgresql. If my application server is running on at least two machines how could I prevent duplicate keys from being generated?
So snowflake provides 63 bit integer stored in an int64. According to the documentation you can generate 4096 unique IDs every millisecond, per Node ID. Let's take the default implementation.That is 4096 * 1023 = 40961023 id's per millisecond and if you calculate in one second you can generate billions of unique id across multiple nodes and will be very rare to get conflict.
So i think if you pass a node id in env variable of the server and generate id's based upon that you should be safe.
It also helps to add some prefix to the id based upon the entity or domain so that you get more entropy which will reduce the conflicts even less.
Is there anyway to find the missing records from solr index.
I am running crawling against a SQL DB. My primaryKey is "id".
There are a few records missing in index. Is there any specific way to find those all??
Is it going to make any difference between a long value and string primary key, if we are using range query??
Thanks in advance....!!
If you mean that those records went "missing" during indexation, you can write them down in a file during indexation, because you will know more or less which records will not make it through.
If you are talking about comparing the database with Solr the only way is to crawl all the database and search for the record in Solr.
You can do it with a range query on group of ids if your ids are numeric for example and then if the result does not match you can narrow down the search.
they easiest way though is to just compare the ids one by one but it's also the slowest way. It depends on your database.
Primary keys in Solr are string only, but nobody say you can't have a numeric unique key alongside.
I am using solrcloud-4.3.0 and zookeeper-3.4.5 on windows machine. I have a collection of index with unique field "id". I observed that there were duplicate documents in the index with same unique id value. As per my understanding this should not happen cause the purpose of the unique field is to avoid such situations. Can anyone help me out here what causes this problem ?
In the "/conf/schema.xml" file there is a XML element called "", which seems to be "id" by default... that is supposed to be your "key".
However, according to Solr documentation (http://wiki.apache.org/solr/UniqueKey#Use_cases_which_do_not_require_a_unique_key) you do not always need to have always to have a "unique key", if you do not require to incrementally add new documents to an existing index... maybe that is what is happening in your situation. But I also had the impression you always needed a unique ID.
Probably too late to add an answer to this question, but it is also possible to duplicate documents with unique keys/fields by merging indexes with duplicate documents/fields.
Apparently when indexes are merged either via the lucene IndexMergeTool or the solr CoreAdminHandler, any duplicate documents will be happily appended to the index. (as of lucene and solr 4.6.0)
de-duplication seems to happen at retrieval time.
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes