Disabling virtual nodes in an existing Solr DC - solr

I have an existing cluster with the following topology:
DC Cassandra: 2 nodes
DC Solr: 5 nodes
All of the nodes currently use vnodes. I want to disable vnodes in the Solr DC for performance reasons.
According to this document, to disable vnodes:
In the cassandra.yaml file, set num_tokens to 1
Uncomment the initial_token property and set it to 1 or to the value of a generated token for a multi-node cluster.
Is this all that I need to do? (no repair, no cleanup, no anything?) Seems too good to be true for me.
As for token assignment, should I use the python code found here (for Murmur3) or should I reuse one of the existing tokens from the vnodes that the node currently has?

The only way to disable vnodes is to do: http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configVnodesProduction_t.html
in the reverse. Make a new Solr dc with vnodes off and switch over to it.

Related

using DistinguishedName in ldap query

I have a requirement where I need to run query like below and fetch 2-3 attributes for all entities satisfying this query. The number of distinguishedName would be around 100 in a single query. As I see in the microsoft documentation, that distinguishedName is not indexed, I suspect that this might cause performance issues.
Does anybody know if this would indeed cause performance issues? Apart from the below ldap filter, I would obviously have to use SUBTREE scope.
(|(distinguishedName=<DN 1 goes here>)(distinguishedName=<DN 2 goes here>))
Edit 1:
I tried this in my test Active Directory which has ~6k entries.
Internal event: A client issued a search operation with the following options.
Starting node:
DC=example,DC=com
Filter:
( |
(distinguishedName=CN=user-1,CN=large-test,CN=Users,DC=example,DC=com)
(distinguishedName=CN=user-2,CN=large-test,CN=Users,DC=example,DC=com)
(distinguishedName=CN=user-3,CN=large-test,CN=Users,DC=example,DC=com)
(distinguishedName=CN=group1,CN=large-test,CN=Groups,DC=example,DC=com)
)
Search scope:
subtree
Attribute selection:
mail,objectClass
Server controls:
Visited entries:
4
Returned entries:
4
Used indexes:
idx_distinguishedName:4:N;idx_distinguishedName:1:N;idx_distinguishedName:1:N;idx_distinguishedName:1:N;
Pages referenced:
123
Pages read from disk:
0
From the results it looks like it only visited 4 entries that I searched for using some indexes. I checked with schema snap-in tool (just to be sure) and it doesn't show indexes on distinguishedName. Not sure how it's using these indexes though.
Microsoft Active Directory stores the group memberships at the entry level, so you could use this to fetch the email attribute.
E.g.
ldapsearch .... -b SEARCH_BASE "(|(memberOf=GROUP_DN_1)(memberOf=GROUP_DN_2)...)" mail

NDB Query hangs for one set of criteria

UPDATED:
I do not have an entry in the index.yaml, expecting zig-zag to work fine on equalities (which it appears to in the datastore UI) and here is the actual query I'm making:
UploadBlock.query(
UploadBlock.time_slices == timeslice, # timeslice is each value in the repeated property
UploadBlock.company == block.company,
UploadBlock.aggregator_serial == block.aggregator_serial,
UploadBlock.instrument_serial == block.instrument_serial)\
.fetch(10, keys_only=True)
I've cross-posted this because I didn't read the initial support page first, so this is also on the google forum at https://groups.google.com/forum/#!topic/google-appengine/-7n8Y_tzCgM
I have a datastore query I make as an integral part of my system. It's an equality filter on four properties. When I make this query for one set of criteria, the query just hangs. It does not hang for this set of criteria in the datastore viewer, so I can see that there are only two results to return. I can fetch both of these entities by id without problem, but the query always hangs, both in remote shell and in production. The query is running fine for a multitude of different criteria, it appears to just be this one set. I have attempted replacing both entities so that the criteria still matches, but they have different ids and I get the same result.
For some more specific information, the four properties are defined as follows:
company = ndb.KeyProperty(indexed=True)
aggregator_serial = ndb.StringProperty(indexed=True)
instrument_serial = ndb.StringProperty(indexed=True)
time_slices = ndb.IntegerProperty(indexed=True, repeated=True)
I'm running gcloud versions
Google Cloud SDK 207.0.0
app-engine-java 1.9.64
app-engine-python 1.9.71
app-engine-python-extras 1.9.69
beta 2018.06.22
bq 2.0.34
core 2018.06.22
gcloud
gsutil 4.32
Any help on what I should try for next steps would be greatly appreciated.
Thanks in advance!

Solr: To get all records

I am trying to upgrade my Solr 4.x version to 5.2.1 Solrcloud implementation. I had written following code to get all the results from Sorl query which works well in Solr single instance mode.
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.addSort("agent_status", ORDER.desc);
query.addFilterQuery("account_id:\"" + accountId + "\"");
query.set("rows", Integer.MAX_VALUE);
But code will not work well in SolrCloud implemenation.It throws following exception.
2015-08-14 16:44:45,648 ERROR [solr.core.SolrCore] - [http-8080-8] : java.lang.NegativeArraySizeException
at org.apache.lucene.util.PriorityQueue.<init>(PriorityQueue.java:58)
at org.apache.lucene.util.PriorityQueue.<init>(PriorityQueue.java:39)
at org.apache.solr.handler.component.ShardFieldSortedHitQueue.<init>(ShardDoc.java:113)
at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:972)
at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:750)
at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:729)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
I found that it is failing because of query.set("rows", Integer.MAX_VALUE) statement.People suggested me to use pagination.
But, I can not afford doing pagination as there will be too many changes at UI side.
There is one more way where I can first query with some small number & get total number of documents using response.getResults().getNumFound() method & try setting that value to setRows method.But this approach will increase one more call to server.
Is there any other way I can solve this problem?
You can always set your rows to be a large value that would encompass your results. Integer.MAX_VALUE will not work due to the size limits of Java Arrays (see here) and the Lucene Priority Queue (see lines 42 - 58).
Solr-534 requested to have essentially what your asking for; there is some good conversation about why and why-not such a feature would be good.
A better question might be how many documents can the UI hold without becoming unusable? However many documents that is, would be a good value for your query to return.

Solr 3.5 indexing taking very long

We recently migrated from solr3.1 to solr3.5, we have one master and one slave configured. The master has two cores,
1) Core1 – 44555972 documents
2) Core2 – 29419244 documents
We commit every 5000 documents, but lately the commit is taking very long 15 minutes plus in some cases. What could have caused this, I have checked the logs and the only warning i can see is,
“WARNING: Use of deprecated update request parameter update.processor detected. Please use the new parameter update.chain instead, as support for update.processor will be removed in a later version.”
Memory details:
export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"
Solr Config:
<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>
<!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
Also noticed, that top command show almost 350GB of Virtual memory usage.
What could be causing this, as everything was running fine a few days back?
Do you have a large search warming query? Our commits take upto 2 mins because of search warming in place. Wondering if that is the case.
The large virtual memory usage would explain this.

How can I modify the Solr Update Handler to not simply overwrite existing documents?

I'm working with Solr indexing data from two sources - real-time "pump" inserting (and updating) documents into Solr and database which holds backups of those documents.
The problem we encountered looks like that - if we make a data import from database while pump is performing inserts, we may index a doc from pump, and later overwrite it with doc extracted from database - which is a backup, so it's probably little outdated.
If we close the pump, import from database and open the pump again, it probably will cause instabilities in our application.
What I'd like to do is tell Solr to not automatically overwrite the document, but do so conditionally (for example by the value of 'last_modified_date' field).
My question is - how can I do it? Do I have to modify Solr source, make a new class overwriting some update processor, or just add some magic lines to solrconfig?
Sorry, but there there is not an option or config to tell Solr to not automatically update documents, but instead use some conditional check. The current model for Solr is that if you insert a document with the same unique id as one already in the index, it will "update" that document by a delete/add operation. Solr also does not currently support the ability to only update specific fields in an existing indexed document. Please see issue SOLR-139 for more details.
Based on the scenario you have described, I would suggest that you create a process outside of Solr that handles the retrieval of items from your data sources and then performs the conditional check to see what is in the index already and determine if an update to the index is necessary.
You can use solr script processors to check if that document exists proceeds in its accordance
Below code only works when solr uses java 8
function processAdd(cmd) {
doc = cmd.solrDoc;
var previousDoc=null;
try {
// create a term type object
var Term = Java.type("org.apache.lucene.index.Term");
var TermObject =new Term("fieldForSearchTryUnique","Value of field");
//retrieve document id from solr return -1 if not present
previousDocId= req.getSearcher().getFirstMatch(TermObject);
if(-1!=perviousDocId) {
// get complete document from solr for that searched field
previousDoc=req.getSearcher().doc(previousDocId);
// do required process here
}
}
catch(err) {
logger.error("error in update processor "+err)
}
}

Resources