SolrCloud Autoscaling Automatically Adding Replicas - solr

I'm using the 7.7 version of Solr cluster. I have three prescale AWS
EC2 servers with a Target group, an NLB attached, and autoscaling
turned on just for the Solr cluster.
Using the command below, I created the solr collection (collection Name: abc) with 1 shard and 3 replicas. Currently, the cluster now includes autoscale servers (4 replica for collection Name: abc)
There are currently 4 active nodes in that cluster.
From the application team, a collection (collection Name: xyz) will be created from code. They are unsure of the number of active nodes in the Solr cluster. In the zookeeper properties, the replicationFactor value is set to 3.
Three replication factors were used to create the collection from the application side (one of the replica is autoscale server). Now that the load is decreasing, the autoscale server will shut down.
There are only 2 replicationFactors available at that time (collection Name: xyz). Prescale server does not automatically add new collection xyz.
Please provide a solution for this.
Create collection command:
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=test_50&numShards=1&replicationFactor=3&collection.configName=conf_product&autoAddReplicas=true"
###Below the autoscaling policy i applied for solr.
curl http://localhost:8983/solr/admin/autoscaling -H 'Content-type:application/json' -d '{ "set-cluster-policy" : [{ "replica" : "1", "shard" : "#EACH", "node" : "#ANY", }] }'
curl http://localhost:8983/solr/admin/autoscaling -H 'Content-type:application/json' -d '{ "set-trigger": {"name" : "node_added_trigger","event" : "nodeAdded","waitFor" : "5s", "preferredOperation": "ADDREPLICA", "enabled" : true, "actions" : [{ "name" : compute_plan", "class": "solr.ComputePlanAction" }, { "name" : "execute_plan", "class": "solr.ExecutePlanAction" } ] }}'
curl http://localhost:8983/solr/admin/autoscaling -H 'Content-type:application/json' -d '{ "set-trigger": { "name" : "node_lost_trigger", "event" : "nodeLost", "waitFor" : "5s", "preferredOperation": "DELETENODE", "enabled" : true, "actions" : [ { "name" : "compute_plan", "class": "solr.ComputePlanAction" }, { "name" : "execute_plan", "class": "solr.ExecutePlanAction" } ] }}'

Related

Solr edismax query syntax error "Query Field '_text_' is not a valid field name"

I had originally created in my solr schema 3 copy fields:
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field": {"source":"company_name","dest":"_text_"}}' http://my-instance/solr/listing/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field": {"source":"address","dest":"_text_"}}' http://my-instance/solr/listing/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field": {"source":"city","dest":"_text_"}}' http://my-instance/solr/listing/schema
However, I have recently removed these from the schema and are now composing queries in a slightly different format. More advanced queries we have the need for edismax.
However, even by turning on edismax I'm receiving an error from the solr query parser as per below. Did I break something by deleting the copy fields?
/solr/listing/select?debugQuery=on&defType=edismax&q=%3A&stopwords=true
{
"responseHeader": {
"zkConnected": true,
"status": 400,
"QTime": 1,
"params": {
"q": "*:*",
"defType": "edismax",
"debugQuery": "on",
"stopwords": "true"
}
},
"error": {
"metadata": [
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"org.apache.solr.common.SolrException"
],
"msg": "org.apache.solr.search.SyntaxError: Query Field '_text_' is not a valid field name",
"code": 400
}
}
As per the comments the 'text' field remains in 3 places in the config:
"/update/extract":{
"startup":"lazy",
"name":"/update/extract",
"class":"solr.extraction.ExtractingRequestHandler",
"defaults":{
"lowernames":"true",
"fmap.content":"_text_"}}
"spellchecker":{
"name":"default",
"field":"_text_",
"initParams":[{
"path":"/update/**,/query,/select,/tvrh,/elevate,/spell,/browse",
"defaults":{"df":"_text_"}}]
As per the comment on my question (I'm still on the learning path of solr):
Although they have been deprecated for quite some time, Solr still has
support for Schema based configuration of a <defaultSearchField/>
(which is superseded by the df parameter) and <solrQueryParser defaultOperator="OR"/> (which is superseded by the q.op parameter.
If you have these options specified in your Schema, you are strongly
encouraged to replace them with request parameters (or request
parameter defaults) as support for them may be removed from future
Solr release.
For our purposes and as we are using the edismax query parser we needed to specify the query fields that we wanted to use.
2+ year old post, not sure this will help.
Since you are using "defType": "edismax"
try "q.alt": "*:*" instead of "q": "*:*". This should fix the issue.

sh.isBalancerRunning(): false

I am trying to shard a mongodb database like this:
1- Start each member of the shard replica set
mongod --shardsvr --port 27100 --replSet r1 --dbpath <some_path>\shardsvr\shardsvr1
mongod --shardsvr --port 27200 --replSet r2 --dbpath <some_path>\shardsvr\shardsvr2
2- Start each member of the config server replica set
mongod --configsvr --port 27020 --replSet cfg1 --dbpath <some_path>\configsvr\configsvr1
3- Connect to config server replica set
mongo --port 27020
4- Initiate the replica set
conf = {
_id: "cfg1",
members: [
{
_id:0,
host: "localhost:27020"
}
]
}
rs.initiate(conf)
5- Start the mongos and specify the --configdb parameter
mongos --configdb cfg1/localhost:27020 --port 28000
6- Initiate the replica set of each shard
mongo --port 27100
var config = {_id: "r1", members: [{_id:0, host:"localhost:27100"}]}
rs.initiate(config)
exit
mongo --port 27200
var config = {_id: "r2", members: [{_id:0, host:"localhost:27200"}]}
rs.initiate(config)
exit
7- Connect to mongos to add shards
mongo --port 28000
sh.addShard("r1/localhost:27100")
sh.addShard("r2/localhost:27200")
8- Add some data
use sharddb
for (i = 10000; i < 30000; i++){
db.example.insert({
author: "author" + i,
post_title: "Blog Post by Author " + i,
date: new Date()
});
}
db.example.count()
9- Enable sharding
sh.enableSharding("sharddb")
10- Create the index as part of sh.shardCollection()
db.example.ensureIndex({author : 1}, true)
sh.shardCollection("sharddb.example", {author: 1})
11- Check if balancer is running
sh.isBalancerRunning()
However, in this step, I get a false as response, and I dont know what I did wrong to get this. I followed the steps of this tutorial
With only 20000 documents that are ~100 bytes each, there is probably only 1 chunk.
Check with
use sharddb
db.printShardingStatus()
I repeated the steps you listed above, and got the following result:
{ "_id" : "sharddb", "primary" : "shard02", "partitioned" : true }
sharddb.example
shard key: { "author" : 1 }
unique: false
balancing: true
chunks:
shard02 1
{ "author" : { "$minKey" : 1 } } -->> { "author" : { "$maxKey" : 1 } } on : shard02 Timestamp(1, 0)
The mongos will monitor what it has added to each chunk, and notify the config server to consider splitting when it has seen enough data added. Then the balancer will automatically be activated when one shard contains several more chunks than another.
If you insert enough documents to trigger automatic splitting, or manually split the chunk, the balancer will begin doing its thing.

SolrCloud in production - querying q=* gives numFound=0

So I have a three-node cluster deployed using a zookeeper. And successfully created test collection (3 shards). Then after I have
curl -X POST -H 'Content-Type: application/json' 'ec2FirstNodeIP:8983/solr/test/update' --data-binary ' [ { "f1" : "1", "f2" : "2", "f3" : "3" } ]'
I got
{"responseHeader":{"status":0,"QTime":38} ...
However when I have curl "sameIP:8983/solr/test/select?wt=json&indent=true&q=*:*"
I am getting
NumFound:0
But using the admin UI for updating the document, the query now returns the document
image for admin UI
What am I missing?
To make document searchable we should commit. use commit=true
ec2FirstNodeIP:8983/solr/test/update?commit=true this should work.

Mongodb shard balance not work properly, with a lot of moveChunk error reported

We have a mongoDb cluster with 3 shards, each shard is a replica set contains 3 nodes, the mongoDb version we use is 3.2.6. we have a big database with size about 230G, which contains about 5500 collections. we found that about 2300 collections are not balanced where other 3200 collections are evenly distributed to 3 shards.
below is the result of sh.status (the whole result is too big, i just post part of them):
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("57557345fa5a196a00b7c77a")
}
shards:
{ "_id" : "shard1", "host" : "shard1/10.25.8.151:27018,10.25.8.159:27018" }
{ "_id" : "shard2", "host" : "shard2/10.25.2.6:27018,10.25.8.178:27018" }
{ "_id" : "shard3", "host" : "shard3/10.25.2.19:27018,10.47.102.176:27018" }
active mongoses:
"3.2.6" : 1
balancer:
Currently enabled: yes
Currently running: yes
Balancer lock taken at Sat Sep 03 2016 09:58:58 GMT+0800 (CST) by iZ23vbzyrjiZ:27017:1467949335:-2109714153:Balancer
Collections with active migrations:
bdtt.normal_20131017 started at Sun Sep 18 2016 17:03:11 GMT+0800 (CST)
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
1490 : Failed with error 'aborted', from shard2 to shard3
1490 : Failed with error 'aborted', from shard2 to shard1
14 : Failed with error 'data transfer error', from shard2 to shard1
databases:
{ "_id" : "bdtt", "primary" : "shard2", "partitioned" : true }
bdtt.normal_20160908
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard2 142
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160909
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard1 36
shard2 42
shard3 46
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160910
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard1 34
shard2 32
shard3 32
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160911
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard1 30
shard2 32
shard3 32
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160912
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard2 126
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160913
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard2 118
too many chunks to print, use verbose if you want to force print
}
Collection "normal_20160913" is not balanced, I post the getShardDistribution() result of this collection below:
mongos> db.normal_20160913.getShardDistribution()
Shard shard2 at shard2/10.25.2.6:27018,10.25.8.178:27018
data : 4.77GiB docs : 203776 chunks : 118
estimated data per chunk : 41.43MiB
estimated docs per chunk : 1726
Totals
data : 4.77GiB docs : 203776 chunks : 118
Shard shard2 contains 100% data, 100% docs in cluster, avg obj size on shard : 24KiB
the balancer process is in running status, and the chunksize is default(64M):
mongos> sh.isBalancerRunning()
true
mongos> use config
switched to db config
mongos> db.settings.find()
{ "_id" : "chunksize", "value" : NumberLong(64) }
{ "_id" : "balancer", "stopped" : false }
And I found a lot of moveChunk error from mogos log, which might be the reason why some of the collections not well balanced, here is the latest part of them:
2016-09-19T14:25:25.427+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:25:59.620+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:25:59.644+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:35:02.701+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:35:02.728+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:18.232+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:18.256+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:27.101+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:27.112+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:43:41.889+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
I tried use moveChunk command manually, it's returns same error:
mongos> sh.moveChunk("bdtt.normal_20160913", {_id:ObjectId("57d6d107edac9244b6048e65")}, "shard3")
{
"cause" : {
"ok" : 0,
"errmsg" : "Not starting chunk migration because another migration is already in progress",
"code" : 117
},
"code" : 117,
"ok" : 0,
"errmsg" : "move failed"
}
I am not sure if too many collections created which cause migration overwhelmed? each day about 60-80 new collections will created.
I need help here to answer below questions, any hints will be great:
Why some of the collections not balanced, is it related to the big number of newly created collections?
Is there any command can check the processing migration jobs details? I got a lot of error log which shows some migration jog is running, but I can not find which is running.
Answer my own question:
Finally we found the root cause, it's an exactly same issue with this one "MongoDB balancer timeout with delayed replica", caused by abnormal replica set config.
When this issue happens, our replica set configuration as below:
shard1:PRIMARY> rs.conf()
{
"_id" : "shard1",
"version" : 3,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "10.25.8.151:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "10.25.8.159:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "10.25.2.6:37018",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 3,
"host" : "10.47.114.174:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(86400),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5755464f789c6cd79746ad62")
}
}
There are 4 nodes inside the replica set: one primary, one slave, one arbiter and one 24 hours delayed slave. that makes 3 nodes to be majority, since arbiter have no data present, balancer need to wait the delayed slave to satisfy the write concern(make sure the receiver shard have received the chunk).
There are several ways to solve the problem. We just removed the arbiter, the balancer works fine now.
I'm going to speculate but my guess is that your collections are very imbalanced and are currently being balanced by chunk migration (It might take a long time). Hence your manual chunk migration is queued but not executed right away.
Here are a few points that might clarify a bit more:
One chunk at a time: MongoDB chunk migration happens in a queue mechanism and only one chunk at a time are migrated.
Balancer lock: Balancer lock information might give you some more idea of what is being migrated. You should also be able to see log entries is chunk migration in your mongos log files.
One option you have is to do some pre-splitting in your collections. The pre-splitting process essentially configured an empty collection to start balanced and avoid being imbalanced in the first place. Because once they get imbalanced the chunk migration process might not be your friend.
Also, you might want to revisit your shard keys. You are probably doing something wrong with your shard keys that's causing a lot of imbalance.
Plus, your data size doesn't seem too large to me to warrant a sharded configuration. Remember, never do a sharded configuration unless you are forced by your data size/working set size attributes. Because sharding is not free (you are probably already feeling the pain).

Lily with Morphline and HBase

I'm trying to use an tutorial from Cloudera. (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/search_hbase_batch_indexer.html)
I have a code to insert objects in Avro format in HBase and I want to insert them to Solr but I don't get anything.
I have been taking a look to the logs:
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={0Name178721/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
15/06/12 00:45:00 DEBUG indexer.Indexer$RowBasedIndexer: Indexer _default_ will send to Solr 0 adds and 0 deletes
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={1Name134339/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
So, I'm reaing them but I don't know why it isn't indexed anything in Solr.
I guess that my morphline.conf is wrong.
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**", "com.ngdata.**"]
commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:avroUser"
outputField : "_attachment_body"
type : "byte[]"
source : value
}
]
}
}
#for avro use with type : "byte[]" in extractHBaseCells mapping above
{ readAvroContainer {} }
{
extractAvroPaths {
flatten : true
paths : {
name : /name
}
}
}
{ logTrace { format : "output record: {}", args : ["#{}"] } }
]
}
]
I wasn't sure if I had to have an "_attachment_body" field in Solr, but it seems that it isn't necessary, so I guess that readAvroContainer or extractAvroPaths are wrong.
I have a "name" field in Solr and my avroUser has a "name" field as well.
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
I have all this things working well here.
I did this steps:
1) Install hbase-solr-indexer as a service:
Fist of all you have to install hbase-solr-indexer.
installing hbase-solr-indexing as a service
Add cloudera repos to yum repos for this.
After that type:
sudo yum install hbase-solr-indexer
2) Criate morphline files:
ok, you did it.
2) Set the Replication scope for every column family and register a hbase-indexer configuration
Using the Lily HBase NRT Indexer Service
$ hbase shell
hbase shell> disable 'record'
hbase shell> alter 'record', {NAME => 'data', REPLICATION_SCOPE => 1}
hbase shell> enable 'record'
Try to follow the others tutorials above. ;)
I was with problems with a NRT solution, but when I followed all that tutorial step by step It worked.
I hope this help someone.

Resources