Mongodb Primary shard cluster down - database

just wanted to understand a scenario where if primary shard cluster down.
so i have a setup of Mongo database where i have 4 shards running in replicaset.
shard-1 == Server 1 (Primary), shard-1 Server 2 (Secondary), shard-1 - Server 3 (Secondary)
shard-2 == Server 4 (Primary), shard-2 - Server 5(Secondary), shard-2 - Server 6(Secondary)
shard-3 == Server 7 (Primary), shard-3 - Server 8(Secondary), shard-3 - Server 9(Secondary)
i have single database and single collection, so assuming that is distributed across all 3 shards as chunks and balancer is doing it's job right?
so in such case if shard-1(cluster) goes down, will traffic movement will be normal or will be hampered.

I guess you mix sharding and replication.
What do you mean by "if shard-1(cluster) goes down"? This means you lose 3 servers at once! Is this a probable situation you like investigate? Then I would say, you did a poor design of your cluster.
Sharding needs to be enabled on database level and on collection level. Based on the information you provided, no answer can be given.

well i was just making a scenario where complete cluster might go down, let's say, i have a cluster in single DC and that DC is down, so i might be in a situation where complete cluster is unavailable, other 2 cluster's are in different DC and so they are up and running well.
ok, let's try another way, If the primary replica set member of a primary shard goes down, elections will be held, the new primary will be announced and everything is back to normal right?.
but only in case if cluster have another available nodes, what if the complete cluster goes down?.
what is the use case of other 2 sharded cluster?.
or do i have wrong understanding of sharding?.
also, yes i have enabled sharding on my DB and collection both.

ok, why i was coming this way, let me tell you my actual problem here, i have 4 shards and all in replicaset.
when i check sh.status(), i saw below output
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: yes
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
7641 : Failed with error 'aborted', from MCA2 to MCA4
databases:
{ "_id" : "MCA", "primary" : "MCA2", "partitioned" : true, "version" : { "uuid" : UUID("xxxxxxxxxxxx"), "lastMod" : 1 } }
xxxxxxx
shard key: { "xxxxx" : "hashed" }
unique: false
balancing: true
chunks:
MCA 1658
MCA2 1692
MCA3 1675
MCA4 1670
so my simple question is if this Primary MCA2 shard goes down, what will happen, will collection(xxxxxx) is inaccessible by application or what?
also, as per terminology, i have 3 nodes cluster so anyone can go down and other becomes primary for traffic serve, so as long as any of the node is alive in my primary shard is alive and can server traffic to application right?.
if yes then let's say complete replicaset is down of primary shard MCA2, what now?
if no, then what will happen.
changed the actual value of collection and shard key for security reason to (xxxxxxx)

Related

How do I partition mongodb datasets?

I'm stuck in mongodb sharding and I need your help!
My first question is "How do I make my database partitioned:true in sh.status()?
I've worked with sharding servers and mongos but I need to partition my documents base on datetime.So I used tags and zone-ranges but I couldn't make this option true!
Here is the option I'm talking about:
I tried query it by sh.shardCollection("db.coll" , partitioned:true) but it doesn't work.
Create the index on which you would like to shard/partition:
use <database>
db.<collection>.createIndex({"<shard key field>":1})
Enable Sharding ( partition) the database:
sh.enableSharding("<database>")
Shard the collection :
sh.shardCollection("<database>.<collection>", { "<shard key field>" : 1, ... } )

using DistinguishedName in ldap query

I have a requirement where I need to run query like below and fetch 2-3 attributes for all entities satisfying this query. The number of distinguishedName would be around 100 in a single query. As I see in the microsoft documentation, that distinguishedName is not indexed, I suspect that this might cause performance issues.
Does anybody know if this would indeed cause performance issues? Apart from the below ldap filter, I would obviously have to use SUBTREE scope.
(|(distinguishedName=<DN 1 goes here>)(distinguishedName=<DN 2 goes here>))
Edit 1:
I tried this in my test Active Directory which has ~6k entries.
Internal event: A client issued a search operation with the following options.
Starting node:
DC=example,DC=com
Filter:
( |
(distinguishedName=CN=user-1,CN=large-test,CN=Users,DC=example,DC=com)
(distinguishedName=CN=user-2,CN=large-test,CN=Users,DC=example,DC=com)
(distinguishedName=CN=user-3,CN=large-test,CN=Users,DC=example,DC=com)
(distinguishedName=CN=group1,CN=large-test,CN=Groups,DC=example,DC=com)
)
Search scope:
subtree
Attribute selection:
mail,objectClass
Server controls:
Visited entries:
4
Returned entries:
4
Used indexes:
idx_distinguishedName:4:N;idx_distinguishedName:1:N;idx_distinguishedName:1:N;idx_distinguishedName:1:N;
Pages referenced:
123
Pages read from disk:
0
From the results it looks like it only visited 4 entries that I searched for using some indexes. I checked with schema snap-in tool (just to be sure) and it doesn't show indexes on distinguishedName. Not sure how it's using these indexes though.
Microsoft Active Directory stores the group memberships at the entry level, so you could use this to fetch the email attribute.
E.g.
ldapsearch .... -b SEARCH_BASE "(|(memberOf=GROUP_DN_1)(memberOf=GROUP_DN_2)...)" mail

MongoDB hidden node still receiving connections

I'm not sure if this question been asked before or if the following behavior of MongoDB is normal. Searching online output no results to this scenario.
Initially, we had a 3 node deployment, 1 Primary, 1 Secondary, and 1 Arbiter.
We wanted to add a ReadOnly replica to the cluster and remove the Arbiter node as well in the process. We added the following to the new node:
priority: 0
hidden: true
votes: 1
And removed the Arbiter in the same reconfiguration process so we always have 3 voting members and it leaves us with 1 Primary and 1 Secondary and 1 ReadOnly Node.
The complete process went through smoothly, however, we still end up seeing connections to the ReadOnly replica.
But when checking via db.currentOp(), no queries show up.
Based on the documentation on MongoDB website,
Hidden members are part of a replica set but cannot become primary and are invisible to client applications.
Is there a way to investigate why there are connections coming in? And if this is normal behavior?
EDIT: (for further clarification)
Assuming the following:
MongoDB A (Primary): 192.168.1.50
MongoDB B (Secondary): 192.168.1.51
MongoDB C (Hidden): 192.168.1.52
Client A: 192.168.1.60
Client B: 192.168.1.61
In the logs, we see the following:
2018-03-12T07:19:11.607+0000 I ACCESS [conn119719] Successfully authenticated as principal SOMEUSER on SOMEDB
2018-03-12T07:19:11.607+0000 I NETWORK [conn119719] end connection 192.168.1.60 (2 connections now open)
2018-03-12T07:19:17.087+0000 I NETWORK [listener] connection accepted from 192.168.1.60:47806 #119720 (3 connections now open)
2018-03-12T07:19:17.371+0000 I ACCESS [conn119720] Successfully authenticated as principal SOMEUSER on SOMEDB
So if the other MongoDB instances were connecting, that would be fine, but my question is regarding why the clients are able to connect even when the hidden option is true and if that behavior is normal.
Thank You

Merge vs Synchronize databases

I know, I know... I'm not the first to ask this question but I have gone through numerous posts (on SO as well), but I'm still dissatisfied. I need to merge tables in two databases that have identical schema/structure. See example below. I used a trial of Redgate's SQL Data Compare, but it seems to me that that software only synchronizes Database B to look like A, and often clobbers the data in Database B. If you know any other software that can do a "true DB merge" (note I DO have Foreign Key relationships set up) effectively, then fine. Otherwise how can I do this as quickly and reliably in SQL?
Database A:
PK   Rank
1     Private  
5     Sergeant
ID---Name        RankID
54---Joe           1
60---Frank        1
63---Robert       5
Database B:
PK  Rank
2     Private
3     Corporal
4     Sergeant
6     Lieutenant
ID---Name          RankID
40---Moe             2
45---Steve           2
67---Max             3
78---Tom             4
80---Peter           6
Ideal Merged Database:
PK   Rank Description
1     Private
5     Sergeant
10   Corporal
11 Lieutenant
ID----Name-----RankID
54----Joe--------1
60----Frank------1
63----Robert-----5
100---Moe-------1
101---Steve-----1
102---Max-------10
103---Tom-------5
104---Peter-----11
Sorry for the formatting (had a rough time aligning columns). If it's still not clear what I'm looking for, please let me know

App Engine query in admin datastore viewer returning different results than programmatic query

I'm flummoxed.
I noticed today that some data I thought should be present in my production appengine app wasn't showing up. I connected to the app via the remote console and ran the queries manually. Sure enough it looked like I only had 15 of the 101 rows I was expecting to see.
Then I went to my admin console at appengine.google.com and fired up the datastore viewer with the following query:
SELECT * FROM Assignment where game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')
The result I see is the first page of 20 results. I page through those results, and am able to see all 101 entities. HOORAY! My data is still there. BUT why then can't I access it via the db api? (NOTE: I've already tried clearing memcache via the memcache viewer, even though this particularly query isn't manually memcached)
From the remote console:
> from google.appengine.ext.db import GqlQuery
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')").count()
15
The remote console agrees with the app itself, which only seems to be able to see 15 of the expected 101 rows.
What gives?
UPDATE:
I suspect this might be an indexing issue. If I issue get_by_key_name for one of the missing rows, it subsequently shows up in db api queries.
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')").count()
15
> entities.Assignment.get_by_key_name('201212-assignment-135.9')
<entities.Assignment object at 0xa11eb6c>
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')").count()
16
So should I (or can I) rebuild my indexes to remedy this problem?
UPDATE #2:
I attempted to build a perfect index for this query, and have just verified that even when the query does use the just-built index (via query.index_list()), the results are still only limited to a small subset of items available via the datastore viewer. Infuriatingly, it's actually a different subset than is available with the previous index (20 items vs 15 items). So now adding an additional filter term results in an additional 5 rows returned. So dumb.
All indexes claim to be "serving" so there shouldn't be any reason that the indexes are this far off.
UPDATE #3:
Sometimes, using my new index, I'll get the right answer:
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb') and user = 'zee'").count()
101
However if I issue this query 10 times, it comes back with the 'bad' results about half the time:
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb') and user = 'zee'").count()
16
So maybe its an issue of a bad/behind bigtable replica that I'm hitting half the time, or something else completely opaque that we won't get an answer to (appengine status does list a service disruption today), but I have a feeling that this will be fixed on its own. Will update again if it does.
FINAL UPDATE:
As I suspected, when I woke up this morning my app (and manual queries) now see a consistent, correct view of the data. Would still love an answer as to why this happened, but until I get that I'm going to chalk it up to internal Google bigtable weirdness.
I filed this issue against appengine to see if I can get an answer from someone in the know.
For HRD applications, this is working as intended. App Engine High Replication Datastore (HRD) stores your data synchronously in multiple datacenters. However, the delay from the time a write is committed until it becomes visible in all datacenters means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. [1]
In your specific case, the discrepancy between the results from your application and the Admin Console Datastore Viewer is just because they most likely are reading from different Datastore servers with different consistency.
If you require a consistent view of your data, I advise taking a closer look into the article "Structuring Data for Strong Consistency"
[1] https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency

Resources