E11000 (DuplicateKey) error when using a partial multikey unique index - database

Consider a collection with the following documents:
{
name: "John Doe",
emails: [
{
value: "some#domain.com",
isValid: true,
isPreferred: true
}
]
},
{
name: "John Doe",
emails: [
{
value: "john.doe#gmail.com",
isValid: false,
isPreferred: false
},
{
value: "john.doe#domain.com",
isValid: true,
isPreferred: true
}
]
}
There should be no users with the same valid and preferred emails, so there is a unique index for that:
db.users.createIndex( { "emails.value": 1 }, { name: "loginEmail", unique: true, partialFilterExpression: { "emails.isValid": true, "emails.isPreferred": true } } )
Adding the following email to the first document triggers the unique constraint violation:
{
name: "John Doe",
emails: [
{
value: "john.doe#gmail.com",
isValid: false,
isPreferred: false
}
]
}
Caused by: com.mongodb.MongoCommandException: Command failed with
error 11000 (DuplicateKey): 'E11000 duplicate key error collection:
profiles.users index: loginEmail dup key: { emails.value:
"john.doe#gmail.com", emails.isValid: false, emails.isPreferred: false
}' on server profiles-db-mongodb.dev:27017. The full response is
{"ok": 0.0, "errmsg": "E11000 duplicate key error collection:
profiles.users index: loginEmail dup key: { emails.value:
"john.doe#gmail.com", emails.isValid: false, emails.isPreferred:
false }", "code": 11000, "codeName": "DuplicateKey", "keyPattern":
{"emails.value": 1, "emails.isValid": 1, "emails.isPreferred": 1},
"keyValue": {"emails.value": "john.doe#gmail.com", "emails.isValid":
false, "emails.isPreferred": false}}
As I can understand, this happens because the filter expression is applied to the collection, not to the embedded documents, so although being somewhat counterintuitive and unexpected, the index behaves as described.
My question is how can I ensure partial uniqueness without having false negatives?

TLDR: You cant.
Let's understand why it's happening first, maybe then we'll understand what can be done. The problem originates due to a combination of two Mongo features.
the dot notation syntax. The dot notation syntax allows you to query subdocuments in arrays at ease ("emails.isPreferred": true). However when you want to start using multiple conditions for subdocuments like in your case you need to use something like $elemMatch sadly the restrictions for partialFilterExpression are quite restrictive and do not give you such power.
Which means even docs with emails such as:
{
"_id": ObjectId("5f106c0e823eea49427eea64"),
"name": "John Doe",
"emails": [
{
"value": "john.doe#gmail.com",
"isValid": true,
"isPreferred": false
},
{
"value": "john.doe#domain.com",
"isValid": false,
"isPreferred": true
}
]
}
Will be indexed. So ok, We will have some extra indexed documents in the collection but still apart from (falsely) increasing index size you still hope it might work, but it doesn't due to point 2.
multikey indexes:
MongoDB uses multikey indexes to index the content stored in arrays. ... , MongoDB creates separate index entries for every element of the array.
So when you create an index on an array or on any field of a sub document in an array Mongo will "flatten" the array and create a unique entry for each of the documents. and in this case it will create a unique index for all emails in the array.
So due to all these "features" and the restrictions of the partial filter syntax usage we can't really achieve what you want.
So what can you do? I'm sure you're already thinking of possible work arounds through this. A simple solution would be to maintain an extra field that will only contain those isValid and isPreferred emails. then a unique sparse index will do the trick.

Related

Set Index Key to Output Field Mapping

In my index, I've a field called id. During my enrichment pipeline I compute a value called /document/documentId, which I'm attempting to map to the id field. However, this mapping does not seem to work as the id always seems to be some long value that looks like a hash. All my other output field mappings work as expected.
Portion of the Index:
{
'name': 'id',
'type': 'Edm.String',
'facetable': false,
'filterable': true,
'key': true,
'retrievable': true,
'searchable': true,
'sortable': true,
'analyzer': null,
'indexAnalyzer': null,
'searchAnalyzer': null,
'synonymMaps': [],
'fields': []
}
Portion of the Indexer:
'outputFieldMappings': [
{
'sourceFieldName': '/document/documentId',
'targetFieldName': 'id'
}
]
Expected Value: 4b160942-050f-42b3-bbbb-f4531eb4ad7c
Actual Value: aHR0cHM6Ly9zdGRvY3VtZW50c2Rldi5ibG9iLmNvcmUud2luZG93cy5uZXQvMDNiZTBmMzEtNGMyZC00NDRjLTkzOTQtODJkZDY2YTc4MjNmL29yaWdpbmFscy80YjE2MDk0Mi0wNTBmLTQyYjMtYmJiYi1mNDUzMWViNGFkN2MucGRm0
Any thoughts on how to fix this would be much appreciated!
TL;DR - Can't use output field mappings for Keys. Can only use source fields.
According to Microsoft, it's not possible to set the document key using the output field mapping. Apparently, there is an issue in cases of deleting documents so the key has to exist straight out of the document.
I ended up using a mapping function in the fieldMappings.
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_name",
"targetFieldName": "filename"
},
{
"sourceFieldName": "metadata_storage_name",
"targetFieldName": "id",
"mappingFunction": {
"name": "extractTokenAtPosition",
"parameters": {
"delimiter": ".",
"position": 0
}
}
}
]
Since my file name is something like 4b160942-050f-42b3-bbbb-f4531eb4ad7c.pdf then this ends up mapping mapping correctly to my Id.
You can use a regular field mapping rather than an output field mapping. If you created your indexer in the Azure portal, your key (which is "id", since key is true in your index definition of "id" above) was probably base64-encoded (that option is checked by default). You will need to base64-decode it to get your original value, OR you can store a second copy of the original value without encoding it (the key will need to be encoded). Here's how you do the latter - this can replace your output field mapping:
"fieldMappings": [
{
"sourceFieldName": "documentId",
"targetFieldName": "documentId"
},
{
"sourceFieldName": "documentId",
"targetFieldName": "id",
"mappingFunction": {
"name": "base64Encode"
}
}
]
Note that you will also need to add a documentId field in your index since you are storing this in its original format as well.
{
'name': 'documentId',
'type': 'Edm.String',
'facetable': false,
'filterable': true,
'key': false,
'retrievable': true,
'searchable': true,
'sortable': true,
'analyzer': null,
'indexAnalyzer': null,
'searchAnalyzer': null,
'synonymMaps': [],
'fields': []
}
Alternatively, you could just base64 encode (when storing) and decode (when retrieving) the id value. This key value is base64-encoded so it's safe to use as an Azure Cognitive Search document key. Check out https://learn.microsoft.com/azure/search/search-indexer-field-mappings for more info.

MongoDB Unique Index issue in array of subdocuments

I have a document like this:
{
_id : ObjectID(),
title: "",
items: [
{
"itemId" : 1234678,
}
]
}
itemId is a unique index created like this:
db.allItems.createIndex( { "items.itemId" : 1 }, { unique: true});
And then everything works fine, until I set items array (not pushing one), in this case, unique index does not work. The following data in the update operation (using $set) does not throw an error and works fine, which MUST NOT. I mean it creates the sub-document without any unique error
items: [
{
itemId: 1234678
},
{
itemId: 1234678
}
]
While I expect MongoDB to throw error that itemId is not unique.
MongoDb index uniqueness is applicable for documents, not for nested arrays.
If you try to insert new document with:
items: [
{
"itemId" : 1234678,
},
...
]
MonogDB will throw E11000 duplicate key error collection

Mongo - add field if object in array of sub docs has value

Details
I develop survey application with express and struggle with some getting of data.
The case:
you can get all surveys by "GET /surveys". And every survey doc has to contains hasVoted:mongoose.Bool and optionsVote:mongoose.Map if the user has voted for the survey. (SurveySchema is bellow)
you can vote for survey by "POST /surveys/vote"
you can see the results of any survey only if you vote for it
new Schema({
question: {
type: mongoose.Schema.Types.String,
required: true,
},
options: {
type: [{
type: mongoose.Schema.Types.String,
required: true,
}]
},
optionsVote: {
type: mongoose.Schema.Types.Map,
of: mongoose.Schema.Types.Number,
},
votesCount: {
type: mongoose.Schema.Types.Number,
},
votes: {
type: [{
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
},
option: mongoose.Schema.Types.Number,
}]
},
})
Target:
So the target of the question is how to add fields hasVoted and optionsVote if there is "Vote" sub document in votes array where user===req.user.id ?
I believe you got the idea so if you have an idea how to change the schema to achieve the desired result I'm open!
Example:
Data:
[{
id:"surveyId1
question:"Question",
options:["op1","op2"],
votes:[{user:"userId1", option:0}]
votesCount:1,
optionsVote:{"0":1,"1":0}
},{
id:"surveyId2
question:"Question",
options:["op1","op2"],
votes:[{user:"userId2", option:0}]
votesCount:1,
optionsVote:{"0":1,"1":0}
}]
Route handler:
Where req.user.id='userId1' and then make the desired query.
The result
[{ // Voted for this survey
id:"surveyId1
question:"Question",
options:["op1","op2"],
votes:[{user:"userId1", option:0}]
votesCount:1,
optionsVote:{"0":1,"1":0},
hasVoted:true,
},{ // No voted for this survey
id:"surveyId2
question:"Question",
options:["op1","op2"],
votesCount:1,
}]
In MongoDB, you can search for sub document as follows
//Mongodb query to search for survey filled by a user
db.survey.find({ 'votes.user': myUserId })
So with this when you can get results only where user has voted, do you really need hasVoted field?
To have optionsVote field, first I would prefer schema of optionsVote as {option: "a", count:1}. You can choose any of the following approach.
A. manage to update optionsVote field at the time of update by incrementing the count of the voted option when you POST /survey/vote.
B. Another approach would be to calculate the optionsVote based on votes entries at the time of GET /survey. You can do this via aggregate
//Mongodb query to get optionsVote:{option: "a", count:1} from votes: { user:"x", option:"a"}
db.survey.aggregate([
{ $unwind: "$votes" },
{ $group: {
"_id": { "id": "_id", "option": "$votes.option" },
optionCount: { $sum: 1 }
}
},
{
$group: { "_id": "$_id.id" },
optionsVote: { $push : { option: "$_id.option", count: "$optionCount" } },
votes: { $push : '$votes'}
}
])
//WARNING: I haven't tested this query, this is just to show the approach -> group based on votes.option and count all votes for that option for each document and then create optionsVote field by pushing all option with their count using $push into the field `optionsVote`
I recommend approach A because I assume POST operations would be quite less than GET operations. Also it's easier to implement. Having said that, keeping query in B handy will help you with sanity check.

count number of rows in cloudant in response

I have below response from my map reduce .
Now i want to count the number of rows in the response can any one help me how i can do it in cloudant? I need something in response like to get the total count of distinct correlationid in a period.
{
rows: [
{
key: [
"201705",
"aws-60826346-"
],
value: null
},
{
key: [
"201705",
"aws-60826348802-"
],
value: null
},
{
key: [
"201705",
"aws-las97628elb"
],
value: null
},
{
key: [
"201705",
"aws-ve-test"
],
value: null
},
{
key: [
"201705",
"aws-6032dcbce"
],
value: null
},
{
key: [
"201705",
"aws-60826348831d"
],
value: null
},
{
key: [
"201705",
"aws-608263488833926e"
],
value: null
},
{
key: [
"201705",
"aws-608263488a74f"
],
value: null
}
]
}
You need to implement a slightly obscure concept called "chained map-reduce" to accomplish this. You can't do this in the Cloudant administrative GUI, so you'll have to write your design document by hand.
Have your map/reduce emit an array as the key. The 1st array element will be month and the second will be your correlationid. The value should be 1. Then specify the built-in _count as the reduce function.
Now you need to add the chaining part. Chaining basically involves automatically copying the result of a map/reduce into a new database. You can then do another map/reduce on that database. Thereby creating a chain of map/reduces...
Here's a tiny sample database using your example:
https://rajsingh.cloudant.com/so44106569/_all_docs?include_docs=true&limit=200
Here's the design document containing the map/reduce, along with the dbcopy command that updates a new database (in this case called sob44106569) with the results of the view called view:
{
"_id": "_design/ddoc",
"_rev": "11-88ff7d977dfff81a05c50b13d854a78f",
"options": {
"epi": {
"dbcopy":
{
"view": "sob44106569"
}
}
},
"language": "javascript",
"views": {
"view": {
"reduce": "_count",
"map": "function (doc) {\n emit([doc.month, doc.machine], 1);\n}"
}
}
}
Here's the result of the map function (no reduce) showing 10 rows. Notice that there are two documents with month 201705 and machine aws-6032dcbce:
https://rajsingh.cloudant.com/so44106569/_design/ddoc/_view/view?limit=200&reduce=false
If you just do the built-in _count reduce on this view at group_level=1, you'll get a value of 9 for 201705, which is wrong for your purposes because you want to count that aws-6032dcbce only once, even though it shows up in the data twice:
https://rajsingh.cloudant.com/so44106569/_design/ddoc/_view/view?limit=200&reduce=true&group=true&group_level=1
So let's take a quick look at the map/reduce at group_level=2. This is what gets copied to the new database:
https://rajsingh.cloudant.com/so44106569/_design/ddoc/_view/view?limit=200&reduce=true&group=true&group_level=2
Here you see that aws-6032dcbce only shows up once (but with value=2), so this is a useful view. The dbcopy part of our map/reduce creates the database sob44106569 based on this view. Let's look at that:
https://rajsingh.cloudant.com/sob44106569/_all_docs?include_docs=true
Now we can run a very simple map/reduce on that database, emitting the month and machine again (now they are in an array so have different names), but this time the repeated values for machine have already been "reduced" away.
function (doc) {
if (doc.key && doc.key.length == 2 )
emit(doc.key[0], doc.key[1]);
}
And finally here's the count of distinct "machines". Now we can finally see the desired value of 8 for 201705.
https://rajsingh.cloudant.com/sob44106569/_design/views/_view/counted?limit=200&reduce=true&group=true&group_level=1
response:
{
"rows": [
{
"key": "201705",
"value": 8
},
{
"key": "201706",
"value": 1
}
]
}
Emit 1 instead of null and use the built-in reducer _count.

Set criteria in query for fields and fields in nested objects

I have a document like this:
{
"InDate": "11.09.2015",
"Kst2Kst": true,
"OutDate": "11.09.2015",
"__v": 0,
"_id": ObjectId('55f2df2d7e12a9f1f52837e6'),
"accepted": true,
"inventar": [
{
"accepted": "1",
"name": "AAAA",
"isstammkost": true,
"stammkost": "IWXI"
},
{
"accepted": "1",
"name": "BBBB",
"isstammkost": false,
"stammkost": "null"
}
]
}
I want to select the data with "isstammkost": true in the inventar-array.
My query is:
Move.findOne({accepted : true, 'inventar.isstammkost' : true},
'OutDate InDate inventar.name', function(err, res)
It doesn't work -> It selects all, even with inventar.isstammkost : false.
The "normal" query works like I want (without criteria in sub-array). Whats the right way to set criteria in sub-array?
Of course it will return the "isstammkost": false part, because that is part of the same document as the "isstammkost": true. They are both objects in the array "inventar", a top-level field in a single document. Without some sort of projection, the entire document will always be returned to a mongodb query and thus nodejs will pass them on to you.
I'm not terribly up-to-speed on nodejs, but if this were the mongo shell it would look like this:
> db.MyDB.findOne({{accepted : true, "inventar.isstammkost" : true}, {"inventar.isstammkost.$": 1});
You will need to find out how to add that extra parameter to the nodejs function.

Resources