Groovy - FindAll - unique record - condition declared by field - arrays

I have a json as below
{
"Animals": [
{
"Name": "monkey",
"Age": 4
},
{
"Name": "lion",
"Age": 3
},
{
"Name": "lion",
"Age": 3,
"Misc": "001"
}
]
}
2 elements out of 3 inside json array has the Name and Age. The only difference is that 3rd element has Misc and the 2nd does not have Misc.
How to get the record having Misc when there are 2 records with same Name and Age?
Below is what I tried
parsedJson?.Animals = parsedJson?.Animals?.unique().findAll{animal -> animal?.Misc?.trim() ? animal?.Misc?.trim() : site?.Name?.trim() };
Looks like I missed one more statement or I missed something inside unique()
I also tried
parsedJson?.Animals = parsedJson?.Animals?.unique{a1,a2 -> a1?.Misc <=> a2?.Misc}
but still not get what I want
What I want is
{
"Animals": [
{
"Name": "monkey",
"Age": 4
},
{
"Name": "lion",
"Age": 3,
"Misc": "001"
}
]
}

One way to go about this is by grouping the elements and then just merge
the maps.
groupBy is used to group the elements by their "primary key" -- lets
assume, that this is Name and Age. The resulting data structure is
a map with [it.Name, it.Age] tuples and keys and a list of elements,
that hold that property.
Next reduce over the list of maps and just merge them. This assumes,
that the information there does not contradict itself (e.g. only adds to
the result). Otherwise the last map would just win.
def data = [["Name": "monkey", "Age": 4],
["Name": "lion", "Age": 3],
["Name": "lion", "Age": 3, "Misc": "001"]]
println data.groupBy{[it.Name, it.Age]}.collect{ _, xs -> xs.inject{ acc, x -> acc.putAll x; acc } }
// → [[Name:monkey, Age:4], [Name:lion, Age:3, Misc:001]]

Related

How to set unique constraint for field in document nested in array?

I have a collection of documents in MongoDB that looks like:
{"_id": 1, "array": [{"id": 1, "content": "..."}, {"id": 2, "content": "..."}]}
{"_id": 2, "array": [{"id": 1, "content": "..."}, {"id": 2, "content": "..."}, {"a_id": 3, "content": "..."}]}
and I want to ensure that there is no duplicate array.id within each document. So the provided example is ok, but the followign is not:
{"_id": 1, "array": [{"id": 1, "content": "..."}, {"id": 1, "content": "..."}]}
My question is how to do this (preferably in PyMongo).
EDIT
What I tried was the following code that I thought would create key on (_id, array.id) but if you run it this does not happen:
from pymongo import MongoClient, ASCENDING
client = MongoClient(host="localhost", port=27017)
database = client["test_db"]
collection = database["test_collection"]
collection.drop()
collection.create_index(keys=[("_id", ASCENDING),
("array.id", ASCENDING)],
unique=True,
name="new_key")
document = {"array": [{"id": 1}, {"id": 2}]}
collection.insert_one(document)
collection.find_one_and_update({"_id": document["_id"]},
{"$push": {"array": {"id": 1}}})
updated_document = collection.find_one({"_id": document["_id"]})
print(updated_document)
which outputs (note that there are two objects with id = 1 in the array). I would expect to get an exception.
{'_id': ObjectId('5eb51270d6d70fbba739e3b2'), 'array': [{'id': 1}, {'id': 2}, {'id': 1}]}
So if I understand it correctly there is no way how to set index (or
some condition) that would enforce the uniqueness within the document,
right? (Other than check this explicitly when creating the document or
when inserting into it.)
Yes. Please see the following two scenarios about using the unique index on an array field with embedded documents.
Unique Multikey Index (index on embdeed document field within an array):
For unique indexes, the unique constraint applies across separate
documents in the collection rather than within a single document.
Because the unique constraint applies to separate documents, for a
unique multikey index, a document may have array elements that result
in repeating index key values as long as the index key values for that
document do not duplicate those of another document.
First Scenario:
db.arrays.createIndex( { _id: 1, "array.id": 1}, { unique: true } )
db.arrays.insertOne( { "_id": 1, "array": [ { "id": 1, "content": "11"}, { "id": 2, "content": "22"} ] } )
db.arrays.insertOne( { "_id": 2, "array": [ { "id": 1, "content": "1100"}, { "id": 5, "content": "55"} ] } )
db.arrays.insertOne( {"_id": 3, "array": [ {"id": 3, "content": "33"}, {"id": 3, "content": "3300"} ] } )
All the three documents gets inserted without any errors.
As per the note on Unique Multikey Index, above, the document with _id : 3 has two embedded documents within the array with same "array.id"value: 3.
Also, the uniqueness is enforced on two keys of the compound index { _id: 1, "array.id": 1} and there were duplicate "array.id" values across the documents also ( the _id values 1 and 2).
Second Scenario:
db.arrays2.createIndex( { "array.id": 1 }, { unique: true } )
db.arrays2.insertOne( { "_id": 3, "array": [ { "id": 3, "content": "33" }, { "id": 3, "content": "330"} ] } )
db.arrays2.insertOne( { "_id": 4, "array": [ { "id": 3, "content": "331" }, { "id": 30, "content": "3300" } ] } )
The first document with _id : 3 gets inserted successfully. The second one has an error: "errmsg" : "E11000 duplicate key error collection: test.arrays2 index: array.id_1 dup key: { array.id: 3.0 } ". This behavior is as expected as per the note Unique Multikey Index.
You can do this check on update
const doc = await Model.findOneAndUpdate(
{ _id, 'array.id': { $ne: newID} },
{
$push: {
array: newID
}
},
{ new: true }
);

Merge Multiple Array by keys - Angular 5, Typescript

My question is about typescript and not javascript. I want to merge multiple arrays by key(id). For Example: I have these one to many relations arrays
Student Array 1 :
[
{
"Case ID":12,
"Student name":"john",
"address":"Ohio"
},
{
"Case ID":13,
"Student name":"David",
"address":"new york"
}
]
Courses Array 2 :
[
{
"id":34343,
"Case ID":12,
"course":"algorithm",
"Grade":"A"
},
{
"id":343434,
"Case ID":12,
"course":"advanced c++",
"Grade":"B"
}
]
I want to get this array which has keys from both array1 and array 2 :
`[
{
"Case ID":12,
"name":"john",
"Courses":[{"course":"algorithm",
"Grade":"A",},
{"course":"advanced c++",
"Grade":"B"}]
}
]`
#JohnyAli, you don't want get the object that you propouse (they have a repeted key). You want to get
{ Sid:..,name:..,courses:[{Grade:..course:..},{Grade:..course:..}]
So use map
const data=this.students.map(x=>{
//witch each element of students
return { //an element that have
Sid:x.Sid, //property Sid equal property Sid of element
name:x.name, //idem with name
courses:this.courses.filter(c=>c.Sid==x.Sid) //the variable courses
//was the courses where
//Sid was equal the Sid of element
})
_.groupBy() lodash will do that job, you can have any property you want to group by your array.
var arr = [
{
"name": "xyz",
"age": 22,
"add": "street 5"
},
{
"name": "fjf",
"age": 22,
"add": "street 6"
}
];
console.log(_.groupBy(arr, 'name'));
/** result:
{
"xyz": [
{
"name": "xyz",
"age": 22,
"add": "street 5"
}
],
"fjf": [
{
"name": "fjf",
"age": 22,
"add": "street 6"
}
]
} **/
You can use lodash
and do the merge opeartion on both the arrays to a destination array.

Obtaining keys and values from JSON nested array in nest

First time posting! I am converting JSON data (dictionary) from a server into a csv file. The keys and values taken are fine apart from the nest "Astronauts", which is an array. Basically every individual JSON string is a datum that may contains from 0 to an unlimited number of astronauts which features I would like to extract as independent values. For instance something like this:
Astronaut1_Spaceships_First: Katabom
Astronaut1_Spaceships_Second: The Kraken
Astronaut1_name: Jebeddia
(...)
Astronaut2_gender: Hopefully female
and so on. The problem here is that the nest is set as an array and not a dictionary so I do not know what to do. I have tried the dpath library as well as flattering the nest but nothing did change. Any ideas?
import json
import os
import csv
import datetime
import dpath.util #Dpath library needs to be installed first
datum = {"Mission": "Make Earth Greater Again", "Objective": "Prove Earth is flat", "Astronauts": [{"Spaceships": {"First": "Katabom", "Second": "The Kraken"}, "Name": "Jebeddiah", "Gender": "Hopefully male", "Age": 35, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}, {"Spaceships": {"First": "The Kraken", "Second": "Minnus I"}, "Name": "Bob", "Gender": "Hopefully female", "Age": 23, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}]}
#Parsing process
parsed = json.loads(datum) #datum is the JSON string retrieved from the server
def flattenjson(parsed, delim):
val = {}
for i in parsed.keys():
if isinstance(parsed[i], dict):
get = flattenjson(parsed[i], delim)
for j in get.keys():
val[i + delim + j] = get[j]
else:
val[i] = parsed[i]
return val
flattened = flattenjson(parsed,"__")
#process of creating csv file
keys=['Astronaut1_Spaceship_First','Astronaut2_Spaceship_Second', 'Astronaut1_Name] #reduced to 3 keys for this example
writer = csv.DictWriter(OD, keys ,restval='Null', delimiter=",", quotechar="\"", quoting=csv.QUOTE_ALL, dialect= "excel")
writer.writerow(flattened)
.
#JSON DATA FROM SERVER
{
"Mission": "Make Earth Greater Again",
"Objective": "Prove Earth is flat",
"Astronauts": [ {
"Spaceships": {
"First": "Katabom",
"Second": "The Kraken"
},
"Name": "Jebeddiah",
"Gender": "Hopefully male",
"Age": 35,
"Prefered colleages": [],
"Following missions": [
{
"Payment_status": "TO BE CONFIRMED"
}
]
},
{
"Spaceships": {
"First": "The Kraken",
"Second": "Minnus I"
},
"Name": "Bob",
"Gender": "Hopefully female",
"Age": 23,
"Prefered colleages": [],
"Following missions": [
{
"Payment_status": "TO BE CONFIRMED"
}
]
},
]
}
]
Firstly, the datum you have defined here is not the datum that would be extracted from the server. The datum from the server would be a string. The datum you have in this program is already processed. Now, assuming datum to be:
datum = '{"Mission": "Make Earth Greater Again", "Objective": "Prove Earth is flat", "Astronauts": [{"Spaceships": {"First": "Katabom", "Second": "The Kraken"}, "Name": "Jebeddiah", "Gender": "Hopefully male", "Age": 35, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}, {"Spaceships": {"First": "The Kraken", "Second": "Minnus I"}, "Name": "Bob", "Gender": "Hopefully female", "Age": 23, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}]}'
You don't need the the dpath library. The problem here is that your json flattener doesn't handle embedded lists. Try using the one I've put below.
Assuming that you want a one line csv file,
import json
def flattenjson(data, delim, topname=''):
"""JSON flattener that can handle embedded lists and dictionaries"""
flattened = {}
def internalflat(int_data, name=topname):
if type(int_data) is dict:
for key in int_data:
internalflat(int_data[key], name + key + delim)
elif type(int_data) is list:
i = 1
for elem in int_data:
internalflat(elem, name + str(i) + delim)
i += 1
else:
flattened[name[:-len(delim)]] = int_data
internalflat(data)
return flattened
#If you don't want mission or objective in csv file
flattened_astronauts = flattenjson(json.loads(datum)["Astronauts"], "__", "Astronaut")
keys = flattened_astronauts.keys().sort()
writer = csv.DictWriter(OD, keys ,restval='Null', delimiter=",", quotechar="\"", quoting=csv.QUOTE_ALL, dialect= "excel")
writer.writerow(flattened_astronauts)

How to use jq to produce a cartesian product of two arrays present in the input JSON

I'd like to be able to use jq to output the 'product' of 2 arrays in the input JSON... for example, given the following input JSON:
{
"quantities": [
{
"product": "A",
"quantity": 30
},
{
"product": "B",
"quantity": 10
}
],
"portions": [
{
"customer": "C1",
"percentage": .6
},
{
"customer": "C2",
"percentage": .4
}
]
}
I'd like to produce the following output (or similar...):
[
{
"customer": "C1",
"quantities": [
{
"product": "A",
"quantity": 18
},
{
"product": "B",
"quantity": 6
}
]
},
{
"customer": "C2",
"quantities": [
{
"product": "A",
"quantity": 12
},
{
"product": "B",
"quantity": 4
}
]
}
]
So in other words, for each portion, use its value of percentage, and apply it to each product quantity. Given 2 quantities and 2 portions should yield 4 results.. given 3 quantities and 2 portions should yield 6 results, etc...
I've made some attempts using foreach filters, but to no avail...
I think this will do what you want.
[
.quantities as $q
| .portions[]
| .percentage as $p
| {
customer,
quantities: [
$q[] | .quantity = .quantity * $p
]
}
]
Since you indicated you want the Cartesian product, and that you only gave the sample output as being indicative of what you're looking for, it may be worth mentioning that one can obtain the Cartesian product very simply:
.portions[] + .quantities[]
This produces objects such as:
{
"product": "B",
"quantity": 10,
"customer": "C2",
"percentage": 0.4
}
You could then use reduce or (less efficiently, group_by) to obtain the data in whatever form it is you really want.
For example, assuming .customer is always a string, we could transform
the input into the requested format as follows:
def add_by(f;g): reduce .[] as $x ({}; .[$x|f] += [$x|g]);
[.quantities[] + .portions[]]
| map( {customer, quantities: {product, quantity: (.quantity * .percentage)}} )
| add_by(.customer; .quantities)
| to_entries
| map( {customer: .key, quantities: .value })

MongoDB - Select multiple sub-dicuments from array using $elemMatch

I have a collection like the following:-
{
_id: 5,
"org_name": "abc",
"items": [
{
"item_id": "10",
"data": [
// Values goes here
]
},
{
"item_id": "11",
"data": [
// Values goes here
]
}
]
},
// Another sub document
{
_id: 6,
"org_name": "sony",
"items": [
{
"item_id": "10",
"data": [
// Values goes here
]
},
{
"item_id": "11",
"data": [
// Values goes here
]
}
]
}
Each sub document corresponds to individual organizations and each organization has an array of items in them.
What I need is to get the select individual elements from the items array, by providing item_id.
I already tried this:-
db.organizations.find({"_id": 5}, {items: {$elemMatch: {"item_id": {$in: ["10", "11"]}}}})
But it is returning either the item list with *item_id* "10" OR the item list with *item_id* "11".
What I need is is the get values for both item_id 10 and 11 for the organization "abc". Please help.
update2:
db.organizations.aggregate([
// you can remove this to return all your data
{$match:{_id:5}},
// unwind array of items
{$unwind:"$items"},
// filter out all items not in 10, 11
{$match:{"items.item_id":{"$in":["10", "11"]}}},
// aggregate again into array
{$group:{_id:"$_id", "items":{$push:"$items"}}}
])
update:
db.organizations.find({
"_id": 5,
items: {$elemMatch: {"item_id": {$in: ["10", "11"]}}}
})
old Looks like you need aggregation framework, particularly $unwind operator:
db.organizations.aggregate([
{$match:{_id:5}}, // you can remove this to return all your data
{$unwind:"$items"}
])

Resources