How to use one collection to map another collection in MongoDB - database

I have a dataset like the following
[{id:1,month:1,count:1},{id:1,month:2,count:2},{id:1,month:3,count:3}......,
{id:2,month:1,count:1},{id:2,month:2,count:2},{id:2,month:3,count:3}.......,
........
........
{id:19,month:1,count:1},{id:19,month:2,count:2},{id:19,month:3,count:3}.......,]
and the table looks like this.
|id|month|count|
|1 | 1 | 1 |
.............
.........
|19| 12 | 4 |
there is another id as divisonId and it maps to the ids above as the following
{1:[1,2,4,5],2:[3,6,8,9],3:[7,10,....19]}
and the mapping table looks like this.
|divisionId| id|
| 1 | 1 |
| 1 | 2 |
| 1 | 4 |
| 1 | 5 |
| 2 | 3 |
| 2 | 6 |
......
......
so now I need to aggreate the data and sum and regroup them according to the divisonIds.
so eventually the return data should look like the following
[{divsionId:1,month:1,count:19},{divisionId:1,month:2,count:53},{divisionId:1,month:3,count:66}......,
{divisionId:2,month:1,count:21},{divisionId:2,month:2,count:82},{divisionId:2,month:3,count:63}.......,
{divisionId:3,month:1,count:1},{divisionId:3,month:2,count:2},{divisionId:3,month:3,count:3}.......,]
and the table should looks like
| divisionId| month | count |
| 1 | 1 | 200 |
| 1 | 2 | 400 |
| 1 | 3 | 300 |
.....
.....
| 3 | 11 | 500 |
| 3 | 12 | 600 |
so basically, it just map the ids to divisionId, and sum up the individually months across those ids and aggregate a new collection to return data.
I am not allowed to put divisionId to the original table, due to the fact that ids maybe assigned to different divisionIds in the future, or it could have been much easier to just use the aggregate methods.
currently, one way I can do so is to use Javascript to get datas for ids separately according to the mapping, then do the calculation and push it up to mongos to store it as a new collection, so when UI query the data in the future, it will just read the query, saving the expensive calculation. But it would be awesome if I can solve this problem just by using some advanced mongodb syntax. Please let me know if you have some tricks I could use. thanks.

Please try this :
db.divisionIdCollName.aggregate([{
$lookup:
{
from: "idCollectionName",
let: { ids: "$id" },
pipeline: [
{
$match:
{
$expr:
{ $in: ["$id", "$$ids"] }
}
}
],
as: "data"
}
}, { $unwind: { path: "$data", preserveNullAndEmptyArrays: true } },
{ $group: { _id: { divisionId: '$divisionId', month: '$data.month' }, month: { $first: '$data.month' }, count: { $sum: '$data.count' } } },
{$addFields : {divisionId : '$_id.divisionId'}}, {$project : {_id:0}}
])
Result : Mongo playground
You can test results over there - Once you feel aggregation is returning correct results try to add $merge stage to write result to another collection, you could use $out instead of $merge but we're using $merge is because if given name matches with any collection name in database $out will replace entire collection with aggregation result each time query runs which is destructive & should not be used if this query has to update existing records in a collection, Which is why we're using $merge, Please read about those two before you use, So add below stage as last stage after $project.
Note : $merge is new in v 4.2, $out >= v 2.6. If you're doing $merge since you're specifying two fields on: [ "divisionId", "month" ] - So there should be an unique compound index created on collection divisionIdCollNameNew - So yes we need to manually create collection & create unique index as well & then execute query.
Create collection & index :
db.createCollection("divisionIdCollNameNew")
db.divisionIdCollNameNew.createIndex ( { divisionId: 1, month: 1 }, { unique: true } )
Final Stage :
{ $merge : { into: { coll: "divisionIdCollNameNew" }, on: [ "divisionId", "month" ], whenNotMatched: "insert" } }

ASSUMPTION
Collection months has this structure: {id:1,month:1,count:1}
Collection divisions has this structure: {1:[1,2,4,5],2:[3,6,8,9],3:[7,10,19]}
You may perform such query:
db.divisons.aggregate([
{
$addFields: {
data: {
$filter: {
input: {
$objectToArray: "$$ROOT"
},
cond: {
$isArray: "$$this.v"
}
}
}
}
},
{
$unwind: "$data"
},
{
$lookup: {
from: "months",
let: {
ids: "$data.v"
},
pipeline: [
{
$match: {
$expr: {
$in: [
"$id",
"$$ids"
]
}
}
}
],
as: "months"
}
},
{
$unwind: "$months"
},
{
$group: {
_id: {
divisionId: "$data.k",
month: "$months.month"
},
count: {
$sum: "$months.count"
}
}
},
{
$project: {
_id: 0,
divisionId: "$_id.divisionId",
month: "$_id.month",
count: "$count"
}
},
{
$sort: {
divisionId: 1,
month: 1
}
}
])
MongoPlayground
EXPLANATION
Your divisions collection has not normalized key:value pairs, so our first step is to convert 1:[...], 2:[...] pair into [{k:"1", v:[...]}, {k:2, v:[...]}] pairs with $objectToArray operator.
Then, we flatten array from previous step with $unwind and apply $lookup with uncorrelated sub-queries to cross with months collection.
The last steps, we $group by divisionId + month and sum count value.
In order to store the result inside another collection, you need to use $out or $merge operator.

Related

Reactive Programming with RxJS and separating data into

I am trying to work more reactively with Angular 15 and RxJS observables for a UI component. I only subscribe to the data in my component template (html). I have a service that receives data from an external system. The issue I have is the data may be received for multiple days and needs to be 'split' for the display usage.
In the display, there are individual components of data, that show the rows returned from the service call. The service makes an HTTP call to an external host.
this.Entries$ = this.Http_.get<Array<IEntry>>('http://host.com/api/entry');
This data is then an array of records with an EntryDate, and a structure of information (UserId, Description, TimeWorked, etc.). The external API sends all the records back as one flat array of data which is not guaranteed to be sorted, it comes back in a database order, which was the order records were entered. A sort might be needed for any processing, but I am not sure.
[
{ "EnterDate": 20221025, "UserId": "JohnDoe", "TimeWorked": 2.5, ... },
{ "EnterDate": 20221025, "UserId": "JohnDoe", "TimeWorked": 4.5, ... },
{ "EnterDate": 20221025, "UserId": "BSmith", "TimeWorked": 5, ... },
{ "EnterDate": 20221026, "UserId": "JohnDoe", "TimeWorked": 4, ... },
{ "EnterDate": 20221026, "UserId": "BSmith", "TimeWorked": 5, ... },
{ "EnterDate": 20221026, "UserId": "JohnDoe", "TimeWorked": 2, ... },
]
Currently, my HTML template loops through the Entries$ observable, when it was for just one day.
<ng-container *ngFor="let OneEntry of (Entries$ | async)">
<one-entry-component [data]=OneEntry />
</ng-container>
I want to be able to split my array of records into different datasets by EntryDate (and apparently user, but just EntryDate would work for now), similar to the groupBy(), but I do not know how to get to the internal record references, as it would be a map within the groupBy() I believe.
With the data split, I would then be looking to have multiple one-day-components on the page, that then have the one-entry-component within them.
|---------------------------------------------------------------|
| |
| |-One Day 1-------------###-| |-One Day 2-------------###-| |
| | | | | |
| | [ One Line ] | | [ One Line ] | |
| | [ One Line ] | | [ One Line ] | |
| | [ One Line ] | | [ One Line ] | |
| | [ One Line ] | | [ One Line ] | |
| | | | | |
| |---------------------------| |---------------------------| |
| |
| |-One Day 3-------------###-| |-One Day 4-------------###-| |
| | | | | |
| | [ One Line ] | | [ One Line ] | |
| | [ One Line ] | | [ One Line ] | |
| | [ One Line ] | | [ One Line ] | |
| | [ One Line ] | | [ One Line ] | |
| | | | | |
| |---------------------------| |---------------------------| |
| |
|---------------------------------------------------------------|
The 4 boxes would be there if there were 4 separate days in the response. If there were 2 different dates, then just show 2 dates, but this could be 5 or 6 even.
I would need an Observable that had the dates for splitting (and even users) and then be able to pass this as data to the one<one-day-component [data]=OneDateOneUser$ />. My component needs this so that I can count the time entries for the title, which I believe is a simple .pipe(map()) operation.
Within the one-day-component, I would then simply loop through the OneDateOneUser$ observable to extract individual records to send to the one-entry-component as I do currently.
I believe the RxJS groupBy is what I need. However, I am new to RxJS, and working with the inner array of data is not clear to me in the example.
If the data is individual records like the example, and not an array of data, then it does work using the example RxJS reference.
import { of, groupBy, mergeMap, reduce, map } from 'rxjs';
of(
{ id: 1, name: 'JavaScript' },
{ id: 2, name: 'Parcel' },
{ id: 2, name: 'webpack' },
{ id: 1, name: 'TypeScript' },
{ id: 3, name: 'TSLint' }
).pipe(
groupBy(p => p.id, { element: p => p.name }),
mergeMap(group$ => group$.pipe(reduce((acc, cur) => [...acc, cur], [`${ group$.key }`]))),
map(arr => ({ id: parseInt(arr[0], 10), values: arr.slice(1) }))
)
.subscribe(p => console.log(p));
// displays:
// { id: 1, values: [ 'JavaScript', 'TypeScript' ] }
// { id: 2, values: [ 'Parcel', 'webpack' ] }
// { id: 3, values: [ 'TSLint' ] }
However, simply changing the data in the of() to be an array (more like how my data comes back), breaks, and I am not sure how to fix it:
import { of, groupBy, mergeMap, reduce, map } from 'rxjs';
of(
[
{ id: 1, name: 'JavaScript' },
{ id: 2, name: 'Parcel' },
{ id: 2, name: 'webpack' },
{ id: 1, name: 'TypeScript' },
{ id: 3, name: 'TSLint' }
]
).pipe(
groupBy(p => p.id, { element: p => p.name }),
mergeMap(group$ => group$.pipe(reduce((acc, cur) => [...acc, cur], [`${ group$.key }`]))),
map(arr => ({ id: parseInt(arr[0], 10), values: arr.slice(1) }))
)
.subscribe(p => console.log(p));
What if you just turned that Array<IEntry> into a Record<number, IEntry> with something like lodash's group by and a map RxJS operator?
Then you can get the desired outcome with some flex-wrap and flex-row functionality on the template and just loop over the entries of the record:
Check this little working CodePen
import {groupBy} from 'lodash'
const fakeData = [
{ "EnterDate": 20221025, "UserId": "JohnDoe", "TimeWorked": 2.5, ... },
{ "EnterDate": 20221025, "UserId": "JohnDoe", "TimeWorked": 4.5, ... },
{ "EnterDate": 20221025, "UserId": "BSmith", "TimeWorked": 5, ... },
{ "EnterDate": 20221026, "UserId": "JohnDoe", "TimeWorked": 4, ... },
{ "EnterDate": 20221026, "UserId": "BSmith", "TimeWorked": 5, ... },
{ "EnterDate": 20221026, "UserId": "JohnDoe", "TimeWorked": 2, ... },
]
// Replace "of" with your API call
entriesByDate$: Observable<Record<number, IEntry>> = of(fakeData).pipe(
map(allEntries => groupBy(allEntries, 'EnterDate'))
)
<div *ngIf="entriesByDate$ | async as entries" class="flex flex-row flex-wrap">
<ng-container *ngFor="let [enterDate, entries] of Object.entries(entries)">
<entry-group-component [title]="enterDate" [data]="entries" />
</ng-container>
</div>
No need to import lodash if you care to write the grouping function yourself. Array#reduce should suffice:
function groupByEnterDate(entries: Array<IEntry>) {
return entries.reduce(
(acc, current) => {
const key = current.EnterDate
const groupedByKey = acc[key] ?? []
return { ...acc, [key]: [...groupedByKey, current] }
},
{}
)
}

how to grep the values from mongodb

New to development. I am trying to grep the values from JSON file. Can some one help me on this.
[{
"State": "New York",
"City": "Queens",
"Cars": {
"gas": {
"USAMade": {
"Ford": ["Fordcars", "Fordtrucks", "Fordsuv"]
},
"OutsideUS": {
"Toyota": ["Tcars", "Ttrucks", "TSUV"]
}
},
"electric": {
"USAMade": {
"Tesla": ["model3", "modelS", "modelX"]
},
"OutsideUS": {
"Nissan": ["Ncars", "Ntrucks", "NSUV"]
}
}
}
},
{
"State": "Atlanta",
"City": "Roswell",
"Cars": {
"gas": {
"USAMade": {
"Ford": ["Fordcars", "Fordtrucks", "Fordsuv"]
},
"OutsideUS": {
"Toyota": ["Tcars", "Ttrucks", "TSUV"]
}
},
"electric": {
"USAMade": {
"Tesla": ["model3", "modelS", "modelX"]
},
"OutsideUS": {
"Nissan": ["Ncars", "Ntrucks", "NSUV"]
}
}
}
}
]
How to list the type of cars like ( gas/electric)?
once i get the type, i want to list the respective country of made ( USAMade/OutsideUS).
After that i want to list the models ( Ford/Toyota)?
Lets suppose you have the documents in the file test.json , here it is how to grep using linux shell tools cat,jq,sort,uniq:
1) cat test.json | jq '.[] | .Cars | keys[] ' | sort | uniq
"electric"
"gas"
2) cat test.json | jq '.[] | .Cars[] | keys[] ' | sort | uniq
"OutsideUS"
"USAMade"
3) cat test.json | jq '.[] | .Cars[][] | keys[] ' | sort | uniq
"Ford"
"Nissan"
"Tesla"
"Toyota"
If your data is in mongoDB , I suggest you keep this distinct values in single document in separate collection and populate the frontend page on load from this collection and the document can look something like this:
{
State:["Atlanta","Oregon"],
City:["New York" , "Tokio" , "Moskow"],
Location:["OutsideUS" ,"USAMade"],
Model:["Ford","Toyota","Nissan"]
}
You don't need to extract distinct values from database every time your front page loads since it is not scalable solution and at some point it will become performance bottleneck ...
But if you want it anyway to get only the distinct keys from mongoDB based on selection you can do as follow:
1.
mongos> db.test.aggregate([ {"$project":{"akv":{"$objectToArray":"$Cars"}}} ,{$unwind:"$akv"} ,{ $group:{_id:null , "allkeys":{$addToSet:"$akv.k"} } }] ).pretty()
{ "_id" : null, "allkeys" : [ "gas", "electric" ] }
mongos> db.test.aggregate([ {"$project":{"akv":{"$objectToArray":"$Cars.gas"}}} ,{$unwind:"$akv"} ,{ $group:{_id:null , "allkeys":{$addToSet:"$akv.k"} } }] ).pretty()
{ "_id" : null, "allkeys" : [ "USAMade", "OutsideUS" ] }
mongos> db.test.aggregate([ {"$project":{"akv":{"$objectToArray":"$Cars.gas.USAMade"}}} ,{$unwind:"$akv"} ,{ $group:{_id:null , "allkeys":{$addToSet:"$akv.k"} } }] ).pretty()
{ "_id" : null, "allkeys" : [ "Ford" ] }

How to use JQ to copy a single value from a nested array which has duplicate keys

I have an array of json objects, each with an array of tags. Specific tags can appear multiple times in the child array but I only want the first matching tag (key+value) copied up onto the parent object. I've come up with a filter-set but it gives me multiple outputs if the given tag appears more than once in the child array ... I only want the first one.
Sample Json Input:
[
{
"name":"integration1",
"accountid":111,
"tags":[
{ "key": "env",
"values":["prod"]
},
{ "key": "team",
"values":["cougar"]
}
]
},
{
"name":"integration2",
"accountid":222,
"tags":[
{ "key": "env",
"values":["prod"]
},
{ "key": "team",
"values":["bear"]
}
]
},
{
"name":"integration3",
"accountid":333,
"tags":[
{ "key": "env",
"values":["test"]
},
{ "key": "team",
"values":["lemur"]
},
{ "key": "Env",
"values":["qa"]
}
]
}
]
Filter-set that I came up with:
jq -r '.[] | .tags[].key |= ascii_downcase | .env = (.tags[] | select(.key == "env").values[0])|[.accountid,.name,.env] | #csv' test.json
Example output with undesirable extra line:
111,"integration1","prod"
222,"integration2","prod"
333,"integration3","test"
333,"integration3","qa" <<<
Try using first(<expression>) to get only the first matching value. In case there are no matching values at all, you can use first(<expression>, <default_value>).
jq -r '.[] | .tags[].key |= ascii_downcase | .env = first((.tags[] | select(.key == "env").values[0]),null)|[.accountid,.name,.env] | #csv' test.json
Alternatively, if you are going to want to extract other tags similarly, you might prefer to extract them all into one object like this. I'm using reverse to meet your requirement of keeping the first match for any given key, otherwise the last match would win.
jq -r '.[] | .tags |= ( map({(.key|ascii_downcase): .values[0]}) | reverse | add ) | [.accountid, .name, .tags.env] | #csv'

Trying to filter an array output with jq

I have the given input as such:
[{
"ciAttributes": {
"entries": "{\"hostname-cdc1.website.com\":[\"127.0.0.1\"],\"hostname-cdc1-extension.website.com\":[\"127.0.0.1\"]}"
},
"ciAttributes": {
"entries": "{\"hostname-dfw1.website.com\":[\"127.0.0.1\"],\"hostname-dfw1-extension.website.com\":[\"127.0.0.1\"]}"
},
"ciAttributes": {
"entries": "{\"hostname-cdc2.website.com\":[\"127.0.0.1\"],\"hostname-cdc2-extension.website.com\":[\"127.0.0.1\"]}"
},
"ciAttributes": {
"entries": "{\"hostname-dfw2.website.com\":[\"127.0.0.1\"],\"hostname-dfw2-extension.website.com\":[\"127.0.0.1\"]}"
},
}]
...and when I execute my jq with the following command (manipulating existing json):
jq '.[].ciAttributes.entries | fromjson | keys | [ { hostname: .[0] }] | add' | jq -s '{ instances: . }'
...I get this output:
{
"instances": [
{
"hostname": "hostname-cdc1.website.com"
},
{
"hostname": "hostname-dfw1.website.com"
},
{
"hostname": "hostname-cdc2.website.com"
},
{
"hostname": "hostname-dfw2.website.com"
}
]
}
My end goal is to only extract "hostnames" that contain "cdc." I've tried playing with the json select expression but I get a syntax error so I'm sure I'm doing something wrong.
First, there is no need to call jq more than once.
Second, because the main object does not have distinct key names, you would have to use the --stream command-line option.
Third, you could use test to select the hostnames of interest, especially if as seems to be the case, the criterion can most easily be expressed as a regex.
So here in a nutshell is a solution:
Invocation
jq -n --stream -c -f program.jq input.json
program.jq
{instances:
[inputs
| select(length==2 and (.[0][-2:] == ["ciAttributes", "entries"]))
| .[-1]
| fromjson
| keys_unsorted[]
| select(test("cdc.[.]"))]}

MongoDB $lookup based on two schema

I would like to know how to query in mongo db based on two other schema collection.
I have 3 schema's in mongo db:
1. Site : {_id, name}
2. Components : { _id, siteId, details }
3. Maintenance : { _id, siteId }
I want to query and get all the components with site information and at same time ensuring that they are not in maintenance.
I am able to fetch components with site information with following query:
componentCollection
.aggregate([
{
$lookup: {
from: 'sites',
localField: 'siteId',
foreignField: '_id',
as: 'sites',
},
},
])
How to update this query so that I can ensure selected components site's are not in maintenance collection?
Update with sample data, expected and current output:
`Site`
---------------------------
_id | name
---------------------------
1 | site1
---------------------------
2 | site2
---------------------------
`Components`
---------------------------
_id | siteId | details
---------------------------
3 | 1 | help & support
---------------------------
4 | 2 | footer links
---------------------------
`Maintenance`
---------------------------
_id | siteId
---------------------------
5 | 1
---------------------------
With the above sample query I am getting the following result:
[
{
_id: 3,
siteId: 1,
details: 'help & support',
sites: [
{
_id: 1,
name: 'site1'
}
]
},
{
_id: 4,
siteId: 2,
details: 'footer links',
sites: [
{
_id: 2,
name: 'site2'
}
]
}
]
But I want only below, as site1 is in maintenance mode
[
{
_id: 4,
siteId: 2,
details: 'footer links',
sites: [
{
_id: 2,
name: 'site2'
}
]
}
]
This might help you.
Join the Components Collection and Site Collection
Nested join the Site and Maintenance. Because if Maintenance has the Site, we can easily eliminate the object.
Filter out the object in the Site array if joinMaintenance is an empty array using $filter.
So if joinMaintenance doesn't have any objects, it will exist in Sites
Here is the code
[
{
"$lookup": {
"from": "Site",
"let": {
"sId": "$siteId"
},
"pipeline": [
{
$match: {
$expr: {
$eq: [
"$_id",
"$$sId"
]
}
}
},
{
$lookup: {
from: "Maintenance",
let: {
"smId": "$_id"
},
pipeline: [
{
"$match": {
$expr: {
$eq: [
"$siteId",
"$$smId"
]
}
}
}
],
as: "joinMaintenance"
}
}
],
as: "sites"
}
},
{
$project: {
details: 1,
siteId: 1,
sites: {
$filter: {
input: "$sites",
cond: {
$eq: [
"$$this.joinMaintenance",
[]
]
}
}
}
}
},
{
$match: {
$expr: {
$ne: [
"$sites",
[]
]
}
}
},
{
$project: {
"sites.joinMaintenance": 0
}
}
]
Working Mongo playground

Resources