Mongodb find and delete records matching in another collection - database

I have two collections in my database.
First collection's name is init_records. A record document looks like,
if automatically created:
{
"_id": 0,
"email": "a#xxx.com",
"source":"campaign"
}
else if manually created by the user:
{
"_id": 1,
"email": "a#xxx.com",
"type":"typeA",
"isActive": false,
"process":"waiting"
}
You can see, auto created records have field "source" as record's source. But they don't have fields "type", "isActive" or "process"; team_id and isActive must be selected by the user manually, after user confirms the record, process field is added and set to "waiting". User may create a init_record self or update a automatically created record (like _id:0, some necessary fields are missing, they will be added by the user) and set for process.
Second collection's name is confirmed_records. If the user confirms an init record, a new confirmed record is created, for example, the above init_record with _id:1
{
"_id": "aa1",
"email": "a#xxx.com",
"type": "typeA",
"isActive": false
}
and the record in init_records collection (_id:1) is deleted. But if some error occurs in this process, no confirmed_record is created, init_record is not deleted but "process" field is set from "waiting" to "error".
Well, now I have decided to clean up some init_records, because users are usually ignoring automatically created (init records with "source" field) records and they manually create records with the same email. That means meanwhile they have a record for an email,
{
"_id": 10,
"email": "count#xxx.com",
"source":"commercial"
}
they ignore that record and create another record,
{
"_id": 11,
"email": "count#xxx.com",
"type":"typeB",
"isActive": false
"process":"waiting"
}
(As you can see, automatically created records have "source" field but no "process" field. Manually created ones have "process" and "isActive" but no "source")
And then after they confirm that record they have created, a confirmed_record is created,
{
"_id": "aa85",
"email": "count#xxx.com",
"type":"typeB",
"isActive": false
}
init_record _id:11 is deleted but init_record _id:10 still remains there. When you check an email (count#xxx.com in this example) they see both a confirmed record and an init (not confirmed) record and this is confusing.
Now I need to delete some init_records because some user's are very unhappy.
For an email:
If there is only an init_record (automatically or manually created), it shall keep remaining.
If there is an automatically created init_record and user has ignored it and created a second init_record (like the above example, _id: 10 and 11) the automatically created init_record (which has no "process" or "isActive" field) must be deleted
If there is an automatically created init_record but user has ignored it and created a second init_record (_id: 10 and 11) and then confirmed it (like the above example, _id:11 is deleted and confirmed_record _id:"aa85" is created), the remaining init_record (_id: 11) must be deleted.
If there is an automatically created init_record but user has ignored it and created a second init_record (_id: 10 and 11) and then -ignored the newly created one- and confirmed the automatically created one (this time _id:10 is deleted and a confirmed_record is created), the remaining init_record (this time _id: 10) must be deleted.
I created an example with mongo playgound with sample data:
https://mongoplayground.net/p/gdHsOPF-Z9F
Shortly:
init_record a#xxx.com also exists in confirmed_records, but this record is created by a user(have type and isActive fields) should remain there
init_record c#xxx.com (Automatically created) also exists in confirmed_records, init_record must be deleted
init_record d#xxx.com (Automatically created) also exists in confirmed_records (there are 2 confirmed_records with different types), init_record must be deleted
init_record f#xxx.com does not exist in confirmed_records, should remain there
init_record g#xxx.com does not exist in confirmed_records, should remain there
There are 2 init_records for h#xxx.com, one of them is automatically created, the other is manually created by a user. Only the automatically created (missing fields) one must be deleted.
(_ids 3, 6, 13 must be deleted, 10, 12, 14, 17 should remain)
So, if I check the records with the same email in both collections,
db.init_records.aggregate([
{
"$lookup": {
"from": "confirmed_records",
"localField": "email",
"foreignField": "email",
"as": "relation"
}
},
{
"$match": {
"$and": [
{
"relation._id": {
"$exists": true
}
},
{
"type": {
"$exists": false
}
}
]
}
}
])
3 records are listed with this aggregate, these are the emails which have both init and confirmed records, and also do not have "process" fields which means they are not confirmed by the user. But I don't know how to make this query to delete records, deleteMany method deletes records matching a filter, not an aggregate pipeline (as far as I know).
My real problem here is I also have to find and delete duplicate init_records, (if there is more than one init_records with same "email" and if at least one of them is manually created = the automatically created records must be removed), I have no clue what to do here. I can detect _ids 3 and 6 (these emails also exist in confirmed_records and they do not have "type" field) but I cannot detect _id:13 which is an automatically created record and has a manually created duplicate.
I have a feeling that I am missing something here, is there an easy way to handle this (finding both existing records in the other collection and also finding duplicate records (with some conditions and also deleting these unwanted records obtained with aggregate). Any ideas are appreciated, I have already spent many hours but couldn't find a solution yet. If you guys think this thing cannot be done via a few simple queries, I have to write some code to work with live data (Maybe a long batch process to check all the records one by one but still doable for me). Thank you for reading this.

Related

Update a subtotal value field everytime an item is added /deleted to a different collection

I need your help and guidance about what is the best way to update a field table everytime an item is added into a table, or deleted from the same table.
In my main collection I have a field called site, that is populated with one of these values => a, b or c.
Because with graphql I was not able to do so, the quick an easier fix was to have this subTotal collection that stores the subTotal of these 3 fields. Now I need to update this collection with lifecycle hooks, or with a cronjob. I don't know how to do this with either of these 2 methods.
I am looking for something like this
Add a new item with value a in field called "a", then do a +1 to my new collection with the field "a".
The same applies while deleting an item with the value "c" in the field site, then -1 the value in the field "c" from subTotal collection.
POST Old.site.a ==> New.a +1 | DEL Old.site.b ==> New.b -1
This new collection stores the subtotals of each category I have. I did this, just because I was not able to retrieve the subtotals with a graphql query. Please see here ==> GraphQL Subcategory Count Aggregated Query
What I was looking for, was a query that would retrieve all the subtotal of site field from webs collection in a format like this:
{
"data": {
"webs": { for each value from site field retrieve subtotal
"meta": {
"pagination": {
"a": {
"total": 498},
"b": :{
"total": 3198},
},
"c": :{
"total": 998},
},
}
}
}
}
I know that would be a big stress for Strapi to update these fields everytime a POST or DEL method is made, so maybe a cronjob would suffix, that can run every 5 minutes or so, would be a better idea.
Could you please help me?
I would owe you a lot!

Postgres SQL add key to all json array

In my database I have table which has a column called items, this column has jsonb type and almost every record has data like this one:
{"items": [{"id": "item-id", "sku": "some-sku", "quantity": 1, "master_sku": "some-master-sku"}]}
I need to create a migration which add type to every item, so it should looks like this:
{"items": [{"id": "item-id", "sku": "some-sku", "quantity": 1, "master_sku": "some-master-sku", "type": "product"}]}
I must add type to every record, and every record looks the same.
The problem is, that i have about milion records in my table, and I cannot iterate every item and add new type because it can take too long time and my deploy script may crash.
Guys, how can i do that in the simplest way?
As long as the format/content is always more or less the same, the fastest way to do it would probably be as a string operation, something like this:
UPDATE your_table
SET your_field = REGEXP_REPLACE(your_field::TEXT, '\}(.+?)', ', "type": "product"}\1', 'g')::JSONB
WHERE your_field IS NOT NULL;
Example: https://www.db-fiddle.com/f/fsHoFKz9szpmV5aF4dt7r/1
This just checks for any } character which is followed by something (so we know it's not the final one from the "items" object, and replaces it with the missing key/value (and whatever character was following).
Of course this performs practically no validation, such as whether that key/value already exists. Where you want to be between performance and correctness depends on what you know about your data.

Store sync: Many deletions, some failed

I have a store in which the user could delete multiple records with a single destroy operation.
Now, a few of these records are locked in the database (because someone else is working on them), and thus cannot be deleted. How can the server tell the frontend that the deletion of records with Id a, b, c was successful, but that records with Id x, y, z could not be deleted and should be moved back into the store and displayed in the grid?
The ExtJS store should know after the sync() which records were really deleted server-side, and which weren't.
I think there's no straightforward solution to this problem. I have opted for the following workaround:
The records now have an "IsDeleted" flag that is set to false by default:
fields:[{
...
},{
name: 'IsDeleted'
type: 'bool',
defaultValue: false
The store has a filter that hides entries where the flag is set to true:
filters:[{
property:'IsDeleted',
value:false
}]
When the user opts to delete, I don't remove entries from the store, instead I set the IsDeleted flag to true on these entries. The filter makes the user think that the entry has been deleted.
When the store syncs, it does an update operation, not a destroy operation. So the update endpoint of the API then has to delete all entries where IsDeleted is transmitted as true. If it can't delete an entry from the database, the corresponding json as returned to the client gets IsDeleted set to false, so that the frontend gets to know that the deletion of that entry failed.

Getting the latest (timestamp wise) value from cloudant query

I have a cloudant DB where each document looks like:
{
"_id": "2015-11-20_attr_00",
"key": "attr",
"value": "00",
"employeeCount": 12,
"timestamp": "2015-11-20T18:16:05.366Z",
"epocTimestampMillis": 1448043365366,
"docType": "attrCounts"
}
For a given attribute there is an employee count. As you can see I have a record for the same attribute every day. I am trying to create a view or index that will give me the latest record for this attribute. Meaning if I inserted a record on 2015-10-30 and another on 2015-11-10, then the one that is returned to me is just employee count for the record with timestamp 2015-11-10.
I have tried view, but I am getting all the entries for each attribute not just the latest. I did not look at indexes because I thought they do not get pre calculated. I will be querying this from client side, so having it pre calculated (like views are) is important.
Any guidance would be most appreciated. thank you
I created a test database you can see here. Just make sure your when you insert your JSON document into Cloudant (or CouchDB), your timestamps are not strings but JavaScript data objects:
https://examples.cloudant.com/latestdocs/_all_docs?include_docs=true
I built a search index like this (name the design doc "summary" and the search index "latest"):
function (doc) {
if ( doc.docType == "totalEmployeeCounts" && doc.key == "div") {
index("division", doc.value, {"store": true});
index("timestamp", doc.timestamp, {"store": true});
}
}
Then here's a query that will return only the latest record for each division. Note that the limit value will apply to each group, so with limit=1, if there are 4 groups you will get 4 documents not 1.
https://examples.cloudant.com/latestdocs/_design/summary/_search/latest?q=*:*&limit=1&group_field=division&include_docs=true&sort_field=-timestamp
Indexing TimeStamp as a string is not recommended.
Reference:
https://cloudant.com/blog/defensive-coding-in-mapindex-functions/#.VvRVxtIrJaT
I have the same problem. I converted the timestamp value to milliseconds (number) and then indexed that value.
var millis= Date.parse(timestamp);
index("millis",millis,{"store": false});
You can use the same query as Raj suggested but with the 'millis' field instead of the timestamp .

Check first value in array, insert new conditionally

I have an array of "states" in my documents:
{
"_id: ObjectId("53026de61e30e2525d000004")
"states" : [
{
"name" : "complete",
"userId" : ObjectId("52f4576126cd0cbe2f000005"),
"_id" : ObjectId("53026e16c054fc575d000004")
},
{
"name" : "active",
"userId" : ObjectId("52f4576126cd0cbe2f000005"),
"_id" : ObjectId("53026de61e30e2525d000004")
}
]
}
I just insert a new state onto the front of the array when there is a new state. Current work around until mongo 2.6 is released here: Can you have mongo $push prepend instead of append?
However I do not want users to be able to save the same state twice in row. I.E. if its already complete you should not be able to add another 'complete' state. Is there a way that I can check the first element in the array and only insert the new state if its not the same in one query/update command to mongo.
I say one query/update due to the fact that mongo does not support transactions so I don't want to query for the first element in the array then send another update statement, as that could cause problems if another state got inserted between my query and my update.
You can qualify your update statement with a query, for example:
db.mydb.states.update({"states.name":{$nin:["newstate"]}},{$addToSet:{"states":{"name":"newstate"}}})
This will prevent updates from a user if the query part of the update returns no document. You can additionally add more fields to filter on on the query part.

Resources