Error in creating index in Mongo db using mongo console - database

I am using this query for creating the index:
db.CollectionName.createIndex({result: {$exists:true}, timestamp : {$gte: 1573890921898000}})
What am I trying to do here is, creating indexing on timestamp > last month and for only those data where result exist, but I am getting error
{
"ok" : 0,
"errmsg" : "Values in v:2 index key pattern cannot be of type object. Only numbers > 0, numbers < 0, and strings are allowed.",
"code" : 67,
"codeName" : "CannotCreateIndex"
}
What am I doing wrong here?

thanks to #prasad_ for the suggestion, i needed to use partial_Indexes so the query becomes :
db.CollectionName.createIndex({result: 1, timestamp: 1}, {partialFilterExpression :{result: {$exists:true}, timestamp : {$gte: 1573890921898000}}})

Related

Mongodb TTL Index not expiring documents from collection

I have TTL index in collection fct_in_ussd as following
db.fct_in_ussd.createIndex(
{"xdr_date":1},
{ "background": true, "expireAfterSeconds": 259200}
)
{
"v" : 2,
"key" : {
"xdr_date" : 1
},
"name" : "xdr_date_1",
"ns" : "appdb.fct_in_ussd",
"background" : true,
"expireAfterSeconds" : 259200
}
with expiry of 3 days. Sample document in collection is as following
{
"_id" : ObjectId("5f4808c9b32ewa2f8escb16b"),
"edr_seq_num" : "2043019_10405",
"served_imsi" : "",
"ussd_action_code" : "1",
"event_start_time" : ISODate("2020-08-27T19:06:51Z"),
"event_start_time_slot_key" : ISODate("2020-08-27T18:30:00Z"),
"basic_service_key" : "TopSim",
"rate_event_type" : "",
"event_type_key" : "22",
"event_dir_key" : "-99",
"srv_type_key" : "2",
"population_time" : ISODate("2020-08-27T19:26:00Z"),
"xdr_date" : ISODate("2020-08-27T19:06:51Z"),
"event_date" : "20200827"
}
Problem Statement :- Documents are not getting removed from collection. Collection still contains 15 days old documents.
MongoDB server version: 4.2.3
Block compression strategy is zstd
storage.wiredTiger.collectionConfig.blockCompressor: zstd
Column xdr_date is also part of another compound index.
Observations as on Sep 24
I have 5 collections with TTL index.
It turns out that data is getting removed from one of the collection and rest of the collections remains unaffected.
Daily insertion rate is ~500M records (including 5 collections).
This observation left me confused.
TTL expiration thread run on single. Is it too much data for TTL to expire ?

How clustering is helping in query pruning in Snowflake?

I have a table clustered on s_nation_key as below.
create or replace table t1
( S_SUPPKEY string,
S_NAME string,
S_NATIONKEY string,
S_ADDRESS string,
S_ACCTBAL string) cluster by (S_NATIONKEY);
Now i have added data to it
INSERT INTO T1
SELECT S_SUPPKEY , S_NAME,S_NATIONKEY,S_ADDRESS,S_ACCTBAL
FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."SUPPLIER"
WHERE S_NATIONKEY=7
limit 50000;
When i check data distribution in underlying micro partition itlooks good .
>select system$clustering_information('t1','S_NATIONKEY');
{ "cluster_by_keys" : "LINEAR(S_NATIONKEY)", "total_partition_count" : 1, "total_constant_partition_count" : 0, "average_overlaps" : 0.0, "average_depth" : 1.0, "partition_depth_histogram" : {
"00000" : 0,
"00001" : 1,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0 } }
Again i have loaded few more record as below for particular s_nation_key set as below.
--batch load 2
INSERT INTO T1
SELECT S_SUPPKEY , S_NAME,S_NATIONKEY,S_ADDRESS,S_ACCTBAL
FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."SUPPLIER"
WHERE S_NATIONKEY=3
LIMIT 50000;
--batch load 3
INSERT INTO T1
SELECT S_SUPPKEY , S_NAME,S_NATIONKEY,S_ADDRESS,S_ACCTBAL
FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."SUPPLIER"
WHERE S_NATIONKEY=1
limit 50000;
--batch load 3
INSERT INTO T1
SELECT S_SUPPKEY , S_NAME,S_NATIONKEY,S_ADDRESS,S_ACCTBAL
FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."SUPPLIER"
WHERE S_NATIONKEY=2
and S_ACCTBAL>0
limit 50000;
Now when i check clustering information again ,this also looks good . Now total 4 micro-partition and each distinct S_NATIONKEY value set is loaded into individual partition with no overlapping in range.So all micro-partition is having clustering depth 1.
>select system$clustering_information('t1','S_NATIONKEY');
{
"cluster_by_keys" : "LINEAR(S_NATIONKEY)",
"total_partition_count" : 4,
"total_constant_partition_count" : 4,
"average_overlaps" : 0.0,
"average_depth" : 1.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 4,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
Now as per Snowflake documentation and concept of query pruning, when ever we search for records belong to one cluster_key value , it should scan only particular micro-partition which will be holding that cluster_key value (basing on min/max value range of each micro-partition). But in my case it is scanning all underlying micro partition(as below)
.
As per above query planning stats,it is scanning all the partitions, instead of scanning 1 .
Am i missing anything here ??What is the logic behind it ??
Please help me in understanding this scenario in Snowflake.
Thanks,
#Himanshu
The Autoclustering or the clustering keys are not intended for all tables. It is usually suggested for a very large table that runs into Terra bytes in size. We should not compare the cluster key to any index kind of object that is available in most of the RDBMS systems. Here we are grouping the data into the micro partitions in an orderly fashion which helps to avoid scanning the partitions which may not contain the requested data. In the case of small tables, the engine prefers to scan all the partitions if it estimates that this is not a costly operation.
Refer to the Attention Section of the documentation :
https://docs.snowflake.com/en/user-guide/tables-clustering-keys.html#clustering-keys-clustered-tables.
Here the size of the table is not that big, that is why it is scanning all the partition rather one. Even if you check the total size scanned it is just 7.96 mb which is small hence SF scans all partitions

mongoDB aggregate find total number of employees group by each state

Find total number of employees group by each state using aggregate
I tried the following in the screenshot link below. But the result is 0.
db.research.aggregate({$unwind:'$offices'},{"$match":
{'offices.country_code':"USA"}},{"$project": {'offices.state_code' : 1}},
{"$group" : {"_id":'$of
fices.state_code',"count" : {"$sum":'$number_of_employees'}}})
you need to $project number_of_employees to count in next stage, or you can remove the $project stage
db.research.aggregate([
{$unwind:'$offices'},
{"$match": {'offices.country_code':"USA"}},
{"$project": {'offices.state_code' : 1, 'number_of_employees' : 1}}, //project number_of_employees
{"$group" : {"_id":'$offices.state_code',"count" : {"$sum":'$number_of_employees'}}}
])
or
db.research.aggregate([
{$unwind:'$offices'},
{"$match": {'offices.country_code':"USA"}},
{"$group" : {"_id":'$offices.state_code',"count" : {"$sum":'$number_of_employees'}}}
])

JSONata sort / order by of an array

I want to order an array. The JSONata expression below has an incoming array as follows.
[{"id":"Air-1a",
"Controller":"ESP62",
"Cntr-TaskNo":10,
"Cntr-GPIO":13,
"name":"Air",
"valueName":"Humidity",
"Sensor":"DHT22",
(and many other key pairs)},
{next object}, ...]
I then transform the array with the following JSONata expression:
payload.(
{ "Controller" : $.Controller,
"Cntr-TaskNo": $.CntrDef.TaskNo,
"Cntr-GPIO" : $.CntrDef.GPIO,
"name" : $.name,
"valueName" : $.valueName,
"Sensor" : $.Sensor,
"id" : $.id
}
)
But now I want to - in the same JSONata expression, sort on firstly the Controller, and then the GPIO. To tried with the Controller only first.
I tried:
payload.(
{ $sort("Controller",function($l, $r){$l.Controller > $r.Controller}) : $.Controller ,
"Cntr-TaskNo": $.CntrDef.TaskNo,
"Cntr-GPIO" : $.CntrDef.GPIO,
"name" : $.name,
"valueName" : $.valueName,
"Sensor" : $.Sensor,
"id" : $.id
}
)
As well as trying to add the sort function at the end with the ~> chaining command. I also tried the order-by operator.
Could anyone point me in the right direction?
//----------
The new flow with the changed 'ESP62' to '-' that does not work:
[{"id":"874b0c77.f87418","type":"inject","z":"6f27a311.d135bc","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":200,"y":180,"wires":[["8c196590.c20638"]]},{"id":"8c196590.c20638","type":"change","z":"6f27a311.d135bc","name":"Dataset","rules":[{"t":"set","p":"payload","pt":"msg","to":"[{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":10,\"CntrGPIO\":13,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"111\",\"bbb\":\"222\",\"ccc\":\"333\"},{\"id\":\"Air-2a\",\"Controller\":\"ESP72\",\"CntrTaskNo\":11,\"CntrGPIO\":14,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"444\",\"bbb\":\"555\",\"ccc\":\"666\"},{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":2,\"CntrGPIO\":9,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"},{\"id\":\"Air-1a\",\"Controller\":\"-\",\"CntrTaskNo\":10,\"CntrGPIO\":12,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"}]","tot":"json"}],"action":"","property":"","from":"","to":"","reg":false,"x":360,"y":180,"wires":[["13981162.14e28f"]]},{"id":"c8a256a5.a170c8","type":"debug","z":"6f27a311.d135bc","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":690,"y":180,"wires":[]},{"id":"13981162.14e28f","type":"change","z":"6f27a311.d135bc","name":"Jsonata $sort","rules":[{"t":"set","p":"payload","pt":"msg","to":"($sort(payload,function($l , $r){$l.Controller > $r.Controller}) ; \t$sort(payload,function($l , $r){$l.CntrGPIO > $r.CntrGPIO}))","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":180,"wires":[["c8a256a5.a170c8"]]}]
I suggest first sorting the dataset and afterward transform the already sorted array of objects. The transformation is trivial and you want to know how to sort, so I show below one possible solution. It uses an expression with two concatenated $sort functions.
Edited after a better understanding of the requirement.
I tested successfully a Node-RED flow using this expression in a change node:
($a := $sort(payload,function($l , $r){$l.Controller > $r.Controller}) ; $sort($a,function($l , $r){(($l.Controller = $r.Controller) and ($l.CntrGPIO > $r.CntrGPIO))}))
Flow (contain dataset set hardcoded):
[{"id":"a7814b7e.3adeb8","type":"tab","label":"Flow 4","disabled":false,"info":""},{"id":"8bf10833.c71748","type":"inject","z":"a7814b7e.3adeb8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":140,"wires":[["9e365564.edca08"]]},{"id":"9e365564.edca08","type":"change","z":"a7814b7e.3adeb8","name":"Dataset","rules":[{"t":"set","p":"payload","pt":"msg","to":"[{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":10,\"CntrGPIO\":13,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"111\",\"bbb\":\"222\",\"ccc\":\"333\"},{\"id\":\"Air-2a\",\"Controller\":\"ESP72\",\"CntrTaskNo\":11,\"CntrGPIO\":14,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"444\",\"bbb\":\"555\",\"ccc\":\"666\"},{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":2,\"CntrGPIO\":9,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"},{\"id\":\"Air-1a\",\"Controller\":\"-\",\"CntrTaskNo\":10,\"CntrGPIO\":12,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"}]","tot":"json"}],"action":"","property":"","from":"","to":"","reg":false,"x":300,"y":140,"wires":[["762f6421.074fec"]]},{"id":"f827bddb.c9acd","type":"debug","z":"a7814b7e.3adeb8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":630,"y":140,"wires":[]},{"id":"762f6421.074fec","type":"change","z":"a7814b7e.3adeb8","name":"Jsonata $sort","rules":[{"t":"set","p":"payload","pt":"msg","to":"($a := $sort(payload,function($l , $r){$l.Controller > $r.Controller}) ; $sort($a,function($l , $r){(($l.Controller = $r.Controller) and ($l.CntrGPIO > $r.CntrGPIO))}))","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":460,"y":140,"wires":[["f827bddb.c9acd"]]}]
also tested in Jsonata exerciser: http://try.jsonata.org/S1IlT3y-E
You can sort the array using the following expression:
payload^(Controller, CntrDef.GPIO)
The order-by operator ^ will sort the array, first by increasing value of Controller, then by increasing value of CntrGPIO. You can then transform each object within that array
payload^(Controller, CntrDef.GPIO).{
"Controller" : Controller,
"Cntr-TaskNo": CntrDef.TaskNo,
"Cntr-GPIO" : CntrDef.GPIO,
"name" : name,
"valueName" : valueName,
"Sensor" : Sensor,
"id" : id
}

Adding child node values

Below is the firebase Database of a child node of a particular user under the "users" node:
"L1Bczun2d5UTZC8g2LXchLJVXsh1" : {
"email" : "orabbz#yahoo.com",
"fullname" : "orabueze yea",
"teamname" : "orabbz team",
"total" : 0,
"userName" : "orabbz#yahoo.com",
"week1" : 0,
"week10" : 0,
"week11" : 0,
"week12" : 0,
"week2" : 0,
"week3" : 17,
"week4" : 0,
"week5" : 20,
"week6" : 0,
"week7" : 0,
"week8" : 0,
"week9" : 10
},
IS there a way to add up all the values of Weeks 1 down to week 12 and have the total sum in the total key?
I am curently thinking of bringing all the values of week 1 - week 12 brought into the angular js scope then adding up the values and then posting the total back in the firebase databse key total. But this sounds too long winded. is there a shorter solution?
As far as I know, Firebase DB doesn't have any DB functions as you'd have in SQL. So the options you have is to get the data and calculate it in Angular as you say. Or update a counter when the weeks are added to the user (when writing to the DB). Then just read the counter later

Resources