Extract values from an array in mongoDB to dataframe using rmongodb

Extract values from an array in mongoDB to dataframe using rmongodb - arrays

I'm querying a database containing entries as displayed in the example. All entries contain the following values:
_id: unique id of overallitem and placed_items
name: the name of te overallitem
loc: location of the overallitem and placed_items
time_id: time the overallitem was stored
placed_items: array containing placed_items (can range from zero: placed_items : [], to unlimited amount.
category_id: the category of the placed_items
full_id: the full id of the placed_items
I want to extract the name, full_id and category_id on a per placed_items level given a time_id and loc constraint
Example data:
{
"_id" : "5040",
"name" : "entry1",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [],
}
{
"_id" : "5041",
"name" : "entry2",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [
{
"_id" : "5043",
"category_id" : 101,
"full_id" : 901,
},
{
"_id" : "5044",
"category_id" : 102,
"full_id" : 902,
}
],
}
{
"_id" : "5042",
"name" : "entry3",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [
{
"_id" : "5045",
"category_id" : 101,
"full_id" : 903,
},
],
}
The expected outcome for this example would be:
"name" "full_id" "category_id"
"entry2" 901 101
"entry2" 902 102
"entry3" 903 101
So if placed_items is empty, do put the entry in the dataframe and if placed_items containts n entries, put n entries in dataframe
I tried to work out an RBlogger example to create the desired dataframe.
#Set up database
mongo <- mongo.create()
#Set up condition
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "loc", 1)
mongo.bson.buffer.start.object(buf, "time_id")
mongo.bson.buffer.append(buf, "$gte", 20120930)
mongo.bson.buffer.append(buf, "$lte", 20121002)
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
#Count
count <- mongo.count(mongo, "items_test.overallitem", query)
#Note that these counts don't work, since the count should be based on
#the number of placed_items in the array, and not the number of entries.
#Setup Cursor
cursor <- mongo.find(mongo, "items_test.overallitem", query)
#Create vectors, which will be filled by the while loop
name <- vector("character", count)
full_id<- vector("character", count)
category_id<- vector("character", count)
i <- 1
#Fill vectors
while (mongo.cursor.next(cursor)) {
b <- mongo.cursor.value(cursor)
order_id[i] <- mongo.bson.value(b, "name")
product_id[i] <- mongo.bson.value(b, "placed_items.full_id")
category_id[i] <- mongo.bson.value(b, "placed_items.category_id")
i <- i + 1
}
#Convert to dataframe
results <- as.data.frame(list(name=name, full_id=full_uid, category_id=category_id))
The conditions work and the code works if I would want to extract values on an overallitem level (i.e. _id or name) but fails to gather the information on a placed_items level. Furthermore, the dotted call for extracting full_id and category_id does not seem to work. Can anyone help?

Related

Mongodb TTL Index not expiring documents from collection

I have TTL index in collection fct_in_ussd as following
db.fct_in_ussd.createIndex(
{"xdr_date":1},
{ "background": true, "expireAfterSeconds": 259200}
)
{
"v" : 2,
"key" : {
"xdr_date" : 1
},
"name" : "xdr_date_1",
"ns" : "appdb.fct_in_ussd",
"background" : true,
"expireAfterSeconds" : 259200
}
with expiry of 3 days. Sample document in collection is as following
{
"_id" : ObjectId("5f4808c9b32ewa2f8escb16b"),
"edr_seq_num" : "2043019_10405",
"served_imsi" : "",
"ussd_action_code" : "1",
"event_start_time" : ISODate("2020-08-27T19:06:51Z"),
"event_start_time_slot_key" : ISODate("2020-08-27T18:30:00Z"),
"basic_service_key" : "TopSim",
"rate_event_type" : "",
"event_type_key" : "22",
"event_dir_key" : "-99",
"srv_type_key" : "2",
"population_time" : ISODate("2020-08-27T19:26:00Z"),
"xdr_date" : ISODate("2020-08-27T19:06:51Z"),
"event_date" : "20200827"
}
Problem Statement :- Documents are not getting removed from collection. Collection still contains 15 days old documents.
MongoDB server version: 4.2.3
Block compression strategy is zstd
storage.wiredTiger.collectionConfig.blockCompressor: zstd
Column xdr_date is also part of another compound index.
Observations as on Sep 24
I have 5 collections with TTL index.
It turns out that data is getting removed from one of the collection and rest of the collections remains unaffected.
Daily insertion rate is ~500M records (including 5 collections).
This observation left me confused.
TTL expiration thread run on single. Is it too much data for TTL to expire ?

Groovy for loop that compares records from database

I have a database with 1 table that holds hundreds of records. I need to make a for loop in groovy script that compares first record with second record, second record with third record, etc. i need to compare length changes between records and print out all changes that is higher than 30. Example - first record 30m, second record 40m, third record 100m. It will print out second-third record.
I dont know amount of records in table, so i dont know how to create for loop. Any suggestions?
Also records has ip. Each ip can be multiple times and i need to compare all records in each ip.
record 1:
port_nbr | 1
pair | pairA
length | 30.00
add_date | 2020-06-16 00:01:13.237164
record 2:
port_nbr | 1
pair | pairA
length | 65.00
add_date | 2020-06-16 00:02:13.237164
record 3:
port_nbr | 2
pair | pairc
length | 65.00
add_date | 2020-06-16 00:02:13.237164
I expect that for loop checks if current record port_nbr is the same with next record, if yes, then it checks if pair is same and if its the same, then he compares if length changed for 30+m. In this case it would output that there is 30+m change in 1/2 record. After outputing it, then it would compare second record and third record. But they doesnt have same port_nbr and pair, so i expect it to start comparing again all port_nbr that is 2 with all following records.
There could be even 10 records with port_nbr 1, but with different pairs. I need to check for pairs aswell and only then compare lengths.
My code at this moment:
import java.sql.*;
import groovy.sql.Sql
class Main{
static void main(String[] args) {
def dst_db1 = Sql.newInstance('connection.........')
dst_db1.getConnection().setAutoCommit(false)
def sql = (" select d.* from (select d.*, lead((case when length <> 'N/A' then length else length_to_fault end)::float) over (partition by port_nbr, pair order by port_nbr, pair, d.add_date) as lengthh from diags d)d limit 10")
def lastRow = [id:-1, port_nbr:-1, pair:'', lengthh:-1.0]
dst_db1.eachRow( sql ) {row ->
if( row.port_nbr == lastRow.port_nbr && row.pair == lastRow.pair){
BigDecimal lengthChange =
new BigDecimal(row.lengthh ? row.lengthh : 0 ) - new BigDecimal(lastRow.lengthh ? lastRow.lengthh :0 )
if( lengthChange > 30.0){
print "Port ${row.port_nbr}, ${row.pair} length change: $lengthChange"
println "/tbetween row ID ${lastRow.id} and ${row.id}"
}
lastRow = row
}else{
println "Key Changed"
lastRow = row
}
}
}
}

The following code will report length changes > 30 within the same port_nbr and pair.
def sql = 'Your SQL here.' // Should include "order by pair, port_nbr, date"
def lastRow = [id:-1, port_nbr:-1, pair:'', length:-1.0]
dst_db1.eachRow( sql ) { row ->
if ( row.port_nbr == lastRow.port_nbr && row.pair == lastRow.pair ) {
BigDecimal lengthChange =
new BigDecimal( row.length ) - new BigDecimal( lastRow.length )
if ( lengthChange > 30.0 ) {
print "Port ${row.port_nbr}, ${row.pair} length change: $lengthChange"
println "\tbetween row ID ${lastRow.id} and ${row.id}"
}
lastRow = row
} else {
println "Key changed"
lastRow = row
}
}
To run the above code without a database I prefixed it with this test code:
class DstDb1 {
def eachRow ( sql, closure ) {
rows.each( closure )
}
def rows = [
[id: 1, port_nbr: 1, pair: 'pairA', length: 30.00 ],
[id: 2, port_nbr: 1, pair: 'pairA', length: 65.00 ],
[id: 3, port_nbr: 1, pair: 'pairA', length: 70.00 ],
[id: 4, port_nbr: 1, pair: 'pairA', length: 75.00 ],
[id: 5, port_nbr: 1, pair: 'pairB', length: 130.00 ],
[id: 6, port_nbr: 1, pair: 'pairB', length: 165.00 ],
[id: 7, port_nbr: 1, pair: 'pairB', length: 170.00 ],
[id: 8, port_nbr: 1, pair: 'pairB', length: 175.00 ],
[id: 9, port_nbr: 2, pair: 'pairC', length: 230.00 ],
[id:10, port_nbr: 2, pair: 'pairC', length: 265.00 ],
[id:11, port_nbr: 2, pair: 'pairC', length: 270.00 ],
[id:12, port_nbr: 2, pair: 'pairC', length: 350.00 ]
]
}
DstDb1 dst_db1 = new DstDb1()
Running the test gives this result:
Key changed
Port 1, pairA length change: 35 between row ID 1 and 2
Key changed
Port 1, pairB length change: 35 between row ID 5 and 6
Key changed
Port 2, pairC length change: 35 between row ID 9 and 10
Port 2, pairC length change: 80 between row ID 11 and 12

mongoDB aggregate find total number of employees group by each state

Find total number of employees group by each state using aggregate
I tried the following in the screenshot link below. But the result is 0.
db.research.aggregate({$unwind:'$offices'},{"$match":
{'offices.country_code':"USA"}},{"$project": {'offices.state_code' : 1}},
{"$group" : {"_id":'$of
fices.state_code',"count" : {"$sum":'$number_of_employees'}}})

you need to $project number_of_employees to count in next stage, or you can remove the $project stage
db.research.aggregate([
{$unwind:'$offices'},
{"$match": {'offices.country_code':"USA"}},
{"$project": {'offices.state_code' : 1, 'number_of_employees' : 1}}, //project number_of_employees
{"$group" : {"_id":'$offices.state_code',"count" : {"$sum":'$number_of_employees'}}}
])
or
db.research.aggregate([
{$unwind:'$offices'},
{"$match": {'offices.country_code':"USA"}},
{"$group" : {"_id":'$offices.state_code',"count" : {"$sum":'$number_of_employees'}}}
])

How do I decipher Optional<Any> into an Array of Dictionaries?

The following is an abridge list of JSON data:
let json = JSON(data: response.result.value!).dictionary
self?.processData(json: json!)
...
...
func processData(json:[String:Any]) {
let myList = json["list"]
...
}
---------------------------------------------------------------------
(lldb) po myList
▿ Optional<Any>
▿ some : [
{
"author" : "Jimmy Kickarse",
"word" : "Turkey",
"defid" : 1925960,
"current_vote" : "",
"thumbs_down" : 1103,
"thumbs_up" : 1987,
"permalink" : "http:\/\/turkey.urbanup.com\/1925960",
"example" : "If through some crazy events Asia and Europe went to war, they'd both bomb Turkey.",
"definition" : "A country that's incredibly fun to be in because it's not quite European, but not quite Asian either."
},
…
{
"author" : "DildoBob",
"word" : "Turkey",
"defid" : 7671084,
"current_vote" : "",
"thumbs_down" : 27,
"thumbs_up" : 112,
"permalink" : "http:\/\/turkey.urbanup.com\/7671084",
"example" : "Turkey cannot tweet, because Prime Minister Recep Tayyip Erdogan (Or if you are dyslexic, Pro Gay Centipede Ray) banned it's usage and access.",
"definition" : "A bird that can't tweet."
}
]
▿ rawArray : 10 elements
▿ 0 : 9 elements
▿ 0 : 2 elements
- .0 : author
- .1 : Jimmy Kickarse
▿ 1 : 2 elements
- .0 : word
- .1 : Turkey
▿ 2 : 2 elements
- .0 : defid
- .1 : 1925960
▿ 3 : 2 elements
- .0 : current_vote
- .1 :
▿ 4 : 2 elements
- .0 : thumbs_down
- .1 : 1103
▿ 5 : 2 elements
- .0 : thumbs_up
- .1 : 1987
▿ 6 : 2 elements
- .0 : permalink
- .1 : http://turkey.urbanup.com/1925960
▿ 7 : 2 elements
- .0 : example
- .1 : If through some crazy events Asia and Europe went to war, they'd both bomb Turkey.
▿ 8 : 2 elements
- .0 : definition
- .1 : A country that's incredibly fun to be in because it's not quite European, but not quite Asian either.
…
…
- rawDictionary : 0 elements
- rawString : ""
- rawBool : false
- _type : SwiftyJSON.Type.array
- _error : nil
How can I decipher this?
The list says it's a 'SwiftJSON.Type.array' (see above).
But the following says it's not:
(lldb) po type(of:myList)
Swift.Optional<Any>
It looks like an array of dictionaries.
So I attempted to cast it as such:
(lldb) po myList as [[String:String]]
error: Execution was interrupted, reason: signal SIGABRT.
The process has been returned to the state before expression evaluation.
How can get the elements of this object??
...or correctly convert this into an array of dictionaries of strings to decipher?
Follow up:
if let myList = json["list"] as? [[String:Any]] {
print("Do Something")
}
The 'if' statement failed.

Your myList is sure Array of Dictionary but you need to cast it to [[String:Any]] not [[String:String]] because its Dictionary contains number and string both as a value, So simply cast it to [[String:Any]] works for your.
if let myList = json["list"] as? [[String:Any]] {
//access your myList array here
}

Per a suggestion, I printed the data out:
func getData(str:String) {
Alamofire.request(MashapeRouter.getDefinition(str))
.responseData { response in
let json = JSON(data: response.result.value!).dictionary
print("\n-------------\n")
print(json!)
return;
}
}
I got the following:
-------------
["result_type": exact, "sounds": [
"http:\/\/media.urbandictionary.com\/sound\/turkey-7377.mp3",
"http:\/\/wav.urbandictionary.com\/turkey-21188.wav",
"http:\/\/wav.urbandictionary.com\/turkey-24905.wav",
"http:\/\/wav.urbandictionary.com\/turkey-40301.wav"
], "list": [
{
"example" : "If through some crazy events Asia and Europe went to war, they'd both bomb Turkey.",
"thumbs_down" : 1103,
"author" : "Jimmy Kickarse",
"defid" : 1925960,
"definition" : "A country that's incredibly fun to be in because it's not quite European, but not quite Asian either.",
"thumbs_up" : 1988,
"word" : "Turkey",
"permalink" : "http:\/\/turkey.urbanup.com\/1925960",
"current_vote" : ""
},
{
"example" : "That honkey is one straight-up jive turkey!",
"thumbs_down" : 686,
"author" : "Jam Master J",
"defid" : 1528701,
"definition" : "(n) a loser; an uncoordinated, inept, clumsy fool\r\nOR\r\na tool; a person who is not in with current culture and slang or is just generally uncool. \r\nThese slang usages of the word \"turkey\" were mostly used during the late 60's and 70's by urban-dwelling blacks.\r\nSee [jive turkey]",
"thumbs_up" : 1160,
"word" : "turkey",
"permalink" : "http:\/\/turkey.urbanup.com\/1528701",
"current_vote" : ""
},…
],
"tags": [
"thanksgiving",
"chicken",
"sex",
"turkish",
"turk",
"food",
"gobble",
"duck",
"ham",
"sandwich"
]]
Base on this output, I tried the following:
po json!["list"]?[0]["example"].string!
▿ Optional<String>
- some : "If through some crazy events Asia and Europe went to war, they\'d both bomb Turkey."
So I'm getting close:
if let myString = json!["list"]?[0]["example"].string {
print("myString= \(myString)")
}
...from which I got the following:
myString= If through some crazy events Asia and Europe went to war, they'd both bomb Turkey.
(lldb)
...so apparently all I need to do is to clean this all up; avoiding the 'pyramid of doom':- multiple optional unwrapping.

Viewing unique fields

Is there a juttle program I can run to view all unique fields within a given query? I'm trying comb through a list of events of the same type that have a ton of different fields.
I know I could just use the #table sink and scroll right but, I'd like to view unique fields in a list if possible.

Hacky but works:
events -from :5 minutes ago: -to :now: | head 1 | #logger -display.style 'pretty'
You get:
{
"bytes" : 7745,
"status" : "200",
"user_agent" : "Mozilla/5.0 (iPhone; CPU iPhone OS 511 like Mac OS X) AppleWebKit/534.46 (KHTML like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3",
"version" : "1.1",
"ctry" : "US",
"ident" : "-",
"message" : "194.97.17.121 - - [2014-02-25T09:00:00-08:00] \"GET /black\" 200 7745 \"http://google.co.in/\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 511 like Mac OS X) AppleWebKit/534.46 (KHTML like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3\"",
"auth" : "-",
"verb" : "GET",
"url" : "/black",
"source_host" : "www.jut.io",
"referer" : "http://google.co.in/",
"space" : "default",
"type" : "event",
"time" : "2014-12-11T23:46:21.905Z",
"mock_type" : "demo",
"event_type" : "web"
}

You can use the split proc in combination with reduce by to get this list.
emit -limit 1
|(
put field1 = 1, field2 = 2;
put field2 = 2, field3 = 3;
)| split // break each point into one point for each field, assigning each field name into the point's name field
| reduce by name // get unique list of name field values
| sort name
| #logger
{"name":"field3"}
{"name":"field2"}
{"name":"field1"}
==============================================================

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Extract values from an array in mongoDB to dataframe using rmongodb - arrays

Related

Mongodb TTL Index not expiring documents from collection

Groovy for loop that compares records from database

mongoDB aggregate find total number of employees group by each state

How do I decipher Optional<Any> into an Array of Dictionaries?

Viewing unique fields

Categories

Resources