Viewing unique fields - juttle

Is there a juttle program I can run to view all unique fields within a given query? I'm trying comb through a list of events of the same type that have a ton of different fields.
I know I could just use the #table sink and scroll right but, I'd like to view unique fields in a list if possible.

Hacky but works:
events -from :5 minutes ago: -to :now: | head 1 | #logger -display.style 'pretty'
You get:
{
"bytes" : 7745,
"status" : "200",
"user_agent" : "Mozilla/5.0 (iPhone; CPU iPhone OS 511 like Mac OS X) AppleWebKit/534.46 (KHTML like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3",
"version" : "1.1",
"ctry" : "US",
"ident" : "-",
"message" : "194.97.17.121 - - [2014-02-25T09:00:00-08:00] \"GET /black\" 200 7745 \"http://google.co.in/\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 511 like Mac OS X) AppleWebKit/534.46 (KHTML like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3\"",
"auth" : "-",
"verb" : "GET",
"url" : "/black",
"source_host" : "www.jut.io",
"referer" : "http://google.co.in/",
"space" : "default",
"type" : "event",
"time" : "2014-12-11T23:46:21.905Z",
"mock_type" : "demo",
"event_type" : "web"
}

You can use the split proc in combination with reduce by to get this list.
emit -limit 1
|(
put field1 = 1, field2 = 2;
put field2 = 2, field3 = 3;
)| split // break each point into one point for each field, assigning each field name into the point's name field
| reduce by name // get unique list of name field values
| sort name
| #logger
{"name":"field3"}
{"name":"field2"}
{"name":"field1"}
==============================================================

Related

New to NoSQL and a little confused with creating collections

Im a student just starting out on NoSQL and its just not clicking with me. im a little confused on a few points.
Any help would be greatly appreciated
1.Can documents belong to multiple collections?
2.Have I the correct syntax here for creating the Collection?
The pic is the collection er and a is just a snippet of the full er.
db.Animal.insert ( {
“animal_ID” : “XXXXXXX “,
“common_name” : “Red Squirrel”,
“IUCN” : “Least Concern (declining)”,
“photo” : “qs451xkx6qf4j”,
“extinct” : {
“when” : “null “,
“reason” : “null”
},
“invasive” : {
“threat_level” : “null”,
“threat” : “null”,
“how_to_help” : “null”
},
“native” : {
“endangerment” : “population declining“,
“how_to_help” : “providing a little extra food, planting some red squirrel-friendly shrubs and reporting any red or grey squirrel activity”
},
“Fact_sheet” : “{
“fact_id” : “ “,
“animal_id” : “ XXXXXXX “,
“order” : “ Rodentia “,
“family” : “Sciuridae “ ,
“species” : “Sciurus vulgaris “ ,
“size” : “body length 19 to 23 cm, tail length 15 to 20 cm “ ,
“weight” : “250 to 340 g “ ,
“lifespan” : “3 years , 7 to 10 in captivity “ ,
“extra” : “In Norse mythology, Ratatoskr is a red squirrel who runs up and down with messages in the world tree, Yggdrasil, and spreads gossip “ ,
“habitat” : { [
“name” : “woodland “,
“description” : “a low-density forest forming open habitats with plenty of sunlight and limited shade “
]
});
Can documents belong to multiple collections?
In MongoDB, no. In other databases, I don't know.
2.Have I the correct syntax here for creating the Collection?
To create a collection you would use https://docs.mongodb.com/manual/reference/method/db.createCollection/. This call also permits you to pass various collection options.
You are inserting a document. In MongoDB when a document is inserted, if the destination collection doesn't exist, it is created automatically by the server.

Mongodb TTL Index not expiring documents from collection

I have TTL index in collection fct_in_ussd as following
db.fct_in_ussd.createIndex(
{"xdr_date":1},
{ "background": true, "expireAfterSeconds": 259200}
)
{
"v" : 2,
"key" : {
"xdr_date" : 1
},
"name" : "xdr_date_1",
"ns" : "appdb.fct_in_ussd",
"background" : true,
"expireAfterSeconds" : 259200
}
with expiry of 3 days. Sample document in collection is as following
{
"_id" : ObjectId("5f4808c9b32ewa2f8escb16b"),
"edr_seq_num" : "2043019_10405",
"served_imsi" : "",
"ussd_action_code" : "1",
"event_start_time" : ISODate("2020-08-27T19:06:51Z"),
"event_start_time_slot_key" : ISODate("2020-08-27T18:30:00Z"),
"basic_service_key" : "TopSim",
"rate_event_type" : "",
"event_type_key" : "22",
"event_dir_key" : "-99",
"srv_type_key" : "2",
"population_time" : ISODate("2020-08-27T19:26:00Z"),
"xdr_date" : ISODate("2020-08-27T19:06:51Z"),
"event_date" : "20200827"
}
Problem Statement :- Documents are not getting removed from collection. Collection still contains 15 days old documents.
MongoDB server version: 4.2.3
Block compression strategy is zstd
storage.wiredTiger.collectionConfig.blockCompressor: zstd
Column xdr_date is also part of another compound index.
Observations as on Sep 24
I have 5 collections with TTL index.
It turns out that data is getting removed from one of the collection and rest of the collections remains unaffected.
Daily insertion rate is ~500M records (including 5 collections).
This observation left me confused.
TTL expiration thread run on single. Is it too much data for TTL to expire ?

Array insertion to SQL and retrieval formatting

I am working on sending a collection of custom objects to SQL. My objects look like this:
aa : True
phone_number :
manual_scan_time : 0
ip_addr : 10.50.0.147
last_logon_user_guid : 286c1410eb470e4ea31dd17e9c1eee31
urlfilter_violated : 0
last_connect_time : 1563379779
id : 0542c82cada09040be79bfb6e54fd119
spyware_detected : 0
arch : x64
sched_start_time : 1562922019
virus_detected : 0
platform : Win Server 2012 R2
version : 6.6.2457/14.1.1516
manual_aggressive_complete_time : 0
online : True
type : 1
status : 1
spam_detected : 0
sched_complete_time : 1562926028
sched_scan_time : 1559294629
pop3_scan : False
manual_aggressive_start_time : 0
last_logon_user_account : jdoe
name : server01
ss_service : True
scan_mode : 0
manual_complete_time : 0
mac_addr : 2B:41:38:B4:34:46
components : {#{version=0; type=1}, #{version=11.000.1006; type=3}, #{version=15.239.00; type=14}}
manual_start_time : 0
last_logon_user_domain : acme
created_at : 1510519578
last_connect_time_human : 7/17/2019 4:09:39 PM
CustomerName : Acme Corp
Looking at the "components" property, I see it is an array and is not being set in the database. So, I started looking into how I might convert that to a string, in such a way that I could extract it later and convert it back to this array.
I am looping through the objects like this:
$sqlValues = New-Object Collections.ArrayList
$allComputers.computers | ForEach-Object {
$_ | ForEach-Object {
$_.PsObject.Properties | ForEach-Object {
If ($_.Value -match "'") {
$_.Value = $_.Value.Replace("'", "''")
}
}
}
$sqlValues.Add("('$($_.aa)', '$($_.phone_number)', '$($_.manual_scan_time)', '$($_.ip_addr)', '$($_.last_logon_user_guid)', '$($_.urlfilter_violated)', '$($_.last_connect_time)', '$($_.id)', '$($_.spyware_detected)', '$($_.arch)', '$($_.sched_start_time)', '$($_.virus_detected)', '$($_.platform)', '$($_.version)', '$($_.manual_aggressive_complete_time)', '$($_.online)', '$($_.type)', '$($_.status)', '$($_.spam_detected)', '$($_.sched_complete_time)', '$($_.sched_scan_time)', '$($_.pop3_scan)', '$($_.manual_aggressive_start_time)', '$($_.last_logon_user_account)', '$($_.name)', '$($_.ss_service)', '$($_.scan_mode)', '$($_.manual_complete_time)', '$($_.mac_addr)', '$($_.components)', '$($_.manual_start_time)', '$($_.last_logon_user_domain)', '$($_.created_at)', '$($_.last_connect_time_human)', '$($_.CustomerName)', '$(Get-Date)')")
}
My first thought was to use:
$sqlValues.Add("('$($_.components | Out-String)'")
That worked, but the field was a string that looked like a table. Then I thought:
$sqlValues.Add("('$($($_.components | ForEach-Object {$_ -join ','}).TrimEnd())')")
That output looked right, but when I queried SQL to get the objects back, there are extra lines after "components" and "QueryDate". Where are those coming from? Or rather, how do I get rid of them?
aa : True
phone_number :
manual_scan_time : 0
ip_addr : 10.50.0.147
last_logon_user_guid : 286c1410eb470e4ea31dd17e9c1eee31
urlfilter_violated : 0
last_connect_time : 1563379779
id : 0542c82cada09040be79bfb6e54fd119
spyware_detected : 0
arch : x64
sched_start_time : 1562922019
virus_detected : 0
platform : Win Server 2012 R2
version : 6.6.2457/14.1.1516
manual_aggressive_complete_time : 0
online : True
type : 1
status : 1
spam_detected : 0
sched_complete_time : 1562926028
sched_scan_time : 1559294629
pop3_scan : False
manual_aggressive_start_time : 0
last_logon_user_account : jdoe
name : server01
ss_service : True
scan_mode : 0
manual_complete_time : 0
mac_addr : 2B:41:38:B4:34:46
components : #{version=0; type=1} #{version=11.000.1006; type=3} #{version=15.241.00; type=14}
manual_start_time : 0
last_logon_user_domain : acme
created_at : 1510519578
last_connect_time_human : 7/17/2019 4:09:39 PM
CustomerName : Acme Corp
QueryDate : 7/18/2019 12:00:00 AM

Powershell: How can I use an array or hashtable as an inline lookup

Using Powershell I am calling win32_computersystem. I want to list data about the machine including $_thermalstate - Here is my code
The code looks as though it should work but returns a empty value. I want to create an inline array or hash table that the value $_.thermalstate references.
Get-WmiObject win32_computersystem | select Name, Model, Caption, #{n="Timezone"; e={$_.currenttimezone}}, Description, DNShostname,Domain,#{n='Domain Role'; E={$_.domainrole}},Roles,Status,#{n='System Type'; e={$_.systemtype}},#{n='Thermal State'; e={$_.thermalstate[#{'3'='safe'}]}}
Output
Name : MYPC
Model : Latitude E5470
Caption : MYPC
Timezone : 600
Description : AT/AT COMPATIBLE
DNShostname : MYPC
Domain : work.biz
Domain Role : 1
Roles : {LM_Workstation, LM_Server, NT}
Status : OK
System Type : x64-based PC
Thermal State : Safe
your lookup structure was ... wrong. [grin]
replace the last line of this reformatted version of your code ...
Get-WmiObject win32_computersystem |
Select-Object Name, Model, Caption,
#{n="Timezone"; e={$_.currenttimezone}},
Description, DNShostname,Domain,
#{n='Domain Role'; E={$_.domainrole}},
Roles,Status,
#{n='System Type'; e={$_.systemtype}},
#{n='Thermal State'; e={$_.thermalstate[#{'3'='safe'}]}}
... with this line ...
#{n='Thermal State'; e={#{'3'='Safe'}["$($_.ThermalState)"]}}
note that the lookup table is on the OUTSIDE of the [] and that the value is forced to a string.
however, i would NOT do it this way. it's too finicky. create the lookup table BEFORE your call and use that to perform the lookup.
Your code looks like you are trying to declare/initialize a hash table while also trying to use thermalstate as a hash array.
If you initialize the hash array first, the code looks like this:
$h = #{'3'='safe'}; Get-WmiObject win32_computersystem | select Name, Model, Caption, #{n="Timezone"; e={$_.currenttimezone}}, Description, DNShostname,Domain,#{n='Domain Role';E={$_.domainrole}},Roles,Status,#{n='System Type'; e={$_.systemtype}},#{n='Thermal State'; e={$h[$_.thermalstate.toString()]}}
According to https://wutils.com/wmi/root/cimv2/win32_computersystem/
ThermalState property
CIMTYPE 'uint16'
Description 'The ThermalState property identifies the enclosure's thermal state when last booted.'
MappingStrings ['SMBIOS|Type 3|System Enclosure or Chassis|Thermal State']
read True
ValueMap ['1', '2', '3', '4', '5', '6']
Values ['Other', 'Unknown', 'Safe', 'Warning', 'Critical', 'Non-recoverable']
ThermalState property is in 1 class (Win32_ComputerSystem) of ROOT\cimv2 and in 2 namespaces
You could create an enum
enum ThermalState {
Other = 1
Unknown = 2
Safe = 3
Warning = 4
Critical = 5
NonRecoverable = 6
}
And use that to get a verbose response from the property
Get-WmiObject win32_computersystem | Select-Object Name, Model, Caption,
#{n="Timezone"; e={$_.currenttimezone}}, Description, DNShostname,Domain,
#{n='Domain Role';E={$_.domainrole}},Roles,Status,
#{n='System Type'; e={$_.systemtype}},
#{n='Thermal State'; e={[ThermalState]$_.thermalstate}}
Sample output
Name : HP-G1610
Model : ProLiant MicroServer Gen8
Caption : HP-G1610
Timezone : 120
Description : AT/AT COMPATIBLE
DNShostname : HP-G1610
Domain : DOMAIN
Domain Role : 0
Roles : {...}
Status : OK
System Type : x64-based PC
Thermal State : Safe
In general to get a list of an enum :
> $Enum ='System.DayOfWeek'
> [Enum]::GetValues($Enum) | ForEach-Object {'{0} {1}' -f [int]$_,$_ }
0 Sunday
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday

Extract values from an array in mongoDB to dataframe using rmongodb

I'm querying a database containing entries as displayed in the example. All entries contain the following values:
_id: unique id of overallitem and placed_items
name: the name of te overallitem
loc: location of the overallitem and placed_items
time_id: time the overallitem was stored
placed_items: array containing placed_items (can range from zero: placed_items : [], to unlimited amount.
category_id: the category of the placed_items
full_id: the full id of the placed_items
I want to extract the name, full_id and category_id on a per placed_items level given a time_id and loc constraint
Example data:
{
"_id" : "5040",
"name" : "entry1",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [],
}
{
"_id" : "5041",
"name" : "entry2",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [
{
"_id" : "5043",
"category_id" : 101,
"full_id" : 901,
},
{
"_id" : "5044",
"category_id" : 102,
"full_id" : 902,
}
],
}
{
"_id" : "5042",
"name" : "entry3",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [
{
"_id" : "5045",
"category_id" : 101,
"full_id" : 903,
},
],
}
The expected outcome for this example would be:
"name" "full_id" "category_id"
"entry2" 901 101
"entry2" 902 102
"entry3" 903 101
So if placed_items is empty, do put the entry in the dataframe and if placed_items containts n entries, put n entries in dataframe
I tried to work out an RBlogger example to create the desired dataframe.
#Set up database
mongo <- mongo.create()
#Set up condition
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "loc", 1)
mongo.bson.buffer.start.object(buf, "time_id")
mongo.bson.buffer.append(buf, "$gte", 20120930)
mongo.bson.buffer.append(buf, "$lte", 20121002)
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
#Count
count <- mongo.count(mongo, "items_test.overallitem", query)
#Note that these counts don't work, since the count should be based on
#the number of placed_items in the array, and not the number of entries.
#Setup Cursor
cursor <- mongo.find(mongo, "items_test.overallitem", query)
#Create vectors, which will be filled by the while loop
name <- vector("character", count)
full_id<- vector("character", count)
category_id<- vector("character", count)
i <- 1
#Fill vectors
while (mongo.cursor.next(cursor)) {
b <- mongo.cursor.value(cursor)
order_id[i] <- mongo.bson.value(b, "name")
product_id[i] <- mongo.bson.value(b, "placed_items.full_id")
category_id[i] <- mongo.bson.value(b, "placed_items.category_id")
i <- i + 1
}
#Convert to dataframe
results <- as.data.frame(list(name=name, full_id=full_uid, category_id=category_id))
The conditions work and the code works if I would want to extract values on an overallitem level (i.e. _id or name) but fails to gather the information on a placed_items level. Furthermore, the dotted call for extracting full_id and category_id does not seem to work. Can anyone help?

Resources