I'm importing a very large csv file to mongodb which follows the following format:
"zzzàms#hotmail.com","12071988"
"zzzг ms#hotmail.com","12071988"
"zzпїѕпїѕmmbbii2#bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2#list.ru","MA15042002"
"zzпїѕпїѕmmbbii2#rambler.ru","MA15042002"
"zzпїѕпїѕmmbbii2#yandex.ru","MA15042002"
However, I am not certain how many fields / columns will follow, after the email field.
I have imported using this command:
mongoimport -d emails -c second --file all.csv --type csv --fields email, number
However, any fields / columns after number field is issued a default value of 'field2', 'field3' and so on.
{ "_id" : ObjectId("5a5cd95e598f1e910d353e3b"), "email" : "00-amber-00#embarqmail.com", " number" : "number1", "field2" : "number2" }
How can I put anything after the number field in the same column, so it's classified as 'number'?
Sometimes, one entry has maybe 40 columns.
I don't wish to modify the csv file unless it is really necessary.
Sorry, english is not first language, thanks.
you can use Unix commands like awk to parse the line to json as per the logic and stdin to mongoimport
sample file
saravana#ubuntu:~$ cat sample-doc.txt
"zzzàms#hotmail.com","12071988"
"zzzг ms#hotmail.com","12071988"
"zzпїѕпїѕmmbbii2#bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2#list.ru","MA15042002"
"zzпїѕпїѕmmbbii2#rambler.ru","MA15042002","34534"
"zzпїѕпїѕmmbbii2#yandex.ru","MA15042002","1232434","3435435","53534"
awk to convert json, email followed by numbers
saravana#ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }'
{ email :"zzzàms#hotmail.com", numbers : [ "12071988" ] }
{ email :"zzzг ms#hotmail.com", numbers : [ "12071988" ] }
{ email :"zzпїѕпїѕmmbbii2#bk.ru", numbers : [ "MA15042002" ] }
{ email :"zzпїѕпїѕmmbbii2#list.ru", numbers : [ "MA15042002" ] }
{ email :"zzпїѕпїѕmmbbii2#rambler.ru", numbers : [ "MA15042002","34534" ] }
{ email :"zzпїѕпїѕmmbbii2#yandex.ru", numbers : [ "MA15042002","1232434","3435435","53534" ] }
saravana#ubuntu:~$
mongoimport using stdin
saravana#ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }' | mongoimport --type json --db test --collection emailnos -v
2018-01-17T09:58:11.559+0530 reading from stdin
2018-01-17T09:58:11.559+0530 using fields:
2018-01-17T09:58:11.561+0530 connected to: localhost
2018-01-17T09:58:11.561+0530 ns: test.emailnos
2018-01-17T09:58:11.561+0530 connected to node type: standalone
2018-01-17T09:58:11.561+0530 using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-01-17T09:58:11.561+0530 using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-01-17T09:58:11.726+0530 imported 6 documents
collection
> db.emailnos.find()
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da90"), "email" : "zzzàms#hotmail.com", "numbers" : [ "12071988" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da91"), "email" : "zzпїѕпїѕmmbbii2#list.ru", "numbers" : [ "MA15042002" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da92"), "email" : "zzпїѕпїѕmmbbii2#rambler.ru", "numbers" : [ "MA15042002", "34534" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da93"), "email" : "zzпїѕпїѕmmbbii2#yandex.ru", "numbers" : [ "MA15042002", "1232434", "3435435", "53534" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da94"), "email" : "zzzг ms#hotmail.com", "numbers" : [ "12071988" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da95"), "email" : "zzпїѕпїѕmmbbii2#bk.ru", "numbers" : [ "MA15042002" ] }
>
Related
I have a data collection which contains a set of records in the following format.
{
"_id" : 22,
"title" : "3D User Interfaces with Java 3D",
"isbn" : "1884777902",
"pageCount" : 520,
"publishedDate" : ISODate("2000-08-01T07:00:00Z"),
"thumbnailUrl" : "https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/barrilleaux.jpg",
"longDescription" : "Description",
"status" : "PUBLISH",
"authors" : [
"Jon Barrilleaux"
],
"categories" : [
"Java",
"Computer Graphics"
]
},
{
"_id" : 23,
"title" : "Specification by Example",
"isbn" : "1617290084",
"pageCount" : 0,
"publishedDate" : ISODate("2011-06-03T07:00:00Z"),
"thumbnailUrl" : "https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/adzic.jpg",
"status" : "PUBLISH",
"authors" : [
"Gojko Adzic"
],
"categories" : [
"Software Engineering"
]
}
Please note that the 'categories' is an array.
I want to count the published books for each category. I tried the following solution, but it treated the entire array as one group.
db.books.aggregate([
{
$group:{_id:"$categories", total:{$sum:1}}
}
])
Instead of so, I want to count the number of records for each individual category value inside 'categories' array.
You should first use $unwind which outputs one document for each element in the array.
db.books.aggregate([
{
$unwind : "$categories"
},
{
$group : { _id : "$categories", total: { $sum: 1 } }
}
])
How do I query a deeply nested docs like show in the below image.
Here columns is an array of unknown size. Each element in the column contains a record that is again an array. Each element of the record array contains an array called fields. Each entry in the field contains 2 keys called name and value.
I'm querying the name of the innermost array (in fields array). I couldn't go above level 1 order of nesting.
JSON doc of the above image
"data" : {
"columns" : [
{
"name" : "styleId",
"record" : [
{
"fname" : "column_mapping",
"_id" : ObjectId("5ba488c79dc6d62c90257752"),
"fields" : [
{
"name" : "column_mapping_form",
"value" : "styleId"
}
],
"rules" : [
[
]
]
}
]
},
{
"name" : "vendorArticleNumber",
"record" : [
{
"fname" : "column_mapping",
"_id" : ObjectId("5ba488c79dc6d62c90257753"),
"fields" : [
{
"name" : "column_mapping_form",
"value" : "vendorArticleNumber"
}
],
"rules" : [
[
]
]
}
]
},
{
"name" : "vendorArticleName",
"record" : [
{
"fname" : "column_mapping",
"_id" : ObjectId("5ba488c79dc6d62c90257754"),
"fields" : [
{
"name" : "column_mapping_form",
"value" : "vendorArticleName"
}
],
"rules" : [
[
]
]
}
]
}
}
What can be the solutions if such kind of heaving nesting is there?
db.collection.find("data.columns.record.fields.name" : "column_mapping_form")
will match all documents where there is at least one element of columns has at least one record with at least one field where name is "column_mapping_form".
https://docs.mongodb.com/manual/tutorial/query-array-of-documents/ has very good explanation, examples, and interactive shell to play with.
I'm fairly new to mongoDB, but I've managed to archive a load of documents into a new collection called documents_archived in the following format using an aggregation pipeleine:
{
"_id" : ObjectId("5a0046ef2039404645a42f52"),
"archive" : [
{
"_id" : ObjectId("54e60f49e4b097cc823afe8c"),
"_class" : "xxxxxxxxxxxxx",
"fields" : [
{
"key" : "Agreement Number",
"value" : "1002465507"
}
{
"key" : "Name",
"value" : "xxxxxxxx"
}
{
"key" : "Reg No",
"value" : "xxxxxxx"
}
{
"key" : "Surname",
"value" : "xxxxxxxx"
}
{
"key" : "Workflow Id",
"value" : "xxxxxxxx"
}
],
"fileName" : "Audit_C1002465507.txt",
"type" : "Workflow Audit",
"fileSize" : NumberLong(404),
"document" : BinData(0, "xxxxx"),
"creationDate" : ISODate("2009-09-25T00:00:00.000+0000"),
"lastModificationDate" : ISODate("2015-02-19T16:28:57.016+0000"),
"expiryDate" : ISODate("2018-09-25T00:00:00.000+0000")
}
]
}
Now, I'm trying to extract just the Agreement Number's value. However, I have tried many things that my limited knowledge, searching and documentation will allow. Wondered if the mongoDB experts out there can help. Thank you.
Here's a solution that uses the agg framework. I am assuming that each doc can have more than one entry in the archive field but only one Agreement Number in the fields array because your design appears to be key/value. If multiple Agreement Numbers show up in the fields array we'll have to add an additional $unwind but for the moment, this should work:
db.foo.aggregate([
{$unwind: "$archive"}
,{$project: {x: {$filter: {
input: "$archive.fields",
as: "z",
cond: {$eq: [ "$$z.key", "Agreement Number" ]}
}}
}}
,{$project: {_id:false, val: {$arrayElemAt: ["$x.value",0]} }}
]);
{ "val" : "1002465507" }
You can use following in mongo shell to extract only values:
db.documents_archived.find().forEach(function(doc) {
doc.archive[0].fields.forEach(function(field) {
if (field.key == "Agreement Number") {
print(field.value)
}
})
})
I have some complex document (being new to mongodb schemas, I think it's complext) that I'm trying to process through for a specific array value match across different array sections of the document.
Sample content of my document:
{
"_id" : ObjectId("541c0c9bdfecb53368e12ef0"),
"SRVIP" : "10.10.10.10",
"INSNME" : "myinstance",
"DBNAME" : "mydbname",
"DBGRPL" : [{
"GRPNME" : "grp1",
"GRPPRV" : "7",
"GRPAUT" : [ “AUTH1”,”AUTH2”],
"GRPUSR" : [ "USER1",”USER2”]
}
],
"SAUTLV" : [ { "SAUNME" : "USER4",
"SAUPRV" : "0",
"SAUAUT" : [ “AUTH2”,”AUTH3”],
"SAUUSR" : [ "USER2" ]
}
],
"USRLVL" : [
{ "ULVNME" : "USER1",
"ULVPRV" : "0",
"ULVAUT" : [ "AUTH1","AUTH2","AUTH3"]
},
{
"ULVNME" : "USER2",
"ULVPRV" : "2411",
"ULVAUT" : [ "AUTH3"]
}
]
}
I'm trying to only return the section of the document where for example USER1 exists
At the moment, I've create two different aggregated statement to retrieve the information, but I'm looking at a single statement to search all arrays in the document.
Retrieving USER1 statement on DBGRPL array level :
var var1=[“USER1”]
db.authinfo.aggregate({$unwind:"$DBGRPL"},{$match:{"DBGRPL.GRPUSR":{$in:var1}}},{$project:{SRVIP:1,DBNAME:1,"DBGRPL":1}})
var var1=”USER1”
Retrieving USER1 statement on USRLVL array level:
db.authinfo.aggregate({$unwind:"$USRLVL"},{$match:{"USRLVL.ULVNME":var1}},{$project:{SRVIP:1,DBNAME:1,"USRLVL":1}})
The obvious error with the above approach is using 2 different variable type for the queries to work, which is also something I can't resolve at the moment ….
How can I combine the search into a single statement ?
expected output :
{
"_id" : ObjectId("541c0c9bdfecb53368e12ef0"),
"SRVIP" : "10.10.10.10",
"INSNME" : "myinstance",
"DBNAME" : "mydbname",
"DBGRPL" : [{
"GRPNME" : "grp1",
"GRPPRV" : "7",
"GRPAUT" : [ “AUTH1”,”AUTH2”],
"GRPUSR" : [ "USER1",”USER2”]
}
],
"USRLVL" : [
{ "ULVNME" : "USER1",
"ULVPRV" : "0",
"ULVAUT" : [ "AUTH1","AUTH2","AUTH3"]
}
{
]
}
when searching for USER1.
I will also search across the GRPAUTH, SAUAUT and ULVAUTH sections of the document where say AUTH1 is a value ...
Mongoose/Mongo noob here:
My Data
Here is my simplified data, each user has his own document
{ "__v" : 1,
"_id" : ObjectId( "53440e94c02b3cae81eb0065" ),
"email" : "test#test.com",
"firstName" : "testFirstName",
"inventories" : [
{ "_id" : "active",
"tags" : [
"inventory",
"active",
"vehicles" ],
"title" : "activeInventory",
"vehicles" : [
{ "_id" : ObjectId( "53440e94c02b3cae81eb0069" ),
"tags" : [
"vehicle" ],
"details" : [
{ "_id" : ObjectId( "53440e94c02b3cae81eb0066" ),
"year" : 2007,
"transmission" : "Manual",
"price" : 1000,
"model" : "Firecar",
"mileageReading" : 50000,
"make" : "Bentley",
"interiorColor" : "blue",
"history" : "CarProof",
"exteriorColor" : "blue",
"driveTrain" : "SWD",
"description" : "test vehicle",
"cylinders" : 4,
"mileageType" : "kms" } ] } ] },
{ "title" : "soldInventory",
"_id" : "sold",
"vehicles" : [],
"tags" : [
"inventory",
"sold",
"vehicles" ] },
{ "title" : "deletedInventory",
"_id" : "deleted",
"vehicles" : [],
"tags" : [
"inventory",
"sold",
"vehicles" ] } ] }
As you can see, each user has an inventories property that is an array that contains 3 inventories (activeInventory, soldInventory and deletedInventory)
My Query
Given an user's email a a vehicle ID, i would like my query to go through find the user's activeInventory and return just the vehicle that matches the ID. Here is what I have so far:
user = api.mongodb.userModel;
ObjectId = require('mongoose').Types.ObjectId;
return user
.findOne({email : params.username})
.select('inventories')
.find({'title': 'activeInventory'})
//also tried
//.where('title')
//.equals('activeInventory')
.exec(function(err, result){
console.log(err);
console.log(result);
});
With this, result comes out as an empty array. I've also tried .find('inventories.title': 'activeInventory') which strangely returns the entire inventories array. If possible, I'd like to keep the chaining query format as I find it much more readable.
My Ideal Query
return user
.findOne({email : params.username})
.select('inventories')
.where('title')
.equals('activeInventory')
.select('vehicles')
.id(vehicleID)
.exec(cb)
Obviously it does not work but it can give you an idea what I'm trying to do.
Using the $ positional operator, you can get the results. However, if you have multiple elements in the vehicles array all of them will be returned in the result, as you can only use one positional operator in the projection and you are working with 2 arrays (one inside another).
I would suggest you take a look at the aggregation framework, as you'll get a lot more flexibility. Here's an example query for your question that runs in the shell. I'm not familiar with mongoose, but I guess this will still help you and you'd be able to translate it:
db.collection.aggregate([
// Get only the documents where "email" equals "test#test.com" -- REPLACE with params.username
{"$match" : {email : "test#test.com"}},
// Unwind the "inventories" array
{"$unwind" : "$inventories"},
// Get only elements where "inventories.title" equals "activeInventory"
{"$match" : {"inventories.title":"activeInventory"}},
// Unwind the "vehicles" array
{"$unwind" : "$inventories.vehicles"},
// Filter by vehicle ID -- REPLACE with vehicleID
{"$match" : {"inventories.vehicles._id":ObjectId("53440e94c02b3cae81eb0069")}},
// Tidy up the output
{"$project" : {_id:0, vehicle:"$inventories.vehicles"}}
])
This is the output you'll get:
{
"result" : [
{
"vehicle" : {
"_id" : ObjectId("53440e94c02b3cae81eb0069"),
"tags" : [
"vehicle"
],
"details" : [
{
"_id" : ObjectId("53440e94c02b3cae81eb0066"),
"year" : 2007,
"transmission" : "Manual",
"price" : 1000,
"model" : "Firecar",
"mileageReading" : 50000,
"make" : "Bentley",
"interiorColor" : "blue",
"history" : "CarProof",
"exteriorColor" : "blue",
"driveTrain" : "SWD",
"description" : "test vehicle",
"cylinders" : 4,
"mileageType" : "kms"
}
]
}
}
],
"ok" : 1
}
getting the chaining query format ... i dont know how to parse it but, what you are searching for is projection, you should take a look to http://docs.mongodb.org/manual/reference/operator/projection/
it would probably look like this :
user.findOne({email: params.username}, {'inventories.title': {$elemMatch: "activeInventory", 'invertories.vehicle.id': $elemMatch: params.vehicleId}, function(err, result) {
console.log(err);
console.log(result);
})