Delete Elasticsearch index without deleting its mappings - database

How can I delete data from my elasticsearch database without deleting my index mapping?
I am Tire gem and using the delete command deletes all my mappings and run the create command once again. I want to avoid the create command from being run again and again.
Please help me out with this.

found it at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
DELETE <index>/_query
{
"query" : {
"match_all": {}
}
}
You can also just delete a specific type by changing it to DELETE <index>/<type>/_query
This will delete the data and maintain the mappings, setting, etc.

You can use index templates, which will be applied to indices whose name matches a pattern.
That way you can simply delete an index, using the delete index api (way better than deleting all documents in it), and when you recreate the same index the matching index templates will get applied to it, so that you don't need to recreate its mappings, settings, warmers as well...
What happens is that the mappings will get deleted as they refer to the index that you deleted, but since they are stored in the index templates as well you won't need to resubmit them again when recreating the same index later on.

Due to the way ElasticSearch delete it's documents (by flagging the document with a bitset for deletion) it wouldn't be worthwhile to iterate through X amount of documents and flagging them for delete. I believe when you flush an indices it will free memory by removing all documents with the delete bitset flagged, being an expensive operation and slowing down the shards on which the index resides in.
Hope this helps.

Updating Yehosef's answer based on the latest docs (6.2 as of this post):
POST <index>/_delete_by_query
{
"query" : {
"match_all": {}
}
}

Deleting by query is deprecated in 1.5.3
You should use the scroll/scan API to find all matching ids and then issue a bulk request to delete them.
As documented here
curl -XGET 'localhost:9200/realestate/houses/_search?scroll=1m' -d '
{
"query": {
"match_all" : { }
},
"fields": []
}
'
and then the bulk delete (don't forget to put a new line after the last row)
curl -XPOST 'localhost:9200/_bulk' -d '
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "1" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "2" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "3" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "4" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "5" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "6" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "7" } }
{ "delete" : { "_index" : "realestate", "_type" : "houses", "_id" : "8" } }
'

Related

String from document meets value of array

I've got an array of Project ID's, for example:
[ 'ExneN3NdwmGPgRj5o', 'hXoRA7moQhqjwtaiY' ]
And in my Questions collection, I've got a field called 'project', which has a string of a project Id. For example:
{
"_id" : "XPRbFupkJPmrmvcin",
"question" : "Vraag 13",
"answer" : "photo",
"project" : "ExneN3NdwmGPgRj5o",
"datetime_from" : ISODate("2017-01-10T08:01:00Z"),
"datetime_till" : ISODate("2017-01-10T19:00:00Z"),
"createdAt" : ISODate("2017-01-10T08:41:39.950Z"),
"notificationSent" : true
}
{
"_id" : "EdFH6bo2xBPht5kYW",
"question" : "sdfadsfasdf",
"answer" : "text",
"project" : "hXoRA7moQhqjwtaiY",
"datetime_from" : ISODate("2017-01-11T11:00:00Z"),
"datetime_till" : ISODate("2017-01-11T17:00:00Z"),
"createdAt" : ISODate("2017-01-10T10:21:42.147Z"),
"notificationSent" : false
}
Now I want to return all documents of the Questions collection, where the Project (id) is one of the value's from the Array.
To test if it's working, I'm first trying to return one document.
Im console.logging like this:
Questions.findOne({project: { $eq: projectArray }})['_id'];
but have also tryed this:
Questions.findOne({project: { $in: [projectArray] }})['_id'];
But keep getting 'undefined'
Please try this.
Questions.find({project: { $in: projectArray }}) => for fetching all docs with those ids
Questions.findOne({project: { $in: projectArray }}) => if you want just one doc

MongoDB: Check for missing documents using a model tree structures with an array of ancestors

I'm using a model tree structures with an array of ancestors and I need to check if any document is missing.
{
"_id" : "GbxvxMdQ9rv8p6b8M",
"type" : "article",
"ancestors" : [ ]
}
{
"_id" : "mtmTBW8nA4YoCevf4",
"parent" : "GbxvxMdQ9rv8p6b8M",
"ancestors" : [
"GbxvxMdQ9rv8p6b8M"
]
}
{
"_id" : "J5Dg4fB5Kmdbi8mwj",
"parent" : "mtmTBW8nA4YoCevf4",
"ancestors" : [
"GbxvxMdQ9rv8p6b8M",
"mtmTBW8nA4YoCevf4"
]
}
{
"_id" : "tYmH8fQeTLpe4wxi7",
"refType" : "reference",
"parent" : "J5Dg4fB5Kmdbi8mwj",
"ancestors" : [
"GbxvxMdQ9rv8p6b8M",
"mtmTBW8nA4YoCevf4",
"J5Dg4fB5Kmdbi8mwj"
]
}
My attempt would be to check each ancestors id if it is existing. If this fails, this document is missing and the data structure is corrupted.
let ancestors;
Collection.find().forEach(r => {
if (r.ancestors) {
r.ancestors.forEach(a => {
if (!Collection.findOne(a))
missing.push(r._id);
});
}
});
But doing it like this will need MANY db calls. Is it possible to optimize this?
Maybe I could get an array with all unique ancestor ids first and check if these documents are existing within one db call??
First take out all distinct ancesstors from your collections.
var allAncesstorIds = db.<collectionName>.distinct("ancestors");
Then check if any of the ancesstor IDs are not in the collection.
var cursor = db.<collectionName>.find({_id : {$nin : allAncesstorIds}}, {_id : 1})
Iterate the cursor and insert all missing docs in a collection.
cursor.forEach(function (missingDocId) {
db.missing.insert(missingDocId);
});

MongoDB Update array in a document

I try to update arrays of multiple document with this query :
db.BusinessRequest.update({"DealTypes": { $exists: true }, "DealTypes.DisplayName": "Minority trade sale" }, {$set:{"DealTypes.$.DisplayName":"Minority"}}, false,true );
but when there is a match, it only updates the first row of my array whereas the displayName does not match with the first.
I use IntelliShell of MongoChef software.
My document looks like this :
{
"_id" : BinData(4, "IKC6QJRGSIywmKTKKRfTHA=="),
"_t" : "InvestorBusinessRequest",
"Title" : "Business Request 000000002",
"DealTypes" : [
{
"_id" : "60284B76-1F45-49F3-87B5-5278FF49A304",
"DisplayName" : "Majority",
"Order" : "001"
},
{
"_id" : "64A52AFE-2FF5-426D-BEA7-8DAE2B0E59A6",
"DisplayName" : "Majority trade sale",
"Order" : "002"
},
{
"_id" : "C07AE70D-4F62-470D-BF65-06AF93CCEBFA",
"DisplayName" : "Minority trade sale",
"Order" : "003"
},
{
"_id" : "F5C4390A-CA7D-4AC8-873E-2DC43D7F4158",
"DisplayName" : "Equity fund raising",
"Order" : "004"
}
]
}
How can I achieve this please ? Thanks in advance
EDIT :
This line works :
db.BusinessRequest.update({"DealTypes": { $exists: true }, "DealTypes": { $elemMatch: {"DisplayName": "Majority trade sale"}}}, {$set:{"DealTypes.$.DisplayName":"Majority"}}, false,true );
Please try this :
db.BusinessRequest.find().forEach( function(doc) {
do {
db.BusinessRequest.update({{"DealTypes": { $exists: true }, "DealTypes.DisplayName": "Minority trade sale" },
{$set:{"DealTypes.$.DisplayName":"Minority"}});
} while (db.getPrevError().n != 0);
})
or
You cannot modify multiple array elements in a single update operation. Thus, you'll have to repeat the update in order to migrate documents which need multiple array elements to be modified. You can do this by iterating through each document in the collection, repeatedly applying an update with $elemMatch until the document has all of its relevant comments replaced.
db.BusinessRequest.update({"DealTypes": { $exists: true }, "DealTypes": { $elemMatch: {"DisplayName": "Majority trade sale"}}}, {$set:{"DealTypes.$.DisplayName":"Majority"}}, false,true );
If you need efficiency in the search then I suggest you to normalise schema where each row is kept in separate document.
Please execute the following script in your mongo shell :
db.BusinessRequest.find({"DealTypes":{$exists:true}}).forEach(function(item)
{
for(i=0;i < item.DealTypes.length;i++)
{
if(item.DealTypes[i].DisplayName === 'Minority trade sale'){
item.DealTypes[i].DisplayName = 'Minority';
}
}
db.BusinessRequest.save(item);
});
Last two arguments in your update have a problem.
This is the form of update() method in mongodb
db.collection.update(
<query>,
<update>,
{
upsert: <boolean>,
multi: <boolean>,
writeConcern: <document>
}
)
I believe your update should be like this;
db.BusinessRequest.update
( {"DealTypes": { $exists: true }, "DealTypes.DisplayName": "Minority trade sale" }
, {$set:{"DealTypes.$.DisplayName":"Minority"}}
{ upsert : false, multi : true });

Mongodb get unique value from documents and add to array

I wanted to do a query to match documents in one collection with documents in another collection based upon a value which should be contained in both sets of documents but, as I have been informed that Mongo does not support a JOIN, I believe I can't do what I want in the way I want to.
My alternative method then is to insert a document into the collection (col1) where I want to do a query and update which contains an array of all the unique cycle number which are in the other collection (col2).
Collection 1 (Col 1)
{
"_id" : ObjectId("5670961f910e1f54662c11ag"),
"objectType" : "Account Balance",
"Customer" : "Thomas Brown",
"status" : "unprocessed",
"cycle" : "1234"
},
{
"_id" : ObjectId("5670961f910e1f54662c12fd"),
"objectType" : "Account Balance",
"Customer" : "Luke Underwood",
"status" : "unprocessed",
"cycle" : "1235"
}
Collection 2 (Col 2)
{
"_id" : ObjectId("5670961f910e1f54662c1d9d"),
"objectOrigin" : "Xero",
"Value" : "500.00",
"key" : "grossprofit",
"cycle" : "1234",
"company" : "e56e09ef-5c7c-423e-b699-21469bd2ea00"
},
{
"_id" : ObjectId("5670961f910e1f54662c1d9f"),
"objectOrigin" : "Xero",
"Value" : "500.00",
"key" : "grossprofit",
"cycle" : "1234",
"company" : "0a514db8-1428-4da6-9225-0286dc2662c1"
},
{
"_id" : ObjectId("5670961f910e1f54662c1da0"),
"objectOrigin" : "Xero",
"Value" : "-127.28",
"key" : "grossprofit",
"cycle" : "1234",
"company" : "c2d0561c-dc5d-44b9-beaf-d69a3472a2b8"
},
{
"_id" : ObjectId("5670961f910e1f54662c1da1"),
"objectOrigin" : "Xero",
"Value" : "-127.28",
"key" : "grossprofit",
"cycle" : "1235",
"company" : "c3fbe6e4-962a-45f6-9ce3-71e2a588438c"
}
So I want to create a document in collection 1 which looks like this:
{
"_id" : ObjectId("5670961f910e1f54662c1d9f"),
"objectType" : "Status Updater",
"cycles" : ["1234","1235"]
}
Now what I want to do is query ALL documents where cycle = cycles and update "status" to "processed". I believe I would do this with a findAndModify with multi : true but not entirely sure.
When finished, I will just simply delete any document in the Collection 1 where objectType is "Status Updater".
If I understand correctly, you want to
a) update all documents in collection #1 where the value of cycle
exists in collection #2.
b) Furthermore, your document of type "objectType" : "Status
Updater" is only a temporary document to keep track of all the cycle
values.
I think you can skip b) and just use the following (this code needs to be executed in the mongo shell):
# get values of all unique cycle values
# returns an array containing: ["1234", "1235", ...]
values = db.coll2.distinct("cycles", {})
# find all documents where cycles value is in the values array
# and update their status.
db.coll1.update({cycles: {$in: values}}, {$set: {status: "processed"}}, {multi: true})

mongodb - mapreducing two collections where one collection has ids in an array of array

I'm very new to mongoDB and having some problems on joining two collections.
I've read some posts on mapReduce to perform NOSQL way of joining but still having some difficulties here
Collection 1: attraction
{
"_id" : "0001333b-e485-4fee-a0e2-9b7dc338d5a2",
"types" : "Shops",
"name" : "name",
"geo_location" : {
"lat" : 36.0567700000000002,
"lon" : -112.1354520000000008
},
"overall_rating" : 10.0000000000000000,
"num_of_review" : 6,
"review" : [
{
"review_ids" : [
"66ea1cd8-da34-40dc-8ad6-f30df5de9c2c",
"76f51c8d-d2a8-4609-8b7c-c2b0c386e35c",
"185c962a-fcfe-4d03-a3ac-86398be6312a",
"2212535b-28c6-423e-91f7-cc1dfb407d79",
"7e0f1d85-e79e-4bec-9e9c-7dfb03223816",
"f19a83a6-c6ef-4cbe-b90d-f6187bd50baa"
]
}
]
}
Collection 2: attraction_review
{
"_id" : "7e0f1d85-e79e-4bec-9e9c-7dfb03223816",
"user_id" : "somename",
"review_id" : "r122796525",
"unified_id" : "0001333b-e485-4fee-a0e2-9b7dc338d5a2",
"source_id" : "d1057961",
"review_url" : "someURL",
"title" : "some title",
"overall_rating" : 10,
"review_date" : "dates",
"content" : "some contents here",
"source" : "source",
"traval_date" : "dates",
"sort" : ""
}
Basically I need to keep (or copy) the reviews in the attraction_review whose _id has appeared in the review_ids array of the attraction collection.
The example above shows the matching review in red.
It is guaranteed that the attraction_review collection contains every ids in the review_ids for all records in the attraction collection.
The difficulty here is that the review_ids array is within review array, and I am not sure how I would go about mapping many instances of ids.
I would be grateful for some suggestions.
Many thanks

Resources