Migrating existing data in cloudant - cloudant

Say I have a data structure like this in cloudant where this is one record:
{
"UserId": "0014807347",
"Conq": {
"reqs": "Testing",
"tag": "ARRANGEMENT"
},
"Outcome": {
"tag": "ARRANGEMENT",
"rating": 0
},
"id": "cdc11dc55a0006bb544d235e7dc1540a"
}
How could I transform each record of a particular table to add new fields?

Do a PUT with the id and current revision with the updated JSON body:
curl https://$USERNAME:$PASSWORD#$USERNAME.cloudant.com/$DATABASE/cdc11dc55a0006bb544d235e7dc1540a\
-X PUT \
-H "Content-Type: application/json" \
-d "$JSON"
{
"_id": "cdc11dc55a0006bb544d235e7dc1540a",
"_rev": "1-THE_CURRENT_REV_ID_HERE",
"UserId": "0014807347",
"Conq": {
"reqs": "Testing",
"tag": "ARRANGEMENT"
},
"Outcome": {
"tag": "ARRANGEMENT",
"rating": 0
},
"my_new_data_field": "My New Content Goes Here"
}
}
You should get a response of the type:
{
"ok":true,
"id":"cdc11dc55a0006bb544d235e7dc1540a",
"rev":"2-9176459034"
}
The current revision (indicated by 1-THE_CURRENT_REV_ID_HERE above) should be the revision you got when the document was last written.

Related

How to convert JSON array into JSON object and write it into file using shell script?

I have the below format of JSON file which is having issues[] array and I tried to use it for Kibana. But unfortunately Kibana doesn't support nested objects and array and there is a plugin to utilize so that I need to downgrade which I can't do right now because in that case I will lose all my data.
Sample data:
{
"expand": "schema,names",
"startAt": 0,
"maxResults": 50,
"total": 4,
"issues": [{
"expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
"id": "1999875",
"self": "https://amazon.kindle.com/jira/rest/api/2/issue/1999875",
"key": "KINDLEAMZ-67578",
"fields": {
"summary": "contingency is displaying for confirmed card.",
"priority": {
"name": "P1",
"id": "1"
},
"created": "2019-09-23T11:25:21.000+0000"
}
},
{
"expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
"id": "2019428",
"self": "https://amazon.kindle.com/jira/rest/api/2/issue/2019428",
"key": "KINDLEAMZ-68661",
"fields": {
"summary": "card",
"priority": {
"name": "P1",
"id": "1"
},
"created": "2019-09-23T11:25:21.000+0000"
}
},
{
"expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
"id": "2010958",
"self": "https://amazon.kindle.com/jira/rest/api/2/issue/2010958",
"key": "KINDLEAMZ-68167",
"fields": {
"summary": "Test Card",
"priority": {
"name": "P1",
"id": "1"
},
"created": "2019-09-23T11:25:21.000+0000"
}
}
]
}
So I just planned to restructure this payload like all the issues[] into as an object and write down in separate file. So that I can avoid that issue.
Expected output:
For that above sample data I have 4 records in issues[].length so I just want to create 4 different files with below format:
File1.json:
{
"key": "KINDLEAMZ-67578",
"summary": "contingency is displaying for confirmed card.",
"name": "P1",
"created": "2019-09-23T11:25:21.000+0000"
}
The same way I want to looping the other arrays and get the values as like above and write down in File2.json, File3.json, and File4.json.
Since the data is dynamic and so I just want this file creation happen based of length of issues[] array.
Is there anyway to achieve this by using shell script? Or any CLI library.
Please advise me.
Specify -c/--compact-output flag to make jq put each entity on a single, separate line, then use awk to write each line to a separate file.
jq -c '.issues[] | {
key,
summary: .fields.summary,
name: .fields.priority.name,
created: .fields.created
}' file | awk '{
f = ("file" NR ".json")
print > f
close(f)
}'
Using GNU awk and extension gawk-json:
awk '
#load "json"
{
lines=lines $0
if(json_fromJSON(lines,data)==1){
for(i in data["issues"]) {
out["key"] = data["issues"][i]["key"]
out["summary"] = data["issues"][i]["fields"]["summary"]
out["created"] = data["issues"][i]["fields"]["created"]
out["name"] = data["issues"][i]["fields"]["priority"]["name"]
file="file" i ".json"
print json_toJSON(out) > file
close(file)
delete out
}
}
}' file.json
Output:
$ cat file1.json | jq '.' # useless use of cat but used to emphasize
{
"created": "2019-09-23T11:25:21.000+0000",
"key": "KINDLEAMZ-67578",
"summary": "contingency is displaying for confirmed card.",
"name": "P1"
}

Exact string search in array in Elasticsearch

I want to search exact string in array.
My data in ES is like below:
{ category": [
"abc test"
],
"es_flag": false,
"bullet_points": [],
"content": "",
"description": false }
I have multiple category like "abc test", "new abc test" etc...
I am trying below query but I am getting multiple category result, I was searching for "abc test" but "new abc test" category is also coming in the result.
{
"from": 0,
"size": 30,
"query": {
"bool" : {
"must": [
{ "match_phrase": { "category": "abc test" } }
]
}
},
"sort": [ { "createdAt": { "order": "desc" } } ]
}
Help will be appreciated.
I'm assuming you are using default analyzer. In that case match_phrase against "field": "abc test" will match all documents which will have the fields with adjacent tokens of abc test, including:
new abc test
abc test new
foo abc test bar
And it will not match:
abc new test - query tokens are not adjacent
test abc - query tokens are adjacent, but in the wrong order
What would actually help you is using the keyword analyzer for your field (you either need to build new index from scratch or update your mappings). If you're building from scrach:
curl -XPUT http://localhost:9200/my_index -d '
{
"mappings": {
"categories": {
"properties": {
"category": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
}'
And afterwards you need to use just simple query, e.g. like this (either match or term will do):
curl -XGET http://localhost:9200/my_index/_search -d '
{
"query": {
"match" : {
"message" : "abc test"
}
}
}'
My version of elasticsearch is 6.0.1. I am using this approach:
GET <your index>/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "category:abc OR category:test"
}
}]
}
},
"sort":[{"createdAt": {
"order": "desc"
}}]
}

How to insert array of document in mongodb using node.js?

I want to insert array of document to mongodb using node.js but while inserting it's only inserting first data only.
[{
"userid": "5664",
"name": "Zero 2679",
"number": "1234562679",
"status": "contact",
"currentUserid": "Abcd"
},
{
"userid": "5665",
"name": "Zero 3649",
"number": "1234563649",
"status": "contact",
"currentUserid": "Xyz"
}]
Sample code
collection.insert([{"userid": userid,"name": name,"number": number,"status": status,"currentUserid": currentUserid}], function(err, docs) {
if (err) {
res.json({error : "database error"});
}else {
collection.find({currentUserid:currentUserid}).toArray(function(err, users) {
res.send(users);
});
}});
But it still inserting first value only can you please tell me how to insert all these documents.
Please kindly go through my post and suggest me some solution.
In your sample code you are adding only 1 user.
db.collection('myCollection').insert([doc1, doc2]); inserts two documents using bulk write.
See documentation here: https://docs.mongodb.org/manual/reference/method/db.collection.insert/
From your sample, you can do:
var data = [{
"userid": "5664",
"name": "Zero 2679",
"number": "1234562679",
"status": "contact",
"currentUserid": "Abcd"
},
{
"userid": "5665",
"name": "Zero 3649",
"number": "1234563649",
"status": "contact",
"currentUserid": "Xyz"
}];
db.collection('myCollection').insert(data)
.then(function() {
return db.collection('myCollection').find({number: {$in: ["1234563649", "1234562679"]}});
})
.then(function(res) {
console.log(res);
});

Meaning of the text_index parameter in the Concept Insights annotateText call?

The example /annotateText Concept Insights call provides the following example output:
curl -H 'Content-Type: text/plain' -d 'IBM announces new Watson services.'
'https://watson-api-explorer.mybluemix.net/concept-insights/api/v2/graphs/wikipedia/en-20120601/annotate_text'
{
"annotations": [
{
"concept": {
"id": "/graphs/wikipedia/en-20120601/concepts/Watson_(computer)",
"label": "Watson (computer)"
},
"score": 0.99832845,
"text_index": [
18,
24
]
},
{
"concept": {
"id": "/graphs/wikipedia/en-20120601/concepts/IBM",
"label": "IBM"
},
"score": 0.9980473,
"text_index": [
0,
3
]
}
]
}
What is the meaning of the text_index parameter that is being returned?
text_index tells you the start and end position where the identified concept is.
In your example, the concept IBM_Watson was identified in the snippet IBM announces new Watson.

Unique Filter to Elastic Search Column not working (duplicate items inserted)

I've modified my contactNumber field to have a unique filter
by updating the index settings as follows
curl -XPUT localhost:9200/test-index2/_settings -d '
{
"index":{
"analysis":{
"analyzer":{
"unique_keyword_analyzer":{
"only_on_same_position":"true",
"filter":"unique"
}
}
}
},
"mappings":{
"business":{
"properties":{
"contactNumber":{
"analyzer":"unique_keyword_analyzer",
"type":"string"
}
}
}
}
}'
A sample Item looks like this,
doc_type:"Business"
contactNumber:"(+12)415-3499"
name:"Sam's Pizza"
address:"Somewhere on earth"
The Filter does not work, as duplicate items are inserted, I'd like NO two documents having the same contactNumber
in the above, I've also set only_on_same_position -> true so that existing duplicate values would be truncated/deleted
What am i doing wrong in the settings?
That's something Elasticsearch couldn't help you out of the box... you need to make this uniqueness functionality available in your app. The only idea that I can think of is to have the phone number as the _id of the document itself and whenever you insert/update something ES will use the contactNumber as _id and it will associate that document with the one that already exists or create a new one.
For example:
PUT /test-index2
{
"mappings": {
"business": {
"_id": {
"path": "contactNumber"
},
"properties": {
"contactNumber": {
"type": "string",
"analyzer": "keyword"
},
"address": {
"type": "string"
}
}
}
}
}
Then you index something:
POST /test-index2/business
{
"contactNumber": "(+12)415-3499",
"address": "whatever 123"
}
Getting it back:
GET /test-index2/business/_search
{
"query": {
"match_all": {}
}
}
It looks like this:
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test-index2",
"_type": "business",
"_id": "(+12)415-3499",
"_score": 1,
"_source": {
"contactNumber": "(+12)415-3499",
"address": "whatever 123"
}
}
]
}
You see there that the _id of the document is the phone number itself. If you want to change or insert another document (the address is different, there is a new field - whatever_field - but the contactNumber is the same):
POST /test-index2/business
{
"contactNumber": "(+12)415-3499",
"address": "whatever 123 456",
"whatever_field": "whatever value"
}
Elasticserach "updates" the existing document and responds back with:
{
"_index": "test-index2",
"_type": "business",
"_id": "(+12)415-3499",
"_version": 2,
"created": false
}
created is false, this means the document has been updated, not created. _version is 2 which again says that the document has been updated. And the _id is the phone number itself which indicate this is the document that has been updated.
Looking again in the index, ES stores this:
"hits": [
{
"_index": "test-index2",
"_type": "business",
"_id": "(+12)415-3499",
"_score": 1,
"_source": {
"contactNumber": "(+12)415-3499",
"address": "whatever 123 456",
"whatever_field": "whatever value"
}
}
]
So, the new field is there, the address has changed, the contactNumber and _id are exactly the same.

Resources