NoSQL Structure for handling labeled tags - database

Currently I have a hundreds of thousands of files like so:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root"
}
Currently, I index all the names, and a client can search for files based on the name of the file. These files also have tags that need to be associated with the file but they also have specific labels.
An example would be:
{
"tags": [
{ "client": "john doe" },
{ "office": "virginia" },
{ "ssn": "1234" }
]
}
Is adding the tags array to my above file object the ideal solution if I want to be able to search thousands of files with a client of John Doe?
The only other solution I can think of is having something an object per tag and having an array of file ID's associated with each tag like so:
{
"_id": "11111111",
"type": "tag",
"label": "client",
"items": [
"1234567890",
"1222222222",
"1333333333"
]
}
With this being a LOT of objects I need to add tags to, I'd rather do it the most efficient way possible FIRST so I don't have to backtrack in the near future when I start running into issues.
Any guidance would be greatly appreciated.

Your original design, with a tags array, works well with Cloudant Search: https://console.ng.bluemix.net/docs/services/Cloudant/api/search.html#search.
With this approach you would define a single design document that will index any tag in the tags array. You do not have to create different views for different tags and you can use the Lucene syntax for queries: http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview.
So, using your example, if you have a document that looks like this with tags:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root",
"tags": [
{ "client": "john doe" },
{ "office": "virginia" },
{ "ssn": "1234" }
]
}
You can create a design document that indexes each tag like so:
{
"_id": "_design/searchFiles",
"views": {},
"language": "javascript",
"indexes": {
"byTag": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.type === \"file\" && doc.tags) {\n for (var i=0; i<doc.tags.length; i++) {\n for (var name in doc.tags[i]) {\n index(name, doc.tags[i][name]);\n }\n }\n }\n}"
}
}
}
The function looks like this:
function (doc) {
if (doc.type === "file" && doc.tags) {
for (var i=0; i<doc.tags.length; i++) {
for (var name in doc.tags[i]) {
index(name, doc.tags[i][name]);
}
}
}
}
Then you would search like this:
https://your_cloudant_account.cloudant.com/your_db/_design/searchFiles/_search/byTag
?q=client:jack+OR+office:virginia
&include_docs=true

The solution, that comes into my mind would be using map reduce functions.
To do that, you would add the tags to your original document:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root",
"client": "john",
...
}
Afterwards, you can create a design document, that looks like this:
{
"_id": "_design/query",
"views": {
"byClient": {
"map": "function(doc) { if(doc.client) { emit(doc.client, doc._id) }}"
}
}
}
After the view is processed, you can open it with
GET /YOURDB/_design/query/_view/byClient?key="john"
By adding the query parameter include_docs=true, the whole document will be returned, instead of the id.
You can also write your tags into an tags attribute, but you have to update the map function to match the new design.
More information about views can be found here:
http://docs.couchdb.org/en/2.0.0/api/ddoc/views.html

Related

How to include imported fields in the search results?

I'm using document references to import parent fields into a child document. While searches against the parent fields work, the parent fields themselves do not seem to be included in the search results, only child fields.
To use the example in the documentation, salesperson_name does not appear in the fields entry for id:test:ad::1 when using query=John, or indeed when retrieving id:test:ad::1 via GET directly.
Here's a simplified configuration for my document model:
search definitions
person.sd - the parent
search person {
document person {
field name type string {
indexing: summary | attribute
}
}
fieldset default {
fields: name
}
}
event.sd - the child
search event {
document event {
field code type string {
indexing: summary | attribute
}
field speaker type reference<person> {
indexing: summary | attribute
}
}
import field speaker.name as name {}
fieldset default {
fields: code
}
}
documents
p1 - person
{
"fields": {
"name": "p1"
}
}
e1 - event
{
"fields": {
"code": "e1",
"speaker": "id:n1:person::1"
}
}
query result
curl -s "http://localhost:8080/search/?yql=select%20*%20from%20sources%20*where%20name%20contains%20%22p1%22%3B" | python -m json.tool
This returns both e1 and p1, as you would expect, given that name is present in both. But the fields of e1 do not include the name.
{
"root": {
"children": [
{
"fields": {
"documentid": "id:n1:person::1",
"name": "p1",
"sddocname": "person"
},
"id": "id:n1:person::1",
"relevance": 0.0017429193899782135,
"source": "music"
},
{
"fields": {
"code": "e1",
"documentid": "id:n1:event::1",
"sddocname": "event",
"speaker": "id:n1:person::1"
},
"id": "id:n1:event::1",
"relevance": 0.0017429193899782135,
"source": "music"
}
],
...
"fields": {
"totalCount": 2
},
}
}
Currently you'll need to add the imported 'name' into the default summary by
import field speaker.name as name {}
document-summary default {
summary name type string{}
}
More about explicit document summaries in http://docs.vespa.ai/documentation/document-summaries.html
The result of your query will then return
"children": [
{
"fields": {
"documentid": "id:n1:person::1",
"name": "p1",
"sddocname": "person"
},
"id": "id:n1:person::1",
"relevance": 0.0017429193899782135,
"source": "stuff"
},
{
"fields": {
"code": "e1",
"documentid": "id:n1:event::1",
"name": "p1",
"sddocname": "event",
"speaker": "id:n1:person::1"
},
"id": "id:n1:event::1",
"relevance": 0.0017429193899782135,
"source": "stuff"
}
],
We'll improve the documentation on this. Thanks for the very detailed write-up.
Add "summary" to the indexing statement of the imported field in the parent document type.
E.g in the documentation example change the "name" field in the "salesperson" document type to say "indexing: attribute | summary".

Add new object inside array of objects, inside array of objects in mongodb

Considering the below bad model, as I am totally new to this.
{
"uid": "some-id",
"database": {
"name": "nameOfDatabase",
"collection": [
{
"name": "nameOfCollection",
"fields": {
"0": "field_1",
"1": "field_2"
}
},
{
"name": "nameOfAnotherCollection",
"fields": {
"0": "field_1"
}
}
]
}
}
I have the collection name (i.e database.collection.name) and I have a few fields to add to it or delete from it (there are some already existing ones under database.collection.fields, I want to add new ones or delete exiting ones).
In short how do I update/delete "fields", when I have the database name and the collection name.
I cannot figure out how to use positional operator $ in this context.
Using mongoose update as
Model.update(conditions, updates, options, callback);
I don't know what are correct conditions and correct updates parameters.
So far I have unsuccessfully used the below for model.update
conditions = {
"uid": req.body.uid,
"database.name": "test",
"database.collection":{ $elemMatch:{"name":req.body.collection.name}}
};
updates = {
$set: {
"fields": req.body.collection.fields
}
};
---------------------------------------------------------
conditions = {
"uid": req.body.uid,
"database.name": "test",
"database.collection.$.name":req.body.collection.name
};
updates = {
$addToSet: {
"fields": req.body.collection.fields
}
};
I tried a lot more but none did work, as I am totally new.
I am getting confused between $push, $set, $addToSet, what to use what not to?, how to?
The original schema is supposed to be as show below, but running queries on it is getting harder n harder.
{
"uid": "some-id",
"database": [
{ //array of database objects
"name": "nameOfDatabase",
"collection": [ //array of collection objects inside respective databases
{
"name": "nameOfCollection",
"fields": { //fields inside a this particular collection
"0": "field_1",
"1": "field_2"
}
}
]
}
]
}

"There is no index available for this selector" despite the fact I made one

In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!

Node.JS - How to access Values of Dictionary within an Array of a Key in a Dictionary?

I'm new in Node.JS and I'm able to parse the JSON data and do a console log to print out name and badges.
var details = JSON.parse(body);
console.log(details.name, details.badges.length);
But I don't know how I can get the data inside the arrays of the bagdes such as id, name, url.
I tried
console.log(details.badges.length.id);
But nothing shows up. How can I access that? Thank you.
{
"name": "Andrew Chalkley",
"badges": [
{
"id": 49,
"name": "Newbie",
"url": "http:\/\/teamtreehouse.com\/chalkers",
"icon_url": "https:\/\/achievement-images.teamtreehouse.com\/Generic_Newbie.png",
"earned_date": "2012-07-23T19:59:34.000Z",
"courses": [
]
},
{
"id": 26,
"name": "Introduction",
"url": "http:\/\/teamtreehouse.com\/library\/html\/introduction",
"icon_url": "https:\/\/achievement-images.teamtreehouse.com\/HTML_Basics.png",
"earned_date": "2012-07-23T21:57:24.000Z",
"courses": [
{
"title": "HTML",
"url": "http:\/\/teamtreehouse.com\/library\/html",
"badge_count": 1
},
{
"title": "Introduction",
"url": "http:\/\/teamtreehouse.com\/library\/html\/introduction",
"badge_count": 1
}
]
}
}
It is an array, so you need the index, for example: details.badges[0].id
This will return the first (index 0) element id.
.length only returns the length of the array, so it will not be useful to get the data in it.

CouchDB View With OR Condition

I have two kinds of documents in couchDB with following json type:
1.
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"Vehicle": {
"type": "STRING",
"name": "Vehicle",
"value": "12345"
},
"Start": {
"type": "DATE",
"name": "Start",
"value": "2014-09-10T11:19:00.000Z"
}
}
2.
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"Equipment": {
"type": "STRING",
"name": "Equipment",
"value": "12345"
},
"Start": {
"type": "DATE",
"name": "Start",
"value": "2014-09-10T11:19:00.000Z"
}
}
I want to make one view which search all these documents whose doc.Vehicle.value=12345 OR doc.Equipment.value=12345.
How can I make this view that will return all these kind of documents.
Thanks in advance.
Just emit both (yes, map functions may emits multiple times different key-values for the same doc) values with your view:
function(doc){
if (doc.Equipment) {
emit(doc.Equipment.value, null)
}
if (doc.Vehicle) {
emit(doc.Vehicle.value, null)
}
}
And request them by the same key:
http://localhost:5984/db/_design/ddoc/_view/by_equip_value?key="12345"
See also the Guide to Views for more info about CouchDB views.
With Kxepals Version, you cannot query the type of results ("12345" can be either Vehicle, OR Equipment). you can only see the result when you use "include_docs=true" and search inside the doc, or make a second query with the id of the results.
If you want to see the type (or Query by type) you need to extend the View :
..
if(doc.Equipment) {
emit (doc.Equipment.value,doc.Equipment.name);
}
if(doc.Vehicle) {
emit(doc.Vehicle.value,doc.Vehicle.name);
}
Here, the name is the value of the result rows.
But you can also define the results in the query, if you put the name as a first query item:
if(doc.Equipment) {
emit([doc.Equipment.name,doc.Equipment.value],null);
}
if(doc.Vehicle) {
emit ([doc.Vehicle.name,doc.Vehicle.value],null);
}
Here, the
Your Query for Vehicles:
/viewname?startkey=["Vehicle"]&Endkey=["Vehicle",{}]
Equipment:
/viewname?startkey=["Equipment"]&endkey=["Equipment,{}]
Here, the name is the first Item of the result rows key array.
Maybe this will help : http://de.slideshare.net/okurow/couchdb-mapreduce-13321353
BTW: Better solution would be :
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"type": "Vehicle",
"value":"12345",
"Start": {
"type": "DATE",
"name": "Start", // ? maybe also obsolete, because already inside "Start" Element
"value": "2014-09-10T11:19:00.000Z"
}
}
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"type": "Equipment",
"value":"12345",
"Start": {
"type": "DATE",
"name": "Start", // ? maybe also obsolete, because already inside "Start" Element
"value": "2014-09-10T11:19:00.000Z"
}
}
in this case you can use only one emit:
emit([doc.type,doc.value],null)

Resources