MongoDb schema design for User database - database

I'm relatively new to MongoDb and I'm looking to get some advice on my db design for an upcoming project. Essentially, I will have Users, each of which will be associated with multiple Clients. Each of these Clients will have one User and then up to 700 Pages associated with them. Each Page will likely only have one Client. I'm wondering what the best way to build my schema is with this basic design. In my research, I've gathered that one-to-many relationships where I won't be updating the data that often (like my relationship between User and Client, and probably Client to Page) should be embedded in a single document. In that case, I would write a User document something like this.
{
"_id": "12345",
"name" : "Joe Guy",
"clients" : [
{
"_id": "234234",
"name": "Sue Lady",
"pages" : [
{
"_id": "22342",
"title" : "roy's page",
"url" : "https://web.com"
},
{
"_id": "23929",
"title" : "jake's page",
"url" : "https://web1.com"
}
},
{
"_id": "98934",
"name": "bobby man",
"pages" : [
{
"_id": "159837",
"title" : "ted's page",
"url" : "https://web2.com"
}
}]
}
Does this seem like a reasonable design? Or might it be smarter to break out Clients and/or Pages into their own documents? With the amount of Pages per Client being up to 700, I'm not sure I'd want to load all of those Pages each time a User is loaded. With that said, the number of Pages per client probably won't change much after they are first set, at most by a few up or down over a period of time.
Any help is greatly appreciated, thank you!

Related

Solr query for child documents and return parents and filtered children

I'm having trouble creating a Solr query to be able to pull out the right documents, and am starting to wonder if what I am trying to do is even possible.
Currently on Solr 8.9 using a managed schema and every field is using a wildcard field.
Firstly what the document looks like
(changed names due to redacting internal business language):
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates_s": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
},
{
"id": "COUNTY:1CITY:2",
"city_name_s": "Watford",
"size": {
"id": "COUNTY:1CITY:2SIZE:2",
"sq_ft_s": "150",
"sq_meters_s": "10000"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
And what I want to return:
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
Basically my goal is to return more or less the entire thing, however with filtering out one of the cities. For example, the condition for the city would be like city_name_s:"St Albans". So it's to say that I want the parent and all children, however if the child is in that array (ie cities array), then the given field (city_name_s) must equal my defined value, or we don't want that child.
Things I've tried:
I've basically tried two approaches here:
I've tried to play around with {!child} and {!parent} to get a result that I want. Currently I can only get something from City level or the entire thing as if the filter was not there at county level.
I've tried to change values for the childFilter option, with things like:
city_name_s:"St Albans" OR (*:* NOT city_name_s:[* TO *]) to try and say 'if field exists it should be this'.
Anyhow I'm starting to run out of ideas with this; been hacking away at it for the past couple of days and not really got any closer.
Thanks in advance for any help; bashing my head against the wall currently so any suggestions are more than welcome :)
I had a similar issue in solr 9.0.0 and this solved it for me: Apache Solr Filter on Child Documents
In your case, just add fl=*,[child childFilter=city_name_s:"St Albans"]

How do I restrict medication annotations to a specific document section via IBM Watson Annotator for Clinical Data (ACD)

I’m using the IBM Watson Annotator for Clinical Data (ACD) API hosted in IBM Cloud to detect medication mentions within discharge summary clinic notes. I’m using the out-of-the-box medication annotator provided with ACD.
I’m able to detect and extract medication mentions, but I ONLY want medications mentioned within “DISCHARGE MEDICATIONS” or “DISCHARGE INSTRUCTIONS” sections.
Is there a way I can restrict ACD to only return medication mentions that appear within those two sections? I’m only interested in discharge medications.
For example, given the following contrived (non-PHI) text:
“Patient was previously prescribed cisplatin.DISCHARGE MEDICATIONS: 1. Aspirin 81 mg orally once daily.”
I get two medication mentions: one over “cisplatin” and another over “aspirin” - I only want the latter, since it appears within the “DISCHARGE MEDICATIONS” section.
Since the ACD medication annotator captures the section headings as part of the mention annotations that appear within a section, you can define an inclusive filter that checks for (1) the desired normalized section heading as well as (2) a filter that checks for the existence of the section heading fields in general, should a mention appear outside of any section and not include section header fields as part of the annotation. This will filter out any medication mentions from the ACD response that don't appear within a "DISCHARGE MEDICATIONS" section. I added a couple other related normalized section headings so you can see how that's done. Feel free to modify the sample below to meet your needs.
Here's a sample flow you can persist via POST /flows and then reference on the analyze call as POST /analyze/{flow_id} - e.g. POST /analyze/discharge_med_flow
{
"id": "discharge_med_flow",
"name": "Disharge Medications Flow",
"description": "Detect medication mentions within DISCHARGE MEDICATIONS sections",
"annotatorFlows": [
{
"flow": {
"elements": [
{
"annotator": {
"name": "medication",
"configurations": [
{
"filter": {
"target": "unstructured.data.MedicationInd",
"condition": {
"type": "all",
"conditions": [
{
"type": "all",
"conditions": [
{
"type": "match",
"field": "sectionNormalizedName",
"values": [
"Discharge medication",
"Discharge instructions",
"Medications on discharge"
],
"not": false,
"caseInsensitive": true,
"operator": "equals"
},
{
"type": "match",
"field": "sectionNormalizedName",
"operator": "fieldExists"
}
]
}
]
}
}
}
]
}
}
],
"async": false
}
}
]
}
See the IBM Watson Annotator for Clinical Data filtering docs for additional details.
Thanks

How do I choose between column-family and a document store database?

I'm working on a project, and I'm struggling to make a definitive decision on whether to use a column-family or a document store. My situation is as follows:
The project I am working on is a hass.io application that will visualize certain data for Tesla cars. My project will run on a raspberry pi (pi 3), so database size is an issue.
My data will look something like this:
{
"cars" : [
{
"car_id" : 3241123,
"model" : "Tesla S",
"data" : [
{
"timestamp": 23840923804982309,
"temperature": 24.5,
"battery_level" : 40,
"is_charging" : true,
"speed" : null
},
{
"timestamp": 23840923804982333,
"temperature": 26.0,
"battery_level" : 35,
"is_charging" : false,
"speed" : 30
}
]
},
{
"car_id" : 3241157,
"model" : "Renault Zoey",
"data" : [
{
"timestamp": 23840923804982309,
"temperature": 23.3,
"battery_level" : 90,
"is_charging" : true,
"speed" : null
},
{
"timestamp": 23840923804982350,
"temperature": 23.0,
"battery_level" : 92,
"is_charging" : true,
"speed" : null
}
]
}
]
}
my project HAS to use a NoSQL database
This example is in JSON, but it's just to show the data. It doesn't have to be stored in the database as a JSON file per se.
It is expected that the amount of cars will be low (2-4) and the amount of data will grow quite large (a couple of new entries per minute)
I want to be able to plot the data in a graph, so most likely my queries will have to return the timestamp for every data point of every car and some other value, like for example speed or battery level. My database will have a very low amount of clients, and real-time data visualization is not required. Therefore read speed is not very important.
As far as my research has shown according to these requirements, the column-family and document store architectures don't differ too much. Except for scalability, but I don't believe my database will grow to the scale that I will have to start thinking about sharding, and if I do I probably will first have to think about vertical scaling. Am I right in believing this or is there an actual difference?
On a side note: I am asking this question comparing column-families to document store, but perhaps this comparison is futile at this level, and I have to start looking at specific column-stores and document stores. If so any advice in this direction is also appreciated.

Fetch partial documents from couchdb

I'm using couchdb to store large documents, which is causing some trouble when fetching them to memory. I do realize the database is not meant to be used this way. As a fallback solution, is it possible to fetch partial documents from the database, without creating a view?
In example, if a document has the fields id, content and extra_content, I would like to retrieve only the first two.
Thank you in advance.
If you are using CouchDB 2.x, you can use /db/_find endpoint as a mechanism to retrieve part of the doc.
POST /db/_find
{
"selector": {
"_id": "a-doc-id"
},
"fields": [
"_id",
"content"
]
}
You'll get only the set of fields you have specified in the query
This is not possible prior to CouchDB 2.x. For CouchDB 2.x or greater, see JuanjoRodriguez's answer.
But one possible work around for any version of CouchDB would be to take advantage of file attachments, which by default are excluded from a fetch. If some of your data isn't always needed, and doesn't need to be included in indexes, you could potentially store it as (JSON) attachments, rather than as part of the document directly:
{
"id": "foo",
"content": "stuff",
"extra_content": "other stuff"
}
becomes:
{
"id": "foo",
"content": "stuff",
"_attachments": {
"extra_content": {
"content_type": "application/json",
"data": "ZXh0cmEgc3R1ZmYK"
}
}
}

searching an array deep inside a mongo document

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}
you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Resources