Which should I choose, Document Model or Relation Model?

Which should I choose, Document Model or Relation Model? - database

In my application, I choose Document Model but I still have some questions.
Here is my example document:
{
"catalogs": {
"cat-id1": {
"name": "catalog-1",
"createdAt": 123,
"products": {
"pro-id1": {
"name": "product-1",
"createdAt": 321,
"ingredients": {}
},
"pro-id2": {
"name": "product-2",
"createdAt": 654,
"ingredients": {}
}
}
},
"cat-id2": {
"name": "catalog-2",
"createdAt": 456,
"products": {
"pro-id3": {
"name": "product-3",
"createdAt": 322,
"ingredients": {}
},
"pro-id4": {
"name": "product-4",
"createdAt": 655,
"ingredients": {}
}
}
}
}
}
But ingredients in product is referrer to another Document.
{
"ingredients": {
"ing-id1": {},
"ing-id2": {}
}
}
Document Model has several benefits:
Easy to edit schema, like if (user.first_name) user.first_name = user.name.split(' ')[0]
No need to join, easily take all data in once.
Also I know that:
On updates to a document, the entire document usually needs to be rewritten.
For these reasons, it is generally recommended that you keep documents fairly small and avoid writes that increase the size of a document .
Main idea is: Which data model leads to simpler application code?
My question will be:
What size of Document should I keep?
My application already have a Relation DB, should I combined Document Model to Relation DB to reducing complexity?

Since you already have a relational database in use, I don't see a real benefit to using a document based DB as well.
Your Database schema seems simple enough to be using a relational DB. Wheras, if catalog entries would be very different from each other, you might consider a document based model. But this does not seem to be the case.
Therefore, my advice is, you stick with a relational model.
I would design the model like this:
A table for each entity (catalog, product, ingredient) where each entry has a unique Id
A relation table for each n:m relationship (catalogProduct, productIngredient) that only contain the Id of the entities of the relationship.
An example:
The ingredients ing1, ing2 and ing3 are stored in the table ingredient.
The products prod1 and prod2 are stored in product.
ing1 and ing2 are needed for prod1
ing2 and ing3 for prod2
In productIngredient in each entry, you store the ID of an ingredient and the ID of the product it is used in.
prod1 : ing1
prod1 : ing2
prod2 : ing2
prod2 : ing3

Related

Representing JSON data in relational database table

I have a problem where I need to convert a JSON payload into SQL tables, while maintaining the relationships established in the payload. This is so that later I have the ability to query the tables and recreate the JSON payload structure in the future.
For example:
{
"batchId": "batch1",
"payees" : [
   {
 "payeeId": "payee1",
"payments": [
{
"paymentId": "paymentId1",
"amount": 200,
"currency": "USD"
},
{
"paymentId": "paymentId2",
"amount": 200,
"currency": "YEN"
},
{
"paymentId": "paymentId2",
"amount": 200,
"currency": "EURO"
}
]
}
]
}
For the above payload, I have a batch with payments grouped by payees. At its core it all boils down to a batch and its payments. But in that you can have groupings, for example above, it's grouped by payees.
One thing to note is that the payload may not necessarily always follow the above structure. Instead of grouping by payees, it could be by something else like currency for example. Or even no grouping at all, just a root level batch and an array of payments.
I want to know if there are conventions/rules I can follow to approach represent such data into relational tables? Thanks.
edit:
I am primarily looking to use Postgres and have looked into the jsonb feature that it provides for storing json data. However, I'm still struggling to figure out how/where (in terms of which table) to best store the grouping info.

Building a database of average speed from two cameras using cloudant entries

I'm very new to IBM Cloud, and specifically, Cloudant DB. In this database is records of when cars pass two different speed cameras (either 20 or 21). I have a database of javascript objects with the following format:
{
"_id": "006994989f0914a7fb1ca44fae00fe75",
"_rev": "1-e9b9afcb45f6ff703825d4be6d331f73",
"payload": {
"license_plate": "GNX834",
"camera_id": 20,
"date_time_string": "2019-05-08T15:20:04.134Z",
"date_time_UTC_milliseconds": 1557328804134
},
"qos": 2,
"retain": false
}
I wish create a search index function to create another database full of objects that contain the average speed of cars as they travel between the two cameras (They're 3 miles apart). I know I need to sort them by number plate, but I'm struggling to understand how to do this in Cloudant DB?
Any help on this subject would be great!

If you want to extra the documents where the licence_plate is equal to a value that you know, then you need to create an index on that field:
POST /db/_index HTTP/1.1
Content-Type: application/json
{
"index": {
"fields": ["payload.license_plate"]
},
"name" : "plate-index",
"type" : "json"
}
Then you can query by licence plate:
POST /db/_find HTTP/1.1
Content-Type: application/json
{
"selector": {
"payload.license_plate": "NH67992"
}
}
which would return you the matching documents.
Alternatively, if you want to extract the data en masse, ordered by the license plate field, you could create a MapReduce View with the license plate as the key (map function below):
function(doc) {
emit(doc.payload.license_plate, doc.payload.date_time_string)
}
The index that this Map function generates is ordered by license plate and be used to extract data in license plate order.

WKS - Training model to identify entities on tables

Browser type and version: GoogleChrome 67.0.3396.99
We are trying to train our model to identify values from multiple types of tables whom contain different number of rows and columns. A text row was extracted to begin the training, first we configure our system types and then, marked the entities and also the relation “AllInOne”. We are able to train 10 relations in a training set, but when the model is tested, we are only able to see 8 relations even creating other document sets for training and test the model multiple times. Is there another way to associate the column value with the row values in a single relation considering there isn’t a standard for the types of tables we are analyzing with the Discovery service?
We are expecting the discovery service response as the following:
"relations": [
{
"type": "AllInOne",
"sentence": "…",
"arguments": [
{
"entities": [
{
"“text": "””",
"type": "entity1"
}
]
},
{
"entities": [
{
"“text": "””",
"type": "entity2"
}
]
},
{
"entities": [
{
"“text": "””",
"type": "\"entity..n”,"
}
]
},
{ "..." }
]
}

The machine learning model that is trained in Watson Knowledge Studio targets unstructured natural language text. It may not be suitable for (semi-) structured format like table, especially for relations.

Structuring DynamoDB tables vs traditional relational model

I'm trying to wrap my head around DynamoDB's scans and queries, and how I should structure my tables.
Let's say I have buckets and marbles, and each bucket can contain many marbles. In a traditional relational database, I might set that up like this:
Buckets
id name
---------------
B1 Blue Bucket
B2 Red Bucket
Marbles
id name bucketId lots more fields...
------------------------------------------------
M1 Deep Swirls B1
M2 Fire Red B1
M3 Obsidian B2
As I understand it, if I structured my data this way in DynamoDB, it could be costly for RCUs because I'd have to do scans. If I wanted to get all the marbles in bucket B1, I'd have to do a scan of Marbles where bucketId = B1, which grabs the full list of marbles and then removes the ones that don't match (if I understand the inner workings of DynamoDB correctly).
This doesn't sound very performant or cost-effective. How should I structure this data?
IMPORTANT NOTE: Marbles should be able to exist on their own, i.e. part of no bucket. (bucketId = null)

You will want two tables to track this. bucket for the bucket and marble for the marble. A bucket will contain a list of marbles with some basic information (name, color, etc...) that you would use for displaying a quick list of the collection. Make sure to include the id of the marble. Then on the actual marble representation, put all of the information for the marble, plus a bucket object that will link back to it's assigned bucket. It would look something like this:
Marble
{
"id": 1,
"name": "Deep Swirls",
"color": "Red",
"complexProp": {
...
},
"bucket": {
"name": "Blue Bucket",
"id": 1
"
}
Bucket
{
"id": 1,
"name": "Blue Bucket",
"marbles": [
{
"id": 1,
"name": "Deep Swirls",
"color": "Red"
},
{
"id": 2,
"name": "Fire Red",
"color": "Red"
}
]
}
The downside to this approach is that you will need to update the marble in two places if anything changes (but if a marble changed color that would be rather impressive) if the changing data is in both places. You will also need to change data in two places if you change what bucket it is in. You can omit the bucket property on the marble representation if you don't care about quickly discovering which bucket a given marble is in.

Optimizing seemingly simple couchbase query for "items whose children satisfy"

I'm developing a system to store our translations using couchbase.
I have about 15,000 entries in my bucket that look like this:
{
"classifications": [
{
"documentPath": "Test Vendor/Test Project/Ordered",
"position": 1
}
],
"id": "message-Test Vendor/Test Project:first",
"key": "first",
"projectId": "project-Test Vendor/Test Project",
"translations": {
"en-US": [
{
"default": {
"owner": "414d6352-c26b-493e-835e-3f0cf37f1f3c",
"text": "first"
}
}
]
},
"type": "message",
"vendorId": "vendor-Test Vendor"
},
And I want, as an example, to find all messages that are classified with a "documentPath" of "Test Vendor/Test Project/Ordered".
I use this query:
SELECT message.*
FROM couchlate message UNNEST message.classifications classification
WHERE classification.documentPath = "Test Vendor/Test Project/Ordered"
AND message.type="message"
ORDER BY classification.position
But I'm very surprised that the query takes 2 seconds to execute!
Looking at the query execution plan, it seems that couchbase is looping over all the messages and then filtering on "documentPath".
I'd like it to first filter on "documentPath" (because there are in reality only 2 documentPaths matching my query) and then find the messages.
I've tried to create an index on "classifications" but it did not change anything.
Is there something wrong with my index setup, or should I structure my data differently to get fast results?
I'm using couchbase 4.5 beta if that matters.

Your query filters on the documentPath field, so an index on classifications doesn't actually help. You need to create an array index on the documentPath field itself using the new array index syntax on Couchbase 4.5:
CREATE INDEX ix_documentPath ON myBucket ( DISTINCT ARRAY c.documentPath FOR c IN classifications END ) ;
Then you can query on documentPath with a query like this:
SELECT * FROM myBucket WHERE ANY c IN classifications SATISFIES c.documentPath = "your path here" END ;
Add EXPLAIN to the start of the query to see the execution plan and confirm that it is indeed using the index ix_documentPath.
More details and examples here: http://developer.couchbase.com/documentation/server/4.5-dp/indexing-arrays.html

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Which should I choose, Document Model or Relation Model? - database

Related

Representing JSON data in relational database table

Building a database of average speed from two cameras using cloudant entries

WKS - Training model to identify entities on tables

Structuring DynamoDB tables vs traditional relational model

Optimizing seemingly simple couchbase query for "items whose children satisfy"

Categories

Resources