Database: flat vs nested data - database

I would like to store data in a database that can be laid out nested like
[
{
id: 'deadbeef',
url: 'https://lol.cat/1234',
revisions: [
{
id: '1',
title: 'foo',
authors: ['lol', 'cat'],
content: 'yadda yadda',
// ...
},
{
id: '2',
title: 'foo',
authors: ['lol', 'cat'],
content: 'yadda yadda bla',
// ...
},
// ...
]
},
// ...
]
(One can imagine more levels here.)
Alternatively, the same data could be organized flat like
[
{
documentId: 'deadbeef',
url: 'https://lol.cat/1234',
id: '1',
title: 'foo',
authors: ['lol', 'cat'],
content: 'yadda yadda',
// ...
},
{
documentId: 'deadbeef',
url: 'https://lol.cat/1234',
id: '2',
title: 'foo',
authors: ['lol', 'cat'],
content: 'yadda yadda bla',
// ...
},
// ...
]
with basically only the leaves of the approach above stored, along with all the information belonging to them.
Typical requests would be:
Give all revisions of document deadbeef.
Give me revision 6 of document caffee.
Is either one of the approaches obviously better? What are advantages/disadvantages of either approach?

Your second schema is a denormalized version of the first. It might be useful to compare a more relational approach:
{
documents: [
{
id: 'deadbeef',
url: 'https://lol.cat/1234',
// ...
},
// ...
],
revisions: [
{
id: '1',
documentId: 'deadbeef'
title: 'foo',
authors: ['lol', 'cat'],
content: 'yadda yadda',
// ...
},
{
id: '2',
documentId: 'deadbeef',
title: 'foo',
authors: ['lol', 'cat'],
content: 'yadda yadda bla',
// ...
},
// ...
]
}
The nested approach suffers from a problem called access path dependence. Basically, by assuming a preferred hierarchy, it makes queries that require a different hierarchy more difficult.
The denormalized version can suffer from update anomalies, which means partial updates can put the database into an inconsistent state.
The relational approach, on the other hand, doesn't favor any hierarchy, thereby supporting ad-hoc querying, and normalization helps to eliminate update anomalies. RDBMSs also incorporate numerous integrity checks and constraints to ensure the validity of data.

Related

How to update a same field in all objects of a same type in normalised cache?

I'm using Apollo Client 3 in a react project.
I have a data structure like this:
ROOT_QUERY
getCollection {
__typename: 'Collection',
_id: '123'
tagColorMap: [
{__typename: 'Tag',
name: 'tag1',
color: "#673ab7",
count: 3},
{__typename: 'Tag',
name: 'tag2',
color: '#f44336',
count: 1},
...
]
entries: [
{
__typename: 'Entry',
_id: 'asd'
tags: [tag1, tag2, tag3]
},
{
__typename: 'Entry',
_id: 'qwe'
tags: [tag2, tag3]
},
...
}
}
the data are normalised in the cache.
ROOT_QUERY
getCollection{
"__ref": "Collection:123"
}
Collection:123{
_id: '123'
tagColorMap: [
{__typename: "Tag",
name: "tag1",
color: "#673ab7",
count: 3},
{__typename: "Tag",
name: "tag2",
color: "#f44336",
count: 1},
...
]
entries: [
{
__ref: "Entry:asd"
},
{
__ref: "Entry:qwe"
},
...
]
Entry:asd {
_id: 'asd'
tags: ['tag1', 'tag2', 'tag3']
},
Entry:qwe {
_id: 'qwe'
tags: ['tag2', 'tag3']
},
I performed a mutation, which renames one of the tag, say 'tag1' -> 'tag11', which returns the new tagColorMap;
now I want to change all 'tag1' into 'tag11' in the cache.
I have gone through the official doc and googled it for a while, but still can't find a way to do this.
refetching won't work because the time between the mutation is done and the refetch is done, all the entries that still have tag 'tag1' don't have a corresponding colour in the colour map, it will fallback to default colour, then back to the original colour after the refetch is done.
another way might be that to let the server return the entire collection after the mutation, which is quite a lot of data
so that's why I would like to rename all 'tag1' into 'tag11' in all 'entry" objects directly in cache, but I couldn't find a way to do this... Could anyone help me with this?
thank you very much in advance!

Is there other solution to use `R.applySpec` without inserting the unchanged keys value?

Is there other solution to use R.applySpec without inserting the the unchanged keys value?(without needs to type id and name keys in the example, because later the keys will be change dynamically). Thank you.
Here is my input data
const data = [
[
{ id: 'data1', name: 'it is data 1', itemId: 'item1' },
{ id: 'data1', name: 'it is data 1', itemId: 'item2' }
],
[
{ id: 'data2', name: 'it is data 2', itemId: 'item1' }
],
[
{ id: 'data3', name: 'it is data 3', itemId: 'item1' },
{ id: 'data3', name: 'it is data 3', itemId: 'item2' }
]
]
And the output
[
{
id: 'data1', // this one doesn't change
name: 'it is data 1', // this one doesn't change
itemId: [ 'item1', 'item2' ]
},
{
id: 'data2', // this one doesn't change
name: 'it is data 2', // this one doesn't change
itemId: [ 'item1' ]
},
{
id: 'data3', // this one doesn't change
name: 'it is data 3', // this one doesn't change
itemId: [ 'item1', 'item2' ]
}
]
The solution to get the output using Ramda
const result = R.map(
R.applySpec({
id: R.path([0, 'id']),
name: R.path([0, 'name']), // don't need to type id or name again
itemId: R.pluck('itemId')
})
)(data)
We could certainly write something in Ramda like this:
const convert = map (lift (mergeRight) (head, pipe (pluck ('itemId'), objOf('itemId'))))
const data = [[{id: 'data1', name: 'it is data 1', itemId: 'item1'}, {id: 'data1', name: 'it is data 1', itemId: 'item2'}], [{id: 'data2', name: 'it is data 2', itemId: 'item1'}], [{id: 'data3', name: 'it is data 3', itemId: 'item1'}, {id: 'data3', name: 'it is data 3', itemId: 'item2'}]]
console .log (convert (data))
.as-console-wrapper {min-height: 100% !important; top: 0}
<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.27.0/ramda.js"></script>
<script> const {map, lift, mergeRight, head, pipe, pluck, objOf} = R </script>
I'm not sure whether I find that more or less readable than a ES6/Ramda version, though:
const convert = map (
reduce ((a, {itemId, ...rest}) => ({...rest, itemId: [...(a .itemId || []), itemId]}), {})
)
or a plain ES6 version:
const convert = data => data .map (
ds => ds .reduce (
(a, {itemId, ...rest}) => ({...rest, itemId: [...(a .itemId || []), itemId]}),
{}
)
)
The question about applySpec is interesting. This function lets you build a new object out of the old one, but you have to entirely describe the new object. There is another function, evolve, which keeps intact all the properties of the input object, replacing only those specifically mentioned, by applying a function to their current value. But the input to the functions in evolve accepts only the current value, unlike applySpec which has access to the entire original object.
I could see some rationale for a function combining these behaviors. But I don't have a clear API in my head for how it should work. If you have some thoughts on this, and want to make a proposal, the Ramda team is always looking for suggestions.

Performance and scalability to have one data structure or splitting data structures in MongoDB

Is there a performance difference if you have an array within a data set (tags) like this:
images cluster:
{
_id: 'imageID1',
title: 'some title name',
tags: ['cool', 'banana', 'animal']
},
{
_id: 'imageID2',
title: 'some other title name',
tags: ['funny', 'creative', 'animal']
}
vs separating the array into different clusters? (I'm using mongodb as my database)
images cluster:
{
_id: 'imageID1',
title: 'some title name',
tagsId: 'imageID1tags'
},
{
_id: 'imageID2',
title: 'some other title name',
tagsId: 'imageID2tags'
}
Tags cluster:
{
_id: '1',
imagesId: 'imageID1',
tag: 'cool'
},
{
_id: '2',
imagesId: 'imageID1',
tag: 'banana'
},
{
_id: '3',
imagesId: 'imageID1',
tag: 'animal'
},
{
_id: '4',
imagesId: 'imageID2',
tag: 'funny'
},
{
_id: '5',
imagesId: 'imageID2',
tag: 'creative'
},
{
_id: '6',
imagesId: 'imageID2',
tag: 'animal'
},
Just side note: You can see (for example purposees), I have 2 animal data, where 1 of them belongs to imageID1 and another imageID2. This cluster will have endless amounts of tags, where it can be duplicate names, unique to its imagesId.
So does it make sense to just simplify it (for scalability and performance considerations) and have it within the images cluster (just 1 data cluster), or spread it out (having 2 data clusters)? I want to eventually do a search on the tags, and then the images associated to the search tag will populate.
So if user searches animal, both of the images will show up.

Managing redux state for a list of items

Prior to this, I was managing general Redux state as follows:
for example I was setting isRequestTags from a reducer.
But now I'm facing another challenge:
Suppose I have a list of tags, for each tag there can be some states defined like isPrimaryTag.
How can I define states for a list of items which have a common attribute?
If you have a list of tags, and each tag has, say, a name and a flag, then you can't "refactor" that out in any meaningful way, e.g.,
tags: [
{ name: 'foo', isPrimary: true },
{ name: 'bar', isPrimary: false }
]
If the common attributes are themselves an object, particularly a large one, you'd use normal state-shape practices as outlined in the Redux docs.
For example, if each tag had something like this:
tagInfo: {
isPrimary: true,
group: 'whatever',
somethingElse: { etc: 'etc' }
}
and multiple tags had the same value, you'd provide an ID/index:
tagInfos: [
{
isPrimary: true,
group: 'whatever',
somethingElse: { etc: 'etc' }
},
{
isPrimary: true,
group: 'whatever',
somethingElse: { etc: 'etc' }
}
]
tags: [
{ name: 'foo', tagInfoIndex: 0 },
{ name: 'bar', tagInfoIndex: 1 }
// etc
]
All that said, I'm not entirely sure if that's what you're asking.

What is an example of normalizing the state in a React Redux app?

I'm reading the Redux Reducers docs and don't get how normalizing the state would work. The current state in the example is this:
{
visibilityFilter: 'SHOW_ALL',
todos: [
{
text: 'Consider using Redux',
completed: true,
},
{
text: 'Keep all state in a single tree',
completed: false
}
]
}
Can you provide an example of what the above would look like if we followed the below?
For
example, keeping todosById: { id -> todo } and todos: array inside
the state would be a better idea in a real app, but we’re keeping the
example simple.
This example is straight from Normalizr.
[{
id: 1,
title: 'Some Article',
author: {
id: 1,
name: 'Dan'
}
}, {
id: 2,
title: 'Other Article',
author: {
id: 1,
name: 'Dan'
}
}]
Can be normalized this way-
{
result: [1, 2],
entities: {
articles: {
1: {
id: 1,
title: 'Some Article',
author: 1
},
2: {
id: 2,
title: 'Other Article',
author: 1
}
},
users: {
1: {
id: 1,
name: 'Dan'
}
}
}
}
What's the advantage of normalization?
You get to extract the exact part of your state tree that you want.
For instance- You have an array of objects containing information about the articles. If you want to select a particular object from that array, you'll have to iterate through entire array. Worst case is that the desired object is not present in the array. To overcome this, we normalize the data.
To normalize the data, store the unique identifiers of each object in a separate array. Let's call that array as results.
result: [1, 2, 3 ..]
And transform the array of objects into an object with keys as the id(See the second snippet). Call that object as entities.
Ultimately, to access the object with id 1, simply do this- entities.articles["1"].
You can use normalizr for this.
Normalizr takes JSON and a schema and replaces nested entities with their IDs, gathering all entities in dictionaries.
For example,
[{
id: 1,
title: 'Some Article',
author: {
id: 1,
name: 'Dan'
}
}, {
id: 2,
title: 'Other Article',
author: {
id: 1,
name: 'Dan'
}
}]
can be normalized to
{
result: [1, 2],
entities: {
articles: {
1: {
id: 1,
title: 'Some Article',
author: 1
},
2: {
id: 2,
title: 'Other Article',
author: 1
}
},
users: {
1: {
id: 1,
name: 'Dan'
}
}
}
}

Resources