Couch DB relationship modelling - searching & sorting - reactjs

I am building a site using Couchdb and ReactJS.
One of my pages displays a list of up to 10,000 financial transactions, each txn consisting of:
date
in amount
out amount
payee
category item
notes
I have a pagination strategy and only load and display 100 transactions at a time.
At any one time, I want to be able to search a single column - I use a drop down to tell the search functionality which index to use for searching.
I also want to be able to sort each column.
So far I have used multiple views and I have all of the above functionality working.
During development I used a string for the category item. Now that I have worked out how to get all of the above to work, I need to properly tackle the category item column entry.
A category item belongs to a category, so a category can have one or more category items so there is a one to many relationship between the category and the items.
Each txn can have one and only one category item.
A category is made up of a small number of fields.
A category item is made up of a small number of fields.
I am struggling to find the best approach to doing this.
I have considered each of the approaches described in https://docs.couchbase.com/server/5.0/data-modeling/modeling-relationships.html.
At this point, I am considering one of the following approaches and I was wondering if anyone had any advice - I have include examples of the txns, cats and cat items at the end of this post?
Embed the cat item in the txn and hopefully suss how to both search and sort on the cat item.name
Abandon pagination and load all the txns into the virtual dom, and sort and search the dom directly
Currently each distinct item is a separate document and I use referencing to maintain the relationship. I have considered using the id to store searching and sorting data but I don't see how this would work to give me all that I need.
Txn
{
"_id": "1",
"type": "txn"
"date": "2020-01-20",
"cat": "3",
"notes": "xxxx",
"out": 10,
"in": 0
}
Category
{
"_id": "2",
"type": "cat",
"name": "Everyday Expenses",
"weight": 2
}
Category Item
{
"_id": "3",
"type": "catitem",
"cat": "2",
"name": "Groceries (£850)",
"weight": 0,
"notes": "blah, blah, blah"
}
I am running ReactJS on node.js and I am using pouchdb.

Related

FireStore(NoSQL) fetching limited in nested data

Currently, I working on personal project. I want to build a test online.
I'm using Firestore(NoSQL) for storing Test and Question
This is my current schema
{
"id": "Test ID",
"name": "Test Name",
"number_of_question": 20, // Number of question will fetch from question_bank
"question_bank": [
{"id": "Question ID",
"name": "Question Name 1 ?",
"answer": ["A","B","C","D"],
"correct_answer": ["A","B"]
},
{
"id": "Question ID 2",
"name": "Question Name 2 ?",
"answer": ["A","B","C","D"],
"correct_answer": ["A"]
}, ...
]
}
Because in the future, there are possibility that the question_bank become very large (1000 questions)
Is there a way or a better schema that we can tell NoSQL to fetch (randomly limited to number_of_question)questions in question_banks.
(I really want to hit the database only 1 for this action)
Firestore will always return the whole document so you cannot fetch just a few items from that array. The question_bank can be a sub-collection where each question in question_bank array is a document. Then you can specify number of documents to query from the sub-collection.
const snap = await db.collection('quizzes/{quizId}/questions').limit(20).get()
// can add more query clauses if required
If you want to fetch random documents from that sub-collection, checkout:
Firestore: How to get random documents in a collection
It sounds like you'll want to use a subcollection for the question_bank of each test. With a subcollection you can query the questions for a specific test, retrieving a subset of them.
I recommend checking out the Firebase documentation on the hierarchical data model of Firestore, and on performing queries.

Document with reference to many vs Many with refenrece to document(s) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am setting up my database structure and I am unsure on what the right solution is for my use case:
I have millions of Item documents all with static information that
will never change.
I have Group Documents that hold a list of key value pairs (filters) by which the Items will be grouped.
Items can belong to more than one Group
The key value pairs of the Group can be changed which will change which Items fall within this Group
I have 2 solutions, but unsure which one is the right one:
Have the Group documents hold a list of id's referencing all the Item documents that belong to the group, which could be very huge. In this solution, should the Group's filters be changed then I would have to traverse all Items, extract their Id's and assign it to the Group. This would result in a single update, but also a very large list of Item references.
Group
{
"_id": "af355",
"_rev": "string",
"filters": {
"key1": "value",
"key2": "value",
...
},
"itemIds": [
"s5f6a",
"afaf4",
"12dr4",
...(could potentially be millions)
]
}
Item (Millions of these)
{
"_id": "s5f6a",
"_rev": "string",
"field1": "value",
"field2": "value"
}
Have the Items hold a list of Id's referencing the Groups they belong too, which will never be very big. Should a Group's filters change then I would have to traverse all Items, and update all those that match and no longer match which could be potentially millions of updates.
Group
{
"_id": "af355",
"_rev": "string",
"filters": {
"key1": "value",
"key2": "value",
...
}
}
Item
{
"_id": "s5f6a",
"_rev": "string",
"field1": "value",
"field2": "value"
...
"groups": [
"af355",
"46sdf",
...(Small list)
]
}
Which of these solutions would yield better performance and least use of resources or the best balance there of? If there is a better solution, I am open to ideas.
In general terms, you'll work with the grain of couchdb if your model is immutable-ish. Any model that relies on updating large lists or objects inside documents will be prone to update conflicts once the rate of change increases.
Documents with a changing list of millions of objects will be no fun at all: almost certainly you'll have to contend with (a) frequent update conflicts, (b) long AND wide revision trees (if you're replicating) and (c) poor performance as the documents increase in size.
Small (10s of kbs max), unchanging docs is the ideal if the data set is large or the concurrent rate of change is large (for some definition of large).
Use views to stich together the current state for docs, or do more work on the client side (multiple requests) instead.

How to "join" 2 indices and search in ElasticSearch?

Suppose I have an index called "posts" with the following properties:
{
"uid": "<user id>",
"date": "<some date>",
"message": "<some message>"
}
And another index called "users" with the following properties:
{
"uid": "<user id>",
"gender": "Male"
}
Now, I'm searching for posts posted by people who are males. How can I do that?
I definitely don't want to have a "user" property in a post and store the gender of the user in there. Because when a user updates his/her gender, I'd have to go to every single post that he/she has ever posted to update the gender.
Elasticsearch doesn't support inter index relation till now. There is 'join' datatype but it supports only fields within the same index.

Does it make sense to model a Solr document if search does not allow to specify the attributes?

I want to provide a search feature in my site where the user is able to search by text only, without specifying the attributes.
For example, instead of allowing the user to search by "author=George Martin" he will simply query "George Martin".
I would like to know if there is any advantage in a document model like this one:
{
"id": 1,
"title": "Game of Thrones",
"author": "George R. R. Martin",
"published": "August, 1996"
}
Compared to:
{
"id": 1,
"data": [
"Game of Thrones",
"George R. R. Martin",
"August, 1996"
]
}
If I'm not going to use "author:value" in the Solr API, I should get the same results, right?
The first version will allow you to assign different weights to the different fields. I.e. a hit in the title might be more important than a hit in the author field - or vice versa.
Using the edismax handler (defType=edismax) and query fields (qf=title author published) will give you the same behavior as your second example, but will retain the structure of the document.
As the fields are put into the qf parameter, there is no need for the user to explicitly tell Solr which fields she wants to search.
To give the fields different weights, assign a weight to the field in the qf list: qf=title^5 author^2 published will give a hit in title five times the weight than a hit in published - i.e. "The Hunt for Red October" will be more important than something published in October.

Why is it possible to get duplicate results from Azure Search when paging?

Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:
GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc
Why is this possible? How can it happen? Are there any consistency guarantees when paging?
The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).
Here is an example of how you might get duplicates. Assume an index with four documents:
{ "id": "1", "rating": 5 }
{ "id": "2", "rating": 3 }
{ "id": "3", "rating": 2 }
{ "id": "4", "rating": 1 }
Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:
$top=2&$skip=0&$orderby=rating desc
And get these results:
{ "id": "1", "rating": 5 }
{ "id": "2", "rating": 3 }
Now you insert a fifth document into the index:
{ "id": "5", "rating": 4 }
Shortly thereafter, you execute a query to fetch the second page of results:
$top=2&$skip=2&$orderby=rating desc
And get these results:
{ "id": "2", "rating": 3 }
{ "id": "3", "rating": 2 }
Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.
In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.
For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.

Resources