Updating one RDBMS and NoSQL DBMS together - database

We have a use case in which we need to use a NoSQL database as a key value store and a RDBMS as a read view. We have an event driven system which will process the event only if the event id of the event is not found in the Key Value store. Once the event is processed, we need to add the event ID to the key value store and the result of the process to the RDBMS in one transaction.
Which combination of DBMS offers such solution? In summary, I would like to update a NoSQL and RDBMS together in one transaction with different data.

Related

Advice on database modeling, OneToOne with multiple related tables

I am looking for advice on the best way to go about modeling my database
Lets say I have three entities: meetings, habits, and tasks. Each has its own unique schema, however, I would like all 3 to have several things in common.
They should all contain calendar information, such as a start_date, end_date, recurrence_pattern, etc...
There are a few ways I could go about this:
Add these fields to each of the entities
Create an Event entity and have a foreign_key field on each of the other entities, pointing to the related Event
Create an Event entity and have 3 foreign_key fields on the Event (one for each of the other entities). At any given time only 1 of those fields would have a value and the other 2 would be null
Create an Event entity with 2 fields related_type and related_id. the related_type value, for any given row, would be one of "meetings", "habits", or "tasks" and the related_id would be the actual id of that entity type.
I will have separate api endpoints that access meetings, habits, and tasks.
I will need to return the event data along with them.
I will also have an endpoint to return all events.
I will need to return the related entity data along with each event.
Option 4 seems to be the most flexible but eliminates working with foreign keys.
Im not sure if that is a problem or a hinders performance.
I say its flexible in the case that I add a new entity, lets call it "games", the event schema will already be able to handle this.
When creating a new game, I would create a new event, and set the related_type to "games".
Im thinking the events endpoint can join on the related_type and would also require little to no updating.
Additionally, this seems better than option 3 in the case that I add many new entities that have event data.
For each of these entities a new column would be added to the event.
Options 1 and 2 could work fine, however I cannot just query for all events, I would have to query for each of the other entities.
Is there any best practices around this scenario? Any other approaches?
In the end performance is more important then flexibility. I would rather update code than sacrifice on performance.
I am using django and maybe someone has some tips around this, however, I am really looking for best practices around the database itself and not the api implementation.
I would keep it simple and choose option 1. Splitting up data in more tables than necessary for proper normalization won't be a benefit.
Perhaps you will like the idea of using PostgreSQL table inheritance. You could have an empty table event, and your three tables inherit from that table. That way, they automatically have all column from event, but they are still three independent tables.

Event Sourcing SQL Populate Parent and Child Table

Following up from question
CQRS Read Model Design when Event Sourcing with a Parent-Child-GrandChild… relationship:
We utilize Event sourcing with SQL Server 2016 at Example: furniture company.
(1) We have a Parent and Child table. Say a FurnitureDescriptionTable, (Parent table- description of all furniture Items) and FurnitureOrders(Child - multiple customers orders, refers to FurnitureDescription table). Should the join column between these be Guid or Integer Identity in SQL?
(2) If Guid, who generates the Guid, API or SQL? any reason?
Choosing what kind of type you need for for primary/foreign keys is a known problem in RDBMS world. Simple googling will help. But still:
Guids are usually done on the application side. This option is popular (since you are referring to CQRS) when command handlers can generate complete domain objects, including the identity. Otherwise, you need to have a unique identity generator, which might be non-trivial, but still feasible in some databases, like using Oracle sequences.
Numbers are usually chosen for database-generated ids. Then, new id will only be known when the row is inserted to a table. For event-sourcing scenario this is not an option, since you will only insert on the read side, but objects are created on the write side.

Cassandra, how to filter and update a big table dynamically?

I'm trying to find the best data model to adapt a very big mysql table in Cassandra.
This table is structured like this:
CREATE TABLE big_table (
social_id,
remote_id,
timestamp,
visibility,
type,
title,
description,
other_field,
other_field,
...
)
A page (which is not here) can contain many socials, which can contain many remote_ids.
Social_id is the partitioning key, remote_id and timestamp are the clustering key: "Remote_id" gives unicity, "Time" is used to order the results. So far so good.
The problem is that users can also search on their page contents, filtering by one or more socials, one or more types, visibility (could be 0,1,2), a range of dates or even nothing at all.
Plus, based on the filters, users should be able to set visibility.
I tried to handle this case, but I really can find a sustainable solution.
The best I've got is to create another table, which I need to keep up with the original one.
This table will have:
page_id: partition key
timestamp, social_id, type, remote_id: clustering key
Plus, create a Materialized View for each combination of filters, which is madness.
Can I avoid creating the second table? What wuold be the best Cassandra model in this case? Should I consider switching to other technologies?
I start from last questions.
> What would be the best Cassandra model in this case?
As stated in Cassandra: The Definitive Guide, 2nd edition (which I highly recommend to read before choosing or using Cassandra),
In Cassandra you don’t start with the data model; you start with the query model.
You may want to read an available chapter about data design at Safaribooksonline.com. Basically, Cassandra wants you to think about queries only and don't care about normalization.
So the answer on
> Can I avoid creating the second table?
is You shouldn't avoiding it.
> Should I consider switching to other technologies?
That depends on what you need in terms of replication and partitioning. You may end up creating master-master synchronization based on RDBMS or something else. In Cassandra, you'll end up with duplicated data between tables and that's perfectly normal for it. You trade disk space in exchange for reading/writing speed.
> how to filter and update a big table dynamically?
If after all of the above you still want to use normalized data model in Cassandra, I suggest you look on secondary indexes at first and then move on to custom indexes like Lucene index.

How do entity groups help transactions?

In Google App Engine, all datastore operations in a transaction must operate on entities within the same group.
I don't understand why this is a useful constraint for transactions. It seems unnecessary since the datastore could know which entities to lock based on what happens in the transaction.
How does grouping entities together improve datastore's operations during transactions?
It's useful to think of App Engine's datastore as a giant hashmap. All you can do is put, get & delete key-value pairs. Often the key is auto-created, and usually the value is a serialized object, but it's still all a humungous key-value pair store. A hashmap.
Now this big hashmap has one (and only one) option for transactions: you can atomically manipulate one key-value pair at a time. No choices, ifs or buts - a transaction is something that applies to a single pair.
Of course, your value can be anything. It doesn't have to be a single object. It could be a hierarchical tree of objects. That's an entity group. It's a trick that says, "I have to manipulate these objects in a transaction, so I'll have to make them look like a single value. I'll just stick them into a parent object and store that."
So entity groups aren't created as a useful way of doing transactions. Entities are grouped because it's the only way to put them in a transaction.

Maintaining audit log for entities split across multiple tables

We have an entity split across 5 different tables. Records in 3 of those tables are mandatory. Records in the other two tables are optional (based on sub-type of entity).
One of the tables is designated the entity master. Records in the other four tables are keyed by the unique id from master.
After update/delete trigger is present on each table and a change of a record saves off history (from deleted table inside trigger) into a related history table. Each history table contains related entity fields + a timestamp.
So, live records are always in the live tables and history/changes are in history tables. Historical records can be ordered based on the timestamp column. Obviously, timestamp columns are not related across history tables.
Now, for the more difficult part.
Records are initially inserted in a single transaction. Either 3 or 5 records will be written in a single transaction.
Individual updates can happen to any or all of the 5 tables.
All records are updated as part of a single transaction. Again, either 3 or 5 records will be updated in a single transaction.
Number 2 can be repeated multiple times.
Number 3 can be repeated multiple times.
The application is supposed to display a list of point in time history entries based on records written as single transactions only (points 1,3 and 5 only)
I'm currently having problems with an algorithm that will retrieve historical records based on timestamp data alone.
Adding a HISTORYMASTER table to hold the extra information about transactions seems to partially address the problem. A new record is added into HISTORYMASTER before every transaction. New HISTORYMASTER.ID is saved into each entity table during a transaction.
Point in time history can be retrieved by selecting the first record for a particular HISTORYMASTER.ID (ordered by timestamp)
Is there any more optimal way to manage audit tables based on AFTER (UPDATE, DELETE) TRIGGERs for entities spanning multiple tables?
Your HistoryMaster seems similar to how we have addressed history of multiple related items in one of our systems. By having a single point to hang all the related changes from in the history table, it is easy to then create a view that uses the history master as the hub and attached the related information. It also allows you to not create records in the history where an audit is not desired.
In our case the primary tables were called EntityAudit (where entity was the "primary" item being retained) and all data was stored EntityHistory tables related back to the Audit. In our case we were using a data layer for business rules, so it was easy to insert the audit rules into the data layer itself. I feel that the data layer is an optimal point for such tracking if and only if all modifications use that data layer. If you have multiple applications using distinct data layers (or none at all) then I suspect that a trigger than creates the master record is pretty much the only way to go.
If you don't have additional information to track in the Audit (we track the user who made the change, for example, something not on the main tables) then I would contemplate putting the extra Audit ID on the "primary" record itself. Your description does not seem to indicate you are interested in the minor changes to individual tables, but only changes that update the entire entity set (although I may be miss reading that). I would only do so if you don't care about the minor edits though. In our case, we needed to track all changes, even to the related records.
Note that the use of an Audit/Master table has an advantage in that you are making minimal changes to the History tables as compared to the source tables: a single AuditID (in our case, a Guid, although autonumbers would be fine in non distributed databases).
Can you add a TimeStamp / RowVersion datatype column to the entity master table, and associate all the audit records with that?
But an Update to any of the "child" tables will need to update the Master entity table to force the TimeStamp / RowVersion to change :(
Or stick a GUID in there that you freshen whenever one of the associated records changes.
Thinking that through, out loud, it may be better to have a table joined 1:1 to Master Entity that only contains the Master Entity ID and the "version number" fo the record - either TimeSTamp / RowVersion, GUID, incremented number, or something else.
I think it's a symptom of trying to capture "abstract" audit events at the lowest level of your application stack - the database.
If it's possible consider trapping the audit events in your business layer. This would allow you to capture the history per logical transaction rather than on a row-by-row basis. The date/time is unreliable for resolving things like this as it can be different for different rows, and the same for concurrent (or closely spaced) transactions.
I understand that you've asked how to do this in DB triggers though. I don't know about SQL Server, but in Oracle you can overcome this by using the DBMS_TRANSACTION.LOCAL_TRANSACTION_ID system package to return the ID for the current transaction. If you can retrieve an equivalent SQLServer value, then you can use this to tie the record updates for the current transaction together into a logical package.

Resources