I have a single SQL Server database for both read and write operation, and I am implementing CQRS pattern for code segregation and maintainability so that I can assign read operations to few resources in my team and write operations to other, I see using CQRS seems to be a clean approach.
Now, whenever there are Inserts/Update/Deletes happening on tables in my database, I need to send messages to other systems who need to know about the changes in my system, as my database is master data so any changes happening here needs to be projected to down stream systems so that they get the latest data and maintain it in their systems. For this purpose I may use MQ or Kafka so whenever there is change i can generate key message and put in MQ or use kafka for messaging purpose.
Until now I haven't used Event Sourcing as I thought since I don't have multiple databases for read/write, so I might not need Event Sourcing, is my assumption right that if we have single database we don't need Event Sourcing ? or Event Sourcing can play any role in utilizing MQ or Kafka, I mean if i use Event Sourcing pattern i can save data first in the master database and use Event Sourcing pattern to write the changes into MQ or use kafka to write messages I am clueless here if we can use Event Sourcing pattern for MQ or Kafka.
Do I need Event Sourcing for writing message to MQ or for using Kafka ? or it's not need at all in my case as i have only one database and I don't need to know the series of updates that happened to the system of record and all i care is about the final state of the record in my master database and then use MQ or Kafka to send about changes to downstream systems where ever there are CRUD operations so that they will have the latest changes.
is my assumption right that if we have single database we don't need Event Sourcing ?
No, that assumption is wrong.
In the CQRS "tradition", event sourcing refers to maintaining information in a persistent data structure; when new information arrives, we append that information to what we already know. In other words, it describes the pattern we use to remember information. See Fowler 2005.
When you starting talking about messaging solutions like Kafka or Rabbit, you are in a different problem space: how do you share information between systems? Well, we put the information into a message, and deliver that message from the producer to the consumer(s). And for reasons of history, that message is called an Event (see Hohpe et al).
Two different ideas, easily confused. When CQRS people talk about event sourcing, they don't mean the same thing as Kafka people talking about event sourcing.
Do I need Event Sourcing for writing message to MQ or for using Kafka ?
No - messaging still works even when you choose other remembering strategies than event sourcing.
It's perfectly reasonable to modify your model "in place", as it were, and then to also publish a message announcing that things have changed.
why do I need Event Sourcing when I have only one database and I don't care about the series of changes to the records but care only about the final state of the data,
You don't.
The two most common motivations for event sourcing is that either (a) an event history has natural alignment with the domain you are working in (for instance: accounting) or (b) a need to support temporal queries.
If you don't have those problems, then you don't need it.
Related
The external database consists of a set of rules for each key, these rules should be applied on each stream element in the Flink job. Because it is very expensive to make a DB call for each element and retrieve the rules, I want to fetch the rules from the database at initialization and store it in a local cache.
When rules are updated in the external database, a status change event is published to the Flink job which should be used to fetch the rules and refresh this cache.
What is the best way to achieve what I've described? I looked into keyed state but initializing all keys and refreshing the keys on update doesn't seem possible.
I think you can make use of BroadcastProcessFunction or KeyedBroadcastProcessFunction to achieve your use case. A detailed blog available here
In short: You can define the source such as Kafka or any other and then publish the rules to Kafka that you want the actual stream to consume. Connect the actual data stream and rules stream. Then the processBroadcastElement will stream the rules where you can update the state. Finally the updated state (rules) can be retrieved in the actual event streaming method processElement.
Points to consider: Broadcast state will be kept on the heap always, not in state store (RocksDB). So, it has to be small enough to fit in memory. Each slot will copy all of the broadcast state into its checkpoints, so all checkpoints and savepoints will have n (parallelism) copies of the broadcast state.
A few different mechanisms in Flink may be relevant to this use case, depending on your detailed requirements.
Broadcast State
Jaya Ananthram has already covered the idea of using broadcast state in his answer. This makes sense if the rules should be applied globally, for every key, and if you can find a way to collect and broadcast the updates.
Note that the Context in the processBroadcastElement() of a KeyedBroadcastProcessFunction method contains the method applyToKeyedState(StateDescriptor<S, VS> stateDescriptor, KeyedStateFunction<KS, S> function). This means you can register a KeyedStateFunction that will be applied to all states of all keys associated with the provided stateDescriptor.
State Processor API
If you want to bootstrap state in a Flink savepoint from a database dump, you can do that with this library. You'll find a simple example of using the State Processor API to bootstrap state in this gist.
Change Data Capture
The Table/SQL API supports Debezium, Canal, and Maxwell CDC streams, and Kafka upsert streams. This may be a solution. There's also flink-cdc-connectors.
Lookup Joins
Flink SQL can do temporal lookup joins against a JDBC database, with a configurable cache. Not sure this is relevant.
In essence David's answer summarizes it well. If you are looking for more detail: not long ago, I gave a webinar [1] on this topic including running code examples. [2]
[1] https://www.youtube.com/watch?v=cJS18iKLUIY
[2] https://github.com/knaufk/enrichments-with-flink
I am currently trying to get into microservices architecture, and I came across Data consistency issue. I've read, that duplicating data between several microservices considered a good idea, because it makes each service more independent.
However, I can't figure out what to do in the following case to provide consistency:
I have a Customer service which has a RegisterCustomer method.
When I register a customer, I want to send a message via RabbitMQ, so other services can pick up this information and store in its DB.
My code looks something like this:
...
_dbContext.Add(customer);
CustomerRegistered e = Mapper.Map<CustomerRegistered>(customer);
await _messagePublisher.PublishMessageAsync(e.MessageType, e, "");
//!!app crashes
_dbContext.SaveChanges();
...
So I would like to know, how can I handle such case, when application sends the message, but is unable to save data itself? Of course, I could swap DbContextSave and PublishMessage methods, but trouble is still there. Is there something wrong with my data storing approach?
Yes. You are doing dual persistence - persistence in DB and durable queue. If one succeeds and other fails, you'd always be in trouble. There are a few ways to handle this:
Persist in DB and then do Change Data Capture (CDC) such that the data from the DB Write Ahead Log (WAL) is used to create a materialized view in the second service DB using real time streaming
Persist in a durable queue and a cache. Using real time streaming persist the data in both the services. Read data from cache if the data is available in cache, otherwise read from DB. This will allow read after write. Even if write to cache fails in worst case, within seconds the data will be in DB through streaming
NServiceBus does support durable distributed transaction in many scenarios vs. RMQ.Maybe you can look into using that feature to ensure that both the contexts are saved or rolled back together in case of failures if you can use NServiceBus instead of RMQ.
I think the solution you're looking for is outbox pattern,
there is an event related database table in the same database as your business data,
this allows them to be committed in the same database transaction,
and then a background worker loop push the event to mq
I have a Flink project that receives an events streams, and executes some logic to add a flag of this event, then it saves the flag and the eventID for a while to be reused or to be queried by other system.
in this case, the volume of data is not too many, and need to be good reliability, of course, better to be updated in time before being used.
Traditionally, we can use an external database to save this kind of data.
But after I learned the state, I saw it seems to be very useful, and has a good backends mechanism, and can be queryable.
So I am asking question to listen more to your arguments and evidence.
I am moving my last two comments to here as an answer since I realized I am essentially doing that.
Ok, It might have been the Uber keynote then. But the bottom line is that there are companies that are using extremely large state to hold data that you need to perform calculations against effectively.
For example, I made a program that took in messages that with an unique ID and a value field(int). I then had a stateful function that was keyed by the ID of the received message and every message I received for that ID would be added to a stateful value object, updating the the total for that ID. You could make a stateful list object to hold all the messages you received if you needed that. An alternative to that is to use a "new age" database that is designed for quick read/writes, like Cassandra, to store that. But that approach comes with its own limitations because of the I/O (long story short, Flink and Cassandra could handle lots of dat fast, the network bandwidth could not).
So keeping all that data in state in flink can be done and used well and has many benefits.
The one thing that I have to caveat this with is that I do not know if Flink's state has the same sort of failsafes like that of Cassandra or Kafka. Whereas they replicate their data across nodes so that if one goes down, then the others can handle everything and repopulate the other node when it is restarted. Flink's state can be stored on a remote backend like an s3 bucket or hdfs
(see: https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/state_backends.html), but I do not know if there is replication of the state. So if the state is stored all on one node that goes down, if it is gone for good or is backed up on another node. That is something to look into more since that should be a big decision in your choice.
Hope that at least gave you some info and a brief idea of what questions to ask.
I was looking to implement CQRS pattern. For the process of updating the read database, is it best to use a windows service, or to update the view at the time of creating a new record in the update database? Is it best to use triggers, or some other process? I've seen a couple of approaches and haven't made up my mind what is the best approach to achieve this.
Thanks.
Personally I love to use messaging to solve these kind of problems.
You commands result in events when they are processed and if you use messaging to publish the events one or more downstream read services can subscribe to the events and process them to update the read models.
The reason why messaging is nice in this case is that it allows you to decouple the write and read side from each other. Also, it allows you to easily have several subscribers if you find a need for it. Additionally, messaging using a persistent queuing system like MSMQ enables retrying of failed messages. It also means that you can take a read model offline (for updates etc) and when it comes back up it can then process all the events in the queue.
I'm no friend of Triggers in relational databases, but I imagine the must be pretty hard to test. And triggers would introduce routing logic where it doesn't belong. Could it be also that if the trigger action fails, the entire write transaction rolls back? Triggers is probably the least beneficial solution.
It depends on how tolerant your application must be with regards to eventual consistency.
If your app has no problem with read data being 5 minutes old, there's no need to denormalize upon every write data change. In that case, a background service that kicks in every n minutes or that kicks in only when the CPU consumption is below a certain threshold, for instance, can be a good solution.
If, on the other hand, your app is time-sensitive, such as in the case of frequently changing statuses, machine monitoring, stock exchange data etc., then you will want to keep the lag as low as possible and denormalize on the spot -- that is, in-process or at least in real-time. So in this case you may choose to run the denormalizers in a constantly-running process or to add them to the chain of event handlers straight in your code.
Your call.
What's the best strategy to keep all the clients of a database server synchronized?
The scenario involves a database server and a dynamic number of clients that connect to it, viewing and modifying the data.
I need real-time synchronization of the data across all the clients - if data is added, deleted, or updated, I want all the clients to see the changes in real-time without putting too much strain on the database engine by continuous polling for changes in tables with a couple of million rows.
Now I am using a Firebird database server, but I'm willing to adopt the best technology for the job, so I want to know if there is any kind of already existing framework for this kind of scenario, what database engine does it use and what does it involve?
Firebird has a feature called EVENT that you may be able to use to notify clients of changes to the database. The idea is that when data in a table is changed, a trigger posts an event. Firebird takes care of notifying all clients who have registered an interest in the event by name. Once notified, each client is responsible for refreshing its own data by querying the database.
The client can't get info from the event about the new or old values. This is by design, because there's no way to resolve this with transaction isolation. Nor can your client register for events using wildcards. So you have to design your server-to-client notification pretty broadly, and let the client update to see what exactly changed.
See http://www.firebirdsql.org/doc/whitepapers/events_paper.pdf
You don't mention what client platform or language you're using, so I can't advise on the specific API you would use. I suggest you google for instance "firebird event java" or "firebird event php" or similar, based on the language you're using.
Since you say in a comment that you're using WPF, here's a link to a code sample of some .NET application code registering for notification of an event:
http://www.firebirdsql.org/index.php?op=devel&sub=netprovider&id=examples#3
Re your comment: Yes, the Firebird event mechanism is limited in its ability to carry information. This is necessary because any information it might carry could be canceled or rolled back. For instance if a trigger posts an event but then the operation that spawned the trigger violates a constraint, canceling the operation but not the event. So events can only be a kind of "hint" that something of interest may have happened. The other clients need to refresh their data at that time, but they aren't told what to look for. This is at least better than polling.
So you're basically describing a publish/subscribe mechanism -- a message queue. I'm not sure I'd use an RDBMS to implement a message queue. It can be done, but you're basically reinventing the wheel.
Here are a few message queue products that are well-regarded:
Microsoft MSMQ (seems to be part of Windows Professional and Server editions)
RabbitMQ (free open-source)
Apache ActiveMQ (free open-source)
IBM WebSphere MQ (probably overkill in your case)
This means that when one client modifies data in a way that others may need to know about, that client also has to post a message to the message queue. When consumer clients see the message they're interested in, they know to refresh their copy of some data.
SQL Server 2005 and higher support notification based data source caching expiry.