One database, multiple frontends, maintaing request ordering - database

Assuming I have one database keeping a simple history with multiple front ends talking to it (one front end per server), I wonder what are the common solutions to deal with time. As soon as I have multiple servers, I cannot assume a global consistent clock, and I was interested in the possible solutions to maintain some kind of ordering between requests.
For a concrete example, let's say I want to record histories of customers, where history is defined as time ordered set of records. The record table would be as simple as (customer_id, time, data), and history would be all the rows where customer_id == requested id. Each request sent by the user would contain one record sent to one customer. Ideally, the time should refer to the "actual" time the request was sent to the front end by the customer (as that's the time as seen from the user POV). To be exact, I only care about preserving the ordering between records for each customer, not about the absolute time.
I am aware of solutions such as vector clocks, etc... but that seems rather complex, and I would expect this to be a rather common issue ?
Solutions which are not acceptable in my case:
Changing the requests arriving at the front end: I unfortunately have to work under the constraint that the requests are passed as is. I have complete control of whatever communication protocol is needed between front ends and database, though.
Server time clocks are synchronized
All request which require being ordered to each other are handled by the same front end server
[EDIT]: the question may sound a bit like red-herring, so here is my rationale for asking it: while this is not my issue right now, I am interested in the possibility to go to a platform like Google App Engine, which explicitly says that their servers are not guaranteed to be time synchronized. The solution to that issue for request ordering does not sound obvious to me - but maybe something like vector clock is actually the only "good" solution ?

When you perform any action that records history data to the database you could record two sets of datetime info:
the datetime as set by the DB when the record was inserted
the datetime passed through with the data as a legitimate piece of metadata.
The former would give you a central view of the world if you ever needed it, and the latter would let you reconstruct datetime from customers perspective.
If you were ultra-keen you could also pass through the datetime from the users browser by filling some sort of parameter/field using JavaScript.

As soon as I have multiple servers, I
cannot assume a global consistent
clock
Well, you can configure servers to sync their clocks to a time server. You could also configure your database server to sync to a time server, and configure the other servers to sync to your database server as often as you need to. (I'm not saying that's a great idea, just saying it's possible. If you have access to all the servers.)
Anyway . . . so the front ends are the only pieces of software you have that actually know when a request arrives. Is that right?
If that's right, then it's the front ends job to record the time of the customer's request, possibly in UTC, and then forward that timestamp to the database.
If you can't synchronize the server's clocks, then I think your only hope is to have every front ends ask just one specific server--maybe your database server, but maybe not--what time it is when a customer request arrives. A front end can do that by asking for daytime on port 13 (DAYTIME protocol, RFC-867), asking for time on port 37 (TIME protocol, RFC-868), or asking a time server on port 123 (either NTP or SNTP protocol, RFC-1305 and RFC-2030).
But after reading your edit, I think what you want is impossible. You seem to be saying that
what the front ends send doesn't
contain enough information to
reconstruct the "true" ordering
what the front ends send cannot be
changed
If the front ends can't send you any other information, vector clocks and interval tree clocks won't help.

Related

Processing a million records as a batch in BizTalk

I am looking at suggestions on how to tackle this and whether I am using the right tool for the job. I work primarily on BizTalk and we are currently using BizTalk 2013 R2 with SQL 2014.
Problem:
We would be receiving positional flat files every day(around 50) from various partners and the theoretical total number of records received would be over a million records. Each record has some identifying information that will need to be sent to a web service which would come back essentially with a YES or NO based on which the incoming file is split into two files.
Originally, the scope for daily expected records was 10k which later ballooned to 100k and now is at a million records.
Attempt 1: Scatter-Gather pattern
I am debatching the records in a custom pipeline using the file disassembler, adding a couple of port configurable properties for the scatter part(following Richard Seroter's suggestion of implementing a round-robin assignment) where I control the number of scatter/worker orchestrations I spin up to call the web service and mark the records to be sent to 'Agency A' or 'Agency B' and finally push a control message that spins up the Gather/Aggregator orchestration that collects all the messages that are processed from the workers into the messagebox via correlation and creates two files to be routed to Agency A and Agency B.
So, every file that gets dropped will have it's own set of workers and a aggregator that would process the file.
This works well for files with fewer number of records but if a file has over 100k records, I see throttling happen and the file takes a long time to process and generate the two files.
I have put the receive location/worker & aggregator/send port on separate hosts.
It appears to be that the gatherer seems to be dehydrated and not really aggregating the records processed by the workers until all of them are processed and i think since the ratio of msgs published vs processed is very large, it is throttling.
Approach 2:
Assuming that the Aggregator orchestration is the bottleneck, instead of accumulating them in an orchestration, i pushed the processed records to a SQL db and 'split' the records into two XML files(basically a concatenate of msgs going to Agency A/B and wrapping it in XML declaration and using the correct msg type based on writing some of the context properties to the SQL table along with the record).
These aggregated XML records are polled and routed to the right agencies.
This seems to work okay with 100k records and completes in an acceptable amount of time. Now that the goal post/requirement has again changed with regard to expected volume, i am trying to see if BizTalk is even a feasible choice anymore.
I have indicated that BT is not the right tool for the job to perform such a task but the client is suggesting we add more servers to make it work. I am looking at SSIS.
Meanwhile, while doing some testing, some observations:
Increasing the number of workers improved processing(duh):
It looks like if each worker processed a fewer number of records in it's queue/subscription, they finished their queue quickly. When testing this 100k record file, using 100 workers completed in under 3 hrs. This is with minimal activity on the server from other applications.
I am trying to get the web service hosting team to give me a theoretical maximum no of concurrent connection they can handle. I am leaning towards asking them to see if they can handle 1000 calls and maybe the existing solution would scale with my observations.
I have adjusted a few settings for the host with regard to message count and physical memory threshold so it won't balk with the volume but I am still unsure. I didn't have to mess with these settings before and can use advice to monitor any particular counters.
The post is a bit long but I am hoping this gives an idea on what I did so far. Any help/insight appreciated in tackling this problem. If you are suggesting alternatives, i am restricted to .NET or MS based tools/frameworks but would love to hear on other options as well.
I will try to answer or give more detail if you want to clarify or understand something I didn't make clear.
First, 1 million records/messages is not the issue, but you can make it a problem by handling it poorly.
Here's the pattern I would lay out first.
Load the records into SQL Server with SSIS. This will be very fast.
Process/drain the records into you BizTalk app for...well, whatever needs to be done. Calling the service etc.
Update the SQL Record with the result.
When that process is complete, query out the Yes and No batches as one (large) message each, transform and send.
My guess is the Web Service will be the bottleneck unless it's specifically designed for such a load. You will probably have to tune BizTalk to throttle only when necessary but don't worry about that just yet. A good app pattern is more important.
In such scenarios, you should consider following approach:
De-batch the file and store individual records to MSMQ. You can easily achieve this without any extra coding effort, all you need is to create a send port using MSMQ adapter or WCF custom with netmsmq binding. If required, you can also create separate queues depending on different criteria you may have in your messages.
Receive the messages from MSMQ using receive location on a separate host.
Send them to web service on a different BizTalk host.
Try using messaging only scenarios, you can handle service response using a pipeline component if required. You can use Map on send port itself. In worst case if you need orchestration, it should only be to handle one message processing without any complex pattern.
You can again push messages back to two MSMQ for two different agencies based of web service response.
You can then receive those messages again and write them to file, you can simply use a send port with FileAppend option or use a custom pipeline component to write the received messages to file without aggregating them in orchestration. You can gather them in orchestration, if per file you don't have more than few thousand messages.
With this approach you won't have any bottleneck within BizTalk and you don't need to use complex orchestration pattern which usually end up having many persistent points.
If web service becomes a bottleneck, then you can control the rate of received message from MSMQ using 1) Ordered Delivery on MSMQ receive location and if required 2) using BizTalk host throttling by changing two properties Message Count in Db to a very low number e.g. 1000 from 50K default and increasing Spool and Tracking Data Multiplier accordingly e.g. 500 from 10 default to make sure the multiply of both number is enough for not to cause throttling due to messages within BizTalk. You can also reduce the number of worker threads on BizTalk host to make it little slow.
Please note MSMQ is part of Windows OS and does not require any additional setup. Usually installed by default, if not you can add using add-remove features. You can also use IBM MQ if your organization has the infrastructure. But for one million messages, MSMQ will be just fine.
Apologies on the late update*
We've decided to use SSIS to bulk import the file to a table and since the lookup web service is part of the same organization and network although using a different stack, they have agreed to allow us to call their lookup table upon which their web service is based on and we are using a 'merge' between those tables to identify 'Y' or 'N' and export them out via SSIS as well.
In short, we've skipped using BT. The time it now takes is within a couple of mins for a 1.5 million record file to be processed and send the split files.
Appreciate all the advice provided here.

Frequent Database Query for Instant Message

I am creating an Instant Messaging application for our department. The features of this application are:
The messages will be stored in a database
The messages may be sent to one, multiple, or all users/locations
The logged in user will be able to see a history of the messages they are included in.
My question: is it appropriate to constantly query the database from each client - there should be less than 20 clients running - say every 15 - 30 secs or so? I have seen examples of a server/client messaging app using tcipclient but am not familiar with that subject. So I thought querying the database might be the approach I could go with. What are the ramifications of performing these queries so often? I'm also looking at sqldependencies??? Should I really go back to and try and learn tcip technology?
Thanks
If you know that you will always have of the order of tens of clients but not of the order of thousands of clients, then polling will work just fine, and you do not have to poll every 15 seconds, (it would be unusable if you did so,) you can poll every 100 or 200 milliseconds, so chatting will appear instantaneous.
Just make sure that each polling operation is as simple as possible. The simplest operation you can do is this:
SELECT * FROM chat_log WHERE chat_log.id > ? where id is your IDENTITY primary key, and ? is the last id that your client has seen so far from the server. Therefore, if there are no new chat messages, no rows are retrieved. With every row retrieved by a client, update the largest id that the client has seen so far, and you are good to go.
I have done it and it works like a charm.
From a technical point of view polling is a very ignoble technique, but in many situations it can be a practical compromise which may yield good enough results with very little development. (The alternative would be to create a proper chat server which sends push notifications to the clients, good luck with that.)
If its less that 20 clients (20 select queries every 20 seconds + some writes), SQL Server will have no issues to process these messages.
Selection of tools and technology depends on your actual requirements. (size of messages, allow file transfers, delete/edit messages...)
I can suggest few options to improve performance,
Reading Messages - You can use Caching (e.g. Azure Redis Cache) for recent messages (last 30days). You can come up with background cache update strategy to make sure it's continuously updated with new messages. Read messages will call the cache first, it will hit the database only if there is a cache miss.
Also you can create a local message cache (client side) which will dramatically improve performance for end user. You can create a SQLite for this (like Skype does. Win + R -> %appdata%\skype -> folder -> main.db)
Or else you can simply have an Archive table in your db where a scheduled (every 24 hours) background process archives messages older than 14/30 days. So you will have recent messages
Writing - Writing messages will be chatty, rather than directly updating the database you can use a Message queue (Azure Message Queue, Rabbit MQ.. etc). Then you can have another process to write messages to the database.
Each technology selection will have it's own cost, pros and cons and learning time. Therefore start simple and leave room to scale later.

DB consistency with microservices

What is the best way to achieve DB consistency in microservice-based systems?
At the GOTO in Berlin, Martin Fowler was talking about microservices and one "rule" he mentioned was to keep "per-service" databases, which means that services cannot directly connect to a DB "owned" by another service.
This is super-nice and elegant but in practice it becomes a bit tricky. Suppose that you have a few services:
a frontend
an order-management service
a loyalty-program service
Now, a customer make a purchase on your frontend, which will call the order management service, which will save everything in the DB -- no problem. At this point, there will also be a call to the loyalty-program service so that it credits / debits points from your account.
Now, when everything is on the same DB / DB server it all becomes easy since you can run everything in one transaction: if the loyalty program service fails to write to the DB we can roll the whole thing back.
When we do DB operations throughout multiple services this isn't possible, as we don't rely on one connection / take advantage of running a single transaction.
What are the best patterns to keep things consistent and live a happy life?
I'm quite eager to hear your suggestions!..and thanks in advance!
This is super-nice and elegant but in practice it becomes a bit tricky
What it means "in practice" is that you need to design your microservices in such a way that the necessary business consistency is fulfilled when following the rule:
that services cannot directly connect to a DB "owned" by another service.
In other words - don't make any assumptions about their responsibilities and change the boundaries as needed until you can find a way to make that work.
Now, to your question:
What are the best patterns to keep things consistent and live a happy life?
For things that don't require immediate consistency, and updating loyalty points seems to fall in that category, you could use a reliable pub/sub pattern to dispatch events from one microservice to be processed by others. The reliable bit is that you'd want good retries, rollback, and idempotence (or transactionality) for the event processing stuff.
If you're running on .NET some examples of infrastructure that support this kind of reliability include NServiceBus and MassTransit. Full disclosure - I'm the founder of NServiceBus.
Update: Following comments regarding concerns about the loyalty points: "if balance updates are processed with delay, a customer may actually be able to order more items than they have points for".
Many people struggle with these kinds of requirements for strong consistency. The thing is that these kinds of scenarios can usually be dealt with by introducing additional rules, like if a user ends up with negative loyalty points notify them. If T goes by without the loyalty points being sorted out, notify the user that they will be charged M based on some conversion rate. This policy should be visible to customers when they use points to purchase stuff.
I don’t usually deal with microservices, and this might not be a good way of doing things, but here’s an idea:
To restate the problem, the system consists of three independent-but-communicating parts: the frontend, the order-management backend, and the loyalty-program backend. The frontend wants to make sure some state is saved in both the order-management backend and the loyalty-program backend.
One possible solution would be to implement some type of two-phase commit:
First, the frontend places a record in its own database with all the data. Call this the frontend record.
The frontend asks the order-management backend for a transaction ID, and passes it whatever data it would need to complete the action. The order-management backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The order-management transaction ID is stored as part of the frontend record.
The frontend asks the loyalty-program backend for a transaction ID, and passes it whatever data it would need to complete the action. The loyalty-program backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The loyalty-program transaction ID is stored as part of the frontend record.
The frontend tells the order-management backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend tells the loyalty-program backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend deletes its frontend record.
If this is implemented, the changes will not necessarily be atomic, but it will be eventually consistent. Let’s think of the places it could fail:
If it fails in the first step, no data will change.
If it fails in the second, third, fourth, or fifth, when the system comes back online it can scan through all frontend records, looking for records without an associated transaction ID (of either type). If it comes across any such record, it can replay beginning at step 2. (If there is a failure in step 3 or 5, there will be some abandoned records left in the backends, but it is never moved out of the staging area so it is OK.)
If it fails in the sixth, seventh, or eighth step, when the system comes back online it can look for all frontend records with both transaction IDs filled in. It can then query the backends to see the state of these transactions—committed or uncommitted. Depending on which have been committed, it can resume from the appropriate step.
I agree with what #Udi Dahan said. Just want to add to his answer.
I think you need to persist the request to the loyalty program so that if it fails it can be done at some other point. There are various ways to word/do this.
1) Make the loyalty program API failure recoverable. That is to say it can persist requests so that they do not get lost and can be recovered (re-executed) at some later point.
2) Execute the loyalty program requests asynchronously. That is to say, persist the request somewhere first then allow the service to read it from this persisted store. Only remove from the persisted store when successfully executed.
3) Do what Udi said, and place it on a good queue (pub/sub pattern to be exact). This usually requires that the subscriber do one of two things... either persist the request before removing from the queue (goto 1) --OR-- first borrow the request from the queue, then after successfully processing the request, have the request removed from the queue (this is my preference).
All three accomplish the same thing. They move the request to a persisted place where it can be worked on till successful completion. The request is never lost, and retried if necessary till a satisfactory state is reached.
I like to use the example of a relay race. Each service or piece of code must take hold and ownership of the request before allowing the previous piece of code to let go of it. Once it's handed off, the current owner must not lose the request till it gets processed or handed off to some other piece of code.
Even for distributed transactions you can get into "transaction in doubt status" if one of the participants crashes in the midst of the transaction. If you design the services as idempotent operation then life becomes a bit easier. One can write programs to fulfill business conditions without XA. Pat Helland has written excellent paper on this called "Life Beyond XA". Basically the approach is to make as minimum assumptions about remote entities as possible. He also illustrated an approach called Open Nested Transactions (http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper142.pdf) to model business processes. In this specific case, Purchase transaction would be top level flow and loyalty and order management will be next level flows. The trick is to crate granular services as idempotent services with compensation logic. So if any thing fails anywhere in the flow, individual services can compensate for it. So e.g. if order fails for some reason, loyalty can deduct the accrued point for that purchase.
Other approach is to model using eventual consistency using CALM or CRDTs. I've written a blog to highlight using CALM in real life - http://shripad-agashe.github.io/2015/08/Art-Of-Disorderly-Programming May be it will help you.

Message Queue or DataBase insert and select

I am designing an application and I have two ideas in mind (below). I have a process that collects data appx. 30 KB and this data will be collected every 5 minutes and needs to be updated on client (web side-- 100 users at any given time). Information collected does not need to be stored for future usage.
Options:
I can get data and insert into database every 5 minutes. And then client call will be made to DB and retrieve data and update UI.
Collect data and put it into Topic or Queue. Now multiple clients (consumers) can go to Queue and obtain data.
I am looking for option 2 as better solution because it is faster (no DB calls) and no redundancy of storage.
Can anyone suggest which would be ideal solution and why ?
I don't really understand the difference. The data has to be temporarily stored somewhere until the next update, right.
But all users can see it, not just the first person to get there, right? So a queue is not really an appropriate data structure from my interpretation of your system.
Whether the data is written to something persistent like a database or something less persistent like part of the web server or application server may be relevant here.
Also, you have tagged this as real-time, but I don't see how the web-clients are getting updates real-time without some kind of push/long-pull or whatever.
Seems to me that you need to use a queue and publisher/subscriber pattern.
This is an article about RabitMQ and Publish/Subscribe pattern.
I can get data and insert into database every 5 minutes. And then client call will be made to DB and retrieve data and update UI.
You can program your application to be event oriented. For ie, raise domain events and publish your message for your subscribers.
When you use a queue, the subscriber will dequeue the message addressed to him and, ofc, obeying the order (FIFO). In addition, there will be a guarantee of delivery, different from a database where the record can be delete, and yet not every 'subscriber' have gotten the message.
The pitfalls of using the database to accomplish this is:
Creation of indexes makes querying faster, but inserts slower;
Will have to control the delivery guarantee for every subscriber;
You'll need TTL (Time to Live) strategy for the records purge (considering delivery guarantee);

How do I sort events created by different clients based on a global clock?

I have a newsfeed program, and I've got many client applications (about 70) across a few timezones that generate events, for example when a secretary schedules a meeting it adds to the list on the server. This list is served to every client that wants to view it. Currently each record has the following metadata:
random unique ID
local timestamp (YYYY:MM:DD:H:M:S:ms)
How do I sort these events on the server, such that they appear in the correct order they were submitted in? Currently they get mixed up since local timing doesn't match. I don't have any UTC timestamps (can I calculate these locally?), so I'm wondering if I can make-do with the information I got... or should I be getting more information from each client? I noticed even clients in the same timezone get events mixed up because their system time is not synchronized (is it possible to know the exact global time, or synchronize the system time with a server on Windows?)
I'm not asking for code, I just need a pointer in the right direction.
When storing temporal values it is essential to always use UTC. Anything else and you're screwed. You really should also store the related timezone along with the UTC.

Resources