Frequent Database Query for Instant Message - sql-server

I am creating an Instant Messaging application for our department. The features of this application are:
The messages will be stored in a database
The messages may be sent to one, multiple, or all users/locations
The logged in user will be able to see a history of the messages they are included in.
My question: is it appropriate to constantly query the database from each client - there should be less than 20 clients running - say every 15 - 30 secs or so? I have seen examples of a server/client messaging app using tcipclient but am not familiar with that subject. So I thought querying the database might be the approach I could go with. What are the ramifications of performing these queries so often? I'm also looking at sqldependencies??? Should I really go back to and try and learn tcip technology?
Thanks

If you know that you will always have of the order of tens of clients but not of the order of thousands of clients, then polling will work just fine, and you do not have to poll every 15 seconds, (it would be unusable if you did so,) you can poll every 100 or 200 milliseconds, so chatting will appear instantaneous.
Just make sure that each polling operation is as simple as possible. The simplest operation you can do is this:
SELECT * FROM chat_log WHERE chat_log.id > ? where id is your IDENTITY primary key, and ? is the last id that your client has seen so far from the server. Therefore, if there are no new chat messages, no rows are retrieved. With every row retrieved by a client, update the largest id that the client has seen so far, and you are good to go.
I have done it and it works like a charm.
From a technical point of view polling is a very ignoble technique, but in many situations it can be a practical compromise which may yield good enough results with very little development. (The alternative would be to create a proper chat server which sends push notifications to the clients, good luck with that.)

If its less that 20 clients (20 select queries every 20 seconds + some writes), SQL Server will have no issues to process these messages.
Selection of tools and technology depends on your actual requirements. (size of messages, allow file transfers, delete/edit messages...)
I can suggest few options to improve performance,
Reading Messages - You can use Caching (e.g. Azure Redis Cache) for recent messages (last 30days). You can come up with background cache update strategy to make sure it's continuously updated with new messages. Read messages will call the cache first, it will hit the database only if there is a cache miss.
Also you can create a local message cache (client side) which will dramatically improve performance for end user. You can create a SQLite for this (like Skype does. Win + R -> %appdata%\skype -> folder -> main.db)
Or else you can simply have an Archive table in your db where a scheduled (every 24 hours) background process archives messages older than 14/30 days. So you will have recent messages
Writing - Writing messages will be chatty, rather than directly updating the database you can use a Message queue (Azure Message Queue, Rabbit MQ.. etc). Then you can have another process to write messages to the database.
Each technology selection will have it's own cost, pros and cons and learning time. Therefore start simple and leave room to scale later.

Related

Processing a million records as a batch in BizTalk

I am looking at suggestions on how to tackle this and whether I am using the right tool for the job. I work primarily on BizTalk and we are currently using BizTalk 2013 R2 with SQL 2014.
Problem:
We would be receiving positional flat files every day(around 50) from various partners and the theoretical total number of records received would be over a million records. Each record has some identifying information that will need to be sent to a web service which would come back essentially with a YES or NO based on which the incoming file is split into two files.
Originally, the scope for daily expected records was 10k which later ballooned to 100k and now is at a million records.
Attempt 1: Scatter-Gather pattern
I am debatching the records in a custom pipeline using the file disassembler, adding a couple of port configurable properties for the scatter part(following Richard Seroter's suggestion of implementing a round-robin assignment) where I control the number of scatter/worker orchestrations I spin up to call the web service and mark the records to be sent to 'Agency A' or 'Agency B' and finally push a control message that spins up the Gather/Aggregator orchestration that collects all the messages that are processed from the workers into the messagebox via correlation and creates two files to be routed to Agency A and Agency B.
So, every file that gets dropped will have it's own set of workers and a aggregator that would process the file.
This works well for files with fewer number of records but if a file has over 100k records, I see throttling happen and the file takes a long time to process and generate the two files.
I have put the receive location/worker & aggregator/send port on separate hosts.
It appears to be that the gatherer seems to be dehydrated and not really aggregating the records processed by the workers until all of them are processed and i think since the ratio of msgs published vs processed is very large, it is throttling.
Approach 2:
Assuming that the Aggregator orchestration is the bottleneck, instead of accumulating them in an orchestration, i pushed the processed records to a SQL db and 'split' the records into two XML files(basically a concatenate of msgs going to Agency A/B and wrapping it in XML declaration and using the correct msg type based on writing some of the context properties to the SQL table along with the record).
These aggregated XML records are polled and routed to the right agencies.
This seems to work okay with 100k records and completes in an acceptable amount of time. Now that the goal post/requirement has again changed with regard to expected volume, i am trying to see if BizTalk is even a feasible choice anymore.
I have indicated that BT is not the right tool for the job to perform such a task but the client is suggesting we add more servers to make it work. I am looking at SSIS.
Meanwhile, while doing some testing, some observations:
Increasing the number of workers improved processing(duh):
It looks like if each worker processed a fewer number of records in it's queue/subscription, they finished their queue quickly. When testing this 100k record file, using 100 workers completed in under 3 hrs. This is with minimal activity on the server from other applications.
I am trying to get the web service hosting team to give me a theoretical maximum no of concurrent connection they can handle. I am leaning towards asking them to see if they can handle 1000 calls and maybe the existing solution would scale with my observations.
I have adjusted a few settings for the host with regard to message count and physical memory threshold so it won't balk with the volume but I am still unsure. I didn't have to mess with these settings before and can use advice to monitor any particular counters.
The post is a bit long but I am hoping this gives an idea on what I did so far. Any help/insight appreciated in tackling this problem. If you are suggesting alternatives, i am restricted to .NET or MS based tools/frameworks but would love to hear on other options as well.
I will try to answer or give more detail if you want to clarify or understand something I didn't make clear.
First, 1 million records/messages is not the issue, but you can make it a problem by handling it poorly.
Here's the pattern I would lay out first.
Load the records into SQL Server with SSIS. This will be very fast.
Process/drain the records into you BizTalk app for...well, whatever needs to be done. Calling the service etc.
Update the SQL Record with the result.
When that process is complete, query out the Yes and No batches as one (large) message each, transform and send.
My guess is the Web Service will be the bottleneck unless it's specifically designed for such a load. You will probably have to tune BizTalk to throttle only when necessary but don't worry about that just yet. A good app pattern is more important.
In such scenarios, you should consider following approach:
De-batch the file and store individual records to MSMQ. You can easily achieve this without any extra coding effort, all you need is to create a send port using MSMQ adapter or WCF custom with netmsmq binding. If required, you can also create separate queues depending on different criteria you may have in your messages.
Receive the messages from MSMQ using receive location on a separate host.
Send them to web service on a different BizTalk host.
Try using messaging only scenarios, you can handle service response using a pipeline component if required. You can use Map on send port itself. In worst case if you need orchestration, it should only be to handle one message processing without any complex pattern.
You can again push messages back to two MSMQ for two different agencies based of web service response.
You can then receive those messages again and write them to file, you can simply use a send port with FileAppend option or use a custom pipeline component to write the received messages to file without aggregating them in orchestration. You can gather them in orchestration, if per file you don't have more than few thousand messages.
With this approach you won't have any bottleneck within BizTalk and you don't need to use complex orchestration pattern which usually end up having many persistent points.
If web service becomes a bottleneck, then you can control the rate of received message from MSMQ using 1) Ordered Delivery on MSMQ receive location and if required 2) using BizTalk host throttling by changing two properties Message Count in Db to a very low number e.g. 1000 from 50K default and increasing Spool and Tracking Data Multiplier accordingly e.g. 500 from 10 default to make sure the multiply of both number is enough for not to cause throttling due to messages within BizTalk. You can also reduce the number of worker threads on BizTalk host to make it little slow.
Please note MSMQ is part of Windows OS and does not require any additional setup. Usually installed by default, if not you can add using add-remove features. You can also use IBM MQ if your organization has the infrastructure. But for one million messages, MSMQ will be just fine.
Apologies on the late update*
We've decided to use SSIS to bulk import the file to a table and since the lookup web service is part of the same organization and network although using a different stack, they have agreed to allow us to call their lookup table upon which their web service is based on and we are using a 'merge' between those tables to identify 'Y' or 'N' and export them out via SSIS as well.
In short, we've skipped using BT. The time it now takes is within a couple of mins for a 1.5 million record file to be processed and send the split files.
Appreciate all the advice provided here.

SqlDependency vs SQLCLR call to WebService

I have a desktop application which should be notified on any table change. So, I found only two solutions which fits well for my case: SqlDependency and SQLCLR. (I would like to know if there is better in .NET stack) I have built the both structure and made them work. I only able to compare the duration of a s̲i̲n̲gl̲e̲ response from SQL Server to the client.
SqlDependency
Duration: from 100ms to 4 secs
SQLCLR
Duration: from 10ms to 150ms
I would like this structure to be able to deal with high rate notifications*, I have read a few SO and blog posts (eg: here) and also am warned from a colleague that on mass requests SqlDependency may go wrong. Here, MS offers something which I didn't get that may be another solution to my problem.
*:Not all the time but for a season; 50-200 requests per sec on 1-2 servers.
On the basis of a high rate of notifications and in parallel with performance, which of these two should I go on with, or is there another option?
Neither SqlDependency (i.e. Query Notifications) nor SQLCLR (i.e. call a Web Service via a Trigger) is going to work for that volume of traffic (50-200 req per sec). And in fact, both options are quite dangerous at those volumes.
The advice given in both linked pages (the one on SoftwareEngineering.StackExchange.com and the TechNet article) are all much better options. The advice on Best way to get push notifications to server from ms sql database (i.e. custom queue table that is polled every few seconds) is very similar to option #1 of the Planning for Notifications TechNet article (which uses Service Broker to handle the processing of the queue).
I like the queuing idea (fully custom or using Service Broker) the best and have used fully custom queues on highly transactional systems (easily the volume you are anticipating) with much success. The pros and cons between these two options (as I see them, of course) are:
Service Broker
Pro: Existing (and proven) framework (can scale and tied into Transactions)
Con: not always easy to configure or administer / debug, can't easily aggregate 200 individual events in 1 second into a single message (will still be 1 message per each Trigger event)
Fully custom queue
Pro: can aggregate many simultaneous trigger events into single "message" to client (i.e. polling service picks up whatever changes happened since last polling), can make use of Change Tracking / Change Data Capture as the source of "what changed" so you might not need to build a queue table.
Con: Is only as scalable as you are able to make it (might be as good, or better, than Service Broker, but highly dependent on your skill and experience to achieve this), needs thorough testing of edge cases to make sure the queue processing doesn't miss, or double-count, events.
You might be able to combine Service Broker with Change Tracking / Change Detection. If there is an easy-enough way to determine the last change processed (change as noted in Change Tracking / Change Data Capture table(s)), then you can set up a SQL Server Agent job to poll every few seconds, and if you find that new changes have come in, then grab all of those changes into a single message to send to Service Broker.
Some documentation to get you started:
Track Data Changes (covers both Change Tracking and Change Data Capture)
SQL Server Service Broker

Service bus queue in e-commerce application

We have an e-commerce application running on MS SQL.
Every now and then we have a flash sale, and once we start inserting all the orders into the database, our site's performance drops. We have it at the point where we can insert about 1,500 orders in a minute, but the site hangs for a few minutes after that. The site only hangs once the inserts start happening.
I have been looking into using Azure Service Bus queues mixed with SignalR to manage the order process, as this was suggested to me a while back. The way I see it happening is (broad overview):
Client calls a procedure on the server which inserts an order into a queue.
Client gets notified that they are in a queue.
We have a worker process which processes the order from the queue and inserts it into the database.
Server then notifies the client that the order is processed and moves them onto the payment page.
I am new to SignalR and queues in general so my questions are:
Will queues actually have a performance benefit. If so, why?
Are queues even the correct thing to use in this instance?
The overview you mention makes sense. It seems like you should be able to do it without SignalR since ServiceBus will let you know once it successfully inserted the message into the queue.
It is not that queues give you better performance for 1 request. Messages placed onto the queue will be stored until you are ready to process them. By doing this you will not suffer "peak" issues and you will be able to receive from the Queue at a speed that you know your system is able to sustain (Maybe 500 orders/minute or whatever number works for you).
So they will give you a much more stable latency per request without bringing down your system.

Database time acces in Heroku with Play Framework

I am having a problem and I need your help.
I am working with Play Framework v1.2.4 in java, and my server is uploaded in the Heroku servers.
All works fine, I can access to my databases and all is ok, but I am experiment troubles when I do a couple of saves to the database.
I have a method who store data many times in the database and return a notification to a mobile phone. My problem is that the notification arrives before the database finish to save the data, because when it arrives I request for the update data to the server, and it returns the data without the last update. After a few seconds I have trying to update again, and the data shows correctly, therefore I think there is a time-access problem.
The idea would be that when the databases end to save the data, the server send the notification.
I dont know if this is caused because I am using the free version of the Heroku Servers, but I want to be sure before purchasing it.
In general all requests to cloud databases are always slower than the same working on your local machine. Even simply query that on your computer needs just 0.0001 sec can be as slow as 0.5 sec in the cloud. Reason is simple clouds providers uses shared databases + (geo) replications, which just... cannot be compared to the database accessed only by one program on the same machine.
Also keep in mind that free Heroku DB plans doesn't offer ANY database cache, which means that every query is fetched from the cloud directly.
As we don't know your application it's hard to say what is the bottleneck anyway almost for sure you have at least 3 ways to solve your problem. They are not an alternatives, probably you will need to use (or at least check) all of them.
You need to risk some basic plan and see how things changed with paid version, maybe it will be good enough for you, maybe not.
Redesign your application to make less queries. For an example instead sending 10 queries to select 10 different rows, you will need to send one query, which selects all 10 records at once.
Use Play's cache API to avoid repeating selecting the same set of data again and again. For an example, if you have some categories, which changes rarely, but you need category tree for each article, you don't need to fetch categories from DB every time, instead you can store a List of categories in cache, so you will need to use only one request to fetch article's content (which can be cached for some short time as well...)

Message Queue or DataBase insert and select

I am designing an application and I have two ideas in mind (below). I have a process that collects data appx. 30 KB and this data will be collected every 5 minutes and needs to be updated on client (web side-- 100 users at any given time). Information collected does not need to be stored for future usage.
Options:
I can get data and insert into database every 5 minutes. And then client call will be made to DB and retrieve data and update UI.
Collect data and put it into Topic or Queue. Now multiple clients (consumers) can go to Queue and obtain data.
I am looking for option 2 as better solution because it is faster (no DB calls) and no redundancy of storage.
Can anyone suggest which would be ideal solution and why ?
I don't really understand the difference. The data has to be temporarily stored somewhere until the next update, right.
But all users can see it, not just the first person to get there, right? So a queue is not really an appropriate data structure from my interpretation of your system.
Whether the data is written to something persistent like a database or something less persistent like part of the web server or application server may be relevant here.
Also, you have tagged this as real-time, but I don't see how the web-clients are getting updates real-time without some kind of push/long-pull or whatever.
Seems to me that you need to use a queue and publisher/subscriber pattern.
This is an article about RabitMQ and Publish/Subscribe pattern.
I can get data and insert into database every 5 minutes. And then client call will be made to DB and retrieve data and update UI.
You can program your application to be event oriented. For ie, raise domain events and publish your message for your subscribers.
When you use a queue, the subscriber will dequeue the message addressed to him and, ofc, obeying the order (FIFO). In addition, there will be a guarantee of delivery, different from a database where the record can be delete, and yet not every 'subscriber' have gotten the message.
The pitfalls of using the database to accomplish this is:
Creation of indexes makes querying faster, but inserts slower;
Will have to control the delivery guarantee for every subscriber;
You'll need TTL (Time to Live) strategy for the records purge (considering delivery guarantee);

Resources