As per title, assume I have a WCF web service that, at specific times of the day, will encounter huge amounts of traffic/calls. Each call to the web service will invoke a write to a database and a read from the same database.
Technically, what must I take into consideration (if anything) with regards to protecting the database from any unwanted effects from reading and writing to it very frequently?
Also, must the WCF web service be coded/structured any differently with this in mind?
There are 2 basic approaches:
Throttle the input. Use the throttling function in WCF to reduce the load on your database.
Put a layer of protection between the WCF calls and your database. For example a queue to protect against incomming messages and a cache to protect against excessive reads.
Which one you choose will depend on your situation. The first one is very cheap to implement since it is just a configuration change, but your users of the WCF service may notice that they are being throttled. The second is better for the users of the service, but much more expensive to implement.
Related
My collegue is defending that opening a single database connection for an application is much better and faster than opening and closing it using a pool.
He has an ApplicationStart method where he inits Application('db') and keeps this connection live across the app. This app is mostly contains readonly data.
How can I persuade him?
That depends a lot on what the "application" here is. If this is a client application that works on a single thread and does things sequentially, then frankly there won't be any noticeable difference either way. In that scenario, if you use the pool it will basically be a pool of 1 item, and opening a connection from the pool will be virtually instantaneous (and certainly not noticeable compared to network IO). In that scenario I would still say use the inbuilt pooling, as it will avoid assumptions when you change scenario.
However, if you application uses more than one thread, or via any other mechanism does more than one thing at a time (async) etc, using a single connection would be very bad; either it will outright fail, or you will need to synchronize around the connection, which would limit you severely. Note that any server-side application (any kind of web application, WCF service, SOAP service, or socket service) would react very badly to his idea.
Perhaps the main way to convince him is simply: ask him to prove it. Ask for a repeatable test / demonstration that shows this difference.
I have a CRUD webservice, and have been tasked with trying to figure out a way to ensure that we don't lose data when the database goes down. Everyone is aware that if the database goes down we won't be able to get "reads" but for a specific subset of the operations we want to make sure that we don't lose data.
I've been given the impression that this is something that is covered by services like 0MQ, RabbitMQ, or one of the Microsoft MQ services. Although after a few days of reading and research, I'm not even certain that the messages we're talking about in MQ services include database operations. I am however 100% certain that I can queue up as many hello worlds as I could ever hope for.
If I can use a message queue for adding a layer of protection to the database, I'd lean towards Rabbit (because it appears to persist through crashes) but since the target is a Microsoft SQL server databse, perhaps one of their solutions (such as SQL Service Broker, or MSMQ) is more appropriate.
The real fundamental question that I'm not yet sure of though is whether I'm even playing with the right deck of cards (so to speak).
With the desire for a high-availablity webservice, that continues to function if the database goes down, does it make sense to put a Rabbit MQ instance "between" the webservice and the database? Maybe the right link in the chain is to have RabbitMQ send messages to the webserver?
Or is there some other solution for achieving this? There are a number of lose ideas at the moment around finding a way to roll up weblogs in the event of database outage or something... but we're still in early enough stages that (at least I) have no idea what I'm going to do.
Is message queue the right solution?
Introducing message queuing in between a service and it's database operations is certainly one way of improving service availability. Writing to a local temporary queue in a store-and-forward scenario will always be more available than writing to a remote database server, simply by being a local operation.
Additionally by using queuing you gain greater control over the volume and nature of database traffic your database has to handle at peak. Database writes can be queued, routed, and even committed in a different order.
However, in order to do this you need to be aware that when a database write is performed it is processed off-line. Even under conditions where this happens almost instantaneously, you are losing a benefit that the synchronous nature of your current service gives you, which is that your service consumers can always know if the database write operation is successful or not.
I have written about this subject before here. The user posting the question had similar concerns to you. Whether you do this or not is a decision you have to make based on whether this is something your consumers care about or not.
As for the technology stacks you are thinking of this off-line model is implementable with any of them pretty much, with the possible exception of Service broker, which doesn't integrate well with code (see my answer here: https://stackoverflow.com/a/45690344/569662).
If you're using Windows and unlikely to need to migrate, I would go for MSMQ (which supports durable messaging via transactional queues) as it's lightweight and part of Windows.
I have a simple Web based application scenario,Sending a request and get response from Database.Response would be having very large number of rows say around 10,000 to 20,000 of records at a time.
I have designed for Audit Logging for all transaction.i.e.Inserting into database for all such responses.say 10,000 to 20,000 rows at a time.
As,Inserting into the table is just for auditing purpose.Can I have some way to separate Auditing and Logging from Normal response ? Some way to differentiate them ?
Any help on design would be highly appreciable.
Thanks in Advance.
In general, it's a bad idea for a web application to do too much work in a synchronous web request. Web servers (and web application servers) are designed to serve lots of concurrent requests, but on the assumption that each request will take just milliseconds to execute. Different servers have different threading strategies, but as soon as you have long-running requests, you're likely to encounter an overhead due to thread management, and you can then very quickly find your web server slowing down to the point of appearing broken.
Reading or writing 10s of thousands of rows in a single web request is almost certainly a bad idea. You probably want to design your application to use asynchronous worker queues. There are several solutions for this; in the Java ecosystem, you could check out vert.x
In these asynchronous models, auditing is straightforward - your auditor subscribes to the same message queue as the "write to database" listener.
Checkout log4j2 for seperating auditing and logging.
This is easily done by having two appenders in the log4j2.xml itself.
For reference visit:
https://logging.apache.org/log4j/2.x/manual/appenders.html
We have an affiliate system which counts millions of banner Impressions/Clicks per day.
Currently it writes to SQL every Impression/Click that occurs in real time on each request.
Web application serves these requests.
We are facing two problems:
If we have a lot of concurrent requests per second, the SQL is
starting to work very hard to insert the Impressons/Clicks data and
as a result lead to problem #2.
If SQL is slow at the moment, the requests are being accumulated and
are waiting in the queue on web server. As a result we have a
slowness on a web application and requests are not being processed.
Design we thought of in high level:
We are now considering changing the design by taking out the writing to SQL logic out of web application (write it to some local storage instead) and making a stand alone service which will read from local storage and eventually write the aggregated Impressions/Clicks data (not in real time) to SQL in background.
Our constraints:
10 web servers (load balanced)
1 SQL server
What do you think of suggested design?
Would you use NoSQL as local storage for each web server?
Suggest your alternative.
Your problem seems to be that your front-end code is synchronusly blocking while waiting for the back-end code to update the database.
Decouple front-end and back-end, e.g. by putting a queue inbetween where the front-end can write to the queue with low latency and high throughput. The back-end then can take its time to process the queued data into their destinations.
It may or may not be necessary to make the queue restartable (i.e. not losing data after a crash). Depending on this, you have various options:
In-memory queue, speedy but not crash-proof.
Database queue, makes sense if writing the raw request data to a simple data structure is faster than writing the final data into its target data structures.
Renundant queues, to cover for crashes.
I'm with Bernd, but I'm not sure about using a queue specifically.
All you need is something asynchronous that you can call; that way the act of logging the impression is pretty much redundant.
I'm building a mobile application in VB.NET (compact framework), and I'm wondering what the best way to approach the potential offline interactions on the device. Basically, the devices have cellular and 802.11, but may still be offline (where there's poor reception, etc). A driver will scan boxes as they leave his truck, and I want to update the new location - immediately if there's network signal, or queued if it's offline and handled later. It made me think, though, about how to handle offline-ness in general.
Do I cache as much data to the device as I can so that I use it if it's offline - Essentially, each device would have a copy of the (relevant) production data on it? Or is it better to disable certain functionality when it's offline, so as to avoid the headache of synchronization later? I know this is a pretty specific question that depends on my app, but I'm curious to see if others have taken this route.
Do I build the application itself to act as though it's always offline, submitting everything to a local queue of sorts that's owned by a local class (essentially abstracting away the online/offline thing), and then have the class submit things to the server as it can? What about data lookups - how can those be handled in a "Semi-live" fashion?
Or should I have the application attempt to submit requests to the server directly, in real-time, and handle it if it itself request fails? I can see a potential problem of making the user wait for the timeout, but is this the most reliable way to do it?
I'm not looking for a specific solution, but really just stories of how developers accomplish this with the smoothest user experience possible, with a link to a how-to or heres-what-to-consider or something like that. Thanks for your pointers on this!
We can't give you a definitive answer because there is no "right" answer that fits all usage scenarios. For example if you're using SQL Server on the back end and SQL CE locally, you could always set up merge replication and have the data engine handle all of this for you. That's pretty clean. Using the offline application block might solve it. Using store and forward might be an option.
You could store locally and then roll your own synchronization with a direct connection, web service of WCF service used when a network is detected. You could use MSMQ for delivery.
What you have to think about is not what the "right" way is, but how your implementation will affect application usability. If you disable features due to lack of connectivity, is the app still usable? If you have stale data, is that a problem? Maybe some critical data needs to be transferred when you have GSM/GPRS (which typically isn't free) and more would be done when you have 802.11. Maybe you can run all day with lookup tables pulled down in the morning and upload only transactions, with the device tracking what changes it's made.
Basically it really depends on how it's used, the nature of the data, the importance of data transactions between fielded devices, the effect of data latency, and probably other factors I can't think of offhand.
So the first step is to determine how the app needs to be used, then determine the infrastructure and architecture to provide the connectivity and data access required.
I haven't used it myself, but have you looked into the "store and forward" capabilities of the CF? It may suit your needs. I believe it uses an Exchange mailbox as a message queue to send SOAP packets to and from the device.
The best way to approach this is to always work offline, then use message queues to handle sending changes to and from the device. When the driver marks something as delivered, for example, update the item as delivered in your local store and also place a message in an outgoing queue to tell the server it's been delivered. When the connection is up, send any queued items back to the server and get any messages that have been queued up from the server.