I working on a REST webservice that offers a method to insert data into a database.
The problem is, how can I make sure the webservice only commits the data if the client knows about it?
The client might lose connection after sending the data, but the webservice commits it to the database anyway. The client can never know if this was successful and will try again next time, basically trying to insert the same data again.
So I was thinking about having a second method that does the actual commit, but this seems to leave me with the same problem? Although it might be less likely to occur as there is not much data being send.
Is there even a way to be 100% sure the client knows it has been inserted into the database, or do I have to make sure it cannot be inserted twice?
Is there even a way to be 100% sure the client knows it has been inserted into the database,
No. Different network protocols can reduce the incidence of this. But it's a fundamental behavior of distributed systems. Even if the client connects directly to the database, a network failure after a commit is still possible.
So, yes you "have to make sure it cannot be inserted twice". Typically this is done with a unique index so that a subsequent insert will fail.
In my opinion, the easiest way to make sure the client knows is to send notifications to him. You might be sending all kinds of notifications depending on your use case - Emails, SMS, in app notifications, PWA notifications and etc. So, when you do the insert, whether is was successfull or not, you can inform the client via notifications about the result of the insert. Even in the case of the client being offline, the notifications will still be delivered when he comes online.
Related
The scenario is needing to write high volume data, like tracking clicks or mouse movements, from a web application to a SQL database. The data doesn't need to be written right away because the analysis on the data happens on some recurring basis, like daily or weekly.
I want some feedback on a solution that comes to mind:
The click and mouse data is published to a message queue. This stores the queue items in memory so it should be fast and faster than SQL. Then on some other server a job plugs away on retrieving the next queue item and writing the data to SQL.
Does anyone know of implementations like this? What pitfalls am I failing to see? If this solution is not a good one are there other alternatives?
Regards
RabbitMQ is meant for real time message exchange and not for temporary buffering data. If you are able to consume all data as soon as it arrives in your queues, then this solution will work for you. Otherwise RabbitMQ will grow in memory and eventually die. Then you will have to configure it to throw some data away (there are a lot of options to choose rules for this).
You could possibly store data in Redis cache, you can do it as fast as you publish your events to RabbitMQ. Then you can listen to the new changes in Redis from remote server and fill up whatever database storage you use, or even use it as your data storage.
To solve a very similar problem I was considering doing exactly this. In the end we decided not to go for it because we did need access to the data very quickly. However I still like the idea.
Ive also recently learnt that under the hood this is exactly the way that Microsofft Dynamics CRM does its database updates, using message passing.
Things I think you would need to pay careful attention to.
Make sure that if your RabbitMQ instance disappeared it wouldnt have any affect on your client. Rabbit dying is bad enough, your client erroring because Rabbit is down would be terrible.
If it's truly very high volume (and its good practice for reliability anyway) clustering is something worth looking at.
Obviously paying attention to your deadletter queues is a must. But the ability to play back messages which failed for some reason is awesome, in theory at least your data should eventually always get to you database. Even if it went down for a period of time.
Make sure you can keep up with the number of messages being passed in. Of course, this should be solvable by adding more consumer to a given queue. Which leads to...
Idempotency of messages. Given that your messages relate directly to a DB write, they HAVE to be idempotent.
I use PushSharp to send notifications for a few Apps.
PushSharp is great it really simplifies the work with push services, and I wonder what is the right way to work with it?
I haven't found examples/ explanations about that.
Now, when I have a message to send , I ...
create a PushSharp object
do a PushService.QueueNotification() for all devices
do a PushService.StopAllServices to send all queued messages
exits the method (and kill the PushService object).
Should I work this way, or keep this PushService object alive and call its methods when needed?
How should I use a PushService object to get the unregistered device ids? with a dedicated instance?
Any suggestion would be appreciated.
This is a question which frequently comes up.
The answer isn't necessarily one way or the other, but it depends on your situation. In most cases it would be absolutely fine to just create a PushBroker instance whenever you need it, since most platforms use HTTP based protocols for sending notifications. In the case of Apple, they state in their documentation that you should keep your connection to APNS open in order to minimize overhead of opening and closing secure connections.
However, in practice I think this means that they don't want you connecting and disconnecting VERY frequently (eg: they don't want you creating a new connection for every message you send). In reality, if you're sending batches of notifications every so often (let's say every 15 minutes or every hour) they probably won't have a problem with you opening a new connection for each batch and then closing it when done.
I've never heard of anyone being blocked from Apple's APNS servers for doing this. In fact in the very early days of working with push notifications, I had a bug that caused a new apns connection to be created for each notification. I sent thousands of notifications a day like this and never heard anything about it from Apple (eventually I identified it as a bug and fixed it of course).
As for collecting feedback, by default the ApplePushService will poll the feedback servers after 10 seconds of starting, and then every 10 minutes thereafter. If you want to disable this from happening you can simply set the ApplePushChannelSettings.FeedbackIntervalMinutes to <= 0. You can then use the FeedbackService class to poll for feedback whenever you need to, manually.
What is the best way to program an immediate reaction to an update to data in a database?
The simplest method I could think of offhand is a thread that checks the database for a particular change to some data and continually waits to check it again for some predefined length of time. This solution seems to be wasteful and suboptimal to me, so I was wondering if there is a better way.
I figure there must be some way, after all, a web application like gmail seems to be able to update my inbox almost immediately after a new email was sent to me. Surely my client isn't continually checking for updates all the time. I think the way they do this is with AJAX, but how AJAX can behave like a remote function call I don't know. I'd be curious to know how gmail does this, but what I'd most like to know is how to do this in the general case with a database.
Edit:
Please note I want to immediately react to the update in the client code, not in the database itself, so as far as I know triggers can't do this. Basically I want the USER to get a notification or have his screen updated once the change in the database has been made.
You basically have two issues here:
You want a browser to be able to receive asynchronous events from the web application server without polling in a tight loop.
You want the web application to be able to receive asynchronous events from the database without polling in a tight loop.
For Problem #1
See these wikipedia links for the type of techniques I think you are looking for:
Comet
Reverse AJAX
HTTP Server Push
EDIT: 19 Mar 2009 - Just came across ReverseHTTP which might be of interest for Problem #1.
For Problem #2
The solution is going to be specific to which database you are using and probably the database driver your server uses too. For instance, with PostgreSQL you would use LISTEN and NOTIFY. (And at the risk of being down-voted, you'd probably use database triggers to call the NOTIFY command upon changes to the table's data.)
Another possible way to do this is if the database has an interface to create stored procedures or triggers that link to a dynamic library (i.e., a DLL or .so file). Then you could write the server signalling code in C or whatever.
On the same theme, some databases allow you to write stored procedures in languages such as Java, Ruby, Python and others. You might be able to use one of these (instead of something that compiles to a machine code DLL like C does) for the signalling mechanism.
Hope that gives you enough ideas to get started.
I figure there must be some way, after
all, web application like gmail seem
to update my inbox almost immediately
after a new email was sent to me.
Surely my client isn't continually
checking for updates all the time. I
think the way they do this is with
AJAX, but how AJAX can behave like a
remote function call I don't know. I'd
be curious to know how gmail does
this, but what I'd most like to know
is how to do this in the general case
with a database.
Take a peek with wireshark sometime... there's some google traffic going on there quite regularly, it appears.
Depending on your DB, triggers might help. An app I wrote relies on triggers but I use a polling mechanism to actually 'know' that something has changed. Unless you can communicate the change out of the DB, some polling mechanism is necessary, I would say.
Just my two cents.
Well, the best way is a database trigger. Depends on the ability of your DBMS, which you haven't specified, to support them.
Re your edit: The way applications like Gmail do it is, in fact, with AJAX polling. Install the Tamper Data Firefox extension to see it in action. The trick there is to keep your polling query blindingly fast in the "no news" case.
Unfortunately there's no way to push data to a web browser - you can only ever send data as a response to a request - that's just the way HTTP works.
AJAX is what you want to use though: calling a web service once a second isn't excessive, provided you design the web service to ensure it receives a small amount of data, sends a small amount back, and can run very quickly to generate that response.
I'm building a mobile application in VB.NET (compact framework), and I'm wondering what the best way to approach the potential offline interactions on the device. Basically, the devices have cellular and 802.11, but may still be offline (where there's poor reception, etc). A driver will scan boxes as they leave his truck, and I want to update the new location - immediately if there's network signal, or queued if it's offline and handled later. It made me think, though, about how to handle offline-ness in general.
Do I cache as much data to the device as I can so that I use it if it's offline - Essentially, each device would have a copy of the (relevant) production data on it? Or is it better to disable certain functionality when it's offline, so as to avoid the headache of synchronization later? I know this is a pretty specific question that depends on my app, but I'm curious to see if others have taken this route.
Do I build the application itself to act as though it's always offline, submitting everything to a local queue of sorts that's owned by a local class (essentially abstracting away the online/offline thing), and then have the class submit things to the server as it can? What about data lookups - how can those be handled in a "Semi-live" fashion?
Or should I have the application attempt to submit requests to the server directly, in real-time, and handle it if it itself request fails? I can see a potential problem of making the user wait for the timeout, but is this the most reliable way to do it?
I'm not looking for a specific solution, but really just stories of how developers accomplish this with the smoothest user experience possible, with a link to a how-to or heres-what-to-consider or something like that. Thanks for your pointers on this!
We can't give you a definitive answer because there is no "right" answer that fits all usage scenarios. For example if you're using SQL Server on the back end and SQL CE locally, you could always set up merge replication and have the data engine handle all of this for you. That's pretty clean. Using the offline application block might solve it. Using store and forward might be an option.
You could store locally and then roll your own synchronization with a direct connection, web service of WCF service used when a network is detected. You could use MSMQ for delivery.
What you have to think about is not what the "right" way is, but how your implementation will affect application usability. If you disable features due to lack of connectivity, is the app still usable? If you have stale data, is that a problem? Maybe some critical data needs to be transferred when you have GSM/GPRS (which typically isn't free) and more would be done when you have 802.11. Maybe you can run all day with lookup tables pulled down in the morning and upload only transactions, with the device tracking what changes it's made.
Basically it really depends on how it's used, the nature of the data, the importance of data transactions between fielded devices, the effect of data latency, and probably other factors I can't think of offhand.
So the first step is to determine how the app needs to be used, then determine the infrastructure and architecture to provide the connectivity and data access required.
I haven't used it myself, but have you looked into the "store and forward" capabilities of the CF? It may suit your needs. I believe it uses an Exchange mailbox as a message queue to send SOAP packets to and from the device.
The best way to approach this is to always work offline, then use message queues to handle sending changes to and from the device. When the driver marks something as delivered, for example, update the item as delivered in your local store and also place a message in an outgoing queue to tell the server it's been delivered. When the connection is up, send any queued items back to the server and get any messages that have been queued up from the server.
I'm a beginning web developer sitting on an ambitious web application project.
So after having done some research, I found out about SQL Service Broker. It seems like something I could use, but I'm not sure. Since learning it requires someone to put in lots of time, I wanted to be sure that it would fit my needs.
I need to implement a system where website users can submit text to the website. This stream of messages has to be redundant and dealt with in a FIFO way, with on the other end of the stream another group of users dealing with the messages.
Now, a message that is being read by one of this last group of users, should be locked so that no-one else can read it at the same time. The user can then decide to handle the message or not. Only if he decides to deal with the message can it be deleted from the queue. If he decides that he doesn't want to deal with the message, the message should be put back in the queue (at the end of the queue, or at least with the highest priority), so that another user can read it and decide.
Is this something I would be able to implement with SQL Service Broker? Am I on the wrong track?
Thank you!
IMO, the best use of Service Broker is for connecting to independent Application in a loosely coupled way. What I mean by that is that systems tied in this way can communicate through a set of mutually agreed message types. This in contrast to one application manipulating directly the other's database, for example.
From what you've said, I would implement it as a simple table, for example: Create a message table with an identity PK, an Allocation flag and your custom columns. Whenever an operator wants to fetch the last message, get the lowest PK value for which Allocation = 'N' and update Allocation to 'Y'. This in a single transaction.
When/if the operation decides to return the message to queue, just set its AllocationFlag to 'N' and its back.
This is just an example. The database in this case is providing you consistency, heavy load performance, etc.
Behind the screens all data you submit to SSB is stored and manipulated as tables, so there is no reason for it to be necessarily faster than a database solution .