Implementing Comet on the database-side - database

This is more out of curiosity and "for future reference" than anything, but how is Comet implemented on the database-side? I know most implementations use long-lived HTTP requests to "wait" until data is available, but how is this done on the server-side? How does the web server know when new data is available? Does it constantly poll the database?

What DB are you using? If it supports triggers, which many RDBMSs do in some shape or form, then you could have the trigger fire an event that actually tells the HTTP request to send out the appropriate response.
Triggers remove the need to poll... polling is generally not the best idea.
PostgreSQL seems to have pretty good support (even PL/Python).

this is very much application dependent. The most likely implementation is some sort of messaging system.
Most likely, your server side code will consist of quite a few parts:
a few app servers that hansle incoming requests,
a (separate) comet server that deals with all the open connections to clients,
the database, and
some sort of messaging infrastructure
the last one, the messaging infrastructure is really the key. This provides a way for the app servers to talk to the comet server. So when a request comes in the app server will put a message into the message queue telling the comet server to notify the correct client(s)
How messaging is implemented is, again, very much application dependent. A very simple implementation would just use a database table called messages and poll that.
But depending on the stack you plan on using there should be more sphisticated tools available.
In Rails I'm using Juggernaut which simply listens on some network port. Whenever there is data to send the Rails Application server opens a connection to this juggernaut push server and tells it what to send to the clients.

Related

Is Socket.io Ideal for chat module

I am working on an Angularjs and Node.js based application. This is an organization based application. In this app, I have to implement chat functionality. So as we all know Socket.io is the best solution for instant messaging app and its reliability. But apart from this, I have few doubts regarding Socket.io. As of my understanding when we use socket programming (Socket.io in my case), for each and every connection it reserves a port. What if the size of an organization is too big? Will it work? At the server side, I am using Express js. Will Socket.io creates extra load on the server?
Should I go with Socket.io or HTTP?
Thanks.
HTTP polling for any sort of interactive timing is enormously inefficient. You will have tens of thousands of clients repeatedly asking your server, "do you have anything new for me?" and the server regularly responding "no, nothing yet".
webSockets (which socket.io uses as the transport) were invented precisely because they are more efficient for two way, interactive communication than HTTP polling.
Modern servers can be configured to handle hundreds of thousands of simultaneous webSocket connections. How many a single server of yours can actually handle in the real life working of your application depends upon dozens of factors, none of which you've disclosed in your question. But, selecting webSocket/socket.io is not a bad architectural choice for two-way chat - that's the kind of application is was invented for because it's generally better than HTTP polling at that sort of thing.
See these references:
What are the pitfalls of using Websockets in place of RESTful HTTP?
Ajax vs Socket.io
Can this technology stack scale?
Do HTML WebSockets maintain an open connection for each client? Does this scale?
600k concurrent websocket connections on AWS using Node.js
Node.js w/1M concurrent connections!
HTML5 WebSocket: A Quantum Leap in Scalability for the Web
For beginners, chat using socket.io is really simple to understand and integrate. However, the amount of bandwidth will depend heavily on the amount of data you're going to send from the server, and how much data the client will send. The bandwidth usage will also depend on which Socket.IO transport you're using, and the heartbeat interval of your application.
The performance impact of the application also varies on the type of application you're running and the performance capability of your machine and/or network. However, 5000+ clients will have a considerable impact on performance, regardless of your computer's capabilities unless you are scaling the application across multiple cores.
You can refer to this link for more details. Link
Go with Socket.io. It is incredibly relevant today for highly interactive applications like chat module. With web socket, there is no negotiation protocols and connection remain open as long as users concerned are registering for service with the web server. The payload is significantly less than http/https protocol.

Scaling WebSockets on Google Compute Engine

I would like to implement a chat system as part of a game I am developing on App Engine. To implement this, I would like to use WebSockets, and have clients connect to each other though a hub, in this case an instance of GCE. Assuming this game needed to scale to multiple instances on GCE, how would this work? If I had a client 1, and the load balancer directed that request of client 1 to instance A, and another client (2) came in and was directed to instance B, but those clients wanted to chat with each other, they would each be connected to different hubs, and would be unable to reach each other. How would this be set up to work with scale? Would I implement it using queues, where each instance listens on that queue, and if so, how would I do that?
Google Play Game Services offers exactly the functionality that you want but in regard to Android and ios clients. So this option may not be compatible with your game tech design.
In general you're reasoning correctly. Messages from client who want to talk to each other will most of the time hit different server instances. What you want to do is to make instances handle the communication between users. Pub/sub (publish-subscribe pattern) is very suitable pattern in this scenario. Roughly:
whenever there's a message directed to client X a message is published on the channel X,
whenever client X creates a session, instance handling it subscribes to channel X.
You can use one of many existing solutions for starters. It's very easy to set this up using redis. If you need something more low-level and more flexible check out zeromq.
You can expect single instance of either solution to be able to handle thousands of QPS.
Unfortunately I don't have any experience with scaling neither of these solutions so can't offer you any practical advice as to the limits of their scalability.
PS. There are also other topics you may want to explore such as: message persistence and failure recovery I didn't address here at all.
I didn't try to implement this yet but I'll probably have to soon, I think it should be fairly simple to handle it yourself.
You have: server 1 with list of clients and you have server 2 with another list of clients,
so if client wants to send data to another client which might be on server 2, you have to:
Lookup if the receiver is on current server - if it is, you just send it (standard)
Otherwise you send the same data to all other servers you have, so they would check their lists for particular client (or clients) and send data to them.

distributed system: sql server broker implementation

I have a distributed application consisting of 8 servers, all running .NET windows services. Each service polls the database for work packages that are available.
The polling mechanism is important for other reasons (too boring for right now).
I'm thinking that this polling mechanism would be best implemented in a queue as the .NET services will all be polling the database regularly and when under load I don't want deadlocks.
I'm thinking I would want each .NET service to put a message into an input queue. The database server would pop each message of the input queue one at a time, process it, and put a reply message on another queue.
The issue I am having is that most examples of SQL Server Broker (SSB) are between database services and not initiated from a .NET client. I'm wondering if SQL Server Broker is just the wrong tool for this job. I see that the broker T-SQL DML is available from .NET but the way I think this should work doesn't seem to fit with SSB.
I think that I would need a single SSB service with 2 queues (in and out) and a single activation stored procedure.
This doesn't seem to be the way SSB works, am I missing something ?
You got the picture pretty much right, but there are some missing puzzle pieces in that picture:
SSB is primarily a communication technology, designed to deliver messages across the network with exactly-once-in-order semantics (EOIO), in a fully transactional fashion. It handles network connectivity (authentication, traffic confidentiality and integrity) and acknowledgement and retry logic for transmission.
Internal Activation is an unique technology in that it eliminates the requirement for a resident service to poll the queue. Polling can never achieve the dynamic balance needed for low latency and low resource consumption under light load. Polling forces either high latency (infrequent polling to save resources) or high resource utilization (frequent polling required to provide low latency). Internal activation also has self-tunning capability to ramp up more processors to answer to spikes in load (via max_queue_readers) while at the same time still being capable of tuning down the processing under low load, by deactivating processors. One of the often overlooked advantages of the Internal Activation mechanism is the fact that is fully contained within a database, ie. it fails over with a cluster or database mirroring failover, and it travels with the database in backups and copy-attach operations. There is also an External Activation mechanism, but in general I much more favor the internal one for anything that fits an internal context (Eg. not and HTTP request, that must be handled outside the engine process...)
Conversation Group Locking is again unique and is a means to provide exclusive access to correlated message processing. Application can take advantage by using the conversation_group_id as a business logic key and this pretty much completely eliminates deadlocks, even under heavy multithreading.
There is also one issue which you got wrong about Service Broker: the need to put a response into a separate queue. Unlike most queueing products with which you may be familiar, SSB primitive is not the message but a 'conversation'. A conversation is a fully duplex, bidirectional communication channel, you can think of it much like a TCP socket. An SSB service does not need to 'put responses in a queue' but instead it can simply send a response on the conversation handle of the received message (much like how you would respond in a TCP socket server by issue a send on the same socket you got the request from, not by opening a new socket and sending a response). Furthermore SSB will take care of the inherent message correlation so that the sender will know exactly the response belong to which request it sent, since the response will come back on the same conversation handle the request was sent on (much like in a TCP socket case the client receives the response from the server on the same socket it sent the request on). This conversation handle is important again when it comes to the correlation locking of related conversation groups, see the link above.
You can embed .Net logic into Service Broker processing via SQLCLR. Almost all SSB applciations have a non-SSB service at at least one end, directly or indirectly (eg. via a trigger), a distributed application entirely contained in the database is of little use.

Is PollingDuplex right for Silverlight client notification?

I'm trying to figure out if PollingDuplex is the right way to go for my problem.
Here is my scenario:
1. 3rd party application sends a UDP packet with a client's IP address to a server app.
2. The server app needs to the notify the specified client and send along some data.
The client is a Silverlight application.
I've been looking at some guides and sample code (http://petermcg.wordpress.com/2008/09/03/silverlight-polling-duplex-part-1-architecture/) but I don't understand how clients are identified on the server using PollingDuplex. I understand that the clients register with the server and continually poll for messages. How would I make sure that only the right clients get the message designated for that client? In other words, the messages on the server should not be broadcasted to all polling clients but only sent to one specific client.
Any help is much appreciated.
Whether you're using Net.TCP or HttpDuplexBinding, clients can be identified using OperationContext.Current.Channel.SessionId. And more specifically, you can grab the actual channel that WCF uses to talk to them using OperationContext.Current.GetCallbackChannel<IMyCustomServiceInterface>(). You can store those in memory, perhaps associated with some other identifier passed up from the client, and when you need to communicate with the client in question (e.g., to pass them the data from the UDP packet), you call the appropriate method on that specific stored channel; and the client will get notified.
I should note that while I don't particularly recommend HttpDuplexBinding, apart from its quirks and stability and performance issues, it should work for what you're doing, and in exactly the same way as Net.TCP. Although the clients technically do "poll" the server, that's hidden from you. All you know on the server is that you're calling a method on a particular channel. The underlying binding code takes care of making sure that the right client gets notified.
Polling duplex is actually an entirely client side implementation that exists only for Silverlight (there's no regular .NET framework version of it, except a project on Codeplex Microsoft's own internal consulting services developed for a high profile client of theirs). There's nothing at all special about it on the server side.
It's not really meant to be used in production by Microsoft's own admission (we have a Microsoft contact at our company who admitted this to us candidly). It's not very robust or well implemented and can/will DoS your server under any kind of volume:
http://forums.silverlight.net/p/89970/239380.aspx
You're better off rolling your own client side polling mechanism - or (better and more scalable) using TCP with session in Silverlight 4, which provides true duplex support (because the connection is not stateless and thus supports true push notifications):
http://www.silverlightshow.net/items/WCF-NET.TCP-Protocol-in-Silverlight-4.aspx.

.NET CF mobile device application - best methodology to handle potential offline-ness?

I'm building a mobile application in VB.NET (compact framework), and I'm wondering what the best way to approach the potential offline interactions on the device. Basically, the devices have cellular and 802.11, but may still be offline (where there's poor reception, etc). A driver will scan boxes as they leave his truck, and I want to update the new location - immediately if there's network signal, or queued if it's offline and handled later. It made me think, though, about how to handle offline-ness in general.
Do I cache as much data to the device as I can so that I use it if it's offline - Essentially, each device would have a copy of the (relevant) production data on it? Or is it better to disable certain functionality when it's offline, so as to avoid the headache of synchronization later? I know this is a pretty specific question that depends on my app, but I'm curious to see if others have taken this route.
Do I build the application itself to act as though it's always offline, submitting everything to a local queue of sorts that's owned by a local class (essentially abstracting away the online/offline thing), and then have the class submit things to the server as it can? What about data lookups - how can those be handled in a "Semi-live" fashion?
Or should I have the application attempt to submit requests to the server directly, in real-time, and handle it if it itself request fails? I can see a potential problem of making the user wait for the timeout, but is this the most reliable way to do it?
I'm not looking for a specific solution, but really just stories of how developers accomplish this with the smoothest user experience possible, with a link to a how-to or heres-what-to-consider or something like that. Thanks for your pointers on this!
We can't give you a definitive answer because there is no "right" answer that fits all usage scenarios. For example if you're using SQL Server on the back end and SQL CE locally, you could always set up merge replication and have the data engine handle all of this for you. That's pretty clean. Using the offline application block might solve it. Using store and forward might be an option.
You could store locally and then roll your own synchronization with a direct connection, web service of WCF service used when a network is detected. You could use MSMQ for delivery.
What you have to think about is not what the "right" way is, but how your implementation will affect application usability. If you disable features due to lack of connectivity, is the app still usable? If you have stale data, is that a problem? Maybe some critical data needs to be transferred when you have GSM/GPRS (which typically isn't free) and more would be done when you have 802.11. Maybe you can run all day with lookup tables pulled down in the morning and upload only transactions, with the device tracking what changes it's made.
Basically it really depends on how it's used, the nature of the data, the importance of data transactions between fielded devices, the effect of data latency, and probably other factors I can't think of offhand.
So the first step is to determine how the app needs to be used, then determine the infrastructure and architecture to provide the connectivity and data access required.
I haven't used it myself, but have you looked into the "store and forward" capabilities of the CF? It may suit your needs. I believe it uses an Exchange mailbox as a message queue to send SOAP packets to and from the device.
The best way to approach this is to always work offline, then use message queues to handle sending changes to and from the device. When the driver marks something as delivered, for example, update the item as delivered in your local store and also place a message in an outgoing queue to tell the server it's been delivered. When the connection is up, send any queued items back to the server and get any messages that have been queued up from the server.

Resources