As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
With event based programming you essentially just loop and poll, loop and poll...why is this preferred to just blocking? If you're not receiving any events why would you prefer to use select() over just blocking on an accept()?
Normally this question is posed as if event based network programming is makes better use of resources than thread based network programming. The general answer to this question is theoretically no, but practically yes. I recall a paper written by one of the founders of Inktomi who's product later became Apache Traffic Server (Traffic Server is an event based http proxy). Basically, the conclusions were that userspace threads could be as fast as an event based model. They felt that context switching would always make OS level threads slower that event models. There were at the time no production ready userspace threading models that compete with event based models. Finally, they indicated that the conceptual overhead of using a event based model over a thread based model was significant on a large scale application. You have already noticed this.
It is much simpler to just have a bunch of threads each handling the whole connection lifetime than to have an event loop dispatching work based on when some part of the process has to block, when a timer goes off, or who knows what other events. Sadly, at this time, the more complicated approach is the faster.
Note: sorry for not posting a link to the paper, but I cannot seem to find an online source right now. I will try to edit this post with a link later
"better" depends on what you need.
With event based (select/poll/epoll/etc.) IO, you can listen on events from many(thousands) sockets in one thread. This can vastly improve scalability vs using one thread per socket doing blocking operations.
With blocking read/writes/accepts, you can't service several clients concurrently in one thread, you'll have to use at least one thread/process per connection. The drawback here is that this does not scale as much as event based IO. However the programming model becomes much easier.
Sometimes you'll need to call APIs (e.g. to query a backend database) which only provides a blocking API. In such a case, you'll block every other client if you do this in an event based IO loop, and you'll basically have to resort to using thread-per-client anyway - if you need scalability in such cases, it's common to couple an event loop with a worker thread pool, which might make the programming model even harder.
You can use blocking sync IO when:
You have only one socket active
Your application doesn't do anything but react to events on the socket
You don't plan to extend the application
You don't mind that the user have to kill the application to exit
If any of these are false, you're likely better off with a polling loop.
It isn't a good idea let the program block on an accept. It means that no other operations can be performed until your application will receive some datas.
Even if yout network requirements are very simple and you don't need to send or receive other datas than which for your blocked socket is waiting for, you can't even update your GUI (if you have one ) or receive input from the user.
Generally the point is to use select or threads?
Threads are hard to debug and can create problems regarding concurrent operations. So, unless you explicitly need threads , i'll suggest to use select.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to write a server program which accepts incoming connections and process received requests. The first idea appeared in my head was to use non-blocking socket with epoll() or select().
For instance, when epoll() returns, it gives me a socket array with available IO events. Then I have to loop over the socket array to send and receive data, once the buffer is entirely received or sent, a callback functions is executed. This is also the technique the most discussed on internet.
However, if I use this technique, my program will keep all the other sockets waiting while it is dealing with one client connection. Isn't it an inefficient way to go if the request of client is time-consuming?
In the documents that I've found they said that such one thread/process design can easily handle hundreds of connections simultaneously, and multithread design is always heavily criticized for its complexity, system overhead etc.
Thus, my question is: How to design an efficient server program if it has to handle heavy workload?
Thanks.
Million dollar questions with a million different trade offs. For those that get Monty Python...
https://www.youtube.com/watch?v=pWS8Mg-JWSg
Back to reality... Apache2 can handle heavy workloads, nginx can handle heavy workloads, so can Node, Tomcat, Jetty, JBoss, Netty... In fact any of the well knowns applications servers in use today and quite a few less well known ones can handle heavy workloads and they all use various combinations of threads, events and processes to do it. Some languages ie Erlang or Go etc allow you to easily spin up high performance application servers in a few hundred lines of code.
Although out of date now the following page has some great information on why this is not a simple problem...
http://www.kegel.com/c10k.html
Rather than worrying about performance now get something working, benchmark it then ask how to make it faster... if you've been smart and made sure that you have a modular design swapping out parts of it will be relatively easy ie look at what Apache did with MPM, a pluggable engine with completely different performance characteristics etc.
Once you have your server outperforming any of the above in benchmarks your answer to this question would likely be accepted.
Heavy workload is a misleading term and in the end, it doesn't really dictate how you should design your system. The main issue here, is one of responsiveness and its requirements. If processing a single request takes a long time, and you do not want to starve the other clients (which you probably don't), then a single thread design will obviously not do. You should at least have a thread (or one per client) that handles responding to the request in some manner, even if only to notify the client that the request is being processed.
I am implementing a small database like MySQL.. Its a part of a larger project..
Right now i have designed the core database, by which i mean i have implemented a parser and i can now execute some basic sql queries on my database.. it can store, update, delete and retrieve data from files.. As of now its fine.. however i want to implement this on network..
I want more than one user to be able to access my database server and execute queries on it at the same time... I am working under Linux so there is no issue of portability right now..
I know i need to use Sockets which is fine.. I also know that i need to use a concept like Thread Pool where i will be required to create a maximum number of threads initially and then for each client request wake up a thread and assign it to the client..
As for now what i am unable to figure out is how all this is actually going to be bundled together.. Where should i implement multithreading.. on client side / server side.? how is my parser going to be configured to take input from each of the clients separately?(mostly via files i think?)
If anyone has idea about how i can implement this pls do tell me bcos i am stuck here in this project...
Thanks.. :)
If you haven't already, take a look at Beej's Guide to Network Programming to get your hands dirty in some socket programming.
Next I would take his example of a stream client and server and just use that as a single threaded query system. Once you have got this down, you'll need to choose if you're going to actually use threads or use select(). My gut says your on disk database doesn't yet support parallel writes (maybe reads), so likely a single server thread servicing requests is your best bet for starters!
In the multiple client model, you could use a simple per-socket hashtable of client information and return any results immediately when you process their query. Once you get into threading with the networking and db queries, it can get pretty complicated. So work up from the single client, add polling for multiple clients, and then start reading up on and tackling threaded (probably with pthreads) client-server models.
Server side, as it is the only person who can understand the information. You need to design locks or come up with your own model to make sure that the modification/editing doesn't affect those getting served.
As an alternative to multithreading, you might consider event-based single threaded approach (e.g. using poll or epoll). An example of a very fast (non-SQL) database which uses exactly this approach is redis.
This design has two obvious disadvantages: you only ever use a single CPU core, and a lengthy query will block other clients for a noticeable time. However, if queries are reasonably fast, nobody will notice.
On the other hand, the single thread design has the advantage of automatically serializing requests. There are no ambiguities, no locking needs. No write can come in between a read (or another write), it just can't happen.
If you don't have something like a robust, working MVCC built into your database (or are at least working on it), knowing that you need not worry can be a huge advantage. Concurrent reads are not so much an issue, but concurrent reads and writes are.
Alternatively, you might consider doing the input/output and syntax checking in one thread, and running the actual queries in another (query passed via a queue). That, too, will remove the synchronisation woes, and it will at least offer some latency hiding and some multi-core.
Scenario : I am working on LOB application, as in silverlight every call to service is Async so automatically UI is not blocked when the request is processed at server side.
Silverlight also supports threading as per my understanding if you are developing LOB application threads are most useful when you need to do some IO operation but as i am not using OOB application it is not possible to access client resource and for all server request it is by default Async.
In above scenario is there any usage of Threading or can anyone provide some good example where by using threading we can improve performance.
I have tried to search a lot on this topic but everywhere i have identified some simple threading example from which it is very difficult to understand the real benefit.
Thanks for help
Tomasz Janczuk has also pointed out that if the UI thread is fairly busy, you can significantly improve the performance even of async WCF calls by marshaling them onto a separate thread. And I should note that the UI thread can get awfully busy doing things that you wouldn't anticipate would chew up cycles, like calculating drop-shadows and what-not, so this might be worth investigating (and measuring) for your application.
That said, I've been writing LOB apps for the better part of two decades, and synchronous IO aside, I haven't found a lot of scenarios where adding multiple threads in an LOB application was worth the additional complexity.
Edit 4/2/10: I had lunch with Tomasz Janczuk and some other folks from the WCF team the other day, and they clarified a few issues for me about how WCF works with Silverlight background threads. There are two things to be concerned with: sending data, and receiving it (say, from duplex callbacks or async call completions). When you send data, the call will always be made from the thread that actually makes the call. So if you have a lot of data that needs to be serialized, you might get a small performance boost by marshaling the outgoing call onto a background thread (say, by using ThreadPool.QueueUserWorkItem). But it's not likely to be a substantial performance boost.
However, when you receive data, either through a duplex callback, or through an async xxxCompleted method, the data is always received on the thread on which the connection was originally opened. This means that if you're opening the connection explicitly, it will receive data on that thread; but if you're opening the connection implicitly, it will receive data on the thread on which you made your first outbound connection. This won't make a lot of difference if you need to update the UI on every callback, since you'd just have to marshal the call back onto the UI thread. But if there are times when you just need to store the data for future reference or processing, you can get yourself a significant performance boost by opening your connection on a separate thread, so that you can receive and process callbacks without waiting on the UI thread.
Hope this helps. Thought I'd write it down while I still have it reasonably fresh in my head.
The same advantages apply to Silverlight as to other applications. If your are doing a long running calculation on the client and don't want to tie up the main/ui thread, then threading is an obvious choice.
Also, I haven't researched it, but I would imagine if you are running a multi-core machine, you could improve performance by splitting work into multiple separate threads.
I have a C# winform application. it has many forms with different functionalities. These forms wrap to a WCF service. for example
form1 calls serviceMethod1 continuously and updates the results
form2 calls serviceMethod2 continuously and updates the results
The calls are made in a different thread per each form, but this is ending up with too many threads as we have many forms. Is this bad and why? and is there a way to avoid this given my scenario?
Regards
How many threads are you talking about? If you have a lot of threads, you'll lose a bit of performance due to context switching - but in practice I wouldn't expect this to become a significant problem until you have an awful lot of them.
One alternative would be to use a Timer though (it sounds like a System.Timers.Timer or System.Threading.Timer would be most appropriate) - schedule each service call to be made on a regular basis, and the timer will use the threadpool to fire the calls. I suspect that although you say you're calling the services "continuously" you actually mean you're doing it regularly - which is exactly the kind of situation a timer is good for.
To answer the question frankly: It depends entirely on the OS and app design, but this question may indicate a shortcoming in the program's design.
Detail:
You want to learn the allocation requirements of a thread on your target architecture/OS, as well as keep your threads relatively busy/avoid polling, and to configure priorities correctly if you really do have a lot of threads. 'Many' threads may be 8 (or fewer, if busy), or 100+ if they have relatively little work to do, it ultimately depends on your needs and design.
As tests for some tests/objects/operations, I have used more than 100, and occasionally more than 1000 working threads. No explosions happened, though I have never had a true need for those operations to be that parallel in a shipping app (unless the aforementioned programs are being used in very unusual circumstances), and it made more sense to put the actual implementation into some centralized task manager. If you have time-critical/real time applications, then these tasks may be best on another thread. If they are short lived, consider a thread pool.. well, there are many ways to attack many problem classes...
You can use WCF asynchronious proxy
In Visual Studio, when you add Web Reference you can check "Generate Asynchronous operations" to generate an asynchronious proxy.
While the threads spend most of their time waiting for server response - even hundreds of threads are unlikely to degrade performance (CPU-wise). Otherwise, use thread pool and queue "request and update form once" tasks when previous update completes.
More important problem might be loading service with too many simultaneous requests.
As a general rule, you won't gain anything by having more threads than you have CPU cores. There are exceptions to the general rule, but I doubt they apply to your case.
From the OS' point of view, threads are no longer the lightweight things they used to be, but are almost as costly as full processes. Implementing thread synchronization correctly is not a simple task, debugging multi-threaded applications is a lot harder than a single threaded one.
With green threads, it is not an issue. Green threads being sort of a virtual thread, which is what you will generally get with Java and C#.
The benefit of threads in many apps is not to crunch more numbers but to allow lots of things to go on at once with good responsiveness, so having a lot of threads can be very useful for some things and will not always have any real cost.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've been experiencing the good and the bad sides of messaging systems in real production environments, and I must admit that a well organized table or schema of tables simply beats every time any other form of messaging queue, because:
Data are permanently stored on a table. I've seen so many java (jms) applications that lose or vanish messages on their way for uncaught exceptions or other bugs.
Queues tend to fill up. Db storage is virtually infinite, instead.
Tables are easily accessible, while you have to use esotic instruments to read from a queue.
What's your opinion on each approach?
The phrase beats every time totally depends on what your requirements were to begin with. Certainly its not going to beat every time for everyone.
If you are building a single system which is already using a database, you don't have very high performance throughput requirements and you don't have to communicate with any other teams or systems then you're probably right.
For simple, low thoughput, mostly single threaded stuff, database are a totally fine alternative to message queues.
Where a message queue shines is when
you want a high performance, highly concurrent and scalable load balancer so you can process tens of thousands of messages per second concurrently across many servers/processes (using a database table you'd be lucky to process a few hundred a second and processing with multiple threads is pretty hard as one process will tend to lock the message queue table)
you need to communicate between different systems using different databases (so don't have to hand out write access to your systems database to other folks in different teams etc)
For simple systems with a single database, team and fairly modest performance requirements - sure use a database. Use the right tool for the job etc.
However where message queues shine is in large organisations where there are lots of systems that need to communicate with each other (and so you don't want a business database to be a central point of failure or place of version hell) or when you have high performance requirements.
In terms of performance a message queue will always beat a database table - as message queues are specifically designed for the job and don't rely on pessimistic table locks (which are required for a database implementation of a queue - to do the load balancing) and good message queues will perform eager loading of messages to queues to avoid the network overhead of a database.
Similarly - you'd never use a database to do load balancing of HTTP requests across your web servers - as it'd be too slow - if you have high performance requirements for your load balancer you'd not use a database either.
I've used tables first, then refactor to a full-fledged msg queue when (and if) there's reason - which is trivial if your design is reasonable.
The biggest benefits are a.) it's easier, (b. it's a better audit trail because you have the other tables to join to, c.) if you know the database tools really well, they are easier to use than the Message Queue tools, d.) it's generally a bit easier to set up a test/dev environment in a context that already exists for your app (if same familiarity applies).
Oh, and e.) for perhaps you and others, it's not another product to learn, install, configure, administer, and support.
IMPE, it's just as reliable, disconnectable, and you can convert if it needs more scalable.
Data are permanently stored on a table. I've seen so many java (jms) applications that loose or vanish messages on their way for uncaught exceptions or other bugs.
Which JMS implementation? Sun sells reliable queue which can't lose messages. Perhaps you just purchased a cheesy JMS-compliant product. IBM's MQ is extremely reliable, and there are JMS libraries to access it.
Queues tend to fill up. Db storage is virtually infinite, instead.
Ummm... If your queue fills up, it sounds like something is broken. If your apps crash, that's not a good thing, and queues have little to do with that. If you've purchased a really poor JMS implementation, I can see where you might be unhappy with it. It's a competitive market-place. Find a better queue manager. Sun's JCAPS has a really good queue manager, formerly the SeeBeyond message queue.
Tables are easily accessible, while you have to use esotic instruments to read from a queue.
That doesn't fit with my experience. Tables are accessed through this peculiar "other language" (SQL), and requires that I be aware of structure mappings from tables to objects and data type mappings from VARCHAR2 to String. Further, I have to use some kind of access layer (JDBC or an ORM which uses JDBC). That seems very, very complex. A queue is accessed through MessageConsumers and MessageProducers using simple sends and receives.
It sounds as though the problems you've experienced are not inherent to messaging, but rather are artifacts of poorly-implemented messaging systems. Is building messaging systems harder than building database systems? Yes, if all you ever do is build database systems.
Losing messages to uncaught exceptions? That's hardly the fault of the message queue. The applications you're using are poorly engineered. They're removing messages from the queue before processing completes. They're not using transactions, or journalling.
Message queues fill up while DB storage is "virtually infinite"? You talk as though managing disk space were something that databases didn't require. Message queue servers require administration, just like database servers do.
Esoteric instruments to read from a queue? Maybe if you find asynchronous methods esoteric. Maybe if you find serialization and deserialization esoteric. (At least, those are the things I found esoteric when I was learning messaging. Like many seemingly-esoteric technologies, they're actually quite mundane once you understand them, and understanding them is an important part of the seasoned developer's education.)
Aspects of messaging that make it superior to databases:
Asynchronous processing. Message queues notify waiting processes when new messages arrive. To accomplish this functionality in a database, the waiting processes have to poll the database.
Separation of concerns. The communications channel is decoupled from the implementation details of the message content. Only the sender and the receiver need to know anything about the format of the data stream within a given message.
Fault-tolerance.. Messaging can function when connections between servers are intermittent. Message queues can store messages locally and only forward them to remote servers when the connection is live.
Systems integration. In the Windows world, at least, messaging is built into the operating system. It uses the OS's security model, it's managed through the OS's tools, etc.
If you don't need these things, you probably don't need messaging.
Here's a simple example of an application for messaging: I'm building a system right now where users, distributed across multiple networks, are entering fairly intricate sets of transactions that are used to produce printed output. Output generation is computationally expensive and not part of their workflow; i.e. the users don't care when the output gets generated, just that it does.
So we serialize the transactions into a message and drop it in a queue. A process running on a server grabs messages from the queue, produces the output, and stores the output in an imaging system.
If we used a database as our message store, we'd have to come up with a schema to store a transaction format that right now only the sender and receiver care about, we'd need to make sure every workstation on the network had permanent persistent connections to the database server, we'd have no capacity to distribute this transaction load across multiple servers, and our output server would have to query the database thousands of times a day waiting to see if there were new jobs to process.
Queues provide reliable messaging. The store-and-forward, disconnected nature of queueing make it much more scalable than databases, not to mention more robust.
And queues shouldn't really be used for permanent storage of information - it is best to think of them as temporary inboxes, unlike databases.