Efficiency of asynchronous non-blocking server socket [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to write a server program which accepts incoming connections and process received requests. The first idea appeared in my head was to use non-blocking socket with epoll() or select().
For instance, when epoll() returns, it gives me a socket array with available IO events. Then I have to loop over the socket array to send and receive data, once the buffer is entirely received or sent, a callback functions is executed. This is also the technique the most discussed on internet.
However, if I use this technique, my program will keep all the other sockets waiting while it is dealing with one client connection. Isn't it an inefficient way to go if the request of client is time-consuming?
In the documents that I've found they said that such one thread/process design can easily handle hundreds of connections simultaneously, and multithread design is always heavily criticized for its complexity, system overhead etc.
Thus, my question is: How to design an efficient server program if it has to handle heavy workload?
Thanks.

Million dollar questions with a million different trade offs. For those that get Monty Python...
https://www.youtube.com/watch?v=pWS8Mg-JWSg
Back to reality... Apache2 can handle heavy workloads, nginx can handle heavy workloads, so can Node, Tomcat, Jetty, JBoss, Netty... In fact any of the well knowns applications servers in use today and quite a few less well known ones can handle heavy workloads and they all use various combinations of threads, events and processes to do it. Some languages ie Erlang or Go etc allow you to easily spin up high performance application servers in a few hundred lines of code.
Although out of date now the following page has some great information on why this is not a simple problem...
http://www.kegel.com/c10k.html
Rather than worrying about performance now get something working, benchmark it then ask how to make it faster... if you've been smart and made sure that you have a modular design swapping out parts of it will be relatively easy ie look at what Apache did with MPM, a pluggable engine with completely different performance characteristics etc.
Once you have your server outperforming any of the above in benchmarks your answer to this question would likely be accepted.

Heavy workload is a misleading term and in the end, it doesn't really dictate how you should design your system. The main issue here, is one of responsiveness and its requirements. If processing a single request takes a long time, and you do not want to starve the other clients (which you probably don't), then a single thread design will obviously not do. You should at least have a thread (or one per client) that handles responding to the request in some manner, even if only to notify the client that the request is being processed.

Related

Disadvantages of SQL Server Service Broker [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been doing r&d for the scope of SQL Server Service Broker to replace current messaging solution MSMQ. I want to know disadvantages of SQL Server Service Broker in contrast to MSMQ for following criteria.
Development
Troubleshooting
Performance (let's say we need to process 100,000 messages daily, having avg size around 25 KB)
Scalability
I've used Service Broker in my current project, having previously used MSMQ (brokered by MassTransit) with great success. I was initially dubious about using Service Broker, but I have to admit it has performed very well.
If you're using the Publish / Subscribe model, then I would use Message Queueing every time (though I would use RabbitMQ over MSMQ if the project allowed) but when you just want to chew through a stack of data and persist it to Sql Server, then Service Broker is a great solution: the fact it's so 'close to the metal' is a big advantage.
Development
Service Broker requires a lot of boilerplate, which is a pain, but unless you're planning on having lots of different queues it's manageable. Sql Server projects in Visual Studio take a lot of the pain of deploying away.
Troubleshooting
Service Broker is a black box - messages go in, and they usually come out, but if they don't then troubleshooting can be problematic, and all you can do is query the system views - and sometimes you simply can't find out what has gone wrong. This is annoying, but MSMQ has the same kind of issues..
Performance
Service Broker performance is excellent for our use case (see comment section below for discussion). We are processing a lot more than 100,000 messages per day, more than 30,000 per hour at our SLA load, and our message sizes are large. I would estimate we process close to 100,000 messages per hour during heavy load testing.
For best performance I would advise you to use a Dialog Pool like this one 1 as creating a Service Broker dialog can be an expensive operation.
You will also want to use the Error Handling procedures detailed by Remus Rusanu. (If you do use Service broker, you might as well read everything Remus has written on the subject before you start, as you'll end out reading it eventually!)
Scalability
You can certainly use more than one server to scale up if required, though we haven't had to do so, and from the load size you mention I don't think you would need to either.
I don't think I have really managed to answer your question, as I haven't highlighted enough disadvantages of Service Broker queues. I would say the impenetrable nature of its internal workings is the thing that most annoys me - when it works, it works very well, but when it stops working it can be very hard to figure out why. Also, if you have a lot of messages in a queue, using ALTER QUEUE takes a very long time to complete.
Not knowing how you are using MSMQ also makes it different to fairly compare the two technologies.
1 Recreated in a gist as the original url is now "disabled" and the page isn't in the internet archive. Eventually found a copy here

Efficiently process data in real time and push in database in C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I solved one of the problems I came across and need help on tuning it further to gain performance.
There will be an apache module that receives request over http protocol and respond in few milliseconds. It processes input parameters and put it into cache layer as JSON record. (Currently text files are used as cache).
Another asynchronous module reads JSON records, cook them and push them into MongoDB. This avoid latency in http response that might get added because of MongoDB performance degradation.
There will be multiple machines (currently 12) under Load Balancer and would expect 100 M requests per day which would be ~10-15 GB in size measured when JSON records are written in text files.
Basically I am searching for better data ingestion solution.
About using ZeroMQ or RabbitMQ in producer consumer architecture,
Message queues are in memory queues and they might not be able to consume this much data in memory.
Data being consumed in MongoDB is important data and we could not cost loosing it in case message queue goes down / crashes for some reason.
Please suggest.
Looks like in both cases each thread is coupled with its matching thread in the other module (either by a fixed file or offset) - this means that you still get bottlenecked if one of the modules is inherently slower than the other, as the threads of the faster module will become underutilized.
Instead, you can use tasks queue (or similar solutions to the multiple producers - multiple consumers problem), and let each thread choose from any available tasks once it becomes free. This will allow you greater freedom in balancing the number of threads in each module. If the front end http module for e.g. is 2x faster than the backend processing, you can spawn 2x more backend threads.
The "price" is that you'll need to maintain the shared queue safely (locking etc..) and make sure it's done efficiently and with no deadlocks.
you can use something like RabbitMQ or ZeroMQ for this kind of thing and use bulk data inserts into your database, or scale out to other servers as needed.
You create HTTP receiver threads to receive the data to be recorded, pushing the incoming data to a receive queue in memory. You create a single database writer thread that does nothing else but takes all received data queued in memory (by the receiver threads), converts them to a single database INSERT with multiple rows, sends them in a single transaction, and after the transaction is done, it goes back to the queue for another batch of received data (the queue collected the data while the previous transaction was in progress). Sending multiple INSERT transactions in parallel to a single database table can result in higher throughput only in special cases, that's the reason why a single writer thread is usually a good choice. Batching together multiple INSERTs into a single transaction will make a much better use of the limited I/O capabilities of HDDs.

Is it possible for a server to get two requests at the exact same time? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I'm a bit confused about how a server handles thousands of requests. I'd imagine that if there are thousands of requests, eventually two of them would have to hit the server at the exact same time. If two messages hit it at the same time, it can only handle one right?
I know that if multiple requests hit a server in very close, but not exact same, times then it can distribute the work to other servers very quickly before it gets another job to do. What I don't get, is what happens when a machine gets two requests at precisely the same moments.
Even if two cpu's are handling requests at exactly the same time, and they conflict, then at some stage they will say "I want to do something, which can't be done at the same time as anything else is happening". If two cpus do this at exactly the same time, there will be a predefined order - I would imagine that the lowered-number CPU would get priority.
This blocking will only take a cycle or two - just long enough to say "I lay claim to this resource" (known as a lock). The block will then be unblocked, and the other CPU can resume (so long as it doesn't want to access the same data).
This is hideously oversimplified.
Also, it is not possible for two requests to "come in at the same time" over the network, if there is only one network card - if they come in at the same time, EXACTLY (on a gigabit network), then the two packets will collide; both requests will be retried after a slightly random time, until there is no collision.
Remember, this is running at the clock speed of your computer (e.g. up to 3ghz), so "thousands per seconds" are as nothing. 1 million requests per second, would come out as 1 request per 1000 cpu cycles.
All servers are possibly concurrrent Servers.The server can be iterative, i.e. it iterates through each client and serves one request at a time.
Alternatively, a server can handle multiple clients at the same time in parallel, and this type of a
server is called a concurrent server.Also they are generating response by load balancing and making web cache. These two important mechanisms are used to allow many users at the same time. Web Cache A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.1 Google's cache link in its search results provides a way of retrieving information from websites that have recently gone down and a way of retrieving data more quickly than by clicking the direct link.From Wiki
Load balancing Load balancing is a computer networking method for distributing workloads across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources. Successful load balancing optimizes resource use, maximizes throughput, minimizes response time, and avoids overload. Using multiple components with load balancing instead of a single component may increase reliability through redundancy. Load balancing is usually provided by dedicated software or hardware, such as a multilayer switch or a Domain Name System server Process.From WikiThis is a little picture of multi threaded server

Implementing multithreaded application under C

I am implementing a small database like MySQL.. Its a part of a larger project..
Right now i have designed the core database, by which i mean i have implemented a parser and i can now execute some basic sql queries on my database.. it can store, update, delete and retrieve data from files.. As of now its fine.. however i want to implement this on network..
I want more than one user to be able to access my database server and execute queries on it at the same time... I am working under Linux so there is no issue of portability right now..
I know i need to use Sockets which is fine.. I also know that i need to use a concept like Thread Pool where i will be required to create a maximum number of threads initially and then for each client request wake up a thread and assign it to the client..
As for now what i am unable to figure out is how all this is actually going to be bundled together.. Where should i implement multithreading.. on client side / server side.? how is my parser going to be configured to take input from each of the clients separately?(mostly via files i think?)
If anyone has idea about how i can implement this pls do tell me bcos i am stuck here in this project...
Thanks.. :)
If you haven't already, take a look at Beej's Guide to Network Programming to get your hands dirty in some socket programming.
Next I would take his example of a stream client and server and just use that as a single threaded query system. Once you have got this down, you'll need to choose if you're going to actually use threads or use select(). My gut says your on disk database doesn't yet support parallel writes (maybe reads), so likely a single server thread servicing requests is your best bet for starters!
In the multiple client model, you could use a simple per-socket hashtable of client information and return any results immediately when you process their query. Once you get into threading with the networking and db queries, it can get pretty complicated. So work up from the single client, add polling for multiple clients, and then start reading up on and tackling threaded (probably with pthreads) client-server models.
Server side, as it is the only person who can understand the information. You need to design locks or come up with your own model to make sure that the modification/editing doesn't affect those getting served.
As an alternative to multithreading, you might consider event-based single threaded approach (e.g. using poll or epoll). An example of a very fast (non-SQL) database which uses exactly this approach is redis.
This design has two obvious disadvantages: you only ever use a single CPU core, and a lengthy query will block other clients for a noticeable time. However, if queries are reasonably fast, nobody will notice.
On the other hand, the single thread design has the advantage of automatically serializing requests. There are no ambiguities, no locking needs. No write can come in between a read (or another write), it just can't happen.
If you don't have something like a robust, working MVCC built into your database (or are at least working on it), knowing that you need not worry can be a huge advantage. Concurrent reads are not so much an issue, but concurrent reads and writes are.
Alternatively, you might consider doing the input/output and syntax checking in one thread, and running the actual queries in another (query passed via a queue). That, too, will remove the synchronisation woes, and it will at least offer some latency hiding and some multi-core.

Is network event based programming really better...? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
With event based programming you essentially just loop and poll, loop and poll...why is this preferred to just blocking? If you're not receiving any events why would you prefer to use select() over just blocking on an accept()?
Normally this question is posed as if event based network programming is makes better use of resources than thread based network programming. The general answer to this question is theoretically no, but practically yes. I recall a paper written by one of the founders of Inktomi who's product later became Apache Traffic Server (Traffic Server is an event based http proxy). Basically, the conclusions were that userspace threads could be as fast as an event based model. They felt that context switching would always make OS level threads slower that event models. There were at the time no production ready userspace threading models that compete with event based models. Finally, they indicated that the conceptual overhead of using a event based model over a thread based model was significant on a large scale application. You have already noticed this.
It is much simpler to just have a bunch of threads each handling the whole connection lifetime than to have an event loop dispatching work based on when some part of the process has to block, when a timer goes off, or who knows what other events. Sadly, at this time, the more complicated approach is the faster.
Note: sorry for not posting a link to the paper, but I cannot seem to find an online source right now. I will try to edit this post with a link later
"better" depends on what you need.
With event based (select/poll/epoll/etc.) IO, you can listen on events from many(thousands) sockets in one thread. This can vastly improve scalability vs using one thread per socket doing blocking operations.
With blocking read/writes/accepts, you can't service several clients concurrently in one thread, you'll have to use at least one thread/process per connection. The drawback here is that this does not scale as much as event based IO. However the programming model becomes much easier.
Sometimes you'll need to call APIs (e.g. to query a backend database) which only provides a blocking API. In such a case, you'll block every other client if you do this in an event based IO loop, and you'll basically have to resort to using thread-per-client anyway - if you need scalability in such cases, it's common to couple an event loop with a worker thread pool, which might make the programming model even harder.
You can use blocking sync IO when:
You have only one socket active
Your application doesn't do anything but react to events on the socket
You don't plan to extend the application
You don't mind that the user have to kill the application to exit
If any of these are false, you're likely better off with a polling loop.
It isn't a good idea let the program block on an accept. It means that no other operations can be performed until your application will receive some datas.
Even if yout network requirements are very simple and you don't need to send or receive other datas than which for your blocked socket is waiting for, you can't even update your GUI (if you have one ) or receive input from the user.
Generally the point is to use select or threads?
Threads are hard to debug and can create problems regarding concurrent operations. So, unless you explicitly need threads , i'll suggest to use select.

Resources