which is better way to edit RLIMIT_NPROC value - c

My application creates per connection thread . Application is ruinng under the non-zero user id and Sometimes number of threads surpasses default value 1024 . I want to edit this number so I have few options
run as root [very bad idea and also have to compromise with securty ,so dropping it]
run under underprivilaged user use setcap and give capability CAP_SYS_RESOURCE . then I can add code im my program
struct rlimit rlp; /* will initilize this later with values of nprocs(maximum number of desired threads)*/
setrlimit(RLIMIT_NPROC, &rlp);
/*RLIMIT_NPROC
*The maximum number of processes (or, more precisely on Linux, threads) that can
* created for the real user ID of the
*calling process. Upon encountering this limit, fork(2) fails with the error
*EAGAIN. */
Other thing is editing /etc/securitylimits.conf where simply I can make entry for the development user and can put lines e.g.
#devuser hard nproc 20000
#devuser soft nproc 10000
where 10k is enough .So being litle reluctant in chaning source code should I proceed with last option . And I am more curios to know what is more robust and standars approach.
seeking your opinions , and thank you in advance :)
PS: What will happen if a single process will be served with more than 1k threads . ofcource i have 32GB of Ram also

First, I believe you are wrong in having nearly a thousand threads. Threads are quite costly, and it is usually not reasonable to have so much of them. I would suggest having a few dozen threads at most (unless you run on a very costly super-computer).
You could have some event loop around a multiplexing syscall like poll(2). Then a single thread can deal with many thousands of connections. Read about the C10K problem and epoll. Consider using some event libraries like libevent or libev etc...
You could start your application as root (perhaps by using setuid techniques), set-up the required resources (in particular, opening privileged TCP/IP ports), and change the user with setreuid(2)
Read Advanced Linux Programming...
You could also wrap your application around a tiny setuid C program which increase the limits using setrlimit(2), change the user with setreuid, and at last execve(2) your real program.

Related

How to choose for multithreading - c

I have to do a program client-server in c where server can use n-threads that can work simultaneously for manage the request of clients.
For do it I use a socket that use a listener that put the new FD (of new connection request) in a list and then the threads can take it when they are able to do.
I know that I can use pipe too for communication between thread.
Is the socket the best way ? And why or why not?
Sorry for my bad English
To communicate between threads you can use socket as well as shared memory.
To do multithreading there are many libraries available on github, one of them I used is the below one.
https://github.com/snikulov/prog_posix_threads/blob/master/workq.c
I tried and tested the same way what you want. it works perfect!
There's one very nice resource related to socket multiplexing which I think you should stop and read after reading this answer. That resource is entitled The C10K problem, and it details numerous solutions to the problem people faced in the year 2000, of handling 10000 clients.
Of those solutions, multithreading is not the primary one. Indeed, multithreading as an optimisation should be one of your last resorts, as that optimisation will interfere with the instruments you use to diagnose other optimisations.
In general, here is how you should perform optimisations, in order to provide guaranteed justifications:
Use a profiler to determine the most significant bottlenecks (in your single-threaded program).
Perform your optimisation upon one of the more significant bottlenecks.
Use the profiler again, with the same set of data, to verify that your optimisation worked correctly.
You can repeat these steps ad infinitum until you decide the improvements are no longer tangible (meaning, good luck observing the differences between before and after). Following these steps will provide you with data you can show your employer, if he/she asks you what you've been doing for the last hour, so make sure you save the output of your profiler at each iteration.
Optimisations are per-machine; what this means is that an optimisation for your machine might actually be slower on another machine. For example, you may use a buffer of 4096 bytes for your machine, while the cache lines for another machine might indicate that 512 bytes is a better idea.
Hence, ideally, we should design programs and modules in such a way that their resources are minimal and can be easily be scaled up, substituted and/or otherwise adjusted for other machines. This can be difficult, as it means in the buffer example above you might start off with a buffer of one byte; you'd most likely need to study finite state machines to achieve that, and using buffers of one byte might not always be technically feasable (i.e. when dealing with fields that are guaranteed to be a certain width; you should use that width as your minimum limit, and scale up from there). The reward is ultra-portable and ultra-optimisable in all situations.
Keep in mind that extra threads use extra resources; we tend to assume that the stack space reserved for a thread can grow to 1MB, so 10000 sockets occupying 10000 threads (in a thread-per-socket model) would occupy about 10GB of memory! Yikes! The minimal resources method suggests that we should start off with one thread, and scale up from there, using a multithreading profiler to measure performance like in the three steps above.
I think you'll find, though, that for anything purely socket-driven, you likely won't need more than one thread, even for 10000 clients, if you study the C10K problem or use some library which has been engineered based on those findings (see your comments for one such suggestion). We're not talking about masses of number crunching, here; we're talking about socket operations, which the kernel likely processes using a single core, and so you can likely match that single core with a single thread, and avoid any context switching or thread synchronisation troubles/overheads incurred by multithreading.

How to get the fastest data processing way: fork or/and multithreading

Imagine that we have a client, which keeps sending lots of double data.
Now we are trying to make a server, which can receive and process the data from the client.
Here is the fact:
The server can receive a double in a very short time.
There is a function to process a double at the server, which needs more than 3 min to process only one double.
We need to make the server as fast as possible to process 1000 double data from the client.
My idea as below:
Use a thread pool to create many threads, each thread can process one double.
All of these are in Linux.
My question:
For now my server is just one process which contains multi-threads. I'm considering if I use fork(), would it be faster?
I think using only fork() without multithreading should be a bad idea but what if I create two processes and each of them contains multi-threads? Can this method be faster?
Btw I have read:
What is the difference between fork and thread?
Forking vs Threading
To a certain degree, this very much depends on the underlying hardware. It also depends on memory constraints, IO throughput, ...
Example: if your CPU has 4 cores, and each one is able to run two threads (and not much else is going on on that system); then you probably would prefer to have a solution with 4 processes; each one running two threads!
Or, when working with fork(), you would fork() 4 times; but within each of the forked processes, you should be distributing your work to two threads.
Long story short, what you really want to do is: to not lock yourself into some corner. You want to create a service (as said, you are building a server, not a client) that has a sound and reasonable design.
And given your requirements, you want to build that application in a way that allows you to configure how many processes resp. threads it will be using. And then you start profiling (meaning: you measure what is going on); maybe you do experiments to find the optimum for a given piece of hardware / OS stack.
EDIT: I feel tempted to say - welcome to the real world. You are facing the requirement to meet precise "performance goals" for your product. Without such goals, programmer life is pretty easy: most of the time, one just sits down, puts together a reasonable product and given the power of todays hardware, "things are good enough".
But if things are not good enough, then there is only one way: you have to learn about all those things that play a role here. Starting with things "which system calls in my OS can I use to get the correct number of cores/threads?"
In other words: the days in which you "got away" without knowing about the exact capacity of the hardware you are using ... are over. If you intend to "play this game"; then there are no detours: you will have to learn the rules!
Finally: the most important thing here is not about processes versus threads. You have to understand that you need to grasp the whole picture here. It doesn't help if you tune your client for maximum CPU performance ... to then find that network or IO issues cause 10x of "loss" compared to what you gained by looking at CPU only. In other words: you have to look at all the pieces in your system; and then you need to measure to understand where you have bottlenecks. And then you decide the actions to take!
One good reading about that would be "Release It" by Michael Nygard. Of course his book is mainly about patterns in the Java world; but he does a great job what "performance" really means.
fork ing as such is way slower than kicking off a thread. A thread is much more lightweight (traditionally, although processes have caught up in the last years) than a full OS process, not only in terms of CPU requirements, but also with regards to memory footprint and general OS overhead.
As you are thinking about a pre-arranged pool of threads or processes, setup time would not account much during runtime of your program, so you need to look into "what is the cost of interprocess communications" - Which is (locally) generally cheaper between threads than it is between processes (threads do not need to go through the OS to exchang data, only for synchronisation, and in some cases you can even get away without that). But unfortunately you do not state whether there is any need for IPC between worker threads.
Summed up: I cannot see any advantage of using fork(), at least not with regards to efficiency.

Fork()ing and running on specific set of CPUs

I have a parent process, which I use to spawn a series of child processes, which each run their own program sequentially. Each of these programs change a file over time, I want to read the data from this file and see how it changes as each program runs.
I need two sets of data for this to work, the value of the file at some set interval (I haven't decided on the interval yet), and the time each program takes to run, there are other variables which can influence the execution times of these programs, which I want to see also.
So I figured to get more accurate timing of the child process while still reading from a file I could run them on different cores. I have 8 cores, I would like to run the parent process on 0-3, then fork the child to run on 4-7. I'm not sure if this is possible though within C, and a search around hasn't yielded any answers, which makes me think it isn't.
Within Linux, outside of a program, I can use taskset to do this.
I plan on setting aside 4 of the cores using the kernel parameter isolcpus(). I want as little noise as possible while running the child programs.
Asking the kernel to associate CPU cores with threads or processes is also known as setting the "affinity" between the core and the process/thread.
Under linux, there exists a set of functions that provide this capability. Take a look at the manual page for one of the functions...
man pthread_setaffinity_np
This family of API calls might be able to give you what you need.
That man page has a "see also" section that links to the other functions in this family.
Typically with features such as these that deal with kernel process and thread scheduling, it is entirely dependent on what mood the kernel is in at the time as to whether your requests are met or ignored. Your mileage may very due to system load or the number of available cores. Even if a system has 16 cores, these features may be disabled in the kernel compilation settings (think virtual machines). Equally, you may find that there are some additional options that you may be able to add to your kernel to get better results than the defaults.

Allocating a lot of file descriptors

I am interested in bringing a system down (for, say 15 minutes) by allocating a lot of file descriptors and causing Out-of-File-Descriptor failure. (Don't worry, I am not trying to hack into anything. This is for testing a service I am writing... to see how it behaves under other programs misbehaving.) Any best practices for that? Should I just keep saying fopen() in a infinite for loop? And after 15 minutes, I can kill the process? Does anybody have experience with this?
Update: I am running Linux and the program I am writing will have super user privileges.
Thanks,
~yogi
Did you consider lowering with setrlimit RLIMIT_NOFILE the file descriptor limit before running your program?
This can be done simply with the bash ulimit -n builtin, in the same shell where you test your application, e.g.:
ulimit -n 32
And it won't perturb much a lot of other services already running. Lowering that limit will make your application (run in the same shell) hurt it quickly (for your testing purposes).
On the entire system level you might also write into /proc/sys/fs/file-max e.g. with
echo 1024 > /proc/sys/fs/file-max
Depends on OS implementation, but call fopen on same file from same process will not allocate new file description, but just increment reference counter.
I would recommend you to read something about stress testing
Here are some usable software(you don't tag any OS platform):
http://www.opensourcetesting.org/performance.php
I had this happen once in normal use. I believe you run of inodes in linux. I don't know a faster way that just opening files. Just be careful, we locked our system up. It was a while ago so I don't remember what was trying to open a file, but things generally assume they can get a file handle and don't behave as well as they should in the case they can't. ~Ben
My 2 cents:
1.Write a program that creates a lot of file descriptors. You can achieve it by one of the following methods:
(a)Opening lot of different files in your code
(b)Opening a lot of socket descriptors
(c)Creating a lot of threads
2.Now, keep spawning multiple instances of the program created in Step-1 (i.e. create multiple processes) using a shell script or something similar.
Note:
In linux as well as most other operating systems, there is a limit on the number of file descriptors per process (In linux by default it is 1024 I guess. You can check it using ulimit -a). So, your process will just fail when you do this. I am really not so sure that just by increasing the number of file descriptor usage you can make the system go down.
You can use mkstemp to get file descriptors of temporary files.

Simulating file system access

I am designing a file system in user space and need to test it. I do not want to use the available benchmarking tools as my requirements are different. So to test the file system I wish to simulate file access operation. To do this, I first use the ftw() function to walk through one f my existing file system(experimental) and list all the files and directories in a file.
Then I invoke a simulator to simulate file access by a number of processes. Thus, the simulator randomly starts a process i.e it forks a thread which does what a real process would have done. The thread randomly selects a file operation (read, write, rename etc) selects arguments to this operation from the list(generated by ftw()) . The thread does a number of such file operations and then exits marking the end of a process. The simulator continues to spawn threads; thread execution can overlap just as real processes do. Now, as operations are performed by threads, files get inserted, deleted, renamed and this is updated in the list of files.
I have not yet started coding. Does the plan seem sane? I am also not sure how to code the simulator...how will it spawn threads over a period of time. Should I be using some random delay to do this.
Thanks
Yep, that seems fairly reasonable to me. I would consider attempting to impose a statistical distribution over your file operations (and accesses to particular files) that is somehow matched to your expected workload. You might be able to find some statistics about typical filesystem workloads as a starting point.
That sounds about right for a decent test case just to make sure it's working. You could use sleep() to wait between spawning threads or just spawn them all at once and have them do an operation then wait a bit, then do another operation, etc... IMO if you hit it hard with a lot of requests and it works then there's a likely chance your filesystem will do just fine. Take an example from PostMark which all it does is append like crazy to different files and other benchmarks that do random access reads/writes in different locations to make sure that the page has to be read from disk.

Resources