The following program is not producing output. It enters the for loop and takes one value (via scanf) but after that the code block stops execution. Ideone (an online compiler and debugging tool) says that SIGXCPU signal is generated.
#include <stdio.h>
#include <stdlib.h>
long fact(int);
int z(int);
int main()
{
int i, n;
scanf("%d",&n);
int a[10];long b[10];
int c[10];
for(i=0;i<n;i++)
{
scanf("%d", &a[i]);
b[i]=fact(a[i]);
c[i]=z(b[i]);
}
printf("\n");
for(i=0; i<n; i++)
{
printf("%d", c[i]);
}
return 0;
}
long fact(int m)
{
if (m==1) return 1;
else return (m*fact(m-1));
}
int z (int s)
{
int c=0, temp;
temp=s%10;
if(temp!=0) return c;
else
{
c++; z(temp);
}
}
What does the SIGXCPU signal mean?
The SIGXCPU signal is sent each second to a process after it exceeds its limit on consumed processor time (RLIMIT_CPU), or, for realtime processes, its limit on running without sleeping.
The problem here is with your recursive z function that does not stop and calls itself again and again (and causes a stack overflow). Fix its stop condition.
From the signal man page:
Signal | Default Action | Description
-------+----------------+-------------------------
SIGXCPU| A | CPU time limit exceeded.
The default actions are as follows:
A - Abnormal termination of the process.
Additionally, implementation-defined abnormal termination actions,
such as creation of a core file, may occur.
It's probably consuming more CPU time and/or related resources than ideone will allow so your program doesn't overload the site.
Check your recursion and termination conditions.
If you find that no problem in recursion, then you can change the CPU time limit(RLIMIT_CPU) using:
setrlimit(); (set time slice for cpu)
In this program, we set cpu time to that process equal to 15 seconds. You can set as per requirement, but make sure it's low as soon as possible because your cpu are busy long time to executing that process and we loss the multitasking and concerancy of cpu.
struct rlimit v;
v.rlim_cur=15; //set the soft limit to 15 sec
setrlimit(RLIMIT_CPU,&v);
All done after that you set time slice to cpu 15 seconds.
Hope you understand.
Related
I'm taking an operating system class and my professor gave us this homework.
"Place __asm mfence in a proper position."
This problem is about using multiple threads and its side-effect.
Main thread is increasing shared_var but thread_1 is doing it in the same time.
Thus, shared_var becomes 199048359.000 when the code is increasing number 2000000 times.
The professor said __asm mfence will solve this issue. But, I do not know where to place it.
I'm trying to search the problem on google, github and here but I cannot find a source.
I do not know this is a stupid question because I'm not majoring in computer science.
Also, I would like to know why this code shows 199948358.0000 not 2000000.00
Any help would be greatly appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <conio.h>
int turn;
int interested[2];
void EnterRegion(int process);
void LeaveRegion(int process);
DWORD WINAPI thread_func_1(LPVOID lpParam);
volatile double shared_var = 0.0;
volatile int job_complete[2] = {0, 0};
int main(void)
{
DWORD dwThreadId_1, dwThrdParam_1 = 1;
HANDLE hThread_1;
int i, j;
// Create Thread 1
hThread_1 = CreateThread(
NULL, // default security attributes
0, // use default stack size
thread_func_1, // thread function
&dwThrdParam_1, // argument to thread function
0, // use default creation flags
&dwThreadId_1
); // returns the thread identifier
// Check the return value for success.
if (hThread_1 == NULL)
{
printf("Thread 1 creation error\n");
exit(0);
}
else
{
CloseHandle( hThread_1 );
}
/* I am main thread */
/* Now Main Thread and Thread 1 runs concurrently */
for (i = 0; i < 10000; i++)
{
for (j = 0; j < 10000; j++)
{
EnterRegion(0);
shared_var++;
LeaveRegion(0);
}
}
printf("Main Thread completed\n");
job_complete[0] = 1;
while (job_complete[1] == 0) ;
printf("%f\n", shared_var);
_getch();
ExitProcess(0);
}
DWORD WINAPI thread_func_1(LPVOID lpParam)
{
int i, j;
for (i = 0; i < 10000; i++) {
for (j = 0; j < 10000; j++)
{
EnterRegion(1);
shared_var++;
LeaveRegion(1);
}
}
printf("Thread_1 completed\n");
job_complete[1] = 1;
ExitThread(0);
}
void EnterRegion(int process)
{
_asm mfence;
int other;
other = 1 - process;
interested[process] = TRUE;
turn = process;
while (turn == process && interested[other] == TRUE) {}
_asm mfence;
}
void LeaveRegion(int process)
{
_asm mfence;
interested[process] = FALSE;
_asm mfence;
}
The EnterRegion() and LeaveRegion() functions are implementing a critical region using a thing called "Peterson's algorithm".
Now, the key to Peterson's algorithm is that when a thread reads turn it must get the latest (most recent) value written by any thread. That is, operations on turn must be Sequentially Consistent. Also, the write to interested[] in EnterRegion() must become visible to all threads before (or at the same time) as the write to turn.
So the place to put the mfence is after the turn = process ; -- so that the thread does not proceed until its write to turn is visible to all other threads.
It is also important to persuade the compiler to read from memory every time it reads turn and interested[], so you should set them volatile.
If you are writing this for x86 or x86_64, that is sufficient -- because they are generally "well behaved", so that:
all the writes to turn and interested[process] will occur in program order
all the reads of turn and interested[other] will also occur in program order
and setting those volatile ensures that the compiler doesn't fiddle with the order, either.
The reason for using the mfence on the x86 and x86_64 in this case is to flush the write queue to memory before proceeding to read the turn value. So, all memory writes go into a queue, and at some time in the future each write will reach actual memory, and the effect of the write will become visible to other threads -- the write has "completed". Writes "complete" in the same order the program did them, but delayed. If the thread reads something it has written recently, the processor will pick the (most recent) value out of the write queue. This means that the thread does not need to wait until the write "completes", which is generally a Good Thing. However, it does mean that the thread is not reading the same value that any other thread will read, at least until the write does "complete". What the mfence does is to stall the processor until all outstanding writes have "completed" -- so any following reads will read the same thing any other thread would read.
The write to interested[] in LeaveRegion() does not (on x86/x86_64) require an mfence, which is good because mfence is a costly operation. Each thread only ever writes to its own interested[] flag and only ever reads the other's. The only constraint on this write is that it must not "complete" after the write in EnterRegion() (!). Happily the x86/x86_64 does all writes in order. [Though, of course, after the write in LeaveRegion() the write in EnterRegion() may "complete" before the other thread reads the flag.]
For other devices, you might want other fences to enforce the ordering of reads/writes of turn and interested[]. But I don't pretend to know enough to advise on ARM or POWERPC or anything else.
I have a simple program using signal with the user's handlers.
#include <signal.h>
#include <stdio.h>
#include <zconf.h>
int x = 0;
int i = 3;
void catcher3(int signum) {
i = 1;
}
void catcher2(int signum) {
// Stuck in infinity loop here.
// Happens even with i == 0
if (i != 0) {
x = 5;
}
}
void catcher1(int signum) {
printf("i = %d\n", i);
i--;
if (i == 0) {
signal(SIGFPE, catcher2);
signal(SIGTERM, catcher3);
}
}
int main() {
signal(SIGFPE, catcher1);
x = 10 / x;
printf("Goodbye");
}
While I expect it to print:
3
2
1
Goodbye
It actually prints:
3
2
1
# Infinity loop within catcher2
My questions are:
On running a user handler like catcher1, to which point the code returns after the handler's execution? I would expect it continue the execution but it re-runs the signal handler.
What causes the infinity loop?
How to fix it?
Why sending SIGTERM won't print "Goodbye"? (kill -s TERM <pid>)
As pointed out by AProgrammer, the program doesn't necessarily read x after returning from the handler, even if x is marked volatile (which it should be anyway). This is because the execution continues to the offending instruction. The read from memory and the actual division could be separate instructions.
To get around this you will have to continue the execution to a point before x was read from memory.
You can modify your program as follows -
#include <csetjmp>
jmp_buf fpe;
volatile int x = 0; // Notice the volatile
volatile int i = 3;
void catcher2(int signum) {
if (i != 0) {
x = 5;
longjump(fpe, 1);
}
}
int main() {
signal(SIGFPE, catcher1);
setjump(fpe);
x = 10 / x;
printf("Goodbye");
}
Rest of the functions can remain the same.
You should also not be using printf from the signal handler. Instead use write directly to print debug messages as -
write(1, "SIGNAL\n", sizeof("SIGNAL\n"));
The handling of signals is complex and full of implementation defined, unspecified and undefined behavior. If you want to be portable, there is in fact very few things that you can do. Mostly reading and writing volatile sig_atomic_t and calling _Exit. Depending on the signal number, it is often undefined if you leave the signal handler in another way than calling _Exit.
In your case, I think FPE is one of those signals for which leaving normally the signal handler is UB. The best I can see is restarting the machine instruction which triggered the signal. Few architectures, and last I looked x86 was not one of them, provide a way to do 10/x without loading x in a register; that means that restarting the instruction will always restart the signal, even if you modify x and x us a volatile sig_atomtic_t.
Usually longjmp is also able to leave signal handler. #Bodo confirmed that using setjmp and longjmp to restart the division, you can get the behavior you want.
Note: on Unix there is another set of functions, sigaction, siglongjump and others, which is better to use. In fact I don't recommend using something else in any serious program.
I am using pthreads with gcc. The simple code example takes the number of threads "N" as a user-supplied input. It splits up a long array into N roughly equally sized subblocks. Each subblock is written into by individual threads.
The dummy processing for this example really involves sleeping for a fixed amount of time for each array index and then writing a number into that array location.
Here's the code:
/******************************************************************************
* FILE: threaded_subblocks_processing
* DESCRIPTION:
* We have a bunch of parallel processing to do and store the results in a
* large array. Let's try to use threads to speed it up.
******************************************************************************/
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>
#define BIG_ARR_LEN 10000
typedef struct thread_data{
int start_idx;
int end_idx;
int id;
} thread_data_t;
int big_result_array[BIG_ARR_LEN] = {0};
void* process_sub_block(void *td)
{
struct thread_data *current_thread_data = (struct thread_data*)td;
printf("[%d] Hello World! It's me, thread #%d!\n", current_thread_data->id, current_thread_data->id);
printf("[%d] I'm supposed to work on indexes %d through %d.\n", current_thread_data->id,
current_thread_data->start_idx,
current_thread_data->end_idx-1);
for(int i=current_thread_data->start_idx; i<current_thread_data->end_idx; i++)
{
int retval = usleep(1000.0*1000.0*10.0/BIG_ARR_LEN);
if(retval)
{
printf("sleep failed");
}
big_result_array[i] = i;
}
printf("[%d] Thread #%d done, over and out!\n", current_thread_data->id, current_thread_data->id);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
if (argc!=2)
{
printf("usage: ./a.out number_of_threads\n");
return(1);
}
int NUM_THREADS = atoi(argv[1]);
if (NUM_THREADS<1)
{
printf("usage: ./a.out number_of_threads (where number_of_threads is at least 1)\n");
return(1);
}
pthread_t *threads = malloc(sizeof(pthread_t)*NUM_THREADS);
thread_data_t *thread_data_array = malloc(sizeof(thread_data_t)*NUM_THREADS);
int block_size = BIG_ARR_LEN/NUM_THREADS;
for(int i=0; i<NUM_THREADS-1; i++)
{
thread_data_array[i].start_idx = i*block_size;
thread_data_array[i].end_idx = (i+1)*block_size;
thread_data_array[i].id = i;
}
thread_data_array[NUM_THREADS-1].start_idx = (NUM_THREADS-1)*block_size;
thread_data_array[NUM_THREADS-1].end_idx = BIG_ARR_LEN;
thread_data_array[NUM_THREADS-1].id = NUM_THREADS;
int ret_code;
long t;
for(t=0;t<NUM_THREADS;t++){
printf("[main] Creating thread %ld\n", t);
ret_code = pthread_create(&threads[t], NULL, process_sub_block, (void *)&thread_data_array[t]);
if (ret_code){
printf("[main] ERROR; return code from pthread_create() is %d\n", ret_code);
exit(-1);
}
}
printf("[main] Joining threads to wait for them.\n");
void* status;
for(int i=0; i<NUM_THREADS; i++)
{
pthread_join(threads[i], &status);
}
pthread_exit(NULL);
}
and I compile it with
gcc -pthread threaded_subblock_processing.c
and then I call it from command line like so:
$ time ./a.out 4
I see a speed up when I increase the number of threads. With 1 thread the process takes just a little over 10 seconds. This makes sense because I sleep for 1000 usec per array element, and there are 10,000 array elements. Next when I go to 2 threads, it goes down to a little over 5 seconds, and so on.
What I don't understand is that I get a speed-up even after my number of threads exceeds the number of cores on my computer! I have 4 cores, so I was expecting no speed-up for >4 threads. But, surprisingly, when I run
$ time ./a.out 100
I get a 100x speedup and the processing completes in ~0.1 seconds! How is this possible?
Some general background
A program's progress can be slowed by many things, but, in general, you can slow spots (otherwise known as hot spots) into two categories:
CPU Bound: In this case, the processor is doing some heavy number crunching (like trigonometric functions). If all the CPU's cores are engaged in such tasks, other processes must wait.
Memory bound: In this case, the processor is waiting for information to be retrieved from the hard disk or RAM. Since these are typically orders of magnitude slower than the processor, from the CPU's perspective this takes forever.
But you can also imagine other situations in which a process must wait, such as for a network response.
In many of these memory-/network-bound situations, it is possible to put a thread "on hold" while the memory crawls towards the CPU and do other useful work in the meantime. If this is done well then a multi-threaded program can well out-perform its single-threaded equivalent. Node.js makes use of such asynchronous programming techniques to achieve good performance.
Here's a handy depiction of various latencies:
Your question
Now, getting back to your question: you have multiple threads going, but they are performing neither CPU-intensive nor memory-intensive work: there's not much there to take up time. In fact, the sleep function is essentially telling the operating system that no work is being done. In this case, the OS can do work in other threads while your threads sleep. So, naturally, the apparent performance increases significantly.
Note that for low-latency applications, such as MPI, busy waiting is sometimes used instead of a sleep function. In this case, the program goes into a tight loop and repeatedly checks a condition. Externally, the effect looks similar, but sleep uses no CPU while the busy wait uses ~100% of the CPU.
I have a thread pool with about 100 threads. During testing, when I introduce some anomalous conditions, the overall process becomes very slow. Once I make the things normal, the process becomes fast again. Therefore, all threads are running.
I want to detect which threads get slow in particular. For this, I want to write another thread whose responsibility will be to keep an eye on other threads, and report periodically which of them is waiting for a resource to get released. Is there a way (in Pthread) I can find which threads are waiting for some resource to get released, i.e. which threads are "hung" -- if it is a right term to use?
System: C, Pthread, Linux
PS: Please mention in comments if you need any other details.
I'm probably really old-fashioned, but I say just instrument your code and measure it yourself. For example, add something like the following code (temporarily) to your program, and do a search-and-replace to change all your program's pthread_mutex_lock() calls to instrumented_pthread_mutex_lock().
Then run your program with stdout redirected to a file. Afterwards, you can look in the file and see which threads were waiting a long time for which mutexes.
(Note that the printf() calls will change the timing of your program somewhat, but for this purpose I don't think it will matter much)
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>
static unsigned long long GetCurrentClockTimeMicroseconds()
{
static clock_t _ticksPerSecond = 0;
if (_ticksPerSecond <= 0) _ticksPerSecond = sysconf(_SC_CLK_TCK);
struct tms junk; clock_t newTicks = (clock_t) times(&junk);
return ((((unsigned long long)newTicks)*(1000000))/_ticksPerSecond);
}
int instrumented_pthread_mutex_lock(pthread_mutex_t * mtx)
{
unsigned long long beforeTime = GetCurrentClockTimeMicroseconds();
int ret = pthread_mutex_lock(mtx);
unsigned long long afterTime = GetCurrentClockTimeMicroseconds();
unsigned long long elapsedTime = (afterTime-beforeTime);
if (elapsedTime > 1000) // or whatever threshold you like; I'm using 1 millisecond here
{
printf("Thread %li took %llu microseconds to acquire mutex %p\n", (long int) pthread_self(), elapsedTime, mtx);
}
return ret;
}
I begin learn about the Linux C,but i meet the problem so confused me.
I use the function times.but return the value equals 0.
ok i made the mistak,I changed the code:
But the is not much relative with printf. clock_t is define with long in Linux.so i convert clock_t to long.
This is my code:
#include <sys/times.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
long clock_times;
struct tms begintime;
sleep(5);
if((clock_times=times(&begintime))==-1)
perror("get times error");
else
{
printf("%ld\n",(long)begintime.tms_utime);
printf("%ld\n",(long)begintime.tms_stime);
printf("%ld\n",(long)begintime.tms_cutime);
printf("%ld\n",(long)begintime.tms_cstime);
}
return 0;
}
the output:
0
0
0
0
the also return 0;
and I using gdb to debug,Also the variable of begintimes also to be zero.
there is no relative with printf function.
Please
This is not unusual; the process simply hasn't used enough CPU time to measure. The amount of time the process spends in sleep() doesn't count against the program's CPU time, as times() measures The CPU time charged for the execution of user instructions (among other related times) which is the amount of time the process has spent executing user/kernel code.
Change your program to the following which uses more CPU and can therefore be measured:
#include <sys/times.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
long clock_times;
struct tms begintime;
unsigned i;
for (i = 0; i < 1000000; i++)
time(NULL); // An arbitrary library call
if((clock_times=times(&begintime))==-1)
perror("get times error");
else
{
printf("%ld %ld %ld %ld\n",
(long)begintime.tms_utime,
(long)begintime.tms_stime,
(long)begintime.tms_cutime,
(long)begintime.tms_cstime);
}
return 0;
}
Your code using close to none CPU time, so results are correct. Sleep suspends your program execution - everything that happens in this time is not your execution time, so it wouldn't be counted.
Add empty loop and you'll see the difference. (ofc., disable compiler optimisations - or empty loop will be removed).
Take a look at 'time' program output (time ./a.out) - it prints 'real' time (estimated by gettimeofday(), i suppose), user time (time wasted by your userspace code) and system time (time wasted within system calls - e.g. write to file, open network connection, etc.).
(sure, by 'wasted' i mean 'used', but whatever)