I'm having some trouble writing Pseduocode for a homework assignment in my operating systems class in which we are programming in C.
You will be implementing a Producer-Consumer program with a bounded buffer queue of N elements, P producer threads and C consumer threads
(N, P and C should be command line arguments to your program, along with three additional parameters, X, Ptime and Ctime, that are described below). Each
Producer thread should Enqueue X different numbers onto the queue (spin-waiting for Ptime*100,000 cycles in between each call to Enqueue). Each Consumer thread
should Dequeue P*X/C items from the queue (spin-waiting for Ctime*100,000 cycles
in between each call to Dequeue). The main program should create/initialize the
Bounded Buffer Queue, print a timestamp, spawn off C consumer threads & P
producer threads, wait for all of the threads to finish and then print off another
timestamp & the duration of execution.
My main difficulty is understanding what my professor means by spin-waiting for the variables times 100,000. I have bolded the section that is confusing me.
I understand a time stamp will be used to print the difference between each thread. We are using semaphores and implementing synchronization at the moment. Any suggestions on the above queries would be much appreciated.
I'm guessing it means busy-waiting; repeatedly checking the loop condition and consuming unnecessary CPU power in a tight loop:
while (current_time() <= wake_up_time);
One would ideally use something that suspends your thread until it's woken up externally, by the scheduler (so resources such as the CPU can be diverted elsewhere):
sleep(2 * 60 * 1000 ms);
or at least give up some CPU (i.e. not be so tight):
while (current_time() <= wake_up_time)
sleep(100 ms);
But I guess they don't want you to manually invoke the scheduler, hinting the OS (or your threading library) that it's a good time to make a context switch.
I'm not sure what cycles are; in assembly they might be CPU cycles but given that your question is tagged C, I'll bet that they're simply loop iterations:
for (int i=0; i<Ptime*100000; ++i); //spin-wait for Ptime*100,000 cycles
Though it's always safest to ask whoever issued the homework.
busy-waiting or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input is available, or if a lock is available.
so the assignment says to wait for Ptime*100000 time before producing next element and enqueue x different elements after the condition is true
similarly Each Consumer thread
should Dequeue P*X/C items from the queue and wait for ctime*100000 after every consumption of item
I suspect that your professor is being a complete putz - by actually ASKING for the worste "busy waiting" technique in existance:
int n = pTime * 100000;
for ( int i=0; i<n; ++i) ; // waste some cycles.
I also suspect that he still uses a pterosaur thigh-bone as a walking stick, has a very nice (dry) cave, and a partner with a large bald patch.... O/S guys tend to be that way. It goes with the cool beards.
No wonder his thoroughly modern students misunderstand him. He needs to (re)learn how to grunt IN TUNE.
Cheers. Keith.
Related
so I'm trying to reduce the CPU utilization in my mock operating system. I have a function to suspend execution of a calling thread until time has advanced by at least x timer ticks. Unless the system is idle, the thread doesn't have to wake up after exactly x ticks have passed, it is placed back in the ready threads queue after it's waited for the right amount of ticks. Ideally there's some semaphore implementation for this, but I don't know how. That said, if you have a better solution to semaphores, it would be interesting to hear that also.
For some reference here's the initial code with descriptions:
void timer_sleep (int64_t ticks){
int64_t start = timer_ticks ();
ASSERT (intr_get_level () == INTR_ON);
while (timer_elapsed (start) < ticks)
thread_yield ();
}
timer_ticks - gives us how many ticks have passed since the OS started.
timer_elapsed - tells us the amount of timer_ticks() passed since another number of ticks.
intr_get_level - is used to get the value of whether or not interrupts are enabled or disabled.
thread_yield - This Yields the CPU. The current thread is not put to sleep and may be scheduled again immediately at the schedulers whim.
I'm looking to create a state of uninterruptible sleep for a program I'm writing. Any tips or ideas about how to create this state would be helpful.
So far I've looked into the wait_event() function defined in wait.h, but was having little luck implementing it. When trying to initialize my wait queue the compiler complained
warning: parameter names (without types) in function declaration
static DECLARE_WAIT_QUEUE_HEAD(wq);
Has anyone had any experience with the wait_event() function or creating an uninterruptible sleep?
The functions that you're looking at in include/linux/wait.h are internal to the Linux kernel. They are not available to userspace.
Generally speaking, uninterruptible sleep states are considered undesirable. Under normal circumstances, they cannot be triggered by user applications except by accident (e.g, by attempting to read from a storage device that is not responding correctly, or by causing the system to swap).
You can make sleep 'signal-aware`.
sleep can be interrupted by signal. In which case the pause would be stopped and sleep would return with amount of time still left. The application can choose to handle the signal notified and if needed resume sleep for the time left.
Actually, you should use synchronization objects provided by the operating system you're working on or simply check the return value of sleep function. If it returns to a value bigger than zero, it means your procedure was interrupted. According to this return value, call sleep function again by passing the delta (T-returnVal) as argument (probably in a loop, in case of possible interrupts that might occur again in that time interval)
On the other hand, if you really want a real-uninterruptible custom sleep function, I may suggest something like the following:
void uninterruptible_sleep(long time, long factor)
{
long i, j;
__asm__("cli"); // close interrupts
for(i=0; i<time; ++i)
for(j=0; j<factor; ++j)
; // custom timer loop
__asm__("sti"); // open interrupts
}
cli and sti are x86 assembly instructions which allow us to set IF (interrupt flag) of the cpu. In this way, it is possible to clear (cli) or set (sti) all the interrupts. However, if you're working on a multi-processor system, there needs to be taken another synchronization precautions too, due to the fact that these instructions will only be valid for single microprocessor. Moreover, this type of function as I suggested above, will be very system (cpu) dependant. Because, the inner loop requires a clock-cycle count to measure an exact time interval (execution number of instructions per second) depending on the cpu frequency. Thus, if you really want to get rid of every possible interrupt, you may use a function as I suggested above. But be careful, if your program gets a deadlock situation while it's in cli state, you will need to restart your system.
(The inline assembly syntax I have written is for gcc compiler)
Come someone please tell me how this function works? I'm using it in code and have an idea how it works, but I'm not 100% sure exactly. I understand the concept of an input variable N incrementing down, but how the heck does it work? Also, if I am using it repeatedly in my main() for different delays (different iputs for N), then do I have to "zero" the function if I used it somewhere else?
Reference: MILLISEC is a constant defined by Fcy/10000, or system clock/10000.
Thanks in advance.
// DelayNmSec() gives a 1mS to 65.5 Seconds delay
/* Note that FCY is used in the computation. Please make the necessary
Changes(PLLx4 or PLLx8 etc) to compute the right FCY as in the define
statement above. */
void DelayNmSec(unsigned int N)
{
unsigned int j;
while(N--)
for(j=0;j < MILLISEC;j++);
}
This is referred to as busy waiting, a concept that just burns some CPU cycles thus "waiting" by keeping the CPU "busy" doing empty loops. You don't need to reset the function, it will do the same if called repeatedly.
If you call it with N=3, it will repeat the while loop 3 times, every time counting with j from 0 to MILLISEC, which is supposedly a constant that depends on the CPU clock.
The original author of the code have timed and looked at the assembler generated to get the exact number of instructions executed per Millisecond, and have configured a constant MILLISEC to match that for the for loop as a busy-wait.
The input parameter N is then simply the number of milliseconds the caller want to wait and the number of times the for-loop is executed.
The code will break if
used on a different or faster micro controller (depending on how Fcy is maintained), or
the optimization level on the C compiler is changed, or
c-compiler version is changed (as it may generate different code)
so, if the guy who wrote it is clever, there may be a calibration program which defines and configures the MILLISEC constant.
This is what is known as a busy wait in which the time taken for a particular computation is used as a counter to cause a delay.
This approach does have problems in that on different processors with different speeds, the computation needs to be adjusted. Old games used this approach and I remember a simulation using this busy wait approach that targeted an old 8086 type of processor to cause an animation to move smoothly. When the game was used on a Pentium processor PC, instead of the rocket majestically rising up the screen over several seconds, the entire animation flashed before your eyes so fast that it was difficult to see what the animation was.
This sort of busy wait means that in the thread running, the thread is sitting in a computation loop counting down for the number of milliseconds. The result is that the thread does not do anything else other than counting down.
If the operating system is not a preemptive multi-tasking OS, then nothing else will run until the count down completes which may cause problems in other threads and tasks.
If the operating system is preemptive multi-tasking the resulting delays will have a variability as control is switched to some other thread for some period of time before switching back.
This approach is normally used for small pieces of software on dedicated processors where a computation has a known amount of time and where having the processor dedicated to the countdown does not impact other parts of the software. An example might be a small sensor that performs a reading to collect a data sample then does this kind of busy loop before doing the next read to collect the next data sample.
I am writing a Gif animator in C.
I have two threads running in parallel, both . The first allows the user to alter the speed of the animation. The second draws the current frame, and then calls Sleep(Constant * 100 / CurrentSpeed), where CurrentSpeed is a percentage amount, ranging from 1 to 200.
The problem is that if you quickly change the speed from 100%, to 1%, and then back to the first, the second thread will execute the following:
Sleep(Constant * 100)
This will draw frame A, wait many seconds (although the speed was changed by the user), and only then draw B and the following frames in the default speed.
It seems to me that Sleep is a poor choice of mine in this case. What can I do to solve this problem?
EDIT:
The code I currently have (Simplified):
while (1) {
InvalidateRect(Handle, &ImageRect, FALSE);
if (shouldDispose) {
break;
}
if (DelayTime)
Sleep(DelayTime * 100 / CurrentSpeed);
SelectNextImage();
}
Instead of calling Sleep() with the desired frame rate, why don't you call it with a constant interval of 1 ms, for example, and use a variable as a counter?
For example, let C be a global variable (counter) which is loaded with a number of 'ticks' of 1ms. Then, write the loop:
while(1) { //Main loop of the player thread
if (C > 0) C--;
if (C == 0) nextframe(); //if counter reaches 0, load next frame.
Sleep(1);
}
The control thread would load C with a number of 1ms ticks (i.e. frame rate), and the player thread will never be stopped beyond 1 ms. The use of 1ms as the base rate is arbitrary. Use the minimum time that allows you the maximum frame rate, in order to load CPU the less as possible.
EDIT
After some hot comments (arguing is good after all), I'd like to point out that this solution is sub-optimal, i.e., it doesn't use any OS mechanism for signaling threads or any other API for preventing the thread from wasting CPU time. The solution shown here is generic: it may be used in any system (even in embedded systems without any running OS. But above all, it is based on the original code posted by the user that asked the question: using Sleep(), how can I achieve my purpose. I give him my humble answer. Anyway, I encourage other people to write sample code using the appropriate API for achieving the same goal. With no hard feelings, special thanks to Martin James.
Find a synchro API on your OS that allows a wait with a timeout, eg. WaitForSingleObject() on Windows. If you want to change the delay, change the timeout and signal the event upon which the WFSO is waiting to make it return 'early' and restart the wait with the new timeout.
Polling with Sleep(1) loops is rarely justifiable.
Create a waitable timer. When you set the timer, you can specify a callback function that will run in the setting thread's context. This means you can do it with two threads, but it actually works just fine with only a single thread as well.
The main advantage of a waitable timer is, however, that it is more accurate and more reliable than Sleep. A timer is conceptually much different from Sleep insofar as Sleep only gives up control and the scheduler marks the thread as ready to run when the time is up and when the scheduler runs anyway. It doesn't do anything beyond that. Which means that the thread will eventually be scheduled to run again, like any other thread that is ready.
A thread that is waiting on a timer (or other waitable object) causes the scheduler to run when the timer is up and has its priority temporarily boosted. It therefore runs not only more reliably and more closely to the desired time, but also earlier than all other threads with the same base priority. Which does not give a realtime guarantee but at least gives a sort of "soft guarantee".
If you still want to use Sleep, use SleepEx instead which you can alert, either by queueing an APC, or by calling the undocumented NtAlertThread function.
In any case, Sleep is troublesome not only because of being unreliable, but also because it bases on the granularity of the system-wide timer. Which you can, of course, set to as low as 1ms (or less on some systems), but that will cause a lot of unnecessary interrupts.
I've written a program that executes some calculations and then merges the results.
I've used multi-threading to calculate in parallel.
During the phase of merge result, each thread will lock the global array, and then append individual part to it, and some extra work will be done to eliminate the repetitions.
I test it and find that the cost on merging increases with the number of threads, and the rate is unexpected:
2 thread: 40,116,084(us)
6 thread:511,791,532(us)
Why: what occurs when the number of threads increases? How do I change this?
--------------------------slash line -----------------------------------------------------
Actually, the code was very simply, there is the pseudo-code:
typedef my_object{
long no;
int count;
double value;
//something others
} my_object_t;
static my_object_t** global_result_array; //about ten thounds
static pthread_mutex_t global_lock;
void* thread_function(void* arg){
my_object_t** local_result;
int local_result_number;
int i;
my_object_t* ptr;
for(;;){
if( exit_condition ){ return NULL;}
if( merge_condition){
//start time point to log
pthread_mutex_lock( &global_lock);
for( i = local_result_number-1; i>=0 ;i++){
ptr = local_result[ i] ;
if( NULL == global_result_array[ ptr->no] ){
global_result_array[ ptr->no] = ptr; //step 4
}else{
global_result_array[ ptr->no] -> count += ptr->count; // step 5
global_result_array[ ptr->no] -> value += ptr->value; // step 6
}
}
pthread_mutex_unlock( &global_lock); // end time point to log
}else{
//do some calculation and produce the partly and thread-local result ,namely the local_result and local_result_number
}
}
}
As above, the difference between two threads and six threads are step 5 and step6, i has counted that there were about hundreds millions order of execution of step 5 and 6. The others are same.
So, from my view, the merge operation was very light, in spite of using 2 thread or 6 thread, they both need to lock and do merge exclusively.
Another astonished thing was : when using six thread, the cost on step 4 was boomed! It was the boot reason that the total cost was boomed!
btw: The test server has two cpus ,each cpu has four cores.
There are various reasons for the behaviour shown:
More threads means more locks and more blocking time among threads. As is apparent from your description, your implementation uses mutex locks or something similar. The speed-up with threads is better if the data sets are largely exclusive.
Unless your system has as many processors/cores as the number of threads, all of them cannot run concurrently. You can set the maximum concurrency using pthread_setconcurrency.
Context switching is an overhead. Hence the difference. If your computer had 6 cores it would be faster. Overwise you need to have more context switches for the threads.
This is a huge performance difference between 2/6 threads. I'm sorry, but you have to try very hard indeed to make such a huge discrepancy. You seem to have succeeded:((
As others have pointed out, using multiple threads on one data set only becomes worth it if the time spent on inter-thread communication, (locks etc.), is less than the time gained by the concurrent operations.
If, for example, you find that you are merging successively smaller data sections, (eg. with a merge sort), you are effectively optimizing the time wasted on inter-thread comms and cache-thrashing. This is why multi-threaded merge-sorts are frequently started with an in-place sort once the data has been divided up into a chunk less than the size of the L1 cache.
'each thread will lock the global array' - try to not do this. Locking large data structures for extended periods, or continually locking them for successive short periods, is a very bad plan. Locking the global once serializes the threads and generates one thread with too much inter-thread comms. Continualy locking/releasing generates one thread with far, far too much inter-thread comms.
Once the operations get so short that the returns are diminished to the point of uselessness, you would be better off queueing those operations to one thread that finishes off the job on its own.
Locking is often grossly over-used and/or misused. If I find myself locking anything for longer than the time taken to push/pop a pointer onto a queue or similar, I start to get jittery..
Without seeing/analysing the code, and more importantly, data,, (I guess both are complex), it's difficult to give any direct advice:(