I'm working on a program with critical sections, so I am using semaphores. Specifically, the POSIX semaphores: http://www.kernel.org/doc/man-pages/online/pages/man3/sem_close.3.html
According to http://www.sbin.org/doc/glibc/libc_34.html (search: macro SEM_VALUE_MAX ), there is a maximum value that the general semaphore can be set to. On my system, this is about 32K.
Unfortunately, I'm dealing with some time-sensitive code (reading form an arduino via a serial port at ~1MBit/s), so I'd like to have larger semaphores, because of some implementation details. Ideally, I'd like them to be able to be at least 2^20, but I'm a little unclear on why there is an upper limit anyway.
Is there any way to exceed this SEM_VALUE_MAX, and get a semaphore with a larger value? I could only think of:
Redefining SEM_VALUE_MAX
probably a horrible idea; I think those POSIX folks know what they're doing
Having a semaphore refer to more than one 'chunk' of data
right now, each up() or down() only acquires/releases a single 'chunk' -- an unsigned short int.
I imagine dealing with multiple at a time could cause deadlock.
Implementing my own semaphores.
time consuming / redundant work
less portable
Ask you wonderful folks what you think!
Thanks a bunch in advance!
Couldn't you just use the semaphore to protect access to another counter that you use to track the allocations. That way you don't need any more semaphore values than you have accessors.
How portable do you need your program to be? _POSIX_SEM_VALUE_MAX is the minimum value of SEM_VALUE_MAX that is POSIX-conforming. Larger values are always allowed. glibc and other real-world implementations I'm familiar with have SEM_VALUE_MAX defined much larger, usually equal to INT_MAX. The only way you need to worry about this is if you want your program to be portable to other POSIX systems that have a very low SEM_VALUE_MAX.
Related
I have a multithreaded application where I one producer thread(main) and multiple consumers.
Now from main I want to have some sort of percentage of how far into the work the consumers are. Implementing a counter is easy as the work that is done a loop. However since this loop repeats a couple of thousands of times, maybe even more than a million times. I don`t want to mutex this part. So I went looking into some atomic options of writing to an int.
As far as I understand I can use the builtin atomic functions from gcc:
https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
however, it doesn`t have a function for just reading the variable I want to work on.
So basically my question is.
can I read from the variable safely from my producer, as long as I use the atomic builtins for writing to that same variable in the consumer
or
do I need some sort of different function to read from the variable. and what function is that
Define "safely".
If you just use a regular read, on x86, for naturally aligned 32-bit or smaller data, the read is atomic, so you will always read a valid value rather than one containing some bytes written by one thread and some by another. If any of those things are not true (not x86, not naturally aligned, larger than 32 bits...) all bets are off.
That said, you have no guarantee whatsoever that the value read will be particularly fresh, or that the sequence of values seen over multiple reads will be in any particular order. I have seen naive code using volatile to defeat the compiler optimising away the read entirely but no other synchronisation mechanism, literally never see an updated value due to CPU caching.
If any of these things matter to you, and they really should, you should explicitly make the read atomic and use the appropriate memory barriers. The intrinsics you refer to take care of both of these things for you: you could call one of the atomic intrinsics in such a way that there is no side effect other than returning the value:
__sync_val_compare_and_swap(ptr, 0, 0)
or
__sync_add_and_fetch(ptr, 0)
or
__sync_sub_and_fetch(ptr, 0)
or whatever
If your compiler supports it, you can use C11 atomic types. They are introduced in the section 7.17 of the standard, but they are unfortunately optional, so you will have to check whether __STDC_NO_ATOMICS__ is defined to at least throw a meaningful error if it's not supported.
With gcc, you apparently need at least version 4.9, because otherwise the header is missing (here is a SO question about this, but I can't verify because I don't have GCC-4.9).
I'll answer your question, but you should know upfront that atomics aren't cheap. The CPU has to synchronize between cores every time you use atomics, and you won't like the performance results if you use atomics in a tight loop.
The page you linked to lists atomic operations for the writer, but says nothing about how such variables should be read. The answer is that your other CPU cores will "see" the updated values, but your compiler may "cache" the old value in a register or on the stack. To prevent this behavior, I suggest you declare the variable volatile to force your compiler not to cache the old value.
The only safety issue you will encounter is stale data, as described above.
If you try to do anything more complex with atomics, you may run into subtle and random issues with the order atomics are written to by one thread versus the order you see those changes in another thread. Unfortunately you're not using a built-in language feature, and the compiler builtins aren't designed perfectly. If you choose to use these builtins, I suggest you keep your logic very simple.
If I understood the problem, I would not use any atomic variable for the counters. Each worker thread can have a separate counter that it updates locally, the master thread can read the whole array of counters for an approximate snapshot value, so this becomes a 1 consumer 1 producer problem. The memory can be made visible to the master thread, for example, every 5 seconds, by using __sync_synchronize() or similar.
I am reviewing some source code and I was wondering if the following was thread safe? I have heard of compiler or CPU instruction/read reordering (would it have something to do with branch prediction?) and the Data->unsafe_variable variable below can be modified at any time by another thread.
My question is: depending on how the compiler/CPU reorder read/writes, would it be possible that the below code would allow the Data->unsafe_variable to be fetched twice? (see 2nd snippet)
Note: I do not worry about the first access, any data can be there as long as it does not pass the 'if', I am just concerned by the possibility that the data would be fetched another time after the 'if'. I was also wondering if the cast into volatile here would help preventing a double fetch?
int function(void* Data) {
// Data is allocated on the heap
// What it contains at this point is not important
size_t _varSize = ((volatile DATA *)Data)->unsafe_variable;
if (_varSize > x * y)
{
return FALSE;
}
// I do not want Data->unsafe_variable to be fetch once this point reached,
// I want to use the value "supposedly" stored in _varSize
// Would any compiler/CPU reordering would allow it to be double fetched?
size_t size = _varSize - t * q;
function_xy(size);
return TRUE;
}
Basically I do not want the program to behave like this for security reasons:
_varSize = ((volatile DATA *)Data)->unsafe_variable;
if (_varSize > x * y)
{
return FALSE;
}
size_t size = ((volatile DATA *)Data)->unsafe_variable - t * q;
function10(size);
I am simplifying here and they cannot use mutex. However, would it be safer to use _ReadWriteBarrier() or MemoryBarrier() after the fist line instead of a volatile cast? (VS compiler)
Edit: Giving slightly more context to the code.
The code is broken for many reasons. I'll just point out one of the more subtle ones as others have pointed out the more obvious ones. The object is not volatile. Casting a pointer to a pointer to a volatile object doesn't make the object volatile, it just lies to the compiler.
But there's a much bigger point -- you are going about this totally the wrong way. You are supposed to be checking whether the code is correct, that is, whether it is guaranteed to work. You aren't clever enough, nobody is, to think of every possible way the system might fail to do what you assume it will do. So instead, just don't make those assumptions.
Thinking about things like CPU read re-ordering is totally wrong. You should expect the CPU to do what, and only what, it is required to do. You should definitely not think about specific mechanisms by which it might fail, but only whether it is guaranteed to work.
What you are doing is like trying to figure out if an employee is guaranteed to show up for work by checking if he had his flu shot, checking if he is still alive, and so on. You can't check for, or even think of, every possible way he might fail to show up. So if find that you have to check those kinds of things, then it's not guaranteed and relying on it is broken. Period.
You cannot make reliable code by saying "the CPU doesn't do anything that can break this, so it's okay". You can make reliable code by saying "I make sure my code doesn't rely on anything that isn't guaranteed by the relevant standards."
You are provided with all the tools you need to do the job, including memory barriers, atomic operations, mutexes, and so on. Please use them.
You are not clever enough to think of every way something not guaranteed to work might fail. And you have a plethora of things that are guaranteed to work. Fix this code, and if possible, have a talk with the person who wrote it about using proper synchronization.
This sounds a bit ranty, and I apologize for that. But I've seen too much code that used "tricks" like this that worked perfectly on the test machines but then broke when a new CPU came out, a new compiler, or a new version of the OS. Fixing code like this can be an incredible pain because these hacks hide the actual synchronization requirements. The right answer is almost always to code clearly and precisely what you actually want, rather than to assume that you'll get it because you don't know of any reason you won't.
This is valuable advice from painful experience.
The standard(s) are clear. If any thread may be modifying the object, all accesses, in all threads, must be synchronized, or you have undefined behavior.
The only portable solution for C++ is C++11 atomics, which is available in upcoming VS 2012.
As for C, I do not know if recent C standards bring some portable facilities, I am not following that, but as you are using Visal Studio, it does not matter anyway, as Microsoft is not implementing recent C standards.
Still, if you know you are developing for Visual Studio, you can rely on guarantees provided by this compiler, which apply to both C and C++. Some of them are implicit (accessing volatile variables implies also some memory barriers applied), some are explicit, like using _MemoryBarrier intrinsic.
The whole topic of the memory model is discussed in depth in Lockless Programming Considerations for Xbox 360 and Microsoft Windows, this should give you a good overview. Beware: the topic you are entering is full of hard topics and nasty surprises.
Note: Relying on volatile is not portable, but if you are using old C / C++ standards, there is no portable solution anyway, therefore be prepared to facing the need of reimplementing this for different platform should the need ever arise. When writing portable threaded code, volatile is considered almost useless:
For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:
atomicity
memory consistency, i.e. the order of a thread's operations as seen by another thread.
Do I understand the new Std right that shared_ptr is not required to use a reference count? Only that it is likely that it is implemented this way?
I could imagine an implementation that uses a hidden linked-list somehow. In N3291 "20.7.2.2.5.(8) shared_ptr observers [util.smartptr.shared.obs]" The note says
[ Note: use_count() is not necessarily efficient. — end note ]
which gave me that idea.
You're right, nothing in the spec requires the use of an explicit "counter", and other possibilities exist.
For example, a linked-list implementation was suggested for the implementation of boost's shared_ptr; however, the proposal was ultimately rejected because it introduced costs in other areas (size, copy operations, and thread safety).
Abstract description
Some people say that shared_ptr is a "reference counter smart pointer". I don't think it is the right way to look at it.
Actually shared_ptr is all about (non-exclusive) ownership: all the shared_ptr that are copies of a shared_ptr initialised with a pointer p are owners.
shared_ptr keeps track of the set of owners, to guaranty that:
while the set of owners is non-empty, delete p is not called
when the set of owners becomes empty, delete p (or a copy of D the destruction functor) is called immediately;
Of course, to determine when the set of owners becomes empty, shared_ptr only needs a counter. The abstract description is just slightly easier to think about.
Possible implementations techniques
To keep track of the number of owners, a counter is not only the most obvious approach, it's also relatively obvious how to make thread-safe using atomic compare-and-modify.
To keep track all the owners, a linked list of owner is not only the obvious solution, but also an easy way to avoid the need to allocate any memory for each set of owners. The problem is that it isn't easy to make such approach efficiently thread safe (anything can be made thread safe with a global lock, which is against the very idea of parallelism).
In the case of multi-thread implementation
On the one hand, we have a small, fix-size (unless the custom destruction function is used) memory allocation, that's very easy to optimise, and simple integer atomic operations.
On the other hand, there is costly and complicated linked-list handling; and if a per owners set mutex is needed (as I think it is), the cost of memory allocation is back, at which point we can just replace the mutex with the counter!
About multiple possible implementations
How many times I have read that many implementations are possible for a "standard" class?
Who has never heard this fantasy that the complex class that could be implemented as polar coordinates? This is idiotic, as we all know. complex must use Cartesian coordinates. In case polar coordinates are preferred, another class must be created. There is no way a polar complex class is going to be used as a drop-in replacement for the usual complex class.
Same for a (non-standard) string class: there is no reason for a string class to be internally NUL terminated and not store the length as an integer, just for the fun and inefficiency of repeatedly calling strlen.
We now know that designing std::string to tolerate COW was a bad idea that is the reason for the unusual invalidation semantics of const iterators.
std::vector is now guaranteed to be continuous.
The end of the fantasy
At some point, the fantasy where standard classes have many significantly different reasonable implementations has to be dropped. Standard classes are primitive building blocks; not only they should be very efficient, they should have predictable efficiency.
A programmer should be able to make portable assumptions about the relative speed of basic operations. A complex class is useless for serious number crunching if even the simplest addition turns into a bunch a transcendental computations. If a string class is not guaranteed to have very fast copy via data sharing, the programmer will have to minimize string copies.
An implementer is free to choose a different implementation techniques only when it doesn't make a common cheap operation extremely costly (by comparison).
For many classes, this means that there is exactly one viable implementation strategy, with sometimes a few degrees of liberty (like the size of a block in a std::deque).
in a current project I dared to do away with the old 0 rule, i.e. returning 0 on success of a function. How is this seen in the community? The logic that I am imposing on the code (and therefore on the co-workers and all subsequent maintenance programmers) is:
.>0: for any kind of success/fulfillment, that is, a positive outcome
==0: for signalling no progress or busy or unfinished, which is zero information about the outcome
<0: for any kind of error/infeasibility, that is, a negative outcome
Sitting in between a lot of hardware units with unpredictable response times in a realtime system, many of the functions need to convey exactly this ternary logic so I decided it being legitimate to throw the minimalistic standard return logic away, at the cost of a few WTF's on the programmers side.
Opininons?
PS: on a side note, the Roman empire collapsed because the Romans with their number system lacking the 0, never knew when their C functions succeeded!
"Your program should follow an existing convention if an existing convention makes sense for it."
Source: The GNU C Library
By deviating from such a widely known convention, you are creating a high level of technical debt. Every single programmer that works on the code will have to ask the same questions, every consumer of a function will need to be aware of the deviation from the standard.
http://en.wikipedia.org/wiki/Exit_status
I think you're overstating the status of this mythical "rule". Much more often, it's that a function returns a nonnegative value on success indicating a result of some sort (number of bytes written/read/converted, current position, size, next character value, etc.), and that negative values, which otherwise would make no sense for the interface, are reserved for signalling error conditions. On the other hand, some functions need to return unsigned results, but zero never makes sense as a valid result, and then zero is used to signal errors.
In short, do whatever makes sense in the application or library you are developing, but aim for consistency. And I mean consistency with external code too, not just your own code. If you're using third-party or library code that follows a particular convention and your code is designed to be closely coupled to that third-party code, it might make sense to follow that code's conventions so that other programmers working on the project don't get unwanted surprises.
And finally, as others have said, whatever your convention, document it!
It is fine as long as you document it well.
I think it ultimately depends on the customers of your code.
In my last system we used more or less the same coding system as yours, with "0" meaning "I did nothing at all" (e.g. calling Init() twice on an object). This worked perfectly well and everybody who worked on that system knew this was the convention.
However, if you are writing an API that can be sold to external customers, or writing a module that will be plugged into an existing, "standard-RC" system, I would advise you to stick to the 0-on-success rule, in order to avoid future confusion and possible pitfalls for other developers.
And as per your PS, when in Rome, do like the romans do :-)
I think you should follow the Principle Of Least Astonishment
The POLA states that, when two
elements of an interface conflict, or
are ambiguous, the behaviour should be
that which will least surprise the
user; in particular a programmer
should try to think of the behavior
that will least surprise someone who
uses the program, rather than that
behavior that is natural from knowing
the inner workings of the program.
If your code is for internal consumption only, you may get away with it, though. So it really depends on the people your code will impact :)
There is nothing wrong with doing it that way, assuming you document it in a way that ensures others know what you're doing.
However, as an alternative, if might be worth exploring the option to return an enumerated type defining the codes. Something like:
enum returnCode {
SUCCESS, FAILURE, NO_CHANGE
}
That way, it's much more obvious what your code is doing, self-documenting even. But might not be an option, depending on your code base.
It is a convention only. I have worked with many api that abandon the principle when they want to convey more information to the caller. As long as your consistent with this approach any experienced programmer will quickly pick up the standard. What is hard is when each function uses a different approach IE with win32 api.
In my opinion (and that's the opinion of someone who tends to do out-of-band error messaging thanks to working in Java), I'd say it is acceptable if your functions are of a kind that require strict return-value processing anyway.
So if the return value of your method has to be inspected at all points where it's called, then such a non-standard solution might be acceptable.
If, however, the return value might be ignored or just checked for success at some points, then the non-standard solution produces quite some problem (for example you can no longer use the if(!myFunction()) ohNoesError(); idiom.
What is your problem? It is just a convention, not a law. If your logic makes more sense for your application, then it is fine, as long as it is well documented and consistent.
On Unix, exit status is unsigned, so this approach won't work if you ever have to run your program there, and this will confuse all your Unix programmers to no end. (I looked it up just now to make sure, and discovered to my surprised that Windows uses a signed exit status.) So I guess it will probably only mostly confuse your Windows programmers. :-)
I'd find another method to pass status between processes. There are many to choose from, some quite simple. You say "at the cost of a few WTF's on the programmers side" as if that's a small cost, but it sounds like a huge cost to me. Re-using an int in C is a miniscule benefit to be gained from confusing other programmers.
You need to go on a case by case basis. Think about the API and what you need to return. If your function only needs to return success or failure, I'd say give it an explicit type of bool (C99 has a bool type now) and return true for success and false for failure. That way things like:
if (!doSomething())
{
// failure processing
}
read naturally.
In many cases, however, you want to return some data value, in which case some specific unused or unlikely to be used value must be used as the failure case. For example the Unix system call open() has to return a file descriptor. 0 is a valid file descriptor as is theoretically any positive number (up to the maximum a process is allowed), so -1 is chosen as the failure case.
In other cases, you need to return a pointer. NULL is an obvious choice for failure of pointer returning functions. This is because it is highly unlikely to be valid and on most systems can't even be dereferenced.
One of the most important considerations is whether the caller and the called function or program will be updated by the same person at any given time. If you are maintaining an API where a function will return the value to a caller written by someone who may not even have access to your source code, or when it is the return code from a program that will be called from a script, only violate conventions for very strong reasons.
You are talking about passing information across a boundary between different layers of abstraction. Violating the convention ties both the caller and the callee to a different protocol increasing the coupling between them. If the different convention is fundamental to what you are communicating, you can do it. If, on the other hand, it is exposing the internals of the callee to the caller, consider whether you can hide the information.
I'm totally blown away from the quality of windows SRW implementation. Its faster then critical sections and its just a few bytes memory overhead.
Unfortunately it is only Windows Vista/Windows 7.
As this is a pure user land implementation, does anybody know if there is a cross platform implementation for it? Has anybody reverse-engineered there solution?
And please i don't want to add stuff like boost just to pull in a less then 100 LOC solution.
If you want something "portable" in the sense of conforming to some standard... If you are using POSIX threads there is pthread_rwlock_init() and friends. These are of course not typically used on Windows but rather Unix-type OSes.
But if you mean "portable" in the sense of "portable to multiple versions of Windows..." There are some undocumented calls in ntdll which implement RW locks. RtlAcquireResourceShared() and RtlAcquireResourceExclusive().
Here are some prototypes from WINE's implementation:
void WINAPI RtlInitializeResource(LPRTL_RWLOCK rwl);
void WINAPI RtlDeleteResource(LPRTL_RWLOCK rwl);
BYTE WINAPI RtlAcquireResourceExclusive(LPRTL_RWLOCK rwl, BYTE fWait);
BYTE WINAPI RtlAcquireResourceShared(LPRTL_RWLOCK rwl, BYTE fWait);
void WINAPI RtlReleaseResource(LPRTL_RWLOCK rwl);
Note you may have to GetProcAddress() these from ntdll.dll yourself.
As for the structure referenced... Here's what WINE declares:
typedef struct _RTL_RWLOCK {
RTL_CRITICAL_SECTION rtlCS;
HANDLE hSharedReleaseSemaphore;
UINT uSharedWaiters;
HANDLE hExclusiveReleaseSemaphore;
UINT uExclusiveWaiters;
INT iNumberActive;
HANDLE hOwningThreadId;
DWORD dwTimeoutBoost;
PVOID pDebugInfo;
} RTL_RWLOCK, *LPRTL_RWLOCK;
If you don't want to use pthreads and you don't want to link to sketchy undocumented functionality... You can look up a rwlock implementation and implement it yourself in terms of other operations... Say InterlockedCompareExchange(), or perhaps higher level primitives such as semaphores and events.
You can certainly roll your own using the same ideas as slim rwlock (at least what I imagine they did, since this is fairly straightforward). I outlined the approach in some detail in this other question.
For your case, you can mostly ignore the "fair" aspect, but the implementation is essentially the same. In particular, if you are willing to let an indefinite stream of readers block writers, you always let readers in when the lock already has readers in it (i.e., state (2) and (3) more or less collapse together).
In your case, for the cross platform angle, you would need to implement the blocking with either windows events or pthread condvars - but the details are similar in either case. Or, if you really want to avoid blocking at all, your only choice is spinning (ideally using the pause instruction to be nice to the CPU), which makes things even easier by removing the whole fallback to blocking code.
A good implementation is probably a couple hundred LOC. I wrote one (close source, I cannot share it) and it performs excellently (better than slim lock, in fact).