overflows in size_t additions - c

I like to have my code warning free for VS.NET and GCC, and I like to have my code 64-bit ready.
Today I wrote a little module that deals with in memory buffers and provides access to the data via a file-style interface (e.g. you can read bytes, write bytes, seek around etc.).
As the data-type for current read position and size I used size_t since that seems to be the most natural choice. I get around the warnings and it ought to work in 64-bit as well.
Just in case: My structure looks like this:
typedef struct
{
unsigned char * m_Data;
size_t m_CurrentReadPosition;
size_t m_DataSize;
} MyMemoryFile;
The signedness of size_t seems not to be defined in practice. A Google code-search proved that.
Now I'm in a dilemma: I want to check additions with size_t for overflows because I have to deal with user supplied data and third party libraries will use my code. However, for the overflow check I have to know the sign-ness. It makes a huge difference in the implementation.
So - how the heck should I write such a code in a platform and compiler independent way?
Can I check the signedness of size_t at run or compile-time? That would solve my problem. Or maybe size_t wasn't the best idea in the first place.
Any ideas?
EDIT: I'm looking for a solution for the C-language!

Regarding the whether size_t is signed or unsigned and GCC (from an old GCC manual - I'm not sure if it's still there):
There is a potential problem with the
size_t type and versions of GCC prior
to release 2.4. ANSI C requires that
size_t always be an unsigned type. For
compatibility with existing systems'
header files, GCC defines size_t in
stddef.h to be whatever type the
system's sys/types.h defines it to
be. Most Unix systems that define
size_t in sys/types.h, define it to
be a signed type. Some code in the
library depends on size_t being an
unsigned type, and will not work
correctly if it is signed.
The GNU C library code which expects
size_t to be unsigned is correct. The
definition of size_t as a signed type
is incorrect. We plan that in version
2.4, GCC will always define size_t as an unsigned type, and the
'fixincludes' script will massage the
system's sys/types.h so as not to
conflict with this.
In the meantime, we work around this
problem by telling GCC explicitly to
use an unsigned type for size_t when
compiling the GNU C library.
'configure' will automatically detect
what type GCC uses for size_t arrange
to override it if necessary.
If you want a signed version of size_t use ptrdiff_t or on some systems there is a typedef for ssize_t.

size_t is an unsigned integral type, according to the C++ C standards. Any implementation that has size_t signed is seriously nonconforming, and probably has other portability problems as well. It is guaranteed to wrap around when overflowing, meaning that you can write tests like if (a + b < a) to find overflow.
size_t is an excellent type for anything involving memory. You're doing it right.

size_t should be unsigned.
It's typically defined as unsigned long.
I've never seen it be defined otherwise. ssize_t is its signed counterpart.
EDIT:
GCC defines it as signed in some circumstances. compiling in ASNI C mode or std-99 should force it to be unsigned.

For C language, use IntSafe. Also released by Microsoft (not to be confused with the C++ library SafeInt). IntSafe is a set of C language function calls that can perform math and do conversions safely.
updated URL for intsafe functions

Use safeint. It is a class designed by Michael Howard and released as open source from Microsoft. It is designed to make working with integers where overflow is identified as a risk. All overflows are converted to exceptions and handled. The class is designed to make correct usage easy.
For example :
char CouldBlowUp(char a, char b, char c)
{
SafeInt<char> sa(a), sb(b), sc(c);
try
{
return (sa * sb + sc).Value();
}
catch(SafeIntException err)
{
ComplainLoudly(err.m_code);
}
return 0;
}
Also safeint is used a lot internally at Microsoft in products like Office.
Ref:
link text

I am not sure if I understand the question exactly, but maybe you can do something like:
temp = value_to_be_added_to;
value_to_be_added_to += value_to_add;
if (temp > value_to_be_added_to)
{
overflow...
}
Since it will wrap back to lower values you can easily check if it overflowed.

Related

Dealing with long type from a 32-bit codebase on a 64-bit system (Linux)

I have a program written originally WAY back in 1995, maintained to 2012.
It's obviously written for a 32-bit architecture, I've managed to get the damn thing running, but I'm getting stumped on how it's saving data...
My issue is with the sizeof(long) under 64-bit (a common problem I know), I've tried doing a sed across the code and replacing long with int_32t, but then I get errors where it's trying to define a variable like:
unsigned long int count;
I've also tried -m32 on the gcc options, but then it fails to link due to 64-bit libraries being required.
My main issue is where it tries to save player data (it's a MUD), at the following code lines:
if ((sizeof(char) != 1) || (int_size != long_size))
{
logit(LOG_DEBUG,
"sizeof(char) must be 1 and int_size must == long_size for player saves!\n");
return 0;
}
Commenting this out allows the file to save, but because it's reading bytes from a buffer as it reloads the characters, the saved file is no longer readable by the load function.
Can anyone offer advice, maybe using a typedef?
I'm trying to avoid having to completely rewrite the save/load routines - this is my very last resort!.
Thanks in advance for answers!
Instead of using types like int and long you can use int32_t and int64_t, which are typedef:s to types that have the correct size in your environment. They exists in signed and unsigned variants as in int32_t and uint32_t.
In order to use them you need to include stdint.h. If you include inttypes.h you will also get macros useful when printing using printf, e.g. PRIu64.

What is the typdef size_t actual code definition [duplicate]

I see variables defined with this type but I don't know where it comes from, nor what is its purpose. Why not use int or unsigned int? (What about other "similar" types? Void_t, etc).
From Wikipedia
The stdlib.h and stddef.h header files define a datatype called size_t1 which is used to represent the size of an object. Library functions that take sizes expect them to be of type size_t, and the sizeof operator evaluates to size_t.
The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to programming errors,2 particularly as 64-bit architectures become more prevalent.
From C99 7.17.1/2
The following types and macros are defined in the standard header stddef.h
<snip>
size_t
which is the unsigned integer type of the result of the sizeof operator
According to size_t description on en.cppreference.com size_t is defined in the following headers :
std::size_t
...
Defined in header <cstddef>
Defined in header <cstdio>
Defined in header <cstring>
Defined in header <ctime>
Defined in header <cwchar>
size_t is the unsigned integer type of the result of the sizeof operator (ISO C99 Section 7.17.)
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. The value of the result is implementation-defined, and its type (an unsigned integer type) is size_t (ISO C99 Section 6.5.3.4.)
IEEE Std 1003.1-2017 (POSIX.1) specifies that size_t be defined in the header sys/types.h, whereas ISO C specifies the header stddef.h. In ISO C++, the type std::size_t is defined in the standard header cstddef.
Practically speaking size_t represents the number of bytes you can address. On most modern architectures for the last 10-15 years that has been 32 bits which has also been the size of a unsigned int. However we are moving to 64bit addressing while the uint will most likely stay at 32bits (it's size is not guaranteed in the c++ standard). To make your code that depends on the memory size portable across architectures you should use a size_t. For example things like array sizes should always use size_t's. If you look at the standard containers the ::size() always returns a size_t.
Also note, visual studio has a compile option that can check for these types of errors called "Detect 64-bit Portability Issues".
This way you always know what the size is, because a specific type is dedicated to sizes. The very own question shows that it can be an issue: is it an int or an unsigned int? Also, what is the magnitude (short, int, long, etc.)?
Because there is a specific type assigned, you don't have to worry about the length or the signed-ness.
The actual definition can be found in the C++ Reference Library, which says:
Type: size_t (Unsigned integral type)
Header: <cstring>
size_t corresponds to the integral data type returned by the language operator sizeof and is defined in the <cstring> header file (among others) as an unsigned integral type.
In <cstring>, it is used as the type of the parameter num in the functions memchr, memcmp, memcpy, memmove, memset, strncat, strncmp, strncpy and strxfrm, which in all cases it is used to specify the maximum number of bytes or characters the function has to affect.
It is also used as the return type for strcspn, strlen, strspn and strxfrm to return sizes and lengths.
size_t should be defined in your standard library's headers. In my experience, it usually is simply a typedef to unsigned int. The point, though, is that it doesn't have to be.
Types like size_t allow the standard library vendor the freedom to change its underlying data types if appropriate for the platform. If you assume size_t is always unsigned int (via casting, etc), you could run into problems in the future if your vendor changes size_t to be e.g. a 64-bit type. It is dangerous to assume anything about this or any other library type for this reason.
I'm not familiar with void_t except as a result of a Google search (it's used in a vmalloc library by Kiem-Phong Vo at AT&T Research - I'm sure it's used in other libraries as well).
The various xxx_t typedefs are used to abstract a type from a particular definite implementation, since the concrete types used for certain things might differ from one platform to another. For example:
size_t abstracts the type used to hold the size of objects because on some systems this will be a 32-bit value, on others it might be 16-bit or 64-bit.
Void_t abstracts the type of pointer returned by the vmalloc library routines because it was written to work on systems that pre-date ANSI/ISO C where the void keyword might not exist. At least that's what I'd guess.
wchar_t abstracts the type used for wide characters since on some systems it will be a 16 bit type, on others it will be a 32 bit type.
So if you write your wide character handling code to use the wchar_t type instead of, say unsigned short, that code will presumably be more portable to various platforms.
In minimalistic programs where a size_t definition was not loaded "by chance" in some include but I still need it in some context (for example to access std::vector<double>), then I use that context to extract the correct type. For example typedef std::vector<double>::size_type size_t.
(Surround with namespace {...} if necessary to make the scope limited.)
As for "Why not use int or unsigned int?", simply because it's semantically more meaningful not to. There's the practical reason that it can be, say, typedefd as an int and then upgraded to a long later, without anyone having to change their code, of course, but more fundamentally than that a type is supposed to be meaningful. To vastly simplify, a variable of type size_t is suitable for, and used for, containing the sizes of things, just like time_t is suitable for containing time values. How these are actually implemented should quite properly be the implementation's job. Compared to just calling everything int, using meaningful typenames like this helps clarify the meaning and intent of your program, just like any rich set of types does.

Difference between size_t and mwSize when compiling C MEX-files for Matlab

I am currently working on porting some C MEX-files for 32-bit Matlab to 64-bit Matlab.
While doing so, I have encountered two types, one coming from the Matlab people, and one which is C standard.
This is what the Matlab documentation is saying about mwSize:
mwSize (C and Fortran)
Type for size values
Description
mwSize is a type that represents size values, such as array dimensions. Use this function for cross-platform flexibility. By default, mwSize is equivalent to int in C. When using the mex -largeArrayDims switch, mwSize is equivalent to size_t in C. In Fortran, mwSize is similarly equivalent to INTEGER*4 or INTEGER*8, based on platform and compilation flags.
This is what Wikipedia is saying about size_t:
size_t is an unsigned data type defined by several C/C++ standards (e.g., the C99 ISO/IEC 9899 standard) that is defined in stddef.h.[1] It can be further imported by inclusion of stdlib.h as this file internally sub includes stddef.h[2].
This type is used to represent the size of an object. Library functions that take or return sizes expect them to be of this type or have the return type of size_t. Further, the most frequently used compiler-based operator sizeof should evaluate to a value that is compatible with size_t.
The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to programming errors,[3][4] when moving from 32 to 64-bit architecture, for example.
As far as I can see, these types are actually the same. My questions are:
Are they?
If they are, which one would be considered better programming taste to use? Ideally we would like our code to be compatible with future Matlab releases as well. I am guessing that the answer is mwSize, but I am not sure.
Edit: I should add that the Matlab people are using both. For example:
size_t mxGetN(const mxArray *pm);
is a function that is retrieving the number of columns of an mxArray. However, when one creates a matrix, one uses,
mxArray *mxCreateDoubleMatrix(mwSize m, mwSize n, mxComplexity ComplexFlag);
where the input evidently should be mwSize.
mwSize is defined for backward compatibility and portability. As the documentation states, it maps to an int when the -largeArrayDims switch is not used during compilation, and size_t when it is. So, in the first case mwSize is signed, but in the second, it isn't.
Using mwSize in your code allows you to re-use the code on all platforms, irrespective of whether that flag is used or not.
As for the API inconsistencies you've pointed out, they are truly inconsistencies, but not ones for major concern. mxGetN() will never return a negative number, so having it return a size_t is OK. However, (I'm guessing) older versions or versions of the mex API on certain platforms expect an int to passed to mxCreateDoubleMatrix() so defining the function as taking an input of type mwSize makes it portable and / or backward compatible.
Short answer is, use mwSize and use -largeArrayDims to compile the mex function.

Can we change the size of size_t in C?

Can we change the size of size_t in C?
No. But why would you even want to do it?
size_t is not a macro. It is a typedef for a suitable unsigned integer type.
size_t is defined in <stddef.h> (and other headers).
It probably is typedef unsigned long long size_t; and you really should not even think about changing it. The Standard Library uses it as defined by the Standard Library. If you change it, as you cannot change the Standard Library, you'll get all kinds of errors because your program uses a different size for size_t than the Standard Library. You can no longer call malloc(), strncpy(), snprintf(), ...
If you want to fork Linux or NetBSD, then "Yes"
Although you can redefine macros this one is probably a typedef.
If you are defining an environment then it's perfectly reasonable to specify size_t as you like. You will then be responsible for all the C99 standard functions for which conforming code expects size_t.
So, it depends on your situation. If you are developing an application for an existing platform, then the answer is no.
But if you are defining an original environment with one or more compilers, then the answer is yes, but you have your work cut out for you. You will need an implementation of all the library routines with an API element of size_t which can be compiled with the rest of your code with the new size_t typedef. So, if you fork NetBSD or Linux, perhaps for an embedded system, then go for it. Otherwise, you may well find it "not worth the effort".

C: Why isn't size_t a C keyword?

sizeof is a C keyword. It returns the size in a type named size_t. However, size_t is not a keyword, but is defined primarily in stddef.h and probably other C standard header files too.
Consider a scenario where you want to create a C program which does not include any C standard headers or libraries. (Like for example, if you are creating an OS kernel.) Now, in such code, sizeof can be used (it is a C keyword, so it is a part of the language), but the type that it returns (size_t) is not available!
Does not this signify some kind of a problem in the C standard specification? Can you clarify this?
It does not literally return a value of type size_t since size_t is not a concrete type in itself, but rather a typedef to an unspecified built-in type. Typedef identifiers (such as size_t) are completely equivalent to their respective underlying types (and are converted thereto at compile time). If size_t is defined as an unsigned int on your platform, then sizeof returns an unsigned int when it is compiled on your system. size_t is just a handy way of maintaining portability and only needs to be included in stddef.h if you are using it explicitly by name.
sizeof is a keyword because, despite it's name and usage, it is an operator like + or = or < rather than a function like printf() or atoi() or fgets(). A lot of people forget (or just don't know) that sizeof is actually an operator, and is always resolved at compile-time rather than at runtime.
The C language doesn't need size_t to be a usable, consistent language. That's just part of the standard library. The C language needs all operators. If, instead of +, C used the keyword plus to add numbers, you would make it an operator.
Besides, I do semi-implicit recasting of size_ts to unsigned ints (and regular ints, but Kernighan and Ritchie will someday smite me for this) all the time. You can assign the return type of a sizeof to an int if you like, but in my work I'm usually just passing it straight on to a malloc() or something.
Some headers from the C standard are defined for a freestanding environment, i.e. fit for use e.g. in an operating system kernel. They do not define any functions, merely defines and typedefs.
They are float.h, iso646.h, limits.h, stdarg.h, stdbool.h, stddef.h and stdint.h.
When working on an operating system, it isn't a bad idea to start with these headers. Having them available makes many things easier in your kernel. Especially stdint.h will become handy (uint32_t et al.).
Does not this signify some kind of a problem in the C standard specification?
Look up the difference between a hosted implementation of C and a freestanding C implementation. The freestanding (C99) implementation is required to provide headers:
<float.h>
<iso646.h>
<limits.h>
<stdarg.h>
<stdbool.h>
<stddef.h>
<stdint.h>
These headers do not define any functions at all. They define parts of the language that are somewhat compiler specific (for example, the offsetof macro in <stddef.h>, and the variable argument list macros and types in <stdarg.h>), but they can be handled without actually being built into the language as full keywords.
This means that even in your hypothetical kernel, you should expect the C compiler to provide these headers and any underlying support functions - even though you provide everything else.
I think that the main reasons that size_t is not a keyword are:
there's no compelling reason for it to be. The designers of the C and C++ languages have always preferred to have language features be implemented in the library if possible and reasonable
adding keywords to a language can create problems for an existing body of legacy code. This is another reason they are generally resistant to adding new keywords.
For example, in discussing the next major revision of the C++ standard, Stroustrup had this to say:
The C++0x improvements should be done in such a way that the resulting language is easier to learn and use. Among the rules of thumb for the committee are:
...
Prefer standard library facilities to language extensions
...
There is no reason not to include stddef.h, even if you are working on a kernel - it defines type sizes for your specific compiler that any code will need.
Note also that almost all C compilers are self-compiled. The actual compiler code for the sizeof operator will therefore use size_t and reference the same stddef.h file as does user code.
From MSDN:
When the sizeof operator is applied
to an object of type char, it yields 1
Even if you don't have stddef.h available/included and don't know about size_t, using sizeof you can get the size of objects relative to char.
size_t is actually a type - often an unsigned int. Sizeof is an operator that gives the size of a type. The type returned by sizeof is actually implementation-specific, not a C standard. It's just an integer.
Edit:
To be very clear, you do not need the size_t type in order to use sizeof. I think the answer you're looking for is - Yes, it is inconsistent. However, it doesn't matter. You can still practically use sizeof correctly without having a size_t definition from a header file.
size_t is not a keyword by necessity. Different architectures often have different sizes for integral types. For example a 64 bit machine is likely to have an unsigned long long as size_t if they didn't decide to make int a 64 bit datatype.
If you make sizeof a builtin type to the compiler, then it will take away the power to do cross compilation.
Also, sizeof is more like a magic compile time macro (think c++ template) which explains why it is a keyword instead of defined type.
The simple reason is because it is not a fundamental type. If you look up the C standard you will find that fundamental types include int, char etc but not size_t. Why so? As others have already pointed out, size_t is an implementation specific type (i.e. a type capable of holding the size in number of "C bytes" of any object).
On the other hand, sizeof is an (unary) operator. All operators are keywords.

Resources