Does cyclone perform static or dynamic checks on fat pointers?

Does cyclone perform static or dynamic checks on fat pointers? - c

I am working my way thru Cyclone: A Safe Dialect of C for a PL class. The paper's authors explain that they've added a special 'fat' pointer that stores bounds information to prevent buffer overflows. But they don't specify if the check on this pointer is static or dynamic. The example they give seems to imply that the programmer must remember to check the size of the array in order to check that they don't exceed the buffer. This seems to open up the possibility of programming errors, just like in C. I thought the whole idea of Cyclone was to make such errors impossible. Does the language have a check? Does it just make it harder to make programming mistakes?
int strlen(const char ?s) {
int i, n;
if (!s) return 0;
n = s.size; //what if the programmer forgets to do this.. or accidentally adds an n++;
for (i = 0; i < n; i++,s++)
if (!*s) return i;
return n;
}

"Fat" pointers support pointer arithmetic with run-time bounds
checking.
Obtained from Wikipedia by googling for “fat pointers”.

Related

grammatical difficulties, unsuspected output

may you please tell me why by running this two codes I have different output?
void UART_OutString(unsigned char buffer[]){
int i;
while(buffer[i]){
UART_OutChar(buffer[i]);
i++;
}
}
and
void UART_OutString(unsigned char buffer[]){
int i = 0;
while(buffer[i]){
UART_OutChar(buffer[i++]);
}
}
regards, Genadi

You didn't initialize the i variable in the first case, so it's an uninteresting typo bug that your compiler ought to warn you about...
That being said, we can apply the KISS principle and rewrite the whole code in the most readable way possible, a for loop, which by its nature makes it very hard to forget to initialize the loop iterator:
void UART_OutString(const char* buf[]){
for(int i=0; buf[i]!='\0'; i++){
UART_OutChar(buffer[i]);
}
}
As it turns out, the most readable way is very often the fastest way possible too.
(However, int might be inefficient on certain low-end systems, so if you are fine with only using strings with length 255 or less, uint8_t i would be a better choice. Embedded systems should never use int and always the stdint.h types.)

For what it's worth, I'd implement this as
void UART_OutChar(unsigned char c);
void UART_OutString(unsigned char buffer[]){
for(unsigned char *p = buffer; *p; p++) {
UART_OutChar(*p);
}
}
to avoid the separate counter variable at all.

It is always a good idea to initialize local variables, especially in C where you should assume that nothing is done for you (because that's usually the case). There is a reason why regulated languages would not allow you to do this.
I believe reading the unassigned variable will result in unspecified behaviour (effectively C doesn't know there isn't meant to be anything there and will just grab what ever), this means it is completely un-predictable.
This could also cause all kinds of problems as you then index an array with it and C will not stop you from indexing an array out of bounds so if the random i value C happens to grab is larger than the size of the array then you will experience undefined behaviour in what buffer[i] returns. This one could be particularly nasty as it could cause any kind of memory read / segmentation fault are crash your program depending on quite what it decides to read.
Therefor unassigned i = random behaviour, and you then get more random behaviour from using that i value to index your array.
I believe this is about all the reasons that this is a bad idea. In C it is particular important to pay attention to things like this as it will often allow you to compile and run your code.
Both initialising i, and using the solution in #AKX's answer are good solutions although i thaught this would more answer your question of why they return differently. To which really the answer is the first approach returns completely randomly

What is the correct way to allocate memory for an array depending on a command line parameter?

When writing a program in which I ask the user to enter number N, which I have to use to allocate the memory for an int array, what is the correct way to handle this:
First approach:
int main() {
int array[],n;
scanf("%d\n",&n);
array = malloc(n * sizeof(int));
}
or the second approach:
int main() {
int n;
scanf("%d\n",&n);
int array[n];
}

Either one will work (though the first case needs to be changed from int array[] to int *array); the difference depends on where the array is stored.
In the first case, the array will be stored in the heap, while in the second case, it'll (most likely) be stored on the stack. When it's stored on the stack, the maximum size of the array will be much more limited based on the limit of the stack size. If it's stored in the heap, however, it can be much larger.

Your second approach is called a variable length array (VLA), and is supported only as of c99. This means that if you intend your code to be compatible with older compilers (or to be read and understood by older people..), you may have to fall back to the first option which is more standard. Note that dynamically allocating data requires proper maintenance, the most important part of that being - freeing it when you're done (which you don't do in your program)

Assuming you meant to use int *array; instead of int array[];(The first one wouldn't compile).
Always use the first approach unless you know the array size is going to be very small and you have the intimate knowledge of the platforms your will be running on. Naturally, the question arises how small is small enough?
The main problem with second approach is that there's no portable way to verify whether the VLA (Varible Length Array) allocation succeeded. The advantage is that you don't have to manage the memory but that's hardly an "advantage" considering the risk of undefined behaviour in case memory allocation fails.
It was introduced in C99 and been made optional in C11. That suggests the committee found it not-so-useful. Also, C11 compilers may not support it and you have to perform additional check whether your compiler supports it or not by checking if __STDC_NO_VLA__ has been defined.
Automatic storage allocation for an array as small int my_arr[10]; could fail. This is an extreme and unrealistic example in modern operating systems, but possible in theory. So I suggest to avoid VLAs in any serious projects.

You did say you wanted a COMMAND LINE parameter:
int main (int argc, char **argv)
{
int *array ;
int count ;
if (argc < 2)
return 1 ;
count = atoi (argv[1]) ;
array = malloc (sizeof(int)*count) ;
. . . . .
free (array) ;
return 0 ;
}

C global unsized array?

We had a school project, any information system using C. To keep a dynamic-sized list of student records, I went for a linked list data structure. This morning my friend let me see his system. I was surprised with his list of records:
#include <stdio.h>
/* and the rest of the includes */
/* global unsized array */
int array[];
int main()
{
int n;
for (n=0; n < 5; n ++) {
array[n] = n;
}
for (n=0; n < 5; n ++) {
printf("array[%d] = %d\n", n, array[n]);
}
return 0;
}
As with the code, he declared an unsized array that is global (in the bss segment) to the whole program. He was able to add new entries to the array by overwriting subsequent blocks of memory with a value other than zero so that he can traverse the array thusly:
for (n=0; array[n]; n++) {
/* do something */
}
He used (I also tested it with) Turbo C v1. I tried it in linux and it also works.
As I never encountered this technique before, I am presuming there is a problem with it. So, yeah, I wanna know why this is a bad idea and why prefer this over a linked list.

int array[];
Is technically known as an array with incomplete type. Simply put it is equivalent to:
int array[1];
This is not good simply because:
It produces an Undefined behavior. The primary use of array with incomplete type is in Struct Hack. Note that incomplete array types where standardized in C99 and they are illegal before.

This is Undefined behaviour. You are writing to unallocated memory (beyond the array). In order to compile this, the compiler is allocating at least one element, and you're then writing beyond that. Try a much bigger range of numbers. For example, if I run your code on Linux it works, but if I change the loop to 50,000, it crashes.
EDIT The code may work for small values of n but for larger values it will fail. To demonstrate this I've written your code and tested it for n = 1000.
Here is the link for CODEPAD, and you can see that for n = 1000, a segmentation fault happens.
Whereas with the same code with the same compiler, it is working for n = 10, see this link CODEPAD. So this is called Undefined behavior.

If you use linked lists you can check whether the memory is allocated properly or not.
int *ptr;
ptr = (int *)malloc(sizeof(int))
if(ptr==NULL)
{
printf("No Memory!!!");
}
But with your code the program simply crashes if tested with an array having a large bound.

C memory issue with char*

I need help with my C code. I have a function that sets a value to the spot in memory
to the value that you have input to the function.
The issue that I am facing is if the pointer moves past the allocated amount of memory
It should throw an error. I am not sure how to check for this issue.
unsignded char the_pool = malloc(1000);
char *num = a pointer to the start of the_pool up to ten spots
num[i] = val;
num[11] = val; //This should throw an error in my function which
So how can I check to see that I have moved into unauthorized memory space.

C will not catch this error for you. You must do it yourself.
For example, you could safely wrap access to your array in a function:
typedef struct
{
char *data;
int length;
} myArrayType;
void MakeArray( myArrayType *p, int length )
{
p->data = (char *)malloc(length);
p->length = length;
}
int WriteToArrayWithBoundsChecking( myArrayType *p, int index, char value )
{
if ( index >= 0 && index < p->length )
{
p->data[index] = value;
return 1; // return "success"
}
else
{
return 0; // return "failure"
}
}
Then you can look at the return value of WriteToArrayWithBoundsChecking() to see if your write succeeded or not.
Of course you must remember to clean up the memory pointed at by myArrayType->data when you are done. Otherwise you will cause a leak.

dont you mean?
num[11] = val
Yes there is no way to check that it is beyond bounds except doing it yourself, C provides no way to do this. Also note that arrays start at zero so num[10] is also beyond bounds.

The standard defines this as Undefined behavior.
It might work, it might not, you never know, when coding in C/C++, make sure you check for bounds before accessing your arrays

Common C compilers will not perform array bounds checking for you.
Some compilers are available that claim to support array bounds -- but their performance is usually poor enough compared to the normal compilers that they are usually not distributed far and wide.
There are even dialects of C intended to provide memory safety, but again, these usually do not get very far. (The linked Cyclone, for example, only supports 32 bit platforms, last time I looked into it.)
You may build your own datastructures to provide bounds checking if you wish. If you maintain a structure that includes a pointer to the start of your data, a data member that includes the allocated size, and functions that work on the structure, you can implement all this. But the onus is entirely on you or your environment to provide these datastructures.

I guess you could use sizeof to avoid your array access out of bound index. But c allows you to access some memory out of your array bound. That's OK for c compiler, and OS will manage the behavior when you do that.
C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
You could use array like this:
type name[size];

if you are using the Visual Studio 2010 ( or 2011 Beta ) it will till you after u try to free the allocated memory.
there is advanced tools to check for leaked memory.
in you example, you have actually moved to unauthorized memory space indeed. your indexes should be between 0 to ( including ) 999.

Pass-by-reference in C - downsides?

Most high-level languages (Python, Ruby, even Java) use pass-by reference. Obviously, we don't have references in C, but we can imitate them using pointers. There are a few benefits to doing this. For example:
int findChar(char ch, char* in)
{
int i = 0;
for(i = 0; in[i] != '\0'; i++)
if(in[i] == ch)
return i;
return -1;
}
This is a common C paradigm: catch an abnormal or erroneous situation by returning some error value (in this case, return -1 if the character is not in the string).
The problem with this is: what if you want to support strings more than 2^31 - 1 characters long? The obvious solution is to return an unsigned int but that won't work with this error value.
The solution is something like this:
unsigned int* findChar(char ch, char* in)
{
unsigned int i = 0;
for(i = 0; in[i] != '\0'; i++)
if(in[i] == ch)
{
unsigned int index = (unsigned int*) malloc(sizeof(unsigned int));
*index = i;
return index;
}
return NULL;
}
There are some obvious optimizations which I didn't make for simplicity's sake, but you get the idea; return NULL as your error value.
If you do this with all your functions, you should also pass your arguments in as pointers, so that you can pass the results of one function to the arguments of another.
Are there any downsides to this approach (besides memory usage) that I'm missing?
EDIT: I'd like to add (if it isn't completely obvious by my question) that I've got some experience in C++, but I'm pretty much a complete beginner at C.

It is a bad idea because caller is responsible to free the index, otherwise you are leaking memory. Alternatively you can use static int and return its address every time - there will be no leaks, but function becomes non-reentrant, which is risky (but acceptable if you bear it in mind).
Much better would be to return pointer to char function finds, or NULL if it is not present. That's the way strchr() works, BTW.
Edited to reflect changes in original post.

Without the malloc, the position can be still a stack variable and you can use it in an if statement:
int findChar(char ch, char* in, int* pos)
{
int i = 0;
for(i = 0; in[i] != '\0'; i++)
{
if(in[i] == ch)
{
*pos = i;
return 1;
}
}
return 0;
}

In the specific example, you should use size_t as the return type: this is the data type that adequately represents how large strings can get on any system. I.e. you can't possibly have a string that is longer than a size_t can represent. Then, you can fairly safely use (size_t)-1 as an error indicator: realistically, you also cannot put a string with that size into memory, since you also need some address space for the code you are executing; it becomes a limitation of your API that such long strings would not be supported if they existed.
Your approach not only has the disadvantage using more memory, but also the disadvantage of being slower: the callee needs to malloc, the caller needs to free. Those are fairly expensive operations.
There is one other standard approach relevant here: errno. In case of an error indicator, you don't know what the error is. So in C, rather than using an out parameter, we typically put the error details into a global or thread-local variable.

The function needs to dereference the parameters, which takes more time than accessing the stack.
The pointers can be uninitialized, causing unexpected results.
There is no standard way to specify which pointer is for input, wich is for output and which is for both (there are extensions, and naming tricks, but it's still a matter).

I am not an expert, but I think a ton of small mallocs can cause problems. First, you have to take care of freeing the memory after the use of the value. Then you also have to deal with the fragmentation of the free memory. Passing as pointer is more suitable for complex structures.

I'd say the most severe downside to your code is that you use one return value to represent both a general failure and the result if successful.
While this is a common practice, it can lead to wierd scenarios when requirements change, just like the one you described. An alternative practice would be to separate the return values, i.e. something like this
int findChar(char ch, char const * const in, unsigned int * const index)
{
if ( in != NULL && index != NULL)
{
unsigned int i;
for(i = 0; in[i]; i++)
{
if(in[i] == ch)
{
*index = i;
return EXIT_SUCCESS;
}
}
}
return EXIT_FAILURE;
}
...where the function return value tells you whether the function was successful or not, separately from the value of 'index'.
Then again, as fortran noted, there is no way to enforce whether the pointers are input values, output values, or both (i.e. modified inside the function).

The biggest downside is that it requires findChar()'s callers to free() the returned memory, or create a memory leak. You've reinvented the strchr() wheel poorly.
I also don't see why you're thinking that returning a pointer to unsigned int is such a big step forward. First, you could just return an unsigned int, if all you're after is the ability to return values up to 2^32 on a 32-bit machine instead of 2^31-1. Second, your stated goal is to avoid a problem with large strings. Well, what if you're on a 64-bit machine, where 'int' and 'unsigned int' remain 32 bits? What you really want here is a long, but returning pointers doesn't actually help here.
ELIDED BOGUS CRITICISM