Distinguish between string and byte array?

Distinguish between string and byte array? - c

I have a lot of functions that expect a string as argument, for which I use char*, but all my functions that expect a byte-array, also use char*.
The problem is that I can easily make the mistake of passing a byte-array in a string-function, causing all kinds of overflows, because the null-terminator cannot be found.
How is this usually delt with? I can imagine changing all my byte-array functions to take an uint8_t, and then the compiler will warn about signed-ness when I pass a string. Or what is the right approach here?

I generally make an array something like the following
typedef struct {
unsigned char* data;
unsigned long length;
unsigned long max_length;
} array_t;
then pass array_t* around
and create array functions that take array_t*
void array_create( array_t* a, unsgined long length) // allocates memory, sets the max_length, zero length
void array_add(array_t* a, unsigned char byte) // add a byte
etc

The problem is more general in C than you are thinking. Since char* and char[] are equivalent for function parameters, such a parameter may refer to three different semantic concepts:
a pointer on one char object (this is the "official" definition of pointer types)
a char array
a string
In most cases where is is possible the mondern interfaces in the C standard uses void* for an untyped byte array, and you should probably adhere to that convention, and use char* only for strings.
char[] by themselves probably are rarely used as such; I can't imagine a lot of use cases for these. If you think of them as numbers you should use the signed or unsigned variant, if you see them just as bit pattern unsigned char should be your choice.
If you really mean an array as function parameter (char or not) you can mark that fact for the casual reader of your code by clearly indicating it:
void toto(size_t n, char A[const n]);
This is equivalent to
void toto(size_t n, char *const A);
but makes your intention clearer. And in the future there might even be tools that do the bounds checking for you.

Write a common structure to handle both string and bytes.
struct str_or_byte
{
int type;
union
{
char *buf;
char *str;
}pointer;
int buf_length;
}
If type is not string then access the pointer.buf only upto buf_length. Otherwise directly access pointer.str without checking buf_length and maintain it as null terminated string.
Or else maintain string also as byte array by considering only length, dont keep null terminated charater for string.
struct str_or_byte
{
char *buf;
int buf_length;
}
And dont use string manuplation functions which are not considering length. That means use strncpy, strncat, strncmp ... instead of strcpy, strcat, strcmp...

C using convention. Here's the rules I use (fashioned after the std lib)
void foo(char* a_string);
void bar(void* a_byte_array, size_t number_of_bytes_in_the_array);
This is easy to remember. If you are passing a single char* ptr, then it MUST be a null-terminated char array.

Related

C dynamic array of strings

I'm trying to write a simple implementation of dynamic arrays of strings in C. Here's my code (minus the includes and main function etc ...):
typedef char* string;
typedef struct {
string* list;
size_t size;
size_t used;
} list;
void initList(list* l, size_t initSize) {
l->list = malloc(initSize * sizeof(string));
l->used = 0;
l->size = initSize;
}
void insertList(list* l, string elem) {
if (l->used == l->size) {
l->size *= 2;
l->list = realloc(l->list, l->size * sizeof(string));
}
l->list[l->used++] = elem;
}
My code seems to work as I expect, I'm asking my question because I read that you should use char[] instead of char*.
I read that using typedef char* string declares the string in read-only memory, so trying to modify it causes undefined behaviour.
If so, using the GCC C compiler I don't receive any errors or warnings and the code seems to work when compiled.
The functions for creating and growing the dynamic array where taken from another question on StackOverflow, the original question created a dynamic array of integers.
I'm just curious as to if my code is good/bad practice.

I read that using typedef char* string declares the string in read-only memory, so trying to modify it causes undefined behaviour.
Well, that's nonsense. You might confuse this with something like
char *foo = "foo";
In that case, although foo is a mutable pointer to a mutable set of characters, the characters it points to are NOT mutable. This is because a "string literal" in C always lives in non-mutable memory. It doesn't help that foo isn't const here. A good compiler will warn you, though -- you should only assign string literals to const char *.
If you want a mutable string, initialize a char[] from it, this way it's not a pointer and the compiler knows to place it in mutable memory. But this really only concerns literals.
So there's nothing wrong with using char * as your string type. In fact, that's what C does implicitly for string literals. I'd still have a little objection: Seasoned C-programmers will know about char * et al and they will expect that "plain" strings are char * (or const char *). If you name something string, it should somehow provide more than that. If it doesn't, just don't confuse people and go by char *.

Declaring array of voids

I'm writing a program that copies files. I've used a buffer in order to store the information that the read() function provides and then give this data to the write() function. I've used this declaration:
static void buffer[BUFFER_SIZE];
The problem is that I get the the error error: declaration of ‘buffer’ as array of voids.
I don't understand why declaring an array of void is an error. How can I declare a block of memory without a specific type?

I don't understand why declaring an array of void is an error.
The technical reason you cannot declare an array-of-void is that void is an incomplete type that cannot be completed. The size of array elements must be known but the void type has no size. Note that you are similarly unable to declare a void object, as in
void foo;
Note that a pointer-to-void has a specific size, which is why
void *foo;
is valid, as well as sizeof(foo), but sizeof(*foo) would be invalid.
How can I declare a block of memory without a specific type?
For generic memory buffers, use an array of plain char or unsigned char (or even uint8_t). I usually prefer plain char when I need to pass pointers to the str* family of functions where array of unsigned char would lead to diagnostics about incompatible pointers.

Use an array of char or unsigned char instead.

When declaring an array, the compiler must know the size of the array.
The void type has no size. So compiler report error.

This is how you do it, although it is not an array of voids as you may have wanted, it is an array of void pointers; in each cell of this void pointer you can assign ANYTHING.
If you wish to store a variable, you simply cast it to (void*), here is an example:
void *XPoint[5] = { 0 };
XPoint[0] = (void*)5; // int
XPoint[1] = "Olololo!"; // char*
XPoint[2] = (void*)'x'; // char
XPoint[3] = (void*)structA.nNum; // struct int

You also can declare void pointer:
void* buffer;
// just pointers for simple work with allocated piece of memory
char* chBuf;
long* intBuf;
then use malloc() for this pointer and work with this piece of memory like with an any array (char, short, long, float, double or your own type). Don't forget to use free() to avoid memory leak in this case.
buffer = malloc (BUFFER_SIZE);
if (buffer==NULL)
exit (1);
// init our buffers
chBuf=(*char)buffer;
intBuf=(*long)buffer;

Using pointers with functions and structs

Hello i'm selfstudying C and i'm a bit confused about the following code since i don't know if i'm understanding the code properly. I would be very thankful if someone could read my explanation and correct me if i'm wrong.
The code is from a header file. The function of the program should be uninteresting at this point, since my comprehension problem is about the pointers and the values the functions give back. So first of all i'm declaring 3 arrays of char and an integer in my employee struct.
struct employee
{
char firstname[11];
char lastname[11];
char number[11];
int salary;
}
5 functions are declared in the header file. The first function takes 4 values (3 pointers and one int) and gives back a pointer to a struct. The second function gets a pointer to the "struct employee" and gives back a pointer to an element of the array "char firstname" in the struct employee. The functions 3 and 4 are doing the same for the other both arrays. The function 5 gets a pointer to the struct employee but gives back an int and not a pointer. So it is just using the declared variable in the struct.
struct employee* createEmployee(char*, char*, char*, int); //1
char* firstname (struct Employee*); //2
char* lastname (struct Employee*); //3
char* number (struct Employee*); //4
int salary (struct Employee*); //5

Your understanding is pretty much correct. To be a little more accurate and/or less abusive of the English language, functions 2-4 return a pointer to an element of the corresponding array.
The idea is that the contents of each array represent some kind of text, with each element corresponding a character of text, up until the first appearance of a zero value (used to mark the end of the "string"). Keep in mind that there is no provision for Unicode here, or even for specifying an encoding: we assume ASCII, and for any bytes not in the range 0..127, all bets are off. A better name for the type char would be byte; but we didn't really know any better back then. (You should also be aware that char is a separate type from both signed char and unsigned char, and that char may or may not be signed.)
This way, the names and number can be any length up to ten (an 11th byte is reserved in order to have room for the "null terminator" with the zero value), and the returned pointer can be used to inspect - and modify; but that might not be a good idea - the data in the arrays that are part of the Employee structure.
The int returned as the Employee 's salary is a copy of the one in the struct. So although we can see the whole value, this does not let us modify the value in the struct. Of course, as long as we have the struct definition and an Employee instance, there is no protection; we can access the member directly instead of going through the function.
In C, we get "encapsulation" and "data hiding" by not providing these definitions. Instead, we would just put struct Employee; and the function declarations in the header, and the struct definition in the implementation file. Now calling code doesn't know anything about what an Employee is; only about what it can do.

Almost right, except when you read the functions, the input "char*" should be read as a string is passed in though technically it is a pointer. for integers the actual value is passed, so you are right there.
so,
struct employee* createEmployee(char*, char*, char*, int);
would mean that the procedure creates a employee with the four inputs passed (the first three are strings (Char*) which probably are first name, last name and number/id in that order and the last one is the salary.)

What you cannot be sure by just reading the function signature whether the values/pointers returned are reference to same values/pointers from the sturct, or the function(s) is creating a copy and returning pointer to the new value. This is why its a good idea to also write the documentation as comments with the function signature in the header file.

Your explanations make sense, except #2-4: since they return char*, they cannot return an element of a char array (the correct type for the latter would be simply char). Each of the three functions returns a pointer to char. One presumes that they return the pointer to the first element of the corresponding array, which is basically how arrays are passed around in C.

There is a problem in 1. What you're doing I assume is:
struct employee* createEmployee(const char* f, const char* l, const char* n,
const int sal)
{
struct employee* e;
strcpy(e->firstname, f);
strcpy(e->lastname, l);
/* ... */
return e;
}
The problem here is the arrays coming in. It is perfectly feasible for a char array of any size to be passed in; anything over the length of 11 would cause you to start overwriting data at &(e->firstname[0])+11;. Over what? Exactly. You've no idea (and nor have I, it'd be determined at run-time). This could cause some serious problems.
One way around that is to use functions from stdlib.h and string.h i.e. strlen() to test the length of the data being passed in to ensure it fits your field size.
A better method might be to write:
int createEmployee(struct employee* e, const char* f, const char* l, const char* n,
const int sal)
{
int error = 0;
if ( strlen(f) < 11 )
{
strncpy(e->firstname, f);
}
else
{
error++;
}
/* ... */
return error;
}
See what I've done? Yes it will work - anything passed in as a pointer can be modified. It's not pass-by-reference, quite. Edit: as aix says, pointers are "how arrays are passed around in C".
Another potential method is strncpy() which will truncate the source string according to the last argument, so strncpy(e->firstname, f, 11); would be safe.
You might well try dynamic memory allocation for the field sizes based on requirement, too. I'm guessing you'll be learning that later/as another challenge.
Also, another suggestion whilst we're at it is to define the pointer to a struct using typedef. It makes things a little more readable although the way you've done it is definitely clearer for someone learning.

C Language - [copy in place] float* to char* and the reverse

I want to copy a float buffer into a char (byte) buffer without allocating memory for two separate buffers. In another words I want to use one buffer and copy in place.
The Problem is that if I need a float buffer then in order to copy that to a char then I will need a char* pointer; If I were copying from float* to float* it would be easy as I would just pass in the same pointer for the target and the source.
eg.
void CopyInPlace(float* tar, float* src, int len) {
....
}
CopyInPlace(someBuffer, someBuffer, 2048);
void PackFloatToChar(char* tar, float* src, int len) {
}
????????
How would I do this?
Does memcpy copy in place?, if passed in the same pointer?

If you want to convert a float pointer to a char pointer, a cast is sufficient.
float* someBuffer;
...
char* someBuffer2 = (char*)someBuffer;

Your question seems a bit confused.
Do you want to simply interpret an array of floats as a char array (for something like writing to a file?). If so, simply cast. All pointers in C can be represented by char*'s.
memcpy will copy from one memory location to another. But keep careful track of whether your "len" parameter is the number of floats or number of bytes. If "len" is the count of floats in the array, multiply it by sizeof(float) in the memcpy call.

As an alternative to the casting that's already been recommended, you might want to consider using a union, something like:
union x {
float float_val;
char bytes[sizeof(float)];
};
There isn't likely to be a huge difference either way, but you may find one more convenient or readable than the other.

scanf("%d", char*) - char-as-int format string?

What is the format string modifier for char-as-number?
I want to read in a number never exceeding 255 (actually much less) into an unsigned char type variable using sscanf.
Using the typical
char source[] = "x32";
char separator;
unsigned char dest;
int len;
len = sscanf(source,"%c%d",&separator,&dest);
// validate and proceed...
I'm getting the expected warning: argument 4 of sscanf is type char*, int* expected.
As I understand the specs, there is no modifier for char (like %sd for short, or %lld for 64-bit long)
is it dangerous? (will overflow just overflow (roll-over) the variable or will it write outside the allocated space?)
is there a prettier way to achieve that than allocating a temporary int variable?
...or would you suggest an entirely different approach altogether?

You can use %hhd under glibc's scanf(), MSVC does not appear to support integer storage into a char directly (see MSDN scanf Width Specification for more information on the supported conversions)

It is dangerous to use that. Since there an implicit cast from a unsigned char* to an int*, if the number is more than 0xFF it is going to use bytes (max 3) next to the variable in the stack and corrupt their values.
The issue with %hhd is that depending of the size of an int (not necessarily 4 bytes), it might not be 1 byte.
It does not seem sscanf support storage of numbers into a char, I suggest you use an int instead. Although if you want to have the char roll-over, you can just cast the int into a char afterward, like:
int dest;
int len;
len = sscanf(source,"%c%d",&separator,&dest);
dest = (unsigned char)dest;

I want to read in a number never
exceeding 255 (actually much less)
into an unsigned char type variable
using sscanf.
In most contexts you save little to nothing by using char for an integer.
It generally depends on architecture and compiler, but most modern CPUs are not very well at handling data types which are of different in size from register. (Notable exception is the 32-bit int on 64-bit archs.)
Adding here penalties for non-CPU-word-aligned memory access (do not ask me why CPUs do that) use of char should be limited to the cases when one really needs a char or memory consumption is a concern.

It is not possible to do.
sscanf will never write single byte when reading an integer.
If you pass a pointer to a single allocated byte as a pointer to int, you will run out of bounds. This may be OKay due to default alignment, but you should not rely on that.
Create a temporary. This way you will also be able to run-time check it.

Probably the easiest way would be to load the number simply into a temporary integer and store it only if its in the required boundaries. (and you would probably need something like unsigned char result = tempInt & 0xFF; )

It's dangerous. Scanf will write integer sized value over the character-sized variable. In your example (most probably) nothing will happen unless you reorganize your stack variables (scanf will partially overwrite len when trying to write integer sized "dest", but then it returns correct "len" and overwrites it with "right" value).
Instead, do the "correct thing" rather than relying on compiler mood:
char source[] = "x32";
char separator;
unsigned char dest;
int temp;
int len;
len = sscanf(source,"%c%d",&separator,&temp);
// validate and proceed...
if (temp>=YOUR_MIN_VALUE && temp<=YOUR_MAX_VALUE) {
dest = (unsigned char)temp;
} else {
// validation failed
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Distinguish between string and byte array? - c

C using convention. Here's the rules I use (fashioned after the std lib) void foo(char* a_string); void bar(void* a_byte_array, size_t number_of_bytes_in_the_array); This is easy to remember. If you are passing a single char* ptr, then it MUST be a null-terminated char array.

Related

C dynamic array of strings

Declaring array of voids

Using pointers with functions and structs

C Language - [copy in place] float* to char* and the reverse

scanf("%d", char*) - char-as-int format string?

Categories

Resources