I have a thread which parses incomming characters/bytes one by one.
I would like to store the sequence of bytes in a byte pointer, and in the end when the sequence of "\r\n" is found it should print the full message out.
unsigned char byte;
unsigned char *bytes = NULL;
while (true){ // thread which is running on the side
byte = get(); // gets 1 byte from I/O
bytes = byte; //
*bytes++;
if (byte == 'x'){ // for now instead of "\r\n" i use the char 'x'
printf( "Your message: %s", bytes);
bytes = NULL; // or {0}?
}
}
You should define bytes as array with size of max message length not a pointer.
unsigned char byte, i;
unsigned char arr[10]; // 10 for example
i=0;
while (true){
byte = get();
arr[i] = byte;
i++;
if (byte == 'x'){
printf( "Your message: %s", arr);
}
}
When you define bytes as a pointer, it points to nothing and writing to it may erase other data in your program, you can make it array or allocate space for it in run time using malloc
Your Code
unsigned char byte;
unsigned char *bytes = NULL;
while (true){
Nothing wrong here, but some things must be cleared:
Did you alloc memory for your bytes buffer? That is, using malloc() family functions?
If so, did you check malloc() return and made sure the pointer is ok?
Did you include stdbool.h to use true and false?
Moving on...
byte = get();
bytes = byte;
*bytes++;
I'm assuming get() returns an unsigned char, since you didn't give the code.
Problem: bytes = byte. You're assigning an unsigned char to an unsigned char *. That's bad because unsigned char * is expecting a memory address (aka pointer) and you're giving it a character (which translates into a really bad memory address, cause you're giving addresses up to 255, which your program isn't allowed to access), and your compiler certainly complained about that assignment...
*byte++ has two "problems" (not being really problems): one, you don't need the * (dereferencing) operator to just increment the pointer reference, you could've done byte++; two, it was shorter and easier to understand if you switched this line and the previous one (bytes = byte) to *bytes++ = byte. If you don't know what this statement does, I suggest reading up on operator precedence and assignment operators.
Then we have...
if (byte == 'x'){
printf( "Your message: %s", bytes);
bytes = NULL;
}
if's alright.
printf() is messed up because you've been incrementing your bytes pointer the whole time while you were get()ting those characters. This means that the current location pointed by bytes is the end of your string (or message). To correct this, you can do one of two things: one, have a counter on the number of bytes read and then use that to decrement the bytes pointer and get the correct address; or two, use a secondary auxiliary pointer (which I prefer, cause it's easier to understand).
bytes = NULL. If you did malloc() for your bytes buffer, here you're destroying that reference, because you're making an assignment that effectively changes the address to which the pointer points to to NULL. Anyway, what you need to clear that buffer is memset(). Read more about it in the manual.
Another subtle (but serious) problem is the end of string character, which you forgot to put in that string altogether. Without it, printf() will start printing really weired things past your message until a Segmentation Fault or the like happens. To do that, you can use your already incremented bytes pointer and do *bytes = 0x0 or *bytes = '\0'. The NULL terminating byte is used in a string so that functions know where the string ends. Without it, it would be really hard to manipulate strings.
Code
unsigned char byte;
unsigned char *bytes = NULL;
unsigned char *bytes_aux;
bytes = malloc(500);
if (!bytes) return;
bytes_aux = bytes;
while (true) { /* could use while(1)... */
byte = get();
*bytes++ = byte;
if (byte == 'x') {
*(bytes - 1) = 0x0;
bytes = bytes_aux;
printf("Your message: %s\n", bytes);
memset(bytes, 0, 500);
}
}
if ((*bytes++ = get()) == 'x') is a compound version of the three byte = get(); *bytes++ = byte; if (byte == 'x'). Refer to that assignment link I told you about! This is a neat way of writing it and will make you look super cool at parties!
*(bytes - 1) = 0x0; The -1 bit is to exclude the x character which was saved in the string. With one step we exclude the x and set the NULL terminating byte.
bytes = bytes_aux; This restores bytes default state - now it correctly points to the beginning of the message.
memset(bytes, 0, 500) The function I told you about to reset your string.
Using memset is not necessary in this particular case. Every loop repetition we're saving characters from the beginning of the bytes buffer forward. Then, we set a NULL terminating byte and restore it's original position, effectively overwriting all other data. The NULL byte will take care of preventing printf() from printing whatever lies after the end of the current message. So the memset() part can be skipped and precious CPU time saved!
Somewhere when you get out of that loop (if you do), remember to free() the bytes pointer! You don't want that memory leaking...
Related
I'm required to save data, which happens to be a char array of 4096 characters tops, in a specified memory address.
So here i get the memory address from a uint32_t into the void* ptrAddress
uint32_t number = 1268;
void* ptrAddress = &number;
and here i try to copy the array of As, which works! but not really, because i get some garbage in the middle
char tryArray[4096];
for(int i = 0; i < 4095; i++ ){
tryArray[i] = 'A';
}
//EDIT: added the null terminator (forgot to do that)
tryArray[4095] = '\0';
char* copy = strcpy(ptrAddress, (char*)tryArray);
printf("lets see: %s\n", copy);
output (imagine 4096 As and 'garbage' as some garbage character):
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'garbage''garbage''garbage'AAAAAAAAAAAAAAAAA
And after that i get a seg fault
What am i doing wrong? If there's anything more you need to know from the code or my intentions with it, please tell me!
You have a couple of problems.
First, this code:
uint32_t number = 1268;
void* ptrAddress = &number;
Doesn't make a pointer to memory address 1268, like you seem to indicate you want it to. It makes a pointer to the integer where that 1268 is stored. You then overrun that storage by a lot, causing undefined behaviour, so after that it's game over.
If you want a pointer to a specific memory address:
void *ptrAddress = (void *)0x1268;
Make sure that address is legit in your environment/address space, though!
Second, strcpy works on null-terminated strings. You should probably use memcpy if you plan to work with a 4096 byte (non-null terminated) buffer. Note that means you can't print using printf (at least the way you're trying).
printf("%-4096s");
Should do it though.
I am working on a short program that reads a .txt file. Intially, I was playing around in main function, and I had gotten to my code to work just fine. Later, I decided to abstract it to a function. Now, I cannot seem to get my code to work, and I have been hung up on this problem for quite some time.
I think my biggest issue is that I don't really understand what is going on at a memory/hardware level. I understand that a pointer simply holds a memory address, and a pointer to a pointer simply holds a memory address to an another memory address, a short breadcrumb trail to what we really want.
Yet, now that I am introducing malloc() to expand the amount of memory allocated, I seem to lose sight of whats going on. In fact, I am not really sure how to think of memory at all anymore.
So, a char takes up a single byte, correct?
If I understand correctly, then by a char* takes up a single byte of memory?
If we were to have a:
char* str = "hello"
Would it be say safe to assume that it takes up 6 bytes of memory (including the null character)?
And, if we wanted to allocate memory for some "size" unknown at compile time, then we would need to dynamically allocate memory.
int size = determine_size();
char* str = NULL;
str = (char*)malloc(size * sizeof(char));
Is this syntactically correct so far?
Now, if you would judge my interpretation. We are telling the compiler that we need "size" number of contiguous memory reserved for chars. If size was equal to 10, then str* would point to the first address of 10 memory addresses, correct?
Now, if we could go one step further.
int size = determine_size();
char* str = NULL;
file_read("filename.txt", size, &str);
This is where my feet start to leave the ground. My interpretation is that file_read() looks something like this:
int file_read(char* filename, int size, char** buffer) {
// Set up FILE stream
// Allocate memory to buffer
buffer = malloc(size * sizeof(char));
// Add characters to buffer
int i = 0;
char c;
while((c=fgetc(file))!=EOF){
*(buffer + i) = (char)c;
i++;
}
Adding the characters to the buffer and allocating the memory is what is I cannot seem to wrap my head around.
If **buffer is pointing to *str which is equal to null, then how do I allocate memory to *str and add characters to it?
I understand that this is lengthy, but I appreciate the time you all are taking to read this! Let me know if I can clarify anything.
EDIT:
Whoa, my code is working now, thanks so much!
Although, I don't know why this works:
*((*buffer) + i) = (char)c;
So, a char takes up a single byte, correct?
Yes.
If I understand correctly, by default a char* takes up a single byte of memory.
Your wording is somewhat ambiguous. A char takes up a single byte of memory. A char * can point to one char, i.e. one byte of memory, or a char array, i.e. multiple bytes of memory.
The pointer itself takes up more than a single byte. The exact value is implementation-defined, usually 4 bytes (32bit) or 8 bytes (64bit). You can check the exact value with printf( "%zd\n", sizeof char * ).
If we were to have a char* str = "hello", would it be say safe to assume that it takes up 6 bytes of memory (including the null character)?
Yes.
And, if we wanted to allocate memory for some "size" unknown at compile time, then we would need to dynamically allocate memory.
int size = determine_size();
char* str = NULL;
str = (char*)malloc(size * sizeof(char));
Is this syntactically correct so far?
Do not cast the result of malloc. And sizeof char is by definition always 1.
If size was equal to 10, then str* would point to the first address of 10 memory addresses, correct?
Yes. Well, almost. str* makes no sense, and it's 10 chars, not 10 memory addresses. But str would point to the first of the 10 chars, yes.
Now, if we could go one step further.
int size = determine_size();
char* str = NULL;
file_read("filename.txt", size, &str);
This is where my feet start to leave the ground. My interpretation is that file_read() looks something like this:
int file_read(char* filename, int size, char** buffer) {
// Set up FILE stream
// Allocate memory to buffer
buffer = malloc(size * sizeof(char));
No. You would write *buffer = malloc( size );. The idea is that the memory you are allocating inside the function can be addressed by the caller of the function. So the pointer provided by the caller -- str, which is NULL at the point of the call -- needs to be changed. That is why the caller passes the address of str, so you can write the pointer returned by malloc() to that address. After your function returns, the caller's str will no longer be NULL, but contain the address returned by malloc().
buffer is the address of str, passed to the function by value. Allocating to buffer would only change that (local) pointer value.
Allocating to *buffer, on the other hand, is the same as allocating to str. The caller will "see" the change to str after your file_read() returns.
Although, I don't know why this works: *((*buffer) + i) = (char)c;
buffer is the address of str.
*buffer is, basically, the same as str -- a pointer to char (array).
(*buffer) + i) is pointer arithmetic -- the pointer *buffer plus i means a pointer to the ith element of the array.
*((*buffer) + i) is dereferencing that pointer to the ith element -- a single char.
to which you are then assigning (char)c.
A simpler expression doing the same thing would be:
(*buffer)[i] = (char)c;
with char **buffer, buffer stands for the pointer to the pointer to the char, *buffer accesses the pointer to a char, and **buffer accesses the char value itself.
To pass back a pointer to a new array of chars, write *buffer = malloc(size).
To write values into the char array, write *((*buffer) + i) = c, or (probably simpler) (*buffer)[i] = c
See the following snippet demonstrating what's going on:
void generate0to9(char** buffer) {
*buffer = malloc(11); // *buffer dereferences the pointer to the pointer buffer one time, i.e. it writes a (new) pointer value into the address passed in by `buffer`
for (int i=0;i<=9;i++) {
//*((*buffer)+i) = '0' + i;
(*buffer)[i] = '0' + i;
}
(*buffer)[10]='\0';
}
int main(void) {
char *b = NULL;
generate0to9(&b); // pass a pointer to the pointer b, such that the pointer`s value can be changed in the function
printf("b: %s\n", b);
free(b);
return 0;
}
Output:
0123456789
Fairly simple question regarding malloc. What is the max that I can set within the allocated area. For instance:
char *buffer;
buffer = malloc(20);
buffer[19] = 'a'; //Is this the highest spot I can set?
buffer[20] = 'a'; //Or is this the highest spot I can set?
free(buffer);
The phrasing of your question is a bit off. You mean "what is the maximum index I can use for an allocated block of memory". The answer is the same as for arrays.
If you are reading or writing the memory, you may safely use indices between (and including) 0 and one less than the size of the block (in your case, that means index 19). All up, that means you can access the 20 values that you asked for.
If you are simply obtaining the pointer for comparison with other pointers inside the same block (and you are not going to read or write to it), you may additionally obtain the pointer one-past-the-end (in your case that means index 20).
To clarify these things with examples:
Yes, buffer[19] = 'a'; is the very last value you may access in a read or write capacity. Don't forget that if you want to store a string in this memory, and hand it to functions that expect a null-terminated string, this slot is your last chance to put that value of '\0'.
You are allowed to access buffer[20] in the following manner:
char *p;
for( p = &buffer[0]; p != &buffer[20]; ++p )
{
putc( *p, stdout );
}
This is useful because of the way we tend to iterate over memory and store sizes. It would make our code quite less readable if we had to subtract 1 all over the place.
Oh, and it gives you the neat trick:
size_t buf_size = 20;
char *buffer = malloc(buf_size);
char *start = buffer;
char *end = buffer + buf_size;
size_t oops_i_forgot_the_size = end - start;
malloc(x) will allocate x bytes.
So by accessing buffer[0] you access the first byte, by accessing buffer[1] you access the second.
e.g
char * buffer = (char *) malloc(1);
buffer[0] = 0; // legal
buffer[1] = 0; // illegal
I'm trying to do some work with inotify and better understand C in general. I'm very novice. I was looking over the inotify man page and I saw an example of using inotify. I have some questions around how exactly they use buffers. The code is here:
http://man7.org/linux/man-pages/man7/inotify.7.html
The block I'm most interested is:
char buf[4096]
__attribute__ ((aligned(__alignof__(struct inotify_event))));
const struct inotify_event *event;
int i;
ssize_t len;
char *ptr;
/* Loop while events can be read from inotify file descriptor. */
for (;;) {
/* Read some events. */
len = read(fd, buf, sizeof buf);
if (len == -1 && errno != EAGAIN) {
perror("read");
exit(EXIT_FAILURE);
}
/* If the nonblocking read() found no events to read, then
it returns -1 with errno set to EAGAIN. In that case,
we exit the loop. */
if (len <= 0)
break;
/* Loop over all events in the buffer */
for (ptr = buf; ptr < buf + len;
ptr += sizeof(struct inotify_event) + event->len) {
event = (const struct inotify_event *) ptr;
What I'm trying to understand is is how exactly are the processing the bits in this buffer. This is what I know:
We define a char buf of 4096, which means we have a buffer just about 4kbs of size. When call read(fd, buf, sizeof buf) and len will be anywhere from 0 - 4096 (partial reads can occur).
We do some async checking, that's obvious.
Now we get to the for loop, here is where I'm a little confused. We set ptr equal to buf and then compare ptr's size to buff + len.
At this point does ptr equal the value '4096' ? And if so we are saying; is ptr:4096 < buf:4096 + len:[0-4096]. I'm using a colon here to signify what I think the variable's value is and [] meaning a range.
We then as the iterator expression, increase ptr+= the size of an inotify event.
I'm used to higher level OOP languages, in which I'd declare a buffer of 'inotify_event' objects. However I'm assuming since we are just getting back a byte array from 'read' we need to pull off the bites at the 'inotify_event' boundary and type cast those bits into an event object. Does this sounds correct?
Also I'm not exactly sure how comparison works with a buf[4096] values. We don't have concept of checking an array's current size (allocated indexes) so I'm assuming when used in comparison, we are comparing the size of it's allocated memory space '4096' in this case?
Thanks for the help, this is my first time really working with processing bits off a buffer. Trying to wrap my head around all this. Any further reading would be helpful! I've been finding a good amount of reading on C as a language, a good amount of reading on linux systems programming, but I can't seem to find topics such as 'working with buffers' or the grey area between the two.
When you do the assignment ptr = buf in C, you are assigning the address of the first element of buf to ptr. Thus, the comparison is checking whether ptr has gone beyond the end of the buffer.
The loop is jumping by the number of bytes needed to skip over one struct inotify_event, which is defined here, and the length of the name of the event.
ptr = buf
You are assigning the address of the first element of buf (i.e &buf[0]) to the pointer ptr. So you are starting looping through the buf using a pointer starting from the first element.
ptr < buf + len;
This is checking that your ptr pointer is "moving" through the array until the end of buf. It is made using pointer arithmetic. So the loop compare addresses of ptr pointed address with the address of buf + the len of buffer returned by read function.
ptr += sizeof(struct inotify_event) + event->len
Lastly the pointer is moved forward of size of the event struct struct inotify_event plus the event len, that I guess is variable based on the event type.
I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
So the prototype for my function is:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
The question: Is my way best practice, or is there a better way I should follow?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}
It's perfectly ok to return malloc'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc and xrealloc are "safe" allocating functions that I made up to skip NULL checks.)
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()ated string. It's a common idiom to return mallocated obejcts and have the caller free() them.
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()d, though).
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc().
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc() at all.
If you accept one last piece of good advice: it's advisable to const-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
All in all, I'd rewrite your function like this:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
It's perfectly OK to return newly-malloc-ed (and possibly internally realloced) values from functions, you just need to document that you are doing so (as you do here).
Other obvious items:
Instead of int int_length you might want to use size_t. This is "an unsigned type" (usually unsigned int or unsigned long) that is the appropriate type for lengths of strings and arguments to malloc.
You need to allocate n+1 bytes initially, where n is the length of the string, as strlen does not include the terminating 0 byte.
You should check for malloc failing (returning NULL). If your function will pass the failure on, document that in the function-description comment.
sscanf is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can use isxdigit from <ctype.h> to check for hexadecimal digits, and/or strtoul to do the conversion.
Rather than doing one realloc for every % conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing a realloc after all.
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca, you may not reallocate it.
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity of string. Thus, the string itself only starts at the 8th position *(mystring+8). By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.