I am new programmer in general and I have start working now with c. I am trying to decode the IDEv3 mp3 tag and I came across with a variety of problems. While I was using the fread() and strncpy() commands I have noticed that both need to have the \n character as the end reference point. (Maybe I am wrong this is only an observation)
When I am printing the output they produce a non readable character. As a solution to overcome the problem I am using fread() for 4 Bytes instead of 3 in order to produce (8)\n characters (whole Byte), and a second step I am using strncpy() with 3 Bytes to an allocated memory which then I am using for printing. In theory when I am using fread() I should not encounter this problem.
A sample of code:
#include <stdio.h>
#include <stdlib.h>
typedef struct{
unsigned char header_id[3]; /* Unsigned character 3 Bytes (24 bits) */
}mp3_Header;
int main (int argc, char *argv[]) {
mp3_Header first;
unsigned char memory[4];
FILE *file = fopen( name.mp3 , "rb" );
if ( (size_t) fread( (void *) memory , (size_t) 4 , (size_t) 1 , (FILE *) file) !=1 ) {
printf("Could not read the file\n");
exit (0);
} /* End of if condition */
strncpy( (char *) first.header_id , (char *) memory , (size_t) 3);
printf ("This is the header_ID: %s\n", first.header_id);
fclose(file);
} /* End of main */
return 0;
Your observation with '\n' terminating strings isn't correct. Strings, in C, need to be terminated by a 0 byte (\0). However, some functions like fgets(), which are supposed to read lines from a file, take the \n at the end of the line as a terminator.
The problem with your code is that fread() ready binary data, and doesn't try to interpret that data as a string, which means it won't put the \0 at the end. But string functions, like strcpy, need this 0 byte to recognize the end of the string. strncpy stops after copying the \0 as well, but it won't ever put more bytes into the receiving string to prevent a buffer overflow. So it will copy your 3 bytes, but it won't put a \0 to the end of the string, as it would do if the string was shorter than the length argument.
So what you should do is declare header_id with one MORE element that what you actually need, and after the strcpy, set this extra element to \0. Like this:
strncpy( first.header_id , memory , 3);
first.header_id[3] = '\0';
Remember the 3 header bytes will go to array elements 0..2, so element 3 needs the terminator. Of course, you need to declare header_id[4] to have space for the extra \0.
Also note i omitted the type casts - you don't need them if your types are correct anyway. Passing an array to a function will pass a pointer to the 1st element anyway, so there's no need to cast the array header_id to a pointer in strncpy( (char *) first.header_id , (char *) memory , (size_t) 3);.
There are 2 correct ways of handling the header. I'm assuming the MP3 file has a IDV3 tag, so the file starts with "TAG" or "TAG+". So the part you want to read has 4 bytes.
a) You think of char *memory being a C "string", and first.header_id as well. Then do it this way (omitted everything else to show the important parts):
typedef struct{
unsigned char header_id[5];
} mp3_Header;
char memory[5];
fread(memory, 4, 1, file);
memory[4]='\0';
strncpy(first.header_id, memory, 5)
After the fread, your memory looks like this:
0 1 2 3 4
+----+----+----+----+----+
| T | A | G | + | ? |
+----+----+----+----+----+
The 5th byte, at index 4, is not defined, because you read only 4 bytes. If you use a string function on this string (for example printf("%s\n", memory)); the function doesn't know where to stop, because there is no terminating \0, and printf will continue to output garbage until the next \0 it finds somewhere in your computer's RAM. That's why you do memory[4]='\0' next so it looks like this:
0 1 2 3 4
+----+----+----+----+----+
| T | A | G | + | \0 |
+----+----+----+----+----+
Now, you can use strncpy to copy these 5 bytes to first.header_id. Note you need to copy 5 bytes, not just 4, you want the \0 copied as well.
(In this case, you could use strcpy (without n) as well - it stops at the first \0 it encounters. But these days, to prevent buffer overflows, people seem to agree on not using strcpy at all; instead, always use strncpy and explicitly state the length of the receiving string).
b) You treat memory as binary data, copy the binary data to the header, and then turn the binary data into a string:
typedef struct{
unsigned char header_id[5];
} mp3_Header;
char memory[4];
fread(memory, 4, 1, file);
memcpy(first.header_id, memory, 4)
first.header_id[4]='\0';
In this case, there is never a \0 at the end of memory. So it's sufficient to use a 4-byte-array now. In this case (copying binary data), you don't use strcpy, you use memcpy instead. This copies just the 4 bytes. But now, first.header_id has no end marker, so you have to assign it explicitly. Try drawing images like i did above if it isn't 100% clear to you.
But always remember: if you use operators like '+', you do NOT work upon the string. You work on the single characters. The only way, in C, to work on a string as a whole, is using the str* functions.
Yes, C strings always end in the null (0x00) character. It's the programmer's responsibility to understand that, and code appropriately.
For example, if your header_id will be up to a 3-printable-character string, you need to allocate 4 characters in that array to allow for the trailing null. (And you need to make sure that null is actually present.) Otherwise, printf won't know when to stop, and will keep printing until it finds a 0 byte.
When you copy binary data between buffers you should use appropriate function for the job, like memcpy(). Because you are dealing with binary data you must know exactly the length of the buffer as there is no null characters to indicate the end of data.
To make it a string simply allocate length+1 buffer and set the last byte to '\0' and voila, you have a string. However.. it is possible that there is already a null character in the binary data you copied so you should do some sanity checks before trusting it to really be a string you wanted. Something like \001 might be invalid id for mp3 format.. but it might be a broken file, you never know what you are dealing with.
Related
Relatively new C programmer here. I am reviewing the following code for a tutorial for a side project I am working on to practice C. The point of the abuf struct is to create a string that can be appended to. Here is the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct abuf {
char* str;
unsigned int size;
} abuf;
void abAppend(abuf *ab, const char *s, int len) {
char *new = realloc(ab->str, ab->size + len);
if (new == NULL) return;
memcpy(&new[ab->size], s, len);
ab->str = new;
ab->size += len;
}
int main(void) {
abuf ab = {
NULL,
0
};
char *s = "Hello";
abAppend(&ab, s, 5);
abAppend(&ab, ", world", 7);
return 0;
}
Everything compiles and my tests (redacted for simplicity) show that the string "Hello" is stored in ab's str pointer, and then "Hello, world" after the second call to abAppend. However, something about this code confuses me. On the initial call to abAppend, the str pointer is null, so realloc, according to its man page, should behave like malloc and allocate 5 bytes of space to store the string. But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string, if I understand this correctly. Isn't this null byte lost if we store "Hello\0" in a malloc-ed container large enough only to store "Hello"?
On the second call to abAppend, we concatenate ", world" to str. The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for. And yet, everything works, and if I test for the null byte with a loop like for (int i = 0; ab.str[i] != '\0'; i++), the loop works fine and increments i 12 times (0 thru 11), and stops, meaning it encountered the null byte on the 13th iteration. What I don't get is why does it encounter the null byte, if we don't allocate space for it?
I tried to break this code by doing weird combinations of strings, to no avail. I also tried to allocate an extra byte in each call to abAppend and changed the function a little to account for the extra space, and it performed the exact same as this version. How the null byte gets processed is eluding me.
How does realloc treat null bytes in strings?
The behavior of realloc is not affected by the contents of the memory it manages.
But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string,…
The characters are copied with memcpy(&new[ab->size], s, len);, where len is 5. memcpy copies characters without regard to whether there is a terminating null byte. Given length of 5, it copies 5 bytes. It does not append a terminating null character to those.
The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for.
On the second called to abAppend, 7 more bytes are copied with memcpy, after the first 5 bytes. memcpy is given a length of 7 and copies only 7 bytes.
… it encountered the null byte on the 13th iteration.
When you tested ab.str[12], you exceeded the rules for which the C standard defines the behavior. ab.str[12] is outside the allocated memory. It is possible it contained a null byte solely because nothing else in your process had used that memory for another purpose, and that is why your loop stopped. If you attempted this in the middle of a larger program that had done previous work, that byte might have contained a different value, and your test might have gone awry in a variety of ways.
You're correct that you only initially allocated space for the characters in the string "Hello" but not the terminating null byte, and that the second call only added enough bytes for the characters in tge string ", world" with no null terminating byte.
So what you have is an array of characters but not a string since it's not null terminated. If you then attempt to read past the allocated bytes, you trigger undefined behavior, and one of the ways UB can manifest itself is that things appear to work properly.
So you got "lucky" that things happened to work as if you allocated space for the null byte and set it.
I have the following piece of code, which a colleague claims may contain an out-of-bounds read, which I do not agree with. Could you help settle this argument and explain why?
char *test_filename = malloc(Size + 1);
sprintf(test_filename, "");
if (Size > 0 && Data)
snprintf(test_filename, Size + 1, "%s", Data);
where Data is a non-null-terminated string of type const uint8_t *Data and Size is the size of Data, i.e., number of bytes in Data, of type size_t.
It may read out-of-bounds because the format string is %s, perhaps?
Your colleague is correct. Perhaps unintuitively, snprintf(test_filename, Size + 1, "%s", Data) is guaranteed to read bytes starting at Data until a 0 byte is encountered, in your case typically resulting in an out-of-bounds read.
It will only write Size of these bytes to test_filename and null terminate them, respecting the size limit of the destination; but it will continue to read on. The reason for that is a design choice which enables the caller to determine the needed destination size for dynamic allocation before anything is actually written: snprintf() returns the number of bytes which would be written if the destination had infinite space. This feature is supposed to be used with a destination size of 0 (and potentially a null pointer as the destination). This functionality is useful for arguments which are not strings: With numbers etc. the size of the output is difficult to predict (e.g. locale dependent) and best left to the function at run time.
At the same time the return value indicates whether the output was truncated: If it is greater or equal to the size parameter, not all of the input was used in the output. In your case, what was left out were the bytes starting a Data[Size] and ending with the first 0 byte, or a segmentation fault ;-).
Suggestion for a fix: First of all it is unclear why you would use the printf family to print a string; simply copy it. And then Andrew has a point in his comments that since Datais not null terminated it is not really a string (even if all bytes are printable); so don't start fiddling with strcpy and friends but simply memcpy() the bytes, and null terminate manually.
Oh, and the preceding sprintf(test_filename, ""); does not serve any discernible purpose. If you want to write a null byte to *Data, simply do so; but since you are not using strcat, which would rely on a terminated destination string to extend, it is quite unnecessary.
from the MAN page for snprintf()
The functions snprintf() and vsnprintf() write at most size bytes (including the terminating null byte ('\0')) to str.
Note the at most size bytes
This means that snprintf() will stop transferring bytes after the parameter Size bytes are transferred.
this statement;
sprintf(test_filename, "");
is completely unneeded and has no effect on the operation of the second call to snprintf()
If you want to result in a 'proper' string, suggest:
char *test_filename = calloc( sizeof( char ), Size + 1);
if (Size > 0 && Data)
snprintf(test_filename, Size, "%s", Data);
however, the function: snprintf() keeps reading until a NUL byte is encountered. This can create problems, upto and including a seg fault event.
The function: memcpy() is made for this kind of job. Suggest replacing the call to snprintf() with
memcpy( test_filename, Data, Size );
In these two versions:
//VERSION 1
char *c=malloc(10);
c[0]='h';
c[1]='i';
c[2]='\0';
c[3]='l';
printf("%s\n",c);
I am getting the expected result i.e. hi is being printed.
Now in this one:
//VERSION 2
char *c;
size_t siz=8;
c=malloc(sizeof(char)*(siz+1)); //char size is 1 byte on system
getline(&c,&siz,stdin);
c[siz]='\0';
printf("%s\n",c);
On inputting the value 'hello world' the output is 'hello world' - I was expecting that it won't print anything after reading 9th byte (it is set to \0).
Why is there difference in the two?
Is it happening because pointer c in version 2 is made to point to stdin and `\0' modification doesn't work that way in a stream? If yes then why is compiler now issuing any warning or error?
As you yourself noted in the comments, getline will check the pointer and size arguments to see if it needs to reallocate (or allocate) the buffer in the event the line from the stream exceeds the given buffer's size (a NULL buffer of size 0 being a plain allocation instead of a reallocation). When this happens, both the pointer and the size arguments are changed to match the new buffer (remember, you passed in pointers to the buffer pointer and size arguments, not just the arguments themselves, references not values).
So, in your example, after allocating a buffer of size 9 chars (9 bytes in your case); your c pointer is set to some memory with at least 9 bytes available and siz is still 8. However, after typing a line longer than 8 characters (including the new line) like "hello world\n", the buffer is reallocated to fit the whole string "hello world\n\0", ie 13 bytes, AND the size arguments is changed to 13. So, when getline returns, c points to this new buffer and siz is 13. You don't need to add a null termination since getline does it for you (assuming it succeeds). What you are doing is then setting c[13] to '\0' which luckily for you didn't trigger any exceptions as you are accessing past the end of the buffer (making the string "hello world\n\0\0").
For the results you're looking for, keep the original size aside, like in a macro:
#define SIZE 8
char* c;
size_t siz = SIZE;
c = malloc(sizeof(char) * (siz +1));
getline(&c, &siz, stdin); // if you type something longer than 8 bytes including new line, it will trigger the realloc and siz will be changed
c[SIZE] = '\0'; // prematurely end the string at 8 bytes
printed("%s\n", c); // now you'll get shorter strings, noting siz will still keep the full length for you
I found this function on stackoverflow which concates two strings together. Here is the function:
char* concatstring(char *s1,char *s2)
{
char *result = malloc(strlen(s1)+strlen(s2)+1);
strcpy(result,s1);
strcat(result,s2);
return result;
}
My question is, why do we add 1 to the malloc call?
It's because in C "strings" are stored as arrays of chars followed by a null byte. This is by convention. Consequently, null bytes may not appear inside any C string.
However, the actual string itself does not contain the null byte (which is just part of the representation of the string), and so strlen reports the number of non-null bytes in the string. To create a C string that is the result of concatenating two strings, you thus need to leave room for the null terminator.
In fact, every string operation one way or another needs to deal with the null terminator. Unfortunately, the details vary from function to function (e.g. snprintf does it right, but strncpy is dangerously different), and you should read each function's manual very carefully to understand who takes care of the null terminator and how.
You need to allocate space for the '\0' (NULL character) which is used to terminate strings in C.
i.e. the string "cat" is actually "cat\0".
If the string is "cat":
char * mystring = "cat";
Then strlen(mystring), would return 3.
But in reality it takes 4 bytes to store mystring, with one byte to store null character.
So if you have two strings, "dog" and "cat", their length will be 3 and 3 , although the number of bytes required to store them would be 4 each. The memory required to store their concatenation would be 3+3 +1 = 7.
So the 1 in malloc is to allocate extra byte to store the null character.
I'm using a char[] of size 4 but when I use memcpy() function it stores 8 characters in it and also the character array length becomes 8. What is happing?
I don't want to use malloc ok.
char strRoh[4]={'\0'};
and then
memcpy(strRoh,Dump+22,4);
Now tell me whats wrong with this
char strIP[]="hhhhhhhh";
char strRoh[4]={'\0'};
char strTheta[4]={'\0'};
char strTimeStamp[6]={'\0'};
char strNMDump[48]={'\0'};
is there any problem with decelerations cause when i change there order they strings also change there size now strroh is getting 10 chars
what a hell is going on with this
C strings are 0-terminated. This means that if you want to have a string of length n in C, you need n+1 chars for it:
char hello[5] = "hello";
is not a string, because hello has space for 5 chars, and it doesn't end with 0.
char hello[6] = "hello";
is a string, and has 6 characters: h, e, l, l, o, 0.
To be able to use string related functions in C, you need the terminating 0.
So, change your code to have:
char strRoh[5]={'\0'};
char strTheta[5]={'\0'};
char strTimeStamp[7]={'\0'};
char strNMDump[49]={'\0'};
Note that in C, when you do:
char hello[] = "hello";
the compiler does the counting for you, and makes hello an array of size 6 (one terminating 0):
printf("%zu\n", sizeof hello);
will print 6.
The underlying type of the objects pointed by both the source and destination pointers are irrelevant for memcpy; The result is a binary copy of the data.
The function does not check for any terminating null character in source - it always copies exactly num bytes. My guess is you are not adding a terminating null and trying to access it as a string.
C does not have any kind of boundary check on its data types.
So what you are probably "seeing" when debugging the code is that it shows you 8 bytes in the array. As someone else says, you might be trying to view it as a string and do not have a terminating zero byte. This is quite normal in C, and it is one of the aspects of the language that makes it very hard to understand.
I can recommend you read a good introduction to memory and pointer handling under C, or switch to a managed language like C#, VB.NET, Java, Perl, Python etc.
I suppose that if char has 2 bytes if you memcpy to a byte array you might be getting 8 bytes, that is 2 bytes for each char.
I am however rusty at this C/C++ things. So hopefully somebody with more experience will give you a better answer.
The problem is you have a char array of 4 bytes and you writing full 4 bytes during memcpy without leaving any space for the terminating null character. Declare your array as 5 bytes and initialize it all to null (which you are already doing) and everything should be fine.