Appending an Int to a char * in C - c

So I am looking to append the length of a cipher text onto the end of the char array that I am storing the cipher in. I am not a native to C and below is a test snippet of what I have devised that I think works.
...
int cipherTextLength = 0;
unsigned char *cipherText = NULL;
...
EVP_EncryptFinal_ex(&encryptCtx, cipherText + cipherTextLength, &finalBlockLength);
cipherTextLength += finalBlockLength;
EVP_CIPHER_CTX_cleanup(&encryptCtx);
// Append the length of the cipher text onto the end of the cipher text
// Note, the length stored will never be anywhere near 4294967295
char cipherLengthChar[1];
sprintf(cipherLengthChar, "%d", cipherTextLength);
strcat(cipherText, cipherLengthChar);
printf("ENC - cipherTextLength: %d\n", cipherTextLength);
...
The problem is I don't think using strcat when dealing with binary data is going to be trouble free. Could anyone suggest a better way to do this?
Thanks!
EDIT
Ok, so I'll add a little context as to why I was looking to append the length. In my encrypt function, the function EVP_EncryptUpdate requires the length of the plainText being encrypted. As this is much more easy to obtain, this part isn't a problem. However, similarly, using EVP_DecryptFinal_ex in my decrypt function requires the length of the ciperText being decrypted, so I need to store it somewhere.
In the application where I am implementing this, all I am doing is changing some poor hashing to proper encryption. To add further hassle, the way the application is I first need to decrypt information read in from XML, do something with it, then encrypt it and rewrite it to XML again, so I need to have this cipher length stored in the cipher somehow. I also don't have scope to redesign this.

Instead of what you are doing now, it may be smarter to encode the ciphertext size to a location before the ciphertext itself. Once you start decrypting, it is not very useful to find the size at the end. You need to know the end to get the size to find the end, not very helpful.
Furthermore, the ciphertext is binary, so you don't need to convert anything to string. You would like to convert it to a fixed number of bytes (otherwise you don't know the size of the size :P ). So create a bigger buffer (4 bytes more than you require for the ciphertext), and start encrypting to offset 4 forwards. Then copy the size of the ciphertext in at the start of the buffer.
If you don't know how to encode an integer, take a look at - for instance - this question/ answer. Note, this will only encode 32 bits for a maximum size of the ciphertext of 2^32, about 4 GiB. Furthermore, the link pointed to use Big Endian encoding. You should use either Big Endian (preferred for crypto code) or Little Endian encoding - but don't mix the two.
Neither the ciphertext nor the encoded size should be used as a character string. If you need a character string, my suggestion is to base 64 encode the buffer up to the end of the ciphertext.

I hope you are having enough big arrays both for cipherText and cipheLegthChar to store the required text. Hence instead of
unsigned char *cipherText = NULL;
You can have
unsigned char cipherText[MAX_TEXT];
similarly for
cipherLenghthChar[MAX_INT];
Or you can have them dynamically allocated.
where MAX_TEXT and MAX_INT max buffer size to store text and integer. Also after first call of EVP_EncryptFinal_ex NULL terminate cipherText so that you strcat works.

The problem is I don't think using strcat when dealing with binary data is going to be trouble free.
Correct! That's not your only problem though:
// Note, the length stored will never be anywhere near 4294967295
char cipherLengthChar[1];
sprintf(cipherLengthChar, "%d", cipherTextLength);
Even if cipherTextLength is 0 here, you've gone out of bounds, since sprintf will add a null terminator, making a total of two chars -- but cipherLengthChar only has room for one.
If you consider, e.g. 4294967295, as a string, that's 10 chars + '\0' = 11 chars.
It would appear that finalBlockLength is the length of the data put into cipherText. However, the EVP_EncryptFinal_ex() call will probably fail in one way or another, or at least not do what you want, since cipherText == NULL. You then add 0 to that (== 0 aka. still NULL) and submit it as a parameter. Either you need a pointer to a pointer there (if EVP_EncryptFinal_ex is going to allocate space for you), or else you have to make sure there is enough room in cipherText to start with.
With regard to tacking text (or whatever) onto the end, you can just use sprintf directly:
sprintf(cipherText + finalBlockLength, "%d", cipherTextLength);
Presuming that cipherText is non-NULL and has enough extra room in it (see first couple of paragraphs).
However, I'm very dubious that doing that will be useful later on, but since I don't have any further context, I can't say more.

Related

GNU memmem vs C strstr

Is there any use case that can be solved by memmem but not by strstr?
I was thinking of able to parse a string raw bytes (needle) inside a bigger string of raw bytes(haystack). Like trying to find a particular raw byte pattern inside a blob of raw bytes read from C's read function.
There is the case you mention: raw binary data.
This is because raw binary data may contain zeroes, which are interpreted and string terminator by strstr, making it ignore the rest of the haystack or needle.
Additionally, if the raw binary data contains no zero bytes, and you don't have a valid (inside the same array or buffer allocation) extra zero after the binary data, then strstr will happily go beyond the data and cause Undefined Behavior via buffer overflow.
Or, to the point: strstr can't be used if the data is not strings. memmem doesn't have this limitation.
In addition to searching in non-string data, memmem() can be used to look for substrings in just a portion of a longer string, something strstr() can't do:
char somestr[] = "a long string with the word apple";
// Look in just the 5th through 15 characters
// (Haystack must have at least 15 characters or else)
char *loc = memmem(somestr + 4, 10, "pp", 2);
and if you already know the lengths of the strings, it might be faster than strstr() when used on the entire haystack string, but that depends a lot on the implementation and should be benchmarked.

How to determine the size of a string in C, or at least ensuring that it doesn't exceed a maximum number of bytes?

Is it possible to determine the size in bytes of a string in C?
I'm trying to ensure that JSON strings built in C do not exceed a 1 MB size limit before passing them to the requesting application. I don't know the strings at compile time.
I've read that it is just strlen * sizeof( char ); but I don't understand that, because I read elsewhere that UTF-8 can have characters of size up to four bytes and sizeof( char ) is always one.
I am likely misunderstanding something basic.
If a character array is allocated as char JSON[1048576], does this allocate that many characters or bytes? If it is bytes, then as long as something like snprintf is used when writing to JSON array, would this guarantee that it can never exceed 1 MB in size, even if there were character in that array that exceed one byte?
Thank you.
Since you are after a size limit 1MB and not a string length limit per se, you can just use strlen(json_str). Provided that your json string is null terminated, '\0'.
If you allocate char JSON[1048576] that will give you an array with that many bytes. And snprintf(JSON, 1048576, "<json string>", ...) will guarantee that you never overfill your array.
It does not guarantee however that your string is a valid utf-8 string since the last character may be a multi byte character that is split in the middle.
A C char is not the same as a utf-8 character. In C char is by definition 1 Byte but in utf-8 the visual character that you want, like the heart in your comment, may be represented by several bytes of data.
One byte gives you 256 different values and since there are way more than 256 Unicode "characters" more than one byte is needed to encode many of them. The designers of utf-8 was clever though so the first 127 characters can be encoded using just one byte and if only those characters are used it will both valid utf-8 and ascii.

Why does dynamically allocated array does not update with the new data coming?

I am trying to receive a message from the socket server which sends a large file of around 7MB. Thus in the following code, I try to concatenate all data into one array s from buffer. But as I try the following, I see that the length of s does not change at all, although the total bytes received continue to increase.
char buffer[300];
char* s = calloc(1, sizeof(char));
size_t n = 1;
while ((b_recv = recv(socket_fd,
buffer,
sizeof(buffer), 0)) > 0) {
char *temp = realloc(s, b_recv + n);
s = temp;
memcpy(s + n -1, buffer, b_recv);
n += b_recv;
s[n-1] = '\0';
printf("%s -- %zu",s, strlen(s));
}
free(s);
Is this not the correct way to update receive data of varying sizes? Also when I try to print s, it gives some random question mark characters. What is the mistake that I am making?
Why does dynamically allocated array does not update with the new data coming?
You have not presented any reason to believe that the behavior is as the question characterizes it. You are receiving binary data and storing it in memory, which is fine, but you cannot expect sensible results from treating such data as if it were a C string. Not even when you replace the last byte with a string terminator.
Binary data can and generally does contain bytes with value 0. C strings use such bytes as terminators marking the end of the string data, so, for example, strlen will measure only the number of bytes before the first zero byte, regardless of how many additional bytes have been stored after it. Moreover, even if you do not receive any zero bytes at all, your particular code inserts them, clobbering some of the real bytes received.
You may attempt to print such data to the console as if it were text, but if in fact it does not consist of text encoded according to the runtime character encoding then there is no reason to expect the resulting display to convey useful information. Instead, examine it in memory via a debugger, or write the raw bytes to a file and examine the result with a hex editor, or write them (still raw) through a filter that converts to hexadecimal or some other text representation, or similar. And you have as many bytes to examine as you have copied to the allocated space. You're keeping track of that already, so you don't need strlen() to tell you how many that is.

Appending a char w/ null terminator in C

perhaps a lil trivial, but im just learning C and i hate doing with 2 lines, what can be done with one(as long as it does not confuse the code of course).
anyway, im building strings by appending one character at a time. im doing this by keeping track of the char index of the string being built, as well as the input file string's(line) index.
str[strIndex] = inStr[index];
str[strIndex + 1] = '\0';
str is used to temporarily store one of the words from the input line.
i need to append the terminator every time i add a char.
i guess what i want to know; is there a way to combine these in one statement, without using strcat()(or clearing str with memset() every time i start a new word) or creating other variables?
Simple solution: Zero out the string before you add anything to it. The NULs will already be at every location ahead of time.
// On the stack:
char str[STRLEN] = {0};
// On the heap
char *str = calloc(STRLEN, sizeof(*str));
In the calloc case, for large allocations, you won't even pay the cost of zeroing the memory explicitly (in bulk allocation mode, it requests memory directly from the OS, which is either lazily zero-ed (Linux) or has been background zero-ed before you ask for it (Windows)).
Obviously, you can avoid even this amount of work by defering the NUL termination of the string until you're done building it, but if you might need to use it as a C-style string at any time, guaranteeing it's always NUL-terminated up front isn't unreasonable.
I believe the way you are doing it now is the neatest that satisfies your requirement of
1) Not having string all zero to start with
2) At every stage the string is valid (as in always has a termination).
Basically you want to add two bytes each time. And really the most neat way to do that is the way you are doing it now.
If you are wanting to make the code seem neater by having the "one line" but not calling a function then perhaps a macro:
#define SetCharAndNull( String, Index, Character ) \
{ \
String[Index] = (Character); \
String[Index+1] = 0; \
}
And use it like:
SetCharAndNull( str, strIndex, inStr[index]);
Otherwise the only other thing I can think of which would achieve the result is to write a "word" at a time (two bytes, so an unsigned short) in most cases. You could do this with some horrible typecasting and pointer arithmetic. I would strongly recommend against this though as it won't be very readable, also it won't be very portable. It would have to be written for a particular endianness, also it would have problems on systems that require alignment on word access.
[Edit: Added the following]
Just for completeness I'm putting that awful solution I mentioned here:
*((unsigned short*)&str[strIndex]) = (unsigned short)(inStr[index]);
This is type casting the pointer of str[strIndex] to an unsigned short which on my system (OSX) is 16 bits (two bytes). It is then setting the value to a 16 bit version of inStr[index] where the top 8 bits are zero. Because my system is little endian, then the first byte will contain the least significant one (which is the character), and the second byte will be the zero from the top of the word. But as I said, don't do this! It won't work on big endian systems (you would have to add in a left shift by 8), also this will cause alignment problems on some processors where you can not access a 16bit value on a non 16-bit aligned address (this will be setting address with 8bit alignment)
Declare a char array:
char str[100];
or,
char * str = (char *)malloc(100 * sizeof(char));
Add all the character one by one in a loop:
for(i = 0; i<length; i++){
str[i] = inStr[index];
}
Finish it with a null character (outside the loop):
str[i] = '\0';

Scanning a file and allocating correct space to hold the file

I am currently using fscanf to get space delimited words. I establish a char[] with a fixed size to hold each of the extracted words. How would I create a char[] with the correct number of spaces to hold the correct number of characters from a word?
Thanks.
Edit: If I do a strdup on a char[1000] and the char[1000] actually only holds 3 characters, will the strdup reserve space on the heap for 1000 or 4 (for the terminating char)?
Here is a solution involving only two allocations and no realloc:
Determine the size of the file by seeking to the end and using ftell.
Allocate a block of memory this size and read the whole file into it using fread.
Count the number of words in this block.
Allocate an array of char * able to hold pointers to this many words.
Loop through the block of text again, assigning to each pointer the address of the beginning of a word, and replacing the word delimiter at the end of the word with 0 (the null character).
Also, a slightly philosophical matter: If you think this approach of inserting string terminators in-place and breaking up one gigantic string to use it as many small strings is ugly, hackish, etc. then you probably should probably forget about programming in C and use Python or some other higher-level language. The ability to do radically-more-efficient data manipulation operations like this while minimizing the potential points of failure is pretty much the only reason anyone should be using C for this kind of computation. If you want to go and allocate each word separately, you're just making life a living hell for yourself by doing it in C; other languages will happily hide this inefficiency (and abundance of possible failure points) behind friendly string operators.
There's no one-and-only way. The idea is to just allocate a string large enough to hold the largest possible string. After you've read it, you can then allocate a buffer of exactly the right size and copy it if needed.
In addition, you can also specify a width in your fscanf format string to limit the number of characters read, to ensure your buffer will never overflow.
But if you allocated a buffer of, say 250 characters, it's hard to imaging a single word not fitting in that buffer.
char *ptr;
ptr = (char*) malloc(size_of_string + 1);
char first = ptr[0];
/* etc. */

Resources