strtol result mismatch while string conversion - c

After checking out this question I did not found required solution, so I've tried to use strtol in following manner:
in = (unsigned char *)malloc(16);
for(size_t i = 0; i < (size_t)strlen(argv[2]) / 2; i+=2 )
{
long tmp = 0;
tmp = strtol((const char *)argv[2] + i, &argv[2] + 2, 16);
memcpy(&in[i], &tmp, 1);
}
This code produced several intermediate values:
Can someone please explain me why entire in array gets filled by 0xFF(255) bytes and why tmp is not equal it's estimated value?
Tips about how to improve above code to fill in array with correct hex values also welcome.

Your code is erroneous for multiple counts and the casts hide the problems:
Casting the return value of malloc is not necessary in C and can potentially hide an unsafe conversion if you forget to include <stdlib.h>:
in = (unsigned char *)malloc(16);
Casting the return value of strlen to (size_t) is useless, even redundant as strlen is defined to return a size_t. Again you might have forgotten to include <string.h>...
for (size_t i = 0; i < (size_t)strlen(argv[2]) / 2; i += 2) {
long tmp = 0;
strtol takes a const pointer to char, which argv[2] + i will convert to implicitly. The cast (const char*) is useless. The second argument is the address of a char*. You pass the address of the fifth element of argv, in other terms &argv[4], most certainly not what you indent to do, although your loop's purpose is quite obscure...
tmp = strtol((const char *)argv[2] + i, &argv[2] + 2, 16);
Copying the long value in tmp with memcpy would require to copy sizeof(tmp) bytes. Copying the first byte only has an implementation defined effect, depending on the size of char and the endianness of the target system:
memcpy(&in[i], &tmp, 1);
}
You should post a complete compilable example that illustrates your problem, a code fragment is missing important context information, such as what header files are included, how variables are defined and what code is executed before the fragment.
As written, your code does not make much sense, trying to interpret its behavior is pointless.
Regarding the question in reference, your code does not even remotely provide a solution for converting a string of hexadecimal characters to an array of bytes.

Related

Possible "Integer Overflow or Wraparound" flaw in this C code

Below I have extracted a piece of code from a project over which I ran a static analysis tool looking for security flaws. It flagged this code as being susceptible to an integer overflow/wraparound flaw as explained here:
https://cwe.mitre.org/data/definitions/190.html
Here is the relevant code:
#define STRMAX 16
void bar() {
// ... struct malloc'd elsewhere
mystruct->string = malloc(STRMAX * sizeof(char));
memset(mystruct->string, '\0', STRMAX);
strncpy(mystruct->string,"H3C19H1E4XAA9MQ",STRMAX); // 15 character string
mystruct->baz = malloc(3 * sizeof(char*));
memset(mystruct->baz, '\0', 3);
if (strlen(mystruct->string) > 0) {
strncpy(mystruct->baz,&(mystruct->string[0]),2);
}
mystruct->quux = malloc(19 * sizeof(char));
memset(mystruct->quux, '\0', 19);
for(int i = 0; i < 19; i++) {
mystruct->quux[i] = 'a';
}
foo(struct);
}
void foo(Mystruct *struct) {
// this was flagged
size_t input_len = strlen(mystruct->baz) + strlen(mystruct->quux);
// this one too but it flows from the first I think.
char *input = malloc((input_len + 1) * sizeof(char));
memset(input, '\0', input_len + 1);
strncpy(input, mystruct->baz, strlen(mystruct->baz));
strncat(input, mystruct->quux, strlen(mystruct->quux));
// ...
}
So in other words, there are three members of a struct that are explicitly bounded in one function and then used to create a variable in another one based on the size of the struct's members.
The analyzer flagged the first line of foo in particular. My question is, did it correctly flag this? If so, what would be a simple way to mitigate this? I'm on a platform that doesn't have a BSD-style reallocate function by default.
PS: I realize that strncpy(foo, bar, strlen(bar)) is somewhat frivolous in memory terms.
size_t input_len = strlen(struct->baz) + strlen(struct->quux); can theoretically wraparound so the analyser is right. If strlen(struct->quux) is larger than SIZE_MAX - strlen(struct->baz) it will wraparound
If input_len == SIZE_MAX then input_len + 1 can wraparound as well.
When you add two string lengths, the result can exceed the maximum value of the size_t type. For instance, if SIZE_MAX were 100 (I know this is not realistic, it's just for example purposes), and your strings were 70 and 50 characters long, the total would wrap around to 20, which is not the correct result.
This is rarely a real concern, since the actual maximum is usually very large (the C standard allows it to be as low as 65535, but you're only likely to run into this on microcontrollers), and in practice strings are not even close to the limit, so most programmers would just ignore this problem. Protecting against this overflow gets complicated, although it's possible if you really need to.

Removing duplicate characters from two argument strings in C

I'm trying to optimize a problem that I have to make it more readable with the same speed optimization. My problem consists in this:
Allowed function: write.c, nothing else.
Write a program that takes two strings and displays, without doubles, the
characters that appear in either one of the strings.
The display will be in the order characters appear in the command line, and
will be followed by a \n.
As you can see, in the main it will take two of your argument strings (argv[1] and argv[2]) into our function (void remove_dup(char *str, char *str2) after is compiling it with GCC. That temporary array will hold the ASCII value of the character after a duplicate is detected. For example, str1 = "hello" and str2 = "laoblc". The expected output will result as "heloabc" using the write function.
However, GCC was complaining because I have an array subscript with my temporary character array filled in with zeroes from the index of my strings. To stop making the compiler complaint, I had to cast the string index as an int to hold the ASCII value inside my temporary array. This will be our checker, which will determine if there is a duplicate in our string depending on the value of the character. Recompiling it again, but this time using warning flags: gcc -Wextra -Werror -Wall remove_dup.c. This is the error that I get:
remove_dup:11 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
if (temp[str[i]] == 0)
^~~~~~~
remove_dup:13 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
temp[str[i]] = 1;
^~~~~~~
remove_dup:21 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
if (temp[str2[i]] == 0)
^~~~~~~~
remove_dup.c:23 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
temp[str2[i]] = 1;
^~~~~~~~
Now my real question is, how can I have the same time efficiency BUT without using any kind of casting into my array? This program is running as O(m + n) where m is our first string and n is our second string.
This is the code:
void remove_dup(char *str, char *str2)
{
int temp[10000] = {0};
int i;
i = 0;
while (str[i])
{
if (temp[(int)str[i]] == 0)
{
temp[(int)str[i]] = 1;
write(1, &str[i], 1);
}
i++;
}
i = 0;
while (str2[i])
{
if (temp[(int)str2[i]] == 0)
{
temp[(int)str2[i]] = 1;
write(1, &str2[i], 1);
}
i++;
}
}
int main(int argc, char *argv[])
{
if (argc == 3)
remove_dup(argv[1], argv[2]);
write(1, "\n", 1);
return (0);
}
I hope this is clear enough with the logic structure I explained. I might have grammar mistakes, so bear with me :).
Casting here will have no performance penalty.
However, as a rule of thumb, it is generally best to avoid explicit casts whenever possible. You can do this by for example by changing:
temp[(int)str[i]]
to:
temp[+str[i]]
This will work by the usual arithmetic conversions.
However, your code has another problem. You could ask: why would gcc bother to issue such an annoying warning message?
One answer is that they just like to be annoying. A better guess is that on most platforms char is signed-- see Is char signed or unsigned by default? --and so if your string happen to have an ASCII char greater than 127 (i.e. less than zero), you will have a bug.
One way to fix this is to replace:
temp[(int)str[i]]
with:
temp[str[i] + 128]
(and change int temp[10000] = {0} to int temp[256 + 128] = {0}). This will work regardless of the default sign of char.
Now my real question is, how can I have the same time efficiency BUT without using any kind of casting into my array?
I don't believe casting in C has a runtime penalty. Everything in C is a number anyway. I believe it's just telling the compiler that yes, you know you're using the wrong type and believe it's ok.
Note that char can be signed. It is possible for a negative number to sneak in there.
This program is running as O(m * n) where m is our first string and n is our second string.
No, it's running as O(n). O(m*n) would be if you were iterating over one string for every character of the other.
for( int i = 0; i < strlen(str1); i++ ) {
for( int j = 0; j < strlen(str2); j++ ) {
...
}
}
But you're looping over each string one after the other in two independent loops. This is O(m + n) which is O(n).
On to improvements. First, temp only ever needs to hold the char range which is, at most, 256. Let's give it a variable name that describes what it does, chars_seen.
Finally, there's no need to store a full integer. Normally we'd use bool from stdbool.h, but we can define our own using signed char which is what stdbool.h is likely to do. We're sure to wrap it in an #ifndef bool so we use the system supplied one if available, it will know better than we do what type to use for a boolean.
#ifndef bool
typedef signed char bool;
#endif
bool chars_seen[256] = {0};
You might be able to get a bit more performance by eliminating i and instead increment the pointer directly. Not only more performance, but this makes many string and array operations simpler.
for( ; *str != '\0'; str++ ) {
if( !chars_seen[(size_t)*str] ) {
chars_seen[(size_t)*str] = 1;
write(1, str, 1);
}
}
Note that I'm casting to size_t, not int, because that is the proper type for an index.
You might be able to shave a touch off by using post-increment, whether this helps is going to depend on your compiler.
if( !chars_seen[(size_t)*str]++ ) {
write(1, str, 1);
}
Finally, to avoid repeating your code and to extend it to work with any number of strings, we can write a function which takes in the set of characters seen and displays one string. And we'll give the compiler the hint to inline it, though it's of questionable use.
inline void display_chars_no_dups( const char *str, bool chars_seen[]) {
for( ; *str != '\0'; str++ ) {
if( !chars_seen[(size_t)*str]++ ) {
write(1, str, 1);
}
}
}
Then main allocates the array of seen characters and calls the function as many times as necessary.
int main(int argc, char *argv[]) {
bool chars_seen[256] = {0};
for( int i = 1; i < argc; i++ ) {
display_chars_no_dups( argv[i], chars_seen );
}
write(1, "\n", 1);
}

MD4 hash with openssl, save result into char array

I wrote a simple example with openssl in C. I wanted to compute MD4 hash from my message but I want to save result into a char array. Heres my code with comments which will help you understand what I want to achieve:
#include <string.h>
#include <openssl/md4.h>
#include <stdio.h>
int main()
{
unsigned char digest[MD4_DIGEST_LENGTH];
char string[] = "hello world";
// run md4 for my msg
MD4((unsigned char*)&string, strlen(string), (unsigned char*)&digest);
// save md4 result into char array - doesnt work
char test[MD4_DIGEST_LENGTH];
sprintf(test, "%02x", (unsigned int)digest);
for(int i = 0; i < MD4_DIGEST_LENGTH; i++)
printf("%02x", test[i]);
printf("\n\n");
// print out md4 result - works, but its not intochar array as I wanted it to be
for(int i = 0; i < MD4_DIGEST_LENGTH; i++)
printf("%02x", digest[i]);
printf("\n\n");
// works but i dont understand why 'mdString' is 33 size
char mdString[33];
for(int i = 0; i < MD4_DIGEST_LENGTH; i++)
// and I also dont get i*2 in this loop
sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]);
printf("md4 digest: %s\n", mdString);
return 0;
}
The question is, why this code below doesnt work, it shows different md4 value than it should be:
char test[MD4_DIGEST_LENGTH];
sprintf(test, "%02x", (unsigned int)digest);
for(int i = 0; i < MD4_DIGEST_LENGTH; i++)
printf("%02x", test[i]);
printf("\n\n");
and how can I know what size should be mdString and why is there i*2 in the last loop? Can anybody explain this?
Firstly, your call to MD4() provides an incorrect address for the string and digest arrays: by using &, you are getting the array's address (char **), not the address of the first character.
Since you are explicitly casting &string and &digest to unsigned char*, the compiler won't warn you. Remove the casts, and you will receive this warning:
warning: passing argument 1 of 'MD4' from incompatible pointer type
So instead call MD4() this way:
MD4(string, strlen(string), digest);
I personally prefer to avoid explicitly casting pointers unless it is really necessary, that way you will catch incorrect type casting much more easily.
Next, you attempt to use sprintf() to convert digest to a hexadecimal integer: sprintf(test, "%02x", (unsigned int)digest);.
Two things wrong here: (a) since digest is essentially a character pointer, ie: memory address, you're turning this address into an unsigned integer and then turning that integer into a hex; (b) you need to loop over the elements of digest and convert each one into a character, snprintf won't do this for you!
I see that you may be relatively new to C given the mistakes made, but don't dispair, making mistakes is the way to learn! :)
If you can afford the book, I highly recommend "C Primer Plus" by Stephen Prata. It's a great intro for anyone starting out programming and it's a very complete reference for later use when you are already comfortable with the language.
Otherwise, there is plenty of material online, and googling "C pointer tutorial" will return several useful results.
Hope this helps!
EDIT:
Forgot to comment about the other snippet of code that does work, but uses 33 bytes to store the string-ized MD4 hash:
// works but i dont understand why 'mdString' is 33 size
char mdString[33];
for(int i = 0; i < MD4_DIGEST_LENGTH; i++)
// and I also dont get i*2 in this loop
sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]);
printf("md4 digest: %s\n", mdString);
The openssl manpage for MD4() states that the hash is 16 bytes long.
Knowing this, and the fact that each unsigned char can hold a value from 0 to 255, then the maximum hexadecimal representation for any individual element in digest is 0xFF, in other words, 2 ASCII characters per unsigned char.
The reason the size for msString (33) appears cryptic is because MD4_DIGEST_LENGTH should have been used to calculate the size of the array: you need 2 characters to represent each one of the elements in digest + 1 null terminator ('\0') to end the string:
char mdString[(MD4_DIGEST_LENGTH * 2) + 1];
sprintf will print 2 characters to the mdString array whenever it's fed 1 byte from digest, so you need to advance 2 index positions in mdString for each position in digest, hence the use of i * 2. The following produces the same result as using i * 2:
for(int i = 0, j = 0; i < MD4_DIGEST_LENGTH; i++, j += 2)
sprintf(&mdString[j], "%02x", (unsigned int)digest[i]);

Reallocing a char*

I am trying to do a function that will store in a char array some information to print on it:
int offset = 0;
size_t size = 1;
char *data = NULL;
data = malloc(sizeof(char));
void create(t_var *var){
size_t sizeLine = sizeof(char)*(strlen(var->nombre)+2)+sizeof(int);
size = size + sizeLine;
realloc(data, size);
sprintf(data+offset,"%s=%d\n",var->name,var->value);
offset=strlen(data);
}
list_iterate(aList, (void *)create);
t_var is a struct that has two fields: name (char*) and value (int).
What's wrong with this code? When running it on Valgrind it complains about the realloc and sprintf.
Without knowing the specific valgrind errors, the standout one is:
realloc(data, size); should be data = realloc(data, size);
I'm sorry to say that, but almost EVERYTHING is wrong with your code.
First, incomplete code.
You say your t_var type has two members, name and value.
But your code refers to a nombre member. Did you forget to mention it or did you forget to rename it when publishing the code?
Second, misused sizeof.
You use a sizeof(int) expression. Are you aware what you actually do here?!
Apparently you try to calculate the length of printed int value. Alas, operator sizeof retrieves the information about a number of bytes the argument occupies in memory. So, for example, for 32-bits integer the result of sizeof(int) is 4 (32 bits fit in 4 bytes), but the maximum signed 32-bit integer value is power(2,31)-1, that is 2147483647 in decimal. TEN digits, not four.
You can use (int)(2.41 * sizeof(any_unsigned_int_type)+1) to determine a number of characters you may need to print the value of any_unsigned_int_type. Add one for a preceding minus in a case of signed integer types.
The magic constant 2.41 is a decimal logarithm of 256 (rounded up at the 3-rd decimal digi), thus it scales the length in bytes to a length in decimal digits.
If you prefer to avoid floating-point operations you may use another approximation 29/12=2.41666..., and compute (sizeof(any_unsigned_int_type)*29/12+1).
Third, sizeof(char).
You multiply the result of strlen by sizeof(char).
Not an error, actually, but completely useless, as sizeof(char) equals 1 by definition.
Fourth, realloc.
As others already explained, you must store the return value:
data = realloc(data, size);
Otherwise you risk you loose your re-allocated data AND you continue writing at the previous location, which may result in overwriting (so destroying) some other data on the heap.
Fifth, offset.
You use that value to determine the position to sprintf() at. However, after the print you substitute offset with a length of last printout instead of incrementing it. As a result consecutive sprintfs will overwrite previous output!
Do:
offset += strlen(data);
Sixth: strlen of sprintf.
You needn't call strlen here at all, as all functions of printf family return the number of characters printed. You can just use that:
int outputlen = sprintf(data+offset, "%s=%d\n", var->name, var->value);
offset += outputlen;
Seventh: realloc. Seriously.
This is quite costly function. It may need to do internal malloc for a new size of data, copy your data into a new place and free the old block. Why do you force it? What impact will it have on your program if it needs to print five thousand strings some day...?
It is also quite dangerous. Really. Suppose you need to print 5,000 strings but there is room for 2,000 only. You will get a NULL pointer from realloc(). All the data printed to the point are still at the current data pointer, but what will you do next?
How can you tell list_iterate to stop iterating...?
How can you inform the routine above the list_iterate that the string is incomplete...?
There is no good answer. Luckily you needn't solve the problem — you can just avoid making it!
Solution.
Traverse your list first and calculate the size of buffer you need. Then allocate the buffer — just once! — and go on with filling it. There is just one place where the allocation may fail and you can simply not go into the problem if that ever happens:
int totaloutputlength = 0;
char *outputbuffer = NULL;
char *currentposition = NULL;
void add_var_length(t_var *var){
const int numberlength = sizeof(var->value)*29/12 + 1;
totaloutputlength += strlen(var->name) + 2 + numberlength;
}
void calculate_all_vars_length(t_list *aList){
totaloutputlength = 0;
list_iterate(aList, (void *)add_var_length);
}
void sprint_var_value(t_var *var){
int outputlen = sprintf(currentposition, "%s=%d\n", var->name, var->value);
currentposition += outputlen; // advance the printing position
}
int sprint_all_vars(t_list *aList){
calculate_all_vars_length(aList);
outputbuffer = malloc(totaloutputlength + 1); // +1 for terminating NUL char
// did allocation succeed?
if(outputbuffer == NULL) { // NO
// possibly print some error message...
// possibly terminate the program...
// or just return -1 to inform a caller something went wrong
return -1;
}
else { // YES
// set the initial printing position
currentposition = outputbuffer;
// go print all variables into the buffer
list_iterate(aList, (void *)sprint_var_value);
// return a 'success' status
return 0;
}
}

Pointer initializiation? for a specific function

Alright, this one's been puzzling me for a bit.
the following function encodes a string into base 64
void Base64Enc(const unsigned char *src, int srclen, unsigned char *dest)
{
static const unsigned char enc[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
unsigned char *cp;
int i;
cp = dest;
for(i = 0; i < srclen; i += 3)
{
*(cp++) = enc[((src[i + 0] >> 2))];
*(cp++) = enc[((src[i + 0] << 4) & 0x30)
| ((src[i + 1] >> 4) & 0x0f)];
*(cp++) = enc[((src[i + 1] << 2) & 0x3c)
| ((src[i + 2] >> 6) & 0x03)];
*(cp++) = enc[((src[i + 2] ) & 0x3f)];
}
*cp = '\0';
while (i-- > srclen)
*(--cp) = '=';
return;
}
Now, on the function calling Base64Enc() I have:
unsigned char *B64Encoded;
Which is the argument I pass onto unsigned char *dest in the base 64 encoding function.
I've tried different initializations from mallocs to NULL to other initialization. No matter what I do I alway get an exception and if I don't initialize it then the compiler (VS2005 C compiler) throws a warning telling me that it hasn't been initialized.
If I run this code with the un-initialized variable sometimes it works and some other it doesn't.
How do I initialized that pointer and pass it to the function?
you need to allocate buffer big enough to contain the encoded result. Either allocate it on the stack, like this:
unsigned char B64Encoded[256]; // the number here needs to be big enough to hold all possible variations of the argument
But it is easy to cause stack buffer overflow by allocating too little space using this approach. It would be much better if you allocate it in dynamic memory:
int cbEncodedSize = srclen * 4 / 3 + 1; // cbEncodedSize is calculated from the length of the source string
unsigned char *B64Encoded = (unsigned char*)malloc(cbEncodedSize);
Don't forget to free() the allocated buffer after you're done.
It looks like you would want to use something like this:
// allocate 4/3 bytes per source character, plus one for the null terminator
unsigned char *B64Encoded = malloc(srclen*4/3+1);
Base64Enc(src, srclen, B64Encoded);
It would help if you provided the error.
I can, with your function above, to this successfully:
int main() {
unsigned char *B64Encoded;
B64Encoded = (unsigned char *) malloc (1000);
unsigned char *src = "ABC";
Base64Enc(src, 3, B64Encoded);
}
You definitely need to malloc space for the data. You also need to malloc more space than src (1/4 more I believe).
A base64 encoded string has four bytes per three bytes in-data string, so if srclen is 300 bytes (or characters), the length for the base64 encoded string is 400.
Wikipedia has a brief but quite good article about it.
So, rounding up srclen to the nearest tuple of three, divided by three, times four should be exactly enough memory.
I see a problem in your code in the fact that it may access the byte after the trailing null char, for instance if the string length is one char. The behavior is then undefined and may result in a thrown exception if buffer boundary checking is activated.
This may explain the message related to accessing uninitialized memory.
You should then change your code so that you handle the trailing chars separately.
int len = (scrlen/3)*3;
for( int i = 0; i < len; i += 3 )
{
// your current code here, it is ok with this loop condition.
}
// Handle 0 bits padding if required
if( len != srclen )
{
// add new code here
}
...
PS: Here is a wikipedia page describing Base64 encoding.

Resources