Copy 16 bits hash on 32 bits array - c

I have a problem with my use of sprintf and strcat. Here's my code :
unsigned char hashResults[8][16];
unsigned char tmp[2];
unsigned char hash[8][32];
transformation("toto", 4, hashResults);
for (int k = 0; k < 8; ++k)
{
for (int i = 0; i < 16; ++i)
{
sprintf(tmp,"%2.2x",hashResults[k][i]);
strcat(hash[k],tmp);
}
printf("%d \n%s\n", strlen(hash[k]), hash[k]);
}
printf("Test : %s\n", hash[3]);
The function transformation() gives me 8 hashes on 16 bits.
I use sprintf and strcat to get hash on 32 bits. When I try to read all hash[k], strlen(hash[8]) returns to me 32 (it's correct), and the string is correct too.
But, when I try to read hash[3] outside the loop it contains all values after it.
For example, the output of my program :
32 - 4a18e332afba75b9734e875323f452f8
32 - b96833277faf31a5915c769f44634506
32 - f89f6dd8cd5aee79de3b2c0c27cafe2e
32 - c9f629472c862c1e7542f4cb2835d02b
32 - 09fc12cfb0a81a38513dbd5edff19e52
32 - 35564354793555a3ae1382f647044445
Test : b96833277faf31a5915c769f44634506f89f6dd8cd5aee79de3b2c0c27cafe2ec9f629472c862c1e7542f4cb2835d02b09fc12cfb0a81a38513dbd5edff19e5235564354793555a3ae1382f647044445
Does anyone see the problem? I wish to use these hashes to compare them to others hashes.

You should increase unsigned char tmp[2] to unsigned char tmp[3] and unsigned char hash[8][32] to unsigned char hash[8][33];
One problem is sprintf(tmp,"%2.2x",hashResults[k][i]), because it writes two characters + '\0' which takes three elements in the array tmp.
But the biggest problem is strcat(hash[k],tmp);.
At the end of every inner for loop you have written in the hash[k] arrays 33 characters(32 chars + '\0'). What happens is when you populate one of the hash[k] arrays in the inner for loop you also write \0' to the first element in the next array, that's why printf("%d \n%s\n", strlen(hash[k]), hash[k]); prints correct results. That misleads you to believing you have null terminated the hash[k] arrays. Now when you enter the next inner for loop you override the null you have written to this array the previous time you exited the inner for loop, no longer terminating the previous array.
So you null terminate your hash[k] array by writing zero to the first element of the hash[k+1] array on the exit of the inner for loop. Then you override this null value every time you enter the inner for loop.
In the end you have no terminating nulls in your arrays and only the final null on the last array is there.
I wonder how do you even get this to work every time, because you write the terminating null beyond the array size which leads to undefined behavior.
Making tmp[3] and hash[8][33] should fix your problem.
In order for the strcat() function to work properly you have to have at least one null in the array you are concatenating to, because otherwise it wouldn't know where to concatenate to. You have to add hash[k][0] = 0; before entering every inner loop:
for (int k = 0; k < 8; ++k)
{
hash[k][0] = 0;
for (int i = 0; i < 16; ++i)
{
sprintf(tmp,"%2.2x",hashResults[k][i]);
strcat(hash[k],tmp);
}
printf("%d \n%s\n", strlen(hash[k]), hash[k]);
}

Related

Function to Split a String into Letters and Digits in C

I'm pretty new to C, and I'm trying to write a function that takes a user input RAM size in B, kB, mB, or gB, and determines the address length. My test program is as follows:
int bitLength(char input[6]) {
char nums[4];
char letters[2];
for(int i = 0; i < (strlen(input)-1); i++){
if(isdigit(input[i])){
memmove(&nums[i], &input[i], 1);
} else {
//memmove(&letters[i], &input[i], 1);
}
}
int numsInt = atoi(nums);
int numExponent = log10(numsInt)/log10(2);
printf("%s\n", nums);
printf("%s\n", letters);
printf("%d", numExponent);
return numExponent;
}
This works correctly as it is, but only because I have that one line commented out. When I try to alter the 'letters' character array with that line, it changes the 'nums' character array to '5m2'
My string input is '512mB'
I need the letters to be able to tell if the user input is in B, kB, mB, or gB.
I am confused as to why the commented out line alters the 'nums' array.
Thank you.
In your input 512mB, "mB" is not digit and is supposed to handled in commented code. When handling those characters, i is 3 and 4. But because length of letters is only 2, when you execute memmove(&letters[i], &input[i], 1);, letters[i] access out of bounds of array so it does undefined behaviour - in this case, writing to memory of nums array.
To fix it, you have to keep unique index for letters. Or better, for both nums and letters since i is index of input.
There are several problems in your code. #MarkSolus have already pointed out that you access letters out-of-bounds because you are using i as index and i can be more than 1 when you do the memmove.
In this answer I'll address some of the other poroblems.
string size and termination
Strings in C needs a zero-termination. Therefore arrays must be 1 larger than the string you expect to store in the array. So
char nums[4]; // Can only hold a 3 char string
char letters[2]; // Can only hold a 1 char string
Most likely you want to increase both arrays by 1.
Further, your code never adds the zero-termination. So your strings are invalid.
You need code like:
nums[some_index] = '\0'; // Add zero-termination
Alternatively you can start by initializing the whole array to zero. Like:
char nums[5] = {0};
char letters[3] = {0};
Missing bounds checks
Your loop is a for-loop using strlen as stop-condition. Now what would happen if I gave the input "123456789BBBBBBBB" ? Well, the loop would go on and i would increment to values ..., 5, 6, 7, ... Then you would index the arrays with a value bigger than the array size, i.e. out-of-bounds access (which is real bad).
You need to make sure you never access the array out-of-bounds.
No format check
Now what if I gave an input without any digits, e.g. "HelloWorld" ? In this case nothin would be written to nums so it will be uninitialized when used in atoi(nums). Again - real bad.
Further, there should be a check to make sure that the non-digit input is one of B, kB, mB, or gB.
Performance
This is not that important but... using memmove for copy of a single character is slow. Just assign directly.
memmove(&nums[i], &input[i], 1); ---> nums[i] = input[i];
How to fix
There are many, many different ways to fix the code. Below is a simple solution. It's not the best way but it's done like this to keep the code simple:
#define DIGIT_LEN 4
#define FORMAT_LEN 2
int bitLength(char *input)
{
char nums[DIGIT_LEN + 1] = {0}; // Max allowed number is 9999
char letters[FORMAT_LEN + 1] = {0}; // Allow at max two non-digit chars
if (input == NULL) exit(1); // error - illegal input
if (!isdigit(input[0])) exit(1); // error - input must start with a digit
// parse digits (at max 4 digits)
int i = 0;
while(i < DIGITS && isdigit(input[i]))
{
nums[i] = input[i];
++i;
}
// parse memory format, i.e. rest of strin must be of of B, kB, mB, gB
if ((strcmp(&input[i], "B") != 0) &&
(strcmp(&input[i], "kB") != 0) &&
(strcmp(&input[i], "mB") != 0) &&
(strcmp(&input[i], "gB") != 0))
{
// error - illegal input
exit(1);
}
strcpy(letters, &input[i]);
// Now nums and letter are ready for further processing
...
...
}
}

Non-recursive combination algorithm to generate distinct character strings

This problem has been irritating me for too long. I need a non-recursive algorithm in C to generate non-distinct character strings. For instance, if a given character string is 26 characters long, and the string is of length 2, then there are 26^2 non-distinct characters.
Please note that these are distinct combinations, aab is not the same as baa or aba. I've searched S.O., and most solutions produce non-distinct combinations. Also, I do not need permutations.
The algorithm can't rely on a libraries. I'm going to translate this C code into cuda where standard C libraries don't work (at least not efficiently).
Before I show you what I started, let me explain an aspect of the program. It is multithreaded on a GPU, so I initialize the beginning string with a few characters, aa in this case. To create a combination, I add one or more characters depending on the desired length.
Here's one method that I have attempted:
int main(void){
//Declarations
char final[12] = {0};
char b[3] = "aa";
char charSet[27] = "abcdefghijklmnopqrstuvwxyz";
int max = 4; //Set for demonstration purposes
int ul = 1;
int k,i;
//This program is multithreaded on a GPU. Each thread is initialized
//to a starting value for the string. In this case, it is aa
//Set final with a starting prefix
int pref = strlen(b);
memcpy(final, b, pref+1);
//Determine the number of non-distinct combinations
for(int j = 0; j < length; j++) ul *= strlen(charSet);
//Start concatenating characters to the current character string
for(k = 0; k < ul; k++)
{
final[pref+1] = charSet[k];
//Do some work with the string
}
...
It should be obvious that this program does nothing useful, accept if I'm only appending one character from charSet.
My professor suggested that I try using a mapping (this isn't homework; I asked him about possible ways to generate distinct combinations without recursion).
His suggestion is similar to what I started above. Using the number of combinations calculated, he suggested to decompose it according to mod 10. However, I realized it wouldn't work.
For example, say I need to append two characters. This gives me 676 combinations using the character set above. If I am on the 523rd combination, the decomposition he demonstrated would yield
523 % 10 = 3
52 % 10 = 2
5 % 10 = 5
It should be obvious that this doesn't work. For one, it yields three characters, and two, if my character set is larger than 10 characters, the mapping ignores those above index 9.
Still, I believe a mapping is key to the solution.
The other method I explored utilized for loops:
//Psuedocode
c = charset;
for(i = 0; i <length(charset); i++){
concat string
for(j = 0; i <length(charset); i++){
concat string
for...
However, this hardcodes the length of the string I want to compute. I could use an if statement with a goto to break it, but I would like to avoid this method.
Any constructive input is appreciated.
Given a string, to find the next possible string in the sequence:
Find the last character in the string which is not the last character in the alphabet.
Replace it with the next character in the alphabet.
Change every character to the right of that character with the first character in the alphabet.
Start with a string which is a repetition of the first character of the alphabet. When step 1 fails (because the string is all the last character of the alphabet) then you're done.
Example: the alphabet is "ajxz".
Start with aaaa.
First iteration: the rightmost character which is not z is the last one. Change it to the next character: aaaj
Second iteration. Ditto. aaax
Third iteration: Again. aaaz
Four iteration: Now the rightmost non-z character is the second last one. Advance it and change all characters to the right to a: aaja
Etc.
First, thanks for everyone's input; it was helpful. Being that I am translating this algorithm into cuda, I need it to be as efficient as possible on a GPU. The methods proposed certainly work, but not necessarily optimal for GPU architecture. I came up with a different solution using modular arithmetic that takes advantage of the base of my character set. Here's an example program, primarily in C with a mix of C++ for output, and it's fairly fast.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <iostream>
using namespace std;
typedef unsigned long long ull;
int main(void){
//Declarations
int init = 2;
char final[12] = {'a', 'a'};
char charSet[27] = "abcdefghijklmnopqrstuvwxyz";
ull max = 2; //Modify as need be
int base = strlen(charSet);
int placeHolder; //Maps to character in charset (result of %)
ull quotient; //Quotient after division by base
ull nComb = 1;
char comb[max+1]; //Array to hold combinations
int c = 0;
ull i,j;
//Compute the number of distinct combinations ((size of charset)^length)
for(j = 0; j < max; j++) nComb *= strlen(charSet);
//Begin computing combinations
for(i = 0; i < nComb; i++){
quotient = i;
for(j = 0; j < max; j++){ //No need to check whether the quotient is zero
placeHolder = quotient % base;
final[init+j] = charSet[placeHolder]; //Copy the indicated character
quotient /= base; //Divide the number by its base to calculate the next character
}
string str(final);
c++;
//Print combinations
cout << final << "\n";
}
cout << "\n\n" << c << " combinations calculated";
getchar();
}

Algorithm for processing the string

I really don't know how to implement this function:
The function should take a pointer to an integer, a pointer to an array of strings, and a string for processing. The function should write to array all variations of exchange 'ch' combination to '#' symbol and change the integer to the size of this array. Here is an example of processing:
choker => {"choker","#oker"}
chocho => {"chocho","#ocho","cho#o","#o#o"}
chachacha => {"chachacha","#achacha","cha#acha","chacha#a","#a#acha","cha#a#a","#acha#a","#a#a#a"}
I am writing this in C standard 99. So this is sketch:
int n;
char **arr;
char *string = "chacha";
func(&n,&arr,string);
And function sketch:
int func(int *n,char ***arr, char *string) {
}
So I think I need to create another function, which counts the number of 'ch' combinations and allocates memory for this one. I'll be glad to hear any ideas about this algorithm.
You can count the number of combinations pretty easily:
char * tmp = string;
int i;
for(i = 0; *tmp != '\0'; i++){
if(!(tmp = strstr(tmp, "ch")))
break;
tmp += 2; // Skip past the 2 characters "ch"
}
// i contains the number of times ch appears in the string.
int num_combinations = 1 << i;
// num_combinations contains the number of combinations. Since this is 2 to the power of the number of occurrences of "ch"
First, I'd create a helper function, e.g. countChs that would just iterate over the string and return the number of 'ch'-s. That should be easy, as no string overlapping is involved.
When you have the number of occurences, you need to allocate space for 2^count strings, with each string (apart from the original one) of length strlen(original) - 1. You also alter your n variable to be equal to that 2^count.
After you have your space allocated, just iterate over all indices in your new table and fill them with copies of the original string (strcpy() or strncpy() to copy), then replace 'ch' with '#' in them (there are loads of ready snippets online, just look for "C string replace").
Finally make your arr pointer point to the new table. Be careful though - if it pointed to some other data before, you should think about freeing it or you'll end up having memory leaks.
If you would like to have all variations of replaced string, array size will have 2^n elements. Where n - number of "ch" substrings. So, calculating this will be:
int i = 0;
int n = 0;
while(string[i] != '\0')
{
if(string[i] == 'c' && string[i + 1] == 'h')
n++;
i++;
}
Then we can use binary representation of number. Let's note that incrementing integer from 0 to 2^n, the binary representation of i-th number will tell us, which "ch" occurrence to change. So:
for(long long unsigned int i = 0; i < (1 << n); i++)
{
long long unsigned int number = i;
int k = 0;
while(number > 0)
{
if(number % 2 == 1)
// Replace k-th occurence of "ch"
number /= 2;
k++;
}
// Add replaced string to array
}
This code check every bit in binary representation of number and changes k-th occurrence if k-th bit is 1. Changing k-th "ch" is pretty easy, and I leave it for you.
This code is useful only for 64 or less occurrences, because unsigned long long int can hold only 2^64 values.
There are two sub-problems that you need to solve for your original problem:
allocating space for the array of variations
calculating the variations
For the first problem, you need to find the mathematical function f that takes the number of "ch" occurrences in the input string and returns the number of total variations.
Based on your examples: f(1) = 1, f(2) = 4 and f(3) = 8. This should give you a good idea of where to start, but it is important to prove that your function is correct. Induction is a good way to make that proof.
Since your replace process ensures that the results have either the same of a lower length than the original you can allocate space for each individual result equal to the length of original.
As for the second problem, the simplest way is to use recursion, like in the example provided by nightlytrails.
You'll need another function which take the array you allocated for the results, a count of results, the current state of the string and an index in the current string.
When called, if there are no further occurrences of "ch" beyond the index then you save the result in the array at position count and increment count (so the next time you don't overwrite the previous result).
If there are any "ch" beyond index then call this function twice (the recurrence part). One of the calls uses a copy of the current string and only increments the index to just beyond the "ch". The other call uses a copy of the current string with the "ch" replaced by "#" and increments the index to beyond the "#".
Make sure there are no memory leaks. No malloc without a matching free.
After you make this solution work you might notice that it plays loose with memory. It is using more than it should. Improving the algorithm is an exercise for the reader.

accessing array of strings in C

How to declare array of strings in C.
Is it like
char str[100][100] ={"this","that","those"};
If so how to access the values .. can i travers like this?
(It does not give any compilation error ..but shows some additional garbage characters)
int i ,j;
char c[100][100] = {"this","that"};
for(i = 0 ;c[i] != '\0';++i)
for(j = 0; c[i][j] != '\0';++j)
printf("%c",c[i][j]);
Is it necessary to add '\0' at end of eac string..for ex:
char c[100][100]={"this\0","that\0"}
How to declare array of strings in C
It is Ok, but you will have to be extremely careful of buffer-overflow when dealing with these strings
can i travers like this?
Note that the condition in the first for loop: for(i = 0 ;c[i] != '\0';++i) is probably wrong, and will fail since c[i] is an array, whose address is not 0. You should probably iterate the outer array by numbers [until you read all elements], and not until you find some specific character. You can do that by maintaining a different variable n, which will indicate how many elements does the array currently have.
Is it necessary to add '\0' at end of eac string..for ex:
No - the compiler add it to you, it is just fine without adding the '\0' to the string.
Yes, you can declare an array of strings that way.
No, you can't traverse it like that, the condition on your outer loop is bad - a string (char *) will never be equal to a character '\0'. The inner loop is fine.
No, you don't need to add the '\0', that will happen automatically.
c[i] is a pointer, so it has nothing to do with '\0'
so instead you should check c[i][0]
The compiler will add '\0' for you when you input a string like "this"
char str[100][100] ={"this","that","those"};
int main()
{
int i ,j;
char c[100][100] = {"this","that"};
for(i = 0 ;c[i][0] != '\0';++i)
{
for(j = 0; c[i][j] != '\0';++j)
printf("%c",c[i][j]);
}
}

Why are the elements in my char* array two bytes instead of four? :

I am new to C, so forgive me if this question is trivial. I am trying to reverse a string, in
my case the letters a,b,c,d. I place the characters in a char* array, and declare a buffer
which will hold the characters in the opposite order, d,c,b,a. I achieve this result using
pointer arithmetic, but to my understanding each element in a char* array is 4 bytes, so when I do the following: buffer[i] = *(char**)letters + 4; I am supposed to be pointing at the
second element in the array. Instead of pointing to the second element, it points to the third. After further examination I figured that if I increment the base pointer by two
each time I would get the desired results. Does this mean that each element in the array
is two bytes instead of 4? Here is the rest of my code:
#include <stdio.h>
int main(void)
{
char *letters[] = {"a","b","c","d"};
char *buffer[4];
int i, add = 6;
for( i = 0 ; i < 4 ; i++ )
{
buffer[i] = *(char**)letters + add;
add -= 2;
}
printf("The alphabet: ");
for(i = 0; i < 4; i++)
{
printf("%s",letters[i]);
}
printf("\n");
printf("The alphabet in reverse: ");
for(i = 0; i < 4; i++)
{
printf("%s",buffer[i]);
}
printf("\n");
}
You're not making an array of characters: you're making an array of character strings -- i.e., an array of pointers to arrays of characters. I am not going to rewrite the whole program for you of course, but I'll start out with two alternative possible correct declarations for your main data structure:
char letters[] = {'a','b','c','d, 0};
char * letters = "abcd";
Either of these declares an array of five characters: a, b, c, d followed by 0, the traditional ending for a character string in C.
Another thing: rather than making assumptions about the size of things, use the language to tell you. For instance:
char *my_array[] = { "foo" , "bar" , "baz" , "bat" , } ;
// the size of an element of my_array
size_t my_array_element_size = sizeof(my_array[0]) ;
size_t alt_element_size = size(*my_array) ; // arrays are pointers under the hood
// the number of elements in my_array
size_t my_array_element_cnt = sizeof(my_array) / sizeof(*myarray ;
// the size of a char
size_t char_size = sizeof(*(my_array[0])) ; // size of a char
Another thing: understand your data structures (as noted above). You talk about chars, but your data structures are talking about strings. Your declarations:
char *letters[] = {"a","b","c","d"};
char *buffer[4];
get parsed as follows:
letters is an array of pointers to char (which happen to be nul-terminated C-style strings), and it's initialized with 4 elements.
Like letters, buffer is an array of 4 pointers to char, but uninitialized.
You are not actually dealing individual chars anywhere, even in the printf() statements: the %s specifier says the argument is a nul-terminated string. Rather, you're dealing with strings (aka pointers to char) and arrays of the same.
An easier way:
#include <stdio.h>
int main(void)
{
char *letters[] = { "a" , "b" , "c" , "d" , } ;
size_t letter_cnt = size(letters)/sizeof(*letters) ;
char *buffer[sizeof(letters)/sizeof(*letters)] ;
for ( int i=0 , j=letter_cnt ; i < letter_cnt ; ++i )
{
buffer[--j] = letters[i] ;
}
printf("The alphabet: ");
for( int i = 0 ; i < letter_cnt ; ++i )
{
printf("%s",letters[i]);
}
printf("\n");
printf("The alphabet in reverse: ");
for( int i=0 ; i < letter_cnt ; i++ )
{
printf("%s",buffer[i]);
}
printf("\n");
}
BTW, is this homework?
This is a case of operator precedence. When you use buffer[i] = *(char**)letters + add;, the * before the cast is performed before the +, making this code equivalent to (*(char**)letters) + add;. The first part is equivalent to the address of the first element in your array, the string "a". Since using string constant automatically adds a null byte, this points to 'a\0'. It happens that the compiler placed all four strings immediately after each other in memory, so if you go past the end of that string you flow into the next. When you add to the pointer, you are moving through this character array: 'a\0b\0c\0d\0'. Notice that each character is 2 bytes after the last. Since this is only true because the compiler placed the 4 strings directly after each other, you should never depend on it (it won't even work if you tried to re-reverse your other string). Instead, you need to put in parentheses to make sure the addition happens before the dereference, and use the 4 byte pointer size. (Of course, as pointed out by Nicholas, you shouldn't assume the size of anything. Use sizeof to get the size of a pointer instead.)
buffer[i] = *((char**)letters + add);
char *letters[] = {"a","b","c","d"};
I think you didn't get the pointer arithmetic correctly. letters is an array of pointers and when incremented by 1 makes to go to next row.
letters + 1 ; // Go to starting location of 2 row, i.e., &"b"
char *letters[] = { "abc" , "def" } ;
(letters + 1) ; // Point to the second row's first element, i.e., &"d"
*((*letters) + 1) ; // Get the second element of the first row. i.e., "b"

Resources