Function to Split a String into Letters and Digits in C - arrays

I'm pretty new to C, and I'm trying to write a function that takes a user input RAM size in B, kB, mB, or gB, and determines the address length. My test program is as follows:
int bitLength(char input[6]) {
char nums[4];
char letters[2];
for(int i = 0; i < (strlen(input)-1); i++){
if(isdigit(input[i])){
memmove(&nums[i], &input[i], 1);
} else {
//memmove(&letters[i], &input[i], 1);
}
}
int numsInt = atoi(nums);
int numExponent = log10(numsInt)/log10(2);
printf("%s\n", nums);
printf("%s\n", letters);
printf("%d", numExponent);
return numExponent;
}
This works correctly as it is, but only because I have that one line commented out. When I try to alter the 'letters' character array with that line, it changes the 'nums' character array to '5m2'
My string input is '512mB'
I need the letters to be able to tell if the user input is in B, kB, mB, or gB.
I am confused as to why the commented out line alters the 'nums' array.
Thank you.

In your input 512mB, "mB" is not digit and is supposed to handled in commented code. When handling those characters, i is 3 and 4. But because length of letters is only 2, when you execute memmove(&letters[i], &input[i], 1);, letters[i] access out of bounds of array so it does undefined behaviour - in this case, writing to memory of nums array.
To fix it, you have to keep unique index for letters. Or better, for both nums and letters since i is index of input.

There are several problems in your code. #MarkSolus have already pointed out that you access letters out-of-bounds because you are using i as index and i can be more than 1 when you do the memmove.
In this answer I'll address some of the other poroblems.
string size and termination
Strings in C needs a zero-termination. Therefore arrays must be 1 larger than the string you expect to store in the array. So
char nums[4]; // Can only hold a 3 char string
char letters[2]; // Can only hold a 1 char string
Most likely you want to increase both arrays by 1.
Further, your code never adds the zero-termination. So your strings are invalid.
You need code like:
nums[some_index] = '\0'; // Add zero-termination
Alternatively you can start by initializing the whole array to zero. Like:
char nums[5] = {0};
char letters[3] = {0};
Missing bounds checks
Your loop is a for-loop using strlen as stop-condition. Now what would happen if I gave the input "123456789BBBBBBBB" ? Well, the loop would go on and i would increment to values ..., 5, 6, 7, ... Then you would index the arrays with a value bigger than the array size, i.e. out-of-bounds access (which is real bad).
You need to make sure you never access the array out-of-bounds.
No format check
Now what if I gave an input without any digits, e.g. "HelloWorld" ? In this case nothin would be written to nums so it will be uninitialized when used in atoi(nums). Again - real bad.
Further, there should be a check to make sure that the non-digit input is one of B, kB, mB, or gB.
Performance
This is not that important but... using memmove for copy of a single character is slow. Just assign directly.
memmove(&nums[i], &input[i], 1); ---> nums[i] = input[i];
How to fix
There are many, many different ways to fix the code. Below is a simple solution. It's not the best way but it's done like this to keep the code simple:
#define DIGIT_LEN 4
#define FORMAT_LEN 2
int bitLength(char *input)
{
char nums[DIGIT_LEN + 1] = {0}; // Max allowed number is 9999
char letters[FORMAT_LEN + 1] = {0}; // Allow at max two non-digit chars
if (input == NULL) exit(1); // error - illegal input
if (!isdigit(input[0])) exit(1); // error - input must start with a digit
// parse digits (at max 4 digits)
int i = 0;
while(i < DIGITS && isdigit(input[i]))
{
nums[i] = input[i];
++i;
}
// parse memory format, i.e. rest of strin must be of of B, kB, mB, gB
if ((strcmp(&input[i], "B") != 0) &&
(strcmp(&input[i], "kB") != 0) &&
(strcmp(&input[i], "mB") != 0) &&
(strcmp(&input[i], "gB") != 0))
{
// error - illegal input
exit(1);
}
strcpy(letters, &input[i]);
// Now nums and letter are ready for further processing
...
...
}
}

Related

How to extract words from a string array in c language and store it in another 1-d array? I don't want to use pointers

I wrote this code in C, where I wanted to extract words from str, then store them in char word[] (one by one), and send it to another function- palindrome. However, the words are not being formed properly. I'm new to this language so I don't want to use pointers or something else. I want to do it in the most simple way possible. Could you please suggest modifications to the code so that the words get formed properly?
int main()
{
char str[100];
int a=0, l, p=0;
printf("Enter the text \n");
gets(str);
l=strlen(str);
for(int i=0;i<l;i++)
{
char word[100]; int a=0;
if(str[i]==' '||str[i]=='\0')
{
for(int j=p;j<i;j++)
{
word[a]=str[j];
a++;
}
printf("This \n");
puts(word);
palindrome(a,word);
}
}
return 0;
}
Basic things in C don't mean easy to understand. I advise you to understand this code well in order to understand why using high level functions while being a beginner is not necessarily a good thing.
#include <unistd.h>
int main()
{
char buffer;
while(read(0, &buffer, 1) > 0)
write(1, &buffer, 1);
return 0;
}
Since you are an enthusiastic learner and have already tried to solve a problem, the best way to help you is to provide you the guidelines which will assure that you will end up solving your issue successfully.
Pointers
You do not want to use pointers. But what pointers are? Nothing more than variables pointing to a certain position in memory, being aware of the type. You already use arrays, but arrays are memory sections having a type and a size. That section has a start address, so your array is actually not very different from a pointer. So, the intent to avoid using pointers was already breached by the use of arrays, but I think I understand what you want. You want to minimalize pointer arithmetics.
Understandable, implementable solution
You can iterate the characters of your array and store the first non-space character index since the last space you have encountered and find the next space. At that point, you can create a new array of a size which matches the number of characters in that interval and copy the characters from the interval question into that array. Since you are learning C, I will provide you an algorithm that you can implement (this is not actual program code)
wordStart <- -1
lastSpace <- -1
index <- 0
while (index < sizeof(input)) do
if (input[index] != ' ') then
if (wordStart == lastSpace) then
wordStart <- index
end if
else
if (wordStart == lastSpace) then
wordStart <- index
lastSpace <- index
else
lastSpace <- index
//create array of characters, having lastSpace - wordStart elements
//copy the elements between the index of lastSpace and wordStart - 1 into your new array
//pass that new array into your call to palindrom
end if
end if
index <- index + 1
end while

Memcpy and splitting up a string from a pointer array

I am attempting to split up a string allocated to a pointer array and assign it into a matrix array. The number of characters that get assigned to the array are dependent on the number (bytes) entered by the user, so I can't use a function like strtok() to split up the string, since I don't have an actual delimeter.
Anyways, if the number of bytes entered are 1, I can successfully fill my "matrix" from start to finish.
My issue comes when the input is either 2 or 4. For some reason, the code skips over the first character in my original pointer array, and starts at the second. A suspicion I had was that since memcpy() skips over the first char because it is stored at the pointer. I thought this might be the case since technically pointer arrays are not arrays but pointers to arrays, but that wouldn't really explain why the first char gets stored when the input byte is 1.
Below is a little snippet of my code, which includes the dynamic array allocation, and use of memcpy().
Here is where I allocated the string and read in from a file:
char * fileArray = (char*)malloc(size*sizeof(char));
if (fileArray== NULL){
printf("NULL");
return 1;
}
fgets(fileArray, size+1, fp);
After a few lines in between, which calculated the amount of columns I would have to allocate for the matrix, I tried using memcpy.
char maTrix[numOfCols][bytes];
if (bytes == 1){
for (i = 0; i<numOfCols; i++) {
memcpy(&maTrix[i], &fileArray[i], bytes);
}
}
else if (bytes == 2 || bytes == 4) {
for (i = 0; i < numOfCols; i++) {
int k = i * bytes;
int p = k + bytes;
while (k < p) {
memcpy(&maTrix[i], &fileArray[k], bytes);
k++;
}
}
}
If my original theory is right, how would I go about correcting this issue?
The input file I am using contains something like this:
The highest forms of understanding we can achieve are laughter and human compass
ion. Richard Feynman
To make clearer what I am trying to do, is to basically split the total number of characters into columns of n bytes (read chars) each. I have padded the # of characters so that n should always be divisible by the string.
I'm looking for an output of something like this.
if bytes == 2:
maTrix[0][0] = 'T'
maTrix[0][1] = 'h'
maTrix[1][0] = 'e'
maTrix[1][1] = ' '
maTrix[2][0] = 'h'
And so on until the whole matrix is filled.
Instead I get:
maTrix[0][0] = 'h'
maTrix[0][1] = 'e'
maTrix[1][0] = ' '
maTrix[1][1] = 'h'
maTrix[2][0] = 'i'
The same is expected for an input of 4 bytes, just with less columns and 4 rows.

How to set the values from the token character to this array called customerData[][] in C?

I just started learning C language and I need some help with a program. Here is the code.
Questions:
What is this? customerData[NUM_FIELDS][FIELD_LENGTH];
Is it a char 2D array?
How do you input data into the array? fgetC, putchar, getchar ?
#include <stdio.h> #include <string.h> #include <stdlib.h>
#define INPUT_LENGTH 128
#define FIELD_LENGTH 30
#define NUM_FIELDS 9
int main()
{
FILE *data=NULL;
char input[INPUT_LENGTH];
char customerData[NUM_FIELDS][FIELD_LENGTH];
int element=0;
char *next;
char ch;
data= fopen("data.txt","r");
if(data!=NULL)
{
//token=strtok(input,"|");
/*while loop will go through line by line and stored it in an input array*/
while(fgets(input,INPUT_LENGTH,data)!= NULL)
{
next=strtok(input,"|");
while(next!=NULL)
{
//ch=getchar()
//>probably a get char for ch
strcpy(next,customerData[element][strlen(next)]);
/*need to put the values into customer data one by one*/
printf("%s\n",next);
//element+=1;
next=strtok(NULL,"|");
}
//element=0;
}
printf("program is done\n");
}
fclose(data);
return 0;
}
In general, "help me with my code" questions are off-topic on Stack Overflow. In order to keep the question on-topic, I'm going to focus only on the question of how to access 2D char arrays.
Yes, this is a 2D char array. Or, put another way, it's an array with NUM_FIELDS elements, where each element of the array is a char array with FIELD_LENGTH elements.
There are loads of ways to insert data into a 2D char array, but there are probably two I've encountered most often. Which one you choose to use will depend on how you want to think of this array.
Option 1: A 2D array of single chars
The first way to think about this variable is simply as a 2D array of chars - a grid of elements that you can access. Here, you can simply input values using the normal assignment operator. You'll want to make sure that your indexes are in range, or you'll start accessing invalid memory.
//Set a known element that's definitely in range
customerData[1][2] = 'A';
//Loop through all the elements
for(int ii = 0; ii < NUM_FIELDS; ii++)
{
for (int jj = 0; jj < FIELD_LENGTH; jj++)
{
customerData[i][j] = 'B';
}
}
//Set an element from variables
char nextInput = getNextCharFromInput();
if(x < NUM_FIELD && y < FIELD_LENGTH)
{
customerData[x][y] = nextInput;
}
//Bad. This could corrupt memory
customerData[100][60] = 'X';
//Risky without check. How do you know x and y are in range?
cusomterData[x][y] = 'X';
You could certainly write your code by assigning these elements on character at a time. However, the broader context of your program heavily implies to me that the next option is better.
Option 2: A 1D array of fixed-length strings
In C, a "string" is simply an array of chars. So another way to look at this variable (and the one that makes the most sense for this program) is to treat it as a 1D array of length NUM_FIELDS, where each element is a string of length FIELD_LENGTH.
Looking at this this way, you can start using the C string functions to input data into the array, rather than needing to deal character by character. As before, you still need to be careful of lengths so that you don't go off the end of the strings.
Also be aware that all array decay into pointers, so char* is also a string (just of unknown length).
//Set a specific field to a known string, which is short enough to fit
strcpy(customerData[2], "date");
//Loop through all fields and wipe their data
for(int ii = 0; ii < NUM_FIELDS; ii++)
{
memset(customerData[ii], 0, FIELD_LENGTH);
}
//Set field based on variables
if(x < NUM_FIELDS)
{
//Will truncate next if it is too long
strncpy(customerData[x], next, FIELD_LENGTH);
//Will not input anything if field is too long
if(strlen(next) < FIELD_LENGTH)
{
strcpy(customerData[x], next);
}
}
//Bad. Could corrupt memory
strcpy(customerData[100], "date");
strcpy(customerData[1], "this string is definitely much longer than FIELD_LENGTH");
//Risky. Without a check, how do you know either variable in in range?
strcpy(customerData[x], next);
getchar and fgetC both deal with reading characters, from stdout and a file respectively, so can't be used to put data into a variable. putchar does deal with put character into things, but only stdout, so can't be used here.

Why is the concatenated string at the end totally different from what the code does?

The code below might seem big, but it's really simple. I wanted to make an exercise generator in C, which concatenates latex formatted strings, stored in the function local arrays.
But the output is totally not what I expected it to be. I might have done something wrong with the pointers, or some overflow somewhere.
Tried all the tips I could find on internet:
(1) Initialize a string with '\0' at position 0;
(2) Use a pointer to the string to pass to the function;
int create_exercise_string(char **s, int s_size) {
// The format of the exercise is x+n=m;
// ASCII 48-57 = '0' ... '9';
int exercises_left = 10;
char *exercise;
char operation_list[][2] = {"+\0", "-\0"};
char equal[] = "&=&"; // & is for centering on the item contained in latex;
char array_begin[] = "\\begin{eqnarray*}\n";
char array_end[] = "\\end{eqnarray*}";
char number[1];
strcat(*s, array_begin);
s_size -= strlen(array_begin);
//REMOVE
puts("before exercise generation");
while (exercises_left > 0 && s_size > 0) {
exercise = malloc(256);
if (!exercise) {
puts("allocating error, quitting...");
getchar();
exit(1);
}
exercise[0] = '\0';
// THE INTERESTED PART =================================================
if (exercises_left < 10)
strcat(exercise, "\\\\\n");
printf("exercise number %d\n", exercises_left);
strcpy(exercise, "x");
//add an operator
strcat(exercise, operation_list[rand() % 2]);
// add a number
number[0] = (rand() % 10) + 48;
strcat(exercise, number);
// add an equal
strcat(exercise, equal);
// add a number
number[0] = (rand() % 10) + 48;
strcat(exercise, number);
// END =================================================================
s_size -= strlen(exercise);
strcat(*s, exercise);
free(exercise);
exercises_left--;
}
//REMOVE
puts("exercise generation ended");
if (s_size < strlen(array_end)) {
puts("create_exercise_string: buffer overflow detected, quitting...");
getchar();
exit(1); // for now... will be substituted with proper code
}
else strcat(*s, array_end);
puts("allocation worked, returning in main");
return exercises_left; // 0 if succesfull;
}
I was expecting the output to be like this
\begin{eqnarray*}
x-9&=&3\\
x+3&=&12\\
x-2&=&3\\
... 7 other exercises
\end{eqnarray*}
But I actually get
\begin{eqnarray*}
x-5\end{eqnarray*}&=&9\end{eqnarray*}x-9\end{eqnarray*}&=&2\end{eqnarray*}x+3\end{eqnarray*}&=&1\end{eqnarray*}x-7\end{eqnarray*}&=&0\end{eqnarray*}x+6\end{eqnarray*}&=&1\end{eqnarray*}x-6\end{eqnarray*}&=&5\end{eqnarray*}x+6\end{eqnarray*}&=&8\end{eqnarray*}x-8\end{eqnarray*}&=&6\end{eqnarray*}x+4\end{eqnarray*}&=&5\end{eqnarray*}x+1\end{eqnarray*}&=&5\end{eqnarray*}\end{eqnarray*}
With no \n added overall, and some \end{eqnarray*} repeatedly added...
What is wrong?
I managed to fix it.
All I can say is, when working with strings, always keep in mind if the terminating character is added or not, so in my case, I need to reinforce the study of strings allocation in C.
The problem was the variable number[1].
number[0] = rand() % 10 + 48;
This code adds the char representation of a number between 0 and 9 [48,57] in ASCII.
But it doesn't add a terminating null character!
Everytime I concatenated number to the exercise string, it overflowed to another local variable which had the terminating null character, thus the "undefined behaviour" of strcat() was that it kept reading another local variable.
Lucky me in memory there was a '\0' minefield, otherwise I would have needed to bang my head on the wall, if protected memory was tried to be read.
Thanks for the help!

Algorithm for processing the string

I really don't know how to implement this function:
The function should take a pointer to an integer, a pointer to an array of strings, and a string for processing. The function should write to array all variations of exchange 'ch' combination to '#' symbol and change the integer to the size of this array. Here is an example of processing:
choker => {"choker","#oker"}
chocho => {"chocho","#ocho","cho#o","#o#o"}
chachacha => {"chachacha","#achacha","cha#acha","chacha#a","#a#acha","cha#a#a","#acha#a","#a#a#a"}
I am writing this in C standard 99. So this is sketch:
int n;
char **arr;
char *string = "chacha";
func(&n,&arr,string);
And function sketch:
int func(int *n,char ***arr, char *string) {
}
So I think I need to create another function, which counts the number of 'ch' combinations and allocates memory for this one. I'll be glad to hear any ideas about this algorithm.
You can count the number of combinations pretty easily:
char * tmp = string;
int i;
for(i = 0; *tmp != '\0'; i++){
if(!(tmp = strstr(tmp, "ch")))
break;
tmp += 2; // Skip past the 2 characters "ch"
}
// i contains the number of times ch appears in the string.
int num_combinations = 1 << i;
// num_combinations contains the number of combinations. Since this is 2 to the power of the number of occurrences of "ch"
First, I'd create a helper function, e.g. countChs that would just iterate over the string and return the number of 'ch'-s. That should be easy, as no string overlapping is involved.
When you have the number of occurences, you need to allocate space for 2^count strings, with each string (apart from the original one) of length strlen(original) - 1. You also alter your n variable to be equal to that 2^count.
After you have your space allocated, just iterate over all indices in your new table and fill them with copies of the original string (strcpy() or strncpy() to copy), then replace 'ch' with '#' in them (there are loads of ready snippets online, just look for "C string replace").
Finally make your arr pointer point to the new table. Be careful though - if it pointed to some other data before, you should think about freeing it or you'll end up having memory leaks.
If you would like to have all variations of replaced string, array size will have 2^n elements. Where n - number of "ch" substrings. So, calculating this will be:
int i = 0;
int n = 0;
while(string[i] != '\0')
{
if(string[i] == 'c' && string[i + 1] == 'h')
n++;
i++;
}
Then we can use binary representation of number. Let's note that incrementing integer from 0 to 2^n, the binary representation of i-th number will tell us, which "ch" occurrence to change. So:
for(long long unsigned int i = 0; i < (1 << n); i++)
{
long long unsigned int number = i;
int k = 0;
while(number > 0)
{
if(number % 2 == 1)
// Replace k-th occurence of "ch"
number /= 2;
k++;
}
// Add replaced string to array
}
This code check every bit in binary representation of number and changes k-th occurrence if k-th bit is 1. Changing k-th "ch" is pretty easy, and I leave it for you.
This code is useful only for 64 or less occurrences, because unsigned long long int can hold only 2^64 values.
There are two sub-problems that you need to solve for your original problem:
allocating space for the array of variations
calculating the variations
For the first problem, you need to find the mathematical function f that takes the number of "ch" occurrences in the input string and returns the number of total variations.
Based on your examples: f(1) = 1, f(2) = 4 and f(3) = 8. This should give you a good idea of where to start, but it is important to prove that your function is correct. Induction is a good way to make that proof.
Since your replace process ensures that the results have either the same of a lower length than the original you can allocate space for each individual result equal to the length of original.
As for the second problem, the simplest way is to use recursion, like in the example provided by nightlytrails.
You'll need another function which take the array you allocated for the results, a count of results, the current state of the string and an index in the current string.
When called, if there are no further occurrences of "ch" beyond the index then you save the result in the array at position count and increment count (so the next time you don't overwrite the previous result).
If there are any "ch" beyond index then call this function twice (the recurrence part). One of the calls uses a copy of the current string and only increments the index to just beyond the "ch". The other call uses a copy of the current string with the "ch" replaced by "#" and increments the index to beyond the "#".
Make sure there are no memory leaks. No malloc without a matching free.
After you make this solution work you might notice that it plays loose with memory. It is using more than it should. Improving the algorithm is an exercise for the reader.

Resources