Divide a string into a certain number of substrings (C) - c

So, I have a certain file I read as input and load it into the memory, and obtain a cstring char* text.
I want to split that string into a certain number of strings. Say I have 4 threads, for example, and I want each thread to print a certain "piece" of the string (not necessarily in order).
int num_threads = 4;
char *filename = "file.txt";
int file_size = load_file_to_mem(filename, &text); // I made this and it works.
int text_size = strlen(text);
int substring_size; // that would be the size of each substring
char to_thread[num_threads][block_size];
How can I split the *text string into 4 same sized substrings, that is, how can I find the block size so I can make each thread receive a substring of that size?

Unless I misunderstand the question, one way to do it is with the ceil() function: num_threads - 1 substrings would each be of length ceil(test_size / num_threads) + 1. The remaining single substring would be of length test_size - (num_threads - 1) * ceil(test_size / num_threads) + 1.
You might do it this way because num_threads may not divide test_size evenly. Add one byte to include space for a null terminator character for each substring.

Dividing the string is relatively easy: you have the string size, which you obtained from strlen, and you can compute the substring size by dividing the size by 4. You could then use something like strncpy to copy just the amount of characters you need from the string you need to split into some temporary buffers, for example. Basically, for thread i you need to copy substring_size characters, starting from character substring_size * i (just use a pen and paper to see why this happens).
Now, you have 2 problems. One is that the size of the string may not be divisible by 4, which means that the strings printed from the threads will not be equal. The way you handle this is a matter of choice. One way to go would be:
base_substring_size = string_size / no_of_threads
remainder_substring_size = string_size % no_of_threads
for thread_id = 0 to no_of_threads:
this_thread_size = base_substring_size
if remainder_substring_size > 0
this_thread_size += 1
This way, you have in the this_thread_size variable the exact size of the buffer for each thread. And do not forget about the NUL terminator.

Related

How do I print a string between two pointers?

I'm building a rocket for Elon Musk and the memory usage is very important to me.
I have text and a pointer to it pText. It's chilling in the heap.
Sometimes I need to analyse the string, its words. I don't store substrings in heap, instead I store two pointers start/end for represeting a substring of the text. But sometimes I need to print those substrings for debugging purposes. How do I do that?
I know that for a string to be printed I need two things
a pointer to the begging
null terminator at the end
Any ideas?
// Text
char *pText = "We've sold the Earch!";
// Substring `sold`
char *pStart = &(pText + 6) // s
char *pEnd = &(pStart + 3) // d
// Print that substring
printf("sold: %s", ???);
If you only want to print the sub-string, then use a precision argument for printf:
printf("sold: %.*s", (int) (pEnd - pStart) + 1, pStart);
If you need to use the sub-string in other ways then the simplest is probably to create a temporary string, copy into it, and then print that instead.
Perhaps something like this:
// Get the length of the sub-string
size_t length = pEnd - pStart + 1;
// Create an array for the sub-string, +1 for the null-terminator
char temp[length + 1];
// Copy the sub-string
memcpy(temp, pStart, length);
// Terminate it
temp[length] = '\0';
If you need to do this many times I recommend you create a generic function for this.
You might also need to dynamically allocate the string using malloc depending on use-case.

Size of formatted string

I am struggling to understand what happens during snprintf.
Let's say I have two numbers:
int i =11; int k = 3;
I want to format them like this "[%02d] %03d\t" and use snprintf.
Afterwards I use the resulting string with write().
snprintf needs the length/bytes n.
I do not understand what is the length I need to provide...
I have 2 theories:
a) It is
sizeof(int)*2
b) I check how many chars the formatted string will contain by counting the digits of the two integers and adding the other chars that the output will have:
2*sizeof(char) + 1*sizeof(char) + 2*sizeof(char) + 3*sizeof(char)+ 1*sizeof(char)
-> digits of i + digits of k + zeros added to first int + zeros added to second int + tab
I am struggling to understand what is the "n" I have to give to snprintf
It is the buffer size
According to a documentation:
Maximum number of bytes to be used in the buffer. The generated string
has a length of at most n-1, leaving space for the additional
terminating null character. size_t is an unsigned integral type.
Suppose you write to an array such as this:
char buf[32];
The buffer can hold 32 chars (including the null terminator). Therefore we call the function like this:
snprintf (buf, 32, "[%02d] %03d\t", i, k);
You can also check the return value to see how many chars have been written (or would have been written). In this case, if it's bigger than 32, then that would mean that some characters had to be discarded because they didn't fit.
Pass 0 and NULL first to obtain an exact amount
int n = snprintf(NULL, 0, "[%02d] %03d\t", i, k);
Then you know you need n + 1
char *buf = malloc(n + 1);
snprintf(buf, n + 1, "[%02d] %03d\t", i, k);
free(buf);
See it on ideone: https://ideone.com/pt0cOQ
n is the size of the string you're passing into snprintf, so it knows when to stop writing to the buffer. This is to prevent a category of errors knows as buffer overflows. snprintf will write n - 1 characters into the passed-in buffer and then terminate it with the null character.

Concatenate char array and char

I am new to C language. I need to concatenate char array and a char. In java we can use '+' operation but in C that is not allowed. Strcat and strcpy is also not working for me. How can I achieve this? My code is as follows
void myFunc(char prefix[], struct Tree *root) {
char tempPrefix[30];
strcpy(tempPrefix, prefix);
char label = root->label;
//I want to concat tempPrefix and label
My problem differs from concatenate char array in C as it concat char array with another but mine is a char array with a char
Rather simple really. The main concern is that tempPrefix should have enough space for the prefix + original character. Since C strings must be null terminated, your function shouldn't copy more than 28 characters of the prefix. It's 30(the size of the buffer) - 1 (the root label character) -1 (the terminating null character). Fortunately the standard library has the strncpy:
size_t const buffer_size = sizeof tempPrefix; // Only because tempPrefix is declared an array of characters in scope.
strncpy(tempPrefix, prefix, buffer_size - 3);
tempPrefix[buffer_size - 2] = root->label;
tempPrefix[buffer_size - 1] = '\0';
It's also worthwhile not to hard code the buffer size in the function calls, thus allowing you to increase its size with minimum changes.
If your buffer isn't an exact fit, some more legwork is needed. The approach is pretty much the same as before, but a call to strchr is required to complete the picture.
size_t const buffer_size = sizeof tempPrefix; // Only because tempPrefix is declared an array of characters in scope.
strncpy(tempPrefix, prefix, buffer_size - 3);
tempPrefix[buffer_size - 2] = tempPrefix[buffer_size - 1] = '\0';
*strchr(tempPrefix, '\0') = root->label;
We again copy no more than 28 characters. But explicitly pad the end with NUL bytes. Now, since strncpy fills the buffer with NUL bytes up to count in case the string being copied is shorter, in effect everything after the copied prefix is now \0. This is why I deference the result of strchr right away, it is guaranteed to point at a valid character. The first free space to be exact.
strXXX() family of functions mostly operate on strings (except the searching related ones), so you will not be able to use the library functions directly.
You can find out the position of the existing null-terminator, replace that with the char value you want to concatenate and add a null-terminator after that. However, you need to make sure you have got enough room left for the source to hold the concatenated string.
Something like this (not tested)
#define SIZ 30
//function
char tempPrefix[SIZ] = {0}; //initialize
strcpy(tempPrefix, prefix); //copy the string
char label = root->label; //take the char value
if (strlen(tempPrefix) < (SIZ -1)) //Check: Do we have room left?
{
int res = strchr(tempPrefix, '\0'); // find the current null
tempPrefix[res] = label; //replace with the value
tempPrefix[res + 1] = '\0'; //add a null to next index
}

Simple Concatenation in C

I'm not good in C, top of that I'm doing after very long, I need to do a very simple thing:
char code[]="aasd";
char *rmessage="";
strcat(rmessage,code[0]);
I simply want to concatenate the content of index 0 of array code to rmessage.
You need to ensure there is enough space in rmessage to store the result of the concatentation. You can use strncat to specify the number of characters to copy from a string:
char code[] = "aasd";
char rmessage[1024] = "";
strncat(rmessage, code, 1);
or, in this case, just assign the first character of rmessage:
rmessage[0] = code[0];
Not coding in C for long time.I think the syntax is just correct.
int sz=10; // sz = # number of chars you want to store + 1 , i assumed 9 characters will be stored at max.
char code[] = "aasd";
char *rmessage = malloc(sz*sizeof(char));
rmessage[0]=code[0];
rmessage[1]=NULL;
*Remember to deallocate the memory allocated to rmessage after your job is done.
free(rmessage);

Determining the length of an array for memory efficiency

Write a function in C language that:
Takes as its only parameter a sentence stored in a string (e.g., "This is a short sentence.").
Returns a string consisting of the number of characters in each word (including punctuation), with spaces separating the numbers. (e.g., "4 2 1 5 9").
I wrote the following program:
int main()
{
char* output;
char *input = "My name is Pranay Godha";
output = numChar(input);
printf("output : %s",output);
getch();
return 0;
}
char* numChar(char* str)
{
int len = strlen(str);
char* output = (char*)malloc(sizeof(char)*len);
char* out = output;
int count = 0;
while(*str != '\0')
{
if(*str != ' ' )
{
count++;
}
else
{
*output = count+'0';
output++;
*output = ' ';
output++;
count = 0;
}
str++;
}
*output = count+'0';
output++;
*output = '\0';
return out;
}
I was just wondering that I am allocating len amount of memory for output string which I feel is more than I should have allocated hence there is some wasting of memory. Can you please tell me what can I do to make it more memory efficient?
I see lots of little bugs. If I were your instructor, I'd grade your solution at "C-". Here's some hints on how to turn it into "A+".
char* output = (char*)malloc(sizeof(char)*len);
Two main issues with the above line. For starters, you are forgetting to "free" the memory you allocate. But that's easily forgiven.
Actual real bug. If your string was only 1 character long (e.g. "x"), you would only allocate one byte. But you would likely need to copy two bytes into the string buffer. a '1' followed by a null terminating '\0'. The last byte gets copied into invalid memory. :(
Another bug:
*output = count+'0';
What happens when "count" is larger than 9? If "count" was 10, then *output gets assigned a colon, not "10".
Start by writing a function that just counts the number of words in a string. Assign the result of this function to a variable call num_of_words.
Since you could very well have words longer than 9 characters, so some words will have two or more digits for output. And you need to account for the "space" between each number. And don't forget the trailing "null" byte.
If you think about the case in which a 1-byte unsigned integer can have at most 3 chars in a string representation ('0'..'255') not including the null char or negative numbers, then sizeof(int)*3 is a reasonable estimate of the maximum string length for an integer representation (not including a null char). As such, the amount of memory you need to alloc is:
num_of_words = countWords(str);
num_of_spaces = (num_of_words > 0) ? (num_of_words - 1) : 0;
output = malloc(num_of_spaces + sizeof(int)*3*num_of_words + 1); // +1 for null char
So that's a pretty decent memory allocation estimate, but it will definitely allocate enough memory in all scenarios.
I think you have a few other bugs in your program. For starters, if there are multiple spaces between each word e.g.
"my gosh"
I would expect your program to print "2 4". But your code prints something else. Likely other bugs exist if there are leading or trailing spaces in your string. And the memory allocation estimate doesn't account for the extra garbage chars you are inserting in those cases.
Update:
Given that you have persevered and attempted to make a better solution in your answer below, I'm going to give you a hint. I have written a function that PRINTs the length of all words in a string. It doesn't actually allocate a string. It just prints it - as if someone had called "printf" on the string that your function is to return. Your job is to extrapolate how this function works - and then modify it to return a new string (that contains the integer lengths of all the words) instead of just having it print. I would suggest you modify the main loop in this function to keep a running total of the word count. Then allocate a buffer of size = (word_count * 4 *sizeof(int) + 1). Then loop through the input string again to append the length of each word into the buffer you allocated. Good luck.
void PrintLengthOfWordsInString(const char* str)
{
if ((str == NULL) || (*str == '\0'))
{
return;
}
while (*str)
{
int count = 0;
// consume leading white space
while ((*str) && (*str == ' '))
{
str++;
}
// count the number of consecutive non-space chars
while ((*str) && (*str != ' '))
{
count++;
str++;
}
if (count > 0)
{
printf("%d ", count);
}
}
printf("\n");
}
The answer is: it depends. There are trade-offs.
Yes, it's possible to write some extra code that, before performing this action, counts the number of words in the original string and then allocates the new string based on the number of words rather than the number of characters.
But is it worth it? The extra code would make your program longer. That is, you would have more binary code, taking up more memory, which may be more than you gain. In addition, it will take more time to run.
By the way, you have a memory leak in your program, which is more of a problem.
As long as none of the words in the sentence are longer than 9 characters, the length of your output array needs only to be the number of words in the sentence, multiplied by 2 (to account for the spaces), plus an extra one for the null terminator.
So for the string
My name is Pranay Godha
...you need only an array of length 11.
If any of the words are ten characters or more, you'll need to calculate how many extra char your array will need by determining the length of the numeric required. (e.g. a word of length 10 characters clearly requires two char to store the number 10.)
The real question is, is all of this worth it? Unless you're specifically required (homework?) to use the minimal space required in your output array, I'd be minded to allocate a suitably large array and perform some bounds checking when writing to it.

Resources