malloc() into a recursive function that modifies a string - c

THE CODE:
The aim of this code is to take a string of 0 1 and * and print all the combinations of strings obtained replacing * with 0 or 1.
Ex.
input : 0*1 => output: 001 011
the idea is to build a recursive function (for practice purposes):
void rec_print (char *mod_str)
which counts the occurrences of *, the offset of the first * encountered while looping the string (I have used ternary just to practice them)
for(int i=0; mod_str[i]; i++) {
n_star = (mod_str[i] == '*') ? n_star+1 : n_star;
if (offset==-1) {
offset = (mod_str[i] == '*') ? i : -1;
}
}
The base case occurs when there is only one *, in which case replaces * with 0, prints the string, replace the 0 with a 1, and finally prints the string:
if (n_star==1) {
mod_str[offset] = "0";
printf("\n%s", mod_str);
mod_str[offset] = "1";
printf("\n%s", mod_str);
}
otherwise modifies the first * of the string to a 0 then call itself, then modify it to a 1 when it comes back, and finally call itself:
else {
// replace the first encountered * to 0 and recall itself
mod_str[offset] = "0";
rec_print(mod_str);
// replace the previous 0 to 1 and recall itself
mod_str[offset] = "1";
rec_print(mod_str);
}
THE ISSUE:
as you will notice here the problem is that I am trying to modify a string which memory is read-only (yes, I have a huge "Python bias" here), normally I would use a malloc() to resolve this but I can't figure out how to use it inside a function to modify a string.
I am well aware recursion is not the best solution to this exercise but I need to satisfy my curiosity about this.
I thank everyone for the time spent here and apologize for my eventual English mistakes in advance.

The base case occurs when there is only one *, in which case replaces * with 0, prints the string, replace the 0 with a 1, and finally prints the string:
This base case isn't really "base." For example, you could receive the string "001" as input, which contains zero stars. You should modify your base case to handle only zero stars, and rely on your recursive case to simplify the case with one star to zero stars.
as you will notice here the problem is that I am trying to modify a string which memory is read-only
If the memory is read-only, you should clarify that fact to the compiler! Then the compiler can check your code to make sure it never writes to a read-only variable.
Example:
void rec_print (const char *mod_str)
as you will notice here the problem is that I am trying to modify a string which memory is read-only (yes, I have a huge "Python bias" here), normally I would use a malloc() to resolve this but I can't figure out how to use it inside a function to modify a string.
You need to copy the string to a place where you can modify it. Here's an example:
void rec_print (const char *src_str) { // note name change
// Allocate memory
char *mod_str = malloc((strlen(src_str) + 1) * sizeof(char));
// This function copies src_str to mod_str
strcpy(mod_str, src_str);
// Rest of function
...
// Clean up memory used
free(mod_str);
}
Here we allocate enough memory to hold the input string. Then, we copy the input string to the space we allocated. We can do all modifications on this copy. At the end we free the allocation.

Related

memset() not setting memory in c

I apologize if my formatting is incorrect as this is my first post, I couldn't find a post on the site that dealt with the same issue I am running into. I'm using plain C on ubuntu 12.04 server. I'm trying to concatenate several strings together into a single string, separated by Ns. The string sizes and space between strings may vary, however. A struct was made to store the positional data as several integers that can be passed to multiple functions:
typedef struct pseuInts {
int pseuStartPos;
int pseuPos;
int posDiff;
int scafStartPos;
} pseuInts;
As well as a string struct:
typedef struct string {
char *str;
int len;
} myString;
Since there are break conditions for the concatenated string multiple nodes of a dynamically linked list were assembled containing an identifier and the concatenated string:
typedef struct entry {
myString title;
myString seq;
struct entry *next;
} entry;
The memset call is as follows:
} else if ((*pseuInts)->pseuPos != (*pseuInts)->scafStartPos) {
(*pseuEntry)->seq.str = realloc ((*pseuEntry)->seq.str, (((*pseuEntry)->seq.len) + (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)))); //realloc the string being extended to account for the Ns
memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len)), 'N', (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos))); //insert the correct number of Ns
(*pseuEntry)->seq.len += (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)); //Update the length of the now extended string
(*pseuInts)->pseuPos += (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)); //update the position values
}
These are all being dereferenced as this else if decision is in a function being called by a function called from main, but the changes to the pseuEntry struct need to be updated in main so as to be passed to another function for further processing.
I've double checked the numbers being used in pseuInts by inserting some printf commands and they are correct in the positioning of how many Ns need to be added, even as they change between different short strings. However, when the program is run the memset only inserts Ns the first time it's called. IE:
GATTGT and TAATTTGACT are separated by 4 spaces and they become:
GATTGTNNNNTAATTTGACT
The second time it is called on the same concatenated string it doesn't work though. IE:
TAATTTGACT and TCTCC are separated by 6 spaces so the long string should become:
GATTGTNNNNTAATTTGACTNNNNNNTCTCC
but it only shows:
GATTGTNNNNTAATTTGACTTCTCC
I've added printfs to display the concatenated string immediately before and after the memset and the they are identical in output.
Sometimes the insertion is adding extra character spaces, but not initializing them so they print nonsense (as would be expected). IE:
GAATAAANNNNNNNNNNNNNNNNN¬GCTAATG
should be
GAATAAANNNNNNNNNNNNNNNNNGCTAATG
I've switched the memset with a for or a while loop and I get the same result. I used an intermediate char * to realloc and still get the same result. I'm looking for for suggestions as to where I should look to try and detect the error.
If you are okay with considering a completely different approach, I would like to offer this:
I understand your intent to be: Replace existing spaces between two strings with an equal number of "N"s. memset() (and associated memory allocations) is the primary method to perform the concatenations.
The problems you have described with your current concatenation attempts are :
1) garbage embedded in resulting string.
2) writing "N" in some unintended memory locations.
3) "N" not being written in other intended memory locations.
Different approach:
First: verify that the memory allocated to the string being modified is sufficient to contain results
second: verify all strings to be concatenated are \0 terminated before attempting concatenation.
third: use strcat(), and a for(;;) loop to append all "N"s, and eventually, subsequent strings.
eg.
for(i=0;i<numNs;i++)//compute numNs with your existing variables
{
strcat(firstStr, "N");//Note: "N" is already NULL term. , and strcat() also ensures null term.
}
strcat(firstStr, lastStr); //a null terminated concatenation
I know this approach is vastly different from what you were doing, but it does address at least the issues identified from your problem statement. If this makes no sense, please let me know and I will address questions as I am able to. (currently have other projects going on)
Looking at your memset:
memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len))), ...
That's the destination. Shouldn't it be:
(memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len) + ((*pseuEntry)->seq.pseuStartPos))
Otherwise I'm missing the meaninging of pseuInts .

How to write into a char array in C at specific location using sprintf?

I am trying to port some code written in MATLAB to C, so that I can compile the function and execute it faster (the code is executed very often and it would bring a significant speed increase).
So basically what my MATLAB code does it that it takes a matrix and converts it to a string, adding brackets and commas, so I can write it to a text file. Here's an idea of how this would work for a vector MyVec:
MyVec = rand(1,5);
NbVal = length(MyVec)
VarValueAsText = blanks(2 + NbVal*30 + (NbVal-1));
VarValueAsText([1 end]) = '[]';
VarValueAsText(1 + 31*(1:NbVal-1)) = ',';
for i = 1:NbVal
VarValueAsText(1+(i-1)*31+(1:30)) = sprintf('%30.15f', MyVec(i));
end
Now, how can I achieve a similar result in C? It doesn't seem too difficult, since I can calculate in advance the size of my string (char array) and I know the position of each element that I need to write to my memory area. Also the sprintf function exists in C. However, I have trouble understanding how to set this up, also because I don't have an environment where I can learn easily by trial and error (for each attempt I have to recompile, which often leads to a segmentation fault and MATLAB crashing...).
I hope someone can help even though the problem will probably seem trivial, but I have have very little experience with C and I haven't been able to find an appropriate example to start from...
Given an offset (in bytes) into a string, retrieving a pointer to this offset is done simply with:
char *ptr = &string[offset];
If you are iterating through the lines of your matrix to print them, your loop might look as follow:
char *ptr = output_buffer;
for (i = 0; i < n_lines; i++) {
sprintf (ptr, "...", ...);
ptr = &ptr[line_length];
}
Be sure that you have allocated enough memory for your output buffer though.
Remember that sprintf will put a string-terminator at the end of the string it prints, so if the string you "print" into should be longer than the string you print, then that won't work.
So if you just want to overwrite part of the string, you should probably use sprintf to a temporary buffer, and then use memcpy to copy that buffer into the actual string. Something like this:
char temp[32];
sprintf(temp, "...", ...);
memcpy(&destination[position], temp, strlen(temp));

Coredump when parsing chapters

I am doing a homework assignment that reads in a book. First, a line is read in and a pointer made to point at that line. Then a paragraph function reads in lines and stores their address into a array of pointers. Now, I am on reading a chapter (a paragraph recognized by the next line being broke). It should call get_paragraph() and store the address of paragraphs until it comes to a new chapter.
A new chapter is the only time in the book where the first character in the line is not a space. I think this is were I am having problems in my code. All functions up to this point work. I hope I have provided enough information. The code compiles but core dumps when started.
I am a student and learning so please be kind. Thanks.
char*** get_chapter(FILE * infile){
int i=0;
char **chapter[10000];//an array of pointers
// Populate the array
while(chapter[i]=get_paragraph(infile)) { //get address store into array
if(!isspace(**chapter[0])){ //check to see if it is a new chapter<---problem line?
// save paragraph not used in chapter using static to put into next chapter
break;
}
i++;//increment array
}
//add the null
chapter[++i]='\0';//put a null at the end to signify end of array
//Malloc the pointer
char**(*chap) = malloc(i * sizeof(*chap));//malloc space
//Copy the array to the pointer
i=0;//reset address
while(chapter[i]){//while there are addresses in chapter
chap[i] = chapter[i++];//change addresses into chap
}
chap[i]='\0';//null to signify end of chapter
//Return the pointer
return(chap);//return pointer to array
}
For those who would rather see without comments:
char*** get_chapter(FILE * infile){
int i=0;
char **chapter[10000];
while(chapter[i]=get_paragraph(infile)) {
if(!isspace(**chapter[0])){
break;
}
i++;
}
chapter[++i]='\0';
char**(*chap) = malloc(i * sizeof(*chap));//malloc space
i=0;
while(chapter[i]){
chap[i] = chapter[i++];
}
chap[i]='\0';
return(chap);
}
Comments inline.
char*** get_chapter(FILE * infile) {
int i=0;
// This is a zero length array!
// (The comma operator returns its right-hand value.)
// Trying to modify any element here can cause havoc.
char **chapter[10,000];
while(chapter[i]=get_paragraph(infile)) {
// Do I read this right? I translate it as "if the first character of
// the first line of the first paragraph is not whitespace, we're done."
// Not the paragraph just read in -- the first paragraph. So this will exit
// immediately or else loop forever and walk off the end of the array
// of paragraphs. I think you mean **chapter[i] here.
if(!isspace(**chapter[0])){
break;
}
i++;
}
// Using pre-increment here means you leave one item in the array uninitialized
// which can also cause a fault later on. Use post-increment instead.
// Also '\0' here is the wrong sort of zero; I think you need NULL instead.
chapter[++i]='\0';
char**(*chap) = malloc(i * sizeof(*chap));
i=0;
while(chapter[i]) {
// This statement looks ambiguous to me. Referencing a variable twice and
// incrementing it in the same statement? You may end up with an off-by-one error.
chap[i] = chapter[i++];
}
// Wrong flavor of zero again.
chap[i]='\0';
return(chap);
}
Can I suggest that you use for loops instead of whiles? You need to stop if you run out of space, so you might as well use the appropriate construct.
I suspect you have a bug in this code:
while(chapter[i]=get_paragraph(infile)) {
if(!isspace(**chapter[0])){
break;
}
i++;
}
chapter[++i]='\0';
Firstly, shouldn't it be chapter[i] instead of chapter[0]? You want to know if the pointer at chapter[i] points to a space, not the first pointer in chapter. So this will probably loop indefinitely - hence the need for a for loop, so you don't just loop forever accidentally.
Secondly, you increment i at the end of the while block, and then again in the chapter[++i] assignment. i has already been incremented by the final loop execution before the while condition breaks, so it is already the correct position to use. ++i increments before yielding the value, so presumably you meant to have i++ here, so that it would increment after yielding the current value of i. Either way, it's confusing one of us as to what you mean, so maybe just put the increment on a separate line for clarity. The compiler will sort out any available optimisation.
Finally (and I might well be wrong here) why are you setting the value to '\0'? That's a null character, isn't it? But your array is of pointers. The null pointer would be 0, rather than '\0', I think. If I'm right, you might have still got away with it if '\0' yields the same set of zeroes as the null pointer.
Have you tried single stepping though it in gdb, and occasionally dumping the local variables to see the current state? It's a good way to learn. You may want to add a few extra intermediate variables that "info locals" will automatically dump as well (pointers to the current XXX, where XXX is various items in your hierarchy)
I assume a GNU environment:
% gcc -g homework.c -o hw
% gdb hw
(gdb) b 10
(gdb) r
(gdb) info locals
(gdb) n
(gdb) info locals
...
Replace "10" with a suitable line number near the beginning of the function.
Shouldn't it be:
if(!isspace(**chapter[i])){
Each chapter[i] is a pointer to a pointer to a char, this char is the first character in each chapter. So **chapter[i] represents the first character in chapter i. Using chapter[0] will only look at the first chapter.

Memory issue in my Sub String function in C?

I have this C function which attempts to tell me if a sub string is contained in a string.
int sub_string(char parent [1000], char child [1000]){
int i;
i = 0;
int parent_size = (int) strlen(parent);
int child_size = (int) strlen(child);
char tempvar [child_size];
int res;
res = 1;
while(i<(parent_size - child_size + 1) && res != 0){
strncpy(tempvar, parent + i, child_size);
if(strcmp(tempvar, child)==0){
res = 0;
}
i++;
memset(tempvar, 0, child_size);
}
memset(tempvar, 0, sizeof(tempvar));
return res;
}
Now the strange thing is, when I pass a string "HOME_DIR=/tmp/" and "HOME_DIR" it returns a 0 the first time round, but after I call this function again, it returns a 1 to say it hasn't found it!!
I am guessing this is a memory issue, but I can't tell where, I would appreciate any help on this.
Is there any reason you can't use the strstr function? Otherwise there are some things you should clean up in your code. For starters since you are limiting the length of the arrays coming in to 1000 characters you should use strnlen instead of strlen with a limit of 1000. You should also create you should zero out the tempvar array before you start copying into it. If parent is not null terminated you could run off the end of the array in your while loop. I would also suggest using strncmp and giving a length limit (in general if you are using the C string library you should use the 'n' version of the functions i.e. strnlen instead of strlen so that you put a bounding length on the operation, this helps to protect buffer overflows and potential security holes in your code).
I have noticed some issues with this program:
Use pointers instead of fixed char arrays. This is more space optimal. So your function definition becomes int sub_string(char *parent, int parent_len, char *child, int child_len). Please note that since I pass pointers I also need to pass the length of the string so I know how much to traverse. So now you access your string like so *(parent+i) in a loop.
i<(parent_size - child_size + 1) This condition looks a bit dicey to me. Let's say parent is 100 in len & child is 75. so this expression becomes i<26. Now your loop will terminate when i>26. So tempvar would have the parent_string till index 25. So how does this work again?
One problem is:
char tempvar [child_size];
strcmp below will compare child_size+1 characters (incl. terminating '\0'), therefore its undefined behaviour.
Do you know the C-standard functions strstr and strncmp?
sizeof(tempvar) does not return child_size.

How do we know that a string element in C is uninitialized?

Is there a way to know whether the element in a string in C has a value or not? I have tried using NULL, '', and ' ', but they don't seem to be working. I need to shift the characters down to index 0 without using stdlib functions.
#include <stdio.h>
int main()
{
char literal[100];
//literal[99] = '\0'
literal[98] = 'O';
literal[97] = 'L';
literal[96] = 'L';
literal[95] = 'E';
literal[94] = 'H';
int index = 0;
while(literal[index] != '\0')
{
if(literal[index] == NULL) // does not work
printf("Empty");
else
printf("%c", literal[index]);
++index;
}
getchar();
return 0;
}
No. Since literal has automatic storage, its elements will not be initialized, the values in the array is undefined.
You could initialize every element to something special and check for that value.
e.g. you could change
char literal[100];
char literal[100] = {0};
to initialize every element to 0.
You'd have to change your while loop termination check to
while(index < 100) {
if(literal[index] == 0)
printf("Empty");
...
}
}
That might not be optimal if you need to perform more string manipulation on the array though, as 0 now means empty element and also 'end of string'.
No, you can't do that. This is because it will have a value - there is just no way of knowing what that value is. This is why it is essential to initialise things to known values.
C does not default initialize anything. Therefore the contents in your string are whatever garbage was in that memory by whatever last used it on the stack. You need to explicitly set each literal value to a value that means "unset" to you.
No, there is no way of knowing what value the array has. You can, however, initialize it with a chosen "default" value of your choice and later check against that.
You need to set the end of the string to 0 (zero) or '\0' - C only does this for you automatically for string literals, not local variables on the stack
Try
memset(&literal, 0, 100);
Or just uncomment your line that sets literal at index 99 to '\0'
A c string is just a bunch of memory locations and a convention that '\0' marks the end. There is no compiler enforcement, and no attached meta data (unless you build a structure to provide it).
Every cell in memory, always has a value, so every string always has a value, you just can't guarantee that it is sensible or even that it ends in the allotted space.
Insuring that there is meaningful data in there is your responsibility, which suggests that you should initialize all strings either at declaration time or immediately after allocation. Exception to the rule are rare and are undertaken at your own risk.
No, that is undefined behaviour as the runtime, for all we care could shove in a few binary ASCII characters, you really do not want to get into that. The best way to deal with it is to use a for loop and iterate through it or use calloc which initializes a pointer but sets it to 0.
for (i = 0; i < 100; i++) literal[i] = '\0';
OR
char *literalPtr = (char*)calloc(100, sizeof(char)); // Array of 99 elements plus 1 for '\0'
There is absolutely no guarantee in doing that. Hence it would be classified as undefined behaviour as it is dependent on the compiler and runtime implementation.
Hope this helps,
Best regards,
Tom.

Resources