A function to find and substitute specific text? - c

Is there a function which I can use that which will allow me to replace a specific texts.
For example:
char *test = "^Hello world^"; would be replaced with char *test = "<s>Hello world</s>";
Another example: char *test2 = "This is ~my house~ bud" would be replaced with char *test2 = "This is <b>my house</b> bud"

Before you can begin to replace substrings within a string, you have to understand what you are dealing with. In your example you want to know whether you can replace characters within a string, and you give as an example:
char *test = "^Hello world^";
By being declared and initialized as shown above test, is a string-literal created in read-only memory (on virtually all systems) and any attempt to modify characters stored in read-only memory invokes Undefined Behavior (and most likely a Segmentation Fault)
As noted in the comments, test could be declared and initialized as a character array, e.g. char test[] = "^Hello world^"; and insure that test is modifiable, but that does not address the problem where your replacement strings are longer than the substrings being replaced.
To handle the additional characters, you have two options (1) you can declare test[] to be sufficiently large to accommodate the substitutions, or (2) you can dynamically allocate storage for the replacement string, and realloc additional memory if you reach your original allocation limit.
For instance if you limit the code associated with test to a single function, you could declare test with a sufficient number of characters to handle the replacements, e.g.
#define MAXC 1024 /* define a constant for the maximum number of characters */
...
test[MAXC] = "^Hello world^";
You would then simply need to keep track of the original string length plus the number of character added with each replacement and insure that the total never exceeds MAXC-1 (reserving space for the nul-terminating character).
However, if you decided to move the replacement code to a separate function -- you now have the problem that you cannot return a pointer to a locally declared array (because the locally declared array is declared within the function stack space -- which is destroyed (released for reuse) when the function returns) A locally declared array has automatic storage duration. See: C11 Standard - 6.2.4 Storage durations of objects
To avoid the problem of a locally declared array not surviving the function return, you can simply dynamically allocate storage for your new string which results in the new string having allocated storage duration which is good for the life of the program, or until the memory is freed by calling free(). This allows you to declare and allocate storage for a new string within a function, make your substring replacements, and then return a pointer to the new string for use back in the calling function.
For you circumstance, a simple declaration of a new string within a function and allocating twice the amount of storage as the original string is a reasonable approach to take. (you still must keep track of the number of bytes of memory you use, but you then have the ability to realloc additional memory if you should reach your original allocation limit) This process can continue and accommodate any number of strings and substitutions, up to the available memory on your system.
While there are a number of ways to approach the substitutions, simply searching the original string for each substring, and then copying the text up to the substring to the new string, then copying the replacement substring allows you to "inch-worm" from the beginning to the end of your original string making replacement substitutions as you go. The only challenge you have is keeping track of the number of characters used (so you can reallocate if necessary) and advancing your read position within the original from the beginning to the end as you go.
Your example somewhat complicates the process by needing to alternate between one of two replacement strings as you work your way down the string. This can be handled with a simple toggle flag. (a variable you alternate 0,1,0,1,...) which will then determine the proper replacement string to use where needed.
The ternary operator (e.g. test ? if_true : if_false; can help reduce the number of if (test) { if_true; } else { if_false; } blocks you have sprinkled through your code -- it's up to you. If the if (test) {} format is more readable to you -- use that, otherwise, use the ternary.
The following example takes the (1) original string, (2) the find substring, (3) the 1st replacement substring, and (4) the 2nd replacement substring as arguments to the program. It allocates for the new string within the strreplace() function, makes the substitutions requested and returns a pointer to the new string to the calling function. The code is heavily commented to help you follow along, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* replace all instances of 'find' in 's' with 'r1' and `r2`, alternating.
* allocate memory, as required, to hold string with replacements,
* returns allocated string with replacements on success, NULL otherwise.
*/
char *strreplace (const char *s, const char *find,
const char *r1, const char *r2)
{
const char *p = s, /* pointer to s */
*sp = s; /* 2nd substring pointer */
char *newstr = NULL, /* newsting pointer to allocate/return */
*np = newstr; /* pointer to newstring to fill */
size_t newlen = 0, /* length for newstr */
used = 0, /* amount of allocated space used */
slen = strlen (s), /* length of s */
findlen = strlen (find), /* length of find string */
r1len = strlen (r1), /* length of replace string 1 */
r2len = strlen (r2); /* length of replace string 2 */
int toggle = 0; /* simple 0/1 toggle flag for r1/r2 */
if (s == NULL || *s == 0) { /* validate s not NULL or empty */
fputs ("strreplace() error: input NULL or empty\n", stderr);
return NULL;
}
newlen = slen * 2; /* double length of s for newstr */
newstr = calloc (1, newlen); /* allocate twice length of s */
if (newstr == NULL) { /* validate ALL memory allocations */
perror ("calloc-newstr");
return NULL;
}
np = newstr; /* initialize newpointer to newstr */
/* locate each substring using strstr */
while ((sp = strstr (p, find))) { /* find beginning of each substring */
size_t len = sp - p; /* length to substring */
/* check if realloc needed? */
if (used + len + (toggle ? r2len : r1len) + 1 > newlen) {
void *tmp = realloc (newstr, newlen * 2); /* realloc to temp */
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-newstr");
return NULL;
}
newstr = tmp; /* assign realloc'ed block to newstr */
newlen *= 2; /* update newlen */
}
strncpy (np, p, len); /* copy from pointer to substring */
np += len; /* advance newstr pointer by len */
*np = 0; /* nul-terminate (already done by calloc) */
strcpy (np, toggle ? r2 : r1); /* copy r2/r1 string to end */
np += toggle ? r2len : r1len; /* advance newstr pointer by r12len */
*np = 0; /* <ditto> */
p += len + findlen; /* advance p by len + findlen */
used += len + (toggle ? r2len : r1len); /* update used characters */
toggle = toggle ? 0 : 1; /* toggle 0,1,0,1,... */
}
/* handle segment of s after last find substring */
slen = strlen (p); /* get remaining length */
if (slen) { /* if not at end */
if (used + slen + 1 > newlen) { /* check if realloc needed? */
void *tmp = realloc (newstr, used + slen + 1); /* realloc */
if (!tmp) { /* validate */
perror ("realloc-newstr");
return NULL;
}
newstr = tmp; /* assign */
newlen += slen + 1; /* update (not required here, know why? */
}
strcpy (np, p); /* add final segment to string */
*(np + slen) = 0; /* nul-terminate */
}
return newstr; /* return newstr */
}
int main (int argc, char **argv) {
const char *s = NULL,
*find = NULL,
*r1 = NULL,
*r2 = NULL;
char *newstr = NULL;
if (argc < 5) { /* validate required no. or arguments given */
fprintf (stderr, "error: insufficient arguments,\n"
"usage: %s <find> <rep1> <rep2>\n", argv[0]);
return 1;
}
s = argv[1]; /* assign arguments to poitners */
find = argv[2];
r1 = argv[3];
r2 = argv[4];
newstr = strreplace (s, find, r1, r2); /* replace substrings in s */
if (newstr) { /* validate return */
printf ("oldstr: %s\nnewstr: %s\n", s, newstr);
free (newstr); /* don't forget to free what you allocate */
}
else { /* handle error */
fputs ("strreplace() returned NULL\n", stderr);
return 1;
}
return 0;
}
(above, the strreplace function uses pointers to walk ("inch-worm") down the original string making replacement, but you can use string indexes and index variables if that makes more sense to you)
(also note the use of calloc for the original allocation. calloc allocates and sets the new memory to all zero which can aid in insuring you don't forget to nul-terminate your string, but note any memory added by realloc will not be zeroed -- unless you manually zero it with memset or the like. The code above manually terminates the new string after each copy, so you can use either malloc or calloc for the allocation)
Example Use/Output
First example:
$ ./bin/str_substr_replace2 "^Hello world^" "^" "<s>" "</s>"
oldstr: ^Hello world^
newstr: <s>Hello world</s>
Second example:
$ ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
==8962== Memcheck, a memory error detector
==8962== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8962== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8962== Command: ./bin/str_substr_replace2 This\ is\ ~my\ house~\ bud ~ \<b\> \</b\>
==8962==
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
==8962==
==8962== HEAP SUMMARY:
==8962== in use at exit: 0 bytes in 0 blocks
==8962== total heap usage: 1 allocs, 1 frees, 44 bytes allocated
==8962==
==8962== All heap blocks were freed -- no leaks are possible
==8962==
==8962== For counts of detected and suppressed errors, rerun with: -v
==8962== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have any further questions.

Related

Differences between allocating memory spaces for a string based on the size of its characters vs. the size of the entire string

When we allocating memory spaces for a string, do the following 2 ways give the same result?
char *s = "abc";
char *st1 = (char *)malloc(sizeof(char)*strlen(s));
char *st2 = (char *)malloc(sizeof(s));
In other words, does allocate the memory based on the size of its characters give the same result as allocating based on the size of the whole string?
If I do use the later method, is it still possible for me to add to that memory spaces character by character such as:
*st = 'a';
st++;
*st = 'b';
or do I have to add a whole string at once now?
Let's see if we can't get you straightened out on your question and on allocating (and reallocating) storage. To begin, when you declare:
char *s = "abc";
You have declared a pointer to char s and you have assigned the starting address for the String Literal "abc" to the pointer s. Whenever you attempt to use sizeof() on a_pointer, you get sizeof(a_pointer) which is typically 8-bytes on x86_64 (or 4-bytes on x86, etc..)
If you take sizeof("abc"); you are taking the size of a character array with size 4 (e.g. {'a', 'b', 'c', '\0'}), because a string literal is an array of char initialized to hold the string "abc" (including the nul-terminating character). Also note, that on virtually all systems, a string literal is created in read-only memory and cannot be modified, it is immutable.
If you want to allocate storage to hold a copy of the string "abc", you must allocate strlen("abc") + 1 characters (the +1 for the nul-terminating character '\0' -- which is simply ASCII 0, see ASCII Table & Description.
Whenever you allocate memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. So if you allocate for char *st = malloc (len + 1); characters, you do not want to iterate with the pointer st (e.g. no st++). Instead, declare a second pointer, char *p = st; and you are free to iterate with p.
Also, in C, there is no need to cast the return of malloc, it is unnecessary. See: Do I cast the result of malloc?.
If you want to add to an allocation, you use realloc() which will create a new block of memory for you and copy your existing block to it. When using realloc(), you always reallocate using a temporary pointer (e.g. don't st = realloc (st, new_size);) because if when realloc() fails, it returns NULL and if you assign that to your pointer st, you have just lost the original pointer and created a memory leak. Instead, use a temporary pointer, e.g. void *tmp = realloc (st, new_size); then validate realloc() succeeds before assigning st = tmp;
Now, reading between the lines that is where you are going with your example, the following shows how that can be done, keeping track of the amount of memory allocated and the amount of memory used. Then when used == allocated, you reallocate more memory (and remembering to ensure you have +1 bytes available for the nul-terminating character.
A short example would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define THISMANY 23
int main (void) {
char *s = "abc", *st, *p; /* string literal and pointer st */
size_t len = strlen(s), /* length of s */
allocated = len + 1, /* number of bytes in new block allocated */
used = 0; /* number of bytes in new block used */
st = malloc (allocated); /* allocate storage for copy of s */
p = st; /* pointer to allocate, preserve st */
if (!st) { /* validate EVERY allocation */
perror ("malloc-st");
return 1;
}
for (int i = 0; s[i]; i++) { /* copy s to new block of memory */
*p++ = s[i]; /* (could use strcpy) */
used++; /* advance counter */
}
*p = 0; /* nul-terminate copy */
for (size_t i = 0; i < THISMANY; i++) { /* loop THISMANY times */
if (used + 1 == allocated) { /* check if realloc needed (remember '\0') */
/* always realloc using temporary pointer */
void *tmp = realloc (st, 2 * allocated); /* realloc 2X current */
if (!tmp) { /* validate EVERY reallocation */
perror ("realloc-st");
break; /* don't exit, original st stil valid */
}
st = tmp; /* assign reallocated block to st */
allocated *= 2; /* update allocated amount */
}
*p++ = 'a' + used++; /* assign new char, increment used */
}
*p = 0; /* nul-terminate */
printf ("result st : %s\n" /* output final string, length, allocated */
"length st : %zu bytes\n"
"final size : %zu bytes\n", st, strlen(st), allocated);
free (st); /* don't forget to free what you have allocated */
}
Example Use/Output
$ ./bin/sizeofs
result st : abcdefghijklmnopqrstuvwxyz
length st : 26 bytes
final size : 32 bytes
Look things over and let me know if this answered your questions, and if not, leave a comment and I'm happy to help further.
If you are still shaky on what a pointer is, and would like more information, here are a few links that provide basic discussions of pointers that may help. Difference between char pp and (char) p? and Pointer to pointer of structs indexing out of bounds(?)... (ignore the titles, the answers discuss pointer basics)

How to read a text file and store in an array in C

The script successfully prints the text file however I want to store what is in the text file into an array, I have looked a lot of places but I am not exactly understanding what information I have come across, is there anyway I can get some guidance?
#include <stdlib.h>
int main()
{
// OPENS THE FILE
FILE *fp = fopen("/classes/cs3304/cs330432/Programs/StringerTest/people.txt", "r");
size_t len = 1000;
char *word = malloc(sizeof(char) * len);
// CHECKS IF THE FILE EXISTS, IF IT DOESN'T IT WILL PRINT OUT A STATEMENT SAYING SO
if (fp == NULL)
{
printf("file not found");
return 0;
}
while(fgets(word, len, fp) != NULL)
{
printf("%s", word);
}
free(word);
}
the text file has the following in it(just a list of words):
endorse
vertical
glove
legend
scenario
kinship
volunteer
scrap
range
elect
release
sweet
company
solve
elapse
arrest
witch
invasion
disclose
professor
plaintiff
definition
bow
chauvinist
Let's see if we can't get you straightened out. First, you are thinking in the right direction, and you should be commended for using fgets() to read each line into a fixed buffer (character array), and then you need to collect and store all of the lines so that they are available for use by your program -- that appears to be where the wheels fell off.
Basic Outline of Approach
In an overview, when you want to handle an unlimited number of lines, you have two different types of blocks of memory you are going to allocate and manage. The first is a block of memory you allocate that will hold some number of pointers (one for each line you will store). It doesn't matter how many you initially allocate, because you will keep track of the number allocated (number available) and the number used. When (used == available) you will realloc() a bigger block of memory to hold more pointers and keep on going.
The second type block of memory you will handle is the storage for each line. No mystery there. You will allocate storage for each character (+1 for the null-terminating character) and you will copy the line from your fixed buffer to the allocated block.
The two blocks of memory work together, because to create your collection, you simply assign the address for the block of memory holding the line of data to the next available pointer.
Let's think through a short example where we declare char **lines; as the pointer to the block of memory holding pointers. Then say we allocate two-pointers initially, we have valid pointers available for lines[0] and lines[1]. We track the number of pointers available with nptrs and the number used with used. So initially nptrs = 2; and used = 0;.
When we read our first line with fgets(), we will trim the '\n' from the end of the string and then get the length of the string (len = strlen(buffer);). We can then allocate storage for the string assigning the address of the allocated block to our first pointer, e.g.
lines[used] = malloc (len + 1);
and then copy the contents of buffer to lines[0], e.g.
memcpy (lines[used], buffer, len + 1);
(note: there is no reason to call strcpy() and have it scan for end-of-string again, we already know how many characters to copy -- including the nul-terminating character)
Finally, all that is needed to keep our counters happy is to increment used by one. We store the next line the same way, and on the 3rd iteration used == nptrs so we realloc() more pointers (generally just doubling the number of pointers each time a realloc() is required). That is a good balance between calls to realloc() and growth of the number of pointers -- but you are free to increment the allocation any way you like -- but avoid calling realloc() for every line.
So you keep reading lines, checking if realloc() is required, reallocating if needed, and allocating for each line assigning the starting address to each of your pointers in turn. The only additional note is that when you realloc() you always use a temporary pointer so when realloc() fails and returns NULL, you do not overwrite your original pointer with NULL losing the starting address to the block of memory holding pointers -- creating a memory leak.
Implementation
The details were left out of the overview, so let's look at a short example to read an unknown number of lines from a file (each line being 1024 characters or less) and storing each line in a collection using a pointer-to-pointer to char as described above. Don't use Magic-Numbers in your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTRS 2 /* initial no. of pointers to allocate (lines) */
Don't hardcode Filenames in your code either, that was argc and argv are for in int main (int argc, char **argv). Pass the filename to read as the first argument to the program (or read from stdin by default if no argument is given):
int main (int argc, char **argv) {
char buf[MAXC], /* fixed buffer to read each line */
**lines = NULL; /* pointer to pointer to hold collection of lines */
size_t nptrs = NPTRS, /* number of pointers available */
used = 0; /* number of pointers used */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
(note: you should not need to recompile your program just to read from a different filename)
Now allocate and Validate your initial number of pointers
/* allocate/validate block holding initial nptrs pointers */
if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
perror ("malloc-lines");
exit (EXIT_FAILURE);
}
Read each line and trim the '\n' from the end and get the number of characters that remaining after the '\n' has been removed (you can use strcspn() to do it all at once):
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
size_t len;
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save length */
Next we check if a reallocation is needed and if so reallocate using a temporary pointer:
if (used == nptrs) { /* check if realloc of lines needed */
/* always realloc using temporary pointer (doubling no. of pointers) */
void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
if (!tmp) { /* validate reallocation */
perror ("realloc-lines");
break; /* don't exit, lines still good */
}
lines = tmp; /* assign reallocated block to lines */
nptrs *= 2; /* update no. of pointers allocatd */
/* (optionally) zero all newly allocated memory here */
}
Now allocate and Validate the storage for the line and copy the line to the new storage, incrementing used when done -- completing your read-loop.
/* allocate/validate storage for line */
if (!(lines[used] = malloc (len + 1))) {
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buf, len + 1); /* copy line from buf to lines[used] */
used += 1; /* increment used pointer count */
}
/* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
Now you can use the lines stored in lines as needed in your program, remembering to free the memory for each line when done and then finally freeing the block of pointers, e.g.
/* use lines as needed (simply outputting here) */
for (size_t i = 0; i < used; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free line storage when done */
}
free (lines); /* free pointers when done */
}
That's all that is needed. Now you can go read the 324,000 words in /usr/share/dict/words (or perhaps on your system /var/lib/dict/words depending on distro) and you will not have any problems doing so.
Input File
A short example file:
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/fgets_lines_dyn_simple dat/captnjack.txt
line[ 0] : This is a tale
line[ 1] : Of Captain Jack Sparrow
line[ 2] : A Pirate So Brave
line[ 3] : On the Seven Seas.
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156== Memcheck, a memory error detector
==8156== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8156== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8156== Command: ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156==
line[ 0] : This is a tale
line[ 1] : Of Captain Jack Sparrow
line[ 2] : A Pirate So Brave
line[ 3] : On the Seven Seas.
==8156==
==8156== HEAP SUMMARY:
==8156== in use at exit: 0 bytes in 0 blocks
==8156== total heap usage: 9 allocs, 9 frees, 5,796 bytes allocated
==8156==
==8156== All heap blocks were freed -- no leaks are possible
==8156==
==8156== For counts of detected and suppressed errors, rerun with: -v
==8156== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
The Full Code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTRS 2 /* initial no. of pointers to allocate (lines) */
int main (int argc, char **argv) {
char buf[MAXC], /* fixed buffer to read each line */
**lines = NULL; /* pointer to pointer to hold collection of lines */
size_t nptrs = NPTRS, /* number of pointers available */
used = 0; /* number of pointers used */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate/validate block holding initial nptrs pointers */
if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
perror ("malloc-lines");
exit (EXIT_FAILURE);
}
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
size_t len;
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save length */
if (used == nptrs) { /* check if realloc of lines needed */
/* always realloc using temporary pointer (doubling no. of pointers) */
void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
if (!tmp) { /* validate reallocation */
perror ("realloc-lines");
break; /* don't exit, lines still good */
}
lines = tmp; /* assign reallocated block to lines */
nptrs *= 2; /* update no. of pointers allocatd */
/* (optionally) zero all newly allocated memory here */
}
/* allocate/validate storage for line */
if (!(lines[used] = malloc (len + 1))) {
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buf, len + 1); /* copy line from buf to lines[used] */
used += 1; /* increment used pointer count */
}
/* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* use lines as needed (simply outputting here) */
for (size_t i = 0; i < used; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free line storage when done */
}
free (lines); /* free pointers when done */
}
Look things over and let me know if you have any questions. If you also wanted to read lines of unknown length (millions of characters long), you would simply loop doing the same thing allocating and reallocating for each line until the '\n' character was found (or EOF) marking the end of the line. It is no different in principle than what we have done above for the pointers.

Need help for reading a file character by character in C

I have a question about reading a file character by character and counting it in C
here's my code down below
void read_in(char** quotes){
FILE *frp = fopen(IN_FILE, "r");
char c;
size_t tmp_len =0, i=0;
//char* tmp[100];
//char* quotes[MAX_QUOTES];
//char str = fgets(str, sizeof(quotes),frp);
while((c=fgetc(frp)) != EOF){
if(frp == NULL){
printf("File is empty!");
fclose(frp); exit(1);
}
else{
if(c != '\n'){
printf("%c",c);
c=fgetc(frp);
tmp_len++;
}
}
char* tmp = (char*)calloc(tmp_len+1, sizeof(char));
fgets(tmp, sizeof(tmp), frp);
strcpy((char*)quotes[i], tmp);
printf("%s\n", (char*)quotes[i]);
i++;
}
}
It doesn't work but I don't understand why.
Thank you
From your question and through the comments, it is relatively clear you want to read all quotes (lines) in a file into dynamically allocated storage (screen 1) and then sort the lines by length and output the first 5 shortest lines (screen 2) saving the 5 shortest lines to a second output file (this part is left to you). Reading and storing all lines from a file isn't difficult -- but it isn't trivial either. It sounds basic, and it is, but it requires that you use all of the basic tools needed to interface with persistent storage (reading the file from disk/storage media) and your computer's memory subsystem (RAM) -- correctly.
Reading each line from a file isn't difficult, but like anything in C, it requires you to pay attention to the details. You can read from a file using character-oriented input functions (fgetc(), getc(), etc..), you can use formatted-input functions (fscanf()) and you can use line-oriented input functions such as (fgets() or POSIX getline()). Reading lines from a file is generally done with line-oriented functions, but there is nothing wrong with using a character-oriented approach either. In fact you can relatively easily write a function based around fgetc() that will read each line from a file for you.
In the trivial case where you know the maximum number of characters for the longest line in the file, you can use a 2D array of characters to store the entire file. This simplifies the process by eliminating the need to allocate storage dynamically, but has a number of disadvantages like each line in the file requiring the same storage as the longest line in the file, and by limiting the size of the file that can be stored to the size of your program stack. Allocating storage dynamically with (malloc, calloc, or realloc) eliminates these disadvantages and inefficiencies allowing you to store files up to the limit of the memory available on your computer. (there are methods that allow both to handle files of any size by using sliding-window techniques well beyond your needs here)
There is nothing difficult about handling dynamically allocated memory, or in copying or storing data within it on a character-by-character basis. That said, the responsibility for each allocation, tracking the amount of data written to each allocated block, reallocating to resize the block to ensure no data is written outside the bounds of each block and then freeing each allocated block when it is no longer needed -- is yours, the programmer. C gives the programmer the power to use each byte of memory available, and also places on the programmer the responsibility to use the memory correctly.
The basic approach to storing a file is simple. You read each line from the file, allocating/reallocating storage for each character until a '\n' or EOF is encountered. To coordinate all lines, you allocate a block of pointers, and you assign the address for each block of memory holding a line to a pointer, in sequence, reallocating the number of pointers required as needed to hold all lines.
Sometimes a picture really is worth 1000 words. With the basic approach you declare a pointer (to what?) a pointer so you can allocate a block of memory containing pointers to which you will assign each allocated line. For example, you could declare, char **lines; A pointer-to-pointer is a single pointer that points to a block of memory containing pointers. Then the type for each pointer for lines will be char * which will point to each block holding a line from the file, e.g.
char **lines;
|
| allocated
| pointers allocated blocks holding each line
lines --> +----+ +-----+
| p1 | --> | cat |
+----+ +-----+--------------------------------------+
| p2 | --> | Four score and seven years ago our fathers |
+----+ +-------------+------------------------------+
| p3 | --> | programming |
+----+ +-------------------+
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
You can make lines a bit more flexible to use by allocating 1 additional pointer and initializing that pointer to NULL which allows you to iterate over lines without knowing how many lines there are -- until NULL is encountered, e.g.
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
|pn+1| | NULL |
+----+ +------+
While you can put this all together in a single function, to help the learning process (and just for practical reusability), it is often easier to break this up into two function. One that reads and allocates storage for each line, and a second function that basically calls the first function, allocating pointers and assigning the address for each allocated block of memory holding a line read from the file to the next pointer in turn. When you are done, you have an allocated block of pointers where each of the pointers holds the address of (points to) an allocated block holding a line from the file.
You have indicated you want to read from the file with fgetc() and read a character at a time. There is nothing wrong with that, and there is little penalty to this approach since the underlying I/O subsystem provides a read-buffer that you are actually reading from rather than reading from disk one character at-a-time. (the size varies between compilers, but is generally provided through the BUFSIZ macro, both Linux and Windows compilers provide this)
There are virtually an unlimited number of ways to write a function that allocates storage to hold a line and then reads a line from the file one character at-a-time until a '\n' or EOF is encountered. You can return a pointer to the allocated block holding the line and pass a pointer parameter to be updated with the number of characters contained in the line, or you can have the function return the line length and pass the address-of a pointer as a parameter to be allocated and filled within the function. It is up to you. One way would be:
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
...
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
(note: the ssize_t is a signed type providing the range of size_t that essentially allows the return of -1. it is provided in the sys/types.h header. you can adjust the type as desired)
The fgetclines() function makes one final call to realloc to shrink the size of the allocation to the exact number of characters needed to hold the line and the nul-terminating character.
The function called to read all lines in the file while allocation and reallocating pointers as required does essentially the same thing as the fgetclines() function above does for characters. It simply allocates some initial number of pointers and then begins reading lines from the file, reallocating twice the number of pointers each time it is needed. It also adds one additional pointer to hold NULL as a sentinel that will allow iterating over all pointers until NULL is reached (this is optional). The parameter n is updated to with the number of lines stored to make that available back in the calling function. This function too can be written in a number of different ways, one would be:
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
Note above, the function takes an open FILE* parameter for the file rather than taking a filename to open within the function. You generally want to open the file in the calling function and validate that it is open for reading before calling a function to read all the lines. If the file cannot be opened in the caller, there is no reason to make the function all to read the line from the file to begin with.
With a way to read an store all lines from your file done, you next need to turn to sorting the lines by length so you can output the 5 shortest lines (quotes). Since you will normally want to preserve the lines from your file in-order, the easiest way to sort the lines by length while preserving the original order is just to make a copy of the pointers and sort the copy of pointers by line length. For example, your lines pointer can continue to contain the pointers in original order, while the set of pointers sortedlines can hold the pointers in order sorted by line length, e.g.
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
After reading the file and filling the lines pointer, you can copy the pointers to sortedlines (including the sentinel NULL), e.g.
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
Then you simply call qsort to sort the pointers in sortedlines by length. Your only job with qsort is to write the *compare` function. The prototype for the compare function is:
int compare (const void *a, const void *b);
Both a and b will be pointers-to elements being sorted. In your case with char **sortedlines;, the elements will be pointer-to-char, so a and b will both have type pointer-to-pointer to char. You simply write a compare function so it will return less than zero if the length of line pointed to by a is less than b (already in the right order), return zero if the length is the same (no action needed) and return greater than zero if the length of a is greater than b (a swap is required). Writing the compare a the difference of two conditionals rather than simple a - b will prevent all potential overflow, e.g.
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
Now you can simply pass the collection of objects, the number of object, the size of each object and the function to use to sort the objects to qsort. It doesn't matter what you need to sort -- it works the same way every time. There is no reason you should ever need to "go write" a sort (except for educational purposes) -- that is what qsort is provided for. For example, here with sortedlines, all you need is:
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
Now you can display all lines by iterating through lines and display all lines in ascending line length through sortedlines. Obviously to display the first 5 lines, just iterate over the first 5 valid pointers in sortedlines. The same applies to opening another file for writing and writing those 5 lines to a new file. (that is left to you)
That's it. Is any of it difficult -- No. Is it trivial to do -- No. It is a basic part of programming in C that takes work to learn and to understand, but that is no different than anything worth learning. Putting all the pieces together in a working program to read and display all lines in a file and then sort and display the first 5 shortest lines you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
/** free all allocated memory (both lines and pointers) */
void freelines (char **lines, size_t nlines)
{
for (size_t i = 0; i < nlines; i++) /* loop over each pointer */
free (lines[i]); /* free allocated line */
free (lines); /* free pointers */
}
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
size_t n = 0; /* no. of pointers with allocated lines */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
if (!(lines = readfile (fp, &n))) /* read all lines in file, fill lines */
return 1;
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
/* output all lines from file (first screen) */
puts ("All lines:\n\nline : text");
for (size_t i = 0; i < n; i++)
printf ("%4zu : %s\n", i + 1, lines[i]);
/* output first five shortest lines (second screen) */
puts ("\n5 shortest lines:\n\nline : text");
for (size_t i = 0; i < (n >= NSHORT ? NSHORT : n); i++)
printf ("%4zu : %s\n", i + 1, sortedlines[i]);
freelines (lines, n); /* free all allocated memory for lines */
free (sortedlines); /* free block of pointers */
}
(note: the file reads from the filename passed as the first argument to the program, or from stdin if no argument is given)
Example Input File
$ cat dat/fleascatsdogs.txt
My dog
My fat cat
My snake
My dog has fleas
My cat has none
Lucky cat
My snake has scales
Example Use/Output
$ ./bin/fgetclinesimple dat/fleascatsdogs.txt
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900== Memcheck, a memory error detector
==5900== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5900== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5900== Command: ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900==
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
==5900==
==5900== HEAP SUMMARY:
==5900== in use at exit: 0 bytes in 0 blocks
==5900== total heap usage: 21 allocs, 21 frees, 7,938 bytes allocated
==5900==
==5900== All heap blocks were freed -- no leaks are possible
==5900==
==5900== For counts of detected and suppressed errors, rerun with: -v
==5900== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
There is a lot here, and as with any "how do it do X?" question, the devil is always in the detail, the proper use of each function, the proper validation of each input or allocation/reallocation. Each part is just as important as the other to ensure your code does what you need it to do -- in a defined way. Look things over, take your time to digest the parts, and let me know if you have further questions.
If you are using Linux you can try to use getline instead of fgetc and fgets because getline takes care of memory allocation.
Example:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
if (argc != 2)
{
printf("usage: rf <filename>\n");
exit(EXIT_FAILURE);
}
fp = fopen(argv[1], "r");
if (fp == NULL)
{
perror("fopen");
exit(EXIT_FAILURE);
}
while ((read = getline(&line, &len, fp)) != -1) {
printf("Retrieved line of length %zu :\n", read);
printf("%s", line);
}
free(line);
exit(EXIT_SUCCESS);
}

Modifying arrays in-place and understanding memory allocation for it

I have the following two functions which takes an arrays of strings and makes them lowercase (in-place) --
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
void to_lower(char ** strings) {
char * original_string;
char * lower_string;
for (int i=0; (original_string=strings[i]) != NULL; i++) {
lower_string = malloc(strlen(original_string + 1) * sizeof(char));
for (int j=0; j<=strlen(original_string); j++) {
lower_string[j] = tolower(original_string[j]);
}
strings[i]=lower_string;
}
}
int main(void) {
char * strings[] = {"Hello", "Zerotom", "new", NULL };
to_lower(strings);
to_lower(strings); // this and successive calls won't change
to_lower(strings); // anything but are here to try and understand malloc
to_lower(strings);
to_lower(strings);
return 0;
}
At the beginning of the main function before to_lower is called, how much memory is consumed? My guess was 16 bytes from the array of chars (15 chars + 1 null byte at end).
After to_lower has run 5 times and before the function is returned, how much memory has been consumed? Where should I be "free"-ing the strings that are being passed into the function (as my thought was calling malloc every time a string is copied/lower-cased it creates that much additional memory but never frees anything.
Does the to_lower function look ok, or how can it be modified so that it doesn't leak memory if it currently is?
You are wrapping yourself around the axle (confusing yourself) by combining the allocation and conversion to lower into a single void to_lower(char ** strings) function. Just as you have found, if you want to call the function twice on the same object, you have to free the memory you allocated to hold the lowercase strings in between calls -- but then you have lost the pointers to your original strings..
While there is nothing wrong with combining multiple operations in a function, you do have to step back and make sure what you are doing makes sense and won't cause more problems than it solves.
Your allocation and copying of strings is required before you modify the strings contained in strings because you initialize strings as an array of pointers to string-literals. String-literal are immutable (on all but a very few systems) created in read-only memory (generally the .rodata section of the executable). Attempting to modify a string-literal will just about guarantee a SegFault (on all but the quirky few systems)
Further, how are you going to get the original strings back if you have already overwritten the pointer addresses to the originals with pointers to allocated memory holding the lower-case results? (not to mention the tracking headache you create by replacing pointers to literals with pointers to allocated memory as far as when it is okay to free those pointers).
Here it is far better to leave you original strings untouched and simply allocate a copy of the original (by allocating pointers, including one for the sentinel value, and allocating storage for each of the original strings and then copy the original strings before converting to lowercase. This solves your problem with memory leaks as well as the problem with losing your original pointers to the string literals. You can free the lowercase strings as needed and you can always make another copy of the originals to send to your conversion function again.
So how would you go about implementing this? The easiest way is just to declare a pointer to pointer to char (e.g. a double-pointer) to which you can assign a block of memory for any number of pointers you like. In this case just allocate the same number of pointers as you have in the strings array, e.g.:
char *strings[] = {"Hello", "Zerotom", "new", NULL };
size_t nelem = *(&strings + 1) - strings; /* number of pointers */
/* declare & allocate nelem pointers */
char **modstrings = malloc (nelem * sizeof *modstrings);
if (!modstrings) { /* validate EVERY allocation */
perror ("malloc-modstrings");
}
(note: you can also use sizeof strings / sizeof *strings to obtain the number of elements)
Now that you have modstrings assigned a block of memory holding the same number of pointers as you have in strings, you can simply allocate blocks of memory sufficient to hold each of the string literals and assign the starting address for each block to successive pointers in modstrings setting the last pointer NULL as your sentinel, e.g.
void copy_strings (char **dest, char * const *src)
{
while (*src) { /* loop over each string */
size_t len = strlen (*src); /* get length */
if (!(*dest = malloc (len + 1))) { /* allocate/validate */
perror ("malloc-dest"); /* handle error */
exit (EXIT_FAILURE);
}
memcpy (*dest++, *src++, len + 1); /* copy *src to *dest (advance) */
}
*dest = NULL; /* set sentinel NULL */
}
(note: by passing the src parameter as char * const *src instead of just char **src, you can indicate to the compiler that src will not be changed allowing further optimizations by the compiler. restrict would be similar, but that discussion is left to another day)
Your to_lower function then reduces to:
void to_lower (char **strings) {
while (*strings) /* loop over each string */
for (char *p = *strings++; *p; p++) /* loop over each char */
*p = tolower (*p); /* convert to lower */
}
As a convenience, since you know you will want to copy strings to modstrings before each call to to_lower, you can combine both functions into a single wrapper (that does make sense to combine), e.g.
void copy_to_lower (char **dest, char * const *src)
{
copy_strings (dest, src); /* just combine functions above into single */
to_lower (dest);
}
(you could even add the print_array and free_strings above as well if you always want to do those operations in a single call as well -- more later)
In between each copy_to_lower and print_array of modstrings you will need to free the storage assigned to each pointer so you do not leak memory when you call copy_to_lower again. A simple free_strings function could be:
void free_strings (char **strings)
{
while (*strings) { /* loop over each string */
free (*strings); /* free it */
*strings++ = NULL; /* set pointer NULL (advance to next) */
}
}
You can now allocate, copy, convert to lower, print, and free as many times as you like in main(). you would simply make repeated calls to:
copy_to_lower (modstrings, strings); /* copy_to_lower to modstrings */
print_array (modstrings); /* print modstrings */
free_strings (modstrings); /* free strings (not pointers) */
copy_to_lower (modstrings, strings); /* ditto */
print_array (modstrings);
free_strings (modstrings);
...
Now recall, you are freeing the storage for each string when you call free_strings, but you are leaving the block of memory containing the pointers assigned to modstrings. So to complete your free of all memory you have allocated, don't forget to free the pointers, e.g.
free (modstrings); /* now free pointers */
Putting it altogether into an example you could do the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
void print_array (char **strings)
{
while (*strings)
printf ("%s, ", *strings++);
putchar ('\n');
}
void free_strings (char **strings)
{
while (*strings) { /* loop over each string */
free (*strings); /* free it */
*strings++ = NULL; /* set pointer NULL (advance to next) */
}
}
void copy_strings (char **dest, char * const *src)
{
while (*src) { /* loop over each string */
size_t len = strlen (*src); /* get length */
if (!(*dest = malloc (len + 1))) { /* allocate/validate */
perror ("malloc-dest"); /* handle error */
exit (EXIT_FAILURE);
}
memcpy (*dest++, *src++, len + 1); /* copy *src to *dest (advance) */
}
*dest = NULL; /* set sentinel NULL */
}
void to_lower (char **strings) {
while (*strings) /* loop over each string */
for (char *p = *strings++; *p; p++) /* loop over each char */
*p = tolower (*p); /* convert to lower */
}
void copy_to_lower (char **dest, char * const *src)
{
copy_strings (dest, src); /* just combine functions above into single */
to_lower (dest);
}
int main(void) {
char *strings[] = {"Hello", "Zerotom", "new", NULL };
size_t nelem = *(&strings + 1) - strings; /* number of pointers */
/* declare & allocate nelem pointers */
char **modstrings = malloc (nelem * sizeof *modstrings);
if (!modstrings) { /* validate EVERY allocation */
perror ("malloc-modstrings");
}
copy_to_lower (modstrings, strings); /* copy_to_lower to modstrings */
print_array (modstrings); /* print modstrings */
free_strings (modstrings); /* free strings (not pointers) */
copy_to_lower (modstrings, strings); /* ditto */
print_array (modstrings);
free_strings (modstrings);
copy_to_lower (modstrings, strings); /* ditto */
print_array (modstrings);
free_strings (modstrings);
copy_to_lower (modstrings, strings); /* ditto */
print_array (modstrings);
free_strings (modstrings);
free (modstrings); /* now free pointers */
}
Example Use/Output
$ ./bin/tolower_strings
hello, zerotom, new,
hello, zerotom, new,
hello, zerotom, new,
hello, zerotom, new,
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/tolower_strings
==5182== Memcheck, a memory error detector
==5182== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5182== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==5182== Command: ./bin/tolower_strings
==5182==
hello, zerotom, new,
hello, zerotom, new,
hello, zerotom, new,
hello, zerotom, new,
==5182==
==5182== HEAP SUMMARY:
==5182== in use at exit: 0 bytes in 0 blocks
==5182== total heap usage: 13 allocs, 13 frees, 104 bytes allocated
==5182==
==5182== All heap blocks were freed -- no leaks are possible
==5182==
==5182== For counts of detected and suppressed errors, rerun with: -v
==5182== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Now this is a long post, but you have made progress on learning dynamic allocation. It will take you a while to digest it all, but if you have further questions just let me know.
Does the to_lower function look ok, or how can it be modified so that
it doesn't leak memory if it currently is?
As pointed out by #chux in comments, you need to add 1 to the len of orginal_string, not to the pointer itself.
Regarding your question, yes, it leaks, each call to malloc wants a call to free. The problem here is: you are not allowed to call free on the initial values since they are filled with string-literals.
A posible solution is:
extern char *strdup(const char *);
static char *dup(const char *str)
{
char *res = strdup(str);
if (res == NULL) {
perror("strdup");
exit(EXIT_FAILURE);
}
return res;
}
int main(void)
{
char *strings[] = {
dup("Hello"),
dup("Zerotom"),
dup("new"),
NULL
};
...
Now you can call to_lower and you don't need to malloc inside the function, just call free for each element at the very end when the array is no longer needed.
Notice that strdup is not part of the standard (but is avaliable on many implementations)
Every time to_lower() is called, you are replacing all string literals with dynamic memory pointers.
If you are calling to_lower() again without freeing existing memory, there is a memory leak.
lower_string = malloc(strlen(original_string + 1) * sizeof(char));
for (int j=0; j<=strlen(original_string); j++) {
lower_string[j] = tolower(original_string[j]);
}
strings[i]=lower_string;
So that when strings[] array is no longer needed, you should free all its memory.
E.g.
for (int i=0; strings[i] != NULL; i++) {
free(strings[i]);
strings[i] = NULL;
}
As an alternative to {"Hello", "Zerotom", "new", NULL }; and malloc() and friends, form the array of pointers char * strings[] to be initialized with pointers to modifiable data.
Since C99, use compound literals.
void inplace_strtolower(char * s) {
while (*s) {
*s = (char) tolower((unsigned char ) *s);
s++;
}
}
// "Non-const string literal"
// Compound literal from string literal"
#define NCSL(sl) ((char[sizeof(sl)]) { sl })
int main(void) {
char * strings[] = {NCSL("Hello"), NCSL("Zerotom"), NCSL("new"), NULL};
inplace_strtolower(strings[0]);
inplace_strtolower(strings[1]);
inplace_strtolower(strings[2]);
puts(strings[0]);
puts(strings[1]);
puts(strings[2]);
return 0;
}
Output
hello
zerotom
new

C - function returns a pointer to a string from char matrix ONLY use pointers

I need to write a function called MakeString. The function returns a pointer to a string, which contains one word composed of each row of the small matrix in order, so that each line break will be expressed
as a single space between the words In the string. (After the last word there will be no space.)
Inference: In the function there is no use of [], but executes by working with pointers.
Function. In addition, trips should be carried out with the voters, meaning that they will actually move to vote for each other
Whatever it takes, and will not stay in the same location all the time.
The 'answer' that the function returns, which is a pointer, will be input into a pointer in MAIN.
I tried to do the function, but It NOT like instruction and not good...
#define SIZE 4
static char allocbuf[SIZE][];
static char *allocp = allocbuf;
char matrix[SIZE][SIZE]
{
{U,N,T,E},
{C,P,G,X},
{D,L,A,B},
{J,T,N,N}
};
char MakeString(int n) /*return pointer to n charachters*/
{
if (allocbuf + SIZE - allocp >=n)
{
allocp += n;
return allocp - n;
}
else
return 0;
}
For example:
Small matrix:
U N T E
C P G X
D L A B
J T N N
pStr = UNTE CPGX DLAB JTNN
Thanks (:
If I understand your question, you want write a function to read the characters from matrix (a 2D array of char) into an allocated string placing a space between each rows worth of characters and returning the nul-terminated string back to the calling function. You need to do this using pointers and without array [index] notation.
To begin, your declaration of matrix is wrong. E isn't a character, it is a variable. 'E' is a character literal. (note the single-quotes) So a proper declaration of matrix would be:
char matrix[SIZE][SIZE] = { {'U','N','T','E'}, /* don't use globals */
{'C','P','G','X'}, /* declare in main and */
{'D','L','A','B'}, /* pass as a parameter */
{'J','T','N','N'} };
(note: simply char matrix[][SIZE] = {{...}}; is sufficient, where the number of rows will be sized based on your initialization)
As noted in the comment, avoid the use of global variables unless absolutely necessary. (very limited cases -- not here). Instead, declare matrix in the scope where it is required and pass matrix as a parameter to any function that needs to process the data. By contrast, defining is constant with #define is perfectly correct, and you should define constants as needed to avoid using magic-numbers in your code.
Since matrix is a 2D array, to pass it as a parameter, you must include the number of columns as part of the parameter passed. You can either declare the parameter as char matrix[SIZE][SIZE] or equivalently as char (*matrix)[SIZE] reflecting the fact that the first level of indirection is converted to a pointer to the first element on access. See: C11 Standard - 6.3.2.1 Other Operands - Lvalues, arrays, and function designators(p3) (paying attention to the 4-exceptions)
Within your makestring() function you must allocate storage of at least SIZE * SIZE + SIZE (space for each character + 3 spaces + the nul-terminating character). Assigning the starting address of your new block of memory to a pointer, and then creating a second pointer to the block will allow you to iterate over it, copying characters to it -- while preserving a pointer to the beginning.
Putting those pieces together, you could do something similar to:
char *makestring (char (*a)[SIZE])
{
char *str = malloc (SIZE * SIZE + SIZE), *p = str; /* allocate */
if (!str) { /* validate EVERY allocation */
perror ("malloc-str");
return NULL;
}
for (int i = 0; i < SIZE; i++) { /* for each row */
if (i) /* if row not 1st */
*p++ = ' '; /* add space */
for (int j = 0; j < SIZE; j++) /* for each char */
*p++ = *(*(a + i) + j); /* copy to str */
}
*p = 0; /* nul-terminate string */
return str; /* return pointer to allocated string */
}
(note: while not an error, C generally avoids the use of camelCase or MixedCase variable names in favor of all lower-case while reserving upper-case names for use with macros and constants.)
Putting it altogether in a short example, you could do:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 4
char *makestring (char (*a)[SIZE])
{
char *str = malloc (SIZE * SIZE + SIZE), *p = str; /* allocate */
if (!str) { /* validate EVERY allocation */
perror ("malloc-str");
return NULL;
}
for (int i = 0; i < SIZE; i++) { /* for each row */
if (i) /* if row not 1st */
*p++ = ' '; /* add space */
for (int j = 0; j < SIZE; j++) /* for each char */
*p++ = *(*(a + i) + j); /* copy to str */
}
*p = 0; /* nul-terminate string */
return str; /* return pointer to allocated string */
}
int main (void) {
char matrix[SIZE][SIZE] = { {'U','N','T','E'}, /* don't use globals */
{'C','P','G','X'}, /* declare in main and */
{'D','L','A','B'}, /* pass as a parameter */
{'J','T','N','N'} },
*str;
if ((str = makestring (matrix))) { /* validate call to makestring */
printf ("str: '%s'\n", str); /* output string */
free (str); /* free allocated memory */
}
}
(note: don't forget to free the memory you allocate. While it will be freed on program exit, get in the habit of tracking your allocations and ensuring all blocks you allocate are freed)
Example Use/Output
$ ./bin/makestring
str: 'UNTE CPGX DLAB JTNN'
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/makestring
==6576== Memcheck, a memory error detector
==6576== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6576== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6576== Command: ./bin/makestring
==6576==
str: 'UNTE CPGX DLAB JTNN'
==6576==
==6576== HEAP SUMMARY:
==6576== in use at exit: 0 bytes in 0 blocks
==6576== total heap usage: 1 allocs, 1 frees, 20 bytes allocated
==6576==
==6576== All heap blocks were freed -- no leaks are possible
==6576==
==6576== For counts of detected and suppressed errors, rerun with: -v
==6576== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
You need to specify the size of allocbuf so it can hold the entire result:
char allocbuf[SIZE * (SIZE + 1)];
There's no need for allocp, because the array name will decay to a pointer when used in calculations. In MakeString, you need to loop over the rows and characters of matrix, copying them to allocbuf.
char *MakeString()
for (int i = 0; i < SIZE; i++) {
memcpy(allocbuf + i * SIZE, matrix + i, SIZE);
if (i < SIZE - 1) {
// write space between rows
*(allocbuf + i * SIZE + SIZE) = ' ';
} else {
// write null at end
*(allocbuf + i * SIZE + SIZE) = 0;
}
}
return allocbuf;
}
The instructions don't mention the n argument to MakeString(), so I removed it.

Resources