realloc fails to expand char array when piping input from a file - c

I wrote the below c function to get a string from a user. It uses realloc to dynamically change the char array size to accommodate for unknown char array length. From my understanding, it should be able to take as much input as you can throw at it (or have memory available), however, when I attempt to pipe text to it from a randomized text file (used "tr '\n' ' ' ./random.txt" to ensure I removed any newlines from the text file), I get the "Unable to allocate memory to hold char array. Exiting!" error message. Why is this occurring? Should my array be able to hold up to Gigabytes of data since I have 16 Gigabytes of RAM the way it was designed to dynamically grow?
#include <stdio.h>
#include <stdlib.h>
void GetString(int*, int*);
int main(void)
{
unsigned int strLength = 32;
char *stringPtr = malloc(strLength);
if (stringPtr == NULL)
{
fprintf(stderr, "Unable to allocate memory to hold char array. Exiting!\n");
return 1;
}
printf("Enter some input: ");
int c = EOF;
unsigned int i = 0;
while ((c = getchar()) != '\n' && c != EOF)
{
stringPtr[i++] = (char) c;
if (i == strLength)
{
strLength *= strLength;
if ((stringPtr = realloc(stringPtr, strLength)) == NULL)
{
fprintf(stderr, "Unable to expand memory to hold char array. Exiting!\n");
return 2;
}
}
}
stringPtr[i] = '\0';
if (sizeof(stringPtr) < strLength)
{
stringPtr = realloc(stringPtr, i);
}
printf("\n\nString value: %s\n\n\n", stringPtr);
free(stringPtr);
stringPtr = NULL;
}

I modified your program a bit to help figure out what's going wrong:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main(void)
{
unsigned int strLength = 32;
char *stringPtr = malloc(strLength);
if (!stringPtr)
{
fprintf(stderr, "failed to allocate %u bytes: %s\n",
strLength, strerror(errno));
return 1;
}
int c = EOF;
unsigned int i = 0;
while ((c = getchar()) != '\n' && c != EOF)
{
stringPtr[i++] = (char) c;
if (i == strLength)
{
unsigned int nStrLength = strLength;
nStrLength *= nStrLength;
if (nStrLength <= strLength)
{
fprintf(stderr, "cannot grow string of %u bytes any more\n",
strLength);
return 1;
}
if ((stringPtr = realloc(stringPtr, nStrLength)) == NULL)
{
fprintf(stderr,
"failed to enlarge string from %u to %u bytes: %s\n",
strLength, nStrLength, strerror(errno));
return 1;
}
strLength = nStrLength;
}
}
return 0;
}
When run more-or-less as you did, this is what I get:
$ yes | tr -d '\n' | ./a.out
cannot grow string of 1048576 bytes any more
1048576 is one megabyte, but more importantly, it's 220. The square of 220 is 240, which is bigger than 232 − 1, which is the largest value that can be represented in an unsigned int on this system. I predict that you will get the same results on your system.
I therefore recommend you make three changes:
As I mentioned already, all of those unsigned int variables should be size_t instead.
As chux mentioned already, change your code to just multiply strLength by two instead of by itself.
Incorporate an explicit check for overflow along the lines of what I have done here. Or adopt reallocarray, which probably isn't in your C library but you can drop in from the link. [EDIT: reallocarray is still a good idea in general, but it doesn't help with this class of numeric-overflow bugs, because it's the number of items in the array that is overflowing, not the product of item count and size.]
Also, it wasn't your immediate problem this time, but for future reference, strerror(errno) is your friend. Always print strerror(errno) when a system primitive fails.

Related

How to store arrays inside array of pointers

i'm trying to implement little program that takes a text and breaks it into lines and sort them in alphabetical order but i encountered a little problem, so i have readlines function which updates an array of pointers called lines, the problem is when i try to printf the first pointer in lines as an array using %s nothing is printed and there is no errors.
I have used strcpy to copy an every single text line(local char array) into a pointer variable and then store that pointer in lines array but it gave me the error.
Here is the code:
#include <stdio.h>
#define MAXLINES 4
#define MAXLENGTH 1000
char *lines[MAXLINES];
void readlines() {
int i;
for (i = 0; i < MAXLINES; i++) {
char c, line[MAXLENGTH];
int j;
for (j = 0; (c = getchar()) != '\0' && c != '\n' && j < MAXLENGTH; j++) {
line[j] = c;
}
lines[i] = line;
}
}
int main(void) {
readlines();
printf("%s", lines[0]);
getchar();
return 0;
}
One problem is the following line:
lines[i] = line;
In this line, you make lines[i] point to line. However, line is a local char array whose lifetime ends as soon as the current loop iteration ends. Therefore, lines[i] will contain a dangling pointer (i.e. a pointer to an object that is no longer valid) as soon as the loop iteration ends.
For this reason, when you later call
printf("%s", lines[0]);
lines[0] is pointing to an object whose lifetime has ended. Dereferencing such a pointer invokes undefined behavior. Therefore, you cannot rely on getting any meaningful output, and your program may crash.
One way to fix this would be to not make lines an array of pointers, but rather an multidimensional array of char, i.e. an array of strings:
char lines[MAXLINES][MAXLENGTH+1];
Now you have a proper place for storing the strings, and you no longer need the local array line in the function readlines.
Another issue is that the line
printf("%s", lines[0]);
requires that lines[0] points to a string, i.e. to an array of characters terminated by a null character. However, you did not put a null character at the end of the string.
After fixing all of the issues mentioned above, your code should look like this:
#include <stdio.h>
#define MAXLINES 4
#define MAXLENGTH 1000
char lines[MAXLINES][MAXLENGTH+1];
void readlines() {
int i;
for (i = 0; i < MAXLINES; i++) {
char c;
int j;
for (j = 0; (c = getchar()) != '\0' && c != '\n' && j < MAXLENGTH; j++) {
lines[i][j] = c;
}
//add terminating null character
lines[i][j] = '\0';
}
}
int main(void) {
readlines();
printf("%s", lines[0]);
return 0;
}
However, this code still has a few issues, which are probably unrelated to your immediate problem, but could cause trouble later:
The function getchar will return EOF, not '\0', when there is no more data (or when an error occurred). Therefore, you should compare the return value of getchar with EOF instead of '\0'. However, a char is not guaranteed to be able to store the value of EOF. Therefore, you should store the return value of getchar in an int instead. Note that getchar returns a value of type int, not char.
When j reaches MAX_LENGTH, you will call getchar one additional time before terminating the loop. This can cause undesired behavior, such as your program waiting for more user input or an important character being discarded from the input stream.
In order to also fix these issues, I recommend the following code:
#include <stdio.h>
#define MAXLINES 4
#define MAXLENGTH 1000
char lines[MAXLINES][MAXLENGTH+1];
void readlines() {
int i;
for (i = 0; i < MAXLINES; i++)
{
//changed type from "char" to "int"
int c;
int j;
for ( j = 0; j < MAXLENGTH; j++ )
{
if ( (c = getchar()) == EOF || c == '\n' )
break;
lines[i][j] = c;
}
//add terminating null character
lines[i][j] = '\0';
}
}
int main(void) {
readlines();
printf("%s", lines[0]);
return 0;
}
Problem 1
char *lines[MAXLINES];
For the compiler it makes no difference how you write this, but for you, as you are learning C, maybe it is worth consider different spacing and naming. Question is: what is lines[]? lines[] is supposed to be an array of strings and hold some text inside. So lines[0] is a string, lines[1] is a string and so on. As pointed in a comment you could also use char lines[MAX_LINES][MAX_LENGTH] and have a 2D box of NxM char. This way you would have a pre-determined size in terms of number and size of lines and have simpler things at a cost of wasting space in lines of less than MAX_LENGTH chars and having a fixed number of lines you can use, but no need to allocate memory.
A more flexible way is to use an array of pointers. Since each pointer will represent a line, a single one
char* line[MAXLINES];
is a better picture of the use: line[0] is char*, line[1] is char* and so on. But you will need to allocate memory for each line (and you did not) in your code.
Remember int main(int argc, char**argv)
This is the most flexible way, since in this way you can hold any number of lines. The cost? Additional allocations.
size_t n_lines;
char** line;
This may be the best representation, as known by every C program since K&R.
Problem 2
for (
j = 0;
(c = getchar()) != '\0' && c != '\n' && j < MAXLENGTH;
j++) {
line[j] = c;
}
lines[i] = line;
This loop does not copy the final 0 that terminates each string. And reuses the same line, a char[] to hold the data as being read. And the final line does not copy a string, if one existed there. There is no one since the final 0 was stripped off by the loop. And there is no data too, since the area is being reused.
A complete C example of uploading a file to a container in memory
I will let an example of a more controlled way of writing this, a container for a set of lines and even a sorting function.
a data structure
The plan is to build an array of pointers as the system does for main. Since we do no know ahead the number of lines and do not want this limitation we will allocate memory in groups of blk_size lines. At any time we have limit pointers to use. From these size are in use. line[] is char* and points to a single line of text. The struct is
typedef struct
{
size_t blk_size; // block
size_t limit; // actual allocated size
size_t size; // size in use
char** line; // the lines
} Block;
the test function
Block* load_file(const char*);
Plan is to call load_file("x.txt") and the function returns a Block* pointing to the array representing the lines in file, one by one. Then we call qsort() and sort the whole thing. If the program is called lines we will run
lines x.txt
and it will load the file x.txt, show its contents on screen, sort it, show the sorted lines and then erase everything at exit.
main() for the test
int main(int argc, char** argv)
{
char msg[80] = {0};
if (argc < 2) usage();
Block* test = load_file(argv[1]);
sprintf(msg, "==> Loading \"%s\" into memory", argv[1]);
status_blk(test, msg);
qsort(test->line, test->size, sizeof(void*), cmp_line);
sprintf(msg, "==> \"%s\" after sort", argv[1]);
status_blk(test, msg);
test = delete_blk(test);
return 0;
};
As planned
load_file() is the constructor and load the file contents into a Block.
status_blk() shows the contents and accepts a convenient optional message
qsort() sorts the lines using a one-line cmp_line() function.
status_blk() is called again and shows the now sorted contents
as in C++ delete_blk() is the destructor and erases the whole thing._
output using main() as tlines.c for testing
PS M:\> .\lines tlines.c
loading "tlines.c" into memory
Block extended for a total of 16 pointers
==> Loading "tlines.c" into memory
Status: 13 of 16 lines. [block size is 8]:
1 int main(int argc, char** argv)
2 {
3 char msg[80] = {0};
4 if (argc < 2) usage();
5 Block* test = load_file(argv[1]);
6 sprintf(msg, "==> Loading \"%s\" into memory", argv[1]);
7 status_blk(test, msg);
8 qsort(test->line, test->size, sizeof(void*), cmp_line);
9 sprintf(msg, "==> \"%s\" after sort", argv[1]);
10 status_blk(test, msg);
11 test = delete_blk(test);
12 return 0;
13 };
==> "tlines.c" after sort
Status: 13 of 16 lines. [block size is 8]:
1 Block* test = load_file(argv[1]);
2 char msg[80] = {0};
3 if (argc < 2) usage();
4 qsort(test->line, test->size, sizeof(void*), cmp_line);
5 return 0;
6 sprintf(msg, "==> Loading \"%s\" into memory", argv[1]);
7 sprintf(msg, "==> \"%s\" after sort", argv[1]);
8 status_blk(test, msg);
9 status_blk(test, msg);
10 test = delete_blk(test);
11 int main(int argc, char** argv)
12 {
13 };
About the code
I am not sure if it needs much explanation, it is a single function that does the file loading and it has around 20 lines of code. The other functions has less than 10. The whole file is represented in line that is char** and Block has the needed info about actual size.
Since line[] is an array of pointers we can call
qsort(test->line, test->size, sizeof(void*), cmp_line);
and use
int cmp_line(const void* one, const void* other)
{
return strcmp(
*((const char**)one), *((const char**)other));
}
using strcmp() to compare the strings and have the lines sorted.
create_blk() accepts a block size for use in the calls to realloc() for eficiency.
Delete a Block is a 3-step free() in the reverse order of allocation.
The complete code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct
{
size_t blk_size; // block
size_t limit; // actual allocated size
size_t size; // size in use
char** line; // the lines
} Block;
Block* create_blk(size_t);
Block* delete_blk(Block*);
int status_blk(Block*, const char*);
Block* load_file(const char*);
int cmp_line(const void*, const void*);
void usage();
int main(int argc, char** argv)
{
char msg[80] = {0};
if (argc < 2) usage();
Block* test = load_file(argv[1]);
sprintf(msg, "\n\n==> Loading \"%s\" into memory", argv[1]);
status_blk(test, msg);
qsort(test->line, test->size, sizeof(void*), cmp_line);
sprintf(msg, "\n\n==> \"%s\" after sort", argv[1]);
status_blk(test, msg);
test = delete_blk(test);
return 0;
};
int cmp_line(const void* one, const void* other)
{
return strcmp(
*((const char**)one), *((const char**)other));
}
Block* create_blk(size_t size)
{
Block* nb = (Block*)malloc(sizeof(Block));
if (nb == NULL) return NULL;
nb->blk_size = size;
nb->limit = size;
nb->size = 0;
nb->line = (char**)malloc(sizeof(char*) * size);
return nb;
}
Block* delete_blk(Block* blk)
{
if (blk == NULL) return NULL;
for (size_t i = 0; i < blk->size; i += 1)
free(blk->line[i]); // free lines
free(blk->line); // free block
free(blk); // free struct
return NULL;
}
int status_blk(Block* bl,const char* msg)
{
if (msg != NULL) printf("%s\n", msg);
if (bl == NULL)
{
printf("Status: not allocated\n");
return -1;
}
printf(
"Status: %zd of %zd lines. [block size is %zd]:\n",
bl->size, bl->limit, bl->blk_size);
for (int i = 0; i < bl->size; i += 1)
printf("%4d\t%s", 1 + i, bl->line[i]);
return 0;
}
Block* load_file(const char* f_name)
{
if (f_name == NULL) return NULL;
fprintf(stderr, "loading \"%s\" into memory\n", f_name);
FILE* F = fopen(f_name, "r");
if (F == NULL) return NULL;
// file is open
Block* nb = create_blk(8); // block size is 8
char line[200];
char* p = &line[0];
p = fgets(p, sizeof(line), F);
while (p != NULL)
{
// is block full?
if (nb->size >= nb->limit)
{
const size_t new_sz = nb->limit + nb->blk_size;
char* new_block =
realloc(nb->line, (new_sz * sizeof(char*)));
if (new_block == NULL)
{
fprintf(
stderr,
"\tCould not extend block to %zd "
"lines\n",
new_sz);
break;
}
printf(
"Block extended for a total of %zd "
"pointers\n",
new_sz);
nb->limit = new_sz;
nb->line = (char**)new_block;
}
// now copy the line
nb->line[nb->size] = (char*)malloc(1 + strlen(p));
strcpy(nb->line[nb->size], p);
nb->size += 1;
// read next line
p = fgets(p, sizeof(line), F);
}; // while()
fclose(F);
return nb;
}
void usage()
{
fprintf(stderr,"Use: program file_to_load\n");
exit(EXIT_FAILURE);
}
Try something like this:
#include <stdio.h>
#include <stdlib.h> // for malloc(), free(), exit()
#include <string.h> // for strcpy()
#define MAXLINES 4
#define MAXLENGTH 1000
char *lines[MAXLINES];
void readlines() {
for( int i = 0; i < MAXLINES; i++) {
char c, line[MAXLENGTH + 1]; // ALWAYS one extra to allow for '\0'
int j = 0;
// RE-USE(!) local array for input characters until NL or length
// NB: Casting return value to character (suppress warning)
while( (c = (char)getchar()) != '\0' && c != '\n' && j < MAXLENGTH )
line[ j++ ] = c;
line[j] = '\0'; // terminate array (transforming it to 'string')
// Attempt to get a buffer to preserve this line
// (Old) compiler insists on casting return from malloc()
if( ( lines[i] = (char*)malloc( (j + 1) * sizeof lines[0][0] ) ) == NULL ) {
fprintf( stderr, "malloc failure\n" );
exit( -1 );
}
strcpy( lines[i], line ); // preserve this line
}
}
int my_main() {
readlines(); // only returns after successfully reading 4 lines of input
for( int i = 0; i < MAXLINES; i++)
printf( "Line %d: '%s'\n", i, lines[i] ); // enhanced
/* Maybe do stuff here */
for( int j = 0; j < MAXLINES; j++) // free up allocated memory.
free( lines[j] );
return 0;
}
If you would prefer to 'factor out` some code (and have a facility that you've written is absent, here's a version:
char *my_strdup( char *str ) {
int len = strlen( str ) + 1; // ALWAYS +1
// Attempt to get a buffer to preserve this line
// (Old) compiler insists on casting return from malloc()
char *pRet = (char*)malloc( len * sizeof *pRet );
if( pRet == NULL ) {
fprintf( stderr, "malloc failure\n" );
exit( -1 );
}
return strcpy( pRet, str );
}
The the terminating and preserve is condensed to:
line[j] = '\0'; // terminate array (transforming it to 'string')
lines[i] = my_strdup( line ); // preserve this line

Getting "Abort trap 6" using memset()

I am relatively new to C, so please bear with me if this is an obvious question. I've looked all over SO for an answer, and have not been able to figure this out.
I am writing a simple calculator -- it will take a calculation from the user ("1 + 3", for example, and return the result. To keep things simple, I am setting a length for the input buffer and forcing the user to stay within those bounds. If they input too many characters, I want to alert them they have gone over the limit, and reset the buffer so that they can re-input.
This functionality works fine when they stay under the limit. It also correctly gives them a message when they go over the limit. However, when they try to input a valid calculation after having put in an invalid one, I get abort trap: 6. I know this has something to do with how I am resetting the array and managing the memory of that buffer, but my C skills are not quite sharp enough to diagnose the problem on my own.
If anybody could please take a look, I'd really appreciate it! I've pasted my code below.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <stdlib.h>
#define BUFFER_SIZE 50
static void ready_for_input()
{
printf("> ");
}
static char *as_string(char buffer[], int size)
{
char *result = (char *)malloc((size + 1) * sizeof(char));
if (!result)
{
fprintf(stderr, "calculator: allocation error");
exit(EXIT_FAILURE);
}
for (int i = 0; i < size; i++)
{
result[i] = buffer[i];
}
// to make it a valid string
result[size] = '\0';
return result;
}
static char *read_line()
{
// put the input into a buffer
char buffer[BUFFER_SIZE], c;
int len = 0;
while (true)
{
c = getchar();
if (c == EOF || c == '\n')
{
// reset if input has exceeded buffer length
if (len > BUFFER_SIZE)
{
printf("Calculations must be under 100 characters long.\n");
memset(buffer, 0, sizeof(buffer));
len = 0;
ready_for_input();
}
else
{
return as_string(buffer, len);
}
}
else
{
buffer[len++] = c;
}
}
}
static void start_calculator()
{
ready_for_input();
char *line = read_line();
printf("input received : %s", line);
}
int main(int argc, char *argv[])
{
start_calculator();
}
You don't prevent the buffer overflow, because you are checking for it too late. You should check whether the user is about to exceed the buffer's size, before the user hits enter.
The code below improves a bit the way a buffer overflow is checked:
static char *read_line()
{
// put the input into a buffer
char buffer[BUFFER_SIZE];
int c; // getchar should be assigned to an int
int len = 0;
while (true)
{
c = getchar();
if (len >= BUFFER_SIZE)
{
// drop everything until EOF or newline
while (c != EOF && c != '\n')
c = getchar();
printf("Calculations must be under 100 characters long.\n");
memset(buffer, 0, sizeof(buffer));
len = 0;
ready_for_input();
}
else if (c == EOF || c == '\n')
{
return as_string(buffer, len);
}
else
{
buffer[len++] = c;
}
}
}
Another thing to notice is that gethchar() should be assigned to an int variable instead of char since you are checking for EOF (more info about this)
Finally, you may want to check for better ways to read a line in c, such as fgets, dynamically allocate memory for your buffer and using realloc (or a combination of malloc and memmove) to double the size when a limit is reached, or using getline.

How to get ASCII code for characters from a text file?

Update, Hello guys Thank you all for the help, my initial approach was wrong and I did not use ASCII codes at all.
Sorry for the late replay I had a half-day off today and made a new post for the complete code
there is no errors but the prgram is not working proberly ( this is an update of old post )
I wrote the program, and it is working with no errors But it is not giving me the results I wanted
My only problem is when I read a character how to check its ASCII and store it.
#include <stdio.h>
#include <string.h>
int main()
{
char dictionary[300];
char ch, temp1, temp2;
FILE *test;
test=fopen("HW2.txt","r");
for(int i=0;i<2000;i+=1)
{ ch=fgetc(test);
printf("%c",ch);
}
}
If we are talking about plain ASCII, values goes from 0 to 127, your table shoud look like:
int dictionary[128] = {0};
Regarding your question:
how to check its ASCII and store it
Consider a char being a tiny int, they are interchangeable and you don't need any conversion.
fgetc wants an int in order to handle EOF, and trying to read 2000 characters from a file containing less than 2000 bytes can have very bad consequences, to read the whole file:
int c;
while ((c = fgetc(test)) != EOF)
{
if ((c > 0) && (c < 128))
{
dictionary[c]++;
}
}
for (int i = 1; i < 128; i++)
{
if (dictionary[i] > 0)
{
printf("%c appeared %d times\n", i, dictionary[i]);
}
}
EDIT:
Rereading, I see that you want to store words, not chars, ok, then it's a bit more difficult but nothing terrible, do not limit yourself to 300 words, use dynamic memory:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// A struct to hold the words and the
// number of times it appears
struct words
{
size_t count;
char *word;
};
int main(void)
{
FILE *file;
file = fopen("HW2.txt", "r");
// Always check the result of fopen
if (file == NULL)
{
perror("fopen");
exit(EXIT_FAILURE);
}
struct words *words = NULL;
size_t nwords = 0;
char *word = NULL;
size_t nchars = 1;
size_t i;
int c;
// while there is text to scan
while ((c = fgetc(file)) != EOF)
{
if (isspace(c))
{
if (word != NULL)
{
// Search the word in the table
for (i = 0; i < nwords; i++)
{
// Found, increment the counter
if (strcmp(word, words[i].word) == 0)
{
words[i].count++;
free(word);
break;
}
}
// Not found, add the word to the table
if (i == nwords)
{
struct words *temp;
temp = realloc(words, sizeof(*temp) * (nwords + 1));
if (temp == NULL)
{
perror("realloc");
exit(EXIT_FAILURE);
}
words = temp;
words[nwords].word = word;
words[nwords].count = 1;
nwords++;
}
// Prepare the next word
word = NULL;
nchars = 1;
}
}
else
{
char *temp;
temp = realloc(word, nchars + 1);
if (temp == NULL)
{
perror("realloc");
exit(EXIT_FAILURE);
}
word = temp;
word[nchars - 1] = (char)c;
word[nchars++] = '\0';
}
}
for (i = 0; i < nwords; i++)
{
printf("%s appeared %zu times\n", words[i].word, words[i].count);
free(words[i].word);
}
free(words);
fclose(file);
return 0;
}
In C, characters are, essentially, their ASCII code (or rather, their char or unsigned char value). So once you read a character, you have its ASCII code already.
However, fgetc() doesn't always return the character it read for you; it may fail, for which reason it returns an int, not an unsigned char, which will be -1 in case of failure.
So:
You need to define an int variable to take the result of fgetc().
If it's not EOF, you can cast the result back into a unsigned char. That's your character, and it's ASCII value, at the same time.
PS - I'm ignoring non-ASCII characters, non-Latin languages etc. (But C mostly ignores them in its basic standard library functions too.)

Using realloc to expand buffer while reading from file crashes

I am writing some code that needs to read fasta files, so part of my code (included below) is a fasta parser. As a single sequence can span multiple lines in the fasta format, I need to concatenate multiple successive lines read from the file into a single string. I do this, by realloc'ing the string buffer after reading every line, to be the current length of the sequence plus the length of the line read in. I do some other stuff, like stripping white space etc. All goes well for the first sequence, but fasta files can contain multiple sequences. So similarly, I have a dynamic array of structs with a two strings (title, and actual sequence), being "char *". Again, as I encounter a new title (introduced by a line beginning with '>') I increment the number of sequences, and realloc the sequence list buffer. The realloc segfaults on allocating space for the second sequence with
*** glibc detected *** ./stackoverflow: malloc(): memory corruption: 0x09fd9210 ***
Aborted
For the life of me I can't see why. I've run it through gdb and everything seems to be working (i.e. everything is initialised, the values seems sane)... Here's the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <errno.h>
//a struture to keep a record of sequences read in from file, and their titles
typedef struct {
char *title;
char *sequence;
} sequence_rec;
//string convenience functions
//checks whether a string consists entirely of white space
int empty(const char *s) {
int i;
i = 0;
while (s[i] != 0) {
if (!isspace(s[i])) return 0;
i++;
}
return 1;
}
//substr allocates and returns a new string which is a substring of s from i to
//j exclusive, where i < j; If i or j are negative they refer to distance from
//the end of the s
char *substr(const char *s, int i, int j) {
char *ret;
if (i < 0) i = strlen(s)-i;
if (j < 0) j = strlen(s)-j;
ret = malloc(j-i+1);
strncpy(ret,s,j-i);
return ret;
}
//strips white space from either end of the string
void strip(char **s) {
int i, j, len;
char *tmp = *s;
len = strlen(*s);
i = 0;
while ((isspace(*(*s+i)))&&(i < len)) {
i++;
}
j = strlen(*s)-1;
while ((isspace(*(*s+j)))&&(j > 0)) {
j--;
}
*s = strndup(*s+i, j-i);
free(tmp);
}
int main(int argc, char**argv) {
sequence_rec *sequences = NULL;
FILE *f = NULL;
char *line = NULL;
size_t linelen;
int rcount;
int numsequences = 0;
f = fopen(argv[1], "r");
if (f == NULL) {
fprintf(stderr, "Error opening %s: %s\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
rcount = getline(&line, &linelen, f);
while (rcount != -1) {
while (empty(line)) rcount = getline(&line, &linelen, f);
if (line[0] != '>') {
fprintf(stderr,"Sequence input not in valid fasta format\n");
return EXIT_FAILURE;
}
numsequences++;
sequences = realloc(sequences,sizeof(sequence_rec)*numsequences);
sequences[numsequences-1].title = strdup(line+1); strip(&sequences[numsequences-1].title);
rcount = getline(&line, &linelen, f);
sequences[numsequences-1].sequence = malloc(1); sequences[numsequences-1].sequence[0] = 0;
while ((!empty(line))&&(line[0] != '>')) {
strip(&line);
sequences[numsequences-1].sequence = realloc(sequences[numsequences-1].sequence, strlen(sequences[numsequences-1].sequence)+strlen(line)+1);
strcat(sequences[numsequences-1].sequence,line);
rcount = getline(&line, &linelen, f);
}
}
return EXIT_SUCCESS;
}
You should use strings that look something like this:
struct string {
int len;
char *ptr;
};
This prevents strncpy bugs like what it seems you saw, and allows you to do strcat and friends faster.
You should also use a doubling array for each string. This prevents too many allocations and memcpys. Something like this:
int sstrcat(struct string *a, struct string *b)
{
int len = a->len + b->len;
int alen = a->len;
if (a->len < len) {
while (a->len < len) {
a->len *= 2;
}
a->ptr = realloc(a->ptr, a->len);
if (a->ptr == NULL) {
return ENOMEM;
}
}
memcpy(&a->ptr[alen], b->ptr, b->len);
return 0;
}
I now see you are doing bioinformatics, which means you probably need more performance than I thought. You should use strings like this instead:
struct string {
int len;
char ptr[0];
};
This way, when you allocate a string object, you call malloc(sizeof(struct string) + len) and avoid a second call to malloc. It's a little more work but it should help measurably, in terms of speed and also memory fragmentation.
Finally, if this isn't actually the source of error, it looks like you have some corruption. Valgrind should help you detect it if gdb fails.
One potential issue is here:
strncpy(ret,s,j-i);
return ret;
ret might not get a null terminator. See man strncpy:
char *strncpy(char *dest, const char *src, size_t n);
...
The strncpy() function is similar, except that at most n bytes of src
are copied. Warning: If there is no null byte among the first n bytes
of src, the string placed in dest will not be null terminated.
There's also a bug here:
j = strlen(*s)-1;
while ((isspace(*(*s+j)))&&(j > 0)) {
What if strlen(*s) is 0? You'll end up reading (*s)[-1].
You also don't check in strip() that the string doesn't consist entirely of spaces. If it does, you'll end up with j < i.
edit: Just noticed that your substr() function doesn't actually get called.
I think the memory corruption problem might be the result of how you're handling the data used in your getline() calls. Basically, line is reallocated via strndup() in the calls to strip(), so the buffer size being tracked in linelen by getline() will no longer be accurate. getline() may overrun the buffer.
while ((!empty(line))&&(line[0] != '>')) {
strip(&line); // <-- assigns a `strndup()` allocation to `line`
sequences[numsequences-1].sequence = realloc(sequences[numsequences-1].sequence, strlen(sequences[numsequences-1].sequence)+strlen(line)+1);
strcat(sequences[numsequences-1].sequence,line);
rcount = getline(&line, &linelen, f); // <-- the buffer `line` points to might be
// smaller than `linelen` bytes
}

txt to separate strings in c

I have been trying to take chars from a txt file(in which the words of the text that will become strings will be separated by spaces) and import them into strings in my code. I tried it but I only could print the words (that are separated by spaces). How can I input them into strings?
The code that prints the words is the following, but I also need it to save the string into arrays or pointers if possible.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(){
FILE *fp;
int i=0;
char *words=NULL,*word=NULL,c;
if ((fp=fopen("monologue.txt","r"))==NULL){ /*Where monologue txt is a normal file with plain text*/
printf("Error Opening File\n");
exit(1);}
while ((c = fgetc(fp))!= EOF){
if (c=='\n'){ c = ' '; }
words = (char *)realloc(words, ++i*sizeof(char));
words[i-1]=c;}
word=strtok(words," ");
while(word!= NULL){
printf("%s\n",word);
word = strtok(NULL," ");}
exit(0);
}
Your code is rather hard to read. Here is almost identical code that is (I submit) considerably more readable:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
const char filename[] = "monologue.txt";
FILE *fp;
int i = 0;
char *words = NULL;
char *word = NULL;
int c;
if ((fp = fopen(filename, "r")) == NULL)
{
/*Where monologue txt is a normal file with plain text*/
fprintf(stderr, "Error opening file %s\n", filename);
exit(1);
}
while ((c = fgetc(fp)) != EOF)
{
if (c == '\n')
c = ' ';
words = (char *)realloc(words, ++i * sizeof(char));
words[i-1] = c;
}
word = strtok(words, " ");
while (word != NULL)
{
printf("%s\n", word);
word = strtok(NULL, " ");
}
return(0);
}
This shows us that you are slurping the entire file into the string pointed to by words, but you are doing so rather inefficiently in that you are reallocating memory one byte at a time for each byte read. You should be looking to do things much more effectively, by reading bigger chunks of the file into memory. For example, you might allocate an initial buffer of 32 KiB; you could read into that buffer using fread(); if you don't encounter EOF, you could then reallocate the space, doubling the amount available to you. (For testing, you'd start with a much smaller block - maybe 16 bytes, maybe even as small as 4 bytes; this ensures you test the memory reallocation code, whereas 32 KiB would probably seldom exercise the reallocation code.)
You also need to ensure that your string is null terminated; as it stands, it is not. You would need to do a final realloc() to make space for the null terminator too.
You can avoid mapping newlines during input since strtok() can be given a list of characters on which to split, so you can add newline to that list.
To generate a list of words, you need to adapt the loop around strtok(). You might simply count the spaces and newlines and then allocate enough pointers to point to that many words; you might have an overestimate if there are adjacent spaces or newlines, but better over than under. Alternatively, you can can allocate, for sake of argument, 16 pointers. As you process the first 16 words, you use these pointers; when you run out of space, you double the number of pointers allocated, and use the new supply until that runs out. You can use any algorithm that allocates a significant number of pointers (meaning 'more than one' and 'increasing as the number already used goes up') instead of simple doubling, but doubling has its merits (notably, it is simple).
One word of caution: you should never assign the result of realloc() to the variable that is its first argument:
words = (char *)realloc(words, ++i * sizeof(char)); // Bad!
The trouble is that if realloc() fails, you've just wiped out the only pointer to the previously allocated memory, so you have leaked it all. Always assign to a new variable, test that it worked, then copy the result:
char *new_space = (char *)realloc(words, ++i * sizeof(char));
if (new_space == 0)
{
fprintf(stderr, "Memory allocation failed at size %d\n", i);
exit(1);
}
words = new_space;
I assembled this code yesterday. Notice that it uses functions to do repeated jobs - such as checking that memory allocation succeeded. There is room to improve it (there always is). It does character at a time input still (and newline mapping, therefore) but allocates increasingly large chunks of memory so that it does not do memory allocation on every character read. The err_exit() function is a useful skeleton; you can flesh it out into a much more complex system, but the basic idea of a function to report errors and exit (with a behaviour similar to fprintf() + exit() can simplify programs a lot (and error checking and reporting is important, but needs to be simple when it can be).
#include <assert.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void err_exit(const char *format, ...);
static void *emalloc(size_t nbytes);
static void *erealloc(void *old_space, size_t nbytes);
int main(void)
{
const char filename[] = "monologue.txt";
FILE *fp;
size_t i = 0;
size_t len_data = 4;
char *data = emalloc(len_data);
int c;
/* Read data from file */
if ((fp = fopen(filename, "r")) == NULL)
err_exit("Error opening file %s\n", filename);
while ((c = fgetc(fp)) != EOF)
{
if (c == '\n')
c = ' ';
if (i >= len_data)
{
assert(i == len_data);
data = realloc(data, 2 * len_data);
len_data *= 2;
}
data[i++] = c;
}
if (i >= len_data)
{
assert(i == len_data);
data = erealloc(data, len_data + 1);
len_data++;
}
data[i] = '\0';
fclose(fp);
/* Split file into words */
size_t len_wordlist = 16;
size_t num_words = 0;
char **wordlist = emalloc(len_wordlist * sizeof(char *));
char *location = data;
char *word;
for (num_words = 0; (word = strtok(location, " ")) != NULL; num_words++)
{
if (num_words >= len_wordlist)
{
assert(num_words == len_wordlist);
wordlist = erealloc(wordlist, 2 * len_wordlist * sizeof(char *));
len_wordlist *= 2;
}
wordlist[num_words] = word;
location = NULL;
}
/* Print the word list - one per line */
for (i = 0; i < num_words; i++)
printf("%zu: %s\n", i, wordlist[i]);
/* Release allocated space */
free(data);
free(wordlist);
return(0);
}
static void err_exit(const char *format, ...)
{
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
exit(1);
}
static void *emalloc(size_t nbytes)
{
void *new_space = malloc(nbytes);
if (new_space == 0)
err_exit("Failed to allocate %zu bytes of memory\n", nbytes);
return(new_space);
}
static void *erealloc(void *old_space, size_t nbytes)
{
void *new_space = realloc(old_space, nbytes);
if (new_space == 0)
err_exit("Failed to reallocate %zu bytes of memory\n", nbytes);
return(new_space);
}
Try this. I've modified very little about your code, just to keep it close to your starting point. The main thing I did was add allwords which is an array of char * (this is where I store each string one by one). Then right after printing each version of word (what you were already doing), I also copied it into the next open slot in the allwords array. At the end I added another printing loop to display the contents of each string.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXWORDS 999
int main(){
FILE *fp;
int i=0, j;
char *words=NULL,*word=NULL,c;
char *allwords[MAXWORDS];
if ((fp=fopen("monologue.txt","r"))==NULL){ /*Where monologue txt is a normal file with plain text*/
printf("Error Opening File\n");
exit(1);}
while ((c = fgetc(fp))!= EOF){
if (c=='\n'){ c = ' '; }
words = (char *)realloc(words, ++i*sizeof(char));
words[i-1]=c;}
word=strtok(words," ");
i=0;
while(word!= NULL && i < MAXWORDS){
printf("%s\n",word);
allwords[i] = malloc(strlen(word));
strcpy(allwords[i], word);
word = strtok(NULL," ");
i++;
}
printf("\nNow printing each saved string:\n");
for (j=0; j<i; j++)
printf("String %d: %s\n", j, allwords[j]);
exit(0);
}

Resources