Let's say we have a string of words that are delimited by a comma.
I want to write a code in C to store these words in a variable.
Example
amazon, google, facebook, twitter, salesforce, sfb
We do not know how many words are present.
If I were to do this in C, I thought I need to do 2 iterations.
First iteration, I count how many words are present.
Then, in the next iteration, I store each words.
Step 1: 1st loop -- count number of words
....
....
//End 1st loop. num_words is set.
Step 2:
// Do malloc using num_words.
char **array = (char**)malloc(num_words* sizeof(char*));
Step 3: 2nd loop -- Store each word.
// First, walk until the delimiter and determine the length of the word
// Once len_word is determined, do malloc
*array= (char*)malloc(len_word * sizeof(char));
// And then store the word to it
// Do this for all words and then the 2nd loop terminates
Can this be done more efficiently?
I do not like having 2 loops. I think there must be a way to do it in 1 loop with just basic pointers.
The only restriction is that this needs to be done in C (due to constraints that are not in my control)
You don't need to do a separate pass to count the words. You can use realloc to enlarge the array on the fly as you read in the data on a single pass.
To parse an input line buffer, you can use strtok to tokenize the individual words.
When saving the parsed words into the word list array, you can use strdup to create a copy of the tokenized word. This is necessary for the word to persist. That is, whatever you were pointing to in the line buffer on the first line will get clobbered when you read the second line (and so on ...)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
char **words;
size_t wordmax;
size_t wordcount;
int
main(int argc,char **argv)
{
char *cp;
char *bp;
FILE *fi;
char buf[5000];
--argc;
++argv;
// get input file name
cp = *argv;
if (cp == NULL) {
printf("no file specified\n");
exit(1);
}
// open input file
fi = fopen(cp,"r");
if (fi == NULL) {
printf("unable to open file '%s' -- %s\n",cp,strerror(errno));
exit(1);
}
while (1) {
// read in next line -- bug out if EOF
cp = fgets(buf,sizeof(buf),fi);
if (cp == NULL)
break;
bp = buf;
while (1) {
// tokenize the word
cp = strtok(bp," \t,\n");
if (cp == NULL)
break;
bp = NULL;
// expand the space allocated for the word list [if necessary]
if (wordcount >= wordmax) {
// this is an expensive operation so don't do it too often
wordmax += 100;
words = realloc(words,(wordmax + 1) * sizeof(char *));
if (words == NULL) {
printf("out of memory\n");
exit(1);
}
}
// get a persistent copy of the word text
cp = strdup(cp);
if (cp == NULL) {
printf("out of memory\n");
exit(1);
}
// save the word into the word array
words[wordcount++] = cp;
}
}
// close the input file
fclose(fi);
// add a null terminator
words[wordcount] = NULL;
// trim the array to exactly what we need/used
words = realloc(words,(wordcount + 1) * sizeof(char *));
// NOTE: because we added the terminator, _either_ of these loops will
// print the word list
#if 1
for (size_t idx = 0; idx < wordcount; ++idx)
printf("%s\n",words[idx]);
#else
for (char **word = words; *word != NULL; ++word)
printf("%s\n",*word);
#endif
return 0;
}
What you're looking for is
http://manpagesfr.free.fr/man/man3/strtok.3.html
(From man page)
The strtok() function parses a string into a sequence of tokens. On the first call to strtok() the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str should be NULL.
But this thread look like duplicate of Split string with delimiters in C
Unless you are forced to produce your own implementation ...
We do not know how many words are present.
We know num_words <= strlen(string) + 1. Only 1 "loop" needed. The cheat here is a quick run down s via strlen().
// *alloc() out-of-memory checking omitted for brevity
char **parse_csv(const char *s) {
size_t slen = strlen(s);
size_t num_words = 0;
char **words = malloc(sizeof *words * (slen + 1));
// find, allocate, copy the words
while (*s) {
size_t len = strcspn(s, ",");
words[num_words] = malloc(len + 1);
memcpy(words[num_words], s, len);
words[num_words][len] = '\0';
num_words++;
s += len; // skip word
if (*s) s++; // skip ,
}
// Only 1 realloc() needed.
realloc(words, sizeof *words *num_words); // right-size words list
return words;
}
It makes send to NULL terminate the list, so
char **words = malloc(sizeof *words * (slen + 1 + 1));
...
words[num_words++] = NULL;
realloc(words, sizeof *words *num_words);
return words;
In considering the worst case for the initial char **words = malloc(...);, I take a string like ",,," with its 3 ',' would make for 4 words "", "", "", "". Adjust code as needed for such pathological cases.
Related
I am trying to read a file and store every word into a dynamically allocated 2D array. The size of the input file is unknown.
I am totally lost and don't know how I could "fix/finish" the program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char filename[25];
printf("Input the filename");
scanf("%s", filename);
fileConverter(filename);
}
int fileConverter(char filename[25]) {
//int maxLines = 50000;
//int maxWordSize = 128;
//char words[maxLines][maxWordSize];
//char **words;
char **arr = (char**) calloc(num_elements, sizeof(char*));
for ( i = 0; i < num_elements; i++ ) {
arr[i] = (char*) calloc(num_elements_sub, sizeof(char));
}
FILE *file = NULL;
int amountOfWords = 0;
file = fopen(filename, "r");
if(file == NULL) {
exit(0);
}
while(fgets(words[amountOfWords], 10000, file)) {
words[amountOfWords][strlen(words[amountOfWords]) - 1] = "\0";
amountOfWords++;
}
for(int i = 0; i < amountOfWords; i++) {
printf("a[%d] = ", i);
printf("%s\n", words[i]);
}
printf("The file contains %d words and the same amount of lines.\n", amountOfWords);
return amountOfWords;
The main challenges for this kind of problem are
reallocating the array of strings as the program reads new words, and
handling words that are larger than the buffer used by fgets.
The general approach for these kind of parsing problems, is to design a state machine. The state machine here has two states:
The current character is whitespace. Action: Continue reading whitespace until we reach the end of the buffer, or until we land on a non-whitespace character, in which case we switch to state 2.
The current character is non-whitespace (i.e. a word). Action: Continue reading non-whitespace until we reach the end of the buffer, or until we land on a whitespace character, in which case we copy the word we just read to the array of strings and switch to state 1.
Particularly difficult is the case in which we are in state 2 and reach the end of the buffer. This means that this word spans multiple buffers. To accommodate for this, we deviate slightly from a direct state machine implementation. State 2 is slightly different, depending on if we are reading a new word or continuing one that was started in a previous buffer.
We now keep track of wordSize. If we start reading from the start of a buffer, but wordSize is not 0, then we know we are continuing a previous word and we know what size it was for the realloc we need.
Below is one possible implementation. All the work is done in the wordArrayRead function. Walking through it from the top of the function:
First we declare the variables that we need across lineBuffer reads: an index for the word itself and the length of the word we are currently reading, followed by the declaration of the buffer itself. The outside loop repeatedly reads using fgets until we have exhausted the input.
We start reading at index 0 and stop at the null-terminator. The first if-statement checks if we should be in state 2: either the current character is the start of a word or we were already reading a word.
State 2
The index wordStartIdx stays at the first character of the word (segment) and we walk the wordEndIdx to the end of the word (segment) or to the end of the buffer.
We then check if we need to increase the size of the array of strings. Here we increase it to 2 times + 1 the previous size to avoid frequent reallocations.
We set a boolean value, indication whether we have reached the end of a word. If we have, we need to allocate for and write the null-terminator at the end of the string.
If wordLength == 0 it means we are reading a new word and have to allocate memory for it for the first time. If wordLength != 0, we have to reallocate to append to an existing word.
We copy the word (segment) currently in the lineBuffer to the array of strings.
Now, we do some bookkeeping. If we reached the end of a word, we write the null-terminator, increment the index to point to the next word location and reset wordLength. If this wasn't the case, we only increment the wordLength with the length of the segment we just read. Finally, we update wordStartIdx, which still points to the start of the word, to point to the end of the word, so we can continue iterating over the buffer.
State 1
Having finishing the State 2 processing, we go into State 1 which has only two lines. It simply advances the index until we land at non-whitespace. Note that the null-terminator of the lineBuffer ('\0') does not count as whitespace, so this loop will not continue past the end of the buffer.
After all input has been processed, we shrink the array of strings to the actual size of its data. This "corrects" the allocation policy of increasing the size by 2n+1 each time it wasn't large enough.
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// BUFFER_SIZE must be >1U
#define BUFFER_SIZE 1024U
struct WordArray
{
char **words;
size_t numberOfWords;
};
static struct WordArray wordArrayConstruct(void);
static void wordArrayResize(struct WordArray *wordArray, size_t const newSize);
static void wordArrayDestruct(struct WordArray *wordArray);
static void wordArrayRead(FILE *restrict stream, struct WordArray *wordArray);
static char *reallocStringWrapper(char *restrict str, size_t const newSize);
static void wordArrayPrint(struct WordArray const *wordArray);
int main(void)
{
struct WordArray wordArray = wordArrayConstruct();
wordArrayRead(stdin, &wordArray);
wordArrayPrint(&wordArray);
wordArrayDestruct(&wordArray);
}
static void wordArrayRead(FILE *restrict stream, struct WordArray *wordArray)
{
size_t wordArrayIdx = 0U;
size_t wordLength = 0U;
char lineBuffer[BUFFER_SIZE];
while (fgets(lineBuffer, sizeof lineBuffer, stream) != NULL)
{
size_t wordStartIdx = 0U;
while (lineBuffer[wordStartIdx] != '\0')
{
if (!isspace(lineBuffer[wordStartIdx]) || wordLength != 0U)
{
size_t wordEndIdx = wordStartIdx;
while (!isspace(lineBuffer[wordEndIdx]) && wordEndIdx != BUFFER_SIZE - 1U)
++wordEndIdx;
if (wordArrayIdx >= wordArray->numberOfWords)
wordArrayResize(wordArray, wordArray->numberOfWords * 2U + 1U);
size_t wordSegmentLength = wordEndIdx - wordStartIdx;
size_t foundWordEnd = wordEndIdx != BUFFER_SIZE - 1U; // 0 or 1 bool
// Allocate for a new word, or reallocate for an existing word
// If a word end was found, add 1 to the size for the '\0' character
char *dest = wordLength == 0U ? NULL : wordArray->words[wordArrayIdx];
size_t allocSize = wordLength + wordSegmentLength + foundWordEnd;
wordArray->words[wordArrayIdx] = reallocStringWrapper(dest, allocSize);
memcpy(&(wordArray->words[wordArrayIdx][wordLength]),
&lineBuffer[wordStartIdx], wordSegmentLength);
if (foundWordEnd)
{
wordArray->words[wordArrayIdx][wordLength + wordSegmentLength] = '\0';
++wordArrayIdx;
wordLength = 0U;
}
else
{
wordLength += wordSegmentLength;
}
wordStartIdx = wordEndIdx;
}
while (isspace(lineBuffer[wordStartIdx]))
++wordStartIdx;
}
}
// All done. Shrink the words array to the size of the actual data
if (wordArray->numberOfWords != 0U)
wordArrayResize(wordArray, wordArrayIdx);
}
static struct WordArray wordArrayConstruct(void)
{
return (struct WordArray) {.words = NULL, .numberOfWords = 0U};
}
static void wordArrayResize(struct WordArray *wordArray, size_t const newSize)
{
assert(newSize > 0U);
char **tmp = (char**) realloc(wordArray->words, newSize * sizeof *wordArray->words);
if (tmp == NULL)
{
wordArrayDestruct(wordArray);
fprintf(stderr, "WordArray allocation error\n");
exit(EXIT_FAILURE);
}
wordArray->words = tmp;
wordArray->numberOfWords = newSize;
}
static void wordArrayDestruct(struct WordArray *wordArray)
{
for (size_t wordStartIdx = 0U; wordStartIdx < wordArray->numberOfWords; ++wordStartIdx)
{
free(wordArray->words[wordStartIdx]);
wordArray->words[wordStartIdx] = NULL;
}
free(wordArray->words);
}
static char *reallocStringWrapper(char *restrict str, size_t const newSize)
{
char *tmp = (char*) realloc(str, newSize);
if (tmp == NULL)
{
free(str);
fprintf(stderr, "Realloc string allocation error\n");
exit(EXIT_FAILURE);
}
return tmp;
}
static void wordArrayPrint(struct WordArray const *wordArray)
{
for (size_t wordStartIdx = 0U; wordStartIdx < wordArray->numberOfWords; ++wordStartIdx)
printf("%zu: %s\n", wordStartIdx, wordArray->words[wordStartIdx]);
}
Note: This program reads input from stdin, as Unix/Linux utilities typically do. Use input redirection to read from a file, or provide a file descriptor to the readWordArray function.
to allocate dynamic 2D array you need:
void allocChar2Darray(size_t rows, size_t columns, char (**array)[columns])
{
*array = malloc(rows * sizeof(**array));
}
I wanted to know if there was a way to use scanf so I can take in an unknown number of string arguments and put them into a char* array. I have seen it being done with int values, but can't find a way for it to be done with char arrays. Also the arguments are entered on the same line separated by spaces.
Example:
user enters hello goodbye yes, hello gets stored in array[0], goodbye in array[1] and yes in array[2]. Or the user could just enter hello and then the only thing in the array would be hello.
I do not really have any code to post, as I have no real idea how to do this.
You can do something like, read until the "\n" :
scanf("%[^\n]",buffer);
you need to allocate before hand a big enough buffer.
Now go through the buffer count the number of words, and allocate the necessary space char **array = ....(dynamic string allocation), go to the buffer and copy string by string into the array.
An example:
int words = 1;
char buffer[128];
int result = scanf("%127[^\n]",buffer);
if(result > 0)
{
char **array;
for(int i = 0; buffer[i]!='\0'; i++)
{
if(buffer[i]==' ' || buffer[i]=='\n' || buffer[i]=='\t')
{
words++;
}
}
array = malloc(words * sizeof(char*));
// Using RoadRunner suggestion
array[0] = strtok (buffer," ");
for(int w = 1; w < words; w++)
{
array[w] = strtok (NULL," ");
}
}
As mention in the comments you should use (if you can) fgets instead fgets(buffer,128,stdin);.
More about strtok
If you have an upper bound to the number of strings you may receive from the user, and to the number of characters in each string, and all strings are entered on a single line, you can do this with the following steps:
read the full line with fgets(),
parse the line with sscanf() with a format string with the maximum number of %s conversion specifiers.
Here is an example for up to 10 strings, each up to 32 characters:
char buf[400];
char s[10][32 + 1];
int n = 0;
if (fgets(buf, sizeof buf, sdtin)) {
n = sscanf("%32s%32s%32s%32s%32s%32s%32s%32s%32s%32s",
s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8], s[9]));
}
// `n` contains the number of strings
// s[0], s[1]... contain the strings
If the maximum number is not known of if the maximum length of a single string is not fixed, or if the strings can be input on successive lines, you will need to iterate with a simple loop:
char buf[200];
char **s = NULL;
int n;
while (scanf("%199s", buf) == 1) {
char **s1 = realloc(s, (n + 1) * sizeof(*s));
if (s1 == NULL || (s1[n] = strdup(buf)) == NULL) {
printf("allocation error");
exit(1);
}
s = s1;
n++;
}
// `n` contains the number of strings
// s[0], s[1]... contain pointers to the strings
Aside from the error handling, this loop is comparable to the hard-coded example above but it still has a maximum length for each string. Unless you can use a scanf() extension to allocate the strings automatically (%as on GNU systems), the code will be more complicated to handle any number of strings with any possible length.
You can use:
fgets to read input from user. You have an easier time using this instead of scanf.
malloc to allocate memory for pointers on the heap. You can use a starting size, like in this example:
size_t currsize = 10
char **strings = malloc(currsize * sizeof(*strings)); /* always check
return value */
and when space is exceeded, then realloc more space as needed:
currsize *= 2;
strings = realloc(strings, currsize * sizeof(*strings)); /* always check
return value */
When finished using the requested memory from malloc() and realloc(), it's always to good to free the pointers at the end.
strtok to parse the input at every space. When copying over the char * pointer from strtok(), you must also allocate space for strings[i], using malloc() or strdup.
Here is an example I wrote a while ago which does something very similar to what you want:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define INITSIZE 10
#define BUFFSIZE 100
int
main(void) {
char **strings;
size_t currsize = INITSIZE, str_count = 0, slen;
char buffer[BUFFSIZE];
char *word;
const char *delim = " ";
int i;
/* Allocate initial space for array */
strings = malloc(currsize * sizeof(*strings));
if(!strings) {
printf("Issue allocating memory for array of strings.\n");
exit(EXIT_FAILURE);
}
printf("Enter some words(Press enter again to end): ");
while (fgets(buffer, BUFFSIZE, stdin) != NULL && strlen(buffer) > 1) {
/* grow array as needed */
if (currsize == str_count) {
currsize *= 2;
strings = realloc(strings, currsize * sizeof(*strings));
if(!strings) {
printf("Issue reallocating memory for array of strings.\n");
exit(EXIT_FAILURE);
}
}
/* Remove newline from fgets(), and check for buffer overflow */
slen = strlen(buffer);
if (slen > 0) {
if (buffer[slen-1] == '\n') {
buffer[slen-1] = '\0';
} else {
printf("Exceeded buffer length of %d.\n", BUFFSIZE);
exit(EXIT_FAILURE);
}
}
/* Parsing of words from stdin */
word = strtok(buffer, delim);
while (word != NULL) {
/* allocate space for one word, including nullbyte */
strings[str_count] = malloc(strlen(word)+1);
if (!strings[str_count]) {
printf("Issue allocating space for word.\n");
exit(EXIT_FAILURE);
}
/* copy strings into array */
strcpy(strings[str_count], word);
str_count++;
word = strtok(NULL, delim);
}
}
/* print and free strings */
printf("Your array of strings:\n");
for (i = 0; i < str_count; i++) {
printf("strings[%d] = %s\n", i, strings[i]);
free(strings[i]);
strings[i] = NULL;
}
free(strings);
strings = NULL;
return 0;
}
I have buffer problem on this line
strcpy_s(*(pWords + word_count), word_length, pWord);
I'm trying to read a file from argv[1] and print out every single word in that file and their occurrence, but I can't figure out whats wrong..?!?
int main(int argc, char* argv[])
{
char *delimiters = argv[2]; // Prose delimiters
char buf[BUF_LEN]; // Buffer for a line of keyboard input
size_t str_size = INIT_STR_EXT; // Current memory to store prose
char* filePath = argv[1];
FILE *fP ;
char* pStr = malloc(str_size); // Pointer to prose to be tokenized
*pStr = '\0'; // Set 1st character to null
fopen_s(&fP, filePath, "r");
fread(buf, BUF_LEN, 10, fP);
size_t maxWords = 10; // Current maximum word count
int word_count = 0; // Current word count
size_t word_length = 0; // Current word length
char** pWords = calloc(maxWords, sizeof(char*)); // Stores pointers to the words
int* pnWord = calloc(maxWords, sizeof(int)); // Stores count for each word
size_t str_len = strnlen_s(buf, BUF_LEN); // Length used by strtok_s()
char* ptr = NULL; // Pointer used by strtok_s()
char* pWord = strtok_s(buf, delimiters, &ptr); // Find 1st word
if (!pWord)
{
printf("No words found. Ending program.\n");
return 1;
}
bool new_word = true; // False for an existing word
while (pWord)
{
// Check for existing word
for (int i = 0; i < word_count; ++i)
if (strcmp(*(pWords + i), pWord) == 0)
{
++*(pnWord + i);
new_word = false;
break;
}
if (new_word) // Not NULL if new word
{
//Check for sufficient memory
if (word_count == maxWords)
{ // Get more space for pointers to words
maxWords += WORDS_INCR;
pWords = realloc(pWords, maxWords*sizeof(char*));
// Get more space for word counts
pnWord = realloc(pnWord, maxWords*sizeof(int));
}
// Found a new word so get memory for it and copy it there
word_length = ptr - pWord; // Length of new word
*(pWords + word_count) = malloc(word_length);
strcpy_s(*(pWords + word_count), word_length, pWord); // Copy to array
*(pnWord + word_count++) = 1; // Increment word count
}
else
new_word = true; // Reset new word flag
pWord = strtok_s(NULL, delimiters, &ptr); // Find subsequent word
}
strcpy_s adds a null byte to the end of the string. You need to malloc(word_length+1).
There are two problems with this line:
fread(buf, BUF_LEN, 10, fP);
Firstly the buffer is too small by a factor of 10 as you read 10 elements.
Second, it does not read the file further than BUF_LEN (previously, *10).
Also the code does not take care of newline chars, as I cannot pass that in argv[2] delimiter spec, even as " \\n".
I suggest you replace fread() with a loop of fgets(), and redefine the word delimiters.
#define BUF_LEN 1000 // plenty of room
...
char buf[BUF_LEN+1]; // allow for 0 terminator
char delimiters[] = " \n\t"; // predefined
...
//size_t str_len = strnlen_s(buf, BUF_LEN); // unnecessary
while (fgets(buf, BUF_LEN, fP) != NULL) { // new outer loop
char* ptr = NULL; // carry on as you were
...
}
Next, as others commented, increase the string space allocation
*(pWords + word_count) = malloc(word_length+1);
In addition, although you have used the "safe" string functions, you did not check argc or the result of any of fopen_s(), fread(), malloc(), calloc(), realloc(), nor have you closed the file or released memory.
Looks to me like you forgot to get an additional byte for the 0 character.
Despite that: Instead of allocating a fixed buffer size for your file, you could get the filesize with fseek using SEEK_END and an offset of 0 to allocate that much memory+1 byte
i have a certain txt file(for instance - dic.txt) in which words appear in this order:
hello - ola - hiya \n
chips - fries - frenchfries \n
I need to read the contents of the file into an array of string arrays:
for instance:
array[0] : [hello,ola,hiya]
array[1] : [chips,fries,frenchfries]
I was thinking of using strtok in order to split each line in the file into a string (after copying the entire file into a string and calculating the number of lines),but i could not figure how to split each line ("hello - ola - hiya \n") into the words,and storing each array into the array (an array of strings within an array).
I was considering using malloc in order to allocate memory for each line of words,and storing the pointer to the string's array into the array,but i will be glad to receive any suggestions.
The straightforward way to read lines from a file and then split them into tokens is to read lines with fgets and then use strtok to split each line into tokens:
int main(int argc, char *argv[])
{
// Check for arguments and file pointer omitted
FILE *f = fopen(argv[1], "r");
for (;;) {
char line[80];
char *token;
if (fgets(line, 80, f) == NULL) break;
token = strtok(line, " -\n");
while (token) {
// Do something with token, for example:
printf("'%s' ", token);
token = strtok(NULL, " -\n");
}
}
fclose(f);
return 0;
}
This approach is fine as long as all the lines in your file are shorter than 80 characters. It works for variable numbers of tokens per line.
You have mentioned the issue of handling memory for the lines. The example above assumes that the memory handling is done by the data structure for each word. (It's not part of the example, which just prints the tokens.)
You can malloc memory for each line, which is more flexible than a rigid character limit per line, but you'll end up with a lot of allocations. The benefit is that your words don't need extra memory, they can just be pointers into the lines, but you'll have to take care of properly allocating memory for the lines - and freeing it afterwards.
If you read the whole text file to a contiguous chunk of memory, you're basically done with memory storage, as long as you keep that chunk "alive" as long as your words live:
char *slurp(const char *filename, int *psize)
{
char *buffer;
int size;
FILE *f;
f = fopen(filename, "r");
if (f == NULL) return NULL;
fseek(f, 0, SEEK_END);
size = ftell(f);
fseek(f, 0, SEEK_SET);
buffer = malloc(size + 1);
if (buffer) {
if (fread(buffer, 1, size, f) < size) {
free(buffer);
} else {
buffer[size] = '\0';
if (psize) *psize = size;
}
}
fclose(f);
return buffer;
}
With that chunk of memory, you can first look for lines by looking for the next newline, and then use strtok as above:
int main(int argc, char *argv[])
{
char *buffer; // contiguous memory chunk
char *next; // pointer to next line or NULL for last line
buffer = slurp(argv[1], NULL);
if (buffer == NULL) return 0;
next = buffer;
while (next) {
char *token;
char *p = next;
// Find beginning of the next line,
// i.e. the char after the next newline
next = strchr(p, '\n');
if (next) {
*next = '\0'; // Null-terminate line
next = next + 1; // Advance past newline
}
token = strtok(p, " -\n");
while (token) {
// Do something with token, for example:
printf("'%s' ", token);
token = strtok(NULL, " -\n");
}
}
free(buffer); // ... and invalidate your words
return 0;
}
If you use fscan, you always copy the found tokens to a temporary buffer and when you store them away in your dictionary structure, you have to copy them again with strcpy. That's a lot of copying. Here, you read and allocate once and then work with pointers into the chunk. strtok null-terminates the tokens, so your chunk is a chain of C strings.
Reading the wholem file into memory is usually not a good solution, but in this case, where the file basically is the data, it makes sense.
(Note: All this discussion about memory does not affect the memory needed for your dictionary structure, the nodes in trees and lined lists or whatever. It is just about storing the strings proper.)
using fgets:
int eol(int c, FILE *stream) //given a char and the file, check if eol included
{
if (c == '\n')
return 1;
if (c == '\r') {
if ((c = getc(stream)) != '\n')
ungetc(c, stream);
return 1;
}
return 0;
}
int charsNumInLine(FILE *stream)
{
int position = ftell(stream);
int c, num_of_chars=0;
while ((c = getc(stream)) != EOF && !eol(c, stream))
num_of_chars++;
fseek(stream,position,SEEK_SET); //get file pointer to where it was before this function call
return num_of_chars;
}
void main()
{
//...
char *buffer;
int size;
while()
{
size=charsNumInLine(stream);
buffer = (char*)malloc( size*sizeof(char) );
fgets(buffer,sizeof(buffer),stream);
if (feof(stream) || ferror(stream) )
break;
// use strtok to separate words...
}
//...
}
another way is to use fscanf(file,"%s",buff)to read words and then use the above function eol to see when we get to a newline.
I am doing an exercise on a book, changing the words in a sentence into pig latin. The code works fine in window 7, but when I compiled it in mac, the error comes out.
After some testings, the error comes from there. I don't understand the reason of this problem. I am using dynamic memories for all the pointers and I have also added the checking of null pointer.
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
Full source code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#define inputSize 81
void getSentence(char sentence [], int size);
int countWord(char sentence[]);
char ***parseSentence(char sentence[], int *count);
char *translate(char *world);
char *translateSentence(char ***words, int count);
int main(void){
/* Local definition*/
char sentence[inputSize];
int wordsCnt;
char ***head;
char *result;
getSentence(sentence, inputSize);
head = parseSentence(sentence, &wordsCnt);
result = translateSentence(head, wordsCnt);
printf("\nFinish the translation: \n");
printf("%s", result);
return 0;
}
void getSentence(char sentence [81], int size){
char *input = (char *)malloc(size);
int length;
printf("Input the sentence to big latin : ");
fflush(stdout);
fgets(input, size, stdin);
// do not copy the return character at inedx of length - 1
// add back delimater
length = strlen(input);
strncpy(sentence, input, length-1);
sentence[length-1]='\0';
free(input);
}
int countWord(char sentence[]){
int count=0;
/*Copy string for counting */
int length = strlen(sentence);
char *temp = (char *)malloc(length+1);
strcpy(temp, sentence);
/* Counting */
char *pToken = strtok(temp, " ");
char *last = NULL;
assert(pToken == temp);
while (pToken){
count++;
pToken = strtok(NULL, " ");
}
free(temp);
return count;
}
char ***parseSentence(char sentence[], int *count){
// parse the sentence into string tokens
// save string tokens as a array
// and assign the first one element to the head
char *pToken;
char ***words;
char *pW;
int noWords = countWord(sentence);
*count = noWords;
/* Initiaze array */
int i;
words = (char ***)calloc(noWords+1, sizeof(char **));
for (i = 0; i< noWords; i++){
words[i] = (char **)malloc(sizeof(char *));
}
/* Parse string */
// first element
pToken = strtok(sentence, " ");
if (pToken){
pW = (char *)malloc(strlen(pToken)+1);
strcpy(pW, pToken);
**words = pW;
/***words = pToken;*/
// other elements
for (i=1; i<noWords; i++){
pToken = strtok(NULL, " ");
pW = (char *)malloc(strlen(pToken)+1);
strcpy(pW, pToken);
**(words + i) = pW;
/***(words + i) = pToken;*/
}
}
/* Loop control */
words[noWords] = NULL;
return words;
}
/* Translate a world into big latin */
char *translate(char *word){
int length = strlen(word);
char *bigLatin = (char *)malloc(length+3);
/* translate the word into pig latin */
static char *vowel = "AEIOUaeiou";
char *matchLetter;
matchLetter = strchr(vowel, *word);
// consonant
if (matchLetter == NULL){
// copy the letter except the head
// length = lenght of string without delimiter
// cat the head and add ay
// this will copy the delimater,
strncpy(bigLatin, word+1, length);
strncat(bigLatin, word, 1);
strcat(bigLatin, "ay");
}
// vowel
else {
// just append "ay"
strcpy(bigLatin, word);
strcat(bigLatin, "ay");
}
return bigLatin;
}
char *translateSentence(char ***words, int count){
char *bigLatinSentence;
int length = 0;
char *bigLatinWord;
/* calculate the sum of the length of the words */
char ***walker = words;
while (*walker){
length += strlen(**walker);
walker++;
}
/* allocate space for return string */
// one space between 2 words
// numbers of space required =
// length of words
// + (no. of words * of a spaces (1) -1 )
// + delimater
// + (no. of words * ay (2) )
int lengthOfResult = length + count + (count * 2);
bigLatinSentence = (char *)malloc(lengthOfResult);
// trick to initialize the first memory
strcpy(bigLatinSentence, "");
/* Translate each word */
int i;
char *w;
for (i=0; i<count; i++){
w = translate(**(words + i));
strcat(bigLatinSentence, w);
strcat(bigLatinSentence, " ");
assert(w != **(words + i));
free(w);
}
/* free memory of big latin words */
walker = words;
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
return bigLatinSentence;
}
Your code is unnecessarily complicated, because you have set things up such that:
n: the number of words
words: points to allocated memory that can hold n+1 char ** values in sequence
words[i] (0 <= i && i < n): points to allocated memory that can hold one char * in sequence
words[n]: NULL
words[i][0]: points to allocated memory for a word (as before, 0 <= i < n)
Since each words[i] points to stuff-in-sequence, there is a words[i][j] for some valid integer j ... but the allowed value for j is always 0, as there is only one char * malloc()ed there. So you could eliminate this level of indirection entirely, and just have char **words.
That's not the problem, though. The freeing loop starts with walker identical to words, so it first attempts to free words[0][0] (which is fine and works), then attempts to free words[0] (which is fine and works), then attempts to free words (which is fine and works but means you can no longer access any other words[i] for any value of i—i.e., a "storage leak"). Then it increments walker, making it more or less equivalent to &words[1]; but words has already been free()d.
Instead of using walker here, I'd use a loop with some integer i:
for (i = 0; words[i] != NULL; i++) {
free(words[i][0]);
free(words[i]);
}
free(words);
I'd also recommending removing all the casts on malloc() and calloc() return values. If you get compiler warnings after doing this, they usually mean one of two things:
you've forgotten to #include <stdlib.h>, or
you're invoking a C++ compiler on your C code.
The latter sometimes works but is a recipe for misery: good C code is bad C++ code and good C++ code is not C code. :-)
Edit: PS: I missed the off-by-one lengthOfResult that #David RF caught.
int lengthOfResult = length + count + (count * 2);
must be
int lengthOfResult = length + count + (count * 2) + 1; /* + 1 for final '\0' */
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
/* free(walker); Don't do this, you still need walker */
walker++;
}
free(words); /* Now */
And you have a leak:
int main(void)
{
...
free(result); /* You have to free the return of translateSentence() */
return 0;
}
In this code:
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
You need to check that **walker is not NULL before freeing it.
Also - when you compute the length of memory you need to return the string, you are one byte short because you copy each word PLUS A SPACE (including a space after the last word) PLUS THE TERMINATING \0. In other words, when you copy your result into the bigLatinSentence, you will overwrite some memory that isn't yours. Sometimes you get away with that, and sometimes you don't...
Wow, so I was intrigued by this, and it took me a while to figure out.
Now that I figured it out, I feel dumb.
What I noticed from running under gdb is that the thing failed on the second run through the loop on the line
free(walker);
Now why would that be so. This is where I feel dumb for not seeing it right away. When you run that line, the first time, the whole array of char*** pointers at words (aka walker on the first run through) on the second run through, when your run that line, you're trying to free already freed memory.
So it should be:
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
walker++;
}
free(words);
Edit:
I also want to note that you don't have to cast from void * in C.
So when you call malloc, you don't need the (char *) in there.