Stack Smashing and using malloc - c

I'm making a program that counts the number of words contained within a file. My code works for certain test cases with files that have less than a certain amount of words/characters...But when I test it with, let's say, a word like:
"loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong", (this is not random--this is an actual test case I'm required to check), it gives me this error:
*** stack smashing detected ***: ./wcount.out terminated
Abort (core dumped)
I know what the error means and that I have to implement some sort of malloc line of code to be able to allocate the right amount of memory, but I can't figure out where in my function to put it or how to do it:
int NumberOfWords(char* argv[1]) {
FILE* inFile = NULL;
char temp_word[20]; <----------------------I think this is the problem
int num_words_in_file;
int words_read = 0;
inFile = fopen(argv[1], "r");
while (!feof(inFile)) {
fscanf(inFile, "%s", temp_word);
words_read++;
}
num_words_in_file = words_read;
printf("There are %d word(s).\n", num_words_in_file - 1);
fclose(inFile);
return num_words_in_file;
}

As you've correctly identified by rendering your source code invalid (future tip: /* put your arrows in comments */), the problem is that temp_word only has enough room for 20 characters (one of which must be a terminal null character).
In addition, you should check the return value of fopen. I'll leave that as an exercise for you. I've answered this question in other questions (such as this one), but I don't think just shoving code into your face will help you.
In this case, I think it may pay to better analyse the problem you have, to see if you actually need to store words to count them. As we define a word (the kind read by scanf("%s", ...) as a sequence of non-whitespace characters followed by a sequence of (zero or more) whitespace characters, we can see that such a counting program as yours needs to follow the following procedure:
Read as much whitespace as possible
Read as much non-whitespace as possible
Increment the "word" counter if all was successful
You don't need to store the non-whitespace any more than you do the whitespace, because once you've read it you'll never revisit it. Thus you could write this as two loops embedded into one: one loop which reads as much whitespace as possible, another which reads non-whitespace, followed by your incrementation and then the outer loop repeats the whole lot... until EOF is reached...
This will be best achieved using the %*s directive, which tells scanf-related functions not to try to store the word. For example:
size_t word_count = 0;
do {
fscanf(inFile, "%*s");
} while (!feof(inFile) && ++word_count);

You are limited by the size of your array. A simple solution would be to increase the size of your array. But you are always susceptible to stack smashing if someone enters a long word.
A word is delimited by spaces.
You can simply store a counter variable initialized to zero, and a variable that records the current char that you are looking at. Every time you read in a character using fgetc(inFile, &temp) that is a space, you increment the counter.

In your current code you simply want to count the words. Therefore you are not interested in the words themselves. You can suppress the assignment with the optional * character:
fscanf(inFile, "%*s");

Related

How to store multiple sets of strings in a character array

I am trying to store a given input number of sets of strings in a 3D character array, but couldn't.
Is it even possible using char arrays or should i use any other concept like data structures......?
int main()
{
int i,j,T,N[10];
char word[10][10][10];
scanf("%d",&T);/* No of test cases*/
for(i=0;i<T;i++)
{
scanf("%d",&N[i]); /*No of strings*/
for(j=0;j<N[i];j++)
scanf("%s",word[i][j]); /* reading the strings*/
}
return 0;
First: a "3D character array" is better thought of as a "2D string matrix" in this case.
And yes, of course it's very possible.
There are some weaknesses with your code that might trip it up, hard to say since you don't show a full test case with the input data you provide.
scanf() can fail, in which case you cannot rely on the variables having values
scanf() with %s will stop on the first whitespace character, which might cause your scanning code to become very confused
You don't limit the size of string you scan, but only provide 10 bytes of buffer space per string, so easy to get buffer overruns.
A better solution would be to check that the scanning succeeded, and make each string be on a line on its own, and read full lines using fgets() into an appropriately-sized buffer, then perhaps copy the part you want to keep into place in the matrix.

How to print file contents to stdout without storing them in memory?

My program takes in files with arbitrarily long lines. Since I don't know how much characters would be on a line, I would like to print the whole line to stdout, without malloc-ing an array to store it. Is this possible?
I am aware that it's possible to print these lines one chunk at a time-- however, the function doing the printing would be called very often, and I wish to avoid the overhead of malloc-ing arrays that hold the output, in every single call.
First of all you can't print things that's not exist, means that you have to store it somewhere, either in the stack or heap. If you use FILE* then libc will do it for you automatically.
Now if you use FILE*, you can use getc to get an ASCII character a time, check if the character is a newline character and push it to stdout.
If you's using file descriptor, you can read a character a time and do exactly the same thing.
Both approaches does not require you explicitly allocate memory in the heap.
Now if you use mmap, you can perform some strtok family function and then print the string to stdout.
takes in files with arbitrarily long lines ... print the whole line to stdout, without malloc-ing an array to store it. Is this possible?
In general, for arbitrary long lines: no.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. C11dr ยง7.21.2 2
The length of a line is not limited to SIZE_MAX, the longest array possible in C. The length of a line can exceed the memory capacity of the computer. There is just no way to read arbitrary long lines. Simply code could use the following. I doubt it will be satisfactory, yet it does print the entire contents of a file with scant memory.
// Reads one character at a time.
int ch;
while((ch = fgetc(fp)) != EOF) {
putchar(ch);
}
Instead, code should set a sane upper bound on line length. Create an array or allocate for the line. As much as a flexible long line is useful, it is also susceptible to malicious abuse by a hacker exploit consuming unrestrained resources.
#define LINE_LENGTH_MAX 100000
char *line = malloc(LINE_LENGTH_MAX + 1);
if (line) {
while (fgets(line, LINE_LENGTH_MAX+1, fp)) {
if (strlen(line) >= LINE_LENGTH_MAX) {
Handle_Possible_Attach();
}
foo(line); // Use line
}
free(line);
)

what does fscanf being == 1 do

There is more to this code obviously but I am just curious as to what this line of code actually does. I know the while loop and such but am new to the fscanf()
while (fscanf(input_file, "%s", curr_word) == 1)
fscanf() returns the number of input items successfully scanned and stored.
as per the man page
Return Value
These functions return the number of input items successfully matched and assigned, which can be fewer than provided for, or even zero in the event of an early matching failure.
In your case
while (fscanf(input_file, "%s", curr_word) == 1)
fsaacf() will return a value of 1 if it is able to successfully scan a string (as per the %s format specifier) from input_file and put it into curr_word.
fscanf(input_file, "%s", curr_word) reads the input stream input_file and stores the next sequence of non spacing characters into the array pointed to by cuur_word and appends a '\0' byte. As you can see, the size of this array is not passed to fscanf. This is a classical case of potential buffer overflow, a security flaw that can be exploited by a hacker by storing appropriate contents in the input stream.
After gets, the scanf family of library functions is the best source of buffer overflow bugs one can find.
It is very difficult to use fscanf correctly. Most C programmers should avoid it.

Dynamic memory allocation for input? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am having a lot of trouble starting my project. Here are the directions:
"Complete counts.c as follows:
Read characters from standard input until EOF (the end-of-file mark) is read. Do not prompt the user to enter text - just read data as soon as the program starts.
Keep a running count of each different character encountered in the input, and keep count of the total number of characters input (excluding EOF)."
The format my professor gave me to start is: `
#include <stdio.h>
int main(int argc, char *argv[]) {
return 0;
}
In addition to how to start the problem, I'm also confused as to why the two parameter's are given in the main function when nothing is going to be passed to it. Help would be much appretiated! Thank you!
`
Slightly tricky to see what you're having trouble with here. The title doesn't form a complete question, nor is there one in the body; and they seem to be hinting at entirely different questions.
The assignment tells you to read characters - not store them. You could have a loop that only reads them one at a time if you wish (for instance, using getchar). You're also asked to report counts of each character, which would make sense to store in an array. Given that this is of "each different character", the simplest way would be to size the array for all possible characters (limits.h defines UCHAR_MAX, which would help with this). Remember to initialize the array if it's automatically allocated (the default for function local variables).
Regarding the arguments to main, this program does not need them, and the C standard does allow you to leave them out. They're likely included as this is a template of a basic C program, to make it usable if command line arguments will be used also.
For more reference code you might want to compare the word count utility (wc); the character counting you want is the basis of a frequency analysis or histogram.
This should give you a start to investigate what you need to learn to complete your task,
Initially declare a character input buffer of sufficient size to read chars as,
char input[SIZE];
Use fgets() to read the characters from stdin as,
if (fgets(input, sizeof input, stdin) == NULL) {
; // handle EOF
}
Now input array has your string of characters which you to find occurrence of characters. I did not understand When you say different characters to count, however you have an array to traverse it completely to count the characters you need.
Firstly, luckily for you we will not need dynamic memory allocation at all here as we are not asked to store the input strings, instead we simply need to record how many of each ascii code is input during program run, as there a constant and finite number of those we can simply store them in a fixed size array.
The functions we are looking at here (assuming we are using standard libs) are as follows:
getchar, to read chars from standard input
printf, to print the outputs back to stdout
The constructs we will need are:
do {} while, to loop around until a condition is false
The rest just needs simple mathematical operators, here is a short example which basically shows a sample solution:
#include <stdio.h>
int main(int argc, char *argv[])
{
/* Create an array with entries for each char,
* then init it to zeros */
int AsciiCounts[256] = {0};
int ReadChar;
int TotalChars = 0;
int Iterator = 0;
do
{
/* Read a char from stdin */
ReadChar = getchar();
/* Increment the entry for its code in the array */
AsciiCounts[ReadChar]++;
TotalChars++;
} while (ReadChar != EOF);
/* Stop if we read an EOF */
do
{
/* Print each char code and how many times it occurred */
printf("Char code %#x occurred %d times\n", Iterator, AsciiCounts[Iterator]);
Iterator++;
} while (Iterator <= 255);
/* Print the total length read in */
printf("Total chars read (excluding EOF): %d", --TotalChars);
return 0;
}
Which should achieve the basic goal, however a couple of extension exercises which would likely benefit your understanding of C. First you could try to convert the second do while loop to a for loop, which is more appropriate for the situation but I did not use for simplicity's sake. Second you could add a condition so the output phase skips codes which never occurred. Finally it could be interesting to check which chars are printable and print their value instead of their hex code.
On the second part of the question, the reason those arguments are passed to main even though they are ignored is due to the standard calling convention of c programs under most OSes, they pass the number of command line arguments and values of each command line argument respectively in case the program wishes to check them. However if you really will not use them you can in most compilers just use main() instead however this makes things more difficult later if you choose to add command line options and has no performance benefit.

Please Explain this Example C Code

This code comes from K&R. I have read it several times, but it still seems to escape my grasp.
#define BUFSIZE 100
char buf[BUFSIZE];
int bufp = 0;
int getch(void)
{
return(bufp>0)?buf[--bufp]:getchar();
}
int ungetch(int c)
{
if(bufp>=BUFSIZE)
printf("too many characters");
else buf[bufp++]=c;
}
The purpose of these two functions, so K&R says, is to prevent a program from reading too much input. i.e. without this code a function might not be able to determine it has read enough data without first reading too much. But I don't understand how it works.
For example, consider getch().
As far as I can see this is the steps it takes:
check if bufp is greater than 0.
if so then return the char value of buf[--bufp].
else return getchar().
I would like to ask a more specific question, but I literally dont know how this code achieves what it is intended to achieve, so my question is: What is (a) the purpose and (b) the reasoning of this code?
Thanks in advance.
NOTE: For any K&R fans, this code can be found on page 79 (depending on your edition, I suppose)
(a) The purpose of this code is to be able to read a character and then "un-read" it if it turns out you accidentally read a character too many (with a max. of 100 characters to be "un-read"). This is useful in parsers with lookahead.
(b) getch reads from buf if it has contents, indicated by bufp>0. If buf is empty, it calls getchar. Note that it uses buf as a stack: it reads it from right-to-left.
ungetch pushes a character onto the stack buf after doing a check to see if the stack isn't full.
The code is not really for "reading too much input", instead is it so you can put back characters already read.
For example, you read one character with getch, see if it is a letter, put it back with ungetch and read all letters in a loop. This is a way of predicting what the next character will be.
This block of code is intended for use by programs that make decisions based on what they read from the stream. Sometimes such programs need to look at a few character from the stream without actually consuming the input. For example, if your input looks like abcde12xy789 and you must split it into abcde, 12, xy, 789 (i.e. separate groups of consecutive letters from groups of consecutive digits) you do not know that you have reached the end of a group of letters until you see a digit. However, you do not want to consume that digit at the time you see it: all you need is to know that the group of letters is ending; you need a way to "put back" that digit. An ungetch comes in handy in this situation: once you see a digit after a group of letters, you put the digit back by calling ungetch. Your next iteration will pick that digit back up through the same getch mechanism, sparing you the need to preserve the character that you read but did not consume.
1. The other idea also shown here can be also called as a very primitive I/O stack mangement system and gives the implementation of the function getch() and ungetch().
2. To go a step further , suppose you want to design an Operating System , how can you handle the memory which stores all the keystrokes?
This is solved by the above code snippet.An extension of this concept is used in file handling , especially in editing files .In that case instead of using getchar() which is used to take input from Standard input , a file is used as a source of input.
I have a problem with code given in question. Using buffer (in form of stack) in this code is not correct as when getting more than one extra inputs and pushing into stack will have undesired effect in latter processing (getting input from buffer).
This is because when latter processing (getting input) going on ,this buffer (stack) will give extra input in reverse order (means last extra input given first).
Because of LIFO (Last in first out ) property of stack , the buffer in this code must be quene as it will work better in case of more than one extra input.
This mistake in code confused me and finally this buffer must be quene as shown below.
#define BUFSIZE 100
char buf[BUFSIZE];
int bufr = 0;
int buff = 0;
int getch(void)
{
if (bufr ==BUFSIZE)
bufr=0;
return(bufr>=0)?buf[bufr++]:getchar();
}
int ungetch(int c)
{
if(buff>=BUFSIZE && bufr == 0)
printf("too many characters");
else if(buff ==BUFSIZE)
buff=0;
if(buff<=BUFSIZE)
buf[buff++]=c;
}

Resources