K&R Chapter 1 - Exercise 22 solution, what do you think? - c

I'm learning C from the k&r as a first language, and I just wanted to ask, if you thought this exercise was being solved the right way, I'm aware that it's probably not as complete as you'd like, but I wanted views, so I'd know I'm learning C right.
Thanks
/* Exercise 1-22. Write a program to "fold" long input lines into two or
* more shorter lines, after the last non-blank character that occurs
* before then n-th column of input. Make sure your program does something
* intelligent with very long lines, and if there are no blanks or tabs
* before the specified column.
*
* ~svr
*
* [NOTE: Unfinished, but functional in a generic capacity]
* Todo:
* Handling of spaceless lines
* Handling of lines consisting entirely of whitespace
*/
#include <stdio.h>
#define FOLD 25
#define MAX 200
#define NEWLINE '\n'
#define BLANK ' '
#define DELIM 5
#define TAB '\t'
int
main(void)
{
int line = 0,
space = 0,
newls = 0,
i = 0,
c = 0,
j = 0;
char array[MAX] = {0};
while((c = getchar()) != EOF) {
++line;
if(c == NEWLINE)
++newls;
if((FOLD - line) < DELIM) {
if(c == BLANK) {
if(newls > 0) {
c = BLANK;
newls = 0;
}
else
c = NEWLINE;
line = 0;
}
}
array[i++] = c;
}
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
return 0;
}

I'm sure you on the rigth track, but some pointers for readability:
comment your stuff
name the variables properly and at least give a description if you refuse
be consequent, some single-line if's you use and some you don't. (imho, always use {} so it's more readable)
the if statement in the last for-loop can be better, like
if(array[0] != NEWLINE)
{
printf("%c", array[line]);
}

That's no good IMHO.
First, it doesn't do what you were asked for. You were supposed to find the last blank after a nonblank before the output line boundary. Your program doesn't even remotely try to do it, it seems to strive for finding the first blank after (margin - 5) characters (where did the 5 came from? what if all the words had 9 letters?). However it doesn't do that either, because of your manipulation with the newls variable. Also, this:
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
is probably wrong, because you check for a condition that never changes throughout the loop.
And, last but not least, storing the whole file in a fixed-size buffer is not good, because of two reasons:
the buffer is bound to overflow on large files
even if it would never overflow, people still wouldn't like you for storing eg. a gigabyte file in memory just to cut it into 25-character chunks
I think you should start again, rethink your algorithm (incl. corner cases), and only after that, start coding. I suggest you:
process the file line-by-line (meaning output lines)
store the line in a buffer big enough to hold the largest output line
search for the character you'll break at in the buffer
then print it (hint: you can terminate the string with '\0' and print with printf("%s", ...)), copy what you didn't print to the start of the buffer, proceed from that

An obvious problem is that you statically allocate 'array' and never check the index limits while accessing it. Buffer overflow waiting to happen. In fact, you never reset the i variable within the first loop, so I'm kinda confused about how the program is supposed to work. It seems that you're storing the complete input in memory before printing it word-wrapped?
So, suggestions: merge the two loops together and print the output for each line that you have completed. Then you can re-use the array for the next line.
Oh, and better variable names and some comments. I have no idea what 'DELIM' is supposed to do.

It looks (without testing) like it could work, but it seems kind of complicated.
Here's some pseudocode for my first thought
const int MAXLINE = ?? — maximum line length parameter
int chrIdx = 0 — index of the current character being considered
int cand = -1 — "candidate index", Set to a potential break character
char linebuf[bufsiz]
int lineIdx = 0 — index into the output line
char buffer[bufsiz] — a character buffer
read input into buffer
for ix = 0 to bufsiz -1
do
if buffer[ix] == ' ' then
cand = ix
fi
linebuf[lineIdx] = buffer[ix]
lineIdx += 1
if lineIdx >= MAXLINE then
linebuf[cand] = NULL — end the string
print linebuf
do something to move remnants to front of line (memmove?)
fi
od
It's late and I just had a belt, so there may be flaws, but it shows the general idea — load a buffer, and copy the contents of the buffer to a line buffer, keeping track of the possible break points. When you get close to the end, use the breakpoint.

Related

I am trying to create a code polisher program in C

I am trying to create the function delete_comments(). The read_file() and main functions are given.
Implement function char *delete_comments(char *input) that removes C comments from program stored at input. input variable points to dynamically allocated memory. The function returns pointer to the polished program. You may allocate a new memory block for the output, or modify the content directly in the input buffer.
You’ll need to process two types of comments:
Traditional block comments delimited by /* and */. These comments may span multiple lines. You should remove only characters starting from /* and ending to */ and for example leave any following newlines untouched.
Line comments starting with // until the newline character. In this case, newline character must also be removed.
The function calling delete_comments() only handles return pointer from delete_comments(). It does not allocate memory for any pointers. One way to implement delete_comments() function is to allocate memory for destination string. However, if new memory is allocated then the original memory in input must be released after use.
I'm having trouble understanding why my current approach is wrong or what is the specific problem that I'm getting weird output. I'm approaching the problem by trying to create a new array where to copy the input string with the new rules.
#include "source.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Remove C comments from the program stored in memory block <input>.
* Returns pointer to code after removal of comments.
* Calling code is responsible of freeing only the memory block returned by
* the function.
*/
char *delete_comments(char *input)
{
input = malloc(strlen(input) * sizeof (char));
char *secondarray = malloc(strlen(input) * sizeof (char));
int x, y = 0;
for (x = 0, y = 0; input[x] != '\0'; x++) {
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
else if ((input[x] == '/') && (input[x + 1] == '/')) {
int j = 0;
while (input[x + j] != '\n') {
y++;
j++;
}
}
else {
secondarray[x] = input[y];
y++;
}
}
return secondarray;
}
/* Read given file <filename> to dynamically allocated memory.
* Return pointer to the allocated memory with file content, or
* NULL on errors.
*/
char *read_file(const char *filename)
{
FILE *f = fopen(filename, "r");
if (!f)
return NULL;
char *buf = NULL;
unsigned int count = 0;
const unsigned int ReadBlock = 100;
unsigned int n;
do {
buf = realloc(buf, count + ReadBlock + 1);
n = fread(buf + count, 1, ReadBlock, f);
count += n;
} while (n == ReadBlock);
buf[count] = 0;
return buf;
}
int main(void)
{
char *code = read_file("testfile.c");
if (!code) {
printf("No code read");
return -1;
}
printf("-- Original:\n");
fputs(code, stdout);
code = delete_comments(code);
printf("-- Comments removed:\n");
fputs(code, stdout);
free(code);
}
Your program has fundamental issues.
It fails to tokenize the input. Comment start sequences can occur inside string literals, in which case they do not denote comments: "/* not a comment".
You have some basic bugs:
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
Here, when we enter the loop, with i = 0, input + x is still pointing to the opening /. We did not skip over the opening * and are already looking for a closing *. This means that the sequence /*/ will be recognized as a complete comment, which it isn't.
This loop's also assumes that every /* comment is properly closed. It's not checking for the null character which can terminate the input, so if the comment is not closed, it will march beyond the end of the buffer.
C has line continuations. In ISO C translation stage 2, all backlash-newline sequences are deleted, converting one or more physical lines into logical lines. What that means is that a // comment can span multiple physical lines:
// this is an \
extended comment
You can see, by the way, that StackOverflow's automatic language detector for syntax highlighting is getting this right!
Line continuations are independent of tokenization, which doesn't happen until translation stage 3. Which means:
/\
/\
this is an extended \
comment
That one has defeated StackOverflow's syntax highlighting.
Furthermore, a line continuation can happen in any token, possibly multiple times:
"\
this is a string literal\
"
If you really want to make this work 100% correctly, you need to parse the input. By "parse" I mean a more formal, rigorous detection routine that understands what it is reading, in the context it is reading it.
For example, there are many times where this code could be defeated.
printf("the answer is %d // %d\n", a, b);
would likely trip your // detection and strip the end of the printf.
There are two general approaches to the problem above:
Find every corner case where comment-like characters could be used, and write conditional statements to avoid them before stripping.
Fully parse the language, so you will know if you are within a string or some other context that's wrapping comment like characters, or if you are in the top level context where the characters really mean "this is a comment"
To learn about parsing, I generally recommend "The Dragon Book" but it is a hard read, unless you have studied a bit of Discrete Mathematics. It covers a lot of different parsing techniques, and in doing so it doesn't have many pages left for examples. This means that it's the kind of book where you have to read, think, and then program a mini-example. If you follow that path, there is no input you can't tackle.
If you are pragmatic in your solution, and it is not about learning parsing, but about stripping comments, I recommend that you find a well constructed parser for C, and then learn how to walk the Abstract Syntax Tree in an Emitter, which fails to emit the comments.
There are some projects that do this already; but, I don't know if they have the right structure for easy modification. lint comes to mind, as well as other "pretty-printers" GCC certainly has the parsing code in there, but I've heard that GCC's Abstract Syntax Tree isn't easy to learn.
Your solution has several problems:
The worst issue
As the first instruction in delete_comments() you overwrite input with a new pointer returned by malloc(), which points to memory of random contents.
In consequence the address to the real input is lost.
Oh, and please check the returned value, if you call malloc().
Failing to increment the scanned position in comments correctly
You are scanning the input by the index x, but if you detect a comment, you don't change it.
You are actually advancing y but this is only used for the copying.
Think about lines like these:
int x; /* some /* weird /* comment */
///////////////////////////////
for (;;) { }
Ignoring character and string literals
Your solution should take character and string literals into account.
For example:
int c_plus_plus_comment_start = '//'; /* multi character constant */
const char* c_comment_start = "/*";
Note: There are more. Learn to use a debugger, or at least insert lots of printf()s in "interesting" places.

Stdin + Dictionary Text Replacement Tool -- Debugging

I'm working on a project in which I have two main files. Essentially, the program reads in a text file defining a dictionary with key-value mappings. Each key has a unique value and the file is formatted like this where each key-value pair is on its own line:
ipsum i%##!
fubar fubar
IpSum XXXXX24
Ipsum YYYYY211
Then the program reads in input from stdin, and if any of the "words" match the keys in the dictionary file, they get replaced with the value. There is a slight thing about upper and lower cases -- this is the order of "match priority"
The exact word is in the replacement set
The word with all but the first character converted to lower case is in the replacement set
The word converted completely to lower case is in the replacement set
Meaning if the exact word is in the dictionary, it gets replaced, but if not the next possibility (2) is checked and so on...
My program passes the basic cases we were provided but then the terminal shows
that the output vs reference binary files differ.
I went into both files (not c files, but binary files), and one was super long with tons of numbers and the other just had a line of random characters. So that didn't really help. I also reviewed my code and made some small tests but it seems okay? A friend recommended I make sure I'm accounting for the null operator in processInput() and I already was (or at least I think so, correct me if I'm wrong). I also converted getchar() to an int to properly check for EOF, and allocated extra space for the char array. I also tried vimdiff and got more confused. I would love some help debugging this, please! I've been at it all day and I'm very confused.
There are multiple issues in the processInput() function:
the loop should not stop when the byte read is 0, you should process the full input with:
while ((ch = getchar()) != EOF)
the test for EOF should actually be done differently so the last word of the file gets a chance to be handled if it occurs exactly at the end of the file.
the cast in isalnum((char)ch) is incorrect: you should pass ch directly to isalnum. Casting as char is actually counterproductive because it will turn byte values beyond CHAR_MAX to negative values for which isalnum() has undefined behavior.
the test if(ind >= cap) is too loose: if word contains cap characters, setting the null terminator at word[ind] will write beyond the end of the array. Change the test to if (cap - ind < 2) to allow for a byte and a null terminator at all times.
you should check that there is at least one character in the word to avoid calling checkData() with an empty string.
char key[ind + 1]; is useless: you can just pass word to checkData().
checkData(key, ind) is incorrect: you should pass the size of the buffer for the case conversions, which is at least ind + 1 to allow for the null terminator.
the cast in putchar((char)ch); is useless and confusing.
There are some small issues in the rest of the code, but none that should cause a problem.
Start by testing your tokeniser with:
$ ./a.out <badhash2.c >zooi
$ diff badhash2.c zooi
$
Does it work for binary files, too?:
$ ./a.out <./a.out > zooibin
$ diff ./a.out zooibin
$
Yes, it does!
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
void processInput(void);
int main(int argc, char **argv) {
processInput();
return 0;
}
void processInput() {
int ch;
char *word;
int len = 0;
int cap = 60;
word = malloc(cap);
while(1) {
ch = getchar(); // (1)
if( ch != EOF && isalnum(ch)) { // (2)
if(len+1 >= cap) { // (3)
cap += cap/2;
word = realloc(word, cap);
}
word[len++] = ch;
} else {
if (len) { // (4)
#if 0
char key[len + 1];
memcpy(key, word, len); key[len] = 0;
checkData(key, len);
#else
word[len] = 0;
fputs(word, stdout);
#endif
len = 0;
}
if (ch == EOF) break; // (5)
putchar(ch);
}
}
free(word);
}
I only repaired your tokeniser, leaving out the hash table and the search & replace stuff. It is now supposed to generate a verbatim copy of the input. (which is silly, but great for testing)
If you want to allow binary input, you cannot use while((ch = getchar()) ...) : a NUL in the input would cause the loop to end. You must pospone testing for EOF, because ther could still be a final word in your buffer ...&& ch != EOF)
treat EOF just like a space here: it could be the end of a word
you must reserve space for the NUL ('\0') , too.
if (len==0) there would be no word, so no need to look it up.
we treated EOF just like a space, but we don't want to write it to the output. Time to break out of the loop.

Reading words from a file while ignoring spaces and ".,() In C

I'm writing a program that scan words from a text file
how can I scan only the word that starts with space or number and end with a space
Without taking ',.()\t\n to my string
I know i can use the function scanf but didn't quite get how to use it that way
Another small question
I'm looking to count how many lines there are in my text
so I guess I should look for a "\n" sign to increase my count right?
First of all when you ask a question you need to follow some rules, here is a like for how to ask questions in stack overflow:
How to ask from the Stack Overflow Help Center
If I was doing this task I would read it char by char using a 'while loop' ends at EOF and inside of it a 'if' checks if you should add this char to your string and adding it using realloc.
For your second question put another 'if' before the first one that checks if the char == to '\n' and if so ++ to a int that count number of lines.
At the end it should look like this:
int *input = malloc(sizeof(int));
int linesLength = 0;
int now;
int lines = 0;
whlie ((now = getc(file)) != EOF){
if (now == '\n') {
lines += 1;
}
if (/* what chars you want out */){
/* add your char using realloc */
linesLength += 1;
}
}
Btw this is the exact same task I need to do till the 18 to this month and your name looks familiar, the point is, this is not a site for homework, pay attention for this next time.

Infinite loop when getting a line with getchar()

I was trying an exercise from K&R (ex 1-17), and I came up with my own solution.
The problem is that my program appears to hang, perhaps in an infinite loop. I omitted the NUL ('\0') character insertion as I find C generally automatically attaches it to the end of a string (Doesn't it?).
Can somebody please help me find out what's wrong?
I'm using the GCC compiler with Cygwin on win8(x64), if that helps..
Question - Print all input lines that are longer than 80 characters
#include<stdio.h>
#define MINLEN 80
#define MAXLEN 1000
/* getlin : inputs the string and returns its length */
int getlin(char line[])
{
int c,index;
for(index = 0 ; (c != '\n') && ((c = getchar()) != EOF) && (index < MAXLEN) ; index++)
line[index] = c;
return (index); // Returns length of the input string
}
main()
{
int len;
char chArr[MAXLEN];
while((len = getlin(chArr))>0)
{
/* A printf here,(which I had originally inserted for debugging purposes) Miraculously solves the problem!!*/
if(len>=MINLEN)
printf("\n%s",chArr);
}
return 0;
}
And I omitted the null('\0') character insertion as I find C generally automatically attaches it to the end of a string (Doesn't it?).
No, it doesn't. You're using getchar() to read input characters one at a time. If you put the chars in an array yourself, you'll have to terminate it yourself.
The C functions that return a string will generally terminate it, but that's not what you're doing here.
Your input loop is a little weird. The logical AND operator only executes the right-hand-side if the left-hand-side evaluates to false (it's called "short-circuiting"). Rearranging the order of the tests in the loop should help.
for(index = 0 ; (index < MAXLEN) && ((c = getchar()) != EOF) && (c != '\n'); index++)
line[index] = c;
This way, c receives a value from getchar() before you perform tests on its contents.
I'm not positive about what's wrong, but you don't provide the input to the program so I'm guessing.
My guess is that in getlin your variable c gets set to '\n' and at that point it never gets another character. It just keeps returning and looping.
You never SET c to anything inside your getlin function before you test it, is the problem.
C does not insert a NUL terminator at the end of strings automatically. Some functions might do so (e.g. snprintf). Consult your documentation. Additionally, take care to initialize all your variables, like c in getlin().
1) C doesn't add a final \0 to your string. You are responsible for using an array of at least 81 chars and puting the final \0 after the last character you write in it.
2) You're testing the value of c before reading it
3) Your program doesn't print anything because printf uses a buffer for I/O which is flushed when you send \n. Modify this statement to print a final \n:
printf("\n%s",chArr);
to become:
printf("%s\n",chArr);
4) To send an EOF to your program you should do a Ctrl+D under unix and I don't know if it's possible for windows. This may be the reason why the program never ends.

Trying to convert morse code to english. struggling

I'm trying to create a function to read Morse code from one file, convert it to English text, print the converted text to the terminal, and write it to an output file. Here's a rough start...
#define TOTAL_MORSE 91
#define MORSE_LEN 6
void
morse_to_english(FILE* inputFile, FILE* outputFile, char morseStrings[TOTAL_MORSE][MORSE_LEN])
{ int i = 0, compare = 0;
char convert[MORSE_LEN] = {'\0'}, *buffer = '\0';
//read in a line of morse string from file
// fgets(buffer, //then what?
while(((convert[i] = fgetc(inputFile)) != ' ') && (i < (MORSE_LEN - 1)))
{ i++;
}
if (convert[i + 1] == ' ')
convert[i + 1] = '\0';
//compare read-in string w/morseStrings
for (i = 48, compare = strcmp(convert, morseStrings[i]); //48 is '0'
i < (TOTAL_MORSE - 1) && compare != 0;
i++)
{ compare = strcmp(convert, morseStrings[i]);
}
printf("%c", (char)i);
}
I have initialized morseStrings to the morse code.
That's my function right now. It does not work, and I'm not really sure what approach to take.
My original algorithm plan was something like this:
1. Scan Morse code in from file, character by character, until a space is reached
1.1 save to a temporary buffer (convert)
2. loop while i < 91 && compare != 0
compare = strcmp(convert, morseString[i])
3. if (test ==0) print ("%c", i);
4. loop through this until eof
but.. I can't seem to think of a good way to test if the next char in the file is a space. So this has made it very difficult for me.
I got pretty frustrated and googled for ideas, and found a suggestion to use this algorithm
Read a line
Loop
-strchr() for a SPACE or EOL
-copy characters before the space to another string
-Use strcmp() and loop to find the letter
-Test the next character for SPACE.
-If so, output another space
-Skip to next morse character
List item
Endloop
But, this loops is kind of confusing. I would use fgets() (I think), but I don't know what to put in the length argument.
Anyways, I'm tired and frustrated. I would appreciate any help or insight for this problem. I can provide more code if necessary.
Your original plan looks fine. You're off by 1 when you check for the ' ' in the buffer, though. It's at convert[i], not convert[i + 1]. The i++ inside the loop doesn't happen when a space is detected.
I wouldn't use strchr(), to complicated.
Loop through the Inputfile reading a line
tokenize line with [strtok][1]
loop through tokens and save(best append) the single Letters to a Buffer
close looops and print
a bit of pseudocode for u
while(there is a next line){
tokens = strtok(line);
int i = 0;
while(tokens hasnext){
save to buffer}}
If you are concerned about the CPU time you can write a lookup table to find the values, something as a switch like this:
case '.-': code = "A"; break;
case '-...': code = "B"; break;
case '-.-.': code = "C"; break;
After you split the morse code by the spaces and send the diferent . and - combinations to the switch to get the original character.
I hope this help.
Best regards.

Resources