K&R exercise 1-22 Hints - c

The exercise states as follows
Write a program to "fold" long input lines in two or more shorter lines after the non-last blank character that occurs before the n-th columns of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.
My question is on how to implement the foldStrings function. I have tried some things but none of them worked.
Can you give me some hints on how to do this, but please don't write the solution down I want to figure it myself.
I have written some code but I am stuck at the folding part
#include <stdio.h>
#include <string.h>
int getline(char s[], int lim);
void emptystring(char s[]);
void foldStrings(char s[],int len);
int main(){
int len ;
char line[255];
while((len = getline(line,255))>0)
{
foldStrings(line,len);
}
return 0;
}
int getline(char s[],int lim)
{
int c , i ;
for( i = 0 ; i < lim-1 && ( c = getchar()) != EOF && c !='\n';++i)
s[i] = c;
if ( c == '\n')
{
s[i] = c;
++i;
}
return i;
}
void foldStrings(char s[], int len)
{
}
void emptystring(char s[])
{
int i;
int len = strlen(s);
for( i = 0 ; i < len ; ++i){
s[i] = 0 ;
}
}
I am stuck at the foldStrings function.
P.S I am using the empty string function to print the lines, so print a segmented line and then empty it, fill it up again and print it and so on.
Update
I have tried doing the foldStrings, here is one of my implementations
void foldStrings(char s[], int len)
{
int i ;
char temp[255];
for(i = 1;i < len-1 ;++i)
{
if( i % 16 != 0)
{
temp[i-1] = s[i-1];
}
else if(i%16 == 0)
{
printf("%s",temp)
emptystring(temp);
}
}
}

When getline() is done, s is not necessarily null character terminated.
// for( i = 0 ; i < lim-1 && ...
for( i = 0 ; i < lim-2 && ( c = getchar()) != EOF && c !='\n';++i)
...
s[i] = '\0'; // add
return i;
Same for foldStrings(). missing null character.
temp[i-1] = s[i-1];
temp[i] = '\0'; // add
Other problems may exist

This exercise is a little challenging.
At first, you don't need to buffer at all, as you only have two kinds of read chars (blank/non blank) and for a non blank character you always have to print it, so your main loop can be something like
while((c = getchar()) != EOF) {
...
}
(much the style all exercises in K&R are written)
look that when blank character are input, you only have to count them, and reset the counter on \n input.
As you ask not to reveal the final solution, I'll commit on that, but the trick is to count characters as you read the line, outputting on nonblanks (and counting) and not outputting (but only counting) on blank characters. If the character read is a blank and you have passed the limit, you'll fold the line (emit a \n)
EDITION 1
In my first attempt to write the code I discovered that the pattern blank->nonblank crossing the maximum line length boundary makes the need to break the line at the point of the first blank character and remember all the nonblank characters read so far. In that case, I'll need at most, the maximum output line length of storage (to store the non blank characters that happen to be in the data when we reach the maximum line length and have to break the line), and a maximum output line length of them have to be stored, as if I get more, for sure the line must be broken before that point.
My first attempt will be to store the number of blank characters read so far, followed by a buffer of maximum output line length (not input line length, which is unbounded, as specified in the problem). The possible statuses will be: (follow next edition of page)

Related

Kernighan and Ritchie C exercise 1-16

I tried to implement a solution for the exercise on the C language of K&R's book. I wanted to ask here if this could be considered a legal "solution", just modifying the main without changing things inside external functions.
Revise the main routine of the longest-line program so it will
correctly print the length of arbitrary long input lines, and as much
as possible of the text.
#include <stdio.h>
#define MAXLINE 2 ////
int get_line1(char s[], int lim)
{
int c, i;
for (i = 0; i < lim - 1 && ((c = getchar()) != EOF) && c != '\n'; i++) {
s[i] = c;
}
if (c == '\n') {
s[i] = c;
i++;
}
s[i] = '\0';
return i;
}
int main()
{
int len;
int max = MAXLINE;
char line[MAXLINE];
int tot = 0;
int text_l = 0;
while ((len = get_line1(line, max)) > 0) {
if (line[len - 1] != '\n') {
tot = tot + len;
}
if (line[1] == '\n' || line[0] == '\n') {
printf("%d\n", tot + 1);
text_l = text_l + (tot + 1);
tot = 0;
}
}
printf("%d\n", text_l);
}
The idea is to set the max lenght of the string considered for the array line ad 2.
For a string as abcdef\n , the array line will be ab. Since the last element of the array is not \n (thus the line we are considering is not over), we save the length up until now and repeat the cycle. We will get then the array made of cd, then ef and at the end we will get the array of just \n. Then the else if condition is executed, since the first element of this array is\n, and we print the tot length obtained from the previous additions. We add +1 in order to also consider the new character \n. This works also for odd strings: with abcdefg\n the process will go on up until we reach g\n and the sum is done correctly.
Outside the loop then we print the total amount of text.
Is this a correct way to do the exercise?
The exercise says to “Revise the main routine,” but you altered the definition of MAXLINE, which is outside of main, so that is not a valid solution.
Also, your code does not have the copy or getline routines of the original. Your get_line1 appears to be identical except for the name. However, a correction solution would use identical source code except for the code inside main.
Additionally, the exercise says to print “as much as possible of the text.” That is unclearly stated, but I expect it means to keep a buffer of MAXLINE characters (with MAXLINE at its original value of 1000) and use it to print the first MAXLINE−1 characters of the longest line.

Checking if a string is a palindrome - when I compile the program shuts down

I'm trying to write a C program which eats a string of a priori bounded length and returns 1 if it's a palindrome and 0 otherwise. We may assume the input consists of lower case letters.
This is a part of a first course in programming, so I have no experience.
Here's my attempt. As soon as I try to build/run it on CodeBlocks, the program shuts down. It's a shame because I thought I did pretty well.
#include <stdio.h>
#define MaxLength 50
int palindrome(char *a,int size) /* checks if palindrome or not */
{
int c=0;
for(int i=0;i<size/2;i++) /* for every spot up to the middle */
{
if (*(a+i)!=*(a+size-i-1)) /* the palindrome symmetry condition */
{
c++;
}
}
if (c==0)
{
return 1; /*is palindrome*/
}
else
return 0; /*is not palindrome*/
}
int main()
{
char A[MaxLength]; /*array to contain the string*/
char *a=&A[0]; /*pointer to the array*/
int i=0; /*will count the length of the string*/
int temp;
while ((temp=getchar())!='\n' && temp != EOF) /*loop to read input into the array*/
{
A[i]=temp;
i++;
}
if (palindrome(a,i)==1)
printf("1");
else
printf("0");
return 0;
}
Remark. I'm going to sleep now, so I will not be responsive for a few hours.
Your approach is ok, though you have a number of small errors. First,#define MaxLength=50 should be #define MaxLength 50 (the text to replace, space, then its replacement).
You should also provide a prototype for your palindrome() function before main():
int palindrome(char *a,int size);
...or just move the whole palindrome() function above main(). Either a prototype or the actual function definition should appear before any calls to the function happen.
Next issue is that you're looking for a null character at the end of the input string. C strings are usually null terminated, but the lines coming from the console aren't (if there's a terminator it would be added by your program when it decides to end the string) -- you should probably check for a newline instead (and ideally, for errors as well). So instead of
while ((temp=getchar())!='\0')
try
while ((temp=getchar())!='\n' && temp != EOF)
When you print your results in main(), you should have a newline at the end, eg. printf("1\n"); instead of printf("1");, to ensure the output buffer gets flushed so you can see the output as well as to end that output line.
Then in your palindrome() function, your for loop sytax is wrong -- the three parts should be separated with semicolons, not commas. So change:
for(int i=0,i<size/2,i++)
...to:
for(int i=0; i<size/2; i++)
You also have an extra closing brace for the loop body to remove.
After fixing all that, it seems to work...
This directive
#define MaxLength=50
is invalid. There should be
#define MaxLength 50
Change the loop in main the following way
int temp;
^^^
while ( i < MaxLength && ( temp = getchar () )!= EOF && temp != '\n' )
{
A[i] = temp;
i++;
}
Otherwise if to use the original loop you have to place the zero into the buffer directly using the alt key and the numeric keypad.
The function itself can be written simpler
int palindrome( const char *a, int size) /* checks if palindrome or not */
{
int i = 0;
while ( i < size / 2 && *( a + i ) == *( a + size - i - 1 ) ) ++i; {
return i == size / 2;
}

K&R exercise 1-18 gives segmentation fault when EOF is encountered

I've been stuck at this problem:
Write a Program to remove the trailing blanks and tabs from each input line and to delete entirely blank lines.
for the last couple of hours, it seems that I can not get it to work properly.
#include<stdio.h>
#define MAXLINE 1000
int mgetline(char line[],int lim);
int removetrail(char rline[]);
//==================================================================
int main(void)
{
int len;
char line[MAXLINE];
while((len=mgetline(line,MAXLINE))>0)
if(removetrail(line) > 0)
printf("%s",line);
return 0;
}
//==================================================================
int mgetline(char s[],int lim)
{
int i,c;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if( c == '\n')
{
s[i]=c;
++i;
}
s[i]='\0';
return i;
}
/* To remove Trailing Blanks,tabs. Go to End and proceed backwards removing */
int removetrail(char s[])
{
int i;
for(i = 0; s[i] != '\n'; ++i)
;
--i; /* To consider raw line without \n */
for(i > 0; ((s[i] == ' ') || (s[i] == '\t')); --i)
; /* Removing the Trailing Blanks and Tab Spaces */
if( i >= 0) /* Non Empty Line */
{
++i;
s[i] = '\n';
++i;
s[i] = '\0';
}
return i;
}
I am using gedit text editor in debian.
Anyway when I type text into the terminal and hit enter it just copies the whole line down, and if I type text with blanks and tabs and I press EOF(ctrl+D) I get the segmentation fault.
I guess the program is running out of memory and/or using memory 'blocks' out of its array, I am still really new to all of this.
Any kind of help is appreciated, thanks in advance.
P.S.: I tried using both the code from the solutions book and code from random sites on internet but both of them give me the segmentation fault message when EOF is encountered.
It's easy:
The mgetline returns the buffer filled with data you enter in two cases:
when new line char is encountered
when EOF is encountered.
In first case it put the new line char into the buffer, in the latter - it does not.
Then you pass the buffer to removetrail function that first tries to find the newline char:
for(i=0; s[i]!='\n'; ++i)
;
But there is no new line char when you hit Ctrl-D! Thus you get memory access exception as you pass over the mapped memory.

Program runs too slowly with large input - C

The goal for this program is for it to count the number of instances that two consecutive letters are identical and print this number for every test case. The input can be up to 1,000,000 characters long (thus the size of the char array to hold the input). The website which has the coding challenge on it, however, states that the program times out at a 2s run-time. My question is, how can this program be optimized to process the data faster? Does the issue stem from the large char array?
Also: I get a compiler warning "assignment makes integer from pointer without a cast" for the line str[1000000] = "" What does this mean and how should it be handled instead?
Input:
number of test cases
strings of capital A's and B's
Output:
Number of duplicate letters next to each other for each test case, each on a new line.
Code:
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
int main() {
int n, c, a, results[10] = {};
char str[1000000];
scanf("%d", &n);
for (c = 0; c < n; c++) {
str[1000000] = "";
scanf("%s", str);
for (a = 0; a < (strlen(str)-1); a++) {
if (str[a] == str[a+1]) { results[c] += 1; }
}
}
for (c = 0; c < n; c++) {
printf("%d\n", results[c]);
}
return 0;
}
You don't need the line
str[1000000] = "";
scanf() adds a null terminator when it parses the input and writes it to str. This line is also writing beyond the end of the array, since the last element of the array is str[999999].
The reason you're getting the warning is because the type of str[10000000] is char, but the type of a string literal is char*.
To speed up the program, take the call to strlen() out of the loop.
size_t len = strlen(str)-1;
for (a = 0; a < len; a++) {
...
}
str[1000000] = "";
This does not do what you think it does and you're overflowing the buffer which results in undefined behaviour. An indexer's range is from 0 - sizeof(str) EXCLUSIVE. So you either add one to the
1000000 when initializing or use 999999 to access it instead. To get rid of the compiler warning and produce cleaner code use:
str[1000000] = '\0';
Or
str[999999] = '\0';
Depending on what you did to fix it.
As to optimizing, you should look at the assembly and go from there.
count the number of instances that two consecutive letters are identical and print this number for every test case
For efficiency, code needs a new approach as suggeted by #john bollinger & #molbdnilo
void ReportPairs(const char *str, size_t n) {
int previous = EOF;
unsigned long repeat = 0;
for (size_t i=0; i<n; i++) {
int ch = (unsigned char) str[i];
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
}
char *testcase1 = "test1122a33";
ReportPairs(testcase1, strlen(testcase1));
or directly from input and "each test case, each on a new line."
int ReportPairs2(FILE *inf) {
int previous = EOF;
unsigned long repeat = 0;
int ch;
for ((ch = fgetc(inf)) != '\n') {
if (ch == EOF) return ch;
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
return ch;
}
while (ReportPairs2(stdin) != EOF);
Unclear how OP wants to count "AAAA" as 2 or 3. This code counts it as 3.
One way to dramatically improve the run-time for your code is to limit the number of times you read from stdin. (basically process input in bigger chunks). You can do this a number of way, but probably one of the most efficient would be with fread. Even reading in 8-byte chunks can provide a big improvement over reading a character at a time. One example of such an implementation considering capital letters [A-Z] only would be:
#include <stdio.h>
#define RSIZE 8
int main (void) {
char qword[RSIZE] = {0};
char last = 0;
size_t i = 0;
size_t nchr = 0;
size_t dcount = 0;
/* read up to 8-bytes at a time */
while ((nchr = fread (qword, sizeof *qword, RSIZE, stdin)))
{ /* compare each byte to byte before */
for (i = 1; i < nchr && qword[i] && qword[i] != '\n'; i++)
{ /* if not [A-Z] continue, else compare */
if (qword[i-1] < 'A' || qword[i-1] > 'Z') continue;
if (i == 1 && last == qword[i-1]) dcount++;
if (qword[i-1] == qword[i]) dcount++;
}
last = qword[i-1]; /* save last for comparison w/next */
}
printf ("\n sequential duplicated characters [A-Z] : %zu\n\n",
dcount);
return 0;
}
Output/Time with 868789 chars
$ time ./bin/find_dup_digits <dat/d434839c-d-input-d4340a6.txt
sequential duplicated characters [A-Z] : 434893
real 0m0.024s
user 0m0.017s
sys 0m0.005s
Note: the string was actually a string of '0's and '1's run with a modified test of if (qword[i-1] < '0' || qword[i-1] > '9') continue; rather than the test for [A-Z]...continue, but your results with 'A's and 'B's should be virtually identical. 1000000 would still be significantly under .1 seconds. You can play with the RSIZE value to see if there is any benefit to reading a larger (suggested 'power of 2') size of characters. (note: this counts AAAA as 3) Hope this helps.

Writing in the location outside of array

I've just started learning programming. This is my first post. I'm reading a book "C Programming Language" by Kernighan and Ritchie, and I came across an example that I don't understand (section 1.9, p 30).
This program takes text as input, determines the longest line, and prints it.
Char array line[MAXLINE] is declared, where MAXLINE is 1000. This should mean that the last element of this array has index of MAXLINE-1, which is 999.
However, if you look at function getline, which is being passed line[] array as an argument (and MAXLINE as lim), it appears that if user input is a line longer than MAXLINE, i will be incremented until i = lim, that is, i = MAXLINE. Therefore, the statement line[i] = '\0' will be line[MAXLINE] = '\0'.
This looks wrong to me - how can we write to the line[MAXLINE] location, if the size of line[] is MAXLINE. Wouldn't it be writing into the location outside of the array?
The only explanation I can come up with is that when declaring char array[size], C language actually creates char array[size+1] array, where the last element is reserved for the NULL character. If so, this is pretty confusing, and isn't mentioned in the book. Can anyone confirm this, or explain what's going on?
#include <stdio.h>
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int maxline);
void copy(char to[], char from[]);
/* print the longest input line */
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/* getline: read a line into s, return length */
int getline(char s[],int lim)
{
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
This for loop appears to be doing the reading in getline:
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
It looks like i is incremented until it reaches lim - 1, not lim (where lim here is equal to MAXLINE in the case you were talking about). Hence, if the line is longer than MAXLINE, it stops after reading MAXLINE-1 characters, and tacks on the '\0' at the end like you expect.
If you look at this line, then you can see that it stops the loop two characters before the limit. i < lim -1
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
If the char was a \n it is appended, so the 0-Byte is exactly at the limit in this case, if the line is exactly one byte shorter then the limit (which is correct, because the 0-Byte is also included).
No, I think it is clean.
Note that since the book was written, POSIX has standardized a getline() function with a completely different interface; this can cause some grief, but it is fixable by renaming the function from K&R.
The code is:
int getline(char s[],int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c=getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
Let's consider 2 cases:
998 characters followed by newline.
999 characters followed by newline.
In the first case, when the character before the newline is read, i is 997, which is less than 999 (lim-1), so the getchar() is executed, the character is neither EOF nor newline, and s[997] is assigned, and i is incremented to 998. Since i is still less than 999, the newline is read, and the loop is terminated. Because c is the newline, s[998] is given the newline and i is incremented to 999. Then the assignment s[i] = '\0'; writes to element 999, which is safe.
The analysis in the second case is similar. When the character before the newline is read, i is 998, which is less than 999, so getchar() is executed, the character is neither EOF nor newline, so s[998] is assigned, and i is incremented to 999. Since i is no longer less than 999, the loop exits without reading the newline; since c is not a newline, the body of the if after the loop is not executed; then the null is written to s[999], which is safe.
If EOF is detected before the newline (so the file doesn't end with a newline and technically isn't a text file according to the C standard), the loop is safely broken without overflowing the buffer.
Is there a case that isn't covered?
This is called testing the boundary conditions. It is important to test just below a limit (to make sure it works OK) and at the limit (to ensure it handles that OK). Most of the time, the algorithm doesn't need more than one test just below and one test at the limit; sometimes, if the algorithm handles several numbers either side of a limit (e.g. average of 3 cells), then you have to do more testing at the upper boundary. Lower boundary testing is also important — testing for 0, 1, 2, ... is very valuable.
general answer
reading/writing outside of allocated memory is undefined behaviour.
In many cases it will lead to the dreaded Segmentation fault.
In some cases you might get away due to sheer luck (e.g. because the actual memory you have accessed is physically/logically existing and not used otherwise).
the simple answer is: do not do this!! protect your code against accessing out-of-bounds memory.
C does never do any magic, like allocating n+1 bytes when you really only asked you to allocate n bytes.
as for your specific example
for (i=0; i < lim-1 /* ... */ ; ++i)
this will not really increment i up to lim, as the condition makes sure that i is smaller than lim-1, so as soon as it reaches lim-1 (which is still a valid index within s[]) it will stop the for-loop..

Resources