C program to sort characters in a string - c

I have this program in C I wrote that reads a file line by line (each line has just one word), sorts the alphabets, and then displays the sorted word and the original word in each line.
#include<stdio.h>
int main()
{
char line[128];
int i=0;
int j;
int length;
while(fgets(line,sizeof line,stdin) != NULL)
{
char word[128];
for (i=0; line[i] != '\0'; i++)
{
word[i]=line[i];
}
while (line[i] != '\0')
i++;
length=i;
for (i=length-1; i >=0; i--)
{
for (j=0; j<i; j++)
{
if (line[j] > line[i])
{
char temp;
temp = line[j];
line[j] = line[i];
line[i]=temp;
}
}
}
printf("%s %s",line,word);
}
return 0;
}
I'm compiling and running it using the following bash commands.
gcc -o sign sign.c
./sign < sample_file | sort > output
The original file (sample_file) looks like this:
computer
test
file
stack
overflow
The output file is this:
ackst stack
cemoprtu computer
efil file
efloorvw overflow
er
estt test
ter
ter
I'm having two issues:
The output file has a bunch of newline characters at the beginning (ie. about 5-7 blank lines before the actual text begins)
Why does it print 'ter' twice at the end?
PS - I know these are very elementary questions, but I only just started working with C / bash for a class and I'm not sure where I am going wrong.

Problem 1
After this code, the variable line contains a line of text, including the newline character from the end of the string
while(fgets(line,sizeof line,stdin) != NULL)
{
This is why you are getting the "extra" newlines. The ASCII value for newline is less than the ASCII value for 'A'. That is why the newlines end up at the beginning of each string, once you've sorted the characters. E.g. "computer\n" becomes "\ncemoprtu".
To solve this, you can strip the newlines off the end of your strings, after the for-loop
if(i > 0 && word[i-1] == '\n')
{
word[i-1] = '\0';
line[i-1] = '\0';
--i;
}
...
printf("%s %s\n",line,word); /* notice the addition of the newline at the end */
This happens to solve problem 2, as well, but please read on, to see what was wrong.
Problem 2
After the loop
for (i=0; line[i] != '\0'; i++) { /* */ }
The string word will not be null-terminated (except by blind luck, since it is ready random uninitialized memory). This is why you get the "ter", because that is part of the data you left behind when you copied the word "computer" to word.
Problem 3
After the loop
for (i=0; line[i] != '\0'; i++) { /* */ }
The value of line[i] != '\0' will always be false. This means that this code will do nothing
while (line[i] != '\0')
i++;
It might make the problem more obvious if I replace the for-loop and the while-loop with basically identical code, using goto:
i=0;
begin_for_loop:
if(line[i] != '\0')
{
{
word[i]=line[i];
}
i++;
goto begin_for_loop;
}
begin_while_loop:
if(line[i] != '\0')
{
i++;
goto begin_while_loop;
}
(btw, most professional programmers will do anything from laugh to yell at you if you mention using goto :) I only am using it here to illustrate the point)
A tip I find handy is to draw out my arrays, variables etc on a piece of paper, then trace through each line of my code (again, on paper) to debug how it works.

Related

Why does the output have random characters at the end?

I was trying to solve a problem in toph.co platform. I have made a code. It is not working as it should be. Giving me the correct output in some cases. But when I am trying a case with 'o' it is entering in a loop. And maybe the problem is there. But I am unable to find that out.
Problem link: https://toph.co/p/better-passwords
Please help me to solve the problem in my code. I am using c programming language. As I am new in programming, I am not getting the point that is wrong.
I have tried it in many ways. Modifying the code again and again. Now I am becoming mad.
#include <stdio.h>
#include<string.h>
int main()
{
char s[40], i, j;
int len;
gets(s);
len= strlen(s);
for(i=0; i<=len; i++)
{
if(s[i] == 's')
{
s[i]= '$';
}
else if(s[i] == 'i')
{
s[i]= '!';
}
else if(s[i] == 'o')
{
s[i]= '(';
for(j=len; j>i; j--)
{
s[j]=s[j-1];
len= strlen(s);
}
s[i+1]=')';
}
else
{
continue;
}
len= strlen(s);
}
if (s[0]>='a' && s[0]<='z')
{
s[0]= s[0]- 32;
}
int new_len=strlen(s);
s[new_len]='.';
puts(s);
return 0;
}
I expect the output Un$()ph!$t!cated. , but it is showing Un$()ph!$t!cated.....'Many unwanted charecter'.....
Your string null byte is missing after manipulations, easiest way to avoid such problems - is to initialize whole character array into zero bytes:
char s[40] = {0}, // was char s[40], uninitialized !
Also do you notice a compile message "warning: the `gets' function is dangerous and should not be used" ?
gets() is dangerous, because it is not protected from buffer overflow - try to run your program with very long string exceeding your buffer s capacity and you will get a crash :
* stack smashing detected *: terminated Aborted (core dumped)
Use fgets() instead of gets(), like so :
fgets(s, 39, stdin);
if (s[strlen(s)-1] == '\n') s[strlen(s)-1] = '\0'; // deleting newline character
Notice, that we read here BUFFER_SIZE - 1 characters here, i.e.- by 1 char less, than your buffer is able to hold (40), because if we enter long string which is of full buffer size,- then your code extending string will smash stack once again. You need to be serious about buffer overflows.
As most commentators wrote: you don't put the end-of-string character '\0' behind the extended string. Strings in C are marked by this extra character at the end. It is needed to find the end of a string.
Example: A string "abc" is actually a sequence of 4 characters, 'a', 'b', 'c', and '\0'. Its length is 3, and its s[3] contains '\0'.
Your loop:
for(j=len; j>i; j--)
{
s[j]=s[j-1];
len= strlen(s);
}
In the first turn of the loop the character at s[len] is replaced by the character at s[len-1]. This overwrites the original '\0' with the last (visible) character of your string.
If you change the loop into:
for (j = len; j > i; j--)
{
s[j + 1] = s[j];
}
the '\0' will be copied in the first run.
Note 1: Move the assignment to len behind the loop.
Note 2: Make sure that your variable s is big enough for all inputs.
Note 3: See the whitespace I use? Adopt a good code style and stick to it.
There are a couple of places in your code where you are overwriting the "\0".
These are:
s[j]=s[j-1];
And
s[new_len]='.';
So the rectified code would be....
int main()
{
char s[40], i, j;
int len;
gets(s);
len= strlen(s);
for(i=0; i<=len; i++)
{
if(s[i] == 's')
{
s[i]= '$';
}
else if(s[i] == 'i')
{
s[i]= '!';
}
else if(s[i] == 'o')
{
s[i]= '(';
for(j=len+1; j>i; j--) //j needs to be one more than the length
{
s[j]=s[j-1];
len= strlen(s);
}
s[i+1]=')';
}
else
{
continue;
}
len= strlen(s);
}
if (s[0]>='a' && s[0]<='z')
{
s[0]= s[0]- 32;
}
int new_len=strlen(s);
s[new_len]='.';//add a '.'
s[new_len+1] = '\0'; //add the terminating character
puts(s);
return 0;
}

K&R exercise 1-18 gives segmentation fault when EOF is encountered

I've been stuck at this problem:
Write a Program to remove the trailing blanks and tabs from each input line and to delete entirely blank lines.
for the last couple of hours, it seems that I can not get it to work properly.
#include<stdio.h>
#define MAXLINE 1000
int mgetline(char line[],int lim);
int removetrail(char rline[]);
//==================================================================
int main(void)
{
int len;
char line[MAXLINE];
while((len=mgetline(line,MAXLINE))>0)
if(removetrail(line) > 0)
printf("%s",line);
return 0;
}
//==================================================================
int mgetline(char s[],int lim)
{
int i,c;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if( c == '\n')
{
s[i]=c;
++i;
}
s[i]='\0';
return i;
}
/* To remove Trailing Blanks,tabs. Go to End and proceed backwards removing */
int removetrail(char s[])
{
int i;
for(i = 0; s[i] != '\n'; ++i)
;
--i; /* To consider raw line without \n */
for(i > 0; ((s[i] == ' ') || (s[i] == '\t')); --i)
; /* Removing the Trailing Blanks and Tab Spaces */
if( i >= 0) /* Non Empty Line */
{
++i;
s[i] = '\n';
++i;
s[i] = '\0';
}
return i;
}
I am using gedit text editor in debian.
Anyway when I type text into the terminal and hit enter it just copies the whole line down, and if I type text with blanks and tabs and I press EOF(ctrl+D) I get the segmentation fault.
I guess the program is running out of memory and/or using memory 'blocks' out of its array, I am still really new to all of this.
Any kind of help is appreciated, thanks in advance.
P.S.: I tried using both the code from the solutions book and code from random sites on internet but both of them give me the segmentation fault message when EOF is encountered.
It's easy:
The mgetline returns the buffer filled with data you enter in two cases:
when new line char is encountered
when EOF is encountered.
In first case it put the new line char into the buffer, in the latter - it does not.
Then you pass the buffer to removetrail function that first tries to find the newline char:
for(i=0; s[i]!='\n'; ++i)
;
But there is no new line char when you hit Ctrl-D! Thus you get memory access exception as you pass over the mapped memory.

C Program won't remove comments that take up the whole line

So I'm working through the K&R C book and there was a bug in my code that I simply cannot figure out.
The program is supposed to remove all the comments from a C program. Obviously I'm just using stdin
#include <stdio.h>
int getaline (char s[], int lim);
#define MAXLINE 1000 //maximum number of characters to put into string[]
#define OUTOFCOMMENT 0
#define INASINGLECOMMENT 1
#define INMULTICOMMENT 2
int main(void)
{
int i;
int isInComment;
char string[MAXLINE];
getaline(string, MAXLINE);
for (i = 0; string[i] != EOF; ++i) {
//finds whether loop is in a comment or not
if (string[i] == '/') {
if (string[i+1] == '/')
isInComment = INASINGLECOMMENT;
if (string[i+1] == '*')
isInComment = INMULTICOMMENT;
}
//fixes the problem of print messing up after the comment
if (isInComment == INASINGLECOMMENT && string[i] == '\0')
printf("\n");
//if the line is done, restates all the variables
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
//prints current character in loop
if(isInComment == OUTOFCOMMENT && string[i] != EOF)
printf("%c", string[i]);
//checks to see of multiline comment is over
if(string[i] == '*' && string[i+1] == '/' ) {
++i;
isInComment = OUTOFCOMMENT;
}
}
return 0;
}
So this works great except for one problem. Whenever a line starts with a comment, it prints that comment.
So for instance, if I had a line that was simply
//this is a comment
without anything before the comment begins, it will print that comment even though it's not supposed to.
I thought I was making good progress, but this bug has really been holding me up. I hope this isn't some super easy thing I've missed.
EDIT: Forget the getaline function
//puts line into s[], returns length of that line
int getaline(char s[], int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c = getchar()) != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
There are many problems in your code:
isInComment is not initialized in function main.
as pointed by others, string[i] != EOF is wrong. You need to test for end of file more precisely, especially for files that do not end with a linefeed. This test only works if char type is signed and EOF is a valid signed char value. It will nonetheless mistakenly stop on a stray \377 character, which is legal in a string or in a comment.
When you detect the end of line, you read another line and reset i to 0, but i will be incremented by the for loop before you test again for single line comment... hence the bug!
You do not handle special cases such as /* // */ or // /*
You do not handle strings. This is not a comment: "/*", nor this: '//'
You do not handle \ at end of line (escaped linefeed). This can be used to extend single line comments, strings, etc. There are more subtle cases related to \ handling and if you really want completeness, you should handle trigraphs too.
Your implementation has a limit for line size, this is not needed.
The problem you are assigned is a bit tricky. Instead of reading and parsing lines, read one character at a time and implement a state machine to parse escaped linefeeds, strings, and both comment styles. The code is not too difficult if you do it right with this method.
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
When you start a new line, you initialize i to 0. But then in the next iteration:
for (i = 0; string[i] != EOF; ++i)
i will be incremented, so you'll begin the new line with index 1. Therefore there is a bug when the line begins with //.
You can see that it solves the problem if you write instead:
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
though it's usually considered as bad style to modify for loop indices inside the loop. You may redesign your implementation in a more readable way.

C filling an array from a text file

I'm trying to fill an array from a text file. I'm using fgetc and my problem is dealing with the newline characters that are in the text file. I've currently got,
for(i = 0; i < rows; i++){
for(j = 0; j < columns; j++){
if((fgetc(fp) == '\n') || (fgetc(fp) == '\r')){
fgetc(fp);
array[i][j] = fgetc(fp);
else{
array[i][j] = fgetc(fp);
}
printf("i %d j %d char %c code %d\n", i, j, array[i][j], array[i][j]);
}
}
The idea is that if there's a newline character I want to advance the file pointer while in the same i,j position of the loop so I can get the next character. The output for this is jumbled for the first two rows and then it starts reading characters with character code -1. Am I doing something terribly wrong?
Each call to fgetc will advance the file pointer. Try calling it once:
int c = fgetc(fp);
then test the value of c. Store it if you want or go through the loop again.
In your first if() statement, there's a bit of an issue. When you do this:
if((fgetc(fp) == '\n') || (fgetc(fp) == '\r')){
fgetc(fp);
array[i][j] = fgetc(fp);
You actually are calling fgetc(fp) four times. Twice in the if() statement, twice later on.
Perhaps you are looking more for something like this:
for(i = 0; i < rows; i++){
for(j = 0; j < columns; j++){
int test = fgetc(fp);
if(test != '\n' && test != '\r')
array[i][j] = test;
//We want to "undo" the last j++ if we got a whitespace
else
j--;
printf("i %d j %d char %c code %d\n", i, j, array[i][j], array[i][j]);
}
}
In this example, you call fgetc(fp) exactly once per iteration, and if it's not a \n or \r, you put it in your array.
I'm sorry, I have little experience with fgetc(). If you notice something incredibly awful with what I've done, please notify me!
I can immediately see one source of error. In the following line:
if((fgetc(fp) == '\n') || (fgetc(fp) == '\r'))
There are 2 calls to fgetc(). This means that if the first call does not return '\n', another call will be made whose return value is then compared to '\r'. This has the effect of advancing the file pointer twice, as the pointer is advanced each time you call fgetc . A better way to do this would be to fetch one character and then test whether it is '\n' or '\r', and only then incrementing the file pointer with another call to fgetc if this is true. For example:
char letter = fgetc(fp);
if((letter == '\n') || (letter == '\r')
...
...
Try this and see if you still get the same error.
I believe you are getting the character in your evaluation statement twice. Also, typically the CRLF (carriage return and line feed) end of line characters can be two characters. Read http://en.wikipedia.org/wiki/Newline for details on this.
#include <stdio.h>
int main ()
{
FILE *fp;
int c;
fp = fopen("file.txt","r");
if(fp == NULL)
{
fprintf(stderr,"Error opening file");
return(-1);
}
do
{
c = fgetc(fp);
if ((c == '\n') || (c == '\r')) {
fgetc(fp); // skip CR or LF and advance a character
} else {
printf("%c", c); // print all other characters
}
}while(c != EOF);
fclose(fp);
return(0);
}
This was a quick hit at the code from memory. I don't have a compiler readily available but I think it is correct.

Printing C Array to a terminal

I need to read in a file from C, store it in an array and print its contents. For some reason I keep seeing octal in my output near the end. I am dynamically creating the array after counting how many lines and characters are in it after opening the file.
output:
Abies
abies
abietate
abietene
abietic
abietin
\320ΡΏ_\377Abietineae --> umlaut? where did he come from?
y\300_\377abietineous
code:
int main(int argc, char ** argv) {
char c = '\0';
FILE * file;
int i = 0, j = 0, max_line = 0, max_char_per_line = 0;
/* get array limits */
file = fopen(argv[1], "r");
while ((c = fgetc(file)) != EOF){
if (c == '\n'){
max_line++; j++;
if (j > max_char_per_line){
max_char_per_line = j;
}
j = 0;
continue;
}
j++;
}
rewind(file);
/* declare array dynamically based on max line and max char */
char word[max_line][max_char_per_line];
/*read in file*/
j = 0; c = '\0';
while ((c = fgetc(file)) != EOF){
if (c == '\n'){
word[i][j] = '\0';
i++; j=0;
continue;
}
word[i][j] = c;
j++;
}
word[i][j] = '\0';
fclose(file);
for (i = 0; i < max_line; i++){
printf("%s\n", word[i]);
}
return 0;
}
Change read routine:
if (c == '\n'){
word[i][j] = 0x0;
i++; j=0;
continue;
}
and add the "\n" back in the printf routine.
for (i = 0; i < max_line; i++){
printf("%s\n", word[i]);
}
C strings are zero-terminated, not "\n"-terminated, so when you printf()ed them, printf() did not know where to stop printing.
You aren't terminating your strings. You need to add the null-terminator: \0, after the last character for each line.
In your first loop, you determine enough space for the longest line, including a newline character.
If you want to keep the newlines in your input array, just add 1 to max_char_per_line, and add the null-terminator after the newline character when you finish each line in your second loop.
If you don't need the newline in your input array, instead simply use that space for the null-terminator.
Not that it explains exactly the phenomenon you observe, but it may. You do not seem to take into account the terminating zero byte when calculating array boundaries. Just ++ the max_char_per_line after doing the calculations. And don't forget to add this zero byte if the array isn't guaranteed to be zero-initialized.
edit: do you see these lines after the output or in one these lines of output?

Resources