C number of line in the file (UNIX) - c

I'm trying to find the total number of lines in a text file, but it's not working (the final line count is 0 - see below). Here's the code:
#define BUFFER_SIZE 1
int lineNumber = 0;
int columnNumber = 0;
char *byteCurrent;
while (read(openFile, &byteCurrent, BUFFER_SIZE) > 0)
{
if (byteCurrent[0] != '\0') columnNumber++;
if (byteCurrent[0] == '\n') lineNumber++;
printf("%c", byteCurrent);
}

You have many problems with this code. The first is that you have an uninitialized pointer byteCurrent, but that doesn't matter since you don't actually use what it points to (which is just some seemingly random location) but you use a pointer to the pointer. When you do &byteCurrent you get a pointer to the variable byteCurrent which is of type char **.
That's just one problem, another is that there is no string terminator in a file. If you get a zero when reading (which is what '\0' is) it's because there is an actual zero in the file, not because you get to the end of something. This leads columnNumber to count the number of characters in the file and not any column number.
The solution to the first problem is to use a plain char variable:
char byteCurrent;
The solution to the second problem I don't know, because I don't know what your columnNumber variable is supposed to count.

Related

Separating an input file into two different 2d Arrays

I'm currently working on an assignment that requires me to take an input file, separate it and store the contents into two different arrays.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char DataMem[32][3];
int RegMem[32][10];
char line[100][21]; //Holds the value for each line in the input file
int i = 0;
int j = 0;
while(fgets(line[i], 20, stdin) != NULL)
{
line[i++];
if(line[i] == " ")
DataMem[j] = line[i];
//printf("%s", line[3]);
}
return 0;
}
Suppose the input file looks something like:
95864312
68957425
-136985475
36547566
24957986
1
45
98
where the first values before the 1, are stored into the array named line, and the lines following the blank line need to be put into the array DataMem.
Can anyone point me into the direction as to how to do this? I can fill in the array line correctly, however I am having a hard time stopping the fill at that point and subsequently filling the rest of the file into the array datamem.
Thank you
Several issues; one is that (as Some programmer dude says) is that you do i++ so your test of line[i] == " " references uninitialized memory.
You also have a problem in that you cannot compare strings using ==, you must use strcmp(). The issue is that strings in C are actually pointers, so
char *foo = "Hello";
char *bar = "Hello";
if (foo == bar)
printf("This will never print out\n")
is not going to do what you expect--you will never see the printout, as the way this will (likely) compile is that foo will be set to point to an address in memory (say 0x1000) that has 'H' in 0x1000, 'e' in 0x1001, etc, up to 'o' in 0x1004, and '\0' in 0x1005. The bar variable will point to some other address (say 0x2000) which will hold the string "Hello" also. So while both strings are the same, the if test is actually testing if (0x1000 == 0x2000), which will fail--you need to do if (!strcmp(foo, bar)) to actually test the contents.
[Note: this example is actually flawed in that most modern compilers will simply create one instance of the "Hello" string in read-only memory and point both variables at it, so the if test would actually work in this case. But you shouldn't rely on that and it definitely does not hold for the general case.]
Your test on line[i] looks wrong also, as I suspect you want to copy strings that start with a space, not strings that are exactly " ", so I suspect you actually want to test on line[i][0]. But I can't be sure without knowing your assignment.
Your declaration of DataMem is incorrect also--you have declared it as 32 3-char entries, but you are writing line[i] into it and line[i] is a pointer. You either need to declare all instances big enough to hold the entire string you want (presumably the same 21 bytes that line can hold) and copy into it, or you need to declare it as an array of pointers (char *DataMem[32]). There is a key difference you need to understand: if you copy the string, then if you modify DataMem's view of the string, line's view of the string is unchanged. If you simply copy the pointer then changing one string changes both (because they are both pointing at exactly the same memory). Obviously, copying the string is slower and takes more memory (well, except for very short strings).
The magic numbers are bad as well. Instead of 20 and 21, for example, I would do #define MAX_STRING_LEN 20 and use it in the code. (Good job remembering to declare the array big enough to hold the terminating NIL by the way. However fgets() already is aware of the need and will read in up to one fewer characters so there is room for the NIL. You should be passing in 21 not 20). Also, I would pass sizeof(line[i]) as the argument to fgets(), not MAX_STRING_LEN (and certainly not 20). That way if the size of line[i] ever changes the code will still be correct; if you pass in the same dimension that you used to declare the variable someone might change it without realizing they need to change it here too.
Finally, you need to bounds-check inside your loop. What happens if the input is longer than the 100 entries you declared for line[]? Without a test you run the risk of writing beyond your variable boundary (which tends to lead to really hard-to-find bugs). A very useful macro is
#define NELEM(x) (sizeof(x) / sizeof(*(x)))
which you could use to do the test:
if (i >= NELEM(line)) {
printf("Data overflow\n");
exit(1);
}
so you don't need to embed the 100 (or the #define you replace it with) inside your code. (The #define works by taking the size of the entire data structure and dividing it by the size of the first element in it. And all the parentheses are actually required).
#include <stdio.h>
int main()
{
char DataMem[32][3];
char line[100][21]; //Holds the value for each line in the input file
for(int i = 0; fgets(line[i], 20, stdin) != NULL ; ++i)
{
if('\r' == line[i][0] || '\n' == line[i][0]) {
break;
}
printf("line[%d] = %s",i, line[i]);
}
for(int i = 0; fgets(DataMem[i], 20, stdin) != NULL ; ++i)
{
printf("DataMem[%d] = %s",i, DataMem[i]);
}
return 0;
}

Understanding character pointers in a while loop

I am learning C and a I came across this function in my study materials. The function accepts a string pointer and a character and counts the number of characters that are in the string. For example for a string this is a string and a ch = 'i' the function would return 3 for 3 occurrences of the letter i.
The part I found confusing is in the while loop. I would have expected that to read something like while(buffer[j] != '\0') where the program would cycle through each element j until it reads a null value. I don't get how the while loop works using buffer in the while loop, and how the program is incremented character by character using buffer++ until the null value is reached. I tried to use debug, but it doesn't work for some reason. Thanks in advance.
int charcount(char *buffer, char ch)
{
int ccount = 0;
while(*buffer != '\0')
{
if(*buffer == ch)
ccount++;
buffer++;
}
return ccount;
}
buffer is a pointer to a set of chars, a string, or a memory buffer holding char data.
*buffer will dereference the value at buffer, as a char. This can be compared with the null character.
When you add to buffer - you are adding to the address, not the value it points to, buffer++ adds 1 to the address, pointing to the next char. This means that now *buffer results in the next character.
In the loop you are incrementing the pointer buffer until it points to the null character, at which point you know you scanned the whole string. Instead of buffer[j], which is equivalent to *(buffer+j), we are incrementing the pointer itself.
When you say buffer++ you increment the address stored in buffer by one.
Once you internalize how pointers work, this code is cleaner than the code that uses a separate index to scan the character string.
In C and C++, arrays are stored in sequence, and an array is stored according to its first address and length.
Therefore *buffer is actually the address of the first byte, and is synonymous with buffer[0]. Because of this, you can use buffer as an array, like this:
int charcount(char *buffer, char ch)
{
int ccount = 0;
int charno = 0;
while(buffer[charno] != '\0')
{
if(buffer[charno] == ch)
ccount++;
charno++;
}
return ccount;
}
Note that this works because strings are null terminated - if you don't have a null termination in the character array pointed to by *buffer it will continue reading forever; you lose the bit where c knows how long the array is. This is why you see so many c functions to which you pass a pointer and a length - the pointer tells it the [0] position of the array, and the size you specify tells it how far to keep reading.
Hope this helps.

Program only works if dummy char array is declared [C]

The following code will print to the file correctly if char finalstr[2048]; is declared, however if I remove it (since it's not used anywhere) the program prints garbage ascii instead. This makes me believe it's something related to memory, however I have no clue.
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp;
FILE *fp2;
char str[2048];
char finalstr[2048];
fp = fopen("f_in.txt", "r");
fp2 = fopen("f_out.txt", "w");
while(fgets(str,2047,fp))//read line by line until end of file
{
int i;
for(i=0;i<=strlen(str);i++)//go trough the string cell by cell
{
if(str[i]>47 && str[i]<58 && str[i+1]>47 && str[i+1]<58)//from 0 to 9
{
char temp[2];//to hold temporary two digit string number
temp[0]=str[i];
i++;
temp[1]=str[i];
if(atoi(temp)<27)//if it's an upper case letter
fprintf(fp2,"%c",atoi(temp)+64);
else//if it's lowercase, skip the special characters between Z and a
fprintf(fp2,"%c",atoi(temp)+70);
}
else fprintf(fp2,"%c",str[i]);
}
}
fclose(fp);
fclose(fp2);
}
Input
20343545 3545 27 494140303144324738 343150 404739283144: ffabcd. 094540' 46 3546?
01404146343144 283127474635324738 404739283144 09 453131 3545 abcdefYXWVUTSRQP
2044474546 3931. 09 37404149 27 384146!
Output if finalstr[] is declared
This is a wonderful hex number: ffabcd. Isn' t it?
Another beautiful number I see is abcdefYXWVUTSRQP
Trust me. I know a lot!
Output if finalstr[] is not declared
?99? 9? 9 ?9999?9?9 99? 9?999?: ffabcd. ??9' ? 9??
((((.(( (((((.((. ((.((( ( ((( .( abcdefYXWVUTSRQP
øòòøò øò. ø òòòò ø òòò!
I did notice that the first if() statement could cause an overflow, however replacing <= with < had no effect on the end result.
I really wonder what the explanation behind this is, and whether it's C specific or if it would have happened in C++ too.
The main problem is with the temporary string you're using. It's not long enough to store a null terminating character, so you have an unterminated string.
Make the array 3 bytes long and add the terminator:
char temp[3];//to hold temporary two digit string number
temp[0]=str[i];
i++;
temp[1]=str[i];
temp[2]=0;
Also, you're looking too far off of the end of the array in your for loop. Use < instead of <=:
for(i=0;i<strlen(str);i++)//go trough the string cell by cell
Finally, make sure you #include <string.h> so that you have a proper declaration for strlen.
atoi(temp) causes undefined behaviour. The atoi function expects a pointer to null-terminated string as argument, however you provided a pointer to two characters without a terminator.
The atoi function will read off the end of your array. Your dummy array influences this because it changes what junk is present after the temp array.
BTW you could use (str[i] - '0') * 10 + (str[i+1] - '0') instead of atoi.
To my understanding, the problem is that the program fills the array potentially up to its full capacity by
fgets(str,2047,fp)
which means that the condition
i <= strlen(str)
works as expected only if the location after str is terminated with a zero; this might be the case when declaring finalstr.

How to compare a specific string with all elements of an array in C?

I'm reading in from a file that has a hex value on each line. It will look like this:
F0BA3240C
083FA52
45D3687AF
etc.
The hex values won't have the same length.
I have fgets reading from this file into a buffer and then a piece of code to get rid of the newline character. From there I put the string from the buffer into my data array. But before putting the string from the buffer into my data array, I'm attempting to compare the string from the buffer to the strings already stored in the data array and see if there is or isn't a match so I can update some counters. However, I'm having issues using strcmp and strncmp. Any help will be appreciated, thanks.
Relevant code:
char **data = NULL;
char data_buffer[100];
//program first goes through the file and determines amount of lines there are hence this variable
int count_line = 0;
...
data = malloc(count_line * sizeof(char *));
int f;
int i;
for(i=0; i<count_line; i++)
{
fgets(data_buffer, sizeof(data_buffer), fp);
...
//allocate space to store copy of line and add one for null terminator
data[i] = malloc(line_length + 1);
...
if(asdf != NULL)
{
//problem here. don't know how to compare stream from buffer and compare to all elements of data buffer
for(f=0; f<sizeof(data); f++)
{
if(strcmp(data[f], data_buffer) == 0)
there_was_a_match++;
}
}
...
//copy string from buffer into data array
strcpy(data[i], data_buffer);
}
Consider these lines:
for(f=0; f<sizeof(data); f++)
{
if(strcmp(data[f], data_buffer) == 0)
there_was_a_match++;
}
What is the value of sizeof(data)? Since data is of type char**, presumably
sizeof(data) is the size of a pointer in bytes, so some fixed integer value such as 4 or 8.
Now observe that the first time you encounter this loop within the "for i" loop,
i is 0 and data[0] is the only pointer in the array of pointers that has been allocated--
every other pointer in data is invalid.
So now what happens is, we do the first iteration of the inner loop: f is 0, so we
end up comparing the string we just read to itself.
On the next iteration, f is 1, we try comparing our latest string to data[1],
but data[1] has not yet been initialized,
ergo we have undefined behavior (such as a crash).
You might be better off if the f loop were like this:
for(f=0; f<i; ++f)
{
if(strcmp(data[f], data_buffer) == 0)
there_was_a_match++;
}
This way, you will compare the newest string (which you have just saved in data[i])
with only the strings that were already loaded.
There is one other thing that may be troublesome. Suppose your input consists of four
copies of the same string.
Then after you read the second copy and execute this loop, there_was_a_match will be 1;
after reading the third copy and executing that loop, there_was_a_match will be 3
(because it matches twice);
after reading the fourth copy and executing that loop, there_was_a_match will be 6.
I suspect these are not the results you want.
Perhaps you want to break out of the loop after finding the first match.
I can´t see problems with strcmp/strcpy, but:
for(f=0; f<sizeof(data); f++)
You can´t use sizeof, ie. it won´t result in the value count_line.
From the shown code, this could be enough to make it work.
Just compare to the previous allocated lines, which is i lines.
// for(f=0; f<sizeof(data); f++)
for(f=0; f<i; f++)

K&R Chapter 1 - Exercise 22 solution, what do you think?

I'm learning C from the k&r as a first language, and I just wanted to ask, if you thought this exercise was being solved the right way, I'm aware that it's probably not as complete as you'd like, but I wanted views, so I'd know I'm learning C right.
Thanks
/* Exercise 1-22. Write a program to "fold" long input lines into two or
* more shorter lines, after the last non-blank character that occurs
* before then n-th column of input. Make sure your program does something
* intelligent with very long lines, and if there are no blanks or tabs
* before the specified column.
*
* ~svr
*
* [NOTE: Unfinished, but functional in a generic capacity]
* Todo:
* Handling of spaceless lines
* Handling of lines consisting entirely of whitespace
*/
#include <stdio.h>
#define FOLD 25
#define MAX 200
#define NEWLINE '\n'
#define BLANK ' '
#define DELIM 5
#define TAB '\t'
int
main(void)
{
int line = 0,
space = 0,
newls = 0,
i = 0,
c = 0,
j = 0;
char array[MAX] = {0};
while((c = getchar()) != EOF) {
++line;
if(c == NEWLINE)
++newls;
if((FOLD - line) < DELIM) {
if(c == BLANK) {
if(newls > 0) {
c = BLANK;
newls = 0;
}
else
c = NEWLINE;
line = 0;
}
}
array[i++] = c;
}
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
return 0;
}
I'm sure you on the rigth track, but some pointers for readability:
comment your stuff
name the variables properly and at least give a description if you refuse
be consequent, some single-line if's you use and some you don't. (imho, always use {} so it's more readable)
the if statement in the last for-loop can be better, like
if(array[0] != NEWLINE)
{
printf("%c", array[line]);
}
That's no good IMHO.
First, it doesn't do what you were asked for. You were supposed to find the last blank after a nonblank before the output line boundary. Your program doesn't even remotely try to do it, it seems to strive for finding the first blank after (margin - 5) characters (where did the 5 came from? what if all the words had 9 letters?). However it doesn't do that either, because of your manipulation with the newls variable. Also, this:
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
is probably wrong, because you check for a condition that never changes throughout the loop.
And, last but not least, storing the whole file in a fixed-size buffer is not good, because of two reasons:
the buffer is bound to overflow on large files
even if it would never overflow, people still wouldn't like you for storing eg. a gigabyte file in memory just to cut it into 25-character chunks
I think you should start again, rethink your algorithm (incl. corner cases), and only after that, start coding. I suggest you:
process the file line-by-line (meaning output lines)
store the line in a buffer big enough to hold the largest output line
search for the character you'll break at in the buffer
then print it (hint: you can terminate the string with '\0' and print with printf("%s", ...)), copy what you didn't print to the start of the buffer, proceed from that
An obvious problem is that you statically allocate 'array' and never check the index limits while accessing it. Buffer overflow waiting to happen. In fact, you never reset the i variable within the first loop, so I'm kinda confused about how the program is supposed to work. It seems that you're storing the complete input in memory before printing it word-wrapped?
So, suggestions: merge the two loops together and print the output for each line that you have completed. Then you can re-use the array for the next line.
Oh, and better variable names and some comments. I have no idea what 'DELIM' is supposed to do.
It looks (without testing) like it could work, but it seems kind of complicated.
Here's some pseudocode for my first thought
const int MAXLINE = ?? — maximum line length parameter
int chrIdx = 0 — index of the current character being considered
int cand = -1 — "candidate index", Set to a potential break character
char linebuf[bufsiz]
int lineIdx = 0 — index into the output line
char buffer[bufsiz] — a character buffer
read input into buffer
for ix = 0 to bufsiz -1
do
if buffer[ix] == ' ' then
cand = ix
fi
linebuf[lineIdx] = buffer[ix]
lineIdx += 1
if lineIdx >= MAXLINE then
linebuf[cand] = NULL — end the string
print linebuf
do something to move remnants to front of line (memmove?)
fi
od
It's late and I just had a belt, so there may be flaws, but it shows the general idea — load a buffer, and copy the contents of the buffer to a line buffer, keeping track of the possible break points. When you get close to the end, use the breakpoint.

Resources