C getting number of lines in a text file of words - c

So I'm trying to get the number of lines in a text file of words each on a new line. I have this method so far...
char * getS(char *fileName){
FILE *src;
if((src = fopen(fileName, "r")) == NULL){
printf("%s %s %s", "Cannot open file ", fileName, ". The program is now ending.");
exit(-1);
}
char *get = ".";
int c = 0;
char ch = 'x';
while(ch!=EOF) {
ch = fgetc(src);
if(ch == '\n') c++;
}
fseek(src, 0, SEEK_SET);
printf("%i",c);
int random = rand() % (c - 1);
return get;
}
For some reason if I put a printf for ch in the middle of the while it will give me the correct number of lines, otherwise it 7801729.
Also how would I make a random int from 0 to the number of lines? The concept of using random in C is rather baffling to me right now.
Thanks in Advance!

I think fgetc() returns an int but you are stuffing the returned value into a char (without casting it to a char) so you're getting the first byte of the returned int in the right place (in your ch variable) but the additional three bytes overflow into your c variable, which is defined adjacent to ch on the stack. When you increment c, it's increasing the first byte (which suggests you're on a big-endian machine) but those extra three bytes in your int are untouched and left corrupted by the overwrite from the getc() return. That's why the first byte in your answer looks correct. Try defining ch as an int and I bet your problem goes away (though you might have to add some casting to avoid compiler errors/warnings).

Related

Dynamically allocate user inputted string

I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.
First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.
No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}
If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.
Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}

Read from a .txt file and save it in an array.Trouble with fscanf

I want read from a .txt file which contains english sentences and store them into a character array. Each character by character. I tried but got segmentation fault:11 . I have trouble with fscanf and reading from a file in C.
#include<stdio.h>
#include<math.h>
#include<limits.h>
int main()
{
FILE* fp = fopen("file1.txt","r");
char c , A[INT_MAX];
int x;
while(1)
{
fscanf("fp,%c",&c);
if(c == EOF)
{break;}
A[x] = c;
x++;
}
int i;
for (i=0;i<x;i++)
printf("%c",A[i]);
return 0;
}
Problem 1: Putting the array onto the stack as A[INT_MAX] is bad practice; it allocates an unreasonable amount of space on the stack (and will crash on machines where INT_MAX is large relative to the size of memory). Get the file size, then malloc space for it.
fseek(fp, SEEK_END);
long size = ftell(fp);
rewind(fp);
char *A = malloc((size_t) size); // assumes size_t and long are the same size
if (A == NULL) {
// handle error
}
Problem 2: The fscanf is wrong. If you insist on using fscanf (which is not a good way to read an entire file; see problem 4), you should change:
fscanf("fp,%c",&c);`
should be
int count = fscanf(fp, "%c",&c);
if (count <= 0)
break;
Problem 3: Your x counter is not initialized. If you insist on using fscanf, you'd need to initialize it:
int x = 0;
Problem 4: The fscanf is the wrong way to read the entire file. Assuming you've figured out how large the file is (see problem 1), you should read the file with an fread, like this:
int bytes_read = fread(A, 1, size, fp);
if (bytes_read < size) {
// something went wrong
}
My initial answer, and a good general rule:
You need to check the return value, because your c value can never be EOF, because EOF is an int value that doesn't fit into a char. (You should always check return values, even when it seems like errors shouldn't happen, but I haven't consistently done that in the code above.)
From http://www.cplusplus.com/reference/cstdio/fscanf/ :
Return Value
On success, the function returns the number of items of the argument list successfully filled. This count can match the expected number of items or be less (even zero) due to a matching failure, a reading error, or the reach of the end-of-file.
If a reading error happens or the end-of-file is reached while reading, the proper indicator is set (feof or ferror). And, if either happens before any data could be successfully read, EOF is returned.
If an encoding error happens interpreting wide characters, the function sets errno to EILSEQ.
Hi you should declear till where the program should read data. You can access all characters even if you read line like a string.
try it out
#include<stdio.h>
#include<string.h>
#define INT_MAX 100
int main()
{
FILE* fp = fopen("file1.txt","r");
char c , A[INT_MAX];
int i;
int x;
j=0
while(fscanf(fp,"%s",A[j])!=EOF)
{
j++;
}
int i;
int q;
for(q=0;q<j;q++)
{
for (i=0;i<strlen(A[q]);i++)
printf("%c ",A[q][i]);
printf("\n");
}
return 0;
}

Array of strings being overwritten

I have a program that is trying to take a text file that consists of the following and feed it to my other program.
Bruce, Wayne
Bruce, Banner
Princess, Diana
Austin, Powers
This is my C code. It is trying to get the number of lines in the file, parse the comma-separated keys and values, and put them all in a list of strings. Lastly, it is trying to iterate through the list of strings and print them out. The output of this is just Austin Powers over and over again. I'm not sure if the problem is how I'm appending the strings to the list or how I'm reading them off.
#include<stdio.h>
#include <stdlib.h>
int main(){
char* fileName = "Example.txt";
FILE *fp = fopen(fileName, "r");
char line[512];
char * keyname = (char*)(malloc(sizeof(char)*80));
char * val = (char*)(malloc(sizeof(char)*80));
int i = 0;
int ch, lines;
while(!feof(fp)){
ch = fgetc(fp);
if(ch == '\n'){ //counts how many lines there are
lines++;
}
}
rewind(fp);
char* targets[lines*2];
while (fgets(line, sizeof(line), fp)){
strtok(line,"\n");
sscanf(line, "%[^','], %[^',']%s\n", keyname, val);
targets[i] = keyname;
targets[i+1] = val;
i+=2;
}
int q = 0;
while (q!=i){
printf("%s\n", targets[q]);
q++;
}
return 0;
}
The problem is with the two lines:
targets[i] = keyname;
targets[i+1] = val;
These do not make copies of the string - they only copy the address of whatever memory they point to. So, at the end of the while loop, each pair of target elements point to the same two blocks.
To make copies of the string, you'll either have to use strdup (if provided), or implement it yourself with strlen, malloc, and strcpy.
Also, as #mch mentioned, you never initialize lines, so while it may be zero, it may also be any garbage value (which can cause char* targets[lines*2]; to fail).
First you open the file. The in the while loop, check the condition to find \n or EOF to end the loop. In the loop, if you get anything other than alphanumeric, then separate the token and store it in string array. Increment the count when you encounter \n or EOF. Better use do{}while(ch!=EOF);

Random exclamation marks in stream from malloc() but goes away if the line is removed? Am I corrupting heap?

I don't work with C often so please excuse any mistakes I might be making in terms of coding style :P I'm currently getting an error that I'm a bit stumped on: when I include the line tokenCopy = malloc(sizeof(fileSize));, I get random a random exclamation about 1/4th of the way through the output of a file to std but if the line is removed/commented, the data displays as expected:
MAY +1.32 D1 1002
JUNE -1.57 D3 201
JULY -2.37 D4 478
AUGUST +5.03 D2 930
SEPTEMBER -3.00 D1 370
OCTOBER +7.69 D1 112
and the actual output I get when the line is in place:
MAY +1.32 D1 1002
JUNE -1.57 D3 2!
and the relevant code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/**
* Machine struct - three column
**/
/**
* Parses the input file, size is the size of name
**/
char parseInputFile(char *name, int size) {
FILE *fp;
if ((fp = fopen(name,"r")) == NULL) {
printf("Cannot open file %s\n", name);
return 1;
}
else {
int fileSize;
fileSize = 0;
char *fileContent;
char *processedFileContent;
//get file size
fseek(fp, 0, SEEK_END);
fileSize = ftell(fp);
fseek(fp, 0, SEEK_SET);
//allocate
fileContent = malloc(sizeof(fileSize));
processedFileContent = malloc(sizeof(fileSize));
//read
char c;
int g;
g=0;
while((c = getc(fp)) != EOF) {
fileContent[g] = c;
g++;
}
//process
char delim[6] = " ";
char *tokenCopy;
tokenCopy = malloc(sizeof(fileSize));
strcpy(tokenCopy, fileContent);
char *tokens = strtok(tokenCopy, delim);
while (tokens) {
tokens = strtok(NULL, delim);
}
puts(fileContent);
//printf("File Size: %i \n",fileSize);
//puts(tokenCopy);
return *processedFileContent;
}
}
int main(int argc, char *argv[])
{
//char *input;
if (argc == 1)
puts("You must enter a filename");
else {
int size = sizeof(argv[1]);
parseInputFile(argv[1],size);
}
return 0;
}
Could anyone offer any insight into what I'm doing wrong (or if my code is causing problems in itself)?
You put the size of the file in fileSize but then allocate only the space to store an int that is what sizeof(FileSize) will give you.
These two lines
fileContent = malloc(sizeof(fileSize));
processedFileContent = malloc(sizeof(fileSize));
should be (assuming you will treat the text you'll read as a string):
fileContent = malloc(fileSize+1);
processedFileContent = malloc(fileSize+1)
and, after having read the file content, you should put a '\0' at the end.
That said, I really don't get what you are trying to achieve by using strtok(). If you only need to separate the three components of each line, you can do it much easily while you read the file since you read it one character at the time.
If you elaborate a little bit more on what you're trying to achieve, we might have other advice.
UPDATE AFTER COMMENT BELOW
You should step back a second and reconsider your problem as I suspect you don't need to store any string at all. The first value is a month name, which can be stored as an integer, the second is a double (or float), the third seems 'Dx' with x varying from 1 to 4, again this could be an integer. It seems the name of a sensor, so I suspect it could be coded in an integer anyway as there will surely be a finite number of them. And the fourth is clearly another integer.
With a wild guess on what those fields mean, your struct would look like something like this:
struct val {
int month;
double value;
int sensor;
int var;
}
Now, you can get the values as you go one char at the time, or read an entire line and get the values from there.
Going one char at the time will not require any additional space but will result in a longer program (full of 'if' and 'while'). Reading the line will be slightly easier but will require you to handle the maximum size of a line.
A proper structuring of functions will help you a lot:
do {
if ((c = get_month(fp, &month)) != EOF)
if ((c = get_value(fp, &value)) != EOF)
if ((c = get_sensor(fp, &sensor)) != EOF)
if ((c = get_var(fp, &var)) != EOF)
measures = add_data(measures, month, value, sensor, var);
} while (c != EOF);
return measures
Where measures can be a linked list or a resizable array of your structs and assuming you'll go one char at the time.
There are quite many other details you should set before you're done, I hope this will help you find the right direction.
Your delims string is not null-terminated.
Also, the delimiters are matched character-by-character, so there is no use repeating the space six times, nor will your call to strtok match a run of six spaces. Perhaps you need something like strpbrk.

How do I read hex numbers into an unsigned int in C

I'm wanting to read hex numbers from a text file into an unsigned integer so that I can execute Machine instructions. It's just a simulation type thing that looks inside the text file and according to the values and its corresponding instruction outputs the new values in the registers.
For example, the instructions would be:
1RXY -> Save register R with value in
memory address XY
2RXY -> Save register R with value XY
BRXY -> Jump to register R if xy is
this and that etc..
ARXY -> AND register R with value at
memory address XY
The text file contains something like this each in a new line. (in hexidecimal)
120F
B007
290B
My problem is copying each individual instruction into an unsigned integer...how do I do this?
#include <stdio.h>
int main(){
FILE *f;
unsigned int num[80];
f=fopen("values.txt","r");
if (f==NULL){
printf("file doesnt exist?!");
}
int i=0;
while (fscanf(f,"%x",num[i]) != EOF){
fscanf(f,"%x",num[i]);
i++;
}
fclose(f);
printf("%x",num[0]);
}
You're on the right track. Here's the problems I saw:
You need to exit if fopen() return NULL - you're printing an error message but then continuing.
Your loop should terminate if i >= 80, so you don't read more integers than you have space for.
You need to pass the address of num[i], not the value, to fscanf.
You're calling fscanf() twice in the loop, which means you're throwing away half of your values without storing them.
Here's what it looks like with those issues fixed:
#include <stdio.h>
int main() {
FILE *f;
unsigned int num[80];
int i=0;
int rv;
int num_values;
f=fopen("values.txt","r");
if (f==NULL){
printf("file doesnt exist?!\n");
return 1;
}
while (i < 80) {
rv = fscanf(f, "%x", &num[i]);
if (rv != 1)
break;
i++;
}
fclose(f);
num_values = i;
if (i >= 80)
{
printf("Warning: Stopped reading input due to input too long.\n");
}
else if (rv != EOF)
{
printf("Warning: Stopped reading input due to bad value.\n");
}
else
{
printf("Reached end of input.\n");
}
printf("Successfully read %d values:\n", num_values);
for (i = 0; i < num_values; i++)
{
printf("\t%x\n", num[i]);
}
return 0
}
You can also use the function strtol(). If you use a base of 16 it will convert your hex string value to an int/long.
errno = 0;
my_int = strtol(my_str, NULL, 16);
/* check errno */
Edit: One other note, various static analysis tools may flag things like atoi() and scanf() as unsafe. atoi is obsolete due to the fact that it does not check for errors like strtol() does. scanf() on the other hand can do a buffer overflow of sorts since its not checking the type sent into scanf(). For instance you could give a pointer to a short to scanf where the read value is actually a long....and boom.
You're reading two numbers into each element of your array (so you lose half of them as you overwrite them. Try using just
while (i < 80 && fscanf(f,"%x",&num[i]) != EOF)
i++;
for your loop
edit
you're also missing the '&' to get the address of the array element, so you're passing a random garbage pointer to scanf and probably crashing. The -Wall option is your friend.
In this case, all of your input is upper case hex while you are trying to read lower case hex.
To fix it, change %x to %X.
Do you want each of the lines (each 4 characters long) separated in 4 different array elements? If you do, I'd try this:
/* read the line */
/* fgets(buf, ...) */
/* check it's correct, mind the '\n' */
/* ... strlen ... isxdigit ... */
/* put each separate input digit in a separate array member */
num[i++] = convert_xdigit_to_int(buf[j++]);
Where the function convert_xdigit_to_int() simply converts '0' (the character) to 0 (an int), '1' to 1, '2' to 2, ... '9' to 9, 'a' or 'A' to 10, ...
Of course that pseudo-code is inside a loop that executes until the file runs out or the array gets filled. Maybe putting the fgets() as the condition for a while(...)
while(/*there is space in the array && */ fgets(...)) {
}

Resources