Using fread() to parse data through a file and stdin - c

I am trying to write an algorithm that takes input from a file and builds what is called an "s1 record". (The functionality of this function is not important) Depending on the command line arguments, the program will set the inputFile to the specified file, or stdin if no file is provided.
The algorithm needs to be structured in a way that can handle both file patterns.
The idea of this is to take FILE* data and read it into a buffer of size 16 bytes. Every 16 bytes of data, an s1 record will be built. As long as there are 16 bytes to read then it works fine and dandy. Once there is a line with less than 16 bytes, it doesn't create an s1 record.
Ive tested the output and these are some of the things I noticed:
When I run the program using "stdin", I am prompted for user input. I enter 20 characters (which should print 16 in 1 srecord, and 4 in another) and my output is as follows:
12345678901234567890
Buffer: 1234567890123456
S113000031323334353637383930313233343536AA
When I run the program using a file (record.dat) with one single line with the characters of the alphabet on it, I get this:
Buffer: ABCDEFGHIJKLMNOP
Buffer: QRSTUVWXYZKLMNOP
This is not valid either, as it prints the "KLMNOP" at the end of the line as well.
My question is: How can I structure this to accept the input from either a file or stdin using the same algorithm, and what exactly am I doing wrong in my algorithm? I have tried providing all the useful information I can, and can specify more detail if requested. Below is the code for the algorithm I am trying to write.
inputFile is set to stdin if no file is specified
char buffer[kMaxLineSize + 1] = { '\0' };
char byte = 0;
int count = 1;
while((fread(buffer, 1, kMaxLineSize, inputFile)))
{
printf("%c", byte);
clearCRLF(buffer);
printf("Buffer: %s\n", buffer);
if(outputFormat == 1)
{
char s1Record[kMaxSRecordSize] = { 0 };
buildS1Record(addressField, s1Record, buffer);
fprintf(outputFile, "%s\n", s1Record);
addressField += strlen(buffer);
s1Count++;
}
else
{
char asmRecord[kMaxASMRecordSize] = { 0 };
buildAssemblyRecord(asmRecord, buffer);
fprintf(outputFile, "%s\n", asmRecord);
}
}

I'll try to combine the comments in this answer.
But first, what you call an s1 "record" is not a record. It is a string of maximum 16 characters and a terminating null character. A record, in my understanding, is a struct with fields of possibly different types, one of which could be a string.
The code fixes are as follows:
char buffer[kMaxLineSize + 1] = { '\0' };
int len;
while (len=fread(buffer, 1, kMaxLineSize, inputFile))
{
...
char s1Record[kMaxSRecordSize] = { 0 };
buildS1Record(addressField, s1Record, buffer, len);
so you pass the length read to the function. Now it can copy the characters read and terminate with a '\0' character. Note also that there can be a discrepancy between kMaxLineSize and kMaxSRecordSize: they must be the same size (plus 1 for \0), so better use a single variable.
I hope this late answer can still be of use to you.

Related

C write and read single character

I'm currently working with processes, and encountered a problem while reading and writing char to a file.
The idea is we have couple of processes which should read an integer from file, increment it and write back. Here is my attempt: (i wont include error checking)
...
char n;
char buff[5];
int number;
...
read(my_desc, &n, 1);
number = (int)n;
number++;
sprintf(buff, "%4d", number);
write(my_desc, buff, sizeof(buff));
...
The file is just plain
0
But the output seems to be not correct (almost always garbage).
I already read write and read manuals but im clueless. I've checked some topics on read and write functions here on stack overflow, but most of them either don't work for me or i struggle with implementation.
Thanks in advance.
It appears that you are reading a single character, taking the ASCII code of that character and converting that number to a 4-character string, and then writing those 4 characters and the terminating null character back to the file.
According to the information that you provided in the comments section, this is not intended. If I understand you correctly, you rather want to
read the entire file as a string,
convert that string to a number,
increment that number,
convert that number back to a string and
overwrite the entire file with that string.
Step #1 can be accomplished with the function read. However, you should read in the whole file instead of only a single character.
Step #2 can be accomplished by using the function strtol.
Step #3 is trivial.
Step #4 can be accomplished using the function snprintf.
Step #5 can be accomplished by rewinding the file using the function lseek, and then using the function write.
I am assuming that the number represented in the file is in the respresentable range of a long int, which is -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 on most POSIX platforms. This means that the length of the string can be up to 19 characters, 20 including the terminating null character. That is why I am using a buffer size of 20.
char buffer[20], *p;
ssize_t bytes_read;
long num;
bytes_read = read( my_desc, buffer, (sizeof buffer) - 1 );
if ( bytes_read <= 0 )
{
//TODO: handle input error
}
//add null terminating character to string
buffer[bytes_read] = '\0';
//attempt to convert the string to a number
num = strtol( buffer, &p, 10 );
//check for conversion error
if ( p == buffer )
{
//TODO: handle conversion error
}
//increment the number
num++;
//write incremented number to buffer
snprintf( buffer, sizeof buffer, "%ld", num );
//rewind file
lseek( my_desc, 0, SEEK_SET );
//write buffer to file
write( my_desc, buffer, strlen(buffer) );
Note that I have not tested this code.
Also note that this program assumes that the input file does not contain any leading zeros. If the file contains the text string "003", then this program will overwrite the first character with a 4, but leave the remaining characters in the file intact. If this is an issue, then you will have to add a call to ftruncate to truncate the file.

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.
Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).
fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}
Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

Reading strings with spaces from a file

I'm working on a project and I just encountered a really annoying problem. I have a file which stores all the messages that my account received. A message is a data structure defined this way:
typedef struct _message{
char dest[16];
char text[512];
}message;
dest is a string that cannot contain spaces, unlike the other fields.
Strings are acquired using the fgets() function, so dest and text can have "dynamic" length (from 1 character up to length-1 legit characters). Note that I manually remove the newline character after every string is retrieved from stdin.
The "inbox" file uses the following syntax to store messages:
dest
text
So, for example, if I have a message from Marco which says "Hello, how are you?" and another message from Tarma which says "Are you going to the gym today?", my inbox-file would look like this:
Marco
Hello, how are you?
Tarma
Are you going to the gym today?
I would like to read the username from the file and store it in string s1 and then do the same thing for the message and store it in string s2 (and then repeat the operation until EOF), but since text field admits spaces I can't really use fscanf().
I tried using fgets(), but as I said before the size of every string is dynamic. For example if I use fgets(my_file, 16, username) it would end up reading unwanted characters. I just need to read the first string until \n is reached and then read the second string until the next \n is reached, this time including spaces.
Any idea on how can I solve this problem?
#include <stdio.h>
int main(void){
char username[16];
char text[512];
int ch, i;
FILE *my_file = fopen("inbox.txt", "r");
while(1==fscanf(my_file, "%15s%*c", username)){
i=0;
while (i < sizeof(text)-1 && EOF!=(ch=fgetc(my_file))){
if(ch == '\n' && i && text[i-1] == '\n')
break;
text[i++] = ch;
}
text[i] = 0;
printf("user:%s\n", username);
printf("text:\n%s\n", text);
}
fclose(my_file);
return 0;
}
As the length of each string is dynamic then, if I were you, I would read the file first for finding each string's size and then create a dynamic array of strings' length values.
Suppose your file is:
A long time ago
in a galaxy far,
far away....
So the first line length is 15, the second line length is 16 and the third line length is 12.
Then create a dynamic array for storing these values.
Then, while reading strings, pass as the 2nd argument to fgets the corresponding element of the array. Like fgets (string , arrStringLength[i++] , f);.
But in this way you'll have to read your file twice, of course.
You can use fgets() easily enough as long as you're careful. This code seems to work:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAX_MESSAGES = 20 };
typedef struct Message
{
char dest[16];
char text[512];
} Message;
static int read_message(FILE *fp, Message *msg)
{
char line[sizeof(msg->text) + 1];
msg->dest[0] = '\0';
msg->text[0] = '\0';
while (fgets(line, sizeof(line), fp) != 0)
{
//printf("Data: %zu <<%s>>\n", strlen(line), line);
if (line[0] == '\n')
continue;
size_t len = strlen(line);
line[--len] = '\0';
if (msg->dest[0] == '\0')
{
if (len < sizeof(msg->dest))
{
memmove(msg->dest, line, len + 1);
//printf("Name: <<%s>>\n", msg->dest);
}
else
{
fprintf(stderr, "Error: name (%s) too long (%zu vs %zu)\n",
line, len, sizeof(msg->dest)-1);
exit(EXIT_FAILURE);
}
}
else
{
if (len < sizeof(msg->text))
{
memmove(msg->text, line, len + 1);
//printf("Text: <<%s>>\n", msg->dest);
return 0;
}
else
{
fprintf(stderr, "Error: text for %s too long (%zu vs %zu)\n",
msg->dest, len, sizeof(msg->dest)-1);
exit(EXIT_FAILURE);
}
}
}
return EOF;
}
int main(void)
{
Message mbox[MAX_MESSAGES];
int n_msgs;
for (n_msgs = 0; n_msgs < MAX_MESSAGES; n_msgs++)
{
if (read_message(stdin, &mbox[n_msgs]) == EOF)
break;
}
printf("Inbox (%d messages):\n\n", n_msgs);
for (int i = 0; i < n_msgs; i++)
printf("%d: %s\n %s\n\n", i + 1, mbox[i].dest, mbox[i].text);
return 0;
}
The reading code will handle (multiple) empty lines before the first name, between a name and the text, and after the last name. It is slightly unusual in they way it decides whether to store the line just read in the dest or text parts of the message. It uses memmove() because it knows exactly how much data to move, and the data is null terminated. You could replace it with strcpy() if you prefer, but it should be slower (the probably not measurably slower) because strcpy() has to test each byte as it copies, but memmove() does not. I use memmove() because it is always correct; memcpy() could be used here but it only works when you guarantee no overlap. Better safe than sorry; there are plenty of software bugs without risking extras. You can decide whether the error exit is appropriate — it is fine for test code, but not necessarily a good idea in production code. You can decide how to handle '0 messages' vs '1 message' vs '2 messages' etc.
You can easily revise the code to use dynamic memory allocation for the array of messages. It would be easy to read the message into a simple Message variable in main(), and arrange to copy into the dynamic array when you get a complete message. The alternative is to 'risk' over-allocating the array, though that is unlikely to be a major problem (you would not grow the array one entry at a time anyway to avoid quadratic behaviour when the memory has to be moved during each allocation).
If there were multiple fields to be processed for each message (say, date received and date read too), then you'd need to reorganize the code some more, probably with another function.
Note that the code avoids the reserved namespace. A name such as _message is reserved for 'the implementation'. Code such as this is not part of the implementation (of the C compiler and its support system), so you should not create names that start with an underscore. (That over-simplifies the constraint, but only slightly, and is a lot easier to understand than the more nuanced version.)
The code is careful not to write any magic number more than once.
Sample output:
Inbox (2 messages):
1: Marco
How are you?
2: Tarma
Are you going to the gym today?

How would I compare a string (entered by the user) to the first word of a line in a file?

I am really struggling to understand how character arrays work in C. This seems like something that should be really simple, but I do not know what function to use, or how to use it.
I want the user to enter a string, and I want to iterate through a text file, comparing this string to the first word of each line in the file.
By "word" here, I mean substring that consists of characters that aren't blanks.
Help is greatly appreciated!
Edit:
To be more clear, I want to take a single input and search for it in a database of the form of a text file. I know that if it is in the database, it will be the first word of a line, since that is how to database is formatted. I suppose I COULD iterate through every single word of the database, but this seems less efficient.
After finding the input in the database, I need to access the two words that follow it (on the same line) to achieve the program's ultimate goal (which is computational in nature)
Here is some code that will do what you are asking. I think it will help you understand how string functions work a little better. Note - I did not make many assumptions about how well conditioned the input and text file are, so there is a fair bit of code for removing whitespace from the input, and for checking that the match is truly "the first word", and not "the first part of the first word". So this code will not match the input "hello" to the line "helloworld 123 234" but it will match to "hello world 123 234". Note also that it is currently case sensitive.
#include <stdio.h>
#include <string.h>
int main(void) {
char buf[100]; // declare space for the input string
FILE *fp; // pointer to the text file
char fileBuf[256]; // space to keep a line from the file
int ii, ll;
printf("give a word to check:\n");
fgets(buf, 100, stdin); // fgets prevents you reading in a string longer than buffer
printf("you entered: %s\n", buf); // check we read correctly
// see (for debug) if there are any odd characters:
printf("In hex, that is ");
ll = strlen(buf);
for(ii = 0; ii < ll; ii++) printf("%2X ", buf[ii]);
printf("\n");
// probably see a carriage return - depends on OS. Get rid of it!
// note I could have used the result that ii is strlen(but) but
// that makes the code harder to understand
for(ii = strlen(buf) - 1; ii >=0; ii--) {
if (isspace(buf[ii])) buf[ii]='\0';
}
// open the file:
if((fp=fopen("myFile.txt", "r"))==NULL) {
printf("cannot open file!\n");
return 0;
}
while( fgets(fileBuf, 256, fp) ) { // read in one line at a time until eof
printf("line read: %s", fileBuf); // show we read it correctly
// find whitespace: we need to keep only the first word.
ii = 0;
while(!isspace(fileBuf[ii]) && ii < 255) ii++;
// now compare input string with first word from input file:
if (strlen(buf)==ii && strstr(fileBuf, buf) == fileBuf) {
printf("found a matching line: %s\n", fileBuf);
break;
}
}
// when you get here, fileBuf will contain the line you are interested in
// the second and third word of the line are what you are really after.
}
Your recent update states that the file is really a database, in which you are looking for a word. This is very important.
If you have enough memory to hold the whole database, you should do just that (read the whole database and arrange it for efficient searching), so you should probably not ask about searching in a file.
Good database designs involve data structures like trie and hash table. But for a start, you could use the most basic improvement of the database - holding the words in alphabetical order (use the somewhat tricky qsort function to achieve that).
struct Database
{
size_t count;
struct Entry // not sure about C syntax here; I usually code in C++; sorry
{
char *word;
char *explanation;
} *entries;
};
char *find_explanation_of_word(struct Database* db, char *word)
{
for (size_t i = 0; i < db->count; i++)
{
int result = strcmp(db->entries[i].word, word);
if (result == 0)
return db->entries[i].explanation;
else if (result > 0)
break; // if the database is sorted, this means word is not found
}
return NULL; // not found
}
If your database is too big to hold in memory, you should use a trie that holds just the beginnings of the words in the database; for each beginning of a word, have a file offset at which to start scanning the file.
char* find_explanation_in_file(FILE *f, long offset, char *word)
{
fseek(f, offset, SEEK_SET);
char line[100]; // 100 should be greater than max line in file
while (line, sizeof(line), f)
{
char *word_in_file = strtok(line, " ");
char *explanation = strtok(NULL, "");
int result = strcmp(word_in_file, word);
if (result == 0)
return explanation;
else if (result > 0)
break;
}
return NULL; // not found
}
I think what you need is fseek().
1) Pre-process the database file as follows. Find out the positions of all the '\n' (carriage returns), and store them in array, say a, so that you know that ith line starts at a[i]th character from the beginning of the file.
2) fseek() is a library function in stdio.h, and works as given here. So, when you need to process an input string, just start from the start of the file, and check the first word, only at the stored positions in the array a. To do that:
fseek(inFile , a[i] , SEEK_SET);
and then
fscanf(inFile, "%s %s %s", yourFirstWordHere, secondWord, thirdWord);
for checking the ith line.
Or, more efficiently, you could use:
fseek ( inFile , a[i]-a[i-1] , SEEK_CURR )
Explanation: What fseek() does is, it sets the read/write position indicator associated with the file at the desired position. So, if you know at which point you need to read or write, you can just go there and read directly or write directly. This way, you won't need to read whole lines just to get first three words.

How do I read a file in C and store the characters in a variable

I am completely new to C and need help with this badly.
Im reading a file with fopen(), then obtaining the contents of it using fgetc(). What I want to know is how I can access the line fgetc() returns so if I can put the 4th - 8th characters into a char array. Below is an example I found online but am having a hard time parsing the data returns, I still don't have a firm understanding of C and don't get how an int can be used to store a line of characters.
FILE *fr;
fr = fopen("elapsed.txt", "r");
int n = fgetc(fr);
while(n!= EOF){
printf("%c", n);
n = fgetc(fr);
} printf("\n");
Here
1 first open the file
2 get size of file
3 allocated size to character pointer
4 and read data from file
FILE *fr;
char *message;
fr = fopen("elapsed.txt", "r");
/*create variable of stat*/
struct stat stp = { 0 };
/*These functions return information about a file. No permissions are required on the file itself*/
stat("elapsed.txt", &stp);
/*determine the size of data which is in file*/
int filesize = stp.st_size;
/*allocates the address to the message pointer and allocates memory*/
message = (char *) malloc(sizeof(char) * filesize);
if (fread(message, 1, filesize - 1, fr) == -1) {
printf("\nerror in reading\n");
/**close the read file*/
fclose(fr);
/*free input string*/
free(message);
}
printf("\n\tEntered Message for Encode is = %s", message);
PS Dont Forget to Add #include <sys/stat.h>.
You're not retrieving a line with fgetc. You are retrieving one character at a time from the file. That sample keeps retrieving characters until the EOF character is encountred (end of file). Look at this description of fgetc.
http://www.cplusplus.com/reference/clibrary/cstdio/fgetc/
On each iteration of the while loop, fgetc will retrieve a single character and place it into the variable "n". Something that can help you with "characters" in C is to just think of it as one byte, instead of an actual character. What you're not understanding here is that an int is 4 bytes and the character is 1 byte, but both can store the same bit pattern for the same ASCII character. The only different is the size of the variable internally.
The sample you have above shows a printf with "%c", which means to take the value in "n" and treat it like an ASCII character.
http://www.cplusplus.com/reference/clibrary/cstdio/printf/
You can use a counter in the while loop to keep track of your position to find the 4th and 8th value from the file. You should also think about what happens if the input file is smaller than your maximum size.
Hope that helps.
Ok look at it as box sizes I could have a 30cm x 30cm box that can hold 1 foam letter that I have. Now the function I am calling a function that 'could' return a 60cm x 60cm letter but it 99% likely to return a 30cm x 30cm letter because I know what its reading - I know if I give it a 60cm x 60cm box the result will always fit without surprises.
But if I am sure that the result will always be a 30cm x 30cm box then I know I can convert the result of a function that returns aa 60cm x 60cm box without losing anything

Resources