I have written a basic code which writes into a file a string in binary mode (using fwrite()). Also I can read the same string from the file (using fread()) in to the buffer and print it. It works but in the part where I read from the file, extra junk is also read into the buffer. My question is how to know the length of the bytes to be read, correctly?
The following is the code --
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define BUFSZ 81
char * get_string (char *, size_t);
int main (int argc, char * argv[])
{
if (argc != 2)
{
fprintf (stderr, "Invalid Arguments!!\n");
printf ("syntax: %s <filename>\n", argv[0]);
exit (1);
}
FILE * fp;
if ((fp = fopen(argv[1], "ab+")) == NULL)
{
fprintf (stderr, "Cannot openm file <%s>\n", argv[1]);
perror ("");
exit (2);
}
char string[BUFSZ];
char readString[BUFSZ];
size_t BYTES, BYTES_READ;
puts ("Enter a string: ");
get_string (string, BUFSZ);
// printf ("You have entered: %s\n", string);
BYTES = fwrite (string, sizeof (char), strlen (string), fp);
printf ("\nYou have written %zu bytes to file <%s>.\n", BYTES, argv[1]);
printf ("\nContents of the file <%s>:\n", argv[1]);
rewind (fp);
BYTES_READ = fread (readString, sizeof (char), BUFSZ, fp);
printf ("%s\n", readString);
printf ("\nYou have read %zu bytes from file <%s>.\n", BYTES_READ, argv[1]);
getchar ();
fclose (fp);
return 0;
}
char * get_string (char * str, size_t n)
{
char * ret_val = fgets (str, n, stdin);
char * find;
if (ret_val)
{
find = strchr (str, '\n');
if (find)
* find = '\0';
else
while (getchar () != '\n')
continue;
}
return ret_val;
}
in the part where I read from the file, extra junk is also read into the buffer.
No, it isn't. Since you're opening the file in append mode, it's possible that you're reading in extra data preceding the string you've written, but you are not reading anything past the end of what you wrote, because there isn't anything there to read. When the file is initially empty or absent, you can verify that by comparing the value of BYTES to the value of BYTES_READ.
What you are actually seeing is the effect of the read-back data not being null terminated. You did not write the terminator to the file, so you could not read it back. It might be reasonable to avoid writing the terminator, but in that case you must supply a new one when you read the data back in. For example,
readString[BYTES_READ] = '\0';
My question is how to know the length of the bytes to be read, correctly?
There are various possibilities. Among the prominent ones are
use fixed-length data
write the string length to the file, ahead of the string data.
Alternatively, in your particular case, when the file starts empty and you write only one string in it, there is also the possibility of capturing and working with how many bytes were read instead of knowing in advance how many should be read.
First of all you get string from the user, which will contain up to BUFSZ-1 characters (get_string() function will remove the trailing newline or skip any character exceeding the BUFSZ limit.
For example, the user might have inserted the word Hello\n, so that after get_string() call string array contains
-------------------
|H|e|l|l|o|'\0'|...
-------------------
Then you fwrite the string buffer to the output file, writing strlen (string) bytes. This doesn't include the string terminator '\0'.
In our example the contents of the output file is
--------------
|H|e|l|l|o|...
--------------
Finally you read back from the file. But since readString array is not initialized, the file contents will be followed by every junk character might be present in the uninitialized array.
For example, readString could have the following initial contents:
---------------------------------------------
|a|a|a|a|a|T|h|i|s| |i|s| |j|u|n|k|!|'\0'|...
---------------------------------------------
and after reading from the file
---------------------------------------------
|H|e|l|l|o|T|h|i|s| |i|s| |j|u|n|k|!|'\0'|...
---------------------------------------------
So that the following string would be printed
HelloThis is junk!
In order to avoid these issues, you have to make sure that a trailing terminator is present in the target buffer. So, just initialize the array in this way:
char readString[BUFSZ] = { 0 };
In this way at least a string terminator will be present in the target array.
Alternatively, memset it to 0 before every read:
memset (readString, 0, BUFSZ);
Related
i have a file("career.txt") where each line are composed in this way:
serial number(long int) name(string) surname(string) exam_id(string) exam_result(string);
each line could have from 1 to 25 couple composed by the exam_id(ex: INF070) and the exam_result(ex:30L).
Ex line:
333145 Name Surname INF120 24 INF070 28 INF090 R INF100 30L INF090 24
33279 Name Surname GIU123 28 GIU280 27 GIU085 21 GIU300 R
I don't know how many couple there are in one line(there could be 1, 5 or 25)(so i can't use a for cycle) and i have to read it.
How can i read the entire line?
I tried use getline and fgets, and then divide the string with strtok, but I don't know why it doesn't work.
If you don't know the maximum number of characters in a line of the text file, one way to read the entire line is to dynamically allocate memory for the buffer as you read the file.
You can use the getline function to do this which is a POSIX function.
Here is an example of how you can use getline to read a line from a file:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *buffer = NULL;
size_t bufsize = 0;
ssize_t characters;
FILE *file = fopen("file.txt", "r");
if (file == NULL) {
printf("Could not open file\n");
return 1;
}
while ((characters = getline(&buffer, &bufsize, file)) != -1) {
printf("Line: %s", buffer);
}
free(buffer);
fclose(file);
return 0;
}
The getline function works by allocating memory for the buffer automatically, so you don't have to specify a fixed buffer size. It takes three arguments: a pointer to a pointer to a character (in this case, &buffer), a pointer to a size_t variable (in this case, &bufsize), and a pointer to a FILE object representing the file you want to read from. The function reads a line from the file and stores it in the buffer, dynamically allocating more memory if necessary. The getline function returns the number of characters read, which includes the newline character at the end of the line, or -1 if it reaches the end of the file. And in the example above, each line is read and printed. And it is important to release the memory after finished processing the buffer, in this case by free(buffer).
I'm trying to read every string separated with commas, dots or whitespaces from every line of a text from a file (I'm just receiving alphanumeric characters with scanf for simplicity). I'm using the getline function from <stdio.h> library and it reads the line just fine. But when I try to "iterate" over the buffer that was fetched with it, it always returns the first string read from the file. Let's suppose I have a file called "entry.txt" with the following content:
test1234 test hello
another test2
And my "main.c" contains the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_WORD 500
int main()
{
FILE *fp;
int currentLine = 1;
size_t characters, maxLine = MAX_WORD * 500;
/* Buffer can keep up to 500 words of 500 characters each */
char *word = (char *)malloc(MAX_WORD * sizeof(char)), *buffer = (char *)malloc((int)maxLine * sizeof(char));
fp = fopen("entry.txt", "r");
if (fp == NULL) {
return 1;
}
for (currentLine = 1; (characters = getline(&buffer, &maxLine, fp)) != -1; currentLine++)
{
/* This line gets "test1234" onto "word" variable, as expected */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // As expected
/* This line should get "test" string, but again it obtains "test1234" from the buffer */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // Not intended...
// Do some stuff with the "word" and "currentLine" variables...
}
return 0;
}
What happens is that I'm trying to get every alphanumeric string (namely word from now on) in sequence from the buffer, when the sscanf function just gives me the first occurrence of a word within the specified buffer string. Also, every line on the entry file can contain an unknown amount of words separated by either whitespaces, commas, dots, special characters, etc.
I'm obtaining every line from the file separately with "getline" because I need to get every word from every line and store it in other place with the "currentLine" variable, so I'll know from which line a given word would've come. Any ideas of how to do that?
fscanf has an input stream argument. A stream can change its state, so that the second call to fscanf reads a different thing. For example:
fscanf(stdin, "%s", str1); // str1 contains some string; stdin advances
fscanf(stdin, "%s", str2); // str2 contains some other sting
scanf does not have a stream argument, but it has a global stream to work with, so it works exactly like fscanf(stdin, ...).
sscanf does not have a stream argument, nor there is any global state to keep track of what was read. There is an input string. You scan it, some characters get converted, and... nothing else changes. The string remains the same string (how could it possibly be otherwise?) and no information about how far the scan has advanced is stored anywhere.
sscanf(buffer, "%s", str1); // str1 contains some string; nothing else changes
sscanf(buffer, "%s", str2); // str2 contains the same sting
So what does a poor programmer fo?
Well I lied. No information about how far the scan has advanced is stored anywhere only if you don't request it.
int nchars;
sscanf(buffer, "%s%n", str1, &nchars); // str1 contains some string;
// nchars contains number of characters consumed
sscanf(buffer+nchars, "%s", str2); // str2 contains some other string
Error handling and %s field widths omitted for brevity. You should never omit them in real code.
Im trying to write a simple program that reads a file and prints each line of the file. Each line will contain a name and their age, for example:
Jenny 16
Amy 24
I'm getting Segmentation fault (core dumped) and I'm not sure what is causing this problem since I have used call-by-reference for the integer value in the fscanf already. Will greatly appreciate any input
*sorry, i should have specified this before but the input file name is just file1 so it shouldn't be longer than the buffer
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
int main(){
char filename[10];
scanf("%s",filename);
FILE *file;
char name[20];
int num;
file = fopen("filename", "r");
while
((fscanf(file, "%s %d\n", name, &num)) != EOF) {
printf("%s %d\n", name, num);
}
fclose(file);
return 0;
}
Simple error - "filename" is string literal, so fopen("filename", "r"); will try to open file that does not exist. Change it to fopen(filename, "r"); and it should work. Note, that reading from fscanf is unsafe - if the name is longer than 19 characters, the program can act in unpredictable way; fopen can overwrite other data in the program, null terminating character could be omitted, or program might execute just fine in many examples, and give weird result in others - it is called "undefined behaviour". Nevertheless for simple learning purpose this approach is enough - just keep in mind, that it can crash from time to time.
You need to limit the scanf so it doesn't read anything longer of your buffer. Probably the segmentation fault is because you are reading a name or a filename with more characters than your buffer's sizes. (to be sure, you should post your input data).
You should use scanf family of functions like this:
scanf("%9s",filename);
Notice I limit the size to 9 (because your buffer is 10 chars length) to leave room for scanf to put a \x00 to terminate the string.
This is a circumstance where reading with fscanf() his horribly fragile. How would it handle:
Jenny 16
Minnie Mouse 92
or what if a space was missing?
Jenny16
Minnie Mouse92
Would fscanf() still work?
The key is to handle input in a line-oriented manner with fgets() (or POSIX getline()) to read the entire line into a buffer (character array) and then parse the values you need from the buffer using sscanf() (or a pair of pointers)
One way to approach this is to separate everything up to the first digit into name and then convert the remaining number of age. Then you simply remove any trailing spaces from name by overwriting the whtiespace character with the nul-terminating character to trim any trailing whitespace from name. You can use strlen() to get the length of all characters in name and then use a pointer to the last character and work backwards toward the beginning removing whitespace until the first non-whitespace character is found.
For example:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define MAXNM 64
int main (int argc, char **argv) {
char buf[MAXC]; /* buffer to hold each line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line of input */
char name[MAXNM]; /* name */
int age; /* age */
/* parse contents of buf into name and age -- VALIDATE */
if (sscanf (buf, "%63[^0-9]%d", name, &age) == 2) {
size_t len = strlen(name); /* get length of name */
char *p = name + len - 1; /* pointer to last char in name */
while (p > name && isspace(*p)) /* while char is whitespace */
*p-- = 0; /* overwrite with nul-terminating char */
printf ("name: '%s' age: %d\n", name, age);
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
}
(note: there is nothing special about a "buffer". It is just a character array (or unsigned char or uint8_t) sufficiently large to hold the entirety of what it is you need to temporarily store -- don't skimp on buffer size. You can adjust downward as needed if on a micro-controller with limited memory)
Also do not hardcode filenames or numbers (called Magic Numbers) unless there is no alternative --- as above when providing the field-width modifier to the %[^0-9] conversion specifier. Otherwise, if you need a constant #define one or use a global enum if you have multiple constants to define.
Example Use/Output
With your original example in dat/name_age.txt you would have:
$ ./bin/name_age dat/name_age.txt
name: 'Jenny' age: 16
name: 'Amy' age: 24
or in the case of the example with "Minnie Mouse 92":
$ ./bin/name_age <dat/name_age2.txt
name: 'Jenny' age: 16
name: 'Minnie Mouse' age: 92
Look things over and let me know if you have questions.
I want to know if I can read a complete file with single scanf statement. I read it with below code.
#include<stdio.h>
int main()
{
FILE * fp;
char arr[200],fmt[6]="%[^";
fp = fopen("testPrintf.c","r");
fmt[3] = EOF;
fmt[4] = ']';
fmt[5] = '\0';
fscanf(fp,fmt,arr);
printf("%s",arr);
printf("%d",EOF);
return 0;
}
And it resulted into a statement after everything happened
"* * * stack smashing detected * * *: terminated
Aborted (core dumped)"
Interestingly, printf("%s",arr); worked but printf("%d",EOF); is not showing its output.
Can you let me know what has happened when I tried to read upto EOF with scanf?
If you really, really must (ab)use fscanf() into reading the file, then this outlines how you could do it:
open the file
use fseek() and
ftell() to find the size of the file
rewind() (or fseek(fp, 0, SEEK_SET)) to reset the file to the start
allocate a big buffer
create a format string that reads the correct number of bytes into the buffer and records how many characters are read
use the format with fscanf()
add a null terminating byte in the space reserved for it
print the file contents as a big string.
If there are no null bytes in the file, you'll see the file contents printed. If there are null bytes in the file, you'll see the file contents up to the first null byte.
I chose the anodyne name data for the file to be read — there are endless ways you can make that selectable at runtime.
There are a few assumptions made about the size of the file (primarily that the size isn't bigger than can be fitted into a long with signed overflow, and that it isn't empty). It uses the fact that the %c format can accept a length, just like most of the formats can, and it doesn't add a null terminator at the end of the string it reads and it doesn't fuss about whether the characters read are null bytes or anything else — it just reads them. It also uses the fact that you can specify the size of the variable to hold the offset with the %n (or, in this case, the %ln) conversion specification. And finally, it assumes that the file is not shrinking (it will ignore growth if it is growing), and that it is a seekable file, not a FIFO or some other special file type that does not support seeking.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
const char filename[] = "data";
FILE *fp = fopen(filename, "r");
if (fp == NULL)
{
fprintf(stderr, "Failed to open file %s for reading\n", filename);
exit(EXIT_FAILURE);
}
fseek(fp, 0, SEEK_END);
long length = ftell(fp);
rewind(fp);
char *buffer = malloc(length + 1);
if (buffer == NULL)
{
fprintf(stderr, "Failed to allocate %ld bytes\n", length + 1);
exit(EXIT_FAILURE);
}
char format[32];
snprintf(format, sizeof(format), "%%%ldc%%ln", length);
long nbytes = 0;
if (fscanf(fp, format, buffer, &nbytes) != 1 || nbytes != length)
{
fprintf(stderr, "Failed to read %ld bytes (got %ld)\n", length, nbytes);
exit(EXIT_FAILURE);
}
buffer[length] = '\0';
printf("<<<SOF>>\n%s\n<<EOF>>\n", buffer);
free(buffer);
return(0);
}
This is still an abuse of fscanf() — it would be better to use fread():
if (fread(buffer, sizeof(char), length, fp) != (size_t)length)
{
fprintf(stderr, "Failed to read %ld bytes\n", length);
exit(EXIT_FAILURE);
}
You can then omit the variable format and the code that sets it, and also nbytes. Or you can keep nbytes (maybe as a size_t instead of long) and assign the result of fread() to it, and use the value in the error report, along the lines of the test in the fscanf() variant.
You might get warnings from GCC about a non-literal format string for fscanf(). It's correct, but this isn't dangerous because the programmer is completely in charge of the content of the format string.
I'm currently working on a binary file creation. Here is what I have tried.
Example 1:
#include<stdio.h>
int main() {
/* Create the file */
int a = 5;
FILE *fp = fopen ("file.bin", "wb");
if (fp == NULL)
return -1;
fwrite (&a, sizeof (a), 1, fp);
fclose (fp);
}
return 0;
}
Example 2:
#include <stdio.h>
#include <string.h>
int main()
{
FILE *fp;
char str[256] = {'\0'};
strcpy(str, "3aae71a74243fb7a2bb9b594c9ea3ab4");
fp = fopen("file.bin", "wb");
if(fp == NULL)
return -1;
fwrite(str, sizeof str, 1, fp);
return 0;
}
Example 1 gives the right output in binary form. But Example 2 where I'm passing string doesn't give me right output. It writes the input string which I have given into the file and appends some data(binary form).
I don't understand and I'm unable to figure it out what mistake I'm doing.
The problem is that sizeof str is 256, that is, the entire size of the locally declared character array. However, the data you are storing in it does not require all 256 characters. The result is that the write operation writes all the characters of the string plus whatever garbage happened to be in the character array already. Try the following line as a fix:
fwrite(str, strlen(str), 1, fp);
C strings are null terminated, meaning that anything after the '\0' character must be ignored. If you read the file written by Example 2 into a str[256] and print it out using printf("%s", str), you would get the original string back with no extra characters, because null terminator would be read into the buffer as well, providing proper termination for the string.
The reason you get the extra "garbage" in the output is that fwrite does not interpret str[] array as a C string. It interprets it as a buffer of size 256. Text editors do not interpret null character as a terminator, so random characters from str get written to the file.
If you want the string written to the file to end at the last valid character, use strlen(str) for the size in the call of fwrite.