How to read and print hexadecimal numbers from a file in C - c

I'm trying to read 14 digit long hexadecimal numbers from a file and then print them. My idea is to use a long long int and read the lines from the files with fscanf as if they were strings and then turn the string into a hex number using atoll. The problem is I am getting a seg value on my fscanf line according to valgrind and I have absolutely no idea why. Here is the code:
#include<stdio.h>
int main(int argc, char **argv){
if(argc != 2){
printf("error argc!= 2\n");
return 0;
}
char *fileName = argv[1];
FILE *fp = fopen( fileName, "r");
if(fp == NULL){
return 0;
}
long long int num;
char *line;
while( fscanf(fp, "%s", line) == 1 ){
num = atoll(line);
printf("%x\n", num);
}
return 0;
}

Are you sure you want to read your numbers as character strings? Why not allow the scanf do the work for you?
long long int num;
while( fscanf(fp, "%llx", &num) == 1 ){ // read a long long int in hex
printf("%llx\n", num); // print a long long int in hex
}
BTW, note the ll size specifier to %x conversion in printf - it defines the integer value will be of long long type.
Edit
Here is a simple example of two loops reading a 3-line input (with two, no and three numbers in consecutive lines) with a 'hex int' format and with a 'string' format:
http://ideone.com/ntzKEi
A call to rewind allows the second loop read the same input data.

That line variable is not initialized, so when fscanf() dereferences it you get undefined behavior.
You should use:
char line[1024];
while(fgets(line, sizeof line, fp) != NULL)
To do the loading.
If you're on C99, you might want to use uint64_t to hold the number, since that makes it clear that 14-digit hexadecimal numbers (4 * 14 = 56) will fit.

The other answers are good, but I want to clarify the actual reason for the crash you are seeing. The problem is that:
fscanf(fp, "%s", line)
... essentially means "read a string from a file, and store it in the buffer pointed at by line". In this case, your line variable hasn't been initialised, so it doesn't point anywhere. Technically, this is undefined behavior; in practice, the result will often be that you write over some arbitrary location in your process's address space; furthermore, since it will often point at an illegal address, the operating system can detect and report it as a segment violation or similar, as you are indeed seeing.
Note that fscanf with a %s conversion will not necessarily read a whole line - it reads a string delimited by whitespace. It might skip lines if they are empty and it might read multiple strings from a single line. This might not matter if you know the precise format of the input file (and it always has one value per line, for instance).
Although it appears in that case that you can probably just use an appropriate modifier to read a hexadecimal number (fscanf(fp, "%llx", &num)), rather than read a string and try to do a conversion, there are various situations where you do need to read strings and especially whole lines. There are various solutions to that problem, depending on what platform you are on. If it's a GNU system (generally including Linux) and you don't care about portability, you could use the m modifier, and change line to &line:
fscanf(fp, "%ms", &line);
This passes a pointer to line to fscanf, rather than its value (which is uninitialised), and the m causes fscanf to allocate a buffer and store its address in line. You then should free the buffer when you are done with it. Check the Glibc manual for details. The nice thing about this approach is that you do not need to know the line length beforehand.
If you are not using a GNU system or you do care about portability, use fgets instead of fscanf - this is more direct and allows you to limit the length of the line read, meaning that you won't overflow a fixed buffer - just be aware that it will read a whole line at a time, unlike fscanf, as discussed above. You should declare line as a char-array rather than a char * and choose a suitable size for it. (Note that you can also specify a "maximum field width" for fscanf, eg fscanf(fp, "%1000s", line), but you really might as well use fgets).

Related

What happens with extra memory using fscanf?

I'm new to C and I have a couple questions about fscanf. I wrote a simple program that reads the contents of a file and spits it back out on the command line:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char* argv[1])
{
if (argc != 2)
{
printf("Usage: fscanf txt\n");
return 1;
}
char* txt = argv[1];
FILE* fp = fopen(txt, "r");
if (fp == NULL)
{
printf("Could not open %s.\n", txt);
return 2;
}
char s[50];
while (fscanf(fp, "%49s", s) == 1)
printf("%s\n", s);
return 0;
}
Let's say the contents of my text file is just "C is cool.", which will output:
C
is
cool.
So I have two questions here:
1) Does fscanf assume that the placeholder "%s" will be a single word (an array of chars only)? According to this program's output, spaces and line breaks seem to prompt the function to return. But what if I wanted to read a whole paragraph? Would I use fread() instead?
2) More importantly I'm wondering what happens with all of the unused space in the array. On the first iteration, I think s[0] = "C" and s[1] = "\0", so are s[2] - s[49] just wasted?
EDIT: while (fscanf(fp, "%**49**s", s) == 1) - thanks to #M Oehm for pointing this out - enforcing strong limit here to prevent dangerous buffer overflows
1) Does fscanf assume that the placeholder "%s" will be a single word
(an array of chars only)? According to this program's output, spaces
and line breaks seem to prompt the function to return. But what if I
wanted to read a whole paragraph? Would I use fread() instead?
The %s specifier reads single words that are delimited by white space. The scanf family of functions are very cerude; they do not normally distinguish between line breaks and spaces, for example.
A line is anything up to the next newline. There is no concept of paragraph, but you might consider anything between blank lines a paragraph. The function to read lines of text is fgets, so you could read lines until you find an empty one. (fgets retains the newline at the end, mind.)
fread is a function for reading binary data. It is not useful for reading structured texts. (But it can be used to read the contents of a whole text file at once.)
2) More importantly I'm wondering what happens with all of the unused
space in the array. On the first iteration, I think c[0] = 'C' and
c[1] = '\0', so are c[2] - c[49] just wasted?
You are right, the data after the null ternimator isn't used. "Wasted" is too negative – with user input you don't know whether you encounter a longer word eventually. Because dynamic allocation requires some care in C, allocating "enogh for most cases" is a goopd practice in C. You should enforce the hard limit when reading, though, to prevent buffer overruns:
fscanf(fp, "%49s", s)
The issue of "wasted" memory becomes more serious if you have an array of arrays of 50 chars. Most of the words will be much shorter than 50 chars. Here, the extra memory might eventually hurt you. 48 extra characters for reading a line are okay, though.
(A strategy to save "compact" arrays of chars is to have a running array of chars that is a concatenation of all strings, including their terminators. The word array is then an array of piointers into that master string.)
You use specifier %s which will read and store data in array s until it encounters a space or newline . As soon as it encounters space fscanf returns.
I think c[0] = "C" and c[1] = "\0", so are c[2] - c[49] just wasted?
Yes , s[0]='C' and s[1]='\0' and you probably can't do anything about the size of array being much more.
If you want complete string "C is cool" stored in array use fgets.
#define len 1000
char s[len];
while(fgets(s,len,fp)!=NULL) {
//your code
}

Issue reading Japanese characters from file - C

I am writing a program which reads a file with almost 2 million lines. The file is in the format integer ID tab with an artist name string.
6821361 Selinsgrove High School Chorus
10151460 greek-Antique
10236365 jnr walker & the all-stars
6878792 Grieg - Kraggerud, Kjekshus
6880556 Mr. Oiseau
6906305 stars on 54 (maxi single)
10584525 Jonie Mitchel
10299729 エリス レジーナ/アントニオ カルロス ジョビン
Above is an example with some items from the file (not some lines do not follow the specific format). My program work file until it gets to the last line from the example then it endlessly prints エリス レジーナ/アントニオ カルロス ジョビ\343\203.
struct artist *read_artists(char *fname)
{
FILE *file;
struct artist *temp = (struct artist*)malloc(sizeof(struct artist));
struct artist *head = (struct artist*)malloc(sizeof(struct artist));
file = fopen("/Users/Daniel/Library/Developer/Xcode/DerivedData/project_Audioscrobbler_Artists-hgwyqpinuoxayzbmvarcjxryqnrz/Build/Products/Debug/artist_data.txt", "r");
if(file == 0)
{
perror("fopen");
exit(1);
}
int artist_ID;
char artist_name[650];
while(!feof(file))
{
fscanf(file, "%d\t%65[^\t\n]\n", &artist_ID, artist_name);
temp = create_play(artist_ID, artist_name, 0, -1);
head = add_play(head, temp);
printf("%s\n", artist_name);
}
fclose(file);
//print_plays(head);
return head;
}
Above is my code for reading from the file. Can you please help explain what is wrong?
As the comments indicate, one problem is with while(!feof(file)) The linked content will explain in detail why this is not a good idea, but in summary, quoting from one of the answers in the link:
(!feof(file))...
...is wrong because it tests for something that is
irrelevant and fails to test for something that you need to know. The
result is that you are erroneously executing code that assumes that it
is accessing data that was read successfully, when in fact this never
happened. - Kerrek SB
In your case, this usage does not cause your problem, but as Kerrek explains might happen, masks it.
You can replace that with fgets(...):
char lineBuf[1000];//make length longer or shorter for your purpose
file = fopen("/Users/Daniel/Library/Developer/Xcode/DerivedData/project_Audioscrobbler_Artists-hgwyqpinuoxayzbmvarcjxryqnrz/Build/Products/Debug/artist_data.txt", "r");
if(!file) return -1;
while(fgets (lineBuf, sizeof(lineBuf), file))
{
//process each line here
//But processing Japanese characters
//will require special considerations.
//Refer to the link below for UNICODE tips
}
Unicode in C and C++...
In particular, you will need to use variable types that are sufficient for containing the different size characters you will be processing. The link discusses this in great detail.
Here is an excerpt:
"char" no longer means character
I hereby recommend referring to character codes in C programs using a 32-bit unsigned integer type. Many platforms provide a
"wchar_t" (wide character) type, but unfortunately it is to be avoided
since some compilers allot it only 16 bits—not enough to represent
Unicode. Wherever you need to pass around an individual character,
change "char" to "unsigned int" or similar. The only remaining use for
the "char" type is to mean "byte".
Edit:
In the comments above, you state but the string it's failing on is 66 bytes long. Because you are reading into a 'char' array, the bytes necessary to complete the character were truncated one byte before including the last necessary byte. ASCII characters can be contained in a single char space. Japanese characters cannot. If you were using an array of unsigned int instead of array of char, the last byte would have been included.
OP's code failed because the result of fscanf() was not checked.
fscanf(file, "%d\t%65[^\t\n]\n", &artist_ID, artist_name);
The fscanf() read in 65 char of "エリス レジーナ/アントニオ カルロス ジョビン". Yet this string, encoded in UTF8, has a length of 66. The last 'ン' is codes 227, 131, 179 (octal 343 203 263) and only the last 2 were read. When artist_name is printed the following appears.
エリス レジーナ/アントニオ カルロス ジョビ\343\203
Now begins the problem. The last char 179 remains in in file. On the next fscanf(), it fails as char 179 does not convert into a int ("%d"). So fscanf() returns 0. Since code did not check the result of fscanf(), it does not realize artist_ID and artist_name are left over from before and so prints the same text.
As feof() is never true for the char 179 is not consumed, we have infinite loop.
The while(!feof(file)) hid this problem, but did not cause it.
The fgets() proposed by #ryyker is a good approach. Another is:
while (fscanf(file, "%d\t%65[^\t\n]\n", &artist_ID, artist_name) == 2) {
temp = create_play(artist_ID, artist_name, 0, -1);
head = add_play(head, temp);
printf("%s\n", artist_name);
}
IOWs, validate the results of *scanf().

Use of fgets() and gets()

#include <stdlib.h>
#include <stdio.h>
int main() {
char ch, file_name[25];
FILE *fp;
printf("Enter the name of file you wish to see\n");
gets(file_name);
fp = fopen(file_name,"r"); // is for read mode
if (fp == NULL) {
printf(stderr, "There was an Error while opening the file.\n");
return (-1);
}
printf("The contents of %s file are :\n", file_name);
while ((ch = fgetc(fp)) != EOF)
printf("%c",ch);
fclose(fp);
return 0;
}
This code seems to work but I keep getting a warning stating "warning: this program uses gets(), which is unsafe."
So I tried to use fgets() but I get an error which states "too few arguments to function call expected 3".
Is there a way around this?
First : Never use gets() .. it can cause buffer overflows
second: show us how you used fgets() .. the correct way should look something like this:
fgets(file_name,sizeof(file_name),fp); // if fp has been opened
fgets(file_name,sizeof(file_name),stdin); // if you want to input the file name on the terminal
// argument 1 -> name of the array which will store the value
// argument 2 -> size of the input you want to take ( size of the input array to protect against buffer overflow )
// argument 3 -> input source
FYI:
fgets converts the whole input into a string by putting a \0 character at the end ..
If there was enough space then fgets will also get the \n from your input (stdin) .. to get rid of the \n and still make the whole input as a string , do this:
fgets(file_name,sizeof(file_name),stdin);
file_name[strlen(file_name)] = '\0';
Yes: fgets expects 3 arguments: the buffer (same as with gets), the size of the buffer and the stream to read from. In your case your buffer-size can be obtained with sizeof file_name and the stream you want to read from is stdin. All in all, this is how you'll call it:
fgets(file_name, sizeof file_name, stdin);
The reason gets is unsafe is because it doesn't (cannot) know the size of the buffer that it will read into. Therefore it is prone to buffer-overflows because it will just keep on writing to the buffer even though it's full.
fgets doesn't have this problem because it makes you provide the size of the buffer.
ADDIT: your call to printf inside the if( fp == NULL ) is invalid. printf expects as its first argument the format, not the output stream. I think you want to call fprintf instead.
Finally, in order to correctly detect EOF in your while-condition you must declare ch as an int. EOF may not necessarily fit into a char, but it will fit in an int (and getc also returns an int). You can still print it with %c.
Rather than ask how to use fgets() you should either use google, or look at the Unix/Linux man page or the VisualStudio documentation for the function. There are hundreds of functions in C, C++ and lots of class objects. You need to first figure out how to answer the basics yourself, so that your real questions stand a chance of being answered.
If you are new to C, you are definitely doing the right thing of experimenting, but take a look at other code, as you go along, to learn some of the tips/tricks of how code is written.

C - fscanf to read two ints and then strings

I need to store in a text file two integers and then the lines of a text. i've successfully done it by writing each int in a line and each line of the text in a new line too. In order to read it, however, I've found some troubles. I'm doing this:
FILE *f = fopen(arquivo, "r");
char *lna = NULL;
fscanf(f, "%d\n%d\n", &maxCol, &maxLin);
//↑This reads the two ints, works fine in step-by-step
for (;;) {
fscanf(f, "%s\n", &lna);
//↑This sets lna to NULL always, even if there are more lines
if (lna != NULL)
lna[strlen(lna) - 1] = '\0';
if (feof(f))
break;
inserirApos(lista, lna, &atual);
}
fclose(f);
I've tryied a few different ways, but they never worked. I understand I can read everthing like strings, with gets or something, but I think that has a problem if the string contains spaces. I wanted to know if the way I'm doing is the best, and what's wrong with it. I've found one of these methods (that didn't work either) that you have to pass the maximum length of each line. I know this information if necessary, it's the maxCol I read before.
fscanf(f, "%s\n", &lna);
Is the wrong argument type. The %s format expects a char* as argument, but you gave it a char**. And you have not allocated memory to that pointer. fscanf expects a char* pointing to a large enough memory area.
char *lna = malloc(whatever_you_need);
...
fscanf("%s ", lna);
(no difference between the '\n' and ' ' in the fscanf format. both consume the entire whitespace following the string of non-whitespace characters scanned int lna.)
You seem to be expecting fscanf() to dynamically allocate strings for you; that's not at all how it works. This is undefined behavior.
You need to allocate the space for lna first.
char *lna = malloc(MAX_SIZE);//MAX_SIZE is the maximum size the string can be + 1
The additional arguments should point to already allocated objects of the type specified by their corresponding format specifier within the format string.

string input and output in C

I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));

Resources