C - Inputting string into an int variable? - c

In the following code,
#include <stdio.h>
int main()
{
int i = 5;
scanf("%s", &i);
printf("%d\n", i);
return 0;
}
I take the input string that is stored at the address of i. When I try to print the variable i, I get some number.
Input example:
hello
Output:
1819043176
What number is this and what exactly is happening?

This program writes the string that it reads from the user into the memory occupied by the variable i and past it. As this is undefined behavior, anything could happen.
What is actually happening is that on your machine int is the size of 4 chars, and the characters "hell", when converted into ASCII and interpreted as a number in the CPUs byte order, turns out to be the number 1819043176. The rest of the string, the letter o and the terminating nul character, are past the end of where i is stored on your machine. So what scanf does is this:
h e l l o \0
|68 65 6c 6c|6f 00 ...
| i|memory past i
You seem to be running this on a little-endian machine, so that when the bytes 68 65 6c 6c are stored into an int it's interpreted as the number 0x6c6c6568, or 1819043176 in decimal.
If int was different size, or if the machine used another character set (like EBCDIC instead of ASCII), or if the CPU used big-endian byte order, or if the program runs in an environment where memory writes are bound-checked, you would get different results or a program crash. In short, undefined behavior.

Related

Is it possible to read character array elements into a struct using ```strtoul``` in C?

I'm working on a project for a class and I could use some guidance. I need to parse a character array into constituent parts - the specifications of which I am given - but I am unsure how to do so in C.
I have been given a file and each page of the file is read into a buffer as a character array like so:
typedef struct page_t {
char reserved[PAGESIZE];
} page_t;
I have been given the following specifications about the pages read:
For each page it starts with a 2 byte gap offset, followed key-value records, a gap at the indicated offset, and lastly an 8 byte address at the end pointing to the next page
The key-value records are of the following form: 8 byte unsigned integer key followed by a value where the first 4 bytes are an unsigned integer inidicating the length of the string part of the value and a string of variable length (it will be the length indicated in the 4 bytes previously mentioned so the total length of the value portion will be length+4)
There can be multiple key-value records in the file but the sum of all key-value records will not exceed 4086 bytes and the gap is always at the end of the file prior to the address of the next page
Since I have not been given anymore explanation about format of the page read in and I need to parse through the char array I was wondering if I could do something like use the strtoul function to read the 8 bytes of the array at a time to find the correct key (and to skip over the key's values if they are not the key I am trying to match). I asked my TA about it and the answer I got was:
You can use functions that convert character (byte) arrays to numbers. Consider making a toy example program that converts a structure to a character array and back to see if scan/atoi/strtoll... have the expected behavior. If the functions do not work you can also consider reading iteratively. You may find them useful to extract the key/value size. The value as a string should work!
So I tried making a short program that converted a struct to an array and back and tried using strtoul on the string but I'm not sure that I'm doing it correctly.
So my tester program looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
typedef struct record_test {
uint64_t key;
uint32_t val_size;
char value[255];
} record_test;
int main( int argc, char ** argv ) {
record_test record = {1234, 13, "asdfghjklqwer"};
char page[4096];
// print what is in record
printf("Here's the record itself:\n");
printf("key: %llu\n", record.key);
printf("val_size: %u\n", record.val_size);
printf("record: %s\n", record.value);
memcpy(page, &record, sizeof(record_test));
// print what is in page
printf("Here's what's in the page:\n");
printf("page: %s\n", page);
// check page contents with pointer
record_test* revert;
revert = (record_test*)page;
printf("Here's the reverted record using pointers:\n");
printf("key: %llu\n", revert->key);
printf("val_size: %u\n", revert->val_size);
printf("record: %s\n", revert->value);
// reading what is in page using strtoul
char* endKey;
char* value;
printf("reading using strtoul:\n");
printf("key: %lu\n", strtoul(page, &endKey, 8));
printf("val size: %d\n", (int)strtoul(endKey, &value, 4));
printf("value: %s\n", value);
}
And these are the results I'm getting from it when I use printf to follow it:
Here's the record itself:
key: 1234
val_size: 13
record: asdfghjklqwer
Here's what's in the page:
page: ?
Here's the reverted record using pointers:
key: 1234
val_size: 13
record: asdfghjklqwer
reading using strtoul:
key: 0
val size: 0
value: ?
So based on the pointer that I used to recast the struct, the character array does have the right information in it but for whatever reason the character array itself is showing ? when I try to print it and similarly the printf statements showing what strtoul is reading is showing 0 for the integers. I'm not sure what's going on here, why am I getting ? when that character isn't even in the value string?Can someone tell me where I am going wrong or if I can even use this function at all? Should I be trying to iterate though the character array using bitwise operations to read it instead?
Any help would be great! Thank you!
I'm going to try to help you understand what's happening here. When you do memcpy to "flatten" your structure, let's analyze what should be going into memory.
We start out with 1234. Convert that to hexadecimal and that becomes 04D2. Now a uint64_t is probably an 8 byte long structure on your machine (you can verify this by doing a sizeof(uint64_t)), so in memory you can expect the first 8 bytes to be 00 00 00 00 00 00 04 D2.
Next up, you have 13, which in hexadecimal is 0D and it's in a uint32_t. This is typically half of what a uint64_t is, probably 4 bytes long on your machine (again, you can verify with sizeof). This means the next 4 bytes would be 00 00 00 0D.
Finally, you have an array of 255 char. char's are 1 byte long each. Each letter in your text asdfghjklqwer gets converted to an ASCII code representing that letter, so the hexadecimal would be 61 73 64 66 67 ... and the rest of those 255 bytes are just random data that's in your memory.
Now one final thing to keep in mind is the endianness of your computer. If your computer has an Intel processor or AMD processor, then your computer is using little-endian. If you're unfamiliar with what endianness is, then look at this Wikipedia article for an explanation. But, simply put, endianness refers to the order that bytes are written to memory. Little endian (which is probably what you have) means that the little ends of the bytes are written first.
So what does this mean? Up above I said the first 8 bytes in memory would be 00 00 00 00 00 00 04 D2. For little endian machines, this isn't really true. The bytes are actually written right to left. What's actually in memory would be D2 04 00 00 00 00 00 00. Hopefully this makes sense.
So now, with some little modifications to your program, you can actually print out what's in your computer's memory and you can see more clearly what I am talking about.
First, in your program, change char page[4096]; to unsigned char page[4096];. The reason is because all this would be easier to understand with unsigned characters. If you really want to know how signed and unsigned numbers work in a computer system, Google twos compliment to learn more. For now, just change it to unsigned. Then add this to your program:
record_test record = { 1234, 13, "asdfghjklqwer" };
unsigned char page[4096];
// print what is in record
printf("Here's the record itself:\n");
printf("key: %llu\n", record.key);
printf("val_size: %u\n", record.val_size);
printf("record: %s\n", record.value);
memcpy(page, &record, sizeof(record_test));
// print what is in page
for (int i = 0; i < sizeof(record_test); i++)
{
printf("page[%d] = %02X\n", i, page[i]);
}
When you run this program, it will execute the memcpy like before, but then I have it printing out the data stored at the page address. Try modifying your record and see if you can understand what my explanation is all about!
Hopefully this all made sense! Good luck!!!

Why does this variable equal a number and a character?

While debugging this code snippet here:
int main () {
char str[] = "Stackoverflow";
char a = *str;
return 0;
}
Why does a show as 83 'S'?
I think you might want to have more than one thing clarified:
First, pointer str points to the first character of a sequence of character values in memory, i.e. S, t, a, ...
Then, *str dereferences this pointer, i.e. it reads the value of the character to which str points. Hence *str yields S.
Statement char a = *str assigns the value S to variable a of type char, which represents a portion of memory capable of storing one character. Usually, char is an 8 bit signed (or unsigned) integral value, and any simple character is therefore represented by a value between -127 and +128 (the range of signed 8 bit values). The character value S, for example, seems to be represented as integral value 83 according to ASCII. Whether a system uses ASCII or some other character set is system defined, but ASCII is by far the most common character set today.
So S and 83 are actually the same thing, it's just that when a terminal interprets value 83 to be printed as character, it prints S. The other way round, if we interpret S as integral value, a terminal would print 83:
#include <stdio.h>
int main() {
printf("'S' as integral value: %d\n", 'S');
printf("83 as character value: %c\n", 83);
char c1 = 'S';
char c2 = 83;
if (c1 == c2) {
printf("c1 and c2 are equal.\n");
} else {
printf("c1 and c2 are not equal.\n");
}
}
Output:
'S' as integral value: 83
83 as character value: S
c1 and c2 are equal.
83 is the ASCII code for uppercase letter 's'.
*str is equal to writing str[0] so in this case the first memory slot of the array str which corresponds to the character 'S'
Computers understand everything as numbers: Characters, strings, photos, videos, audio ... etc. Everything is a number inside a computer and thus people wondered how to represent characters.
And because of this fact, they decided to encode characters as numbers so that every character has a corresponding number that encodes it inside the computer.
Throughout history, many character encoding schemes (A matching between characters and numbers) have been worked out but one of them is very famous and almost used everywhere : It's called ASCII character encoding. ASCII is a 7-bits encoding that represents all numerical characters and Latin alphabet characters (Uppercase and lowercase) beside some other symbols.
By default, your system provides ASCII input to your C program and thus, internally, this input is stored in memory as ASCII standard says. For instance, when you type A on your keyboard, the keyboard sends the value 65 (This is the decimal value of the character A in the ASCII standard. Internally, it is sent as a sequence of 1000001101 because computers work in binary) to your program. Your program stores this value (65) inside a memory location specified by a variable (char c;). When you ask the computer to print this character, it checks the ASCII value stored in the character's variable and then figures out a way how to draw the matching symbol on the screen.
In C, strings are just a sequence (Or an array) of characters. When you hold a pointer to a string, it actually points to the first character of the string (The character array). If you advance the pointer by 1, you will point to the second character and so on. So, if you dereference your original pointer (That points to the first character), you will get the ASCII value of the character stored in that position (The first position) and thus in your case you get 83 which corresponds to the symbol 'S'.
The program below shows all ASCII characters and their graphical representation : Some few characters might not have a visual representation because they are used for controlling input and terminal, especially, the first few characters (First 34 values).
#include <stdio.h>
int main ()
{
/* Unsigned to avoid integer overflow in the loop below */
unsigned char c;
/* ASCII is 7-bit so it can represent
2^7 = 128 (from 0 to 127) symbols */
for (c = 0; c < 128; c++)
printf ("ASCII value of %c = %d\n", c, c);
return 0;
}

How can C read chinese from console and file

I'm using ubuntu 12.04
I want to know how can I read Chinese using C
setlocale(LC_ALL, "zh_CN.UTF-8");
scanf("%s", st1);
for (b = 0; b < max_w;b++)
{
printf("%d ", st1[b]);
if (st1[b] == 0)
break;
}
For this code, when I input English, it outputs fine, but if I enter Chinese like"的",it outputs
Enter word or sentence (EXIT to break): 的
target char seq :
-25 -102 -124 0
I'm wondering why there is negative values in the array.
Further, I found that the bytes of a "的" in file read using fscanf is different from reading from the console.
UTF-8 encodes characters with a variable number of bytes. This is why you see three bytes for the 的 sign.
At graphemica - 的, you can see that 的 has the value U+7684 which translates to E7 9A 84 when you encode it in UTF-8.
You print every byte separately as an integer value. A char type might be signed and when it is converted to an integer, you can get negative numbers too. In your case this is
-25 = E7
-102 = 9A
-124 = 84
You can print the bytes as hex values with %x or as an unsigned integer %u, then you will see positive numbers only.
You can also change your print statement to
printf("%d ", (unsigned char) st1[b]);
which will interpret the bytes as unsigned values and show your output as
231 154 132 0
There's no need (and in fact it's harmful) to hard-code a specific locale name. What characters you can read are independent of the locale's language (used for messages), and any locale with UTF-8 encoding should work fine.
The easiest (but ugly once you try to go too far with it) way to make this work is to use the wide character stdio functions (e.g. getwc) instead of the byte-oriented ones. Otherwise you can read bytes then process them with mbrtowc.

How does C retrieve the specific ASCII value?

In the below program
#include<stdio.h>
int main()
{
int k=65;
printf(" The ASCII value is : %c",k);
return 0;
}
The output is "The ASCII Value is : A" .
I just don't understand how does %c brought the corresponding ASCII value of that number?
I mean how does an integer value is referred to %c(instead of %d) and still brought the ASCII value?
How does this process work? Please explain.
Its not the "%c" that is doing it. When you run your program, all it does is outputs a sequence of bytes (numbers) to the standard output. If you use "%c" it will output a single byte of value 65 and if you use "%d" it will output two bytes, one with value 54 for the 6 and with value 53 for the 5. Then, your terminal displays those bytes as character glyphs, according to what encoding it is using. If your terminal is using an ascii-compatible encoding then 65 will be the code for "A".
The short version: your operating system has a table that links a graphical symbol to each integer , and for 65 it has linked the graphics for "A". The ASCII standard says that 65 should be linked to a graphic that humans can read as "A".
I think the following answer helps you. Credits to Sujeet Gholap.
the %c works as following:
1. Take the least significant byte of the variable
2. Interpret that byte as an ascii character
that's it
so, when you look at the four bytes of 65, (in hex), they look like
00 00 00 41
the %c looks at the 41
and prints it as A
that's it
#include<stdio.h>
int main()
{
int k=65 + 256;
printf(" The ASCII value is : %c",k);
return 0;
}
consider that code
where k is
00 00 01 41
here, even now, the last byte is 41
so, it still prints A

Why does printf("%s",charstr) increasingly prints more than expected with each fread()?

In an attempt to learn file structures, I am trying to read in a .wav file and simply print information about it. I have a struct that holds all the information defined as so:
typedef struct{
char chunkId[4];
unsigned int chunkSize;
char format[4];
char subchunk1Id[4];
unsigned int subchunk1Size;
unsigned short audioFormat;
unsigned short numChannels;
unsigned int sampleRate;
unsigned int byteRate;
unsigned short blockAlign;
unsigned short bitsPerSample;
char subchunk2Id[4];
unsigned int subchunk2Size;
void *data;
} WavFile;
What's happening is that for each time I fread through the file, It causes my c-strings to print longer and longer. Here's a sample code snippet:
fseek(file, SEEK_SET, 0);
fread(wavFile.chunkId, 1, sizeof(wavFile.chunkId), file);
fread(&wavFile.chunkSize, 1, sizeof(wavFile.chunkSize), file);
fread(wavFile.format, 1,sizeof(wavFile.format), file);
fread(wavFile.subchunk1Id, 1, sizeof(wavFile.subchunk1Id), file);
fread(&wavFile.subchunk1Size, 1, sizeof(wavFile.subchunk1Size), file);
fread(&wavFile.audioFormat, 1, sizeof(wavFile.audioFormat), file);
printf("%s\n",wavFile.chunkId);
printf("%d\n",wavFile.chunkSize);
printf("%s\n",wavFile.format);
printf("%s\n",wavFile.subchunk1Id);
printf("%d\n",wavFile.subchunk1Size);
printf("%d\n",wavFile.audioFormat);
Something in the way I have my struct setup, the way I'm reading the file, or the way that printf() is seeing the string is causing the output to print as shown:
RIFF�WAVEfmt
79174602
WAVEfmt
fmt
16
1
The expected output:
RIFF
79174602
WAVE
fmt
16
1
I do understand that c-strings need to be null terminated, but then I got to thinking how is printing a string from a binary file any different from printing a string literal like printf("test");? The file specifications requires that the size of the members to have the exact sizes defined in my struct. Doing char chunkId[5]; and then chunkId[4]='\0'; won't seem to be a good solution to this problem.
I've been trying to resolve this for a couple days now, so now I'm coming to SO to maybe get a push in the right direction.
For full disclosure, here's the hex output of the relevant portion of the file because this webform doesn't show all garbled mess that is showing up on my output.
52 49 46 46 CA 1B B8 04 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 02 00 44 AC 00 00 98 09 04 00 06 00 18 00 64 61 74 61
If you know the size, you can limit the output of printf:
// Only prints 4-bytes from format. No NULL-terminator needed.
printf("%.4s\n", wavFile.format);
If the size is stored in a different field, you can use that too:
// The * says: print number of chars, as dictated by "theSize"
printf("%.*s\n", wavFile.theSize, wavFile.format);
The way you have called printf(), it expects a '\0' terminated string, but your struct elements aren't (fread() doesn't add '\0' and format, chunkId etc. don't have enough length to contain it).
The simplest way is:
printf( "%.*s\n", (int)sizeof(wavFile.format), wavFile.format );
If it is not a null terminated string you can use .* and an extra int argument which specifies the size of the string to printf, for example:
printf("%.*s\n", (int)sizeof(wavFile.chunkId), wavFile.chunkId);
or alternatively:
printf("%.4s\n", wavFile.chunkId);
which in your case may be simpler since the size seems to be fixed in your case.
From the printf document above the precision specifier in the format string works as follows:
(optional) . followed by integer number or * that specifies precision of the conversion. In the case when * is used, the precision is specified by an additional argument of type int. If the value of this argument is negative, it is ignored. See the table below for exact effects of precision.
and the table below which this text references says the following for character string:
Precision specifies the maximum number of bytes to be written.
First, be sure you're reading the file in binary mode (use fopen with the mode set to "rb"). This makes no difference on Unix-like systems, but on others reading a binary file in text mode may give you corrupted data. And you should be checking the value returned by each fread() call; don't just assume that everything works.
printf with a %s format requires a pointer to a string. A string always has a null character '\0' to mark the end of it.
If you have a chunk of data read from a file, it's unlikely to have a terminating null character.
As the other answers say, there are variations of the %s format that can limit the number of character printed, but even so, printf won't print anything past the first null character that happens to appear in the array. (A null character, which is simply a byte with the value 0, may be valid data, and there may be more valid data after it.)
To print arbitrary character data of known length, use fwrite:
fwrite(wavFile.chunkId, sizeof wavFile.chunkId, 1, stdout);
putchar('\n');
In this particular case, it looks like you're expecting chunkId to contain printable characters; in your example, it has "RIFF" (but without the trailing null character). But you could be reading an invalid file.
And printing binary data to standard output can be problematic. If it happens to consist of printable characters, that's fine, and you can assume that everything is printable in an initial version. But you might consider checking whether the characters in the array actually are printable (see isprint()), and print their values in hexadecimal if they're not.

Resources