Difference between binary zeros and ASCII character zero - c

gcc (GCC) 4.8.1
c89
Hello,
I was reading a book about pointers. And using this code as a sample:
memset(buffer, 0, sizeof buffer);
Will fill the buffer will binary zero and not the character zero.
I am just wondering what is the difference between the binary and the character zero. I thought it was the same thing.
I know that textual data is human readable characters and binary data is non-printable characters. Correct me if I am wrong.
What would be a good example of binary data?
For added example, if you are dealing with strings (textual data) you should use fprintf. And if you are using binary data you should use fwrite. If you want to write data to a file.
Many thanks for any suggestions,

The quick answer is that the character '0' is represented in binary data by the ASCII number 48. That means, when you want the character '0', the file actually has these bits in it: 00110000. Similarly, the printable character '1' has a decimal value of 49, and is represented by the byte 00110001. ('A' is 65, and is represented as 01000001, while 'a' is 97, and is represented as 01100001.)
If you want the null terminator at the end of the string, '\0', that actually has a 0 decimal value, and so would be a byte of all zeroes: 00000000. This is truly a 0 value. To the compiler, there is no difference between
memset(buffer, 0, sizeof buffer);
and
memset(buffer, '\0', sizeof buffer);
The only difference is a semantic one to us. '\0' tells us that we're dealing with a character, while 0 simply tells us we're dealing with a number.
It would help you tremendously to check out an ascii table.
fprintf outputs data using ASCII and outputs strings. fwrite writes pure binary data. If you fprintf(fp, "0"), it will put the value 48 in fp, while if you fwrite(fd, 0) it will put the actual value of 0 in the file. (Note, my usage of fprintf and fwrite were obviously not proper usage, but shows the point.)
Note: My answer refers to ASCII because it's one of the oldest, best known character sets, but as Eric Postpichil mentions in the comments, the C standard isn't bound to ASCII. (In fact, while it does occasionally give examples using ASCII, the standard seems to go out of its way to never assume that ASCII will be the character set used.). fprintf outputs using the execution character set of your compiled program.

If you are asking about the difference between '0' and 0, these two are completely different:
Binary zero corresponds to a non-printable character \0 (also called the null character), with the code of zero. This character serves as null terminator in C string:
5.2.1.2 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.
ASCII character zero '0' is printable (not surprisingly, producing a character zero when printed) and has a decimal code of 48.

Binary zero: 0
Character zero: '0', which in ASCII is 48.

binary data: the raw data that the cpu gets to play with, bit after bit, the stream of 0s and 1s (usually organized in groups of 8, aka Bytes, or multiples of 8)
character data: bytes interpreted as characters. Conventions like ASCII give the rules how a specific bit sequence should be displayed by a terminal, a printer, ...
for example, the binary data (bit sequence ) 00110000 should be displayed as 0
if I remember correctly, the unsigned integer datatypes would have a direct match between the binary value of the stored bits and the interpreted value (ignore strangeness like Endian ^^).
On a higher level, for example talking about ftp transfer, the destinction is made between:
the data should be interpreted as (multi)byte characters, aka text (this includes non-character signs like a line break)
the data is a big bit/bytestream, that can't be broken down in smaller human readable bits, for example an image or a compiled executable

in system every character have a code and zero ASCII code is 0x30(hex).
to fill this buffer with zero character you must enter this code :
memset(buffer,30,(size of buffer))

Related

How does atof.c work? Subtracting an ASCII zero from an ASCII digit makes it an int? Am I missing something?

So as part of my C classes, for our first homework we are supposed to implement our own atof.c function, and then use it for some tasks. So, being the smart stay-at-home student I am I decided to look at the atof.c source code and adapt it to meet my needs. I think i'm on board with most of the operations that this function does, like counting the digits before and after the decimal point, however there is one line of code that I do not understand. I'm assuming this is the line that actually converts the ASCII digit into a digit of type int. Posting it here:
frac1 = 10*frac1 + (c - '0');
in the source code, c is the digit that they are processing, and frac1 is an int that stores some of the digits from the incoming ASCII string. but why does c- '0' work?? And as a followup, is there another way of achieving the same result?
There is no such thing as "text" in C. Just APIs that happen to treat integer values as text information. char is an integer type, and you can do math with it. Character literals are actually ints in C (in C++ they're char, but they're still usable as numeric values even there).
'0' is a nice way for humans to write "the ordinal value of the character for zero"; in ASCII, that's the number 48. Since the digits appear in order from 0 to 9 in all encodings I'm aware of, you can convert from the ordinal value in the encoding (e.g. ASCII) to actual numeric values by subtracting away '0' to get actual int values from 0 to 9.
You could just as easily subtract 48 directly (when compiled, it would be impossible to tell which option you used; 48 and ASCII '0' are indistinguishable), it would just be less obvious what you were doing to other people reading your source code.
The ASCII value of '0' is the 48'th character in code page 437 (IBM default character set). Similarly, '1' is the 49'th etc. Subtracting '0' instead of a magic number such as 48 is much clearer as far as self-documentation goes.

itoa providing 7-bit output to character input

I am trying to convert a character to its binary using inbuilt library (itoa) in C(gcc v-5.1) using example from Conversion of Char to Binary in C , but i'm getting a 7-bit output for a character input to itoa function.But since a character in C is essentially an 8-bit integer, i should get an 8-bit output.Can anyone explain why is it so??
code performing binary conversion:
enter for (temp=pt;*temp;temp++)
{
itoa(*temp,opt,2); //convert to binary
printf("%s \n",opt);
strcat(s1,opt); //add to encrypted text
}
PS:- This is my first question in stackoverflow.com, so sorry for any mistakes in advance.
You could use printf( "%2X\n", *opt ); to print a 8-bit value as 2 hexadecimal symbols.
It would print the first char of opt. Then, you must increment the pointer to the next char with opt++;.
The X means you want it to be printed as uppercase hexadecimal characters (use x for lowercase) and the 2 will make sure it will print 2 symbols even if opt is lesser than 0x10.
In other words, the value 0xF will be printed 0x0F... (actually 0F, you could use printf( "%#2X\n", *opt ); to print the 0x).
If you absolutely want a binary value you have to make a function that will print the right 0 and 1. There are many of them on the internet. If you want to make yours, reading about bitwise operations could help you (you have to know about bitwise operations if you want to work with binaries anyways).
Now that you can print it as desired (hex as above or with your own binary function), you can redirect the output of printf with the sprintf function.
Its prototype is int sprintf( char* str, const char* format, ... ). str is the destination.
In your case, you will just need to replace the strcat line with something like sprintf( s1, "%2X\n", *opt); or sprintf( s1, "%s\n", your_binary_conversion_function(opt) );.
Note that using the former, you would have to increment s1 by 2 for each char in opt because one 8-bit value is 2 hexadecimal symbols.
You might also have to manage s1's memory by yourself, if it was not the case before.
Sources :
MK27's second answer
sprintf prototype
The function itoa takes an int argument for the value to be converted. If you pass it a value of char type it will be promoted to int. So how would the function know how many leading zeros you were expecting? And then if you had asked for radix 10 how many leading zeros would you expect? Actually, it suppresses leading zeros.
See ASCII Table, note the hex column and that for the ASCII characters the msb is 0. Printable ASCII characters range from 0x20 thru 0x7f. Unicode shares the characters 0x00 thru 0x7f.
Hex 20 thru 7f are binary 00100000 thru 01111111.
Not all binary values are printable characters and in some encodings are not legal values.
ASCII, hexadecimal, octal and binary are just ways of representing binary values. Printable characters are another way but not all binary values can be displayed, this is the main data that needs to be displayed or treated as character text is generally converted to hex-ascii or Base64.

K&R 1-7 is it solvable by using putchar() instead of printf?

There are many questions about this exercise all over the internet, but I couldn't find any solution (nor any hint) on how to solve this exercise using 'putchar'.
Write a program to print the value of EOF.
I can easily get a working answer to this:
printf("%d", EOF);
I'd like to know if there are any known (or logical) answers using 'putchar' (as I guess that was the whole purpose of the exercise, being located at the end of a paragraph on 'getchar' and 'putchar')
Writing:
putchar(EOF);
or
int c;
c = EOF;
putchar(c);
the program just launches and closes itself without showing any text.
putchar converts its argument to unsigned char before it's outputted (and it's a character which is written, not the result of a decimal conversion). As EOF is negative, this conversion isn't value-preserving. Commonly, EOF has the value -1, with 8-bit char, that conversion results in 255, that is, putchar(EOF) is equivalent to putchar('\xff'), which isn't a printable character (assuming an ASCII system). Piping the output to hexdump -C would make the output visible.
Without using printf and friends, a function outputting a decimal number for an int can be written. Basically,
print a '-' if the value is negative
for every decimal digit, starting from the highest-valued:
convert the digit's value to a character (e.g. 7 should become '7')
try to print the result (with putchar, for example)
if that fails, return an error indicator
Hint: The division and modulus operators, / and %, will be useful.
The digits '0' ... '9' are ascending (and contiguous); so for the conversion from the digit's value to a character, adding '0' yields the desired result (3 + '0' is '3', for example).
Care must be taken to avoid integer overflow, even if corner cases like INT_MIN are passed to the function. -INT_MIN may result in an overflow (and in fact does on pretty much every system), a positive number, on the other hand, can always be negated. A value of 0 may need special handling.
(Alternatively, the number can be converted to a string first which then can be outputted. A char array of size 1 + sizeof(int)*CHAR_BIT/3+1 + 1 is big enough to hold that string, including minus sign and 0-terminator.)
If you get stuck, the (non-standard) itoa function does something similar, looking for example implementations may give some ideas.
EOF is not a char which can be printed as you expected. So putchar(EOF) doesn't print any values, as it prints only char.
Hence use printf("%d", EOF) which outputs integer -1.
putchar(int char) So if you pass a ASCII value to this API you get a corresponding character printed on the screen but EOF evaluates to -1 which is not a printable ASCII character so you don't see anything or might see some junk.
As per the man page of putchar(),
int putchar(int c);
putchar(c); is equivalent to putc(c,stdout).
and
putc() is equivalent to fputc() .......fputc() writes the character c, cast to an unsigned char, to stream.
This, putchar() is supposed to output the char representation of the ASCII value supplied as it's argument.
So, if the value of EOF is a non printable character [in ASCII], you won't see anything ["without showing any text"] on stdout.

writing escape sequence in C using hex, dec, and oct values?

Can someone explain this question to me? I don't understand how the book arrived at its values or how one would arrive at the answer.
Here is the question:
Suppose that ch is a type char variable. Show how to assign the carriage-return character to ch by using an escape sequence, a decimal value, an octal character constant, and a hex character constant. (Assume ASCII code values.)
Here is the answer:
Assigning the carriage-return character to ch by using:
a) escape sequence: ch='\r';
b) decimal value: ch=13;
c) an octal character constant: ch='\015';
d) a hex character constant: ch='\xd';
I understand the answer to part a, but am completely lost for parts b, c, and d. Can you explain?
Computers represent characters using character encondings, such as ascii, utf-8, utf-16, iso-8859 (http://en.wikipedia.org/wiki/ISO/IEC_8859-1), as well as others. The carriage return character was used by early computers as a printer instruction to return the printhead to the leftmost position. And the linefeed character was used to index the paper to a new line (thus why DOS uses CRLF for lines, it worked better with dot matrix printers). Anyway the CR character is stored internally as a numeric value in either a single 8-bit byte/octet or a 16-bit pair of two bytes/octets, depending upon your language.
The common ascii characterset is found here: http://www.asciitable.com/ and you can find that CR, '\r', 13, 0xD, et al are different representations for the same value.
Strings are just sequences of characters stored either as an array of characters with a marker at the end (terminator), or stored with a count of the current string length.
From wiki:
Computers and communication equipment represent characters using a
character encoding that assigns each character to something — an
integer quantity represented by a sequence of bits, typically — that
can be stored or transmitted through a network. Two examples of usual
encodings are ASCII and the UTF-8 encoding for Unicode.
For your question b,c,d - all values are 13 (in decimal). Run this code to understand what's happening:
char ch1='\r';
printf("Ascii value of carriage return is %d", ch1);
There are two parts to explaining answers b-d.
You need to know that the ASCII code point for 'carriage return' or CR (also known as Control-M) is 13. You can find that out from various sources. It might not be obvious that the Unicode standard is one of those places (but it is) and U+000D is CARRIAGE RETURN (CR). Unicode code points U+0000..U+007F are identical to ASCII; Unicode code points U+0000..U+00FF are identical to ISO 8859-1 (Latin 1).
You need to know that C can use decimal numbers, or octal or hexadecimal escapes when assigning to characters. Notations such as '\15' or '\015' are octal character constants, and octal 15 is decimal 13. Notations such as '\xD' or '\x0D' (or, indeed, '\x0000000000000D' and all stops en route) are hexedecimal constants and hex D is also decimal 13. (Note that octal escapes are limited to 1-3 digits, but hex escapes are not so limited, but values larger than '\xFF' typically have implementation defined representations.)

Endianness -- why do chars put in an Int16 print backwards?

The following C code, compiled and run in XCode:
UInt16 chars = 'ab';
printf("\nchars: %2.2s", (char*)&chars);
prints 'ba', rather than 'ab'.
Why?
That particular implementation seems to store multi-character constants in little-endian format. In the constant 'ab' the character 'b' is the least significant byte (the little end) and the character 'a' is the most significant byte. If you viewed chars as an array, it'd be chars[0] = 'b' and chars[1] = 'a', and thus would be treated by printf as "ba".
Also, I'm not sure how accurate you consider Wikipedia, but regarding C syntax it has this section:
Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.
So it appears the 'ab' multi-character constant format should be avoided in general.
It depends on the system you're compiling/running your program on.
Obviously on your system, the short value is stored in memory as 0x6261 (ba): the little endian way.
When you ask to decode a string, printf will read byte by byte the value you have stored in memory, which actually is 'b', then 'a'. Thus your result.
Multicharacter character literals are implementation-defined:
C99 6.4.4.4p10: "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."
gcc and icl print ba on Windows 7. tcc prints a and drops the second letter altogether...
The answer to your question can be found in your tags: Endianness. On a little endian machine the least significant byte is stored first. This is a convention and does not affect efficiency at all.
Of course, this means that you cannot simply cast it to a character string, since the order of characters is wrong, because there are no significant bytes in a character string, but just a sequence.
If you want to view the bytes within your variable, I suggest using a debugger that can read the actual bytes.

Resources