how do I know if the case is true? - c

Let say if we are given a byte of binary data, how can you know what that data represents?
Is it true that you cant really know what the data represents because you need to know whether the one byte of binary data is represented in base 2, if it unsigned, signed, etc.
or is it that you can know what it represents since binary is base 2?

I am sorry to tell that a byte of data has nothing to do with it's supposed representation.
You state that because it's a byte, it's a binary representation. this is purely assumption.
It depends on the intention of the guy who store the very data.
It might represent anything. As #nos told you, it really depend on the convention the setter used to store it.
You may have a complementary to 2 number, a signed byte on 7 bit, un unsigned on 8 bits, an octal representation (or a partial representation) or a mask (each group of byte within the byte may describer something totally different than another). It could also be a representation of a special coding. Etc.
This is truly unlimited.
In order to properly interpret it you need to know the underlying convention (a spec). #fede1024 told you about files, which use special character so that you can double check with the convention.
One more thing… Bear in mind that even binary data can be stored in natural order or in reverse order: that's endianness. So when you examine a number store in at least 2 bytes, you have to know whether the most significant byte is stored first or sec on din memory. If you misinterpret this, you won't understand the underlying piece of data. Endianness is a constant for a given processor.

Base-2 and binary refer to the same thing. Typically, you do need to know whether the byte is signed or unsigned at least (in C). As for what the data represents - well, "it depends". Whether you want to interpret it as a single byte, as a character (or not), etc. With multi-byte data, you often also have to take endianness (ordering of the bytes into larger words) into account.

Some files format start with a magic number, for example all PNG files starts with 89 50 4E 47 0D 0A 1A 0A. That said, if you have a general binary file without any kind of magic number, you can just guess about his contents.
You can try to open it with an hexadecimal editor, but there is no automatic way to understand what the data represents.

You know it's base 2 since it's a byte of binary data, as you said. To see if it is true, in C everything but 0 is true. If it's 0, then it's false.

Related

About Memory Address convention? [duplicate]

Whenever I see C programs that refer directly to a specific location on the memory (e.g. a memory barrier) it is done with hexadecimal numbers, also in windows when you get a segfualt it presents the memory being segfualted with a hexadecimal number.
For example: *(0x12DF)
I am wondering why memory addresses are represented using hexadecimal numbers?
Is there a special reason for that or is it just a convention?
Memory is often manipulated in terms of larger units, such as pages or segments, which
tend to have sizes that are powers of 2. So if addresses are expressed in hex, it's
much easier to read them as page+offset or similar constructs. Decimal is difficult because
of that pesky factor of 5, and binary addresses are too long to be easily readable.
Its a much shorter way to represent what would otherwise be written in binary. It is also very nice and easy to convert hex to binary and back. Each 4 digits of binary corresponds to one digit of hex.
Convention and convenience: hex shows more clearly what relationship various pointers have to address segmenting. (For example, shared libraries are usually loaded on even hex boundaries, and the data segment likewise is on an even boundary.) DEC minicomputer convention actually preferred octal, but IBM's hex preference won out in practice.
(As for why this matters: what's easier to remember, 0xb73eb000 or 3074338816? It's the address of one of the shared objects in my current shell on jinx.)
It's the shortest, common number format, thus the numbers don't take up much place and everybody knows what they mean.
Computer only understands binary language which is collection of 0's and 1's. That means ON/OFF. As in case of the human readability the binary number which may be representing some address or data has to be converted into human readable format. Hexadecimal is one of them. But the question can be why we have converted binary to HEX only why not decimal, octal etc. Answer is HEX is the one which can be easily converted with the least amount of overhead on both HW as well as SW. thats why we are using addresses as HEX. But internally they are used as binary only.
Hope it helps :)

Why is infinity = 0x3f3f3f3f?

In some situations, one generally uses a large enough integer value to represent infinity. I usually use the largest representable positive/negative integer. That usually yields more code, since you need to check if one of the operands is infinity before virtually all arithmetic operations in order to avoid overflows. Sometimes it would be desirable to have saturated integer arithmetic. For that reason, some people use smaller values for infinity, that can be added or multiplied several times without overflow. What intrigues me is the fact that it's extremely common to see (specially in programming competitions):
const int INF = 0x3f3f3f3f;
Why is that number special? It's binary representation is:
00111111001111110011111100111111
I don't see any specially interesting property here. I see it's easy to type, but if that was the reason, almost anything would do (0x3e3e3e3e, 0x2f2f2f2f, etc). It can be added once without overflow, which allows for:
a = min(INF, b + c);
But all the other constants would do, then. Googling only shows me a lot of code snippets that use that constant, but no explanations or comments.
Can anyone spot it?
I found some evidence about this here (original content in Chinese); the basic idea is that 0x7fffffff is problematic since it's already "the top" of the range of 4-byte signed ints; so, adding anything to it results in negative numbers; 0x3f3f3f3f, instead:
is still quite big (same order of magnitude of 0x7fffffff);
has a lot of headroom; if you say that the valid range of integers is limited to numbers below it, you can add any "valid positive number" to it and still get an infinite (i.e. something >=INF). Even INF+INF doesn't overflow. This allows to keep it always "under control":
a+=b;
if(a>INF)
a=INF;
is a repetition of equal bytes, which means you can easily memset stuff to INF;
also, as #Jörg W Mittag noticed above, it has a nice ASCII representation, that allows both to spot it on the fly looking at memory dumps, and to write it directly in memory.
I may or may not be one of the earliest discoverers of 0x3f3f3f3f. I published a Romanian article about it in 2004 (http://www.infoarena.ro/12-ponturi-pentru-programatorii-cc #9), but I've been using this value since 2002 at least for programming competitions.
There are two reasons for it:
0x3f3f3f3f + 0x3f3f3f3f doesn't overflow int32. For this some use 100000000 (one billion).
one can set an array of ints to infinity by doing memset(array, 0x3f, sizeof(array))
0x3f3f3f3f is the ASCII representation of the string ????.
Krugle finds 48 instances of that constant in its entire database. 46 of those instances are in a Java project, where it is used as a bitmask for some graphics manipulation.
1 project is an operating system, where it is used to represent an unknown ACPI device.
1 project is again a bitmask for Java graphics.
So, in all of the projects indexed by Krugle, it is used 47 times because of its bitpattern, once because of its ASCII interpretation, and not a single time as a representation of infinity.

How do I work with bit data in C

In class I've been tasked with writing a C program that decompresses a text file and prints out the characters it contains. Each character in the file is represented by 2 bits (4 possible characters).
I've recently been informed that a byte is not necessarily 8 bits on all systems, and a char is not necessarily 1 byte. This then makes me wonder how on earth I'm supposed to know how many bits got loaded from a file when I loaded 1 byte. Also how am I supposed to keep the loaded data in memory when there are no data types that can guarantee a set amount of bits.
How do I work with bit data in C?
A byte is not necessarily 8 bits. That much is certainly true. A char, on the other hand, is defined to be a byte - C does not differentiate between the two things.
However, the systems you will write for will almost certainly have 8-bit bytes. Bytes of different sizes are essentially non-existant outside of really, really old systems, or certain embedded systems.
If you have to write your code to work for multiple platforms, and one or more of those have differently sized chars, then you write code specifically to handle that platform - using e.g. CHAR_BIT to determine how many bits each byte contains.
Given that this is for a class, assume 8-bit bytes, unless told otherwise. The point is not going to be extreme platform independence, the point is to teach you something about bit fiddling (or possibly bit fields, but that depends on what you've covered in class).
This then makes me wonder how on earth I'm supposed to know how many
bits got loaded from a file when I loaded 1 byte.
You'll be hard pressed to find a platform where a byte is not 8 bits. (though as noted above CHAR_BIT can be used to verify that). Also clarify the portability requirements with your instructor or state your assumptions.
Usually bits are extracted using shifts and bitwise operations, e.g. (x & 3) are the rightmost 2 bits of x. ((x>>2) & 3) are the next two bits. Pick the right data type for the platforms you are targettiing or as others say use something like uint8_t if available for your compiler.
Also see:
Type to use to represent a byte in ANSI (C89/90) C?
I would recommend not using bit fields. Also see here:
When is it worthwhile to use bit fields?
You can use bit fields in C. These indices explicitly let you specify the number of bits in each part of the field, if you are truly concerned about width. This page gives a discussion: http://msdn.microsoft.com/en-us/library/yszfawxh(v=vs.80).aspx
As an example, check out the ieee754.h for usage in the context of implementing IEEE754 floats

retrieve cobol s9(2) COMP onto C variable

I need to retrieve data from a COBOL variable of the type: "PIC S9(2) COMP" onto a C variable of the type "int".
It's stored using two bytes of a string, so I receive it as a couple of chars.
I know COBOL stores decimal data onto a "S9(2) COMP" in binary format, so It would be a great help letting me know any algorithm or way to convert it safely.
Any kind of help & suggestion will be welcome.
Update:
Finally we decided to change the picture of the variable to 9(3) in the COBOL part of the implementation, because of the endianess problem.
Thanks to all of you for the answers.
You should be able to treat that as a short, or a 16-bit twos complement integer. You will need to check for endianess though depending upon which platform originated the field.
The format of s9(2) comp will depend on the Cobol Compiler. Most Cobol compilers I know will store it a 2 byte big-endian integer (high byte first; Intel processors store low byte first). Exceptions include
MicroFocus (depends on compile parameters) - normally 1 byte integer
RM-Cobol : Has its own "Binary Format".
If it is a 2 byte integer
On Big-Endian machine (IBM Mainframe, Power etc) it should be a 2-Byte integer
On Little-Endian (Intel) you need to swap the bytes around.
The S9(2) COMP indicates it is a left-aligned (bit 0 at the left of word) not explicity synchronised signed numeric field, probably in an internal or pseudo-binary format, that can hold 2 digits and possibly a bit in the word (or in a different memory location) is used for indicating if the value is positive or negative. A cobol program can have a specific method for storing a COMP data item which may not be directly compatible with C. It appears that you need to access and test the sign indicator in a relevant way to check it and you may have to access each of the 2 bytes (2 of 8 bits) or characters (2 of 6 bits), check if they are big-endian or little-endian and put them into a signed integer field in C. A lot will depend on the architecture and compilers of the computer involved in creating and reading in the data field and can be more complicated if 2 computer types are involved.

Convert really big number from binary to decimal and print it

I know how to convert binary to decimal. I know at least 2 methods: table and power ;-)
I want to convert binary to decimal and print this decimal. Moreover, I'm not interested in this `decimal'; I want just to print it.
But, as I wrote above, I know only 2 methods to convert binary to decimal and both of them required addition. So, I'm computing some value for 1 or 0 in binary and add it to the remembered value. This is a thin place. I have a really-really big number (1 and 64 zeros). While converting I need to place some intermediate result in some 'variable'. In C, I have an `int' type, which is 4 bytes only and not more than 10^11.
So, I don't have enough memory to store intermedite result while converting from binary to decimal. As I wrote above, I'm not interested in THAT decimal, I just want to print the result. But, I don't see any other ways to solve it ;-( Is there any solution to "just print" from binary?
Or, maybe, I should use something like BCD (Binary Coded Decimal) for intermediate representation? I really don't want to use this, 'cause it is not so cross-platform (Intel's processors have a built-in feature, but for other I'll need to write own implementation).
I would glad to hear your thoughts. Thanks for patience.
Language: C.
I highly recommend using a library such as GMP (GNU multiprecision library). You can use the mpz_t data type for large integers, the various import/export routines to get your data into an mpz_t, and then use mpz_out_str() to print it out in base 10.
Biggest standard integral data type is unsigned long long int - on my system (32-bit Linux on x86) it has range 0 - 1.8*10^20 which is not enough for you, so you need to create your own type (struct or array) and write basic math (basically you just need an addition) for that type.
If I were you (and memory is not an issue), I'd use an array - one byte per decimal digit rather then BCD. BCD is more compact as it stores 2 decimal digits per byte but you need to put much more effort working with high and low nibbles separately.
And to print you just add '0' (character, not digit) to every byte of your array and you get a printable string.
Well, when converting from binary to decimal, you really don't need ALL the binary bits at the same time. You just need the bits you are currently calculating the power of and probably a double variable to hold the results.
You could put the binary value in an array, lets say i[64], iterate through it, get the power depending on its position and keep adding it to the double.
Converting to decimal really means calculating each power of ten, so why not just store these in an array of bytes? Then printing is just looping through the array.
Couldn't you allocate memory for, say, 5 int's, and store your number at the beginning of the array? Then manually iterate over the array in int-sized chunks. Perhaps something like:
int* big = new int[5];
*big = <my big number>;

Resources