How a file in a disk is represented [closed] - file

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
i had weird question or rather stupid question
when i open a binary file using text editor it doesn't seem like it represented in binary 0 and 1 or hex so what representation is that???
IHDR\00\00k\00\00\C3\00\00\00\A2\B6\8D$\00\00\00sBIT|d\88\00\00 \00IDATx\9C̽Y\AC-\CBy\DF\F7\FB\AA\AA\BBװ\87\B3\CFtϹ󽼗

The hard disk (as well as any other digital device in your computer) transmits data as 0 and 1. And all files are just sequences of numbers, and they are all 'binary' in the sense that they all are bunch of bits. But some of the files can be read by a human (after a simple decoding that is performed by text viewers), and we call those 'text' files; and others are in machine-oriented language and are not targeted to human's perception at all or at least without a special software (those are called 'binary').
A text editor tries to display these data as text. As "plain" text files usually contain a text encoded by 8 bits per 1 character, your editor interprets each binary octet (each byte) as an integer number containing a character's code, and displays the appropriate character. For some codes, there are no printable characters in the encoding table; these characters are usually displayed with squares, question marks or (as in your case) with their numerical (hexadecimal) codes.
Some editors can show pure hexadecimal representation of file, and it's rather convenient feature for low-level data analysis since hexadecimals are compact and can be quite easily converted to a binary representation.

This is hexadecimal representation along with ASCII representation of characters your software is able to display..

Related

Should open any C file as a binary file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I've read somewhere we should always open file in C as a Binary file (even if it's a text file). At the time (few years ago) I didn't care too much about it, but now I really need to understand if that's the case and how come.
I've been trying to search for info on this but the most I find is the opening difference between them - not even their structural difference.
So I guess my question is: why should we always open the file as a binary even if we guess before hand it's a text file? Second question lies on the structure of each file itself, is a binary file like an "encrypted" text file?
The names "text" vs. "binary", while quite mnemonic, can sometimes leave you wondering which one to apply. It's best to translate them to their underlying mechanics, and choose based on which one of those you need.
"Binary" could also be called "verbatim" opening mode. Each byte in the file will be read exactly as-is on disk. Which means that if it's a Windows file containing the text "ABC" on one line (including the line terminator), the bytes read from the file will be 65 66 67 13 10.
"Text" mode could also be called "line-terminator translating" opening mode. When the file contains a sequence of 1 or more characters which is defined by the platform on which you're running as "line terminator"(1), the entire sequence will be read from the file, but the runtime will make it appear as if only the character '\n' (10 when using ASCII) was read. For the same Windows-file above, if it was opened as a text file on Windows, the bytes read from the file would be 65 66 67 10.
The same applies when writing: a file openend as "binary" for writing will write exactly the bytes you give it. A file opened as "text" will translate the byte '\n' (10 in ASCII) to whatever the platform defines as the line-terminating character sequence.
I don't think an "always do this" rule can be distilled from the above, but perhaps you can use it to make an informed decision for each case.
(1) On Unix-style systems, the line-terminating character sequence is LF (ASCII 10). On Windows, it's the two-character sequence CR LF (ASCII 13 10). On old pre-X Mac OS, it was just the single-character CR (ASCII 13).

Unknown characters when printing text file in c [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to print the characters from a text file using C in CodeBlock terminal. I use getc and printf. But the terminal shows unwanted characters as well. For example,
when I read,
CAAAAATATAAAAACAGGTTTATGATATAAGGTAAAGTATGGGAGATGGGGACAAAAGT
It shows,
CΘA A A A A T A T A A A A A C A G G T T T A T G A T A T A A G GT A A A G T A T$GhGêG╝A G<AöT G#GñG<G AxC A A A A G T
Can any one please state what can be done to avoid this situation.
Your text file obviously uses a 2byte character encoding. If this is on windows, it's very likely UTF-16.
char in C is a single byte, so a single-byte encoding is assumed. There are many ways to solve this, e.g. you could use iconv. On windows, you can use wchar_t(*) to read the characters of this file (together with functions for wide characters like getwc() and if you need it in an 8byte encoding, windows API functions like WideCharToMultiByte() can help.
wchar_t is a type for "wide" characters, but it's implementation-defined how many bytes a wide character has. On windows, wchar_t has 16 bits and typically holds UTF-16 encoded characters. On many other systems, wchar_t has 32 bits and typically holds UCS-4 encoded characters.

C programming - hexadecimal or decimal convention [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
My question is pretty simple. I am newbie in microcontrollers world and trying to understand usage of haxadecimal or decimal naming conventions. I saw much C code and first part of programmers uses decimal naming convention:
#define TEST_BUTTON_PORT 1
#define TEST_BUTTON_BIT 19
the second part uses hexadecimal way:
#define IOCON_FUNC0 0x0
#define IOCON_FUNC1 0x1
Is any important reason for different conventions? Is just a programmer choice?
The purpose of hex is to ease the use of binary numbers, since binary is very hard for humans to read. Some examples when hexadecimal is used:
Describing binary numbers and binary representations.
Dealing with hardware addresses.
Doing bit-wise arithmetic.
Declaring bit masks/bit fields.
Dealing with any form of raw data, such as memory dumps, machine code or data protocols.
An exception to this is, oddly, when specifying the number of bits to shift. This is almost always done in decimal notation. If you wish to set bit 19 you would usually do it by writing:
PORT |= 1 << 19;
This assuming bits are enumerated from 0 to n.
I suppose this is because decimal format is more convenient when enumerating things, such as bit/pin numbers. (And manufacturers of MCUs etc usually enumerate pins with decimal notation.)

UTF-8 encoding in c [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Improve this question
what is UTF-8 encoding? I google it but could not able to understand what it is. Please explain in simple words and example.
Next I need to encode one string in UTF-8 encoding. I got openssl but it is converting in only base64 format.
#include<stdio.h>
struct some
{
char string[40];
};
int main()
{
string *s;
char str[9];
gets(str);
strcpy(s,str);
/*Now how to get emcoded form of "Hello" in UTF-8*/
/*printf("encoded data");
return 0;
}
Those strings are available at runtime so do not anything about what is coming. and after encoding need to store them in DB.
I checked it on SO itself but could not find any source in c, it is available in .net java c#. I am using linux Redhat.
Encodings describe what bytes or sequence of bytes correspond to what characters. ASCII is the simplest encoding. In ASCII a single byte value corresponds to a single character. Unfortunately there are more than 255 characters in the world. UTF-8 is probably the most common encoding format because it is compatible with english ASCII, but also allows international characters. If you write a standard english string in C it is already UTF-8. "Hello" == "Hello"
Joel has a fantastic article about this subject called: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
It does a good job of explaining ASCII, unicode, and UTF8 string encodings.
In UTF-8, every code point from 0-127 is stored in a single byte. Only
code points 128 and above are stored using 2, 3, in fact, up to 4 (not 6, corrected by R.)
bytes.

Is there a C library that converts integers to hexadecimal or binary? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Is there a library function that takes in an integer and converts it to a single-byte hexadecimal or binary number?
For example, if I passed it the input of 64, it would output 0x40.
For hex numbers, you can use sprintf:
char buff[80];
sprintf(buff, "0x%02x", 64);
An int is an int, whether it is 0x40 or 64; the data representation of the two is exactly the same (10000000...011111111). If you are asking how it would be represented in a char array, you'd use sprintf. The simplest way is sprintf(buf, "%#x", 64).
Internally, integers are already represented as binary. You can display a number as hexadecimal using the %x format string (%#02x will fit your example best).
See this question regarding binary, for which there isn't a built-in format string specifier.
In C the int type's size depends upon implementation. Normally, it will be 4 bytes long, and thus impossible to storing in a single byte without losing important information.
If you use a char or int8_t then you will have a single byte. Bytes are binary internally and always will be. So anytime you want to do anything with your byte, you must do it in binary.
Hexadecimal vs binary vs base 10 is a display decision. So if you accept those as input, you will have to convert a string into a single byte for storage in memory. When you display them, you will have to convert to the desired display format.
Using sprintf works for display. Use strtol for input.

Resources