Should open any C file as a binary file [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I've read somewhere we should always open file in C as a Binary file (even if it's a text file). At the time (few years ago) I didn't care too much about it, but now I really need to understand if that's the case and how come.
I've been trying to search for info on this but the most I find is the opening difference between them - not even their structural difference.
So I guess my question is: why should we always open the file as a binary even if we guess before hand it's a text file? Second question lies on the structure of each file itself, is a binary file like an "encrypted" text file?

The names "text" vs. "binary", while quite mnemonic, can sometimes leave you wondering which one to apply. It's best to translate them to their underlying mechanics, and choose based on which one of those you need.
"Binary" could also be called "verbatim" opening mode. Each byte in the file will be read exactly as-is on disk. Which means that if it's a Windows file containing the text "ABC" on one line (including the line terminator), the bytes read from the file will be 65 66 67 13 10.
"Text" mode could also be called "line-terminator translating" opening mode. When the file contains a sequence of 1 or more characters which is defined by the platform on which you're running as "line terminator"(1), the entire sequence will be read from the file, but the runtime will make it appear as if only the character '\n' (10 when using ASCII) was read. For the same Windows-file above, if it was opened as a text file on Windows, the bytes read from the file would be 65 66 67 10.
The same applies when writing: a file openend as "binary" for writing will write exactly the bytes you give it. A file opened as "text" will translate the byte '\n' (10 in ASCII) to whatever the platform defines as the line-terminating character sequence.
I don't think an "always do this" rule can be distilled from the above, but perhaps you can use it to make an informed decision for each case.
(1) On Unix-style systems, the line-terminating character sequence is LF (ASCII 10). On Windows, it's the two-character sequence CR LF (ASCII 13 10). On old pre-X Mac OS, it was just the single-character CR (ASCII 13).

Related

Find the number of characters from the first character of nth occurrence of a character in a text file without using an array and strchr function? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Fore example lets assume the following is a text in a file:
"I want to find number of characters from the first character is the nth occurrence of a character in a text file, but, I want to do this without declaring an array, storing the text file in it and applying the strchr function."
Lets say I want to find the position of the second new line character in the text? How many number of characters is the new line from the first character in the text? For e.g. The first occurrence character 't' is the 6th character of the text file.
If it is possible can someone please explain how? If no, can someone please explain why?
That can be done in an operating system that supports memory mapped files. This includes Windows and POSIX operating systems such as Linux.
In POSIX it is done using mmap(); there is example code in the documentation.
For Windows there is an API for for memory mapped files. Again, sample code is included in the documentation.

Unknown characters when printing text file in c [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to print the characters from a text file using C in CodeBlock terminal. I use getc and printf. But the terminal shows unwanted characters as well. For example,
when I read,
CAAAAATATAAAAACAGGTTTATGATATAAGGTAAAGTATGGGAGATGGGGACAAAAGT
It shows,
CΘA A A A A T A T A A A A A C A G G T T T A T G A T A T A A G GT A A A G T A T$GhGêG╝A G<AöT G#GñG<G AxC A A A A G T
Can any one please state what can be done to avoid this situation.
Your text file obviously uses a 2byte character encoding. If this is on windows, it's very likely UTF-16.
char in C is a single byte, so a single-byte encoding is assumed. There are many ways to solve this, e.g. you could use iconv. On windows, you can use wchar_t(*) to read the characters of this file (together with functions for wide characters like getwc() and if you need it in an 8byte encoding, windows API functions like WideCharToMultiByte() can help.
wchar_t is a type for "wide" characters, but it's implementation-defined how many bytes a wide character has. On windows, wchar_t has 16 bits and typically holds UTF-16 encoded characters. On many other systems, wchar_t has 32 bits and typically holds UCS-4 encoded characters.

How to implement word spell checking with c? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to create spell checker in c for my assessment. I manage to start the work - I have text file which content all the dictionary, I code how to read the files, how to compare chosen file with the dictionary file and now I need to print out to new text file where the misspelled words been found, what they are, and their correct version. The big problem is, that I have no idea how to do this. My code right now can say that there is a difference between the files. But I don't know how make strcmp check string by string, word by word, if something is wrong.
In dictionary files are all the words, so of course, if my program is reading other file, compare, then writes all the words which aren't in the file to the new output file with errors, these output-error words will be also just random words, which aren't even in the text file, or connected with the text file.
I hope I explained my problem well and there is somebody who would tell me how to fix this problem. I don't even ask for the code, I just need some idea how I would need to code the rest of the program. And sorry, for my English, it's my second language, so I still make grammatical mistakes.
Here are some steps you can follow:
read the dictionary into a memory structure, for example an array of strings, which you will sort in lexicographical order (with strcmp).
read the file line by line, and for each line iterate these steps:
initialize a highlight line with spaces, same length as the line read.
skip characters that cannot be part of a word with strcspn(), save the index i.
scan the characters that can be part of a word with strspn() save this number n.
if n is 0, this is the end of line
look up the word at index i with n chars in the dictionary (potentially ignoring the case)
if the word cannot be found, set the corresponding characters in the warning line with ^ characters.
update the index i += n and iterate.
if at least one word was not found in the line, output the line and the warning line.
Study these standard functions:
strspn()
strcspn()
qsort()
bsearch()

Why does Ctrl-J and Ctrl-M return 10 in c programming on OSX? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Consider this thread What is EOF in the C programming language?
The answer was that EOF (Ctrl-D) results in that getchar returns -1
My question is what does Ctrl-J and Ctrl-M represent in c on OSX and why does getchar return 10 for both using the same code as in link above?
What other shortcuts (Ctrl-somthing / Cmd-something) results in that getcharturns a static predefined number?
Ctrl-J is the shortcut for the line feed control character, having the character code 10. Here is a page with other control characters
I as of this time do not know why Ctrl-M (ASCII value 13) returns 10 but assume it is due to it being similar in function to the line feed.
The reason EOF returns -1 is because its value is -1 on most systems.
Some other defined characters:
Ctrl-G: 7
Ctrl-I: 9
...
Ctrl-V: 22
stdin is typically in text mode. Various conversions occur per OS concerning line endings when reading/writing in text mode. Crtl-M is one of them - it is converted to 10. Had IO been in binary mode, no conversion would be expected.
Consoles map various keyboard combinations to various char and actions (like Ctrl-D --> EOF). The various char created certainly include most of the values 0 to 127. As these values are typically mapped to ASCII, the first 32 values (Ctrl-#, CTRL-A, CTRl-B, ... Ctrl-_), they may have no graphical representation
Note: Notice what is returned when getchar() is called again after it returned EOF. Expect it to immediately return EOF again without waiting for any additionally key presses. (Ctrl-D) set a condition, not a char.

How a file in a disk is represented [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
i had weird question or rather stupid question
when i open a binary file using text editor it doesn't seem like it represented in binary 0 and 1 or hex so what representation is that???
IHDR\00\00k\00\00\C3\00\00\00\A2\B6\8D$\00\00\00sBIT|d\88\00\00 \00IDATx\9C̽Y\AC-\CBy\DF\F7\FB\AA\AA\BBװ\87\B3\CFtϹ󽼗
The hard disk (as well as any other digital device in your computer) transmits data as 0 and 1. And all files are just sequences of numbers, and they are all 'binary' in the sense that they all are bunch of bits. But some of the files can be read by a human (after a simple decoding that is performed by text viewers), and we call those 'text' files; and others are in machine-oriented language and are not targeted to human's perception at all or at least without a special software (those are called 'binary').
A text editor tries to display these data as text. As "plain" text files usually contain a text encoded by 8 bits per 1 character, your editor interprets each binary octet (each byte) as an integer number containing a character's code, and displays the appropriate character. For some codes, there are no printable characters in the encoding table; these characters are usually displayed with squares, question marks or (as in your case) with their numerical (hexadecimal) codes.
Some editors can show pure hexadecimal representation of file, and it's rather convenient feature for low-level data analysis since hexadecimals are compact and can be quite easily converted to a binary representation.
This is hexadecimal representation along with ASCII representation of characters your software is able to display..

Resources