Same file: different when opened with notepad vs. wordpad - file

Was accessing a file in a code with both C# and C++. When file is opened in notepad it looks like this (one integer at left and the rest of numbers are double):
But the same file when opened with WordPad looks like this (one integer next to each double):
Why do they look different?

It has to do with the way that newlines are encoded in your file. Windows recognizes a newline as consisting of two characters (\r\n) whereas some other operating systems, namely Unix-based ones, use only \n or \r. WordPad is smart enough to recognize both newline types, but Notepad is not.

Because notepad and Wordpad use different ways to read out files, apperantly this file is written in a way that both read it differently...

Because Notepad and WorkPad understand \r\n differently

Notepad and WordPad treat "new line" differently - one accepts just \n, another requires \r\n to recognize "new line" (and some would be ok with \n\r).
Similar goes for many other editors. I.e. if your try to open the file in Visual Studio it is likely to ask something like "Do you want to convert Unix new lines to Windows new lines".
If you are writing file with C# use WriteLine rather than manually adding \n or at least use Envirnment.NewLine to write "new line" to stream.
Similarly in C++ you can write "\r\n" instead of just "\n" if you must open file in Notepad or other editor that requires such sequence (most editors/viewers would be ok with either).

Related

How to solve the \r (carriage return) problem that prevents Windows text files from being cross-platform with Linux?

Intro
There is a "feature" in Windows that adds a carriage return character before every newline. This is an old thing, but I am having trouble finding specific solutions to the problem.
Problem
If you create a file in Windows with fopen("file.txt", "w"), it will create a text file where every \n in your code is converted to \r\n. This creates a problem in situations where you try to read the file on Linux, because there is now an unaccounted-for character in the mix, and most line reads depend on reading to \n.
Research
I created text ("w") and binary ("wb") files on Windows and Linux, and tried to read and compare them with the same files made on the other OS.
Binary files do not get an added carriage return, and seem to work fine.
Text files, on the other hand, are a mixed bag:
On Windows, comparing the Windows text file (\r\n) to the Linux text file (\n) will result in them being read equally (this is apparently a feature of some C Windows implementations - \r\n will get automatically read as \n)
On Linux, comparing the files will result in them not being equal and the \r will not be handled, creating a reading problem.
Conclusion
I would like to know of ways how to handle this so my text files can be truly cross-platform. I have seen a lot of posts about this, but most have mixed and opposing claims and almost none have specific solutions.
So please, can we solve this problem once and for all after 40 years? Thank you.

creating .mid file: writing a '\n' causes '\r\n' in Windows

I have created a .mid file by writing bytes to a file and save it as .midi. I can run it and it works, but there are some special cases where it does not.
If I write a byte containing \n (ASCII 10) then it will instead write 2 bytes \r\n, which makes the .mid not runnable. (This is normal for Windows machine to do, but not desirable in my case.) An example of writing \n could be when picking the key which is being represented by \n.
Is there a workaround to write \n and not \r\n or another way to make sure that byte written is ASCII 10 on a Windows machine?
Thanks!
On linux/unix, it doesn't matter whether you specify "wb" or "w" to create a file.
But creating a text file using fopen in windows means that all \n are converted to \r\n, so if you're using this to create binary files, the binary files will be "corrupt" if there are some bytes with value "10" (linefeed)
Simple solution: always use fopen("file.bin","wb") when creating a binary file, on all platforms so your code is portable.

C file operation font in arabic

I've written a C program consisting of file operations (a .txt file).
When I open the output file in notepad, I don't read the contents in Latin script (or in simpler words - letters of the English alphabet), but some other script.
However when I open the file in C (using fopen etc.), I get the output in English (Latin script) again.
How do I view the output in English (Latin script) in notepad??
Sounds like the classic "Bush hid the facts" problem.
In the Notepad file open dialog, you can see a drop-down that allows specifying an encoding. Since the file does not start with an unicode BOM, Notepad has to guess the encoding of the file (it does so as soon as you highlight the file in the list). And sometimes, it guesses wrong (but that's visible in the drop-down, and you can change it before clicking OK).

Changing backslashes to forward slashes changes file size

I have two small to medium sized files (2k) that are for all intents and purposes identical. The second file is the result of the first file being duplicated and replacing backslashes with forward slashes. The new file is bigger by 80 bytes (or one byte per line).
I did this with a simple batch script,and at first I thought the script might have unintentionally added some spaces or other artifacts. Or maybe the fact that their extensions are different has something to do with it (one has a tmp extension and the other has a lst extension).
From an editor, I replaced all forward slashes in the new file with backslashes and saved it without changing the extension.
And, hey guess what? The files were the same size again.
Now, before this is written off as a random fluke, I also see the same behavior exhibited in three other pairs of files (in other words six files) created in the same manner as the first. They are all one byte bigger per line in the file. The largest is about 12k bytes, and the smallest is about 2k.
I wouldn't think it has anything to do with escaping because I am on a Windows box using the Windows 7 cmd.exe shell.
Also one other thing. I tried the following:
echo \\\\\ >> a.txt
echo ///// >> b.txt
The files matched in size (7 bytes)
Does anyone have an explanation for this behavior?
I would suggest opening the files with an editor like Notepad++ that shows the type of linefeed (Windows/Mac/Unix). This is most likely your problem if the file size differs 1 byte per line.
Notepad++ can show line endings as small CR/LF symbols (View -> Show Symbol -> Show End of Line) and convert between the Windows/Mac/Unix line endings (Edit -> EOL Conversion).
Both Unix and Mac systems are usually storing files with an one byte line ending (Mac: CR, Unix: LF), Windows uses two bytes (CR LF).
Depending on the programs your batch scripts use, this might occur even though your system is a pure Windows box. The reason you don't get a difference when using an editor is that editors usually keep the file's original line endings.
Okay. I just solved it. #schnaader pointed me in the right direction. It actually has nothing to do with the forward or backslashes.
What happened is that my script added one character of trailing white space to each line. Why the file again became the same size after I reverted the slashes is because the editor I used to find and replace (Komodo Edit) is set up to automatically trim trailing white space on file save.
Funny.

Line Break in Programming

Using \n represents line breaks in all programming languages I know of. But when it is used to write string to txt file, there is either \n written or replaced by some symbol or other element when viewed under notepad.. But other text editors such as Notepad++, Kate etc display fine.
What is the reason for this discrepancy?
Can I make the program e.g. C to write line breaks so that notepad itself displays as line break and not some symbol in between?
P.S. Everyone should have experienced this sometime in programming life. Who used windows.
You'll need a \r\n for Notepad in windows. Notepad++ and Kate are far better editors that are \r\n and \n aware and display the line break as you would like, or as you configure them.

Resources