I have two files that have the same character ('\n' or 10 in ASCII) but display differently. Why is this? - database

They're both .iif files, except one is generated by QuickBooks and the other is generated by an application I wrote. I've looked at the binary, and as far as I can tell there is no reason why they display differently in Notepad, and the one I wrote can't be imported into Quickbooks.
Edit: I looked at the binary in Eclipse. Windows requires an additional carriage return to display a new line correctly.

Related

How to solve the \r (carriage return) problem that prevents Windows text files from being cross-platform with Linux?

Intro
There is a "feature" in Windows that adds a carriage return character before every newline. This is an old thing, but I am having trouble finding specific solutions to the problem.
Problem
If you create a file in Windows with fopen("file.txt", "w"), it will create a text file where every \n in your code is converted to \r\n. This creates a problem in situations where you try to read the file on Linux, because there is now an unaccounted-for character in the mix, and most line reads depend on reading to \n.
Research
I created text ("w") and binary ("wb") files on Windows and Linux, and tried to read and compare them with the same files made on the other OS.
Binary files do not get an added carriage return, and seem to work fine.
Text files, on the other hand, are a mixed bag:
On Windows, comparing the Windows text file (\r\n) to the Linux text file (\n) will result in them being read equally (this is apparently a feature of some C Windows implementations - \r\n will get automatically read as \n)
On Linux, comparing the files will result in them not being equal and the \r will not be handled, creating a reading problem.
Conclusion
I would like to know of ways how to handle this so my text files can be truly cross-platform. I have seen a lot of posts about this, but most have mixed and opposing claims and almost none have specific solutions.
So please, can we solve this problem once and for all after 40 years? Thank you.

DOSBox autoexec menu design

I'm trying to make a (somewhat) stylish DOS menu as a present for my father.
I was able to get the whole menu system to work, but I wanted to gussy it up with some box drawing characters and, possibly, colored text.
In this YouTube video, the user shows an example of what I'm trying to do (example at the 5:00 mark), but doesn't explain how those characters are being rendered. In the Notepad document, it is displayed as goofy characters.
Do I need to save the file with a special type of encoding? Can it only be done in Notepad (I'm using TextEdit on Mac)? Can someone provide an example menu that can be added to DOSBox's [autoexec] config?
Also, I'm not sure if it is possible, but how can the text color/background color be changed? When running DOSBox initially, it shows their welcome screen with a blue background and box drawing characters, so I would think all of that is possible.
I tried using escaped unicode characters and I tried using a capital-E acute (as shown in the linked video), but they just render funky stuff when run in DOSBox.
The discrepancy in characters is a result of different code pages being used in character rendering. English-speaking Windows uses ANSI code page 1252 (otherwise known as Latin-1), while DOS uses OEM code page 437, or IBM-PC.
The codepage that Windows uses will vary based on your system language, so you many need to experiment to find the correct characters, but basically, find the character you want to print in 437 (say ╔, which is 200) and then in your code use the 1252 version (where 200 is È). Then save the file in ANSI encoding.

Understanding compile errors due to copying code from a doc file and not a txt file

SITUATION:
My instructor for my micro-controller class refuses to save sample code to a text file and instead saves it to a word document file instead. When I open up the doc file and copy/paste the code into my IDE "CodeWarrior" it causes errors upon compile time.
I am having to rewrite all the code into a text editor and then copy/paste it into my IDE.
MY UNDERSTANDING:
I was told to always save code as a text file because when you save code as a word document file it will bring in unwanted characters when your copy/pasting the code into your IDE for compiling.
MY QUESTIONS TO YOU:
1.)
Can someone explain this dilemma to me so I can understand it better? I would like to present a better case next time when I receive errors and to also know more about what is happening.
2.)
Is it possible to write a script that will show me all the characters that are being copied and pasted into a file when the code is coming from a word document vs. a text file? In otherwords is there a program that will allow me to see what is going on between copying/pasting code from a word doc file versus a txt file?
Saving source code as a Word document is just silly. If your instructor is insisting on this, chances are no matter how well-reasoned and thorough your argument, they're not going to listen. They're beyond help.
However, to answer your questions: 1) It depends on what you're pasting the thing into. Programs that copy onto the clipboard usually make the data available in several different formats, ranging from their own internal format to plain ASCII text, to maximize compatibility so that the data can be pasted into pretty much any target program. Most text editors will only accept the plan-text version, in which case no extra characters should be transferred. However if your text editor supports RTF or HTML, this may not be true. I'm not sure what CodeWarrior supports but it is certainly possible.
A workaround if this is the case: First paste into a PURE text editor like Notepad. Then copy from Notepad into CodeWarrior. This should eliminate any hidden formatting. As shoover said above, make sure double-quotes " are really double-quotes and not the fancy left- and right-specific quotes that Word sometimes uses.
Use a hex editor like XVI32 to see the raw contents of the file, including nonprinting characters. Or use a text editor with support for showing nonprinting characters (vi/vim, etc.).
I'm studying C and I've just had the same problem. When coping a piece of code from a PDF file and trying to compiling it, gcc would return a serie of errors. Reading the answer above I had an idea: "What if I converted the utf8 into ascii?". Well, I found a website that does just that (https://onlineutf8tools.com/convert-utf8-to-ascii). But instead of also converting the utf8 characters into ascii, it showed them as hexadecimals (Copying from the website to the text editor you can see it better). From there i realised that the problem were mostly the quote marks "".
I then copied the ascii "translation" into my code editor (I must add that it worked fine with Sublime, while VScode read the same utf8 code as it was in the original file, even after cp from the website) and replaced all the hex with the actual ascii characters that were needed to compile the code properly. I used the function find and replace from my editor to do it. I must say that it wasn't very fast doing it. But I believe that in some cases, if the code you're trying to copying is too long, doing it the way I've just described could be faster than rewriting the entire code.

How to get cakePHP i18n .pot files encoded in UTF-8?

I'm using the cake i18n command to extract the content of my __() functions in my application.
However, the default.pot output file is not encoded in UTF-8, and thus does not correctly display accentuated characters which is a problem since the main language is french (lot of 'é' , 'à' ...).
I'm using wamp server on Windows 7.
I've tried to change Windows console's encoding with chcp, to convert default.pot file in UTF-8 with notepad++ or PSpad editor, without success.
Do you know any way to get this default.pot file in UTF-8 ?
All the .php or .ctp files are edited with either Komodo or Geany, both on Windows and configured to use UTF-8.
Also i'm using subversion, if it helps.
Thank you for reading.
Had the same problem with cakephp 1.3 (not sure if it is fixed in 2.x): all "special" character which are not ANSI compliant (such as ä, ü, ö, ß), where extracted in the .pot file and there interpreted by ANSI (e.g. "ü" instead of "ü").
The solution mentioned by Camille (manually changing the characters) was not very feasable, since it was a lot of characters, this partly destroyed the .pot format and even worse an automated update of your .po file won't work.
The workaround I found was with the help of a comment in the php documentation for write() (which is used in the console task): http://www.php.net/manual/en/function.fwrite.php#73764.
According to the description there, I extended the file /cake/console/libs/tasks/extract.php with two lines:
First line went into the function __buildFiles():
$string = utf8_decode($string);
I wrote it in line 351, but it just needs to be in the second foreach loop and of couse before the variable is used by the function.
the second line went into the function __writeHeader():
replace the line $File->write($output); with
$File->write(utf8_encode($output));
That did it for me, take care the updating your cakephp will overwrite this changes.
I found a way to deal with this thanks to #Msalters. I changed the default encoding of my editor and overwrote wrong characters.

How to read lines from a pdf file into a c program using ghostscript?

I am currently taking a curse in C programming, and for our final project we need to read some text from a pdf into a string, so we can manipulate the string.
In essence what i am looking for is something similar to this, only with a .pdf instead of a .txt file.
char *line;
fscanf(myfile.txt," %[^\n]", line);
I have no experience with ghostscript, so I have no idea if this is even possible, although we where told that we should use ghostscript.
The current version of Ghostscript includes the 'txtwrite' device, which will extract text from any supported input (PostScript, PDF, XPS, PCL) and will emit it in a variety of forms.
The UTF-8 output would probably be most useful to you.
Caveat! Many things which appear to be text in PDF files are not text, and no attempt is made to deal with these.
ps2ascii is deprecated with the release of the txtwrite device, but in any case its perfectly capable (despite the name) of dealing with PDF as an input.
I can't think why anyone assigned you this project, PDF files are not text files, and cannot be treated as such. In addition to the fact that PDF files are generally compressed, identifying the contents stream and all the other streams it relies on (which may themselves include text) is non-trivial. Plus, the text is often encoded in a way which can be difficult to understand (this is particularly true of CIDFonts and TrueType fonts).
Perhaps your tutor expected you to first become expert in the PDF format, but that seems excessive for a C course.
You can convert your PDF to Postscript using pdf2ps, and then to ASCII using ps2ascii. You already know how to read ASCII.
Both utilities mentioned are in the ghostscript package.

Resources