Mercurial: which config property controls the encoding of file contents? - file

Is there a dedicated Mercurial configuration property which specifies the encoding of file contents and hence should be used by a Mercurial client to properly display a file?
I've found web.encoding which does not seem to mean exactly what I'm looking for. Also, Google gave some results for ui.encoding as well, but I couldn't find any hints in the reference.

Mercurial is not concerned with the encoding of the files you put in your repository: Mercurial is happy to store files with any encoding (or maybe not particular encoding at all).
This means that you can add files with UTF-8, Latin-1, or any other encoding to your repository and Mercurial will check them out exactly as they were when you added them.
The encoding of each file is not stored anywhere in Mercurial and it is up to the client to recognize the encoding (perhaps based on file content where it makes sense, e.g., for XML files).
For a Mercurial desktop client (as per your comments below) I suggest looking at the file content:
Can you decode it at UTF-16?
Can you decode it as UTF-8?
Are the NUL bytes in the file? Then stop and declare it to be "binary".
Fallback on a Latin-N encoding such as Latin-1 for Western Europe.
The UTF-16 and UTF-8 encodings are nice since they are structured and this makes it possible for you to detect that a file isn't valid UTF-8 encoded, say. The above list is written with a European perspective — you should probably also consult someone with knowledge about Shift JIS and other encodings used in Asia.
In any case, I would only expect a Mercurial client to do a best effort attempt at showing me a file with an encoding other than ASCII.
Some alternative interpretations of your question:
If you're really asking about how to make your files look "correct" when you view them in hgweb, then it's a matter of using a consistent encoding in the repository and setting `web.encoding.
If you're really asking how to ensure that text files get the OS native line ending character on different platforms (\n on Unix, \r\n on Windows) when take a look at the eol extension that comes with Mercurial.

No. Encoding (charset) is property of file in repository

Related

File encoding for repositories on DevOps (cp1215/UTF-8)

I'm having issues with DevOps using Windows cp1252 file encoding rather than UTF-8 for repositories, resulting in my code breaking. I have certain symbols that are only supported by the UTF-8 format and have not found a possible way to change how I am pushing to the repository, and have no other alternative options to Azure DevOps as a platform just because of the mess that is JetBrains Space, and how there are utterly no other services offering the same concepts / platform for me (features like boards, the integrated CI, I just like having everything in one place and this is the only possible platform I have found, so don't link me elsewhere unless it actually meets the proper requirements.
Build log (custom Maven-based build environment)
9122 [WARNING]
9123 rip.verse.vserver.commands.implementations.VersionCommand [22,36]
9124 unmappable character (0x95) for encoding UTF-8
VersionCommand.java
19 public boolean execute(Sender sender, String[] arguments) {
20 if (arguments.legnth == 0) {
21 // ...
22 sender.sendTranslatedMessage("&8� &7Version: &b1.0");
23 // ...
24 return true;
25 }
26 }
I found the similar issue reported in DeveoperCommunity forum, and the product group gave the following answer.
This is by design. Editing a file (in Git) in the VSTS Code hub will save that file as UTF-8. This behavior was selected since UTF-8 is ubiquitously used, the encoding can be accurately determined, and it can encode all the special characters of the various flavors of ANSI as well as other languages. Of course, this might present a problem if you're specifically expecting special characters with Win 1252 (or other) encoding in another application or particular purpose, and if that's the case then you'll want to avoid editing in the web UI. I'm curious to know if that's the case, but I would personally recommend eventually using UTF-8 if possible to avoid encoding issues of the past.
Therefore, Windows cp1252 file encoding is not supported in Azure Repository, you need to save your file using UTF-8 file encoding. Please review this blog: How TFS Version Control determines a file’s encoding and What is the default encoding for source files in Visual Studio 2017? if you use Visual Studio.

Unreadable characters in a file on a remote server when viewing in a browser

I have to work with a text file on a remote server. The file can be accessed by a direct link using any browser, in the form http://server.school.com/files/people.all (not a real link, since the access requires a password). When I view it in Firefox some of characters are unreadable for example: 'José Luis Paniagua Sánchez'. I have a few questions.
Could the issue be caused by incorrect settings of my browser or could there be a problem with the file itself?
Is opening a file in a web browser and copying the entire content to a text editor using copy/paste inherently different from downloading the information with a script? Could it affect the encoding of the data?
Thanks.
Select in the browser the encoding, UTF-8 likely. Firefox: View - Character Encoding. The problem is that the file does not specify the encoding of the file (or specifies a default encoding).
A binary download, like downloading an image file (with which you could try), should keep the file as-is.
Cut-copy-paste using the right encoding in the browser should work for UTF-8.
Assuming it is indeed UTF-8 (multibyte sequences for special chars), and you are working on Windows (which is single-byte), you'll better use a programmer's editor like NotePad++ or JEdit, both free. They can set the encoding explicitly, and even convert.

.bin files used for upgrading embeded devices

I am confused a bit about .bin files. Basically in Linux we use elf, .ko type of files for upgrading the box or to copy in it . But, while upgrading a NAND flash in router or any Networking Gaint products why always .bin files is preferred. Is this is something like converged mix of all the OS related files. Does it possible to see the contents of a bin file. How to play with it. It is something like contents of BootROM. How is is prepared? How do we create and test on this. How Linux support for this. Any historical reasons behind this?
Speaking about routers, those files are usually just snapshots of a router's flash memory, probably compressed and with some headers added. Typical things are a compressed squashfs image or simply gzip'ed snapshot of memory.
There is no such thing as .bin format, it's just a custom array of bytes and every vendor interprets it in some vendor-specific way. Basically this extension means “it's not your business what's in the file, our device/software will handle it”. You can try to identify (thnk, reverse-engineer) what's actually in those files by using file utility or just looking at those files through a hex editor and trying to guess what's going on.

extract .bin files

So I have an old dictionary on my pc, pretty old that I cannot find
any track of it's developer or the website (I guess it hasnt even been released
as an official software). I have a personal project of mine and I might need some
of this words translated (about 200-300) and I see that inside the data folder that
contains the database/list of files but Im unable to extract or read this files.
Is there any way to extract or convert these .bin files to a text format or something
readable. I've used some tools like (alcohol 120%, isobuster, magiciso, Izarc) but with
no luck. I keep getting and error message saying it is not a valid cd image file.
So I'm thinking maybe this type of .bin files are not like .bin or .iso cd files that
you can mount and read and something else might be in this case.
If you have any information kindly reply with
your suggestions.
Thank you alot.
You can try using the strings utility to extract the strings out of the file. It comes with any Linux distribution and if you are on Windows, you can get it from Windows Sysinternals.
If you are lucky and the words are not encoded, you may be able to get at the data you are looking for.
.bin is one of those extensions that has been way overused, and could be anything... What did the file come from originally? Do you need to convert these words and store them back in the original file (in their transformed form), and then expect the original app to work correctly?

Legacy dos system with flat file data store (ISAM-Files)

I have a legacy system which used to run on dos. It is an ERP system for retail stores (fashion). It think it stores it's data in flat files.
I have files ending with *.KEY and other files ending with *.D00 (counting up).
I think the key files hold the key informationen and the D-Files hold some data ... there are alot D77 files...
As far as my investigation concerns this is not dfb or foxpro it could proprietary...
The company who wrote it is out of business of course so no chance for support or any hints.
When I open these files in vim or other editors I get some binary signs and some text... I tryed it in hex mode but still nothing to use...
Is there any chance I can dump out the data... in csv, ascii, xml?
I am pretty sure that this is not a standard format. Can someone point me in a direction how those data were stored back in the days and how could I make them read-able...
Any tools, tips or tricks?
// EDIT
After some time I made some progress and can now post some details which I did not now of back then and made a good answer impossible.
I asume that the dos system was written in visual cobol and that the files could be b-tree files stored in ISAM format. I assume the closet thing I could provide is, that there is a possibility that the format is C-ISAM.
How can I access / view or modify these files... C#, JAVA, ruby.... everything new age language would be cool... I am not sure if I can handle cobol... It would be great to have a converter or a viewer tool preferable opensource...
Hope this clearifies more my question =)
OpenCOBOL has a very active user group. The language itself is free and runs on Linux and Windows and perhaps MacOSX. Have a chat to the user group there; they may be able to help.
Peachtree Accounting Software used those file extensions back in 1992.

Resources