File encoding for repositories on DevOps (cp1215/UTF-8)

File encoding for repositories on DevOps (cp1215/UTF-8) - file

I'm having issues with DevOps using Windows cp1252 file encoding rather than UTF-8 for repositories, resulting in my code breaking. I have certain symbols that are only supported by the UTF-8 format and have not found a possible way to change how I am pushing to the repository, and have no other alternative options to Azure DevOps as a platform just because of the mess that is JetBrains Space, and how there are utterly no other services offering the same concepts / platform for me (features like boards, the integrated CI, I just like having everything in one place and this is the only possible platform I have found, so don't link me elsewhere unless it actually meets the proper requirements.
Build log (custom Maven-based build environment)
9122 [WARNING]
9123 rip.verse.vserver.commands.implementations.VersionCommand [22,36]
9124 unmappable character (0x95) for encoding UTF-8
VersionCommand.java
19 public boolean execute(Sender sender, String[] arguments) {
20 if (arguments.legnth == 0) {
21 // ...
22 sender.sendTranslatedMessage("&8� &7Version: &b1.0");
23 // ...
24 return true;
25 }
26 }

I found the similar issue reported in DeveoperCommunity forum, and the product group gave the following answer.
This is by design. Editing a file (in Git) in the VSTS Code hub will save that file as UTF-8. This behavior was selected since UTF-8 is ubiquitously used, the encoding can be accurately determined, and it can encode all the special characters of the various flavors of ANSI as well as other languages. Of course, this might present a problem if you're specifically expecting special characters with Win 1252 (or other) encoding in another application or particular purpose, and if that's the case then you'll want to avoid editing in the web UI. I'm curious to know if that's the case, but I would personally recommend eventually using UTF-8 if possible to avoid encoding issues of the past.
Therefore, Windows cp1252 file encoding is not supported in Azure Repository, you need to save your file using UTF-8 file encoding. Please review this blog: How TFS Version Control determines a file’s encoding and What is the default encoding for source files in Visual Studio 2017? if you use Visual Studio.

Related

Unreadable characters in a file on a remote server when viewing in a browser

I have to work with a text file on a remote server. The file can be accessed by a direct link using any browser, in the form http://server.school.com/files/people.all (not a real link, since the access requires a password). When I view it in Firefox some of characters are unreadable for example: 'JosĂ© Luis Paniagua SĂˇnchez'. I have a few questions.
Could the issue be caused by incorrect settings of my browser or could there be a problem with the file itself?
Is opening a file in a web browser and copying the entire content to a text editor using copy/paste inherently different from downloading the information with a script? Could it affect the encoding of the data?
Thanks.

Select in the browser the encoding, UTF-8 likely. Firefox: View - Character Encoding. The problem is that the file does not specify the encoding of the file (or specifies a default encoding).
A binary download, like downloading an image file (with which you could try), should keep the file as-is.
Cut-copy-paste using the right encoding in the browser should work for UTF-8.
Assuming it is indeed UTF-8 (multibyte sequences for special chars), and you are working on Windows (which is single-byte), you'll better use a programmer's editor like NotePad++ or JEdit, both free. They can set the encoding explicitly, and even convert.

Mercurial: which config property controls the encoding of file contents?

Is there a dedicated Mercurial configuration property which specifies the encoding of file contents and hence should be used by a Mercurial client to properly display a file?
I've found web.encoding which does not seem to mean exactly what I'm looking for. Also, Google gave some results for ui.encoding as well, but I couldn't find any hints in the reference.

Mercurial is not concerned with the encoding of the files you put in your repository: Mercurial is happy to store files with any encoding (or maybe not particular encoding at all).
This means that you can add files with UTF-8, Latin-1, or any other encoding to your repository and Mercurial will check them out exactly as they were when you added them.
The encoding of each file is not stored anywhere in Mercurial and it is up to the client to recognize the encoding (perhaps based on file content where it makes sense, e.g., for XML files).
For a Mercurial desktop client (as per your comments below) I suggest looking at the file content:
Can you decode it at UTF-16?
Can you decode it as UTF-8?
Are the NUL bytes in the file? Then stop and declare it to be "binary".
Fallback on a Latin-N encoding such as Latin-1 for Western Europe.
The UTF-16 and UTF-8 encodings are nice since they are structured and this makes it possible for you to detect that a file isn't valid UTF-8 encoded, say. The above list is written with a European perspective — you should probably also consult someone with knowledge about Shift JIS and other encodings used in Asia.
In any case, I would only expect a Mercurial client to do a best effort attempt at showing me a file with an encoding other than ASCII.
Some alternative interpretations of your question:
If you're really asking about how to make your files look "correct" when you view them in hgweb, then it's a matter of using a consistent encoding in the repository and setting `web.encoding.
If you're really asking how to ensure that text files get the OS native line ending character on different platforms (\n on Unix, \r\n on Windows) when take a look at the eol extension that comes with Mercurial.

No. Encoding (charset) is property of file in repository

Legacy dos system with flat file data store (ISAM-Files)

I have a legacy system which used to run on dos. It is an ERP system for retail stores (fashion). It think it stores it's data in flat files.
I have files ending with *.KEY and other files ending with *.D00 (counting up).
I think the key files hold the key informationen and the D-Files hold some data ... there are alot D77 files...
As far as my investigation concerns this is not dfb or foxpro it could proprietary...
The company who wrote it is out of business of course so no chance for support or any hints.
When I open these files in vim or other editors I get some binary signs and some text... I tryed it in hex mode but still nothing to use...
Is there any chance I can dump out the data... in csv, ascii, xml?
I am pretty sure that this is not a standard format. Can someone point me in a direction how those data were stored back in the days and how could I make them read-able...
Any tools, tips or tricks?
// EDIT
After some time I made some progress and can now post some details which I did not now of back then and made a good answer impossible.
I asume that the dos system was written in visual cobol and that the files could be b-tree files stored in ISAM format. I assume the closet thing I could provide is, that there is a possibility that the format is C-ISAM.
How can I access / view or modify these files... C#, JAVA, ruby.... everything new age language would be cool... I am not sure if I can handle cobol... It would be great to have a converter or a viewer tool preferable opensource...
Hope this clearifies more my question =)

OpenCOBOL has a very active user group. The language itself is free and runs on Linux and Windows and perhaps MacOSX. Have a chat to the user group there; they may be able to help.

Peachtree Accounting Software used those file extensions back in 1992.

Old unknown database

I got this work assignment from my boss where I shall try to get information from an old database. The thing is, we know nothing about it. We hope it is some known format and not something the developer made himself.
It comes standalone with an application (unknown language) and seems to be a mix of file types. In one folder there are, for example:
MISCINFO.BRG (27 531 kb)
MISCINFO.IDX (264 kb)
MISCINFO.LOG (30 422 kb)
In another folder where there are a bunch of VIS-files.
I don't really know where to start. I need some driver to access these files, preferbly by ODBC or just by open them somehow.

.brg could be a bridge file mentioned here:
http://www.recital.com/adminDBS.htm

The application in question comes with some DLL files. One of them is DATABASE.DLL, which contains a couple of names of people in plain text. I've searched some names on Google and found a Delphi programmer which I've contacted and waiting reply from. I've verified Delphi as the application language with some other sources.
According to Dependency Walker the DATABASE.DLL contains some functions for open/close a connection and fetching, updating and deleting data. Some functions indicates the DLL to be custom made. Perhaps I can use the same DLL.
Dependecy Walker only shows exported functions and not anything about parameters. Some files it can't open at all because they are 16-bit.

Well, the best way to go is to look at what software is known to use files with those extensions. LOG isn't much use, but BRG, VIS and IDX are reasonably rare.
VIS files:
Picture File
StudioPro 3D File
Vision Executive (Report) by Lasata Software
VISkompakt (Objects Description File) by PDV-Systeme GmbH
Vista Graphics
BRG files:
The only reference I can find is for Age of Mythology, which seems unlikely.
IDX files:
AOL (Temporary Internet Mail File)
ArcView (Geocoding Index For Read-Only Datasets) by ESRI
Ca Visual Objects Platform for Developer (CAVO) (Index File) by CA
Clip Gallery 1.x (Index) by Microsoft Corporation
Complete Works (Index File) by Toplevel Computing
Corel QuickFinder Information
FAX File
FoxPro (Index) by Microsoft Corporation
ICQ (Index) by ICQ Inc.
Java (Applet Cache Index) by Sun Microsystems, Inc.
LaTeX Index
NoX
Outlook Express (Mailbox Index) by Microsoft Corporation
Pro/ENGINEER (Index File) by PTC
RCA/Lyra Handheld MP3 Player (Database Index) by RCA
Symantec Q&A (Relational Database Index File) by Symantec Corporation
VSFilter (Index File)
Since none of those look that hopeful (there are no products I can see in the lists for VIS files and IDX files), I think unfortunately your hopes that it's not a custom format seem likely to be in vain.

You might want to try the 'file' on those files on a linux system. File ignores the file extension, it actually examines the file to identify it.
So copy the files to a linux machine and execute the following command in a terminal window:
Usage:
$ cd my_directory_with_unknownfiles
$ file *

Microsoft word Text Parser in "C"

I would like to know the procedure to adopt to parse and obtain text content from Microsoft word (.doc and .docx) documents . programming language used should be plain "C" (should be gcc).
Are there any libraries that already do this job,
extension : can i use the same procedure to parse text from Microsoft power point files also ?

Microsoft Word documents are an enormous beast - you definitely don't want to be writing this code yourself. Look into using an existing free Word library such as antiword or wvWare.

I don't know about libraries that exist, but the format specifications are available from Microsoft for free and under a promise not to sue you for using them.

on windows, let word do the job and interface with the COM object, on linux, the job was done in antiword. Or you can automate OpenOffice.org on any platform with the UNO object model.

If you're willing to go through the effort of using a COM interface in C, you can use the IFilter interface built into every version of Windows since Windows 2000. You can use it to extract text from any office document (Word, Excel, etc.), PDF file or any file type that has IFilter support installed.
I wrote a blog post about it a few years back. It's all C++, but you can use COM objects from C.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight