Unreadable characters in a file on a remote server when viewing in a browser - file

I have to work with a text file on a remote server. The file can be accessed by a direct link using any browser, in the form http://server.school.com/files/people.all (not a real link, since the access requires a password). When I view it in Firefox some of characters are unreadable for example: 'José Luis Paniagua Sánchez'. I have a few questions.
Could the issue be caused by incorrect settings of my browser or could there be a problem with the file itself?
Is opening a file in a web browser and copying the entire content to a text editor using copy/paste inherently different from downloading the information with a script? Could it affect the encoding of the data?
Thanks.

Select in the browser the encoding, UTF-8 likely. Firefox: View - Character Encoding. The problem is that the file does not specify the encoding of the file (or specifies a default encoding).
A binary download, like downloading an image file (with which you could try), should keep the file as-is.
Cut-copy-paste using the right encoding in the browser should work for UTF-8.
Assuming it is indeed UTF-8 (multibyte sequences for special chars), and you are working on Windows (which is single-byte), you'll better use a programmer's editor like NotePad++ or JEdit, both free. They can set the encoding explicitly, and even convert.

Related

Mercurial: which config property controls the encoding of file contents?

Is there a dedicated Mercurial configuration property which specifies the encoding of file contents and hence should be used by a Mercurial client to properly display a file?
I've found web.encoding which does not seem to mean exactly what I'm looking for. Also, Google gave some results for ui.encoding as well, but I couldn't find any hints in the reference.
Mercurial is not concerned with the encoding of the files you put in your repository: Mercurial is happy to store files with any encoding (or maybe not particular encoding at all).
This means that you can add files with UTF-8, Latin-1, or any other encoding to your repository and Mercurial will check them out exactly as they were when you added them.
The encoding of each file is not stored anywhere in Mercurial and it is up to the client to recognize the encoding (perhaps based on file content where it makes sense, e.g., for XML files).
For a Mercurial desktop client (as per your comments below) I suggest looking at the file content:
Can you decode it at UTF-16?
Can you decode it as UTF-8?
Are the NUL bytes in the file? Then stop and declare it to be "binary".
Fallback on a Latin-N encoding such as Latin-1 for Western Europe.
The UTF-16 and UTF-8 encodings are nice since they are structured and this makes it possible for you to detect that a file isn't valid UTF-8 encoded, say. The above list is written with a European perspective — you should probably also consult someone with knowledge about Shift JIS and other encodings used in Asia.
In any case, I would only expect a Mercurial client to do a best effort attempt at showing me a file with an encoding other than ASCII.
Some alternative interpretations of your question:
If you're really asking about how to make your files look "correct" when you view them in hgweb, then it's a matter of using a consistent encoding in the repository and setting `web.encoding.
If you're really asking how to ensure that text files get the OS native line ending character on different platforms (\n on Unix, \r\n on Windows) when take a look at the eol extension that comes with Mercurial.
No. Encoding (charset) is property of file in repository

File extension .DB - What kind of database is it exactly?

I have a database file with .DB file extension. I have been googling and it looks like SQLite. I tried to connect to it using SQLite and SQLite3 drivers and I am getting an error "File is encrypted or not a database".
So I dont know if file is encrypted or it is not an SQLite database. Are there any other options what should the .DB extension should be? How do I find out that file is encrypted?
I tried to open it in the text editor and it is mostly a mess of charaters and some times there are words visible. I have uploaded the file here: http://cl.ly/3k0E01373r3v182a3p1o for the closer look.
Thank you for your hints and ideas what to do and how to work with this file.
Marco Pontello's TrID is a great way to determine the type of any file.
TrID is simple to use. Just run TrID and point it to the file to be analyzed. The file will be read and compared with the definitions in the database. Results are presented in order of highest probability.
Just download the executable and the latest definitions file into the same directory and then run TrID:
trid.exe "path/to/file.xyz"
It will output a list of possible file types for the file with a confidence rating. Here's a screenshot of using TrID to analyze a SQLite database file:
There's also a GUI version called TrIDNet:
If you're on a Unix-like platform (Mac OS X, Linux, etc), you could try running file myfile.db to see if that can figure out what type of file it is. The file utility will inspect the beginning of the file, looking for any clues like magic numbers, headers, and so on to determine the type of the file.
Look at the first 30 bytes of the file (open it in Notepad, Notepad++ or another simple text viewer). There's usually some kind of tag or extension name in there.
Both SQLite 2 and SQLite 3 have a very clear message: SQLite format 3 for SQLite 3 (obviously) and This file contains an SQLite 2.1 database for SQLite 2.
Note that encrypted SQLite databases don't have a header like that since the entire file is encrypted. See siyw's comment below.
On a Unix-like system (or Cygwin under Windows), the strings utility will search a file for strings, and print them to stdout. Might help you narrow the field.
There are a lot of programs besides database programs that use a "db" extension, including
ArcView Object Database File (ESRI)
MultiEdit
Netscape
Palm
and so on. Google "file extensions" for some sites that catalog file extensions and the programs that use them.
There's no conclusive way to know, because SQLite encrypts the entire database file, including the header.
Further, there's not a lot of difference to you, except for possible error text to a user if you're prompting them for a password.

How to copy and paste script text from SSMS to Outlook or Word without garbling it?

Say, I have a script nicely formatted in SSMS and it's annotated with all kinds of comments in different languages. But when I copy and paste this nice thingy into Word with syntax highlighted I will get a syntax-highlighted message with those comments garbled, as if reading the source text with one code page and pasting it using another code page. Very nasty kinda bug. Does anyone know how to solve this issue once and for all?
Thank you!
[Update]
[Solution]
Save → Save with Encoding... → Encoding : Unicode (UTF-8 with signature).
[Related forums]
Need a way to set the default encoding for query files in SMSS. by ChrisMay #Microsoft Connect (Go and upvote this issue at Microsoft Connect)
SQL Server Management Studio - File Encoding #SqlDev
SSMS : Option to set default save as encoding for CSV by AaronBertrand #Microsoft Connect
After some tests, I'm still unable to reproduce the issue. And I have no idea why copy-pasting text from one Unicode-compatible to another Unicode-compatible app can give such results.
There are several things you can try:
Inside SSMS, save the script as an Unicode file: Save → Save with Encoding... → Encoding : Unicode (UTF-8 with signature). You will then be able to open it probably correctly in Word. The problem is that syntax highlighting will be lost.
Save the script as an Unicode file, than reopen it and copy-paste. Maybe SSMS assume for some reasons that there is some fancy encoding by default, so this will force it to use UTF-8 instead.
Try to paste in different applications (for example the browser). Looking at the first line at your screenshot, I remember once seeing the same problem with some browser renderings described on Wikipedia (can't find the link).
Try copying the same text from Visual Studio (if installed). Copying source code from Visual Studio to Office programs preserve syntax highlighting, so if you observe the same issue, it may come from this syntax highlighting feature.
If nothing works, report the problem to Microsoft Connect, describing precisely the situation so the people at Microsoft will be able to reproduce this issue.

Is file encryption different from content encryption

Is there any difference between encrypting a file and encrypting the content of the file. If so, how to do the both.
File encryption is encrypting a whole file, from the outside (e.g. by right-clicking in Windows XP Explorer). Content encryption is generally used as a synonym for file encryption, but the two things are different. Content encryption means encrypting the contents of a file, or part of the contents of a file.
Consider video streaming. For instance, we might want to encrypt an HD video so that anybody can see the low-res version but only paying subscribers get the top quality stream. We cannot do that by encrypting the whole video file.
There is no difference.
There is no difference; however, it is possible to go an extra step by mapping the name of the file to something completely meaningless (EncFS does this in paranoia mode) or by attempting to hide the encrypted file is some way (maybe as diffused bits in some other media file). However, those aren't really encryption, but rather steganography -- attempting to hide important (usually encrypted) information.

IE6 "helpfully" appends suffix to downloaded file

A webapp I've been developing allows users to upload and download a type of file which is meant to be treated as an opaque blob. My app serves it up with a file extension not commonly used for any other purpose, as well as specifying that its MIME Content-Type is application/octet-stream.
Internally, the file is a simple Zip archive containing a single compressed file. What I've found is that IE6 apparently inspects the content of the file, determines that it's a Zip archive, and "helpfully" saves it with an additional ".zip" extension. Unbelievable!
As I mentioned, this file is meant to be opaque, and we don't want users to be poking around inside the file--not because it's dangerous or contains sensitive information or anything, we just don't want to confuse them. I suggested prepending the Zip content with a magic number to prevent IE6 from recognizing it, but my manager says he'd prefer it if the file content could remain unchanged, so that knowledgeable people can rename the file and examine its contents as a zip archive, if necessary.
Is there any way to tell IE6 to keep its mitts off of the file? Or any alternative approach at all? (Alas, just not supporting IE6 at all is not an option.)
Incidentally, IE7 respects the file's name, but still identifies it as a Zip archive in the download dialog. That's better than IE6, but still less than ideal.
Short answer: Add correct MIME types to you web server so IE6 doesn't guess the file type.
Long Answer:
My work had a similar problem with Microsoft PowerPoint files.
.ppt vs .pps - Which are identical files with different extensions. We wanted the user to view a show (.pps) but IE6 kept changing it to .ppt. It changed the extention because the users machine had PowerPoint installed and understood that the file "looked" like a . ppt. Don't understand why not .pps.
The problem, besides IE6, was that our web server (IIS) was not aware of either MIME types for .pps or .ppt. So we had to add the correct MIME types so the server would not deliver them as "application/octet-stream". I understand that by using "application/octet-stream" IE6 will try to guess the MIME type.
So we added:
.pps = "application/vnd.ms-powerpoint"
.ppt = "application/vnd.ms-powerpoint"
Now it works fine with IE6.
I hope this helps solve your problem.
use this header flag: Content-Disposition: attachment; filename="yourfilename.extension"
This is a known problem, and the only solution is to edit the client computer's registry, which I'm sure doesn't help you a lot.

Resources