SSMS 'script as' is saving as utf-16 - sql-server

When I check my default SSMS file format it is Windows 1252, and opening a saved script inside VSCode is showing me it is UTF-8.
However, when I do a 'generate scripts', all I get is options for 'unicode' and 'ascii'. When I save as unicode it is saving as utf-16 and causing havoc with git. Saving a script using file/save as is fine.
I'm no expert on code pages/encoding but I am presuming saving as 'ascii' really is just ascii and will lose any non-ascii characters that are valid in UTF-8
How do I change this 'script as unicode' from utf-16 to 8, or if that is a daft question, get it to save using the default option I have for standard files?
I'm using version 17.6 of SSMS
thanks.

Related

Fix Japanese character issue in unix box

Our company build process uses gradle and copies over the deliverables including sql file to a shared drive. Unix boxes can refer to these paths and then uses a shell script. The shell script invokes the db2 command line utility to run the sql files in the corresponding environment's database.
This works pretty well. However I have hit a wall where we are updating japanese characters in the sql file. When I download the file from the browser and inspect the file, I can see the characters properly. However, in the unix box, I can see that the content in the sql file looks pretty different. I used cat, more and vi to view the contents of the file. Value of the LANG environment variable is C. I see the environment variable set is TERM=XTERM. So I think xterm is the terminal evaluator.
LANG=C
When I run the command in the linux terminal window-
locale charmap
I see the encoding as ANSI_X3.4-1968. UTF-8 is one of the possiblities but that is not set in the box I think.
When I run the command in the linux terminal -
file -bi 1.sql
I see the charset encoding of the file as UTF-8.
When we run the shellscript which invokes the db2 command line utility, it inserts whatever I can see in the unix file and not the actual japanese characters.
I tries searching for options as to what could go wrong. Please find below my analysis -
When I run the sql from the Aqua Data Studio, data is inserted properly. So there is no issue with the actual database setup or the table configuration.
There are some other xml files in the shared path which have japanese characters. In unix box, I again see the distorted characters as I see in the sql file. When these xml files get copied to the reporting server, they work fine and the data is shown properly in the pdf.
So my line of thinking is that there can be some way I can specify in the db2 command line utility to use the correct character set and that should work fine. But I am unable to find anything useful on the internet.
Now there are chances that there is something small which I am not able to figure out. Hence I will request you to help me in this regard. This is not very critical as we can always leave the automated process of deployment and sql insertion , and instead go with the manual insertion by giving the scripts to the DBA. However this is highly discouraged.
Do let me know if you could help me with this regard. Just FYI, I unzipped the war produced after the gradle build (from the jenkins server). I can see the file properly in the notepad++. Do let me know if you need any other info.
If the Japanese characters do not render correctly in cat or more in your terminal window then it means that the LANG variable has the wrong setting. In your case LANG=C is not suitable for Unicode.
Some terminal emulators such as Putty, need to have the translation set to support UTF-8 and that may not be default - otherwise Unicode characters can render incorrectly on screen.
The correct LANG setting depends on your country, on which locales are installed in Linux, and also on the target database territory and codepage and Db2-server operating-system (i.e. Db2 for Z/OS , or i-series, or Linux/Unix/Windows). Sometimes you have to install additional locales if they were not installed by default, it depends on your Linux distro , platform, and version and installation policies.
The solution is normally to ensure you set the LANG variable correctly before you connect to any Db2 database. It's wise to assert such things in configuration checks that get automatically run by scripts. You get the correct value for LANG from the Db2 documentation, and that depends on your Db2-server operating system and version and your knowledge of the target database encoding and territory (or CCSID of the target database/tables/columns/tablespaces etc). For example, if your Db2-database lives on Linux/Unix/Windows, look at this page. Other pages exist for Db2 for Z/OS or i-series.
Once you change the LANG variable ( export LANG=...), again verify that more or cat renders the Japanese characters correctly. If you are already connected to the database, changing the LANG has no effect, so remember to disconnect first. Test the LANG values until the Japanese characters look correct, and only then try the Db2 connection and examine the new behavior.

converting .trc 2 ascii

I have accidentally saved some data using the oracle TRACE file format. I have been trying to convert these data files into normal text (ASCII) file formats for a day now without much success. Can someone help me into the right direction? I would prefer to use linux but do also have access to a windows machine. I could upload an example file as well. The files come from a RIGOL scope and using "vim" to peer into them gives something along the lines of: "sc8^#DS1104Z^#^#^#^#^#^#^#^#^#^#^#^#^#00.04.01.SP2^#^#^#^#^#^#^#^#^A^#^E^#^[s`¢^#^#^#^#<8c><9c>^#^#^L^#^B^#^#É^C^#ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ..."

Unreadable characters in a file on a remote server when viewing in a browser

I have to work with a text file on a remote server. The file can be accessed by a direct link using any browser, in the form http://server.school.com/files/people.all (not a real link, since the access requires a password). When I view it in Firefox some of characters are unreadable for example: 'José Luis Paniagua Sánchez'. I have a few questions.
Could the issue be caused by incorrect settings of my browser or could there be a problem with the file itself?
Is opening a file in a web browser and copying the entire content to a text editor using copy/paste inherently different from downloading the information with a script? Could it affect the encoding of the data?
Thanks.
Select in the browser the encoding, UTF-8 likely. Firefox: View - Character Encoding. The problem is that the file does not specify the encoding of the file (or specifies a default encoding).
A binary download, like downloading an image file (with which you could try), should keep the file as-is.
Cut-copy-paste using the right encoding in the browser should work for UTF-8.
Assuming it is indeed UTF-8 (multibyte sequences for special chars), and you are working on Windows (which is single-byte), you'll better use a programmer's editor like NotePad++ or JEdit, both free. They can set the encoding explicitly, and even convert.

Mercurial: which config property controls the encoding of file contents?

Is there a dedicated Mercurial configuration property which specifies the encoding of file contents and hence should be used by a Mercurial client to properly display a file?
I've found web.encoding which does not seem to mean exactly what I'm looking for. Also, Google gave some results for ui.encoding as well, but I couldn't find any hints in the reference.
Mercurial is not concerned with the encoding of the files you put in your repository: Mercurial is happy to store files with any encoding (or maybe not particular encoding at all).
This means that you can add files with UTF-8, Latin-1, or any other encoding to your repository and Mercurial will check them out exactly as they were when you added them.
The encoding of each file is not stored anywhere in Mercurial and it is up to the client to recognize the encoding (perhaps based on file content where it makes sense, e.g., for XML files).
For a Mercurial desktop client (as per your comments below) I suggest looking at the file content:
Can you decode it at UTF-16?
Can you decode it as UTF-8?
Are the NUL bytes in the file? Then stop and declare it to be "binary".
Fallback on a Latin-N encoding such as Latin-1 for Western Europe.
The UTF-16 and UTF-8 encodings are nice since they are structured and this makes it possible for you to detect that a file isn't valid UTF-8 encoded, say. The above list is written with a European perspective — you should probably also consult someone with knowledge about Shift JIS and other encodings used in Asia.
In any case, I would only expect a Mercurial client to do a best effort attempt at showing me a file with an encoding other than ASCII.
Some alternative interpretations of your question:
If you're really asking about how to make your files look "correct" when you view them in hgweb, then it's a matter of using a consistent encoding in the repository and setting `web.encoding.
If you're really asking how to ensure that text files get the OS native line ending character on different platforms (\n on Unix, \r\n on Windows) when take a look at the eol extension that comes with Mercurial.
No. Encoding (charset) is property of file in repository

How to copy and paste script text from SSMS to Outlook or Word without garbling it?

Say, I have a script nicely formatted in SSMS and it's annotated with all kinds of comments in different languages. But when I copy and paste this nice thingy into Word with syntax highlighted I will get a syntax-highlighted message with those comments garbled, as if reading the source text with one code page and pasting it using another code page. Very nasty kinda bug. Does anyone know how to solve this issue once and for all?
Thank you!
[Update]
[Solution]
Save → Save with Encoding... → Encoding : Unicode (UTF-8 with signature).
[Related forums]
Need a way to set the default encoding for query files in SMSS. by ChrisMay #Microsoft Connect (Go and upvote this issue at Microsoft Connect)
SQL Server Management Studio - File Encoding #SqlDev
SSMS : Option to set default save as encoding for CSV by AaronBertrand #Microsoft Connect
After some tests, I'm still unable to reproduce the issue. And I have no idea why copy-pasting text from one Unicode-compatible to another Unicode-compatible app can give such results.
There are several things you can try:
Inside SSMS, save the script as an Unicode file: Save → Save with Encoding... → Encoding : Unicode (UTF-8 with signature). You will then be able to open it probably correctly in Word. The problem is that syntax highlighting will be lost.
Save the script as an Unicode file, than reopen it and copy-paste. Maybe SSMS assume for some reasons that there is some fancy encoding by default, so this will force it to use UTF-8 instead.
Try to paste in different applications (for example the browser). Looking at the first line at your screenshot, I remember once seeing the same problem with some browser renderings described on Wikipedia (can't find the link).
Try copying the same text from Visual Studio (if installed). Copying source code from Visual Studio to Office programs preserve syntax highlighting, so if you observe the same issue, it may come from this syntax highlighting feature.
If nothing works, report the problem to Microsoft Connect, describing precisely the situation so the people at Microsoft will be able to reproduce this issue.

Resources