File encoding format through command prompt : Windows - file

I have a file which may be in ASCII or UTF-8 format. I can know which format it is through Notepad++. But can some one tel me a tool that could show me in which format the file is through command prompt.
Example:
Open Command Prompt,
C:><Some Command> FileName
which should give me the file format like ASCII or UTF-8.

Install Python 3.x
Run command in cmd.exe (Administrator): pip install chardet
Write a small python script that read a file, detect the encoding, and print the encoding using the newly installed module chardet. See here for help.
Put the script somewhere under PATH
Suppose you create ec.py doing the job. Then you can invoke ec FileName on command line to get the encoding. If you do a good job writing the python script, you can invoke something like ec *.txt to get the encodings of multiple files.

This is a duplicate of this question here which has a great answer (not by me I might add)
EDIT
I am pretty sure there is not way that is reliable to do this, usually you are told the encoding of a file, sure you can look for a Byte Order Mark (BOM) at the start of the file but its not mandatory and so is not a true indicator unless you know for SURE its supposed to be there.
Unless someone knows differently I don't think its possible to work out from scratch you have to have some clue about the encoding used.

Related

Converting XLS(X) to csv with ssconvert or something else

I'm trying to convert XLS(X) files to csv on a RHEL server and have learned about gnumeric which includes ssconvert. I've done testing on a lab VM to make sure ssconvert works for what I need. However, I want to know if there is a way to install ssonvert by itself (with any libs/dependencies it needs) and not install everything else that comes with gnumeric.
Alternatively, is there another way to convert XLS(S) files to csv?
If you're open to installing LibreOffice, you can accomplish this using:
soffice --headless --convert-to csv yourfile.xlsx
The output will be comma-separated, newline delimited (though you should test specific behavior on your system to be sure).

VLC command line converting is not always working

I scheduled a .cmd file that would convert a network stream into a .mp4 file, using:
vlc -vvv "http://86.127.212.113/control/faststream.jpg?stream=mxpeg" --sout=#transcode{vcodec=h264,scale=Automat,scodec=none}:file{dst=C:\\Users\\ACV\\Videos\\rec3.mp4,no-overwrite} :no-sout-all :sout-keep
It often works, but sometimes it just creates big files that I am not able to play.
Even VLC itself cannot play these files, outputting just this
I would suggest that you use the following syntax:
Replace the = after --sout with a space character
Doublequote the --sout chain
Replace the prefix : characters for the global options for no-sout-all and sout-keep with --
#"%ProgramFiles%\VideoLAN\VLC\vlc.exe" -vvv "http://86.127.212.113/control/faststream.jpg?stream=mxpeg" --sout "#transcode{vcodec=h264,scale=Automat,scodec=none}:file{dst=C:\\Users\\ACV\\Videos\\rec3.mp4,no-overwrite}" --no-sout-all --sout-keep
I have included the full path to vlc.exe for safety, please adjust it as you need.

En Dash in file path batch job query

Hoping someone can help.
I have a batch job (Windows env) which simply copies a file to another folder.
copy "\\ACP-MS-NAS21\Global\MEC Daily Productivity\Business Analysts\Master_List\HCP_Master_List.xlsx" ^
"\\ACP-MS-NAS21\Global\CSD [?] DWP Medical Services\CSL_CSD_DB\Master_List"
But I get the following error:
The filename, directory name, or volume label syntax is incorrect.
I can see there's an en dash in the file path which I believe is causing the problem.
Is there any way to have a wildcard in the file path or any other way the job can recognise this?
Thanks in advance.
PS. newbie at batch programming, coding etc, so please can explanations be in plain English. Many thanks
The command-line windows (and by extension - BAT files) operate in OEM codepage by default. Which exact codepage is defined by your OS settings (Language for non-Unicode programs or similar). Therefore you cannot normally use anything outside ASCII or whatever codepage you have.
Save script as UTF-8
To use those characters you will need to work in a codepage that actually has them. UTF-8 is the best one for this tasks (and probably the only one that will work in your case).
First, save your script as UTF-8. In notepad you can select codepage from save as menu.
If your editor does not allow you to select UTF-8 (no BOM) leave first line of your BAT file blank as some editors may preface your file a special header called BOM that helps detecting codepage. If it does and you leave 1st line blank you will get a Bad command or file name error as soon as your script starts but this won't prevent it from running properly.
Select UTF-8 codepage
Now, your script is in UTF-8 but windows command processor will still execute it as if it was in ASCII, thus corrupting all special characters. To specify our encoding we need to add following command, preferentially as a first non-blank line of your script, including comments (those may have non-ASCII characters in them - with unpredictable results)
CHCP 65001
CHCP changes current codepage to 65001 which is internal codepage number for UTF-8.
This works because latin letters and numbers in UTF-8 have the same encoding as in ASCII and OEM codepages. Thus your scripts starts executing in OEM codepage, but since CHCP 65001 command itself does not have non-ASCII characters it understood correctly. All following commands will be executed in UTF-8 and may have non-ASCII characters.
You may now insert em-dash into the filename and it won't be replaced with ? upon saving.
Set UTF-8 font
Unfortunately default console window font does not display UTF-8 correctly so you won't be able to see non-ASCII characters correctly.
To solve this you should right-click command-line window title bar, select properties, and change font to UNICODE one. Consolas, Lucida Console and Courier New should work.
Thanks for all the responses.
I've had to make a design change to my DB so have managed to get around the need to do this now (phew :))
Thanks again

Find multiple files from the command line

Description:
I am searching a very large server for files that is on a different server. right now I open command prompt and type
DIR [FILE NAME] /S/4
This returns the server location of the file with some other stuff that is not really needed.
Question:
I have a lot of files to search and one by one input into the above command could take forever. Is there a way I could input all of the names of all the files and only search once and the search results would only need to show file name and location?
First, I hope you don't mean DOS, but rather Windows cmd or batch.
You can certainly write a script that will run your DIR command once per file being sought.
But what you most likely want instead is to search once and print the path of each file found. For this you can use PowerShell's FindChildItem or the improved one posted here: http://windows-powershell-scripts.blogspot.in/2009/08/unix-linux-find-equivalent-in.html
It will be something like:
Find-ChildItem -Name "firstfile.txt|secondfile.txt|..."
Another approach is to install msys or cygwin or another Linux tools environment for Windows and use the Linux find command.

Is it possible to detect file format and encoding of file using batch files?

Is it possible to detect file format and encoding of file using batch files? And if a particular file is not of intended format, throw an error?
As a *nix guy, I'd want to jump for something more powerful than a batch file, such as Python. (or a shell script, but I'm assuming you're using Windows --- you might look into PowerShell, but I've never tried it.)
Unix has a great utility for this sort of thing, it's named file. There appears to be a Windows version here: http://gnuwin32.sourceforge.net/packages/file.htm
Basically, you run file [your filename here] and file spits out a blurb about the file. For example:
$ file zdoom-2.4.1-src.7z
zdoom-2.4.1-src.7z: 7-zip archive data, version 0.3
It's not always right, and it doesn't mean that if file says "this is a JPEG" that the file is actually a JPEG: it could be corrupt, etc.
Also, if I rename the above 7z archive to "foo":
$ file foo
foo: 7-zip archive data, version 0.3
... file will still get it.

Resources