A old text file with special format - file-format

I am working on HP-UX project, there are a old document. Can open it with vim, but there are some special character among text. For example:
.P
"xxxxx"
.AL 1 10
.LI "xxx"
.H 3 "xxxx"
It looks like html but not be html. Is it possible convert it to modern document?

Looks like troff. Install GNU troff (Groff) and try:
groff -Thtml -pet -mm input.mm > output.html

I guess more details are needed - some ideas you may try:
First, issue a file command for the file. It will probably tell you what type of file is.
jim#debian:~$ file foo.bar
foo.bar: ASCII text
Second, search for similar files and see if there's a program to open them in the machine - maybe, they are binary files for some program out there, and you just don't know which one.
Last, but not least, I believe you are right - looks like HTML code to me, so maybe this is used by an application as a kind-of intermediate language, that is parsed later to transform it to real HTML.
I hope this helps!

Related

How to read content of unknown file

I have a file that holds manufacturing orders for a machine.
I would like to read the content of this file and edit it, but when I open it in a text editor i.e. Notepad++, I get a bunch of wierd charecters:
xÚ¥—_HSQÀo«a)’êaAXŽâê×pD8R‰¬©s“i+ƒ´#¡$
-þl-ó/ÓíºIúPôàƒHˆP–%a&RÎÈn÷ü¹·;Ú;ç<ìòÝÃý}¿ó}‡{϶«rWg>˜›ãR‡)Çn0³Ûf³yÎW[5–šw½ÇRW{ñ’rO6¹ŽŸp¦ÙœcÏ.9yÀnýg
)Ë—e90ejÕø£rC. f¦}3ËŒ˜hü”å1g[…ø±ú ÜJøz®‹˜YfÈ,4`ŽKÉ—ù“ÔË¿d„þlG3#=˜Ž´+hF¬¦£€«šm¿áØ
ïÖµv‡ËpíÍ~™‡Aù
šëÈÚ]ÿç™DŒÉFØ ïƒæsij  ¦y=-74Æ/t=ÕŠr\˜š»Âä‰Ý­¨žã΢
dz·à‡'fœ½­yâ½4qåPjácòÄŒeÊhñ“ý™ÙÎÕ÷5ôlñ=˜Õ{ú;ø=Û;4OêYä>Ìpxbæâ­'è"oëB×1gQ9“'¹]Ô³’Ô³ø!ÌózÞyŸõžÓIŽù*&OÌXPÕ"ŽWžpíOÌè‚Þ3Òr0{Ž†R=_?…/¼žÞ0,ê=/?£ûÓËîy“2Z<ij³[ËÁì™÷–ôžÎ’Ããa÷<Maêéí…¼ž}©žYýZ-˜=­”á¤}π>3°¢÷œ$ïè‰3ìž«ƒÄs¿—xnŒÀ*¯gi$ÕómDËÁìùIeоû‡À¬?3°x¾"~ª§c˜öÝÇî颌°›x¾Fßb>Ï}QXÓ{öFi-êÙßóR”œe^Ñ÷ü‘¿g[Lë ŽwJZϘë¹3”³L©gH‚,^Ïe 2ôžWGøëÙ2‚Î
øœL¾ÅqÈäõ,ýç\œË3¾þeྗ&`Ϻ<KÒf“’»ðù]í‰ãžU^wèþåÔÖy”H}ò•6ø6
It looks like the file is encoded.
Any idea how to find the encoding and make the file readable and editable?
It's binary and probably encoded so without knowledge of data structure you can't do much - just reverse engineering based on trying and checking what changed, operating with hex editor.
It isn't impossible, tho. If you can change the data the way you know (eg. change number of orders from 1 to 2) and export to file, you can compare binary values and find which byte holds that number. Of course if it is encrypted and you don't know the key... It's easier to find another way.
For further read, check this out - https://en.wikibooks.org/wiki/Reverse_Engineering/File_Formats
If you've got access to a Linux box why not use
hexdump -C <filename>
You will be able to get a much better insight into how the file is structured, than by using a text editor.
There are also many "hexdump" equivalent commands on Windows

Understanding compile errors due to copying code from a doc file and not a txt file

SITUATION:
My instructor for my micro-controller class refuses to save sample code to a text file and instead saves it to a word document file instead. When I open up the doc file and copy/paste the code into my IDE "CodeWarrior" it causes errors upon compile time.
I am having to rewrite all the code into a text editor and then copy/paste it into my IDE.
MY UNDERSTANDING:
I was told to always save code as a text file because when you save code as a word document file it will bring in unwanted characters when your copy/pasting the code into your IDE for compiling.
MY QUESTIONS TO YOU:
1.)
Can someone explain this dilemma to me so I can understand it better? I would like to present a better case next time when I receive errors and to also know more about what is happening.
2.)
Is it possible to write a script that will show me all the characters that are being copied and pasted into a file when the code is coming from a word document vs. a text file? In otherwords is there a program that will allow me to see what is going on between copying/pasting code from a word doc file versus a txt file?
Saving source code as a Word document is just silly. If your instructor is insisting on this, chances are no matter how well-reasoned and thorough your argument, they're not going to listen. They're beyond help.
However, to answer your questions: 1) It depends on what you're pasting the thing into. Programs that copy onto the clipboard usually make the data available in several different formats, ranging from their own internal format to plain ASCII text, to maximize compatibility so that the data can be pasted into pretty much any target program. Most text editors will only accept the plan-text version, in which case no extra characters should be transferred. However if your text editor supports RTF or HTML, this may not be true. I'm not sure what CodeWarrior supports but it is certainly possible.
A workaround if this is the case: First paste into a PURE text editor like Notepad. Then copy from Notepad into CodeWarrior. This should eliminate any hidden formatting. As shoover said above, make sure double-quotes " are really double-quotes and not the fancy left- and right-specific quotes that Word sometimes uses.
Use a hex editor like XVI32 to see the raw contents of the file, including nonprinting characters. Or use a text editor with support for showing nonprinting characters (vi/vim, etc.).
I'm studying C and I've just had the same problem. When coping a piece of code from a PDF file and trying to compiling it, gcc would return a serie of errors. Reading the answer above I had an idea: "What if I converted the utf8 into ascii?". Well, I found a website that does just that (https://onlineutf8tools.com/convert-utf8-to-ascii). But instead of also converting the utf8 characters into ascii, it showed them as hexadecimals (Copying from the website to the text editor you can see it better). From there i realised that the problem were mostly the quote marks "".
I then copied the ascii "translation" into my code editor (I must add that it worked fine with Sublime, while VScode read the same utf8 code as it was in the original file, even after cp from the website) and replaced all the hex with the actual ascii characters that were needed to compile the code properly. I used the function find and replace from my editor to do it. I must say that it wasn't very fast doing it. But I believe that in some cases, if the code you're trying to copying is too long, doing it the way I've just described could be faster than rewriting the entire code.

Opening a .hs file with ghci in Terminal?

I'm using Mountain Lion. I open the terminal, then I load ghci, I write :l and then I try to load my file (which is in my desktop) by dragging it with the mouse from my desktop to the terminal, so I know that the location is correct and I get this, thank you in advance:
Prelude> :l /Users/myusername/Desktop/Test.hs
[1 of 1] Compiling Main ( /Users/myusername/Desktop/Test.hs, interpreted )
/Users/myusername/Desktop/Test.hs:1:7: parse error on input `\'
Failed, modules loaded: none.
Prelude>
Edit: The file Im trying to open (written in Text Edit) is:
double :: Int -> Int
double x = x + x
TextEdit is not a plaintext editor (unlike e.g. Windows Notepad), so by default it will include formatting junk in your files that GHC obviously isn't happy about. Apparently you can still use TextEdit if set up correctly, but it's quite recommendable to use a proper programming editor. Like any Unix, OSX comes with a vi flavour, which takes some time to get used to but isn't that hard and works fine; at least you can use it to check what's really in your file.
vi /Users/myusername/Desktop/Test.hs
or, even simpler
cat /Users/myusername/Desktop/Test.hs
will just give you the exact contents of your file.
For the choice what editor to use best, consider this question.
I tried the same procedure on Windows and it worked perfectly.
Have you tried to go to the directory inside GHCi and open it?
The procedure would be:
Prelude> :cd /Users/myusername/Desktop/
Prelude> :l Test.hs
For me, copying/pasting the code you have posted, both situations worked on Windows.
The presence of a \ implies you've got an RTF (rich text) file from TextEdit. RTF is TextEdit's default, and it's a format that annotates plain text with information about text font, size, etc.
I'd recommend vi or emacs, but to fix your immediate problem, open the file in TextEdit and hit Cmd-Shift-t to convert your file to plain text.

How to get cakePHP i18n .pot files encoded in UTF-8?

I'm using the cake i18n command to extract the content of my __() functions in my application.
However, the default.pot output file is not encoded in UTF-8, and thus does not correctly display accentuated characters which is a problem since the main language is french (lot of 'é' , 'à' ...).
I'm using wamp server on Windows 7.
I've tried to change Windows console's encoding with chcp, to convert default.pot file in UTF-8 with notepad++ or PSpad editor, without success.
Do you know any way to get this default.pot file in UTF-8 ?
All the .php or .ctp files are edited with either Komodo or Geany, both on Windows and configured to use UTF-8.
Also i'm using subversion, if it helps.
Thank you for reading.
Had the same problem with cakephp 1.3 (not sure if it is fixed in 2.x): all "special" character which are not ANSI compliant (such as ä, ü, ö, ß), where extracted in the .pot file and there interpreted by ANSI (e.g. "ü" instead of "ü").
The solution mentioned by Camille (manually changing the characters) was not very feasable, since it was a lot of characters, this partly destroyed the .pot format and even worse an automated update of your .po file won't work.
The workaround I found was with the help of a comment in the php documentation for write() (which is used in the console task): http://www.php.net/manual/en/function.fwrite.php#73764.
According to the description there, I extended the file /cake/console/libs/tasks/extract.php with two lines:
First line went into the function __buildFiles():
$string = utf8_decode($string);
I wrote it in line 351, but it just needs to be in the second foreach loop and of couse before the variable is used by the function.
the second line went into the function __writeHeader():
replace the line $File->write($output); with
$File->write(utf8_encode($output));
That did it for me, take care the updating your cakephp will overwrite this changes.
I found a way to deal with this thanks to #Msalters. I changed the default encoding of my editor and overwrote wrong characters.

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources