Trouble with running words through text files and counting them - file

Python 3+
This is the error i get
This is my code
I want the user to input some words, then the program should run each word through my two textfiles, if the word exists in any of them, I want the program to add +1 to the positive/negative count list.
Thank you for your help :)

Seems like you have stumbled upon a Decoding error when trying to open one of the input files in the wordlist function. it is usually hard to determine the encoding used for a particular file. so you could :
1.Try opening the file with a different encoding such as ISO-8859-15,etc.
def OpenFile():
try:
with open("My File.txt",mode="r",encoding="IS0-8859-15")
#do process My File
except UnicodeDecodeError:
print("Something went Wrong Try a different file encoding")
else:
#everything was okay, return the required
finally:
# clean up here
2. Look it modules that try and determine the correct encoding for the file such as the chardet module
Install the
chardet module :
sudo pip3 install chardet
you can run it at the command line with your file as the Argument to determine the encoding
cd /path/to/File/
chardetect My\ File.txt
this should return the likely encoding for the given file
3.You can use the chardet module inside your python code however this is recommended in a case where you will be opening a file you do not have access to e.g at a clients computer whom wants to open another specified file
and reopening the same file and redetecting the encoding will cause your program to be slow.

First of all positive_count and negative_count should be integers and not lists. If you wish to count, adding 1 to the list isn't really what you're trying to accomplish.
Second of all, the UnicodeDecodeError is there because the encoding of the underlying file is not utf-8. Did you try utf-16 or utf-16-le? In case you're using Windows, utf-16-le is probably the encoding used unless you're using code-points in which case guessing will be a nightmare.

Related

Opening Japanese-named files in Lua

I have bunch of XML files named in Japanese. I use Lua to read them and put the necessary informations into tables. I could open files named only in a single kanji like 名.xml, but for multiple kanjis like 名前.xml it was contrawise. Before I ran the Lua file, I set the command line's code page to 65001 (as UTF-8). And to read the files I need to encode the filename using WinAPI library from ACP (ASCII code page?) to UTF-8, but this encoding only works for the single kanjis. I've tried several suggestions across internet, using short path to the file, etc. but none of them worked. I tried to use the short path by running Lua as administrator--as stated in other similar question that you need administrator previleges to use the short path--but no luck.
...
for fn in io.popen("DIR xml /B /AA"):lines() do
...
local f = assert(io.open("xml\\" .. winapi.encode(winapi.CP_UTF8, winapi.CP_ACP, fn), "rb"))
...
end
...
But my code produced "Invalid argument" error. I searched this error but none of them are Lua-related, so I opened the C/C++-related ones, but what I got was only 'use _wfopen' or something like that. It's not implemented in Lua and neither I want to implement it myself. So anyone have any idea how to solve this? For more information just be sure to let me know. Thanks!
I don't know why your program does not work, but try this workaround:
local pipe = io.popen([[for %G in (xml\*) do #(type "%G" & echo #FILENAMEMARKER#%G)]], "rb")
local all_files = pipe:read"*a"
pipe:close()
for filecontent, filename in all_files:gmatch"(.-)#FILENAMEMARKER#(.-)\r?\n" do
-- process your file here
print('===== This is your file name:')
print(filename)
print('== This is your file content:')
print(filecontent)
print('== End of file')
end
I think you can use the Japanese alphabet in a table like
local jaAlphbet={"一","|","丶","ノ","乙","亅","<","二","亠","人","⺅","𠆢","儿","入","ハ","丷","冂","冖","冫","几","凵","刀","⺉","力","勹","匕","匚","十","卜","卩","厂","厶","又","マ","九","ユ","乃","𠂉","⻌","口","囗","土","士","夂","夕","大","女","子","宀","寸","小","⺌","尢","尸","屮","山","川","巛","工","已","巾","干","幺","广","廴,"廾","弋","弓","ヨ","彑","彡","彳","⺖","⺘","⺡","⺨","⺾","⻏","⻖","也","亡","及","久","⺹","心","戈","戸","手","支","攵","文","斗","斤","方","无","日","曰","月","木","欠","止","歹","殳","比","毛","氏","气","水","火","⺣","爪","父","爻","爿","片","牛","犬","⺭","王","元","井","勿","尤","五","屯","巴","毋","玄","瓦","甘","生","用","田","疋","疒","癶","白","皮","皿","目","矛","矢","石","示","禸","禾","穴","立","⻂","世","巨","冊","母","⺲","牙","瓜","竹","米","糸","缶","羊","羽","而","耒","耳","聿","肉","自","至","臼","舌","舟","艮","色","虍","虫","血","行","衣","西","臣","見","角","言","谷","豆","豕","豸","貝","赤","走","足","身","車","辛","辰","酉","釆","里","舛","麦","金","長","門","隶","隹","雨","青","非","奄","岡","免","斉","面","革","韭","音","頁","風","飛","食","首","香","品","馬","骨","高","髟","鬥","鬯","鬲","鬼","竜","韋","魚","鳥","鹵","鹿","麻","亀","啇","黄","黒","黍","黹","無","歯","黽","鼎","鼓","鼠","鼻","齊","龠"}
print(jaAlphbet[1])--and you can call the letters, letter by letter
sorry but thats all i know about the subject you are talking about but i hope this helps

Reading in the output of a program, saving it as a string, and using that string in the original program

I am using trying to get specific information from a group of MP3 files, currently I am in the main cygwin64 that holds MP3 files and a .C file which simply contains
FILE * fp;
It contains that single line of code because when that line of code is in place and I type and run "thing.c" in the cygwin command line it outputs what seems the be the information of the contents of the folder. For example it outputs,
home: sticky, directory
lib: directory
sbin: directory
setup-x86_64.exe: PE32+ executable (GUI) x86-64 (stripped to external PDB), for MS Windows
song.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
song1.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
thing.c: ASCII text, with CRLF line terminators
thing.txt: empty
What I want to do is be able to pull that output into a string that I can then use in my C file and alter and then re print out the new altered information. However I'm not sure where the output really is coming from or how I might be able to get it or save the output as a .txt file or back into a C file.
Any advice is appreciated Thanks!
This file is not really a C file at all. Because you're in Cygwin, you're likely operating on a case-insensitive filesystem (NTFS). As such, Cygwin's file command is running when you run the .c file. The way you've attempted to declare a variable (apparently) just so happens to be doing a 'file * fp' command. I'm sure you're getting fp: Cannot open "fp" or something similar after the rest of your output.
This is not anything C-related at all but is just being interpreted as a script by your shell.
It sounds like you have a lot to learn if you want to do this in C. More likely, you can probably write a shell script to accomplish what you want. While I've never used it, mp3info (https://github.com/jaalto/cygwin-package--mp3info) exists for pulling tag information from MP3 files. You could possibly get the exact information you want from that, or pipe the output into sed, awk, or a number of other tools.

Tcl determine file name from browser upload

I have run into a problem in one of my Tcl scripts where I am uploading a file from a Windows computer to a Unix server. I would like to get just the original file name from the Windows file and save the new file with the same name. The problem is that [file tail windows_file_name] does not work, it returns the whole file name like "c:\temp\dog.jpg" instead of just "dog.jpg". File tail works correctly on a Unix file name "/usr/tmp/dog.jpg", so for some reason it is not detecting that the file is in Windows format. However Tcl on my Windows computer works correctly for either name format. I am using Tcl 8.4.18, so maybe it is too old? Is there another trick to get it to split correctly?
Thanks
The problem here is that on Windows, both \ and / are valid path separators so long Windows API is concerned (even though only \ is deemed to be "official" on Windows). On the other hand, in POSIX, the only valid path separator is /, and the only two bytes which can't appear in a pathname component are / and \0 (a byte with value 0).
Hence, on a POSIX system, "C:\foo\bar.baz" is a perfectly valid short filename, and running
file normalize {C:\foo\bar.baz}
would yield /path/to/current/dir/C:\foo\bar.baz. By the same logic, [file tail $short_filename] is the same as $short_filename.
The solution is to either do what Glenn Jackman proposed or to somehow pass the short name from the browser via some other means (some JS bound to an appropriate file entry?). Also you could attempt to detect the user's OS from the User-Agent header.
To make Glenn's idea more agnostic to user's platform, you could go like this:
Scan the file name for "/".
If none found, do set fname [string map {\\ /} $fname] then go to the next step.
Use [file tail $fn] to extract the tail name.
It's not very bullet-proof, but supposedly better than nothing.
You could always do [lindex [split $windows_file_name \\] end]

removing a line from a text file?

I am working with a text file, which contains a list of processes under my programs control, along with relevant data.
At some point, one of the processes will finish, and thus will need to be removed from the file (as its no longer under control).
Here is a sample of the file contents (which has enteries added "randomly"):
PID=25729 IDLE=0.200000 BUSY=0.300000 USER=-10.000000
PID=26416 IDLE=0.100000 BUSY=0.800000 USER=-20.000000
PID=26522 IDLE=0.400000 BUSY=0.700000 USER=-30.000000
So for example, if I wanted to remove the line that says PID=26416.... how could I do that, without writing the file over again?
I can use external unix commands, however I am not very familiar with them so please if that is your suggestion, give an example.
Thanks!
Either you keep the contents of the file in temporary memory and then rewrite the file. Or you could have a file for each of the PIDs with the relevant information in them. Then you simply delete the file when it's no longer running. Or you could use a database for this instead.
As others have already pointed out, your only real choice is to rewrite the file.
The obvious way to do that with "external UNIX commands" would be grep -v "PID=26416" (or whatever PID you want to remove, obviously).
Edit: It is probably worth mentioning that if the lines are all the same length (as you've shown here) and order doesn't matter, you could delete a line more efficiently by copying the last line into the space being vacated, then shorten the file so eliminate what had been the last line. This will only work if they really are all the same length though (e.g., if you got a PID of '1', you'd need to pad it to the same length as the others in the file).
The only way is by copying each character that comes after the deleted line down over the characters that are deleted.
It is far more efficient to simply rewrite the file.
how could I do that, without writing the file over again?
You cannot. Filesystems (perhaps besides more esoteric record based ones) does not support insertion or deletion.
So you'll have to write the lines to a temporary file up till the line you want to delete, skip over that line, and write the rest of the lines to the file. When done, rename/copy the temp file to the original filename
Why are you maintaining these in a text file? That's not the best model for such a task. But, if you're stuck with it ... if these lines are guaranteed to all be the same length (it appears that way from the sample), and if the order of the lines in the file doesn't matter, then you can write the last line over the line for the process that has died and then shorten the file by one line with the (f)truncate() call if you're on a POSIX system: see Jonathan Leffler's answer in How to truncate a file in C?
But note carefully netrom's answer, which gives three different better ways to maintain this info.
Also, if you stick with a text file (preferably written from scratch each time from data structures you maintain, as per netrom's first suggestion), and you want to be sure that the file is always well formed, then write the new data into a temp file on the same device (putting it in the same directory is easiest) and then do a rename() call, which is an atomic operation.
You can use sed:
sed -i.bak -e '/PID=26416/d' test
-i is for editing in place. It also creates a back-up file with the new extension .bak
-e is for specifying the pattern. The /d indicates all lines matching the pattern should be deleted.
test is the filename
The unix command for it is:
grep -v "PID=26416" myfile > myfile.tmp
mv myfile.tmp myfile
The grep -v part outputs the file without the rows with the search term.
The > myfile.tmp part creates a new temp file for this output.
The mv part renames the temp file to the original file.
Note that we are rewriting the file here, and moreover, we can lose data if someone write something to file between the two commands.

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources