Possible bug in Matlab's "fileattrib" function. Workaround? - file

I have found some strange behaviour in Matlab's fileattrib function on Windows. With certain file names it wrongly identifies the file as a hidden, system folder.
To test it, download this file (the file is empty; it's only the file name that matters):
https://docs.google.com/file/d/0B9BeckFuQk1bNHY3T0NKaFpxbUU/edit?usp=sharing
Put the file on an empty folder (I'm using "c:\temp") and try this:
fileattrib('c:\temp\*')
If your Matlab is like mine, it will give you this wrong result:
ans =
Name: 'c:\temp\?aaa.txt'
archive: 1
system: 1
hidden: 1
directory: 1
[...]
Now rename the file name removing the first character and try again. It will correctly say
ans =
Name: 'c:\temp\aaa.txt'
archive: 1
system: 0
hidden: 0
directory: 0
[...]
I have seen this behaviour in Matlab R2010b and R2007a, on Windows Vista and 7.
The problem clearly has to do with certain "offending" characters (or character sets/encodings?), but I've no idea. Can someone figure out why this happens? And how to work around it?
EDIT:
This seems to have been corrected in R2015a (maybe earlier): it correctly returns
Name: 'C:\Users\Luis\Desktop\tmp\�aaa.txt'
archive: 1
system: 0
hidden: 0
directory: 0
[...]

One way to deal with this is not to depend (solely) on the fileattrib command.
In order to determine whether something is a file or directory, you can check how it registers when using the dir command on the containing folder.
Its a bit of a hassle, but when using dir called on the folder (won't work when called on the file directly) you seem to get the correct output.
A quick and dirty alternative would of course be to put your entire handling in a try / catch construction and if one fails simply try the other.

Related

error in looping over files, -fs- command

I'm trying to split some datasets in two parts, running a loop over files like this:
cd C:\Users\Macrina\Documents\exports
qui fs *
foreach f in `r(files)' {
use `r(files)'
keep id adv*
save adv_spa*.dta
clear
use `r(files)'
drop adv*
save fin_spa*.dta
}
I don't know whether what is inside the loop is correctly written but the point is that I get the error:
invalid '"e2.dta'
where e2.dta is the second file in the folder. Does this message refer to the loop or maybe what is inside the loop? Where is the mistake?
You want lines like
use "`f'"
not
use `r(files)'
given that fs (installed from SSC, as you should explain) returns r(files) as a list of all the files whereas you want to use each one in turn (not all at once).
The error message was informative: use is puzzled by the second filename it sees (as only one filename makes sense). The other filenames are ignored: use fails as soon as something is evidently wrong.
Incidentally, note that putting "" around filenames remains essential if any includes spaces.

Trouble with running words through text files and counting them

Python 3+
This is the error i get
This is my code
I want the user to input some words, then the program should run each word through my two textfiles, if the word exists in any of them, I want the program to add +1 to the positive/negative count list.
Thank you for your help :)
Seems like you have stumbled upon a Decoding error when trying to open one of the input files in the wordlist function. it is usually hard to determine the encoding used for a particular file. so you could :
1.Try opening the file with a different encoding such as ISO-8859-15,etc.
def OpenFile():
try:
with open("My File.txt",mode="r",encoding="IS0-8859-15")
#do process My File
except UnicodeDecodeError:
print("Something went Wrong Try a different file encoding")
else:
#everything was okay, return the required
finally:
# clean up here
2. Look it modules that try and determine the correct encoding for the file such as the chardet module
Install the
chardet module :
sudo pip3 install chardet
you can run it at the command line with your file as the Argument to determine the encoding
cd /path/to/File/
chardetect My\ File.txt
this should return the likely encoding for the given file
3.You can use the chardet module inside your python code however this is recommended in a case where you will be opening a file you do not have access to e.g at a clients computer whom wants to open another specified file
and reopening the same file and redetecting the encoding will cause your program to be slow.
First of all positive_count and negative_count should be integers and not lists. If you wish to count, adding 1 to the list isn't really what you're trying to accomplish.
Second of all, the UnicodeDecodeError is there because the encoding of the underlying file is not utf-8. Did you try utf-16 or utf-16-le? In case you're using Windows, utf-16-le is probably the encoding used unless you're using code-points in which case guessing will be a nightmare.

Extracting string from any non-binary file irrespective of its location within file

OK, here is a problem I have been unsuccessfully trying to cope with, writing a batch script. Suppose I have a file containing, say, some youtube addresses (for example a html file with links to youtube pages).
The content of the file may look like this:
Blaaaa blaa
blaa blaa blaa <a href=https://www.youtube.com/watch?v=9bZkp7q19f0>Gangnam1</a> blaaa blaa
<a href=https://www.youtube.com/watch?v=kYtGl1dX5qI&list=RD9bZkp7q19f0>Scream and shout</a> blaa blaa
blaaaaa <a href=https://www.youtube.com/watch?v=lWA2pjMjpBs&list=RD9bZkp7q19f0>Diamonds</a> blaa
blaa bla bla
The strings will be found using wildcard character mask, like this:
https://www.youtube.com/watch\?v=*>
(or something of this kind)
And the output saved in another file should look as follows:
https://www.youtube.com/watch?v=9bZkp7q19f0>
https://www.youtube.com/watch?v=kYtGl1dX5qI&list=RD9bZkp7q19f0>
https://www.youtube.com/watch?v=lWA2pjMjpBs&list=RD9bZkp7q19f0>
The search may of course regards also other strings, not only youtube related.
Simple commands like FIND or FINDSTR cannot be used, as they return the whole line containing the string. Similarly, FOR with tokens and delimiters seems to be of little use here, as the strings to be found are scattered irregularly all over the file, sometimes a few in the same line.
I really do not know how to solve this problem. It may seem simple, still I have never found a script or program that would give an output like that. Perhaps there even exists a ready, compiled program to do it. I will owe a lot for any help.
I'll use another scripting language as Bat to do that.
Here I made a little exemple in Autoit :
StringBetween.au3
#include <String.au3>
Local $hOutFile=FileOpen("output.txt",2)
Local $hTexte=FileRead($CmdLine[1])
$AFind=_StringBetween($hTexte,$cmdline[2],$cmdline[3])
For $i= 0 To UBound($Afind)-1 step 1
FileWrite($hOutFile,$AFind[$i]&#crlf)
Next
FileClose($hOutFile)
You can compile it yourself or Download it already compiled here :
StringBetween.rar
Usage :
Stringbetween [InPutFile] [StringRight] [StringLeft]
Ouput : "Output.txt"
In your case :
Stringbetween.exe "example.html" "<a href=" ">"
A file "Output.txt" will be created with :
https://www.youtube.com/watch?v=9bZkp7q19f0
https://www.youtube.com/watch?v=kYtGl1dX5qI&list=RD9bZkp7q19f0
https://www.youtube.com/watch?v=lWA2pjMjpBs&list=RD9bZkp7q19f0
Thank you for your quick response. It really helped a lot. I am impressed very much.
I have never used AutoIt and now I see it is a useful utility indeed! I have downloaded the program and had big fun trying it. I like the huge library of functions (although their being scattered all over various scripts makes them slightly messy and unintuitive to find) and particularly the ability to compile script code into executable files. I will certainly use it in future, too.
I have slightly modified your script, to be able to process many files from one directory at a time. This is what it looks like right now:
#include <String.au3>
#include <File.au3>
#include <Array.au3>
#include <MsgBoxConstants.au3>
#include <WinAPIFiles.au3>
;Parameters:
Local $Ldelimiter, $Rdelimiter, $Filter, $Outputfilename
;Prompt for parameters if not stated in command line:
If $CmdLine[0] < 1 Then
$Ldelimiter=InputBox("","Enter the left delimiter :","")
Else
$Ldelimiter=$CmdLine[1]
EndIf
If $CmdLine[0] < 2 Then
$Rdelimiter=InputBox("","Enter the right delimiter :","")
Else
$Rdelimiter=$CmdLine[2]
EndIf
If $CmdLine[0] < 3 Then
$Filter=InputBox("","Enter the filter mask :","*.*")
Else
$Filter=$CmdLine[3]
EndIf
If $CmdLine[0] < 4 Then
$Outputfilename=InputBox("","Enter the name of output file :","output.txt")
Else
$Outputfilename=$CmdLine[4]
EndIf
Local $hOutFile=FileOpen($Outputfilename,2) ;Open output file
Local $curpath=_WinAPI_GetCurrentDirectory() ;Get current directory
Local $FileList=_FileListToArray($curpath,$Filter,1) ;Make an array with the list of files to process
For $k= 1 To UBound($FileList)-1 step 1 ;Process a file from the list
Local $hTexte=FileRead($FileList[$k]) ;Read file content
$AFind=_StringBetween($hTexte,$Ldelimiter,$Rdelimiter) ;Make an array with the list of strings to be found
For $i= 0 To UBound($Afind)-1 step 1 ;Get a string from the list
FileWrite($hOutFile,$Ldelimiter&$AFind[$i]&$Rdelimiter&#crlf) ;Write the string to output file
Next
Next
FileClose($hOutFile)
exit
Usage:
Stringbetween [StringLeft] [StringRight] [FileMask] [OutputFile]
If you fail to provide parameters in command line, the program will prompt for them. FileMask is *.* by default (all files in directory will be processed). I have also added the left and right delimiter to the output.
Regards
PS: I still wonder if it is possible to do the same with simple BAT.

FindFirstFile returns access denied

I'm trying to create a robust recursive folder deleter function.
With normal directories works pretty fine.
The problem appears when I create a "hardcore" direcory, like:
C:\test\x\x\x\x\x\x\x\x\x\x\x\x\x\x\x\x\x\x\x\x\ ... \x\x\x
The length of this is around 25000 (less then the MSDN limit: 32,767). Basically I created this directory recursively until the CreatDirectory function failed.
Now, the strangest thing is, that my function is able to delete 2 directories then the FindFirstFile fails with 0x5:
\\?\C:\test\x\ ... \x\x\x\*.* < no error
\\?\C:\test\x\ ... \x\x\*.* < no error
\\?\C:\test\x\ ... \x\*.* < access denied
(I can rerun the it, the app is slowly chews up the folder, 2 by 2, probably until the path length gets pretty small)
I'm running FindFirstFile to check if the folder is empty.
Is there any sort of limitation that is less documented?
The FindFirstFile just simply doesn't work? (buggy?)
Am I missing some sort of NTFS permission thing?
Something else ...
EDIT:
IMPORTANT NOTE: If I run the program step by step slowly ... then nothing will fail.
You are probably experiencing something like a virus scanner, indexer or continuous-backup solution holding a handle to the directory. If the Indexing Service is configured to index that folder for example.
Trying to delete a folder or file which is open other than with FILE_SHARE_DELETE flag will cause ACCESS_DENIED.
To confirm this, use Process Monitor to see opens and closes on anything matching your path.
(Of course also confirm you called FindClose).

Unix : script as proxy to a file

Hi : Is there a way to create a file which, when read, is generated dynamically ?
I wanted to create 3 versions of the same file (one with 10 lines, one with 100 lines, one with all of the lines). Thus, I don't see any need for these to be static, but rather, it would be best if they were proxies from a head/tail/cat command.
The purpose of this is for unit testing - I want a unit test to run on a small portion of the full input file used in production. However, since the code only runs on full files (its actually a hadoop map/reduce application), I want to provide a truncated version of the whole data set without duplicating information.
UPDATE: An Example
more myActualFile.txt
1
2
3
4
5
more myProxyFile2.txt
1
2
more myProxyFile4.txt
1
2
3
4
etc.... So the proxy files are DIFFERENT named files with content that is dynamically provided by simply getting the first n lines of the main file.
This is hacky, but... One way is to use named pipes, and a looping shell script to generate the content (one per named pipe). This script would look like:
while true; do
(
for $(seq linenr); do echo something; done
) >thenamedpipe;
done
Your script would then read from that named pipe.
Another solution, if you are ready to dig into low level stuff, is FUSE.

Resources