Clojure, raynes.fs, iterate-dir: a cryptic file-related data structure

Clojure, raynes.fs, iterate-dir: a cryptic file-related data structure - file

I'm trying to traverse a filesystem tree using iterate-dir function from raynes.fs library: https://github.com/Raynes/fs/blob/master/src/me/raynes/fs.clj
Everything makes sense, except for the "File" part of the returned sequence of vectors (I tried to catch output from the clipboard and define a variable of it for research purposes):
(def res ([#<File /home/alexey/dir-src> #{"3" "4" "1" "2" "10"} #{"Фото-0015.jpg"}]
[#<File /home/alexey/dir-src/1> #{} #{"vB8vqyc4XBk.jpg" "valet.jpg"}]
[#<File /home/alexey/dir-src/10> #{} #{"jca3.jpg" "jca10.jpg" "jca1.jpg" "jca4.jpg" "jca2.jpg"}]
[#<File /home/alexey/dir-src/2> #{"002"} #{"warrior-babe-305079.jpg" "wallp_fant_0017.jpg"}]
[#<File /home/alexey/dir-src/2/002> #{} #{"tumblr_mt7rckyTbi1qd5ic3o1_500.jpg"}]
[#<File /home/alexey/dir-src/3> #{} #{"Сияние-cosplay-931717.jpeg"}]
[#<File /home/alexey/dir-src/4> #{} #{}]))
clojure.lang.ExceptionInfo: Unreadable form :: {:column 14, :line 27, :type :reader-exception}
The error also makes perfect sense, as there's no data structure literals like #< ... > in Clojure. I do need the paths for mapping and sorting, but I have no idea how to access them. By the way, the standard compare function gets accepted:
(sort compare (fs/iterate-dir path))
the sequence above is sorted, though not quite the way I want it.

The File objects are standard java.io.File instances.
You can look up java.io.File in javadoc to find the operations that can be done on a File.
Like most java classes, it has no Clojure read support, so it must be created via constructor calls (ie. (java.io.File. ".") to get a File for the current directory).
As the doc for iterate-dir mentions, it will return a sequence of every node in the tree starting with the input. You can ignore the File if it does not have anything you want in it. Likely, at each node you would want to know the path to that node though.
user> (java.io.File. ".")
#<File .>
user> (.getCanonicalPath (java.io.File. "."))
"/home/noisesmith/example/"

Related

How to Create Text Files from an Array of Values in Powershell

I have a text file "list.txt" with a list of hundreds of URL's that I want to parse, along with some common-to-all config data, into individual xml files (config files) using each value in "list.txt", like so:
list.txt contains:
line_1
line_2
line_3
The boilerplate config data looks like (using line_1 as an example):
<?xml version="1.0"?>
<Website xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Url>line_1.mydomain.com</Url>
<Title>line_1</Title>
<Enabled>true</Enabled>
<PluginInName>Tumblr</PluginInName>
</Website>
So if "list.txt" contains 100 items, I want 100 config files written with the URL and Title elements individualized.
I have fumbled with several posts on reading the array and on creating text files, but I haven't been able to make any of it work.
What I tried, although it's munged at this point. I'm not sure where I started or how I got to here:
$FileName = "C:\temp\list.txt"
$FileOriginal = Get-Content $FileName
# create an empty array
Foreach ($Line in $FileOriginal)
{
$FileModified += $Line
if ($Line -match $pattern)
{
# Add Lines after the selected pattern
$FileModified += 'add text'
$FileModified += 'add second line text'
}
}
Set-Content $fileName $FileModified
This is way beyond my neophyte Powershell skills. Can anyone help out?

You're looking for a string-templating approach, where a string template that references a variable is instantiated on demand with the then-current variable value:
# Define the XML file content as a *template* string literal
# with - unexpanded - references to variable ${line}
# (The {...}, though not strictly necessary here,
# clearly delineates the variable name.)
$template = #'
<code>
<?xml version="1.0"?>
<Website xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Url>${line}.mydomain.com</Url>
<Title>${line}</Title>
<Enabled>true</Enabled>
<PluginInName>Tumblr</PluginInName>
</Website>
'#
# Loop over all input lines.
Get-Content C:\temp\list.txt | ForEach-Object {
$line = $_ # store the line at hand in $line.
# Expand the template based on the current $line value.
$configFileContent = $ExecutionContext.InvokeCommand.ExpandString($template)
# Save the expanded template to an XML file.
$configFileContent | Set-Content -Encoding Utf8 "$line.xml"
}
Notes:
I've chosen UTF-8 encoding for the output XML files, and to name them "$line.xml", i.e. to name them for each input line and to store them in the current location; adjust as needed.
The template expansion (interpolation) is performed via automatic variable $ExecutionContext, whose .InvokeCommand property provides access to the .ExpandString() method, which allows performing string expansion (interpolation) on demand, as if the input string were a double-quoted string - see this answer for a detailed example.
Surfacing the functionality of the $ExecutionContext.InvokeCommand.ExpandString() method in a more discoverable way via an Expand-String cmdlet is the subject of this GitHub feature request.
Ansgar Wiechers points out that a simpler alternative in this simple case - given that only a single piece of information is passed during template expansion - is to use PowerShell's string-formatting operator, -f to fill in the template:
# Define the XML file content as a *template* string literal
# with '{0}' as the placeholder for the line variable, to
# be instantiated via -f later.
$template = #'
<code>
<?xml version="1.0"?>
<Website xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Url>{0}.mydomain.com</Url>
<Title>{0}</Title>
<Enabled>true</Enabled>
<PluginInName>Tumblr</PluginInName>
</Website>
'#
# Loop over all input lines.
Get-Content C:\temp\list.txt | ForEach-Object {
# Expand the template based on the current $line value.
$configFileContent = $template -f $_
# Save the expanded template to an XML file.
$configFileContent | Set-Content -Encoding Utf8 "$line.xml"
}
Optional reading: choosing between -f and $ExecutionContext.InvokeCommand.ExpandString() for template expansion:
Tip of the hat to Ansgar for his help.
Using -f:
Advantages:
It is made explicit on invocation what values will be filled in.
Additionally, it's easier to include formatting instructions in placeholders (e.g., {0:N2} to format numbers with 2 decimal places).
Passing the values explicitly allows easy reuse of a template in different scopes.
An error will occur by default if you accidentally pass too few or too many values.
Disadvantages:
-f placeholders are invariably positional and abstract; e.g., {2} simply tells you that you're dealing with the 3rd placeholder, but tells you nothing about its purpose; in larger templates with multiple placeholders, this can become an issue.
Even if you pass the right number of values, they may be in the wrong order, which can lead to subtle bugs.
Using $ExecutionContext.InvokeCommand.ExpandString():
Advantages:
If your variables have descriptive names, your template will be more readable, because the placeholders - the variable names - will indicate their purpose.
No need to pass values explicitly on invocation - the expansion simply relies on the variables available in the current scope.
Disadvantages:
If you use a template in multiple functions (scopes), you need to make sure that the variables referenced in the template are set in each.
At least by default, $ExecutionContext.InvokeCommand.ExpandString() will quietly ignore nonexistent variables referenced in the template - which may or may not be desired.
However, you can use Set-StrictMode -Version 2 or higher to report an error instead; using Set-StrictMode is good practice in general, though note that its effect isn't lexically scoped and it can disable convenient functionality.
Generally, you manually need to keep your template in sync with the code that sets the variables referenced in the template, to ensure that the right values will be filled in (e.g., if the name of a referenced variable changes, the template string must be updated too).

Perl data structures - looping an array of hashes inside a hash

I need a data structure to keep metadata about a field in a database, which I'm going to access to write dynamic SQL.
I'm using a hash to store things like the name, maybe data type, etc. And most importantly, an array of hashes containing information about the values I want to query out of the field, and the name I want to alias them with.
When I try to access elements of that array, I get:
Global symbol "%elem" requires explicit package name at test.pl line 18.
It sounds like maybe it's having trouble registering the fact that the loop variable representing the array elements is a hash, not a scalar. If I try:
foreach my %elem
then I get:
Missing $ on loop variable at test.pl line 17 (#1)
So far I can't find the relevant Perl documentation that addresses this.
#!/usr/local/bin/perl
use warnings;
use strict;
use diagnostics;
use POSIX 'strftime';
my %struct = (
#"field" = "foobar",
"values" => [
{value => "Y", name => "FOO"}
, {value => "N", name => "BAR"}
]
);
foreach my $elem (#{$struct->{'values'}}) {
print $elem->{'value'};
}
I expect the program to print "YN" to the console.
UPDATE, as someone pointed out I needed to use %hash->{'ref'} in the loop addressing. I added it. Now, I get a notification saying that using a hash as a reference is deprecated (?) but it is printing to the console now!

When I tried running your code, I got a different error than you reported:
Global symbol "$struct" requires explicit package name
This is because you've defined a hash %struct, not a hashref $struct, so you don't need to dereference it. Thus, I changed the line
foreach my $elem (#{$struct->{'values'}}) {
to
foreach my $elem (#{$struct{'values'}}) {
(note no -> to dereference) and it ran perfectly, no errors or warnings, and emitted the output
YN
as expected.

%struct is a hash, not a hash reference. Therefore, $struct->{'values'} is not the correct way to access the values key.
for my $elem (#{$struct{values}}) {
print "$elem->{value}\n";
}

Why can't I lookup an array index inside a foreach loop in Powershell?

first question on here so forgive me if I make any mistakes, I will try to stick to the guidelines.
I am trying to write a PowerShell script that populates two arrays from data I read in via CSV file. I'm using the arrays to cross-reference directory names in order to rename each directory. One array contains the current name of the directory and the other array contains the new name.
This all seems to be working so far. I successfully create and populate the arrays, and using a short input and index lookup to check my work I can search one array for a current name and successfully retrieve the correct new name from the second array. However when I try to implement this code in a foreach loop that runs through every directory name, I can't lookup the array index (it keeps coming back as -1).
I used the code in the first answer found here as my template.
Read a Csv file with powershell and capture corresponding data . Here's my modification to the input lookup, which works just fine:
$input = Read-Host -Prompt "Merchant"
if($Merchant -contains $input)
{
Write-Host "It's there!"
$find = [array]::IndexOf($Merchant, $input)
Write-Host Index is $find
}
Here is my foreach loop that attempts to use the Index lookup, but returns -1 every time. However I know it's finding the file because it enters the if statement and prints "It's there!"
foreach($file in Get-ChildItem $targetDirectory)
{
if($Merchant -contains $file)
{
Write-Host "It's there!"
$find = [array]::IndexOf($Merchant, $file)
Write-Host Index is $find
}
}
I can't figure it out. I'm a PowerShell newb so maybe it's a simple syntax problem, but it seems like it should work and I can't find where I'm going wrong.

Your problem seems to be that $Merchant is a collection of file names (of type string), whereas $file is a FileInfo object.
The -contains operator expects $file to be a string, since $Merchant is a string array, and works as you expect (since FileInfo.ToString() just returns the file name).
IndexOf() isn't so forgiving. It recognizes that none of the items in $Merchant are of the type FileInfo, so it never finds $file.
You can either refer directly to the file name:
[array]::IndexOf($Merchant,$file.Name)
or, as #PetSerAl showed, convert $file to a string instead:
[array]::IndexOf($Merchant,[string]$file)
# or
[array]::IndexOf($Merchant,"$file")
# or
[array]::IndexOf($Merchant,$file.ToString())
Finally, you can call IndexOf() directly on the array, no need to use the static method:
$Merchant.IndexOf($file.Name)

(Perl) Trying to write a foreach statement with a simple array. Confused with the formatting

I'm a beginner in programming, this is my first language. And in my class we are using a slightly out of date book to learn with (Book copyrighted '02). Doubt this would affect you helping me much, but worth noting.
The problem
I don't know how to format a simple foreach statement using/combined with an array. I'm getting mixed up and my book doesn't provide examples. I'm trying to get it so the Uses/Primary_Uses are shown when the user checks multiple checkboxes.
#!/usr/bin/perl
#c04ex5.cgi - creates a dynamic Web page that acknowledges
#the receipt of a registration form
print "Content-type: text/html\n\n";
use CGI qw(:standard -debug);
use strict;
#declare variables
my ($name, $serial, $modnum, $sysletter, $primary_uses, $use, #primary_uses, #uses);
my #models = ("Laser JX", "Laser PL", "ColorPrint XL");
my #systems = ("Windows", "Macintosh", "UNIX");
my #primary_uses = ("Home", "Business", "Educational", "Other");
#assign input items to variables
$name = param('Name');
$serial = param('Serial');
$modnum = param('Model');
$sysletter = param('System');
#primary_uses = param('Use');
#create Web page
print "<HTML><HEAD><TITLE>Juniper Printers</TITLE></HEAD>\n";
print "<BODY><H2>\n";
print "Thank you , $name, for completing \n";
print "the registration form.<BR><BR>\n";
print "We have registered your Juniper $models[$modnum] printer, \n";
print "serial number $serial.\n";
print "You indicated that the printer will be used on the\n";
print "$systems[$sysletter] system. <BR>\n";
print "The primary uses for this printer will be the following:\n";
#The part I'm having trouble with.
foreach $use (#primary_uses) {
print "$use [#use]<BR>\n";
}
print "</H2></BODY></HTML>\n";
My naming of variables might be a bit off, I was getting desperate and making sure I declare more than I should.

If you wanted to print a simple list of items, you should just use the $use variable:
foreach $use (#primary_uses) {
print "$use<BR>\n";
}
Note that this will also remove the fatal error that comes from not declaring #use. Perhaps that was also a point of confusion for you. $use and #use are two completely different variables, despite having the same name.
Note that you can print a list with the CGI module very easily:
my $cgi = CGI->new;
print $cgi->li(\#primary_uses);
Outputs the list interpolated in a list html entity, like so:
<li>Home</li> <li>Business</li> <li>Educational</li> <li>Other</li>
Some other pointers:
Note that it is a good idea to declare your variables in the smallest scope possible
foreach my $use (#primary_uses) { # note the use of "my"
print "$use<BR>\n";
}
That also goes with the other variables. A good idea is to declare them right as you initialize them:
my $name = param('Name');
Then people who read your code don't have to scan backwards in the file to see where the variable has "been" before.
Note that you should never, ever use the content of data from a web form without sanitizing it first, because it is a huge security risk, especially when you print it. It allows a web user to execute arbitrary code on your system.
You should know that for and foreach are aliases for the same function.
Also, you should always, always use warnings:
use warnings;
There really is no good reason to ever not turn warnings on.

foreach my $myuse (#primary_uses) {
print $myuse;
}
You need to declare the variable.

Creating a list of duplicate filenames with Perl

I've been trying to write a script to pre-process some long lists of files, but I am not confident (nor competent) with Perl yet and am not getting the results I want.
The script below is very much work in progress but I'm stuck on the check for duplicates and would be grateful if anyone could let me know where I am going wrong. The block dealing with duplicates seems to be of the same form as examples I have found but it doesn't seem to work.
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', $ARGV[0] or die "can't open: $!";
foreach my $line (<$fh>) {
# Trim list to remove directories which do not need to be checked
next if $line =~ m/Inventory/;
# MORE TO DO
next if $line =~ m/Scanned photos/;
$line =~ s/\n//; # just for a tidy list when testing
my #split = split(/\/([^\/]+)$/, $line); # separate filename from rest of path
foreach (#split) {
push (my #filenames, "$_");
# print "#filenames\n"; # check content of array
my %dupes;
foreach my $item (#filenames) {
next unless $dupes{$item}++;
print "$item\n";
}
}
}
I am struggling to understand what is wrong with my check for duplicates. I know the array contains duplicates (uncommenting the first print function gives me a list with lots of duplicates). The code as it stands generates nothing.
Not the main purpose of my post but my final aim is to remove unique filenames from the list and keep filenames which are in duplicated in other directories.
I know that none of these files are identical but many are different versions of the same file which is why I'm focussing on filename.
Eg I would want an input of:
~/Pictures/2010/12345678.jpg
~/Pictures/2010/12341234.jpg
~/Desktop/temp/12345678.jpg
to give an output of:
~/Pictures/2010/12345678.jpg
~/Desktop/temp/12345678.jpg
So I suppose ideally it would be good to check for uniqueness of a match based on the regex without splitting if that is possible.

This below loop does nothing, because the hash and the array only contain one value for each loop iteration:
foreach (#split) {
push (my #filenames, "$_"); # add one element to lexical array
my %dupes;
foreach my $item (#filenames) { # loop one time
next unless $dupes{$item}++; # add one key to lexical hash
print "$item\n";
}
} # #filenames and %dupes goes out of scope
A lexical variable (declared with my) has a scope that extends to the surrounding block { ... }, in this case your foreach loop. When they go out of scope, they are reset and all the data is lost.
I don't know why you copy the file names from #split to #filenames, it seems very redundant. The way to dedupe this would be:
my %seen;
my #uniq;
#uniq = grep !$seen{$_}++, #split;
Additional information:
You might also be interested in using File::Basename to get the file name:
use File::Basename;
my $fullpath = "~/Pictures/2010/12345678.jpg";
my $name = basename($fullpath); # 12345678.jpg
Your substitution
$line =~ s/\n//;
Should probably be
chomp($line);
When you read from a file handle, using for (foreach) means you read all the lines and store them in memory. It is preferable most times to instead use while, like this:
while (my $line = <$fh>)

TLP's answer gives lots of good advice. In addition:
Why use both an array and a hash to store the filenames? Simply use the hash as your one storage solution, and you will automatically remove duplicates. i.e:
my %filenames; #outside of the loops
...
foreach (#split) {
$filenames{$_}++;
}
Now when you want to get the list of unique filenames, just use keys %filenames or, if you want them in alphabetical order, sort keys %filenames. And the value for each hash key is a count of occurrences, so you can find out which ones were duplicated if you care.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight