How would I implement a breadth first traversal of a directory(depth unknown and not necessarily symmetrical).
My first thought was using fork. I am not sure how to implement it. I was thinking a loop that would first get the parent/s, then's get the number of children of those parents, then forks so many times based on how many children and chdir to that child, which since multi-processes were made all children are then chdir'd to. Then return the children as parents to be forked.
I feel like there are possible hole's in this and I am looking for input on possible flaws or is this a terrible approach. I have heard about people using fork with breadth first, but never found any examples, so if you have any I would gladly look at them.
Your code will look like:
Set initialize todo queue with the base directory.
While the todo queue isn't empty,
Assign the head of the queue to path.
Remove the head of the queue.
If path references a directory,
Append the path of the files in path to the todo queue.
Perform whatever action you want to perform with path.
I don't see why you think fork would help.
For example, actual Perl implementation:
sub dir_contents {
my ($path) = #_;
my $dh;
if (!opendir(my $dh, $path)) {
warn("Can't open dir \"$path\": $!\n");
return;
}
return map { "$path/$_" } grep { !/^\.\.?/ } readdir($dh)
}
my #todo = 'some path';
while ( my $path = shift(#todo) ) {
if (!stat($path)) {
warn("Can't stat \"$path\": $!\n");
next;
}
push #todo, dir_contents($path) if -d _;
print("$path\n");
}
Related
I have details like below in an array. There will be plenty of testbed details in actual case. I want to grep a particular testbed(TESTBED = vApp_eprapot_icr) and an infomation like below should get copied to another array. How can I do it using perl ? End of Testbed info can be understood by a closing flower bracket }.
TESTBED = vApp_eprapot_icr {
DEVICE = vApp_eprapot_icr-ipos1
DEVICE = vApp_eprapot_icr-ipos2
DEVICE = vApp_eprapot_icr-ipos3
DEVICE = vApp_eprapot_icr-ipos5
CARDS=1GIGE,ETHFAST
CARDS=3GIGE,ETHFAST
CARDS=10PGIGE,ETHFAST
CARDS=20PGIGE,ETHFAST
CARDS=40PGIGE,ETHFAST
CARDS=ETHFAST,ETHFAST
CARDS=10GIGE,ETHFAST
CARDS=ETH,ETHFAST
CARDS=10P10GIGE,ETHFAST
CARDS=PPA2GIGE,ETHFAST
CARDS=ETH,ETHFAST,ETHGIGE
}
I will make it simpler, please see the below array
#array("
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
");
Consider an array like this. Where such entries are plenty. I need to grep, for an example student=Akshaya's data. from the opening '{' to closing '}' all info should get copied to another array. This is what I'm looking for.
while (<>) {
print if /TESTBED = vApp_eprapot_icr/../\}/;
}
as a sidenote <> will capture the filename you use on cmdline. So if the data is stored in a file you will run from commandline
perl scriptname.pl filename.txt
Ok. We finally have enough information to come up with an answer. Or, at least, to produce two answers which will work on slightly different versions of your input file.
In a comment you say that you are creating your array like this:
#array = `cat $file`;
That's not a very good idea for a couple of reasons. Firstly, why run an external command like cat when Perl will read the file for you. And secondly, this gives you one element in your array for each line in your input file. Things become far easier if you arrange it so that each of your TESTBED = foo { ... } records is a single array element.
Let's get rid of the cat first. The easiest way to read a single file into an array is to use the file input operator - <>. That will read data from the file whose name is given on the command line. So if you call your program filter_records, you can call it like this:
$ ./filter_records your_input_data.txt
And then read it into an array like this:
#array = <>;
That's good, but we still have each line of the input file in its own array element. How we fix that depends on the exact format of your input file. It's easiest if there's a blank line between each record in the input file, so it looks like this:
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
Perl has a special variable called $/ which controls how it reads records from input files. If we set it to be an empty string then Perl goes into "paragraph" mode and it uses blank lines to delimit records. So we can write code like this:
{
local $/ = '';
#array = <>;
}
Note that it's always a good idea to localise changes to Perl's special variables, which is why I have enclosed the whole thing in a naked block.
If there are no blank lines, then things get slightly harder. We'll read the whole file in and then split it.
Here's our example file with no blank lines:
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
And here's the code we use to read that data into an array.
{
local $/;
$data = <>;
}
#array = split /(?<=^})\n/m, $data;
This time, we've set $/ to undef which means that all of the data has been read from the file. We then split the data wherever we find a newline that is preceded by a } on a line by itself.
Whichever of the two solutions above that we use, we end up with an array which (for our sample data) has three elements - one for each of the records in our data file. It's then simple to use Perl's grep to filter that array in various ways:
# All students whose names start with 'Ak'
#filtered_array = grep { /student=Ak/ } #array;
If you use similar techniques on your original data file, then you can get the records that you are interested in with code like this:
#filtered_array = grep { /TESTBED = vApp_eprapot_icr/ } #array;
I wrote the below Rascal code that is supposed to build a tree out of a map from node names to nodes, starting at the node mapped from "top". It should repeatedly replace the children of all nodes that have strings as children in result by the nodes nodeMap maps them to, until nothing changes anymore (fixpoint).
getNode returns the node a map[str,node] maps it to, or the key itself if it is not present as a key in the map. This works fine, as proves the fact that other code at the bottom of this question does work. However, the code directly below seems to run infintely even on very small inputs.
node nodeMapToNode(map[str, node] nodeMap) {
node result = nodeMap["top"];
return outermost visit(result) {
case node n: {
if ([*str children] := getChildren(n)) {
insert makeNode(getName(n), [getNode(child, nodeMap) | child <- children]);
}
}
}
}
The following code does work and returns in an instant on small inputs as I expected. This is, however, exactly what I understood outermost-visiting should do from what I understood from the Rascal tutor.
Can anyone explain to me what the difference between these code snippets is (besides the way they are written) and what I thus misunderstood about the effect of outermost visit? Also, I'd like to know if a shorter and/or nicer way to write this code - using something like outermost-visiting instead of writing the fixpoint by hand - does exist.
node nodeMapToNode(map[str, node] nodeMap) {
node result = nodeMap["top"];
node lastResult;
do {
lastResult = result;
result = visit(lastResult) {
case node n: {
if ([*str children] := getChildren(n)) {
insert makeNode(getName(n),
[getNode(child, nodeMap) | child <- children]);
}
}
}
} while (result != lastResult);
return result;
}
What is outermost?
The rascal tutor is very compact in it's explanation but let's start from there.
repeat a top-down traversal as long as the traversal changes the resulting value (compute a fixed-point).
which in rascal terms means that this:
r = outermost visit(x) {
case str s => s + "."
when size(s) < 3
};
is syntactic sugar for:
r = x;
solve(r) {
r = top-down visit(r) {
case str s => s + "."
when size(s) < 3
};
}
I think there are two common cases were outermost/innermost makes sense:
your replacement should be repeated multiple times on the same node
your replacement generate new nodes that match other patterns
Your specific example
Regarding the example in your question. The other manually rewritten outermost is actually an innermost. The default visit strategy is bottom-up.
In general, an bottom-up visit of the tree is a quicker than a top-down. Especially when you are rewriting it, since Rascal is immutable, building a new tree bottom-up is quicker.
So, perhaps replace your code with an innermost visit instead of an outermost?
I am just learning Perl as a fourth language.
My wish is to use Parallel::ForkManager to speed up a foreach loop using an array whose members are taken from a text file.
Basically I am testing a .txt file of URLs, and wish to make it so that it will test multiple members of the array at once, not one at a time (five at a time in this instance) and without spamming the same URL inadvertently DoSing it.
Would something like this do the trick?
$limit = new Parallel::ForkManager(5);
foreach (#lines) {
$limit->start and next;
$lines = $_;
... do processing here ...
$limit->finish;
}
or would it be the equivalent of running that loop 5 times making a small multithreaded DoS script?
It isn't too clear from the documentation, but
A call to start will block in the parent process until there are fewer children running than the limit specified. Then it will return the (non-zero) child PID in the parent, and zero in the child
A child process can see all the data in the parent process as it was when the start was called. The data is presumably copy-on-write, as the child may modify it but the changes aren't reflected in any other process's workspace
The $pm->start and next idiom may seem a little obscure. Essentially it skips the rest of the loop if the start method returns a true value. I prefer something like my $pid = $fm->start; next if $pid; or the if construct in the code below. Both do the same thing, but I think more legibly
I recommend that you experiment with this simpler application, which uses a cache of five child threads to print the numbers from zero to nine.
use strict;
use warnings;
use Parallel::ForkManager;
STDOUT->autoflush;
my $fm = Parallel::ForkManager->new(5);
for my $i (0 .. 9) {
my $pid = $fm->start;
if ($pid == 0) {
print "$i\n";
sleep 2;
$fm->finish;
}
}
To test, use a safe local process like print or write to avoid spamming the URL's. Here's a working snippet from a program I wrote that uses the fork manager.
my $pm=new Parallel::ForkManager(20);
foreach $add (#adds){
$pm->start and next;
#if email is invalid move on
if (!defined(Email::Valid::Loose->address($add))){
writeaddr(*BADADDR, $add); #address is bad
$pm->finish;
}
#if email is valid get domain name
$is_valid = Email::Valid::Loose->address($add);
if ($is_valid =~ m/\#(.*)$/) {
$host = $1;
}
$is_valid="";
# perform dsn lookup to check domain
#mx=mx($resolver, $host);
if (#mx) {
writeaddr(*GOODADDR, $add); #address is good
}else{
writeaddr(*BADADDR, $add); #address is bad
}
$pm->finish;
}
First, the code:
lblFileNbr.Text = "?/?";
lblFileNbr.ToolTipText = "Searching for files...";
lock(_fileLock)
{
_dirFiles = new string[0];
_fileIndex = 0;
}
if(_fileThread != null && _fileThread.IsAlive)
{
_fileThread.Abort();
}
_fileThread = new Thread(() =>
{
string dir = Path.GetDirectoryName(fileName) ?? ".";
lock (_fileLock)
{
_dirFiles = GetImageFileExtensions().SelectMany(f => Directory.GetFiles(dir, f, _searchOption)).OrderBy(f => f).ToArray();
_fileIndex = Array.IndexOf(_dirFiles, fileName);
}
int totalFileCount = Directory.GetFiles(dir, "*.*", _searchOption).Length;
Invoke((MethodInvoker)delegate
{
lblFileNbr.Text = string.Format("{0}/{1}", NumberFormat(_fileIndex + 1), NumberFormat(_dirFiles.Length));
lblFileNbr.ToolTipText = string.Format("{0} ({1} files ignored)", dir, NumberFormat(totalFileCount - _dirFiles.Length));
});
});
_fileThread.Start();
I'm building a little image-viewing program. When you open an image, it lists the number of files in the same directory. I noticed when I open an image in a directory with a lot of other files (say 150K), it takes several seconds to build the file list. Thus, I'm delegating this task to another thread.
If, however, you open another image before it finishes searching for the files, that old count is no longer relevant, so I'm aborting the thread.
I'm locking _dirFiles and _fileIndex because I want to add some Left and Right key functionality to switch between photos, so I'll have to access those somewhere else (but in the UI thread).
Is this safe? There seems to be dozens of methods of dealing with threads in C# now, I just wanted something simple.
fileName is a local variable (which means it will be "copied" into the anonymous function, right?), and _searchOption is readonly, so I imagine those 2 are safe to access.
> Is it safe to abort this file-searching thread?
The short answer is NO!
It is almost never safe to abort a thread, and this advice applies even more when you might be executing native code.
If you can't cooperatively exit fast enough ( because it is your call to Directory.GetFiles that takes time ), your best bet is to abandon the thread: let it finish cleanly but ignore its results.
As always, I recommend reading Joe Albahari's free ebook
It isn't safe to abort the thread using Thread.Abort(). But you could instead implement your own abort which could allow you to safely bring the thread to a close in a controlled fashion.
If you use EnumerateFiles instead of GetFiles, you can loop through each file as you increment a counter to get the total number of files while checking a flag to see if the thread needs to abort.
Calling something such as this in place of your current GetFiles().Length:
private bool AbortSearch = false;
private int NumberOfFiles(string dir, string searchPattern, SearchOption searchOption)
{
var files = Directory.EnumerateFiles(dir, searchPattern, searchOption);
int numberOfFiles = 0;
foreach (var file in files)
{
numberOfFiles++;
if (AbortSearch)
{
break;
}
}
return numberOfFiles;
}
You could then replace
_fileThread.Abort();
with
AbortSearch=true;
_fileThread.Join();
You'll achieve what you are with the current Thread.Abort(), but you will allow all threads to end cleanly when you want them to.
The question pretty much sums it up. "dtrace 'print an associative array'" has exactly one google hit and the similar searches are equally useless.
EDIT:
If I were to use an aggregation, I'm not aware that I'd still be able to remove entries. My application requires that I be able to do things like:
file_descriptors[0] = "stdin"
file_descriptors[3] = "service.log"
...
...
file_descriptors[3] = 0
...
...
# should print only those entries that have not been cleared.
print_array(file_descriptors)
I know that you can clear an entire aggregation, but what about a single entry?
UPDATE:
Since I'm doing this in OS X and my application is to track all of the file descriptors that have been opened by a particular process, I was able to have an array of 256 pathnames, thusly:
syscall::open*:entry
/execname == $1/
{
self->path = copyinstr(arg0);
}
syscall::open*:return
/execname == $1/
{
opened[arg0] = self->path;
}
syscall::close*:entry
/execname == $1/
{
opened[arg0] = 0;
}
tick-10sec
{
printf(" 0: %s\n", opened[0]);
}
The above probe repeated 255 more times...
It sucks. I'd really like to have something better.
Is this the link Google found? Because the advice seems pretty sound:
I think the effect you're looking for should be achieved by using an
aggregation rather than an array. So you'd actually do something like:
#requests[remote_ip,request] = count();
... and then:
profile:::tick-10sec
{
/* print all of the requests */
printa(#requests);
/* Nuke the requests aggregation */
trunc(#requests);
}
Use an associative array and sum(1) and sum(-1) instead of count().