Julia parallel file processing

Julia parallel file processing - file

I'm relatively new to Julia language, and I 've been recently trying to process some files in parallel manner. My code looks something like>
for ln in eachline (somefile)
...
proces this line
for ln2 in eachline (someotherfile)
..
..
process ln and ln2
..
..
I've been trying to speed things up a bit with #everywhere and #parallel functions, but it doesn't seem to work for eachline function.
Am I missing something?
Thanks for help.

From #parallel macro we already know that:
#parallel [reducer] for var = range
body
end
The specified range is partitioned and locally executed across all workers.
To do the above job in minimum time, #parallel gets length(range) then partitions it between nworkers().
for more details you can:
. see macro output -> macroexpand(:(#parallel for i in 1:5 i end))
or:
. check macro source -> milti.jl
EachLine is one of Julia iterables, it implements all mandatory methods of iterable interface, but length() is not one of those. (check this discussion), so EachLine is not a range and #parallel fails to do it's task because lack of length() function.
But there are at list two solutions to parallelize the process part:
use lis=readlines() to collect a range of lines, the #parallel for li in lis
use pmap()
Julia’s pmap() (page 483) is designed for the case where each function
call does a large amount of work. In contrast, #parallel for can
handle situations where each iteration is tiny, perhaps merely summing
two numbers.
a sample code:
len=function(s::AbstractString)
string(length(s)) * " " * string(myid());
end
function test()
open("eula.1028.txt") do io
pmap(len,eachline(io))
end
end

Related

Matrix dotProduct with diffrent result from python

I'm studying the multi-layer perceptron algorithm and I'm translating python code to golang.
I have 2 matrices. Let's call this matrix M1:
[[0 0 1 1]
[0 1 0 1]]
Let's call this matrix M2:
[[ 0.00041597 0.02185088 -0.00362142]
[-0.00057384 -0.02866677 0.00488404]
[-0.00056316 -0.02705587 0.00410378]
[ 0.00048268 0.01692128 -0.00262183]]
I'm implementing the dotProduct(M1,M2) in python and it gives me this result
[[ -8.04778516e-05 -1.01345901e-02 1.48194623e-03]
[ -9.11603819e-05 -1.17454886e-02 2.26221011e-03]]
I'm doing it in golang with the same inputs matrix(M1,M2)
but the golang code returns this matrix:
[[-8.047785157755936e-05 -0.010134590118173147 0.0014819462317188985]
[-9.116038191682538e-05 -0.011745488603430228 0.0022622101145935328]]
In python I'm using numpy's dot operation:
resultMatrix = M1.dot(M2)
In go, I'm using this package to work with matrix in go
The problem here is because I calculate others dotProcut calculos with golang and it are all ok
I make N tests with other values, i'm using this package(the same dotProduct method) in others parts of this my code and all has been ok
My Go code at line 128
Tutorial Python code at line 61
Matrix golang package method that implemets the golang dotProduct at line 30
The code in python is not mine, and because this, the code it's written in Portuguese, but my go code is written in English
In python i know that's right because all the neural network works well, but in go I'm not sure
i read the matrix go package method many times and dont get the "bug code implementation", some one know where I'm wrong?

Well, actually the results are pretty much the same. The thing that might confuse you is that formatting is different but still Python's -1.01345901e-02 = -0.0101345901 (see Scientific notation and particularly its E-notation" section) which is pretty close to Go's -0.010134590118173147 and just to make it clear let's align them
Python -1.01345901e-02
Go -0.010134590118173147
So if you have any problems in your code, they probably come from some other source than matrix multiplication.

the nest for loop array variable in MATLAB parfor error cannot be classified

I use the PSO algorithm to find the global best positions. The Matlab code that I would like to run in parallel looks like the following:
%% Particle loop
parfor iter=1:ParticleNum
for FoldNum = 1:10
.............
FoldHL(FoldNum)=Hamming_loss(labelTestOut,TVT);
end
HLoss(iter,:) = mean(FoldHL);
end
and the error information:
The variable FoldHL in a parfor cannot be classified.
Any idea why I am getting this error and how it can be resolved?

tcl multithreading like in c, having hard time to execute thread with procedure

I used to work in C, where threads are easy to create with a specific function I choose.
Now in tcl I can't use thread to start with a specific function I want, I tried this:
package require Thread
proc printme {aa} {
puts "$aa"
}
set abc "dasdasdas"
set pool [tpool::create -maxworkers 4 ]
# The list of *scripts* to evaluate
set tasks {
{puts "ThisisOK"}
{puts $abc}
{printme "1234"}
}
# Post the work items (scripts to run)
foreach task $tasks {
lappend jobs [tpool::post $pool $task]
}
# Wait for all the jobs to finish
for {set running $jobs} {[llength $running]} {} {
tpool::wait $pool $running running
}
# Get the results; you might want a different way to print the results...
foreach task $tasks job $jobs {
set jobResult [tpool::get $pool $job]
puts "TASK: $task"
puts "RESULT: $jobResult"
}
I always get:
Execution error 206: invalid command name "printme"invalid command name "printme"
while executing
"printme "1234""
invoked from within
"tpool::get $pool $job"
Why?

You problem is, that the Tcl threading model is very different from the one used in C. Tcl's model is a basically 'shared nothing by default' model mostly based on message passing.
So every thread in the pool is an isolated interpreter and does not know anything about a proc printme. You need to initialize those interpreters with the procs you need.
See the docs for the ::tpool::create command, it has an option to provide a -initcmd where you can define or package require the stuff you need.
So try this to initialize your threads:
set pool [tpool::create -maxworkers -initcmd {
proc printme {aa} {
puts "$aa"
}}]
https://www.tcl.tk/man/tcl/ThreadCmd/tpool.htm#M10
To answer your comment a bit more detailed:
No, there is no way to make Tcl threads work like C threads and share objects and procs freely. It is a fundamental design decision and allows Tcl to have an interpreter without a massive global lock (in contrast to e.g. CPython), as most things are thread local and use thread local storage.
But there are some ways to make initialization and use of multiple thread interpreters easier. One is the -initcmd parameter from ::tpool::create which allows you to run initialization code for every single interpreter in your pool without doing it manually. If all your code lives in a package, you simply add a package require and your interpreter is properly initialized.
If you really want to share state between multiple threads, you can use the ::tsv subcommands. It allows you to share arrays and other things between threads in an explict way. But under the hood it involves the typical locks and mutexes you might know from C to mediate access.
There is another set of commands in the thread package that allow you to make initialization easier. This is the ttrace command, which allows you to simply trace what gets executed in one interpreter and automatically repeat it in another interpreter. It is quite smart and only shares/copies the procs you really use to the target instead of loading all things upfront.

How to add a "sleep" or "wait" to my Lua Script?

I'm trying to make a simple script for a game, by changing the time of day, but I want to do it in a fast motion. So this is what I'm talking about:
function disco ( hour, minute)
setTime ( 1, 0 )
SLEEP
setTime ( 2, 0 )
SLEEP
setTime ( 3, 0 )
end
and so on. How would I go about doing this?

Lua doesn't provide a standard sleep function, but there are several ways to implement one, see Sleep Function for detail.
For Linux, this may be the easiest one:
function sleep(n)
os.execute("sleep " .. tonumber(n))
end
In Windows, you can use ping:
function sleep(n)
if n > 0 then os.execute("ping -n " .. tonumber(n+1) .. " localhost > NUL") end
end
The one using select deserves some attention because it is the only portable way to get sub-second resolution:
require "socket"
function sleep(sec)
socket.select(nil, nil, sec)
end
sleep(0.2)

If you have luasocket installed:
local socket = require 'socket'
socket.sleep(0.2)

This homebrew function have precision down to a 10th of a second or less.
function sleep (a)
local sec = tonumber(os.clock() + a);
while (os.clock() < sec) do
end
end

wxLua has three sleep functions:
local wx = require 'wx'
wx.wxSleep(12) -- sleeps for 12 seconds
wx.wxMilliSleep(1200) -- sleeps for 1200 milliseconds
wx.wxMicroSleep(1200) -- sleeps for 1200 microseconds (if the system supports such resolution)

I know this is a super old question, but I stumbled upon it while I was working on something. Here's some code that's working for me...
time=os.time()
wait=5
newtime=time+wait
while (time<newtime)
do
time=os.time()
end
And I needed randomization so I added
math.randomseed(os.time())
math.random(); math.random(); math.random()
randwait = math.random(1,30)
time=os.time()
newtime=time+randwait
while (time<newtime)
do
time=os.time()
end

I needed something simple for a polling script, so I tried the os.execute option from Yu Hao's answer. But at least on my machine, I could no longer terminate the script with Ctrl+C. So I tried a very similar function using io.popen instead, and this one does allow early termination.
function wait (s)
local timer = io.popen("sleep " .. s)
timer:close()
end

You should read this:
http://lua-users.org/wiki/SleepFunction
There are several solutions and each one has a description, which is important to know.
This is, what I used:
function util.Sleep(s)
if type(s) ~= "number" then
error("Unable to wait if parameter 'seconds' isn't a number: " .. type(s))
end
-- http://lua-users.org/wiki/SleepFunction
local ntime = os.clock() + s/10
repeat until os.clock() > ntime
end

if you're using a MacBook or UNIX based system, use this:
function wait(time)
if tonumber(time) ~= nil then
os.execute("Sleep "..tonumber(time))
else
os.execute("Sleep "..tonumber("0.1"))
end
wait()

You can use "os.time" or "os.clock" with "while" loop, i prefer "repeat until" loop because its shorter, but they are expensive because they cost full usage of a single core.
If you need something less demanding, you can use various wrappers like wxLua that i use, but sometimes, some of them also got usage penalty, specially annoying in games, so its best to test them and get what is best for your project.
Or you can relay on OS like Windows to do sleep function, using applications that exist in system32, via Batch or PowerShell, using ">nul" to hide it with "os.execute" or "io.popen", like "ping" (localhost/127.0.0.1) with timeout, "choice" (works with XP, newer versions may be different, i prefer it), "timeout" (/nobreak may be useless because all Windows commands can be canceled with CTRL+C). Downside are limited to given OS and number limitation as well as seconds or miliseconds, running it on eg. Linux may need Wine emulation for Windows (if application are written for it). You can also use "sleep" or "start-sleep" (from PowerShell), but since Lua are standalone, most people prefer pure Lua or wrappers, and you can use what suits your project.

function wait(time)
local duration = os.time() + time
while os.time() < duration do end
end
This is probably one of the easiest ways to add a wait/sleep function to your script

monitoring file changes in racket (like tail -f)

I would like to implement a "tail -f" like behavior in Racket. That is, I would like to read from a file, and when I hit the end, be able making something like a "blocking" (read-line file), that shall return when some other process appends a line to file.
I tried synchronizing with (read-line-evt file) but, if I am at the end of file, instead of blocking until other data is available, it returns immediately.
Is there a way to do it?

I don't think that you have any way to avoid polling the file.
Note that all of Racket's input functions consider eof a value that should be returned when it reaches the end of the input stream -- so all of the events immediately return that when the end is reached. At least I don't see anything that looks like a "wait until some input is ready, not eof".
In any case, you also have the ffi, if you know about some system call that triggers a callback instead of polling the file. AFAICT, the linux source code for tail uses inotify, so you might be able to use an old package that interfaces that from racket called mzfam. (But it's pretty old and might need some update work.)

I don't know when Racket added file system change events, but I suspect it was since this question was asked many years ago. Now you can wait on such an event and see if you can read another line (It's not fine grained enough to tell specifically that more data was appended to the file, just that something changed about it.)
An example of a basic tail -f like program to demonstrate file-system-change-evt:
;;; tail.rkt
#lang racket/base
(require racket/list racket/port)
;;; Some utility functions and macros
;; Like take but return the list if it's less than n elements long
;; instead of raising an error
(define (take* list n)
(with-handlers ([exn:fail:contract? (lambda (e) list)])
(take list n)))
;; Repeat body forever until a break is received
(define-syntax-rule (forever body ...)
(with-handlers ([exn:break? (lambda (e) (void))])
(let loop ()
body ...
(loop))))
;; Display the last N lines of a file. Could be more efficient, but
;; this part's not the point...
(define (display-last-lines port n)
(for-each displayln
(reverse
(for/fold ([lines '()])
([line (in-lines port)])
(take* (cons line lines) n)))))
;; Wait for the file's status to change and try to read lines when it does.
(define (follow-tail file)
(call-with-input-file file
(lambda (port)
(display-last-lines port 10)
(forever
(sync (filesystem-change-evt file))
(for ([line (in-lines port)])
(displayln line))))))
(module+ main
(unless (= (vector-length (current-command-line-arguments)) 1)
(displayln "Usage: racket tail.rkt FILENAME" (current-error-port))
(exit 1))
(follow-tail (string->path (vector-ref (current-command-line-arguments) 0))))
After being inspired by this question and Eli's mention of inotify in his answer, and seeing that there still wasn't a Racket package to provide access to it (I think the standard file system change code uses it internally, but it's not exposed at any low level to users), I wrote it myself. A version of the core tail function from above using it:
(require inotify)
(define (follow-tail file)
(call-with-input-file file
(lambda (port)
(display-last-lines port 10)
(call-with-inotify-instance
`((,file (IN_MODIFY)))
(lambda (inotify wds)
(forever
(sync inotify)
(for ([line (in-lines port)])
(displayln line))))))))