I am coming back from NumPy to MATLAB and don't quite have the hang of the broadcasting here.
Can someone explain to me why the first one fails and the second (more explicit works)?
After my understanding, x0 and x1 are both 1x2 arrays and it should be possible to extend them to 5x2.
n_a = 5;
n_b = 2;
x0 = [1, 2];
x1 = [11, 22];
% c = unifrnd(x0, x1, [n_a, n_b])
% Error using unifrnd
% Size information is inconsistent.
% c = unifrnd(x0, x1, [n_a, 1]) % also fails
c = unifrnd(ones(n_a, n_b) .* x0, ones(n_a, n_b) .* x1, [n_a, n_b])
% works
There is a size verification within the unifrnd function (you can type open unifrnd in the command line to see the function code). It sends the error if the third input is not coherent with the size of the first 2 inputs:
[err, sizeOut] = internal.stats.statsizechk(2,a,b,varargin{:});
if err > 0
error(message('stats:unifrnd:InputSizeMismatch'));
end
If you skip this part, though (as in if you create a custom function without the size check), both your function calls that fail will actually work, due to implicit expansion. The real question is whether calling the function this way makes sense.
TL;DR : It is not the broadcasting that fails, it is the function that does not allow you these sets of inputs
unifrnd essentially calls rand and applies scaling and shifting to the desired interval. So you can use rand and do the scaling and shifting manually, which allows you to employ broadcasting (singleton expansion):
c = x0 + (x1-x0).*rand(n_a, n_b);
There are probably libraries to do this (although I haven't found any), I'm actually looking to measure the time a function takes to run in Idris. The way I've found is by using clockTime from System and differentiating between before and after a function runs. Here is an example code:
module Main
import Data.String
import System
factorial : Integer -> Integer
factorial 0 = 1
factorial 1 = 1
factorial n = n * factorial (n - 1)
main : IO ()
main = do
args <- getArgs
case args of
[self ] => putStrLn "Please enter a value"
[_, ar] => do
case parseInteger ar of
Just a => do
t1 <- clockTime
let r = factorial a
t2 <- clockTime
let elapsed = (nanoseconds t2) - (nanoseconds t1)
putStrLn $ "fact(" ++ show a ++ ") = "
++ show r ++ " in "
++ (show elapsed) ++ " ns"
Nothing => putStrLn "Not a valid number"
To avoid Idris optimising the program by already evaluating the factorial, I just asked that the program be called with an argument.
This code doesn't work though: no matter what numbers I enter, such as 10000, Idris always returns 0 nanoseconds, which makes me quite sceptical, even just allocating a bigint takes time. I compile with idris main.idr -o main.
What am I doing wrong in my code? Is clockTime not a good plan for benchmarks?
Idris 1 is no longer being maintained.
In Idris 2, clockTime can be used.
clockTime : (typ : ClockType) -> IO (clockTimeReturnType typ)
An example of its use for benchmarking can be found within the Idris2 compiler, here.
I've tried to implement Haskell Control.Concurrent.MVar that resides in shared memory and allows communicating between multiple independent processes/programs using POSIX functionality.
But I have failed with lots of deadlocks.
The problem is that pthread_cond_timedwait sometimes does not return being called within GHC FFI (albeit interruptible or unsafe).
After a few days of desperate attempts to resolve the problem, I decided to minify the code and ask community to help. Unfortunately, I could not condense the problem into a few lines of code pastable in here. Therefore, I stored the (as small as possible) code on github together with the instructions on how to replicate the problem here is a permalink to the current state of it (mvar-fail branch).
In the essence, the functions to take and put mvar look like this:
int mvar_take(MVar *mvar, ...) {
pthread_mutex_timedlock(&(mvar->statePtr->mvMut), &timeToWait);
while ( !(mvar->statePtr->isFull) ) {
pthread_cond_signal(&(mvar->statePtr->canPutC));
pthread_cond_timedwait(&(mvar->statePtr->canTakeC), &(mvar->statePtr->mvMut), &timeToWait);
}
memcpy(localDataPtr, mvar->dataPtr, mvar->statePtr->dataSize);
mvar->statePtr->isFull = 0;
pthread_mutex_unlock(&(mvar->statePtr->mvMut));
}
int mvar_put(MVar *mvar, ...) {
pthread_mutex_timedlock(&(mvar->statePtr->mvMut), &timeToWait);
while ( mvar->statePtr->isFull ) {
pthread_cond_signal(&(mvar->statePtr->canTakeC));
pthread_cond_timedwait(&(mvar->statePtr->canPutC), &(mvar->statePtr->mvMut), &timeToWait);
}
memcpy(mvar->dataPtr, localDataPtr, mvar->statePtr->dataSize);
mvar->statePtr->isFull = 1;
pthread_mutex_unlock(&(mvar->statePtr->mvMut));
}
(Plus error checking and printfs after every command).
Full code for mvar_take.
The initialization happens as follows:
pthread_mutexattr_init(&(s.mvMAttr));
pthread_mutexattr_settype(&(s.mvMAttr), PTHREAD_MUTEX_ERRORCHECK);
pthread_mutexattr_setpshared(&(s.mvMAttr), PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&(s.mvMut), &(s.mvMAttr));
pthread_condattr_init(&(s.condAttr));
pthread_condattr_setpshared(&(s.condAttr), PTHREAD_PROCESS_SHARED);
pthread_cond_init(&(s.canPutC), &(s.condAttr));
pthread_cond_init(&(s.canTakeC), &(s.condAttr));
Full code.
The Haskell part looks like this:
foreign import ccall interruptible "mvar_take"
mvar_take :: Ptr StoredMVarT -> Ptr a -> CInt -> IO CInt
foreign import ccall interruptible "mvar_put"
mvar_put :: Ptr StoredMVarT -> Ptr a -> CInt -> IO CInt
takeMVar :: Storable a => StoredMVar a -> IO a
takeMVar (StoredMVar _ fp) = withForeignPtr fp $ \p -> alloca $ \lp -> do
r <- mvar_take p lp
if r == 0
then peek lp
else throwErrno $ "takeMVar failed with code " ++ show r
putMVar :: Storable a => StoredMVar a -> a -> IO ()
putMVar (StoredMVar _ fp) x = withForeignPtr fp $ \p -> alloca $ \lp -> do
poke lp x
r <- mvar_put p lp
unless (r == 0)
$ throwErrno $ "putMVar failed with code " ++ show r
Full code.
Changing FFI from interruptible to unsafe does not prevent the deadlock.
Sometimes the deadlock happens every second run, sometimes it happens after 50 runs only (and the rest is executed as expected).
My guess is that GHC might interfere the work of POSIX mutexes with some OS signal handling, but I don't know GHC internals enough to verify it.
Is that me doing something stupidly wrong, or do I need to add some special tricks to make it work inside GHC FFI?
P.S.: the last version of README with my investigations is available at interprocess mvar-fail.
UPDATE 13.06.2018:
I tried to temporarily block all OS signals by surrounding function code with following:
sigset_t mask, omask;
sigfillset(&mask);
sigprocmask(SIG_SETMASK, &mask, &omask);
...
sigprocmask(SIG_SETMASK, &omask, NULL);
This did not help.
Well, as expected, this was my fault - a very C-beginner error.
As one can see from the initialization snippet, I keep the mutex and the condition variables in a structure.
What one cannot see from the snippet here, but can see by the links I gave (on github), is that I am copying that structure to a shared memory. Not only that is not allowed for mutexes, but I also stupidly copy it before I initialize everything in the structure.
That is, I just copied a C structure where I should have set a pointer.
The most surprising here is that the code still works sometimes.
Here is the link to the erroneous code.
I want to write a loop in haskell using monads but I am having a hard time understanding the concept.
Could someone provide me with one simple example of a while loop while some conditions is satisfied that involves IO action? I don't want an abstract example but a real concrete one that works.
Below there's a horrible example. You have been warned.
Consider the pseudocode:
var x = 23
while (x != 0) {
print "x not yet 0, enter an adjustment"
x = x + read()
}
print "x reached 0! Exiting"
Here's its piece-by-piece translation in Haskell, using an imperative style as much as possible.
import Data.IORef
main :: IO ()
main = do
x <- newIORef (23 :: Int)
let loop = do
v <- readIORef x
if v == 0
then return ()
else do
putStrLn "x not yet 0, enter an adjustment"
a <- readLn
writeIORef x (v+a)
loop
loop
putStrLn "x reached 0! Exiting"
The above is indeed horrible Haskell. It simulates the while loop using the recursively-defined loop, which is not too bad. But it uses IO everywhere, including for mimicking imperative-style mutable variables.
A better approach could be to remove those IORefs.
main = do
let loop 0 = return ()
loop v = do
putStrLn "x not yet 0, enter an adjustment"
a <- readLn
loop (v+a)
loop 23
putStrLn "x reached 0! Exiting"
Not elegant code by any stretch, but at least the "while guard" now does not do unnecessary IO.
Usually, Haskell programmers strive hard to separate pure computation from IO as much as possible. This is because it often leads to better, simpler and less error-prone code.
During the realization of the course work I have to write MPI program to solve PDE continuum mechanics. (FORTRAN)
In the sequence program file is written as follows:
do i=1,XX
do j=1,YY
do k=1,ZZ
write(ifile) R(i,j,k)
write(ifile) U(i,j,k)
write(ifile) V(i,j,k)
write(ifile) W(i,j,k)
write(ifile) P(i,j,k)
end do
end do
end do
In the parallel program, I write the same as follows:
/ parallelization takes place only along the axis X /
call MPI_TYPE_CREATE_SUBARRAY(4, [INT(5), INT(ZZ),INT(YY), INT(XX)], [5,ZZ,YY,PDB(iam).Xelements], [0, 0, 0, PDB(iam).Xoffset], MPI_ORDER_FORTRAN, MPI_FLOAT, slice, ierr)
call MPI_TYPE_COMMIT(slice, ierr)
call MPI_FILE_OPEN(MPI_COMM_WORLD, cFileName, IOR(MPI_MODE_CREATE, MPI_MODE_WRONLY), MPI_INFO_NULL, ifile, ierr)
do i = 1,PDB(iam).Xelements
do j = 1,YY
do k = 1,ZZ
dataTmp(1,k,j,i) = R(i,j,k)
dataTmp(2,k,j,i) = U(i,j,k)
dataTmp(3,k,j,i) = V(i,j,k)
dataTmp(4,k,j,i) = W(i,j,k)
dataTmp(5,k,j,i) = P(i,j,k)
end do
end do
end do
call MPI_FILE_SET_VIEW(ifile, offset, MPI_FLOAT, slice, 'native', MPI_INFO_NULL, ierr)
call MPI_FILE_WRITE_ALL(ifile, dataTmp, 5*PDB(iam).Xelements*YY*ZZ, MPI_FLOAT, wstatus, ierr)
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
It works well. But I'm not sure about using an array dataTmp. What solution will be faster and more correct? What about using 4D array like the dataTmp in the whole program? Or, maybe, I should create 5 special mpi_types with different displacemet.
Using dataTmp is fine, if you have the memory space. your MPI_FILE_WRITE_ALL call will be the most expensive part of this code.
You've done the hard part, setting an MPI-IO file view. if you want to get rid of dataTmp, you could create an MPI datatype to describe the arrays (probably using MPI_Type_hindexed and MPI_Get_address)), then use MPI_BOTTOM as the memory buffer.
If I/O speed is an issue and you have the option, I'd suggest changing the file format - or alternately, how the data is laid out in memory - to be more closely lined up: in the serial code, writing data in this transposed and interleaved way is going to be very slow:
program testoutput
implicit none
integer, parameter :: XX=512, YY=512, ZZ=512
real, dimension(:,:,:), allocatable :: R, U, V, W, P
integer :: timer
integer :: ifile
real :: elapsed
integer :: i,j,k
allocate(R(XX,YY,ZZ), P(XX,YY,ZZ))
allocate(U(XX,YY,ZZ), V(XX,YY,ZZ), W(XX,YY,ZZ))
R = 1.; U = 2.; V = 3.; W = 4.; P = 5.
open(newunit=ifile, file='interleaved.dat', form='unformatted', status='new')
call tick(timer)
do i=1,XX
do j=1,YY
do k=1,ZZ
write(ifile) R(i,j,k)
write(ifile) U(i,j,k)
write(ifile) V(i,j,k)
write(ifile) W(i,j,k)
write(ifile) P(i,j,k)
end do
end do
end do
elapsed=tock(timer)
close(ifile)
print *,'Elapsed time for interleaved: ', elapsed
open(newunit=ifile, file='noninterleaved.dat', form='unformatted',status='new')
call tick(timer)
write(ifile) R
write(ifile) U
write(ifile) V
write(ifile) W
write(ifile) P
elapsed=tock(timer)
close(ifile)
print *,'Elapsed time for noninterleaved: ', elapsed
deallocate(R,U,V,W,P)
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
end program testoutput
Running gives
$ gfortran -Wall io-serial.f90 -o io-serial
$ ./io-serial
Elapsed time for interleaved: 225.755005
Elapsed time for noninterleaved: 4.01700020
As Rob Latham, who knows more than a few things about this stuff, says, your transposition for the parallel version is fine - it does the interleaving and transposing explicitly in memory, where it's much faster, and then blasts it out to disk. It's about as fast as the IO is going to get.
You can definitely avoid the dataTmp array by writing one or five individual data types to do the transposition/interleaving for you on the way out to disk via the MPI_File_write_all routine. That will give you a bit more of a balance in between in terms of memory usage and performance. You won't be explicitly defining a big 3-D array, but the MPI-IO code will improve performance over looping over individual elements by doing a fair bit of buffering, meaning that a certain amount of memory is being set aside to do the writing efficiently. The good news is that the balance will be tunable by setting MPI-IO hints in the Info variable; the bad news is that the code is likely to be less clear than what you have now.