Reading CSV file in loop Dataframe (Julia) - loops

I want to read multiple CSV files with changing names like "CSV_1.csv" and so on.
My idea was to simply implement a loop like the following
using CSV
for i = 1:8
a[i] = CSV.read("0.$i.csv")
end
but obviously that won't work.
Is there a simple way of implementing this, like introducing a additional dimension in the dataframe?

Assuming a in this case is an array, this is definitely possible, but to do it this way, you'd need to pre-allocate your array, since you can't assign an index that doesn't exist yet:
julia> a = []
0-element Array{Any,1}
julia> a[1] = 1
ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [1]
Stacktrace:
[1] setindex!(::Array{Any,1}, ::Any, ::Int64) at ./essentials.jl:455
[2] top-level scope at REPL[10]:1
julia> a2 = Vector{Int}(undef, 5);
julia> for i in 1:5
a2[i] = i
end
julia> a2
5-element Array{Int64,1}:
1
2
3
4
5
Alternatively, you can use push!() to add things to an array as you need.
julia> a3 = [];
julia> for i in 1:5
push!(a3, i)
end
julia> a3
5-element Array{Any,1}:
1
2
3
4
5
So for your CSV files,
using CSV
a = []
for i = 1:8
push!(a, CSV.read("0.$i.csv"))
end

You can alternatively to what Kevin proposed write:
# read in the files into a vector
a = CSV.read.(["0.$i.csv" for i in 1:8])
# add an indicator column
for i in 1:8
a[i][!, :id] .= i
end
# create a single data frame with indicator column holding the source
b = reduce(vcat, a)

You can read an arbitrary number of CSV files with a certain pattern in the file name, create a dataframe per file and lastly, if you want, create a single dataframe.
using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes
# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
dfs[i][!, :sample] .= i # I called the new col sample
end
# finally, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)

Related

Write and Read multiple arrays on a file in Python

I need to write multiple arrays (3,3) on the same file and read them one by one.
It’s the first time I use Python. Help me, please
You could use np.savez:
# save three arrays to a numpy archive file
a0 = np.array([[1,2,3],[4,5,6],[7,8,9]])
a1 = np.array([[11,12,13],[14,15,16],[17,18,19]])
a2 = np.array([[21,22,23],[24,25,26],[27,28,29]])
np.savez("my_archive.npz", soil=a0, crust=a1, bedrock=a2)
# opening the archive and accessing each array by name
with np.load("my_archive.npz") as my_archive_file:
out0 = my_archive_file["soil"]
out1 = my_archive_file["crust"]
out2 = my_archive_file["bedrock"]
https://www.pythonlikeyoumeanit.com/Module5_OddsAndEnds/WorkingWithFiles.html
if you mean an array like this
mylist = [[1,2,3],[4,5,6],[7,8,9]]
the n the solution is like this
with open("fout.txt", "w") as fout:
print(*mylist, sep="\n", file=fout)
to read like this
with open("fout.txt", "r") as fout:
print(fout.read())

How to exchange one specific value in an array in Julia?

I'm pretty new to Julia, so this is probably a pretty easy question. I want to create a vector and exchange a given value with a new given value.
This is how it would work in Java, but I can't find a solution for Julia. Do I have to copy the array first? I'm pretty clueless..
function sorted_exchange(v::Array{Int64,1}, in::Int64, out::Int64)
i=1
while v[i]!=out
i+=1
end
v[i]=in
return v
end
The program runs but just returns the "old" vector.
Example: sorted_exchange([1,2,3],4,3), expected:[1,2,4], actual:[1,2,3]
There's a nice built-in function for this: replace or its in-place version: replace!:
julia> v = [1,2,3];
julia> replace!(v, 3=>4);
julia> v
3-element Array{Int64,1}:
1
2
4
The code you have posted seems to work fine, though it does something slightly different. Your code only replaces the first instance of 3, while replace! replaces every instance. If you just want the first instance to be replaced you can write:
julia> v = [1,2,3,5,3,5];
julia> replace!(v, 3=>4; count=1)
6-element Array{Int64,1}:
1
2
4
5
3
5
You can find the value you want to replace using findall:
a = [1, 2, 5]
findall(isequal(5), a) # returns 3, the index of the 5 in a
and use that to replace the value
a[findall(isequal(5), a)] .= 6
a # returns [1, 2, 6]

Correct way of maintaining array structure in R [duplicate]

I am working with 3D arrays. A function takes a 2D array slice (matrix) from the user and visualizes it, using row and column names (the corresponding dimnames of the array). It works fine if the array dimensions are > 1.
However, if I have 1x1x1 array, I cannot extract the slice as a matrix:
a <- array(1, c(1,1,1), list(A="a", B="b", C="c"))
a[1,,]
[1] 1
It is a scalar with no dimnames, hence part of the necessary information is missing. If I add drop=FALSE, I don't get a matrix but retain the original array:
a[1,,,drop=FALSE]
, , C = c
B
A b
a 1
The dimnames are here but it is still 3-dimensional. Is there an easy way to get a matrix slice from 1x1x1 array that would look like the above, just without the third dimension:
B
A b
a 1
I suspect the issue is that when indexing an array, we cannot distinguish between 'take 1 value' and 'take all values' in case where 'all' is just a singleton...
The drop parameter of [ is all-or-nothing, but the abind package has an adrop function which will let you choose which dimension you want to drop:
abind::adrop(a, drop = 3)
## B
## A b
## a 1
Without any extra packages, the best I could do was to apply and return the sub-array:
apply(a, 1:2, identity)
# or
apply(a, 1:2, I)
# B
#A b
# a 1

How to create sub-arrays access the i-th dimension of an array within for()?

In a for-loop, I run in i over an array which I would like to sub-index in dimension i. How can this be done? So a minimal example would be
(A <- array(1:24, dim = 2:4))
A[2,,] # i=1
A[,1,] # i=2
A[,,3] # i=3
where I index 'by foot'. I tried something along the lines of this but wasn't successful. Of course one could could create "2,," as a string and then eval & parse the code, but that's ugly. Also, inside the for loop (over i), I could use aperm() to permute the array such that the new first dimension is the former ith, so that I can simply access the first component. But that's kind of ugly too and requires to permute the array back. Any ideas how to do it more R-like/elegantly?
The actual problem is for a multi-dimensional table() object, but I think the idea will remain the same.
Update
I accepted Rick's answer. I just present it with a for loop and simplified it further:
subindex <- c(2,1,3) # in the ith dimension, we would like to subindex by subindex[i]
for(i in seq_along(dim(A))) {
args <- list(1:2, 1:3, 1:4)
args[i] <- subindex[i]
print(do.call("[", c(list(A), args)))
}
#Build a multidimensional array
A <- array(1:24, dim = 2:4)
# Select a sub-array
indexNumber = 2
indexSelection = 1
# Build a parameter list indexing all the elements of A
parameters <- list(A, 1:2, 1:3, 1:4)
# Modify the appropriate list element to a single value
parameters[1 + indexNumber] <- indexSelection
# select the desired subarray
do.call("[", parameters)
# Now for something completely different!
#Build a multidimensional array
A <- array(1:24, dim = 2:4)
# Select a sub-array
indexNumber = 2
indexSelection = 1
reduced <- A[slice.index(A, indexNumber) == indexSelection]
dim(reduced) <- dim(A)[-indexNumber]
# Also works on the left-side
A[slice.index(A, 2)==2] <- -1:-8

Julia Approach to python equivalent list of lists

I just started tinkering with Julia and I'm really getting to like it. However, I am running into a road block. For example, in Python (although not very efficient or pythonic), I would create an empty list and append a list of a known size and type, and then convert to a NumPy array:
Python Snippet
a = []
for ....
a.append([1.,2.,3.,4.])
b = numpy.array(a)
I want to be able to do something similar in Julia, but I can't seem to figure it out. This is what I have so far:
Julia snippet
a = Array{Float64}[]
for .....
push!(a,[1.,2.,3.,4.])
end
The result is an n-element Array{Array{Float64,N},1} of size (n,), but I would like it to be an nx4 Array{Float64,2}.
Any suggestions or better way of doing this?
The literal translation of your code would be
# Building up as rows
a = [1. 2. 3. 4.]
for i in 1:3
a = vcat(a, [1. 2. 3. 4.])
end
# Building up as columns
b = [1.,2.,3.,4.]
for i in 1:3
b = hcat(b, [1.,2.,3.,4.])
end
But this isn't a natural pattern in Julia, you'd do something like
A = zeros(4,4)
for i in 1:4, j in 1:4
A[i,j] = j
end
or even
A = Float64[j for i in 1:4, j in 1:4]
Basically allocating all the memory at once.
Does this do what you want?
julia> a = Array{Float64}[]
0-element Array{Array{Float64,N},1}
julia> for i=1:3
push!(a,[1.,2.,3.,4.])
end
julia> a
3-element Array{Array{Float64,N},1}:
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
julia> b = hcat(a...)'
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
It seems to match the python output:
In [9]: a = []
In [10]: for i in range(3):
a.append([1, 2, 3, 4])
....:
In [11]: b = numpy.array(a); b
Out[11]:
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
I should add that this is probably not what you actually want to be doing as the hcat(a...)' can be expensive if a has many elements. Is there a reason not to use a 2d array from the beginning? Perhaps more context to the question (i.e. the code you are actually trying to write) would help.
The other answers don't work if the number of loop iterations isn't known in advance, or assume that the underlying arrays being merged are one-dimensional. It seems Julia lacks a built-in function for "take this list of N-D arrays and return me a new (N+1)-D array".
Julia requires a different concatenation solution depending on the dimension of the underlying data. So, for example, if the underlying elements of a are vectors, one can use hcat(a) or cat(a,dims=2). But, if a is e.g a 2D array, one must use cat(a,dims=3), etc. The dims argument to cat is not optional, and there is no default value to indicate "the last dimension".
Here is a helper function that mimics the np.array functionality for this use case. (I called it collapse instead of array, because it doesn't behave quite the same way as np.array)
function collapse(x)
return cat(x...,dims=length(size(x[1]))+1)
end
One would use this as
a = []
for ...
... compute new_a...
push!(a,new_a)
end
a = collapse(a)

Resources