Julia - How to conver DataFrame to Array? - arrays

I have a DataFrame containing only numerical values. Now, what I'd like to do is extract all the values of this DataFrame as an Array. How can I do this? I know that for a single column, if I do df[!,:x1], then the output is an array. But how to do this for all the columns?

The shortest form seems to be:
julia> Matrix(df)
3×2 Array{Float64,2}:
0.723835 0.307092
0.02993 0.0147598
0.141979 0.0271646
In some scenarios you might want need to specify the type such as Matrix{Union{Missing, Float64}}(df)

convert(Matrix, df[:,:])
Try this

You can also use the Tables API for this, in particular the Tables.matrix function:
julia> df = DataFrame(x=rand(3), y=rand(3))
3×2 DataFrame
Row │ x y
│ Float64 Float64
─────┼─────────────────────
1 │ 0.33002 0.180934
2 │ 0.834302 0.470976
3 │ 0.0916842 0.45172
julia> Tables.matrix(df)
3×2 Array{Float64,2}:
0.33002 0.180934
0.834302 0.470976
0.0916842 0.45172

Related

DataFrames to Database tables

geniouses. I am a newbie in Julia, but have an ambitious.
I am trying to a following stream so far, of course it's an automatic process.
read data from csv file to DataFrames
Checking the data then cerate DB tables due to DataFrames data type
Insert data from DataFrames to the created table ( eg. SQLite )
I am sticking at No.2 now, because, for example the column's data type 'Vector{String15}'.
I am struggling how can I reflect the datatype to the query of creating table.
I mean I could not find any solutions below (a) (b).
fname = string( #__DIR__,"/","testdata/test.csv")
df = CSV.read( fname, DataFrame )
last = ncol(df)
for i = 1:last
col[i] = typeof(df[!,i]) # ex. Vector{String15}
if String == col[i] # (a) does not work
# create table sql
# expect
query = 'create table testtable( col1 varchar(15),....'
elseif Int == col[i] # (b) does not work
# create table sql
# expect
query = 'create table testtable( col1 int,....'
end
・
     ・
end
I am wonderring,
I really have to get the type of table column from 'Vector{String15}' anyhow?
Does DataFrames has an utility method to do it?
Should combine with other module to do it?
I am expecting smart tips by you, thanks any advances.
Here is how you can do it both ways:
julia> using DataFrames
julia> using CSV
julia> df = CSV.read("test.csv", DataFrame)
3×3 DataFrame
Row │ a b c
│ String15 Int64 Float64
─────┼─────────────────────────────
1 │ a1234567890 1 1.5
2 │ b1234567890 11 11.5
3 │ b1234567890 111 111.5
julia> using SQLite
julia> db = SQLite.DB("test.db")
SQLite.DB("test.db")
julia> SQLite.load!(df, db, "df")
"df"
julia> SQLite.columns(db, "df")
(cid = [0, 1, 2], name = ["a", " b", " c"], type = ["TEXT", "INT", "REAL"], notnull = [1, 1, 1], dflt_value = [missing, missing, missing], pk = [0, 0, 0])
julia> query = DBInterface.execute(db, "SELECT * FROM df")
SQLite.Query(SQLite.Stmt(SQLite.DB("test.db"), 4), Base.RefValue{Int32}(100), [:a, Symbol(" b"), Symbol(" c")], Type[Union{Missing, String}, Union{Missing, Int64}, UnionMissing, Float64}], Dict(:a => 1, Symbol(" c") => 3, Symbol(" b") => 2), Base.RefValue{Int64}(0))
julia> DataFrame(query)
3×3 DataFrame
Row │ a b c
│ String Int64 Float64
─────┼─────────────────────────────
1 │ a1234567890 1 1.5
2 │ b1234567890 11 11.5
3 │ b1234567890 111 111.5
If you would need more explanations this is covered in chapter 8 of Julia for Data Analysis. This chapter should be available on MEAP in 1-2 weeks (and the source code is already available at https://github.com/bkamins/JuliaForDataAnalysis)

How to export/import an array in Julia?

I want to move an array from my laptop (Julia 1.3.1) to my desktop PC (Julia 1.6.2).
I make an array in Julia 1.3.1 as follows.
using LinearAlgebra
H = ... #give a matrix H
eigen,vector = eigen(H)
Then, I'd like to move "vector" to Julia 1.6.2.
How do you do that?
The simplest way is by using DelimitedFiles:
julia> v = [1.0,2.0,3.0]
julia> using DelimitedFiles
julia> writedlm("f.txt", v)
julia> readdlm("f.txt")
3×1 Matrix{Float64}:
1.0
2.0
3.0
julia> vec(readdlm("f.txt"))
3-element Vector{Float64}:
1.0
2.0
3.0
Note that DelmitedFiles works with matrices so the last example shows what to do if you rather store a vector.
Edit following Bogumil's comment
When you have a Matrix of Complex numbers you need to provide the output type for readdlm:
julia> v = Complex.(rand(2,3), rand(2,3))
2×3 Matrix{ComplexF64}:
0.282157+0.540556im 0.757765+0.103518im 0.979935+0.212347im
0.557499+0.934859im 0.604032+0.338489im 0.431962+0.945946im
julia> writedlm("f.txt", v)
julia> readdlm("f.txt",'\t',Complex{Float64})
2×3 Matrix{ComplexF64}:
0.282157+0.540556im 0.757765+0.103518im 0.979935+0.212347im
0.557499+0.934859im 0.604032+0.338489im 0.431962+0.945946im
julia> readdlm("f.txt",'\t',Complex{Float64}) == v
true
Another way is to use a binary format. For long term in-between version serialization BSON (binary json) could be a good option:
julia> using BSON
julia> BSON.bson("v.bson", v = v)
julia> v2 = BSON.load("v.bson")[:v]
2×3 Matrix{ComplexF64}:
0.282157+0.540556im 0.757765+0.103518im 0.979935+0.212347im
0.557499+0.934859im 0.604032+0.338489im 0.431962+0.945946im

Reading CSV file in loop Dataframe (Julia)

I want to read multiple CSV files with changing names like "CSV_1.csv" and so on.
My idea was to simply implement a loop like the following
using CSV
for i = 1:8
a[i] = CSV.read("0.$i.csv")
end
but obviously that won't work.
Is there a simple way of implementing this, like introducing a additional dimension in the dataframe?
Assuming a in this case is an array, this is definitely possible, but to do it this way, you'd need to pre-allocate your array, since you can't assign an index that doesn't exist yet:
julia> a = []
0-element Array{Any,1}
julia> a[1] = 1
ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [1]
Stacktrace:
[1] setindex!(::Array{Any,1}, ::Any, ::Int64) at ./essentials.jl:455
[2] top-level scope at REPL[10]:1
julia> a2 = Vector{Int}(undef, 5);
julia> for i in 1:5
a2[i] = i
end
julia> a2
5-element Array{Int64,1}:
1
2
3
4
5
Alternatively, you can use push!() to add things to an array as you need.
julia> a3 = [];
julia> for i in 1:5
push!(a3, i)
end
julia> a3
5-element Array{Any,1}:
1
2
3
4
5
So for your CSV files,
using CSV
a = []
for i = 1:8
push!(a, CSV.read("0.$i.csv"))
end
You can alternatively to what Kevin proposed write:
# read in the files into a vector
a = CSV.read.(["0.$i.csv" for i in 1:8])
# add an indicator column
for i in 1:8
a[i][!, :id] .= i
end
# create a single data frame with indicator column holding the source
b = reduce(vcat, a)
You can read an arbitrary number of CSV files with a certain pattern in the file name, create a dataframe per file and lastly, if you want, create a single dataframe.
using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes
# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
dfs[i][!, :sample] .= i # I called the new col sample
end
# finally, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)

Julia: Add missing value to array

I have an array that can take Float64 and Missing values:
local x::Array{Union{Float64, Missing}, 1} = [1.0, missing, 3.0]
I can add more Float64 values using the append! function, but I can't add a missing value this way. I get the following error:
julia> append!(x, missing)
ERROR: MethodError: no method matching length(::Missing)
What's the correct way to add missing values to this array?
Yes you are right that push! should be used.
Additionally your code does not need to be so verbose:
julia> x = [1.0, missing, 3.0]
3-element Array{Union{Missing, Float64},1}:
1.0
missing
3.0
julia> y = Union{Missing, Float64}[]
0-element Array{Union{Missing, Float64},1}
julia> push!(y,1);
julia> push!(y,missing)
2-element Array{Union{Missing, Float64},1}:
1.0
missing
Moreover, instead of Array{Union{Float64, Missing}, 1} the shorter and more readable version Vector{Union{Float64, Missing}} can be used.
I should have been using push! - append! is for adding collections, while push! is for single values.

Creating a 1x1 Julia array

I would like to create a 1×1 array (say an Array{Float64,2}) and initialize it to some value. Of course this works:
M=zeros(1,1)
M[1,1]=0.1234
Is there a more concise way to create M and initialize it at the same time?
Since [1.1234] will give you a Vector in Julia the simplest way I could come up with is:
julia> fill(1.234,1,1)
1x1 Array{Float64,2}:
1.234
An alternative is to reshape:
julia> reshape([1.234], 1, 1)
1x1 Array{Float64,2}:
1.234
The existing answers are not what I would recommend. The best way is to use
julia> hcat(5)
1×1 Array{Int64,2}:
5
This is most concise and parallels the [x y] concatenation form.

Resources