Related
geniouses. I am a newbie in Julia, but have an ambitious.
I am trying to a following stream so far, of course it's an automatic process.
read data from csv file to DataFrames
Checking the data then cerate DB tables due to DataFrames data type
Insert data from DataFrames to the created table ( eg. SQLite )
I am sticking at No.2 now, because, for example the column's data type 'Vector{String15}'.
I am struggling how can I reflect the datatype to the query of creating table.
I mean I could not find any solutions below (a) (b).
fname = string( #__DIR__,"/","testdata/test.csv")
df = CSV.read( fname, DataFrame )
last = ncol(df)
for i = 1:last
col[i] = typeof(df[!,i]) # ex. Vector{String15}
if String == col[i] # (a) does not work
# create table sql
# expect
query = 'create table testtable( col1 varchar(15),....'
elseif Int == col[i] # (b) does not work
# create table sql
# expect
query = 'create table testtable( col1 int,....'
end
・
・
end
I am wonderring,
I really have to get the type of table column from 'Vector{String15}' anyhow?
Does DataFrames has an utility method to do it?
Should combine with other module to do it?
I am expecting smart tips by you, thanks any advances.
Here is how you can do it both ways:
julia> using DataFrames
julia> using CSV
julia> df = CSV.read("test.csv", DataFrame)
3×3 DataFrame
Row │ a b c
│ String15 Int64 Float64
─────┼─────────────────────────────
1 │ a1234567890 1 1.5
2 │ b1234567890 11 11.5
3 │ b1234567890 111 111.5
julia> using SQLite
julia> db = SQLite.DB("test.db")
SQLite.DB("test.db")
julia> SQLite.load!(df, db, "df")
"df"
julia> SQLite.columns(db, "df")
(cid = [0, 1, 2], name = ["a", " b", " c"], type = ["TEXT", "INT", "REAL"], notnull = [1, 1, 1], dflt_value = [missing, missing, missing], pk = [0, 0, 0])
julia> query = DBInterface.execute(db, "SELECT * FROM df")
SQLite.Query(SQLite.Stmt(SQLite.DB("test.db"), 4), Base.RefValue{Int32}(100), [:a, Symbol(" b"), Symbol(" c")], Type[Union{Missing, String}, Union{Missing, Int64}, UnionMissing, Float64}], Dict(:a => 1, Symbol(" c") => 3, Symbol(" b") => 2), Base.RefValue{Int64}(0))
julia> DataFrame(query)
3×3 DataFrame
Row │ a b c
│ String Int64 Float64
─────┼─────────────────────────────
1 │ a1234567890 1 1.5
2 │ b1234567890 11 11.5
3 │ b1234567890 111 111.5
If you would need more explanations this is covered in chapter 8 of Julia for Data Analysis. This chapter should be available on MEAP in 1-2 weeks (and the source code is already available at https://github.com/bkamins/JuliaForDataAnalysis)
I have a data frame holding name of municipalities and name of states. It looks like this:
my.df <- structure(list(Location = c("Abatiá", "Adrianópolis", "Agudos do Sul",
"Almirante Tamandaré", "Altamira do Paraná", "Altônia"), State = c("PR",
"PR", "PR", "PR", "PR", "PR")), .Names = c("Location", "State"
), row.names = 0:5, class = "data.frame")
What I need to do is to convert this data frame to an array. The expected output would be something like:
my.array$PR
Abatiá, PR
Adrianópolis, PR
Agudos do Sul, PR
...
my.array$RS
Vitória das Missões, RS
Westfalia, RS
Xangri-lá, RS
...
and so on.
How can I get there?
My actual data set has about 10k rows, so a fast solution would perhaps be desirable over legibility. Thanks!
The following should get you what you want.
df = data.frame("location" = c("a", "b", "c", "d", "e", "f"), "state" = c("pr", "pr", "pr", "rs", "rs", "rs"), stringsAsFactors=F)
my.array = lapply(unique(df$state), function(x) paste(df$location[df$state == x], df$state[df$state == x], sep=", "))
names(my.array) = unique(df$state)
my.array$pr
# [1] "a, pr" "b, pr", "c, pr"
I simplified the values in df, but the point remains the same.
because you want a list-like result (i.e., that you can index with $), use split on State. It will naturally produce a list with States as names
One way to do it is to split first
split_df <- split(my.df, my.df$State)
my.array <- sapply(names(split_df), function(name)
paste(split_df[[name]][["Location"]],
", ", name, sep=""),
USE.NAMES = TRUE)
A second way to use split (which, after thinking more about your problem, seems more elegant) is to split after the location, state pairs directly
# First, create a new vector (array) of location, state pairs
# use apply(X, 1, FUN) which works row-wise along X
# and for each row, paste it together
location_state <- apply(my.df,
1,
function(r) paste(r["Location"],
r["State"],
sep=', '))
#Second, split that vector, using State
split(location_state, my.df$State)
Example data
states <- sapply(1:100, function(pass) paste0(sample(LETTERS, 2), collapse=""))
my.df <- data.frame(State=sample(states, 10000, replace=TRUE),
Location=sapply(1:1e4, function(pass) paste0(sample(letters, 5),
collapse="")),
stringsAsFactors=FALSE)
how about using Reduce?
Reduce(function(...) paste(..., sep=", "), my.df)
EDIT: update benchmarking with #thelatemail suggestion
#for your benchmarking using 1 million rows
library(rbenchmark)
df <- data.frame(X=rnorm(1e6), Y=rnorm(1e6))
benchmark(M1=Reduce(function(...) paste(..., sep=", "), df),
M2=do.call(paste, c(df, sep=", ")))
##test replications elapsed relative user.self sys.self user.child sys.child
##1 M1 10 68.60 1.219 68.55 0.00 NA NA
##2 M2 10 56.28 1.000 56.22 0.07 NA NA
do.call(paste, c(df, sep=", ")) is certainly faster!
I have 2 arrays of hashes with same keys but different values.
A = [{:a=>1, :b=>4, :c=>2},{:a=>2, :b=>1, :c=>3}]
B = [{:a=>1, :b=>1, :c=>2},{:a=>1, :b=>3, :c=>3}]
I'm trying to compare 1st hash in A with 1st hash in B and so on using their keys and identify which key and which value is not matching if they do not match. please help.
A.each_key do |key|
if A[key] == B[key]
puts "#{key} match"
else
puts "#{key} dont match"
I am not certain which comparisons you want to make, so I will show ways of answering different questions. You want to make pairwise comparisons of two arrays of hashes, but that's really no more difficult than just comparing two hashes, as I will show later. For now, suppose you merely want to compare two hashes:
h1 = {:a=>1, :b=>4, :c=>2, :d=>3 }
h2 = {:a=>1, :b=>1, :c=>2, :e=>5 }
What keys are in h1 or h2 (or both)?
h1.keys | h2.keys
#=> [:a, :b, :c, :d, :e]
See Array#|.
What keys are in both hashes?
h1.keys & h2.keys
#=> [:a, :b, :c]
See Array#&.
What keys are in h1 but not h2?
h1.keys - h2.keys
#=> [:d]
See Array#-.
What keys are in h2 but not h1?
h2.keys - h1.keys #=> [:e]
What keys are in one hash only?
(h1.keys - h2.keys) | (h2.keys - h1.keys)
#=> [:d, :e]
or
(h1.keys | h2.keys) - (h1.keys & h2.keys)
What keys are in both hashes and have the same values in both hashes?
(h1.keys & h2.keys).select { |k| h1[k] == h2[k] }
#=> [:a, :c]
See Array#select.
What keys are in both hashes and have different values in the two hashes?
(h1.keys & h2.keys).reject { |k| h1[k] == h2[k] }
#=> [:b]
Suppose now we had two arrays of hashes:
a1 = [{:a=>1, :b=>4, :c=>2, :d=>3 }, {:a=>2, :b=>1, :c=>3, :d=>4}]
a2 = [{:a=>1, :b=>1, :c=>2, :e=>5 }, {:a=>1, :b=>3, :c=>3, :e=> 6}]
and wished to compare the hashes pairwise. To do that first take the computation of interest above and wrap it in a method. For example:
def keys_in_both_with_different_values(h1, h2)
(h1.keys & h2.keys).reject { |k| h1[k] == h2[k] }
end
Then write:
a1.zip(a2).map { |h1,h2| keys_in_both_with_different_values(h1, h2) }
#=> [[:b], [:a, :b]]
See Enumerable#zip.
Since you're comparing elements of arrays...
A.each_with_index do |hasha, index|
hashb = B[index]
hasha.each_key do |key|
if hasha[key] == hashb[key]
puts "in array #{index} the key #{key} matches"
else
puts "in array #{index} the key #{key} doesn't match"
end
end
end
edit - added a missing end!
When you are dealing with an array, you should reference an element with open-close bracket '[]' as in
A[index at which lies the element you are looking for]
If you want to access an element in a hash, you want to use open-close bracket with the corresponding key in it, as in
A[:a]
(referencing the value that corresponds to the key ':a', which is of a type symbol.)
In this case, the arrays in question are such that hashes are nested within an array. So for example, the expression B[0][:c] will give 2.
To compare the 1st hash in A with the 1st hash in B, the 2nd hash in A with the second hash in B and so forth, you can use each_with_index method on an Array object ,like so;
A = [{:a=>1, :b=>4, :c=>2},{:a=>2, :b=>1, :c=>3}]
B = [{:a=>1, :b=>1, :c=>2},{:a=>1, :b=>3, :c=>3}]
sym = [:a, :b, :c]
A.each_with_index do |hash_a, idx_a|
sym.each do |sym|
if A[idx_a][sym] == B[idx_a][sym]
puts "Match found! (key -- :#{sym}, value -- #{A[idx_a][sym]})"
else
puts "No match here."
end
end
end
which is checking the values based on the keys, which are symbols, in the following order; :a -> :b -> :c -> :a -> :b -> :c
This will print out;
Match found! (key -- :a, value -- 1)
No match here.
Match found! (key -- :c, value -- 2)
No match here.
No match here.
Match found! (key -- :c, value -- 3)
The method each_with_index may look a little bit cryptic if you are not familiar with it.
If you are uncomfortable with it you might want to check;
http://apidock.com/ruby/Enumerable/each_with_index
Last but not least, don't forget to add 'end'(s) at the end of a block (i.e. the code between do/end) and if statement in your code.
I hope it helps.
(I am definitively using wrong terminology in this question, sorry for that - I just don't know the correct way to describe this in R terms...)
I want to create a structure of heterogeneous objects. The dimensions are not necessary rectangular. What I need would be probably called just "array of objects" in other languages like C. By 'object' I mean a structure consisting of different members, i.e. just a list in R - for example:
myObject <- list(title="Uninitialized title", xValues=rep(NA,50), yValues=rep(NA,50))
and now I would like to make 100 such objects, and to be able to address their members by something like
for (i in 1:100) {myObject[i]["xValues"]<-rnorm(50)}
or
for (i in 1:100) {myObject[i]$xValues<-rnorm(50)}
I would be grateful for any hint about where this thing is described.
Thanks in advance!
are you looking for the name of this mythical beast or just how to do it? :) i could be wrong, but i think you'd just call it a list of lists.. for example:
# create one list object
x <- list( a = 1:3 , b = c( T , F ) , d = mtcars )
# create a second list object
y <- list( a = c( 'hi', 'hello' ) , b = c( T , F ) , d = matrix( 1:4 , 2 , 2 ) )
# store both in a third object
z <- list( x , y )
# access x
z[[ 1 ]]
# access y
z[[ 2 ]]
# access x's 2nd object
z[[ 1 ]][[ 2 ]]
I did not realize that you were looking for creating other objects of same structure. You are looking for replicate.
my_fun <- function() {
list(x=rnorm(1), y=rnorm(1), z="bla")
}
replicate(2, my_fun(), simplify=FALSE)
# [[1]]
# [[1]]$x
# [1] 0.3561663
#
# [[1]]$y
# [1] 0.4795171
#
# [[1]]$z
# [1] "bla"
#
#
# [[2]]
# [[2]]$x
# [1] 0.3385942
#
# [[2]]$y
# [1] -2.465932
#
# [[2]]$z
# [1] "bla"
here is the example of solution I have for the moment, maybe it will be useful for somebody:
NUM <- 1000 # NUM is how many objects I want to have
xVal <- vector(NUM, mode="list")
yVal <- vector(NUM, mode="list")
title <- vector(NUM, mode="list")
for (i in 1:NUM) {
xVal[i]<-list(rnorm(50))
yVal[i]<-list(rnorm(50))
title[i]<-list(paste0("This is title for instance #", i))
}
myObject <- list(xValues=xVal, yValues=yVal, titles=title)
# now I can address any member, as needed:
print(myObject$titles[[3]])
print(myObject$xValues[[4]])
If the dimensions are always going to be rectangular (in your case, 100x50), and the contents are always going to be homogeneous (in your case, numeric) then create a 2D array/matrix.
If you want the ability to add/delete/insert on individual lists (or change the data type), then use a list-of-lists.
The problem is that the order in which I "insert" elements in an array changes throughout the execution of the script.
Here is a quick reproduction of the problem:
#!/bin/bash
# : \
exec /home/binops/afse/eer/eer_SPI-7.3.1/tclsh "$0" "$#"
proc myProc { theArray } {
upvar $theArray theArrayInside
parray theArrayInside
puts "------"
foreach { key value } [array get theArrayInside] {
puts "$key => $value"
}
}
# MAIN
set myArray(AQHI) AQHI
set myArray(O3) 1
set myArray(NO2) 2
set myArray(PM2.5) 3
parray myArray
puts "------"
myProc myArray
Output is:
myArray(AQHI) = AQHI
myArray(NO2) = 2
myArray(O3) = 1
myArray(PM2.5) = 3
------
theArrayInside(AQHI) = AQHI
theArrayInside(NO2) = 2
theArrayInside(O3) = 1
theArrayInside(PM2.5) = 3
------
PM2.5 => 3
O3 => 1
NO2 => 2
AQHI => AQHI
Notice I didn't use generic keys like A, B, C and generic values like 1, 2, 3, as you may have expected. This is because the order isn't messed up with these generic keys/values. Maybe this can help identify the problem.
Also notice that the initial order (AQHI, O3, NO2, PM2.5) is lost even at the first call to parray (order is now AQHI, NO2, O3, PM2.5; alphabetically sorted?). It is then changed again upon calling array get ... (inversed?)
So anyways, the question is: How can I make sure the initial order is kept?
You're making the mistake of equating Tcl arrays to those in a language like C, where it's a list of elements. Instead, Tcl arrays are maps (from a key to a value) like a HashMap in Java, and the order of elements is not preserved.
You may be better off using a list (if you just have a number of items you need to store in order).
If you're using 8.5 or higher, a dict if you actually have a mapping of keys to values, since dictionaries are order preserving maps. There are dict backports to Tcl versions before 8.5, but I'm not sure whether they preserve order (and they're slower).
If you can't use 8.5 dicts and need key/value pairs, one option is to use a list of key value pairs and then use lsearch to pull out the values you need
> set mylist {{key1 value1} {key2 value2} {key3 value3}}
> lsearch -index 0 $mylist key2
0
> lindex $mylist [list [lsearch -index 0 $mylist key2] 1]
> value2
> proc kv_lookup {dictList key} {
set index [lsearch -index 0 $dictList $key]
if {$index < 0} {
error "Key '$key' not found in list $dictList"
}
return [lindex $dictList [list $index 1]]
}
> kv_lookup $mylist key2
value2
Man pages for 8.4 are here
You may also want to look at this page on keyed lists for Tcl. It implements what I mentioned above, plus some other useful commands.
For an example of the different between an ordered "map" and an unordered one, you can take a look at the two java classes HashMap (unordered) and LinkedHashMap (ordered).
Tcl is flexible enough that one can devise many schemes to handle what you want. Here's an idea that stores the order of your keys within the array itself, assuming the empty string is not a valid key in your data:
proc array_add {ary_name key value} {
upvar 1 $ary_name ary
set ary($key) $value
lappend ary() $key
}
proc array_foreach {var_name ary_name script} {
upvar 1 $var_name var
upvar 1 $ary_name ary
foreach var $ary() {
uplevel 1 $script
}
}
array_add a foo bar
array_add a baz qux
array_add a abc def
array_add a ghi jkl
array_foreach key a {puts "$key -> $a($key)"}
# foo -> bar
# baz -> qux
# abc -> def
# ghi -> jkl
array names a
# ghi {} foo baz abc
array get a
# ghi jkl {} {foo baz abc ghi} foo bar baz qux abc def
parray a
# a() = foo baz abc ghi
# a(abc) = def
# a(baz) = qux
# a(foo) = bar
# a(ghi) = jkl