Julia plotting unknown number of layers in Gadfly - arrays

I'm trying to create a plot in Julia (currently using Gadfly but I'd be willing to use a different package). I have a multidimensional array. For a fixed dimension size (e.g. 4875x3x3 an appropriate plot would be:
p=Gadfly.plot(
layer(y=sim1.value[:,1,1],x=[sim1.range],Geom.line, Theme(default_color=color("red"))),
layer(y=sim1.value[:,1,2],x=[sim1.range],Geom.line, Theme(default_color=color("blue"))),
layer(y=sim1.value[:,1,3],x=[sim1.range],Geom.line, Theme(default_color=color("green")))
)
but in general I want to be able to write a plot statement where I do not know the third dimension of the sim1.value array. How can I write such a statement?
perhaps something like:
p=Gadfly.plot([layer(y=sim1.value[:,1,i],x=[sim1.range], Geom.line, Theme(default_color=color("red"))) for i in 1:size(sim1)[3]])
but this doesn't work.
I was able to solve this problem by reshaping the array into a dataframe and adding a column to indicate what the third dimension is, but I was wondering if there was a way to do this without creating a dataframe.
Data look something like this:
julia> sim1.value
4875x3x3 Array{Float64,3}:
[:, :, 1] =
0.201974 0.881742 0.497407
0.0751914 0.921308 0.732588
-0.109084 1.06304 1.15962
-0.0149133 0.896267 1.22897
0.717094 0.72558 0.456043
0.971697 0.792255 0.40328
0.971697 0.792255 0.227884
-0.600564 1.23815 0.499631
-0.881391 1.07994 0.59905
-0.530923 1.00278 0.447363
⋮
0.866138 0.657875 0.280823
1.00881 0.594015 0.894645
0.470741 0.859117 1.09108
0.919887 0.540488 1.01126
2.22095 0.194968 0.954895
2.5013 0.202698 2.05665
1.94958 0.257192 2.01836
2.24015 0.209885 1.67657
0.76246 0.739945 2.2389
0.673887 0.640661 2.15134
[:, :, 2] =
1.28742 0.760712 1.61112
2.21436 0.229947 1.87528
-1.66456 1.46374 1.94794
-2.4864 1.84093 2.34668
-2.79278 1.61191 2.22896
-1.46289 1.21712 1.96906
-0.580682 1.3222 1.45223
0.17112 1.20572 0.74517
0.734113 0.629927 1.43462
1.29676 0.266065 1.52497
⋮
1.2871 0.595874 0.195617
1.84438 0.383567 1.15537
2.12446 0.520074 0.957211
2.36307 0.222486 0.402168
2.43727 0.19843 0.636037
2.33525 0.302378 0.811371
1.09497 0.605816 0.297978
1.366 0.56246 0.343701
1.366 0.56246 0.219561
1.35889 0.630971 0.281955
[:, :, 3] =
0.649675 0.899028 0.628103
0.718837 0.665043 0.153844
0.914646 0.807048 0.207743
0.612839 0.790611 0.293676
0.759457 0.758115 0.280334
0.77993 0.774677 0.396879
-1.63825 1.38275 0.85772
-1.43517 1.45871 0.835853
-1.15413 1.35757 1.05071
-1.10967 1.37525 0.685986
⋮
1.15299 0.561492 0.680718
1.14853 0.629728 0.294947
1.65147 0.517422 0.22285
1.65147 0.517422 0.517451
1.78835 0.719658 0.745866
2.36554 0.426616 1.49432
0.855502 0.739237 1.24224
-0.175234 0.701025 1.07798
-0.221313 0.939255 1.3463
1.58094 0.368615 1.63817

Apparently "splatting", if that's the correct term, works here. Try:
p=Gadfly.plot([layer(y=sim1.value[:,1,i],x=[sim1.range], Geom.line, Theme(default_color=color("red"))) for i in 1:size(sim1)[3]]...)
For different layer colors, this is just a guess/hack (feel free to edit for correctness).
p=Gadfly.plot([layer(y=sim1.value[:,1,i],x=[sim1.range], Geom.line, Theme(default_color=color(["red" "blue" "green" "cyan" "magenta" "yellow"][i%6+1]))) for i in 1:size(sim1)[3]]...)
Perhaps one of Gadfly's Scale color parameters would help here.
Addendum:
See first comment below for color selection method.

Related

Generate all combinations of the SUM in Ruby but only using specific amount of numbers

I am currently pulling in F1 prices from an Api, placing them into an Array. and determining what combination is less than or equal to 20. Using the below successfully:
require 'net/http'
require 'json'
#url = 'HIDDEN URL AS HAS NO RELEVANCE'
#uri = URI(#url)
#response = Net::HTTP.get(#uri)
#fantasy = JSON.parse(#response)
arr= [[#fantasy.first["Mercedes"].to_f, #fantasy.first["Ferrari"].to_f], [#fantasy.first["Hamilton"].to_f, #fantasy.first["Verstappen"].to_f]]
target = 20
#array = arr[0].product(*arr[1..-1]).select { |a| a.reduce(:+) <= target }
Where:
#fantasy = [{"Mercedes" => "4", "Ferrari" => "6.2", "Hamilton" => "7.1", "Verstappen" => "3"}]
This is successfully outputting:
[[4.0, 7.1], [4.0, 3.0], [6.2, 7.1], [6.2, 3.0]]
Eventually this will contain all F1 teams on the left side and all F1 drivers on the right (making an F1 fantasy teambuilder). But the idea is that only 1 constructor is needed and 5 drivers for the combination that should be equal or less than 20.
Is there a way to define this? To only use 1 Team (Mercedes, Ferrari etc) and 5 drivers (Hamilton, Verstappen etc) in the calculation? Obviously do not have 5 drivers included yet as just testing. So that my output would be:
[[4.0, 7.1, 3.0], [6.2, 7.1, 3.0]]
Where the constructor forms the 'base' for the calculation and then it can have any 5 of the driver calls?
My final question is, considering what I am trying to do, is this the best way to put my API into an array? As in to manually place #fantasy.first["Mercedes"].to_f inside my array brackets?
Thanks!
Not sure if I understand the question, but does this help?
arr = #fantasy.first.values.map(&:to_f)
target = 20
p result = arr.combination(2).select{|combi| combi.sum <= target}

Wrong outputs from torch.sub?

I’m currently using torch.sub alongside torch.div to obtain the MAPE between my predicted and true labels for my neural network although I’m not getting the answers I’m expecting. According to the example in the documentation, I should be getting a 4x1 tensor, not 4x4.
Could anyone clear this up for me?
print('y_true ', y_true)
y_true tensor([[ 46],
[262],
[ 33],
[ 35]], device=‘cuda:0’, dtype=torch.int16)
print('y_pred ', y_pred)
y_pred tensor([[[308.5075]],
[[375.8983]],
[[389.4587]],
[[406.4957]]], device=‘cuda:0’, grad_fn=)
print('torch.sub ', torch.sub(y_true, y_pred))
torch.sub tensor([[[-262.5075],
[ -46.5075],
[-275.5075],
[-273.5075]],
[[-329.8983],
[-113.8983],
[-342.8983],
[-340.8983]],
[[-343.4587],
[-127.4587],
[-356.4587],
[-354.4587]],
[[-360.4957],
[-144.4957],
[-373.4957],
[-371.4957]]], device='cuda:0', grad_fn=<SubBackward0>)
That is because y_pred has an extra dimension which means the y_true tensor
probably gets broadcasted to the correct dimension.
If you remove the extra last dimension you get the desired result:
>>> torch.sub(y_true, y_pred[...,0]).shape
torch.Size([4, 1])

scala collect always returns Array of type Any

I am trying to run collect on a dataframe and then perform operations on the elements of the resultant Array.
scala> scans.withColumn("src_port_list", udf((x: Seq[Int]) => x.distinct).apply($"src_port_list")).select("src_port_list").collect().map(_(0))
res29: Array[Any] = Array(WrappedArray(38897, 35378, 32947, 24280, 33181, 24782, 40937, 20824, 39685, 39841, 40191, 39031, 40981, 40919, 24436, 39765, 39784, 39881, 41037, 41079, 38874, 39916, 39788, 40468, 40041, 40941, 39325, 38902, 38896, 36151, 41061, 41016, 38921, 39269, 24437, 39001, 24282, 38923, 38920, 39835, 38901, 37585, 38922, 40977, 38898, 39862, 40926, 39909, 38743, 39774, 39761, 40918), WrappedArray(50974, 50998, 51947, 51428, 51012, 50996, 50984, 51564, 51037, 51045, 50980, 51027, 51010, 51036, 51030, 51025, 50992, 50983, 50993, 51009, 50991, 50989, 50990, 51011, 51031, 50987, 50986, 50985, 51028, 51041, 51001, 51035, 51029, 51026, 50995, 50976, 50997, 50981, 50994, 50988, 50975), WrappedArray(53148, 52396, 52318, 52422, 52420, 53064, 52394, 52329, 53156, 53072, 53126, 53...
I need to cast the WrappedArrays inside the resultant Array to sets so that I can perform union / intersection operations, but because they are being treated as type Any, I cannot perform any of the casting operations.
Figured it out as I was posting the question:
Need to import scala.collection.mutable.WrappedArray and use .asInstanceOf[WrappedArray[Int]]

Sorting a second dictionary following the keys of the first dictionary

l have three arrays namely:
ref_labels=array(['hammerthrow_g10_c07', 'wallpushups_g08_c04', 'archery_g09_c03',..., 'frisbeecatch_g09_c03', 'tabletennisshot_g12_c01',
'surfing_g10_c03'], dtype='<U26')
ref_labels is of shape (3000,)
ref_labels is the reference order for two other arrays namely :
to_be_ordered_labels=array(['walkingwithdog_g08_c01', 'nunchucks_g13_c02', ....,'javelinthrow_g09_c03', 'playingflute_g12_c04', 'benchpress_g12_c02', 'frisbeecatch_g14_c01', 'jumpingjack_g13_c07', 'handstandpushups_g08_c05'], dtype='<U28')
Which is of shape (3000,)py
I have also a numpy array of float
to_be_ordered_arrays_of_float which is of shape (3000,101)
Here is a sample from
to_be_ordered_arrays_of_float[0]
array([6.80778456e-08, 1.58984292e-08, 2.69517453e-09, 2.82882096e-09,
1.35314554e-06, 2.66444680e-08, 1.96892984e-06, 1.64217184e-07,
2.40923086e-08, 2.35174169e-09, 1.45098711e-09, 2.10457629e-09,
6.51394956e-08, 4.71427897e-10, 2.48873818e-07, 2.25375985e-08,
1.56526866e-07, 5.60892097e-08, 1.95728759e-07, 7.24156690e-09,
1.33053675e-06, 1.06113225e-08, 3.07328882e-08, 1.58847371e-07,
1.85805094e-09, 4.20591455e-08, 9.77163683e-09, 5.33082073e-07,
4.52592142e-09, 6.20161609e-06, 4.25105497e-08, 8.63415792e-08,
1.98478956e-05, 5.02593911e-10, 9.98565793e-01, 2.76135781e-09,
3.33678649e-08, 2.11770342e-07, 8.09025558e-09, 3.98751210e-09,
8.28181399e-08, 9.51544799e-09, 9.00462692e-06, 3.11626500e-05,
4.00733006e-06, 2.63792316e-07, 8.75839589e-07, 6.86739767e-08,
1.00570272e-08, 4.86615797e-08, 2.16352909e-08, 2.04790371e-08,
1.72958153e-07, 5.78688697e-09, 4.83830753e-09, 3.75843297e-06,
6.00361894e-09, 8.48605123e-06, 1.46872461e-08, 2.71486789e-09,
2.72728915e-08, 9.99970240e-09, 2.69397837e-08, 5.73341836e-08,
3.06793368e-09, 3.16495052e-10, 5.69838967e-08, 1.04099172e-07,
7.12405024e-09, 1.70841350e-08, 1.58363335e-07, 7.10246439e-09,
1.65444236e-09, 3.54519578e-08, 5.11049834e-08, 9.68790381e-09,
2.10373469e-06, 1.54864466e-09, 2.11581687e-06, 4.93066139e-08,
1.78782467e-09, 3.54902490e-08, 1.40120218e-08, 1.82792789e-07,
8.51292086e-08, 9.88524320e-08, 3.18586721e-08, 3.76303788e-08,
1.85764435e-08, 6.87650381e-09, 2.80555332e-06, 2.55424425e-06,
1.33028883e-03, 2.45268382e-07, 1.37083349e-08, 3.04683105e-08,
1.82895951e-06, 4.65470373e-09, 6.83182293e-08, 3.18085824e-08,
2.54011603e-08], dtype=float32)
My question is how can l reorder to_be_ordered_labels , to_be_ordered_arrays_of_float given the order in ref_labels ?
What l have tried ?
I created a random array in order to build a dictionary where ref_labels represent keys then reorder as follow :
random_arrays=np.random.rand(3000,101)
dic1=dict(zip(ref_labels,random_arrays))
dic2=dict(zip(to_be_ordered_labels,to_be_ordered_arrays_of_float))
ordered_dic2=sorted(dic2.items(), key=lambda kv: dic1[kv[0]])
However l get the following error :
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Thank you for your help
My question is how can l reorder to_be_ordered_labels , to_be_ordered_arrays_of_float given the order in ref_labels ?
If I understand it correctly, what you want to do is the following:
import numpy as np
ref = np.array(['labels', 'that', 'define', 'the', 'order'])
other_labels = np.array(['other', 'labels', 'to', 'be', 'sorted'])
rand_data = np.random.randn(5, 10)
idx_sort = np.argsort(ref)
sorted_labels = other_labels[idx_sort]
rand_data = rand_data[idx_sort, :]
If to you want to have an ordered dict, you might want to check the OrderedDict class from the collections library.

using lookup tables to plot a ggplot and table

I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})

Resources