I'm an introductory member of Python.
I are implementing code to organize data with Python.
I have to extract a value that meets only certain conditions out of the numerous lists.
It seems very simple, but it feels too difficult for me.
First, let me explain with the simplest example
solutions
Out[73]:
array([[ 2.31350622e-04, -1.42539948e-02, -7.17361833e-02,
2.17545418e-01, -3.38251827e-01, 1.88254191e-01],
[ 4.23523963e-82, -9.48255372e-81, 5.22018863e-80,
-1.11271010e-79, 1.03507672e-79, -3.55573390e-80],
[ 2.31350597e-04, -1.42539951e-02, -7.17361800e-02,
2.17545409e-01, -3.38251817e-01, 1.88254187e-01],
[ 2.58309722e-02, -6.21550000e-01, 3.41867505e+00,
-7.53828444e+00, 7.09091365e+00, -2.39409614e+00],
[ 2.31350606e-04, -1.42539950e-02, -7.17361809e-02,
2.17545411e-01, -3.38251820e-01, 1.88254188e-01],
[ 1.14525725e-02, -3.25174709e-01, 2.11632584e+00,
-5.16113713e+00, 5.12508331e+00, -1.78380602e+00],
[ 9.75839726e-03, -3.08729919e-01, 2.26983591e+00,
-6.16462170e+00, 6.76409438e+00, -2.55992476e+00],
[ 1.13190092e-03, -6.72042220e-02, 7.10413638e-01,
-2.39952623e+00, 2.94849402e+00, -1.18046338e+00],
[ 5.24406689e-03, -1.86240596e-01, 1.36500589e+00,
-3.61106144e+00, 3.75606312e+00, -1.34699295e+00]])
coeff
Out[74]:
array([[ 1.03177808e-04, -6.35700011e-03, -3.19929208e-02,
9.70209594e-02, -1.50853634e-01, 8.39576506e-02,
4.45980248e-01],
[ 5.13911499e-83, -1.15062991e-81, 6.33426960e-81,
-1.35018220e-80, 1.25598048e-80, -4.31459067e-81,
1.21341776e-01],
[ 1.03177797e-04, -6.35700027e-03, -3.19929194e-02,
9.70209556e-02, -1.50853630e-01, 8.39576490e-02,
4.45980249e-01],
[ 4.26209161e-03, -1.02555298e-01, 5.64078896e-01,
-1.24381145e+00, 1.16999559e+00, -3.95024121e-01,
1.64999272e-01],
[ 1.03177801e-04, -6.35700023e-03, -3.19929198e-02,
9.70209566e-02, -1.50853631e-01, 8.39576495e-02,
4.45980248e-01],
[ 2.27512838e-03, -6.45980810e-02, 4.20421959e-01,
-1.02529362e+00, 1.01813129e+00, -3.54364724e-01,
1.98656535e-01],
[ 1.42058482e-03, -4.49435521e-02, 3.30432790e-01,
-8.97418681e-01, 9.84687293e-01, -3.72662657e-01,
1.45575629e-01],
[ 2.46722650e-04, -1.46486353e-02, 1.54850246e-01,
-5.23029411e-01, 6.42688990e-01, -2.57307904e-01,
2.17971950e-01],
[ 1.30617191e-03, -4.63880878e-02, 3.39990392e-01,
-8.99429225e-01, 9.35545685e-01, -3.35503798e-01,
2.49076135e-01]])
In a matrix defined as 'numpy', called 'solutions', each row represents 'solutions[0]','solutions[1]', 'solutions[i]'... In addition, the 'coeff' is also defined as 'numpy', and the 'coeff[0]','coeff[1]','coeff[i]'... is matched to 'solutions[0]','solutions[1]','solutions[i]'...
What I want at this time is to find specific 'solution[i]' and 'coeff[i]' where all elements of solutions[i] are less than 10^-10 and all elements of coeff[i] are greater than 10^-3.
I wonder if there is an appropriate code to extract a list array in a situation that meets more than one condition. I'm a Python initiator, so please excuse me.
This can be accomplished using advanced indexing:
solution_valid = np.all(solutions < 10e-10, axis=1)
coeff_valid = np.all(coeff > 1e-3, axis=1)
both_valid = coeff_valid & solution_valid
valid_solutions = solutions[both_valid]
valid_coeffs = coeff[both_valid]
but perhaps you mean that the absolute value should be greater or below a certain threshold?
solution_valid = np.all(np.abs(solutions) < 10e-10, axis=1)
coeff_valid = np.all(np.abs(coeff) > 1e-3, axis=1)
both_valid = coeff_valid & solution_valid
valid_solutions = solutions[both_valid]
valid_coeffs = coeff[both_valid]
Related
I’m currently using torch.sub alongside torch.div to obtain the MAPE between my predicted and true labels for my neural network although I’m not getting the answers I’m expecting. According to the example in the documentation, I should be getting a 4x1 tensor, not 4x4.
Could anyone clear this up for me?
print('y_true ', y_true)
y_true tensor([[ 46],
[262],
[ 33],
[ 35]], device=‘cuda:0’, dtype=torch.int16)
print('y_pred ', y_pred)
y_pred tensor([[[308.5075]],
[[375.8983]],
[[389.4587]],
[[406.4957]]], device=‘cuda:0’, grad_fn=)
print('torch.sub ', torch.sub(y_true, y_pred))
torch.sub tensor([[[-262.5075],
[ -46.5075],
[-275.5075],
[-273.5075]],
[[-329.8983],
[-113.8983],
[-342.8983],
[-340.8983]],
[[-343.4587],
[-127.4587],
[-356.4587],
[-354.4587]],
[[-360.4957],
[-144.4957],
[-373.4957],
[-371.4957]]], device='cuda:0', grad_fn=<SubBackward0>)
That is because y_pred has an extra dimension which means the y_true tensor
probably gets broadcasted to the correct dimension.
If you remove the extra last dimension you get the desired result:
>>> torch.sub(y_true, y_pred[...,0]).shape
torch.Size([4, 1])
I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})
I am trying to access a specific value inside an array. The array contains specific class instance variables and is as follows:
[[#<Supermarket:0x007f8e989daef8 #id=1, #name="Easybuy">,
#<Delivery:0x007f8e989f98a8 #type=:standard, #price=5.0>],
[#<Supermarket:0x007f8e99039f88 #id=2, #name="Walmart">,
#<Delivery:0x007f8e989f98a8 #type=:standard, #price=5.0>],
[#<Supermarket:0x007f8e9901a390 #id=3, #name="Forragers">,
#<Delivery:0x007f8e989eae20 #type=:express, #price=10.0>]]
I want to iterate over each array inside the array and find out how many Delivery's within the array have #type:standard. Is this possible? Thank you in advance
array_of_array.inject(0) do |sum, array|
sum + array.count { |el| el.class == Delivery && el.instance_variable_get(:#type) == :standard }
end
You can use select() to filter the elements of an array.
Reconstructing your data:
require 'ostruct'
require 'pp'
supermarket_data = [
['Easybuy', 1],
['Walmart', 2],
['Forragers', 3],
]
supermarkets = supermarket_data.map do |(name, id)|
supermarket = OpenStruct.new
supermarket.name = name
supermarket.id = id
supermarket
end
delivery_data = [
['standard', 5.0],
['standard', 5.0],
['express', 10.0],
]
deliveries = delivery_data.map do |(type, price)|
delivery = OpenStruct.new
delivery.type = type
delivery.price = price
delivery
end
combined = supermarkets.zip deliveries
pp combined
[[#<OpenStruct name="Easybuy", id=1>,
#<OpenStruct type="standard", price=5.0>],
[#<OpenStruct name="Walmart", id=2>,
#<OpenStruct type="standard", price=5.0>],
[#<OpenStruct name="Forragers", id=3>,
#<OpenStruct type="express", price=10.0>]]
Filtering the array with select():
standard_deliveries = combined.select do |(supermarket, delivery)|
delivery.type == 'standard'
end
pp standard_deliveries # pretty print
p standard_deliveries.count
[[#<OpenStruct name="Easybuy", id=1>,
#<OpenStruct type="standard", price=5.0>],
[#<OpenStruct name="Walmart", id=2>,
#<OpenStruct type="standard", price=5.0>]]
2
Want to extract every marketID from every market that has a marketName == 'Moneyline'. Tried a few combinations of .maps, .rejects, and/or .selects but can't narrow it down as the complicated structure is confusing me.
There are many markets in events, and there are many events as well. A sample of the structure (tried to edit it for brevity):
{"currencyCode"=>"GBP",
"eventTypes"=>[
{"eventTypeId"=>6423,
"eventNodes"=>[
{"eventId"=>28017227,
"event"=>
{"eventName"=>"Philadelphia # Seattle"
},
"marketNodes"=>[
{"marketId"=>"1.128274650",
"description"=>
{"marketName"=>"Moneyline"}
},
{"marketId"=>"1.128274625",
"description"=>
{"marketName"=>"Winning Margin"}
}}}]},
{"eventId"=>28018251,
"event"=>
{"eventName"=>"Arkansas # Mississippi State"
},
"marketNodes"=>[
{"marketId"=>"1.128299882",
"description"=>
{"marketName"=>"Under/Over 60.5pts"}
},
{"marketId"=>"1.128299881",
"description"=>
{"marketName"=>"Moneyline"}
}}}]},
{"eventId"=> etc....
Tried all kinds of things, for example,
markets = json["eventTypes"].first["eventNodes"].map {|e| e["marketNodes"].map { |e| e["marketId"] } if (e["marketNodes"].map {|e| e["marketName"] == 'Moneyline'})}
markets.flatten
# => yields every marketId not every marketId with marketName of 'Moneyline'
Getting a simple array with every marketId from Moneyline markets with no other information is sufficient. Using Rails methods is fine too if preferred.
Sorry if my editing messed up the syntax. Here's the source. It looks like this only with => instead of : after parsing the JSON.
Thank you!
I love nested maps and selects :D
require 'json'
hash = JSON.parse(File.read('data.json'))
moneyline_market_ids = hash["eventTypes"].map{|type|
type["eventNodes"].map{|node|
node["marketNodes"].select{|market|
market["description"]["marketName"] == 'Moneyline'
}.map{|market| market["marketId"]}
}
}.flatten
puts moneyline_market_ids.join(', ')
#=> 1.128255531, 1.128272164, 1.128255516, 1.128272159, 1.128278718, 1.128272176, 1.128272174, 1.128272169, 1.128272148, 1.128272146, 1.128255464, 1.128255448, 1.128272157, 1.128272155, 1.128255499, 1.128272153, 1.128255484, 1.128272150, 1.128255748, 1.128272185, 1.128278720, 1.128272183, 1.128272178, 1.128255729, 1.128360712, 1.128255371, 1.128255433, 1.128255418, 1.128255403, 1.128255387
Just for fun, here's another possible answer, this time with regexen. It is shorter but might break depending on your input data. It reads the json data directly as String :
json = File.read('data.json')
market_ids = json.scan(/(?<="marketId":")[\d\.]+/)
market_names = json.scan(/(?<="marketName":")[^"]+/)
moneyline_market_ids = market_ids.zip(market_names).select{|id,name| name=="Moneyline"}.map{|id,_| id}
puts moneyline_market_ids.join(', ')
#=> 1.128255531, 1.128272164, 1.128255516, 1.128272159, 1.128278718, 1.128272176, 1.128272174, 1.128272169, 1.128272148, 1.128272146, 1.128255464, 1.128255448, 1.128272157, 1.128272155, 1.128255499, 1.128272153, 1.128255484, 1.128272150, 1.128255748, 1.128272185, 1.128278720, 1.128272183, 1.128272178, 1.128255729, 1.128360712, 1.128255371, 1.128255433, 1.128255418, 1.128255403, 1.128255387
It outputs the same result as the other answer.
I'm trying to follow a document that has some code on text mining clustering analysis.
I'm fairly new to R and the concept of text mining/clustering so please bear with me if i sound illiterate.
I create a simple matrix called dtm and then run kmeans to produce 3 clusters. The code im having issues is where a function has been defined to get "five most common words of the documents in the cluster"
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
Upon running this code i get the following error:
Error in colSums(df[clusters == n, ]) :
'x' must be an array of at least two dimensions
Could someone help me fix this please?
Thank you.
I can't reproduce your error, it works fine for me. Update your question with a reproducible example and you might get a more useful answer. Perhaps your input data object is empty, what do you get with dim(dtm0.75)?
Here it is working fine on the data that comes with the tm package:
library(tm)
data(crude)
dt0.75 <- DocumentTermMatrix(crude)
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
word freq
the the 69
and and 25
for for 12
government government 11
oil oil 10