I'm trying to assign reactive variable based on my input using loop. For example, I want to select the column (from input) in iris data set. Then get the unique value from that column. And I want to do this in the loop. I find it works for my 'joke' variable, but not for my 'Group[[paste0('Gcol',i)]]' variable. I've been searching answer for this for days.
Thank you for your help in advance!
library(shiny)
data=iris
ui <- fluidPage(
titlePanel("Old Faithful Geyser Data"),
sidebarLayout(
sidebarPanel(
fluidRow(
column(9,wellPanel(lapply(1:5, function(i) {
selectizeInput(paste0('GroupVar',i), paste0('Group ',i), choices = sort(colnames(data)),
options = list(placeholder = 'Select one',
onInitialize = I('function() {
this.setValue(""); }')))
})
)))
),
mainPanel(
fluidRow(column(6, wellPanel(
lapply(1:5, function(i) {
uiOutput(paste0('GroupOpt', i))
})
))),
textOutput("try4"),
textOutput("try2"),
textOutput("try21"),
textOutput("try3"),
textOutput("try")
)
)
)
server <- function(input, output) {
Group=reactiveValues()
for (i in 1:5){
Group[[paste0('Gcol',i)]]=reactive({
data[,which(colnames(data)==
input[[paste0('GroupVar',i)]])]})
}
joke=reactive({data[,which(colnames(data)==input[[paste0('GroupVar',1)]])]})
lapply(1:5, function(i) { output[[paste0('GroupOpt',i)]] = renderUI({
selectInput(paste0("GroupOpt",i), "Select group",multiple=TRUE,
sort(as.character(unique(Group[[paste0('Gcol',i)]])))
)
})})
output$try4 = renderText({print(paste0('it
is',input[[paste0('GroupVar',1)]]))})
output$try2 = renderText({print(dim( Group[[paste0('Gcol',1)]]()))})
output$try21 = renderText({print(class( Group[[paste0('Gcol',1)]]()))})
output$try3 =
renderText({print(which(colnames(data)==input[[paste0('GroupVar',1)]]))})
output$try = renderText({print(unique(as.character(joke())))})
}
# Run the application
shinyApp(ui = ui, server = server)
data[, which(colnames(data)=="Species")] is not a dataframe, this is the column Species, a factor. If you want to allow a one-column dataframe, do data[, which(colnames(data)=="Species"), drop=FALSE]
Replace your loop with the following one, and your app works (but maybe not as you expect; I'm not sure to understand what you want).
for (i in 1:5){
local({
ii <- i
Group[[paste0('Gcol',ii)]]=reactive({
data[,which(colnames(data)==input[[paste0('GroupVar',ii)]])]})
})
}
Related
I'm trying to do a churn analysis with R and SQL Server 2016.
I have uploaded my dataset on my database in a local SQL Server and I did all the preliminary work on this dataset.
Well, now I have this function trainModel() which I would use to estimate my random model forest:
trainModel = function(sqlSettings, trainTable) {
sqlConnString = sqlSettings$connString
trainDataSQL <- RxSqlServerData(connectionString = sqlConnString,
table = trainTable,
colInfo = cdrColInfo)
## Create training formula
labelVar = "churn"
trainVars <- rxGetVarNames(trainDataSQL)
trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~")
formula <- as.formula(temp)
## Train gradient tree boosting with mxFastTree on SQL data source
library(RevoScaleR)
rx_forest_model <- rxDForest(formula = formula,
data = trainDataSQL,
nTree = 8,
maxDepth = 16,
mTry = 2,
minBucket = 1,
replace = TRUE,
importance = TRUE,
seed = 8,
parms = list(loss = c(0, 4, 1, 0)))
return(rx_forest_model)
}
But when I run the function I get this wrong output:
> system.time({
+ trainModel(sqlSettings, trainTable)
+ })
user system elapsed
0.29 0.07 58.18
Warning message:
In tempGetNumObs(numObs) :
Number of observations not available for this data source. 'numObs' set to 1e6.
And for this warning message, the function trainModel() does not create the object rx_forest_model
Does anyone have any suggestions on how to solve this problem?
After several attempts, I found the reason why the function trainModel() did not function properly.
Is not a connection string problem and is not even a data source type issue.
The problem is in the syntax of function trainModel().
It is enough to eliminate from the body of the function the statement:
return(rx_forest_model)
In this way, the function returns the same warning message, but creates the object rx_forest_model in the correct way.
So, the correct function is:
trainModel = function(sqlSettings, trainTable) {
sqlConnString = sqlSettings$connString
trainDataSQL <- RxSqlServerData(connectionString = sqlConnString,
table = trainTable,
colInfo = cdrColInfo)
## Create training formula
labelVar = "churn"
trainVars <- rxGetVarNames(trainDataSQL)
trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~")
formula <- as.formula(temp)
## Train gradient tree boosting with mxFastTree on SQL data source
library(RevoScaleR)
rx_forest_model <- rxDForest(formula = formula,
data = trainDataSQL,
nTree = 8,
maxDepth = 16,
mTry = 2,
minBucket = 1,
replace = TRUE,
importance = TRUE,
seed = 8,
parms = list(loss = c(0, 4, 1, 0)))
}
I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))
I am writing an R-Shiny app. Can some one tell me how to execute a Microsoft SQL query in R Shiny ?
This is what I have done so far:
data <- reactive({
conn <- reactive ({ databaseOpen(serverName="[serverName]", databaseName=[dbName])})
qr <- reactive ({ SELECT * from myTable })
res <- reactive ({databaseQuery(conn = conn,query = qr)})
close(conn)
View(res)
})
Any help is appreciated !
I was able to call a query by creating a function outside of the server and ui functions (in other words, in a global.r). Then the server function could call that query function using one of the inputs in the function.
Here is my code:
queryfunction <- function(zipper){
odbcChannel <- odbcConnect("myconnection")
querydoc <- paste0("
SELECT distinct *
FROM mydb
where substring(nppes_provider_zip,1,2) = '43'
and [provider_type] = 'General Practice'
")
pricetable <- sqlQuery(odbcChannel, querydoc)
close(odbcChannel)
pricetable[which(substring(pricetable$nppes_provider_zip,1,5)==zipper),]
}
server <- shinyServer(function(input, output) {
output$mytable1 <- renderDataTable(data.table(queryfunction(input$zip)))
})
I figured it out. It can be done as:
server.r
serverfun<-function(input, output){
# Storing values in myData variable
myData <- reactive({
# Opening database connection
conn <- databaseOpen(serverName = "myServer",databaseName = "myDB")
# Sample query which uses some input
qr <- paste( "SELECT name FROM Genes g WHERE Id = ",input$myId," ORDER BY name")
# Storing results
res <- databaseQuery(conn = conn,query = qr)
# closing database
databaseClose(conn)
# Returning results
res
})
output$tbTable <- renderTable({
# Checking if myData is not null
if(is.null(myData())){return ()}
# return myData
myData()
})
ui.r
library("shiny")
shinyUI(
pageWithSidebar(
headerPanel("Hide Side Bar example"),
sidebarPanel(
textInput("Id", "Enter ID below","1234")
),
mainPanel(
tabsetPanel(
tabPanel("Data", tableOutput("tbTable"))
)
)
)
)
I'm currently trying to filter a large database using scala. I've written a simple piece of code to match an ID in one database to a list of ID's in another.
Essentially I want to go through database A and if the ID number in the ID column matches one from database B, to extract that entry from Database A.
The code i've written works fine, but it's slow (i.e. has to run over a couple of days) and i'm trying to find a way to speed it up. It may be that it can't be sped up by much, or it can be much much faster with better coding.
So any help would be much appreciated.
Below is a description of the databases and a copy of the code.
Database A is approximately 10gb in size with over 100 million entries and database B has a list of approx 50,000 IDs.
Each database looks like as follows:
Database A:
ID, DataX, date
10, 100,01012000
15, 20, 01012008
5, 32, 01012006
etc...
Database B:
ID
10
15
12
etc...
My code is as follows:
import scala.io.Source
import java.io._
object filter extends App {
def ext[T <: Closeable, R](resource: T)(block: T => R): R = {
try { block(resource) }
finally { resource.close() }
}
val key = io.Source.fromFile("C:\\~Database_B.csv").getLines()
val key2 = new Array[String](50000)
key.copyToArray(key2)
ext(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:\\~Output.csv")))) {
writer =>
val line = io.Source.fromFile("C:\\~Database_A.csv").getLines.drop(1)
while (line.hasNext) {
val data= line.next
val array = data.split(",").map(_.trim)
val idA = array(0)
val dataX = array(1)
val date = array(2)
key2.map { idB =>
if (idA == idB) {
val print = (idA + "," + dataX + "," + date)
writer.write(print)
writer.newLine()
} else None
}
}
}
}
First, there are way more efficient ways to do that than writing a Scala program. Loading two tables in a database and do a join will take about 10 minutes (including data loading) on a modern computer.
Assuming you have to use scala, there is an obvious improvement. Store you keys as a HashSet and use keys.contains(x) instead of traversing all keys. This would give you O(1) lookup instead of O(N) that you have now, which should speed up your program significantly.
Minor point -- use string interpolation instead of concatenation, i.e.
s"$idA,$dataX,$date"
// instead of
idA + "," + dataX + "," + date
Try this:
import scala.io.Source
import java.io._
object filter extends App {
def ext[T <: Closeable, R](resource: T)(block: T => R): R = {
try { block(resource) }
finally { resource.close() }
}
// convert to a Set
val key2 = io.Source.fromFile("C:\\~Database_B.csv").getLines().toSet
ext(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:\\~Output.csv")))) {
writer =>
val lines = io.Source.fromFile("C:\\~Database_A.csv").getLines.drop(1)
for (data <- lines) {
val array = data.split(",").map(_.trim)
array match {
case Array(idA, dataX, date) =>
if (key2.contains(idA)) {
val print = (idA + "," + dataX + "," + date)
writer.write(print)
writer.newLine()
}
case _ => // invalid input
}
}
}
}
IDs are now stored in a set. This will give a better performance.
I've written a small shiny app to test the variable selection function for user uploaded data. Here is my code:
ui.R
shinyUI(pageWithSidebar(
headerPanel("CSV Data explorer"),
sidebarPanel(
fileInput('datafile', 'Choose CSV file',
accept=c('text/csv', 'text/comma-separated-values,text/plain')),
htmlOutput("varselect", inline=TRUE),
selectInput("vars", "Select a variable:",choices=htmlOutput("varselect")),
br()
),
mainPanel(
dataTableOutput("table")
)
))
server.R
shinyServer(function(session,input, output) {
Dataset <- reactive({
infile <- input$datafile
if (is.null(infile)) {
return(NULL)
}
read.csv(infile$datapath)
})
observe({
output$varselect <- renderUI({
if (identical(Dataset(), '') || identical(Dataset(),data.frame())) return(NULL)
updateSelectInput(session, inputId="vars", label="Variables to use:",
choices=names(Dataset()), selected=names(Dataset()))
})
})
output$table <- renderDataTable({
if (is.null(input$vars) || length(input$vars)==0) return(NULL)
return(Dataset()[,input$vars,drop=FALSE])
})
})
if you go ahead and test it on any of your csv files, you will see 2 problems:
1. there is a mess showing all the variable names above the selectInput() box and this is caused by the code:
htmlOutput("varselect", inline=TRUE)
but if I delete this line of code my selectInput is going to disappear.
the selectInput only allows for single variable selection, if I change to
selectInput("vars", "Select a variable:",choices=htmlOutput("varselect"), multiple=TRUE),
and try to click on multiple choices, it is not going to be reflected in the table in the main panel.
I've been struggling with this problem for some time. Could someone help out please! Millions of thanks in advance!
Regards,
mindy
I think you have made yours overly complicated. Briefly, you only really need a uiOutput in place of your htmlOutput and selectInput statements in your ui.R. There also doesn't appear to be any need for the updateSelectInput or observe functions. You can simplify your code to the following and the multiple select appears to work appropriately:
ui.R
shinyUI(pageWithSidebar(
headerPanel("CSV Data explorer"),
sidebarPanel(
fileInput('datafile', 'Choose CSV file',
accept=c('text/csv', 'text/comma-separated-values,text/plain')),
uiOutput("varselect"),
br()
),
mainPanel(
dataTableOutput("table")
)
))
server.R
shinyServer(function(session,input, output) {
Dataset <- reactive({
infile <- input$datafile
if (is.null(infile)) {
return(NULL)
}
read.csv(infile$datapath)
})
output$varselect <- renderUI({
if (identical(Dataset(), '') || identical(Dataset(),data.frame())) return(NULL)
cols <- names(Dataset())
selectInput("vars", "Select a variable:",choices=cols, selected=cols, multiple=T)
})
output$table <- renderDataTable({
if (is.null(input$vars) || length(input$vars)==0) return(NULL)
return(head(Dataset()[,input$vars,drop=FALSE]))
})
})