This question is closely related to this one and I will consider the advice given with respect to schema design in a NoSQL context, yet I'm curious to understand this:
Actual questions
Suppose you have the following document:
_id : 2 abcd
name : 2 unittest.com
paths : 4
0 : 3
path : 2 home
queries : 4
0 : 3
name : 2 query1
url : 2 www.unittest.com/home?query1
requests: 4
1 : 3
name : 2 query2
url : 2 www.unittest.com/home?query2
requests: 4
Basically, I'd like to know
if it is possible to use MongoDB's positional $ operator (details) multiple times, or put differently, in update scenarios that involve array/document structures with a "degree of nestedness" greater than 1:
{ <update operator>: { "paths.$.queries.$.requests" : value } } (doesn't work)
instead of "only" be able to use $ once for a top-level array and being bound to use explicit indexes for arrays on "higher levels":
{ <update operator>: { "paths.$.queries.0.requests" : value } }) (works)
if possible at all, how the corresponding R syntax would look like.
Below you'll find a reproducible example. I tried to be as concise as possible.
Code example
Database connection
require("rmongodb")
db <- "__unittest"
ns <- paste(db, "hosts", sep=".")
# CONNCETION OBJECT
con <- mongo.create(db=db)
# ENSURE EMPTY DB
mongo.remove(mongo=con, ns=ns)
Example document
q <- list("_id"="abcd")
b <- list("_id"="abcd", name="unittest.com")
mongo.insert(mongo=con, ns=ns, b=b)
q <- list("_id"="abcd")
b <- list("$push"=list(paths=list(path="home")))
mongo.update(mongo=con, ns, criteria=q, objNew=b)
q <- list("_id"="abcd", paths.path="home")
b <- list("$push"=list("paths.$.queries"=list(
name="query1", url="www.unittest.com/home?query1")))
mongo.update(mongo=con, ns, criteria=q, objNew=b)
b <- list("$push"=list("paths.$.queries"=list(
name="query2", url="www.unittest.com/home?query2")))
mongo.update(mongo=con, ns, criteria=q, objNew=b)
Update of nested arrays with explicit position index (works)
This works, but it involves an explicit index for the second-level array queries (nested in a subdoc element of array paths):
q <- list("_id"="abcd", paths.path="home", paths.queries.name="query1")
b <- list("$push"=list("paths.$.queries.0.requests"=list(time="2013-02-13")))
> mongo.bson.from.list(b)
$push : 3
paths.$.queries.0.requests : 3
time : 2 2013-02-13
mongo.update(mongo=con, ns, criteria=q, objNew=b)
res <- mongo.find.one(mongo=con, ns=ns, query=q)
> res
_id : 2 abcd
name : 2 unittest.com
paths : 4
0 : 3
path : 2 home
queries : 4
0 : 3
name : 2 query1
requests : 4
0 : 3
time : 2 2013-02-13
url : 2 www.unittest.com/home?query1
1 : 3
name : 2 query2
url : 2 www.unittest.com/home?query2
Update of nested arrays with positional $ indexes (doesn't work)
Now, I'd like to substitute the explicit 0 with the positional $ operator just like I did in order to have the server find the desired subdoc element of array paths (paths.$.queries).
AFAIU the documentation, this should work as the crucial thing is to specify a "correct" query selector:
The positional $ operator, when used with the update() method and acts as a placeholder for the first match of the update query selector:
I think I specified a query selector that does find the correct nested element (due to the paths.queries.name="query1" part):
q <- list("_id"="abcd", paths.path="home", paths.queries.name="query1")
I guess translated to "plain MongoDB" syntax, the query selector looks somewhat like this
{ _id: abcd, paths.path: home, paths.queries.name: query1 }
which seems like a valid query selector to me. In fact it does match the desired element/doc:
> !is.null(mongo.find.one(mongo=con, ns=ns, query=q))
[1] TRUE
My thought was that if it works on the top-level, why shouldn't it work for higher levels as well (as long as the query selector points to the right nested components)?
However, the server doesn't seem to like a nested or multiple use of $:
b <- list("$push"=list("paths.$.queries.$.requests"=list(time="2013-02-14")))
> mongo.bson.from.list(b)
$push : 3
paths.$.queries.$.requests : 3
time : 2 2013-02-14
> mongo.update(mongo=con, ns, criteria=q, objNew=b)
[1] FALSE
I'm not sure if it doesn't work because MongoDB doesn't support this or if I didn't get the R syntax right.
The positional operator only supports one level deep and only the first matching element.
There is a JIRA trackable for the sort of behaviour you want here: https://jira.mongodb.org/browse/SERVER-831
I am unsure if it will allow for more than one match but I believe it will due to the dynamics of how it will need to work.
In case you can execute your query from the MongoDB shell you can bypass this limitation by taking advantage of MongoDB cursor's forEach function (http://docs.mongodb.org/manual/reference/method/cursor.forEach/)
Here is an example with 3 nested arrays:
var collectionNameCursor = db.collection_name.find({...});
collectionNameCursor.forEach(function(collectionDocument) {
var firstArray = collectionDocument.firstArray;
for(var i = 0; i < firstArray.length; i++) {
var secondArray = firstArray[i].secondArray;
for(var j = 0; j < secondArray.length; j++) {
var thirdArray = secondArray[j].thirdArray;
for(var k = 0; k < thirdArray.length; k++) {
//... do some logic here with thirdArray's elements
db.collection_name.save(collectionDocument);
}
}
}
});
Note that this is more of a one time solution then a production code but it's going to do the job if you have to write a fix-up script.
As #FooBar mentioned in the comments of the accepted answer, this feature was implemented in 2017 with MongoDB 3.6.
To do so, you must to use positional filters with arrayFilters conditions.
Applied to your example:
updateOne(
{ "paths.home": "home" },
{ $push : {
"paths.$.queries.$[q].requests": { time: "2022-11-15" }
}
},
{ arrayFilters: [{ "q.name": "name" }] }
)
The postional operator $ refers to the filter { "paths.home": "home" }. Then, the positional filter $[q] refers to the arrayFilter { "q.name": "name" }.
Using this method, you can add as many positional filters as needed, as long as you put the condition in arrayFilters.
However, looking through the documentation of rmongodb, using arrayFilters is not possible at the moment. Alternatively, you could use another R package that has this feature implemented, such as Mongolite.
Related
I want to combine the datasets within a single hdf5 file to form one dataset in a seperate file, but am struggling to set the dtype of the new dataset. I am getting the error AttributeError: 'Group' object has no attribute 'dtype' on the line with ds_0_dtype = h5f1[ds].dtype. the code below (based on some example code posted on stackoverflow)
with
h5py.File('xxx_xxx_signals.hdf5','r') as h5f1 , \
h5py.File('file2.h5','w') as h5f2 :
for i, ds in enumerate(h5f1.keys()) :
if i == 0:
ds_0 = ds
ds_0_dtype = h5f1[ds].dtype
n_rows = h5f1[ds].shape[0]
n_cols = h5f1[ds].shape[1]
else:
if h5f1[ds].dtype != ds_0_dtype :
print(f'Dset 0:{ds_0}: dtype:{ds_0_dtype}')
print(f'Dset {i}:{ds}: dtype:{h5f1[ds].dtype}')
sys.exit('Error: incompatible dataset dtypes')
if h5f1[ds].shape[0] != n_rows :
print(f'Dset 0:{ds_0}: shape[0]:{n_rows}')
print(f'Dset {i}:{ds}: shape[0]:{h5f1[ds].shape[0]}')
sys.exit('Error: incompatible dataset shape')
n_cols += h5f1[ds].shape[1]
prev_ds = ds
h5f2.create_dataset('ds_xxxx', dtype=ds_0_dtype, shape=(n_rows,n_cols), maxshape=(n_rows,None))
first = 0
for ds in h5f1.keys() :
xfer_arr = h5f1[ds][:]
last = first + xfer_arr.shape[1]
h5f2['ds_xxxx'][:, first:last] = xfer_arr[:]
first = last
Likely you have 1 or more Groups in addition to Datasets at the Root level. h5f1.keys() accesses all Nodes -- which can be Datasets or Groups. You need to add a test to skip over Groups. You do this with an isinstance() logic test. Something like this:
else:
if not isinstance(h5f1[ds], h5py.Dataset) :
print(f'Node 0:{ds_0}: is not a dataset')
sys.exit('Error: unexpected Group; only Datasets expected')
if h5f1[ds].dtype != ds_0_dtype :
Once you know how to identify groups, you can also modify code to avoid copying them to the second file. However, that may not be your desired result. I have an extended SO post on using isinstance(). See this link:
Is there a way to get datasets in all groups at once in h5py?
So if I have two arrays in matlab. Let's call them locations1 and locations2
locations1
1123.44977625437 890.824688325172
1290.31273560851 5065.65794385883
1718.10632735926 2563.44895531365
1734.55379433782 4408.20631924691
2050.70084480064 1214.45353443990
2299.46239346717 3781.34694047196
4186.02801290113 4386.67818566045
5676.10649593031 4529.23023993815
locations2
7474.22619378039 3166.41503120846
8604.40241305284 5069.40744277799
9048.25231808890 2563.58997620248
9059.71923042408 4381.75034710351
9643.05902166767 3796.42822996919
11460.8617087264 4392.85930695209
And I want to make it so that any two entries of the second columns that match each other within 100.0 remain while any entry that has no match will get removed. So I want the output to look like
locations1
1290.31273560851 5065.65794385883
1718.10632735926 2563.44895531365
1734.55379433782 4408.20631924691
2299.46239346717 3781.34694047196
4186.02801290113 4386.67818566045
locations2
8604.40241305284 5069.40744277799
9048.25231808890 2563.58997620248
9059.71923042408 4381.75034710351
9643.05902166767 3796.42822996919
11460.8617087264 4392.85930695209
How would I do this? Preferably without loops. Here is what I've done, but it has loops
locround1=round(locations1/50)*50;
locround2=round(locations2/50)*50;
for i=1:size(locations1,1)
nodel1(i)=sum(locround1(i,2)== locround2(:,2))
end
nodel1=repmat(nodel1>0,[2,1]);
nodel1=nodel1';
locations1=nodel1.*locations1;
locations1( ~any(locations1,2), : ) = [];
for i=1:size(locations2,1)
nodel2(i)=sum(locround2(i,2)== locround1(:,2))
end
nodel2=repmat(nodel2>0,[2,1]);
nodel2=nodel2';
locations2=nodel2.*locations2;
locations2( ~any(locations2,2), : ) = [];
This is what I got. If your MATLAB version has set operators, you can do it with the following codes:
Li1 = ismembertol(locations1(:,2),locations2(:,2),100, 'DataScale', 1);
locations1_new = locations1 (Li1,:);
Li2 = ismembertol(locations2(:,2),locations1(:,2),100, 'DataScale', 1);
locations2_new = locations2 (Li2,:);
I tested it, it works.
Let the data be defined as
locations1 = [
1123.44977625437 890.824688325172
1290.31273560851 5065.65794385883
1718.10632735926 2563.44895531365
1734.55379433782 4408.20631924691
2050.70084480064 1214.45353443990
2299.46239346717 3781.34694047196
4186.02801290113 4386.67818566045
5676.10649593031 4529.23023993815
];
locations2 = [
7474.22619378039 3166.41503120846
8604.40241305284 5069.40744277799
9048.25231808890 2563.58997620248
9059.71923042408 4381.75034710351
9643.05902166767 3796.42822996919
11460.8617087264 4392.85930695209
];
threshold = 100;
Then:
m = abs(locations1(:,2)-locations2(:,2).')<=threshold;
result1 = locations1(any(m,2),:);
result2 = locations2(any(m,1),:);
How this works:
The first line computes a matrix with the distance between each value from the second column of locations1 and each value from the second column of locations2. The distances are then compared with threshold, so that the matrix entries become true or false.
This makes use of implicit expansion, introduced in R2016b. For Matlab versions before that, use bsxfun as follows:
m = abs(bsxfun(#minus, locations1(:,2), locations2(:,2).'))<=threshold;
Each row of the computed matrix, m, corresponds to a value from locations1; and each column corresponds to a value from locations2.
The second line uses logical indexing to select the rows of location1 that satisfy the criterion for some value of location2.
Similarly, the third line selects the rows of location2 that satisfy the criterion for some value of location1.
I want to order an array. The JSONata expression below has an incoming array as follows.
[{"id":"Air-1a",
"Controller":"ESP62",
"Cntr-TaskNo":10,
"Cntr-GPIO":13,
"name":"Air",
"valueName":"Humidity",
"Sensor":"DHT22",
(and many other key pairs)},
{next object}, ...]
I then transform the array with the following JSONata expression:
payload.(
{ "Controller" : $.Controller,
"Cntr-TaskNo": $.CntrDef.TaskNo,
"Cntr-GPIO" : $.CntrDef.GPIO,
"name" : $.name,
"valueName" : $.valueName,
"Sensor" : $.Sensor,
"id" : $.id
}
)
But now I want to - in the same JSONata expression, sort on firstly the Controller, and then the GPIO. To tried with the Controller only first.
I tried:
payload.(
{ $sort("Controller",function($l, $r){$l.Controller > $r.Controller}) : $.Controller ,
"Cntr-TaskNo": $.CntrDef.TaskNo,
"Cntr-GPIO" : $.CntrDef.GPIO,
"name" : $.name,
"valueName" : $.valueName,
"Sensor" : $.Sensor,
"id" : $.id
}
)
As well as trying to add the sort function at the end with the ~> chaining command. I also tried the order-by operator.
Could anyone point me in the right direction?
//----------
The new flow with the changed 'ESP62' to '-' that does not work:
[{"id":"874b0c77.f87418","type":"inject","z":"6f27a311.d135bc","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":200,"y":180,"wires":[["8c196590.c20638"]]},{"id":"8c196590.c20638","type":"change","z":"6f27a311.d135bc","name":"Dataset","rules":[{"t":"set","p":"payload","pt":"msg","to":"[{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":10,\"CntrGPIO\":13,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"111\",\"bbb\":\"222\",\"ccc\":\"333\"},{\"id\":\"Air-2a\",\"Controller\":\"ESP72\",\"CntrTaskNo\":11,\"CntrGPIO\":14,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"444\",\"bbb\":\"555\",\"ccc\":\"666\"},{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":2,\"CntrGPIO\":9,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"},{\"id\":\"Air-1a\",\"Controller\":\"-\",\"CntrTaskNo\":10,\"CntrGPIO\":12,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"}]","tot":"json"}],"action":"","property":"","from":"","to":"","reg":false,"x":360,"y":180,"wires":[["13981162.14e28f"]]},{"id":"c8a256a5.a170c8","type":"debug","z":"6f27a311.d135bc","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":690,"y":180,"wires":[]},{"id":"13981162.14e28f","type":"change","z":"6f27a311.d135bc","name":"Jsonata $sort","rules":[{"t":"set","p":"payload","pt":"msg","to":"($sort(payload,function($l , $r){$l.Controller > $r.Controller}) ; \t$sort(payload,function($l , $r){$l.CntrGPIO > $r.CntrGPIO}))","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":180,"wires":[["c8a256a5.a170c8"]]}]
I suggest first sorting the dataset and afterward transform the already sorted array of objects. The transformation is trivial and you want to know how to sort, so I show below one possible solution. It uses an expression with two concatenated $sort functions.
Edited after a better understanding of the requirement.
I tested successfully a Node-RED flow using this expression in a change node:
($a := $sort(payload,function($l , $r){$l.Controller > $r.Controller}) ; $sort($a,function($l , $r){(($l.Controller = $r.Controller) and ($l.CntrGPIO > $r.CntrGPIO))}))
Flow (contain dataset set hardcoded):
[{"id":"a7814b7e.3adeb8","type":"tab","label":"Flow 4","disabled":false,"info":""},{"id":"8bf10833.c71748","type":"inject","z":"a7814b7e.3adeb8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":140,"wires":[["9e365564.edca08"]]},{"id":"9e365564.edca08","type":"change","z":"a7814b7e.3adeb8","name":"Dataset","rules":[{"t":"set","p":"payload","pt":"msg","to":"[{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":10,\"CntrGPIO\":13,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"111\",\"bbb\":\"222\",\"ccc\":\"333\"},{\"id\":\"Air-2a\",\"Controller\":\"ESP72\",\"CntrTaskNo\":11,\"CntrGPIO\":14,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"444\",\"bbb\":\"555\",\"ccc\":\"666\"},{\"id\":\"Air-1a\",\"Controller\":\"ESP62\",\"CntrTaskNo\":2,\"CntrGPIO\":9,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"},{\"id\":\"Air-1a\",\"Controller\":\"-\",\"CntrTaskNo\":10,\"CntrGPIO\":12,\"name\":\"Air\",\"valueName\":\"Humidity\",\"Sensor\":\"DHT22\",\"aaa\":\"777\",\"bbb\":\"888\",\"ccc\":\"999\"}]","tot":"json"}],"action":"","property":"","from":"","to":"","reg":false,"x":300,"y":140,"wires":[["762f6421.074fec"]]},{"id":"f827bddb.c9acd","type":"debug","z":"a7814b7e.3adeb8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":630,"y":140,"wires":[]},{"id":"762f6421.074fec","type":"change","z":"a7814b7e.3adeb8","name":"Jsonata $sort","rules":[{"t":"set","p":"payload","pt":"msg","to":"($a := $sort(payload,function($l , $r){$l.Controller > $r.Controller}) ; $sort($a,function($l , $r){(($l.Controller = $r.Controller) and ($l.CntrGPIO > $r.CntrGPIO))}))","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":460,"y":140,"wires":[["f827bddb.c9acd"]]}]
also tested in Jsonata exerciser: http://try.jsonata.org/S1IlT3y-E
You can sort the array using the following expression:
payload^(Controller, CntrDef.GPIO)
The order-by operator ^ will sort the array, first by increasing value of Controller, then by increasing value of CntrGPIO. You can then transform each object within that array
payload^(Controller, CntrDef.GPIO).{
"Controller" : Controller,
"Cntr-TaskNo": CntrDef.TaskNo,
"Cntr-GPIO" : CntrDef.GPIO,
"name" : name,
"valueName" : valueName,
"Sensor" : Sensor,
"id" : id
}
l have three arrays namely:
ref_labels=array(['hammerthrow_g10_c07', 'wallpushups_g08_c04', 'archery_g09_c03',..., 'frisbeecatch_g09_c03', 'tabletennisshot_g12_c01',
'surfing_g10_c03'], dtype='<U26')
ref_labels is of shape (3000,)
ref_labels is the reference order for two other arrays namely :
to_be_ordered_labels=array(['walkingwithdog_g08_c01', 'nunchucks_g13_c02', ....,'javelinthrow_g09_c03', 'playingflute_g12_c04', 'benchpress_g12_c02', 'frisbeecatch_g14_c01', 'jumpingjack_g13_c07', 'handstandpushups_g08_c05'], dtype='<U28')
Which is of shape (3000,)py
I have also a numpy array of float
to_be_ordered_arrays_of_float which is of shape (3000,101)
Here is a sample from
to_be_ordered_arrays_of_float[0]
array([6.80778456e-08, 1.58984292e-08, 2.69517453e-09, 2.82882096e-09,
1.35314554e-06, 2.66444680e-08, 1.96892984e-06, 1.64217184e-07,
2.40923086e-08, 2.35174169e-09, 1.45098711e-09, 2.10457629e-09,
6.51394956e-08, 4.71427897e-10, 2.48873818e-07, 2.25375985e-08,
1.56526866e-07, 5.60892097e-08, 1.95728759e-07, 7.24156690e-09,
1.33053675e-06, 1.06113225e-08, 3.07328882e-08, 1.58847371e-07,
1.85805094e-09, 4.20591455e-08, 9.77163683e-09, 5.33082073e-07,
4.52592142e-09, 6.20161609e-06, 4.25105497e-08, 8.63415792e-08,
1.98478956e-05, 5.02593911e-10, 9.98565793e-01, 2.76135781e-09,
3.33678649e-08, 2.11770342e-07, 8.09025558e-09, 3.98751210e-09,
8.28181399e-08, 9.51544799e-09, 9.00462692e-06, 3.11626500e-05,
4.00733006e-06, 2.63792316e-07, 8.75839589e-07, 6.86739767e-08,
1.00570272e-08, 4.86615797e-08, 2.16352909e-08, 2.04790371e-08,
1.72958153e-07, 5.78688697e-09, 4.83830753e-09, 3.75843297e-06,
6.00361894e-09, 8.48605123e-06, 1.46872461e-08, 2.71486789e-09,
2.72728915e-08, 9.99970240e-09, 2.69397837e-08, 5.73341836e-08,
3.06793368e-09, 3.16495052e-10, 5.69838967e-08, 1.04099172e-07,
7.12405024e-09, 1.70841350e-08, 1.58363335e-07, 7.10246439e-09,
1.65444236e-09, 3.54519578e-08, 5.11049834e-08, 9.68790381e-09,
2.10373469e-06, 1.54864466e-09, 2.11581687e-06, 4.93066139e-08,
1.78782467e-09, 3.54902490e-08, 1.40120218e-08, 1.82792789e-07,
8.51292086e-08, 9.88524320e-08, 3.18586721e-08, 3.76303788e-08,
1.85764435e-08, 6.87650381e-09, 2.80555332e-06, 2.55424425e-06,
1.33028883e-03, 2.45268382e-07, 1.37083349e-08, 3.04683105e-08,
1.82895951e-06, 4.65470373e-09, 6.83182293e-08, 3.18085824e-08,
2.54011603e-08], dtype=float32)
My question is how can l reorder to_be_ordered_labels , to_be_ordered_arrays_of_float given the order in ref_labels ?
What l have tried ?
I created a random array in order to build a dictionary where ref_labels represent keys then reorder as follow :
random_arrays=np.random.rand(3000,101)
dic1=dict(zip(ref_labels,random_arrays))
dic2=dict(zip(to_be_ordered_labels,to_be_ordered_arrays_of_float))
ordered_dic2=sorted(dic2.items(), key=lambda kv: dic1[kv[0]])
However l get the following error :
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Thank you for your help
My question is how can l reorder to_be_ordered_labels , to_be_ordered_arrays_of_float given the order in ref_labels ?
If I understand it correctly, what you want to do is the following:
import numpy as np
ref = np.array(['labels', 'that', 'define', 'the', 'order'])
other_labels = np.array(['other', 'labels', 'to', 'be', 'sorted'])
rand_data = np.random.randn(5, 10)
idx_sort = np.argsort(ref)
sorted_labels = other_labels[idx_sort]
rand_data = rand_data[idx_sort, :]
If to you want to have an ordered dict, you might want to check the OrderedDict class from the collections library.
I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})