I wrote a code that splits a dataframe data according to a factor a and for each level of the factor returns an anova table for the factor b.
for (i in 1:length(levels(data$a))){
levels<-levels(data$a)
assign(paste("data_", levels[i], sep = ""), subset(data, a==levels[i]))
print (levels[i])
print(anova(lm(var~b, subset(data, a==levels[i]))))
}
The result is exactly what I want, but I would like to have all the anova tables pooled and returned as a unique list or data frame.
Anyone can help?
Apparently this code does the trick:
result_anova<-data.frame()
for (i in 1:length(levels(data$a))){
levels<-levels(data$a)
assign(paste("data_", levels[i], sep = ""), subset(data, a==levels[i]))
result<-as.data.frame(anova(lm(var~b, subset(data, a==levels[i]))))
result_anova[i, 1]<-levels[i]
result_anova[i, 2]<-result[1, 1 ]
result_anova[i, 3]<-result[1, 2 ]
result_anova[i, 4]<-result[1, 3 ]
result_anova[i, 5]<-result[1, 4 ]
result_anova[i, 6]<-result[1, 5 ]
result_anova[i, 7]<-result[2, 1 ]
result_anova[i, 8]<-result[2, 2 ]
result_anova[i, 9]<-result[2, 3 ]
result_anova[i, 10]<-result[2, 4 ]
result_anova[i, 11]<-result[2, 5 ]
colnames(result_anova_genos)<-c ( "genotype", "Df_fac", "Sum_Sq_fac", "Mean_Sq_fac", "F_value_fac", "Pr(>F)_fac", "Df_res", "Sum_Sq_res", "Mean_Sq_res", "F_value_res", "Pr(>F)_res")
}
Please vote this answer or let me know if this code can be improved.
Related
I'm trying to filter and output from JSON with jq.
The API will sometime return an object and sometime an array, I want to catch the result using an if statement and return empty string when the object/array is not available.
{
"result":
{
"entry": {
"id": "207579",
"title": "Realtek Bluetooth Mesh SDK on Linux\/Android Segmented Packet reference buffer overflow",
"summary": "A vulnerability, which was classified as critical, was found in Realtek Bluetooth Mesh SDK on Linux\/Android (the affected version unknown). This affects an unknown functionality of the component Segmented Packet Handler. There is no information about possible countermeasures known. It may be suggested to replace the affected object with an alternative product.",
"details": {
"affected": "A vulnerability, which was classified as critical, was found in Realtek Bluetooth Mesh SDK on Linux\/Android (the affected version unknown).",
"vulnerability": "The manipulation of the argument reference with an unknown input leads to a unknown weakness. CWE is classifying the issue as CWE-120. The program copies an input buffer to an output buffer without verifying that the size of the input buffer is less than the size of the output buffer, leading to a buffer overflow.",
"impact": "This is going to have an impact on confidentiality, integrity, and availability.",
"countermeasure": "There is no information about possible countermeasures known. It may be suggested to replace the affected object with an alternative product."
},
"timestamp": {
"create": "1661860801",
"change": "1661861110"
},
"changelog": [
"software_argument"
]
},
"software": {
"vendor": "Realtek",
"name": "Bluetooth Mesh SDK",
"platform": [
"Linux",
"Android"
],
"component": "Segmented Packet Handler",
"argument": "reference",
"cpe": [
"cpe:\/a:realtek:bluetooth_mesh_sdk"
],
"cpe23": [
"cpe:2.3:a:realtek:bluetooth_mesh_sdk:*:*:*:*:*:*:*:*"
]
}
}
}
Would also like to to use the statement globally for the whole array output so I can parse it to .csv and escape the null, since sofware name , can also contain an array or an object. Having a global if statement with simplify the syntax result and suppress the error with ?
The error i received from bash
jq -r '.result [] | [ "https://vuldb.com/?id." + .entry.id ,.software.vendor // "empty",(.software.name | if type!="array" then [.] | join (",") else . //"empty" end )?,.software.type // "empty",(.software.platform | if type!="array" then [] else . | join (",") //"empty" end )?] | #csv' > tst.csv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7452 0 7393 100 59 4892 39 0:00:01 0:00:01 --:--:-- 4935
jq: error (at <stdin>:182): Cannot iterate over null (null)
What I have tried is the following code which i tried to demo https://jqplay.org/ which is incorrect syntax
.result [] |( if .[] == null then // "empty" else . end
| ,software.name // "empty" ,.software.platform |if type!="array" then [.] // "empty" else . | join (",") end)
Current output
[
[
"Bluetooth Mesh SDK"
],
"Linux,Android"
]
Desired outcome
[
"Bluetooth Mesh SDK",
"empty"
]
After fixing your input JSON, I think you can get the desired output by using the following JQ filter:
if (.result | type) == "array" then . else (.result |= [.]) end \
| .result[].software | [ .name, (.platform // [ "Empty" ] | join(",")) ]
Where
if (.result | type) == "array" then . else (.result |= [.]) end
Wraps .result in an array if type isn't array
.result[].software
Loops though the software in each .result obj
[ .name, (.platform // [ "Empty" ] | join(",")) ]
Create an array with .name and .platform (which is replaced by [ "Empty" ] when it doesn't exist. Then it's join()'d to a string
Outcome:
[
"Bluetooth Mesh SDK",
"Linux,Android"
]
Online demo
Or
[
"Bluetooth Mesh SDK",
"Empty
]
Online demo
I have a database with 1 table that holds hundreds of records. I need to make a for loop in groovy script that compares first record with second record, second record with third record, etc. i need to compare length changes between records and print out all changes that is higher than 30. Example - first record 30m, second record 40m, third record 100m. It will print out second-third record.
I dont know amount of records in table, so i dont know how to create for loop. Any suggestions?
Also records has ip. Each ip can be multiple times and i need to compare all records in each ip.
record 1:
port_nbr | 1
pair | pairA
length | 30.00
add_date | 2020-06-16 00:01:13.237164
record 2:
port_nbr | 1
pair | pairA
length | 65.00
add_date | 2020-06-16 00:02:13.237164
record 3:
port_nbr | 2
pair | pairc
length | 65.00
add_date | 2020-06-16 00:02:13.237164
I expect that for loop checks if current record port_nbr is the same with next record, if yes, then it checks if pair is same and if its the same, then he compares if length changed for 30+m. In this case it would output that there is 30+m change in 1/2 record. After outputing it, then it would compare second record and third record. But they doesnt have same port_nbr and pair, so i expect it to start comparing again all port_nbr that is 2 with all following records.
There could be even 10 records with port_nbr 1, but with different pairs. I need to check for pairs aswell and only then compare lengths.
My code at this moment:
import java.sql.*;
import groovy.sql.Sql
class Main{
static void main(String[] args) {
def dst_db1 = Sql.newInstance('connection.........')
dst_db1.getConnection().setAutoCommit(false)
def sql = (" select d.* from (select d.*, lead((case when length <> 'N/A' then length else length_to_fault end)::float) over (partition by port_nbr, pair order by port_nbr, pair, d.add_date) as lengthh from diags d)d limit 10")
def lastRow = [id:-1, port_nbr:-1, pair:'', lengthh:-1.0]
dst_db1.eachRow( sql ) {row ->
if( row.port_nbr == lastRow.port_nbr && row.pair == lastRow.pair){
BigDecimal lengthChange =
new BigDecimal(row.lengthh ? row.lengthh : 0 ) - new BigDecimal(lastRow.lengthh ? lastRow.lengthh :0 )
if( lengthChange > 30.0){
print "Port ${row.port_nbr}, ${row.pair} length change: $lengthChange"
println "/tbetween row ID ${lastRow.id} and ${row.id}"
}
lastRow = row
}else{
println "Key Changed"
lastRow = row
}
}
}
}
The following code will report length changes > 30 within the same port_nbr and pair.
def sql = 'Your SQL here.' // Should include "order by pair, port_nbr, date"
def lastRow = [id:-1, port_nbr:-1, pair:'', length:-1.0]
dst_db1.eachRow( sql ) { row ->
if ( row.port_nbr == lastRow.port_nbr && row.pair == lastRow.pair ) {
BigDecimal lengthChange =
new BigDecimal( row.length ) - new BigDecimal( lastRow.length )
if ( lengthChange > 30.0 ) {
print "Port ${row.port_nbr}, ${row.pair} length change: $lengthChange"
println "\tbetween row ID ${lastRow.id} and ${row.id}"
}
lastRow = row
} else {
println "Key changed"
lastRow = row
}
}
To run the above code without a database I prefixed it with this test code:
class DstDb1 {
def eachRow ( sql, closure ) {
rows.each( closure )
}
def rows = [
[id: 1, port_nbr: 1, pair: 'pairA', length: 30.00 ],
[id: 2, port_nbr: 1, pair: 'pairA', length: 65.00 ],
[id: 3, port_nbr: 1, pair: 'pairA', length: 70.00 ],
[id: 4, port_nbr: 1, pair: 'pairA', length: 75.00 ],
[id: 5, port_nbr: 1, pair: 'pairB', length: 130.00 ],
[id: 6, port_nbr: 1, pair: 'pairB', length: 165.00 ],
[id: 7, port_nbr: 1, pair: 'pairB', length: 170.00 ],
[id: 8, port_nbr: 1, pair: 'pairB', length: 175.00 ],
[id: 9, port_nbr: 2, pair: 'pairC', length: 230.00 ],
[id:10, port_nbr: 2, pair: 'pairC', length: 265.00 ],
[id:11, port_nbr: 2, pair: 'pairC', length: 270.00 ],
[id:12, port_nbr: 2, pair: 'pairC', length: 350.00 ]
]
}
DstDb1 dst_db1 = new DstDb1()
Running the test gives this result:
Key changed
Port 1, pairA length change: 35 between row ID 1 and 2
Key changed
Port 1, pairB length change: 35 between row ID 5 and 6
Key changed
Port 2, pairC length change: 35 between row ID 9 and 10
Port 2, pairC length change: 80 between row ID 11 and 12
I'm trying to export a dataset to a JSON file. With PROC JSON every row in my dataset is exported nicely.
What I want to do is to add an array into each exported object with data from a specific column.
My dataset has structure like this:
data test;
input id $ amount $ dimension $;
datalines;
1 x A
1 x B
1 x C
2 y A
2 y X
3 z C
3 z K
3 z X
;
run;
proc json out='/MYPATH/jsontest.json' pretty nosastags;
export test;
run;
And the exported JSON object looks, obviously, like this:
[
{
"id": "1",
"amount": "x",
"dimension": "A"
},
{
"id": "1",
"amount": "x",
"dimension": "B"
},
{
"id": "1",
"amount": "x",
"dimension": "C"
},
...]
The result I want:
For each id I would like to insert all of the data from the dimension column into an array so my output would look this this:
[
{
"id": "1",
"amount": "x",
"dimensions": [
"A",
"B",
"C"
]
},
{
"id": "2",
"amount": "y",
"dimensions": [
"A",
"X"
]
},
{
"id": "3",
"amount": "z",
"dimensions": [
"C",
"K",
"X"
]
}
]
I've not been able to find a scenario like this or some guidelines on how to solve my problem. I hope somebody can help.
/Crellee
There are other methods for json output, including
hand-coded emitter in DATA Step
JSON package in Proc DS2
Here is an example of a hand-coded emitter for your data and desired mapping.
data _null_;
file 'c:\temp\test.json';
put '[';
do group_counter = 1 by 1 while (not end_of_data);
if group_counter > 1 then put #2 ',';
put #2 '{';
do dimension_counter = 1 by 1 until (last.amount);
set test end=end_of_data;
by id amount;
if dimension_counter = 1 then do;
q1 = quote(trim(id));
q2 = quote(trim(amount));
put
#4 '"id":' q1 ","
/ #4 '"amount":' q1 ","
;
put #4 '"dimensions":' / #4 '[';
end;
else do;
put #6 ',' #;
end;
q3 = quote(trim(dimension));
put #8 q3;
end;
if dimension_counter > 1 then put #4 '}';
put #2 ']';
end;
put ']';
stop;
run;
Such an emitter can be macro-ized and generalized to handle specifications of data=, by= and arrayify=. Not a path recommended for friends.
You can try concatenating / grouping the text before calling proc json.
I don't have proc json in my SAS environment, but try this step and see it works for you:
data want;
set test (rename=(dimension=old_dimension));
Length dimension $200. ;
retain dimension ;
by id amount notsorted;
if first.amount = 1 then do; dimension=''; end;
if last.amount = 1 then do; dimension=catx(',',dimension,old_dimension); output; end;
else do; dimension=catx(',',dimension,old_dimension); end;
drop old_dimension;
run;
Output:
id=1 amount=x dimension=A,B,C
id=2 amount=y dimension=A,X
id=3 amount=z dimension=C,K,X
I'll try my best to explain the situation.
I have the following db columns:
oid - task - start - end - realstart - realend
My requirement is to have an output like the following:
oid1 - task1 - start1 - end1
oid2 - task2 - start2 - end2
where task1 is task, task2 is task + "real", start1 is start, start2 is realstart, end1 is end, end2 is realend
BUT
the first row should always be created (those start/end fields are never empty) the second row should only be created if realstart and realend exist which may not be true.
Inputs are 6 arrays (one for each column), Outputs must be 4 arrays, something like this:
#input oid,task,start,end,realstart,realend
#output oid,task,start,end
I was thinking about using something like oid.each but I don't know how to add nodes after the current one. Order is important in the requirement.
For any explanation please ask, thanks!
After your comment and understanding that you don't want (or cannot) change the input/output data format, here's another solution that does what you've asked using classes to group the data and make it easier to manage:
import groovy.transform.Canonical
#Canonical
class Input {
String[] oids = [ 'oid1', 'oid2' ]
String[] tasks = [ 'task1', 'task2' ]
Integer[] starts = [ 10, 30 ]
Integer[] ends = [ 20, 42 ]
Integer[] realstarts = [ 12, null ]
Integer[] realends = [ 21, null ]
List<Object[]> getEntries() {
// ensure all entries have the same size
def entries = [ oids, tasks, starts, ends, realstarts, realends ]
assert entries.collect { it.size() }.unique().size() == 1,
'The input arrays do not all have the same size'
return entries
}
int getSize() {
oids.size() // any field would do, they have the same length
}
}
#Canonical
class Output {
List oids = [ ]
List tasks = [ ]
List starts = [ ]
List ends = [ ]
void add( oid, task, start, end, realstart, realend ) {
oids << oid; tasks << task; starts << start; ends << end
if ( realstart != null && realend != null ) {
oids << oid; tasks << task + 'real'; starts << realstart; ends << realend
}
}
}
def input = new Input()
def entries = input.entries
def output = new Output()
for ( int i = 0; i < input.size; i++ ) {
def entry = entries.collect { it[ i ] }
output.add( *entry )
}
println output
Responsibility of arranging the data is on the Input class, while the responsibility of knowing how to organize the output data is in the Output class.
Running this code prints:
Output([oid1, oid1, oid2], [task1, task1real, task2], [10, 12, 30], [20, 21, 42])
You can get the arrays (Lists, actually, but call toArray() if on the List to get an array) from the output object with output.oids, output.tasks, output.starts and output.ends.
The #Canonical annotation just makes the class implement equals, hashCode, toString and so on...
If you don't understand something, ask in the comments.
IF you need an "array" whose size you don't know from the start, you should use a List instead. But in Groovy, that's very easy to use.
Here's an example:
final int OID = 0
final int TASK = 1
final int START = 2
final int END = 3
final int R_START = 4
final int R_END = 5
List<Object[]> input = [
//oid, task, start, end, realstart, realend
[ 'oid1', 'task1', 10, 20, 12, 21 ],
[ 'oid2', 'task2', 30, 42, null, null ]
]
List<List> output = [ ]
input.each { row ->
output << [ row[ OID ], row[ TASK ], row[ START ], row[ END ] ]
if ( row[ R_START ] && row[ R_END ] ) {
output << [ row[ OID ], row[ TASK ] + 'real', row[ R_START ], row[ R_END ] ]
}
}
println output
Which outputs:
[[oid1, task1, 10, 20], [oid1, task1real, 12, 21], [oid2, task2, 30, 42]]
I am a beginner with R. My situation is I have a JSON dataset with a nested array. In the JSON file, one institution looks like this:
{
"website": "www.123.org",
"programs": [
{
"website": "www.111.com",
"contact": "Jim"
},
{
"website": "www.222.com",
"contact": "Han"
}
]
}
To each institution, there may be one program or may be more. I have more than 100 hundreds institution and nearly two hundreds programs in the JSON. I want to ad id for each institution and idpr for each program. Finally, i hope i can get a data.frame that looks like:
id idpr website websitepr contactpr
1 1 www.123.org www.111.com Jim
1 2 www.123.org www.222.com Han
2 1 www.345.org www.aaa.com Lily
3 1 www.567.org www.bbb.com Jack
3 2 www.567.org www.ccc.com Mike
3 3 www.567.org www.ddd.com Minnie
.........
I tried to write a nested loop like this:
count<-0
for (n in json_data){
count<-count+1
id<-c(id,count)
website<-c(website,n$website)
countpr<-1
for (i in n$programs){
id<-c(id,count)
website<-c(website,n$website)
idpr<-c(idpr,countpr)
websitepr<-c(websitepr,i$website)
contactpr<-c(contactpr,i$contact)
countpr<-countpr+1
}
}
but this nested loop can not give me the result i want. Thanks for helping me!
Try this:
# sample data
json.file <- textConnection('[{"website":"www.123.org","programs":[{"website":"www.111.com","contact":"Jim"},{"website":"www.222.com","contact":"Han"}]},{"website":"www.345.org","programs":[{"website":"www.aaa.com","contact":"Lily"}]},{"website":"www.567.org","programs":[{"website":"www.bbb.com","contact":"Jack"},{"website":"www.ccc.com","contact":"Mike"},{"website":"www.ddd.com","contact":"Minnie"}]}]')
# read the data into an R nested list
library(rjson)
raw.data <- fromJSON(file = json.file)
# a function that will process one institution
process.one <- function(id, institution) {
website <- institution$website
websitepr <- sapply(institution$programs, `[[`, "website")
contactpr <- sapply(institution$programs, `[[`, "contact")
data.frame(id, idpr = seq_along(websitepr),
website, websitepr, contactpr)
}
# run the function on all institutions and put the pieces together
do.call(rbind, Map(process.one, seq_along(raw.data), raw.data))
# id idpr website websitepr contactpr
# 1 1 1 www.123.org www.111.com Jim
# 2 1 2 www.123.org www.222.com Han
# 3 2 1 www.345.org www.aaa.com Lily
# 4 3 1 www.567.org www.bbb.com Jack
# 5 3 2 www.567.org www.ccc.com Mike
# 6 3 3 www.567.org www.ddd.com Minnie
you can write a
class website {
//write all as data member
//program as object of class program
}
and use jackson api to convert it into string.
using Mapper.writeValueAsString(object of website)
jar needed are
1.jackson-core-2.0.2.jar
jackson-databind-2.0.2.jar.