I've got two arrays, and I want to subtract array2 from the other array1 to form sorted result array. Wondering if it's even possible. I've tried to search everywhere, but haven't found the solution that I know would know how to implement.
array1
NAME DATA1 DATA2 DATA3
MATT 6 2 4
ROBERT 3 2 1
JAKE 2 2 0
PETER 3 1 2
CHARLES 3 1 2
array2
NAME DATA1 DATA2 DATA3
MATT 6 2 4
JAKE 2 2 0
ROBERT 2 2 0
CHARLES 2 0 2
result array
NAME DATA1 DATA2 DATA3
PETER 3 1 2
CHARLES 1 1 0
ROBERT 1 0 1
MATT 0 0 0
JAKE 0 0 0
try:
=ARRAYFORMULA({A1:D1; QUERY(QUERY({A2:D6; IFERROR(A9:D12*-1, A9:D12)},
"select Col1,sum(Col2),sum(Col3),sum(Col4)
where Col1 is not null
group by Col1"),
"offset 1", 0)})
Related
I have the following dataset about the choices of different car brands and their attributes. I would like to create a matrix based on each attribute of the cars.
RespNum Task Concept Make Exterior.Design Interior.design
1 100086500 1 1 3 2 3
2 100086500 1 2 1 3 2
3 100086500 1 3 4 1 1
4 100086500 1 4 0 0 0
5 100086500 2 1 1 3 2
6 100086500 2 2 5 1 3
Driving.performance Driving.attributes Comfort Practibility Safety
1 1 1 1 3 3
2 3 3 3 2 1
3 2 2 2 1 2
4 0 0 0 0 0
5 3 2 1 1 3
6 1 3 3 3 2
Quality Equipment Sustainability Economy Price Response
1 2 1 1 3 1 0
2 1 3 3 1 3 0
3 3 2 2 2 2 1
4 0 0 0 0 0 0
5 3 2 1 1 4 0
6 1 3 3 3 8 0
I am using the function:
Make = attribcoding(6,4,'Other')
The first input (6) is the number of levels, the second (4) is the column position in the dataset, and the last ('Other') is the name of the outside option. However, I get the following error message:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
I have a nested loop in Stata with four levels of foreach statements. With this loop, I am trying to create a new variable named strata that ranges from 1 to 40.
foreach x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 {
foreach r in 1 2 3 4 5 {
foreach s in 1 2 {
foreach a in 1 2 3 4 {
gen strata= `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
I get an error :
"variable strata already defined"
Even with the error, the loop does assign strata = 1, but not the rest of the strata. All other cells are missing/empty.
Example data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age sex race)
1 2 2
1 2 1
1 1 1
1 1 1
1 2 1
2 2 1
2 2 1
4 2 1
1 2 1
4 2 1
3 2 1
2 2 1
4 2 1
4 2 2
3 2 1
4 1 3
4 2 1
4 2 1
2 1 2
4 2 1
2 2 1
3 2 1
3 2 1
1 2 3
4 2 1
1 2 5
4 2 1
4 2 1
4 2 2
4 2 1
2 2 1
4 1 1
3 2 1
1 2 1
2 2 1
4 2 1
1 2 2
2 2 3
1 1 3
4 2 1
2 2 3
1 2 1
1 1 1
2 2 3
1 2 1
1 1 3
1 2 1
2 2 1
3 2 1
1 2 1
4 2 1
1 2 2
1 2 1
2 2 1
4 2 1
4 2 1
1 2 1
1 2 1
4 2 1
2 2 1
4 2 1
1 2 1
1 1 3
2 2 1
1 1 1
4 1 1
3 2 1
2 2 1
1 2 1
1 1 1
2 2 3
4 2 2
2 2 1
2 2 1
3 2 1
2 2 2
3 2 1
2 1 1
1 1 1
3 2 1
1 2 3
4 2 1
4 2 1
2 2 1
1 2 1
1 1 1
3 2 1
4 2 1
2 2 3
1 2 3
4 2 1
3 2 1
2 2 1
4 2 1
3 2 1
2 1 1
1 2 1
2 2 1
2 2 3
1 1 1
end
label values sex sex
label def sex 1 "male (1)", modify
label def sex 2 "female (2)", modify
label values race race
label def race 1 "non-Hispanic white (1)", modify
label def race 2 "black (2)", modify
label def race 3 "AAPI/other (3)", modify
label def race 5 "Hispanic (5)", modify
generate is for generating new variables. The second time your code reaches a generate statement, the code fails for the reason given.
One answer is that you need to generate your variable outside the loops and then replace inside.
For other reasons your code can be rewritten in stages.
First, integer sequences can be more easily and efficiently specified with forvalues, which can be abbreviated: I tend to write forval.
gen strata = .
forval x = 1/40 {
forval r = 1/5 {
forval s = 1/2 {
forval a = 1/4 {
replace strata = `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
Second, the code is flawed any way. Everything ends up as 40!
Third, you can do allocations much more directly, say by
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
This is a self-contained reproducible demonstration:
clear
set obs 5
gen race = _n
expand 2
bysort race : gen sex = _n
expand 4
bysort race sex : gen age = _n
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
isid strata
Clearly you can and should vary the recipe for a different preferred scheme.
I'm looking for fastest way to get unique values in matrix with Matlab! I have a matrix like this:
1 2
1 2
1 3
1 5
1 23
2 1
3 1
3 2
3 2
3 2
4 17
4 3
4 17
and need to get something like this:
1 2
1 3
1 5
1 23
2 1
3 1
3 2
4 3
4 17
Actually I need unique values by combination of columns in each row.
Have a look at matlabs unique() function with the argument 'rows'.
C = unique(A,'rows')
https://de.mathworks.com/help/matlab/ref/unique.html
train.csv:
01kcPWA9K2BOxQeS5Rju 1
04EjIdbPV5e1XroFOpiN 1
05EeG39MTRrI6VY21DPd 1
05rJTUWYAKNegBk2wE8X 1
0AnoOZDNbPXIr2MRBSCJ 1
0AwWs42SUQ19mI7eDcTC 1
0cH8YeO15ZywEhPrJvmj 1
0DNVFKwYlcjO7bTfJ5p1 1
0DqUX5rkg3IbMY6BLGCE 1
0eaNKwluUmkYdIvZ923c 1
0fHVZKeTE6iRb1PIQ4au 1
0G4hwobLuAzvl1PWYfmd 1
test.csv:
01IsoiSMh5gxyDYTl4CB
01SuzwMJEIXsK7A8dQbl
01azqd4InC7m9JpocGv5
01jsnpXSAlgw6aPeDxrU
01kcPWA9K2BOxQeS5Rju
02IOCvYEy8mjiuAQHax3
02JqQ7H3yEoD8viYWlmS
02K5GMYITj7bBoAisEmD
02MRILoE6rNhmt7FUi45
02mlBLHZTDFXGa7Nt6cr
02zcUmKV16Lya5xqnPGB
03nJaQV6K2ObICUmyWoR
04BfoQRA6XEshiNuI7pF
04EjIdbPV5e1XroFOpiN
these type of rows and i want each row compare with train.csv rows and find the match where it match save the id against that row and output should be like this:
output.csv:
01kcPWA9K2BOxQeS5Rju 2
04EjIdbPV5e1XroFOpiN 2
05EeG39MTRrI6VY21DPd 4
05rJTUWYAKNegBk2wE8X 1
0AnoOZDNbPXIr2MRBSCJ 1
0AwWs42SUQ19mI7eDcTC 5
0cH8YeO15ZywEhPrJvmj 5
0DNVFKwYlcjO7bTfJ5p1 1
0DqUX5rkg3IbMY6BLGCE 3
0eaNKwluUmkYdIvZ923c 1
0fHVZKeTE6iRb1PIQ4au 1
0G4hwobLuAzvl1PWYfmd 2
Kindly help me
I have the following data
id pair_id id_in id_out date
1 1 2 3 1/1/2010
2 1 2 3 1/2/2010
3 1 3 2 1/3/2010
4 1 3 2 1/5/2010
5 1 3 2 1/7/2010
6 2 2 1 1/2/2010
7 3 1 3 1/5/2010
8 2 1 2 1/7/2010
At any given row I want to know what the inflow/outflow differential is between the unique pair id_in and id_out from the id_in perspective
For example, for id_in == 2 and id_out == 3 it would look like the following (from id_in == 2s perspective)
id pair_id id_in id_out date inflow_outflow
1 1 2 3 1/1/2010 1
2 1 2 3 1/2/2010 2
3 1 3 2 1/3/2010 1
4 1 3 2 1/5/2010 0
5 1 3 2 1/7/2010 -1
Explanation. id_in == 2 as received first so they get +1 then they received again so +2. Then they gave out so it gets reduced by -1 bringing the total to that point to 1, etc.
This is what I have tried
sort pair_id id_in date
gen count = 0
qui forval i = 2/`=_N' {
local I = `i' - 1
count if id_in == id_out[`i'] in 1/`I'
replace count = r(N) in `i'
}
I don't follow all the logic here and in particular presenting transactions from the point of view of one member seems quite arbitrary. But a broad impression from loosely similar problems is that you should not be thinking about loops here. It should suffice to use by: and cumulative sums. There is an attempt at some systematic discussion of how to handle dyads at http://www.stata-journal.com/sjpdf.html?articlenum=dm0043 but it is only a beginning.
Please note that presenting dates according to some display format is a small pain as they need to be reverse engineered. dataex from SSC can be used to create examples that are easy to copy and paste.
This code may suggest some technique:
clear
input id pair_id id_in id_out str8 sdate
1 1 2 3 "1/1/2010"
2 1 2 3 "1/2/2010"
3 1 3 2 "1/3/2010"
4 1 3 2 "1/5/2010"
5 1 3 2 "1/7/2010"
6 2 2 1 "1/2/2010"
7 3 1 3 "1/5/2010"
8 2 1 2 "1/7/2010"
end
gen date = daily(sdate, "MDY")
format date %td
assert id_in != id_out
gen pair1 = cond(id_in < id_out, id_in, id_out)
gen pair2 = cond(id_in < id_out, id_out, id_in)
bysort pair_id (date): gen sum1 = sum(id_in == pair1) - sum(id_out == pair1)
bysort pair_id (date): gen sum2 = sum(id_in == pair2) - sum(id_out == pair2)
list date id_* pair? sum?, sepby(pair_id)
+----------------------------------------------------------+
| date id_in id_out pair1 pair2 sum1 sum2 |
|----------------------------------------------------------|
1. | 01jan2010 2 3 2 3 1 -1 |
2. | 02jan2010 2 3 2 3 2 -2 |
3. | 03jan2010 3 2 2 3 1 -1 |
4. | 05jan2010 3 2 2 3 0 0 |
5. | 07jan2010 3 2 2 3 -1 1 |
|----------------------------------------------------------|
6. | 02jan2010 2 1 1 2 -1 1 |
7. | 07jan2010 1 2 1 2 0 0 |
|----------------------------------------------------------|
8. | 05jan2010 1 3 1 3 1 -1 |
+----------------------------------------------------------+
A specific pair (as defined by pair_id) is always conformed by two entities that can be ordered in one of two ways. For example, entity 5 with entity 8, and entity 8 with entity 5. If one is receiving, the other is giving out, necessarily.
Two slightly different ways of approaching the problem can be found below.
clear all
set more off
*----- example data -----
input id pair_id id_in id_out str8 sdate
1 1 2 3 "1/1/2010"
2 1 2 3 "1/2/2010"
3 1 3 2 "1/3/2010"
4 1 3 2 "1/5/2010"
5 1 3 2 "1/7/2010"
6 2 2 1 "1/2/2010"
7 3 1 3 "1/5/2010"
8 2 1 2 "1/7/2010"
end
gen date = daily(sdate, "MDY")
format date %td
drop sdate
sort pair_id date id
list, sepby(pair_id)
*---- what you want -----
// approach 1
bysort pair_id (date id) : gen sum1 = sum(cond(id_in == id_in[1], 1, -1))
gen sum2 = -1 * sum1
// approach 2
bysort pair_id (id_in date id) : gen temp = cond(id_in == id_in[1], 1, -1)
bysort pair_id (date id) : gen sum100 = sum(temp)
gen sum200 = -1 * sum100
// list
drop temp
sort pair_id date
list, sepby(pair_id)
The first approach involves creating a variable that holds the differential for the entity that first receives according to the date variable. sum1 does just that. Variable sum2 holds the differential for the other entity.
The second approach creates a variable that holds the differential for the entity that has the smallest identifying number. I've named it sum100. Variable sum200 holds the information for the other entity.
Note that I added id to the sorting list in case pair_id date does not uniquely identify observations.
The second approach is equivalent to the code provided by #NickCox, or so I believe.
The results:
. list, sepby(pair_id)
+---------------------------------------------------------------------------+
| id pair_id id_in id_out date sum1 sum2 sum100 sum200 |
|---------------------------------------------------------------------------|
1. | 1 1 2 3 01jan2010 1 -1 1 -1 |
2. | 2 1 2 3 02jan2010 2 -2 2 -2 |
3. | 3 1 3 2 03jan2010 1 -1 1 -1 |
4. | 4 1 3 2 05jan2010 0 0 0 0 |
5. | 5 1 3 2 07jan2010 -1 1 -1 1 |
|---------------------------------------------------------------------------|
6. | 6 2 2 1 02jan2010 1 -1 -1 1 |
7. | 8 2 1 2 07jan2010 0 0 0 0 |
|---------------------------------------------------------------------------|
8. | 7 3 1 3 05jan2010 1 -1 1 -1 |
+---------------------------------------------------------------------------+
Check them carefully, as the difference between both approaches is subtle, at least initially.