I currently have deviations for a number of models as shown in the data below:
year model_2015 model_2016 model_2017
2016 15 . .
2017 20 10 .
2018 30 20 30
Variable Model_2015, performed in 2015, has deviations for 2016, 2017, 2018; variable Model_2016 for 2017 as well as 2018, and so on.
I would like to create a variable that sums the first observation of each.
So for this example:
first = 15 + 10 + 30 = 55
I'm assuming that I have to make a loop, but I am stumped on how to go about it.
EDIT:
Ideally, I would also like a solution adding the second, third, and so on non-missing observations.
The following works for me:
generate first = model_2015[1] + model_2016[2] + model_2017[3]
However, here is a more general approach:
clear
input year model_2015 model_2016 model_2017
2016 15 . .
2017 20 10 .
2018 30 20 30
end
generate id = 1
tempfile myfile
save `myfile'
collapse (firstnm) model*, by(id)
egen first = rowtotal(model*)
keep id first
merge 1:m id using `myfile'
drop id _merge
order year model* first
list, abbreviate(15)
+-----------------------------------------------------+
| year model_2015 model_2016 model_2017 first |
|-----------------------------------------------------|
1. | 2016 15 . . 55 |
2. | 2017 20 10 . 55 |
3. | 2018 30 20 30 55 |
+-----------------------------------------------------+
EDIT:
Below, is an even more general solution:
clear
input year model_2015 model_2016 model_2017
2016 15 . .
2017 20 10 .
2018 30 20 30
2019 40 10 10
end
local i = 0
foreach v of varlist model* {
local ++i
local vals
forvalues j = 1 / `=_N' {
if !missing(`v'[`j']) local vals `vals' `=`v'[`j']'
}
local ind_`i' `: word 1 of `vals'' // CHANGE THIS NUMBER
local ind_all `ind_all' `ind_`i''
}
generate first = `= subinstr("`ind_all'", " ", "+", `= wordcount("`ind_all'") - 1')'
Results:
list, abbreviate(15)
+-----------------------------------------------------+
| year model_2015 model_2016 model_2017 first |
|-----------------------------------------------------|
1. | 2016 15 . . 55 |
2. | 2017 20 10 . 55 |
3. | 2018 30 20 30 55 |
4. | 2019 40 10 10 55 |
+-----------------------------------------------------+
+-----------------------------------------------------+
| year model_2015 model_2016 model_2017 second |
|-----------------------------------------------------|
1. | 2016 15 . . 50 |
2. | 2017 20 10 . 50 |
3. | 2018 30 20 30 50 |
4. | 2019 40 10 10 50 |
+-----------------------------------------------------+
+-----------------------------------------------------+
| year model_2015 model_2016 model_2017 third |
|-----------------------------------------------------|
1. | 2016 15 . . 40 |
2. | 2017 20 10 . 40 |
3. | 2018 30 20 30 40 |
4. | 2019 40 10 10 40 |
+-----------------------------------------------------+
Note that in this case I used a slightly modified example for better illustration.
The code below might be the loop(s) that you are seeking for:
forvalues i = 1 / `=_N' {
generate S_`i' = 0
forvalues j = `i' / `=_N' {
capture replace S_`i' = S_`i' + model_`=2015+`j'-`i''[`j']
}
}
Related
I want to print a calendar month thats enclausured in a box made of - and |.
I wrote up a code and if you set the starting date as 2 it kinda works if not it doesnt. What am i doing wrong??
for(cons=1;cons<inicio;cons++)
printf("| ");
for(cons=2;cons<inicio;cons++)
printf(" ");
for(cons=1;cons<=dias_mes(mes,ano);cons++){
printf("%3d", cons);
if((inicio+cons-1)%7 == 0)
printf(" |\n|");
}
This is the code i have for printing the calendar and if i set the starting day to 2 i get this result:
------------------------
| Fevereiro - 2022 |
------------------------
| D S T Q Q S S |
------------------------
| 1 2 3 4 5 6 |
| 7 8 9 10 11 12 13 |
| 14 15 16 17 18 19 20 |
| 21 22 23 24 25 26 27 |
| 28
Else i just get this:
------------------------
| Fevereiro - 2022 |
------------------------
| D S T Q Q S S |
------------------------
| | | 1 2 3 4 |
| 5 6 7 8 9 10 11 |
| 12 13 14 15 16 17 18 |
| 19 20 21 22 23 24 25 |
| 26 27 28
this time i set it to 4.
I am still learning Stata and am not sure how to get this to work. I need to run a regression over increasing sample sizes. I know how to get it to run for a specific sample size:
reg y x1 x2 in 1/10
but I need it to do it for sample size 10, then 11, then 12, etc. up to 1000. I tried the following:
foreach var in varlist x1 x2 {
reg y x1 x2 in 10/_n+1
}
but that did not work. How do I get it to loop the regression increasing sample size by 1 each time?
forvalues i = 10/1000 {
reg y x1 x2 if _n <= `i'
}
The first answer does exactly what you asked, but you then have the output from (in your example) 991 separate regressions to process. The next question is how to select what you want for further analysis. You could check out official command rolling as providing various machinery or rangestat from SSC. Here's a token reproducible example with a listing of results. The dataset in question is panel data: note that options like by(company) insist on separate regressions for each panel. If you wanted more or different results to be kept, there are related commands.
. webuse grunfeld
. rangestat (reg) invest mvalue, int(year . 0)
+------------------------------------------------------------------------------------------+
| year reg_nobs reg_r2 reg_adj~2 b_mvalue b_cons se_mvalue se_cons |
|------------------------------------------------------------------------------------------|
| 1935 10 .86526037 .84841791 .10253446 .20584135 .01430537 16.364981 |
| 1936 20 .75028331 .73641016 .0893724 7.3202446 .01215286 17.885845 |
| 1937 30 .70628424 .69579439 .08388549 11.163111 .01022308 17.600837 |
| 1938 40 .69788155 .68993107 .08333349 10.545486 .00889458 14.396864 |
| 1939 50 .70399484 .69782807 .08056278 9.3450069 .00754013 12.312899 |
|------------------------------------------------------------------------------------------|
| 1940 60 .72557587 .72084442 .0842278 7.6507759 .0068016 11.294786 |
| 1941 70 .73233666 .72840043 .08992373 7.496037 .00659263 11.034693 |
| 1942 80 .72656572 .72306015 .094125 7.7007269 .00653803 10.706012 |
| 1943 90 .73520447 .73219543 .0968378 6.7626664 .00619519 10.093819 |
| 1944 100 .74792336 .74535115 .09900035 6.0220131 .00580579 9.4585067 |
|------------------------------------------------------------------------------------------|
| 1945 110 .76426375 .762081 .1001161 5.3512756 .00535037 8.8046084 |
| 1946 120 .77316485 .77124251 .10424112 3.9977716 .00519777 8.6559782 |
| 1947 130 .76829138 .76648116 .10701191 4.703102 .0051944 8.549975 |
| 1948 140 .75348635 .75170002 .10927737 6.1833536 .00532076 8.6413437 |
| 1949 150 .75420863 .75254788 .11128353 6.3261435 .00522201 8.4080085 |
|------------------------------------------------------------------------------------------|
| 1950 160 .7520656 .75049639 .114046 5.7698694 .00520945 8.3379321 |
| 1951 170 .75387998 .75241498 .11796668 5.0385173 .00520028 8.4011668 |
| 1952 180 .74822014 .74680565 .12304588 3.702257 .00534999 8.728181 |
| 1953 190 .74683845 .74549185 .1322075 -1.296652 .00561387 9.387601 |
| 1954 200 .734334 .73299225 .14138597 -6.9762842 .00604359 10.272724 |
+------------------------------------------------------------------------------------------+
I'm using Vue. Lets say I have a database table with the historical price of a few kinds of fruits for the last few years, so: fruit, year and price columns.
| fruit | year | price |
|--------|------|-------|
| apple | 2018 | 52 |
| apple | 2019 | 57 |
| apple | 2020 | 56 |
| apple | 2021 | 50 |
| banana | 2018 | 25 |
| banana | 2019 | 26 |
| banana | 2021 | 28 |
| pear | 2018 | 61 |
| pear | 2019 | 65 |
| pear | 2020 | 67 |
| pear | 2021 | 64 |
Now I want to create a html table which has fruit names on one axis and years on the other, and the cells contain the price for the given fruit / year combination as below. Some combinations might be missing from the data.
What features and template syntax you'd use? Please do not suggest tranforming the raw data: it comes straight from a database and there will be many tables like this, and I need a generic solution.
| | 2018 | 2019 | 2020 | 2021 |
|--------|------|------|------|------|
| apple | 52 | 57 | 56 | 50 |
| banana | 25 | 26 | n/a | 28 |
| pear | 61 | 65 | 67 | 64 |
I'm looking for elegant "vue-like" solutions. For now I created getRows(), getColumns() functions which collect all possible row and column values and then a getCell(col, row) function to pick up the right value from the dataset - but this might force Vue to rebuild the display more than optimal times when I edit the underlying data.
The broader question is how you work with relational data in Vue, because this is just the basic example, normally the name of the fruit would come from another base table...
I have the following data (this is not a real table in database, it's just a group of information I need to store with each post in my database):
X = yes / true
O = no / false
Weekday | Morning | Day | Evening | Night |
---------------------------------------------
Monday | X | O | O | X |
Tuesday | X | O | O | X |
Wednesday | O | O | X | X |
Thursday | O | X | O | O |
Friday | X | X | X | X |
Saturday | O | O | X | O |
Sunday | X | X | X | O |
How should I store data like this in a database? Im not too experienced with database design and all the possible ways I could think of waste a lot of space. Normalization is not a requirement for this.
I don't need to query by this data, I just need to store it efficiently in parent object/entity.
From what you've said, I'd create a table with 4 columns (Morning, Day, Evening, Night) and a Primary Key to reference the individual datasets (like the one you've shown). Then I'd use a bitfield for each row entry. For example Monday = 1, Tuesday = 2, Wednesday = 4, Thurs = 8, Friday = 16, Saturday = 32, and Sunday = 64.
Your dataset provided (here as PK 001) could be saved in a single row as:
PK | Morning | Day | Evening | Night |
001 83 88 116 23
The morning value is 83, because Monday (1) + Tues (2) + Friday (16) + Sunday (64) = 83.
You'd only need to use a datatype that can store a 128-bit number max. This depends on which database you use but many databases have a binary(64) type that would work.
You would then use the bitwise & operator to test if a day is represented by a particular value:
83 (Morning Value) & 16 (Friday) = 16 (Friday, therefore true)
88 (Day value) & 4 (Wednesday) = 0 (therefore false)
116 (Evening value) & 32 (Saturday) = 32 (Saturday, therefor true)
Alternatively, you could create a bitfield column for each Day of the Week, and the value would be Morning = 1, Day = 2, Evening = 4, and Night = 8.
Your given dataset would be represented as a single row in the database as:
PK | Mon | Tue | Wed | Thu | Fri | Sat | Sun |
001 9 9 12 2 15 4 7
You may think of creating a table with 28 columns, being each group of 4 columns corresponding to the 4 times of each day. Something like:
CREATE TABLE <name> ( SUNDAY_MORNING VARCHAR(1) ,
SUNDAY_DAY VARCHAR(1) ,
SUNDAY_EVENING VARCHAR(1) ,
SUNDAY_NIGHT VARCHAR(1) ,
MONDAY_MORNING VARCHAR(1) ,
MONDAY_DAY VARCHAR(1) ,
MONDAY_EVENING VARCHAR(1) ,
MONDAY_NIGHT VARCHAR(1) ,
...
) ;
Of course, KEYs may need to be defined, but it is not possible to suggest anything based on the provided info.
As per your concern of SPACE, this structure will se MUCH LESS space than you can imagine.
I have an embedded system (MCU without any OS) that the end-user should be able to define the system's level (scale: 0-100) for a year. As an example (time x day matrix):
| 1st Jan | 2nd Jan | 3rd Jan | .. | 31 Dec |
00:30 | 40 (%) | 40 | 45 | .. | 50 |
01:48 | 48 | 47 | 55 | .. | 33 |
02:26 | 64 | 64 | 60 | .. | 68 |
.. | .. | .. | .. | .. | .. |
22:15 | 79 | 82 | 89 | .. | 100 |
23:37 | 100 | 100 | 97 | .. | 100 |
What I thought is to store the data as: time [in minutes], sysLevel
so it would be something like this for the above table:
typedef struct{
uint16_t minute; //scale: 0 - 1440 min
uint8_t level; //scale 0 - 100 (%)
}timeLevel_t; //3 byte
then store each day as
timeLevel_t firstJan[24] = { .. }; //it stores level changes, the array length doesn't have to be 24
timeLevel_t secJan[17] = { .. };
timeLevel_t thirdJan[20] = { .. };
...
(I will fetch the data from a CSV file it might be out of topic to consider it now on this question).
The system would expect to have a per hour task definition in the worst case, so that timeLevel_t (3 byte) definition for 24 hr would be 72 byte data per day, then the data for 365 days would be 26280 byte data.
Would you suggest a more memory efficient algorithm to store the information for a calender year (the program will update it each year so that it would consider the 29th Feb)?
In addition? Would it be better to make a 2 dimensional array to store day information on 1D, and timeLevel_t on the other dimension?
You'll have to store additional information, such as the #days/month, or assume 31 each and waste a couple of bytes.
It kind of all depends on your pain points. You could pack bytes more to have no bits wasted, at the expense of code complexity and compute time.
If storage is at an absolute premium, You could store everything in just one huge 1D array and use the extra bits for other stuff.
For instance, bit 11 could indicate "next day", bit 12 "next month" and then you can crawl the array and find any day of the year. And you'd have a whole other 4 bits for various shenanigans.
Describe the issues more, please.