Tracking units over time if one value is smaller than another value - loops

I want to track the same group over time. Consider the data example below. Group = 4933 first appears in 2008 and treated = 1 if previous < sale & enter == year. I want to spread this 1 to all other group = 4933 but not when year is 2010, for example, because in 2010 there is a new entry of director ID and his group after 2010 should be spread. Thus, I need treated = 1 for group = 709 in 2011 as well.
Treatment is the entering of a director whose previous value is smaller than sale of a firm. And treatment should be 1 for this director up until he leaves the company.
clear
input treated previous enter group dirid fid year sale exit
0 . 2006 2073 41610 11 2008 5.124932 2011
0 . 2007 1139 17558 11 2008 5.124932 2011
1 2.796215 2008 4933 238853 11 2008 5.124932 2011
0 . 2006 2070 41584 11 2008 5.124932 2011
0 . 2006 2075 41649 11 2008 5.124932 2011
0 2.796215 2008 4933 238853 11 2009 5.623706 2011
0 . 2007 1139 17558 11 2009 5.623706 2011
0 . 2006 2073 41610 11 2009 5.623706 2011
0 . 2006 2070 41584 11 2009 5.623706 2011
0 . 2006 2075 41649 11 2009 5.623706 2011
0 . 2009 2078 41712 11 2009 5.623706 2011
0 . 2006 2075 41649 11 2010 5.659729 2011
0 . 2006 2070 41584 11 2010 5.659729 2011
0 . 2007 1139 17558 11 2010 5.659729 2011
0 2.796215 2008 4933 238853 11 2010 5.659729 2011
1 5.123736 2010 709 9587 11 2010 5.659729 2011
0 . 2009 2078 41712 11 2010 5.659729 2011
0 . 2006 2073 41610 11 2010 5.659729 2011
0 . 2007 1139 17558 11 2011 5.513114 2011
0 . 2006 2073 41610 11 2011 5.513114 2011
0 . 2009 2078 41712 11 2011 5.513114 2011
0 2.796215 2008 4933 238853 11 2011 5.513114 2011
0 . 2006 2070 41584 11 2011 5.513114 2011
0 5.123736 2010 709 9587 11 2011 5.513114 2011
0 . 2006 2075 41649 11 2011 5.513114 2011
end
I tried this code:
sum dirid
forval i=9587 /238853{
qui replace treated0 =1 if dirid == `i' & enter <year & !missing(previous)
}
but it replaces also group=4933 in 2010. I don't need this, it should only be one for group = 709.

Related

How to identify gaps in reported year in an unbalanced panel with repeated observations for firm-year?

I have a dataset at the firm-product-year level. I want to identify which firms having gaps in reporting years between 1994-2004. Consider an example below:
clear
input id year sales product
14 1994 28.9 2
14 1994 67.9 3
14 1994 12.5 9
14 1994 451.8 34
14 1994 27.5 44
14 1994 647.6 45
14 1995 9.7 2
14 1995 33.5 3
14 1995 112.4 9
14 1995 712.2 15
14 1995 902.3 41
14 1995 67.3 45
14 1995 15.1 50
14 1996 6.5 2
14 1996 24.6 3
14 1996 1009.4 5
14 1996 77.1 9
14 1996 76.9 17
14 1996 12.4 45
14 1996 946.3 88
14 1996 15.4 92
14 1997 .7 2
14 1997 63.2 2
14 1997 91.7 3
14 1997 860.8 9
14 1997 12.4 21
14 1997 800.8 32
14 1997 33.7 45
14 1997 41 95
15 1999 .1 44
15 2000 .1 58
15 2001 .4 27
15 2001 .1 95
15 2002 .5 5
15 2002 .1 58
15 2003 .1 17
15 2004 3.5 28
15 2004 .1 39
16 2000 .8 2
16 2001 .6 2
16 2003 .2 2
16 2004 .1 2
16 2004 .1 8
16 2004 2.5 8
end
Firm 14 produced 6 products in 1994. It produced every year consecutively until 1997. Because there are no missing years in between, I keep this firm. But firm 16 reports in 2000, 2001 and then in 2003. I assume that the firm still operated in 2002 but doesn't report in the data. How to create a dummy variable for this firm?
tsfill doesn't help because I have repeated values within id-year.
In the first step, you delete the companies that do not produce any products in a year by creating a dummy variable "firm_any_production" that indicates whether a company has produced at least one product in a given year. Then the maximum of this dummy variable is calculated for each firm and the firms for which the maximum is 0 are deleted.
gen firm_any_production = sum(sales) > 0
bysort id (year): egen firm_missing_year = max(firm_any_production)
drop if firm_missing_year == 0
In step 2 you calculate whether the newly added products of a company have higher sales than the core product. This is calculated by creating a dummy variable "is_new_product", which indicates whether a product is a new product. Then the sales of these new products are calculated and compared to the sales of the core product. If the sum of the turnover of the new products is greater than the turnover of the core product, another dummy variable "greater_than_core" is created and set to 1.
bysort id year: egen core_product_sales = max(sales)
gen is_new_product = sales != core_product_sales
gen new_product_sales = sales * is_new_product
gen greater_than_core = sum(new_product_sales) > core_product_sales
Translated with www.DeepL.com/Translator (free version)
Added:
The code is creating a firm_missing_year variable that takes the value of 1 if a firm doesn't report any product in the current year. The is_core_product variable indicates which product has the highest sales in a given year for each firm. The is_new_product variable takes the value of 1 if the product wasn't produced in the previous year. Finally, the higher_new_sales variable takes the value of 1 if the sum of sales of new products is greater than the sales of the core product.
use "your_data_file.dta", clear
gen firm_missing_year = 0
bysort id (year): egen last_year = max(year), unique(id)
replace firm_missing_year = 1 if year > last_year[1]
gen is_core_product = 0
bysort id year: egen max_sales = max(sales), unique(id year)
replace is_core_product = 1 if sales == max_sales
gen is_new_product = 0
bysort id year: gen lagged_product = product[_n-1]
replace is_new_product = 1 if product != lagged_product & sales != max_sales
bysort id year: egen sum_new_sales = sum(sales * is_new_product), unique(id year)
gen higher_new_sales = 0
replace higher_new_sales = 1 if sum_new_sales > max_sales

How to calculate mean of previous values of other firms for Director ID before he joins the firm

I need to calculate the previous wage of director before he joins a new company.
I have created a simple dataset for one director (in practice I have many observations of director_id). This director with ID = 1 manages 5 firms which he joined in different years (the variable called enter). If director joined firm number 2 in 2011, I need the average of the variable wage for all years before 2011 which he was managing. For the same director = 1, I need a different mean(wage) for firm number 3 which he joined in 2012 (which will include mean(wage) from previous 2 companies that he managed before entering company 3 in 2012).
Below is the data. I would really appreciate your help in coding this problem.
clear
input enter year wage director_id firm_id
2006 2006 6.4790964 1 1
2006 2010 6.4783854 1 1
2006 2011 6.4067149 1 1
2006 2012 6.3716507 1 1
2006 2013 6.2248578 1 1
2006 2014 6.0631728 1 1
2011 2011 5.0127039 1 2
2011 2012 4.9616795 1 2
2011 2013 4.9483747 1 2
2011 2014 5.2612371 1 2
2012 2012 4.5389338 1 3
2012 2013 4.4322848 1 3
2012 2014 4.3223209 1 3
2013 2013 4.336947 1 4
2013 2014 4.27459 1 4
2015 2015 -.60586482 1 5
2015 2016 .085194588 1 5
end
I just need to exclude from mean(wage) all values that happen after he enters, so really need to regard only years before he enters a new company.
A recipe for what I think you seek is that the mean previous wage in other firms =
(SUM of previous wages in all firms MINUS sum of previous wages in this firm) / (COUNT of previous years in all firms MINUS count of previous years in this firm).
Your example is helpful but the wage variable is too irregular to allow easy eyeball checks.
Consider this sequence, where rangestat is from SSC.
clear
input enter year wage director_id firm_id
2006 2006 6.4790964 1 1
2006 2010 6.4783854 1 1
2006 2011 6.4067149 1 1
2006 2012 6.3716507 1 1
2006 2013 6.2248578 1 1
2006 2014 6.0631728 1 1
2011 2011 5.0127039 1 2
2011 2012 4.9616795 1 2
2011 2013 4.9483747 1 2
2011 2014 5.2612371 1 2
2012 2012 4.5389338 1 3
2012 2013 4.4322848 1 3
2012 2014 4.3223209 1 3
2013 2013 4.336947 1 4
2013 2014 4.27459 1 4
2015 2015 -.60586482 1 5
2015 2016 .085194588 1 5
end
sort year firm_id
replace wage = _n
rangestat (sum) SUM=wage (count) COUNT=wage, int(year . -1) by(director_id)
rangestat (sum) sum=wage (count) count=wage, int(year . -1) by(director_id firm_id)
replace sum = 0 if sum == .
replace count = 0 if count == .
gen wanted = (SUM - sum) / (COUNT - count)
list, sepby(year)
+---------------------------------------------------------------------------------+
| enter year wage direct~d firm_id SUM COUNT sum count wanted |
|---------------------------------------------------------------------------------|
1. | 2006 2006 1 1 1 . . 0 0 . |
|---------------------------------------------------------------------------------|
2. | 2006 2010 2 1 1 1 1 1 1 . |
|---------------------------------------------------------------------------------|
3. | 2006 2011 3 1 1 3 2 3 2 . |
4. | 2011 2011 4 1 2 3 2 0 0 1.5 |
|---------------------------------------------------------------------------------|
5. | 2006 2012 5 1 1 10 4 6 3 4 |
6. | 2011 2012 6 1 2 10 4 4 1 2 |
7. | 2012 2012 7 1 3 10 4 0 0 2.5 |
|---------------------------------------------------------------------------------|
8. | 2006 2013 8 1 1 28 7 11 4 5.666667 |
9. | 2011 2013 9 1 2 28 7 10 2 3.6 |
10. | 2012 2013 10 1 3 28 7 7 1 3.5 |
11. | 2013 2013 11 1 4 28 7 0 0 4 |
|---------------------------------------------------------------------------------|
12. | 2006 2014 12 1 1 66 11 19 5 7.833333 |
13. | 2011 2014 13 1 2 66 11 19 3 5.875 |
14. | 2012 2014 14 1 3 66 11 17 2 5.444445 |
15. | 2013 2014 15 1 4 66 11 11 1 5.5 |
|---------------------------------------------------------------------------------|
16. | 2015 2015 16 1 5 120 15 0 0 8 |
|---------------------------------------------------------------------------------|
17. | 2015 2016 17 1 5 136 16 16 1 8 |
+---------------------------------------------------------------------------------+

JFreeChart: How to define a DataSet that groups by year, month and day?

I have a SQL query already that gets the data I need but I'm struggling to figure out how to get that into a chart. This is sample data as result of my query:
year month day mode amount duration
2013 2 22 0 1 36001
2013 7 7 1 1 55062
2015 12 23 1 6 13
2015 12 23 4 4 11
2015 12 23 7 31 104
2015 12 23 8 2 4
2015 12 23 12 11 21
2015 12 23 13 3 8
2016 3 24 1 207 519
If I wanted to graph lets say amount grouped per year, month and day how would that be done in JFreeChart?

Copy, paste and save as a .xls file

I need to define cell ranges to copy and paste in new worksbook under the name of column B (i.e. Alda.xls). How can I do it in VBA?
Thanks in advance.
A B C D E F G H I
1999 ALDA 1/14/1999 12:00 1999 1 14 12 -20.2 36.1
1999 ALDA 1/14/1999 15:00 1999 1 14 15 -20.3 36.25
1999 ALDA 1/14/1999 18:00 1999 1 14 18 -20.4 36.4
1999 ALDA 1/14/1999 21:00 1999 1 14 21 -20.35 36.3
1999 ALDA 1/15/1999 0:00 1999 1 15 0 -20.3 36.2
1999 ALDA 1/15/1999 3:00 1999 1 15 3 -20.25 36.2
I can do it manually but I have 200 of them in one worksheet. I just wanna know whether i can do it in VBA to make it fast.
If I have a code to define cell ranges and file name I think I can do it faster.

Functions with Arrays in R

Let's say I have maximum temperature data for the last 20 years. My data frame has a column for month, day, year and MAX_C (temperature data). I want to calculate the mean (and standard deviation, and range) maximum temperature from June 31 of one year to July 1 of the preceding year (i.e. mean max daily temp from July 1, 1991 to June 31, 1992). Is there an efficient way to do this?
My approach, thus far, has been to create an array:
maxt.prev12<-tapply(maxt$MAX_C,INDEX=list(maxt$month,maxt$day,maxt$year),mean)
I put mean in as the function as tapply was not producing an array without a function after the INDEX, but mean is not actually calculating anything here. Then I was thinking about trying to take January through June from one the matrices (i.e. 1992), and July through December from the preceding matrix (i.e. 1991), and then computing the mean. I'm not entirely sure how to do that part, however, there must be a more efficient way of performing these calculations in R
EDIT
Here is a simple sample set of data
maxt
day month year MAX_C
1 1 1990 29
1 2 1990 28
1 3 1990 32
1 4 1990 26
1 5 1990 24
1 6 1990 32
1 7 1990 30
1 8 1990 28
1 9 1990 28
1 10 1990 24
1 11 1990 30
1 12 1990 30
1 1 1991 25
1 2 1991 26
1 3 1991 28
1 4 1991 25
1 5 1991 24
1 6 1991 32
1 7 1991 26
1 8 1991 32
1 9 1991 26
1 10 1991 26
1 11 1991 27
1 12 1991 26
1 1 1992 27
1 2 1992 25
1 3 1992 29
1 4 1992 32
1 5 1992 27
1 6 1992 27
1 7 1992 24
1 8 1992 25
1 9 1992 28
1 10 1992 26
1 11 1992 31
1 12 1992 27
I would create an "indicator year" column which was equal to the year if month in July-Dec but equal to year-1 when month in Jan-June.
EDITED month reference in light of the fact it was numeric rather than character:
> maxt$year2 <- maxt$year
> maxt[ maxt$month %in% 1:6, "year2"] <-
+ maxt[ maxt$month %in% 1:6, "year"] -1
> # month.name is a 12 element constant vector in all versions of R
> # check that it matches the spellings of your months
>
> mean_by_year <- tapply(maxt$MAX_C, maxt$year2, mean, na.rm=TRUE)
> mean_by_year
1989 1990 1991 1992
28.50000 27.50000 27.50000 26.83333
If you wanted to change the labels so they reflected the non-calendar year derivation:
> names(mean_by_year) <- paste(substr(names(mean_by_year),3,4),
+ as.character( as.numeric(substr(names(mean_by_year),3,4))+1),
sep="_")
> mean_by_year
89_90 90_91 91_92 92_93
28.50000 27.50000 27.50000 26.83333
Although I don't think it will be quite right at the millennial turn.

Resources