Copy, paste and save as a .xls file - file

I need to define cell ranges to copy and paste in new worksbook under the name of column B (i.e. Alda.xls). How can I do it in VBA?
Thanks in advance.
A B C D E F G H I
1999 ALDA 1/14/1999 12:00 1999 1 14 12 -20.2 36.1
1999 ALDA 1/14/1999 15:00 1999 1 14 15 -20.3 36.25
1999 ALDA 1/14/1999 18:00 1999 1 14 18 -20.4 36.4
1999 ALDA 1/14/1999 21:00 1999 1 14 21 -20.35 36.3
1999 ALDA 1/15/1999 0:00 1999 1 15 0 -20.3 36.2
1999 ALDA 1/15/1999 3:00 1999 1 15 3 -20.25 36.2

I can do it manually but I have 200 of them in one worksheet. I just wanna know whether i can do it in VBA to make it fast.
If I have a code to define cell ranges and file name I think I can do it faster.

Related

How to identify gaps in reported year in an unbalanced panel with repeated observations for firm-year?

I have a dataset at the firm-product-year level. I want to identify which firms having gaps in reporting years between 1994-2004. Consider an example below:
clear
input id year sales product
14 1994 28.9 2
14 1994 67.9 3
14 1994 12.5 9
14 1994 451.8 34
14 1994 27.5 44
14 1994 647.6 45
14 1995 9.7 2
14 1995 33.5 3
14 1995 112.4 9
14 1995 712.2 15
14 1995 902.3 41
14 1995 67.3 45
14 1995 15.1 50
14 1996 6.5 2
14 1996 24.6 3
14 1996 1009.4 5
14 1996 77.1 9
14 1996 76.9 17
14 1996 12.4 45
14 1996 946.3 88
14 1996 15.4 92
14 1997 .7 2
14 1997 63.2 2
14 1997 91.7 3
14 1997 860.8 9
14 1997 12.4 21
14 1997 800.8 32
14 1997 33.7 45
14 1997 41 95
15 1999 .1 44
15 2000 .1 58
15 2001 .4 27
15 2001 .1 95
15 2002 .5 5
15 2002 .1 58
15 2003 .1 17
15 2004 3.5 28
15 2004 .1 39
16 2000 .8 2
16 2001 .6 2
16 2003 .2 2
16 2004 .1 2
16 2004 .1 8
16 2004 2.5 8
end
Firm 14 produced 6 products in 1994. It produced every year consecutively until 1997. Because there are no missing years in between, I keep this firm. But firm 16 reports in 2000, 2001 and then in 2003. I assume that the firm still operated in 2002 but doesn't report in the data. How to create a dummy variable for this firm?
tsfill doesn't help because I have repeated values within id-year.
In the first step, you delete the companies that do not produce any products in a year by creating a dummy variable "firm_any_production" that indicates whether a company has produced at least one product in a given year. Then the maximum of this dummy variable is calculated for each firm and the firms for which the maximum is 0 are deleted.
gen firm_any_production = sum(sales) > 0
bysort id (year): egen firm_missing_year = max(firm_any_production)
drop if firm_missing_year == 0
In step 2 you calculate whether the newly added products of a company have higher sales than the core product. This is calculated by creating a dummy variable "is_new_product", which indicates whether a product is a new product. Then the sales of these new products are calculated and compared to the sales of the core product. If the sum of the turnover of the new products is greater than the turnover of the core product, another dummy variable "greater_than_core" is created and set to 1.
bysort id year: egen core_product_sales = max(sales)
gen is_new_product = sales != core_product_sales
gen new_product_sales = sales * is_new_product
gen greater_than_core = sum(new_product_sales) > core_product_sales
Translated with www.DeepL.com/Translator (free version)
Added:
The code is creating a firm_missing_year variable that takes the value of 1 if a firm doesn't report any product in the current year. The is_core_product variable indicates which product has the highest sales in a given year for each firm. The is_new_product variable takes the value of 1 if the product wasn't produced in the previous year. Finally, the higher_new_sales variable takes the value of 1 if the sum of sales of new products is greater than the sales of the core product.
use "your_data_file.dta", clear
gen firm_missing_year = 0
bysort id (year): egen last_year = max(year), unique(id)
replace firm_missing_year = 1 if year > last_year[1]
gen is_core_product = 0
bysort id year: egen max_sales = max(sales), unique(id year)
replace is_core_product = 1 if sales == max_sales
gen is_new_product = 0
bysort id year: gen lagged_product = product[_n-1]
replace is_new_product = 1 if product != lagged_product & sales != max_sales
bysort id year: egen sum_new_sales = sum(sales * is_new_product), unique(id year)
gen higher_new_sales = 0
replace higher_new_sales = 1 if sum_new_sales > max_sales

Tracking units over time if one value is smaller than another value

I want to track the same group over time. Consider the data example below. Group = 4933 first appears in 2008 and treated = 1 if previous < sale & enter == year. I want to spread this 1 to all other group = 4933 but not when year is 2010, for example, because in 2010 there is a new entry of director ID and his group after 2010 should be spread. Thus, I need treated = 1 for group = 709 in 2011 as well.
Treatment is the entering of a director whose previous value is smaller than sale of a firm. And treatment should be 1 for this director up until he leaves the company.
clear
input treated previous enter group dirid fid year sale exit
0 . 2006 2073 41610 11 2008 5.124932 2011
0 . 2007 1139 17558 11 2008 5.124932 2011
1 2.796215 2008 4933 238853 11 2008 5.124932 2011
0 . 2006 2070 41584 11 2008 5.124932 2011
0 . 2006 2075 41649 11 2008 5.124932 2011
0 2.796215 2008 4933 238853 11 2009 5.623706 2011
0 . 2007 1139 17558 11 2009 5.623706 2011
0 . 2006 2073 41610 11 2009 5.623706 2011
0 . 2006 2070 41584 11 2009 5.623706 2011
0 . 2006 2075 41649 11 2009 5.623706 2011
0 . 2009 2078 41712 11 2009 5.623706 2011
0 . 2006 2075 41649 11 2010 5.659729 2011
0 . 2006 2070 41584 11 2010 5.659729 2011
0 . 2007 1139 17558 11 2010 5.659729 2011
0 2.796215 2008 4933 238853 11 2010 5.659729 2011
1 5.123736 2010 709 9587 11 2010 5.659729 2011
0 . 2009 2078 41712 11 2010 5.659729 2011
0 . 2006 2073 41610 11 2010 5.659729 2011
0 . 2007 1139 17558 11 2011 5.513114 2011
0 . 2006 2073 41610 11 2011 5.513114 2011
0 . 2009 2078 41712 11 2011 5.513114 2011
0 2.796215 2008 4933 238853 11 2011 5.513114 2011
0 . 2006 2070 41584 11 2011 5.513114 2011
0 5.123736 2010 709 9587 11 2011 5.513114 2011
0 . 2006 2075 41649 11 2011 5.513114 2011
end
I tried this code:
sum dirid
forval i=9587 /238853{
qui replace treated0 =1 if dirid == `i' & enter <year & !missing(previous)
}
but it replaces also group=4933 in 2010. I don't need this, it should only be one for group = 709.

Issues Regarding SAS

I was working on a homework problem regarding using arrays and looping to create a new variable to identify the date of when the maximum blood lead value was obtained but got stuck. For context, here is the homework problem:
In 1990 a study was done on the blood lead levels of children in Boston. The following variables for twenty-five children from the study have been entered on multiple lines per subject in the file lead_sum2018.txt in a list format:
Line 1
ID Number (numeric, values 1-25)
Date of Birth (mmddyy8. format)
Day of Blood Sample 1 (numeric, initial possible range: -9 to 31)
Month of Blood Sample 1 (numeric, initial possible range: -9 to 12)
Line 2
ID Number (numeric, values 1-25)
Day of Blood Sample 2 (numeric, initial possible range: -9 to 31)
Month of Blood Sample 2 (numeric, initial possible range: -9 to 12)
Line 3
ID Number (numeric, values 1-25)
Day of Blood Sample 3 (numeric, initial possible range: -9 to 31)
Month of Blood Sample 3 (numeric, initial possible range: -9 to 12)
Line 4
ID Number (numeric, values 1-25)
Blood Lead Level Sample 1 (numeric, possible range: 0.01 – 20.00)
Blood Lead Level Sample 2 (numeric, possible range: 0.01 – 20.00)
Blood Lead Level Sample 3 (numeric, possible range: 0.01 – 20.00)
Sex (character, ‘M’ or ‘F’)
All blood samples were drawn in 1990. However, during data entry the order of blood samples was scrambled so that the first blood sample in the data file (blood sample 1) may not correspond to the first blood sample taken on a subject, it could be the first, second or third. In addition, some of the months and days and days of blood sampling were not written on the forms. At data entry, missing month and missing day values were each coded as -9.
The team of investigators for this project has made the following decisions regarding the missing values. Any missing days are to set equal to 15, any missing months are to be set equal to 6. Any analyses that are done on this data set need to follow those decisions. Be sure to implement the SAS syntax as indicated for each question. For example, use SAS arrays and loops if the item states that these must be used.
Here is the data that the HW references (it is in list format and was contained in a separate file called lead_sum2018.txt):
1 04/30/78 6 10
1 -9 7
1 14 1
1 1.62 1.35 1.47 F
2 05/19/79 27 11
2 20 -9
2 5 6
2 1.71 1.31 1.76 F
3 01/03/80 11 7
3 6 6
3 27 2
3 3.24 3.4 3.83 M
4 08/01/80 5 12
4 28 -9
4 3 4
4 3.1 3.69 3.27 M
5 12/26/80 21 5
5 3 7
5 -9 12
5 4.35 4.79 5.14 M
6 06/20/81 7 10
6 11 3
6 22 1
6 1.24 1.16 0.71 F
7 06/22/81 19 6
7 3 12
7 29 8
7 3.1 3.21 3.58 F
8 05/24/82 26 7
8 31 1
8 9 10
8 2.99 2.37 2.4 M
9 10/11/82 2 7
9 25 5
9 28 3
9 2.4 1.96 2.71 F
10 . 10 8
10 30 12
10 28 2
10 2.72 2.87 1.97 F
11 11/16/83 19 4
11 15 11
11 7 -9
11 4.8 4.5 4.96 M
12 03/02/84 17 6
12 11 2
12 17 11
12 2.38 2.6 2.88 F
13 04/19/84 2 12
13 -9 6
13 1 7
13 1.99 1.20 1.21 M
14 02/07/85 4 5
14 17 5
14 21 11
14 1.61 1.93 2.32 F
15 07/06/85 5 2
15 16 1
15 14 6
15 3.93 4 4.08 M
16 09/10/85 12 10
16 11 -9
16 23 6
16 3.29 2.88 2.97 M
17 11/05/85 12 7
17 18 1
17 11 11
17 1.31 0.98 1.04 F
18 12/07/85 16 2
18 18 4
18 -9 6
18 2.56 2.78 2.88 M
19 03/02/86 19 4
19 11 3
19 19 2
19 0.79 0.68 0.72 M
20 08/19/86 21 5
20 15 12
20 -9 4
20 0.66 1.15 1.42 F
21 02/22/87 16 12
21 17 9
21 13 4
21 2.92 3.27 3.23 M
22 10/11/87 7 6
22 1 12
22 -9 3
22 1.43 1.42 1.78 F
23 05/12/88 12 2
23 21 4
23 17 12
23 0.55 0.89 1.38 M
24 08/07/88 17 6
24 27 11
24 6 2
24 0.31 0.42 0.15 F
25 01/12/89 4 7
25 15 -9
25 23 1
25 1.69 1.58 1.53 M
A) Input the data and in the data step:
1) make sure that Date of Birth variable is recorded as a SAS date;
2) use SAS arrays and looping to create a SAS date variable for each of the three blood samples and to address the missing data in accordance to the decisions of the investigators. Hint: use a single array and do loop to recode the missing values for day and month, separately, and an array/do loop for creating the SAS date variable;
3) use a SAS function to create a variable for the highest, i.e., maximum, blood lead value for each child;
4) use SAS arrays and looping to identify the date on which this largest value was obtained and create a new variable for the date of the largest blood lead value;
5) determine the age of the child in years when the largest blood lead value was obtained (rounded to two decimal places);
6) create a new variable based on the age of the child in years when the largest lead value was obtained (call it, “agecat”) that takes on three levels: for children less than 4 years old, agecat should equal 1; for children at least 4 years old, but less than 8, agecat should equal 2; and for children at least 8 years of age, agecat should be 3.;
7) print out the variables for the date of birth, date of the largest lead level, age at blood sample for the largest blood lead level, agecat, sex, and the largest blood lead level (Only print out these requested variables). All dates should be formatted to use the mmddyy10. format on the output.
The code I used in response to this was:
libname HW3 'C:\Users\johns\Desktop\SAS';
filename HW3new 'C:\Users\johns\Desktop\SAS\lead_sum2018.txt';
data one;
infile HW3new;
informat dob mmddyy8.;
input #1 id dob dbs1 mbs1
#2 dbs2 mbs2
#3 dbs3 mbs3
#4 bls1 bls2 bls3 sex;
array dbs{3} dbs1 dbs2 dbs3;
array mbs{3} mbs1 mbs2 mbs3;
do i=1 to 3;
if dbs{i}=-9 then dbs{i}=15;
end;
do i=4 to 6;
if mbs{i}=-9 then mbs{i}=6;
end;
array date{3} mdy1 mdy2 mdy3;
do i=1 to 3;
date{i}=mdy(mbs{i}, dbs{i}, 1990);
end;
maxbls=max(of bls1-bls3);
array bls{3} bls1 bls2 bls3;
array maxdte{3} maxdte1 maxdte2 maxdte3;
do i=1 to i=3;
if bls{i}=maxbls then maxdte=i;
end;
agemax=maxdte-dob;
ageest=round(agemax/365.25,2);
if agemax=. then agecat=.;
else if agemax < 4 then agecat=1;
else if 4 <= agemax < 8 then agecat=2;
else if agemax ge 8 then agecat=3;
run;
I received this error:
22 maxbls=max(of bls1-bls3);
23 array bls{3} bls1 bls2 bls3;
24 array maxdte{3} maxdte1 maxdte2 maxdte3;
25 do i=1 to i=3;
26 if bls{i}=maxbls then maxdte=i;
ERROR: Illegal reference to the array maxdte.
27 end;
Does anyone have any tip is regards to this issue? What did I do wrong? Was I supposed to create an additional array for the date of when the maximum blood lead sample value was collected? Thanks!
**I'm stuck on #4 of Part A, but I included the other parts for context. Thanks!
**Edits: I included the data that I had to read into SAS and the file name of the file it came from
Just from looking at the code immediately prior to the error, you have a problem on this line:
26 if bls{i}=maxbls then maxdte=i;
You are getting the error because you are attempting to assign a value to the array maxdte. Arrays cannot be assigned values like that (unless you are using the deprecated do over syntax...) Instead, choose an element of the array and assign the value to the element. E.g. you could do:
26 if bls{i}=maxbls then maxdte{1}=i;
Or instead of a literal 1, you could use a variable containing the relevant array index.
You are not properly handling ID field from lines #2-4
input #1 id dob dbs1 mbs1
#2 dbs2 mbs2
#3 dbs3 mbs3
#4 bls1 bls2 bls3 sex;
For example you need to skip field 1 on line 2-3 or read the ids into array perhaps to check they are all the same.
input #1 id dob dbs1 mbs1
#2 id2 dbs2 mbs2
#3 id3 dbs3 mbs3
#4 id4 bls1 bls2 bls3 sex;
This example show how to check that you have 4 lines with the same ID and if you do read the rest of the variables or execute LOSTCARD. ID 3 has a missing record;
353 data ex;
354 infile cards n=4 stopover;
355 input #1 id #2 id2 #3 id3 #4 id4 #;
356 if id eq id2 eq id3 eq id4
357 then input #1 id dob:mmddyy. dbs1 mbs1
358 #2 id2 dbs2 mbs2
359 #3 id3 dbs3 mbs3
360 #4 id4 bls1 bls2 bls3 sex :$1.;
361 else lostcard;
362 format dob mmddyy.;
363 cards;
NOTE: LOST CARD.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
372 3 01/03/80 11 7
373 3 27 2
374 3 3.24 3.4 3.83 M
375 4 08/01/80 5 12
NOTE: LOST CARD.
376 4 28 -9
NOTE: LOST CARD.
377 4 3 4
NOTE: The data set WORK.EX has 3 observations and 15 variables.
data ex;
infile cards n=4 stopover;
input #1 id #2 id2 #3 id3 #4 id4 #;
if id eq id2 eq id3 eq id4
then input #1 id dob:mmddyy. dbs1 mbs1
#2 id2 dbs2 mbs2
#3 id3 dbs3 mbs3
#4 id4 bls1 bls2 bls3 sex :$1.;
else lostcard;
format dob mmddyy.;
cards;
1 04/30/78 6 10
1 -9 7
1 14 1
1 1.62 1.35 1.47 F
2 05/19/79 27 11
2 20 -9
2 5 6
2 1.71 1.31 1.76 F
3 01/03/80 11 7
3 27 2
3 3.24 3.4 3.83 M
4 08/01/80 5 12
4 28 -9
4 3 4
4 3.1 3.69 3.27 M
;;;;
run;
proc print;
run;

Required help in building a crosstab query - PostgreSQL

I have two tables.
Table1:
Label Date CT
A 2014-01-01 19
A 2014-02-01 10
A 2014-03-01 19
A 2014-04-01 18
B 2014-01-01 20
B 2014-02-01 16
B 2014-03-01 14
B 2014-04-01 16
C 2014-01-01 13
C 2014-02-01 12
C 2014-03-01 19
C 2014-04-01 14
Table2 :
Label Date CT
D 2014-01-01 19
D 2014-02-01 10
D 2014-03-01 19
D 2014-04-01 18
E 2014-01-01 20
E 2014-02-01 16
E 2014-03-01 14
E 2014-04-01 16
F 2014-01-01 13
F 2014-02-01 12
F 2014-03-01 19
F 2014-04-01 14
Desired Output :
Label Jan'14 Feb'14 Mar'14 Apr'14 Total
A 19 10 19 18 66
B 20 16 14 16 66
C 13 12 19 14 58
D 19 10 19 18 66
E 20 16 14 16 66
F 13 12 19 14 58
I'm new to PostgreSQL.
I wanted to take the unique values of Label column from both the table.
And produce the sum total of count to their respective label.
I can combine both the tables in a straight forward method using UNION ALL.
But that'll not give me the desired output or the view like a pivot.
I did google on this but nothing could help me out.
Came across this in SO. And I'm still trying on with it.
But I actually don't have a clue whether it can be done or not.
Can someone help me in getting the desired output.
Thanks in advance!!
Try Like This
select *,("Jan ''14" + "Feb ''14" + "Mar ''14" +"Apr ''14") as total
from crosstab($$
select id,to_char(da,'Mon ''yy') as tt,no from t2
union all
select id,to_char(da,'Mon ''yy') as tt,no from "T1"
$$,$$values ('Jan ''14'), ('Feb ''14'),('Mar ''14'),('Apr ''14') $$) as at
(id text, "Jan ''14" integer,"Feb ''14" integer,"Mar ''14" integer,
"Apr ''14" integer) order by id

Functions with Arrays in R

Let's say I have maximum temperature data for the last 20 years. My data frame has a column for month, day, year and MAX_C (temperature data). I want to calculate the mean (and standard deviation, and range) maximum temperature from June 31 of one year to July 1 of the preceding year (i.e. mean max daily temp from July 1, 1991 to June 31, 1992). Is there an efficient way to do this?
My approach, thus far, has been to create an array:
maxt.prev12<-tapply(maxt$MAX_C,INDEX=list(maxt$month,maxt$day,maxt$year),mean)
I put mean in as the function as tapply was not producing an array without a function after the INDEX, but mean is not actually calculating anything here. Then I was thinking about trying to take January through June from one the matrices (i.e. 1992), and July through December from the preceding matrix (i.e. 1991), and then computing the mean. I'm not entirely sure how to do that part, however, there must be a more efficient way of performing these calculations in R
EDIT
Here is a simple sample set of data
maxt
day month year MAX_C
1 1 1990 29
1 2 1990 28
1 3 1990 32
1 4 1990 26
1 5 1990 24
1 6 1990 32
1 7 1990 30
1 8 1990 28
1 9 1990 28
1 10 1990 24
1 11 1990 30
1 12 1990 30
1 1 1991 25
1 2 1991 26
1 3 1991 28
1 4 1991 25
1 5 1991 24
1 6 1991 32
1 7 1991 26
1 8 1991 32
1 9 1991 26
1 10 1991 26
1 11 1991 27
1 12 1991 26
1 1 1992 27
1 2 1992 25
1 3 1992 29
1 4 1992 32
1 5 1992 27
1 6 1992 27
1 7 1992 24
1 8 1992 25
1 9 1992 28
1 10 1992 26
1 11 1992 31
1 12 1992 27
I would create an "indicator year" column which was equal to the year if month in July-Dec but equal to year-1 when month in Jan-June.
EDITED month reference in light of the fact it was numeric rather than character:
> maxt$year2 <- maxt$year
> maxt[ maxt$month %in% 1:6, "year2"] <-
+ maxt[ maxt$month %in% 1:6, "year"] -1
> # month.name is a 12 element constant vector in all versions of R
> # check that it matches the spellings of your months
>
> mean_by_year <- tapply(maxt$MAX_C, maxt$year2, mean, na.rm=TRUE)
> mean_by_year
1989 1990 1991 1992
28.50000 27.50000 27.50000 26.83333
If you wanted to change the labels so they reflected the non-calendar year derivation:
> names(mean_by_year) <- paste(substr(names(mean_by_year),3,4),
+ as.character( as.numeric(substr(names(mean_by_year),3,4))+1),
sep="_")
> mean_by_year
89_90 90_91 91_92 92_93
28.50000 27.50000 27.50000 26.83333
Although I don't think it will be quite right at the millennial turn.

Resources