Loop for Data Management

Loop for Data Management - loops

For the dataset like the following
I want to change the value of G7_A6 which is either 1 or 2 based on B1_04. For a given ID (there are two same IDs) G7_A6 takes either only 1 or only 2.
I need to use loops in Stata [enter image description here][1] because I have a very large dataset and typing individual IDs is cumbersome
replace G7_A6="2" if B1_04=="3" | B1_04 =="4" | B1_04 == "5" | B1_04 =="6" |B1_04 =="7"
replace G7_A6="1" if B1_04=="2"
The link of picture is here
[1]: https://i.stack.imgur.com/Yv7RI.png

You do not need a loop program for this, as each relationship is row-specific. The code you are using would solve your problem but is a bit cumbersome. A cleaner version would be:
replace G7_A6 = 2 if B1_04 != 1 | B1_04 != 2
replace G7_A6 = 1 if B1_04 == 1 | B1_04 == 2
This will give you the following data:
id
name
B1_04
G7_A6
1
sam
2
1
1
margaret
1
1
9
Jim
5
2
9
Cinderella
1
1
You did not post a question either: you simply said you need a program to do what you already posted.

Related

Equivalent of Excel Pivoting in Stata

I have been working with country-level survey data in Stata that I needed to reshape. I ended up exporting the .dta to a .csv and making a pivot table in in Excel but I am curious to know how to do this in Stata, as I couldn't figure it out.
Suppose we have the following data:
country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
I would like the data to be reformatted as such:
country sum_1 sum_2
A 4 4
B 3 2
First I tried a simple reshape wide command but got the error that "values of variable response not unique within country" before realizing reshape without additional steps wouldn't work anyway.
Then I tried generating new variables conditional on the value of response and trying to use reshape following that... the whole thing turned into kind of a mess so I just used Excel.
Just curious if there is a more intuitive way of doing that transformation.

If you just want a table, then just ask for one:
clear
input str1 country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
end
tabulate country response
| response
country | 1 2 | Total
-----------+----------------------+----------
A | 4 4 | 8
B | 3 2 | 5
-----------+----------------------+----------
Total | 7 6 | 13
If you want the data to be changed to this, reshape is part of the answer, but you should contract first. collapse is in several ways more versatile, but your "sum" is really a count or frequency, so contract is more direct.
contract country response, freq(sum_)
reshape wide sum_, i(country) j(response)
list
+-------------------------+
| country sum_1 sum_2 |
|-------------------------|
1. | A 4 4 |
2. | B 3 2 |
+-------------------------+
In Stata 16 up, help frames introduces frames as a way to work with multiple datasets in the same session.

How to remove first joined row Talend DI

How to delete first matching row in a file using a second one ?
I use Talend DI 7.2 and I need to delete some rows in one delimited file using a second one containing the rows to delete. My first file contains multiple rows matching the second one but for each row in my second file I need to delete only the first row matching in the first file.
For example :
File A : File B :
Code | Amount Code | Amount
1 | 45 1 | 45
1 | 45 3 | 70
2 | 50 3 | 70
2 | 60
3 | 70
3 | 70
3 | 70
3 | 70
At the end, I need to obtain :
File A :
Code | Amount
1 | 45
2 | 50
2 | 60
3 | 70
3 | 70
Only the first match in file A for each row in file B is missing.
I tried with tMap and tFilterRow but it matches all rows not only the first one.
Example edited : I can have many times the same couple code-amount in file B and I need to remove this same number of rows from file A

You can do this by using Variables within the Tmap. I created 3:
v_match - return "match" if code and amount are in lookup file b.
v_count - add to the count if it's a repeating value. otherwise reset to 0
v_last_row - set the value of v_match to this before comparing again. this way we can compare current row to last row and get counts
Then add an Expression filter to remove any first match.
This will give the desired results:

You can't delete rows from a file, so you'll have to generate a new file containing only the rows you want.
Here's a simple solution.
First, join your files using a left join between A as a main flow, and B as a lookup.
In the tMap, using an output filter, you only write to the output file the rows from A that don't match anything in B (row2.code == null) or those which have a match, but not a first match.
The trick is to use a Numeric.sequence, with the code as an id of the sequence; if the sequence returns a value other than 1, you know you've already had that line previously. If it's the first occurence of the code, the sequence would start at 1 and return 1, so the row is filtered out.

How can I use the LAG function and return the same value if the subsequent value in a row is duplicated?

I am using the LAG function to move my values one row down.
However, I need to use the same value as previous if the items in source column is duplicated:
ID | SOURCE | LAG | DESIRED OUTCOME
1 | 4 | - | -
2 | 2 | 4 | 4
3 | 3 | 2 | 2
4 | 3 | 3 | 2
5 | 3 | 3 | 2
6 | 1 | 3 | 3
7 | 4 | 1 | 1
8 | 4 | 4 | 1
As you can see, for instance in ID range 3-5 the source data doesn't change and the desired outcome should be fed from the last row with different value (so in this case ID 2).

Sql server's version of lag supports an expression in the second argument to determine how many rows back to look. You can replace this with some sort of check to not look back e.g.
select lagged = lag(data,iif(decider < 0,0,1)) over (order by id)
from (values(0,1,'dog')
,(1,2,'horse')
,(2,-1,'donkey')
,(3,2,'chicken')
,(4,23,'cow'))f(id,decider,data)
This returns the following list
null
dog
donkey
donkey
chicken
Because the decider value on the row with id of 2 was negative.

Well, first lag may not be the tool for the job. This might be easier to solve with a recursive CTE. Sql and window functions work over set. That said, our goal here is to come up with a way of describing what we want. We'd like a way to partition our data so that sequential islands of the same value are part of the same set.
One way we can do that is by using lag to help us discover if the previous row was different or not.
From there, we can now having a running sum over these change events to create partitions. Once we have partitions, we can assign a row number to each element in the partition. Finally, once we have that, we can now use the row number to look
back that many elements.
;with d as (
select * from (values
(1,4)
,(2,2)
,(3,3)
,(4,3)
,(5,3)
,(6,1)
,(7,4)
,(8,4)
)f(id,source))
select *,lag(source,rn) over (order by Id)
from (
select *,rn=row_number() over (partition by partition_id order by id)
from (
select *, partition_id = sum(change) over (order by id)
from (
select *,change = iif(lag(source) over (order by id) != source,1,0)
from d
) source_with_change
) partitioned
) row_counted
As an aside, this an absolutely cruel interview question I was asked to do once.

VB .net: How to put a value on a checkbox (databese: MS Access)

I just need a little help for my homework. I did my best think about the logic but it's too late. So basically, this is what I need to do :
If I'll check the checkbox 1 it will automatically input "100" on the second column of my datagridview and also if I'll check checkbox 2 it will also input "50" on my 3rd column.
[ ] Fee1 (checkbox1) = $100
[ ] Fee2 (checkbox2) = $50
The output should look like this:
StudentFeeTableGridView
Name| Fee 1| Fee 2
Jack | 0 | 50
Jill | 100 | 0
John | 100 | 0
Jose | 0 | 50
Thank you so much guys for helping! I owe you.

Set DataGridView.SelectionMode=FullRowSelect and use CheckedChanged event. With code below, values will be inserted to cells in selected row:
For 1 checkbox:
If (Fee1.checked=True) Then dataGridView1.CurrentRow.Cells("Fee1").Value="100"
For 2 checkbox:
If (Fee2.checked=True) Then dataGridView1.CurrentRow.Cells("Fee2").Value="50"
Note: .Cells("Fee1") and .Cells("Fee2") must be valid column name in DataGridView.

Comparisons across multiple rows in Stata (household dataset)

I'm working on a household dataset and my data looks like this:
input id id_family mother_id male
1 2 12 0
2 2 13 1
3 3 15 1
4 3 17 0
5 3 4 0
end
What I want to do is identify the mother in each family. A mother is a member of the family whose id is equal to one of the mother_id's of another family member. In the example above, for the family with id_family=3, individual 5 has mother_id=4, which makes individual 4 her mother.
I create a family size variable that tells me how many members there are per family. I also create a rank variable for each member within a family. For families of three, I then have the following piece of code that works:
bysort id_family: gen family_size=_N
bysort id_family: gen rank=_n
gen mother=.
bysort id_family: replace mother=1 if male==0 & rank==1 & family_size==3 & (id[_n]==id[_n+1] | id[_n]==id[_n+2])
bysort id_family: replace mother=1 if male==0 & rank==2 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n+1])
bysort id_family: replace mother=1 if male==0 & rank==3 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n-2])
What I get is:
id id_family mother_id male family_size rank mother
1 2 12 0 2 1 .
2 2 13 1 2 2 .
3 3 15 1 3 1 .
4 3 17 0 3 2 1
5 3 4 0 3 3 .
However, in my real data set, I have to get the mother for families of size 4 and higher (up to 9), which makes this procedure very inefficient (in the sense that there are too many row elements to compare "manually").
How would you obtain this in a cleaner way? Would you make use of permutations to index the rows? Or would you use a for-loop?

Here's an approach using merge.
// create sample data
clear
input id id_family mother_id male
1 2 12 0
2 2 13 1
3 3 15 1
4 3 17 0
5 3 4 0
end
save families, replace
clear
// do the job
use families
drop id male
rename mother_id id
sort id_family id
duplicates drop
list, clean abbreviate(10)
save mothers, replace
use families, clear
merge 1:1 id_family id using mothers, keep(master match)
generate byte is_mother = _merge==3
list, clean abbreviate(10)
The second list yields
id id_family mother_id male _merge is_mother
1. 1 2 12 0 master only (1) 0
2. 2 2 13 1 master only (1) 0
3. 3 3 15 1 master only (1) 0
4. 4 3 17 0 matched (3) 1
5. 5 3 4 0 master only (1) 0
where I retained _merge only for expositional purposes.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Loop for Data Management - loops

Related

Equivalent of Excel Pivoting in Stata

How to remove first joined row Talend DI

How can I use the LAG function and return the same value if the subsequent value in a row is duplicated?

VB .net: How to put a value on a checkbox (databese: MS Access)

Comparisons across multiple rows in Stata (household dataset)

Categories

Resources