how can i compare two csv files? - file

train.csv:
01kcPWA9K2BOxQeS5Rju 1
04EjIdbPV5e1XroFOpiN 1
05EeG39MTRrI6VY21DPd 1
05rJTUWYAKNegBk2wE8X 1
0AnoOZDNbPXIr2MRBSCJ 1
0AwWs42SUQ19mI7eDcTC 1
0cH8YeO15ZywEhPrJvmj 1
0DNVFKwYlcjO7bTfJ5p1 1
0DqUX5rkg3IbMY6BLGCE 1
0eaNKwluUmkYdIvZ923c 1
0fHVZKeTE6iRb1PIQ4au 1
0G4hwobLuAzvl1PWYfmd 1
test.csv:
01IsoiSMh5gxyDYTl4CB
01SuzwMJEIXsK7A8dQbl
01azqd4InC7m9JpocGv5
01jsnpXSAlgw6aPeDxrU
01kcPWA9K2BOxQeS5Rju
02IOCvYEy8mjiuAQHax3
02JqQ7H3yEoD8viYWlmS
02K5GMYITj7bBoAisEmD
02MRILoE6rNhmt7FUi45
02mlBLHZTDFXGa7Nt6cr
02zcUmKV16Lya5xqnPGB
03nJaQV6K2ObICUmyWoR
04BfoQRA6XEshiNuI7pF
04EjIdbPV5e1XroFOpiN
these type of rows and i want each row compare with train.csv rows and find the match where it match save the id against that row and output should be like this:
output.csv:
01kcPWA9K2BOxQeS5Rju 2
04EjIdbPV5e1XroFOpiN 2
05EeG39MTRrI6VY21DPd 4
05rJTUWYAKNegBk2wE8X 1
0AnoOZDNbPXIr2MRBSCJ 1
0AwWs42SUQ19mI7eDcTC 5
0cH8YeO15ZywEhPrJvmj 5
0DNVFKwYlcjO7bTfJ5p1 1
0DqUX5rkg3IbMY6BLGCE 3
0eaNKwluUmkYdIvZ923c 1
0fHVZKeTE6iRb1PIQ4au 1
0G4hwobLuAzvl1PWYfmd 2
Kindly help me

Related

Error: Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

I have the following dataset about the choices of different car brands and their attributes. I would like to create a matrix based on each attribute of the cars.
RespNum Task Concept Make Exterior.Design Interior.design
1 100086500 1 1 3 2 3
2 100086500 1 2 1 3 2
3 100086500 1 3 4 1 1
4 100086500 1 4 0 0 0
5 100086500 2 1 1 3 2
6 100086500 2 2 5 1 3
Driving.performance Driving.attributes Comfort Practibility Safety
1 1 1 1 3 3
2 3 3 3 2 1
3 2 2 2 1 2
4 0 0 0 0 0
5 3 2 1 1 3
6 1 3 3 3 2
Quality Equipment Sustainability Economy Price Response
1 2 1 1 3 1 0
2 1 3 3 1 3 0
3 3 2 2 2 2 1
4 0 0 0 0 0 0
5 3 2 1 1 4 0
6 1 3 3 3 8 0
I am using the function:
Make = attribcoding(6,4,'Other')
The first input (6) is the number of levels, the second (4) is the column position in the dataset, and the last ('Other') is the name of the outside option. However, I get the following error message:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

Nested for-loop: error variable already defined

I have a nested loop in Stata with four levels of foreach statements. With this loop, I am trying to create a new variable named strata that ranges from 1 to 40.
foreach x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 {
foreach r in 1 2 3 4 5 {
foreach s in 1 2 {
foreach a in 1 2 3 4 {
gen strata= `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
I get an error :
"variable strata already defined"
Even with the error, the loop does assign strata = 1, but not the rest of the strata. All other cells are missing/empty.
Example data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age sex race)
1 2 2
1 2 1
1 1 1
1 1 1
1 2 1
2 2 1
2 2 1
4 2 1
1 2 1
4 2 1
3 2 1
2 2 1
4 2 1
4 2 2
3 2 1
4 1 3
4 2 1
4 2 1
2 1 2
4 2 1
2 2 1
3 2 1
3 2 1
1 2 3
4 2 1
1 2 5
4 2 1
4 2 1
4 2 2
4 2 1
2 2 1
4 1 1
3 2 1
1 2 1
2 2 1
4 2 1
1 2 2
2 2 3
1 1 3
4 2 1
2 2 3
1 2 1
1 1 1
2 2 3
1 2 1
1 1 3
1 2 1
2 2 1
3 2 1
1 2 1
4 2 1
1 2 2
1 2 1
2 2 1
4 2 1
4 2 1
1 2 1
1 2 1
4 2 1
2 2 1
4 2 1
1 2 1
1 1 3
2 2 1
1 1 1
4 1 1
3 2 1
2 2 1
1 2 1
1 1 1
2 2 3
4 2 2
2 2 1
2 2 1
3 2 1
2 2 2
3 2 1
2 1 1
1 1 1
3 2 1
1 2 3
4 2 1
4 2 1
2 2 1
1 2 1
1 1 1
3 2 1
4 2 1
2 2 3
1 2 3
4 2 1
3 2 1
2 2 1
4 2 1
3 2 1
2 1 1
1 2 1
2 2 1
2 2 3
1 1 1
end
label values sex sex
label def sex 1 "male (1)", modify
label def sex 2 "female (2)", modify
label values race race
label def race 1 "non-Hispanic white (1)", modify
label def race 2 "black (2)", modify
label def race 3 "AAPI/other (3)", modify
label def race 5 "Hispanic (5)", modify
generate is for generating new variables. The second time your code reaches a generate statement, the code fails for the reason given.
One answer is that you need to generate your variable outside the loops and then replace inside.
For other reasons your code can be rewritten in stages.
First, integer sequences can be more easily and efficiently specified with forvalues, which can be abbreviated: I tend to write forval.
gen strata = .
forval x = 1/40 {
forval r = 1/5 {
forval s = 1/2 {
forval a = 1/4 {
replace strata = `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
Second, the code is flawed any way. Everything ends up as 40!
Third, you can do allocations much more directly, say by
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
This is a self-contained reproducible demonstration:
clear
set obs 5
gen race = _n
expand 2
bysort race : gen sex = _n
expand 4
bysort race sex : gen age = _n
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
isid strata
Clearly you can and should vary the recipe for a different preferred scheme.

Get unique values in matrix with Matlab

I'm looking for fastest way to get unique values in matrix with Matlab! I have a matrix like this:
1 2
1 2
1 3
1 5
1 23
2 1
3 1
3 2
3 2
3 2
4 17
4 3
4 17
and need to get something like this:
1 2
1 3
1 5
1 23
2 1
3 1
3 2
4 3
4 17
Actually I need unique values by combination of columns in each row.
Have a look at matlabs unique() function with the argument 'rows'.
C = unique(A,'rows')
https://de.mathworks.com/help/matlab/ref/unique.html

Unique Columns Across an Array?

I have an array structured like so:
a = [1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5;
1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 3 3 1 1 1 2 3 4 4 4 1 1 1 1 2 2 3 3];
Pretty much, it's a 2 by n (I simplified my matrix in this question with reduced number of columns for simplicity's sake), no real pattern. I want to be able to find the unique number of columns. So in this simplified example, I can (but it'll take a while) count by hand and noticed that my unique matrix b is:
b= 1 1 2 2 2 3 3 3 3 4 5 5
1 2 1 2 3 1 2 3 4 1 2 3
In MATLAB, I can do something like
size(b,2)
To get the number of unique columns. In this example
size(b,2) = 12
My question is, how do I go from matrix a to matrix b so that I can do this computationally for very large n dimensional matrices that I have?
Use unique:
a = [1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5;
1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 3 3 1 1 1 2 3 4 4 4 1 1 1 1 2 2 3 3];
% Transpose to leverage the rows flag, then transpose back
b = unique(a.', 'rows').';
Which returns:
b =
1 1 2 2 2 3 3 3 3 4 5 5
1 2 1 2 3 1 2 3 4 1 2 3

Actionscript 3.0 Cube Crash like game

I'm trying to build game like http://games.yahoo.com/game/bricks-breaking in actionscript 3 (flash builder).
I am able to create an array of bricks (that are visible on game start), but I have no idea how to find a group of bricks in array.
Lets say we have array like so:
1 2 2 1 3 3 1 1 1 1 1 1 1
1 2 1 1 1 3 1 1 1 1 1 1 1
1 2 1 1 1 3 1 1 1 1 1 1 3
1 1 2 1 1 3 3 3 1 1 1 1 3
1 1 1 2 1 3 1 3 3 1 1 1 3
1 1 1 3 3 3 1 3 3 1 1 1 3
1 1 1 1 1 1 1 3 3 1 1 1 1
When the user clicks any brick colored red (in array lets say it is 3) the array after removing all 3 will look like that:
1 2 2 0 0 0 0 0 0 1 1 1 1
1 2 1 1 0 0 1 0 0 1 1 1 1
1 2 1 1 1 0 1 0 0 1 1 1 3
1 1 2 1 1 0 1 0 1 1 1 1 3
1 1 1 1 1 0 1 1 1 1 1 1 3
1 1 1 2 1 0 1 1 1 1 1 1 3
1 1 1 1 1 1 1 1 1 1 1 1 1
Basicly I want to remove all the items that are in group and are the same color.
Any suggestions how to do that?
Is there any kind of algorythm that I should use?
Thanks for advice
A simple way to remove elements is to use a recursive function. It's not the only way (or even a good one) but it should be enough for this kind of game. Basically something like this:
function breakBricks(x:int, y:int, color:int):void {
if(bricks[y][x] != color) return;
bricks[y][x] = 0;
breakBricks(x + 1, y, color);
breakBricks(x, y + 1, color);
breakBricks(x - 1, y, color);
breakBricks(x, y - 1, color);
}
Begin with the position that the user clicked and the colour of that position. If the colour matches it will set that entry to 0, if not it leaves the element alone. It recursively does this to all neighbouring elements. What is missing in this code are boundary checks which you need to add.
In the next step you could iterate over each of the arrays columns from bottom to top, keep reference of the position of the first 0 element you find and move any non-emtpy values you find after that to the lowest empty row position.

Resources