I have this small geo location dataset.
37.9636140,23.7261360
37.9440840,23.7001760
37.9637190,23.7258230
37.9901450,23.7298770
From a random location.
For example this one 37.97570, 23.66721
I need to create a bash command with awk that returns the distances with simple euclidean distance.
This is the command i use
awk -v OFMT=%.17g -F',' -v long=37.97570 -v lat=23.66721 '{for (i=1;i<=NR;i++) distances[i]=sqrt(($1 - long)^2 + ($2 - lat)^2 ); a[i]=$1; b[i]=$2} END {for (i in distances) print distances[i], a[i], b[i]}' filename
When I run this command i get this weird result which is not correct, could someone explain to me what am I doing wrong?
➜ awk -v OFMT=%.17g -F',' -v long=37.97570 -v lat=23.66721 '{for (i=1;i<=NR;i++) distances[i]=sqrt(($1 - long)^2 + ($2 - lat)^2 ); a[i]=$1; b[i]=$2} END {for (i in distances) print distances[i], a[i], b[i]}' filename
44,746962127881936 37.9440840 23.7001760
44,746962127881936 37.9901450 23.7298770
44,746962127881936 37.9636140 23.7261360
44,746962127881936
44,746962127881936 37.9637190 23.7258230
Updated.
Appended the command that #jas provided, I included od -c as #mark-fuso suggetsted.
The issue now is that I get different results from #jas
Command output which showcases the new issue.
awk -v OFMT=%.17g -F, -v long=37.97570 -v lat=23.66721 '
{distance=sqrt(($1 - long)^2 + ($2 - lat)^2 ); print distance, $1, $2}
' file
1,1820150904705098 37.9636140 23.7261360
1,1820150904705098 37.9440840 23.7001760
1,1820150904705098 37.9637190 23.7258230
1,1820150904705098 37.9901450 23.7298770
od -c that shows the content of the input file.
od -c file
0000000 3 7 . 9 6 3 6 1 4 0 , 2 3 . 7 2
0000020 6 1 3 6 0 \n 3 7 . 9 4 4 0 8 4 0
0000040 , 2 3 . 7 0 0 1 7 6 0 \n 3 7 . 9
0000060 6 3 7 1 9 0 , 2 3 . 7 2 5 8 2 3
0000100 0 \n 3 7 . 9 9 0 1 4 5 0 , 2 3 .
0000120 7 2 9 8 7 7 0 \n
0000130
While #jas has provided a 'fix' for the problem, thought I'd throw in a few comments about what OP's code is doing ...
Some basics ...
the awk program ({for (i=1;i<=NR;i++) ... ; b[i]=$2}) is applied against each row of the input file
as each row is read from the input file the awk variable NR keeps track of the row number (ie, NR=1 for the first row, NR=2 for the second row, etc)
on the last pass through the for loop the counter (i in this case) will have a value of NR+1 (ie, the i++ is applied on the last pass through the loop thus leaving i=NR+1)
unless there are conditional checks for each line of input the awk program will apply against every line from the input file (including blank lines - more on this below)
for (i in distances)... isn't guaranteed to process the array indices in numerical order
The awk/for loop is doing the following:
for the 1st input row (NR=1) we get for (i=1;i<=1;i++) ...
for the 2nd input row (NR=2) we get for (i=1;i<=2;i++) ...
for the 3rd input row (NR=3) we get for (i=1;i<=3;i++) ...
for the 4th input row (NR=4) we get for (i=1;i<=4;i++) ...
For each row processed by awk the program will overwrite all previous entries in the distance[] array; net result is the last row (NR=4) will place the same values in all 4 entries of the the distance[] array.
The a[i]=$1; b[i]=$2 array assignments occur outside the scope of the for loop so these will be assigned once per input row (ie, will not be overwritten) however, the array assignments are being made with i=NR+1; net result is the contents of the 1st row (NR=1) are stored in array entries a[2] and b[2], the contents of the 2nd row (NR=2) are stored in array entries a[3] and a[3], etc.
Modifying OP's code with print i, distances[i], a[i], b[i]} and running against the 4-line input file I get:
1 0.064310270672728084 # no data for 2nd/3rd columns because a[1] and b[1] are never set
2 0.064310270672728084 37.9636140 23.7261360 # 2nd/3rd columns are from 1st row of input
3 0.064310270672728084 37.9440840 23.7001760 # 2nd/3rd columns are from 2nd row of input
4 0.064310270672728084 37.9637190 23.7258230 # 2nd/3rd columns are from 3rd row of input
From this we can see the first column of output is the same (ie, distance[1]=distance[2]=distance[3]=distance[4]), while the 2nd and 3rd columns are the same as the input columns except they are shifted 'down' by one row.
That leaves us with two outstanding issues ...
why does OP show 5 lines of output?
why is the first column consist of the garbage 44,746962127881936?
I was able to reproduce this issue by adding a blank line on the end of my input file:
$ cat geo.dat
37.9636140,23.7261360
37.9440840,23.7001760
37.9637190,23.7258230
37.9901450,23.7298770
<<=== blank line !!
Which generates the following with OP's awk code:
44.746962127881936
44.746962127881936 37.9636140 23.7261360
44.746962127881936 37.9440840 23.7001760
44.746962127881936 37.9637190 23.7258230
44.746962127881936 37.9901450 23.7298770
NOTES:
this order is different from OP's sample output and is likely due to OP's awk version not processing for (i in distances)... in numerical order; OP can try something like for (i=1;i<=NR;i++)... or for (i=1;i in distances; i++)... (though the latter will not work correcly for a sparsely populated array)
OPs output (in the question; in comment to #jas' answer) shows a comma (,) in place of the period (.) for the first column so I'm guessing OP's env is using a locale that switches the comma/period as thousands/decimal delimiter (though the input data is based on an 'opposite' locale)
Notice we finally get to see the data from the 4th line of input (shifted 'down' and displayed on line 5) but the first column has what appears to be a nonsensical value ... which can be tracked back to applying the following against a blank line:
sqrt(($1 - long)^2 + ($2 - lat)^2 )
sqrt(( - long)^2 + ( - lat)^2 ) # empty line => $1 = $2 = undefined/empty
sqrt(( - 37.97570)^2 + ( - 23.66721^2 )
sqrt( 1442.153790 + 560.136829 )
sqrt( 2002.290619 )
44.746952... # contents of 1st column
To 'fix' this issue the OP can either a) remove the blank line from the input file or b) add some logic to the awk script to only perform calculations if the input line has (numeric) values in fields #1 & #2 (ie, $1 and $2 are not empty); it's up to the coder to decide on how much validation to apply (eg, are the fields numeric, are the fields within the bounds of legitimate long/lat values, etc).
One last design-related comment ... as demonstrated in jas' answer there is no need for any of the arrays (which in turn reduces memory usage) when all desired output can generated 'on-the-fly' while processing each line of the input file.
Awk takes care of the looping for you. The code will be run in turn for each line of the input file:
$ awk -v OFMT=%.17g -F, -v long=37.97570 -v lat=23.66721 '
{distance=sqrt(($1 - long)^2 + ($2 - lat)^2 ); print distance, $1, $2}
' file
0.060152679674309095 37.9636140 23.7261360
0.045676346307474212 37.9440840 23.7001760
0.059824979147508742 37.9637190 23.7258230
0.064310270672728084 37.9901450 23.7298770
EDIT:
OP is getting different results. I notice in OP's output that there are commas instead of decimal points when printing the distance. This points to a possible issue with the locale setting.
OP confirms that the locale was set for greek, causing the difference in output.
I've looked around but couldn't find a satisfying solution... Basically I made a function that calculates the probability distribution of x number of loss in a portfolio of n credits... And I am trying to write the output in a text file into two columns where the first column would be the X (number of defaults) and second column would be the P(density function of each loss).. something like this:
X P
1 0.005
2 0.003
3 0.005
4 0.005
5 0.005
etc.
I've looked around and people suggested using negative- sign in front of my %d and %f when using fprintf but no luck....
Here's a sample of my code and the output it gives me...
Code:
for(i=0;i<d+1;i++)
{
Densite= gsl_ran_binomial_pdf(i,p,d);
fprintf(pF,"%-5d %-20f .\n",i, Densite);
}
Output:
0 0.005921 .
1 0.031161 .
2 0.081182 .
3 0.139576 .
4 0.178143 .
5 0.180018 .
6 0.150015 .
7 0.106026 .
8 0.064871 .
9 0.034901 .
10 0.016716 .
How to remedy?
Thanks in advance! (complete noob that started coding in C like two days ago..)
Did you run the executable program on Windows or Linux? If Window please use \r\n for new line.
I am trying to print the contents of a file. I have a file maze.txt with the following contents:
7 7
1 1 R N E
1 2 B N W
1 3 B N N
And I am printing it using the following code:
with open(os.path.join('maze.txt')) as f:
for line in f:
print line
f.close()
However, my output has extra empty lines in between:
7 7
1 1 R N E
1 2 B N W
1 3 B N N
I've tried changing my print line to print line[0:-1], which works except it will cut off the last character in the final line because there's not a newline to get rid of after it. Is there an easy way to avoid this?
Put a comma at the end of the print statement:
print line,
Just as the previous answer: when the print function doesn't end with a ',', then it adds a 'newline'.
Also, on your code, when opening a file with the 'with' code, you don't need to close the file: it's closed automatically when exiting the 'with' chunk of code.
I wanted via matlab to read a table of data from a txt file after a specific expression and a number of non desired lines for example the AA.txt have:
Information about students :
AAAA
BBBB
1 10 100
2 3 15
! ! ! a number of lines
10 6 9
I have like information the expression 'Information about students', the number of skipped lines 2 and the number of columns 3 and rows 10 in desired matrix.
if I understand correctly, you wanna skip the first 3 lines (assuming them as headers) and then reading the rest.
I would follow this procedure:
fid = fopen(filename,'r');
A = textscan(fid,'%f %f %f','HeaderLines',3,'Delimiter','\r\n');
I currently do not have access to MATLAB, but I do believe it will work.
I have a file called resistors.dat and I need to get my program to read and parse the values from the file into my program.
How would I read a file like this in C?
Read from the le resistors.dat (supplied on Blackboard) similarly to what you have done in Problem 2 of Lab 12. Each line in resistors.dat now represents one row: Ria, Rib and Ric (i = 1; 2; : : : ; n) of the circuit. Expand Problem 2 of Lab 12 to calculate the total resistance of the circuit. Hint: The total resistance is given by 1 R = 1 R1 + 1 R2 + 1 R3 + : : : + 1 Rn where Ri is the sum of resistances in one input row. In a loop, compute the sum of the inverse resistances 1=Ri. After the input has finished, compute the inverse of this sum to obtain the final result.
This is the content of resistors.dat:
64.35 35.52 85.37
90.43 12.99 80.40
98.37 32.63 78.42
3.82 82.74 52.61
3.75 72.47 49.05
96.73 16.07 23.46
48.15 36.62 83.64
51.96 27.19 22.38
4.18 46.07 91.21
96.94 8.17 50.45
0
There are several ways to accomplish this. I expect that your Resistors.dat file looks something like this:
r=1
r=20
r=22
r=2
I suggest you do something like this:
fopen to open the file, fgets in a while loop until the end of the file (!EOF), to read each line. Then use sscanf to parse each line.