Convert text/data into x, y arrays? - database

I have the following text as the output of my program:
Size: 31 Mflops/s: 355.10 Percentage: 0.79
Size: 32 Mflops/s: 370.89 Percentage: 0.83
...
Size: 767 Mflops/s: 360.15 Percentage: 0.80
and I want to get an X array = [31, 32, ... 767] and a Y array = [355.10, 370.89, ..., 360.15].
I can't copy-paste values vertically like the snippet above, so I can't extract values easily. The output comes with spaces like above.
I would then put these into matplotlib for easy plotting.
Is there an easy way to do this?
Thanks!

I used
http://www.molbiotools.com/textextractor.html
(input: "Size: ")
followed by
https://delim.co/#
to get exactly what I needed.

Related

Calculating the drawdown within a Numpy Array Python

I am trying to write a function that calculates how much the biggest dip was in each array. the function below calculates between the max and the min but it does not get Expected Output I am looking for. The resultant of calc(C) should be -62 since 11,66,45,4 the value went down from 66 to 4 in the array resulting in the dip to be -62 points below 66. How would I be able to fix the function below? Sample code gotten from: issue
def calc(arr):
try:
_min = min(arr)
index_min = np.where(arr == _min)[0][0] #first occurence
_max = max(arr[:index_min])
print(_min-_max)
except:
print('No drawdown')
A = np.array([0,2,5,44,-12,3,-5])
B = np.array([0,10,-110,23,45,66,30,2,12])
C = np.array([0,10,11,-23,45,11,66,45,4,12])
D = np.array([0,5,6,7,8])
E = np.array([0,10,5,6,8])
calc(A)
calc(B)
calc(C)
calc(D)
calc(E)
Output:
-56
-120
-34
No drawdown
No drawdown
Expected Output:
-56
-120
-62
No drawdown
-5
The biggest dip does not necessarily happen at the global maximum or global minimum. We need an exhaustive approach to find the largest dip:
check the maximum value so far, for which we can use numpy.maximum.accumulate;
calculate the biggest dip for each position.
And take the largest dip among all the dips.
def calc(a):
acc_max = np.maximum.accumulate(a)
return (a - acc_max).min()
calc(A)
# -56
calc(B)
# -120
calc(C)
# -62
calc(D)
# 0
calc(E)
# -5

Matlab finding best fit line from scatter plot but exclude some data points

I use Matlab to find the best fit line from a scatter plot, but I need to delete some data points. For example I am trying to find the best fit line of
x = [10 70 15 35 55 20 45 30];
y = [40 160 400 90 500 60 110 800];
Now I need to delete all y points that value is over 300, and of course deleting corresponding x points, and then make a scatter plot and find the best fit line. So how to implement this?
Now I need to delete all y points that value is over 300, and of course deleting corresponding x points,
There is standard Matlab trick - Logical Indexing (see for example in matrix-indexing):
x = [10 70 15 35 55 20 45 30]; y = [40 160 400 90 500 60 110 800];
filter = (y<300);
y1 = y(filter);
x1 = x(filter);
plot(x,y,'+b',x1,y1,'or');
You can use polyfit (Matlab Doc) function for linear fit:
ff=polyfit(x1,y1,1);
plot(x,y,'*b',x1,y1,'or',x1,ff(1)*x1 + ff(2),'-g');
grid on;
The best way is to logically filter the dataset, then plot it.
NOTE: Data should be in column format. If it isn't, rotate like x'.
filter = (y<300);
x = x.*filter;
x = [zeros(length(x),1),x]; % this is to get the b(0) coefficient
y = y.*filter;
b = x\y;
x = x(:,2); % cleaning up column of zeros
plot(x,y,'bo')
hold on
plot([min(x),max(x)],(b(1)+b(2))*[min(x),max(x)])
hold off
axis tight

How is an array sliced?

I have some sample code where the array is sliced as follows:
A = X(:,2:300)
What does this mean about the slice of the array?
: stands for 'all' if used by itself and 2:300 gives an array of integers from 2 to 300 with a spacing of 1 (1 is implicit) in MATLAB. 2:300 is the same as 2:1:300 and you can even use any spacing you wish, for example 2:37:300 (result: [2 39 76 113 150 187 224 261 298]) to generate equally spaced numbers.
Your statement says - select every row of the matrix A and columns 2 to 300. Suggested reading

How to store the Centroid values of blobs in one Array?

My picture has a certain number of various shapes of blobs. I want to store those centroid values in one array for the future use. So I tried the following code, but it did not work. So can anyone help me?
Sample:
for i = 1:length(STATS)
centroid = STATS(i).Centroid;
array = zeros(length(STATS));
array(i) = centroid;
end
I want to store the centroid data in one array like below
array=
145 145
14 235
145 544
14 69
74 55
Try the following:
for i = 1:length(STATS)
array{i} = STATS(i).Centroid;
end
You can print the entire array using the following:
array{:}
You can read more about cell arrays here. Also, in your older code, you were trying to assign an array (Centroid) to an element of an array(array(i)).
How about:
array=cell2mat({STATS.Centroid});
Assuming
STATS(1).Centroid = [145 145];
STATS(2).Centroid = [14 235]; % Etc...
Try:
array = reshape([STATS.Centroid],2,size(STATS,2))'
array =
145 145
14 235
145 544
14 69
74 55
How this works:
[STATS.Centroid] is a short version of [STATS(1).Centroid, STATS(2).Centroid, .. STATS(n).Centroid]. This will give you the values as a vector. reshape is then used to make it into your desired size.

How to get an evenly distributed sample from Perl array values?

I have an array containing many values between 0 and 360 (like degrees in a circle), but unevenly distributed:
1,45,46,47,48,49,50,51,52,53,54,55,100,120,140,188, 210, 280, 355
Now I need to reduce those values to e.g. 4 only, but as evenly as possible distributed values.
How to do that?
Thanks,
Jan
Put the numbers on a circle, like a clock. Now construct a logical cross, say at 12, 3, 6, and 9 o’clock. Put the 12 at the first number. Now find what numbers would be nearest to 3, 6, and 9 o’clock, and record the sum of those three numbers’ distances next to the first number.
Iterate by rotating the top of your cross — the 12 o’clock point — clockwise until it exactly lines up with the next number. Again measure how far the nearest numbers are to each of your three other crosspoints, and record that score next to this current 12 o’clock number.
Repeat until you reach your 12 o’clock has rotated all the way to the original 3 o’clock, at which point you’re done. Whichever number has the lowest sum assigned to it determines the winning configuration.
This solution generalizes to any range of values R and any number N of final points you wish to reduce the set to. Each point on the “cross” is R/N away from each other, and you need only rotate until the top of your cross reaches where the next arm was in the original position. So if you wanted 6 points, you would have a 6-pointed cross, each 60 degrees apart instead of a 4-pointed cross each 90 degrees apart. If your range is different, you still do the same sort of operation. That way you don’t need a physical clock and cross to implement this algorithm: it works for any R and N.
I feel bad about this answer from a Perl perspective, as I’ve not managed to include any dollar signs in the solution. :)
Use a clustering algorithm to divide your data into evenly distributed partitions. Then grab a random value from each cluster. The following $datafile looks like this:
1 1
45 45
46 46
...
210 210
280 280
355 355
First column is a tag, second column is data. Running the following with $K = 4:
use strict; use warnings;
use Algorithm::KMeans;
my $datafile = $ARGV[0] or die;
my $K = $ARGV[1] or 0;
my $mask = 'N1';
my $clusterer = Algorithm::KMeans->new(
datafile => $datafile,
mask => $mask,
K => $K,
terminal_output => 0,
);
$clusterer->read_data_from_file();
my ($clusters, $cluster_centers) = $clusterer->kmeans();
my %clusters;
while (#$clusters) {
my $cluster = shift #$clusters;
my $center = shift #$cluster_centers;
$clusters{"#$center"} = $cluster->[int rand( #$cluster - 1)];
}
use YAML; print Dump \%clusters;
returns this:
120: 120
199: 188
317.5: 355
45.9166666666667: 46
First column is the center of the cluster, second is the selected value from that cluster. The centers' distance to one another should be maximized according to the Expectation Maximization algorithm.

Resources