Structuring a for loop to output classifier predictions in python - loops

I have an existing .py file that prints a classifier.predict for a SVC model. I would like to loop through each row in the X feature set to return a prediction.
I am currently trying to define the element from which to iterate over so as to allow for definition of the test statistic feature set X.
The test statistic feature set X is written in code as:
X_1 = xspace.iloc[testval-1:testval, 0:5]
testval is the element name used in the for loop in the above line:
for testval in X.T.iterrows():
print(testval)
I am having trouble returning a basic set of index values for X (X is the pandas dataframe)
I have tested the following with no success.
for index in X.T.iterrows():
print(index)
for index in X.T.iteritems():
print(index)
I am looking for the set of index values, with base 1 if possible, like 1,2,3,4,5,6,7,8,9,10...n
seemingly simple stuff...i haven't located an existing question via stackoverflow or google.
ALSO, the individual dataframes I used as the basis for X were refined with the line:
df1.set_index('Date', inplace = True)
Because dates were used as the basis for the concatenation of the individual dataframes the loops as written above are returning date values rather than
location values as I would prefer hence:
X_1 = xspace.iloc[testval-1:testval, 0:5]
where iloc, location is noted
please ask for additional code if you'd like to see more
the loops i've done thus far are returning date values, I would like to return index values of the location of the rows to accommodate the line:
X_1 = xspace.iloc[testval-1:testval, 0:5]

The loop structure below seems to be working for my application.
i = 1
j = list(range(1, len(X),1)
for i in j:

Related

Python: Finding the row index of a value in 2D array when a condition is met

I have a 2D array PointAndTangent of dimension 8500 x 5. The data is row-wise with 8500 data rows and 5 data values for each row. I need to extract the row index of an element in 4th column when this condition is met, for any s:
abs(PointAndTangent[:,3] - s) <= 0.005
I just need the row index of the first match for the above condition. I tried using the following:
index = np.all([[abs(s - PointAndTangent[:, 3])<= 0.005], [abs(s - PointAndTangent[:, 3]) <= 0.005]], axis=0)
i = int(np.where(np.squeeze(index))[0])
which doesn't work. I get the follwing error:
i = int(np.where(np.squeeze(index))[0])
TypeError: only size-1 arrays can be converted to Python scalars
I am not so proficient with NumPy in Python. Any suggestions would be great. I am trying to avoid using for loop as this is small part of a huge simulation that I am trying.
Thanks!
Possible Solution
I used the following
idx = (np.abs(PointAndTangent[:,3] - s)).argmin()
It seems to work. It returns the row index of the nearest value to s in the 4th column.
You were almost there. np.where is one of the most abused functions in numpy. Half the time, you really want np.nonzero, and the other half, you want to use the boolean mask directly. In your case, you want np.flatnonzero or np.argmax:
mask = abs(PointAndTangent[:,3] - s) <= 0.005
mask is a 1D array with ones where the condition is met, and zeros elsewhere. You can get the indices of all the ones with flatnonzero and select the first one:
index = np.flatnonzero(mask)[0]
Alternatively, you can select the first one directly with argmax:
index = np.argmax(mask)
The solutions behave differently in the case when there are no rows meeting your condition. Three former does indexing, so will raise an error. The latter will return zero, which can also be a real result.
Both can be written as a one-liner by replacing mask with the expression that was assigned to it.

Compute if loop SPSS

Ultimately, I want to change scores of 0 to 1, scores of 1 to 2, and scores of 2 to 3. I thought one way to do that was using +1, but I realize I could also use a more complicated if then series.
Here is what I did so far:
I used the existing variable (x) to create a new variable (y=x+1) using SPSS syntax. I only want to do this for variables with values >=0 (this was my approach to excluding cells with missing data; the range for x is 0-2).
I can create x+1, but it overwrites the existing variables.
DO REPEAT x =var_1 TO var_86.
if (x>=0) x=(x+1).
end repeat.
exe.
I tried this modification, but it doesn't work:
DO REPEAT x = var_1 TO var_86 / y = var_1a TO var_86a.
IF (x >= 0) y=x +1.
END REPEAT.
EXE.
The error message is:
DO REPEAT The form VARX TO VARY to refer to a range of variables has
been used incorrectly. When using VARX TO VARY to create new
variables, X must be an integer less than or equal to the integer Y.
(Can't use A3 TO A1.)
I tried many other configurations including vectors and loops but haven't yet figured out how to do this computation across the range of variables without overwriting the existing ones. Thanks in advance for any recommendations.
The message you are getting is because SPSS doesn't understand the form var_1a TO var_86a.
For the x to y form to work the number has to be at the end of the name, so for example varA_1 to varA_86 should work.
While you're at it, here's a simple way to go about your task:
recode var_1 TO var_86 (0=1)(1=2)(2=3) into varA_1 TO varA_86.

Extract Data From NetCDF4 File Using List

I am using a list of integers corresponding to an x,y index of a gridded NetCDF array to extract specific values, the initial code was derived from here. My NetCDF file has a single dimension at a single timestep, which is named TMAX2M. My code written to execute this is as follows (please note that I have not shown the call of netCDF4 at the top of the script):
# grid point lists
lat = [914]
lon = [2141]
# Open netCDF File
fh = Dataset('/pathtofile/temperaturedataset.nc', mode='r')
# Variable Extraction
point_list = zip(lat,lon)
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][i,j])
print(dataset_list)
The code executes, and the result is as follows:
masked_array(data=73,mask=False,fill_value=999999,dtype=int16]
The data value here is correct, however I would like the output to only contain the integer contained in "data". The goal is to pass a number of x,y points as seen in the example linked above and join them into a single list.
Any suggestions on what to add to the code to make this achievable would be great.
The solution to calling the particular value from the x,y list on single step within the dataset can be done as follows:
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][:][i,j])
The previous linked example contained [0,16] for the indexed variables, [:] can be used in this case.
I suggest converting to NumPy array like this:
for i, j in point_list:
dataset_list.append(np.array(fh.variables['TMAX2M'][i,j]))

Use of SumIf function with range object or arrays

I'm am trying to optimize a sub that uses Excel´s sumif function since it takes several time to finish.
The specific line (contained in to a for loop) is this one:
Cupones = Application.WorksheetFunction.SumIf(Range("Test_FecFinCup"), Arr_FecFlujos(i), Range("Test_MtoCup"))
Where the ranges are named ranges in the workbook, and Arr_FecFlujos() is an array of dates
That, code works fine, except for it takes to much time to finish.
I am trying this two approaches
Arrays:
Declare my arrays
With Test
Fluj = .Range(Range("Test_Emision").Cells(2, 1), Range("Test_Emision").Cells(2, 1).End(xlDown)).Rows.Count
Arr_FecFinCup = .Range("Test_FecFinCup")
Arr_MtoCup = .Range("Test_MtoCup")
End With
Cupones = Application.WorksheetFunction.SumIf(Arr_FecFinCup, Arr_FecFlujos(i), Arr_MtoCup)
Error tells me I need to work with Range Objects, so I changed to:
With Test
Set Rango1 = .Range("Test_FecIniCup")
Set Rango2 = .Range("Test_MtoCup")
End With
Cupones = Application.WorksheetFunction.SumIf(Rango1, Arr_FecFlujos(i), Rango2)
That one, doesn't shows any error messages, but the sum is incorrect.
Can anybody tell me what's working wrong with these methods and perhaps point me in the correct direction?
It seems that you try to sum a range of numbers using a range of criteria:
WorksheetFunction.SumIf(Arr_FecFinCup, Arr_FecFlujos(i), Arr_MtoCup)
As i know, if the criteria parameter is given a range, Excel don't iterate over that range but instead look for the one value in the criteria_range that coincides with the row of the cell that it is calculating.
For example
Range("D3") = WorksheetFunction.SumIf(Range("A1:A10"),Range("B1:B10"))
Excel will actually calculate as follow
Range("D3") = WorksheetFunction.SumIf(Range("A1:A10"),Range("B3"))
If there is no coincident, then the return is 0
For example
Range("D7") = WorksheetFunction.SumIf(Range("A1:A10"),Range("B1:B5"))
Then D7 is always 0 because looking for [B7] in [B1:B5] is out of range.
Therefore, to do a sum with multiple criterias, the correct way is using SUMIFS as suggested by #mrtiq.

Split array into smaller unequal-sized arrays dependend on array-column values

I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.

Resources