Spotfire Line chart with min max bars - analytics

I am trying to make a chart that has a line graph showing the change in value in the count column for each month, and then two points showing the min and max value in that month. The table table is below.
Date Min Max Count
1/1/2015 0.28 6.02 13
2/1/2015 0.2 7.72 8
3/1/2015 1 1 1
4/1/2015 0.4 6.87 7
5/1/2015 0.36 3.05 8
6/1/2015 0.17 1.26 13
7/1/2015 0.31 1.59 15
8/1/2015 0.39 3.35 13
9/1/2015 0.22 0.86 10
10/1/2015 0.3 2.48 13
11/1/2015 0.16 0.82 9
12/1/2015 0.33 2.18 5
1/1/2016 0.23 1.16 14
2/1/2016 0.38 1.74 7
3/1/2016 0.1 8.87 9
4/1/2016 0.28 0.68 3
5/1/2016 0.13 3.23 11
6/1/2016 0.33 1 5
7/1/2016 0.28 1.26 4
8/1/2016 0.08 0.41 2
9/1/2016 0.43 0.61 2
10/1/2016 0.49 1.39 4
11/1/2016 0.89 0.89 1
I tried doing a scatter plot but when I try to Add a Line from Column value I get an error saying that the line cannot work on categorical data.
Any suggestions on how I can prepare this visualization?
Thanks!

I would do this in a combination chart.
Insert a combination chart (Line & Bar Graph)
On your X-Axis put your date as <BinByDateTime([Date],"Year.Month",1)>
On your Y-Axis put your aggregations: Sum([Count]), Max([Max]), Min([Min])
Right click > Properties > Series > set the Min and Max to Line Type
(Optional) Change the Y-Axis scale

Related

SQL- Using lead with group by to get closest value for a group

I would like to use the Lead function to get the closest value for a group
Below is some sample data from flx_alps_boundaries
Subject code
Grade
Score
20-BD-AC-AL
1
1.12
20-BD-AC-AL
2
1.03
20-BD-AC-AL
3
0.97
20-BD-AC-AL
4
0.92
20-BD-AC-AL
5
0.86
20-BD-AC-AL
6
0.84
20-BD-AH-AL
1
1.15
20-BD-AH-AL
2
1.10
20-BD-AH-AL
3
1.05
20-BD-AH-AL
4
1.00
20-BD-AH-AL
5
0.98
20-BD-AH-AL
6
0.96
I am calculating the score for a subject using a formula and getting the grade for the nearest matching score from the above table . eg if score is 0.95 for subject 20-BD-AC-AL the grade should be 4
This is my current sql
select top 1
ab.alps_grade as alps_grade,
round( sum (actual_alps_points - expected_alps_points)
/ (count(reference) * 100) + 1,2 ) as alps_score
from alps_cte
inner join [flx_alps_boundaries] ab
on alps_cte.course = ab.course_code
where ab.course_code in ('20-BD-AC-AL','20-BD-AH-AL')
group by course,ab.alps_grade,ab.alps_score
order by abs(round(sum(actual_alps_points
- expected_alps_points)
/ (count(reference)*100) + 1, 2)
- ab.alps_score)
This query only returns one row. How do I use LEAD to get the appropriate grade for each
subject's score?

Repeat certain pandas series values, so that it has an entry for all index values between 1 and 100

I have created a list of pandas series, with each series indexed by numbers between 1 and 100 eg
Index Value
1 62.99
4 64.39
37 75.225
65 88.12
74 89.89
79 93.30
88 94.30
92 95.83
100 100.00
What I want to do, either while it is a Series, or as an array after calling .to_numpy() on it, is to fill it out so that my series has 100 values (1 to 100), with any new entries having the previous existing value ie
Index Value
1 62.99
2 62.99
3 62.99
4 64.39
5 64.39
6 64.39
...
...
36 64.39
37 75.225
38 75.225
and so on.
I can do this programmatically the long-winded way by iterating through each series and checking for a change in value; my question is, is there a version of Series.repeat() which could do this in one hit, or a numpy function which can 'pad out' my array in this manner with my 100 values?
Thanks in advance for reading, and for any suggestions. This isn't homework; it's a genuine question so please don't attack me if my style of asking isn't as you expect.
What you need yo do is to frontfill the values in a series:
This code
series = pd.Series([33.2, 36, 39, 55], index=[3, 6, 12, 14], name='series')
indices = range(100)
df = pd.DataFrame(indices)
series = df.join(series).ffill()['series']
produces
0 NaN
1 NaN
2 NaN
3 33.2
4 33.2
...
95 55.0
96 55.0
97 55.0
98 55.0
99 55.0
First values ar NaN because there are no values to fill them in the series
So here's the solution I went with - an ffill() with fillna(0), joining to range(1,101). I had to iterate through a larger dataset which needed grouping by ID first / taking the maximum 'Pct' per 'Bucket' :-
j=df[['ID','Bucket','Pct']].groupby(['ID','Bucket']).max()
for i in df['ID'].unique():
index=pd.DataFrame(range(1,101))
index.columns=['Bucket']
k=pd.merge(index,j.loc[i],how='left',on='Bucket').ffill().fillna(0)
In:
Bucket Pct
3 0.03
3 0.1
3 0.26
3 0.42
3 0.45
3 0.59
3 0.69
3 0.83
3 0.86
3 0.91
3 0.94
3 0.98
4 1.1
... ...
91 98.89
93 99.08
94 99.17
94 99.26
94 99.43
94 99.48
94 99.63
100 100.0
Out:
Bucket Pct
1 0.00
2 0.00
3 0.98
4 1.83
5 22.83
... ...
91 98.89
92 98.89
93 99.08
94 99.63
95 99.63
96 99.63
97 99.63
98 99.63
99 99.63
100 100.00
Many, many thanks once again to you both!

how to split an array into separate arrays (R)? [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 7 years ago.
I have the array:
>cent
b e r f
A19 60.46 0.77 -0.12 1
A15 16.50 0.53 0.08 2
A17 2.66 0.51 0.20 3
A11 36.66 0.40 -0.25 4
A12 38.96 0.91 0.23 1
A05 0.00 0.29 0.01 2
A09 3.40 0.35 0.03 3
A04 0.00 0.25 -0.03 4
Could some one please say me how to split this array into 4 separate arrays where the last column «f» is the flag? In result I would like to see:
>cent1
b e r f
A19 60.46 0.77 -0.12 1
A12 38.96 0.91 0.23 1
>cent2
b e r f
A15 16.50 0.53 0.08 2
A05 0.00 0.29 0.01 2
….
Should I use the for-loop and check flag "f" or exist a build-in function? Thanks.
We can use split to create a list of data.frames.
lst <- split(cent, cent$f)
NOTE: Here I assumed that the 'cent' is a data.frame. If it is a matrix
lst <- split(as.data.frame(cent), cent[,"f"])
Usually, it is enough to do most of the analysis. But, if we need to create multiple objects in the global environment, we can use list2env (not recommended)
list2env(lst, paste0("cent", seq_along(lst)), envir= .GlobalEnv)

Plot this kind of graph from data of an array

Good afternoon,
I am working on a Matlab project and I have stored some data in an array. I would like to plot a plot like the plot shown below. However, I don't know what plotting function I need to use and how, in order to obtain the image plot (it will be not the same, but this style).
My data is on a 11x16 - matrix.
Thank you guys so much beforehand!
#rayryeng
It was a really useful answer, although I didn't need that exact shape. I need the shape that my data would create, I've been trying to modify the code you wrote in order to obtain what I need but I did not obtained it...
My data is
data = ( 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 ;
8.00 8.02 8.04 8.07 8.12 8.20 8.30 8.42 8.53 8.63 8.72 8.80 8.86 8.91 8.96 9.00;
6.00 6.03 6.07 6.12 6.22 6.37 6.59 6.83 7.07 7.28 7.45 7.60 7.72 7.83 7.92 8.00;
4.00 4.03 4.07 4.14 4.26 4.48 4.85 5.26 5.63 5.95 6.21 6.43 6.61 6.75 6.88 7.00;
2.00 2.02 2.05 2.10 2.20 2.44 3.08 3.70 4.23 4.67 5.01 5.29 5.52 5.70 5.86 6.00;
0 0 0 0 0 0 1.33 2.24 2.93 3.47 3.88 4.21 4.46 4.67 4.84 5.00;
0 0 0 0 0 0 0 1.01 1.78 2.38 2.84 3.19 3.46 3.67 3.84 4.00;
0 0 0 0 0 0 0 0 0.80 1.43 1.91 2.25 2.51 2.70 2.86 3.00;
0 0 0 0 0 0 0 0 0 0.63 1.10 1.41 1.62 1.77 1.89 2.00;
0 0 0 0 0 0 0 0 0 0 0.44 0.66 0.79 0.88 0.94 1.00;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
This is my matrix of data (sorry I know it's too long), well and when I try to plot writing:
[x,y] = meshgrid(1:16,1:11);
contourf(x,y,data,20,'LineStyle','none');
colorbar
It should have a different shape than what I get. I need to get that the part that are 0 (zeros) are like the white part of the plot I showed before. (Different shape though) I don't really know how to do it (my data should be read properly), if you could help me I would be really thankful.
Thank you so much for last answer.
It depends on your data, I believe you should use contourf.
This is as close as I could get,
[x,y] = meshgrid(1:16,1:11);
data = - y;
data(end,5:10) = NaN;
data(end-1,6:9) = NaN;
data(end-2,7:8) = NaN;
contourf(x,y,data,20,'LineStyle','none');
colorbar
with,
data = - y .* abs(log(sin(.10 * x - 5.5)+.5));
data(data < -4) = NaN;
So I suppose the code is right, it's matter of your data,
with data = max(data(:)) - data;
What you have is almost correct. All you need to do is set any data that is 0 to NaN. That way, when you throw it into contourf, those parts are not visualized. As such:
data = [10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 ;
8.00 8.02 8.04 8.07 8.12 8.20 8.30 8.42 8.53 8.63 8.72 8.80 8.86 8.91 8.96 9.00;
6.00 6.03 6.07 6.12 6.22 6.37 6.59 6.83 7.07 7.28 7.45 7.60 7.72 7.83 7.92 8.00;
4.00 4.03 4.07 4.14 4.26 4.48 4.85 5.26 5.63 5.95 6.21 6.43 6.61 6.75 6.88 7.00;
2.00 2.02 2.05 2.10 2.20 2.44 3.08 3.70 4.23 4.67 5.01 5.29 5.52 5.70 5.86 6.00;
0 0 0 0 0 0 1.33 2.24 2.93 3.47 3.88 4.21 4.46 4.67 4.84 5.00;
0 0 0 0 0 0 0 1.01 1.78 2.38 2.84 3.19 3.46 3.67 3.84 4.00;
0 0 0 0 0 0 0 0 0.80 1.43 1.91 2.25 2.51 2.70 2.86 3.00;
0 0 0 0 0 0 0 0 0 0.63 1.10 1.41 1.62 1.77 1.89 2.00;
0 0 0 0 0 0 0 0 0 0 0.44 0.66 0.79 0.88 0.94 1.00;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
data(data == 0) = NaN;
[x,y] = meshgrid(1:16,1:11);
contourf(x,y,data,20,'LineStyle','none');
colorbar
This is what I get:
Given your comments, you want the y-axis to be reversed. Simply put axis ij; at the end of the code above to flip the y-axis so that y-down is the positive direction. If you do that, we get this figure:
Credit should go to Kamtal as he figured out where you needed to start. I just helped finish off the requirement.

SAS: Using a Loop for Creating Many Data Sets and renaming the variables in them

I have a dataset in a long format as e.g.:
time subject var1 var2 var3
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
I would like to get a dataset in a wide format as
time var1_sub1 var2_sub1 var3_sub1 var1_sub2 var2_sub2 var3_sub2
1 0.41 0.48 0.85 0.76 0.49 0.03
2 0.58 0.38 0.15 0.36 0.26 0.93
3 0.08 0.39 0.96 0.83 0.88 0.63
4 0.58 0.87 0.15 0.19 0.65 0.99
5 0.55 0.40 0.67 0.89 0.91 0.47
So far, I came up with an idea to do it in the following way:
data data_sub1;
set data;
if subject=1;
var1_sub1=var1;
var2_sub1=var2;
var3_sub1=var3;
run;
data data_sub2;
set data;
if subject=2;
var1_sub2=var1;
var2_sub2=var2;
var3_sub2=var3;
run;
proc sort data=data_sub1;
by time;
run;
proc sort data=data_sub2;
by time;
run;
data datamerged;
merge data_sub1 data_sub2;
by time;
run;
It works, everything is fine, but I would like to learn how one could code it in a more beautiful way as in the practice I have much more subjects and variables.
This is a PROC TRANSPOSE problem. To solve most PROC TRANSPOSE problems, make it totally vertical (one value-one variable name per row) and then transpose using the ID statement.
data have;
input time subject var1 var2 var3;
datalines;
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
;;;;
run;
data have_vert;
set have;
array vars var:;
do _t = 1 to dim(vars);
id=cats(vname(vars[_t]),'_','sub',subject); *this is our future variable name;
value = vars[_t]; *this is our future variable value;
output;
end;
keep time id value subject;
run;
proc sort data=have_vert;
by time subject id;
run;
proc transpose data=have_vert out=want;
by time;
var value;
id id;
run;

Resources