How can I sum values with multiple conditions including different dates - arrays

I have some data as follow (column A:D contain data, column E is the sum I created):
NO
SE
Date Country ID Value Sum
30-01-2014 SE B-08888 10 10
05-02-2014 SE B-08888 23
06-02-2014 SE B-08888 20
13-05-2014 SE B-08888 17 27
14-05-2014 SE B-08888 10
13-05-2014 NO A-07777 15 35
14-05-2014 NO A-07777 20
I would like to sum all values that are having same country and same ID when: 1) the date is greater than 1/5; and 2) when date is less than 1/5.
I am using the SUMIFS. But the SUMIFS doesn't give correct results when I included the date argument which is less than 1/5.
=SUMIFS($D$5:$D$11;$A$5:$A$11;"<="&DATE(2014;5;1);$B$5:$B$11;A2;$C$5:$C$11;C5) ==> gives incorrect result (=10)
=SUMIFS($D$5:$D$11;$A$5:$A$11;">="&DATE(2014;5;1);$B$5:$B$11;A2;$C$5:$C$11;C8) ==> gives correct result (=27)
Is there a way I can take into account both date conditions (i.e. date greater than and less than 1/5) and make the formula general so I don't have to go through every cell to change reference?
Thank you.

Using your data, the second formula returns 27 for me - so I assume the cell references you have not mentioned are as I have guessed. The first formula for me returns 53 - I suspect the result you want, though have not mentioned.
Something is wrong with your data (not the formulae). The most likely cause is that there is a trailing space in C6 and C7 that is not in C5. Copying C5 down to C9 should fix that. There might however be a data issue in other cells in those two rows.
It might make things easier for you if the formulae were in separate columns.

Related

Google Sheets: ArrayFormula sumifs, but return some symbol or empty cell if there's nothing to sum

I have the following table:
`Jan` `Feb` `Mar`
`P7` `Q7` `R7` 56.80 0 0
`Column E Column I Column N`
17 expense Jan-5 15.87
18 $ Jan-9 56.80
19 expense Feb-8 38.12
20 expense Mar 5 45.38
21 $ Mar-12 0.00
So I have `Cell P7` with the following formula `=ArrayFormula(sumifs(N17:N,E17:E,"$",MONTH(I17:I),1))`
and `Cell Q7` with the following formula
`=ArrayFormula(sumifs(N17:N,E17:E,"$",MONTH(I17:I),2))`
and `Cell R7` with the following formula
`=ArrayFormula(sumifs(N17:N,E17:E,"$",MONTH(I17:I),3))`
that checks `Column I` for January dates (for `P7`), February dates (for `Q7`) and March dates (for `R7`), then checks if there's at least one sign `$` in corresponding `Column E` for those dates, and if there is, sums all corresponding amounts in `Column N`.
Now, my problem is this: the formula, as it is now, returns `0` even if there are no signs `$` for a specific month range, like for February in the above table. There are no `$` sign in `Column E` for February, yet cell `Q7` shows `0`.
I would like it to:
- return nothing at all (empty cell instead of `0`) if there are no amounts in `Column N` for a specific month marked with `$` sign in `Column E`
- return `0` only if there is `0.00` amount in `Column N` along with the `$` sign in corresponding `Column E`.
or
- return something else, like `~`, if there's nothing to sum up.
Let's say for March I could put `$` in `E21` and `~` in `N21` and see that `~` returned in `R7` if there are no more amounts to sum up.
Is there a way to do it?
Thanks to MAttKing's reference, simply add your formula in IFERROR(1/(1/ ComplexFunction() )) which should look something like this.
=IFERROR(1/(1/ArrayFormula(sumifs(D3:D,A3:A,"$",MONTH(B3:B),2))))
Output:

Median-If With Month Criteria not working in LibreOffice

I have a simple spreadsheet like below, with columns:
A: Timestamp
B: A numerical result
C: Time duration to compute above result
I want to compute the median value for duration for year 2019 March in cell I4. I used the following formula for it:
{=MEDIAN(IF((YEAR(A:A) = G1) * (MONTH(A:A) = 3), C:C))}
I expect value 48.5 should appear (median value b/w 41 and 56). But, it's showing an error #VALUE! when entered using Ctrl-Shift-Enter.
Can someone point where the problem might be.

Confused with the declaration of 2D arrays in COBOL

So let's assume that I have a file which consists of 10 students with 3 fields: Name, Gender, Age. So, theoretically, I would want to create a 10 by 3 array.
But when it comes to COBOL, two-dimensional tables are created by this example:
01 WS-TABLE.
05 WS-A OCCURS 10 TIMES.
10 WS-B PIC A(10).
10 WS-C OCCURS 5 TIMES.
15 WS-D PIC X(6).
In this example, I can not understand what WS-B and WS-D are. If I want to create an array like the one I mentioned (10 by 3), how may I do so?
Thanks
First of all COBOL doesn't have arrays per-se it has tables. There is no way to make a 2-dimensional table. The example you have give is actually the closest you can get (a nested table). If I was faced with the problem you do (a field of 10 student with Name, Gender and Age) I would structure my data like this:
01 WS-TABLE.
05 WS-STUDENT OCCURS 10 TIMES.
10 WS-NAME PIC X(10).
10 WS-GENDER PIC X.
10 WS-AGE PIC 9(3).
In this example I would use a subscript to access the fields I have created for student. So this is what a loop to display them all would look like:
PERFORM VARYING WS-X
FROM 1 BY 1
UNTIL WS-X > 10
DISPLAY "NAME: " WS-NAME(WS-X) " GENDER: " WS-GENDER(WS-X) " AGE: " WS-AGE(WS-X)
END-PERFORM

MATLAB sort function yields tampered results

I have a vector of 126 elements which is usually correctly sorted; however, I always sort it to make sure everything is okay.
The problem is that: when the array is already sorted, performing a sort would destroy the original values of the array.
I attached the array in a csv file and executed the script below, where I insert the vector in the first column of 'a' then sort it in the second then check for any differences in the third column.
a = csvread('a.csv')
a(:,2)=sort(a(:,1))
a(:,3)=a(:,2)-a(:,1)
result=sum(a(:,3).^2)
You could easily see that the first two columns aren't identical, and the third column has some none zero values.
Syntax for array
a = [17.4800
18.6800
19.8800
21.0800
22.2800
23.4800
24.6800
25.8800
27.0800
28.2800
29.4800
30.6800
46.1600
47.3600
48.5600
49.7600
50.9600
52.1600
53.3600
54.5600
55.7600
56.9600
58.1600
59.3600
74.8400
76.0400
77.2400
78.4400
79.6400
80.8400
103.5200
104.7200
105.9200
107.1200
108.3200
109.5200
110.7200
111.9200
113.1200
114.3200
115.5200
116.7200
132.2000
133.4000
134.6000
135.8000
137.0000
138.2000
139.4000
140.6000
141.8000
143.0000
144.2000
145.4000
165.4200
166.6200
167.8200
169.0200
170.2200
171.4200
172.6200
173.8200
175.0200
176.2200
177.4200
178.6200
179.9300
181.1300
182.3300
183.5300
184.7300
185.9300
187.1300
188.3300
189.5300
201.3700
202.5700
203.7700
204.9700
206.1700
207.3700
236.1100
237.3100
238.5100
239.7100
240.9100
242.1100
243.3100
244.5100
245.7100
246.9100
248.1100
249.3100
239.8400
241.0400
242.2400
276.9900
278.1900
279.3900
280.5900
281.7900
282.9900
284.1900
285.3900
286.5900
287.7900
288.9900
290.1900
277.8200
279.0200
280.2200
281.4200
282.6200
283.8200
285.0200
286.2200
287.4200
288.6200
289.8200
291.0200
291.0700
292.2700
293.4700
295.6900
296.8900
298.0900];
Your original vector is unfortunately not sorted. Therefore, sorting this result will obviously not give you what the original vector is supposed to be as the values that were out of order will become in order.
You can check this by using diff on the read in vector from the CSV file and seeing if there are any negative differences. diff takes the difference between the (i+1)th value and the ith value and if your values are monotonically increasing, you should get positive differences all around. We can see which locations are affected by finding values in the difference that are negative:
a = csvread('a.csv');
ind = find(diff(a) < 0);
We get:
>> ind
ind =
93
108
This says that locations 93 and 108 are where the out of order starts. Locations 94 and 109 is where it actually happens. Let's check out portions 90 - 110 of your vector to be sure:
>> a(90:110)
ans =
245.7100 % 90
246.9100 % 91
248.1100 % 92
249.3100 % 93
239.8400 %<-------
241.0400
242.2400
276.9900
278.1900
279.3900
280.5900
281.7900
282.9900
284.1900
285.3900
286.5900
287.7900 % 106
288.9900 % 107
290.1900 % 108
277.8200 % <------
279.0200
As you can see, locations 93 and 108 take a dip in numerical value, and so if you tried sorting the result then taking the difference, you'll notice that locations 1 up to 93 will exhibit a difference of 0, but after location 93, that's when it becomes unequal.
I'm frankly surprised you didn't see that they're out of order because your snapshot clearly shows there's a decrease in value on the left column towards the top of the snapshot.
Therefore, either check your data to see if you have input it correctly, or modify whatever process you're working on to ensure that it can handled unsorted data.

What are some good ways to compress data across time?

I have an array of objects with time and value property. Looks something like this.
UPDATE: dataset with epoch times rather than time strings
[{datetime:1383661634, value: 43},{datetime:1383661856, value: 40}, {datetime:1383662133, value: 23}, {datetime:1383662944, value: 23}]
The array is far larger than this. Possibly a 6 digit length. I intend to build a graph to represent this array. Due to obvious reasons, I cannot use every bit of the data to build this graph (value vs time); so I need to normalize it across time.
So here's the main problem - There is no trend in the timestamp for these objects; so I need to dynamically choose slots of time in which I either average out the values or show counts of objects in that slot.
How can I calculate slots that user friendly. i.e per minute, hour, day, eight hours or so. I am looking at having a maximum of 25 slots done out of the array, which I show up on the graph.
I hope this helps get my point through.
You can convert the date/time into epoch and use numpy.histogram to get the ranges:
import random, numpy
l = [ random.randint(0, 1000) for x in range(1000) ]
num_items_bins, bin_ranges = numpy.histogram(l, 25)
print num_items_bins
print bin_ranges
Gives:
[34 38 42 41 43 50 34 29 37 46 31 47 43 29 30 42 38 52 42 44 42 42 51 34 39]
[ 1. 40.96 80.92 120.88 160.84 200.8 240.76 280.72
320.68 360.64 400.6 440.56 480.52 520.48 560.44 600.4
640.36 680.32 720.28 760.24 800.2 840.16 880.12 920.08
960.04 1000. ]
Hard to say without knowing the nature of your values, compressing values for display is a matter of what you can afford to discard and what you can't. Some ideas though:
histogram
candlestick chart
Is this JSON and the DateTimes transmitted as text?
Why not transmit the Date as a long (Int64), and use a method to convert to/from DateTime? Depending on which language you could use these implementations:
DateTime to Long in C#
Date to long using Unix timestamp in Java
That alone would save you a considerable amount of space, since strings are 16-bits per character and the long TimeStamp would be just 64 bits.

Resources