How set RRD to store for 2 years? - database

I'm monitoring more than 300 servers, for that I'm using Ganglia.
Which use RRD as database to collect and store data related the resources of each server.
I would like to have a history about 2 years or more, so reading this article, I think that my RRA configuration should be :
RRAs "RRA:AVERAGE:0.5:1:17520"
17520 = (365 days [year] x 2) * 24 [hour]
This is Ganglia default configuration, which is running today:
#
# Round-Robin Archives
# You can specify custom Round-Robin archives here (defaults are listed below)
#
# RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
# "RRA:AVERAGE:0.5:5760:374"
#
Is that right my way of thinking or I'm missing something here ?

After studying this subject for a while, I came up with an answer that may help someone in the future. I read these two articles many times, which I recommend.
Read this one first, Creating an initial RRD then read this one. How to create an RRDTool database:
I will try to explain it simply. Format RRA:CF:xff:steps:rows:
RRA: Round Robin Archive
CF: Consolidation Factor
XFF: Xfile Factor
steps
rows
The biggest issue for me was to discover the right value for steps and rows.
After reading, I came up with this explanation:
1 day - 5-minute resolution
1 week - 15-minute resolution
1 month - 1-hour resolution
1 year - 6-hour resolution
RRA:AVERAGE:0.5:1:288 \
RRA:AVERAGE:0.5:3:672 \
RRA:AVERAGE:0.5:12:744 \
RRA:AVERAGE:0.5:72:1480
Keep in mind that our step is 300 seconds, so the idea is very simple:
If I want to resolve one day which has 86400 seconds, as shown in the first example, how many rows do I need? The answer is 288 rows. Why?
`86400 seconds [1 day] / 300 seconds [5 minutes`] = 288 rows
Another example, if I want to resolve:
1 week [ = 604800 seconds ] in 15 minutes [ = 900 seconds ] = 604800/900 = 672 rows
And so it goes on for the other values. This way you are going to find out how many rows you need.
Finding out how many steps you need is very simple, you just have to take the multiplier of your steps.
Let me explain: Our steps are 300 seconds, right?
So if we want to resolve 5 minutes [ = 300 seconds ], we just need to multiply by 1, right?
So, 15 minutes means by 300 seconds x 3, 1 hour means 300 x 12, 6 hours mean 300 x 72 and so on.
In my specific case, I would like to my steps be 30 seconds, so I came up with these structure:
1 every time 30 seconds 1 * 30s = 30s
2 every second time 1 minute 2 * 30s = 1m
4 every third time 2 minutes 4 * 30s = 2m
10 every 10th time 5 minutes 10 * 30s = 5m
20 every 20th time 10 minutes 20 * 30s = 10m
60 every 60th time 30 minutes 60 * 30s = 30m
80 every 80th time 40 minutes 80 * 30s = 40m
100 every 100th time 50 minutes 100 * 30s = 50m
120 every 120th time 1 hour 120 * 30s = 1h
240 every 240th time 2 hours 240 * 30s = 2h
360 every 360th time 3 hours 360 * 30s = 3h
RRA:AVERAGE:0.5:1:120 \
RRA:AVERAGE:0.5:2:120 \
RRA:AVERAGE:0.5:4:120 \
RRA:AVERAGE:0.5:10:288 \
RRA:AVERAGE:0.5:20:1008 \
RRA:AVERAGE:0.5:60:1440 \
RRA:AVERAGE:0.5:80:3240 \
RRA:AVERAGE:0.5:100:5184 \
RRA:AVERAGE:0.5:120:8760 \
RRA:AVERAGE:0.5:240:8760 \
RRA:AVERAGE:0.5:360:8760 \
Which means:
1 hour - 30 seconds resolution
2 hours - 1 minute resolution
4 hours - 2 minutes resolution
1 day - 5 minutes resolution
1 week - 10 minutes resolution
1 month - 30 minutes resolution
3 months - 40 minutes resolution
6 months - 50 minutes resolution
1 year - 1 hour resolution
2 year - 2 hour resolution
3 year - 3 hour resolution
Well, I hope this helps someone, that's all.

Related

Avoid multiple counting of overlapped times

I have calculated the overlap times between start_time and end_time for groups of IDs for multiple user logins sessions. ID is unique to a user. A user can have multiple sessions from different browsers, devices, etc.
Here's the dataset I have
row id start_time end_time overlap_in_seconds
1 1 08:41:27 08:47:26 359
2 1 08:39:31 08:40:42 71
3 1 08:41:37 08:47:26 349
If you notice for rows 1 and 3 the time overlap time between 08:41:37 and 08:47:26 has been counted twice.
There are 2 options how I would like to show the result:
Option 1:
row id start_time end_time overlap_in_seconds
1 1 08:41:27 08:47:26 10 (the extra overlap time from 08:41:27 to 08:41:37)
2 1 08:39:31 08:40:42 71
3 1 08:41:37 08:47:26 349
Option 2:
I use this table as a temp table and in the outer query when I do a sum on Overlap_in_seconds, I get the total overlap time as 430 seconds (10+71+349), not 779(349+359+71).
meeting_id. correct_overlap
1 430
Below is my outer query.
select temp.id as meeting_id, sum(temp.Overlap_in_seconds) as correct_overlap
from temp
group by temp.id
Any ideas how to do this?
I'm not sure if this is the quickest way to do it, but what i would is to convert every date to a 86400 bit array, every position correspond to each second in the day, next just simply join all the arrays and it will be the seconds for each group.

Working with an upper-triangular array in SAS (challenge +2 Points)

I'm looking to improve my code efficiency by turning my code into arrays and loops. The data i'm working with starts off like this:
ID Mapping Asset Fixed Performing Payment 2017 Payment2018 Payment2019 Payment2020
1 Loan1 1 1 1 90 30 30 30
2 Loan1 1 1 0 80 20 40 20
3 Loan1 1 0 1 60 40 10 10
4 Loan1 1 0 0 120 60 30 30
5 Loan2 ... ... ... ... ... ... ...
So For each ID (essentially the data sorted by Mapping, Asset, Fixed and then Performing) I'm looking to build a profile for the Payment Scheme.
The Payment Vector for the first ID looks like this:
PaymentVector1 PaymentVector2 PaymentVector3 PaymentVector4
1 0.33 0.33 0.33
It is represented by the formula
PaymentVector(I)=Payment(I)/Payment(1)
The above is fine to create in an array, example code can be given if you wish.
Next, under the assumption that every payment made is replaced i.e. when 30 is paid in 2018, it must be replaced, and so on.
I'm looking to make a profile that shows the outflows (and for illustration, but not required in code, in brackets inflows) for the movement of the payments as such - For ID=1:
Payment2017 Payment2018 Payment2019 Payment2020
17 (+90) -30 -30 -30
18 N/A (+30) -10 -10
19 N/A N/A (+40) -13.3
20 N/A N/A N/A (+53.3)
so if you're looking forwards, the rows can be thought of what year it is and the columns representing what years are coming up.
Hence, in year 2019, looking at what is to be paid in 2017 and 2018 is N/A because those payments are in the past / cannot be paid now.
As for in year 2018, looking at what has to be paid in 2019, you have to pay one-third of the money you have now, so -10.
I've been working to turn this dataset row by row into the array but there surely has to be a quicker way using an array:
The Code I've used so far looks like:
Data Want;
Set Have;
Array Vintage(2017:2020) Vintage2017-Vintage2020;
Array PaymentSchedule(2017:2020) PaymentSchedule2017-PaymentSchedule2020;
Array PaymentVector(2017:2020) PaymentVector2017-PaymentVector2020;
Array PaymentVolume(2017:2020) PaymentVolume2017-PaymentVolume2020;
do i=1 to 4;
PaymentVector(i)=PaymentSchedule(i)/PaymentSchedule(1);
end;
I'll add code tomorrow... but the code doesn't work regardless.
data have;
input
ID Mapping $ Asset Fixed Performing Payment2017 Payment2018 Payment2019 Payment2020; datalines;
1 Loan1 1 1 1 90 30 30 30
2 Loan1 1 1 0 80 20 40 20
3 Loan1 1 0 1 60 40 10 10
4 Loan1 1 0 0 120 60 30 30
data want(keep=id payment: fraction:);
set have;
array p payment:;
array fraction(4); * track constant fraction determined at start of profile;
array out(4); * track outlay for ith iteration;
* compute constant (over iterations) fraction for row;
do i = dim(p) to 1 by -1;
fraction(i) = p(i) / p(1);
end;
* reset to missing to allow for sum statement, which is <variable> + <expression>;
call missing(of out(*));
out(1) = p(1);
do iter = 1 to 4;
p(iter) = out(iter);
do i = iter+1 to dim(p);
p(i) = -fraction(i) * p(iter);
out(i) + (-p(i)); * <--- compute next iteration outlay with ye olde sum statement ;
end;
output;
p(iter) = .;
end;
format fract: best4. payment: 7.2;
run;
You've indexed your arrays with 2017:2020 but then try and use them using the 1 to 4 index. That won't work, you need to be consistent.
Array PaymentSchedule(2017:2020) PaymentSchedule2017-PaymentSchedule2020;
Array PaymentVector(2017:2020) PaymentVector2017-PaymentVector2020;
do i=2017 to 2020;
PaymentVector(i)=PaymentSchedule(i)/PaymentSchedule(2017);
end;

SSRS. How to group in a group?

I have SSRS report like below with Boolean parameter to show 12h view or 24h view. To fit report into single screen the 24h report need to group by every 2hr.
07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 ...
Line 1 25 30 24 26 25 25 30 30 ...
08:00 10:00 12:00 14:00 ...
Line 1 55 50 50 60 ...
The query for the dataset is:
SELECT LineID
,Hour
,HourValue
,Target
FROM vwData
ORDER BY LineID, CASE WHEN [Hour] > 6 THEN - 1 ELSE [Hour] END
How can I achieve this?
This declares your bit variable (which should be true when they want the 24 hour view - false when 12 hour)
DECLARE #24Hour bit = 0
SELECT CASE WHEN #24Hour = 0
THEN Hour
ELSE Hour + (Hour % 2)
END AS [HourGroup]
,SUM(Target) AS [TargetTotal]
FROM vwData
GROUP BY CASE WHEN #24Hour = 0
THEN Hour
ELSE Hour + (Hour % 2)
END
If they want the 24 hour view, we make hour = hour + hour % 2. (7 = 8, 8=8, 9=10, etc., etc.). If you had a more complex query, I would suggest reading up on cross apply, but this is so simple I think this will suffice. The grouping by makes sure to aggregate the REAL 7 and REAL 8 hour records (which will both be returned as "8", if using the 24 hour view). If you don't group your results, you will get two 8 oclock records - one with the REAL 7 hour total and one with the REAL 8 hour total.
EDIT:
Since you didn't include the schema of your DB, I'm guessing that 'Target' is the value being summated, but it could just as easily be 'HourValue'. Furthermore, I have no idea why you would need LineID, so I omitted it from my answer. But you can easily modify that if it's inaccurate. In the future, you should provide some sample data and your database schema so that others aren't forced to make assumptions or guess.
You could add a calculated field with a value given by something like this: `Fields!Hour.Value + Fields!Hour.Value Mod 2' and then group on that field, using a parameter to choose the Group By field in the report (Your new field or the actual hour value).

How to setup cron job to run every 5 days?

I want to setup cronjob which will start on for example today and it will run every 5 days.
This is what I have now, is this will work correctly ? If I install this job at 5 o`clock and then every 5 days on 6 AM.
0 6 */5 * * mailx -r root#mail.com -s "Message title" -c "cc#mail.com" primary#mail.com < body.txt
0 0 */5 * * midnight every 5 days.
See this, similar question just asked for every 3 days.
Cron job every three days
Bear with me, this is my first time posting an answer... let me know if anything I put is unclear.
I don't think you have what you need with:
0 0 */5 * * ## <<< WARNING!!! CAUSES UNEVEN INTERVALS AT END OF MONTH!!
Unfortunately, the */5 is setting the interval based on day of the month. See: explanation here. At the end of the month there is recurring issue guaranteed.
1st at 2019-01-01 00:00:00
then at 2019-01-06 00:00:00 << 5 days, etc. OK
then at 2019-01-11 00:00:00
...
then at 2019-01-26 00:00:00
then at 2019-01-31 00:00:00
then at 2019-02-01 00:00:00 << 1 day WRONG
then at 2019-02-06 00:00:00
...
then at 2019-02-26 00:00:00
then at 2019-03-01 00:00:00 << 3 days WRONG
According to this article, you need to add some modulo math to the command being executed to get a TRUE "every N days". For example:
0 0 * * * bash -c '(( $(date +\%s) / 86400 \% 5 == 0 )) && runmyjob.sh
In this example, the job will be checked daily at 12:00 AM, but will only execute when the number of days since 01-01-1970 modulo 5 is 0.
If you want it to be every 5 days from a specific date, use the following format:
0 0 * * * bash -c '(( $(date +\%s -d "2019-01-01") / 86400 \% 5 == 0 )) && runmyjob.sh
The last snippet of Brad will not work because it ignores the actual date.
date +%s -d "2019-01-01"
will always return the same amount of seconds which will end up in either always true or always false depending on which date you choose.
you would have to do an extra calculation like this to run it every 5 days at 00:00 from 2019-01-01 00:00:00 on:
0 0 * * * bash -c '(( ( $(date +\%s) - $(date +\%s -d "2019-01-01 00:00:00") ) / 86400 \% 5 == 0 )) && runmyjob.sh'

Get the number of days of the current date with Time

I'm trying to get the number of days untill today. Like:
int days, seconds;
seconds = Time(0); // Get the number of SECONDS from January, 1ยบ 1970 untill now.
days = seconds / (60 * 60 * 24);
printf("%d", days);
The output is: 16326.
But when I use some website that makes the conversion for you, they show 6464 days instead.
What am I doing wrong ?
You are right. There are 30 + 14 = 44 years, which gives about 16000 days.

Resources