Cumulative Sum - Choosing Portions of Hierarchy - sql-server

I have a bit of an interesting problem.
I required the cumulative sum on a set that is created by pieces of a Time dimension. The time dimension is based on hours and minutes. This dimension begins at the 0 hour and minute and ends at the 23 hour and 59 minute.
What I need to do is slice out portions from say 09:30 AM - 04:00 PM or 4:30PM - 09:30 AM. And I need these values in order to perform my cumulative sums. I'm hoping that someone could suggest a means of doing this with standard MDX. If not is my only alternative to write my own stored procedure which forms my Periods to date set extraction using the logic described above?
Thanks in advance!

You can create a secondary hiearchy in your time dimension with only the hour and filter the query with it.
[Time].[Calendar] -> the hierarchy with year, months, day and hours level
[Time].[Hour] -> the 'new' hierarchy with only hours level (e.g.) 09:30 AM.
The you can make a query in mdx adding your criteria as filter :
SELECT
my axis...
WHERE ( SELECT { [Time].[Hour].[09:30 AM]:[Time].[Hour].[04:00 PM] } on 0 FROM [MyCube] )
You can also create a new dimension instead of a hierarchy, the different is in the autoexists behaviour and the performance.

Related

Average for loop in loop PL / SQL

I am trying to calculate the avarage of durations from the last 40 days for diffrent IDs.
Example: I have 40 days and for each day IDs from 1-20 and each ID has a start date and end date in HH:MI:SS.
My code is a cursor which fetches the last 40 days, then I made a second for loop. In this one I select all the ids from this day. Then I go through every ID for this day and select start and end dat calculating the duration. So far so good. But how do I calculate the avarage of the duration for the IDs in the last 40 days.
The idea is simple. To take the durations for one id (in the last 40 days) add them together and divide them by 40. And then do the same for all IDs. My plan was to make a 2d Array and in the first array putting all IDs, then in the second array to put the duration and add the values for one id together. Then I would have added all the durations for one ID together and get the value from the array. But I am kinda stuck in that idea.
I also wonder if there is a better solution.
Thanks for any help!
From my point of view, you don't need loops nor PL/SQL - just calculate the average:
select id,
avg(end_date - start_date)
from your_table
where start_date >= trunc(sysdate) - 40
group by id
Drawback might be what you said - that you stored dates as hh:mi:ss. What does it mean? That you stored them as strings? If so, most probably bad idea; dates (as Oracle doesn't have a separate datatype for time) should be stored into DATE datatype columns.
If you really have to work with strings, then convert them to dates:
avg(to_date(end_date, 'hh:mi:ss') - to_date(start_date, 'hh:mi:ss'))
Also, you'll then have to have another DATE datatyp column which is capable of saying what "last 40 days" actually means.
Result (the average) will be number of days between these values. Then you can format it prettier, if you want.

I need to provide a tabulated output of this month, month+1, month +2, and month +3 derived from numerous tables within the one sheet

The History:
I have a data set that refreshes every Monday morning adding last week's values to a growing tally until there is 52 weeks in the data set (9 separate cohorts), across 38 different departments.
I have built a power query to filter the department and compiled tables for each cohort, limiting the data to the last 17 weeks, and using excel forecast modelling then setup each table to forecast 16 weeks ahead.
Because the week beginning (WB) dates keep changing IO cant hard code the result table to cells within each cohort table.
My result table needs to show current month, month +1, month +2, and month +3 forecast values as per the highest date closest to or equal to EOM and I need this to be automated, hence a formula.
PS added complexity is that the table has date/value adjacent in (last 17 weeks) and columns separated in future 16 weeks of data in each table. Structure is exactly the same across all the 9 cohort forecast tables.
My Question:
Am I best to use a nested EOM formula, or VLOOKUP(MAX) based on the cohort_forecast_table image link below?
Because the current month needs to be current I have created a cell using =NOW().
I then complete a VLOOKUP within each cell in the master table that references the references the data in each sub-table usin MAX and EOMONTH for current month, then month+1, month+2, month+3, etc.
In a simplified broken down solution:
Date array = 'D3:D35'
Volume array = 'E3:E35'
End of current month formula cell B3: =MAX(($D$3:$D$35<EOMONTH(D1,0))*D3:D35)
Call for result in cell C3:
'=VLOOKUP(B3,Dates:Volumes,2,FALSE)'
I think this will work for me and thank you all...

Manufacturing process cycle time database design

I want to create a database to store process cycle time data. For example:
Say a particular process for a certain product, say welding, theoretically takes about 10 seconds to do (the process cycle time). Due to various issues, the machine's actual cycle time would vary throughout the day. I would like to store the machine's actual cycle time throughout the day and analyze it over time (days, weeks, months). How would i go about designing the database for this?
I considered using a time series database, but i figured it isn't suitable - cycle time data has a start time and an end time - basically i'm measuring time performance over time - if this even makes sense. At the same time, I was also worried that using relational database to store and then display/analyze time related data is inefficient.
Any thoughts on a good database structure would be greatly appreciated. Let me know if any more info is needed and i will gladly edit this question
You are tracking the occurrence of an event. The event (weld) starts at some time and ends at some time. It might be tempting to model the event entity like so:
StationID StartTime StopTime
with each welding station having a unique identifier. Some sample data might look like this:
17 08:00:00 09:00:00
17 09:00:00 10:00:00
For simplicity, I've set the times to large values (1 hour each) and removed the date values. This tells you that welding station #17 started a weld at 8am and finished at 9am, at which time the second weld started which finished at 10am.
This seems simple enough. Notice, however, that the StopTime of the first entry matches the StartTime of the second entry. Of course it does, the end of one weld signals the start of the next weld. That's how the system was designed.
But this sets up what I call the Row Spanning Dependency antipattern: where the value of one field of a row must be synchronized with the value of another field in a different row.
This can create any number of problems. For example, what if the StartTime of the second entry showed '09:15:00'? Now we have a 15 minute gap between the end of the first weld and the start of the next. The system does not allow for gaps -- the end of each weld also starts the next weld. How should be interpret this gap. Is the StopTime of the first row wrong. Is the StartTime of the second row wrong? Are both wrong? Or was there another row between them that was somehow deleted? There is no way to tell which is the correct interpretation.
What if the StartTime of the second entry showed '08:45'? This is an overlap where the start of the second cycle supposedly started before the first cycle ended. Again, we can't know which row contains the erroneous data.
A row spanning dependency allows for gaps and overlaps, neither of which is allowed in the data. There would need to be a large amount of database and application code required to prevent such a situation from ever occurring, and when it does (as assuredly it will) there is no way to determine which data is correct and which is wrong -- not from within the database, that is.
An easy solution is to do away with the StopTime field altogether:
StationID StartTime
17 08:00:00
17 09:00:00
Each entry signals the start of a weld. The end of the weld is indicated by the start of the next weld. This simplifies the data model, makes it impossible to have a gap or overlap, and more precisely matches the system we are modeling.
But we need the data from two rows to determine the length of a weld.
select w1.StartTime, w2.StartTime as StopTime
from Welds w1
join Welds w2
on w2.StationID = w1.StationID
and w2.StartTime =(
select Max( StartTime )
from Welds
where StationID = w2.StationID
and StartTime < w2.StartTime );
This may seem like a more complicated query that if the start and stop times were in the same row -- and, well, it is -- but think of all that checking code that no longer has to be written, and executed at every DML operation. And since the combination of StationID and StartTime would be the obvious PK, the query would use only indexed data.
There is one addition to suggest. What about the first weld of the day or after a break (like lunch) and the last weld of the day or before a break? We must make an effort not to include the break time as a cycle time. We could include the intelligence to detect such situation in the query, but that would increase the complexity even more.
Another way would be to include a status value in the record.
StationID StartTime Status
17 08:00:00 C
17 09:00:00 C
17 10:00:00 C
17 11:00:00 C
17 12:00:00 B
17 13:00:00 C
17 14:00:00 C
17 15:00:00 C
17 16:00:00 C
17 17:00:00 B
So the first few entries represent the start of a cycle, whereas the entry for noon and 5pm represents the start of a break. Now we just need to append the line
where w1.Status = 'C'
to the end of the query above. Thus the 'B' entries supply the end times of the previous cycle but do not start another cycle.

Calculate Facebook likes, comments, and shares for different time zones from saved UTC

I've been struggle with this for a while and hope someone can give me an idea to tackle this.
We have a service that goes out and collects Facebook likes, comments, and shares for each status update multiple times a day. The table that stores this data is something like this:
PostId EngagementTypeId Value CollectedDate
100 1(for likes) 10 1/1/2013 1:00
100 2 (comments) 2 1/1/2013 1:00
100 3 0. 1/1/2013 1:00
100. 1. 12 1/1/2013 3:00
100. 2. 3. 1/1/2013 3:00
100. 3 5. 1/1/2013 3:00
Value holds the total for each engagement type at the time of collection.
I got a requirement to create a report that shows new value per day at different time zones.
Currently,I'm doing the calculation in a stored procedure that takes in a time zone offset and based on that I calculate the delta for each day. If this is for someone in California, the report will show 12 likes, 3 comments, and 5 shares for 12/31/2012. But if someone with the time zone offset of -1, he will see 10 likes on 12/31/2012 and 2 likes on 1/1/2013.
The problem I'm having is doing the calculation on the fly can be slow if we have a lot of data and a big date range. We're talking about having the delta pre-calculated for each day and stored in a table and I can just query from that ( we're considering SSAS but that's for the next phase). But doing this, I would need to have the data for each day for 24 time zones. Am I correct (and if so, this is not ideal) or is there a better way to approach this?
I'm using SQL 2012.
Thank you!
You need to convert UTC DateTime stored in your column to Date based on users UTC time. This way you don't have to worry about any table that has to be populated with data. To get users date from your UTC column you will use something like this
SELECT CONVERT(DATE,(DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), '01/29/2014 04:00')))
AS MyLocalDate
The select statement above figures out Local date based on the difference of UTC date and local Date. You will need to replace GETDATE() with users DATETIME that is passed in to your procedure and replace '01/29/2014 04:00' with your column. This way when you select any date from your table it will be according to what that date was at users local time. Than you can calculate other fields accordingly.

SSAS -> AdventureWorks Example -> Using the browser to splice a measure by week, shows results that have two of the same week records?

I have been working on a cube and noticed that when I am browsing measures in my cube by weeks, I am getting an unexpected result, but first let me display my current scenario. I am looking at counts of a fact load by weeks. When I do so I am getting results like these. :
Weeks | Fact Internet Sales Count
2001-07-01 00:00:00.000 | 28
2001-07-08 00:00:00.000 | 29
....and so on as you would expect.
Further down I noticed this. :
2001-09-30 00:00:00.000 | 10
2001-09-30 00:00:00.000 | 24
As you can see, it shows the week twice with different counts, when you add these counts together it is the correct number of counts for this week (i.e. 34).
I am just confused why it is showing two weeks, when I look at the data in sql I can see that the difference in data between these two is strictly the month in which these dates fell (10 in the earliest month and 24 and the later month in any example).
I initially saw this in my original cube that I created on my own, in turn, I pulled up trusty adventureWorks practice cube and found that it was present in that cube also.
This is due to the fact that within this date hierarchy, the lowest attribute in the hierarchy was date not week. Therefore, there was always a split for weeks by date. This can be alleviated by making a date hierarchy with week as the lowest portion of a date hierarchy.

Resources