I have a large time-series data set in a table that contains 5 years of data. The data is very structured; it is clustered/ordered on the time column and there is exactly one record for exactly every 10 minutes over this entire 5 year period.
In my user-side application I have a time-series chart that is 400 pixels wide, and users can set the time scale from 1 hour up to 5 years. Therefore any query to the database by this chart that returns more than 400 records provides data that cannot be physically displayed.
What I want to know is; can anyone suggest an approach such that when the database is queried for a certain time range, the SQL database would dynamically make a suitable averaging aggregation that returns no more than 400 records?
Example 1): if the time range was 5 years, SQL Server would calculate ~1 value for every 4.5 days (5yrs*365days/400records required), so would average all the 10 minute samples for each 4.5 day bin and return a record for each bin. About 400 in total.
Example 2): If the time range was one month, SQL Server would calculate ~1 record for every 1.85 hours (31 days/400records), so would average all the 10 minute samples for each 1.85 hour bin and return a record for each bin. About 400 in total.
Ideally I'd like a solution that from the applications perspective can be queried just like a static Table.
I'd really appreciate any suggested approaches or code snippets.
some examples, if you have a datetime column (which is not quite clear from your question, as there is not table schema):
Grouping into interval of 5 minutes within a time range
SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc)
They should be quite easy to port to SQL server, use datediff to convert your datetime values into an unix timestamp and use round() with the function parameter <> 0 for the div.
Related
I have one table in my Informix 12.10 database which is currently log in with 10-second frequency.
My requirement is to retrieve data from that table in between Start Date and End Date with time interval (10 sec, 1Min, 2 Min 10 Min,1Hour, 6Hour).
Please help me to find a way.
I am creating a report in SSRS that shows the duration of phone calls.
In my T-SQL script I am using:
CONVERT(VARCHAR,DATEADD(second,Call,0),108)AS[Call Duration]
which works nicely and shows the time as 00:03:20, for example.
However when I create a table in SSRS and try to sum all the different time values it just says #error in the report. I need the report to be able to add these time values up so I can give a total per switchboard operator. So for example if officer x took three calls and they all lasted 3 minutes then I'd need the total to say 00:09:00
Do you know of a way where I can display the total time spend rather than having to list each value separately? I can sum up the number of seconds for each call - so for example get a total of 540 seconds - but need to show this as hh:mm:ss
Thanks
The report is throwing an error because you are trying to sum up a varchar value. Rather than trying to format your data in your SQL query, simply return the values in their raw form to your SSRS report and let your presentation layer format the data for you.
Rather than using a dateadd, it seems your call length is already held within your Call column? If that is the case, simply return that column to your report, either as detail rows to be summed if the detail is required elsewhere in your report, or pre-aggregated in your SQL as this will perform better.
You can then format your Call Duration as follows:
=format(today().AddSeconds(Fields!Call.Value),"HH:mm:ss")
If you aren't aggregating your call seconds in your SQL query, you will need to do this in your expression:
=format(today().AddSeconds(sum(Fields!Call.Value)),"HH:mm:ss")
Obviously this method assumes you won't have any calls longer than 24 hours. If that is a possibility, you will need to calculate the hours, minutes and seconds to be concatenated together.
I am working on a Cassandra data model for storing time series (I'm a Cassandra newbie).
I have two applications: intraday stock data and sensor data.
The stock data will be saved with a time resolution of one minute.
Seven datafields build one timeframe:
Symbol, Datetime, Open, High, Low, Close, Volume
I will query the data mostly by Symbol and Date. e.g. give me all data for AAPL between 2013-01-01 and 2013-01-31 ordered by Datetime.
The recommendation for cassandra queries is to query whole columns. So you could create five rows with the keys Open, High, Low, Close, Volume. And for each Symbol and Minute an own column. E.g. "AAPL:2013-01-04T130400Z".
This would result in a table of five rows and n*NT columns where n = number of symbols, nT = number of minutes.
Most of the time I will query date ranges. I.e. all minutes of a day. So I could rearrange the data to have columns named "AAPL:2013-01-04" and rows: OpenT130400Z, HighT130400Z, LowT130400Z, CloseT130400Z, VolumeT130400Z.
This would result in a table with n*nD columns (n: number of Symbols, nD: number of Days) and 5*nM rows (nM: number of minutes/entries per day).
To sum up: I have columns, which hold the information for a whole day for one symbol.
I have found a description how to deal with time series data in cassandra here http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
But I don't really get, if they use the hour (1332960000) as a column name or as a row key!?
I understood they use the hour as row key and have the small timesteps as columns. So they would have a fixed column number. But that would have disadvantages in reading because I would have to do a range query on keys! Am I right?
Second question:
If I have sensor data, which is much more fine grained than 1 minute stock data (let's say I have to save timesteps with a resolution of microseconds) how would I deal with this?
If I use columns for saving a composite of sensor channel and hours, and rows for microseconds since the last hour this would result in 3,600,000,000 rows and n*nH columns (n: number of sensors, nH: number of Hours).
I could not use the microseconds since last hour for columns because I have 3,6 billion points which is higher than the allowed number of 2 billion columns.
Did I get it?
What do you think about this problem? How to solve it?
Thank you!
Best,
Malte
So I have a suggestion for your first question about the stock data. A naive implementation might look like this:
RowKey:
Column Format:
Name: The current datetime granular to a minute
Value: a composite column of Open,High,Low,Close,Volume
So you would have something like
AAPL = [2013-05-02-15:38:00 | 441.78:448.59:440.63:15066146:445.52] ... [2013-05-02-15:39:00 | 441.78:448.59:440.63:15066146:445.52] ... [2013-05-02-15:40:00 | 441.78:448.59:440.63:15066146:445.52]
That would give you roughly half a million columns in one year so it might be ok for maybe 4 years. I wouldn't go and attempt to hit the 2 billion limit. What you could do is define a splitting factor on the row key. It all depends on your usage pattern, but a simple one might be on the year so the column family entry might look like this with a composite row key and that would guarantee that you always have less than a million columns per row.
AAPL:2013 = [05-02-15:38:00 | 441.78:448.59:440.63:15066146:445.52] ... [05-02-15:39:00 | 441.78:448.59:440.63:15066146:445.52] ... [05-02-15:40:00 | 441.78:448.59:440.63:15066146:445.52]
I have a large transaction table in SQL server which is used to store about 400-500 records each day. What is the data type should I use in my PK column? The PK column stores numeric values, for which integer seems suitable but I'm afraid it will exceed the maximum value for integer since I have so many records everyday.
I am currently using integer data type for my PK column.
With a type INT, starting at 1, you get over 2 billion possible rows - that should be more than sufficient for the vast majority of cases. With BIGINT, you get roughly 922 quadrillion (922 with 15 zeros - 922'000 billions) - enough for you??
If you use an INT IDENTITY starting at 1, and you insert a row every second, you need 66.5 years before you hit the 2 billion limit .... so with 400-500 rows per day - it will take centuries before you run out of possible values... take 1'000 rows per day - you should be fine for 5883 years - good enougH?
If you use a BIGINT IDENTITY starting at 1, and you insert one thousand rows every second, you need a mind-boggling 292 million years before you hit the 922 quadrillion limit ....
Read more about it (with all the options there are) in the MSDN Books Online.
I may be wrong here as maths has never been my strong point, but if you use bigint this has a max size of 2^63-1 (9,223,372,036,854,775,807)
so if you divide that by say 500 to get roughly the number of days-worth of records you get 18446744073709600 days-worth of 500 new records.
divide again by 365, gives you 50539024859478.2 years-worth of 500 records a day
so (((2^63-1) / 500) / 365)
if that's not me being stupid then that's a lot of days :-)
I have a WPF app, where one of the fields has a numeric input box for length of a phone call, called ActivityDuration.
Previously this has been saved as an Integer value that respresents minutes. However, the client now wishes to record meetings using the same table, but meetings can last for 4-5 hours so entering 240 minutes doesn't seem very user friendly.
I'm currently considering my options, whether to change ActivityDuration to a time value in SQL 2008 and try to use a time mask input box, or keep it as an integer and present the client with 2 numeric input boxes, one for hours and one for minutes and then do the calculation to save it in SQL Server 2008 as integer minutes.
I'm open to comments and suggestions. One further consideration is that I will need to be able to calculate total time based upon the ActivityDuration so the field DataType should allow it to be summed easy.
The new time datatype only supports 24 hours, so if you need more you'll have to use datetime.
So if sum 7 x 4 hour meetings, you'll get "4 hours" back
How the DB stores it is also different to how you present and capture the data.
Why not hh:nn type display and convert in the client and store as datetime?
Track the start and end time, no need to mask out the date, since the duration will just be a calculation off of the two dates. You can even do this in "sessions" such that one meeting can have multiple sessions (i.e. one meeting that spans across lunch, that shouldn't be counted toward the duration...).
The data type, then is either datetime or smalldatetime.
Then to get the "total duration" it's just a query using
Select sum(datediff(mm, startdate, enddate)) from table where meetingID = 1