Store Curve Data in SQL Server table - sql-server

I have a SQL Server table in which I need to store daily interest rate data.
Each day, rates are published for multiple time periods. For example, a 1 day rate might be 0.12%, 180 day rate might be 0.070% and so on.
I am considering 2 options for storing the data.
One option is to create columns for date, "days" and rate:
Date | Days | Rate
=========================
11/16/2015 | 1 | 0.12
11/16/2015 | 90 | 0.12
11/16/2015 | 180 | 0.7
11/16/2015 | 365 | 0.97
The other option is to store the "days" and rate via a JSON string(or XML.)
Date | Rates
=============================================================
11/16/2015 | { {1,0.12}, {90,0.12}, {180, 0.7}, {365, 0.97} }
Data only will be imported via bulk insert; when we need to delete we'll just delete all the records and re-import; there is no need for updates. So my need is mostly to read rates for an specified date or range of dates into a .NET application for processing.
I like option 2 (JSON) - it will be easier to create objects in my application; but I also like option 1 because I have more control over the data - data types and constraints.
Any similar experience out there on what might be the best approach or does anyone care to chime in with their thoughts?

I would do option 1. MS SQL Server is a relational database, and storing key:value pairs as in option 2 is not normalized, and is not an efficient for SQL Server to deal with. If you really want option 2, I would use something other than SQL Server.

Related

How to store and query dates in SQLite - using DB browser for Sqlite

Not sure which data type and formatting to use when storing dates in my tables in DB browser for SQLite. Also not sure how to format the SQL queries when retrieving the data. For instance I've created a films database with an attribute for Release (as text data type), I want to be able to find films between two dates.
table : films
+-------+------------+-------------+
| id | title | Release |
+-------+------------+-------------+
| 1 | Star Wars | 2000-01-01 |
| 2 | Star Trek | 2010-01-01 |
tried:
SELECT *
FROM tblFilms
WHERE Release BETWEEN (2000-01-01 AND 2010-01-01)
This does not return any values
It's indiffernent if the following is from DB Browser for SQLite, or the SQLite terminal itself. The software is just a frontend.
Storing dates can be done in principle in any form, as SQLite is not typed, but natively SQLite works with one of these formats:
text in a subset of the ISO-8601 format: 2022-09-17 23:34:08
Julian day : 2459840.40688294
Seconds since (or before) 1970-01-01 00:00:00 UTC (the unix timestamp). : 1663451396
See documentation for SQLite dates.
Queries on dates, and limiting results, is explained with examples here.
Formatting results can be done using strftime:
select strftime("%m/%d/%Y", "2022-09-17") as 'US DATE'
returns
09/17/2022

Do I need to overwrite Google Analytics Data stored in the DB

I try to store pageviews per 7days to DB. Application has a schedule task to fetch most access page from Google Analytics every 30 minutes
Store page url (active source) and count to DB.
| _id | active_source | page_views |
| 1 | /foo-1 | 20 |
| 2 | /foo-3 | 9 |
| 3 | /foo-2 | 2 |
Should I remove previous data before overwrite?
I fear that while delete database, users can't fetch any info.
I'm using MongoDB.
Technically Google Analytics data does not change after 72 hours. Assuming that you are using the Google Analytics reporting API v4 you can check the isgolden parameter in the result. If the data is golden then you know it is done processing and will never change.
So there is no reason to request data you already have stored that is older than 72 hours as the data has completed processing.
In the past i have run a request nightly that would select data for the last five days. Before inserting i would delete less than 72 hours old. This would ensure that i always got the final totals after a few days and refreshed the not complete data every day until it was considered final
Example of a nightly run.
Delete all data in the database for the last three days
Request data from google analytics for last four days
Insert data into database.
Run though with dates.
Today is 2018-01-11 so I delete all the data in the database for 2018-01-10, 2018-01-09 and 2018-01-08. I select all the rows from Google analytics for 2018-01-11, 2018-01-10, 2018-01-09 and 2018-01-08 and insert them.
Tomorrow is 2018-01-12 so tomorrow i will delete all the data in the database for 2018-01-11, 2018-01-10 and 2018-01-09. I select all the rows from Google analytics for 2018-01-12, 2018-01-11, 2018-01-10 and 2018-01-09 and insert them. Notice how tomorrow I will be leaving 2018-01-08 as this data is now processed and wont ever change.
This way you get partial data for the last three days and only ever update data that may not be completed processing

Optimal View Design To Find Mismatches Between Two Sets of Data

A bit of background...my company utilizes a piece of software that stores information about a mortgage loan in independent fields. These fields are broken up across many tables in the loan database.
My current dilemma revolves around designing a view(s) that will allow me to find mismatched data on a subset of loans from the underwriting side of our software and the lock side of our software.
Here is a quick example of the data returned from the two views that already exist:
UW View
transID | DTIField | LTVField | MIField
50000 | 37.5 | 85.0 | 1
Lock View
transID | DTIField | LTVField | MIField
50000 | 42.0 | 85.0 | 0
In the above situation, the view should return the fields that are not matching (in this case the DTIField and the MIField). I have built a comparison view that uses a series of CASE statements to return either a 0 for not matched or a 1 for matched already:
transID | DTIField | LTVField | MIField
50000 | 0 | 1 | 0
This is fine in itself but it is creating a bit of an issue downstream on the reporting side. We want to be able to build a report that would display only those transIDs that have mismatched data and show which columns are not matched. Crystal Reports is the reporting solution in question.
Some specifics about the data sets...we have 27 items of the loan that we are comparing (so a total 54 fields). There are over 4000 loans in the system and growing. There are already indexes on the transID fields.
How would you structure the view to return all the data needed for the report? We can do a good amount of work in Crystal Reports but ideally much of the logic would be handled in MSSQL.
Thanks for any assistance.
I think there should be no issue in comparing the 27 columns for a given row. Since you'll be reading the row just once and comparing the columns on that row in both the tables, it shouldn't really pose any performance issues. You can use some hash functions HASHBYTES to assign a hash value to the combination of these 27 fields in both the tables and then use this field to compare which rows should be returned by the view. This should result in some performance improvement. Testing will reveal more.

How to optimize large database requests

I am working with a database that contains information (measurements) about ships. The ships send an update with their position, fuel use, etc. So an entry in the database looks like this
| measurement_id | ship_id | timestamp | position | fuel_use |
| key | f_key | dd-mm-yy hh:ss| lat-lon | in l/km |
A new one of these entries gets added for every ship every second so the amount of entries in the database gets large very fast.
What I need for the application I am working on is not the information for one second but rather cumulative data for 1 minute, 1 day, or even 1 year. For example the total fuel use over a day, the distance traveled in a year, or the average fuel use per day over a month.
To get that and calculate that from this raw data is unfeasible, you would have to get 31,5 million records from the server to calculate the distance traveled in a year.
What I thought was the smart thing to do is combining entries into one bigger entry. For example get 60 measurements and combine them into 1 minute measurement entry in a separate table. By averaging the fuel use, and by summing the distance traveled between two entries. A minute entry would then look like this.
| min_measurement_id | ship_id | timestamp | position | distance_traveled | fuel_use |
| new key |same ship| dd-mm-yy hh| avg lat-lon | sum distance_traveled | avg fuel_use |
This process could then be repeated to work with hours, days, months, years. This way a query for a week could be done by requesting only 7 queries, or if I want hourly details 168 entries. Those look like way more usable numbers to me.
The new tables can be filled by querying the original database every 10 minutes, that data then fills the minute table, which in turn updates the hours table, etc.
However this seems to be a lot of management and duplication of almost the same data, with constantly the same operation being done.
So what I am interested in is if there is some way of structuring this data. Could it be sorted hierarchically (after all seconds, days, minutes are pretty hierarchical) or are there other ways to optimize this?
This is the first time I am using a database this size so I also did not really know what to look for on the internet.
Aggregates are common in data warehouses so your approach to group data is fine. Yes, you are duplicating some of the data, but you'll get the speed benefit.

Large amount of timecourses in database

I have a rather large amount of data (~400 mio datapoints) which is organized in a set of ~100,000 timecourses. This data may change every day and for reasons of revision-safety has to be archived daily.
Obviously we are talking about way too much data to be handled efficiently, so I made some analysis on sample data. Approx. 60 to 80% of the courses do not change at all between two days and for the rest only a very limited amount of the elements changes. All in all I expect much less than 10 mio datapoints change.
The question is, how do I make use of this knowledge? I am aware of concepts like the Delta-Trees used by SVN and similar techniques, however I would prefer, if the database itself would be capable of handling such semantic compression. We are using Oracle 11g for storage and the question is, is there a better way than a homebrew solution?
Clarification
I am talking about timecourses representing hourly energy-currents. Such a timecourse might start in the past (like 2005), contains 8760 elements per year and might end any time up to 2020 (currently). Each timecourse is identified by one unique string.
The courses themselves are more or less boring:
"Course_XXX: 1.1.2005 0:00 5; 1.1.2005 1:00 5;1.1.2005 2:00 7,5;..."
My task is making day-to-day changes in these courses visible and to do so, each day at a given time a snapshot has to be taken. My hope is, that some loss-free semantical compression will spare me from archiving ~20GB per day.
Basically my source data looks like this:
Key | Value0 | ... | Value23
to archive that data I need to add an additional dimension which directly or indirectly tells me the time at which the data was loaded from the source-system, so my archive-database is
Key | LoadID | Value0 | ... | Value23
Where LoadID is more or less the time the source-DB was accessed.
Now, compression in my scenario is easy. LoadIDs are growing with each run and I can give a range, i.e.
Key | LoadID1 | LoadID2 | Value0 | ... | Value23
Where LoadID1 gives me the ID of the first load where the 24 values where observed and LoadID2 gives me the ID of the last consecutive load where the 24 values where observed.
In my scenario, this reduces the amount of data stored in the database to 1/30th

Resources