Database/Datawarehouse design suggestions - sql-server

We have a legacy ERP system that stores data in flat files. we have replicated these flat files in SQL Server database pretty much as it is.
Some of the sales tables store historical data in multiple columns without storing any dates with them. the name of the column will tell us which month the sales data belong to. Sales01 is current month, Sales02 is previous month, Sales03 is the month before and so on. Same with sales Qty and margin. i.e. Qty01, qty02 and margin01 and margin 02 and so on. This is repeated for each customer and each item sold.
Now, I am working on a small project where I have to design a small DB for reporting with some tables that will be fed by this main database.
I want to load this data in such a way that these values from each month are stored in rows with a month-year or date from first day of the month in another column so I can use where clause with dates.
Not sure what would the be best way to go? I have written a stored proc in the past to load this data this way, but wonder if there a better way to go.
Some how I can use SSIS Pivot transformation?
or just use Pivot or similar statement to do this in an SP?
I will most likely be using this practice to built a data warehouse in future.

Related

SSAS Semi-Additive Measures on some dates

So I'm having trouble trying to configure a new cube that takes a snapshot of my company's open orders each day. Every night, a snapshot is taken and stored in the data warehouse with a date key for the date the snapshot was taken. This date would be the one that we want to be non-additive. However, we also have other dates in this data set, such as scheduled ship-date, order date, etc. that are fully-additive, just like the other non-date dimensions.
Does anyone have any advice on how I can create a cube for this data so that the order totals can be summed across the other dates, but the capture date is LastNonEmpty?
The first connected Date dimension in the Dimension Usage tab is the semi-additive Date dimension. The rest are additive. I describe the exact behavior here.
This answer applies to Multidimensional cubes (not Tabular) which is what I assume you have since you mentioned LastNonEmpty.

Creating "View" to connect two Databases

I have two Databases, one which looks at Financial Year 2018- 2019 and another which will be looking at Financial year 2019-2020.
Now these are Sage Databases so they have a lot of Tables within each of them, the setup of each Database is exactly the same.
Now I had a report which looked at the current financial year no problem, but I didn't realise that they had it set up as the previous financial year would be stored in a separate Database.
I've said "View" within my title, but that's not really what I need as I need whole tables.
So my question is this, is there a construct within SQL that will allow me to create a new datasource of some kind which contains the 12 tables I actually need to populate my report and the data inside these tables be a union of the 12 tables from 18-19 and the 12 tables of 19-20.
I hope that make a least a little sense.
(I would just try and make use of a view, but there's multiple datasets within the report which make use of the joins in slightly different ways)

How can I use variables and SQL code within an SSIS package?

I have an SSIS package I am building to take data from a .CSV file and load it into a table in a SQL Server database. The .CSV file has more columns than my table and I'm looking to filter out the data based on some of these columns that are not being inserted into the table.
I have year, kind, type, dollars as my column names in the .CSV file but I'm only pulling type and dollars into the DB. However, I can only pull those rows where the kind= "L" and year is the current year (with one major caveat). If the process is running in the first quarter of a given year (so month <= 3) it needs to use the previous year as my qualifier for what rows it pulls in from the .CSV file. For instance, say it is February 2015 when this package is running, I need it to pull only rows with a year of 2014 and kind="L" from my .CSV file. If it is September 2015 then it needs to pull in rows with a year of 2015 and kind="L".
Any idea what the best way of doing this is? Right now I have a conditional split in my package but I can only get it to say year==YEAR(GETDATE()) and this will not work for the first quarter. I'd need some sort of variable logic to say something like IF(currentmonth<=3 THAN #year = currentyear-1) ELSE (#year = currentyear) and then use the #year variable in the conditional split. Is this possible?
Any help is much appreciated!
Normally for this kind of workflow, I will import the entire CSV into a temporary table, and then have a separate SQL script or view which reads from the temporary table and applies whatever business logic is needed for the final view.
If you want the logic to be in the SSIS package you can use a derived column component to declare a new boolean field for example IncludeRowInOutput and set it to be something like
((currentmonth <= 3 and year = year(getdate() - 1)) or (year = year(getdate))) and kind = 'L'
Then you can do the conditional split based on the IncludeRowInOutput field.
I'd normally be wary of using too much script components, I find that they are harder to debug and make the dataflow harder to understand.

storing time and day of week

Challenge :
I have a requirement in which I have to implement recurring events. I am storing day of the week , time and the date range for which the event will reoccur.
Possible solution:
Storing time and day of week as string and enumeration.
Storing current and setting the time and day for that day ?
Question :
What is the best practice on storing time and day of week ?
EDT: I am using SQL Server Database
Another alternative is to have computed columns representing the parts that you're after.
CREATE TABLE dbo.foo
(
bar DATETIME,
bar_day_of_week AS DATEPART(WEEKDAY, bar),
bar_time AS CONVERT(TIME, bar)
);
INSERT dbo.foo(bar) SELECT GETDATE();
SELECT bar, bar_day_of_week, bar_time FROM dbo.foo;
This best approach might dependson the database you are using. But, there are two general approaches, both quite reasonable.
The first is to store dates and times in the native format for the database. Most databases have functions to extract day of the week and time from a date time type. You would write your queries using these functions. Typically, you would store these as one field, but sometimes you might want to separate the date and time portions.
The second is to have a calendar table. A calendar table would have a date or dateid as a key, and then contain columns for what you want to know about it. Day of the week would be an obvious column.
A calendar table is the preferred solution in some situations. For instance, if you are going to internationalize your application, then being able to get day of the week from a table makes it easier to get the string English, Greek, Chinese, Russian, Swahili, or whatever your favorite language is. Or, if you want to keep track of specific holidays, then a calendar table can store this information as well. Or, if you have a business running on a strange financial calendar (such as 5-4-4), then a calendar table is a good choice.
In either case, you do not need to store redundant date information in the table. You should derive it, either from a built-in function or by looking up what you want in a reference table.

Recommended way of adding daily data to database

I receive new data files every day. Right now, I'm building the database with all the required tables to import the data and perform the required calculations.
Should I just append each new day's data to my current tables? Each file contains a date column, which would allow for a "WHERE" query in the future if I need to analyze data for one particular day. Or should I be creating a new set of tables for every day?
I'm new to database design (coming from Excel). I will be using SQL Server for this.
Assuming that the structure of the data being received is the same, you should only need one set of tables rather than creating new tables each day.
I'd recommend storing the value of the date column from your incoming data in your database, and also having a 'CreateDate' column in your tables, with a default value of 'GetDate()' so that it automatically gets populated with the current date when the row is inserted.
You may also want to have another column to store the data filename that the row was imported from, but if you're already storing the value of the date column and the date that the row was inserted, this shouldn't really be necessary.
In the past, when doing this type of activity using a custom data loader application, I've also found it useful to create log files to log success/error/warning messages, including some type of unique key of the source data and target database - ie. if coming from an Excel file and going into a database column, you could store the row index from Excel and the primary key of the inserted row. This helps tracking down any problems later on.
You might want to consider having a look at SSIS (SqlServer Integration Services). It's the SqlServer tool for doing ETL activities.
yes, append each day's data to the tables; 1 set of tables for all data.
yes, use a date column to identify the day that the data was loaded.
maybe have another table with a date column and a clob column. The date to contain the load date and the clob to contain the file that you imported.
Good question. You most definitely should have a single set of tables and append the data daily. Consider this: if you create a new set of tables each day, what would, say, a monthly report query look like? A quarterly report query? It would be a mess, with UNIONs and JOINs all over the place.
A single set of tables with a WHERE clause makes the querying and reporting manageable.
You might do a little reading on relational database theory. Wikipedia is a good place to start. The basics are pretty straightforward if you have the knack for it.
I would have the data load into a stage table regardless and append to the main tables after. Once a week i would then refresh all data in the main table to ensure that the data remains correct as per the source.
Marcus

Resources