load into Fact Table - sql-server

i have four dimension tables (titles,publishers,stores,period) and i like to load data into a fact table but i didn't know how! by the way i'm using SSIS
and in the measures of this fact table i have Qty and turnover(chiffre d'affaire) that i need to calculate but i din't know how to. plus in my source data i have in every date a qty.
i want to know which ssis tools i need to use to achieve that + calculating the measures.
i calculate qte from sources but it don't give me to every row a qte. it gives me qte of all in one row!!

Your question is very basic, so I suggest you to take your time and look at some SSIS tutorials on youtube or:
https://learn.microsoft.com/en-us/sql/integration-services/lesson-1-create-a-project-and-basic-package-with-ssis
To answer your question though: you need to first add a Data Flow Task into your Control Flow. Double click on the DFT and in the Data Flow, add an Excel Source to connect to your source. If you want to do some calculations, you need to use a Derived Column transformation.

Related

How to modify the projection of a dataset in a ADF Dataflow

I want to optimize my dataflow reading just data I really need.
I created a dataset that maps a view on my database. This dataset is used by different dataflow so I need a generic projection.
Now I am creating a new dataflow and I want to read just a subset of the dataset.
Here how I created the dataset:
And that is the generic projection:
Here how I created the data flow. That is the source settings:
But now I want just a subset of my dataset:
It works but I think I am doing wrong:
I wanto to read data from my dataset (as you can see from source settings tab), but when I modify the projection I read from the underlying table (as you can see from source option). It seems an inconsistence. Which is the correct way to manage this kind of customization?
Thank you
EDIT
The solution proposed does not solve my problem. If I go in monitor and I analyze the exections that is what I saw...
Before I had applyed the solution proposed and with the solution I wrote above I got this:
As you can see I had read just 8 columns from database.
With the solution proposed, I get this:
And just then:
Just to be clear, the purpose of my question is:
How ca n I read only the data I really need instead of read all data and filter them in second moment?
I found a way (explained in my question) but there is an inconsistency with the configuration of the dataflow (I set a dataflow as input but in the option I write a query that read from db).
First import data as a Source.
You can use Select transformation in DataFlow activity to select CustomerID from imported dataset.
Here you can remove unwanted columns.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-select

SQL to Excel - Max each sheet

I have a SQL Table with close to 2 Million rows and I am trying to export this data into an Excel file so the stakeholders can manipulate data, see charts, so on...
The issue is, when I hit refresh, it fails after getting all the data saying the number of rows exceed max rows limitation in Excel. This table is going to keep growing every day.
What I am looking for here is a way to refresh data, then add rows to Sheet 1 until max rows limitation is reached. Once maxed out, I want the rows to start getting inserted into Sheet 2. Once maxed out, move to 3rd sheet, all from the single SQL table, from a single refresh.
This does not have to happen in Excel (Data -> Refresh option), I can have this as a part of the SSIS package that I am already using to populate rows in the SQL table.
I am also open to any alternate ways to export SQL table into a different format that can be used by said stakeholders to create charts, analyze data, and whatever else pleases them.
Without sounding too facetious, you are suggesting a very inefficient method.
The best way of approaching this method is not to use .xlsx files at all for the data storage.
Assuming your destination stakeholders don't have read access to the SQL server, export the data to .csv and then use Power Query in some sort of 'Dashbaord.xlsx' type file to load the .csv to the data model which can handle hundreds of millions of rows instead of just 1.05m.
This will allow for the use of Power Pivot and DAX for analysis and the data will also be visible in the data model table view if users do want raw rows (or they can refer to the csv file..).
If they do have SQL read access then you can query the server directly so you don't need to store any rows whatsoever as it will read directly.
Failing all that and you decide to do it your way, I would suggest the following.
Read your table into a Pandas df and iterate over each row and cell of the dataframe, writing to an your xlsx[sheet1] using openpyxl then once the row number reaches 1,048,560 simply iterate to xlsx[sheet2].
In short: openpyxl allows you to create workbooks, worksheets, and write to cells directly.
But depending on how many columns you have it could take incredibly long.
Product Limitations
Excel 2007+ 1,048,576 rows by 16,384 columns
A challenge with your suggestion of filling a worksheet with the max number of rows and then splitting is "How are they going to work with that data?" and "Did you split data that should have been together to make an informed choice?"
If Excel is the tool the users want to use and they must have access to all the data, then you're going to need to put the data into a Power Pivot data model (and yes, that's going to impact the availability of some data visualizations). A Power Pivot model is an in-memory tabular data set. What the means is that the data engine, xVelocity, is going to use a bunch of memory but can get over the 1 million row limitation. Depending on how much memory is required, you might need to switch from the default 32 bit Office install and go with a 64 bit install (and I've seen clients have to max RAM out on old, low end desktops because they went cheap for business users).
Power Pivot will have a connection to your SQL Server (or other provider). When it refreshes data, it's going to fire off queries and determine the unique values in columns and then create a dictionary of unique values. This allows it to compress the data with low cardinality really well - sales dates are likely going to be repeated heavily within your set so the compression is good. Assuming your customers are typically not-repeat customers, a customer surrogate key would have high cardinality and thus not compress well since there's little to no repeat. The refresh is going to be dependent on your use case and environment. Maybe the user has to manually kick it off, maybe you have SharePoint with Excel services installed and then you can have it refresh the data on various intervals.
If they're good analysts, you might try turning them on to Power BI. Same-ish engine behind the scenes but built from the ground up to be an response reporting tool. If they're just wading through tables of data, they're not ready for PBI. If they are making visuals out of the data, PBI is likely a better fit.

How to read and change series numbers in columns SSIS?

I'm trying to manipulate a column in SSIS which looks like below after i removed unwanted rows with derived column and conditional split in my data flow task. The source for this is a flatfile.
XXX008001161022061116030S1TVCO3057
XXX008002161022061146015S1PUAG1523
XXX009001161022063116030S1DVLD3002
XXX009002161022063146030S1TVCO3057
XXX009003161022063216015S1PUAG1523
XXX010001161022065059030S1MVMA3020
XXX010002161022065129030S1TVCO3057
XXX01000316102206515901551PPE01504
The first three numbers from the left (starting with "008" first row) represent a series, and the next three ("001") represent another number within the series. what i need is to change all of the first three numbers starting from "001" to the end.
The desired reslut would thus look like:
XXX001001161022061116030S1TVCO3057
XXX001002161022061146015S1PUAG1523
XXX002001161022063116030S1DVLD3002
XXX002002161022063146030S1TVCO3057
XXX002003161022063216015S1PUAG1523
XXX003001161022065059030S1MVMA3020
XXX003002161022065129030S1TVCO3057
XXX00300316102206515901551PPE01504
...
My potential solution would be to load the file to a temporary database table and query it with SQL from there, but i am trying to avoid this.
The final destination is a flatfile.
Does anybody have any ideas how to pull this off in SSIS? Other solutions are appreciated also.
Thanks in advance
I would definitely use the staging table approach and use windows functions to accomplish this. I could see a use case if SSIS was on another machine than the database engine and there was a need to offload the processing to the SSIS box.
In that case I would create a script transformation. You can process each row and make the necessary changes before passing the row to the output. You can use C# or VB.
There are many examples out there. Here is MSDN article - https://msdn.microsoft.com/en-us/library/ms136114.aspx

SSIS error when importing excel date column and feeding to Slowly Changing Dimension

I'm hoping someone that has come across this can help me out because I'm pulling my hair out here.
I have an excel sheet that has a bunch of columns, one of them being a date column. When I use an Excel Source and link it to a Slowly Changing Dimension transformation everything goes great until I click the last button to configure the component and then I get the following message. The date column is coming from excel as DB_DATE type and the database column is date. I've tried doing a data conversion and a derived column to coerce the date but still no love. Any ideas?
Here is the error:
Error at Data Flow Task [SSIS.Pipeline]: The component view is
unavailable. Make sure the component view has been created.
Error at Data Flow Task [Slowly Changing Dimension [26]]: The input column "input column "TargetDate" (94)" cannot be mapped to
external column "external column "TargetDate" (87)" because they have
different data types. The Slowly Changing Dimension transform does not
allow mapping between column of different types except for DT_STR and
DT_WSTR.
I have also successfully used a standard OLEDB destination with this same excel sheet with a date field and it imported the whole sheet fine so I can't see why it's having an issue using the Slowly Changing Dimension.
Based on the advice of some SSIS rock stars I've met, you're probably better avoiding the SCD transformation and rolling your own. That you're getting a standard OLEDB destination to work leads leads me to this conclusion. The advice I received against the SCD transforamtion is based on performance. As I recall OLEDB Commands are generated, and that gets you into row-by-row slowness.

Cross-reference Distance chart in SQL Server 2008 or Excel?

I would like to cross-reference construct a distance chart similar to the one here (example is a road-distance cross-reference chart) and, ideally, store the data in SQL Server 2008 (preferably the Express version). It needs these properties / abilities
Every column has a corresponding row with the same name (ie. not misspelled like my example).
Changing the value at one Row-Column intersection would update the mirror intersection (Column-Row) or the mirror data could be ignored.
The distance-values would need to be end-user editable.
The end-user would need to be able to add, delete or rename a column/row pair.
The end-user needs to be able to sort the columns and have the rows move automatically.
There could be hundreds of pairs.
a look-up query needs to find a distance given a start & destination (Row & Column)
The distance chart is reasonably straightforward to implement in Excel. Considering this, am I better off...
Using Excel as the user editing UI and then updating an SQL 'thing' with the new data?
Using Excel as the data-source even if it means performance issues with querying the data?
Using an as-yet undiscovered stroke of genius detailed here in an answer?
Sure looks like an Excel application to me, start to end. (heh)
I can't imagine your users typing enough data in to make performance an issue. Excel will only take 32757 rows by ditto columns. If that's enough, I'd say you're golden.

Resources