Updating dimension tables using SQL Server (BIDs or Data Tools) - sql-server

I'm quite confused as to how I'm supposed to be adding dimension members to my data warehouse. Let's say that TOWN_NAME is a dimension table that links town_Id to a town_name. So, now, I have 1000 customer names, and they are from 9 towns. Suddenly, in my next ETL process, a customer ends up being added whose town is not amongst that 9 towns i have in my dimension. So I need to add a member to my dimension table. Which step/process in BIDS or DATA TOOLS (BIDS 2012) would have I to use? How should this be one? I'm quite lost as to what could be done.

The usual pattern - regardless of what tools you're using to populate your data warehouse - is to populate your Dimension before you populate your Fact, precisely to avoid this problem.
The usual way to do things is to have a package which pulls out your Dimension data from your source system(s), and then load any new rows into your Dimension table. Then, when your Fact table load happens later in the process you look-up the ID column from the Dimension using the town name. Your Fact data should then be loaded into the Fact table with the ID for the relevant town as one of its column values.
Specifically, in SSIS, you can manage this by creating a package which does your Dimension table load, and another package in the same project which does your Fact table load. Then you can control the order these happen in a couple of different ways:
You can create a third package which uses two Execute Package tasks to call the Dimension package first, and then the Fact package.
You could create a SQL Server Agent Job which first calls the Dimension package, then calls the Fact package.
If you want to be able to run everything from within Visual Studio in order in one go, then go with the first option.

Related

Adding table values to a treeview

I am using Vb 2008 express edition and am very new to treeviews. i have basic knowledge of how to connect to a database. the database i am working with is a microsoft access database and has a large amount of tables with various information. two of these tables i need to put into a treeview. one has 2 columns called date and date id, the date will be the main nodes on the treeview. the other table has 8 colums, among them are the corresponding date id's from the first table, the purchase order id and the purchase order number. the child nodes will be the purchase order number.
Now i know there are a bunch of tutorials out there on treeview population through microsoft access databases but i have found none specifically with what i need, they are all just about dumping ALL the data from the database into the table. i just want specific contents of two tables. if someone could help me out with this i would be very grateful. i can give more information if needed on what i am working with or anything else.!
This is an example of what it needs to look like. i am upgrading this program from vb6 to vb.net which is why i already have the program.
What you will have to do is loop through the first table (using sql and a datareader, for example) and then create the initial (parent) nodes. (Note that the below is a general idea, you will have to figure out the loop and datareader parts).
looping structure
TreeView1.Nodes.Add(nodeName, nodeName)
next record
Then, loop through the second table, adding the record to the correct node...
looping structure
TreeView1.Nodes(parentNodeName).Nodes.Add(nodeName, nodeName)
next record

Load data from multiple source into a destination

I have a desktop application through which data is entered and it is being captured in MS Access DB. The application is being used by multiple users(at different locations). The idea is to download data entered for that particular day into an excel sheet and load it into a centralized server, which is an MSSQL server instance.
i.e. data(in the form of excel sheets) will come from multiple locations and saved into a shared folder in the server, which need to be loaded into SQL Server.
There is a ID column with IDENTITY in the MSSQL server table, which is the primary key column and there are no other columns in the table which contains unique value. Though the data is coming from multiple sources, we need to maintain single auto-updating series(IDENTITY).
Suppose, if there are 2 sources,
Source1: Has 100 records entered for the day.
Source2: Has 200 records entered for the day.
When they get loaded into Destination(SQL Server), table should have 300 records, with ID column values from 1 to 300.
Also, for the next day, when the data comes from the sources, Destination has to load data from 301 ID column.
The issue is, there may be some requests to change the data at Source, which is already loaded in central server. So how to update the data for that row in the central server as the ID column value will not be same in Source and Destination. As mentioned earlier ID is the only unique value column in the table.
Please suggest some ides to do this or I've to take up different approach to accomplish this task.
Thanks in advance!
Krishna
Okay so first I would suggest .NET and doing it through a File Stream Reader, dumping it to the disconnected layer of ADO.NET in a DataSet with multiple DataTables from the different sources. But... you mentioned SSIS so I will go that route.
Create an SSIS project in Business Intelligence Development Studio(BIDS).
If you know for a fact you are just doing a bunch of importing of Excel files I would just create many 'Data Flow Task's or many Source to Destination tasks in a single 'Data Flow Task' up to you.
a. Personally I would create tables in a database for each location of an excel file and have their columns map up. I will explain why later.
b. In a data flow task, select 'Excel Source' as the source file. Put in the appropriate location of 'new connection' by double clicking the Excel Source
c. Choose an ADO Net Destination, drag the blue line from the Excel Source to this endpoint.
d. Map your destination to be the table you map to from SQL.
e. Repeat as needed for each Excel destination
Set up the SSIS task to automate from SQL Server through SQL Management Studio. Remember you to connect to an integration instance, not a database instance.
Okay now you have a bunch of tables right instead of one big one? I did that for a reason as these should be entry points and the logic to determinate dupes and import time I would leave to another table.
I would set up another two tables for the combination of logic and for auditing later.
a. Create a table like 'Imports' or similar, have the columns be the same except add three more columns to it: 'ExcelFileLocation', 'DateImported'. Create an 'identity' column as the first column and have it seed on the default of (1,1), assign it the primary key.
b. Create a second table like 'ImportDupes' or similar, repeat the process above for the columns.
c. Create a unique constraint on the first table of either a value or set of values that make the import unique.
c. Write a 'procedure' in SQL to do inserts from the MANY tables that match up to the excel files to insert into the ONE 'Imports' location. In the many inserts do a process similar to:
Begin try
Insert into Imports (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End try
-- if logic breaks unique constraint put it into second table
Begin Catch
Insert into ImportDupes (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End Catch
-- repeat above for EACH excel table
-- clean up the individual staging tables for the next import cycle for EACH excel table
truncate TableExcel1
d. Automate the procedure to go off
You now have two tables, one for successful imports and one for duplicates.
The reason I did what I did is two fold:
You need to know more detail than just the detail a lot of times like when it came in, from what source it came from, was it a duplicate, if you do this for millions of rows can it be indexed easily?
This model is easier to take apart and automate. It may be more work to set up but if a piece breaks you can see where and easily stop the import for one location by turning off the code in a section.

Database Table Design: Expanding a data table while maintaining backwards compatibility

The project I'm working on tracks data on a year by year basis. The user will log into the system and specify the year it wants to access the data of. For example, the user could specify the year 2004, and the .jsp pages will display 2004 data.
My problem is that from 2013 onward, the data for one .jsp page will be different, and the current database table schema needs to be modified, but backwards compatibility for the 2012 and before years needs to be maintained.
Currently (2012 and before), the relevant database table displays two columns, "continuing students" & "new starts" that is displayed by a single .jsp. For 2013 and onward, 4 columns need to be displayed. The original two columns are being split into two subcategories each, undergrad and graduate. So I can't simply add those new columns to the existing table because that would violate third normal form.
What do you think the best way to manage this situation? How do I display the new data while still maintaining backwards compatibility to display the data for older years?
Some options:
Introduce the fields and allow for nulls for older data. I think you rejected this idea.
Create new table structures to store the new data going forward. It's an least an option if you don't want (1). You could easily create a view that queries from both tables and presents a unified set of data. You could also handle this in the UI and call two separate stored procedures depending on the year queried.
Create a new table with the new attributes and then optionally join back to your original table. This keeps the old table the same and the new table is just an extension of the old data. You would write a stored procedure potentially to take in the year and then return the appropriate data.
One of the things to really consider is that the old data is now inactive. If you aren't writing to it anymore, it's just historical data that can be "archived" mentally. In that case I think it's ok to freeze the schema and the data and let it live by itself.
Also consider if your customers are likely to change the schema yet again. If so, then maybe (1) is the best.

Merging multiple Access databases into SQL Server

We have a program in which each user is given their own Access database. We'd like to merge these all together into a single SQL Server database.
The problem is that, using the SQL Server import/export wizard, the primary/foreign keys do not get updated. So for instance if one user has this table:
1 Apple
2 Banana
and another user has this:
1 Coconut
2 Cheeseburger
the resulting table looks like this:
1 Apple
2 Banana
1 Coconut
2 Cheeseburger
Similarly, anything that referenced Banana by its primary key (2) is now referencing both Banana and Cheeseburger, which will not make the vegans very happy.
Is there any way to automatically update the primary/foreign key references when importing, other than writing an extremely long and complex import-script?
If you need to keep them fully compartmentalized, you have to assign some kind of partitioning column to each table. Is there a reason you need your SQL Server to have the same referential integrity as Access? Are you just importing to SQL Server for read-only reporting? In that case, I would not bother with RI. The queries will all require a partitionid/siteid/customerid. You could enforce that for single-entity access by wrapping tables with a table-valued UDF which required the partitionid. For cross-site that doesn't work.
If you are just loading to SQL Server for reporting, I would also consider altering the data model to support reporting (i.e. a dimensional model is sometimes better than a normalized model) instead of worrying about transaction processing.
I think we need to know more about the underlying goals.
Need more information of requirements.
My basic question is 'Do you need to preserve the original record key?' e.g. 1:apple in table T of user-database A; 1:coconut in table T of user-database B. Table T is assumed to have the same structure in all database instances. Reasons I can suppose that you may want to preserve the original data: (a) you may have a requirement to the reference the original data (maybe a visual for previous reporting), and/or (b) there may be a data dependency in the application itself.
If the answer is 'no,' then you are probably interested only in preserving all of the distinct data values. Allow the SQL table to build using a new key and constrain the SQL table field such that it contains unique data. This approach seems to preserve the original table structure (but not the original key value or its 'location') and may suffice to meet your requirement.
If the answer is 'yes,' I do not see a way around creating an index that preserves a pointer to the original database and the key that was created in its table T. This approach would seem to require an application modification.
The best approach in this case is probably to split the incoming data into two tables: one to identify the database and original key, another to identify the distinct data values. For example: (database) table D has records such as 'A:1:a,' 'A:2:b,' 'B:1:c,' 'B:2:d,' 'B:15:a,' 'C:8:a'; (data) table T1 has records such as 'a:apple,' 'b:banana,' 'c:coconut,' 'd:cheeseburger' where 'A' describes the original database 'location,' 1 is the original value in location 'A,' and 'a' is a value that equates records in table D and table T1. (Otherwise you have a lot of redundant data in the one table; e.g. A:1:apple, B:15:apple, C:8:apple.) Also, T1 has a structure similar to the original T and is seems to be more directly useful in the application.
Ended up creating an SSIS project for this. SSIS is a visual programming tool made by Microsoft (and part of their "Business Integration Studio", which comes with SQL Server) designed for solving exactly these sorts of problems.
Why not let Access use its replication manager to merge the databases? This will allow you to identify the conflicts and resolve them before importing to SQL Server. I'm fairly confident it will retain the foreign key relationships. If I understand your situation correctly, and the databases are the same structure with different data, you could load the combined database to the application and verify the data before moving to SQL Server.
What version of Access are you using? Here's a link for Access 2000. Use the language to adjust search parameters to fit your version.
http://technet.microsoft.com/en-us/library/cc751054.aspx

Adding a new dimension based on a key in fact table linked to one of the dimension tables

I have a fact table that holds all date & time attributes as keys which links to actual DATE & TIME dimension.
When I create a cube on top of it using SSAS 2005, these datetime attributes are split into individual dimensions for the CUBE, which is OK.
Problem is when I add a new datetime attribute to the fact table, my cube doesn't accept that and would not create a new datetime dimension just like the other ones, unless I recreate the cube from scratch.
Can anyone please suggest, how can I add this new attribute separately as a dimension, without having to recreate the cube?
I'm struggling to understand your issue.
It sounds as if you are trying to add a new datetime column(fact) (referenced to your apporpriate Dimension/s attribute) to the Fact table. If so, this changes the structure of the cube and so requires that the cube be re-processed.
To qualify correct use of terminology, a Dimension contains Attributes. A Fact table contains Facts not attributes.
The following reference may be of use.
http://msdn.microsoft.com/en-us/library/aa905984(SQL.80).aspx
Re: Comments
Any structural changes need to be applied/registered within the Data Source View (DSV) in the Business Intelligence Development Studio (BIDS), prior to processing the cube. Clicking the refresh button on the DSV, should prompt you with an option to apply any discovered changes to your tables. Also, should any of your additions/modifications be to the underlying tables of Dimensions, then you may also need to add the attributes in question to the appoprirate Dimension .dim file, prior to re-processing the cube.
Hope this makes sense.
The problem usually comes because of Unknown Member and Null Processing options setup along with the snowflake schema if you have it in your cube. I figured out what the problem actually was.
If you have a case as one mentioned, then SSAS doesn't bring up the structural changes by itself when you refresh the Data source view. In my case, since it was date & time dimensions, I had to add new dimensions manually (Cube dimensions) and setting their NULL Processing options correctly (in my case UnknownMember and not Automatic).
Since it can be tad difficult to do these changes for all such new columns added to underlying fact table, you can try updating the XMLA script using Find & Replace method, carefully crafted.

Resources