I'm creating a warehouse using SQL Server 2008 and Analysis Services. I've managed to create and populate the dimension tables, but I'm having a lot of trouble writing the SQL for loading the fact table. For one thing, I'm not sure how to load the keys of the fact table with the PKs from the dimension table. I tried writing a query that had a series of JOINs to get the keys and the measures I want, but the statement got so complicated that I got lost.
This is the star schema that I have to work from:
http://i.imgur.com/C3DGj.png
What am I doing wrong? I have a feeling that I'm missing something pretty basic, but I'm fairly new to this and most of the information I found online seemed to deal with using SSIS, which I don't have installed.
Any help would be appreciated.
Todays Data Warehouse Developer uses SSIS for loading dimensional models. Typically, lookups are used to convert the dimensional attribute into a key. Most of the time, the data is going to be on another server or in a flat file or something else that forces you to use an ETL tool like SSIS, but in your case, you can get it done without. If your enterprise is serious about BI, you should push to get SSIS installed and learn it.
For your situation, assuming you have a table loaded with raw facts locally, you should be able to do an insert/select.
Basically, you'll want to inner join (since you've had no problems populating the dimension tables) each dimension to the raw facts table. Something like:
INSERT trainingcentrefact
(timekey,locationkey,instructorkey,coursekey,paid,notpaid,... etc)
SELECT
t.timekey
,l.locationkey
,i.instructorkey
,c.coursekey
,rf.paid
,rf.notpaid
,... etc
FROM rawfacts rf
INNER JOIN timedimension t ON rf.time = t.time
INNER JOIN locationdimension l on rf.location = l.location
INNER JOIN instructordimension i on rf.instructor = i.instructor
INNER JOIN coursedimension c on rf.course = c.course
Related
I'm following a tutorial on Azure Data Factory migration from Azure SQL to Blob through pipelines. While most of the concepts make sense, the 'Copy Data' query is a bit confusing. I have a background in writing Oracle SQL, but Azure SQL on ADF is pretty different and I'm struggling to find specific technical documentation, probably because it's not widely adopted yet.
Pipeline configuration shown below:
Query is posted below:
SELECT data_source_table.PersonID,data_source_table.Name,data_source_table.Age,
CT.SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION
FROM data_source_table
RIGHT OUTER JOIN CHANGETABLE(CHANGES data_source_table,
#{activity('LookupLastChangeTrackingVersionActivity').output.firstRow.SYS_CHANGE_VERSION})
AS CT ON data_source_table.PersonID = CT.PersonID
WHERE CT.SYS_CHANGE_VERSION <=
#{activity('LookupCurrentChangeTrackingVersionActivity').output.firstRow.CurrentChangeTrackingVersion}
Output to the sink Blob as a result of the 'Copy Data' query:
2,name2,14,4,U
7,name7,51,3,I
8,name8,31,5,I
9,name9,38,6,I
Couple questions I had:
There's a lot of external referencing from other activities in the 'Copy Data' query like #{activity('...').output.firstRow.CurrentChangeTrackingVersion. Is there a way to know the appropriate syntax to referencing external activities? Can't find any good documentation the syntax, like what .firstRow is or what the changetable output looks like. I can't replicate this query in SSMS, which makes it a bit of a black box for me.
SYS_CHANGE_OPERATION appears in the SELECT with no table name prefix. Is this directly querying from the table in SourceDataset? (It points to data_source_table, which has table tracking enabled) My main confusion stems from how table tracking information is stored in the enabled tables. Is there a way to show all the table's tracked changes in SSMS? I see some documentation on what the return values, but it's hard for me to visualize it without seeing it on the table, so an output query of some return values would be nice.
LookupLastChangeTracking activity queries in all rows from a table (which when I checked, is just one row), but LookupCurrentChangeTracking activity uses a CHANGE_TRACKING function to pull the version of the data sink in table_store_ChangeTracking_version. Why does it use a function when the data sink's version is already recorded in table_store_ChangeTracking_version?
Sorry for the many questions, but I can't find any way to make this learning curve a bit less steep. Any guides or resources would be awesome!
There is an article to get the same thing done from the UI and it will help you understand it better .
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-tracking-feature-portal .
1 . These are the Lookup activity ,. very straight forward , please read about them here .
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
2.SYS_CHANGE_OPERATION is a column on data_source_table and so that should be fine . Regarding the details on the how the change tracking (CT) is stored , I am not sure if all the system table are exposed on Azure SQL , but we did had few table on the on-prem version of the SQL which could be queried if needed . But for this exercise I think that will be an over kill .
I need some help from your side.
For example I have a SQL Server view MyView.
Adb.vs.MyView
with columns:
ID
Name
Address
Email
Phone
Logic behind the view is next
SELECT
Ac.AccountID AS ID,
Ac.AccountName AS Name,
Ad.Main_Adress AS Address,
Em.Main_Email AS Email,
Concat(Ph.Phone_Area_Code,Ph.Mobile_Phone_Number) AS Phone
FROM
Bdb.dbo.Account AS Ac
INNER JOIN
Cdb.dbo.Address AS Ad ON Ac.AccountID = Ad.AccountID
INNER JOIN
Cdb.dbo.Emails AS Em ON Ac.AccountID = Em.AccountID
INNER JOIN
Cdb.dbo.PhoneBook AS Ph ON Ac.AccountID = Ph.AccountID
NOTE:
No KEY relations build between all this tables.
My target to reverse engineer this view to get next kind of result:
Please suggest any kind of tool/tools or scripts to perform this.
Also, if somebody know similar solution but for Rev.En. stored procedures which was used to populating data into tables I will be really appreciated
bec. I will need to reverse tons of such kind views and stored procedures in nearest future.
Thanks in advance for any kind of support !
Take a look at the system views. Most of what you want is probably available in INFORMATION_SCHEMA.VIEW_COLUMN_USAGE. For example:
USE Adb
GO
Select * from INFORMATION_SCHEMA.VIEW_COLUMN_USAGE
where VIEW_NAME='MyView'
GO
Views are just stored SQL scripts that SQL Server adds into your query as a sub select. Consequently the fields used are not actually saved within SQL Server in the same way that table definitions are. Your best bet is to script out all your views using SQL Server Management Studio and plug the files into a tool such as the General SQL Parser which can output the columns and tables that are used in that script.
It isn't perfect but should get you a long way towards what you are trying to achieve. You can try it for free here.
I have fours SQL tables (with different number of rows and column) from those I want to build a new table for reporting purpose based on some rules. I built query statements and run in management studio. In this case, I get some response from management studio with some data but if I try to use those SQL queries in data source to build a report in Visual Studio, I get memory exception. What can I do for this?
Here is the SQL statements I used
SELECT Intable.Fra, EqTable.Name, Rf.Data
FROM EqTable,InTable,RfTable
WHERE RfTable.Name = EqTable.Name AND EqTable.Name] NOT LIKE '%Ann%';
The equivalent tables are shown in the following diagram.
I can see two possibilities:
You have an additional "]" character included in your SQL but this maybe a typo
Do you need a join for the table [inTable]?
This is almost certainly because you are using the ANSI-89 style join. You should use the "newer" ANSI-92 style join.
Bad habits to kick : using old-style JOINs
What has happened here is you have joined RfTable and EqTable by Name. But then you have created a cross join to InTable. The memory exception is probably because once you create this cross join the amount of rows is staggering.
What I really don't understand though is you said you have 4 tables but only 3 of them are in your query.
I setup a test-bed application vulnerable to mssql injection and i wondered, how do i extract column data from another database? To extract column data from current database we do:
convert(int,(select columnnamegoeshere from tablenamegoeshere))--
and then to enumerate the other column data we do:
convert(int,(select columnnamegoeshere from tablenamegoeshere where columnnamegoeshere not in ('firstentryfromcolumn')))--
But if it's not inside the default database and we want to extract column data from another database, how do we do that? Thanks.
I would do a join... but, to keep it simple for you, here's your code using a different database column:
convert(int,(select columnnamegoeshere from tablenamegoeshere where columnnamegoeshere not in (select top 1 firstentryfromcolumn from otherdb.dbo.otherdbtable)))
It would be better to join the 2 tables together and exclude the records if that's possible... subqueries are usually slower and not the best way to go about it.
I am new to SQL and have a question that I have not been able to find the answer to yet, probably because I am not sure exactly what to search for. I am using SQL Server 2012. I have a DB that was converted from an old Access DB (Jobs DB) and need to join that with a SQL Spatial DB. The join would be most likely a ONE to MANY as there is one "LOT" in the spatial DB and would correspond to many possible "JOBS" in the Jobs DB. The join would have to be on the "LOT" except the only problem is that the field that would contain the "LOT" in the Jobs DB is text and might contain multiple lots in the same field, example below.
"L.0005&L.0006" in the Jobs DB would correspond to "5" & "6" (separate rows) in the Spatial DB
What I need is to create another row and copy in all the columns but modify the first to be just "5" instead of "L.0005" and the new row be "6" instead of L.0006". The key would then have to move from just the "JOB_NUMBER" to "JOB_NUMBER" & "LOT" for that table. If I could do this in all one query, that would be great, but if it needs to be 2 queries, then I can do the join to the Spatial DB myself I think, it's just the first part that has me stumped.
If anyone knows of a better way to accomplish this, I am open to suggestions for sure. If this has already been answered elsewhere, please direct me to that solution as, like I said above, I don't think I even know what to search for specifically and haven't found anything with what I am searching.