I am a final year IT student and I need a dataset to create a data warehouse. I tried searching on Kaggle and on open data but could not find the right one. Is there any source for a large free data set I can use for Business Intelligence purposes? Any help would be greatly appreciated!
Microsoft have a few - AdventureWorks is the one I've used in the past.
https://learn.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver15
Related
I have just started learning the SQL Server and Power BI. I'm working on a project and for that i need to create a Measure for the Power BI report that will dynamically extracts the information about all the tables (like Table Name, Last Updated, Dependencies, etc) used in the Tabular solution.
I couldn't find any satisfactory answer anywhere. Is there anyone here who could help?
If you are using an SSAS tabular data model or multi-dimensional cube, you can use DMVs to extract meta data to report on or analyze.
This ARTICLE includes a great walk through and a power bi template you can use to report on metadata.
Hope it helps!!
DAX isn't going to do this for you. However, there are 3rd party tools like Power BI Helper that will allow you to extract model details.
I'm looking for some OLAP data preferably in star schema (or snowflake) for testing a new tool. I've already got the Foodmart database that Mondrian provides. Type of data is not important as long as it has dimensions and associated facts. The larger the size the better for load testing. Anybody knows where I can download such a dataset, ideally in SQL or CSV? (other formats are fine too)
Apologies for being MS SQL focussed, but the Adventure Works DWH is not bad as far as an snowflake schema design. Not not huge as far as data volumes. With some clever SQL you would be able to generate extra rows in the database.
Alternatively try Project Real - a larger DWH project that put together by MS on 2005
This article give a pretty clear description of a Star Schema:-
IBM (nee Informix) red brick warehouse
I am new to data warehousing and I am a little confused.
Please provide some simple steps to create a cube and fill it and make query on it
To know :
I have a database with the original data and I have designed the star schema and made appropriate tables
I have created an analysis service project in VS 2008 and then I have made the data source -data source view-dimensions - and the cube all that based on the star schema I have created previously
Now what should I do to:
fill this cube
make query on this cube
Microsoft has a pretty good tutorial here, and for MDX see here.
In SSAS, filling-in data into a cube is called processing cube, measure, dimension...
See details here.
Tutorials are fairly long, so I do not think that a generic answer would fit here, I would suggest that you work you way through tutorials and then return with more focused, specific questions.
Hi I need some sample SQL Server Employee database with data such as id, surname, name, age, adress etc. It must be quite big, I search with google, but I don't find any good sample.
Can any body help ?
The only tool I can think of is Red Gate Data Generator.
Otherwise, you'd be looking at someone's actual data or expecting someone to provide such a tool free of charge.
You may find it a challenge to find such a database, privacy concerns prevent most publication of personal data such as you require. You may find one using pseudo-data, that is data that looks like what you want but is not about 'real' people. But you will probably find it easier to generate your own such pseudo-data. If you take this approach you can be sure that the data you generate meets your requirements too.
if you have access to Microsoft Visual studio for Database professionals it has a data generator built in which you can use
link text
Also the AdventureWorks db also has a Employee table i think.
This might not be Employee based, but is definitely worth having a look
AdventureWorks
The Northwind database contains some sample employee data in a .mdf file which can be attached to SQL Server.
The old Northwind database has Employee table but that's quite small.
I have about 150 000 rows of data written to a database everyday. These row represent outgoing articles for example. Now I need to show a graph using SSRS that show the average number of articles per day over time. I also need to have a information about the actual number of articles from yesterday.
The idea is to have a aggregated view on all our transactions and have something that can indicate that something is wrong (that we for example send out 20% less articles than the average).
My idea is to have yesterdays data moved into SSAS every night and there store the aggregated value of number of transactions and the actual number of transaction from yesterdays data. Using SSAS would hopefully speed up the reports.
Do you think this is the right idea? Should I skip SSAS and have reports straight on the raw data? I know how use reporting services on raw data using standard SQL queries but how would this change when querying SSAS? I don't know SSAS - where do I start ..?
The neat thing with SSAS is that you can get those indicators that you talk about quite easily either by creating calculated measures or by using KPIs.
I started with Delivering Business Intelligence with Microsoft SQL Server 2005. It had some good introduction, but unfortunately it's too verbose when it comes to the details. But if you want to understand SSAS, OLAP and reporting using this framework it's a good start.
Mosha Pasumansky has a blog on SSAS and MDX with great links.
Other than that I would recommend Microsofts Online books.
Are you sure you aren't mixing up SSAS (Analysis Services) and SSIS (integration services)?
SSAS is not an ETL, it is an OLAP tool.
SSIS is an ETL tool.
I agree with everything that Rowan said. I'm just confused by the terms.
SSAS is an ETL tool. Basically you get data from somewhere (your outgoing articles), do something to it (aggregate), and put it somewhere else (your aggregates table, data warehouse, etc). Check the link for details.
You probably won't be keeping all of the rows in the DB indefinitely and if you want to be able to report on longer trends you need in any case do some kind of aggregating of historical data. So making the reports use this historical data store as their source makes sense. You can then use it to do all kinds of fancy reporting.
TL;DR: Define your aggregated history table with your future reporting needs in mind. Use the SSAS to populate the table and refresh it from the daily updates. Report from that table. Further reading: Star Schemas and data warehousing.
#Sergio and #Rowan
Yes, we're not talking about loading and transforming data into the database (like a SSIS tool would do). That's solved using our integration platform.
#Riri maybe SSAS is overkill for the situation you presented. If you only need to daily populate sumarization tables, you can accomplish it by creating a regular JOB in SQL Server and doing it in a regular T-SQL script.
I've used this approach for several years in a daily process to calculate business indicators from about 9GB new data / day. It works, it's fast, it's simple and it uses a technology you're already used to. If your daily process get's more complicated (it needs to read from files, use FTP, send emails) you can move to a SSIS package (or any other ETL tool you like), but I cannot recommend using SSAS unless you need to provide OLAP capabilities to your users.