I've developed an intranet system for our company which uses a SQL Server 2008 backend. This stores an awful lot of information and I'm frequently asked to build reports for various managers to help with the business. Quite often these reports are variations on a theme, whilst sometimes they're quite unique. At the moment I write SQL to perform the report and have them dump the required output via ASP.Net pages. What I'd really like to do is get away from that, and I was thinking along the lines of having the managers query the database using Excel so that they can decide what fields to filter on etc. To this end I wrote a couple of views and used Excel to connect to them. The problem is that without filtering you end up with a lot of data, so I was wondering about the best way to approach this. I've not had anything to do with data warehousing/Analysis Services but I wondered if that was a route to look at, or should I be looking at Reporting Services? I've got access to the full Microsoft stack so happy to use different solutions
I'm more then happy to spend some time doing some reading/research but I'm a bit unsure where to begin so any pointers would be gratefully received.
Thanks in advance
Related
I am a developer and performance tester but not a DBA. My team is working on a performance testing tool that is specific to our software. One of the features we want it to have is the ability to generate a database report immediately after the test. Our software is database agnostic. For Oracle, I can easily create a snapshot id before and after the test and programmatically create an AWR report for those snapshots, write to a file and save with other artifacts we gather. Works great.
For SQL Server, however, there is no AWR equivalent (that I know of). I know the MDW as part of the SSMS has a UI for getting things like top 10 slow SQL and things like that. But, I have not yet found a way to programmatically create and extract a SQL performance report (preferably similar to Oracle's AWR) for SQL Server.
I am even willing to create the report myself if I can find a way to extract the raw data.
Any ideas would be greatly appreciated because searching online is not getting me anywhere.
P.S. I'm trying to do this in Java, by the way, but will accept help in any language. Thanks again!
Good news! In SQL Server 2016, you can use Query Store. This is like your flight recorder blackbox.. finding long running queries and waits. Capture baseline built in to SQL Server. You can compare before and after hardware changes and/or upgrades on queries. Maybe this similar to Oracle AWR.
Only available SQL Server 2016 and up.
I've been doing some research on this topic for a while now and can't seem to find a similar instance to my issue. I will try and explain everything as best I can, as simply as I can.
The problem is in the title; I am trying to migrate data from an Access database to SQL Server. Typically, this isn't really a hard problem as there exists several import/export tools within SQL Server but I am looking for the best solution. That or some advice/tips as I am somewhat new to database migration. I will now begin to explain my situation.
So I am currently working on migrating data that exists in an Access “database” (database in quotes because I don’t think it is actually a database, you’ll know why in a minute) in an un-normalized form. What I mean by un-normalized is that all of the data is in one table. This table has about 150+ columns and the rows number in the thousands. Yikes, I know; this is what I’ve walked into lol. Anyways, sitting down and sorting through everything, I’ve designed relationships for the data that normalize it nicely in its new home, SQL Server. Enter my predicament (or at least part of it). I have the normalized database set up to hold the data but I’m not sure how to import it, massage/cut it up, and place it in the respective tables I’ve set up.
Thus far I’ve done a bunch of research into what can be done and for starters I have found out about the SQL Server Migration Assistant. I’ve begun messing with it and was able to import the data from Access into SQL Server, but not in the way I wanted. All I got was a straight copy & paste of the data into my SQL Server database, exactly as it was in the Access database. I then learned about the typical practice of setting up a global table/staging area for this type of migration, but I am somewhat of a novice when it comes to using TSQL. The heart of my question comes down to this; Is there some feature in SQL Server (either its import/export tool or the SSMA) that will allow me to send the data to the right tables that already exist in my normalized SQL Server database? Or do I import to the staging area and write the script(s) to dissect and extract the data to the respective normalized table? If it is the latter, can someone please show me some tips/examples of what the TSQL would look like to do this sort of thing. Obviously I couldn’t expect exact scripts from anyone without me sharing the data (which I don’t have the liberty of as it is customer data), so some cookie cutter examples will work.
Additionally, future data is going to come into the new database from various sources (like maybe excel for example) so that is something to keep in mind. I would hate to create a new issue where every time someone wants to add data to the database, a new import, sort, and store script has to be written.
Hopefully this hasn’t been too convoluted and someone will be willing (and able) to help me out. I would greatly appreciate any advice/tips. I believe this would help other people besides me because I found a lot of other people searching for similar things. Additionally, it may lead to TSQL experts showing examples of such data migration scripts and/or an explanation of how to use the tools that exist in such a way the others hadn’t used before or have functions/capabilities not adequately explained in the documentation.
Thank you,
L
First this:
Additionally, future data is going to come into the new database from
various sources (like maybe excel for example)...?
That's what SSIS is for. Setting up SSIS is not a trivial task but it's not rocket science either. SQL Server Management Studio has an Import/Export Wizard which is a easy-to-use SSIS package creator. That will get you started. There's many alternatives such as Powershell but SSIS is the quickest and easiest solution IMO. Especially when dealing with data from multiple sources.
SSIS works nicely with Microsoft Products as data sources (such as Excel and Sharepoint).
For some things too, you can create an MS Access Front-end that interfaces with SQL Server via sql server stored procedures. It just depends on the target audience. This is easy to setup. A quick google search will return many simple examples. It's actually how I learned SQL server 20+ years ago.
Is there some feature in SQL Server that will allow me to send the
data to the right tables that already exist in my normalized SQL
Server database?
Yes and don't. For what you're describing it will be frustrating.
Or do I import to the staging area and write the script(s) to dissect
and extract the data to the respective normalized table?
This.
If it is the latter, can someone please show me some tips/examples of
what the TSQL would look like to do this sort of thing.
When dealing with denormalized data a good splitter is important. Here's my two favorites:
DelimitedSplit8K
PatternSplitCM
In SQL Server 2016 you also have split_string which is faster (but has issues).
Another must have is a good NGrams function. The link I posted has the function attached at the bottom of the article. I have some string cleaning functions here.
The links I posted have some good examples.
I agree with all the approaches mentioned: Load the data into one staging table (possibly using SSIS) then shred it with T-SQL (probably wrapped up in stored procedures).
This is a custom piece of work that needs hand built scripts. There's no automated tool for this because both your source and target schemas are custom schemas. So you'd need to define all that mapping and rules somewhow.... and no SSIS does not magically do this!
It sounds like you have a target schema and mappings between source and target schema already worked out
As an example your first step is to load 'lookup' tables with this kind of query:
INSERT INTO TargetLookupTable1 (Field1,Field2,Field3)
SELECT DISTINCT Field1,Field2,Field3
FROM SourceStagingTable
TargetLookupTable1 should already have an identity primary key defined (which is not mentioned in the above query because it is auto generated)
This is where you will find your first problem. You'll almost definitely find your distinct query just gives you a whole lot of duplicated mispelt data rubbish data. So before you even load your lookup table you need to do data cleansing.
I suggest you clean the data in your source system directly but it depends how comfortable you are with that.
Next step is: assuming your data is all clean and you've loaded a dozen lookup tables in this way..
Now you need to load transactions but you don't know the lookup key that you just generated!
The trick is to pre-include an empty column for this in your staging table to record this
Once you've loaded up your lookup table you can write the key back into the staging table. This query matches back on the fields you used to load the lookup, and writes the key back into the staging table
UPDATE TGT
SET MyNewLookupKey = NewLookupTable.MyKey
FROM SourceStagingTable TGT
INNER JOIN
NewLookupTable
ON TGT.Field1 = NewLookupTable.Field1
AND TGT.Field2 = NewLookupTable.Field2
AND TGT.Field3 = NewLookupTable.Field3
Now you have a column called MyNewLookupKey in your staging table which holds the correct lookup key to load into you transaction table
Ongoing uploads of data is a seperate issue but you might want to investigate an MS Access Data Project (although they are apparently being phased out, they are very handy for a front end into SQL Server)
The thing to remember is: if there is anything ambiguous about your data, for example, "these rows say my car is black but these rows say my car is white", then you (a human) needs to come up with a rule for "disambiguating" it. It can't be done automatically.
So there are quite a number of ways to skin this cat. I don't know much about the "Migration Assistant", but I somehow doubt it's going to make your life easier given what you're trying to do.
I'd just dump the whole denormalized mess into a single big staging table then shred it where you need it using SQL. I know you asked for help with the TSQL, but without having some idea of what the denormalized data is and how you want to re-shape it, all I can do really is suggest you read up on SQL in general (select, from, where, group by, etc).
You could also do the work in SSIS, but ultimately the solution you use is largely going to depend on the nature of how you need to normalize the big denormalized data set. IMHO doing this in SQL is usually the easiest way, but then again when you're a hammer, everything looks like a nail.
As far as future proofing the process, how you import the Access data probably will have little bearing on how you'd import Excel data. If you have a whole lot of different data sources which you'll need to incorporate on a recurring basis, SSIS might be a good choice to invest some time and effort into for the long run. No matter what, incorporating data from a distinct data source takes time and effort. You'll have to do some extra work no matter what. I would weight how frequently you think you'll have to integrate a given data source, and how much effort is involved to massage it into the format you want.
I have a completely different opinion. Because I do both database development and Microsoft's Power BI - - on the PBI side we come across a lot of non-normalized data because a lot of the data is coming in from excel.
My guess is that what is now in Access was an import of something originally began in excel.
Excel Power Query and PBI offers transforms to pivot and unpivot layout. I would use these tools to do that task. Then import the results into SQL.
We have an Access 2016 database with lots of tables, forms and reports from a client. The client would like other people to access the data in this database but doesn't want to spend the money to convert the forms and reports to a website. They would rather have Access 2016 as a frontend with it's forms and reports and store the data in a centralized location. The issue is the users who will access this data won't be on the same LAN or network.
The solution I came up with was to use SQL Azure as the database backend and keep the forms and reports in the Access 2016 database frontend. Can anyone think of an alternative? Does Microsoft have some kind of online hosting with Office 365? I have nothing at all against SQL Server and use it frequently but just don't want to go through the effort of upsizing the database when a simpler solution existed.
You can certainly place the back end on SQL Azure. However given that a typical internet connection is about 100 times slower, then MUCH effort is required to optimize the application. So you need significant experience on how to optimize an Access application to work with SQL server. This setup is thus doable, but will take significant amounts of work to achieve decent performance.
Another possible solution is to use a SharePoint or office 365 back end (which supports SharePoint tables). This setup only works well if table’s sizes are in general below 5000 records. You also have to ensure all table relations are standard auto number PK and child tables use a standard long number column to relate back to the parent table.
Likely the best solution is to setup a server and run remote desktop. This gives the best performance, and the end users don’t need to install access nor your front end part.
I explain in detail the “slowness” with using SQL server over the internet in this article of mine along with some suggestions and possible solutions.
http://www.kallal.ca//Wan/Wans.html
Ok, let me explain the environment we are facing here:
We have an ASP.NET MVC 4 app that uses a SQL Server database.
This app isolates data in "projects", so when any user connects to it only can work on the data of one of this projects.
Sometimes... a group of users have to travel to remote regions for some days to retrieve data for a single project, and quite often they won't be able to have an internet connection (even mobile or satellite solutions are often out of reach).
While the displaced team works on a project, people at the office still can work on the rest of the projects (but no on the one that is abroad).
So... we are pondering the possibility of using a laptop to act as a "mobile server", where users can download the data from a specific project before travelling. While abroad, they can work against this "mobile server", update any data on their project and, when they come back, they could upload their updated data to the main server.
Our idea is to create stored procedures on both servers (main and mobile) that executes different queries to update data from a project between them, passing the project identifier as a parameter. Probably using Linked servers to allow main and mobile to see themselves during update operations.
Our questions here are:
Is this a good aproach?
Is there any other better approach that we're not seeing?
Are there any risks we should pay attention to in this or other approachs?
I've never used Bidirectional transaction replication so if that works for you, problem solved. I do have quite a bit of experience with data migration, including merging large data sets into software driven systems. And from that experience, replication has hurt us more than it has helped us (from a migration/merge view).
The biggest challenge in my opinion is going to be conflict resolution. I know you say that all of the data is in project specific databases, but there is no shared data at all? What about multiple remote users updating the same data? In that case you're going to need a little more than just replication.
Instead of maintaining two databases at all times (one for mobile, one as the regular in-house db), why not a system where a job is called to your main system indicating that a project needs to be prepared for "offline mode" (the job could be stored procedures or SSIS packages or straight T-SQL). Whatever the technology used, this job would copy all of the requested project data to a new database on the remote server/laptop and mark it somehow in the main database as read-only to prevent users in the office from updating that data.
Once the data is in offline mode on the remote server, the users can update and use the data as much as they want from that remote server. Then when the users get an internet connection or they are back in the office they can kick off another job that syncs the data to the main server, removes offline mode, and deletes/archives the remote database. Almost like a temporary project database.
Seriously, it sounds like a fun project.
Technologies to look at:
SSIS (Sql Server Integration Services) - In my experience, this is extremely fast at moving data and allows you the ability to add logic to handle conflict resolution, error logic, etc. It's free (with certain Sql Server editions) and the community is huge so supporting it should be easy. SSIS is not as dynamic as some of the specialized solutions out there.
A data migration suite like Pervasive's Data Integrator - I loved this but it's expensive. You could right an entire solution in this product that could handle the processing of your data bidirectionally and like SSIS it allows for complex programming logic.
T-SQL - With a linked server you could just write straight queries (using stored procedures if you wanted). The problem here is security on the linked server. We don't use them because of this issue. Linked Servers: Good or Bad?
Start using some of Microsoft's built in change detection technologies right off the bat. It's harder to implement when you're already using the system. Change Data Capture (CDC) will give you a full history of the records updated while Change Tracking will give you a light-weight summary of your changes. Using either technology will make syncing the data many times easier.
Change Tracking: http://msdn.microsoft.com/en-us/library/bb933874.aspx
Change Data Capture: http://msdn.microsoft.com/en-us/library/cc645937.aspx
SSIS: http://msdn.microsoft.com/en-us/library/ms169917.aspx
SQL Server Agent Jobs: http://msdn.microsoft.com/en-us/library/ms189237.aspx
Some time ago my company was evaluating different reporting solutions. We settled on MS SSRS particularly because it's capable of connecting to various types of data stores, including MS SQL, Oracle and SAP Netweaver BI. It has proven the test of time pretty well, however, we're now under fire from management because SSRS is not capable of mixing data sources into the same data set.
So, I searched long and hard for a reporting solution that can "inner join" data from separate systems, but I came up short. I am about to propose that we custom write reports (ASP.NET) for these cross-system report requests, but I wanted to ping the internet first for any advice.
How do you "inner join" across your massive enterprise systems for reporting purposes?
Take a look at BIRT from Actuate. BIRT comes in open-source and commercial flavors and I believe it allows joins across data sources.
Perhaps a linked server in SQL Server? Watch out for performance issues and I'm not sure if SSRS has limitations against them - I don't think that it does. You can reference a table like this - MYSERVER01.DATABASE1.dbo.TABLE. More info from the source.
For best performance you would be pulling all your disparate data in to a data warehouse, but that is a major undertaking that management may not be willing to fund.
One way to join across data sources in SSRS is to use subreports - see http://msdn.microsoft.com/en-us/library/ms159837.aspx .
Peformance is unlikely to be good using this method, however - Sam's suggestion of a linked server is likely to be more practical.
(According to BIRT's documentation it does enable joins between datasets, as DMKing suggested - I haven't tried using this feature yet.)
this limitation was removed in SQl Server 2008 R2. there is a work around in previous versions
see my full response here: How can i add a field to dataset from another dataset in ssrs?
I was going to suggest using SSIS as a possibility until I read this
http://blogs.msdn.com/b/jenss/archive/2009/04/23/consuming-ssis-package-data-in-reporting-services-and-using-web-services-in-addition-part-1.aspx
I have used linked servers to combine data from Oracle/SQL Server, not nice but it worked.
Failing that I'd go with subreports.
Failing that, point out to management how expensive SAP/Oracle etc are and they'll soon stop moaning.
:)