Storing Data in MS Access and Querying it in Excel - database

I'm currently on a non-IT project that currently has data which requires some systematic analysis (mathematical formulas). The data is currently stored in Excel but it's a pain to manually enter/massage data in Excel to do the analysis.
Would it be better to store the data in MS Access and use Excel to query Access? In other words, store the data in access and do the analysis in Excel. Being able to do SQL queries on the data would also simplify the analysis.
If so, do any of you have websites/books that describes how I would go about querying Access from Excel?

You're talking about Access for entry and editing of the raw data. Then doing your advanced computations in Excel with Access feeding it the data.
I think that's an Excel-ent strategy because you're taking advantage of the strengths of both applications.
As to how to query Access from Excel, this page provides detailed clear instructions:
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-6112813.html
I found that one from Google by searching for "excel query access". That was the first link; check the others if you want more information.

I think there are a lot of advantages to storing the data in Access. If getting at the data with SQL will be helpful, then it's a no-brainer. Other than that, you can store the data in Excel just as if it was in a database, but it will be up to you to enforce normalization and data integrity. If you put it in Access, Jet will force you to do it (assuming you set it up properly).
For Excel 2003 and prior, I have a page with lots of pictures. http://dicks-clicks.com/excel/ExternalData.htm
Also, if you're comfortable entering your own SQL, do yourself a favor and download QueryManager from here http://www.jkp-ads.com/download.asp It will allow you to edit your queries much faster than using MSQuery.

Yes, it would be better. There are many different paths you can choose from. Here's an option you may not have considered:
Open a new Access file
In that access file, make a link to the Excel file where your data is stored. (Go to Tables, right click and select "link tables")
Query as you need and copy/paste your results to the same XL file (but on a different tab) or to a second Excel file.
This way, you can leave your raw data in Excel if you're more comfortable with that, but still run SQL queries.
Regarding your second question, you don't need a book for that. In Excel, you can go to Data->"Import External Data"->"Import Data" and automatically pull data from your Access queries straight into Excel.
I hope this helps.
Edit: I also recommend that you google advanced Excel functions like Sumproduct, Sumif, and Countif. Depending on your analysis, these 3 functions may allow you to skip Access entirely.

Related

Power BI and SQL Server Indexes

I've done some research without getting valuable information about my question.
I'm working on a data warehouse project and one my customer's requirement is that to use power bi pro for data visualisation.
What is not clear to me is if power bi, while acquiring data in its data model, would or not benefit from the indexing structure developed in SQL Server.
Thank you in advance for recommendation/tips on this subject.
It somewhat depends on whether you are using a live connection.
Existing indexes may speed up data loading when using PowerBI in import mode where the data source is a view, query, or stored procedure.
They will also be used in Live mode when connecting to the above sources, and might be used when connecting directly to multiple tables.
As the comments state, if you are bringing entire tables into PowerBI with import mode, then the existing indexes will not benefit you, and the internal SSAS instance that PBI uses is a whole different kettle of fish.
One caveat is that columnstore indexes can be used to get around some of the data size limitations when dealing with the gateway as described here: https://community.powerbi.com/t5/Power-Query/Using-SQL-Server-with-Nonclustered-Columnstore-Index/td-p/563787, but that's not directly related to your question.
Indexes help with retrieval speed on the server end. The answer to how much it will help depends on the specifics of your situation. If you are doing a lot of data transformation and mashup in the Power BI query editor, indexes will only help where there is a step that selects rows from the SQL Server. It won't help with steps where the processing is being done on the Power BI end (such as merging with data from an Excel file or adding custom columns or some forms of substituting values). However, since you mention a data warehouse rather than a simple database, I'm going to assume you're barely doing any transformation on the Power BI end, relying instead on the server end to do the heavy lifting. In that case indexes will definitely help speed things up if they're done strategically
There are some difference between Import mode and Connect live mode.
Import mode:
Data import can be used against any data source type, it can combining Data from different sources. Current Power BI service limitation published file size is 1 GB.
When using import, data are stored in Power BI file/service. Therefore, there is no need to setup permissions on data source side (service account for load is enough) and you can share data publically or with people outside organization. On the other hand, all data are stored on Power BI. It is supported to implement full DAX expressions and full Power Query transformations.
Connect live mode:
There are more limitations for live connection in place. It doesn’t work against all data sources. Current list can be seen here, it cannot combine data from multiple sources.
You are also limited to just one data source/database you selected. You can’t combine data from multiple data sources anymore. If you are connected to SQL Database, you can still create logical relationships between objects from that database as well as measures and calculated columns. When you are connected to SQL Server Analysis Services, you are limited just to report layout and even can’t make calculated columns ,while you can only create measures currently. When using live connection, users have to have access to underlying data source. This means you can’t share outside of your organization or publically. And It is not supported to implement full DAX expressions, only Report Level Measures, to learn more about report level measures, watch this great video from Patrick, and there is no Power Query transformations.
You can learn more: directquery-live-connection-or-import-data-tough-decision

Most Efficient Way to Migrate Un-Normalized Data in an Access Database to a Normalized Form in a SQL Server Database

I've been doing some research on this topic for a while now and can't seem to find a similar instance to my issue. I will try and explain everything as best I can, as simply as I can.
The problem is in the title; I am trying to migrate data from an Access database to SQL Server. Typically, this isn't really a hard problem as there exists several import/export tools within SQL Server but I am looking for the best solution. That or some advice/tips as I am somewhat new to database migration. I will now begin to explain my situation.
So I am currently working on migrating data that exists in an Access “database” (database in quotes because I don’t think it is actually a database, you’ll know why in a minute) in an un-normalized form. What I mean by un-normalized is that all of the data is in one table. This table has about 150+ columns and the rows number in the thousands. Yikes, I know; this is what I’ve walked into lol. Anyways, sitting down and sorting through everything, I’ve designed relationships for the data that normalize it nicely in its new home, SQL Server. Enter my predicament (or at least part of it). I have the normalized database set up to hold the data but I’m not sure how to import it, massage/cut it up, and place it in the respective tables I’ve set up.
Thus far I’ve done a bunch of research into what can be done and for starters I have found out about the SQL Server Migration Assistant. I’ve begun messing with it and was able to import the data from Access into SQL Server, but not in the way I wanted. All I got was a straight copy & paste of the data into my SQL Server database, exactly as it was in the Access database. I then learned about the typical practice of setting up a global table/staging area for this type of migration, but I am somewhat of a novice when it comes to using TSQL. The heart of my question comes down to this; Is there some feature in SQL Server (either its import/export tool or the SSMA) that will allow me to send the data to the right tables that already exist in my normalized SQL Server database? Or do I import to the staging area and write the script(s) to dissect and extract the data to the respective normalized table? If it is the latter, can someone please show me some tips/examples of what the TSQL would look like to do this sort of thing. Obviously I couldn’t expect exact scripts from anyone without me sharing the data (which I don’t have the liberty of as it is customer data), so some cookie cutter examples will work.
Additionally, future data is going to come into the new database from various sources (like maybe excel for example) so that is something to keep in mind. I would hate to create a new issue where every time someone wants to add data to the database, a new import, sort, and store script has to be written.
Hopefully this hasn’t been too convoluted and someone will be willing (and able) to help me out. I would greatly appreciate any advice/tips. I believe this would help other people besides me because I found a lot of other people searching for similar things. Additionally, it may lead to TSQL experts showing examples of such data migration scripts and/or an explanation of how to use the tools that exist in such a way the others hadn’t used before or have functions/capabilities not adequately explained in the documentation.
Thank you,
L
First this:
Additionally, future data is going to come into the new database from
various sources (like maybe excel for example)...?
That's what SSIS is for. Setting up SSIS is not a trivial task but it's not rocket science either. SQL Server Management Studio has an Import/Export Wizard which is a easy-to-use SSIS package creator. That will get you started. There's many alternatives such as Powershell but SSIS is the quickest and easiest solution IMO. Especially when dealing with data from multiple sources.
SSIS works nicely with Microsoft Products as data sources (such as Excel and Sharepoint).
For some things too, you can create an MS Access Front-end that interfaces with SQL Server via sql server stored procedures. It just depends on the target audience. This is easy to setup. A quick google search will return many simple examples. It's actually how I learned SQL server 20+ years ago.
Is there some feature in SQL Server that will allow me to send the
data to the right tables that already exist in my normalized SQL
Server database?
Yes and don't. For what you're describing it will be frustrating.
Or do I import to the staging area and write the script(s) to dissect
and extract the data to the respective normalized table?
This.
If it is the latter, can someone please show me some tips/examples of
what the TSQL would look like to do this sort of thing.
When dealing with denormalized data a good splitter is important. Here's my two favorites:
DelimitedSplit8K
PatternSplitCM
In SQL Server 2016 you also have split_string which is faster (but has issues).
Another must have is a good NGrams function. The link I posted has the function attached at the bottom of the article. I have some string cleaning functions here.
The links I posted have some good examples.
I agree with all the approaches mentioned: Load the data into one staging table (possibly using SSIS) then shred it with T-SQL (probably wrapped up in stored procedures).
This is a custom piece of work that needs hand built scripts. There's no automated tool for this because both your source and target schemas are custom schemas. So you'd need to define all that mapping and rules somewhow.... and no SSIS does not magically do this!
It sounds like you have a target schema and mappings between source and target schema already worked out
As an example your first step is to load 'lookup' tables with this kind of query:
INSERT INTO TargetLookupTable1 (Field1,Field2,Field3)
SELECT DISTINCT Field1,Field2,Field3
FROM SourceStagingTable
TargetLookupTable1 should already have an identity primary key defined (which is not mentioned in the above query because it is auto generated)
This is where you will find your first problem. You'll almost definitely find your distinct query just gives you a whole lot of duplicated mispelt data rubbish data. So before you even load your lookup table you need to do data cleansing.
I suggest you clean the data in your source system directly but it depends how comfortable you are with that.
Next step is: assuming your data is all clean and you've loaded a dozen lookup tables in this way..
Now you need to load transactions but you don't know the lookup key that you just generated!
The trick is to pre-include an empty column for this in your staging table to record this
Once you've loaded up your lookup table you can write the key back into the staging table. This query matches back on the fields you used to load the lookup, and writes the key back into the staging table
UPDATE TGT
SET MyNewLookupKey = NewLookupTable.MyKey
FROM SourceStagingTable TGT
INNER JOIN
NewLookupTable
ON TGT.Field1 = NewLookupTable.Field1
AND TGT.Field2 = NewLookupTable.Field2
AND TGT.Field3 = NewLookupTable.Field3
Now you have a column called MyNewLookupKey in your staging table which holds the correct lookup key to load into you transaction table
Ongoing uploads of data is a seperate issue but you might want to investigate an MS Access Data Project (although they are apparently being phased out, they are very handy for a front end into SQL Server)
The thing to remember is: if there is anything ambiguous about your data, for example, "these rows say my car is black but these rows say my car is white", then you (a human) needs to come up with a rule for "disambiguating" it. It can't be done automatically.
So there are quite a number of ways to skin this cat. I don't know much about the "Migration Assistant", but I somehow doubt it's going to make your life easier given what you're trying to do.
I'd just dump the whole denormalized mess into a single big staging table then shred it where you need it using SQL. I know you asked for help with the TSQL, but without having some idea of what the denormalized data is and how you want to re-shape it, all I can do really is suggest you read up on SQL in general (select, from, where, group by, etc).
You could also do the work in SSIS, but ultimately the solution you use is largely going to depend on the nature of how you need to normalize the big denormalized data set. IMHO doing this in SQL is usually the easiest way, but then again when you're a hammer, everything looks like a nail.
As far as future proofing the process, how you import the Access data probably will have little bearing on how you'd import Excel data. If you have a whole lot of different data sources which you'll need to incorporate on a recurring basis, SSIS might be a good choice to invest some time and effort into for the long run. No matter what, incorporating data from a distinct data source takes time and effort. You'll have to do some extra work no matter what. I would weight how frequently you think you'll have to integrate a given data source, and how much effort is involved to massage it into the format you want.
I have a completely different opinion. Because I do both database development and Microsoft's Power BI - - on the PBI side we come across a lot of non-normalized data because a lot of the data is coming in from excel.
My guess is that what is now in Access was an import of something originally began in excel.
Excel Power Query and PBI offers transforms to pivot and unpivot layout. I would use these tools to do that task. Then import the results into SQL.

MS Access database with no relations

Can anyone recommend a tool or suggest the approach when dealing with MS Access database with no relationships between tables?
As part of data migration project I am creating data mapping definition rules but it becomes more and more difficult and time consuming to correctly identify source tables/fields for extraction.
I have many tables with the same data appearing in different places. Furthermore, as there were no validation rules when data was input, many entries contain spelling errors or generally do not match expected data type. Most of the tables however already have the keys (primary & foreign) created.
I am looking for a quick solution to rebuild the database (*.mdb), ideally with a use of some software which could identify all potential data issues, suggest corrections, allow for adjustments and finally left off with fully relational database where the data can easily be identified and not scattered all over the place.
I have some general knowledge of databases and SQL but didn't use Access much before so I'm trying to save myself some of the time. And - if it matters - I don't care about database performance at all... Only the data itself. I will be extracting it to *.csv files later anyway...
Comments, suggestions and/or other considerations will be appreciated.
Thanks in advance
J.
I don't believe there is any software that will analyze an Access database and use some kind of artificial intelligence to generate a new database with good data and strong relationships.
My recommendation though is to export all the data into SQL Server (or even MySQL) and then work with it there. It's much easier to manipulate the data with a real query language instead of trying to scrub data in Access.
You can do mass updates, comparisons, joins, etc. with SQL Server. You can query the schema easily (write queries to see if a field appears in a table), change schemas/table definitions with code, etc.
Then once you're done you can use jobs (SSIS) to export the data to CSV.
(You can download SQL Express if you don't have/can't afford SQL Server.)

User-friendly tools for retrieving data from a SQL Server database

We have several SQL Server databases containing measurements from generators that we build. However, this useful data is only accessible to a few engineers since most are unfamiliar with SQL (including me). Are there any tools would allow an engineer to extract chosen subsets of the data in order to analyze it in Excel or another environment? The ideal tool would
protect the database from any accidental changes,
require no SQL knowledge to extract data,
be very easy to use, for example with a GUI to select fields and the chosen time range,
allow export of the data values into a file that could be read by Excel,
require no participation/input from the database manager for the extraction task to run, and
be easy for a newbie database manager to set up.
Thanks for any recommendations or suggestions.
First off, I would never let users run their own queries on a production machine. They could run table scans or some other performance killer all day.
We have a similar situation, and we generally create custom stored procedures for the users to "call", and only allow access to a backup server running "almost live" data.
Our users are familiar with excel, so I create a stored procedure with ample parameters for filtering/customizations and they can easily call it by using something like:
EXEC YourProcedureName '01/01/2010','12/31/2010','Y',null,1234
I document exactly what the parameters do, and they generally are good to go from there.
To set up a excel query you'll need to set up the data sources on the user's PC (control panel - data sources- odbc), which will vary slightly depending on your version of windows.
From in excel, you need to set up the "query", which is just the EXEC command from above. Depending on the version of Excel, it should be something like: menu - data - import external data - new database query. Then chose the data source, connect, skip the table diagram maker and enter the above SQL. Also, don't try to make one procedure do everything, make different ones based on what they do.
Once the data is on the excel sheet, our users pull it to other sheets and manipulate it at will.
Some users are a little advanced and "try" to write their own SQL, but that is a pain. I end up debugging and fixing their incorrect queries. Also, once you do correct the query, they always tinker with it and break it again. using a stored procedure means that they can't change it, and I can put it with our other procedures in the source code repository.
I would recommend you build your own in Excel. Excel can make queries to your SQL Server Database through an ODBC connection. If you do it right, the end user has to do little more than click a "get data" button. Then they have access to all the GUI power of Excel to view the data.
Excel allows to load the output of stored procedures directly into a tab. That IMO is the best way: users need no knowledge of SQL, they just invoke a procedure, and there are no extra moving parts besides Excel and your database.
Depending on your version of SQL server I would be looking at some of the excellent self service BI tools with the later editions such as Report Builder. This is like a stripped down version of visual studio with all the complex bits taken out and just the simple reporting bits left in.
If you setup a shared data source that is logging into the server with quite low access rights then the users can build reports but not edit anything.
I would echo the comments by KM that letting the great unwashed run queries on a production system can lead to some interesting results with either the wrong query being used or massive table scans or cartesian joins etc

What's the best way to work with SQL Server data non-programmatically?

We have a SQL server database. To manipulate the data non-programmatically, I can use SQL Server Management Studio by right-clicking a table and selecting "Open Table". However this is slow for very large tables and sorting and filtering is cumbersome.
Typically what we have done until now is to create an Access database containing linked tables which point to the SQL Server tables and views. Opening a large table is much faster this way, and Access has easy-to-use right-click filtering and sorting.
However, since Access 2007, sorting in particular has been quite slow when working with large tables. The Access database can also inadvertently lock the database tables, blocking other processes that may need to access the data. Creating the Access database in the first place, and updating it when new tables are added to SQL Server, is also tedious.
Is there a better way to work with the data that offers the usability of Access without its drawbacks?
Joel Coehoorn's answer is of course correct, that if the data is critical or there are naive users using the data, then a application front end should be developed. That being said, I have cases where a wise user (ok, me) user needs to just get in there and poke around.
Instead of directly looking at the tables, use MS Access but use queries to narrow down what you're looking at both column wise and row wise. That will improve the speed. Then edit the query properties and make sure that the query is No Locks. That should eliminate any blocking behavior. You may want to limit the number of rows returned which again will improve the speed. You can still edit the data in the query as you look at it.
Depending on what you're looking at, it may also be useful to set up database Views in the SQL Server to do some of the heavy lifting on the server rather than on the client.
I don't know how well it will do with really large tables, but Visual Studio is much quicker than SQL Management Studio for basic table operations. Open up your database in Server Explorer, right-click on a table, and select either "Open" to just display the data or "New Query" to filter, sort, etc.
I've used Visual Studio to do lots of stuff, just for convenience rather than having to log into the server and work on the database manager directly.
However, have you tried Toad for MS SQL (from Quest Software)? I use it all the time for Oracle, and have had good results (often better than Oracle's tools).
Editing raw data is a dangerous no-no. Better to identify the situations where you find yourself doing that and put together an application interface to act as an intermediary that can prevent you from doing stupid things like breaking a foreign key.
I don't know what the performance would be like for large datasets, but open office has a database program (Base), which is an Access clone and may just be what you are looking for.
You might want to read
Tony Toews's Access Performance FAQ, which provides a number of hints on how to improve performance in an Access application. Perhaps one of those tips will solve the problem in your A2K7 app.

Resources