Run a query from two data sets programmatically - database

I am trying to reconcile data from a website and a database programmatically. Right now my process is manual. I download data from the website, download data from my database, and reconcile using an Excel vlookup. Within Excel, I am only reconciling 1 date for many items.
I'd like to programmatically reconcile the data for multiple dates and multiple items. The problem is that I have to download the data from the website manually. I have heard of people doing "outer joins" and "table joins" but I do not know where to begin. Is this something that I code in VBA or notepad?

Generally I do this by bulk inserting the website data into a staging table and then write select statments to join that table to my data in the database. You may need to do clean up first to be able to match the records if they are stored differently.

Python is a scripting language. http://www.python.org
There are tools to allow you to read Excel spreadsheets. For example:
http://michaelangela.wordpress.com/2008/07/06/python-excel-file-reader/
You can also use Python to talk to your database server.
http://pymssql.sourceforge.net/
http://www.oracle.com/technology/pub/articles/devlin-python-oracle.html
http://sourceforge.net/projects/pydb2/
Probably the easiest way to automate this is to save the excel files you get to disk, and use Python to read them, comparing that data with what is in your database.
This will not be a trivial project, but it is very flexible and straight forward. Trying to do it all in SQL will be, IMHO, a recipe for frustration, especially if you are new to SQL.
Alternatively:
You could also do this by using VBA to read in your excel files and generate SQL INSERT statements that are compatible with your DB schema. Then use SQL to compare them.

Related

Extract data from thousands of Excel files into database

We use SharePoint 2013 as a library to hold thousands of Excel files, with almost never consistent formatting, to manage projects occurring on servers. Somewhere in these maybe formatted as table objects is a common set of server names.
Somehow, without being able to change this process in the short term, I need to pull data from all these files to identify how many projects are targeting a particular server.
I've got access to SQL Server 2016 enterprise, and wondering if something like PolyBase could help with this? I also wonder about SSIS but I don't expect any tables to look exactly like another one.
Other tools may be an option, but I'm not sure what can handle this scale and variety. I think daily updates to the data would be enough, but even so it's still a mess.
How do I pull thousands of varied excel tables into a database? Is this even possible?
Any longer term solution that doesn't allow them to format and annotate like excel is unlikely to actually be adopted.
The less you know in advance, the more difficult it will be...
Some ideas:
Technology
read about FROM OPENROWSET which allows to read from an Excel
read about linked server
Use Excel and its great abilities through VBA to iterate through all your Excel-Sheets, open them, analyse them and fill proper tables. Within Excel you know most about your messy data...
Target structure
You might create thousands of tables, each representing one single sheet in all your Excel files. You could query these tables with dynamically created SQL (using meta-data of INFORMATION_SCHEMA) or think about Full-Text-Search
You might import each sheet into one single XML-structure (SELECT * ... FOR XML PATH('...')). In this case you'd need a target table with columns for Path and name of your Excel, Name of the sheet and an XML column for your data. Another approach was to represent each File on one XML and include all sheets there. Try to define common naming for all your data. Querying XML allows to query columns without knowing their actual names (XQuery with XPath using *).
If your Excels are xlsx already, you might open them with UNZIP and take the existing XML as-is.
To be honest: I do not think that any tool can do the magic to import such a wide range of mess automatically...

Querying a .txt file on a web server

The National Weather Service's Climate Prediction Center maintains data of recent weather data from about 1400 weather stations across the United States. The data for the previous day can always be found at the following address:
http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/prcp_temp_tables/dly_glob1.txt
In an ambitious attempt to store weather data for future reference, I want to store this data by row using SQL Server 2012. Five years ago a similar question was asked, and this answer mentioned the BULK INSERT command. I do not have access to this option.
Is there an option which allows for direct import of a web hosted text file which does not use the BULK statement? I do not want to save the file as I plan on automating this process and having it run daily direct to the server.
Update: I have found another option in Ad Hoc Distributed Queries. This option is also unavailable to me based on the nature of the databases in question.
Why do you NOT have access to Bulk Insert? I can't think of a reason that would be disabled on your version of SQL Server.
I can think of a couple ways of doing the work.
#1) Record a macro, using excel, to do everything from the data import, to the parsing of the data sets, and then to saving as a CSV file. I just did it; very easy. Then, use BULK INSERT to get the data from the CSV to SQL Server.
#2) Record a macro, using excel, to do everything from the data import, to the parsing of the data sets. Then use a VBA script to send the data to SQL Server. You will find several ideas from the link below.
http://www.excel-sql-server.com/excel-sql-server-import-export-using-vba.htm#Excel%20Data%20Export%20to%20SQL%20Server%20using%20ADO
#3) You could actually use Python or R to get the data from the web. Both have excellent HTML parsing packages. Then, as mentioned in point #1 above, save the data as a CSV (using Python or R) and BULK INSERT into SQL Server.
R is probably a bit off topic here, but still a viable option. I just did it to test my idea and everything is done in just two lines of code!! How efficient is that!!
X <- read.csv(url("http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/prcp_temp_tables/dly_glob1.txt"))
write.csv(X, file = "C:\\Users\\rshuell001\\Desktop\\foo.csv")

How can I import multiple csv files from a folder into sql, into their own separate table

I would like some advice on the best way to go about doing this. I have multiple files all with different layouts and I would like to create a procedure to import them into new tables in sql.
I have written a procedure which uses xp_cmdshell to get the list of file names in a folder and the use a cursor to loop through those file names and use a bulk insert to get them into sql but I dont know the best way to create a new table with a new layout each time.
I thought if I could import just the column row into a temp table then I could use that to create a new table to do my bulk insert into. but I couldn't get that to work.
So whats the best way to do this using SQL? I am not that familiar with .net either. I have thought about doing this in SSIS, I know its easy enough to load multiple files which have the same layout in SSIS but can it be doe with variable layouts?
thanks
You could use BimlScript to make the whole process automated where you just point it at the path of interest and it writes all the SSIS and T-SQL DDL for you, but for the effort involved in writing the C# you'd need, you may as well just put the data dump into SQL Server in the C#, too.
You can use SSIS to solve this issue, though, and there are a few levels of effort to pick from.
The easiest is to use the SQL Server Import and Export Wizard to create SSIS packages from your Excel spreadsheets that will dump the sheet into its own table. You'd have to run this wizard every time you had a new spreadsheet you wanted to import, but you could save the package(s) so that you could re-import that spreadsheet again.
The next level would be to edit a saved SSIS package (or write one from scratch) to parameterize the file path and the destination table names, and you could then re-use that package for any spreadsheets that followed the same format.
Further along would be to write a package that determined with of the packages from the previouw level to call. If you can query the header rows effectively, you could probably write an SSIS package that accepted a path as an input parameter, found all the Excel sheets in that path, queried the header rows to determine the spreadsheet format, and then pass that information to the parameterized package for that format type.
SSIS development is, of course, its own topic - Integration Services Features and Tasks on MSDN is a good place to start. SSIS has its quirks, and I highly recommend learning BimlScript if you want to do a lot of SSIS development. If you'd like to talk over what the ideas above would require in more detail, please feel free to message me.

What would be the best method for building Dynamic Reports from my SQL DB Data?

I am building a simple database with about 6-7 tables. I will be setting a schedule to do a clean import from a .txt file.
I want to take this data and create a report, like I would do in an excel spreadsheet, convert it to a pdf and post it to our company intranet for those interested to access it.
I'm trying to think of the best way to build my report. Would I just use an excel spreadsheet with a direct connection to the database? Would I create some sort of console application (c/c#/vb/vb.net) that would query the db, generate the report in an excel file, convert to pdf and save?
I'm quite comfortable in these different languages, just not as experienced in the reporting services (although I do have a lot of experience working with EXCEL and VBA Macros) but I want to get into it (SSRS) and get familiar with it as I will be doing a lot of projects like this in the future. This is seems like an easy one to get my hands dirty with and learn and build off of.
Any insight or suggestions would be greatly appreciated.
Thanks so much!
My suggestion:
Create desired SQL queries to retrieve the data in desired form
Link these queries to your Excel sheet, perhaps directly in form of pivot tables for aggregation of results
Using VBA, you can easily create PDF from the data at the click of a button
The initial design will be time intensive, but after that, everything is automated and one just needs to press the button that creates the PDF.
How to link Access queries to your Excel file:
Data --> Get external Data
You can easily refresh all data whenever you open the Excelsheet by using the code below in the On Open event of the workbook:
ThisWorkbook.RefreshAll
If you need further clarification, do not hesitate to ask
If your end goal is to create a PDF that will be out on your intranet then I would create the report in SSRS. Then you can schedule it to run and output a PDF to your network location.
I've had good experiences using a pivot table in Excel which is a connected table to your SQL database.
In the connection parameters in Excel there is a field where you can define your SQL query, whether it be to call a stored procedure or just a simple SELECT statement.
The main reason I prefer a pivot table SQL connection rather than a normal table connection is because if you have a chart that references the connected table, the chart formatting will be reset when you refresh your connection (if you need to updated your report).
If I use a chart that references a pivot table (or a pivot chart) then the formatting is retained.

Tools to update tables in SQL server 2000/2005

Is there any handy tool that can make updating tables easier? Usually I got an Excel file with the original value in one column and new value in another column. Then I write a formula in Excel to create the 'update' statement. Is there any way to simplify the updating task?
I believe the approach in SQL server 2000 and 2005 would be different, so could we discuss them both? Thanks.
In addition, these updates usually request by "non-programmer" (which means they don't understand SQL, so it may not feasible to let them do query), is there any tool that can let them update the table directly without having DBAs do this task? Also, that tool needs to limit the privilege to only modify certain tables. And better has a way rollback the change.
Create a DTS package that will import a csv file, make the updates and then archives the file. The user can drop the file in a specific folder designated for the task or this can be done by an ops person. Schedule the DTS to run every hour, day, etc.
In case your users would insist that they keep using Excel, you've got several different possibilities of getting the data transferred to SQL Server. My preferred one would be to use DTS/SSIS, as mentioned by buckbova.
However, another method is by using OPENROWSET(), which makes it possible to query your Excel file as if it was a table. I wrote a small article about it here: http://blog.hoegaerden.be/2010/03/29/retrieving-data-from-excel/
Another approach that hasn't been mentioned yet (I'm not a big fan of letting regular users edit data directly in the DB), any possibility of creating a small custom application for them?
There you go, a couple more possible solutions :-)
Valentino.
I think the best approach is to expose a view on your data accessible to users who are allowed to do updates, and set up triggers on the view to perform the actual updates on the underlying data. Restrict change to only the columns they should be changing.
This technique can work on SQL Server 2000 and 2005.
I would add audit triggers on the underlying tables so you can always track changes.
You'll have complete control, and they can connect to it with Access or whatever and perform their maintenance.
You could create some accounts in SQL Server for these users and limit their access to only certain tables and columns along with onlu select / update / insert privileges. Then you could create an access database with linked tables to these.

Resources