Converting an Excel LINEST function to T-SQL or SSRS - sql-server

I'm stuck on this one...
...we're using linear regression for some trending and forecasting and I'm having to query data, create a dataset, then paste into excel and apply a linest function to my data. Since the data requirements have changed daily, this has become a very cumbersome thing to whip together. I'd want SQL Server to take care of that processing as this will be an automated forecast that I do not want to touch after I hand it over to an end user. When they refresh the data, I want it to refresh the linest function.
Here's some sample data
The [JanTrend] is a logarithmic trend in Excel that takes the trend of the Jan-12, Jan-13, and Jan-14 fields and calculates.
Here's that function in Excel
=LINEST([Jan-12]:[Jan-14]^{1})
The Forecasted field is basically [Jan-14] + [JanTrend].
StockCode Jan-12 Jan-13 Jan-14 JanTrend Forecasted
300168 2 3 11 5 16
300169 1 4 3 1 4
The JanTrend field is where my linest function is located in my excel spreadsheet.
I want to convert the above function to T-SQL or in an SSRS report. How can I achieve this?
EDIT: I'm trying to calculate a logarithmic trend. I made some changes to my sample data to makes things more clear.

the linest excel function is just linear regression. it's (still) not available in sql server, but you will find a lot of examples of UDF's or queries implementing it. just google for "sqlserver udf linear regression" or refer to this previous question.
Are there any Linear Regression Function in SQL Server?
udf's are generally slow, so you might want to go with the solution in the third post in this forum. http://www.sqlservercentral.com/Forums/Topic710626-338-1.aspx

Related

SQL to Excel - Max each sheet

I have a SQL Table with close to 2 Million rows and I am trying to export this data into an Excel file so the stakeholders can manipulate data, see charts, so on...
The issue is, when I hit refresh, it fails after getting all the data saying the number of rows exceed max rows limitation in Excel. This table is going to keep growing every day.
What I am looking for here is a way to refresh data, then add rows to Sheet 1 until max rows limitation is reached. Once maxed out, I want the rows to start getting inserted into Sheet 2. Once maxed out, move to 3rd sheet, all from the single SQL table, from a single refresh.
This does not have to happen in Excel (Data -> Refresh option), I can have this as a part of the SSIS package that I am already using to populate rows in the SQL table.
I am also open to any alternate ways to export SQL table into a different format that can be used by said stakeholders to create charts, analyze data, and whatever else pleases them.
Without sounding too facetious, you are suggesting a very inefficient method.
The best way of approaching this method is not to use .xlsx files at all for the data storage.
Assuming your destination stakeholders don't have read access to the SQL server, export the data to .csv and then use Power Query in some sort of 'Dashbaord.xlsx' type file to load the .csv to the data model which can handle hundreds of millions of rows instead of just 1.05m.
This will allow for the use of Power Pivot and DAX for analysis and the data will also be visible in the data model table view if users do want raw rows (or they can refer to the csv file..).
If they do have SQL read access then you can query the server directly so you don't need to store any rows whatsoever as it will read directly.
Failing all that and you decide to do it your way, I would suggest the following.
Read your table into a Pandas df and iterate over each row and cell of the dataframe, writing to an your xlsx[sheet1] using openpyxl then once the row number reaches 1,048,560 simply iterate to xlsx[sheet2].
In short: openpyxl allows you to create workbooks, worksheets, and write to cells directly.
But depending on how many columns you have it could take incredibly long.
Product Limitations
Excel 2007+ 1,048,576 rows by 16,384 columns
A challenge with your suggestion of filling a worksheet with the max number of rows and then splitting is "How are they going to work with that data?" and "Did you split data that should have been together to make an informed choice?"
If Excel is the tool the users want to use and they must have access to all the data, then you're going to need to put the data into a Power Pivot data model (and yes, that's going to impact the availability of some data visualizations). A Power Pivot model is an in-memory tabular data set. What the means is that the data engine, xVelocity, is going to use a bunch of memory but can get over the 1 million row limitation. Depending on how much memory is required, you might need to switch from the default 32 bit Office install and go with a 64 bit install (and I've seen clients have to max RAM out on old, low end desktops because they went cheap for business users).
Power Pivot will have a connection to your SQL Server (or other provider). When it refreshes data, it's going to fire off queries and determine the unique values in columns and then create a dictionary of unique values. This allows it to compress the data with low cardinality really well - sales dates are likely going to be repeated heavily within your set so the compression is good. Assuming your customers are typically not-repeat customers, a customer surrogate key would have high cardinality and thus not compress well since there's little to no repeat. The refresh is going to be dependent on your use case and environment. Maybe the user has to manually kick it off, maybe you have SharePoint with Excel services installed and then you can have it refresh the data on various intervals.
If they're good analysts, you might try turning them on to Power BI. Same-ish engine behind the scenes but built from the ground up to be an response reporting tool. If they're just wading through tables of data, they're not ready for PBI. If they are making visuals out of the data, PBI is likely a better fit.

Method/Process to Handle Data in Persistent Manner

I've been banging my head against this for about a year on and off and I just hit a crunch time.
Business Issue: We use a software called Compeat Advantage (General Ledger system) and they provide a Excel add-in that allows you to use a function to retrieve data from the Microsoft SQL database. The problem is that it must make a call to the database for each cell with that function. On average it takes about .2 seconds to make the call and retrieve the data. Not bad except when a report has these in volume. Our standard report built with it has ~1,000 calls. So by math it takes just over 3 minutes to produce the report.
Again, in and of itself not a bad amount of time for a fully custom report. The issue I am trying to address that is one of the smaller reports ran, AND in some cases we have to produce 30 variants of the same report unique per location.
Arguments in function are; Unit(s) [String], Account(s) [String], Start Date, End Date. All of this is retrieved in a SUM() for all info to result in a single [Double] being returned.
SELECT SUM(acctvalue)
FROM acctingtbl
WHERE DATE BETWEEN startDate AND endDate AND storeCode = Unit(s) AND Acct = Account(s)
Solution Sought: For the standard report there is only three variation of the data retrieved (Current Year, Prior Year, and Budget) and if they are all retrieved in bulk but in detailed tables/arrays the 3 minute report would drop to less than a second to produce.
I want to find a way to retrieve in line item detail and store locally to access without the need to create a ODBC for every single function call on the sheet.
SELECT Unit, Account, SUM(acctvalue)
FROM acctingtbl
WHERE date BETWEEN startDate AND endDate
GROUP BY Unit, Account
Help: I am failing to find a functional way to do this. The largest problem I have is the scope/persistence of data. It is easy to call for all the data I need from the database, but keeping it around for use is killing me. Since these are spreadsheet functions after the call the data in the variables is released so I end up in the same spot. Each function call on the sheet takes .2 seconds.
I have tried storing the data in a CSV file but continue to have data handling issues is so far as moving it from the CSV to an array to search and sum data. I don't want to manipulate registry to store the info.
I am coming to the conclusion if I want this to work I will need to call the database, store the data in a .veryhidden tab, and then pull it forward from there.
Any thoughts would be much appreciated on what process I should use.
Okay!
After some lucking Google-fu I found a passable work around.
VBA - Update Other Cells via User-Defined Function This provided many of the answers.
Beyond his code I had to add code to make that sheet calculate ever time the UDF was called to check the trigger. I did that by doing a simple cell + cell formula and having a random number placed in it every time the workbook calculates.
I am expanding the code in the Workbook section now to fill in the holes.
Should solve the issue!

Query SQL database from Excel

I'm attempting to create a MS Query to return data from a SQL database based on a value from a cell in Excel. I have actually successful accomplished this, but only for 1 row. I cant figure out how to get it to copy-down to other rows.
I've created a connection as follows:
Notice that the SQL statement includes a parameter. The parameter is set to point to a specific cell:
I guess this makes sense as I'm only looking to return 1 value per row:The problem is that I have multiple lines to return values for. How do I return a value per row for multiple rows?
I've tried changing the cell reference in the Parameters dialog box, but this does not work as the Excel Table is designed to grow dynamically.
Excel data connections works in a way that every connection has only one SQL Query. So in order to do what you'r looking for, you will need to have many connections, and that's not the "best practice".
However, there are two ways you can solve this situation:
1. Make a single connection with all of the data and create a pivot table based on it. Then use VLOOKUP/INDEX to gather the data to your requested cells.
2. If the data is too big, you can use VBA code to create a smaller Query based on the cells you mentioned and then continue as described on the first option.
Good luck.

OpenOffice Base Report that takes user input

I'm working on a LOT of VBA code that generates reports from Excel.
I reckon it's a really stupid idea to use Excel and VBA to run "queries" on an worksheet so I'd like to do it in a database environment.
I successfully ported the required table and data into OpenOffice base. The problem now is that I need to run a report that takes in user input for example in a pizza database:
"Show me total use of ham on a weekly basis and consolidate that on a month by month basis."
Where "ham" would be an ingredient that the user could change for say, "pepperoni".
How do I create a report that takes user input? Do I need to do it with a subform? Can I connect a form to a report?
Thanks to drop-lists, named ranges, and advanced functions like sumproduct(), you might be able to do everything you need in Excel, possibly without any VBA (It's more powerful than most people realize). Can you give us some more examples?

Cross-reference Distance chart in SQL Server 2008 or Excel?

I would like to cross-reference construct a distance chart similar to the one here (example is a road-distance cross-reference chart) and, ideally, store the data in SQL Server 2008 (preferably the Express version). It needs these properties / abilities
Every column has a corresponding row with the same name (ie. not misspelled like my example).
Changing the value at one Row-Column intersection would update the mirror intersection (Column-Row) or the mirror data could be ignored.
The distance-values would need to be end-user editable.
The end-user would need to be able to add, delete or rename a column/row pair.
The end-user needs to be able to sort the columns and have the rows move automatically.
There could be hundreds of pairs.
a look-up query needs to find a distance given a start & destination (Row & Column)
The distance chart is reasonably straightforward to implement in Excel. Considering this, am I better off...
Using Excel as the user editing UI and then updating an SQL 'thing' with the new data?
Using Excel as the data-source even if it means performance issues with querying the data?
Using an as-yet undiscovered stroke of genius detailed here in an answer?
Sure looks like an Excel application to me, start to end. (heh)
I can't imagine your users typing enough data in to make performance an issue. Excel will only take 32757 rows by ditto columns. If that's enough, I'd say you're golden.

Resources