Speed up Excel array formula to find unique distinct values

Speed up Excel array formula to find unique distinct values - arrays

I have a workbook in which a variable number of rows of data (one per employee) are entered each week on one sheet (DATA ENTRY), and then stored on another sheet (LOG) with the help of a macro that is executed every time the file is saved.
To be able to then retrieve and review employee data for a specific week, I need a column of helper cells in which all the unique distinct dates (weeks) are listed.
I currently do this with the following array formula:
{=IFERROR(INDEX($B$2:$B$1600, MATCH(0,COUNTIF($K$1:K1, $B$2:$B$1600), 0)),"")}
This all works brilliantly, except that I found that this one specific formula slows my file down tremendously. When the file is saved (which triggers data to be copied over to the LOG sheet), it can take up to 10 seconds to process. When this array formula is disabled, it is pretty much instantaneous.
Limiting it to run over 1600 rows helped significantly (it took much longer before when I had it set to 20.000), but it is still not enough and I can't really have this check less than 1600 rows.
Any creative solutions to either make this formula run faster, or to get to the same result (a list of unique distinct dates from a large list of dates) without using an array formula?
Thanks!

You could use Power Query (Get & Transform Data) to populate your list of unique dates.

Related

How to Subtract row by row in Power Bi?

In this dataset I am trying to develop a column or a measure based upon the hours column. I am trying to determine the difference between the first and second hour rows, the second and third hour rows, etc. and all the way through the entirety of the data.
Note: there are multiple serial numbers in this table; I just used this serial as an example.

I'm not sure this should be tagged with SQL-Server unless you can change the SQL that sources the data. If so you could pre-calculate this inside SQL Server.
If you can change the Power Query that brings the data into the data model you can add an Index column as the data's coming in and use that.
Please see:
How to Compare the Current Row to the Previous Row Using DAX

Finding difference between columns in column group

I am using report builder to create a report showing a budget for a project. The dataset includes line items for both budget and projected. See below for example rows. I am using a matrix with column group to display budget and projected side by side as well as a row group to show section, category, etc. I need to have a variance column that subtracts projected from budget.
I have scoured the interwebs for solutions but nothing that has worked so far. I feel like there has to be simple solution to this given it is something that could be done in a sql query with zero effort. Most solutions are assuming I have two separate fields, but these are dynamic fields pull out with the column group.
Dataset Row Samples
Type Section Cateogry Phase Task Total
Budget Building Kitchen Pre-Construction Cabinet Hardware $100
Projected Building Kitchen Pre-Construction Cabinet Hardware $220
Report sample
COL GROUP This is the column i want
Budget Projected Variance
+Buidling $100 $220 -$120
+Kitchen
+Pre-Con
EDIT: I tried the below solution without success and have already visited every link provided in the second answer. Maybe there is something I am missing, but I ended up just doing everything in the SQL query and not use Column groups. This is 100% the simplest solution. I am very surprised there is no easy way to reference individual columns in a column group. The below may work for others, but I just could not get them to work for me. Not sure why.

You could add an additional column inside the “Type” group (provided that this is the name of your column group). Set the Column Visibility to hide the column by an expression like
= IsNothing(Previous(Field!Type.Value, “Type”)
Calculate the values for that column as
= Previous(Sum(Fields!Total.Value), “Type”) – Sum(Fields!Total.Value)
That should calculate the difference between the values of the previous type and the current type, and
only show that column for the "Projected" type (when there is a previous type).

On the matrix, you can use the group subtotals to achieve this, you only have to overwrite the SUM operation with an expression that subtract to values. There are many link mentioning how to do that or that can helps you:
How to add calculated column from dynamic columns to a matrix
Adding subtotals to SSRS report tablix
How to write Expression to subtract row Group SubTotals
Reporting in SQL Server – Using calculated Expressions within reports

Method/Process to Handle Data in Persistent Manner

I've been banging my head against this for about a year on and off and I just hit a crunch time.
Business Issue: We use a software called Compeat Advantage (General Ledger system) and they provide a Excel add-in that allows you to use a function to retrieve data from the Microsoft SQL database. The problem is that it must make a call to the database for each cell with that function. On average it takes about .2 seconds to make the call and retrieve the data. Not bad except when a report has these in volume. Our standard report built with it has ~1,000 calls. So by math it takes just over 3 minutes to produce the report.
Again, in and of itself not a bad amount of time for a fully custom report. The issue I am trying to address that is one of the smaller reports ran, AND in some cases we have to produce 30 variants of the same report unique per location.
Arguments in function are; Unit(s) [String], Account(s) [String], Start Date, End Date. All of this is retrieved in a SUM() for all info to result in a single [Double] being returned.
SELECT SUM(acctvalue)
FROM acctingtbl
WHERE DATE BETWEEN startDate AND endDate AND storeCode = Unit(s) AND Acct = Account(s)
Solution Sought: For the standard report there is only three variation of the data retrieved (Current Year, Prior Year, and Budget) and if they are all retrieved in bulk but in detailed tables/arrays the 3 minute report would drop to less than a second to produce.
I want to find a way to retrieve in line item detail and store locally to access without the need to create a ODBC for every single function call on the sheet.
SELECT Unit, Account, SUM(acctvalue)
FROM acctingtbl
WHERE date BETWEEN startDate AND endDate
GROUP BY Unit, Account
Help: I am failing to find a functional way to do this. The largest problem I have is the scope/persistence of data. It is easy to call for all the data I need from the database, but keeping it around for use is killing me. Since these are spreadsheet functions after the call the data in the variables is released so I end up in the same spot. Each function call on the sheet takes .2 seconds.
I have tried storing the data in a CSV file but continue to have data handling issues is so far as moving it from the CSV to an array to search and sum data. I don't want to manipulate registry to store the info.
I am coming to the conclusion if I want this to work I will need to call the database, store the data in a .veryhidden tab, and then pull it forward from there.
Any thoughts would be much appreciated on what process I should use.

Okay!
After some lucking Google-fu I found a passable work around.
VBA - Update Other Cells via User-Defined Function This provided many of the answers.
Beyond his code I had to add code to make that sheet calculate ever time the UDF was called to check the trigger. I did that by doing a simple cell + cell formula and having a random number placed in it every time the workbook calculates.
I am expanding the code in the Workbook section now to fill in the holes.
Should solve the issue!

Copy a very large number of rows from one sheet to another, excluding blank rows in Excel 2010

I'm currently working on an excel workbook using the following formula to copy all rows from one sheet (Creation_Series_R) to another one, excluding empty rows.
{=IFERROR(INDEX(Creation_Series_R!C:C;SMALL(IF(Creation_Series_R!$C$3:$C$20402<>"";ROW(Creation_Series_R!$C$3:$C$20402));ROW()-ROW(Creation_Series_R!$C$3)+1));"")}
And the formula works very well. Except, when I did my proof of concept I only had a few rows but with the final data, I need to work on 20400 rows... adding to the fact that I have 17 columns, and 3 similar sheets with similar formula, my workbook takes an hour to compute every time I input just one value.
This workbook is designed as a way for a client to enter data, and then it reorganize the data so that it can be imported directly in our software. I already limited the number of data the user can enter per workbook (to their very big disappointment), so I can't really reduce it to less than 20400 rows (it's only a 100 funds financial data).
Is there a way, even maybe using macro, I could do this more efficiently ?

The big block of array formulas is killing your performance (time-wise).
If your data is in column A through Q, then I would use column R as a "helper" column. In R2 insert:
=COUNTA(A2:Q2)
and copy down. The macro would:
AutoFilter column R
Hide all rows showing 0 in column R
Copy the visible rows and paste elsewhere as a block

Running Database in Excel

Is there a way to create a running database in excel (and only in excel, without using third party programs) so for example:
-One worksheet has today's data for each person
-The additional worksheets (one per each person on the first worksheet) keeps a list of each of the past columns
-Each of the worksheets, except for the current worksheet, charts each new row of data added daily.
Here is a picture in case it helps:

This can be done in Excel, but you need to get the data architecture right.
Use ONE sheet for all raw data. Columns are Date, member, score, number of pages, number of files, notes. New data goes at the bottom of the list. You can use VBA to create a data entry form if you don't want to enter data straight into the sheet. The sheet can be hidden, if needed.
Then use ONE other sheet to create a dynamic report where you can select the time frame and the member to report on. Data is pulled from the raw data sheet and aggregated as required. Pivot tables are immensely powerful.
Using a sheet for each member would be duplication of functionality and bad data design.
Edit: a few conceptual screenshots
The raw data table. New data is added at the bottom of the table. A VBA form can ensure a pleasant user interface, so that the user never sees this table.
The report could be a pivot table grouped by date. Slicers allow the selection of specific time frames, for example a month. Another slicer allows filtering by a specific member.
It took me roughly 5 minutes to create the scenario, including making up the dummy data. With a few hours to spend, this could be made really shiny.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Speed up Excel array formula to find unique distinct values - arrays

You could use Power Query (Get & Transform Data) to populate your list of unique dates.

Related

How to Subtract row by row in Power Bi?

Finding difference between columns in column group

Method/Process to Handle Data in Persistent Manner

Copy a very large number of rows from one sheet to another, excluding blank rows in Excel 2010

Running Database in Excel

Categories

Resources