Count the number of specific words in a list - arrays

I have some big computation to do since I have an Excel file with a column representing a list of unique IDs of people that worked on every incidents in our system. I would like to know the total number of interventions that have been done on all incidents. For example, let's say I have this:
ID|People working on that incident
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0|AA0000 BB1111 CC2222 ZZ1234
1|BB1111
2|CC2222 ZZ1234 CC2222 ZZ1234
3|BB1111 CC2222 AA0000 BB1111
I have a list named List which has a zone with the list of people IDs I actually want to include. For example, let's say that the first zone of List = {"AA0000","CC2222"}.
Now, I would like to know how many interventions have been done by our employees (in List) on all the incidents I have (we have 4 in the array above). The result would be 6: 2 interventions for incident ID 0, 0 for ID 1, 2 for ID 2 and 2 for ID 3.
Assuming the data are in a different (closed) workbook, how can I calculate that using my list List and the range above A1:B4 (I would like to eventually use the whole columns, so let's say A:B)?
EDIT:
I already got something working that count the number of times a specific word is in a whole column.
SUM(
LEN('[myFile.xlsx]Sheet1'!$A:$A)
-LEN(
SUBSTITUTE('[myFile.xlsx]Sheet1'!$A:$A;$Z$1;"")
)
)
/LEN($Z$1)
Z1 is the word I'm looking for (example: CC2222) and '[myFile.xlsx]Sheet1'!$A:$A is the column I'm searching in.
Isn't there a really simple way to make this working with an array instead of Z1? The length is always the same (six plus a space).
Source: http://office.microsoft.com/en-ca/excel-help/count-the-number-of-words-in-a-cell-or-range-HA001034625.aspx

Split your source data ColumnB with Text to Columns. Unpivot the result, delete the middle column and pivot what's left.

You could do this fairly easily with a User Defined Function. The function below takes two arguments. The first is the range constituting you second column labelled above "People working on that incident". The second is your List which is a range consisting of a single entry for each ID you wish to count. As shown in your example, if multiple identical ID's are shown in a single entry (e.g. your ID 2 has CC2222 repeated twice), they will each be counted.
To enter this User Defined Function (UDF), opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=InterventionCount(B2:B5,H1:H2)
in some cell.
Option Explicit
Function InterventionCount(myRange As Range, myList As Range) As Long
Dim RE As Object, MC As Object
Dim vRange As Variant, vList As Variant
Dim sPat As String
Dim I As Long
vRange = myRange
vList = myList
If IsArray(vList) Then
For I = 1 To UBound(vList)
If Not vList(I, 1) = "" Then _
sPat = sPat & "|" & vList(I, 1)
Next I
Else
sPat = "|" & vList
End If
sPat = "\b(?:" & Mid(sPat, 2) & ")\b"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = True
.Pattern = sPat
End With
For I = 1 To UBound(vRange)
Set MC = RE.Execute(vRange(I, 1))
InterventionCount = InterventionCount + MC.Count
Next I
End Function
For a non-VBA solution you could use a helper column. Again, List is a single column which contains the list of people you want to add up, one entry per cell.
If your data is in Column B, then add a column and enter this formula in B2:
This formula must be array-entered; and the $A:$J terms represent a counter allowing for up to ten items in the entries in column B. If there might be more than that, expand as needed: e.g. for up to 26 items, you would change them to $A:$Z
=SUM(N(TRIM(MID(SUBSTITUTE(B2," ",REPT(" ",99)),(COLUMN($A:$J)=1)+(COLUMN($A:$J)>1)*(COLUMN($A:$J)-1)*99,99))=(List)))
Fill down as far as necessary, then SUM the column to get your total.
To array-enter a formula, after entering
the formula into the cell or formula bar, hold down
ctrl-shift while hitting enter. If you did this
correctly, Excel will place braces {...} around the formula.

I finally went for a completely different solution based on my working formula for 1 employee:
SUM(
LEN('[myFile.xlsx]Sheet1'!$A:$A)
-LEN(
SUBSTITUTE('[myFile.xlsx]Sheet1'!$A:$A;$Z$1;"")
)
)
/LEN($Z$1)
Instead of trying something more complicated, I just added a new column to my employee list where the total is evaluated for each employees (it was already needed elsewhere anyway). Then, I just have to sum up all the employees to get my total.
It is not as elegant as I would like and I feel like it is a workaround, but since it is the easiest solution on a programmation standpoint and that I need the individual datas anyway, it's what I really need for now.
+1 to all the other answers for your help though.

Related

Long calculation times with XLOOKUP vs INDEX-MIN-COLUMN

I'm using this formula =IF(B24="","",IFERROR(INDEX(Sheet3!$C$3:$EE$3,,MIN(IF(Sheet3!$C$4:$EE$23=(Sheet2!C24&$K$18),COLUMN(Sheet3!$C:$EE)))-2),"NF")) to return a cell value in the top row of an array - a date in this case.
The search criteria is a combination of a unique project number and a 2 digit status alphanumerical code for the project. The array consists of 23 rows where combinations of the unique numbers are found, each with different status codes.
So essentially, I'm building a FILTERED project status dashboard that returns dates linked to the relevant project status.
The code above is inspired from ( LINK ) that uses a very similar layout, but it uses town suburbs linked to postal codes instead of project numbers and status codes. The formula works well (though, not entered as an array formula), but I don't have a single formula in the sheet, I have 3 300 occurrences of this formula.
The problem comes in when the user changes the FILTER - Excel recalculates the entire dashboard and that takes anywhere from 2 to 5 minutes to run. You hit the escape button and cancel the calculation after setting the filter, but Excel just starts calculating again after a few seconds. After that, Excel's response is sluggish and almost unusable. Yes - our hardware is pretty weak ...
I tried XLOOKUP as well, but can't set the "lookup_array" to an array ( Sheet3!$C$4:$EE$23 ) because it doesn't match the "return-array" ( Sheet3!$C$3:$EE$3 ) Concatenating the lookup arrays with & works, but then you'd have to do that for all 23 rows, and again, multiply that by 3 300.
I thought of creating a UDF, but the function will still be called every time Excel recalculates after filtering... 3 300 calls ...
Any ideas on how to make the INDEX version run faster, or make the XLOOKUP accept the lookup_array as Sheet3!$C$4:$EE$23 in the hopes that it'll run faster?
Thank you!
Not really an elegant solution, but it works.
I imported the dataset into a helper sheet, where I combined the cell value with the corresponding value in Column A for each row ( a name in this case ) and the date from row 1 for each column, using underscore as a delimiter.
This new data range was then given a unique name, EE in this case.
On a second helper sheet, using this formula =INDEX(Filtered,1+INT((ROW('Sheet1'!C3)-1)/COLUMNS(Filtered)),MOD(ROW('Sheet1'!C3)-1+COLUMNS(Filtered),COLUMNS(Filtered))+1) and drag it down till it returns an REF! error and going back one row before the error.
This transposes all the data into a single column G. Using =UNIQUE(SORT(FILTER(B3:B3240,B3:B3240<> "",""))) then gives me a filtered list of unique values in column H that I then run
=IF(H3="","",LEFT(H3, SEARCH("_",H3,1)-1)) for the first data value in I, and
=IF(H3="","",MID(H3, SEARCH("_",H3) + 1, SEARCH("_",H3,SEARCH("_",H3)+1) - SEARCH("_",H3) - 1)) for the middle data value in J, and
=IF(H3="","",IFERROR(TEXT(RIGHT(H3,5),"yyyy-mm-dd"),"NF")) for the last data value in K.
Then just run XLOOPUP across columns I, J and K.
Runs quick and easy and solves a few of the other issue I had as well.
The second data set has just over 35 000 rows - still works well and fast.

Populate a row on a sheet for each item on a table and several other cells on another sheet

I'm trying to copy data from a data entry/form sheet "SalesEntry" to a "SalesLog" table on a separate sheet "SalesLog". As I have done before with this code (adapted from here):
Private Sub SaleEntry_Click()
Dim config, itm, arr
Dim rw As Range, listCols As ListColumns
Dim shtForm As Worksheet
Set shtForm = Worksheets("SalesEntry") '<< data source
With Sheets("SalesLog").ListObjects("SalesLog")
Set rw = .ListRows.Add.Range 'add a new row and get its Range
Set listCols = .ListColumns 'get the columns collection
End With
'array of strings with pairs of "[colname]<>[range address]"
config = Array("Fecha<>B3", "Client<>E3", "Product<>?", "Quantity<>?", "Total Sale Price<>?", "Tax Charges<>D26", "Customs Charges<>D27", " Shipping Charges<>D28", "Sale Channel<>B5", "Sale Channel ID<>E5", "Payment Channel<>B7", "Payment Status<>B9", "Amount payed<>E9")
' loop over each item in the config array and transfer the value to the appropriate column
For Each itm In config
arr = Split(itm, "<>") ' split to colname and cell address
rw.Cells(listCols(arr(0)).Index).Value = shtForm.Range(arr(1)).Value
Next itm
End Sub
This works great for storing the info from a few original scattered cells in the entry sheet.
However the problem I have is that in the entry sheet I have a table which contains the items sold on each order, and I need to create a row on the table for each of those items while duplicating the info from the fields on the entry sheet that are not on the items table.
Find below a screenshot of the data entry form. In blue is the items table and in red are the scattered values I'd like to paste for each item on the SalesLog table.
And this is how the resulting table should look:
I have read several articles, the documentation and some posts here, but I'm not sure about the solution. I really like the code above and how it stores data in an array and populates things easily.
So far I have 3 possible courses of action:
Merge and fill two arrays: Create an array for the scattered fields (above the items table) and create an array for the items table and then merging them using the items array as a secondary dimension and duplicating every other field array item for as many items in this secondary dimension. I know how to create the two arrays, but I'm not sure how to merge them into a bidimensional array and a few attemps I've made have returned an error.
Double For loop: Create a loop that appends the scattered cell values for each item in the items table and then looping that result into the SalesLog table, as per the original code above.
To me this is the most feasible solution but I'm not sure about the loop order and I think using a bunch of ReDim Preserve is not the correct way and at the end of some loops I get a 1004 error or nothing happening.
Use a collection?: As far as I have read and since the items will vary with each sale, I've read that collections are more suited; although the size of the array could be simply retrieved by counting rows in the items table before any operation. I have never worked with Collections before and honestly can't tell how to use it.
Can someone point me in the right direction??
I would also like to include a Now () Timestamp for each Row when the user uses the macro(clicks on the "Add sale" button)
Don't overthink this.
You want to generate a row for each item, so the easiest approach is to count the items in the table, then run a for loop and for each item in the table copy the data from the table row and the scattered fields.
This is pretty straightforward and you don't need to mess around with arrays or collections. Why make things more complicated than they need to be?
pseudo code:
myCount = Count rows in item table
for i = 1 to myCount
copy table cell1 in table row i' this is from the table
copy table cell2 in table row i' this is from the table
copy cell3 ' this is from somewhere else in the form
...
copy celln
next i

Excel - setting a dynamic print area for a range of cells covered by an array formula

I have a spreadsheet with a page (Sheet 3) which I would like to export as a PDF (using a macro to export the pdf).
Currently, this links in to another worksheet, in which a user can put in a date range to pull out relevant data from a larger worksheet. This uses the following array formula to populate the data on Sheet 3:
=IFERROR(INDEX(Sheet1!$V$3:$W$5998,SMALL(IF((Sheet1!$C$3:$C$5998>=Crynodeb!$D$3)*(Sheet1!$C$3:$C$5998<=Crynodeb!$F$3)*(Sheet1!$V$3:$V$5998<>""),ROW(Sheet1!$V$3:$W$5998)-2),ROW(18:18)),1),"")
The array formula on Sheet 3 has been applied to around 6000 rows of data. So there is potential for 6000 lines of data to be returned. However, depending on the criteria the user has put in, maybe only 5 rows of data will be returned.
In addition to this, I've applied cell formatting to the 6000 rows so that there's a print-friendly line in between the rows of data.
However, because this has been applied to the 6000 rows of data, there could be 61 or so pages exported, when in reality only a page worth of data is displaying.
Is there an easy way to continue having the array formula applied across a large range, while limiting the print function to only apply to pages containing data that is returned from the array formula?
I'm also using the Format > AutoFit Row Height function to adjust the row height in accordance to the length of the returned items, but at the moment I think I have to do this manually every time I return data. Is there a way of applying that automatically to adjust around the content of the page?
Many thanks
Since you want to export the data to PDF via macro, we can just put everything you want into that one macro.
The macro needs to find the range that has values. From your example, I'm assuming there will always be data returned starting in A3; and then there will no blanks in Column A until the end of the data.
So, this macro starts at A3 and looks at each cell until it finds a blank value to determine where the data ends.
I'm also assuming you have a fixed number of columns of data. I can only see 2 columns so I've used that figure in macro. You can update the code where indicated to the actual number of columns.
Then, from A1 to the last row and column of the data, it autofits the rows and then saves that range as a PDF.
Public Sub SaveCopy()
Dim intLastRow As Integer
'Loops through column A to find last row with data
intLastRow = 3
Do While Cells(intLastRow + 1, 1).Value <> ""
intLastRow = intLastRow + 1
Loop
'With the range of data...
'NOTE: REPLACE '2' WITH ACTUAL NUMBER OF COLUMNS OF DATA
With Range("A1", Cells(intLastRow, 2))
'Ensure text is wrapping - required to autofit
.WrapText = True
'Autofit row height
.Rows.AutoFit
'Save as PDF
.ExportAsFixedFormat Type:=xlTypePDF, _
Filename:=ThisWorkbook.FullName, _
Quality:=xlQualityStandard, _
IncludeDocProperties:=True, _
IgnorePrintAreas:=False, _
OpenAfterPublish:=True
End With
End Sub
This is a basic export and will just save the PDF into the same location as the file and with the same filename (including the excel file extension) and just appends .PDF file extension.
There are various options for handling the filename and location of the PDF. With more information of what is required I could help provide a robust solution.
Alternatively, you can remove the line of code for exporting to PDF and replace it with just ".Select"
This will select the range of data and then you can manually save to PDF. Just use Save As... change the type to PDF, click Options and choose "Selection" in the "Publish what" section.
If any of my assumptions are incorrect, please let me know.

VBA group by process using Public Types

I want to create a counter in a routine that will count how many times a specific entry has appeared so far.
The routine that i have created so far populates data in a spreadsheet through a For..Next Loop. For each of these rows i have an extra column that will represent the counter and count how many times a characteristic of the entry row has appeared so far in the previous rows. For that, I am using the application.worksheetfunction.CountIf function but the reference range has to be dynamic.
For example, I have the following table
Example Table
the overall idea is to group by month and expense type and have the sum amount. The role of the counter is to identify these rows that can be grouped together and loop through their values and sum them. The table has approximately 10,000 rows and 53 columns. For this process, i have created the following public type:
>public type OP
>>Month as string
>>expense_type as string
>>amount as double
>end type
Sub NewOuput()
with sheet1
>for i=1 lastrow 'output is the existing table that i get the data and i want to manipulate and then populate them into another table of the same format
>>op.month=output(i,1)
>>op.expense_type=output(i,2)
>>op.amount=output(i,3)
'----------------------------
>> .cells(i,1)=op.month 'this is the population of hte data in the new table
>> .cells(i,2)=op.expense_type
>> .cells(i,3)=op.amount
next i
end with
end sub
Through functions, i try to identify the rows that need to sum-up and then call the respective functions in the output part of the loop.
Countif excel function cannot be appied with arrays, so this is now out of hte question. I have read many posts on various ways of grouping including data connections, collections and other customised approaches. Collections appeared to be the best ones but i miss some of hte background there.
Does this make any sense? Any suggestions are appreciated
I didn't actually grasp your exact needs, but since the table example image I'd go like follows:
Sub NewOuput()
With sheet1
'fill in the voids of 1st column
With .Range("A1:A" & .Cells(.Rows.Count, "B").End(xlUp).row) '<--| change "A" and "B" to your actual 1st and 2nd columns index
.SpecialCells(xlCellTypeBlanks).FormulaR1C1 = "=R[-1]C"
.Value = .Value
End With
'more code to exploit a "full" database structure
End With
End Sub

Excel VBA: Chart-making macro that will loop through unique name groups and create corresponding charts?

Alright, I've been racking my brain, reading up excel programming for dummies, and looking all over the place but I'm stressing over this little problem I have here. I'm completely new to vba programming, or really any programming language but I'm trying my best to get a handle on it.
The Scenario and what my goal is:
The picture below is a sample of a huge long list of data I have from different stream stations. The sample only holds two (niobrara and snake) to illustrate my problem, but in reality I have a little over 80 stations worth of data, each varying in the amount of stress periods (COLUMN B).
COLUMN A, is the station name column.
COLUMN B, stress period number
COLUMN C, modeled rate
COLUMN D, estimated rate
What I have been TRYING to figure out is how to make a macro that will loop through the station names (COLUMN A) and for each UNIQUE Group of station names, make a chart that will pop out to the right of the group, say in the COLUMN E area.
The chart is completely simple, it just needs two series scatterplot/line chart; one series with COLUMN B as x-value and COLUMN C as y-value; and the other series needs COLUMN B as x-value and COLUMN D as y-value.
Now my main ordeal, is that I don't know how to make the macro distinguish between station names, use all the data relating to that name to make the chart, then looping on to the next Station group and creating a chart that corresponds for that, and to continue looping through all 80+ station names in COLUMN A and to make the corresponding 80+ charts to the right of it all in somewhere like the COLUMN E.
If I had enough points to "bounty" this, I would in a heartbeat. But since I do not, whoever can solve my dilemma would receive my sincere gratitude in helping me understand run this problem smoothly and hopefully better my understanding of scenarios like this in the future. If there is anymore information that I need to clarify to make my question more understandable please comment your query and I'd be happy to explain in more detail the subject.
Cheers.
Oh, and for extra credit; now that I think about it, I manually entered the numbers in COLUMN B. Since the loop would need to use that column as the x-value it would be important if it could loop through itself and fill that column on its own before it made the chart (I would imagine it would have something to do with anything as simple as "counting out the rows that correspond to the station name". But again, I know not the proper terminology to correspond the station name, hence the pickle I'm in; however if the veteran programmer who is savvy enough to answer this question could, I'd imagine such a piece of code would be simple enough yet crucial to the success of such a macro I seek.
Try this
Sub MakeCharts()
Dim sh As Worksheet
Dim rAllData As Range
Dim rChartData As Range
Dim cl As Range
Dim rwStart As Long, rwCnt As Long
Dim chrt As Chart
Set sh = ActiveSheet
With sh
' Get reference to all data
Set rAllData = .Range(.[A1], .[A1].End(xlDown)).Resize(, 4)
' Get reference to first cell in data range
rwStart = 1
Set cl = rAllData.Cells(rwStart, 1)
Do While cl <> ""
' cl points to first cell in a station data set
' Count rows in current data set
rwCnt = Application.WorksheetFunction. _
CountIfs(rAllData.Columns(1), cl.Value)
' Get reference to current data set range
Set rChartData = rAllData.Cells(rwStart, 1).Resize(rwCnt, 4)
With rChartData
' Auto fill sequence number
.Cells(1, 2) = 1
.Cells(2, 2) = 2
.Cells(1, 2).Resize(2, 1).AutoFill _
Destination:=.Columns(2), Type:=xlFillSeries
End With
' Create Chart next to data set
Set chrt = .Shapes.AddChart(xlXYScatterLines, _
rChartData.Width, .Range(.[A1], cl).Height).Chart
With chrt
.SetSourceData Source:=rChartData.Offset(0, 1).Resize(, 3)
' --> Set any chart properties here
' Add Title
.SetElement msoElementChartTitleCenteredOverlay
.ChartTitle.Caption = cl.Value
' Adjust plot size to allow for title
.PlotArea.Height = .PlotArea.Height - .ChartTitle.Height
.PlotArea.Top = .PlotArea.Top + .ChartTitle.Height
' Name series'
.SeriesCollection(1).Name = "=""Modeled"""
.SeriesCollection(2).Name = "=""Estimated"""
' turn off markers
.SeriesCollection(1).MarkerStyle = -4142
.SeriesCollection(2).MarkerStyle = -4142
End With
' Get next data set
rwStart = rwStart + rwCnt
Set cl = rAllData.Cells(rwStart, 1)
Loop
End With
End Sub

Resources