Unique data consolidation across multiple Worksheets - database

I have six worksheets which I want to take the unique id's from a specific column and consolidate them into 1 master (in order to do some analysis and different data representation).
The data all starts from the same cell C17, but ends at different rows (ie C180, C268, etc). I want to be able to consolidate the unique ids from all six spreadsheets weekly.
Is there a solution that will not use array formulas as that SERIOUSLY causes a problem due to the sheer number of rows and resources needed to calculate the list. VBA automation is preferred where the cell ranges for consolidation can be dynamic and the sheet names are referenced from specific cells in the master sheet (it will never be deleted or altered to the extent of the six others)?
So, I would run a macro which will consolidate all the data based off either a named range or specific cells with the sheet names & ranges in them (using indirect to use those strings) and paste that into a new range.
UDF's would be acceptable as well, I just do not want Excel to "freeze" doing calculations.
BTW, I did read Getting unique values in Excel by using formulas only but those solutions only work if the data is on the same sheet or under very specific conditions. Also the array formulas would not work efficiently since my data is literally thousands of rows long.
Edit:
Here's a test macro I used to get data from one sheet, but the problem is I can't use dynamics nor add the rest of the names from the other sheets because the range I copy it to I can't pull the first blank cell after the id's copied.
Sub ConsolidateDATA()
'yStr = Evaluate("=ADDRESS(MIN(IF($C$10:$C$9999 = "", ROW($C$10:$C$9999))), 3, 1, 1)")
'Attempted dynamic range copy ^ - failed
yStr = "C10"
Range("Sheet1!$B$5:$B$29").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:="", CopyToRange:=Range(yStr), Unique:=True
End Sub
I have also had successful attempts with array formulas, but unfortunately they are resource intensive that they are REALLY bad solutions.
-- Array formula to combine lists into 1 master
=IFERROR(INDEX(INDIRECT($B$6, TRUE), ROWS(B$13:$B14)), IFERROR(INDEX(INDIRECT($B$7, TRUE), ROWS(B$13:$B14) - ROWS(INDIRECT($B$6, TRUE))), IFERROR(INDEX(INDIRECT($B$8, TRUE), ROWS(B$13:$B14) - ROWS(INDIRECT($B$6, TRUE)) - ROWS(INDIRECT($B$7))), IFERROR(INDEX(INDIRECT($B$9, TRUE), ROWS(B$13:$B14) - ROWS(INDIRECT($B$6, TRUE)) - ROWS(INDIRECT($B$7)) - ROWS(INDIRECT($B$8))), IFERROR(INDEX(INDIRECT($B$10, TRUE), ROWS(B$13:$B14) - ROWS(INDIRECT($B$6, TRUE)) - ROWS(INDIRECT($B$7)) - ROWS(INDIRECT($B$8)) - ROWS(INDIRECT($B$9, TRUE))), IFERROR(INDEX(INDIRECT($B$11, TRUE), ROWS(B$13:$B14) - ROWS(INDIRECT($B$6, TRUE)) - ROWS(INDIRECT($B$7)) - ROWS(INDIRECT($B$8)) - ROWS(INDIRECT($B$9, TRUE)) - ROWS(INDIRECT($B$10, TRUE))),IFERROR(INDEX(INDIRECT($B$12, TRUE), ROWS(B$13:$B14) - ROWS(INDIRECT($B$6, TRUE)) - ROWS(INDIRECT($B$7)) - ROWS(INDIRECT($B$8)) - ROWS(INDIRECT($B$9, TRUE)) - ROWS(INDIRECT($B$10, TRUE)) - ROWS(INDIRECT($B$11, TRUE))),"")))))))
-- Array formula to get just unique data
=INDEX(TotalNameListRangeFromFormulaAbove, MATCH(0, COUNTIF($D$16:D16, TotalNameListRangeFromFormulaAbove), 0))

I think a combination of loops and collections might solve your problem :)
http://excelmacromastery.com/Blog/index.php/the-complete-guide-to-collections-in-excel-vba/
for i = 1 to UBound(worksheetcount, 1)
for j = 1 to UBound(cellrangecount, 1)
With CreateObject("scripting.dictionary")
For Each "Key" In cellrangecount(cellrangecount)
If Not .Exists(Key) Then .Add Key, Key & "_content"
Next j
next i
End With
I believe this is good enough to get you started on the correct path. I ended up using a dictionary instead of a collection but you can change that if you'd like. Only minor differences on declarations and adding but essentially the same (anecdotally speaking, there are a couple of huge differences, not that I think it matters here). Give me some time and I'll return with something more polished/finished than just the "basic idea of how it might work."
link for dictionaries from the same guy (i really love the way this guy elaborates on stuff)
http://excelmacromastery.com/Blog/index.php/vba-dictionary/

Related

Matching and replacing a selection of data from two different dataframes

(First time posting so please bear with) I have two different dataframes, one of which contains a column of replacement data for a selection of data within the first dataframe.
#dataframe 1
df<-data.frame(site= rep(1:4,3), landings = rep("val",12),
harbour = c("a","b","c","d","e","f","g","h","i","j","k","l"))
#dataframe 2
new_site4<-data.frame(harbour = c("a","b","c","d","e","f","g","h","i","j","k","l"),
sub_site = c("x","x","y","x","y","y","y","x","y","x","y","y") )
I want to replace the "site" in dataframe 1 with the "subsite" in dataframe 2 based on the match of "harbour" however I only need to do it for records for site "4".
Is there a neat way to select only site 4 and then replace the site number with the subsite, ideally without merging or without creating a whole new dataframe. My real dataset is large but the key is only small as it only refers to a small selection of the data which needs the subsite added.
I tried using match() on my main dataset but for some reason it only matched some of the required data not all of it, but this code wont work on my sample data either.
#df$site[match(df$harbour, new_site4$harbour)] <- new_site4$sub_site[match(df$harbour, df$harbour)]`

Filter Array For IDs Existing in Another Array with Ruby on Rails/Mongo

I need to compare the 2 arrays declared here to return records that exist only in the filtered_apps array. I am using the contents of previous_apps array to see if an ID in the record exists in filtered_apps array. I will be outputting the results to a CSV and displaying records that exist in both arrays to the console.
My question is this: How do I get the records that only exist in filtered_apps? Easiest for me would be to put those unique records into a new array to work with on the csv.
start_date = Date.parse("2022-02-05")
end_date = Date.parse("2022-05-17")
valid_year = start_date.year
dupe_apps = []
uniq_apps = []
# Finding applications that meet my criteria:
filtered_apps = FinancialAssistance::Application.where(
:is_requesting_info_in_mail => true,
:aasm_state => "determined",
:submitted_at => {
"$exists" => true,
"$gte" => start_date,
"$lte" => end_date })
# Finding applications that I want to compare against filtered_apps
previous_apps = FinancialAssistance::Application.where(
is_requesting_info_in_mail: true,
:submitted_at => {
"$exists" => true,
"$gte" => valid_year })
# I'm using this to pull the ID that I'm using for comparison just to make the comparison lighter by only storing the family_id
previous_apps.each do |y|
previous_apps_array << y.family_id
end
# This is where I'm doing my comparison and it is not working.
filtered_apps.each do |app|
if app.family_id.in?(previous_apps_array) == false
then #non_dupe_apps << app
else "No duplicate found for application #{app.hbx_id}"
end
end
end
So what am I doing wrong in the last code section?
Let's check your original method first (I fixed the indentation to make it clearer). There's quite a few issues with it:
filtered_apps.each do |app|
if app.family_id.in?(previous_apps_array) == false
# Where is "#non_dupe_apps" declared? It isn't anywhere in your example...
# Also, "then" is not necessary unless you want a one-line if-statement
then #non_dupe_apps << app
# This doesn't do anything, it's just a string
# You need to use "p" or "puts" to output something to the console
# Note that the "else" is also only triggered when duplicates WERE found...
else "No duplicate found for application #{app.hbx_id}"
end # Extra "end" here, this will mess things up
end
end
Also, you haven't declared previous_apps_array anywhere in your example, you just start adding to it out of nowhere.
Getting the difference between 2 arrays is dead easy in Ruby: just use -!
uniq_apps = filtered_apps - previous_apps
You can also do this with ActiveRecord results, since they are just arrays of ActiveRecord objects. However, this doesn't help if you specifically need to compare results using the family_id column.
TIP: Getting the values of only a specific column/columns from your database is probably best done with the pluck or select method if you don't need to store any other data about those objects. With pluck, you only get an array of values in the result, not the full objects. select works a bit differently and returns ActiveRecord objects, but filters out everything but the selected columns. select is usually better in nested queries, since it doesn't trigger a separate query when used as a part of another query, while pluck always triggers one.
# Querying straight from the database
# This is what I would recommend, but it doesn't print the values of duplicates
uniq_apps = filtered_apps.where.not(family_id: previous_apps.select(:family_id))
I highly recommend getting really familiar with at least filter/select, and map out of the basic array methods. They make things like this way easier. The Ruby docs are a great place to learn about them and others. A very simple example of doing a similar thing to what you explained in your question with filter/select on 2 arrays would be something like this:
arr = [1, 2, 3]
full_arr = [1, 2, 3, 4, 5]
unique_numbers = full_arr.filter do |num|
if arr.include?(num)
puts "Duplicates were found for #{num}"
false
else
true
end
end
# Duplicates were found for 1
# Duplicates were found for 2
# Duplicates were found for 3
=> [4, 5]
NOTE: The OP is working with ruby 2.5.9, where filter is not yet available as an array method (it was introduced in 2.6.3). However, filter is just an alias for select, which can be found on earlier versions of Ruby, so they can be used interchangeably. Personally, I prefer using filter because, as seen above, select is already used in other methods, and filter is also the more common term in other programming languages I usually work with. Of course when both are available, it doesn't really matter which one you use, as long as you keep it consistent.
EDIT: My last answer did, in fact, not work.
Here is the code all nice and working.
It turns out the issue was that when comparing family_id from the set of records I forgot that the looped record was a part of the set, so it would return it, too. I added a check for the ID of the array to match the looped record and bob's your uncle.
I added the pass and reject arrays so I could check my work instead of downloading a csv every time. Leaving them in mostly because I'm scared to change anything else.
start_date = Date.parse(date_from)
end_date = Date.parse(date_to)
valid_year = start_date.year
date_range = (start_date)..(end_date)
comparison_apps = FinancialAssistance::Application.by_year(start_date.year).where(
aasm_state:'determined',
is_requesting_voter_registration_application_in_mail:true)
apps = FinancialAssistance::Application.where(
:is_requesting_voter_registration_application_in_mail => true,
:submitted_at => date_range).uniq{ |n| n.family_id}
#pass_array = []
#reject_array = []
apps.each do |app|
family = app.family
app_id = app.id
previous_apps = comparison_apps.where(family_id:family.id,:id.ne => app.id)
if previous_apps.count > 0
#reject_array << app
puts "\e[32mApplicant hbx id \e[31m#{app.primary_applicant.person_hbx_id}\e[32m in family ID \e[31m#{family.id}\e[32m has registered to vote in a previous application.\e[0m"
else
<csv fields here>
csv << [csv fields here]
end
end
Basically, I pulled the applications into the app variable array, then filtered them by the family_id field in each record.
I had to do this because the issue at the bottom of everything was that there were records present in app that were themselves duplicates, only submitted a few days apart. Since I went on the assumption that the initial app array would be all unique, I thought the duplicates that were included were due to the rest of the code not filtering correctly.
I then use the uniq_apps array to filter through and look for matches in uniq_apps.each do, and when it finds a duplicate, it adds it to the previous_applications array inside the loop. Since this array resets each go-round, if it ever has more than 0 records in it, the app gets called out as being submitted already. Otherwise, it goes to my csv report.
Thanks for the help on this, it really got my brain thinking in another direction that I needed to. It also helped improve the code even though the issue was at the very beginning.

Populate a Multi-column Combobox with a 2D array on Access

I tried to follow this method:
ComboBox1.ColumnCount = 2
Dim Films(1 To 5, 1 To 2) As String
Dim i As Integer, j As Integer
Films(1, 1) = "Lord of the Rings"
Films(2, 1) = "Speed"
Films(3, 1) = "Star Wars"
Films(4, 1) = "The Godfather"
Films(5, 1) = "Pulp Fiction"
Films(1, 2) = "Adventure"
Films(2, 2) = "Action"
Films(3, 2) = "Sci-Fi"
Films(4, 2) = "Crime"
Films(5, 2) = "Drama"
ComboBox1.List = Films
source
But the .List property does not work on Access. Any ideas ?
As June7 said, use the ComboBox.AddItem() method in a loop. For your purposes, the ComboBox must not be bound to a data source: It's Row Source Type property should be set to "Value List". To add a multi-column string to a ComboBox row, use a semicolon to delimit the the columns. For example:
ComboBox1.AddItem (Films(1, 1) & ";" & Films(1, 2))
or
Dim rowStr As String
rowStr = Films(1, 1) & ";" & Films(1, 2)
ComboBox1.AddItem (rowStr)
AddItem() automatically appends the row to the end of the ComboBox's list, if you do not specify a row index parameter. For more info, see ComboBox.AddItem method at Office Dev Center.
Screenshot: VBA Demonstration Image
A "Form" in Access is not the same kind of element/object as a "UserForm" is in Excel where your "source" link points to (https://www.excel-easy.com/vba/examples/multicolumn-combo-box.html).
In Access it would be a good idea to get the information into your Combo Box (or List Box) from either a table or a query. You can of course code it with VBA, but then you might find yourself adding/editing a hole lot of VBA here and there, as in Access it all goes more naturally by using SQL and the database engine.
This is a larger topic, but basically you should probably have different tables for "Films" and for "Categories"
Table1:
Table2:
Then you should define the relationships since most likely there are different amount of films in your database than there are categories. Saying that we would like to avoid a situation that you would have to add another movie, let's say "Die hard" into your movie list. That would probably fall into the category "Action". In the database we do not want to repeat ourselves. Just we will, by ID, refer to categoryID by it's value.
So, having done that you need to create a form in Access. Create maybe a query that will get the values for you:
After this you can define the source for the combo e.g. by using wizard:
So this way you can maintain each of the lists separately in their own tables.
Here is the query that got created:
On the Data tab you can decide which bound column to use relative to datasource.
On the Format tab you can adjust the widths of the columns in your combobox. Use 0 length to hide a column.
This way no VBA code is needed.
If needed it is also possible to create or edit the queries with VBA but that is another story.
Hope this helps.

VBA to paste only certain values of cell from one sheet to another

Can some one help me with the below code, what I am looking for is, from sheet "Form" certain values of cells mentioned in 2 sets of Array.
1st set of Array should get copied to sheet "Tracker" C3 onward and second set of array from next cell after the 1set of array ends say EF3 onwards.
whereas now first sett is its pasting from A3 and second from A4. Please let me know in case of any question.
Following is the code which I am using now:
Sub AddEntry()
Dim LR As Long, i As Long, cls
Dim LR2 As Long, j As Long, cls2
cls = Array("C2", "C3", "G2", "G3", "C5", "C6", "C7", "C8", "C9", "C10", "C11", "C12", "C13", "A17", "C17", "D17", "F17", "G17", "H17", "A18", "C18", "D18", "F18", "G18", "H18", "A19", "C19", "D19", "F19", "G19", "H19", "A20", "C20", "D20", "F20", "G20", "H20", "A21", "C21", "D21", "F21", "G21", "H21", "A25", "B25", "C25", "D25", "E25", "F25", "G25", "H25", "A26", "B26", "C26", "D26", "E26", "F26", "G26", "H26", "A27", "B27", "C27", "D27", "E27", "F27", "G27", "H27", "A28", "B28", "C28", "D28", "E28", "F28", "G28", "H28", "A32", "C32", "E32", "G32", "H32", "A33", "C33", "E33", "G33", "H33", "A34", "C34", "E34", "G34", "H34", "A35", "C35", "E35", "G35", "H35", "A39", "D39", "F39", "A40", "D40", "F40", "A41", "D41", "F41", "A45", "C45", "E45", "G45", "A46", "C46", "E46", "G46", "A47", "C47", "E47", "G47", "D51", "D52", "D53", "D54", "D55", "D56", "D57", "D58", "D59", "D60", "D61", "D62", "D63", "D64", "D65", "D66", "D67")
With Sheets("Tracker")
LR = WorksheetFunction.Max(3, .Range("C" & Rows.Count).End(xlUp).Row + 1)
For i = LBound(cls) To UBound(cls)
.Cells(LR, i + 1).Value = Sheets("Form").Range(cls(i)).Value
Next i
End With
cls2 = Array("E51", "E52", "E53", "E54", "E55", "E56", "E57", "E58", "G59", "E60", "E61", "E62", "G63", "E64", "E65", "E66", "E67", "C70", "D70", "E70", "F70", "G70", "H70", "C71", "E71", "G71", "C72", "E72", "G72", "C73", "E73", "G73", "C74", "E74", "G74", "C75", "E75", "G75", "C76", "E76", "G76", "C77", "E77", "G77", "C78", "E78", "G78", "C79", "E79", "G79", "C82", "D82", "E82", "F82", "G82", "H82", "C83", "E83", "G83", "C84", "E84", "G84", "B88", "B89", "B90", "B91", "C88", "C89", "C90", "C91", "D88", "D89", "D90", "D91", "E88", "E89", "E90", "E91", "F88", "F89", "F90", "F91", "G88", "G89", "G90", "G91", "H88", "H89", "H90", "H91")
With Sheets("Tracker")
LR2 = WorksheetFunction.Max(3, .Range("EW" & Rows.Count).End(xlUp).Row + 1)
For j = LBound(cls2) To UBound(cls2)
.Cells(LR, j + 1).Value = Sheets("Form").Range(cls2(j)).Value
Next j
End With
End Sub
Assuming that you want to start cell entries in sheet "Tracker" more to the right, you can add the column number instead of +1 (= column A) and write as follows:
Array 1: assigning cell values starting from column C
.Cells(LR, i + [C1].Column).Value = Sheets("Form").Range(cls(i)).Value
Array 2: assigning cell values starting from column EF
' should be LR2 instead of LR :-)
.Cells(LR2, j + [EF1].Column).Value = Sheets("Form").Range(cls2(j)).Value
Note
[C1].column returns the column number (in any worksheet), e.g. column C Counts 3.
I took a look at your file; the first thing I did was flip through the VBA & try to compile it -- which incidentally, I would recommended to anyone as a first step with a downloaded XLSM. (I haven't seen a malicious macro yet and I'd like to keep it that way!)
I can see that this file has been a "work in progress" because there are bits of code here and there that don't compile properly, such as Me statements pointing to a missing userform, and references to mis-named worksheets such as Form (View) instead of View_Form.
Ideally, this project should be moved from Excel to Access. Excel can be used for filling forms and storing data, but if this is potentially going to sizable, you're best off to use "the right tool for the job". Duplicating your form(s) into Access forms instantly removes the need to copy certain cells to certain sheets, not to mention ease of validation, reporting, security, and unlimited room for expansion plus ease of moving data between Excel, Access, Outlook, etc.
(You even called the spreadsheet a database in one spot!) If your concern is that you're unfamiliar with Access, if you designed this workbook, migration to Access will be a breeze once you figure out the basics of table and form design.
Even Outlook has some pretty nifty form capabilities which can autopopulate the data table when an emailed form is received.
If you need to stay in Excel, how about a User Form instead of the sheet-based form? I too often see people forgetting about Office's built-in features and starting from scratch. That being said, I've been a user of MS Office for 25 years and have never used an Excel User Form. When I think "form", I think MS Access.
Another option, if you want to stay with the worksheet-based form, instead of listing all the cells in the array etc, a minor redesign could make it simpler. One way would be to have a hidden row on the form tab so you have a single uninterrupted line of all the data you need to store. For example, you could hide row 1 and 2, make row 1 the headings like Sourced Processed Year Address etc. and then row 2 could be an "interim" place to store the data, so A2 formula is =C2, B2 is =C3', B3 is=C5` etc.
Finally another sneaky option could be to add hidden comments in each cell that has data that needs to be saved, and then when the form is complete, loop through all the cells looking for comments, and each comment would contain a title or cell reference indicating where that cell's data needs to go.
The destination should be a very straightforward table Use as many columns as you need, but it's not a place for formatting or formulas. (Think database!)
For example, C2 (Sourced By) could have a hidden comment like "Tracker:C" then when the form is filled, you could parse the comments and move the data dynamically (instead of hardcoding 250 cell addresses!) with something like:
Option Explicit
Sub moveData() 'untested; example only
Dim cell As Variant, nextBlankRow As Integer
Dim comm As String, sht As String, col As String
nextBlankRow = 5 'calculate this somehow
'loop through cells with comments
For Each cell In ActiveSheet.Cells.SpecialCells(xlCellTypeComments)
If cell.Comment.Text <> "" Then
'get comment
comm = cell.Comment.Text
'extract location for data like "Sheetname:Columnletter"
sht = Left(comm, InStr(comm, ":") - 1)
col = Right(comm, Len(comm) - InStr(comm, ":"))
'populate correct location with data
Sheets(sht).Range(col & nextBlankRow).Value = cell.Value
End If
Next cell
End Sub
As with anything in Excel (or Office in General) there are a dozen ways you could accomplish the same task. Opt for the ones that don't involve repeating the same code over and over, nor hardcoded data. Planning for future (unexpected) growth is very important, as is debugging as-you-go, which is my last suggestion:
Option Explicit
at the top of every module, and Alt+DLcompile often, removing or commenting-out unused code.
Bottom line, best bet: Access, Excel, Outlook all have form capabilities built in. use a form for a form and you'll save yourself a headache now and later.
Hopefully this gives you some ideas.
Good Luck!

Moving a set of names from Excel to VBA in an array, then using it as a wildcard filter parameter

I have a list in Excel. This list is ever-growing, ever-changing, and is currently defined with a
ListbyName =offset(A1,0,0,counta(A1:A200),1)
Ok, great. Now, I get reports where I need to pull only the elements from the list to a second report. Simply dropping all of the data then filtering isn't workable, since it can only be the "essential" data that moves over. Ok, fine:
Dim listbyName As Variant
Set listbyName = Range("ListbyName")
Set CSTab = CSReport.Sheets(1)
LastRowCS = CSTab.Range("A" & Rows.Count).End(xlUp).Row
CSTab.Range("$A$1:$W$100").AutoFilter Field:=4, Criteria1:=listbyNumber, Operator:=xlFilterValues
Set RngCS = CSTab.Range("A1:W" & LastRowCS).SpecialCells(xlCellTypeVisible) RngCS = RngCS.SpecialCells(xlCellTypeVisible)
Set CabReport.Sheets("CS").Range("C1").Resize(RngCS.Rows.Count).Value = RngCS.Value
Now, where I'm struggling for a bit is the next step - I need to wildcard each entry in the array when filtering for it. So instead of the data being "Apples, Oranges, Potatoes", and I'm looking to get just the apples and oranges, it's "This product contains Apples", "May have Oranges in it" "Apple", etc. How can I put wildcards around each entry in the array, so that when it filters for that entry, it finds everything?
A second example would be, in ugly brute-force manner, of what I'm trying to accomplish:
ActiveSheet.Range("$A$1:$W$71").AutoFilter Field:=1, Criteria1:="=*Apples*", Criteria2:="=*Oranges*", Operator:=xlFilterValues
But flexible enough to accommodate additional add-ons.
How can I get wildcards in my filters? And is my method of grabbing data, filtering it, and pasting it workable, or have I made a fundamental mistake somewhere along the line?
Edit: It's also unhappy with my filter method, error 1004.

Resources