I've passed arrays back and forth from spreadsheets to VBA functions many times. I recently "upgraded" to Excel 365, and now I can't get it to work. For example:
Public Function test(x As Range)
Dim y()
ReDim y(3)
y(1) = 1
y(2) = 2
y(3) = 3
test = y
End Function
Then I highlight three cells, for example, b1:b3, and in the top cell I enter =test(a1:a2) and hit shift-enter. This fills the range with an array formula that is supposed to receive y() from the test function.
However, the cells that reference the function are all zeroes. I put in debugging lines and I can tell the function is running as intended. It's just not passing the array to the spreadsheet.
What's up with that? Has anyone else had this experience?
#RDHS, #tim-williams and #garys-student - thank you for your spot-on answers. And Gary's student - thanks for the incredibly quick response. I'd vote everyone up but I can't 'cuz i'm noob.
But... for completeness' sake -- Your answer raise another question (of a more theoretical type): I SHOULD BE able to coerce a one-dimensional array into a range column directly, and vice versa.
Obviously it's easy enough to check the shape of the range and transform it accordingly (well, it's easy now that that you've shown me how!) But it's so sloppy:
using the above example, instead of just writing
test = y
I need to write:
if x.rows.count=1 then
test = y
else
test = worksheetfunction.transpose(y)
end if
I don't know about you but I'd take Door # 1 (test=y). The other way is SOOOO sloppy.
But MS is holding out on us - excel doesn't force you to do those gymnastics when using built-in spreadsheet array functions like index, match, etc... Index(C1:C10,3) and index(a3:k3,3) both return the value in C3, which is the third ITEM in each ARRAY. Index is smart enough to figure out which is the third item. Surely if you can do it on a worksheet, there must be a way to do it in VBA??
My favorite Comp. Sci. professor - one of the founders of the field of computer science - used to say, "A programming language is low level when its programs require attention to the irrelevant."
He actually made a lot of insightful observations, which he distributed over the ARPANET, making him one of the world's first bloggers (Google Alan Perlis). For twenty years, every programmer had a list of Perlisisms taped above his VT100 -- like:
"In computing, turning the obvious into the useful is a living definition of the word 'frustration'";
"One man's constant is another man's variable";
"Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it."
I bring him up because the desire to produce "clean" code goes way back to the first coders on the first computers. And I was very fond of him.
Give this a try:
Public Function test(x As Range)
Dim y()
ReDim y(3)
y(0) = 1
y(1) = 2
y(2) = 3
test = WorksheetFunction.Transpose(y)
End Function
Related
We have a dataset of 1222x20 in Stata.
There are 611 individuals, such that each individual is in 2 rows of the dataset. There is only one variable of interest in each second row of each individual that we would like to use.
This means that we want a dataset of 611x21 that we need for our analysis.
It might also help if we could discard each odd/even row, and merge it later.
However, my Stata skills let me down at this point and I hope someone can help us.
Maybe someone knows any command or menu option that we might give a try.
If someone knows such a code, the individuals are characterized by the variable rescode, and the variable of interest on the second row is called enterprise.
Below, the head of our dataset is given. There is a binary time variable followup, where we want to regress the enterprise(yes/no) as dependent variable at time followup = followup onto enterprise as independent variable at time followup = baseline
We have tried something like this:
reg enterprise(if followup="Folowup") i.aimag group loan_baseline eduvoc edusec age16 under16 marr_cohab age age_sq buddhist hahl sep_f nov_f enterprise(if followup ="Baseline"), vce(cluster soum)
followup is a numeric variable with value labels, as its colouring in the Data Editor makes clear, so you can't test its values directly for equality or inequality with literal strings. (And if you could, the match needs to be exact, as Folowup would not be read as implying Followup.)
There is a syntax for identifying observations by value labels: see [U] 13.11 in the pdf documentation or https://www.stata-journal.com/article.html?article=dm0009.
However, it is usually easiest just to use the numeric value underneath the value label. So if the variable followup had numeric values 0 and 1, you would test for equality with 0 or 1.
You must use == not = for testing for equality here:
... if followup == 1
For any future Stata questions, please see the Stata tag wiki for detailed advice on how to present data. Screenshots are usually difficult to read and impossible to copy, and leave many details about the data obscure.
I am wondering why indexing Julia's DataArrays with NA values is not possible.
Excuting the snipped below results in an error(NAException("cannot index an array with a DataArray containing NA values")):
dm = data([1 4 7; 2 5 8; 3 1 9])
dm[dm .== 5] = NA
dm[dm .< 3] = 1 #Error
dm[(!isna(dm)) & (dm .< 3)] = 1 #Working
There is a solutions to ignore NA's in a DataFrame with isna(), like answered here. At a first glance it works like it should and ignoring NA's in DataFrames is the same approach like for the DataArrays, because each column of a DataFrame is a DataArray, stated here. But in my opinion ignoring missing values with !isna() on each condition is not the best solution.
For me it's not clear why the DataFrame Module throws an error if NA's are included. If the boolean Array needed for indexing, has NA's values, this values should convert to false like MATLAB® or Pythons Pandas does. In the DataArray modules sourcecode(shown below) in indexing.jl, there is an explicit function to throw the NAException:
# Indexing with NA throws an error
function Base.to_index(A::DataArray)
any(A.na) && throw(NAException("cannot index an array with a DataArray containing NA values"))
Base.to_index(A.data)
end
If you change the snippet by setting the NA's to false ...
# Indexing with NA throws an error
function Base.to_index(A::DataArray)
A[A.na] = false
any(A.na) && throw(NAException("cannot index an array with a DataArray containing NA values"))
Base.to_index(A.data)
end
... dm[dm .< 3] = 1 works like it should(like in MATLAB® or Pandas).
For me it make no sense to automatically throw error if NA's are included on indexing. There should leastwise be a parameter creating the DataArray to let the user choose if NA's are ignored. There are two siginificant reasons: On the one hand it's not very pleasent for writing and reading code, when you have formulas with a lot of indexing and NA values (e.g calculating meteorological grid models) and on the other hand there is a noticeable loss of performance, which this timetest is showing:
#timeit dm[(!isna(dm)) & (dm .< 3)] = 1 #14.55 µs per loop
#timeit dm[dm .< 3] = 1 #754.79 ns per loop
What is the reason that the developers make use of this exception and is there another simpler approach as the !isna() for ignoring NA's in DataArrays?
Suppose you have three rabbits. You want to put the female rabbit(s) in a separate cage from the males. You look at the first rabbit, and it looks like a male, so you leave it where it is. You look at the second rabbit, and it looks like a female, so you move it to the separate cage. You can't really get a good look at the third rabbit. What should you do?
It depends. Maybe you're fine with leaving the rabbit of unknown sex behind. But if you're separating out the rabbits because you don't want them to make baby rabbits, then you might want your analysis software to tell you that it doesn't know the sex of the third rabbit.
Situations like this arise often when analyzing data. In the most pathological cases, data is missing systematically rather than at random. If you were to survey a bunch of people about how fluffy rabbits are and whether they should be eaten more, you could compare mean(fluffiness[should_be_eaten_more]) and mean(fluffiness[!should_be_eaten_more]). But, if people who really like rabbits are incensed that you're talking about eating them at all, they might leave that second question blank. If you ignore that, you will underestimate the mean fluffiness rating among people who don't think rabbits should be eaten more, which would be a grave mistake. This is why fluffiness[!should_be_eaten_more] will throw an error if there are missing values: It is a sign that whatever you are trying to do with your data may not give the right results. This situation is bad enough that people write entire papers about it, e.g. this one.
Enough about rabbits. It is possible that there should be (and may someday be) a more concise way to drop/keep all missing values when indexing, but it will always be explicit rather than implicit for the reason described above. As far as performance goes, while there is a slowdown for isna(x) & (x < 3) vs x < 3, the overhead of repeatedly indexing into an array is also high, and DataArrays adds additional overhead on top of that. The relative overhead decreases as the array gets larger. If this is a bottleneck in your code, your best bet is to write it differently.
I have a problem concerning efficiency and algorithms when it comes to finding the difference between two very large arrays. I'm hoping someone with a good understanding of algorithms can point me to the right direction in how to solve this as my current implementations is taking an extremely long amount of time.
Problem:
I have two very large arrays. One contains a list of emails that have invalid domain names and the other is a mixed list that I need to check against the first array.
accounts_with_failed_email_domains = [279,000 records in here]
unchecked_account_domains = [149,000 records in here]
What I need to do is go through the list of unchecked_account_domains and then compare each entry to see if there is a match in the accounts_with_failed_email_domains. I need to insert all matches between the lists in a separate array to be processed later.
How can I efficiently write something that can quickly check through these accounts. Here is what I have tried so far.
unchecked_account_domains = [really big array]
unchecked_account_domains = unchecked_account_domains.sort
accounts_with_failed_email_domains = [another huge array].sort
unchecked_account_domains.keep_if do |email|
accounts_with_failed_email_domains.any? { |failed_email| email == failed_email }
end
# Count to see how many accounts are left
puts unchecked_account_domains.count
This above implementation has been running forever. Here is the second attempt which still proved not to be any better.
unchecked_account_domains = [really big array]
unchecked_account_domains = unchecked_account_domains.sort
accounts_with_failed_email_domains = [another huge array].sort
unchecked_account_domains.each do |email|
accounts_with_failed_email_domains.bsearch do |failed_email|
final_check << email if email == failed_email
end
end
# Count to see how many accounts are left
puts final_check.count
bsearch seemed to be promising, but I'm pretty sure I'm not using this correctly. Also, I tried looking into this question comparing large lists but this is in python and I can't seem to find a Ruby equivalent for set. Does anyone have any ideas on how to solve this?
It seems like you can use Array#-:
result = unchecked_account_domains - accounts_with_failed_email_domains
I didn't have a new solution here, because the good answers were already taken. However, I wanted to see if there was a performance difference between the two code-based solutions.
This answer is a benchmark to highlight any performance differences in the use of Array#- and two uses of Set#include?. The first Set#include? benchmark always does the set conversion, and the second converts once and keeps the set for subsequent searches.
Here's the code that runs each test 50 times:
require 'set'
require 'benchmark'
string = 'asdfghjkl'
Times = 50
a = 279_000.times.map {|n| "#{n}#{string}" }
b = 149_000.times.map {|n| "#{n*2}#{string}" }
puts RUBY_DESCRIPTION
puts "============================================================"
puts "Running tests for trimming strings"
Benchmark.bm(20) do |x|
x.report("Array#-:") { Times.times {|n| a - b } }
x.report("Set#include? #1:") do
Times.times do |n|
d = []
c = Set.new(b)
a.each {|email| d << email if c.include?(email) }
end
end
x.report("Set#include? #2:") do
c = Set.new(b)
Times.times do |n|
d = []
a.each {|email| d << email if c.include?(email) }
end
end
end
Here are the results:
ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-darwin14]
============================================================
Running tests for trimming strings
user system total real
Array#-: 12.350000 0.250000 12.600000 ( 13.001546)
Set#include? #1: 16.090000 0.330000 16.420000 ( 17.196469)
Set#include? #2: 8.250000 0.100000 8.350000 ( 8.726609)
Clearly, if you just need a single differences comparison, use the Array#- approach. However, if you need to do this type of thing multiple times, pre-converting the set makes a tremendous difference and performs better than Array#-. The cost of converting the Array to a Set is fairly high (comparatively), but once you have a Set, it performs the difference comparison much more quickly.
A Set would be useful here if you know the array contains unique items (or you're not bothered by losing duplicates - which I don't think you are), so simply take your big array and do:
require 'set'
unchecked_account_domains = [really big array]
accounts_with_failed_email_domains = Set.new([another huge array])
final_check = []
unchecked_account_domains.each do |email|
final_check << email if accounts_with_failed_email_domain.include?(email) # .include? on a set is in O(1) look up time
end
Convert the array of failed emails to a set (I think the Ruby command is .to_set, read about it in the Ruby docs). Then check each of the unchecked emails against the set using .include?.
The reason it runs forever is it runs through the entire or much of the list for each check. The set class ought to hash the list, making queries much much faster.
I am not a trained programmer, but I assist in developing/maintaining macros within our VBA-based systems to expedite various tasks our employees do manually. For instance, copying data from one screen to another. By hand, any instance of this could take 30 seconds to 2 minutes, but with a macro, it could take 2-3 seconds.
Most of the macros we develop rely on the ability to accurately pull data as displayed (not from its relative field!) based on a row/column format for each character. As such, we employ the use of a custom command (let's call it, say... Instance.Grab) that pulls what we need from the screen using row x/column y coordinates and the length of what we want to pull. Example, where the we would normally pull a 8 character string from coordinates 1,1:
dim PulledValue as String
PulledValue = Instance.Grab(1,1,8)
If I ran that code on my question so far, the returned value for our macro would have been "I am not"
Unfortunately, our systems are getting their displays altered to handled values of an increased character length. As such, the coordinates of the data we're pulling is getting altered significantly. Rather than go through our macros and change the coordinates and length manually in each macro (which would need to be repeated if the screen formats change again), I'm converting our macros so that any time they need to pull the needed string, we can simply change the needed coordinate/length from a central location.
My question is, what would be the best way to handle this task? I've thought of a few ideas, but want to maximize effectiveness and minimize the time I spend developing it, given my limited programming experience. For the sake of this, let's call what I need to make happen CoorGrab, and where an array is needed, make an array called CoorArray:
1) creating Public Function CoorGrab(ThisField As Variant) -if I did it this way, than I would simply list all the needed coordinate/length sets based on the variant I enter, then pull whichever set as needed using a 3 dimensional array. For instance: CoorGrab(situationA) would return CoorArray(5, 7, 15). This would be easy enough to edit for one of us who know something about programming, but if we're not around for any reason, there could be issues.
2) creating all the needed coordinates in public arrays in the module. I'm not overly familiar with how to implement this, but I think I read up on something called public constants? I kinda like this idea for its simplicity, but am hesitant to use any variable or array as public.
3) creating a .txt file in a shared drive that has all the needed data and a label to identify them, and save it to a shared drive that any terminal can access when running these macros. This would be the easiest for a non-programmer to jump in and edit in case I or one of our other programming-savvy employees aren't available, but it seems like far more work than is needed, and I fear what could happen if the .txt file got a type or accidentally deleted.
Any thoughts on how I should proceed? Are one of the above options inherently better/easier than the others? Or is there another way to handled this situation that I didn't cover? Any info or advice you all can provide would be greatly appreciated!
8/2/15 Note - Should probably mention the VBA is used as part of a terminal emulator with custom applications for the needs of our department. I don't manage the emulator or its applications, nor do I have system admin access; I just create/edit macros used within it to streamline some of the ways our users handle their workloads. Of the three of us who do this, I'm the least skilled at programming, but also the only on who could be pulled that could update them before the changes take effect.
Your way is not so bad, I would:
Use a string as a label as parameter for CoorGrab
Return a range instead of a string (because you can use a single cell range as text and you keep a trace where your data is)
public CoorGrab(byval label as string) as range
Create an Excel Sheet with 3 rows: 1 = label, 2 = x, 3 = y (you could
add a 4 if you need to search in an other sheet)
CoorGrab() Find the label in the Excel Sheet and return X / Y
If developers aren't availables, they just have to edit the Excel sheet.
You could too create and outsource Excel File to read coordinates outside the local file, or use it to update files of everybody (Read file from server, add/update all label in the server file but not in local file)
I have a vector of numbers like this:
myVec= [ 1 2 3 4 5 6 7 8 ...]
and I have a custom function which takes the input of one number, performs an algorithm and returns another number.
cust(1)= 55, cust(2)= 497, cust(3)= 14, etc.
I want to be able to return the number in the first vector which yielded the highest outcome.
My current thought is to generate a second vector, outcomeVec, which contains the output from the custom function, and then find the index of that vector that has max(outcomeVec), then match that index to myVec. I am wondering, is there a more efficient way of doing this?
What you described is a good way to do it.
outcomeVec = myfunc(myVec);
[~,ndx] = max(outcomeVec);
myVec(ndx) % input that produces max output
Another option is to do it with a loop. This saves a little memory, but may be slower.
maxOutputValue = -Inf;
maxOutputNdx = NaN;
for ndx = 1:length(myVec)
output = myfunc(myVec(ndx));
if output > maxOutputValue
maxOutputValue = output;
maxOutputNdx = ndx;
end
end
myVec(maxOutputNdx) % input that produces max output
Those are pretty much your only options.
You could make it fancy by writing a general purpose function that takes in a function handle and an input array. That method would implement one of the techniques above and return the input value that produces the largest output.
Depending on the size of the range of discrete numbers you are searching over, you may find a solution with a golden section algorithm works more efficiently. I tried for instance to minimize the following:
bf = -21;
f =#(x) round(x-bf).^2;
within the range [-100 100] with a routine based on a script from the Mathworks file exchange. This specific file exchange script does not appear to implement the golden section correctly as it makes two function calls per iteration. After fixing this the number of calls required is reduced to 12, which certainly beats evaluating the function 200 times prior to a "dumb" call to min. The gains can quickly become dramatic. For instance, if the search region is [-100000 100000], golden finds the minimum in 25 function calls as opposed to 200000 - the dependence of the number of calls in golden section on the range is logarithmic, not linear.
So if the range is sufficiently large, other methods can definitely beat min by requiring less function calls. Minimization search routines sometimes incorporate such a search in early steps. However you will have a problem with convergence (termination) criteria, which you will have to modify so that the routine knows when to stop. The best option is probably to narrow the search region for application of min by starting out with a few iterations of golden section.
An important caveat is that golden section is guaranteed to work only with unimodal regions, that is, displaying a single minimum. In a region containing multiple minima it's likely to get stuck in one and may miss the global minimum. In that sense min is a sure bet.
Note also that the function in the example here rounds input x, whereas your function takes an integer input. This means you would have to place a wrapper around your function which rounds the input passed by the calling golden routine.
Others appear to have used genetic algorithms to perform such a search, although I did not research this.