I'd like to create a loop or something cleaner to rbind all the coefficients from a rolling panel regressoin, without knowing how many rows there may be (i.e if i change the window). So I'd like a shortcut for the the temp code below, without writing out all the "mod"s that could be in the hundreds. Thanks.
temp <- rbind(mod[[1]]$coefficients, mod[[2]]$coefficients)
temp <- rbind(temp, mod[[3]]$coefficients)
Related
Suppose I have the following array:
a <- sample(letters,100,replace=TRUE)
Then suppose those letters are ordered in a sequence, I want to extract all possible 'n' sized sequences from that array. For example:
For n=2 I would do: paste0(a[1:99],"->",a[2:100])
for n=3 I would do: paste0(a[1:98],"->",a[2:99],"->",a[3:100])
you get the point. Now, my goal is to create a function that would take as input n and would give me back the corresponding set of sequences of the given length from array a
I was able to do it using loops and all that but I was hoping for a high performance one liner.
I am a bit new to R so I'm not aware of all existing functions.
You can use embed. For embed(a, 3), this gives a matrix with columns
a[3:100]
a[2:99]
a[1:98]
in that order.
To reverse the column order use matrix syntax m[rows, cols]:
res = embed(a, 3)[, 3:1]
If you want arrows printed between the columns, then
do.call(paste, c(split(res, col(res)), sep = " -> "))
is one way. This is probably better than apply(res, 1, something), performance-wise, since this is vectorized while apply would loop over rows.
As pointed out by #DavidArenburg, this can similarly be done with data.table:
library(data.table)
do.call(paste, c(shift(a, 2:0), sep = " -> "))[-(1:2)]
shift is like embed, except it ...
returns a list instead of a matrix, so we don't need to split by col to paste
pads with missing values to keep the full length, so we need to drop with -(1:2)
I was hoping to say something useful about how to find obscure functions in R, but came up mostly blank on how embed might be found. Maybe...
Go to any HTML help page
Click the "Index" hyperlink at the bottom
Read every single page
?
Calling all computer scientists - I need your expert advice :)
Here's my problem:
I have a mapping application, and I've divided the world into 10 million possible squares of fixed size(latitude/longitude, ie double/double data type). Let's call that data set D1.
A second set of data, call it D2, is around 20,000 squares of the same size (latitude/longitude, or double/double data type), and represents locations of interest in my app.
When the user zooms in far enough, I want to display all the squares of interest that are in the present view, but not the ones outside the view, because that's way too many for the app to handle (generating overlays, etc.) without getting completely bogged down.
So rather than submitting 20,000 overlay squares for rendering and letting the Mapkit framework manage what gets shown (it completely chokes on that much data), here are a few things I've tried to optimize performance:
1) Put D2 in an array. Iterate through every possible visible square on my view, and for each possible square do a lookup in D2 (using Swift's find() function) to see if the corresponding element exists in that array. If it exists, display it. This is really slow -> if my view has an area of 4000 squares viewable, I have to check 4000 squares * 20000 points in the array = up to 80 million lookups = SLOW..
2) Put D2 in an array. Iterate through D2 and for each element in D2, check if that element is within the bounds of my view. If it is, display it. This is better than #1 (only takes 10% of the time of #1) but still on the slow side
3) Put D2 in an array. Iterate through D2 and create a new array D3 which filters out (using Swift's array.filter() method with a closure) all datapoints outside the view, then submit just those points for rendering. This is fastest (about 2% of original time of #1) but still too slow (depending on the data pattern, can still take several seconds of processing on an iphone4).
I have been thinking about dictionaries or other data structures. Would I expect a dictionary populated with key=(latitude,longitude) and value = (true/false) to be faster to look up than an array? I'm thinking if a map view with bounds y2, y1, x2, x1, I could do a simple for{} loop to find all the dictionary entries in those bounds with value = true (or even no value; all I'd really need is something like dictionarydata.exists(x,y), unless a value is absolutely required to build a dictionary). This would be much faster, but again it depends on how fast a dictionary is compared to an array.
Long story short: is searching through a large dictionary for a key a lot faster than searching through an array? I contemplated sorting my array and building a binary search as a test, but figured dictionaries might be more efficient. Since my array D2 will be built dynamically over time, I'd rather commit much more time/resources per addition (which are singular in nature) in order to absolutely maximize lookup performance later (which are orders of magnitude more data to sift through).
Appreciate any/all advice - are dictionaries the way to go? Any other suggestions?
Riffing off of the examples in the comments here. Create a two dimensional array sorted on both the X and Y axes. Then do a binary search to find the NW corner and the SE corner elements. Then all the elements in the box formed by those two corners need to be displayed.
I'm at a loss for what to do in my program (written in C). There is a large matrix of numbers (an image) that I am processing. The processing happens one line at a time, with reference to the previous line, so I only need to access two lines of numbers at a time.
Originally, I tried a 2 by X array but once I save the information for the third line, the array is upside down with the third line of the image in the first row of the array and the second line of the image in the second row of the image.
Is there a better way to correct this other than simply copying the second row of the array to the first row? Maybe it wouldn't be so bad, but I would imagine doing that for every other line of the image would be expensive. Should I use two pointers on the image instead?
I apologize if this is a common thing that can be easily looked up but I couldn't figure out how to begin looking. If anyone needs clarification, let me know. Thank you very much!
Diagram of what numbers I need access to:
http://www.gliffy.com/go/publish/5968966
I suppose that you are processing the image as you read it, or as you decompress it, or some such, for if you already had the whole thing in memory in usable form then you would just use that.
I see two reasonably good alternatives:
Instead of hard-coding the indices of the earlier and later lines in your 2 by x array, use variables to track which row contains which line.
Use a 1-D array for each line, and use pointers to track which one contains the current line and which one contains the previous line.
(Though really, those boil down to pretty much the same thing.) Either way, you can avoid needless copying.
Let's assume you have:
struct rbg_t bitmap[X][Y];
To get a window of dimensions X by 2, it is like this:
struct rgb_t *first_line = bitmap[0], *second_line=bitmap[1];
Then you can process the two lines like so:
for(int i=0;i<X;++i)
{
do_work(first_line[i], second_line[i]);
}
To shift the window down by one, you can do this:
first_line+=sizeof(struct rgb_t)*X;
second_line+=sizeof(struct rgb_t)*X;
Where X is the width of the bitmap
In my excel document I have two sheets. The first is a data set and the second is a matrix of the relationship between two of the variables in my data set. Each possibility of the variable is a column in my matrix. I'm trying to get the sum of the products of the elements in two different arrays. Right now I'm using the formula {=SUM(N3:N20 * F3:F20)} and manually changing the columns each time. But my data set is over 800 items...
Ideally I'd like to know how to write a program that reads the value of the variable in my dataset looks up the correct columns in the matrix, multiplies them together, sums the products, and puts the result in the correct place in my data set. However, just knowing the result for all the possible combinations of columns would also save me alot of time. Its an 18x18 matrix. Thanks for any feedback!
Your question is a little bit ambiguous but as far as i understand your question you want to multiply different sets of two columns in the same sheet and put their result into the next sheet, is it so? if so, please post images of your work (all sheets). Your answer is possible even in Excel only without any vba code, thanks.
you can also use =SUMPRODUCT(N3:N20,F3:F20) for your formula instead of {=SUM(N3:N20 * F3:F20)}
I am using Matlab for some data collection, and I want to save the data after each trial (just in case something goes wrong). The data is organized as a cell array of cell arrays, basically in the format
data{target}{trial} = zeros(1000,19)
But the actual data gets up to >150 MB by the end of the collection, so saving everything after each trial becomes prohibitively slow.
So now I am looking at opting for the matfile approach (http://www.mathworks.de/de/help/matlab/ref/matfile.html), which would allow me to only save parts of the data. The problem: this doesn't support cells of cell arrays, which means I couldn't change/update the data for a single trial; I would have to re-save the entire target's data (100 trials).
So, my question:
Is there another different method I can use to save parts of the cell array to speed up saving?
(OR)
Is there a better way to format my data that would work with this saving process?
A not very elegant but possibly effective solution is to use trial as part of the variable name. That is, use not a cell array of cell arrays (data{target}{trial}), but just different cell arrays such as data_1{target}, data_2{target}, where 1, 2 are the values of the trial counter.
You could do that with eval: for example
trial = 1; % change this value in a for lopp
eval([ 'data_' num2str(trial) '{target} = zeros(1000,19);']); % fill data_1{target}
You can then save the data for each trial in a different file. For example, this
eval([ 'save temp_save_file_' num2str(trial) ' data_' num2str(trial)])
saves data_1 in file temp_save_file_1, etc.
Update:
Actually it does appear to be possible to index into cell arrays, just not iside cell arrays. Hence, if you store your data slightly differently it seems like you can use matfile to update only part of it. See this example:
x = cell(3,4);
save x;
matObj = matfile('x.mat','writable',true);
matObj.x(3,4) = {eye(10)};
Note that this gives me a version warning, but it seems to work.
Hope this does the trick. However, still look into the next part of my answer as it may help you even more.
For calculations it is usually not required to save to disk after every iteration. An easy way to get a speedup (at the cost of a little more risk) is to save only after every n trials.
Like this for example:
maxTrial = 99;
saveEvery = 10;
for trial = 1:maxTrial
myFun; %Do your calculations here
if trial == maxTrial || mod(trial, saveEvery) == 0
save %Put your save command here
end
end
If your data is always at (or within) a certain size, you can also choose to store your data in a matrix rather than a cell array, then you can use indexing to save only part of the file.
In response to #Luis I will post an other way to deal with the situation.
It is indeed an option to save data in named variables or files, but to save a named variable in a named file seems too much.
If you only change the name of the file, you can save everything without using eval:
assuming you are dealing with trial 't':
filename = ['temp_save_file_' + num2str(t)];
If you really want, you can use print commands to write it as 001 for example.
Now you can simply use this:
save(filename, myData)
To use this, construct the filename again and so something like this:
totalData = {}; %Initialize your total data
And then read them as you wrote them (inside a loop):
load(filename)
totalData{t} = myData