I am trying to look for specific characters in an array and print the output into an excel sheet in the same order (i.e if there are elements in between without a match it is left blank).
I used the following code within the loop:
EDIT:
[num,txt,~] = xlsread('protein-peptides.xls')
for i=1:size(txt)
str(i)=txt(i)
expression='\w*Pyro-glu from E\w*';
matchStr(i)=regexp(str(i),expression,'match','once');
ArrayOfStrings=vertcat(matchStr{:});
end
After the loop:
xlswrite(filename,ArrayOfStrings,1);
And the output is like below.
1) The elements without a match are not shown as blank
2) Each word of the match is displayed in a different cell.
P y r o - g l u f r o m E
P y r o - g l u f r o m E
P y r o - g l u f r o m E
P y r o - g l u f r o m E
How do I get the blank spaces left out in the matrix and have the entire matching phrase in a single cell in the output?
I tried concatenation of cells but that is printing all the output in a single row but still each character in different cells
I know you'd probably prefer to use xlswrite, but I wasn't able to get this to work (at least on OS X). It appears that xlswrite automatically strips out empty cells, as you observed. It also appears to be spreading each cell over multiple columns, which is bizarre and different from the behavior I remember. I wonder if there was a recent update (I'm using R2015b).
I was able to get this to work using simple fprintf calls to write a CSV file, which can be opened in Excel. Note that the termination character (\r) is critical; empty cells do not seem to be preserved in Excel if this is replaced by a linefeed. I also refactored your code a bit.
% Load data from Excel file
[~, txt, ~] = xlsread('protein-peptides.xls');
% Perform analysis
expression='\w*Pyro-glu from E\w*';
for i = 1:length(txt)
matches(i, 1) = regexp(txt(i), expression, 'match', 'once');
end
% Write data to CSV file
fid = fopen('test.csv', 'w+');
for i = 1:length(matches)
fprintf(fid, '%s\r', matches{i});
end
fclose(fid);
Input file rows
Pyro-Flu from E
test
Pyro-Flu from E
test
Output file rows
Pyro-glu from E
Pyro-glu from E
Related
I have been trying to use the Magma Calculator: http://magma.maths.usyd.edu.au/calc/
Given a word u, in a finitely presented group, how do I declare g to be the group element represented by u?
Context:
I can define a group via a finite presentation. For example using the code:
G<a,b,c> := Group< a,b,c | a^2=b^2, b^2=c^2, c=a*b >;
If I then ask for the order of the group:
Order (G);
The correct answer of 8 is returned, so it does understand what G is.
However I want to know how to ask if two elements of the group are equal.
The problem is that a,b,c as well as G.1, G.2, G.3 denote elements of the free group generated by a,b,c. Similarly products of those symbols (and their inverses) represent words in the free group.
Thus
a*a^-1 eq a^-1*a;
returns true, as it is true in the free group, but
a^2 eq c^2;
returns false, even though it is true in the group G.
This is explained in https://magma.maths.usyd.edu.au/magma/handbook/text/851.
In the section "Operations on words" it says that:
"u eq v : Returns true if the words u and v are identical when freely reduced. NB When G is not a free group and false is returned, this does not imply that u and v do not represent the same element of G."
However Magma can do operations with group elements, including answering if two elements g,h, are the same:
g eq h;
The question is then, given a word u, in a finitely presented group, how do I declare g to be the group element represented by u?
Following the anwer by #Userulli I typed
G<a,b,c> := Group< a,b,c | a^2=b^2, b^2=c^2, c=a*b >;
u := a^2;
v := c^2;
g := ElementToSequence(G, u);
h := ElementToSequence(G, v);
g eq h;
and got the reply
>> g := ElementToSequence(G, u);
^
Runtime error in 'ElementToSequence': Bad argument types
Argument types given: GrpFP, GrpFPElt
>> h := ElementToSequence(G, v);
^
Runtime error in 'ElementToSequence': Bad argument types
Argument types given: GrpFP, GrpFPElt
>> g eq h;
^
User error: Identifier 'g' has not been declared or assigned
Have you tryied using the ElementToSequence method?
For example :
u := a^2;
g := ElementToSequence(u, G);
Then you can compare g with other elements in the group to see if they are the same:
v := c^2;
h := ElementToSequence(v, G);
h eq g; // this should return true
EDIT:
Stating the doc magma.maths.usyd.edu.au/magma/handbook/text/208 you need to use fields instead, so you should convert the group into a field, and then use ElementToSequence method to compare the two sequences?
My prototype here :
the sheet where the formula is place in cell B2
I have this query working but the "where" clause is not optimized by an array if it's possible.
=IFERROR(QUERY(F:N, "SELECT F WHERE G CONTAINS '"&A2&"' OR H CONTAINS '"&A2&"' OR I CONTAINS '"&A2&"' OR J CONTAINS '"&A2&"' OR K CONTAINS '"&A2&"' OR L CONTAINS '"&A2&"' OR M CONTAINS '"&A2&"' OR N CONTAINS '"&A2&"'"),"")
Is there a formula to remove all OR clause by an array ?
I tried with no success:
SELECT ArrayFormula(textjoin(", ",TRUE,("Col"&row(indirect("A"&F1&":A"&O1)))))
instead of using OR like:
=IFERROR(QUERY(F:N,
"SELECT F
WHERE G CONTAINS '"&A2&"'
OR H CONTAINS '"&A2&"'
OR I CONTAINS '"&A2&"'
OR J CONTAINS '"&A2&"'
OR K CONTAINS '"&A2&"'
OR L CONTAINS '"&A2&"'
OR M CONTAINS '"&A2&"'
OR N CONTAINS '"&A2&"'"))
you can do:
=IFERROR(QUERY({F:N, FLATTEN(QUERY(TRANSPOSE(F:N),,9^9))},
"select Col1
where Col10 contains '"&A2&"'", ))
It's not necessarily more efficient if the case is simple, but if it's a complicated or long code you could design flags that resemble certain situations and just compare the array of the wanted outcome to the array that you actually get.
I have a custom function in Google Sheets. I pass as parameters a string, a range and a number N (an index) and it looks up the string on the range, then it returns me the value of cell N positions away from my found item. So in the table below:
A B C D
1 z y w v
2 q w e r
3 i d e a
4 s t a r
if I run =myfunc('q',A:D,2) it returns me 'e'. Not too different from a vlookup() but mine searches everywhere in the table (not just the first column) and can return values to the left of the found value, like =myfunc('v', A:D, -2) returns 'y'.
Now I'm coding error handling so if the index to return is out of range, it throws an error with this code throw new Error( "Item out of range." );. As I don't return this, Google Sheet identifies as a spreadsheet error, like I want.
My problem is this: when I use myfunc inside an ArrayFormula() it works fine, unless there is one value out of range, then Google apps throws an error to all the rows in the column. I made a simpler prototype below:
function testError(range) {
var cellRef;
if (range.map) {
return range.map(function (x) {
cellRef = SpreadsheetApp.getActive().getRange(x).getValue();
//(*) below is the problem!!!
if (cellRef == 'q') throw new Error("Error on purpose on cell A2");
return cellRef;
});
}
else {
cellRef = SpreadsheetApp.getActive().getRange(range).getValue();
// (**) below works fine!!
if (cellRef == 'q') throw new Error("Error on purpose on cell A2");
return cellRef;
}
}
So, in my sheets when I type it like this =testerror(address(row(A:A),column(A:A))) in all the rows (from any column, because I'm referring to column(A:A) here, the second row (referring to cell A2) shows an Error, like intended (**). But if I type this =ArrayFormula(testerror(address(row(A:A),column(A:A)))) in the first row of any column, nothing will show up as there will be a single error to all the column (*).
To ilustrate, cell E1, E2, E3 and E4 have the following formula: =testerror(address(row(),column(A:A))) and F1 has the following =ArrayFormula(testerror(address(row(A:A),column(A:A)))). Below is the result:
A B C D E F
1 z y w v z #ERROR
2 q w e r #ERROR
3 i d e a i
4 s t a r s
As you see, the results of column F didn't populate, because it threw one error. For my array part, if I substitute that line (*) for something like this:
if (cellRef == 'q') return "String for ERROR on A2"; the result is:
A B C D E F
1 z y w v z z
2 q w e r #ERROR "String for ERROR on A2"
3 i d e a i i
4 s t a r s s
Now it doesn't break my map() but that's because I returned a string. So, how do I throw an error to that cell only when using ArrayFormula()?
I have a set of 18 Stata data files (one per year) whose names are:
{aindresp.dta, bindresp.dta, ... , rindresp.dta}
I want to eliminate some variables from each dataset. For this, I want to use the fact that many variables across dataset have the same name, plus a prefix given by the dataset prefix (a, b, c, ... r). For example, the variable rach12 is called arach12 in dataset aindresp.dta, etc. Thus, to clean each dataset, I run a loop like the following:
clear all
local list a b c d e f g h i j k l m n o p q r
foreach var of local list {
use `var'indresp.dta
drop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3 `var'xpchcf var'xpchc
save `var'indresp.dta, replace
}
The actual loop is much larger. I am deleting around 200 variables.
The problem is that some variables change name over time, or disappear after a few years. Other variables are added. Therefore, the loop stops as soon as a variable is not found. This is because the drop command in Stata stops. Yet, that command has no option to force it to continue.
How can I achieve my goal? I would not like to go manually over each dataset.
help capture
You can just put capture in front of the drop. You can just keep going, but a little better would be to flag which datasets fail.
In this sample code, I've presumed that there is no point to the save, replace if you didn't drop anything. The basic idea is that a failure of a command results in a non-zero error code accessible in _rc. This will be positive (true) if there was a failure and zero (false) otherwise.
A more elaborate procedure would be to loop over the variables concerned and flag specific variables not found.
clear all
local list a b c d e f g h i j k l m n o p q r
foreach var of local list {
use `var'indresp.dta
capture drop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3 `var'xpchcf var'xpchc
if _rc {
noisily di "Note: failure for `var'indresp.data"
}
else save `var'indresp.dta, replace
}
See also Does Stata have any `try and catch` mechanism similar to Java?
EDIT:
If you want to drop whatever exists, then this should suffice for your problem.
clear all
local list a b c d e f g h i j k l m n o p q r
foreach var of local list {
use `var'indresp.dta
capture drop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3 `var'xpchcf var'xpchc
if _rc {
di "Note: problem for `var'indresp.data"
checkdrop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3
}
save `var'indresp.dta, replace
}
where checkdrop is something like
*! 1.0.0 NJC 1 April 2016
program checkdrop
version 8.2
foreach v of local 0 {
capture confirm var `v'
if _rc == 0 {
local droplist `droplist' `v'
}
else local badlist `badlist' `v'
}
if "`badlist'" != "" {
di _n "{p}{txt}variables not found: {res}`badlist'{p_end}"
}
if "`droplist'" != "" {
drop `droplist'
}
end
I am trying to remove the duplicates from 7 different columns and combine the unique values into one column and I can't find a way to do that using an Excel formula
I've tried the array approach below, but it doesn't work for for more than one column:
=INDEX($A$11:$A$100000, MATCH(0, COUNTIF($C$11:C11,$A$11:$A$100000), 0))
Here's what I'd like ideally:
Starting data:
Column 1: a b d c b i
Column 2: c g h f d c
Column 3: f e a g b a
Ending result:
a
b
c
d
e
f
g
h
i
...
(order not important)
Any solutions would be appreciated.
Not sure if this answers the question exactly, but you could try using COUNTIFS to identify rows where combinations of two or more columns contain duplicate values:
=COUNTIFS($B:$B,$B1,$C:$C,$C1)
This formula will return the number of rows where the value in B1 and C1 is duplicated. You can copy and paste it down to every row in your formula, or use it as an array formula.
There's more on how to do this here:
http://fiveminutelessons.com/learn-microsoft-excel/find-duplicate-rows-excel-across-multiple-columns