Convert tabulated column to array - arrays

My input is copied from a HTML table and looks like this in text format:
1 2 3
4 5 6
(imagine 'tabs' instead of the spaces)
The String would become then:
1\t\2\t3\r\n4\t5\t6
How can I create an array so that:
myArray(0,0) returns 1
myArray(0,1) returns 2
myArray(1,0) returns 4
I have tried this:
String input = Clipboard.GetText();
String[] content = input.Split(("\t").ToCharArray());
but this creates an array with the following elements:
1
2
3\r\n
4
5
6
-- Thank you... --

Since you know how to split it at the tabs, you can split it at the line break too.
stringIs = 1\t\2\t3\r\n4\t5\t6
firstSplit = stringIs.split("\r\n");
Now you have an array with two elements.
firstSplit[0] = "1\t\2\t3";
firstSplit[1] = "4\t5\t6";
So just split those the same way into a new array.

Related

Splitting two strings with matching indexes into multiple rows in SAS

I have imported some SQL data into SAS and am trying to figure out how to split two strings with matching indexes into multiple rows using a data statement. I've seen several examples here of how to do this with one string at a time, but not two parallel strings. An example of my problem is below:
HAVE
ID TIME_ARRAY RESPONSE_ARRAY
1 15:23,13:00,12:02 3,4,2
2 17:03,11:07,19:05 1,2,3
3 15:59,10:34,12:12 4,1,2
WANT
ID TIME RESPONSE
1 15:23 3
1 13:00 4
1 12:02 2
2 17:03 1
2 11:07 2
2 19:05 3
3 15:59 4
3 10:34 1
3 12:12 2
As you can see, the index of the elements in TIME_ARRAY matches the index of the elements in RESPONSE_ARRAY.
Apologies if the problem is unclear, am still a noob with this type of thing.
Any help is much appreciated!
Cheers,
Sean
The multiple string solution isn't particularly different from the one string solution. Just have one loop and chop both off using the same array indicator.
data want;
set have;
do _i = 1 to countw(time_array,',');
time = scan(time_array,_i,',');
response = scan(response_array,_i,',');
output;
end;
keep id time response;
run;
You probably also want to convert those values into numbers once you get them separated out from the string they are in.
You can use the INPUT() function to do that. So building on the code from Joe's answer you get something like this.
data want;
set have;
length time response 8;
format time time. ;
do _i = 1 to countw(time_array,',');
time = input(scan(time_array,_i,','),time.);
response = input(scan(response_array,_i,','),32.);
output;
end;
keep id time response;
run;

Print words from the corresponding line numbers

Hello Everyone,
I have two files File1 and File2 which has the following data.
File1:
TOPIC:topic_0 30063951.0
2 19195200.0
1 7586580.0
3 2622580.0
TOPIC:topic_1 17201790.0
1 15428200.0
2 917930.0
10 670854.0
and so on..There are 15 topics and each topic have their respective weights. And the first column like 2,1,3 are the numbers which have corresponding words in file2. For example,
File 2 has:
1 i
2 new
3 percent
4 people
5 year
6 two
7 million
8 president
9 last
10 government
and so on.. There are about 10,470 lines of words. So, in short I should have the corresponding words in the first column of file1 instead of the line numbers. My output should be like:
TOPIC:topic_0 30063951.0
new 19195200.0
i 7586580.0
percent 2622580.0
TOPIC:topic_1 17201790.0
i 15428200.0
new 917930.0
government 670854.0
My Code:
import sys
d1 = {}
n = 1
with open("ap_vocab.txt") as in_file2:
for line2 in in_file2:
#print n, line2
d1[n] = line2[:-1]
n = n + 1
with open("ap_top_t15.txt") as in_file:
for line1 in in_file:
columns = line1.split(' ')
firstwords = columns[0]
#print firstwords[:-8]
if firstwords[:-8] == 'TOPIC':
print columns[0], columns[1]
elif firstwords[:-8] != '\n':
num = columns[0]
print d1[n], columns[1]
This code is running when I type print d1[2], columns[1] giving the second word in file2 for all the lines. But when the above code is printed, it is giving an error
KeyError: 10472
there are 10472 lines of words in the file2. Please help me with what I should do to rectify this. Thanks in advance!
In your first for loop, n is incremented with each line until reaching a final value of 10472. You are only setting values for d1[n] up to 10471 however, as you have placed the increment after you set d1 for your given n, with these two lines:
d1[n] = line2[:-1]
n = n + 1
Then on the line
print d1[n], columns[1]
in your second for loop (for in_file), you are attempting to access d1[10472], which evidently doesn't exist. Furthermore, you are defining d1 as an empty Dictionary, and then attempting to access it as if it were a list, such that even if you fix your increment you will not be able to access it like that. You must either use a list with d1 = [], or will have to implement an OrderedDict so that you can access the "last" key as dictionaries are typically unordered in Python.
You can either:
Alter your increment so that you do set a value for d1 in the d1[10472] position, or simply set the value for the last position after your for loop.
Depending on what you are attempting to print out, you could replace your last line with
print d1[-1], columns[1]
to print out the value for the final index position you currently have set.

summing & matching cell arrays of different sizes

I have a 4016 x 4 cell, called 'totalSalesCell'. The first two columns contain text the remaining two are numeric.
1st field CompanyName
2nd field UniqueID
3rd field NumberItems
4th field TotalValue
In my code I have a loop which goes over the last month in weekly steps - i.e. 4 loops.
At each loop my code returns a cell of the same structure as totalSalesCell, called weeklySalesCell which generally contains a different number of rows to totalSalesCell.
There are two things I need to do. First if weeklySalesCell contains a company that is not in totalSalesCell it needs to be added to totalSalesCell, which I believe the code below will do for me.
co_list = unique([totalSalesCell(:, 1); weeklySalesCell (:, 1)]);
index = ismember(co_list, totalSalesCell(:, 1));
new_co = co_list(index==0, :);
totalSalesCell = [totalSalesCell; new_co];
The second thing I need to do and am unsure of the best way of going about it is to then add the weeklySalesCell numeric fields to the totalSalesCell. As mentioned the cells will 90% of the time have different row numbers so cannot apply a simple addition. Below is an example of what I wish to achieve.
totalSalesCell weeklySalesCell Result
co_id sales_value co_id sales_value co_id sales_value
23DFG 5 DGH84 3 23DFG 5
DGH84 6 ABC33 1 DGH84 9
12345 7 PLM78 4 ABC33 1
PLM78 4 12345 3 12345 10
KLH11 11 PLM78 8
KLH11 11
I believe the following codes must take care of both of your tasks -
[x1,x2] = ismember(totalSalesCell(:,1),weeklySalesCell(:,1))
corr_c2 = nonzeros(x1.*x2)
newval = cell2mat(totalSalesCell(x1,2)) + cell2mat(weeklySalesCell(corr_c2,2))
totalSalesCell(x1,2) = num2cell(newval)
excl_c2 = ~ismember(weeklySalesCell(:,1),totalSalesCell(:,1))
out = vertcat(totalSalesCell,weeklySalesCell(excl_c2,:)) %// desired output
Output -
out =
'23DFG' [ 5]
'DGH8444' [ 9]
'12345' [10]
'PLM78' [ 8]
'KLH11' [11]
'ABC33' [ 1]

Check 2d array for same values

I am trying to make a game and i have a 2d array
So its like this:
Grid[x][y]
lets pretend these values are in it:
Column 1 Column 2 Column 3 Column 4 Column 5
1 2 5 2 5
2 2 3 1 1
1 4 3 4 5
1 3 3 3 5 <-- match this row
3 5 3 4 5
2 4 3 4 5
2 4 4 4 5
In the middle (index 4) i want to check if there are at least 3 times the same number and what about if there are 4 times the same or even 5.
How do you check this ? What would be a good way to find the same and delete those that are the same... I am stuck to figure out the logic to make something like this
this is what i tried:
grid = {}
for x = 1, 5 do
grid[x] = {finish = false}
for y = 1, 7 do
grid[x][y] = {key= math.random(1,4)}
end
end
function check(t)
local tmpArray = {}
local object
for i = 1,5 do
object = t[i][1].key
if object == t[i+1][1].key then
table.insert( tmpArray, object )
else
break
end
end
end
print_r(grid)
check(grid)
print_r(grid)
where print_r prints the grid:
function print_r ( t )
local print_r_cache={}
local function sub_print_r(t,indent)
if (print_r_cache[tostring(t)]) then
print(indent.."*"..tostring(t))
else
print_r_cache[tostring(t)]=true
if (type(t)=="table") then
for pos,val in pairs(t) do
if (type(val)=="table") then
print(indent.."["..pos.."] => "..tostring(t).." {")
sub_print_r(val,indent..string.rep(" ",string.len(pos)+8))
print(indent..string.rep(" ",string.len(pos)+6).."}")
else
print(indent.."["..pos.."] => "..tostring(val))
end
end
else
print(indent..tostring(t))
end
end
end
sub_print_r(t," ")
end
It doesnt work that great because i check with the index after that one and if that isnt the same it doesnt add the last one..
I dont know if it is the best way to go...
If i "delete" the matched indexes my plan is to move the index row above or beneath it into the 4 index row... but first things first
You should compare the second index not the first: in the table
g = {{1,2,3}, {4,5,6}}
g[1] is first row i.e. {1,2,3}, not {1,4} the first column (first element of first and second rows). You were doing same thing in previous post of yours, you should reread the Lua docs about tables. You should do something like
for i = 1,#t do
object = t[i][1].key
if object == t[i][2].key then
This will only compare first two items in row. If you want to check whether the row has any identical consecutive items you will have to loop over the second index from 1 to #t[i]-1.
You might find the following print function much more useful, as it prints table as a grid, easier to see before/after:
function printGrid(g)
for i, t in ipairs(g) do
print('{' .. table.concat(t, ',') .. '}')
end
end

Changing indices and order in arrays

I have a struct mpc with the following structure:
num type col3 col4 ...
mpc.bus = 1 2 ... ...
2 2 ... ...
3 1 ... ...
4 3 ... ...
5 1 ... ...
10 2 ... ...
99 1 ... ...
to from col3 col4 ...
mpc.branch = 1 2 ... ...
1 3 ... ...
2 4 ... ...
10 5 ... ...
10 99 ... ...
What I need to do is:
1: Re-order the rows of mpc.bus, such that all rows of type 1 are first, followed by 2 and at last, 3. There is only one element of type 3, and no other types (4 / 5 etc.).
2: Make the numbering (column 1 of mpc.bus, consecutive, starting at 1.
3: Change the numbers in the to-from columns of mpc.branch, to correspond to the new numbering in mpc.bus.
4: After running simulations, reverse the steps above to turn up with the same order and numbering as above.
It is easy to update mpc.bus using find.
type_1 = find(mpc.bus(:,2) == 1);
type_2 = find(mpc.bus(:,2) == 2);
type_3 = find(mpc.bus(:,2) == 3);
mpc.bus(:,:) = mpc.bus([type1; type2; type3],:);
mpc.bus(:,1) = 1:nb % Where nb is the number of rows of mpc.bus
The numbers in the to/from columns in mpc.branch corresponds to the numbers in column 1 in mpc.bus.
It's OK to update the numbers on the to, from columns of mpc.branch as well.
However, I'm not able to find a non-messy way of retracing my steps. Can I update the numbering using some simple commands?
For the record: I have deliberately not included my code for re-numbering mpc.branch, since I'm sure someone has a smarter, simpler solution (that will make it easier to redo when the simulations are finished).
Edit: It might be easier to create normal arrays (to avoid woriking with structs):
bus = mpc.bus;
branch = mpc.branch;
Edit #2: The order of things:
Re-order and re-number.
Columns (3:end) of bus and branch are changed. (Not part of this question)
Restore original order and indices.
Thanks!
I'm proposing this solution. It generates a n x 2 matrix, where n corresponds to the number of rows in mpc.bus and a temporary copy of mpc.branch:
function [mpc_1, mpc_2, mpc_3] = minimal_example
mpc.bus = [ 1 2;...
2 2;...
3 1;...
4 3;...
5 1;...
10 2;...
99 1];
mpc.branch = [ 1 2;...
1 3;...
2 4;...
10 5;...
10 99];
mpc.bus = sortrows(mpc.bus,2);
mpc_1 = mpc;
mpc_tmp = mpc.branch;
for I=1:size(mpc.bus,1)
PAIRS(I,1) = I;
PAIRS(I,2) = mpc.bus(I,1);
mpc.branch(mpc_tmp(:,1:2)==mpc.bus(I,1)) = I;
mpc.bus(I,1) = I;
end
mpc_2 = mpc;
% (a) the following mpc_tmp is only needed if you want to truly reverse the operation
mpc_tmp = mpc.branch;
%
% do some stuff
%
for I=1:size(mpc.bus,1)
% (b) you can decide not to use the following line, then comment the line below (a)
mpc.branch(mpc_tmp(:,1:2)==mpc.bus(I,1)) = PAIRS(I,2);
mpc.bus(I,1) = PAIRS(I,2);
end
% uncomment the following line, if you commented (a) and (b) above:
% mpc.branch = mpc_tmp;
mpc.bus = sortrows(mpc.bus,1);
mpc_3 = mpc;
The minimal example above can be executed as is. The three outputs (mpc_1, mpc_2 & mpc_3) are just in place to demonstrate the workings of the code but are otherwise not necessary.
1.) mpc.bus is ordered using sortrows, simplifying the approach and not using find three times. It targets the second column of mpc.bus and sorts the remaining matrix accordingly.
2.) The original contents of mpc.branch are stored.
3.) A loop is used to replace the entries in the first column of mpc.bus with ascending numbers while at the same time replacing them correspondingly in mpc.branch. Here, the reference to mpc_tmp is necessary so ensure a correct replacement of the elements.
4.) Afterwards, mpc.branch can be reverted analogously to (3.) - here, one might argue, that if the original mpc.branch was stored earlier on, one could just copy the matrix. Also, the original values of mpc.bus are re-assigned.
5.) Now, sortrows is applied to mpc.bus again, this time with the first column as reference to restore the original format.

Resources