Append data to new cell arrays using Token - arrays

I have a problem I couldnt solve it. I have a set of data present in lines (generally text that are organized in number of sentences)
Example of my text in sentence:
1. Hello, world, It, is, beautiful, to, see, you, all
2. ,Wishing, you, happy, day, ahead
I am using the strtok
[token remain] = strtok(remain, ', ');
% token = strtrim(token);
CellArray {NumberOFCells} = token(1:end) ;
NumberOFCells= NumberOFCells+1;
I am using the CellArray to store the Token into the cells however what my code does is it takes the first sentences and put into cells and once it iterates to the second sentence and it deletes the pre-assigned cells thus it replaces it with token of the second sentences.
Expected Output
[ nxn ] [ nxn ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ......
'Hello' 'world' 'It' 'is' 'beautiful' 'to' see' 'you' 'all' 'Wishing' 'you' 'happy' 'day' 'ahead'
The question is how can I append the second sentence strings to the cells without clearing the pre-filled cells.
Thank you and looking forward to meet experts matlab programmer
My Code .. Ignore commented lines... Retrieved is basically in this form.
[Index,Retrieved] = system(['wn ' keyword type ]);
Retrieved;
arrowSymbol = ' => ';
CommaSymbol= ', '
NumberOfSense= 'Sense ';
% let's look for the lines with '=> ' only?
senses = regexp(Retrieved, [arrowSymbol '[\w, ]*\n '], 'match');
SplitIntoCell = regexp(senses, [CommaSymbol '[\w, ]*\n'], 'match');
% now, we take out the '=> ' symbol
for i = 1:size(senses, 2)
senses{i} = senses{i}(size(arrowSymbol,2):end);
SplitIntoCell{i}= SplitIntoCell{i}(size(CommaSymbol,2): end);
% SeperateCells= senses ([1:2 ; end-1:end]);
% SplitCellContentIntoSingleRows{i}= strtok (SeperateCells, ['\n' ])
numberCommas = size(regexp(senses{i}, CommaSymbol), 2);
remain = senses{i};
RestWord= SplitIntoCell{i};
NumberOFCells=1;
for j = 2:numberCommas + 1 + 1 % 1 for a word after last comma and 1 because starts at index 2
% RemoveCellComma= regexp (Conversion,',');
% CellArray = [CellArray strsplit(remain, ', ')];
% [str,~] = regexp(remain,'[^, \t]+', 'match', 'split');
% CellArray = [CellArray str];
% [token remain] = strtok(remain, ', ');
% token = strtrim(token);
% CellArray {NumberOFCells} = token(1:end) ;
%
% % CellArray =[CellArray strsplit(remain, ', ')]
% [str, ~]= regexp(remain,'[^, \t]+', 'match', 'split');
% CellArray = [CellArray str];
% NumberOFCells= NumberOFCells+1;
[token remain] = strtok(remain, ', ');
token = strtrim(token);
CellArray {NumberOFCells} = token;
NumberOFCells= NumberOFCells+1;
Retrieved=
cat, true cat
=> feline, felid
=> carnivore
=> placental, placental mammal, eutherian, eutherian mammal
=> mammal, mammalian
=> vertebrate, craniate
=> chordate
=> animal, animate being, beast, brute, creature, fauna
=> organism, being
=> living thing, animate thing
=> object, physical object
=> physical entity
=> entity

Your question is a little confusing, but reading it (and other comments) a couple of times, I think I understand what you're asking.
Eitan T is correct about using regexp for this, and when it comes to cell arrays, be careful of the difference in indexing/concatenation with [] and {}: see Combining Cell Arrays. Assuming your using a loop to go through each sentence, you can do something like:
CellArray = [CellArray strsplit(next_sentence, ', ')];
Using regexp (or it's case-insensitive alternative regexpi), try adding 'split' as another one of the function options, for example:
[str,~] = regexp(next_sentence,'[^, \t]+', 'match', 'split');
CellArray = [CellArray str];

Related

Booleans, arrays, and not typing 256 possible scenarios

I'm trying to make a program based around 8 boolean statements.
I build the array = [0,0,0,0,0,0,0,0];.
For each possible combination I need to make the program output a different text.
To make things simpler, I can remove any possibilities that contain less than 3 true statements.
For example: if (array === [1,1,1,0,0,0,0,0]){console.log('Targets: 4, 5, 6, 7')};
Is it possible to have it set so that if the value is false it's added to then end of "Targets: "? I'm very new to coding as a hobby and have only made 1 extensive program. I feel like {console.log("Targets: " + if(array[0]===0){console.log(" 1,")} + if(array[2]===0)...}would portay what I'm looking for but it's terrible as a code.
I'm sure that someone has had this issue before but I don't think I'm experienced enough to be searching with the correct keywords.
PS: I'd greatly appreciate it if we can stick to the very basics as I haven't had any luck with installing new elements other than discord.js.
This does what you need:
const values = [1,1,1,0,0,0,0,0];
const positions = values.map((v, i) => !v ? i : null).filter(v => v != null);
console.log('Target: ' + positions.join(', '));
In essence:
Map each value to its respective index if the value is falsy (0 is considered falsy), otherwise map it to null.
Filter out all null values.
Join all remaining indexes to a string.
To address your additional requirements:
const locations = ['Trees', 'Rocks', 'L1', 'R1', 'L2', 'R2', 'L3', 'R3'];
const values = [1,1,1,0,0,0,0,0];
const result = values.map((v, i) => !v ? locations[i] : null).filter(v => v != null);
console.log('Target: ' + result.join(', '));

Scala read only certain parts of file

I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))

Combining a set of reversable paths

I need some help recognizing this problem and finding a solution. I don't need someone to code the solution, just to say how to go about solving it.
An array of hashes, each hash containing one path, its ID, and its order (forward (F) or reverse (R))
Each path is initialized in the F position
my #paths = (
{ id => 1, path => [ A, B ], order => 'F' },
{ id => 2, path => [ C, D, E ], order => 'F' },
{ id => 3, path => [ E, B ], order => 'F' }
);
Each node or vertex of each path also has an orientation ( + or - )
my %plus_minus;
$plus_minus{1}{A} = '+';
$plus_minus{1}{B} = '+';
$plus_minus{2}{C} = '+';
$plus_minus{2}{D} = '-';
$plus_minus{2}{E} = '-';
$plus_minus{3}{E} = '-';
$plus_minus{3}{B} = '-';
You can reverse the order of a path ( e.g., [A, B] to [B, A] )
When you reverse order from F => R or R => F you also switch the orientation of each node in the path from + to - or - to +
The paths with orientations look like this:
A+ : B+
C+ : D- : E-
E- : B-
This is the problem input
For output, I'd like to know whether or not it is possible by reversing path orders to create a consensus path, and also what is the way to do this such that you are guaranteed to find the consensus path.
For example, if we reversed path 1 we'd get:
B- : A-
C+ : D- : E-
E- : B-
and the resulting consensus path would be:
C+ : D- : E- : B- : A-
But it's not clear to reverse path 1 first. For example, what if we reverse 3 first? So you can't proceed randomly.
Does anyone recognize this problem or know how to solve it?
What you're asking for isn't easy, and I'm not exactly clear about your requirements
This partial solution takes the brute-force approach of creating a directed graph, adding all the paths from your data and their reversals, and finding the longest path in the resulting data structure
Using your sample data, it produces the reverse of the consensus path that you expect, but according to your rules there will always be two equally valid answers if there are any at all, and because of the random nature of Perl hashes, either one may be presented as the result from one run to the next
If I have understood you correctly then you also need to ensure that the result contains all of the paths in the original data
use strict;
use warnings 'all';
use feature 'say';
use Graph::Directed;
use List::Util 'max';
use List::MoreUtils 'first_index';
my #paths = (
{ id => 1, path => [ qw[ A B ] ], order => 'F' },
{ id => 2, path => [ qw[ C D E ] ], order => 'F' },
{ id => 3, path => [ qw[ E B ] ], order => 'F' },
);
my %plus_minus;
$plus_minus{1}{A} = '+';
$plus_minus{1}{B} = '+';
$plus_minus{2}{C} = '+';
$plus_minus{2}{D} = '-';
$plus_minus{2}{E} = '-';
$plus_minus{3}{E} = '-';
$plus_minus{3}{B} = '-';
# index the array by ID
#
my %paths;
$paths{$_->{id}} = $_ for #paths;
# Incorporate the inexplicably separate plus-minus data
#
for my $id ( keys %plus_minus ) {
my $nodes = $plus_minus{$id};
for my $node ( keys %$nodes ) {
my $sign = $nodes->{$node};
my $nodes = $paths{$id}{path};
my $i = first_index { $_ eq $node } #$nodes;
die sprintf "Node $node not found in path ID $id" if $i < 0;
$nodes->[$i] .= $sign;
}
}
# Add the reverse paths to the hash:
# - Change the `order` field to `R` (original is reliably `F`)
# - Reverse the order of the elements of `path`
# - Reverse the sign of the elements of `path`
#
my $n = max map { $_->{id} } values %paths;
for my $path ( #paths ) {
my $nodes = $path->{path};
my $new_id = ++$n;
$paths{$new_id} = {
id => $new_id,
order => 'R',
path => [
map {
s/([+-])/ $1 eq '+' ? '-' : '+' /er or die;
} reverse #$nodes
],
};
}
# Build the directed graph
#
my $g = Graph::Directed->new;
for my $path ( values %paths ) {
my $nodes = $path->{path};
for my $i ( 0 .. $#$nodes - 1 ) {
$g->add_edge(#{$nodes}[$i, $i+1]);
}
}
# Report the longest path
#
say join ' : ', $g->longest_path;
output
C+ : D- : E- : B- : A-

Filtering a Cell Array with Recursion

I'm pretty close on this problem. What I have to do is filter out a cell array. The cell array can have a variety of items in it, but what I want to do is pull out the strings, using recursion. I am pretty close on this one. I just have an issue when the cells have spaces in them. This is what I should get:
Test Cases:
cA1 = {'This' {{{[1:5] true} {' '}} {'is '} false true} 'an example.'};
[filtered1] = stringFilter(cA1)
filtered1 => 'This is an example.'
cA2 = {{{{'I told '} 5:25 'her she'} {} [] [] ' knows'} '/take aim and reload'};
[filtered2] = stringFilter(cA2)
filtered2 => 'I told her she knows/take aim and reload'
Here is what I have:
%find the strings in the cArr and then concatenate them.
function [Str] = stringFilter(in)
Str = [];
for i = 1:length(in)
%The base case is a single cell
if length(in) == 1
Str = ischar(in{:,:});
%if the length>1 than go through each cell and find the strings.
else
str = stringFilter(in(1:end-1));
if ischar(in{i})
Str = [Str in{i}];
elseif iscell(in{i})
str1 = stringFilter(in{i}(1:end-1));
Str = [Str str1];
end
end
end
end
I tried to use 'ismember', but that didn't work. Any suggestions? My code outputs the following:
filtered1 => 'This an example.'
filtered2 => '/take aim and reload'
You can quite simplify your function to
function [Str] = stringFilter(in)
Str = [];
for i = 1:length(in)
if ischar(in{i})
Str = [Str in{i}];
elseif iscell(in{i})
str1 = stringFilter(in{i});
Str = [Str str1];
end
end
end
Just loop through all elements in the cell a test, whether it is a string or a cell. In the latter, call the function for this cell again. Output:
>> [filtered1] = stringFilter(cA1)
filtered1 =
This is an example.
>> [filtered2] = stringFilter(cA2)
filtered2 =
I told her she knows/take aim and reload
Here is a different implememntation
function str = stringFilter(in)
if ischar(in)
str = in;
elseif iscell(in) && ~isempty(in)
str = cell2mat(cellfun(#stringFilter, in(:)', 'uni', 0));
else
str = '';
end
end
If it's string, return it. If it is a cell apply the same function on all of the elements and concatenate them. Here I use in(:)' to make sure it is a row vector and then cell2mat concatenates resulting strings. And if the type is anything else return an empty string. We need to check if the cell array is empty or not because cell2mat({}) is of type double.
The line
Str = ischar(in{:,:});
is the problem. It doesn't make any sense to me.
You're close to the getting the answer, but made a few significant but small mistakes.
You need to check for these things:
1. Loop over the cells of the input.
2. For each cell, see if it itself is a cell, if so, call stringFilter on the cell's VALUE
3. if it is not a cell but is a character array, then use its VALUE as it is.
4. Otherwise if the cell VALUE contains a non character, the contribution of that cell to the output is '' (blank)
I think you made a mistake by not taking advantage of the difference between in(1) and in{1}.
Anyway, here's my version of the function. It works.
function [out] = stringFilter(in)
out = [];
for idx = 1:numel(in)
if iscell (in{idx})
% Contents of the cell is another cell array
tempOut = stringFilter(in{idx});
elseif ischar(in{idx})
% Contents are characters
tempOut = in{idx};
else
% Contents are not characters
tempOut = '';
end
% Concatenate the current output to the overall output
out = [out, tempOut];
end
end

Reading TDM (Diadem) files from script

My customer is sending TDM/TDX files captured in National Instruments Diadem, which I haven't got. I'm looking for a way to convert the files into .CSV, XLS or .MAT files for analysis in Matlab (without using Diadem or Diadem DLLs!)
The format consists of a well structured XML file (.TDM) and a binary (.TDX), with the .TDM defining how fields are packed as bits in the binary TDX. I'd like to read the files (for use in Matlab and other environments). Does anyone have a general purpose tool or conversion script in for instance Python or Perl (not using the NI DLL's) or directly in Matlab?
I've looked into buying the tool, but didn't like it for anything other than one-time conversion to a compatible file format.
Thanks!
I know this is a little late, but I have a simple library to read TDM/TDX files in Python. It works by parsing the TDM file to figure out the data type, then using NumPy.memmap to open the TDX file. It can then be used like a standard NumPy array. The code is pretty simple, so you could probably implement something similar in Matlab.
Here's the link: https://bitbucket.org/joshayers/tdm_loader
Hope that helps.
Maybe a little too late, but I think there is a simple way to get the data from TDM files: NI provides plug-ins for reading TDM files into Excel and OpenOffice Calc. Having the data in one of these programs you could use the CSV export. Search google for "tdm excel" or "tdm openoffice".
Hope this helps...
Gemue
The following script can convert all variables into 'variable' struct.
CurrDirectory = '...//'; % Path to current directory
fileNametdx = '.../utility/'; % Path to TDX file
%%
% Data type conversion
Dtype.eInt8Usi='int8';
Dtype.eInt16Usi='int16';
Dtype.eInt32Usi='int32';
Dtype.eInt64Usi='int64';
Dtype.eUInt8Usi='uint8';
Dtype.eUInt16Usi='uint16';
Dtype.eUInt32Usi='uint32';
Dtype.eUInt64Usi='uint64';
Dtype.eFloat32Usi='single';
Dtype.eFloat64Usi='double';
%% Read .tdx file Name
wb=waitbar(0,'Reading *.tdx Files');
fileNameTDM = strrep(fileNametdx,'.tdx','.TDM');
%% Read .TDM
tdm=xml2struct(fileNameTDM);
for i=1:numel(tdm.usi_colon_tdm.usi_colon_data.tdm_channel)
waitbar((1/numel(tdm.usi_colon_tdm.usi_colon_data.tdm_channel))*i,wb,['File ' fileNametdx ' conversion started']);
s1=strsplit(string(tdm.usi_colon_tdm.usi_colon_data.tdm_channel{1, i}.local_columns.Text),'"');
usi1=s1(2);
% if condition match untill we get usi2
for j=1:numel(tdm.usi_colon_tdm.usi_colon_data.localcolumn)
usi2=string(tdm.usi_colon_tdm.usi_colon_data.localcolumn{1, j}.Attributes.id);
if usi1==usi2
%take new usi
s2=strsplit(string(tdm.usi_colon_tdm.usi_colon_data.localcolumn{1, j}.values.Text),'"');
new_usi1=s2(2);
w1=strsplit(string(tdm.usi_colon_tdm.usi_colon_data.tdm_channel{1, i}.datatype.Text),'_');
str_1=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence'));
str_2=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.Attributes.id'));
str_3=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.values.Attributes.external'));
str_4=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.values'));
for k=1:numel(eval(str_1))
new_usi2=string(eval(str_2));
if new_usi1==new_usi2
if isfield(eval(str_4), 'Attributes')
inc_value1=string(eval(str_3));
for m=1:numel(tdm.usi_colon_tdm.usi_colon_include.file.block)
inc_value2=string(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.id);
if inc_value1==inc_value2
% offset=round(str2num(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.byteOffset)/8);
length = round(str2num(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.length));
offset1=round(str2num(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.byteOffset));
value_type = tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.valueType;
m = memmapfile(fullfile(CurrDirectory,fileNametdx),'Offset',offset1,'Format',{Dtype.(value_type) [length 1] 'dat'},'Writable',true,'Repeat',1);
dat=m.Data.dat ;
end
end
else
str_5=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.values.',char(fieldnames(tdm.usi_colon_tdm.usi_colon_data.string_sequence{1, k}.values))));
dat=eval(str_5)';
end
name_variable = string(tdm.usi_colon_tdm.usi_colon_data.tdm_channel{1, i}.name.Text);
varname = genvarname(char(name_variable));
variable.(varname) = dat;
end
end
end
end
end
waitbar(1,wb,[fileNametdx ' conversion completed']);
pause(1)
close(wb)
delete(fullfile(CurrDirectory,fileNametdx),fullfile(CurrDirectory,fileNameTDM));
%Output Variable is Struct
clearvars -except variable
This script requires following XML parser
function [ s ] = xml2struct( file )
%Convert xml file into a MATLAB structure
% [ s ] = xml2struct( file )
%
% A file containing:
% <XMLname attrib1="Some value">
% <Element>Some text</Element>
% <DifferentElement attrib2="2">Some more text</Element>
% <DifferentElement attrib3="2" attrib4="1">Even more text</DifferentElement>
% </XMLname>
%
% Will produce:
% s.XMLname.Attributes.attrib1 = "Some value";
% s.XMLname.Element.Text = "Some text";
% s.XMLname.DifferentElement{1}.Attributes.attrib2 = "2";
% s.XMLname.DifferentElement{1}.Text = "Some more text";
% s.XMLname.DifferentElement{2}.Attributes.attrib3 = "2";
% s.XMLname.DifferentElement{2}.Attributes.attrib4 = "1";
% s.XMLname.DifferentElement{2}.Text = "Even more text";
%
% Please note that the following characters are substituted
% '-' by '_dash_', ':' by '_colon_' and '.' by '_dot_'
%
% Written by W. Falkena, ASTI, TUDelft, 21-08-2010
% Attribute parsing speed increased by 40% by A. Wanner, 14-6-2011
% Added CDATA support by I. Smirnov, 20-3-2012
%
% Modified by X. Mo, University of Wisconsin, 12-5-2012
if (nargin < 1)
clc;
help xml2struct
return
end
if isa(file, 'org.apache.xerces.dom.DeferredDocumentImpl') || isa(file, 'org.apache.xerces.dom.DeferredElementImpl')
% input is a java xml object
xDoc = file;
else
%check for existance
if (exist(file,'file') == 0)
%Perhaps the xml extension was omitted from the file name. Add the
%extension and try again.
if (isempty(strfind(file,'.xml')))
file = [file '.xml'];
end
if (exist(file,'file') == 0)
error(['The file ' file ' could not be found']);
end
end
%read the xml file
xDoc = xmlread(file);
end
%parse xDoc into a MATLAB structure
s = parseChildNodes(xDoc);
end
% ----- Subfunction parseChildNodes -----
function [children,ptext,textflag] = parseChildNodes(theNode)
% Recurse over node children.
children = struct;
ptext = struct; textflag = 'Text';
if hasChildNodes(theNode)
childNodes = getChildNodes(theNode);
numChildNodes = getLength(childNodes);
for count = 1:numChildNodes
theChild = item(childNodes,count-1);
[text,name,attr,childs,textflag] = getNodeData(theChild);
if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
%XML allows the same elements to be defined multiple times,
%put each in a different cell
if (isfield(children,name))
if (~iscell(children.(name)))
%put existsing element into cell format
children.(name) = {children.(name)};
end
index = length(children.(name))+1;
%add new element
children.(name){index} = childs;
if(~isempty(fieldnames(text)))
children.(name){index} = text;
end
if(~isempty(attr))
children.(name){index}.('Attributes') = attr;
end
else
%add previously unknown (new) element to the structure
children.(name) = childs;
if(~isempty(text) && ~isempty(fieldnames(text)))
children.(name) = text;
end
if(~isempty(attr))
children.(name).('Attributes') = attr;
end
end
else
ptextflag = 'Text';
if (strcmp(name, '#cdata_dash_section'))
ptextflag = 'CDATA';
elseif (strcmp(name, '#comment'))
ptextflag = 'Comment';
end
%this is the text in an element (i.e., the parentNode)
if (~isempty(regexprep(text.(textflag),'[\s]*','')))
if (~isfield(ptext,ptextflag) || isempty(ptext.(ptextflag)))
ptext.(ptextflag) = text.(textflag);
else
%what to do when element data is as follows:
%<element>Text <!--Comment--> More text</element>
%put the text in different cells:
% if (~iscell(ptext)) ptext = {ptext}; end
% ptext{length(ptext)+1} = text;
%just append the text
ptext.(ptextflag) = [ptext.(ptextflag) text.(textflag)];
end
end
end
end
end
end
% ----- Subfunction getNodeData -----
function [text,name,attr,childs,textflag] = getNodeData(theNode)
% Create structure of node info.
%make sure name is allowed as structure name
name = toCharArray(getNodeName(theNode))';
name = strrep(name, '-', '_dash_');
name = strrep(name, ':', '_colon_');
name = strrep(name, '.', '_dot_');
attr = parseAttributes(theNode);
if (isempty(fieldnames(attr)))
attr = [];
end
%parse child nodes
[childs,text,textflag] = parseChildNodes(theNode);
if (isempty(fieldnames(childs)) && isempty(fieldnames(text)))
%get the data of any childless nodes
% faster than if any(strcmp(methods(theNode), 'getData'))
% no need to try-catch (?)
% faster than text = char(getData(theNode));
text.(textflag) = toCharArray(getTextContent(theNode))';
end
end
% ----- Subfunction parseAttributes -----
function attributes = parseAttributes(theNode)
% Create attributes structure.
attributes = struct;
if hasAttributes(theNode)
theAttributes = getAttributes(theNode);
numAttributes = getLength(theAttributes);
for count = 1:numAttributes
%attrib = item(theAttributes,count-1);
%attr_name = regexprep(char(getName(attrib)),'[-:.]','_');
%attributes.(attr_name) = char(getValue(attrib));
%Suggestion of Adrian Wanner
str = toCharArray(toString(item(theAttributes,count-1)))';
k = strfind(str,'=');
attr_name = str(1:(k(1)-1));
attr_name = strrep(attr_name, '-', '_dash_');
attr_name = strrep(attr_name, ':', '_colon_');
attr_name = strrep(attr_name, '.', '_dot_');
attributes.(attr_name) = str((k(1)+2):(end-1));
end
end
end

Resources