I need to convert uploaded binary files to base64 string format on the fly. I'm using ASP, Vbscript. Using Midori's component for base64 conversion. For small size files (<20K) the performance is okay. But when it exceeds 75 or 100K, its totally lost. Is there any efficient way to convert big binary files (2MB) to base64 string format?
Thanks in advance,
Kenney
I have solved this issue by implementing a .net component for converting to base64 string. The hard part is the binary data sent to the .net COM from ASP is received as a string. Convert.ToBase64() accepts only byte[]. So I tried converting string to byte[].
But the encoding available in .net (Unicode, ASCII, UTF) doesn't works fine. There are data loss, while these encodings are used. Finally I get it done by using StringReader object. Read char by char(16 bit) and converted them to (8 bit) byte[] array.
And the performance is best.
Regards,
Siva.
you should use the .NET methods Convert.ToBase64String and Convert.FromBase64String.
Use the Convert.FromBase64String( ) method. This will give you the binary
data back (as a byte array).
To convert binary data to a Base64 string see conversion functions from binary data to a string in vbscript
from http://www.motobit.com/tips/detpg_Base64Encode/
Function Base64EncodeBinary(inData)
Base64EncodeBinary = Base64Encode(BinaryToString(inData))
End Function
Function Base64Encode(inData)
'rfc1521
'2001 Antonin Foller, Motobit Software, http://Motobit.cz
Const Base64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
Dim cOut, sOut, I
'For each group of 3 bytes
For I = 1 To Len(inData) Step 3
Dim nGroup, pOut, sGroup
'Create one long from this 3 bytes.
nGroup = &H10000 * Asc(Mid(inData, I, 1)) + _
&H100 * MyASC(Mid(inData, I + 1, 1)) + MyASC(Mid(inData, I + 2, 1))
'Oct splits the long To 8 groups with 3 bits
nGroup = Oct(nGroup)
'Add leading zeros
nGroup = String(8 - Len(nGroup), "0") & nGroup
'Convert To base64
pOut = Mid(Base64, CLng("&o" & Mid(nGroup, 1, 2)) + 1, 1) + _
Mid(Base64, CLng("&o" & Mid(nGroup, 3, 2)) + 1, 1) + _
Mid(Base64, CLng("&o" & Mid(nGroup, 5, 2)) + 1, 1) + _
Mid(Base64, CLng("&o" & Mid(nGroup, 7, 2)) + 1, 1)
'Add the part To OutPut string
sOut = sOut + pOut
'Add a new line For Each 76 chars In dest (76*3/4 = 57)
'If (I + 2) Mod 57 = 0 Then sOut = sOut + vbCrLf
Next
Select Case Len(inData) Mod 3
Case 1: '8 bit final
sOut = Left(sOut, Len(sOut) - 2) + "=="
Case 2: '16 bit final
sOut = Left(sOut, Len(sOut) - 1) + "="
End Select
Base64Encode = sOut
End Function
Function MyASC(OneChar)
If OneChar = "" Then MyASC = 0 Else MyASC = Asc(OneChar)
End Function
Use MSXML to do the encoding for you. Here is function encapsulating the procedure:-
Function ToBase64(rabyt)
Dim xml: Set xml = CreateObject("MSXML2.DOMDocument.3.0")
xml.LoadXml "<root />"
xml.documentElement.dataType = "bin.base64"
xml.documentElement.nodeTypedValue = rabyt
ToBase64 = xml.documentElement.Text
End Function
Note this will include linebreaks in the base64 encoding but most base64 decoders are tolerant of linebreaks. If not you could simpy use Replace(base64, vbLF, "") to remove them, this will still be quicker than a pure VBScript solution.
Edit Example usage:-
Dim sBase64: sBase64 = ToBase64(Request.BinaryRead(Request.TotalBytes))
I use next code for c#:
public static string ImageToBase64(Image image, ImageFormat format)
{
using (MemoryStream ms = new MemoryStream())
{
// Convert Image to byte[]
image.Save(ms, format);
byte[] imageBytes = ms.ToArray();
// Convert byte[] to Base64 String
string base64String = Convert.ToBase64String(imageBytes);
return base64String;
}
}
public static Image Base64ToImage(string base64String)
{
// Convert Base64 String to byte[]
byte[] imageBytes = Convert.FromBase64String(base64String);
MemoryStream ms = new MemoryStream(imageBytes, 0,
imageBytes.Length);
// Convert byte[] to Image
ms.Write(imageBytes, 0, imageBytes.Length);
Image image = Image.FromStream(ms, true);
return image;
}
for vbscript see http://www.freevbcode.com/ShowCode.asp?ID=5248 maybe help you.
There is a good discussion of this in base64-encode-string-in-vbscript.
In addition, I have found this site useful for trying to eek speed out of vb code. There are several variants of base 64 there for vb6 that are quite fast.
This is what worked for me
Function Base64DataURI(url)
'Create an Http object, use any of the four objects
Dim Http
Set Http = CreateObject("WinHttp.WinHttpRequest.5.1")
'Send request To URL
Http.Open "GET", url, False
Http.Send
'Get response data As a string and encode as base64
Base64DataURI = Encrypt(Http.ResponseText)
End Function
In my case the URL is a script that generates a barcode on the fly and needed to encode to include that in emails.
Encrypt is a pretty standard function we use to encode as Base64, but the main concept we needed was to get the file via URL not file system.
Related
I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))
I need a code in VBScript or batch to replace 5 Caracters (the bold numbers below) in a line of a text file to change ports numbers.
change_port.vbs:
prefsFile = "%userprofile%\Desktop\teste.msrcincident"
prefsFile = CreateObject("WScript.Shell").ExpandEnvironmentStrings(prefsFile)
newPrefs = "5500"
Set fso = CreateObject("Scripting.FileSystemObject")
json = fso.OpenTextFile(prefsFile).ReadAll
Set re = New RegExp
re.Pattern = "":*?",*,"
json = re.Replace(json, ": & newPrefs & ",*,")
fso.OpenTextFile(prefsFile, 2).Write(json)
Original text file:
RCTICKET="65538,1,10.0.0.1:54593,*,ucIdnri2n4QPf/bv92mtx4w2qliCNdyDgBpHPr7nJFdxYL2/dR+iel9Mh4zgD6QR,*,*,Fbjf5rcIrdrlnibnisrzRcO8tsY=" PassStub="HG)7HbhIZPTiKy" RCTICKETENCRYPTED="1" DtStart="1457700115" DtLength="142560" L="0"/></UPLOADINFO>
Expected result text file:
RCTICKET="65538,1,10.0.0.1:5500,*,ucIdnri2n4QPf/bv92mtx4w2qliCNdyDgBpHPr7nJFdxYL2/dR+iel9Mh4zgD6QR,*,*,Fbjf5rcIrdrlnibnisrzRcO8tsY=" PassStub="HG)7HbhIZPTiKy" RCTICKETENCRYPTED="1" DtStart="1457700115" DtLength="142560" L="0"/></UPLOADINFO>
Can anyone help me?
Your search and replacement expressions are messed up. You're looking for a colon (:) followed by one or more digits (\d+ or [0-9]+) followed by a comma (,), and want to replace that with a colon followed by the new port number and a comma.
Change this:
re.Pattern = "":*?",*,"
json = re.Replace(json, ": & newPrefs & ",*,")
into this.
re.Pattern = ":\d+,"
json = re.Replace(json, ":" & newPrefs & ",")
Always keep your expressions as simple as possible.
I'm using JNDI to query Active directory from group catalog servers:
Hashtable<String, Object> env = new Hashtable<String, Object>();
env.put(Context.INITIAL_CONTEXT_FACTORY,
"com.sun.jndi.ldap.LdapCtxFactory");
env.put(Context.PROVIDER_URL, "ldap://" + serverUrl + "/");
env.put(Context.SECURITY_AUTHENTICATION, "simple");
env.put(Context.SECURITY_PRINCIPAL, userName + "#" + currentDomain);
env.put(Context.SECURITY_CREDENTIALS, credentials);
env.put("java.naming.ldap.attributes.binary", "objectSid");
// Create the initial context
DirContext ctx = new InitialDirContext(env);
When I get objectSid back and convert the byte[] to hex string I get sids such as:
HEX: ACED0005757200025B42ACF317F8060854E002000078700000001001020000000000052000000025020000
SID: S-172-23445241858-4088152667-134674455-188500-7370752-17825792-2-537198592-620756992
This results in byte 0 having a value of 172 and byte 1 of 237, as well as 3 bytes at the end of parsing the 4 byte sub authorities.
Byte 0 should always be 1 and byte 2 should be the number of 4 byte sub authority identifiers (in this case 9). I'm having trouble figuring out what's going on as I'm unable to correctly map between expected and actual.
I'm betting there's some newbie mistake that I'm making, but can't figure out what it might be; my hope is that someone out there has been through this and can tell me what it is!
This was actually not an LDAP issue, but an issue with writing the object I was getting back to a byte array. The lesson is, debug harder...
email = self.request.get('email')
name = self.request.get('name')
mail.send_mail(sender="myemail", email=email, body=name, subject="sss " + name + "sdafsaã")
// added ã: the problem was that "sdafsaã" should be u"sdafsaã". with a "u" before the string. and now it works
then i get this
main.py", line 85, in post
subject="sss " + name + "sdafsa",
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 36: ordinal not in range(128)
the might have chars like õ ó and something like that.
for more details:
the code to run the worker(the code before)
the name is the one that is received from the datastore and contains chars like õ and ó...
taskqueue.add(url='/emailworker', params={'email': e.email, 'name': e.name})
thanks
Try reading a little about how unicode works in Python:
Dive Into Python - Unicode
Unicode In Python, Completely Demystified
Also, make sure you're running Python 2.5 if you are seeing this error on the development server.
You should use:
email = self.request.get('email')
name = self.request.get('name')
mail.send_mail(sender="myemail",
email=email,
body=name,
subject="hello " + name.encode('utf-8') + " user!")
The variable name is a unicode string and should encoded in utf-8 or in the kind of encode you are using in you web application before concatenating to other byte strings.
Without name.encode(), Python uses the default 7 bits ascii codec that can't encode that specific character.
the problem is joining 2 strings: ||| body = name + "ã" => error ||| body = name + u"ã" => works!!! |||
Try with encode
t ='việt ứng '
m = MyModel()
m.data = t.encode('utf-8')
m.put() #success!
My customer is sending TDM/TDX files captured in National Instruments Diadem, which I haven't got. I'm looking for a way to convert the files into .CSV, XLS or .MAT files for analysis in Matlab (without using Diadem or Diadem DLLs!)
The format consists of a well structured XML file (.TDM) and a binary (.TDX), with the .TDM defining how fields are packed as bits in the binary TDX. I'd like to read the files (for use in Matlab and other environments). Does anyone have a general purpose tool or conversion script in for instance Python or Perl (not using the NI DLL's) or directly in Matlab?
I've looked into buying the tool, but didn't like it for anything other than one-time conversion to a compatible file format.
Thanks!
I know this is a little late, but I have a simple library to read TDM/TDX files in Python. It works by parsing the TDM file to figure out the data type, then using NumPy.memmap to open the TDX file. It can then be used like a standard NumPy array. The code is pretty simple, so you could probably implement something similar in Matlab.
Here's the link: https://bitbucket.org/joshayers/tdm_loader
Hope that helps.
Maybe a little too late, but I think there is a simple way to get the data from TDM files: NI provides plug-ins for reading TDM files into Excel and OpenOffice Calc. Having the data in one of these programs you could use the CSV export. Search google for "tdm excel" or "tdm openoffice".
Hope this helps...
Gemue
The following script can convert all variables into 'variable' struct.
CurrDirectory = '...//'; % Path to current directory
fileNametdx = '.../utility/'; % Path to TDX file
%%
% Data type conversion
Dtype.eInt8Usi='int8';
Dtype.eInt16Usi='int16';
Dtype.eInt32Usi='int32';
Dtype.eInt64Usi='int64';
Dtype.eUInt8Usi='uint8';
Dtype.eUInt16Usi='uint16';
Dtype.eUInt32Usi='uint32';
Dtype.eUInt64Usi='uint64';
Dtype.eFloat32Usi='single';
Dtype.eFloat64Usi='double';
%% Read .tdx file Name
wb=waitbar(0,'Reading *.tdx Files');
fileNameTDM = strrep(fileNametdx,'.tdx','.TDM');
%% Read .TDM
tdm=xml2struct(fileNameTDM);
for i=1:numel(tdm.usi_colon_tdm.usi_colon_data.tdm_channel)
waitbar((1/numel(tdm.usi_colon_tdm.usi_colon_data.tdm_channel))*i,wb,['File ' fileNametdx ' conversion started']);
s1=strsplit(string(tdm.usi_colon_tdm.usi_colon_data.tdm_channel{1, i}.local_columns.Text),'"');
usi1=s1(2);
% if condition match untill we get usi2
for j=1:numel(tdm.usi_colon_tdm.usi_colon_data.localcolumn)
usi2=string(tdm.usi_colon_tdm.usi_colon_data.localcolumn{1, j}.Attributes.id);
if usi1==usi2
%take new usi
s2=strsplit(string(tdm.usi_colon_tdm.usi_colon_data.localcolumn{1, j}.values.Text),'"');
new_usi1=s2(2);
w1=strsplit(string(tdm.usi_colon_tdm.usi_colon_data.tdm_channel{1, i}.datatype.Text),'_');
str_1=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence'));
str_2=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.Attributes.id'));
str_3=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.values.Attributes.external'));
str_4=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.values'));
for k=1:numel(eval(str_1))
new_usi2=string(eval(str_2));
if new_usi1==new_usi2
if isfield(eval(str_4), 'Attributes')
inc_value1=string(eval(str_3));
for m=1:numel(tdm.usi_colon_tdm.usi_colon_include.file.block)
inc_value2=string(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.id);
if inc_value1==inc_value2
% offset=round(str2num(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.byteOffset)/8);
length = round(str2num(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.length));
offset1=round(str2num(tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.byteOffset));
value_type = tdm.usi_colon_tdm.usi_colon_include.file.block{1, m}.Attributes.valueType;
m = memmapfile(fullfile(CurrDirectory,fileNametdx),'Offset',offset1,'Format',{Dtype.(value_type) [length 1] 'dat'},'Writable',true,'Repeat',1);
dat=m.Data.dat ;
end
end
else
str_5=char(strcat('tdm.usi_colon_tdm.usi_colon_data.',lower(w1(2)),'_sequence{1, k}.values.',char(fieldnames(tdm.usi_colon_tdm.usi_colon_data.string_sequence{1, k}.values))));
dat=eval(str_5)';
end
name_variable = string(tdm.usi_colon_tdm.usi_colon_data.tdm_channel{1, i}.name.Text);
varname = genvarname(char(name_variable));
variable.(varname) = dat;
end
end
end
end
end
waitbar(1,wb,[fileNametdx ' conversion completed']);
pause(1)
close(wb)
delete(fullfile(CurrDirectory,fileNametdx),fullfile(CurrDirectory,fileNameTDM));
%Output Variable is Struct
clearvars -except variable
This script requires following XML parser
function [ s ] = xml2struct( file )
%Convert xml file into a MATLAB structure
% [ s ] = xml2struct( file )
%
% A file containing:
% <XMLname attrib1="Some value">
% <Element>Some text</Element>
% <DifferentElement attrib2="2">Some more text</Element>
% <DifferentElement attrib3="2" attrib4="1">Even more text</DifferentElement>
% </XMLname>
%
% Will produce:
% s.XMLname.Attributes.attrib1 = "Some value";
% s.XMLname.Element.Text = "Some text";
% s.XMLname.DifferentElement{1}.Attributes.attrib2 = "2";
% s.XMLname.DifferentElement{1}.Text = "Some more text";
% s.XMLname.DifferentElement{2}.Attributes.attrib3 = "2";
% s.XMLname.DifferentElement{2}.Attributes.attrib4 = "1";
% s.XMLname.DifferentElement{2}.Text = "Even more text";
%
% Please note that the following characters are substituted
% '-' by '_dash_', ':' by '_colon_' and '.' by '_dot_'
%
% Written by W. Falkena, ASTI, TUDelft, 21-08-2010
% Attribute parsing speed increased by 40% by A. Wanner, 14-6-2011
% Added CDATA support by I. Smirnov, 20-3-2012
%
% Modified by X. Mo, University of Wisconsin, 12-5-2012
if (nargin < 1)
clc;
help xml2struct
return
end
if isa(file, 'org.apache.xerces.dom.DeferredDocumentImpl') || isa(file, 'org.apache.xerces.dom.DeferredElementImpl')
% input is a java xml object
xDoc = file;
else
%check for existance
if (exist(file,'file') == 0)
%Perhaps the xml extension was omitted from the file name. Add the
%extension and try again.
if (isempty(strfind(file,'.xml')))
file = [file '.xml'];
end
if (exist(file,'file') == 0)
error(['The file ' file ' could not be found']);
end
end
%read the xml file
xDoc = xmlread(file);
end
%parse xDoc into a MATLAB structure
s = parseChildNodes(xDoc);
end
% ----- Subfunction parseChildNodes -----
function [children,ptext,textflag] = parseChildNodes(theNode)
% Recurse over node children.
children = struct;
ptext = struct; textflag = 'Text';
if hasChildNodes(theNode)
childNodes = getChildNodes(theNode);
numChildNodes = getLength(childNodes);
for count = 1:numChildNodes
theChild = item(childNodes,count-1);
[text,name,attr,childs,textflag] = getNodeData(theChild);
if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
%XML allows the same elements to be defined multiple times,
%put each in a different cell
if (isfield(children,name))
if (~iscell(children.(name)))
%put existsing element into cell format
children.(name) = {children.(name)};
end
index = length(children.(name))+1;
%add new element
children.(name){index} = childs;
if(~isempty(fieldnames(text)))
children.(name){index} = text;
end
if(~isempty(attr))
children.(name){index}.('Attributes') = attr;
end
else
%add previously unknown (new) element to the structure
children.(name) = childs;
if(~isempty(text) && ~isempty(fieldnames(text)))
children.(name) = text;
end
if(~isempty(attr))
children.(name).('Attributes') = attr;
end
end
else
ptextflag = 'Text';
if (strcmp(name, '#cdata_dash_section'))
ptextflag = 'CDATA';
elseif (strcmp(name, '#comment'))
ptextflag = 'Comment';
end
%this is the text in an element (i.e., the parentNode)
if (~isempty(regexprep(text.(textflag),'[\s]*','')))
if (~isfield(ptext,ptextflag) || isempty(ptext.(ptextflag)))
ptext.(ptextflag) = text.(textflag);
else
%what to do when element data is as follows:
%<element>Text <!--Comment--> More text</element>
%put the text in different cells:
% if (~iscell(ptext)) ptext = {ptext}; end
% ptext{length(ptext)+1} = text;
%just append the text
ptext.(ptextflag) = [ptext.(ptextflag) text.(textflag)];
end
end
end
end
end
end
% ----- Subfunction getNodeData -----
function [text,name,attr,childs,textflag] = getNodeData(theNode)
% Create structure of node info.
%make sure name is allowed as structure name
name = toCharArray(getNodeName(theNode))';
name = strrep(name, '-', '_dash_');
name = strrep(name, ':', '_colon_');
name = strrep(name, '.', '_dot_');
attr = parseAttributes(theNode);
if (isempty(fieldnames(attr)))
attr = [];
end
%parse child nodes
[childs,text,textflag] = parseChildNodes(theNode);
if (isempty(fieldnames(childs)) && isempty(fieldnames(text)))
%get the data of any childless nodes
% faster than if any(strcmp(methods(theNode), 'getData'))
% no need to try-catch (?)
% faster than text = char(getData(theNode));
text.(textflag) = toCharArray(getTextContent(theNode))';
end
end
% ----- Subfunction parseAttributes -----
function attributes = parseAttributes(theNode)
% Create attributes structure.
attributes = struct;
if hasAttributes(theNode)
theAttributes = getAttributes(theNode);
numAttributes = getLength(theAttributes);
for count = 1:numAttributes
%attrib = item(theAttributes,count-1);
%attr_name = regexprep(char(getName(attrib)),'[-:.]','_');
%attributes.(attr_name) = char(getValue(attrib));
%Suggestion of Adrian Wanner
str = toCharArray(toString(item(theAttributes,count-1)))';
k = strfind(str,'=');
attr_name = str(1:(k(1)-1));
attr_name = strrep(attr_name, '-', '_dash_');
attr_name = strrep(attr_name, ':', '_colon_');
attr_name = strrep(attr_name, '.', '_dot_');
attributes.(attr_name) = str((k(1)+2):(end-1));
end
end
end