How to Decode XML Blob field in D7 - sql-server

I'm having a problem trying to decode XML data returned by an instance of MS SQL Server 2014 to an app written in D7. (the version of Indy is the one which came with it, 9.00.10).
Update When I originally wrote this q, I was under the impression that the contents of the blob field needed to be Base64-decoded, but it seems that that was wrong. Having followed Remy Lebeau's suggestion, the blob stream contains recognisable text in the field names and field values before decoding but not afterwards.
In the code below, the SQL in the AdoQuery is simply
Select * from Authors where au_lname = 'White' For XML Auto
the Authors table being the one in the demo 'pubs' database. I've added the "Where" clause to restrict the size of the result set so I can show a hex dump of the returned blob.
According to the Sql Server OLH, the default type of the returned data when 'For XML Auto' is specified is 'binary base64-encoded format'. The data type of the single field of the AdoQuery is ftBlob, if I let the IDE create this field.
Executing the code below generates an exception "Uneven size in DecodeToStream". At the call to IdDecoderMIME.DecodeToString(S), the length of the string S is 3514, and 3514 mod 4 is 2, not 0 as it apparently should be, hence the exception. I've confirmed that the number of bytes in the field's value is 3514, so there's no difference between the size of the variant and the length of the string, i.e. nothing has gone awol in between.
procedure TForm1.FormCreate(Sender: TObject);
var
SS : TStringStream;
Output : String;
S : String;
IdDecoderMIME : TIdDecoderMIME;
begin
SS := TStringStream.Create('');
IdDecoderMIME := TIdDecoderMIME.Create(Nil);
try
AdoQuery1.Open;
TBlobField(AdoQuery1.Fields[0]).SaveToStream(SS);
S := SS.DataString;
IdDecoderMIME.FillChar := #0;
Output := IdDecoderMIME.DecodeToString(S);
Memo1.Lines.Text := S;
finally
SS.Free;
IdDecoderMIME.Free;
end;
end;
I'm using this code:
procedure TForm1.FormCreate(Sender: TObject);
var
SS : TStringStream;
MS : TMemoryStream;
Output : String;
begin
SS := TStringStream.Create('');
MS := TMemoryStream.Create;
try
AdoQuery1.Open;
TBlobField(AdoQuery1.Fields[0]).SaveToStream(SS);
SS.WriteString(#13#10);
Output := SS.DataString;
SS.Position := 0;
MS.CopyFrom(SS, SS.Size);
MS.SaveToFile(ExtractFilePath(Application.ExeName) + 'Blob.txt');
finally
SS.Free;
MS.Free;
end;
end;
A hex dump of the Blob.Txt file looks like this
00000000 44 05 61 00 75 00 5F 00 69 00 64 00 44 08 61 00 D.a.u._.i.d.D.a.
00000010 75 00 5F 00 6C 00 6E 00 61 00 6D 00 65 00 44 08 u._.l.n.a.m.e.D.
00000020 61 00 75 00 5F 00 66 00 6E 00 61 00 6D 00 65 00 a.u._.f.n.a.m.e.
00000030 44 05 70 00 68 00 6F 00 6E 00 65 00 44 07 61 00 D.p.h.o.n.e.D.a.
00000040 64 00 64 00 72 00 65 00 73 00 73 00 44 04 63 00 d.d.r.e.s.s.D.c.
00000050 69 00 74 00 79 00 44 05 73 00 74 00 61 00 74 00 i.t.y.D.s.t.a.t.
00000060 65 00 44 03 7A 00 69 00 70 00 44 08 63 00 6F 00 e.D.z.i.p.D.c.o.
00000070 6E 00 74 00 72 00 61 00 63 00 74 00 44 07 61 00 n.t.r.a.c.t.D.a.
00000080 75 00 74 00 68 00 6F 00 72 00 73 00 01 0A 02 01 u.t.h.o.r.s.....
00000090 10 E4 04 00 00 0B 00 31 37 32 2D 33 32 2D 31 31 .......172-32-11
000000A0 37 36 02 02 10 E4 04 00 00 05 00 57 68 69 74 65 76.........White
000000B0 02 03 10 E4 04 00 00 07 00 4A 6F 68 6E 73 6F 6E .........Johnson
000000C0 02 04 0D E4 04 00 00 0C 00 34 30 38 20 34 39 36 .........408 496
000000D0 2D 37 32 32 33 02 05 10 E4 04 00 00 0F 00 31 30 -7223.........10
000000E0 39 33 32 20 42 69 67 67 65 20 52 64 2E 02 06 10 932 Bigge Rd....
000000F0 E4 04 00 00 0A 00 4D 65 6E 6C 6F 20 50 61 72 6B ......Menlo Park
00000100 02 07 0D E4 04 00 00 02 00 43 41 02 08 0D E4 04 .........CA.....
As you can see, some of it is legible (field names and contents), some of it not. Does anyone recognise this format and know how to clean it up into the plain text I get from executing the same query in SS Management Studio, i.e. how do I successfully extract the XML from the result set?
Btw, I get the same result (including the contents of the Blob.Txt file) using both the MS OLE DB Provider for Sql Server and the Sql Server Native Client 11 provider, and using Delphi Seattle in place of D7.
Given that the code accesses an external database, this code is the closest I can get to an MCVE.
Update #2 The decoding problem vanishes if I change the Sql query to
select Convert(Text,
(select * from authors where au_lname = 'White' for xml AUTO
))
which gives the result (in SS) of
<authors au_id="172-32-1176" au_lname="White" au_fname="Johnson" phone="408 496-7223" address="10932 Bigge Rd." city="Menlo Park" state="CA" zip="94025" contract="1"/>
but I'm still interested to know how to get this to work without needing the Convert(). I've noticed that if I remove the Where clause from the Sql, what is returned is not well-formed XML - it contains a series of nodes, one per data row, but there is no enclosing root node.
Also btw, I realise that I can avoid this problem by not using "For XML Auto", I'm just interested in how to do it correctly. Also, I don't need any help parsing the XML once I've managed to extract it.

Add the TYPE Directive to specify that you want XML returned.
select *
from Authors
where au_lname = 'White'
for xml auto, type

You can't simply decode the binary blob into XML.
You can use TADOCommand and direct its output stream to an XML document object e.g.:
const
adExecuteStream = 1024;
var
xmlDoc, RecordsAffected: OleVariant;
cmd: TADOCommand;
xmlDoc := CreateOleObject('MSXML2.DOMDocument.3.0'); // or CoDomDocument30.Create;
xmlDoc.async := False;
cmd := TADOCommand.Create(nil);
// specify your connection string
cmd.ConnectionString := 'Provider=SQLOLEDB;Data Source=(local);...';
cmd.CommandType := cmdText;
cmd.CommandText := 'select top 1 * from items for xml auto';
cmd.Properties['Output Stream'].Value := xmlDoc;
cmd.Properties['XML Root'].Value := 'RootNode';
cmd.CommandObject.Execute(RecordsAffected, EmptyParam, adExecuteStream);
xmlDoc.save('d:\test.xml');
cmd.Free;
This results a well-formed XML with enclosing root node RootNode.

Related

How to parse this Apple-specific MPEG file header?

I have an MPEG file which starts like this:
0: 00 0f 6d 79 5f 66 69 6c 65 6e 61 6d 65 2e 6d 70 ..my_filename.mp
10: 67 00 04 fc 00 00 f0 00 b2 10 39 a8 b2 10 39 ad g.........9...9.
20: 0f 6d 79 5f 66 69 6c 65 6e 61 6d 65 2e 6d 70 67 .my_filename.mpg
30: 03 92 3b 40 00 00 00 00 03 7a b5 7c 03 7a d7 d0 ..;#.....z.|.z..
40: 00 4d 6f 6f 56 54 56 4f 44 01 00 01 2a 00 80 00 .MooVTVOD...*...
50: 00 00 00 00 36 b2 83 00 00 04 fc b2 10 39 a8 b2 ....6........9..
60: 10 39 ad 00 00 00 00 00 00 00 00 00 00 00 00 00 .9..............
70: 00 00 00 00 00 00 00 00 00 00 81 81 35 d3 00 00 ............5...
80: 00 36 b2 83 6d 64 61 74 00 00 01 ba 21 00 01 00 .6..mdat....!...
90: 05 80 2b 81 00 00 01 bb 00 0c 80 2f d9 04 e1 ff ..+......../....
a0: c0 c0 20 e0 e0 2e 00 00 01 c0 07 ea ff ff ff ff .. .............
What is the file format of the beginning of the file (first 0x80 bytes), and how do I parse it?
I've run a Google search on MooVTVOD, it looks like something related to QuickTime and iTunes.
What I understand already:
There is 4 bytes of big endian file size in front mdat, according to the QuickTime .mov file format when the .mov contains an MPEG.
Right after mdat there is the MPEG-PS header 00 00 01 ba. Shortly after there is the MPEG-PES header 00 00 01 c0 indicating an audio stream.
However, the first 0x80 bytes in this file seem to be in a different file file format (not QuickTime .mov, not MPEG-PS, not MPEG-PES), and in this question I'm only interested in the file format of the first 0x80 bytes.
Media players such as VLC routinely ignore junk at the beginning of the file, and start playing the MPEG-PS stream at offset 0x80. However, I'm interested in the 0x80 bytes they ignore.
The file format is "Quicktime movie atom," that contains meta information about the media file, or the media itself. mdat is a Media Data atom.
Media atoms describe and define a track’s media type and sample data.
the spec is here: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html
You can parse it with this python script:
https://github.com/kzahel/quicktime-parse
or this software, mentioned in the linked question:
https://archive.codeplex.com/?p=mp4explorer
The file format of the first 0x80 bytes in the question is MacBinary II.
Based on the file format description https://github.com/mietek/theunarchiver/wiki/MacBinarySpecs , it's not MacBinary I, because MacBinary II has data[0x7a] == '\x81' and data[0x7b] == '\x81' (and MacBinary I has data[0x7a] == '\x00', and MacBinary III has data[0x7a] == '\x82'); and it's also not MacBinary III, because that one has data[0x66 : 0x6a] == 'mBIN'.
The CRC value data[0x7c : 0x7e] is incorrect because the filename was modified before posting to StackOverflow. FYI The CRC algorithm used is CRC16/XMODEM (on https://crccalc.com/), also implemented as CalcCRC in
MacBinary.c.
The data[0x41 : 0x45] == 'MooV' is the file type code. According to the Excel spreadsheet downloadable from TCDB, it means QuickTime movie (video and maybe audio).
The data[0x45 : 0x49] == 'TVOD' is the file creator code. TCDB and this database indicate that it means QuickTime Player.
More info and links about MacBinary: http://fileformats.archiveteam.org/wiki/MacBinary
Please note that all these headers are unnecessary to play the video: by removing the first 0x88 bytes we get an MPEG-PS video file, which many video players can play (not only QuickTime Player, and not only players running on macOS).

Xbee sending wrong ZDO responses

I´m playing around with two Xbees, one defined as coordinator, another as router. I want to read information about the network interoperably so i decided to use the ZDO messages.
I send a message like this ((profile ID 0x00 00, cluster ID 0x 00 31) and receive for example the following response from the router:
7E 00 2D 91 00 13 A2 00 40 E5 F0 B4 FB CE 00 00 80 31 00 00 01 2C 00 01 00 01 58 CE C1 8D 7A 3F 2D 40 AB F0 E5 40 00 A2 13 00 00 00 04 02 00 FF 33
Correct answer cluster ID: 0x 80 31
Focussing on the RF Data i have the following:
2C 00 01 00 01 58 CE C1 8D 7A 3F 2D 40 AB F0 E5 40 00 A2 13 00 00 00 04 02 00 FF
I now try to decode this hex string and face some problems.
From my point of view, this string should be encoded like defined within the ZigBee Spec from 2012, at Table 2.126 and 2.127
Unfortunately this don´t work for me. If i ignore, that the first byte should be the status and take the first two of them, i can read out NeighborTableEntries, StartIndex, NeighborTabelListCount. But when it comes to the NeighTableList i only can read out the Extended PAN id, the extended address and the network address, the rest of the string does not fit to the standard. Am i doing something wrong here or does the xbee´s don´t stick to the standard?
2C = Sequence Number
00 = Status (Success)
01 = 1 entry (total)
00 = starting at index 0
01 = 1 entry (in packet)
58 CE C1 8D 7A 3F 2D 40 = Extended Pan ID
AB F0 E5 40 00 A2 13 00 = IEEE address
00 00 = NodeId
04 = (Coordinator, RxOnWhenIdle)
02 = (Unknown Permit Join)
00 = (Coordinator)
FF = (LQI)
The values after the NodeId are bitmasks, not bytes.

searching Tags (TLV) with C function

I' m working on Mastercard Paypass transactions, I Have sent a READ RECORD command and got the result:
70 81 AB 57 11 54 13 33 00 89 60 10 83 D2 51 22
20 01 23 40 91 72 5A 08 54 13 33 00 89 60 10 83
5F 24 03 25 12 31 5F 25 03 04 01 01 5F 28 02 00
56 5F 34 01 01 8C 21 9F 02 06 9F 03 06 9F 1A 02
95 05 5F 2A 02 9A 03 9C 01 9F 37 04 9F 35 01 9F
45 02 9F 4C 08 9F 34 03 8D 0C 91 0A 8A 02 95 05
9F 37 04 9F 4C 08 8E 0E 00 00 00 00 00 00 00 00
42 03 5E 03 1F 03 9F 07 02 FF 00 9F 08 02 00 02
9F 0D 05 00 00 00 00 00 9F 0E 05 00 08 00 60 00
9F 0F 05 00 00 00 00 00 9F 42 02 09 78 9F 4A 01
82 9F 14 01 00 9F 23 01 00 9F 13 02 00 00
This response contains TLV data objects (without spaces). I have converted the response as described in the following:
// Read Record 1 with SFI2
//---------------------------------SEND READ RECORD-------------------
inCtlsSendVAPDU(0x2C,0x03,(unsigned char *)"\x00\xB2\x01\x14\x00",5);
clrscr();
inRet = inCTLSRecv2(Response, 269);
LOG_HEX_PRINTF("Essai EMV4 Read record 1 EMV Paypass:",Response,inRet);
if(Response[14]==0x70)
{
sprintf(Response_PPSE,"%02X%02X",Response[12],Response[13]);//To retrieve length of received data
t1=hexToInt(Response_PPSE);// Convert length to integer
t11=t1-2;
i=14;
k=0;
//--------------------------- Converting data to be used later---------
while(i<t11+14)// 14 to escape the header+ command+ status+ length
{
sprintf(READ1+(2*k),"%02X",Response[i]);
i++;
k++;
}
Now I should check if this Response contains the Mandatory Tags:
5A - Application Primary Account Number (PAN)
5F24 - Application Expiration Date
8C - Card Risk Management Data Object List 1 (CDOL1)
8D - Card Risk Management Data Object List 2 (CDOL2)
So I tried the following to check for the 5A tag (Application Primary Account Number (PAN)):
i=0;
t11=2*t11;
while(i<=t11)
{
strncpy(Response_PPSE,READ1+i,2);
if(strncmp(Response_PPSE,"\x05\x0A")==0)
{
write_at("true",4,1,1);// Just to test on the terminal display
goto end;
}
else
i=i+2;
}
goto end;
I don't know why nothing is displayed on the terminal, The if block is not executed!
I tried to print the 5A tag manually by:
strncpy(Response_PPSE,READ1+44,2);
write_at(Response_PPSE,strlen(Response_PPSE),1,1);
And it display the right value!!
Can someone helps to resolve this issue?
You don't find that tag because you are not searching for the string "5A" but for the string "\x05\x0A" (ENQ character + line feed character). Moreover, I wonder if the above code actually compiles as you did not specify the mandatory length argument to strncmp(). You could try something like
if(strncmp(Response_PPSE,"5A", 2)==0)
instead.
However, you should understand that you are scanning the whole response data for the value 5A. Therefore, finding this value could also mean that it was part of some other TLV tag's data field, length field or even part of a multi-byte tag field. It would therefore make sense to implement (or use an existing) TLV parser for the BER (Basic Encoding Rules) format.
It's not a good approach to search a specific byte in a raw byte-stream data using strings functions at first place.
The generic TLV parser is a very easy algorithm and you will do it in 30 minutes or so.
In general a pseudo-code for TLV parser that look for a specific Tag would be something like this:
index = 0
while (byte[i] != 0x5A or EOF)
{
length = DecodeLength(byte[i+1])
i += length + 2 // + 1 for L (length) byte itself, it might be encoded with
// 2 bytes so the function DecodeLength can return the number
// of bytes lenght has been encoded
// +1 for T (tag) byte
}
if(EOF) return tag_not_found
return byte[i + 2], length // pointer to data for Tag '5A'and length of data

Visual C++, Extract text information from a DLL database (sentence mining)

The 2006 version of the free offline Chinese sentence dictionary Jukuu contains a collection of 100,000 publicly sourced example sentences in Chinese and English in a .dll file.
The application size is about 80mb, but once installed a 500mb DLL dictionary file is created with the source text. For whatever reason the application doesn't run on my computer, and I'd like to extract all the example sentences so I can do some POS analysis on them.
Opening the 500mb .DLL file is mostly gibberish, except for some fragments of text here and there and references to other resources.
I'm wondering if there is any way I can extract the information in plain text?
The application can be downloaded here: http://www.jukuu.com/down/download.html
Thanks!
Edit: Never-mind, It looks like when viewed in HEX, the file is ordered in a way that is not conducive to sentence mining at all:
00 06 00 02 00 07 03 00 01 00 00 00 FF FE 6E 65 76 65 72 7E 73 74 61
6E 64 20 75 70 0B 00 00 80 00 00 00 00 00 00 00 00 FF FE 33 30 32
32 31 38 32 36 38 2D 00 16 00 06 00 02 00 07 03 00 01 00 00 00 FF
FE 65 78 74 72 65 6D 65 6C 79 7E 63 6C 6F 73 65 0B 00 00 80 00 00
00 00 00 00 00
Something like ÿþnever~stand upÿþ302218268-ÿþextremely~close
Any other ideas on how to mine sentences from the application? Maybe a batch script?

wrapping h264 olimex a13 encoded data to RTMP server

I am trying to stream Olimex A13 encoded data to RTMP server with librtmp and view it on VLC. The problem is I cannot find how to correctly wrap Cedar data to flash container... I have found this example which looks exactly what I would need, but VLC has still problems understanding it -
No suitable decoder module:
VLC does not support the audio or video format "undf". Unfortunately there is no way for you to fix this.
I tried grabbing data through rtmpdump and it appears I cannot disable audio track which VLC is trying to get from the header
Format : Flash Video
File size : 195 KiB
Duration : 1mn 27s
Overall bit rate : 1 464 Kbps
_Server : NGINX RTMP (github.com/arut/nginx-rtmp-module)
_displayWidth : 640.000
_displayHeight : 480.000
_fps : 25.000
Video
Format : AVC
Format/Info : Advanced Video Codec
Codec ID : 7
Duration : 1mn 27s
Width : 640 pixels
Height : 480 pixels
Display aspect ratio : 4:3
Frame rate mode : Constant
Frame rate : 25.000 fps
Bit depth : 8 bits
Bits/(Pixel*Frame) : 0.191
Audio
The last line for 'Audio' is what I suspect is causing VLC to err. Moreover I am sure the video data I am trying to send is not correctly wrapped.
From what I understood looking into example is that FLV expects a general stream description which can be flagged for video and audio stream information, other examples I found (including ffmpeg ) use some magic number following onMetaData tag, but I dont think I do that by following an example from above:
STR2AVAL(av, "onMetaData");
enc = AMF_EncodeString(enc, pend, &av);
*enc++ = AMF_ECMA_ARRAY;
enc = AMF_EncodeInt32(enc, pend, 5+5+2); //5 - video 5 - audio
The dump of my rtmp header is:
sending 307 as header
00000000 02 00 0d 40 73 65 74 44 61 74 61 46 72 61 6d 65 ...#setDataFrame
00000010 02 00 0a 6f 6e 4d 65 74 61 44 61 74 61 03 00 06 ...onMetaData...
00000020 61 75 74 68 6f 72 02 00 00 00 09 63 6f 70 79 72 author.....copyr
00000030 69 67 68 74 02 00 00 00 0b 64 65 73 63 72 69 70 ight.....descrip
00000040 74 69 6f 6e 02 00 00 00 08 6b 65 79 77 6f 72 64 tion.....keyword
00000050 73 02 00 00 00 06 72 61 74 69 6e 67 02 00 00 00 s.....rating....
00000060 0a 70 72 65 73 65 74 6e 61 6d 65 02 00 06 43 75 .presetname...Cu
00000070 73 74 6f 6d 00 05 77 69 64 74 68 00 40 84 00 00 stom..width.#...
00000080 00 00 00 00 00 05 77 69 64 74 68 00 40 84 00 00 ......width.#...
00000090 00 00 00 00 00 06 68 65 69 67 68 74 00 40 7e 00 ......height.#~.
000000a0 00 00 00 00 00 00 09 66 72 61 6d 65 72 61 74 65 .......framerate
000000b0 00 40 39 00 00 00 00 00 00 00 0c 76 69 64 65 6f .#9........video
000000c0 63 6f 64 65 63 69 64 02 00 04 61 76 63 31 00 0d codecid...avc1..
000000d0 76 69 64 65 6f 64 61 74 61 72 61 74 65 00 40 96 videodatarate.#.
000000e0 e3 60 00 00 00 00 00 08 61 76 63 6c 65 76 65 6c .`......avclevel
000000f0 00 3f f0 00 00 00 00 00 00 00 0a 61 76 63 70 72 .?.........avcpr
00000100 6f 66 69 6c 65 00 40 50 80 00 00 00 00 00 00 17 ofile.#P........
00000110 76 69 64 65 6f 6b 65 79 66 72 61 6d 65 5f 66 72 videokeyframe_fr
00000120 65 71 75 65 6e 63 79 00 40 20 00 00 00 00 00 00 equency.# ......
00000130 00 00 09 ...
After that application send PSP and SPS data which I am not sure I encode correctly too:
This is what I have in my encoder application
psp struct start
00000000 34 32 30 31 66 00 00 00 00 00 5a 30 49 41 48 2b 4201f.....Z0IAH+
00000010 56 41 55 42 37 49 00 00 00 00 00 00 00 00 00 00 VAUB7I..........
00000020 00 00 00 00 00 00 00 00 61 4f 34 78 45 67 3d 3d ........aO4xEg==
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000040 00 00 00 00 00 00 ......
psp struct end
And this is what I send to RTMP server:
sending 38 as sps/pps info
00000000 17 00 00 00 00 01 42 c0 15 03 01 00 0d 67 5a 30 ......B......gZ0
00000010 49 41 48 2b 56 41 55 42 37 49 01 00 09 68 61 4f IAH+VAUB7I...haO
00000020 34 78 45 67 3d 3d 4xEg==
info frame sent
So the questions are:
How do you modify the header to explicitly state that its video only?
How does one wrap video data in NAL format and how often do you have to transmit SPS and PPS data?
Are there any online services to inspect the RTMP data to understand whats wrong with it? VLC doesnt help much...

Resources