advanced array of bytes searching - arrays

I have a binary file and i have to track a "dynamic array of bytes" in this file , this array is something like:
d0 30 60 XX 5d 48
Where XX can be any HEX value
I need to find all the occurences of this array in the binary file , i mean all the array of bytes that starts with D0 30 60 (hex) , followed by "XX" (random hex byte) , followed by 5D 48 (hex).
is there any tool or a python script that can do that ?

You can use this :
import re
fil = open("myfile.txt")
txt = fil.read()
mo = re.match(r'd0 3d0 30 60 ([0-9a-f][0-9a-f]) 5d 48',txt,re.M)
I have not much used regex in python earlier. I used regex in Php etc so have alook at that link.
Edit : Might be better,
f = open('test.txt', 'r')
s = re.findall(r'd0 3d0 30 60 ([0-9a-f][0-9a-f]) 5d 48',f.read())

Related

Converting a hexadecimal array into Binary and then into Decimal

I have a sample structure which has two sets of data. The first data contains the following Hex array '00 7F 3F FF 08 FF 60 26' and then when I convert it into binary and then decimal I get a correct answer which is '0 127 63 255 8 255 96 38'.
However, I have some data arrays which are not exactly arranged as the first one, they look something like this '1 40 0 F 00 40 00 47' and when I try to convert these kind of data sets the result is inaccurate. I get something like this '64 0 64 0 71' while the expected result is '1 64 0 15 0 64 0 71'.
This is my code with a sample data:
%% Structure
a(1).Id = 118;
a(1).Data = '00 7F 3F FF 08 FF 60 26';
a(2).Id = 108;
a(2).Data = '1 40 0 F 00 40 00 47';
%% Hexadecimal (Data) --> Binary --> Decimal
Data = a(2).Data;
str = regexp(Data,' ','split');
Ind = cellfun(#length,str);
str = str(Ind==2);
%Hex to Binary
binary = hexToBinaryVector(str,8,'MSBFirst');
%Binary to Decimal
Decimal = bi2de(binary,'left-msb');
Any help will be really appreciated!
adding 2 lines should do the trick:
str = regexp(Data,' ','split');
Ind = cellfun(#length,str);
str(Ind==1) = strcat('0',str(Ind==1) );
Ind = cellfun(#length,str);
str = str(Ind==2);
All it is doing is when it sees a String (your Hex) that is 1 Char, it puts a 0 infront of it, so correct it into its correct format. you can actually do this in the cellfun.

Using a 'for loop' to iterate through a cell array

In matlab, I have a cell array block (s) with hex values.
a = '40 C0 70 EB';
b = '40 C0 80 94';
c = '40 C0 90 59';
s = {a;b;c};
I want to iterate horizontally through each line in such a way that;
first byte 'EB' must be converted to binary ( i.e. EB = 1110 1011 = 8 bits) and saved in some variable/array
Then, 'EB & 70' must be converted to binary but their binary values must be stored together (i.e. EB & 70 = 11101011 01110000 = 16 bits) in some variable/array.
Similarly, 'EB & 70 & C0' converted to binary (i.e. EB & 70 & C0 = 11101011 01110000 11000000 = 24 bits) in some variable/array.
Similarly, '40 C0 70 EB' (i.e. 40 & C0 & 70 & EB = 11101011 01110000 11000000 01000000 = 32 bits)
Finally, same thing has to be carried out for the rest of the lines.
I have written a code to convert individual hex values into their equivalent binary but I am not sure how to proceed from here on.
a = '40 C0 70 EB';
b = '40 C0 80 94';
c = '40 C0 90 59';
s = {a;b;c};
s = cellfun(#strsplit, s, 'UniformOutput', false);
s = vertcat(s{:});
dec = hex2dec(s);
bin = dec2bin(dec);
x=cellstr(bin);
bin = mat2cell(x, repmat(size(s,1),1,size(s,2)));
Any suggestions on how to accomplish these feats?
From the code you've included in your question it seems you're most of the way there.
This bit I think you're missing is how to concatenate binary words, which is a bit awkward in Matlab. See this post for some tips. However for your example the slightly hack-y option of just converting to strings and concatenating might be easier.
Making use of your code, the example below outputs:
'11101011' '1110101101110000' '111010110111000011000000' '11101011011100001100000001000000'
'10010100' '1001010010000000' '100101001000000011000000' '10010100100000001100000001000000'
'01011001' '0101100110010000' '010110011001000011000000' '01011001100100001100000001000000'
which I think is what you want, but wasn't totally sure from your text. I assume you want to keep all 4 numbers (8bit, 16bit, 24bit and 32bit) from each row, so have a total of 12 binary strings.
a = '40 C0 70 EB';
b = '40 C0 80 94';
c = '40 C0 90 59';
s = {a;b;c};
s = cellfun(#strsplit, s, 'UniformOutput', false);
s = vertcat(s{:});
% Empty cell to store output binary strings;
outputBinary = cell(size(s));
outputDec = zeros(size(s));
% Iterate over each row
for rowNum = 1:size(s,1)
% To build up binary string from left to right
binaryString = [];
% Iterate over each column
for colNum = 1:size(s,2)
% Convert hex -> dec -> 8-bit binary word
% and add new word to end of growing string for this row
thisBinary = dec2bin(hex2dec(s{rowNum,end+1-colNum}), 8);
binaryString = [binaryString, thisBinary]; %#ok<AGROW>
% Save solution to output array:
outputBinary{rowNum, colNum} = binaryString;
outputDec(rowNum, colNum) = bin2dec(binaryString);
end
end
% Display result
format long;
disp(outputBinary);

Array manipulation in Perl

The Scenario is as follows:
I have a dynamically changing text file which I'm passing to a variable to capture a pattern that occurs throughout the file. It looks something like this:
my #array1;
my $file = `cat <file_name>.txt`;
if (#array1 = ( $file =~ m/<pattern_match>/g) ) {
print "#array1\n";
}
The array looks something like this:
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
From the above array1 output, the pattern of the array is something like this:
T1 P1 t1(1) t1(2)...t1(25) T2 P2 t2(1) t2(2)...t2(25) so on and so forth
Currently, /g in the regex returns a set of values that occur only twice (only because the txt file contains this pattern that number of times). This particular pattern occurrence will change depending on the file name that I plan to pass dynamically.
What I intend to acheive:
The final result should be a csv file that contains these values in the following format:
T1,P1,t1(1),t1(2),...,t1(25)
T2,P2,t2(1),t2(2),...,t2(25)
so on and so forth
For instance: My final CSV file should look like this:
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
The delimiter for this pattern is T1 which is time in the format \d\d:\d\d:\d\d
Example: 10:38:49, 10:38:51 etc
What I have tried so far:
use Data::Dumper;
use List::MoreUtils qw(part);
my $partitions = 2;
my $i = 0;
print Dumper part {$partitions * $i++ / #array1} #array1;
In this particular case, my $partitions = 2; holds good since the pattern occurrence in the txt file is only twice, and hence, I'm splitting the array into two. However, as mentioned earlier, the pattern occurrence number keeps changing according to the txt file I use.
The Question:
How can I make this code more generic to achieve my final goal of splitting the array into multiple equal sized arrays without losing the contents of the original array, and then converting these mini-arrays into one single CSV file?
If there is any other workaround for this other than array manipulation, please do let me know.
Thanks in advance.
PS: I considered Hash of Hashes and Array of Hashes, but that kind of a data structure did not seem to be healthy solution for the problem I'm facing right now.
As far as I can tell, all you need is splice, which will work fine as long as you know the record size and it's constant
The data you showed has 52 fields, but the description of it requires 27 fields per record. It looks like each line has T, P, and t1 .. t24, rather than ending at t25
Here's how it looks if I split the data into 26-element chunks
use strict;
use warnings 'all';
my #data = qw/
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
/;
while ( #data ) {
my #set = splice #data, 0, 26;
print join(',', #set), "\n";
}
output
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
If you wanted to use List::MoreUtils instead of splice, the the natatime function returns an iterator that will do the same thing as the splice above
Like this
use List::MoreUtils qw/ natatime /;
my $iter = natatime 26, #data;
while ( my #set = $iter->() ) {
print join(',', #set), "\n";
}
The output is identical to that of the program above
Note
It is very wrong to start a new shell process just to use cat to read a file. The standard method is to undefine the input record separator $/ like this
my $file = do {
open my $fh, '<', '<file_name>.txt' or die "Unable to open file for input: $!";
local $/;
<$fh>;
};
Or if you prefer you could use File::Slurper like this
use File::Slurper qw/ read_binary /;
my $file = read_binary '<file_name>.txt';
although you will probably have to install it as it is not a core module

import complex data structure in hive with custom separators

I have a huge dataset with the following structure
fieldA,fieldB,fieldC;fieldD|fieldE,FieldF;fieldG|fieldH,FieldI ...
where:
fieldA,fieldB and fieldC are strings that should be imported into separate columns
fieldD|fieldE,FieldF;fieldG|fieldH,FieldI is an array (elements separated by semicolon) of maps (elements separated by |) of arrays (elements separated by comma, e.g. fieldE,FieldF)
My problem is that the initial array is separated from the fieldA,fieldB,fieldC with a semicolon. My question is how do I set the separators correctly when I create a table.
This one does not recognize an array - although I provide a semicolon as a field separator
CREATE TABLE string_array(
first_part STRING # this would be to store fieldA,fieldB,fieldC
,second_part ARRAY<STRING> # this would be to store fieldD|fieldE,FieldF;fieldG|fieldH,FieldI and split it by semicolon
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\\u003b'
COLLECTION ITEMS TERMINATED BY '\\u003b'
MAP KEYS TERMINATED BY '|'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '...' INTO TABLE string_array;
Any ideas how to make it work so I can build upon it? Thanks a lot in advance!
Great question.
I think that we can break this problem up into two discrete pieces: (1) Hive table structure, and (2) data delimiters.
Let's start by looking at the Hive table structure. If I understood your data structure correctly (please correct me if I didn't), the table structure that would best describe your data could be represented as:
CREATE TABLE string_array
AS
SELECT 'fieldA,fieldB,fieldC' AS first_part, array(map('fieldD', array('fieldE', 'FieldF')), map('fieldG', array('fieldH','FieldI'))) AS second_part;
Note that the field second_part is an array of maps, where the key to each map references an array of strings. In other words, the field second_part consists of an array within a map within an array.
If I use the statement above to create a table, I can then copy the resulting table to the local filesystem and look at how Hive assigns default delimiters to it. I know that you don't want to use default delimiters, but please bear with me here. The resulting table looks like this in its serialized on-disk representation:
00000000 66 69 65 6c 64 41 2c 66 69 65 6c 64 42 2c 66 69 |fieldA,fieldB,fi|
00000010 65 6c 64 43 01 66 69 65 6c 64 44 04 66 69 65 6c |eldC.fieldD.fiel|
00000020 64 45 05 46 69 65 6c 64 46 02 66 69 65 6c 64 47 |dE.FieldF.fieldG|
00000030 04 66 69 65 6c 64 48 05 46 69 65 6c 64 49 0a |.fieldH.FieldI.|
If we look at how Hive sees the delimiters we note that Hive actually sees five types or levels of delimiters:
delimiter 1 = x'01' (between fieldC & fieldD) -- between first_part and second_part
delimiter 2 = x'02' (between fieldF & fieldG) -- between the two maps in the array of maps
delimiter 3 = x'03' not used
delimiter 4 = x'04' (between fieldD & fieldE) -- between the key and the array of fields within the map
delimiter 5 = x'05' (between fieldE & fieldF) -- between the fields within the array within the map
And herein lies your problem. Current versions of Hive (as of 0.11.0) only allow you to override three levels of delimiters. But due to the levels of nesting within your data, Hive is seeing a requirement for greater than three levels of delimiters.
My suggestion would be to pre-process your data to use Hive's default delimiters. With this approach you should be able to load your data into Hive and reference it.

Importing text files with comments in MATLAB

Is there any character or character combination that MATLAB interprets as comments, when importing data from text files? Being that when it detects it at the beginning of a line, will know all the line is to ignore?
I have a set of points in a file that look like this:
And as you can see he doesn't seem to understand them very well. Is there anything other than // I could use that MATLAB knows it's to ignore?
Thanks!
Actually, your data is not consistent, as you must have the same number of column for each line.
1)
Apart from that, using '%' as comments will be correctly recognized by importdata:
file.dat
%12 31
12 32
32 22
%abc
13 33
31 33
%ldddd
77 7
66 6
%33 33
12 31
31 23
matlab
data = importdata('file.dat')
2)
Otherwise use textscan to specify arbitrary comment symbols:
file2.dat
//12 31
12 32
32 22
//abc
13 33
31 33
//ldddd
77 7
66 6
//33 33
12 31
31 23
matlab
fid = fopen('file2.dat');
data = textscan(fid, '%f %f', 'CommentStyle','//', 'CollectOutput',true);
data = cell2mat(data);
fclose(fid);
If you use the function textscan, you can set the CommentStyle parameter to // or %. Try something like this:
fid = fopen('myfile.txt');
iRow = 1;
while (~feof(fid))
myData(iRow,:) = textscan(fid,'%f %f\n','CommentStyle','//');
iRow = iRow + 1;
end
fclose(fid);
That will work if there are two numbers per line. I notice in your examples the number of numbers per line varies. There are some lines with only one number. Is this representative of your data? You'll have to handle this differently if there isn't a uniform number of columns in each row.
Have you tried %, the default comment character in MATLAB?
As Amro pointed out, if you use importdata this will work.

Resources