Data conversion from known data - data-modeling

I am trying to add a function to a software (not written by me, no sources available) in which a byte string seems to be convertet somehow:
I am reading an ID from a RFID token: as Bytes:
52 55 48 48 69 54 52 65 54 65 this ID is written in the database as 57770346.
Not just the entries are different, the length is, too. Maybe the first entrys are cut off.
How could the function look like? How could i find it?
Here is some additional data:
52 55 48 48 69 54 52 65 54 65 -> 57770346
52 55 48 48 69 53 68 67 51 49 -> 57742129
48 49 48 52 68 70 57 69 65 68 -> 76731309

Found it myself:
BIN to ASCII
ASCII as HEX to INT
(first 4 characters are missing in the DB)
I'm embarrassed...

Related

PDF file with two trailers?

If I analyse multiple PDF files with a hex editor, I see that all of them have two trailers.
That's possible if an object has changed or renewed (https://blog.idrsolutions.com/multiple-trailers-in-a-pdf-file/), but in my case, the PDF files are not edited.
Does anyone know why all of the analysed files have two trailers?
This is a PDF file that contains a lot of text and also two images (there are two trailers in this file, who are (almost) identical to each other: :
0001a30bh: 74 72 61 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 ; TRAILER..<</Size
0001a31bh: 20 34 37 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 ; 47/Root 1 0 R/I
0001a32bh: 6E 66 6F 20 31 35 20 30 20 52 2F 49 44 5B 3C 45 ; nfo 15 0 R/ID[<E
0001a33bh: 42 33 46 46 33 41 31 45 33 37 33 43 36 34 45 39 ; B3FF3A1E373C64E9
0001a34bh: 31 30 45 33 46 42 43 34 45 37 38 39 31 33 43 3E ; 10E3FBC4E78913C>
0001a35bh: 3C 45 42 33 46 46 33 41 31 45 33 37 33 43 36 34 ; <EB3FF3A1E373C64
0001a36bh: 45 39 31 30 45 33 46 42 43 34 45 37 38 39 31 33 ; E910E3FBC4E78913
0001a37bh: 43 3E 5D 20 3E 3E 0D 0A 73 74 61 72 74 78 72 65 ; C>] >>..startxre
0001a38bh: 66 0D 0A 31 30 36 33 32 33 0D 0A 25 25 45 4F 46 ; f..106323..%%EOF
0001a39bh: 0D 0A 78 72 65 66 0D 0A 30 20 30 0D 0A 74 72 61 ; ..xref..0 0..TRA
0001a3abh: 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 20 34 37 ; ILER..<</Size 47
0001a3bbh: 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 6E 66 6F ; /Root 1 0 R/Info
0001a3cbh: 20 31 35 20 30 20 52 2F 49 44 5B 3C 45 42 33 46 ; 15 0 R/ID[<EB3F
0001a3dbh: 46 33 41 31 45 33 37 33 43 36 34 45 39 31 30 45 ; F3A1E373C64E910E
0001a3ebh: 33 46 42 43 34 45 37 38 39 31 33 43 3E 3C 45 42 ; 3FBC4E78913C><EB
0001a3fbh: 33 46 46 33 41 31 45 33 37 33 43 36 34 45 39 31 ; 3FF3A1E373C64E91
0001a40bh: 30 45 33 46 42 43 34 45 37 38 39 31 33 43 3E 5D ; 0E3FBC4E78913C>]
0001a41bh: 20 2F 50 72 65 76 20 31 30 36 33 32 33 2F 58 52 ; /Prev 106323/XR
0001a42bh: 65 66 53 74 6D 20 31 30 35 39 37 32 3E 3E 0D 0A ; efStm 105972>>..
0001a43bh: 73 74 61 72 74 78 72 65 66 0D 0A 31 30 37 34 32 ; startxref..10742
0001a44bh: 31 0D 0A 25 25 45 4F 46 ; 1..%%EOF
This is a PDF file that does only contain some random characters:
000071cbh: 74 72 61 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 ; TRAILER..<</Size
000071dbh: 20 32 33 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 ; 23/Root 1 0 R/I
000071ebh: 6E 66 6F 20 39 20 30 20 52 2F 49 44 5B 3C 39 46 ; nfo 9 0 R/ID[<9F
000071fbh: 46 31 32 45 31 43 30 41 35 36 44 42 34 38 41 33 ; F12E1C0A56DB48A3
0000720bh: 41 31 43 37 32 30 33 38 32 33 30 32 45 32 3E 3C ; A1C720382302E2><
0000721bh: 39 46 46 31 32 45 31 43 30 41 35 36 44 42 34 38 ; 9FF12E1C0A56DB48
0000722bh: 41 33 41 31 43 37 32 30 33 38 32 33 30 32 45 32 ; A3A1C720382302E2
0000723bh: 3E 5D 20 3E 3E 0D 0A 73 74 61 72 74 78 72 65 66 ; >] >>..startxref
0000724bh: 0D 0A 32 38 36 35 39 0D 0A 25 25 45 4F 46 0D 0A ; ..28659..%%EOF..
0000725bh: 78 72 65 66 0D 0A 30 20 30 0D 0A 74 72 61 69 6C ; xref..0 0..TRAIL
0000726bh: 65 72 0D 0A 3C 3C 2F 53 69 7A 65 20 32 33 2F 52 ; ER..<</Size 23/R
0000727bh: 6F 6F 74 20 31 20 30 20 52 2F 49 6E 66 6F 20 39 ; oot 1 0 R/Info 9
0000728bh: 20 30 20 52 2F 49 44 5B 3C 39 46 46 31 32 45 31 ; 0 R/ID[<9FF12E1
0000729bh: 43 30 41 35 36 44 42 34 38 41 33 41 31 43 37 32 ; C0A56DB48A3A1C72
000072abh: 30 33 38 32 33 30 32 45 32 3E 3C 39 46 46 31 32 ; 0382302E2><9FF12
000072bbh: 45 31 43 30 41 35 36 44 42 34 38 41 33 41 31 43 ; E1C0A56DB48A3A1C
000072cbh: 37 32 30 33 38 32 33 30 32 45 32 3E 5D 20 2F 50 ; 720382302E2>] /P
000072dbh: 72 65 76 20 32 38 36 35 39 2F 58 52 65 66 53 74 ; rev 28659/XRefSt
000072ebh: 6D 20 32 38 33 37 34 3E 3E 0D 0A 73 74 61 72 74 ; m 28374>>..start
000072fbh: 78 72 65 66 0D 0A 32 39 32 37 35 0D 0A 25 25 45 ; xref..29275..%%E
0000730bh: 4F 46 ; OF
Those files are most likely created by MS Word. The excerpts you posted look like their interpretation of hybrid reference PDFs.
There are two special constructs in which the PDF specification uses the mechanisms it introduced for incremental updates for something else:
Linearized PDFs (see ISO 32000-2:2020 Annex F) and
Hybrid-reference PDFs (see ISO 32000-2:2020 Section 7.5.8.4).
Your excerpts look like the latter type of PDFs.
Some backgrounds:
With PDF 1.5 Adobe introduced the option to collect multiple non-stream indirect objects in a stream, a so called "object stream". The advantage of doing so is that data in streams can be compressed while otherwise those object cannot be compressed. At the same time they also introduced the option to put the cross reference table data into streams, the so called "cross-reference streams", also to allow compression.
Obviously a new type of cross reference entry was necessary to describe indirect objects in object streams, so they defined entries of that kind, but only for the cross-reference streams, not for the old cross reference tables.
PDFs stored using object and cross-reference streams often indeed are much smaller than the same PDFs stored as regular indirect objects with cross reference tables. On the other hand PDF processors that were not aware of these techniques couldn't open these PDFs at all.
Thus, Adobe came up with the idea of hybrid files: Files that contain the basic objects in a PDF required to view it at all in the old-fashioned way and the objects for newer or optional features in object and cross reference streams. The trailers of the cross reference tables contain an entry XRefStm pointing to the cross reference stream.
For some reason, though, it was specified that object lookup first had to be attempted in the cross reference table, and only if no entry was found there for the object number in question, the associated cross reference stream was to be searched.
As the first cross reference table is required to cover the complete range of object numbers used, this lookup strategy implied that hybrid-reference files needed a second cross reference table whose trailer could point to the cross reference stream that would be used for lookups before the innermost, first cross reference table.
This is what we see in your example:
trailer
<</Size 47/Root 1 0 R/Info 15 0 R/ID[<EB3FF3A1E373C64E910E3FBC4E78913C><EB3FF3A1E373C64E910E3FBC4E78913C>] >>
startxref
106323
%%EOF
xref
0 0
trailer
<</Size 47/Root 1 0 R/Info 15 0 R/ID[<EB3FF3A1E373C64E910E3FBC4E78913C><EB3FF3A1E373C64E910E3FBC4E78913C>] /Prev 106323/XRefStm 105972>>
startxref
107421
%%EOF
Actually most PDF producers implemented hybrid-reference files (if they did at all) under the impression that the cross reference stream and probably also the object streams should go between the first trailer and the second cross reference table. But there is no requirement for that, and the PDF export of MS Office chose to put all the streams before the first cross reference table. As that's the case for your examples, too, I assume they were produced by MS Office.

Matlab colon in getting all values of array have different pattern value as compare to without colon

I have created an array in Matlab and am facing an issue that when I ask to show all value of array not using colon, e.g., allfriendsmarks=m(), the result is:
allfriendsmarks =
45 65 57 86 49 56 99 87 67 60
But when I use a colon, allfriendsmarks=m(:) results in:
allfriendsmarks =
45
65
57
86
49
56
99
87
67
60
Why do both methods return a different result?

The first letter is being repeated of the first word

I am reading text from a file and outputting it as binary. I have modified the binary conversion as per follows:
Each capital letter shall start with 01 and will be followed by 5 bits.
The 5 bits shall hold the value of the letter.
The letters will have the value as A-2,B-3,C-4,D-5...
For example: HI-> (0101001)(0101010)
My code snippet is as follows:
void printinbits(int n)
{
for (int c = 4; c >= 0; c--)
{
long int k = n >> c;
if (k & 1)
printf("1");
else
printf("0");
}
}
int main()
{
//first letter is being repeated
char check[200];
FILE*fin= fopen("/Users/priya/Desktop/test.txt.rtf","r");
while((fscanf(fin,"%199s",check))==1)
{
for(int i=0;i<strlen(check);++i)
{
if(check[i]>=65&&check[i]<=90)
{
printf("01");
int n=check[i];
n-=63;
printinbits(n);
}
}
}
return 0;
}
My input->
HELLO
My output->
(0101001)(0101001)(0100110)(0101101)(0101101)(0110000)
(As you can see, the first letter H is being repeated)(Various letters are separated by brackets)
Here's a hex dump of a file hello.rtf containing the word HELLO in upper case. It was generated by TextEdit on a Mac.
0x0000: 7B 5C 72 74 66 31 5C 61 6E 73 69 5C 61 6E 73 69 {\rtf1\ansi\ansi
0x0010: 63 70 67 31 32 35 32 5C 63 6F 63 6F 61 72 74 66 cpg1252\cocoartf
0x0020: 31 34 30 34 5C 63 6F 63 6F 61 73 75 62 72 74 66 1404\cocoasubrtf
0x0030: 34 36 30 0A 7B 5C 66 6F 6E 74 74 62 6C 5C 66 30 460.{\fonttbl\f0
0x0040: 5C 66 73 77 69 73 73 5C 66 63 68 61 72 73 65 74 \fswiss\fcharset
0x0050: 30 20 48 65 6C 76 65 74 69 63 61 3B 7D 0A 7B 5C 0 Helvetica;}.{\
0x0060: 63 6F 6C 6F 72 74 62 6C 3B 5C 72 65 64 32 35 35 colortbl;\red255
0x0070: 5C 67 72 65 65 6E 32 35 35 5C 62 6C 75 65 32 35 \green255\blue25
0x0080: 35 3B 7D 0A 5C 6D 61 72 67 6C 31 34 34 30 5C 6D 5;}.\margl1440\m
0x0090: 61 72 67 72 31 34 34 30 5C 76 69 65 77 77 31 30 argr1440\vieww10
0x00A0: 38 30 30 5C 76 69 65 77 68 38 34 30 30 5C 76 69 800\viewh8400\vi
0x00B0: 65 77 6B 69 6E 64 30 0A 5C 70 61 72 64 5C 74 78 ewkind0.\pard\tx
0x00C0: 37 32 30 5C 74 78 31 34 34 30 5C 74 78 32 31 36 720\tx1440\tx216
0x00D0: 30 5C 74 78 32 38 38 30 5C 74 78 33 36 30 30 5C 0\tx2880\tx3600\
0x00E0: 74 78 34 33 32 30 5C 74 78 35 30 34 30 5C 74 78 tx4320\tx5040\tx
0x00F0: 35 37 36 30 5C 74 78 36 34 38 30 5C 74 78 37 32 5760\tx6480\tx72
0x0100: 30 30 5C 74 78 37 39 32 30 5C 74 78 38 36 34 30 00\tx7920\tx8640
0x0110: 5C 70 61 72 64 69 72 6E 61 74 75 72 61 6C 5C 70 \pardirnatural\p
0x0120: 61 72 74 69 67 68 74 65 6E 66 61 63 74 6F 72 30 artightenfactor0
0x0130: 0A 0A 5C 66 30 5C 66 73 32 34 20 5C 63 66 30 20 ..\f0\fs24 \cf0
0x0140: 48 45 4C 4C 4F 7D HELLO}
0x0146:
You may or may not be able to see the H of 'Helvetica' as the only other capital letter in the file โ€” that would account for producing the output for HHELLO. It looks like you might be on a Mac too, so maybe you'd see the same result โ€” or, at least, an equivalent one. (I used a homebrew hex dump program; you'd probably use xxd -g 1 test.txt.rtf, which would produce the hex with lower-case letters, and wouldn't include the final byte count line.)
You could, and should, print the data that your program reads in the loop, at least while debugging it, so that you can see what the program is processing. This is a very basic debugging technique.
In TextEdit, you can switch between rich text and plain text with the 'Make Plain Text' or 'Make Rich Text' option under the Format menu, or using โ‡งโŒ˜T (shift command T) to toggle between the two modes. Note how the file name changes as you do that.
Community Wiki since M Oehm pointed out the likely problem.

Matlab Recursion Not Working as Intended

So I'm trying to write some pretty basic code to essentially create an array of random numbers, based on certain rules. My end goal is to try to have an array of numbers, in which none of them match one another. However, it seems like the end array that my code is outputting has numbers that match, and I can't seem to figure out why.
I've pasted a sample output below, and as you can see, some of the numbers in the 'Totals' array match. I'm guessing something is wrong with the way I wrote the recursive 'addboard' function, but I have no clue what's wrong. If anyone could provide some advice, that'd be great. Thanks.
function [Components,Totals] = BoardForm1()
BoardLengths = [4,6,8,10,12];
Initial = [0,1,2,3,4,5,6,7];
Components = zeros(8,14);
Totals = zeros(8,14);
for i=1:14
for row = 1:length(Initial)
[currentboard,test] = addboard(row,BoardLengths,Initial,Totals);
Initial(row) = test;
Components(row,i) = currentboard
Totals(row,i) = test
end
end
end
function [currentboard,test] = addboard(x,BoardLengths,Initial,Totals)
currentboard = BoardLengths(randi(length(BoardLengths)));
test = Initial(x) + currentboard;
if ismember(test,Totals)
addboard(x,BoardLengths,Initial,Totals);
end
end
Totals =
12 16 28 34 44 56 68 76 84 94 106 114 118 128
13 25 35 39 43 49 53 61 65 75 83 91 95 103
6 18 22 34 42 50 54 66 72 82 86 92 104 112
15 23 35 41 51 57 63 69 73 81 87 99 105 111
14 26 36 48 58 68 80 90 100 104 108 114 120 130
9 13 23 27 31 37 43 49 55 61 65 75 87 91
12 24 34 42 46 54 60 64 72 76 82 88 92 96
19 29 33 39 47 57 69 77 83 89 101 109 119 125
MATLAB passes by value, so any changes made in the recursed addboard is ignored since its output values are ignored. Fix by setting the output values of [currentboard, test] = addboard
In general, I recommend doing this iteratively (while loop) instead of recursively. There may even be a one-liner that can do this, but I'm not sure from the comments what the requirements of the board are.
I am unable to understand your code, maybe some comments what the functions are supposed to do would be helpful, but just on a formal level reading the code there is at least one error in these lines:
if ismember(test,Totals)
addboard(x,BoardLengths,Initial,Totals);
end
You are calling addboard without output arguments which has no effect. It should probably be:
if ismember(test,Totals)
[currentboard,test] = addboard(x,BoardLengths,Initial,Totals);
end

Count frequency of each row in an array using Matlab

I have this huge array. I extracted the unique rows in a separate array. Now I would like to create a vector in which to store the occurrences of each unique row. How could I do that? Tried using histc. I found about tabulate, but only works on vectors.
x=[62 29 64
63 32 61
63 32 61
63 32 61
63 31 62
62 29 64
62 29 64
65 29 60
62 29 64
63 32 61
63 32 61
63 29 62
63 32 61
62 29 64
];
uA=unique(x)
[row, count] = histc(x,unique(x,'rows'))
I get the following error: Edge vector must be monotonically non-decreasing. Also encountered this error in a couple of other attempts.
Use unique this way -
[unique_rows,~,ind] = unique(x,'rows')
counts = histc(ind,unique(ind))
unqiue_rows and counts would be the outputs you might be interested in.
With your given data it yields -
unique_rows =
62 29 64
63 29 62
63 31 62
63 32 61
65 29 60
counts =
5
1
1
6
1
Bonus: You can improve on performance by avoiding the second use of unique this way -
counts = histc(ind,1:max(ind));

Resources