EF ADN specification in SIM/USIM - sim-card

I am building an application to read SIM EF files. From 3G TS 31.102 I am trying to parse the EF ADN file.
According to spec for EF ADN,
1 to X Alpha Identifier O X bytes
X+1 Length of BCD number/SSC contents M 1 byte
X+2 TON and NPI M 1 byte
X+3 to X+12 Dialling Number/SSC String M 10 bytes
X+13 Capability/Configuration Identifier M 1 byte
X+14 Extension1 Record Identifier M 1 byte
I am not able to get the coding for -> Length of BCD number/SSC contents.
In the spec the coding is according to GSM 04.08 but I am not able to find.

There is a good utility class for BCD operations to test. Assuming that you are asking how to get length of BCD digits of Abbreviated Dialling Number. ADN numbers can be 3-4 digits , if they are written as BCD they would be 2 bytes long because each BCD digit is 4-bits nibble, after TON/NPI byte you should read N bytes and convert to it to decimal value
byte[] bcds = DecToBCDArray(211);
System.out.println("BCD is "+ Hex.toHexString(bcds));
System.out.println("BCD length is "+ bcds.length);
System.out.println("To decimal "+ BCDtoString(bcds));

Related

Can I sum integer input from terminal without saving the input as a variable?

I'm trying to write a code for digital root of an extremely big number and can't save it as a variable. Is it possible to do without it?
What you're looking to do is to repeatedly add the digits of a number until you're left with a single digit number, i.e. given 123456, you want 1 + 2 + 3 + 4 + 5 + 6 = 21 ==> 2 + 1 = 3
For a number with up to 50 million digits, the sum of those digits will be no more than 500 million which is well within the range of a 32-bit int.
Start by reading the large number as a string. Then iterate over each character in the string. For each character, verify that it's a character digit, i.e. between '0' and '9'. Convert that character to the appropriate number, then add that number to the sum.
Once you've done that, you've got the first-level sum stored in an int. Now you can loop through the digits of that number using x % 10 to get the lowest digit and x / 10 to shift over the remaining digits. Once you've exhausted the digits, repeat the process until you're left with a value less than 10.

Integer compression method

How can I compress a row of integers into something shorter ?
Like:
Input: '1 2 4 5 3 5 2 3 1 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 5 3 5 2 3 1 2 3 4')
Note:Input will only contain numbers between 1-5 and the total string of number will be 10-16
Is there any way I can compress it to 3-5 numbers?
Here is one way. First, subtract one from each of your little numbers. For your example input that results in
0 1 3 4 2 4 1 2 0 1 2 3
Now treat that as the base-5 representation of an integer. (You can choose either most significant digit first or last.) Calculate the number in binary that means the same thing. Now you have a single integer that "compressed" your string of little numbers. Since you have shown no code of your own, I'll just stop here. You should be able to implement this easily.
Since you will have at most 16 little numbers, the maximum resulting value from that algorithm will be 5^16 which is 152,587,890,625. This fits into 38 bits. If you need to store smaller numbers than that, convert your resulting value into another, larger number base, such as 2^16 or 2^32. The former would result in 3 numbers, the latter in 2.
#SergGr points out in a comment that this method does not show the number of integers encoded. If that is not stored separately, that can be a problem, since the method does not distinguish between leading zeros and coded zeros. There are several ways to handle that, if you need the number of integers included in the compression. You could require the most significant digit to be 1 (first or last depends on where the most significant number is.) This increases the number of bits by one, so you now may need 39 bits.
Here is a toy example of variable length encoding. Assume we want to encode two strings: 1 2 3 and 1 2 3 0 0. How the results will be different? Let's consider two base-5 numbers 321 and 00321. They represent the same value but still let's convert them into base-2 preserving the padding.
1 + 2*5 + 3*5^2 = 86 dec = 1010110 bin
1 + 2*5 + 3*5^2 + 0*5^3 + 0*5^4 = 000001010110 bin
Those additional 0 in the second line mean that the biggest 5-digit base-5 number 44444 has a base-2 representation of 110000110100 so the binary representation of the number is padded to the same size.
Note that there is no need to pad the first line because the biggest 3-digit base-5 number 444 has a base-2 representation of 1111100 i.e. of the same length. For an initial string 3 2 1 some padding will be required in this case as well, so padding might be required even if the top digits are not 0.
Now lets add the most significant 1 to the binary representations and that will be our encoded values
1 2 3 => 11010110 binary = 214 dec
1 2 3 0 0 => 1000001010110 binary = 4182 dec
There are many ways to decode those values back. One of the simplest (but not the most efficient) is to first calculate the number of base-5 digits by calculating floor(log5(encoded)) and then remove the top bit and fill the digits one by one using mod 5 and divide by 5 operations.
Obviously such encoding of variable-length always adds exactly 1 bit of overhead.
Its call : polidatacompressor.js but license will be cost you, you have to ask author about prices LOL
https://github.com/polidatacompressor/polidatacompressor
Ncomp(65535) will output: 255, 255 and when you store this in database as bytes you got 2 char
another way is to use "Hexadecimal aka base16" in javascript (1231).toString(16) give you '4cf' in 60% situation it compress char by -1
Or use base10 to base64 https://github.com/base62/base62.js/
4131 --> 14D
413131 --> 1Jtp

Can big-endian order be associated with the way Englishmen say the numbers under 100 and small-endian with the way Germans say those numbers?

I was thinking that for me and most people around me big-endian order of the bytes in memory seems the most natural way of arranging numbers.
You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight
The most significant digit is written first and then you continue to write the next digits from the next most significant to the least significant This is the same way you say the numbers.
But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit.
I think this is a good analogy to endianness.
"I was thinking that for me... Big-endian order of the bytes in memory seems the most natural way of arranging numbers... You start with the most significant bytes, just like you write the numbers down"
Actually all binary data (zero/one bits) is written in MSB format. We always write the value as starting with MSD (Most-Significant Digit) on the left side, just like in real-life.
However, with having 8 slots within a byte to fill, we write the value itself starting from right side and increasing upwards by shifting to the left. PS: Endianness only applies at multi-byte level.
Summarily: In a single byte (holding a < 100 value like 28 or even 99)
The value 28 is written as 28 (but since it's binary format, it looks like : 11100).
To write value we start at right side : x x x 1 1 1 0 0 (where most-left 1is the MSD).
So the value itself is written in MSB style, but noted within the byte using LSB style of writing.
There is no concept of endiannes within a single-byte value
Example : Imagine bits were slots for holding 0-9 digits...
We still write 28 as : [0 0 0 0 0 0 2 8] so the twenties part is placed like MSB but the whole value starts from the right as if written in LSB style.
Since a single byte does not have endianness, writing value 28 is never going to look like : [0 0 0 0 0 0 8 2] and never as [2 8 0 0 0 0 0 0] since that would give an incorrect 82 or incorrect 28 million values.
"You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight... But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit. I think this is a good analogy to endianness."
Sorry. No it isn't. It stopped being a good analogy as a soon as you mentioned that it involves one byte. A verbally spoken eight-twenty phrase could mean a different thing compared to the written decimal value 820.
What about the English eight-ten (aka eight-teen) for value 18? By your logic the Germans also say eight-ten, right? What happens to eight-ten when a machine is told to simply "reverse" the input when converting between English and German style?

How to handle this in huffman coding?

The input for the compression character with frequencies are,
A = 1
B = 2
C = 4
D = 8
E = 16
F = 32
G = 64
H = 128
I = 256
J = 512
K = 1024
L = 2048
M = 4096
N = 8192
The huffman coding algorithm is,
First we have to pick two lowest frequencies characters and implement a tree, with the parent as sum of those two character frequencies.
After than put 0 to left child and 1 to right child.
Then finally select the value for each character as binary form , to select this starts form root and find it is placed in left or right, after that if it is placed in left add 0, if it is right add 1.
It forms a tree it goes above 8 level. We have to mention the binary in 8 bits only. But for this input, the bit crosses the 8.
Here what we have to do?
If you encode all 256 possible values, some will be represented by more than 8 bits, that's right. But your encoded string isn't interpreted as an array ob bytes, but as a series of bits, which may occupy more than one byte, so it is okay to have branches of your Huffman tree that go deeper than eight levels.
Say you have a Huffman tree that contains these encodings (among others):
E 000 # 3 bits
X 0100000001 # 10 bits
NUL 001 #3 bits
Now when you want to encode the string EEXEEEX, you get:
E E X E E E X NUL # original text
000 000 0100000001 000 000 000 0100000001 001 # encoded bits
You now organise this series of bits into blocks of 8, that is bytes:
eeeEEExx xxxxxxxx EEEeeeEE Exxxxxxx xxxNNN # orig
00000001 00000001 00000000 00100000 00100100 # bits
enc[0] enc[1] enc[2] enc[3] enc[4] # bytes
(The sub-blocks of four are just for easy reading. The last two zero bits are padding.) The byte array enc is now your encoded string.
The compression comes from the fact that frequently used characters occupy less than a byte. For example the first two Es fit into a single byte. Infrequent charactes like X here have a longer encoding, which may even span several bytes.
You must, of course extract the current bit from the current byte in order to traverse your Huffman tree. You'll need the bitwise operators for that.

Decode table construction for base64

I am reading this libb64 source code for encoding and decoding base64 data.
I know the encoding procedure but i can't figure out how the following decoding table is constructed for fast lookup to perform decoding of encoded base64 characters. This is the table they are using:
static const char decoding[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};
Can some one explain me how the values in this table are used for decoding purpose.
It's a shifted and limited ASCII translating table. The keys of the table are ASCII values, the values are base64 decoded values. The table is shifted such that the index 0 actually maps to the ASCII character + and any further indices map the ASCII values after +. The first entry in the table, the ASCII character +, is mapped to the base64 value 62. Then three characters are ignored (ASCII ,-.) and the next character is mapped to the base64 value 63. That next character is ASCII /.
The rest will become obvious if you look at that table and the ASCII table.
It's usage is something like this:
int decode_base64(char ch) {
if (ch < `+` or ch > `z`) {
return SOME_INVALID_CH_ERROR;
}
/* shift range into decoding table range */
ch -= `+`;
int base64_val = decoding[ch];
if (base64_val < 0) {
return SOME_INVALID_CH_ERROR;
}
return base64_val;
}
As know, each byte has 8 bits, possible 256 combinations with 2 symbols (base2).
With 2 symbols is need to waste 8 chars to represent a byte, for example '01010011'.
With base 64 is possible to represent 64 combinations with 1 char...
So, we have a base table:
A = 000000
B = 000001
C = 000010
...
If you have the word 'Man', so you have the bytes:
01001101, 01100001, 01101110
and so the stream:
011010110000101101110
Break in group of six bits: 010011 010110 000101 101110
010011 = T
010110 = W
000101 = F
101110 = u
So, 'Man' => base64 coded = 'TWFu'.
As saw, this works perfectly to streams whith length multiple of 6.
If you have a stream that isn't multiple of 6, for example 'Ma' you have the stream:
010011 010110 0001
you need to complete to have groups of 6:
010011 010110 000100
so you have the coded base 64:
010011 = T
010110 = W
000100 = E
So, 'Ma' => 'TWE'
After to decode the stream, in this case you need to calc the last multiple length to be multiple of 8 and so remove the extra bits to obtain the original stream:
T = 010011
W = 010110
E = 000100
1) 010011 010110 000100
2) 01001101 01100001 00
3) 01001101 01100001 = 'Ma'
In really, when we put the trailing 00s, we mark the end of Base64 string with '=' to each trailing '00 added ('Ma' ==> Base64 'TWE=')
See too the link: http://www.base64decode.org/
Images represented on base 64 is a good option to represent with strings in many applications where is hard to work directly with a real binary stream. Real binary stream is better because is a base256, but is difficult inside HTML for example, there are 2 ways, minor traffic, or more easy to work with strings.
See ASCII codes too, the chars of base 64 is from range '+' to 'z' on table ASCII but there are some values between '+' and 'z' that isn't base 64 symbols
'+' = ASCII DEC 43
...
'z' = ASCII DEC 122
from DEC 43 to 122 are 80 values but
43 OK = '+'
44 isn't a base 64 symbols so the decoding index is -1 (invalid symbol to base64)
45 ....
46 ...
...
122 OK = 'z'
do the char needed to decode, decremented of 43 ('+') to be index 0 on vector to quick access by index so, decoding[80] = {62, -1, -1 ........, 49, 50,51};
Roberto Novakosky
Developer Systems
Considering these 2 mapping tables:
static const char decodingTab[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};
static unsigned char encodingTab[64]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
decodingTab is the reverse mapping table of encondingTab.
So decodingTab[i] should never be -1.
In fact, only 64 values are expected. However decodingTab size is 128.
So, in decodingTab,unexpected index values are set to -1 (an arbitrary number which is not in [0,63])
char c;
unsigned char i;
...
encoding[decoding[c]]=c;
decoding[encoding[i]=i;
Hope it helps.

Resources