how to use arm neon vbit intrinsics? - arm

I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want.
For example I have a source vector like this:
uint8x16_t source = 39 62 9b 52 34 5b 47 48 47 35 0 0 0 0 0 0
The destination vector is:
uint8x16_t destination = 0 0 0 0 0 0 0 0 0 0 0 0 c3 c8 c5 d5
I would like to have as an output this:
39 62 9b 52 34 5b 47 48 47 35 0 0 c3 c8 c5 d5
meaning that I want to copy the first ten bytes from the source and leave the other 6 unchanged.
I'm using this mask:
{0,0,0,0,0,0,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};
What is the correct way to use the vbslq_u8?

The ARM documentation is not very clear, but it looks like you would need to use the intrinsic like this:
uint8x16_t src = {0x39,0x62,0x9b,0x52,0x34,0x5b,0x47,0x48,
0x47,0x35,0x00,0x00,0x00,0x00,0x00,0x0};
uint8x16_t dest = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xc3,0xc8,0xc5,0xd5};
uint8x16_t mask = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00};
dest = vbslq_u8(mask, src, dest);
Note that order of bytes in the mask needs to correspond with the order in the source/dest registers (they seem to be swapped in your question ?).
Also note that the first param to the intrinsic appears to be the selection mask, where a 1 bit selects the corresponding bit from the second param and a 0 bit selects the corresponding bit from the third param.

Related

Converting a hexadecimal array into Binary and then into Decimal

I have a sample structure which has two sets of data. The first data contains the following Hex array '00 7F 3F FF 08 FF 60 26' and then when I convert it into binary and then decimal I get a correct answer which is '0 127 63 255 8 255 96 38'.
However, I have some data arrays which are not exactly arranged as the first one, they look something like this '1 40 0 F 00 40 00 47' and when I try to convert these kind of data sets the result is inaccurate. I get something like this '64 0 64 0 71' while the expected result is '1 64 0 15 0 64 0 71'.
This is my code with a sample data:
%% Structure
a(1).Id = 118;
a(1).Data = '00 7F 3F FF 08 FF 60 26';
a(2).Id = 108;
a(2).Data = '1 40 0 F 00 40 00 47';
%% Hexadecimal (Data) --> Binary --> Decimal
Data = a(2).Data;
str = regexp(Data,' ','split');
Ind = cellfun(#length,str);
str = str(Ind==2);
%Hex to Binary
binary = hexToBinaryVector(str,8,'MSBFirst');
%Binary to Decimal
Decimal = bi2de(binary,'left-msb');
Any help will be really appreciated!
adding 2 lines should do the trick:
str = regexp(Data,' ','split');
Ind = cellfun(#length,str);
str(Ind==1) = strcat('0',str(Ind==1) );
Ind = cellfun(#length,str);
str = str(Ind==2);
All it is doing is when it sees a String (your Hex) that is 1 Char, it puts a 0 infront of it, so correct it into its correct format. you can actually do this in the cellfun.

Array contents display in pairs

I have an array for example: A=[01 255 03 122 85 107]; and I want to print the contents as
A=
FF 01
7A 03
6B 55
Basically a read out from a memory. Is there any function in MatLab lib? I need to do this with minimum use of loops.
Use this -
str2num(num2str(fliplr(reshape(A,2,[])'),'%1d'))
Output -
ans =
21
43
65
87
If you only want to print it as characters, use it without str2num, like this -
num2str(fliplr(reshape(A,2,[])'),'%1d')
Output -
ans =
21
43
65
87
General case with zeros padding -
A=[1 2 3 4 5 6 7 8 9 3] %// Input array
N = 3; %// groupings, i.e. 2 for pairs and so on
A = [A zeros(1,N-mod(numel(A),N))]; %// pad with zeros
out = str2num(num2str(fliplr(reshape(A,N,[])'),'%1d'))
Output -
out =
321
654
987
3
Edit for hex numbers :
Ar = A(flipud(reshape(1:numel(A),2,[])))
out1 = reshape(cellstr(dec2hex(Ar))',2,[])'
out2 = [char(out1(:,1)) repmat(' ',[numel(A)/2 1]) char(out1(:,2))]
Output -
out1 =
'FF' '01'
'7A' '03'
'6B' '55'
out2 =
FF 01
7A 03
6B 55

Detecting I-frame data in an MPEG-4 transport stream

I am testing a project. I need to break the payload data(making zero some bytes) of the MPEG-4 ts packets by a percentage coming from the user. I am doing it by reading the ".ts" file packet by packet(188 bytes). But the video is changing to really mud after process. (By the way I'm writing the program in C)
So I decided to find the data/packets that belongs to I-frames, then not touching them but scrambling the other datas by percentage. I could find below
(in hex)
00 00 00 01 E0 start of video PES packet
..
..
00 00 01 B8 start of group of pictures header
..
..
00 00 01 00 the picture start code. This is 32 bits. The 10 bits immediately following this is called as the temporal reference. So temporal reference will include the byte following the picture start code and the first two bits of the second byte after the picture start code ie one byte(8 bits) + 2 bits. These we need to skip. Now the three bits present(3, 4 and 5th bits of the second byte from the picture start code) will indicate the Frame type ie I, B or P. So to get this simply logical AND & the second byte from the picture start code with 0x38 and right shift >> with 3.
For example the data is like that;
00 00 01 00 00 0F FF F8 00 00 01 B5........... and so on.
Here the first four bytes 00 00 01 00 is the picture start code.
The fifth byte and the first two bits of the sixth byte is the temporal reference.
So our concern is in the sixth byte --> 0F
((0F & 38)>>3)
Frame type = 1 ==> I Frame
Frame type 000 forbidden
Frame type 001 intra-coded (I) - iframe
Frame type 010 predictive-coded (P) - p frame
Frame type 011 bidirectionally-predictive-coded (B) - b frame
But this is for MPEG-2. Is there some patterns like that so I recognize and get the frame type with bitwise operations for MPEG-4 transport stream(extension is ".ts")?
And I need to get how many bytes or packets belong to that frame?
Thanks a lot for your help
I would parse the complete TS packet. So first determine what PID your video stream belongs to (by parsing the PAT and PMT). Then find keyframes by looking for the 'Random Access indicator' bit in the Adaptation Field.
uint8_t *pkt = <your 188 byte TS packet>;
assert( 0x47 == pkt[0] );
int16_t pid = ( ( pkt[1] & 0x1F) << 8 ) | pkt[2];
if ( pid == video_pid ) {
// found video stream
if( ( pkt[3] & 0x20 ) && ( pkt[4] > 0 ) ) {
// have AF
if ( pkt[5] & 0x40 ) {
// found keyframe
} } }
If you are using H.264 there should be specific byte stream for I and P frame ..
Like 0x0000000165 for I frame and 0x00000001XX for P frame ..
So just parse and look for continuous such byte stream in such a way you can identify I or P frame..
Again above byte stream is codec implementation dependent ..
For more information you can look into FFMPEG..

XBee packet format

I have to IEEE 802.15.4 devices running. The question is about XBee-PRO.
Firmware: XBEE PRO 802.15.4 (Version: 10e6)
Hardware: XBEE (Version: 1744)
Both units are configured to the same channel (15) and same PAN id (0x1234). It's hooked to my machines COM port and can actually transmit data when I connect picocom to it. (It responds to AT commands properly and can be configured fully through moltosenso Network Manager - I'm on a Mac). All other registers are at their defaults, apart from the serial baudrate.
The XBee side source address is at 0x1, destination address is 0x2. Now when I type an ASCII character into picocom, this is what I see received on the other device, running in promiscous mode:
-- Typing "a"
E 61 88 7E 34 12 2 0 1 0 2B 0 61 E1
E 61 88 7E 34 12 2 0 1 0 2B 0 61 E1
E 61 88 7E 34 12 2 0 1 0 2B 0 61 E1
E 61 88 7E 34 12 2 0 1 0 2B 0 61 E1
-- Typing "b"
E 61 88 7F 34 12 2 0 1 0 2C 0 62 58
E 61 88 7F 34 12 2 0 1 0 2C 0 62 58
E 61 88 7F 34 12 2 0 1 0 2C 0 62 58
E 61 88 7F 34 12 2 0 1 0 2C 0 62 58
--- Typing "a" again
E 61 88 80 34 12 2 0 1 0 2D 0 61 A9
E 61 88 80 34 12 2 0 1 0 2D 0 61 A9
...
ln pc pan da sa ct pl ck
So for every data payload sent, I see four frames sent out (nobody is picking them up of course). I suppose three of these are 802.15.4 retries, and XBee adds another one for kicks (although the RR register is clearly zero...).
What's the packet format here and where is this specified?
I've looked at XBee API packets and this does look vaguely similar, but I don't see 0x7e delimiters or anything like that here.
I guess what I am seeing is:
ln = length
61 = ??
88 = ??
pc = some sort of packet counter
pan = 16 bits of PAN ID
da = 16 bits of destination address
sa = 16 bits of source address
ct = another counter?
0 = ??
pl = my ASCII character payload
ck = probably a checksum
I tried with setting PAN to 0xFFFF and setting the destination address to 0xFF or broadcast, seeing pretty much the same. These 0x61 and 0x88 don't seem to correspond to much anything in the XBee documentation...
It doesn't directly look like 802.15.4 MAC level data frame either - or if it does, what are the missing fields and where are they specified? Pointers?
EDIT:
Actually, hmm. After importing a hex-formatted dump into Wireshark, it told me exactly that it's a 802.15.4 MAC frame and how to read it.
IEEE 802.15.4 Data, Dst: 0x0002, Src: 0x0001, Bad FCS
Frame Control Field: Data (0x8861)
.... .... .... .001 = Frame Type: Data (0x0001)
.... .... .... 0... = Security Enabled: False
.... .... ...0 .... = Frame Pending: False
.... .... ..1. .... = Acknowledge Request: True
.... .... .1.. .... = Intra-PAN: True
.... 10.. .... .... = Destination Addressing Mode: Short/16-bit (0x0002)
..00 .... .... .... = Frame Version: 0
10.. .... .... .... = Source Addressing Mode: Short/16-bit (0x0002)
Sequence Number: 126
Destination PAN: 0x1234
Destination: 0x0002
Source: 0x0001
I still don't know where the second 16-bit counter comes from in front of the actual data byte, and why FCS is messed up (I had to strip the beginning len field to get Wireshark to read it - that's probably it.)
I think the second counter ct is a counter for the application layer in Zigbee protocol to notice when it should update its data because it is receiving a new one :)
For more information about Frames Format in Zigbee Stack try to download this :
Newnes.ZigBee.Wireless.Networks.and.Transceivers.Sep.2008.eBook-DDU.pdf
Have a nice day :)
Have you try to read packets with X-CTU software?
I suggest you to read this post entry: http://www.tunnelsup.com/xbee-guide/
The pdf with the "Quick Reference Guide" is really useful and contains some data format indicated.
Also, it's always good to study the real documentation from developer (Digi in this case).
The frame is like:
API Frame
But only if you have configured previously the xbee to work in API mode with command:
ATAP 1
Or with XCTU.
Try monitoring communication between two XBee modules to see what the acknowledgement frame looks like.
Try sending a sequence of bytes.
Try performing a Node Discovery (ATND) to see what those frames look like.
Try sending a remote AT command from X-CTU to see what those frames and responses look like.
When reverse engineering a protocol, it's useful to see both sides of the conversation. You can test various theories by emulating each side of the protocol, and trying out variations on what you see. For example, "What if I change this byte, does the remote end still respond?".
My guess is that you're correct about the ct byte being a counter. The following zero byte could be flags, or it could identify the type of packet sent (serial data, remote AT command/response, node discovery/response, etc.).
As you build up an understanding of the structure, you can write a program to parse and dump the contents of the frames. Dump an interpreted version of what you know, and leave the unknown bytes as a sequence of hex bytes. Continue to experiment until you can narrow down the meaning of the remaining bytes.
The extra 2 bytes in payload (0x2D 0x0) is MaxStream header (MM in XCTU). If you disable the MaxStream headers by setting the MM command to without MaxStream headers, then these two bytes will become a part of a 802.15.4 payload, so your full payload would become 2B 0 61 instead of just 61

What's the proper implementation for hardware emulation?

I'm going to programme a Game Boy emulator (Z80 is the CPU in case somebody is not familiar with it), and while I was doing my research, I've found some things I'm not so sure about.
The first one was that C is the programming language to choose here. That's not so much of a problem, but I'd like to hear your opinion from today's point of view. Even C++ was not recommended.
The second thing I found out was that everybody was using one function per opcode. That seems logical since it's just one function call and probably better optimised than having one function for the "ADD" instruction and then you've got to find out what registers are used here. But how necessary is that today? Is it something I should stick to or should I rather rewrite my emulator if I notice that another way which might be more convenient just doesn't cut it (more or less modern gaming consoles pop into my mind right now)?
Also, it's kind of demotivating to write a function for "add that register to this register" over and over again. Is there a way to automate that from an opcode map or something like that?
I mostly agree with WingsOfIcarus. I wrote a few emulators already so here is my insight:
The use of function pointers is a good idea (for speed and clarity of code)
OOP is not a problem
Yes, member calls are a little bit slower, but if you are careful it will not affect performance too much. On the other hand, OOP emulation code is much better to manage/read/understand.
Use an instruction database instead of fixed instruction decoding.
I am using a single text file which consist of all the necessary information for all instructions. The emulator parses it during initialization (feeds the arrays of function pointers and operands...). In this architecture it is very easy to correct errors in the instruction set without any code change.
Complex instruction sets documentation are almost always faulty to some point. The worst case is Z80 (I have never see a 100% error-free instruction set). So use more instruction sets, compare them and create an error-free set (if you can).
Add sound, video, keyboard and mouse to your emulation
This is usually not a problem. On Windows use WaveOut instead of DirectSound. It's more stable, much faster (usable latencies of DSound are sometimes even > 400 ms). With WaveOut I was able to lover latency to 20-80 ms which is OK.
Apply limit speed by T cycles of emulated CPU per second
I am using machine cycle correct timings which is much slower, but allows me to correctly implement any hardware periphery emulations as (FDC, DMAC, sound chips,...without any hacks)
Apply load/save of files for the emulated platform
For example, this is part of my instruction set (which is directly fed to CPU emulation:
opc T0 T1 MC1 MC2 MC3 MC4 MC5 MC6 MC7 mnemonic
B8 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,B
B9 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,C
BA 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,D
BB 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,E
BC 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,H
BD 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,L
BE 07 00 M1R 4 MRD 3 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,(HL)
BF 04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,A
C0 11 05 M1R 5 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET NZ
C1 10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 POP BC
C2L2H2 10 10 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP NZ,U16
C3L1H1 10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP U16
C4L2H2 17 10 M1R 4 MRD 3 MRD 4 MWR 3 MWR 3 ... 0 ... 0 CALL NZ,U16
C5 11 00 M1R 5 MWR 3 MWR 3 ... 0 ... 0 ... 0 ... 0 PUSH BC
C6U2 07 00 M1R 4 MRD 3 ... 0 ... 0 ... 0 ... 0 ... 0 ADD A,U8
C7 11 00 M1R 5 MWR 3 MWR 3 ... 0 ... 0 ... 0 ... 0 RST 00H
C8 11 05 M1R 5 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET Z
C9 10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET
CAL2H2 10 10 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP Z,U16
opc: operation code [hex]
L1,H1,U1,S1 means first operand direct number or address
L2,H2,U2,S2 means second operand direct number or address
L3,H3,U3,S3 means third operand direct number or address
H,L ... U16 high and low byte
U ... U8 unsigned byte
S ... S8 signed byte
T0 normal instruction duration [T] always 2 decimal digits
T1 instruction duration if condition not met [T] always 2 decimal digits
MC1++ Machine cycle first is type,second is duration [T] always 1 decimal digit
... unused
M1R M1 cycle
MRD memory read
MWR memory write
IOR IO read
IOW IO write
NON no external operation (internal computation)
INT interrupt cycle
mnem instruction text (mnemonic)
opc is used for the address in an array of pointers
mnemonic is used to select the proper function pointer, and operands type
T0 and T1 are used for instructions timing (this is enough for rough emulations)
MC1++ are used for correct MC timings (to implement correct hardware emulation and contentions timing)
Here is my Zilog Z80A complete instruction set with machine cycle timing link for download. Feel free to use (just mention my nick somewhere). After porting to this I was finally able to 100% pass the ZEXALL test. For more info see Writing a graphical Z80 emulator in C or C++.
First suggestion, you shouldn't use nested switch statements, you should rather use array of function pointers, alot faster -> better emulation, and nicer code, nested switch-es can also get a bit messy, here are some links where you can read more about these arrays http://www.newty.de/fpt/fpt.html
http://www.multigesture.net/wp-content/uploads/mirror/zenogais/FunctionPointers.htm
Second suggestion, Yes you can do it in C#, Java, C++, but since you want every single bit of your CPU cycles so you can get as close emulation as possible - emulating one CPU cycle of target architecture with least number of CPU cycles on curret architecture, and OOP isn't so good in this case from what I heard/read from people. One of the things is performance, and second is pretty much obvious, emulation is, as you probably noticed, really complex task and wraping it in OOP can be unnecessary pain in the neck.
Here's a pretty cool implementation of working with some opcodes for an NES emulator:
http://bisqwit.iki.fi/jutut/kuvat/programming_examples/nesemu1/
Here's the accompanying youtube videos that have a little more explanation as to what's going on
http://www.youtube.com/watch?v=y71lli8MS8s
It uses C++ templates and some additional C++11 features. As to whether you choose C++ or C that is up to you but it shouldn't really matter a whole lot. If you're just emulating a gameboy I doubt that speed is going to be an issue on modern processors so try to just use whatever you're comfortable with.

Resources