Disassembly of old Turbo Pascal (V3) code - how to create data segment in IDA - disassembly

I would like to disassemble the final version of a self-written Turbo Pascal V3 program, i.e. a simple .COM file, and to that effect I've dug out my old (AD 2004) registered copy of IDA Pro (V4.7.0.831). Not having used it for more than 10 years, and no longer having access to their forum, I'm now stuck. The .COM file loads, IDA happily disassembles it, but it just creates one single segment, and I have no (longer) a clue on how to create the data segment. There's a bit of info in the TP3 Manual, and using David Lindauer's GRDB in DOXBox-X allows me to single-step through the RTL initialisation code and that shows me it sets up up DS and SS, but it doesn't help me in setting up these segments in IDA.
I've tried the "Create Segment" option, but I'm lost entering the required values for start address, end address and base, "class" is probably "DATA", the once for the single "seg000" that IDA creates are CODE, start # 0x0100, end # 0xD623, which leads me to assume that a to-be-created "seg001" should start at 0x0000, end at 0xffff, and have a base of 0xd63 (paragraphs), but that results in a "Bad segment base: segment would have bytes with a negative offset" pop-up.
Trying start # 0xd630, end # 0x1d630, with a base 0x0000 creates a segment, but it looks like
seg000:D622
seg001:C8C00 ; ---------------------------------------------------------------------------
seg001:C8C00
seg001:C8C00 ; Segment type: Regular
seg001:C8C00 seg001 segment byte public '' use16
seg001:C8C00 assume cs:seg001
seg001:C8C00 ;org 0C8C00h
seg001:C8C00 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, gs:nothing
Which may be correct, but the "org 0c8c00" makes absolutely no sense to me.
If you can help me I would be grateful, and to help you, I've uploaded a RAR archive with the full sources, the resulting "lift.com" executable and the input file to my Google drive # https://drive.google.com/drive/folders/0B0oygbfs7DsVVWNBZWpqaHRHX3c?usp=sharing, look for lift16bit.rar Please note that the code will not compile with anything more advanced than Turbo Pascal 3, and in my case it was compiled with TP 3.01a.

The following IDA snippet of IDC code will set up the segment registers for programs compiled with Turbo Pascal V3.01a:
//-------------------------------------------------------------------
// This code sets-up the Turbo Pascal segment registers
//-------------------------------------------------------------------
auto _rds;
auto _lds;
_rds = word(word(0x101) + 0x103 + 9);
_lds = word(word(0x101) + 0x103 + 11);
add_segm_ex(0X100, _rds * 16, 0, 0, 1, 2, ADDSEG_NOSREG);
SegRename(0X100, "cseg");
SegClass (0X100, "CODE");
SegDefReg(0x100, "ds", _rds);
SegDefReg(0x100, "es", 0xFFFF);
SegDefReg(0x100, "ss", 0xFFFF);
SegDefReg(0x100, "fs", 0xFFFF);
SegDefReg(0x100, "gs", 0xFFFF);
set_segm_type(0X100, 2);
add_segm_ex(_rds * 16, (_rds + _lds) * 16, _rds, 0, 3, 2, ADDSEG_NOSREG);
SegRename(_rds * 16, "dseg");
SegClass (_rds * 16, "DATA");
SegDefReg(_rds * 16, "ds", _rds);
SegDefReg(_rds * 16, "es", 0xFFFF);
SegDefReg(_rds * 16, "ss", 0xFFFF);
SegDefReg(_rds * 16, "fs", 0xFFFF);
SegDefReg(_rds * 16, "gs", 0xFFFF);
set_segm_type(_rds * 16, 3);
set_inf_attr(INF_LOW_OFF, 0xffff);
set_inf_attr(INF_HIGH_OFF, 0xffff);
It will (quite likely) work for other versions of TP3, maybe and even for TP1/2, but no guarantees!
The IDC code relies on the disassembled TP3 code from http://www.pcengines.ch/tp3.htm and single-stepping through it using a debugger, I use David Lindauer's GRDB # https://ladsoft.tripod.com/grdb_debugger.html

Related

c - Are defined values slower than hard-coded numbers

this question might appear dumb to you but I couldn't find an answer to it and I want to be sure that it works as I think.
Recently I came across this code:
void RDP_G_SETBLENDCOLOR(void)
{
Gfx.BlendColor.R = _SHIFTR(w1, 24, 8) * 0.0039215689f;
Gfx.BlendColor.G = _SHIFTR(w1, 16, 8) * 0.0039215689f;
Gfx.BlendColor.B = _SHIFTR(w1, 8, 8) * 0.0039215689f;
Gfx.BlendColor.A = _SHIFTR(w1, 0, 8) * 0.0039215689f;
if(OpenGL.Ext_FragmentProgram && (System.Options & BRDP_COMBINER)) {
glProgramEnvParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 2, Gfx.BlendColor.R, Gfx.BlendColor.G, Gfx.BlendColor.B, Gfx.BlendColor.A);
}
}
I understand that the 0.0039215689f (which refers to 1/255) is hard-coded for optimization reasons.
Now imagine that I want to define it
for readability reasons (even if the name chosen here is not better, it's just for the example).
#define PIXEL_VALUE 0.0039215689f
void RDP_G_SETBLENDCOLOR(void)
{
Gfx.BlendColor.R = _SHIFTR(w1, 24, 8) * PIXEL_VALUE;
Gfx.BlendColor.G = _SHIFTR(w1, 16, 8) * PIXEL_VALUE;
Gfx.BlendColor.B = _SHIFTR(w1, 8, 8) * PIXEL_VALUE;
Gfx.BlendColor.A = _SHIFTR(w1, 0, 8) * PIXEL_VALUE;
if(OpenGL.Ext_FragmentProgram && (System.Options & BRDP_COMBINER)) {
glProgramEnvParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 2, Gfx.BlendColor.R, Gfx.BlendColor.G, Gfx.BlendColor.B, Gfx.BlendColor.A);
}
}
Would this define make the code execution slower?
Would this define make the code execution slower?
No, since these two code snippets are identical, because MACROS are expanded before a translation unit is compiled.
Macros do text replacement. The code that gets compiled is exactly the same as if you copied and pasted the replacement text of the macro in your code.
I believe they make no difference at all.
A macro is a pattern of text replacement. So it gets replaced before your code is compiled.
You can try preprocessing both files and see the difference in a terminal:
gcc -E 1.c -o 1.i
gcc -E 2.c -o 2.i
diff -u 1.i 2.i

Correct way to unpack a 32 bit vector in Perl to read a uint32 written in C

I am parsing a Photoshop raw, 16 bit/channel, RGB file in C and trying to keep a log of exceptional data points. I need a very fast C analysis of up to 36 MPix images with 16 bit quanta or 216 MB Photoshop .RAW files.
<1% of the points have weird skin tones and I want to graph them with PerlMagick or Perl GD to see where they are coming from.
The first 4 bytes of the C data file contain the unsigned image width as a uint32_t. In Perl, I read the whole file in binary mode and extract the first 32 bits:
Xres=1779105792l = 0x6a0b0000
It looks a lot like the C log file:
DA: Color anomalies=14177=0.229%:
DA: II=1) raw PIDX=0x10000b25, XCols=[0]=0x00000b6a
Dec(0x00000b6a) = 2922, the Exact X_Columns_Width of a small test file.
Clearly a case of intel's 1972 8008 NUXI architecture. How hard could it possibly be to translate 0x6a0b0000 to 0x6a0b0000; swap 2 bytes and 2 nibbles and you're done. Slicing the 8 characters and rearranging them could be done but that is the kind of ugly hack I am trying to avoid.
Grab the same 32 bit vector from file offset zero and unpack it as "VAX" unsigned long.
$xres = vec($bdat, 0, 32); # vec EXPR,OFFSET,BITS
$vul = unpack("V", vec($bdat, 0, 32));
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731
Every single hex character is mangled. Obviously wrong Endian, it is not VAX. The "Other" one is Network Big-endian
http://perldoc.perl.org/functions/pack.html
N An unsigned long (32-bit) in "network" (big-endian) order.
V An unsigned long (32-bit) in "VAX" (little-endian) order.
$nul = unpack("N", vec($bdat, 0, 32)); # Network Unsigned Long 32b
printf("Xres=0x%08x, NET ulong=%ul=0x%08x\n", $xres, $nul, $nul);
Xres=0x6a0b0000, NET ulong=825702201l=0x31373739
The $XRES still shows the right hex in the wrong order. The "NETWORK" long 32 bit uint extracted from the same bits is unrecognizable. Try Binary
$bits = unpack("b*", vec($bdat, 0, 32));
printf("bits=$bits, len=%d\n", length $bits);
bits=10001100111011001110110010011100100011000000110010101100111011001001110001001100, len=80
I clearly asked for 32 bits and got 80 bits. What gives?
Try for 4, unsigned, 8bit bytes which can NOT be swapped:
for($ii = 0; $ii < 4; $ii++) {
$bit_off=$ii*8; # Bit offset
$uc = unpack("C", vec($bdat, $bit_off, 8)); # C An unsigned char
printf("II $ii, bo $bit_off, d=%d, u=%u, x=0x%x\n",
$uc,$uc, $uc);
}
II 0, bo 0, d=49, u=49, x=0x31
II 1, bo 8, d=51, u=51, x=0x33
II 2, bo 16, d=49, u=49, x=0x31
II 3, bo 24, d=49, u=49, x=0x31
I am looking for hex 0, 6, a or b. There are no "3"s or "1"s in the right answer. Try pirating from a C file:
http://cpansearch.perl.org/src/MHX/Convert-Binary-C-0.76/tests/include/include/bits/byteswap.h
$x = $xres;
$x= (((($x) & 0xff000000) >> 24) | ((($x) & 0x00ff0000) >> 8) | ((($x) & 0x0000ff00) << 8) | ((($x) & 0x000000ff) << 24));
printf("\$xres=0x%08x -> \$x=0x%08x = %u\n", $xres, $x, $x);
$xres=0x6a0b0000 -> $x=0x00000b6a = 2922
It WORKS! But, this is uglier than converting the original, wrong order hex number to a string to untangle it:
$stupid_str = sprintf("%08x", $xres);
$stupid_num = join('', reverse ($stupid_str =~ m/../g));
printf("Stupid_num '%s'->0x%08x=%d\n", $stupid_num, $dec=hex $stupid_num, $dec);
Stupid_num '00000b6a'->0x00000b6a=2922
It's like judging the Ugliest Dog contest, but I would still rather have to maintain the text version than the even more abominable C version.
I know there are ways to do this in Java/Python/Go/Ruby/.....
I know there are command line utilities that do exactly this.
I must figure out how I am misusing either VEC or Unpack, both of which I have used a zillion times. It is the Brain Teasing aspect which is driving me nuts! EndianNess == EndianMess!!!
TYVM!
=================================================
Borodin,
Thanks for lookin' at this.
My intel processor is little-endian. When I read it back, it was trans-mutilated by vec to the "correct" big-endian, network format.
I just tried reading it VERBATIM from a BINARY file read and it works fine:
($b4 = $bdat) =~ s/^(....).*$/$1/msg; # Give me my 4 bytes back without mutilation!
printf("B4='%s'=>0x%08x=<0x%08x\n", $b4, unpack("L>", $b4), unpack("L<", $b4));
B4='j...' = >0x6a0b0000 = <0x00000b6a <<< THE RIGHT ANSWER!!!
If you try unpack 'V', $bdat then you will find that it works
That was my first attempt:
$vul = unpack("V", vec($bdat, 0, 32)); # UNPACK V!
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731 <<<< TOTALLY WRONG!
I had already verified that the $BDAT info was the right data in the wrong format. It just needed some rearrangement.
I just used vec() to generate 1 bit and 4 bit graphics files and it worked faithfully, returning the exact bits I wrote. It must have mistaken my Intel i7 for my IBM System/370. I7/37??? Easy mistake to make. :)
I read the [confusing] part about "converted to a number as with pack ...". That's why my number was backward. The >>unpack("V", vec($bdat"<< ... was my ill-fated attempt to byte-swap the backward number in $BDAT from the WRONG VEC()-preferred FORMAT to the native format supported by my architecture.
Now I understand why I saw so many examples of people extracting by the byte, to avoid Big Brother's helping hand!
Data::BitStream::Vec "uses a Perl vec to store the data. The vector is accessed in 1-bit units"
Thanks 1E6,
B
You are confusing things by combining vec with unpack
The correct way is simply
unpack 'V', $bdat
which returns a value of 0x00000B6A as you expect
vec($bdat, 0, 32) is equivalent to unpack 'N', $bdat as you can see from the value of $xres in your first code block, and the documentation for vec confirms this with
If BITS is 16 or more, bytes of the input string are grouped into chunks of size BITS/8, and each group is converted to a number as with pack()/unpack() with big-endian formats n/N
The line
$vul = unpack("V", vec($bdat, 0, 32))
is very wrong, because the decimal value of vec($bdat, 0, 32) is 1779105792, so you are then calling unpack on the string "1779105792" which doesn't do anything useful at all

Emulation Implementing CPU instructions?

I'm trying to learn emulation programming. I've done a CHIP-8 emulator, Under 40 instructions, and lived because of my music. I'm now hoping to do something A bit more complex, like an SNES. The problem I'm encountering is the sheer number of CPU instructions. Looking through the wiki.SuperFamicom.org 65c816 instruction listing, It look's like a pain in the rear. And I've seen notes here and there on various internet pages that the CPU is the easyest part of an emulator to impliment.
Under the assumption that it was so hard because I was doing it wrong, I looked around and found a simple implimentation: SNES Emulator in 15 minutes which is about 900 lines of code. Easy enough to work through.
So then, from the SNES Emulator in 15 minutes Source, I found where the CPU instructions are. It look's a lot simpler than what I was thinking. I dont really understand it, but it's a few lines of code as opposed to a large mass of code. First thing I notice is that the instructions only have 1 implimentation each. If you look at the table in SuperFamicom then you see that it has
ADC #const
ADC (_db_),X
ADC (_db_,X)
ADC addr
ADC long
...
And The emulator source for (I think) ALL of those is:
// Note: op 0x100 means "NMI", 0x101 means "Reset", 0x102 means "IRQ". They are implemented in terms of "BRK".
// User is responsible for ensuring that WB() will not store into memory while Reset is being processed.
unsigned addr=0, d=0, t=0xFF, c=0, sb=0, pbits = op<0x100 ? 0x30 : 0x20;
// Define the opcode decoding matrix, which decides which micro-operations constitute
// any particular opcode. (Note: The PLA of 6502 works on a slightly different principle.)
const unsigned o8 = op / 32, o8m = 1u << (op%32);
// Fetch op'th item from a bitstring encoded in a data-specific variant of base64,
// where each character transmits 8 bits of information rather than 6.
// This peculiar encoding was chosen to reduce the source code size.
// Enum temporaries are used in order to ensure compile-time evaluation.
#define t(w8,w7,w6,w5,w4,w3,w2,w1,w0) if( \
(o8<1?w0##u : o8<2?w1##u : o8<3?w2##u : o8<4?w3##u : \
o8<5?w4##u : o8<6?w5##u : o8<7?w6##u : o8<8?w7##u : w8##u) & o8m)
t(0,0xAAAAAAAA,0x00000000,0x00000000,0x00000000,0xAAAAA2AA,0x00000000,0x00000000,0x00000000) { c = t; t += A + P.C; P.V = (c^t) & (A^t) & 0x80; P.C = t & 0x100; }
In short, my General question:
Condensing the phenomenal cosmic power of CPU instructions into an itty bitty piece of code
Questions specific to the SNES emulator in 15 minutes source (portion posted above):
How does t(0, 0xAAAAAAAA, 0x00000000, ....) parse the instruction? I see the if statment, but I dont know where the number's for any of the arguments come from, or what they mean to the overall code.
Why o8 = op / 32 and o8m = 1u << (op%32)?
The opcodes for ADC has ADC #const which has a 2 byte operand, or ADC addr which has a 3 byte operand. And the code t(0, 0xAAAAAAAA, ...) impliments both cases?
While I'm asking:
what do the dp, _dp_ and sr that appear in ADC dp, ADC (_dp_) and ADC sr,S mean?
what is the difference between ADC (_dp_,X) and ADC dp,X? (probably redundand given the question above.)
I can't answer all of this, but dp stands for Direct Page, meaning that the instruction takes a single-byte operand which is a memory address within the Direct Page. Direct Page addressing is an extension of the Zero Page addressing mode of the 6502, where the single-byte addresses referred to memory locations $00 through $FF. The 16-bit derivatives of the 6502 have a configuration register which basically relocates the Zero Page to an alternate location.f
In the wiki page you linked to, some of the dp in the table have underscores on them, and the others are in italics. I assume that they are all intended to be italic, and the wiki markup isn't working. A quick check of the Edit link supports this assumption (in the wiki source, they all have underscores). So don't read anything into that.
In 6502 assembly and derivatives of it, ADC dp,X means... let's take a concrete example instead... ADC $10,X means to add $10 to the value in register X to obtain an address, then load a value from that address and add it to the accumulator. ADC ($10,X) adds an extra level of indirection: add $10 to X to obtain an address, load a value from that address, interpret the loaded value as another address, and load the value from that address and add it to the accumulator. Parenthesized operands always add a level of indirection.
Note that the available modes include (dp,X) and (dp),Y and the placement of the parentheses relative to the comma and register is significant. With (dp),Y the value of Y is added to the first loaded value to get the address to use in the second load.
As for that emulator... code golf doesn't lead to enhanced readability! I don't think the portion you've posted is actually understandable by itself, and I don't feel like tracking down and reading the rest of it. But the key concept in the t macro is bitstring. Its arguments are a series of 9 bitmasks, each 32 bits long, for a total of 288 bits. Every possible opcode (256 of them), plus the 3 pseudo-opcodes mentioned in the first comment, is therefore represented by a single bit in this 288-bit-long bitstring, with 29 bits left over.
That explains the construction of o8 and o8m. The 8-bit value is split into a 3-bit portion (to select an argument from the 8 arguments supplied to t) and a 5-bit portion (to select a single bit from the selected argument). The big ?: chain does the first selection and the combination of & and 1 << ... does the select selection.
And then, oh look we have a variable called t too. It's not related to the macro. Giving them the same name was just cruel.
Maybe I can figure out what that bitstring is doing. When the opcode is a low number, o8 (the high bits) will be 0, so the ?: chain will use w0, which is the last argument to the macro. As the opcode increases, the selected argument moves leftward through the argument list to w1, then w2... and the o8m selector likewise starts at the right and moves left (& (1<<0) is the rightmost bit, & (1<<1) is the next one, etc.) and the if condition will be true when the selected bit is 1. Values are:
0, # opcodes $100 and up
0xAAAAAAAA, # opcodes $E0 to $FF
0x00000000, # opcodes $C0 to $DF
0x00000000, # opcodes $A0 to $BF
0x00000000, # opcodes $80 to $9F
0xAAAAA2AA, # opcodes $60 to $7F
0x00000000, # opcodes $40 to $5F
0x00000000, # opcodes $20 to $3F
0x00000000 # opcodes $00 to $1F
or in binary
0, # opcodes $100 and up
0b10101010101010101010101010101010, # opcodes $E0 to $FF
0b00000000000000000000000000000000, # opcodes $C0 to $DF
0b00000000000000000000000000000000, # opcodes $A0 to $BF
0b00000000000000000000000000000000, # opcodes $80 to $9F
0x10101010101010101010001010101010, # opcodes $60 to $7F
0b00000000000000000000000000000000, # opcodes $40 to $5F
0b00000000000000000000000000000000, # opcodes $20 to $3F
0b00000000000000000000000000000000 # opcodes $00 to $1F
Reading each line from right to left, the 1's are in positions corresponding to these opcodes: $61 $63 $65 $67 $69 $6D $6F $71 $73 $75 $77 $79 $7B $7D $7F $E1 $E3 $E5 $E7 $E9 $EB $ED $EF $F1 $F3 $F5 $F7 $F9 $FB $FD $FF
Hmm... that sort of resembles the list of ADC and SBC opcodes, but some of them are wrong.
Oh (I finally gave up and looked at some more of the emulator code) that's a NES emulator, not a SNES emulator, so it only has 6502 opcodes.

Finding position of '1's efficiently in an bit array

I'm wiring a program that tests a set of wires for open or short circuits. The program, which runs on an AVR, drives a test vector (a walking '1') onto the wires and receives the result back. It compares this resultant vector with the expected data which is already stored on an SD Card or external EEPROM.
Here's an example, assume we have a set of 8 wires all of which are straight through i.e. they have no junctions. So if we drive 0b00000010 we should receive 0b00000010.
Suppose we receive 0b11000010. This implies there is a short circuit between wire 7,8 and wire 2. I can detect which bits I'm interested in by 0b00000010 ^ 0b11000010 = 0b11000000. This tells me clearly wire 7 and 8 are at fault but how do I find the position of these '1's efficiently in an large bit-array. It's easy to do this for just 8 wires using bit masks but the system I'm developing must handle up to 300 wires (bits). Before I started using macros like the following and testing each bit in an array of 300*300-bits I wanted to ask here if there was a more elegant solution.
#define BITMASK(b) (1 << ((b) % 8))
#define BITSLOT(b) ((b / 8))
#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
#define BITCLEAR(a,b) ((a)[BITSLOT(b)] &= ~BITMASK(b))
#define BITTEST(a,b) ((a)[BITSLOT(b)] & BITMASK(b))
#define BITNSLOTS(nb) ((nb + 8 - 1) / 8)
Just to further show how to detect an open circuit. Expected data: 0b00000010, received data: 0b00000000 (the wire isn't pulled high). 0b00000010 ^ 0b00000000 = 0b0b00000010 - wire 2 is open.
NOTE: I know testing 300 wires is not something the tiny RAM inside an AVR Mega 1281 can handle, that is why I'll split this into groups i.e. test 50 wires, compare, display result and then move forward.
Many architectures provide specific instructions for locating the first set bit in a word, or for counting the number of set bits. Compilers usually provide intrinsics for these operations, so that you don't have to write inline assembly. GCC, for example, provides __builtin_ffs, __builtin_ctz, __builtin_popcount, etc., each of which should map to the appropriate instruction on the target architecture, exploiting bit-level parallelism.
If the target architecture doesn't support these, an efficient software implementation is emitted by the compiler. The naive approach of testing the vector bit by bit in software is not very efficient.
If your compiler doesn't implement these, you can still code your own implementation using a de Bruijn sequence.
How often do you expect faults? If you don't expect them that often, then it seems pointless to optimize the "fault exists" case -- the only part that will really matter for speed is the "no fault" case.
To optimize the no-fault case, simply XOR the actual result with the expected result and a input ^ expected == 0 test to see if any bits are set.
You can use a similar strategy to optimize the "few faults" case, if you further expect the number of faults to typically be small when they do exist -- mask the input ^ expected value to get just the first 8 bits, just the second 8 bits, and so on, and compare each of those results to zero. Then, you just need to search for the set bits within the ones that are not equal to zero, which should narrow the search space to something that can be done pretty quickly.
You can use a lookup table. For example log-base-2 lookup table of 255 bytes can be used to find the most-significant 1-bit in a byte:
uint8_t bit1 = log2[bit_mask];
where log2 is defined as follows:
uint8_t const log2[] = {
0, /* not used log2[0] */
0, /* log2[0x01] */
1, 1 /* log2[0x02], log2[0x03] */
2, 2, 2, 2, /* log2[0x04],..,log2[0x07] */
3, 3, 3, 3, 3, 3, 3, 3, /* log2[0x08],..,log2[0x0F */
...
}
On most processors a lookup table like this will go to ROM. But AVR is a Harvard machine and to place data in code space (ROM) requires special non-standard extension, which depends on the compiler. For example the IAR AVR compiler would need use the extended keyword __flash. In WinAVR (GNU AVR) you would need to use the PROGMEM attribute, but it's more complex than that, because you would also need to use special macros to to read from the program space.
I think there is only one way to do this:
Create an array out "outdata". Each item of the array can for example correspond an 8-bit port register.
Send the outdata on the wires.
Read back this data as "indata".
Store the indata in an array mapped exactly as the outdata.
In a loop, XOR each byte of outdata with each byte of indata.
I would strongly recommend inline functions instead of those macros.
Why can't your MCU handle 300 wires?
300/8 = 37.5 bytes. Rounded to 38. It needs to be stored twice, outdata and indata, 38*2 = 76 bytes.
You can't spare 76 bytes of RAM?
I think you're missing the forest through the trees. Seems like a bed of nails test. First test some assumptions:
1) You know which pins should be live for each pin tested/energized.
2) you have a netlist translated for step 1 into a file on sd
If you operate on a byte level as well as bit, it simplifies the issue. If you energize a pin, there is an expected pattern out stored in your file. First find the mismatched bytes; identify mismatched pins in the byte; finally store the energized pin with the faulty pin numbers.
You don't need an array for searching, or results. general idea:
numwires=300;
numbytes=numwires/8 + (numwires%8)?1:0;
for(unsigned char currbyte=0; currbyte<numbytes; currbyte++)
{
unsigned char testbyte=inchar(baseaddr+currbyte)
unsigned char goodbyte=getgoodbyte(testpin,currbyte/*byte offset*/);
if( testbyte ^ goodbyte){
// have a mismatch report the pins
for(j=0, mask=0x01; mask<0x80;mask<<=1, j++){
if( (mask & testbyte) != (mask & goodbyte)) // for clarity
logbadpin(testpin, currbyte*8+j/*pin/wirevalue*/, mask & testbyte /*bad value*/);
}
}

What is wrong with my program?

I cannot figure out what is wrong. I spent a few hours trying to debug this. I am compiling with gcc -m32 source.c -o source
How else can I approach this when debugging? Right now, I am isolating the code in many different ways and everything is working the way I expect but its working the wrong way when I have it all together.
This program takes an input and then looks for the highest position with the 1 bit.
I removed my code for now.
in bitsearch, you are storing num in eax, you store a special value in edx in order to perform check. check is testing if the highest bit is set (indicating a negative number), and exits if its the case...
the andl instruction in check stores the result of the operation inside the second operand (eax), so the result overwrites num.
then in zero you are using edx to perform your computation... edx contains the special value of the start of the function, so your result will always be wrong.
now at the end of zero, you are going back to check, but the check is unnecessary here, you should loop back to zeroinstead...
Does the bit-search need to be implemented in assembly? A simple for loop can accomplish the same task, and is much more readable:
int num = 10;
int maxFound = -1;
for (int numShifts = 0; numShifts < 32 && num != 0; numShifts++) {
if ((num & 1) == 1) {
maxFound = numShifts;
}
num = num >> 1;
}
//the last position that had a 1 will be in maxFound
There's a neat bit-fiddling trick: x & -x isolates the last 1-bit. The following C program uses a lookup table based on de Bruijn sequences to compute the number of trailing (!) zeros of a number in constant (!) time:
unsigned int x; // find the number of trailing zeros in 32-bit x
int r; // result goes here
int table[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = table[((uint32_t)((x & -x) * 0x077CB531U)) >> 27];
Doing this in assembly language (which I stopped learning by the age of 16) should be no problem. Now all you have to do is to reverse the bits in num and apply the technique described above.
I wrote a paper about the trick described above, but unfortunately it's not available on the web. If you're interested, I can send it to you (or anyone else who's interested) by email.
My assembly knowledge is a little rusty, but it seems to me like bitsearch is overly complicated. How about just rotating the number to the right and counting the times you need to do that until it's zero?

Resources