CHIP-8 SDL rendering problems - c

I have coded a chip-8 emulator.Whatever I do, it seems that I cannot show any pixels on the screen.The weird thing is that I have checked the code, top-bottom for 2 days already, and there does not seem to be any problem.It reads the .rom file into memory, and fetches the OP code correctly.
Here is the source code:
SDL_SetRenderDrawColor( renderer, 0, 0, 0, SDL_ALPHA_OPAQUE );
SDL_RenderClear(renderer);
uint32_t pixels[(WINDOW_WIDTH / 10) * (WINDOW_HEIGHT / 10)];
uint16_t i;
for(i = 0; i < 64*32; i++){
pixels[i] = (0x00FFFFFF * display[i]) | 0xFF000000;
}
//upload the pixels to the texture
SDL_UpdateTexture(tex,NULL,pixels, 64 * sizeof(uint32_t));
//Now get the texture to the screen
SDL_RenderCopy(renderer,tex,NULL,NULL);
SDL_RenderPresent(renderer); // Update screen
ch8.drawF = false;
uint16_t x = ch8->V[((ch8->opcode & 0x0F00) >> 8)];
uint16_t y = ch8->V[((ch8->opcode & 0x00F0) >> 4)];
uint8_t n = (ch8->opcode & 0x000F);
for(i = 0; i < n; i++) {
uint8_t pixel= memory[ch8->I.word + i];
for(j = 0; j < 8; j++) {
if((pixel & (0x80 >> j)) != 0){
if(display[x + j + ((y + i) * 64)] == 1) {
ch8->V[0xF] = 1;
}
display[x + j + ((y + i) * 64)] ^= 1;
}
}
}

So basically, the problem was at init() function.I was initially using, SDL_CreateWindow and SDL_CreateRenderer,but now I'm using ,SDL_CreateWindowAndRenderer, which takes pointers to pointers of SDL_Window and SDL_Renderer instead of a pointer to a char and a pointer to a window.
Also there were 3 problems I fixed.
1.I was adding + 0x200 to NNN opcodes,because at firstly I thought that the NNN in ROM's are relative to 0, so I removed +0x200 from each XNNN opcode.Also I forgot a * at SDL_Texture* tex, its supposed to be SDL_Texture** tex, I was merely changing the address the local pointer was poiting too...
2.at opcode 2NNN, instead of (ch8->SP) = ch8->opcode & 0x0FFF; its(ch8->SP) = ch8->PC.word;
3.at opcode FX65 its i <= ((ch8->opcode & 0x0F00) >> 8)
Basically, the differences between SDL_CreateWindowAndRenderer and SDL_CreateWindow&SDL_CreateRenderer had me confused, I should had check'd the documentation first.
Now I only need to make the emulator only redraw the changed pixels, then make the emulator play sound.

Related

C code run slower when SIMD instructions are used?

I am a SIMD new, writing a program that converts an image from ARGB to grayscale, and the main operation code is as follows:
void* ptr;
int* pBitmap;
posix_memalign(&ptr, 16, height * width * sizeof(int));
pBitmap = (int*)ptr;
for(row = 0; row < height; row++){
for(col = 0; col < width; col++){
int pixel = pBitmap[col + row * width];
int alpha = (pixel >> 24) & 0xff;
int red = (pixel >> 16) & 0xff;
int green = (pixel >> 8) & 0xff;
int blue = pixel & 0xff;
int bw = (int)(red * 0.299 + green * 0.587 + blue * 0.114);
pBitmap[col + row * width] = (alpha << 24) + (bw << 16) + (bw << 8) + (bw);
}
}
And this is my modified SIMD program, which is much slower than the original one.
__m128i bw;
__m128i* rec;
__m128d blue, grees, red, alpha;
for(int i = 0; i < width * height; i += 2){
rec = (__m128i*)(pBitmap + i);
alpha = _mm_cvtepi32_pd(_mm_srli_epi32(*rec, 24));
red = _mm_cvtepi32_pd(_mm_and_si128(_mm_srli_epi32(*rec, 16), _mm_set1_epi32(0xff)));
green = _mm_cvtepi32_pd(_mm_and_si128(_mm_srli_epi32(*rec, 8), _mm_set1_epi32(0xff)));
blue = _mm_cvtepi32_pd(_mm_and_si128(*rec, _mm_set1_epi32(0xff)));
bw = _mm_add_epi32(_mm_cvtpd_epi32(_mm_mul_pd(reds, _mm_set_pd1(0.299))), _mm_cvtpd_epi32(_mm_mul_pd(greens, _mm_set_pd1(0.587))));
bw = _mm_add_epi32(bws, _mm_cvtpd_epi32(_mm_mul_pd(blues, _mm_set_pd1(0.114))));
*rec = _mm_add_epi32(_mm_add_epi32(_mm_slli_epi32(_mm_cvtpd_epi32(alphas), 24), _mm_slli_epi32(bws, 16)), _mm_add_epi32(_mm_slli_epi32(bws, 8), bws));
}
Is the reason for this result because there are more type conversions? But I don't know where else I can optimize, please help me, thank you.
A few issues with your implementation.
SIMD works best when doing multiple pixels at a time in parallel. Do an Internet search "Arrays of Structures vs. Structures of Arrays" for some examples.
Why use doubles instead of single-precision? That's halving your throughput.
Most compilers do not have way to automatically create data constants from SIMD vectors. All those calls to _mm_set_* intrinsics are doing a lot of things at runtime you should really do at compile time.
Replace all the use of _mm_set_* macros with something like:
union simdConstant
{
float f[4];
__m128 v;
};
static const simdConstant c_luminance = { { 0.299f, 0.587f, 0.114f, 1.f } };
static const simdConstant c_luminanceRed = { { 0.299f, 0.299f, 0.299f, 0.299f } };
Then use c_luminance.v or c_luminanceRed.v instead of _mm_set_ps or _mm_set_ps1.
See also DirectXMath which will provide numerous examples of SIMD implementations.

Color gradient in C

I'm taking my first steps in C, and was trying to make a gradient color function, that draws a bunch of rectangles to the screen (vertically).
This is the code so far:
void draw_gradient(uint32_t start_color, uint32_t end_color) {
int steps = 8;
int draw_height = window_height / 8;
//Change this value inside the loop to write different color
uint32_t loop_color = start_color;
for (int i = 0; i < steps; i++) {
draw_rect(0, i * draw_height, window_width, draw_height, loop_color);
}
}
Ignoring the end_color for now, I want to try and pass a simple red color in like 0xFFFF0000 (ARGB)..and then take the red 'FF' and convert it to an integer or decrease it using the loop_color variable.
I'm not sure how to go get the red value from the hexcode and then minipulate it as a number and then write it back to hex..any ideas?
So in 8 steps the code should for example go in hex from FF to 00 or as integer from 255 to 0.
As you have said, your color is in RGB format. This calculation assumes vertical gradient - meaning from top to the bottom (linear lines).
Steps to do are:
Get number of lines to draw; this is your rectangle height
Get A, R, G, B color components from your start and end colors
uint8_t start_a = start_color >> 24;
uint8_t start_r = start_color >> 16;
uint8_t start_g = start_color >> 8;
uint8_t start_b = start_color >> 0;
uint8_t end_a = end_color >> 24;
uint8_t end_r = end_color >> 16;
uint8_t end_g = end_color >> 8;
uint8_t end_b = end_color >> 0;
Calculate step for each of the components
float step_a = (float)(end_a - start_a) / (float)height;
float step_r = (float)(end_r - start_r) / (float)height;
float step_g = (float)(end_g - start_g) / (float)height;
float step_b = (float)(end_b - start_b) / (float)height;
Run for loop and apply different step for each color
for (int i = 0; i < height; ++i) {
uint32_t color = 0 |
((start_a + i * step_a) & 0xFF) << 24 |
((start_r + i * step_r) & 0xFF) << 16 |
((start_g + i * step_g) & 0xFF) << 8 |
((start_b + i * step_b) & 0xFF) << 0
draw_horizontal_line(i, color);
}
It is better to use float for step_x and multiply/add on each iteration. Otherwise with integer rounding, you may never increase number as it will always get rounded down.

Round Constants in Keccak

Recently, just for the heck of it, I've been playing around with an attempt at implementing Keccak, the cryptographic primitive behind SHA-3. I've run into some issues however, specifically with calculating the round constants used in the "Iota" step of the permutation.
Just to get it out of the way: Yes. I know they are round constants. I know I could hard code them as constants. But where's the fun in that?
I've specifically been referencing the FIPS 202 specification document on SHA-3 as well as the Keccak team's own Keccak reference. However, despite my efforts, I can't seem to end up with the correct constants. I've never dealt with bit manipulation before, so if I'm doing something the complete wrong way, feel free to let me know.
rc is a function defined in the FIPS 202 standard of Keccak that is a linear feedback shift register with a feedback polynomial of x^8 + x^6 + x^5 + x^4 + 1.
The values of t (specific to SHA-3) are defined as the set of integers that includes j + 7 * i_r, where i_r = {0, 1, ..., 22, 23} and j = {0, 1, ..., 4, 5}.
The expected outputs (the round constants) are defined as follows: 0x0000000000000001, 0x0000000000008082, 0x800000000000808a,
0x8000000080008000, 0x000000000000808b, 0x0000000080000001,
0x8000000080008081, 0x8000000000008009, 0x000000000000008a,
0x0000000000000088, 0x0000000080008009, 0x000000008000000a,
0x000000008000808b, 0x800000000000008b, 0x8000000000008089,
0x8000000000008003, 0x8000000000008002, 0x8000000000000080,
0x000000000000800a, 0x800000008000000a, 0x8000000080008081,
0x8000000000008080, 0x0000000080000001, and 0x8000000080008008.
rc Function Implementation
uint64_t rc(int t)
{
if(t % 255 == 0)
{
return 0x1;
}
uint64_t R = 0x1;
for(int i = 1; i <= t % 255; i++)
{
R = R << 0x1;
R |= (((R >> 0x0) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x0;
R |= (((R >> 0x4) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x4;
R |= (((R >> 0x5) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x5;
R |= (((R >> 0x6) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x6;
R &= 0xFF;
}
return R & 0x1;
}
rc Function Call
for(int i_r = 0; i_r < 24; i_r++)
{
uint64_t RC = 0x0;
// TODO: Fix so the limit is not constant
for(int j = 0; j < 6; j++)
{
RC ^= (rc(j + 7 * i_r) << ((int) pow(2, j) - 1));
}
printf("%llu\n", RC);
}
Any help on this matter is much appreciated.
I made some random changes to the code and now it works. Here are the highlights:
The j loop needs to count from 0 to 6. That's because 2^6-1 = 63. So if j is never 6, then the output can never have the MSB set, i.e. an output of 0x8... is not possible.
Using the pow function is generally a bad idea for this type of application. double values have a nasty habit of being slightly lower than desired, e.g. 4 is actually 3.99999999999, which gets truncated to 3 when you convert it to an int. Doubtful that was happening in this case, but why risk it, since it's easy to just multiply variable shift by 2 on each pass through the loop.
The maximum value for t is 7*23+6 = 167, so the % 255 does nothing (at least with the value of i and t in this code). Also, there's no need to treat t == 0 as a special case. The loop won't run when t is 0, so the result is 0x1 by default.
Implementing a linear feedback shift register is quite simple in C. Each term in the polynomial corresponds to a single bit. x^8 is just 2^8 which is 0x100 and x^6 + x^5 + x^4 + 1 is 0x71. So whenever bit 0x100 is set, you XOR the result by 0x71.
Here's the updated code:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
uint64_t rc(int t)
{
uint64_t result = 0x1;
for (int i = 1; i <= t; i++)
{
result <<= 1;
if (result & 0x100)
result ^= 0x71;
}
return result & 0x1;
}
int main(void)
{
for (int i = 0; i < 24; i++)
{
uint64_t result = 0x0;
uint64_t shift = 1;
for (int j = 0; j < 7; j++)
{
uint64_t value = rc(7*i + j);
result |= value << (shift - 1);
shift *= 2;
}
printf("0x%016" PRIx64 "\n", result);
}
}

Writing images with an Arduino

I have an SD card, SD card shield, and Arduino Uno R3. I need to write an image onto the SD card. I would much rather prefer going from a raw array to JPEG/PNG/BMP/etc, rather than using the formats that are easy to write, but not really openable (PPM, PGM, etc).
Is the image writing function included in the Arduino standard libraries? If not, what library should I use? I've looked at lodePNG, but ran into weird errors (vector is not a member of std).
I take zero credit for this code as I pulled it from a thread on the Arduino forums (http://forum.arduino.cc/index.php?topic=112733.0). It writes a .bmp file to an SD card.
Another discussion indicated that because of the compression algorithms associated with JPG and PNG files, the amount of code to make those work would be more difficult to fit on an Arduino, which makes sense in my head (http://forum.arduino.cc/index.php?topic=76376.0).
Hope this helps. Definitely not an expert with Arduino - just tinkered a bit.
#include <SdFat.h>
#include <SdFatUtil.h>
/*
WRITE BMP TO SD CARD
Jeff Thompson
Summer 2012
TO USE MEGA:
The SdFat library must be edited slightly to use a Mega - in line 87
of SdFatConfig.h, change to:
#define MEGA_SOFT_SPI 1
(this uses pins 10-13 for writing to the card)
Writes pixel data to an SD card, saved as a BMP file. Lots of code
via the following...
BMP header and pixel format:
http://stackoverflow.com/a/2654860
SD save:
http://arduino.cc/forum/index.php?topic=112733 (lots of thanks!)
... and the SdFat example files too
www.jeffreythompson.org
*/
char name[] = "9px_0000.bmp"; // filename convention (will auto-increment)
const int w = 16; // image width in pixels
const int h = 9; // " height
const boolean debugPrint = true; // print details of process over serial?
const int imgSize = w*h;
int px[w*h]; // actual pixel data (grayscale - added programatically below)
SdFat sd;
SdFile file;
const uint8_t cardPin = 8; // pin that the SD is connected to (d8 for SparkFun MicroSD shield)
void setup() {
// iteratively create pixel data
int increment = 256/(w*h); // divide color range (0-255) by total # of px
for (int i=0; i<imgSize; i++) {
px[i] = i * increment; // creates a gradient across pixels for testing
}
// SD setup
Serial.begin(9600);
if (!sd.init(SPI_FULL_SPEED, cardPin)) {
sd.initErrorHalt();
Serial.println("---");
}
// if name exists, create new filename
for (int i=0; i<10000; i++) {
name[4] = (i/1000)%10 + '0'; // thousands place
name[5] = (i/100)%10 + '0'; // hundreds
name[6] = (i/10)%10 + '0'; // tens
name[7] = i%10 + '0'; // ones
if (file.open(name, O_CREAT | O_EXCL | O_WRITE)) {
break;
}
}
// set fileSize (used in bmp header)
int rowSize = 4 * ((3*w + 3)/4); // how many bytes in the row (used to create padding)
int fileSize = 54 + h*rowSize; // headers (54 bytes) + pixel data
// create image data; heavily modified version via:
// http://stackoverflow.com/a/2654860
unsigned char *img = NULL; // image data
if (img) { // if there's already data in the array, clear it
free(img);
}
img = (unsigned char *)malloc(3*imgSize);
for (int y=0; y<h; y++) {
for (int x=0; x<w; x++) {
int colorVal = px[y*w + x]; // classic formula for px listed in line
img[(y*w + x)*3+0] = (unsigned char)(colorVal); // R
img[(y*w + x)*3+1] = (unsigned char)(colorVal); // G
img[(y*w + x)*3+2] = (unsigned char)(colorVal); // B
// padding (the 4th byte) will be added later as needed...
}
}
// print px and img data for debugging
if (debugPrint) {
Serial.print("\nWriting \"");
Serial.print(name);
Serial.print("\" to file...\n");
for (int i=0; i<imgSize; i++) {
Serial.print(px[i]);
Serial.print(" ");
}
}
// create padding (based on the number of pixels in a row
unsigned char bmpPad[rowSize - 3*w];
for (int i=0; i<sizeof(bmpPad); i++) { // fill with 0s
bmpPad[i] = 0;
}
// create file headers (also taken from StackOverflow example)
unsigned char bmpFileHeader[14] = { // file header (always starts with BM!)
'B','M', 0,0,0,0, 0,0, 0,0, 54,0,0,0 };
unsigned char bmpInfoHeader[40] = { // info about the file (size, etc)
40,0,0,0, 0,0,0,0, 0,0,0,0, 1,0, 24,0 };
bmpFileHeader[ 2] = (unsigned char)(fileSize );
bmpFileHeader[ 3] = (unsigned char)(fileSize >> 8);
bmpFileHeader[ 4] = (unsigned char)(fileSize >> 16);
bmpFileHeader[ 5] = (unsigned char)(fileSize >> 24);
bmpInfoHeader[ 4] = (unsigned char)( w );
bmpInfoHeader[ 5] = (unsigned char)( w >> 8);
bmpInfoHeader[ 6] = (unsigned char)( w >> 16);
bmpInfoHeader[ 7] = (unsigned char)( w >> 24);
bmpInfoHeader[ 8] = (unsigned char)( h );
bmpInfoHeader[ 9] = (unsigned char)( h >> 8);
bmpInfoHeader[10] = (unsigned char)( h >> 16);
bmpInfoHeader[11] = (unsigned char)( h >> 24);
// write the file (thanks forum!)
file.write(bmpFileHeader, sizeof(bmpFileHeader)); // write file header
file.write(bmpInfoHeader, sizeof(bmpInfoHeader)); // " info header
for (int i=0; i<h; i++) { // iterate image array
file.write(img+(w*(h-i-1)*3), 3*w); // write px data
file.write(bmpPad, (4-(w*3)%4)%4); // and padding as needed
}
file.close(); // close file when done writing
if (debugPrint) {
Serial.print("\n\n---\n");
}
}
void loop() { }

Running out of ram declaring a global 2d array issue

I need a different way to have global access to 160*160 bits of data, that wont cause me to run out of ram. I am trying to create a back buffer for a 160*160 LCD black and white screen. so 160*10 ints gives me 160*160 bits because a int is 16bits. However I am running out of RAM on the board. Does anyone have a way to this where I wont use the ram? maybe allocating in someway? but I cant seem to get a proper way to allocate a 2d array. Is there any other way of doing this?
edit:
it is a msp430 rbx430 board,(here is a link to a picture of it http://i.ytimg.com/vi/rr18why8wzY/0.jpg ) and yes int's are 16bits on this device. longs and doubles are 32bits. the device has 64k memory, and I am running it at 16mhz. I am asking for 3,200 bytes
as for it making sense, how does it not? I have a 64k device, where int's are 16bits. I am creating a map for the 160*160 lcd screen by using the 1's and 0's to keep track of when a pixel is on or off. after i turn on all the pixels i want, i then take my map and apply it to the lcd. This way I do not have to draw to the lcd then erase the lcd then draw again. I can simply draw, and then draw over it. this will make it so it will not flicker.
effectively creating a back buffer to draw to the lcd.
static int lcdPixels[160][10];
/*Must call this before using RBX430_graphics*/
void initGraphics(void)
{
int h = 0;
int w = 0;
for(h=0; h < ROW_SIZE; h++)
{
for(w=0; w < COLUMN_SIZE; w++)
{
lcdPixels[h][w] = 0;
}
}
}
---------------------------------here is the rest-----------------------
void pixelOn(int posX, int posY)
{
// first grab the right column
int column = ( ((float)posX/16.0f) + 0.9f);
// next grab the right bit
int bit = posX;
while(bit > 16)
{
bit = bit - 16;
}
//turn on the bit/pixel
lcdPixels[posY][column] |= (1 << bit);
}
void pixelOFF(int posX, int posY)
{
// first grab the right column
int column = ( ((float)posX/16.0f) + 0.9f);
// next grab the right bit
int bit = posX;
while(bit > 16)
{
bit = bit - 16;
}
//turn off the bit/pixel
lcdPixels[posY][column] &= ~(1 << bit);
}
/* Call this to commit the current backBuffer to the LCD display*/
void commitBuffer(void)
{
int h = 0;
int w = 0;
int k = 0;
for(h=0; h < ROW_SIZE; h++)
{
for(w=0; w < COLUMN_SIZE; w++)
{
for(k=0; k < INT_SIZE; k++)
{
if((lcdPixels[h][w] & (1 << k)) >> k)
{
lcd_point(((w * 16) + k), h, ON);
}
else
{
lcd_point(((w * 16) + k), h, OFF);
}
}
}
}
}
So i now tried to allocate the array using malloc, and that is a no go as well. I guess I just can not do this, 160*160 bits is just to much data....
Do you have 64K of RAM or 64K of Flash memory? I think the RBX430 has a msp430f2274 on it (http://www.ti.com/product/msp430f2274) which only has 1K of RAM.

Resources