XOR performance improvement via instrinics? - c
I have some code where I step through two buffers and XOR one against the placing the results in a third. Specifically, this is a buffer of data, a key stream buffer, and a destination buffer.
The function in question is (in full)
static int
ssh_aes_ctr(EVP_CIPHER_CTX *ctx, u_char *dest, const u_char *src,
LIBCRYPTO_EVP_INL_TYPE len)
{
typedef union {
#ifdef CIPHER_INT128_OK
__uint128_t *u128;
#endif
uint64_t *u64;
uint32_t *u32;
uint8_t *u8;
const uint8_t *cu8;
uintptr_t u;
} ptrs_t;
ptrs_t destp, srcp, bufp;
uintptr_t align;
struct ssh_aes_ctr_ctx_mt *c;
struct kq *q, *oldq;
int ridx;
u_char *buf;
if (len == 0)
return 1;
if ((c = EVP_CIPHER_CTX_get_app_data(ctx)) == NULL)
return 0;
q = &c->q[c->qidx];
ridx = c->ridx;
/* src already padded to block multiple */
srcp.cu8 = src;
destp.u8 = dest;
do { /* do until len is 0 */
buf = q->keys[ridx];
bufp.u8 = buf;
/* figure out the alignment on the fly */
#ifdef CIPHER_UNALIGNED_OK
align = 0;
#else
align = destp.u | srcp.u | bufp.u;
#endif
/* xor the src against the key (buf)
* different systems can do all 16 bytes at once or
* may need to do it in 8 or 4 bytes chunks
* worst case is doing it as a loop */
#ifdef CIPHER_INT128_OK
if ((align & 0xf) == 0) {
destp.u128[0] = srcp.u128[0] ^ bufp.u128[0];
} else
#endif
/* 64 bits */
if ((align & 0x7) == 0) {
destp.u64[0] = srcp.u64[0] ^ bufp.u64[0];
destp.u64[1] = srcp.u64[1] ^ bufp.u64[1];
/* 32 bits */
} else if ((align & 0x3) == 0) {
destp.u32[0] = srcp.u32[0] ^ bufp.u32[0];
destp.u32[1] = srcp.u32[1] ^ bufp.u32[1];
destp.u32[2] = srcp.u32[2] ^ bufp.u32[2];
destp.u32[3] = srcp.u32[3] ^ bufp.u32[3];
} else {
/*1 byte at a time*/
size_t i;
for (i = 0; i < AES_BLOCK_SIZE; ++i)
dest[i] = src[i] ^ buf[i];
}
/* inc/decrement the pointers by the block size (16)*/
destp.u += AES_BLOCK_SIZE;
srcp.u += AES_BLOCK_SIZE;
/* Increment read index, switch queues on rollover */
if ((ridx = (ridx + 1) % KQLEN) == 0) {
oldq = q;
/* Mark next queue draining, may need to wait */
c->qidx = (c->qidx + 1) % numkq;
q = &c->q[c->qidx];
pthread_mutex_lock(&q->lock);
while (q->qstate != KQFULL) {
pthread_cond_wait(&q->cond, &q->lock);
}
q->qstate = KQDRAINING;
pthread_cond_broadcast(&q->cond);
pthread_mutex_unlock(&q->lock);
/* Mark consumed queue empty and signal producers */
pthread_mutex_lock(&oldq->lock);
oldq->qstate = KQEMPTY;
pthread_cond_broadcast(&oldq->cond);
pthread_mutex_unlock(&oldq->lock);
}
} while (len -= AES_BLOCK_SIZE);
c->ridx = ridx;
return 1;
}
Running this through vtune I find that the destp.u128[0] = srcp.u128[0] ^ bufp.u128[0]; line consumes 4% of the cpu time.
The assembly for this line is
shl $0x4, %rax
addq (%rsp), %rax
movq (%rax), %rdx
xorq (%rbx), %rdx
movq 0x8(%rax), %rax
xorq 0x8(%rax), %rax
movq %rax, 0x8(%r12)
The xorq (%rbx), %rdx is consuming 3.5% of the CPU time. What I am wondering is if I could improve performance by vectorizing the data being xor'd and then using intrinsics to perform the xor. I don't have much (read any) real experience in using intrinsics but I'm willing to learn. I just don't know if this code, as is, is about as good as I can expect. Any pointers would be appreciated.
Thanks
Related
What is the safest way to hash a list of non-repeating integers without taking order into account?
I am looking for a hash function, that can hash a list of non-repeating integers while ignoring the order of them. Example I want the two lists l1 = [0, 1, 3, 7] l2 = [7, 3, 1, 0] to have the same hash. Background I have an algorithm that finds a list of vertices on a graph. In an undirected graph, the algorithm will find certain lists multiple times in different orders. With my current understanding of the algorithm, it is easier to filter out the duplicates rather than re-inventing the algorithm. For performance reasons, I understand it to be easier to hash the found lists of vertices rather than comparing the whole lists. Possible answers Now, I see that an XOR or a simple sum might be an answer. Unfortunately, both offer too much potential for hash collisions, as I see it. The not-very-efficient working method is to sort a list, and then use this sorted list to compare the new list (also sorted) against. Other Thoughts Given that The lists contain only integers. The integers will be the vertex indices, and the graph can have billions of vertices. The integers in a list are non-repeating, and their order doesn't matter. The lists can and will consist of between 2 and 100 (and in some cases > 1000) entries. No need for cryptographically-secure randomness. I have this feeling that there should be a relatively easy and straight-forward answer, and I just have not found it.
Use a combination of the product, sum and ^. All are communitive (order independent) with unsigned math. unsigned long long product = 1; unsigned sum = 0; // Maybe unsigned long long unsigned x = 0; for (i=0; i < array_element_count; i++) { product *= l[i]; sum += l[i]; x ^= l[i]; } unsigned long long pre_hash = product + sum + ((unsigned long long) x << 32)); unsigned hash = pre_hash % hash_table_size; Tip: hash_table_size should be a prime to effectively use all pre_hash bits. If array_element_count was high, I would consider p *= shift_right_until_odd(l[i]), else p will too often become 0. If l[i] == 0 p *= l[i] deserves something different. A simple mitigation is p *= l[i] | 1, but that is something pulled out of the air. Hashing takes time for good design and the above are candidate building blocks for OP.
Any CRC will do the job. Just XOR (I have used 64bit numbers, but 32bits crc, but it should work also with full 64 xor/crc or 32bit xor/crc) the elements together (to eliminate any order between them, as the XOR operation is conmutative, you eliminate the dependency on the order) mod 2&31, then take a CRC32 of the result (that will spread the set of values uniformly, as it warrants ---or tries to--- that a change in one bit will affect half of the bits in the result) See here for sample code and several crc tables. The repository is BSD license, so you can use it as desired. Below is a sample implementation that generates random lists, and reorders them, comparing their hashes: crc32ieee8023.h #ifndef CRC32IEEE8023_H #define CRC32IEEE8023_H #include "crc.h" extern CRC_STATE crc32ieee8023[]; #endif /* CRC32IEEE8023_H */ crc.h #ifndef CRC_H #define CRC_H #include <stdlib.h> #include <stdint.h> #define CRC_TABLE_SIZE 256 #define CRC_BYTE_SIZE 8 #define CRC_BYTE_MASK 0xff typedef uint8_t CRC_BYTE; typedef uint64_t CRC_STATE; CRC_STATE do_crc( CRC_STATE state, CRC_BYTE *buff, size_t nbytes, CRC_STATE *table); #endif /* CRC_H */ test_xor_crc_hash.c (This is the important file, where all the stuff is included.) /* test_crc_table -- program to test a crc hash algorithm that * checks a list of numbers and generates the same crc in a form * that is independent on the list order presented. * Program generates a list of random numbers (32bit) then it * generates a random permutation of the list and a sorted list, * calculates the hash over the three lists, and compares them. */ #include <errno.h> #include <fcntl.h> #include <getopt.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include "crc.h" #include "crc32ieee8023.h" #define DFLT_N 10 #define RANDOM_DEV "/dev/urandom" int long_compare( const void *_a, const void *_b); void print( const char *name, const uint64_t *v, int vsz, CRC_STATE crc, uint64_t xor); int main(int argc, char **argv) { int opt; int n = DFLT_N, res; /* process options */ while ((opt = getopt(argc, argv, "n:")) != EOF) { switch (opt) { case 'n': res = sscanf(optarg, "%u", &n); if (res != 1) { fprintf(stderr, "%s: invalid format (-n)\n", optarg); } break; } /* switch */ } /* while */ /* initialization of random number generator */ unsigned short random_state[3]; int fd = open(RANDOM_DEV, O_RDONLY); if (fd < 0) { fprintf(stderr, "open: %s: %s\n", RANDOM_DEV, strerror(errno)); exit(EXIT_FAILURE); } res = read(fd, random_state, sizeof random_state); if (res < 0) { /* error */ fprintf(stderr, "read: %s: %s\n", RANDOM_DEV, strerror(errno)); exit(EXIT_FAILURE); } if (res < sizeof random_state) { fprintf(stderr, "read: %s: incomplete read (%d/%zd)\n", RANDOM_DEV, res, sizeof random_state); exit(EXIT_FAILURE); } seed48(random_state); close(fd); /* generate a list of random numbers and make two copies */ uint64_t *original = calloc(n, sizeof *original), *copy_sorted = calloc(n, sizeof *copy_sorted), *random_sort = calloc(n, sizeof *random_sort); /* make two copies */ for (int i = 0; i < n; i++) { original[i] = copy_sorted[i] = random_sort[i] = (long)lrand48() | ((long)lrand48() << 32); } /* sort the numbers */ qsort(copy_sorted, n, sizeof *copy_sorted, long_compare); /* and random permutation */ for (int i = 0; i < n-1; i++) { int j = lrand48() % (n - i); if (i != j) { uint64_t temp = random_sort[i]; random_sort[i] = random_sort[j]; random_sort[j] = temp; } } /* calculate the sorts */ uint64_t xor_original = 0, xor_sorted = 0, xor_random = 0; for (int i = 0; i < n; i++) { xor_original ^= original[i]; xor_sorted ^= copy_sorted[i]; xor_random ^= random_sort[i]; } /* now, calculate the crc's (a crc64 would be better for long) */ CRC_STATE crc_original = do_crc(0xffffffff, (unsigned char *)&xor_original, sizeof xor_original, crc32ieee8023), crc_sorted = do_crc(0xffffffff, (unsigned char *)&xor_sorted, sizeof xor_sorted, crc32ieee8023), crc_random = do_crc(0xffffffff, (unsigned char *)&xor_random, sizeof xor_random, crc32ieee8023); print("original", original, n, crc_original, xor_original); print(" sorted", copy_sorted, n, crc_sorted, xor_sorted); print(" random", random_sort, n, crc_random, xor_random); if (crc_original != crc_sorted || crc_sorted != crc_random) { fprintf(stderr, "crc's don't match (crc_original == 0x%08lx, " "crc_sorted == 0x%08lx, crc_random == 0x%08lx)\n", crc_original, crc_sorted, crc_random); } /* change only one bit in one element to see how it changes the hash */ int bit_to_change = lrand48() % (n * 64), elem_to_change = bit_to_change % n; bit_to_change %= 64; original[elem_to_change] ^= (1UL << bit_to_change); /* change the bit */ /* we should do the calculation over all elements, but just * changing a bit in one element will change just the same bit in the * xor_original accumulation variable */ uint64_t xor_original_new = xor_original; xor_original_new ^= (1UL << bit_to_change); printf("element=%d, bit=%d\n", elem_to_change, bit_to_change); uint64_t crc_original_new = do_crc(0xffffffff, (unsigned char *)&xor_original_new, sizeof xor_original_new, crc32ieee8023); print(" chg1bit", original, n, crc_original_new, xor_original_new); } int long_compare(const void *_a, const void *_b) { const uint64_t *a = _a, *b = _b; return *a == *b ? 0 : *a > *b ? +1 : -1; } void print(const char *name, const uint64_t *v, int vsz, CRC_STATE crc, uint64_t xor) { printf("%s: { ", name); char *sep = ""; for (int i = 0; i < vsz; i++) { printf("%s0x%016lx", sep, v[i]); sep = ", "; } printf(" }\n" " xor = 0x%016lx, crc = 0x%08lx\n", xor, crc); } crc.c #include <sys/types.h> #include "crc.h" /* table based CRC calculation */ CRC_STATE do_crc( CRC_STATE state, CRC_BYTE *buff, size_t nbytes, CRC_STATE *table) { CRC_STATE index; while (nbytes--) { state ^= *buff++; index = state & CRC_BYTE_MASK; state >>= CRC_BYTE_SIZE; state ^= table[index]; } /* while */ return state; } /* do_crc */ crc32ieee8023.c #include "crc.h" /* variables */ CRC_STATE crc32ieee8023[] = { /* Comando usado: mkcrc -gpedb88320 */ /* Polinomio: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */ /* 0 */ 0x0, 0x77073096, 0xee0e612c, 0x990951ba, /* 4 */ 0x76dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, /* 8 */ 0xedb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, /* 12 */ 0x9b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, /* 16 */ 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, /* 20 */ 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, /* 24 */ 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, /* 28 */ 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, /* 32 */ 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, /* 36 */ 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, /* 40 */ 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, /* 44 */ 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, /* 48 */ 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, /* 52 */ 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, /* 56 */ 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, /* 60 */ 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, /* 64 */ 0x76dc4190, 0x1db7106, 0x98d220bc, 0xefd5102a, /* 68 */ 0x71b18589, 0x6b6b51f, 0x9fbfe4a5, 0xe8b8d433, /* 72 */ 0x7807c9a2, 0xf00f934, 0x9609a88e, 0xe10e9818, /* 76 */ 0x7f6a0dbb, 0x86d3d2d, 0x91646c97, 0xe6635c01, /* 80 */ 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, /* 84 */ 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, /* 88 */ 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, /* 92 */ 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, /* 96 */ 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, /* 100 */ 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, /* 104 */ 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, /* 108 */ 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, /* 112 */ 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, /* 116 */ 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, /* 120 */ 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, /* 124 */ 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, /* 128 */ 0xedb88320, 0x9abfb3b6, 0x3b6e20c, 0x74b1d29a, /* 132 */ 0xead54739, 0x9dd277af, 0x4db2615, 0x73dc1683, /* 136 */ 0xe3630b12, 0x94643b84, 0xd6d6a3e, 0x7a6a5aa8, /* 140 */ 0xe40ecf0b, 0x9309ff9d, 0xa00ae27, 0x7d079eb1, /* 144 */ 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, /* 148 */ 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, /* 152 */ 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, /* 156 */ 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, /* 160 */ 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, /* 164 */ 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, /* 168 */ 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, /* 172 */ 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, /* 176 */ 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, /* 180 */ 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, /* 184 */ 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, /* 188 */ 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, /* 192 */ 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x26d930a, /* 196 */ 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x5005713, /* 200 */ 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0xcb61b38, /* 204 */ 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0xbdbdf21, /* 208 */ 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, /* 212 */ 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, /* 216 */ 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, /* 220 */ 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, /* 224 */ 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, /* 228 */ 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, /* 232 */ 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, /* 236 */ 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, /* 240 */ 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, /* 244 */ 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, /* 248 */ 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, /* 252 */ 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d, }; /* crc32ieee8023 */ Makefile targets = test_xch toclean = $(targets) test_xch_deps = test_xch_objs = crc32ieee8023.o crc.o test_xor_crc_hash.o test_xch_libs = test_xch_ldfl = toclean += $(test_xch_objs) all: $(targets) clean: $(RM) $(toclean) test_xch: $(test_xch_deps) $(test_xch_objs) $(CC) $(LDFLAGS) $($#_ldfl) -o $# $($#_objs) $($#_libs) $(LIBS) To make the program, just run: $ make and to run it, you can use option -n that allows you to specify the number of random elements to generate.
I think you will have to invent one to avoid the slow sorting option. In addition to XOR and arithmetic addition, there are bit rotations, and bit masks you could use. If you need high collision resistance, you could just combine more than one of the hash functions. e.g. Assuming the d_i and arithmetic are modular like with uint32_t for example, H_1 = sum_{i = 1 to n} d_i H_2 = xor_{i = 1 to n} d_i H_3 = xor_{i = 1 to n} (rotl(d_i, d_i & 0x1f) + c) Then take H1H2H3 as a 12 byte hash.
Iterating over string returns empty in c (os development)
I was making an os, or atleast trying to, but I stumbled upon a problem. While trying to iterate over a string to convert to char to print to screen, the returned char seemed to be empty!(I am actually new to os development); Here is the code snippet: int offset = 0; void clear_screen() { unsigned char * video = 0xB8000; for(int i = 0; i < 2000; i+=2){ video[i] = ' '; } } void printc(char c) { unsigned char * video = 0xB8000; video[offset] = c; video[offset+1] = 0x03; offset += 2; } void print(unsigned char *string) { char * sus = '\0'; uint32 i = 0; printc('|'); sus[0] = 'a'; printc(sus[0]); //this prints "a" correctly string[i] = 'c'; while (string[i] != '\0') { printc(string[i]); //this while loop is only called once i++; //it prints " " only once and exits } printc('|'); } int bootup(void) { clear_screen(); // printc('h'); // printc('e'); // printc('l'); /* These work */ // printc('l'); // printc('o'); print("hello"); //this doesn't return 1; } Output that it prints: |a | Thanks in advance!! edit New print function void print(unsigned char *string) { uint32 i = 0; printc('|'); while (string[i] != '\0') { printc('i'); //not printed printc(string[i]); i++; } printc('|'); } still does not work edit 2 updated the code as per #lundin's advice int offset = 0; void clear_screen() { unsigned char * video = (unsigned char *)0xB8000; for(int i = 0; i < 2000; i+=2){ video[i] = ' '; } } void printc(char c) { unsigned char * video = (unsigned char *)0xB8000; video[offset] = c; video[offset+1] = 0x03; offset += 2; } void print(const char *string) { int i = 0; printc('|'); while (string[i] != '\0') { printc('i'); printc(string[i]); i++; } printc('|'); } int bootup(void) { clear_screen(); // printc('h'); // printc('e'); // printc('l'); // printc('l'); // printc('o'); print("hello"); return 1; } stack: init_lm: mov ax, 0x10 mov fs, ax ;other segments are ignored mov gs, ax mov rbp, 0x90000 ;set up stack mov rsp, rbp ;Load kernel from disk xor ebx, ebx ;upper 2 bytes above bh in ebx is for cylinder = 0x0 mov bl, 0x2 ;read from 2nd sectors mov bh, 0x0 ;head mov ch, 1 ;read 1 sector mov rdi, KERNEL_ADDRESS call ata_chs_read jmp KERNEL_ADDRESS jmp $
Before proceeding I would recommend reading the OSDev wiki's page on text-based UIs. While this may go beyond the scope of the question somewhat, I would strongly recommend that, rather than working with the character/attribute values as unsigned char manually, you might want to declare a struct type for those pairs: struct TextCell { volatile unsigned char ch; volatile uint8_t attribute; }; (You could actually be even more refined about it, by using a bitfield for the individual foreground, background, and decoration components of the attributes, but that's probably getting ahead of things.) From there you can define the text buffer as a constant pointer: const struct TextCell* text_buffer = (TextCell *)0xB8000; You could further define const uint16_t MAXH = 80, MAXV = 25; uint16_t currv = 0, currh = 0; struct TextCell* text_cursor = text_buffer; void advance_cursor() { text_cursor++; if (currh < MAXH) { currh++; } else { currh = 0; if (currv < MAXV) { currv++; } else { /* handle scrolling */ } } } void gotoxy(uint16_t x, uint16_t y) { uint16_t new_pos = x * y; if (new_pos > (MAXV * MAXH)) { text_cursor = text_buffer + (MAXV * MAXH); currh = MAXH; currv = MAXV; } else { text_cursor += new_pos; currh = x; currv = y; } Which would lead to the following modifications of your code: void kprintc(char c, uint8_t attrib) { text_cursor->ch = c; text_cursor->attribute = attrib; advance_cursor(); } void kprint(const char *string, uint8_t attribs) { int i; for (i = 0; string[i] != '\0'; i++) { kprintc(string[i], attribs); } } void clear_screen() { for(int i = 0; i < (MAXH * MAXV); i++) { kprintc(' ', 0); } } int bootup(void) { clear_screen(); // kprintc('h', 0x03); // kprintc('e', 0x03); // kprintc('l', 0x03); // kprintc('l', 0x03); // kprintc('o', 0x03); kprint("hello", 0x03); return 1; } So, why am I suggesting all of this extra stuff? Because it is a lot easier to debug this way, mainly - it divides the concerns up better, and structures the data (or in this case, the video text buffer) more effectively. Also, you'll eventually need to do something like this at some point in the project, so if it helps now, you might as well do it now. If I am out of line in this, please let me know.
Your program has undefined behavior since it contains multiple lines that aren't valid C. You will have gotten compiler messages about those lines. unsigned char * video = 0xB8000; etc is not valid C, you need an explicit cast. "Pointer from integer/integer from pointer without a cast" issues Similarly, char * sus = '\0'; is also not valid C. You are trying to assign a pointer to a single character, which doesn't make sense. String handling beginner FAQ here: Common string handling pitfalls in C programming. It also addresses memory allocation basics. sus[0] = 'a'; etc here you have wildly undefined behavior since sus isn't pointing at valid memory. In case you are actually trying to access physical memory addresses, this isn't the correct way to do so. You need volatile qualified pointers. See How to access a hardware register from firmware? (In your case it probably isn't a register but everything from that link still applies - how to use hex constants etc.) EDIT: void print(unsigned char *string) ... string[i] = 'c'; is also wrong. First of all you are passing a char* which is not necessarily compatible with unsigned char*. Then you shouldn't modify the passed string from inside a function called print, that doesn't make sense. This should have been const char* string to prevent such bugs. As it stands you are passing a string literal to this function and then try to modify it - that is undefined behavior since string literals are read-only. Assuming gcc or clang, if you wish to block the compiler from generating an executable out of invalid C code, check out What compiler options are recommended for beginners learning C? In your case you also likely need the -ffreestanding option mentioned there.
char * sus = '\0'; Have not checked more... but this assigns a null pointer to sus, and most probably is not what you want to do.
Understanding hook_finder
I'm Trying to understand the PE Format & the source code of "hook_finder" in here "https://github.com/Mr-Un1k0d3r/EDRs/blob/main/hook_finder64.c" in this snippet I now it's trying to calculate Export_Table offset: VOID DumpListOfExport(VOID *lib, BOOL bNt) { DWORD dwIter = 0; CHAR* base = (CHAR*)lib; CHAR* PE = base + (unsigned char)*(base + 0x3c); DWORD ExportDirectoryOffset = *((DWORD*)PE + (0x8a / 4)); CHAR* ExportDirectory = base + ExportDirectoryOffset; DWORD dwFunctionsCount = *((DWORD*)ExportDirectory + (0x14 / 4)); DWORD OffsetNamesTableOffset = *((DWORD*)ExportDirectory + (0x20 / 4)); CHAR* OffsetNamesTable = base + OffsetNamesTableOffset; printf("------------------------------------------\nBASE\t\t\t0x%p\t%s\nPE\t\t\t0x%p\t%s\nExportTableOffset\t0x%p\nOffsetNameTable\t\t0x%p\nFunctions Count\t\t0x%x (%d)\n------------------------------------------\n", base, base, PE, PE, ExportDirectory, OffsetNamesTable, dwFunctionsCount, dwFunctionsCount); for(dwIter; dwIter < dwFunctionsCount - 1; dwIter++) { DWORD64 offset = *((DWORD*)OffsetNamesTable + dwIter); CHAR* current = base + offset; GetBytesByName((HANDLE)lib, current, bNt); } } ox3c is e_lfnew offset. However, can't understand what's other hex values and why it's divided by 4 byte? Further, VOID GetBytesByName(HANDLE hDll, CHAR *name, BOOL bNt) { FARPROC ptr = GetProcAddress((HMODULE)hDll, name); DWORD* opcode = (DWORD*)*ptr; if(bNt) { if(name[0] != 'N' && name[1] != 't') { return; } } if((*opcode << 24) >> 24 == 0xe9) { if(!IsFalsePositive(name)) { printf("%s is hooked\n", name); } } } what's been exactly left & right shifting and Why 24 specifically? From my understanding of EDRs, it adds a JMP instruction at the very beginning of the function and that's why the condition is trying to check if it's (0xe9), but how does it follow and be certain about the function flow? and is this applicable only for ntdll.dll? Sorry I'm starting to study the PE behavior and trying to make things very clear. Thank you in advance
The function DumpListOfExport assumes that NtHeaders start at the offset 0x3c from the base but, this is not always the case depending on the size of the DOS stub. Probably, this code makes that assumption for ntdll.dll. And in the function GetBytesByName, if first byte of the procedure starts with a JMP(in that case, it is near, relative jmp whose opcode starts with "E9") instruction and the procedure name is not in the false positives list, then the function makes decision that the function is hooked. Let be the value of the 4-bytes pointed to by opcode 0xca0e4be9, left shifting it by 24 will result in 0xe9000000, and then right shifting by 24 the result will be 0x000000e9 which is the value of the first byte at ptr. That procedure can be simplified as follows. VOID GetBytesByName(HANDLE hDll, CHAR *name, BOOL bNt) { FARPROC ptr = GetProcAddress((HMODULE)hDll, name); BYTE* opcode = (BYTE*)ptr; if(bNt) { if(name[0] != 'N' && name[1] != 't') { return; } } if(!IsFalsePositive(name) && *opcode == 0xe9) { printf("%s is hooked\n", name); } } As a note : I can say that the code isn't written well, and doesn't follow any good coding style.
Why can't i exit from shellcode with a syscall?
I try to make a Programm where you put in some assembled assembly in hex and run it. With simple instructions like int3 it works, but when I try to exit from the programm with a syscall it doesnt work. I assembled it with rasm2 mov eax, 1 mov ebx, 12 int 0x80 and then put it as an argument ./Programm b801000000bb0c000000cd80 1 but i get a segfault. Here is my code: #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> char *base16dec(char *b16str) { size_t stingrlength = strlen(b16str); char *decodedstr = malloc(stingrlength / 2); for (size_t i = 0; i < stingrlength; i += 2) { u_int8_t num = 0; char stringIn[3]; stringIn[0] = b16str[i]; stringIn[1] = b16str[i+1]; stringIn[2] = 0; sscanf(stringIn, "%hhx", &num); decodedstr[i/2] = (char) num; } return decodedstr; } this decodes the hex string int main(int argc, char *argv[]) { char *dirstr = "XXXXXX"; char dir[7]; strcpy(dir, dirstr); int fd = mkstemp(dir); if (fd == -1) { dirstr = "/tmp/XXXXXX"; char dir[12]; strcpy(dir, dirstr); fd = mkstemp(dir); } unlink(dir); this creates the tmp file where the assembly is stored char *stringIn; if (argc == 2) { stringIn = malloc(strlen(argv[1])); strcpy(stringIn, argv[1]); } else if (argc == 3) { u_int8_t num = 0; sscanf(argv[2], "%hhu", &num); if (num == 1) { char *done = base16dec(argv[1]); stringIn = malloc(strlen(done)); strcpy(stringIn, done); } else { stringIn = malloc(strlen(argv[1])); strcpy(stringIn, argv[1]); } } else { stringIn = malloc(1024); scanf("%s", stringIn); char *done = base16dec(stringIn); stringIn = malloc(strlen(done)); strcpy(stringIn, done); } this parses and copies the input to stringIn ftruncate(fd, strlen(stringIn)); u_int8_t *code = mmap(NULL, strlen(stringIn), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE , fd, 0); this expands the tmp file and makes it executable and creates a pointer to it named code for (int i = 0; i < 1024; i++) { code[i] = (u_int8_t) stringIn[i]; } this copies the assembly bytes into code #if __x86_64__ __asm__( "mov %0, %%rbx\n" "jmp *%%rbx" : : "g" (code) : "memory" ); #elif __i386__ __asm__( "mov %0, %%ebx\n" "jmp *%%ebx" : : "r" (code) ); #else #endif this jumps to the the assembly return 0; } EDIT: I can't debug the shellcode using gdb I use 64bit Linux Mint I tried to copy 0 using strcpy
Since this is a shellcode you can't have null bytes. In your code you have 2 movs with immediates that are padded to 32-bits mov eax, 1 mov ebx, 12 Which encodes as B801000000BB0C000000, when C hits the null bytes it thinks the string has ended so it only ends up copying part of the instruction and then it executes garbage. Instead you'll need to use: xor eax, eax inc eax xor ebx, ebx mov bl, 12 This will provide the values you want for your system call and does not encode as any null bytes.
How to print an integer in Nios 2?
I've written code than handles interrupts and I added a function that will do something "useful" (calculating the next prime) and that should handle interrupts. The interrupts are working but not printing the calculated primes from within the while loop, while it is printing the time if I don't try to print the prime. I think that putchar can't be used since a) It's for characters and I want to print an int and b) putchar uses interrupts so I must do it some other way. When I tried with printf("%d",next) it didn't work either, why? How should I do to print the next prime from the main loop? The program is #include <stddef.h> #include <stdio.h> #include "system.h" #include "altera_avalon_pio_regs.h" #include "sys/alt_alarm.h" #include "alt_types.h" #include "alt_irq.h" extern int initfix_int(void); extern void puttime(int* timeloc); extern void puthex(int time); extern void tick(int* timeloc); extern void delay(int millisec); extern int hexasc(int invalue); #define TRUE 1 #define KEYS4 ( (unsigned int *) 0x840 ) #define TIMER1 ( (unsigned int *) 0x920 ) #define PERIOD (49999) #define NULL_POINTER ( (void *) 0) #define NULL ( (void *) 0) #define PRIME_FALSE 0 /* Constant to help readability. */ #define PRIME_TRUE 1 /* Constant to help readability. */ volatile int timeloc = 0x5957; volatile int RUN = 1; volatile int * const de2_pio_keys4_base = (volatile int *) 0x840; volatile int * const de2_pio_keys4_intmask = (volatile int *) 0x848; volatile int * const de2_pio_keys4_edgecap = (volatile int *) 0x84c; volatile int * de2_pio_hex_low28 = (volatile int *) 0x9f0; const int de2_pio_keys4_intindex = 2; const int de2_pio_keys4_irqbit = 1 << 2; /* The alt_alarm must persist for the duration of the alarm. */ static alt_alarm alarm; /* * The callback function. */ alt_u32 my_alarm_callback(void* context) { /* This function is called once per second */ if (RUN == 1) { tick(&timeloc); puttime(&timeloc); puthex(timeloc); } return alt_ticks_per_second(); } void n2_fatal_error() { /* Define the pattern to be sent to the seven-segment display. */ #define N2_FATAL_ERROR_HEX_PATTERN ( 0xcbd7ff ) /* Define error message text to be printed. */ static const char n2_fatal_error_text[] = "FATAL ERROR"; /* Define pointer for pointing into the error message text. */ register const char * cp = n2_fatal_error_text; /* Send pattern to seven-segment display. */ *de2_pio_hex_low28 = N2_FATAL_ERROR_HEX_PATTERN; /* Print the error message. */ while (*cp) { //out_char_uart_0( *cp ); cp = cp + 1; } /* Stop and wait forever. */ while (1) ; } /* * Interrupt handler for de2_pio_keys4. * The parameters are ignored here, but are * required for correct compilation. * The type alt_u32 is an Altera-defined * unsigned integer type. */ void irq_handler_keys(void * context, alt_u32 irqnum) { alt_u32 save_value; save_value = alt_irq_interruptible(de2_pio_keys4_intindex); /* Read edge capture register of the de2_pio_keys4 device. */ int edges = *de2_pio_keys4_edgecap; *de2_pio_keys4_edgecap = 0; /* If action on KEY0 */ if (edges & 1) { /* If KEY0 is pressed now */ if ((*de2_pio_keys4_base & 1) == 0) { if (RUN == 0) { RUN = 1; } else { RUN = 0; } } /* If KEY0 is released now */ else if ((*de2_pio_keys4_base & 1) != 0) { } alt_irq_non_interruptible(save_value); } else if (edges & 2) { /* If KEY1 is pressed now */ if ((*de2_pio_keys4_base & 2) == 0) { tick(&timeloc); puttime(&timeloc); puthex(timeloc); } /* If KEY1 is released now */ else if ((*de2_pio_keys4_base & 2) != 0) { } alt_irq_non_interruptible(save_value); } else if (edges & 4) { /* If KEY2 is pressed now */ if ((*de2_pio_keys4_base & 4) == 0) { timeloc = 0x0; puttime(&timeloc); puthex(timeloc); } /* If KEY2 is released now */ else if ((*de2_pio_keys4_base & 4) != 0) { } alt_irq_non_interruptible(save_value); } else if (edges & 8) { /* If KEY3 is pressed now */ if ((*de2_pio_keys4_base & 8) == 0) { timeloc = 0x5957; puttime(&timeloc); puthex(timeloc); } /* If KEY3 is released now */ else if ((*de2_pio_keys4_base & 8) != 0) { } alt_irq_non_interruptible(save_value); } } /* * Initialize de2_pio_keys4 for interrupts. */ void keysinit_int(void) { /* Declare a temporary for checking return values * from system-calls and library functions. */ register int ret_val_check; /* Allow interrupts */ *de2_pio_keys4_intmask = 15; /* Set up Altera's interrupt wrapper for * interrupts from the de2_pio_keys4 device. * The function alt_irq_register will enable * interrupts from de2_pio_keys4. * Return value is zero for success, * nonzero for failure. */ ret_val_check = alt_irq_register(de2_pio_keys4_intindex, NULL_POINTER, irq_handler_keys); /* If there was an error, terminate the program. */ if (ret_val_check != 0) n2_fatal_error(); } /* * NextPrime * * Return the first prime number larger than the integer * given as a parameter. The integer must be positive. */ int nextPrime(int inval) { int perhapsprime; /* Holds a tentative prime while we check it. */ int testfactor; /* Holds various factors for which we test perhapsprime. */ int found; /* Flag, false until we find a prime. */ if (inval < 3) /* Initial sanity check of parameter. */ { if (inval <= 0) return (1); /* Return 1 for zero or negative input. */ if (inval == 1) return (2); /* Easy special case. */ if (inval == 2) return (3); /* Easy special case. */ } else { /* Testing an even number for primeness is pointless, since * all even numbers are divisible by 2. Therefore, we make sure * that perhapsprime is larger than the parameter, and odd. */ perhapsprime = (inval + 1) | 1; } /* While prime not found, loop. */ for (found = PRIME_FALSE; found != PRIME_TRUE; perhapsprime += 2) { /* Check factors from 3 up to perhapsprime/2. */ for (testfactor = 3; testfactor <= (perhapsprime >> 1); testfactor += 1) { found = PRIME_TRUE; /* Assume we will find a prime. */ if ((perhapsprime % testfactor) == 0) { /* If testfactor divides perhapsprime... */ found = PRIME_FALSE; /* ...then, perhapsprime was non-prime. */ goto check_next_prime; /* Break the inner loop, go test a new perhapsprime. */ } } check_next_prime: ; /* This label is used to break the inner loop. */ if (found == PRIME_TRUE) /* If the loop ended normally, we found a prime. */ { return (perhapsprime); /* Return the prime we found. */ } } return (perhapsprime); /* When the loop ends, perhapsprime is a real prime. */ } int main() { int next = 3; /* Remove unwanted interrupts. * A nonzero return value indicates failure. */ if (initfix_int() != 0) n2_fatal_error(); keysinit_int(); if (alt_alarm_start(&alarm, alt_ticks_per_second(), my_alarm_callback, NULL) < 0) { printf("No system clock available\n"); } while( 1 ) /* Loop forever. */ { next = nextPrime(next); //printf("%d",next); //printf("P %d ", next); } return 0; } int hex7seg(int digit) { int trantab[] = { 0x40, 0x79, 0x24, 0x30, 0x19, 0x12, 0x02, 0x78, 0x00, 0x10, 0x08, 0x03, 0x46, 0x21, 0x06, 0x0e }; register int tmp = digit & 0xf; return (trantab[tmp]); } void puthex(int inval) { unsigned int hexresult; hexresult = hex7seg(inval); hexresult = hexresult | (hex7seg(inval >> 4) << 7); hexresult = hexresult | (hex7seg(inval >> 8) << 14); hexresult = hexresult | (hex7seg(inval >> 12) << 21); IOWR_ALTERA_AVALON_PIO_DATA(DE2_PIO_HEX_LOW28_BASE, hexresult); }
Basically your code is working fine on me. The only problem is with printing, for some reason you should print a newline before such as: printf("start program\n"); while( 1 ) /* Loop forever. */ { next = nextPrime(next); printf("%d",next); }