Related
I want to convert a buffer of binary data in bytes into a buffer of sextets, where a sextet is a byte with the two most significant bits set to zero. I also want to do the reverse, i.e. convert a buffer of sextets back to bytes. As a test I am generating a buffer in bytes using a pseudo-random number generator that creates numbers between 0 and 255 using the built in version available in C. This is in order to simulate binary data. The details of the pseudo-random number generator and how good it is is of little importance, just that a stream of byte with various values is generated. Eventually a binary file will be read.
I've modified the functions in the link:
How do I base64 encode (decode) in C?
so that instead of encoding bytes to base64 characters, then decoding them back to bytes, sextets are used instead of base64. My encoding functions is as follows:
int bytesToSextets(int inx, int iny, int numBytes, CBYTE* byteData, BYTE* sextetData) {
static int modTable[] = { 0, 2, 1 };
int numSextets = 4 * ((numBytes + 2) / 3);
int i, j;
for (i = inx, j = iny; i < numBytes;) {
BYTE byteA = i < numBytes ? byteData[i++] : 0;
BYTE byteB = i < numBytes ? byteData[i++] : 0;
BYTE byteC = i < numBytes ? byteData[i++] : 0;
UINT triple = (byteA << 0x10) + (byteB << 0x08) + byteC;
sextetData[j++] = (triple >> 18) & 0x3F;
sextetData[j++] = (triple >> 12) & 0x3F;
sextetData[j++] = (triple >> 6) & 0x3F;
sextetData[j++] = triple & 0x3F;
}
for (int i = 0; i < modTable[numBytes % 3]; i++) {
sextetData[numSextets - 1 - i] = 0;
}
return j - iny;
}
where inx is the index in the input byte buffer where I want to start encoding, iny is the index in the output sextet buffer where the beginning of the sextets are written to, numBytes is the number of bytes to be encoded, and *byteData, *sextetData are the respective buffers to read from and write to. The last for-loop sets elements of sextetData to zero, not to '=' as given in the original code when there is padding. Although zero bytes can be valid data, as the length of the buffers are known in advance, I presume this is not a problem. The function returns with the number of sextets written, which can be checked against 4 * ((numBytes + 2) / 3). The first few sextets of the output buffer encode the number of bytes of data encodes in the rest of the buffer, with the number of sextets given in the formula.
The code for decoding sextets back to bytes is as follows:
int sextetsToBytes(int inx, int iny, int numBytes, CBYTE* sextetData, BYTE* byteData) {
int numSextets = 4 * ((numBytes + 2) / 3);
int padding = 0;
if (sextetData[numSextets - 1 + inx] == 0) padding++;
if (sextetData[numSextets - 2 + inx] == 0) padding++;
int i, j;
for (i = inx, j = iny; i < numSextets + inx;) {
UINT sextetA = sextetData[i++];
UINT sextetB = sextetData[i++];
UINT sextetC = sextetData[i++];
UINT sextetD = sextetData[i++];
UINT triple = (sextetA << 18) + (sextetB << 12) + (sextetC << 6) + sextetD;
if (j < numBytes) byteData[j++] = (triple >> 16) & 0xFF;
if (j < numBytes) byteData[j++] = (triple >> 8) & 0xFF;
if (j < numBytes) byteData[j++] = triple & 0xFF;
}
return j - iny - padding;
}
where as before inx and iny are the indices to start reading from and writing to a buffer, numBytes is the number of bytes that will be in the output buffer, from which the number of input sextets are calculated. The length of the input buffer is found from the first few sextets written by bytesToSextets(), so inx is the position in the input sextet buffer to start the actual conversion back to bytes. In the original function the number of sextets is given, from which the number of bytes is calculated using numSextets / 4 * 3. As this is already known, this is not done and should not make a difference. The last two arguments *sextetData and *byteData are the respectively input and output buffers.
An input buffer in bytes is created, converted to sextets, then as a test converted back to bytes. A comparison is made between the generated initial buffer of bytes and the output buffer in bytes after converting back from the intermediate sextet buffer. When the length of the input buffer is a multiple of 3, the match is perfect and the final output buffer is exactly the same. However, if the number of bytes in the initial buffer is not a multiple of 3, the last 3 bytes in the final output buffer may not match the original bytes. This has obviously something to do with the padding when the number of bytes is not a multiple of 3, but I am unable to find the source of the problem. Incidentally, the return values from the two functions are always correct, even when the last few bytes do not match.
In a header file I have the following typedefs:
typedef unsigned char BYTE;
typedef const unsigned char CBYTE;
typedef unsigned int UINT;
Although the main function is more complicated, in its simplest version it would have a form like:
// Allocate memory for bufA and bufB.
// Write the data length and other information into sextets 0 to 4 in bufB.
// Convert the bytes in bufA starting at index 0 to sextets in bufB starting at index 5.
int countSextets = bytesToSextets(0, 5, lenBufA, bufA, bufB);
// Allocate memory for bufC.
// Convert the sextets in bufB starting at index 5 back to bytes in bufC starting at index 0.
int countBytes = sextetsToBytes(5, 0, lenBufC, bufB, bufC);
As I said, this all works correctly, except that when the lenBufA is not a multiple of 3, the last 3 recovered bytes in bufC do not match those in bufA, but the calculated buffer lengths are all correct.
Perhaps someone can kindly help throw some light on this.
sextetData[numSextets - 1 - i] = 0; should be sextetData[iny + numSextets - 1 - i] = 0;.
The version of sextetsToBytes() I originally posted had the problem that I tested for padding by using:
if (sextetData[numSextets - 1 + inx] == 0) padding++;
if (sextetData[numSextets - 2 + inx] == 0) padding++;
as of course testing for '=' for base64 cannot be used, however, testing for zero can still cause problems, as zero can be a valid data item. This indeed sometimes caused a difference between the specified number of output bytes and the number found by counting up the bytes in the loop and subtracting the padding bytes. By just removing the padding bytes from the function, then checking the counted number returned against the specified input value numBytes, works. The modified code is as follows:
int sextetsToBytes(int numBytes, CBYTE* sextetData, BYTE* byteData) {
int numSextets = 4 * ((numBytes + 2) / 3);
int i, j;
for (i = 0, j = 0; i < numSextets;) {
UINT sextetA = sextetData[i++];
UINT sextetB = sextetData[i++];
UINT sextetC = sextetData[i++];
UINT sextetD = sextetData[i++];
UINT triple = (sextetA << 18) + (sextetB << 12) + (sextetC << 6) + sextetD;
if (j < numBytes) byteData[j++] = (triple >> 16) & 0xFF;
if (j < numBytes) byteData[j++] = (triple >> 8) & 0xFF;
if (j < numBytes) byteData[j++] = triple & 0xFF;
}
return j;
}
I have a char[] that contains a value such as "0x1800785" but the function I want to give the value to requires an int, how can I convert this to an int? I have searched around but cannot find an answer. Thanks.
Have you tried strtol()?
strtol - convert string to a long integer
Example:
const char *hexstring = "abcdef0";
int number = (int)strtol(hexstring, NULL, 16);
In case the string representation of the number begins with a 0x prefix, one must should use 0 as base:
const char *hexstring = "0xabcdef0";
int number = (int)strtol(hexstring, NULL, 0);
(It's as well possible to specify an explicit base such as 16, but I wouldn't recommend introducing redundancy.)
Or if you want to have your own implementation, I wrote this quick function as an example:
/**
* hex2int
* take a hex string and convert it to a 32bit number (max 8 hex digits)
*/
uint32_t hex2int(char *hex) {
uint32_t val = 0;
while (*hex) {
// get current character then increment
uint8_t byte = *hex++;
// transform hex character to the 4bit equivalent number, using the ascii table indexes
if (byte >= '0' && byte <= '9') byte = byte - '0';
else if (byte >= 'a' && byte <='f') byte = byte - 'a' + 10;
else if (byte >= 'A' && byte <='F') byte = byte - 'A' + 10;
// shift 4 to make space for new digit, and add the 4 bits of the new digit
val = (val << 4) | (byte & 0xF);
}
return val;
}
Something like this could be useful:
char str[] = "0x1800785";
int num;
sscanf(str, "%x", &num);
printf("0x%x %i\n", num, num);
Read man sscanf
Assuming you mean it's a string, how about strtol?
Use strtol if you have libc available like the top answer suggests. However if you like custom stuff or are on a microcontroller without libc or so, you may want a slightly optimized version without complex branching.
#include <inttypes.h>
/**
* xtou64
* Take a hex string and convert it to a 64bit number (max 16 hex digits).
* The string must only contain digits and valid hex characters.
*/
uint64_t xtou64(const char *str)
{
uint64_t res = 0;
char c;
while ((c = *str++)) {
char v = (c & 0xF) + (c >> 6) | ((c >> 3) & 0x8);
res = (res << 4) | (uint64_t) v;
}
return res;
}
The bit shifting magic boils down to: Just use the last 4 bits, but if it is an non digit, then also add 9.
One quick & dirty solution:
// makes a number from two ascii hexa characters
int ahex2int(char a, char b){
a = (a <= '9') ? a - '0' : (a & 0x7) + 9;
b = (b <= '9') ? b - '0' : (b & 0x7) + 9;
return (a << 4) + b;
}
You have to be sure your input is correct, no validation included (one could say it is C). Good thing it is quite compact, it works with both 'A' to 'F' and 'a' to 'f'.
The approach relies on the position of alphabet characters in the ASCII table, let's peek e.g. to Wikipedia (https://en.wikipedia.org/wiki/ASCII#/media/File:USASCII_code_chart.png). Long story short, the numbers are below the characters, so the numeric characters (0 to 9) are easily converted by subtracting the code for zero. The alphabetic characters (A to F) are read by zeroing other than last three bits (effectively making it work with either upper- or lowercase), subtracting one (because after the bit masking, the alphabet starts on position one) and adding ten (because A to F represent 10th to 15th value in hexadecimal code). Finally, we need to combine the two digits that form the lower and upper nibble of the encoded number.
Here we go with same approach (with minor variations):
#include <stdio.h>
// takes a null-terminated string of hexa characters and tries to
// convert it to numbers
long ahex2num(unsigned char *in){
unsigned char *pin = in; // lets use pointer to loop through the string
long out = 0; // here we accumulate the result
while(*pin != 0){
out <<= 4; // we have one more input character, so
// we shift the accumulated interim-result one order up
out += (*pin < 'A') ? *pin & 0xF : (*pin & 0x7) + 9; // add the new nibble
pin++; // go ahead
}
return out;
}
// main function will test our conversion fn
int main(void) {
unsigned char str[] = "1800785"; // no 0x prefix, please
long num;
num = ahex2num(str); // call the function
printf("Input: %s\n",str); // print input string
printf("Output: %x\n",num); // print the converted number back as hexa
printf("Check: %ld = %ld \n",num,0x1800785); // check the numeric values matches
return 0;
}
Try below block of code, its working for me.
char p[] = "0x820";
uint16_t intVal;
sscanf(p, "%x", &intVal);
printf("value x: %x - %d", intVal, intVal);
Output is:
value x: 820 - 2080
So, after a while of searching, and finding out that strtol is quite slow, I've coded my own function. It only works for uppercase on letters, but adding lowercase functionality ain't a problem.
int hexToInt(PCHAR _hex, int offset = 0, int size = 6)
{
int _result = 0;
DWORD _resultPtr = reinterpret_cast<DWORD>(&_result);
for(int i=0;i<size;i+=2)
{
int _multiplierFirstValue = 0, _addonSecondValue = 0;
char _firstChar = _hex[offset + i];
if(_firstChar >= 0x30 && _firstChar <= 0x39)
_multiplierFirstValue = _firstChar - 0x30;
else if(_firstChar >= 0x41 && _firstChar <= 0x46)
_multiplierFirstValue = 10 + (_firstChar - 0x41);
char _secndChar = _hex[offset + i + 1];
if(_secndChar >= 0x30 && _secndChar <= 0x39)
_addonSecondValue = _secndChar - 0x30;
else if(_secndChar >= 0x41 && _secndChar <= 0x46)
_addonSecondValue = 10 + (_secndChar - 0x41);
*(BYTE *)(_resultPtr + (size / 2) - (i / 2) - 1) = (BYTE)(_multiplierFirstValue * 16 + _addonSecondValue);
}
return _result;
}
Usage:
char *someHex = "#CCFF00FF";
int hexDevalue = hexToInt(someHex, 1, 8);
1 because the hex we want to convert starts at offset 1, and 8 because it's the hex length.
Speedtest (1.000.000 calls):
strtol ~ 0.4400s
hexToInt ~ 0.1100s
This is a function to directly convert hexadecimal containing char array to an integer which needs no extra library:
int hexadecimal2int(char *hdec) {
int finalval = 0;
while (*hdec) {
int onebyte = *hdec++;
if (onebyte >= '0' && onebyte <= '9'){onebyte = onebyte - '0';}
else if (onebyte >= 'a' && onebyte <='f') {onebyte = onebyte - 'a' + 10;}
else if (onebyte >= 'A' && onebyte <='F') {onebyte = onebyte - 'A' + 10;}
finalval = (finalval << 4) | (onebyte & 0xF);
}
finalval = finalval - 524288;
return finalval;
}
I have done a similar thing before and I think this might help you.
The following works for me:
int main(){
int co[8];
char ch[8];
printf("please enter the string:");
scanf("%s", ch);
for (int i=0; i<=7; i++) {
if ((ch[i]>='A') && (ch[i]<='F')) {
co[i] = (unsigned int) ch[i]-'A'+10;
} else if ((ch[i]>='0') && (ch[i]<='9')) {
co[i] = (unsigned int) ch[i]-'0'+0;
}
}
Here, I have only taken a string of 8 characters.
If you want you can add similar logic for 'a' to 'f' to give their equivalent hex values. Though, I haven't done that because I didn't need it.
I made a librairy to make Hexadecimal / Decimal conversion without the use of stdio.h. Very simple to use :
unsigned hexdec (const char *hex, const int s_hex);
Before the first conversion intialize the array used for conversion with :
void init_hexdec ();
Here the link on github : https://github.com/kevmuret/libhex/
I like #radhoo solution, very efficient on small systems. One can modify the solution for converting the hex to int32_t (hence, signed value).
/**
* hex2int
* take a hex string and convert it to a 32bit number (max 8 hex digits)
*/
int32_t hex2int(char *hex) {
uint32_t val = *hex > 56 ? 0xFFFFFFFF : 0;
while (*hex) {
// get current character then increment
uint8_t byte = *hex++;
// transform hex character to the 4bit equivalent number, using the ascii table indexes
if (byte >= '0' && byte <= '9') byte = byte - '0';
else if (byte >= 'a' && byte <='f') byte = byte - 'a' + 10;
else if (byte >= 'A' && byte <='F') byte = byte - 'A' + 10;
// shift 4 to make space for new digit, and add the 4 bits of the new digit
val = (val << 4) | (byte & 0xF);
}
return val;
}
Note the return value is int32_t while val is still uint32_t to not overflow.
The
uint32_t val = *hex > 56 ? 0xFFFFFFFF : 0;
is not protected against malformed string.
Here is a solution building upon "sairam singh"s solution. Where that answer is a one to one solution, this one combines two ASCII nibbles into one byte.
// Assumes input is null terminated string.
//
// IN OUT
// -------------------- --------------------
// Offset Hex ASCII Offset Hex
// 0 0x31 1 0 0x13
// 1 0x33 3
// 2 0x61 A 1 0xA0
// 3 0x30 0
// 4 0x00 NULL 2 NULL
int convert_ascii_hex_to_hex2(char *szBufOut, char *szBufIn) {
int i = 0; // input buffer index
int j = 0; // output buffer index
char a_byte;
// Two hex digits are combined into one byte
while (0 != szBufIn[i]) {
// zero result
szBufOut[j] = 0;
// First hex digit
if ((szBufIn[i]>='A') && (szBufIn[i]<='F')) {
a_byte = (unsigned int) szBufIn[i]-'A'+10;
} else if ((szBufIn[i]>='a') && (szBufIn[i]<='f')) {
a_byte = (unsigned int) szBufIn[i]-'a'+10;
} else if ((szBufIn[i]>='0') && (szBufIn[i]<='9')) {
a_byte = (unsigned int) szBufIn[i]-'0';
} else {
return -1; // error with first digit
}
szBufOut[j] = a_byte << 4;
// second hex digit
i++;
if ((szBufIn[i]>='A') && (szBufIn[i]<='F')) {
a_byte = (unsigned int) szBufIn[i]-'A'+10;
} else if ((szBufIn[i]>='a') && (szBufIn[i]<='f')) {
a_byte = (unsigned int) szBufIn[i]-'a'+10;
} else if ((szBufIn[i]>='0') && (szBufIn[i]<='9')) {
a_byte = (unsigned int) szBufIn[i]-'0';
} else {
return -2; // error with second digit
}
szBufOut[j] |= a_byte;
i++;
j++;
}
szBufOut[j] = 0;
return 0; // normal exit
}
I know this is really old but I think the solutions looked too complicated. Try this in VB:
Public Function HexToInt(sHEX as String) as long
Dim iLen as Integer
Dim i as Integer
Dim SumValue as Long
Dim iVal as long
Dim AscVal as long
iLen = Len(sHEX)
For i = 1 to Len(sHEX)
AscVal = Asc(UCase(Mid$(sHEX, i, 1)))
If AscVal >= 48 And AscVal <= 57 Then
iVal = AscVal - 48
ElseIf AscVal >= 65 And AscVal <= 70 Then
iVal = AscVal - 55
End If
SumValue = SumValue + iVal * 16 ^ (iLen- i)
Next i
HexToInt = SumValue
End Function
I have a string say:
char *hexstring = "08fc0021";
this is a concatenation of two information each two bytes long.
The first two bytes of this string, ie.: 08fc corresponds to 2300 in dec.
the last 4 bytes, ie., 0021 -> 33.
My problem is to convert this string into two different variables, say:
int varA, varB;
here varA will have the number 2300, and varB = 33.
normally I would have used sscanf to convert the string into a decimal num.
but now i have this problem of a concatenated string with two different info.
any idea suggestion how to nail this ?
thx in advance
Bitwise AND to the Rescue!
So, doing what you require can be done using the bitwise AND opperator on the resulting 32bit number (int?) you get from sscanf.
You first get the number from the string:
char* hexstring = "0x08fc0021";
int num = 0;
sscanf(hexstring, "%x", &num); //put the number into num.
Then you get the bits you want using &:
int varA=0, varB=0;
varA = num & 0xFFFF; //will get the second half.
varB = num & 0xFFFF0000;
varB = varB >> 16; // now you have the first half as well.
And there you have it.
int main(int argc, char *argv[]) {
char *hexstring = "08fc0021";
unsigned long hexnumber = 0u;
unsigned short a = 0u;
unsigned short b = 0u;
/* Use sscanf() to convert the string to integer */
sscanf(hexstring, "%x", &hexnumber);
/* Use bitwise and to filter out the two higher bytes *
* and shift it 16 bits right */
a = ((hexnumber & 0xFFFF0000u) >> 16u);
/* Use bitwise AND to filter out the two lower bytes */
b = (hexnumber & 0x0000FFFFu);
printf("0x%X 0x%X\n",a,b);
return 0;
}
You can use this approach (bit operations):
char *hexstring = "08fc0021";
int aux;
sscanf(hexstring, "%x", &aux);
printf("aux = 0x%x = %d\n", aux, aux);
int varA = (aux & 0xFFFF0000) >> 16, varB = aux & 0x0000FFFF;
printf("varA = 0x%x = %d\n", varA, varA);
printf("varB = 0x%x = %d\n", varB, varB);
Result:
aux = 0x8fc0021 = 150732833
varA = 0x8fc = 2300
varB = 0x21 = 33
EDIT:
Or this approach (string manipulation):
// requires a hexstring length of 8 or more sophisticated logic
char *hexstring = "08fc0021";
int len = strlen(hexstring);
char varA[5], varB[5];
for(int i = 0; i<len; i++)
{
if(i < 4) varA[i] = hexstring[i];
else varB[i-4] = hexstring[i];
}
varA[4] = varB[4] = '\0';
int varAi, varBi;
sscanf(varA, "%x", &varAi);
sscanf(varB, "%x", &varBi);
printf("varAi = 0x%x = %d\n", varAi, varAi);
printf("varBi = 0x%x = %d\n", varBi, varBi);
Same result:
varAi = 0x8fc = 2300
varBi = 0x21 = 33
I have a string for which I compute a sha1 digest like this:
SHA1(sn, snLength, sha1Bin);
If I'm correct this results in a 20 byte char (with binary data). I want to compare the last 3 bytes of this char with another char. This char contains the string "6451E6". 64, 51 & E6 are hex values. How do I convert "6451E6" so that I can compare it via:
if(memcmp(&sha1Bin[(20 - 3)], theVarWithHexValues, 3) == 0)
{
}
I have this function:
/*
* convert hexadecimal ssid string to binary
* return 0 on error or binary length of string
*
*/
u32 str2ssid(u8 ssid[],u8 *str) {
u8 *p,*q = ssid;
u32 len = strlen(str);
if( (len % 2) || (len > MAX_SSID_OCTETS) )
return(0);
for(p = str;(*p = toupper(*p)) && (strchr(hexTable,*p)) != 0;) {
if(--len % 2) {
*q = ((u8*)strchr(hexTable,*p++) - hexTable);
*q <<= 4;
} else {
*q++ |= ((u8*)strchr(hexTable,*p++) - hexTable);
}
}
return( (len) ? 0 : (p - str) / 2);
}
which does the same but I'm new to C and don't understand it :-(
It's easier to go the other way — convert the binary data to a hex string for comparison:
char suffix[7];
sprintf(suffix, "%02x%02x%02x", sha1Bin[17], sha1Bin[18], sha1Bin[19]);
return stricmp(suffix, theVarWithHexValues) == 0;
Even if you prefer converting to binary, sscanf(...%2x...) is better than manually parsing hex numbers.
Fix for AShelly's code:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int hashequal(const unsigned char *sha1Bin, const char *hexstr) {
unsigned long hexvar = strtoul(hexstr, NULL, 16);
unsigned char theVarWithHexValues[] = { hexvar >> 16, hexvar >> 8, hexvar };
return memcmp(sha1Bin + 17, theVarWithHexValues, 3) == 0;
}
int main() {
unsigned char sha1Bin[20];
sha1Bin[17] = 0x64;
sha1Bin[18] = 0x51;
sha1Bin[19] = 0xE6;
printf("%d\n", hashequal(sha1Bin, "6451E6"));
printf("%d\n", hashequal(sha1Bin, "6451E7"));
}
If theVarWithHexValues is indeed a constant of some sort, then the easiest thing would be to put it into binary form directly. Instead of:
const char *theVarWithHexValues = "6451E6";
use:
const char *theVarWithHexValues = "\x64\x51\xE6";
...then you can just memcmp() directly.
char* hexstr = "6451E6";
unsigned long hexvar = strtoul(hexstr, NULL, 16);
hexvar = htonl(hexvar)<<8; //convert to big-endian and get rid of zero byte.
memcmp(&sha1Bin[(20 - 3)], (char*)hexvar, 3)
How does this work?
I know to use it you pass in:
start: string (e.g. "Item 1, Item 2, Item 3")
delim: delimiter string (e.g. ",")
tok: reference to a string which will hold the token
nextpos (optional): reference to a the position in the original string where the next token starts
sdelim (optional): pointer to a character which will hold the starting delimeter of the token
edelim (optional): pointer to a character which will hold the ending delimeter of the token
Code:
#include <stdlib.h>
#include <string.h>
int token(char* start, char* delim, char** tok, char** nextpos, char* sdelim, char* edelim) {
// Find beginning:
int len = 0;
char *scanner;
int dictionary[8];
int ptr;
for(ptr = 0; ptr < 8; ptr++) {
dictionary[ptr] = 0;
}
for(; *delim; delim++) {
dictionary[*delim / 32] |= 1 << *delim % 32;
}
if(sdelim) {
*sdelim = 0;
}
for(; *start; start++) {
if(!(dictionary[*start / 32] & 1 << *start % 32)) {
break;
}
if(sdelim) {
*sdelim = *start;
}
}
if(*start == 0) {
if(nextpos != NULL) {
*nextpos = start;
}
*tok = NULL;
return 0;
}
for(scanner = start; *scanner; scanner++) {
if(dictionary[*scanner / 32] & 1 << *scanner % 32) {
break;
}
len++;
}
if(edelim) {
*edelim = *scanner;
}
if(nextpos != NULL) {
*nextpos = scanner;
}
*tok = (char*)malloc(sizeof(char) * (len + 1));
if(*tok == NULL) {
return 0;
}
memcpy(*tok, start, len);
*(*tok + len) = 0;
return len + 1;
}
I get most of it except for:
dictionary[*delim / 32] |= 1 << *delim % 32;
and
dictionary[*start / 32] & 1 << *start % 32
Is it magic?
Since each character of the delimiter is 8 bits (sizeof(char) == 1 byte), it is limited to 256 possible values.
The dictionary is broken into 8 pieces (int dictionary[8]), 32 possibilities per piece (sizeof(int) is >= 4 bytes) and 32 * 8 = 256.
This forms a 256 bit matrix of values. It then turns on the flag for each character in the delimiter (dictionary[*delim / 32] |= 1 << *delim % 32;). The index of the array is *delim / 32, or the ASCII value of the character divided by 32. Since the ASCII value ranges from 0 to 255, this divide yields a value of 0 to 7 with a remainder. The remainder is which bit to turn on, decided by the modulus operation.
All this does is flag certain bits of the 256 bit matrix as true, if the corresponding ASCII character exists in the delimiter.
Then determining if a character is in the delimiter is simply a lookup in the 256 bit matrix (dictionary[*start / 32] & 1 << *start % 32)
They store which characters have occurred by making an 8 x 32 = 256 table of bits stored in dictionary.
dictionary[*delim / 32] |= 1 << *delim % 32;
sets the bit corresponding to *delim
dictionary[*start / 32] & 1 << *start % 32
checks the bit
OK, so if we send in the string "," for the delimiter then dictionary[*delim / 32] |= 1 << *delim % 32 will be dictionary[1] = 4096. The expression dictionary[*start / 32] & 1 << *start % 32 simply checks for a matching character.
What puzzles me is why they are not using direct char comparison.