Working with base for strtol argument in C

Working with base for strtol argument in C - c

I've been learning C by using some book and there's some code that I don't understand regarding working with base in C programming. So, here's the code:
/* By default, integers are decimal */
#define GN_ANY_BASE 0100 /* Can use any base - like strtol(3) */
#define GN_BASE_8 0200 /* Value is expressed in octal */
#define GN_BASE_16 0400 /* Value is expressed in hexadecimal */
// rlength: Read length bytes from the file, starting at the current file offset, and display them in text form.
int main(int argc, char *argv[]) {
fd = open(argv[1], O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH); /* rw-rw-rw- */
if (fd == -1)
errExit("open");
for (ap = 2; ap < argc; ap++) {
switch (argv[ap][0]) {
case 'r': /* Display bytes at current offset, as text */
case 'R': /* Display bytes at current offset, in hex */
len = getLong(&argv[ap][1], GN_ANY_BASE, argv[ap]);
buf = malloc(len);
...
}
And here's the getLong function:
long getLong(const char *arg, int flags, const char *name) {
return getNum("getLong", arg, flags, name);
}
And here's the getNum function:
static long getNum(const char *fname, const char *arg, int flags, const char *name) {
long res;
char *endptr;
int base;
if (arg == NULL || *arg == '\n')
gnFail(fname, "null or empty string", arg, name);
base = (flags & GN_ANY_BASE) ? 0 : (flags & GN_BASE_8) ? 8 : (flags & GN_BASE_16) ? 16 : 10;
res = strtol(arg, &endptr, base);
return res;
}
This line really confuses me, and what happened after that didn't help either:
base = (flags & GN_ANY_BASE) ? 0 : (flags & GN_BASE_8) ? 8 : (flags & GN_BASE_16) ? 16 : 10;
What did it do? and why do we need this? what is GN_BASE ?
I can run the programs from the terminal by using the command:
./main file.txt r50
And it will give the following output like this:
r50: this is a text inside file.txt
I tried removing as many codes as possible for this example and hope it helps clear the example.
I'm sorry for asking beforehand. Thank you very much.

The argument flag can carry other information so we mask what we are interested in with bitwise and-operator (&). Btw, often those mutually exclusive flags are written with a left shift instead of absolute values (1<<6, 1<<7, 1<<8).
We map the value provided to something that is more convenient for use internally (0100 to 0, 0200 to 8, and 0400 to 16).
Finally, we provide a default value of 10 if none of those 3 base bits were found.

Related

How to parse string to struct?

I want to know if there is any way to convert a string like "5ABBF13A000A01" to the next struct using struct and union method:
struct xSensorData2{
unsigned char flags;
unsigned int refresh_rate;
unsigned int timestamp;
};
Data should be:
flags = 0x01 (last byte);
refresh_rate = 0x000A (two bytes);
timestamp = 5ABBF13A (four bytes);
I'm thinking in next struct of data:
struct xData{
union{
struct{
unsigned char flags:8;
unsigned int refresh_rate:16;
unsigned int timestamp:32;
};
char pcBytes[8];
};
}__attribute__ ((packed));
But I got a struct of 12 bytes, I think it's because bit fields don't work for different types of data. I should just convert string to array of hex, copy to pcBytes and have access to each field.
Update:
In stm32 platform i used this code:
bool bDecode_HexString(char *p)
{
uint64_t data = 0; /* exact width type for conversion
*/
char *endptr = p; /* endptr for strtoul validation */
errno = 0; /* reset errno */
data = strtoull (p, &endptr, 16); /* convert input to uint64_t */
if (p == endptr && data == 0) { /* validate digits converted */
fprintf (stderr, "error: no digits converted.\n");
return false;
}
if (errno) { /* validate no error occurred during conversion */
fprintf (stderr, "error: conversion failure.\n");
return false;
}
//printf ("data: 0x%" PRIx64 "\n\n", data); /* output conerted string */
sensor.flags = data & 0xFF; /* set flags */
sensor.rate = (data >> CHAR_BIT) & 0xFFFF; /* set rate */
sensor.tstamp = (data >> (3 * CHAR_BIT)) & 0xFFFFFFFF; /* set timestamp */
return true;
/* output sensor struct */
// printf ("sensor.flags : 0x%02" PRIx8 "\nsensor.rate : 0x%04" PRIx16
// "\nsensor.tstamp: 0x%08" PRIx32 "\n", sensor.flags, sensor.rate,
// sensor.tstamp);
}
and call the function:
char teste[50] = "5ABBF13A000A01";
bDecode_HexString(teste);
I get data = 0x3a000a01005abbf1

If you are still struggling with the separation of your input into flags, rate & timestamp, then the suggestions in the comments regarding using an unsigned type to hold your input string converted to a value, and using the exact width types provided in <stdint.h>, will avoid potential problems inherent in manipulating signed types (e.g. potential sign-extension, etc.)
If you want to separate the values and coordinate those in struct, that is 100% fine. The work come in separating the individual flags, rate & timestamp. How you choose to store them so they are convenient within your code is up to you. A simple struct utilizing exact-width type could be:
typedef struct { /* struct holding sensor data */
uint8_t flags;
uint16_t rate;
uint32_t tstamp;
} sensor_t;
In a conversion from char * to uint64_t with strtoull, you have two primary validation checks:
utilize both the pointer to the string to convert and the endptr parameter to validate that digits were in fact converted (strtoull sets endptr to point 1-character after the last digit converted). You use this to compare endptr with the original pointer for the data converted to confirm that a conversion took place (if no digits were converted the original pointer and endptr will be the same and the value returned will be zero); and
you set errno = 0; before the conversion and then check again after the conversion to insure no error occurred during the conversion itself. If strtoull encounters an error, value exceeds range, etc.., errno is set to a positive value.
(and if you have specific range validations, e.g. you want to store the result in a size less than that of the conversion, like uint32_t instead of uint64_t, you need to validate the final value can be stored in the smaller type)
A simple approach would be:
uint64_t data = 0; /* exact width type for conversion */
...
char *p = argc > 1 ? argv[1] : "0x5ABBF13A000A01", /* input */
*endptr = p; /* endptr for strtoul validation */
errno = 0; /* reset errno */
...
data = strtoull (p, &endptr, 0); /* convert input to uint64_t */
if (p == endptr && data == 0) { /* validate digits converted */
fprintf (stderr, "error: no digits converted.\n");
return 1;
}
if (errno) { /* validate no error occurred during conversion */
fprintf (stderr, "error: conversion failure.\n");
return 1;
}
printf ("data: 0x%" PRIx64 "\n\n", data); /* output conerted string */
Finally, separating the value in data into the individual values of flags, rate & timestamp, can be done with simple shifts & ANDS, e.g.
sensor_t sensor = { .flags = 0 }; /* declare instance of sensor */
...
sensor.flags = data & 0xFF; /* set flags */
sensor.rate = (data >> CHAR_BIT) & 0xFFFF; /* set rate */
sensor.tstamp = (data >> (3 * CHAR_BIT)) & 0xFFFFFFFF; /* set timestamp */
/* output sensor struct */
printf ("sensor.flags : 0x%02" PRIx8 "\nsensor.rate : 0x%04" PRIx16
"\nsensor.tstamp: 0x%08" PRIx32 "\n", sensor.flags, sensor.rate,
sensor.tstamp);
Putting it altogether, you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <errno.h>
#include <limits.h>
typedef struct { /* struct holding sensor data */
uint8_t flags;
uint16_t rate;
uint32_t tstamp;
} sensor_t;
int main (int argc, char **argv) {
uint64_t data = 0; /* exact width type for conversion */
sensor_t sensor = { .flags = 0 }; /* declare instace of sensor */
char *p = argc > 1 ? argv[1] : "0x5ABBF13A000A01", /* input */
*endptr = p; /* endptr for strtoul validation */
errno = 0; /* reset errno */
data = strtoull (p, &endptr, 0); /* convert input to uint64_t */
if (p == endptr && data == 0) { /* validate digits converted */
fprintf (stderr, "error: no digits converted.\n");
return 1;
}
if (errno) { /* validate no error occurred during conversion */
fprintf (stderr, "error: conversion failure.\n");
return 1;
}
printf ("data: 0x%" PRIx64 "\n\n", data); /* output conerted string */
sensor.flags = data & 0xFF; /* set flags */
sensor.rate = (data >> CHAR_BIT) & 0xFFFF; /* set rate */
sensor.tstamp = (data >> (3 * CHAR_BIT)) & 0xFFFFFFFF; /* set timestamp */
/* output sensor struct */
printf ("sensor.flags : 0x%02" PRIx8 "\nsensor.rate : 0x%04" PRIx16
"\nsensor.tstamp: 0x%08" PRIx32 "\n", sensor.flags, sensor.rate,
sensor.tstamp);
return 0;
}
Example Use/Output
$ ./bin/struct_sensor_bitwise
data: 0x5abbf13a000a01
sensor.flags : 0x01
sensor.rate : 0x000a
sensor.tstamp: 0x5abbf13a
Look things over and let me know if you have further questions.

Here, you have a string of length 14, representing a 7-byte value, consisting of 14 hexadecimal values. Consider this code using strtol, which hexadecimally converts your string into a long int, and then decodes it digit-wise.
uint64_t n = strtoul(str, NULL, 16); // Convert to hexadecimal
int flags = n % 0x10;
int refresh_rate = (n / 0x100) % 0x100000;
int timestamp = n / 0x1000000;
Here is a test case (#import <stdlib.h>):
char str[16] = "5ABBF13EE00AFF";
uint64_t n = strtoul(str, NULL, 16); // Convert to hexadecimal
// Use divide and mod to extract digit segment
int flags = n % 0x100;
int refresh_rate = (n / 0x100) % 0x10000;
int timestamp = n / 0x1000000;
// Print timestamp, refresh rate, and flags
printf("%p %p %p", timestamp, refresh_rate, flags);
The expected result is 0x5abbf13e 0xe00a 0xff as expected.

C Convert Large Integer To Binary

I have IP addresses coming into a box in the form of, for example, 21211328.
I want to take this integer and convert it to its binary form.
The problem I'm getting right now is that my function spits out, based off of the example, -62674752.
This is obviously wrong, as it can't be negative and this isn't in binary.
The function I am using looks like this:
int toBinary(int decimalNo){
if (decimalNo == 0) return 0;
if (decimalNo == 1) return 1; /* optional */
return (decimalNo % 2) + 10 * toBinary(decimalNo / 2);
}
I am using it as follows:
int num_converted = toBinary(iph->saddr); // convert it to binary
printk(KERN_INFO "BSADDR: %d", num_converted); // print the binary conversion to kernel for debugging
if ((num_converted & mask_array) == masked_sub){
return NF_DROP;
}
And this is returning the incorrect output in my kernel logs as seen above.
iph->saddr returns the 21211328, and int num_converted = toBinary(iph->saddr); returns -62674752.

First, ip addresses are normally represented (for human consumption) as quartets of individual bytes (four numbers separated by dots in network byte order) and not as binary sequences.
The address you post (21211328) represents the address 1.67.168.192 (which I guess is a wrong representation ---you need to consider that all ip addresses are in network byte order, most signifiant byte first--- of address 192.168.67.1)
to successfully decode an ip address, you have first to consider if you machine will treat it in the correct endianness or you will have to switch bytes to be able to interpret it as a number. The correct integer (in decimal) for the address 192.168.67.1 would be 3,232,252,673. This number cannot be represented as a signed int, because it has the most signifiant bit on, so it would be represented as a negative number.
To decode an IP address, you first have to convert the number you get from the socket interface to host byte order (endianness) by use of the ntohl(3) function. Once you have the address in host byte order, then you have to convert the number to base 256. This is quite easy, and you can do it with the following snippet of code:
unsigned long ip_address = ntohl(server_address.sin_addr.s_addr);
int i;
for (i = 24; i >= 0; i -= 8) {
if (i < 24) printf("."); /* print the dot between the numbers */
printf("%d",
(ip_address >> i) & 0xff);
}
which I'll explain below:
The expression (ip_address >> i) & 0xff means to right shift the bits 24 to 31, by i places (or 24 places, 16 places, 8 places and 0 places) to the right. So first we will put the 8 most signifiant bits in the place 0 to 7 (24..31 => 0..7), next we will do that for the bits next to them (16..23 => 0..7), and so on until the least signifiant bits, which are not shifted at all (0..7 => 0..7). Once we have the bits we are interested in positions 0..7, we mask them with 0xff (which is a value with 1s in bit positions 0..7 and 0s elsewhere) so the and bit operator & will leave only the bits we have moved to the fixed positions 0..7 and will mask out all the others. So we got a number between 0 and 255 finally, which is what we are printing.
In the case you indeed want to represent them in binary form, you can do it by slightly modifying the above code, just considering that instead of having eight bit numbers as digits you have one bit numbers as digits:
unsigned long ip_address = ntohl(server_address.sin_addr.s_addr);
for (i = 31; i >= 0; i--) {
/* In this case, we don't print the dot between the numbers */
printf("%d",
(ip_address >> i) & 0x1);
}
The mask this time is 0x01 which only masks the least signifiant bit of the number.
Finally, I wrote a complete program to illustrate both ways of decoding, showing in addition the use of the ntohl(3) function:
ipaddr.c
/* 00001 */ #include <arpa/inet.h>
/* 00002 */ #include <stdio.h>
/* 00003 */ #ifndef DOTTED_DECIMAL
/* 00004 */ #define DOTTED_DECIMAL 1
/* 00005 */ #endif
/* 00006 */ #if DOTTED_DECIMAL
/* 00007 */ # define NBITS 8
/* 00008 */ # define SEP "."
/* 00009 */ #else /* BINARY */
/* 00010 */ # define NBITS 1
/* 00011 */ # define SEP ""
/* 00012 */ #endif
/* 00013 */ #define MASK ((1 << NBITS) - 1) /* 100..00 - 1 = 011..11, with NBITS `1` bits */
/* 00014 */ char *ip_formatted(long ip, char *sep, char *buff, size_t buffsz);
/* 00015 */ int main()
/* 00016 */ {
/* 00017 */ char line[1024];
/* 00018 */ unsigned long ip_netfmt = 21211328; /* this was the ip you posted (in network byte order) */
/* 00019 */ unsigned long ip_hostfmt = ntohl(ip_netfmt); /* this is the ip in host byte order */
/* 00020 */ printf("%lu => [%s]\n", ip_netfmt, ip_formatted(ip_netfmt, SEP, line, sizeof line));
/* 00021 */ printf("%lu => [%s]\n", ip_hostfmt, ip_formatted(ip_hostfmt, SEP, line, sizeof line));
/* 00022 */ }
/* 00023 */ char *ip_formatted(long ip, char *sep, char *buff, size_t buffsz)
/* 00024 */ {
/* 00025 */ size_t n;
/* 00026 */ char *s = buff;
/* 00027 */ int i;
/* 00028 */ for (i = 32 - NBITS; i >= 0; i -= NBITS) {
/* 00029 */ int digit = (ip >> i) & MASK;
/* 00030 */ n = snprintf(s, buffsz,
/* 00031 */ "%s%d",
/* 00032 */ i == 32 - NBITS ? "" : sep,
/* 00033 */ digit);
/* 00034 */ s += n; buffsz -= n;
/* 00035 */ }
/* 00036 */ return buff;
/* 00037 */ }
compile this code with the following commands:
$ cc -o ipaddr -DDOTTED_DECIMAL=1 ipaddr.c
to see output in dotted decimal, and
$ cc -o ipaddr -DDOTTED_DECIMAL=0 ipaddr.c
to see output in binary digits.
(I included line numbers, so references to the code can be used, and commented, so you can directly compile the code by cut and paste it)

Try this:
#include <stdio.h>
char *getBin( unsigned long val) {
static char buf[33];
int i;
buf[32]='\0';
for(i=31; i>=0; i--){
buf[i] = val & 1?'1':'0';
val/=2;
}
return buf;
}
int main()
{
unsigned long ip=0x80e2d301;
printf( " ip:%08lx, bin:%sB\n", (unsigned long)ip, getBin(ip) );
return 0;
}

Understanding C Requirements of bits and offset [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I understand the general idea of C and how making a log file would go. Reading/writing to a file and such.
My concern is the following format that is desired:
[![enter image description here][1]][1]
I've gotten a good chunk done now but am concerned with how to append to my log file after the first record. I increment the file's record count (in the top 2 bytes) and write the first record after it. How would I then setup to add the 2nd/3rd/etc records to showup after each other?
//confirm a file exists in the directory
bool fileExists(const char* file)
{
struct stat buf;
return (stat(file, &buf) == 0);
}
int rightBitShift(int val, int space)
{
return ((val >> space) & 0xFF);
}
int leftBitShift(int val, int space)
{
return (val << space);
}
int determineRecordCount(char * logName)
{
unsigned char record[2];
FILE *fp = fopen(logName, "rb");
fread(record, sizeof(record), 1, fp);
//display the record number
int recordNum = (record[0] << 8) | record[1];
recordNum = recordNum +1;
return (recordNum);
}
void createRecord(int argc, char **argv)
{
int recordNum;
int aux = 0;
int dst;
char* logName;
char message[30];
memset(message,' ',30);
//check argument count and validation
if (argc == 7 && strcmp("-a", argv[2]) ==0 && strcmp("-f", argv[3]) ==0 && strcmp("-t", argv[5]) ==0)
{
//aux flag on
aux = 1;
logName = argv[4];
strncpy(message, argv[6],strlen(argv[6]));
}
else if (argc == 6 && strcmp("-f", argv[2]) ==0 && strcmp("-t", argv[4]) ==0)
{
logName = argv[3];
strncpy(message, argv[5],strlen(argv[5]));
}
else
{
printf("Invalid Arguments\n");
exit(0);
}
//check if log exists to get latest recordNum
if (fileExists(logName))
{
recordNum = determineRecordCount(logName);
printf("%i\n",recordNum);
}
else
{
printf("Logfile %s not found\n", logName);
recordNum = 1;
}
//Begin creating record
unsigned char record[40]; /* One record takes up 40 bytes of space */
memset(record, 0, sizeof(record));
//recordCount---------------------------------------------------------------------
record[0] = rightBitShift (recordNum, 8); /* Upper byte of sequence number */
record[1] = rightBitShift (recordNum, 0); /* Lower byte of sequence number */
//get aux/dst flags---------------------------------------------------------------
//get date and time
time_t timeStamp = time(NULL);
struct tm *date = localtime( &timeStamp );
if (date->tm_isdst)
dst = 1;
record[2] |= aux << 7; //set 7th bit
record[2] |= dst << 6; //set 6th
//timeStamp-----------------------------------------------------------------------
record[3] |= rightBitShift(timeStamp, 24);//high byte
record[4] |= rightBitShift(timeStamp, 16);
record[5] |= rightBitShift(timeStamp, 8);
record[6] |= rightBitShift(timeStamp, 0); //low byte
//leave bytes 7-8, set to 0 -----------------------------------------
record[7] = 0;
record[8] = 0;
//store message--------------------------------------------
strncpy(&record[9], message, strlen(message));
//write record to log-----------------------------------------------------------------
FILE *fp = fopen(logName, "w+");
unsigned char recordCount[4];
recordCount[0] = rightBitShift (recordNum, 8); /* Upper byte of sequence number */
recordCount[1] = rightBitShift (recordNum, 0); /* Lower byte of sequence number */
recordCount[2] = 0;
recordCount[3] = 0;
fwrite(recordCount, sizeof(recordCount), 1, fp);
fwrite(record, sizeof(record), 1, fp);
fclose(fp);
printf("Record saved successfully\n");
}

NOTE: I've never had to do this before in C, take it with a grain of salt.
This is a very specific binary formatting where each bit is precisely accounted for. It's using the Least-Significant-Bit numbering scheme (LSB 0) where the bits are numbered from 7 to 0.
Specifying that the "upper byte" comes first means this format is big-endian. The most significant bits come first. This is like how we write our numbers, four thousand, three hundred, and twenty one is 4321. 1234 would be little-endian. For example, the Number Of Records and Sequence are both 16 bit big-endian numbers.
Finally, the checksum is a number calculated from the rest of the record to verify there were no mistakes in transmission. The spec defines how to make the checksum.
Your job is to precisely reproduce this format, probably using the fixed-sized types found in stdint.h or unsigned char. For example, the sequence would be a uint16_t or unsigned char[2].
The function to produce a record might have a signature like this:
unsigned char *make_record( const char *message, bool aux );
The user only has to supply you with the message and the aux flag. The rest you can be figured out by the function. You might decide to let them pass in the timestamp and sequence. Point is, the function needs to be passed just the data, it takes care of the formatting.
This byte-ordering means you can't just write out integers, they might be the wrong size or the wrong byte order. That means any multi-byte integers must be serialized before you can write them to the record. This answer covers ways to do that and I'll be using the ones from this answer because they proved a bit more convenient.
#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
unsigned char *make_record( const char *message, bool aux ) {
// Allocate and zero memory for the buffer.
// Zeroing means no risk of accidentally sending garbage.
unsigned char *buffer = calloc( 40, sizeof(unsigned char) );
// As we add to the buffer, pos will track the next byte to be written.
unsigned char *pos = buffer;
// I decided not make the user responsible for
// the sequence number. YMMV.
static uint16_t sequence = 1;
pos = serialize_uint16( pos, sequence );
// Get the timestamp and DST.
time_t timestamp = time(NULL);
struct tm *date = localtime( &timestamp );
// 2nd row is all flags and a bunch of 0s. Start with them all off.
uint8_t flags = 0;
if( aux ) {
// Flip the 7th bit on.
flags |= 0x80;
}
if( date->tm_isdst ) {
// Flip the 6th bit on.
flags |= 0x40;
}
// That an 8 bit integer has no endianness, this is to ensure
// pos is consistently incremented.
pos = serialize_uint8(pos, flags);
// I don't know what their timestamp format is.
// This is just a guess. It's probably wrong.
pos = serialize_uint32(pos, (uint32_t)timestamp);
// "Spare" is all zeros.
// The spec says this is 3 bytes, but only gives it bytes
// 7 and 8. I'm going with 2 bytes.
pos = serialize_uint16(pos, 0);
// Copy the message in, 30 bytes.
// strncpy() does not guarantee the message will be null
// terminated. This is probably fine as the field is fixed width.
// More info about the format would be necessary to know for sure.
strncpy( pos, message, 30 );
pos += 30;
// Checksum the first 39 bytes.
// Sorry, I don't know how to do 1's compliment sums.
pos = serialize_uint8( pos, record_checksum( buffer, 39 ) );
// pos has moved around, but buffer remains at the start
return buffer;
}
int main() {
unsigned char *record = make_record("Basset hounds got long ears", true);
fwrite(record, sizeof(unsigned char), 40, stdout);
}
At this point my expertise is exhausted, I've never had to do this before. I'd appreciate folks fixing up the little mistakes in edits and suggesting better ways to do it in the comments, like what to do with the timestamp. And maybe someone else can cover how to do 1's compliment checksums in another answer.

As a byte is composed by 8 bits (from 0 to 7) you can use bitwise operations to modify them as asked in your specifications. Take a look for general information (https://en.wikipedia.org/wiki/Bitwise_operations_in_C). As a preview, you can use >> or << operators to determine which bit to modify, and use logical operators | and & to set it's values.

searching an lzw encoded file

I am looking to alter the LZW compressor to enable it to search for a word in an LZW encoded file and finds the number of matches for that search term.
For example if my file is used as
Prompt:>lzw "searchterm" encoded_file.lzw
32
Any suggestions on how to achive this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BITS 12 /* Setting the number of bits to 12, 13*/
#define HASHING_SHIFT (BITS-8) /* or 14 affects several constants. */
#define MAX_VALUE (1 << BITS) - 1 /* Note that MS-DOS machines need to */
#define MAX_CODE MAX_VALUE - 1 /* compile their code in large model if*/
/* 14 bits are selected. */
#if BITS == 16
#define TABLE_SIZE 99991
#endif
#if BITS == 14
#define TABLE_SIZE 18041 /* The string table size needs to be a */
#endif /* prime number that is somewhat larger*/
#if BITS == 13 /* than 2**BITS. */
#define TABLE_SIZE 9029
#endif
#if BITS <= 12
#define TABLE_SIZE 5021
#endif
void *malloc();
int *code_value; /* This is the code value array */
unsigned int *prefix_code; /* This array holds the prefix codes */
unsigned char *append_character; /* This array holds the appended chars */
unsigned char decode_stack[4000]; /* This array holds the decoded string */
/*
* Forward declarations
*/
void compress(FILE *input,FILE *output);
void expand(FILE *input,FILE *output);
int find_match(int hash_prefix,unsigned int hash_character);
void output_code(FILE *output,unsigned int code);
unsigned int input_code(FILE *input);
unsigned char *decode_string(unsigned char *buffer,unsigned int code);
/********************************************************************
**
** This program gets a file name from the command line. It compresses the
** file, placing its output in a file named test.lzw. It then expands
** test.lzw into test.out. Test.out should then be an exact duplicate of
** the input file.
**
*************************************************************************/
main(int argc, char *argv[])
{
FILE *input_file;
FILE *output_file;
FILE *lzw_file;
char input_file_name[81];
char command;
command=(argv==3);
/*
** The three buffers are needed for the compression phase.
*/
code_value=(int*)malloc(TABLE_SIZE*sizeof(int));
prefix_code=(unsigned int *)malloc(TABLE_SIZE*sizeof(unsigned int));
append_character=(unsigned char *)malloc(TABLE_SIZE*sizeof(unsigned char));
if (code_value==NULL || prefix_code==NULL || append_character==NULL)
{
printf("Fatal error allocating table space!\n");
exit(-1);
}
/*
** Get the file name, open it up, and open up the lzw output file.
*/
if (argc>1)
strcpy(input_file_name,argv[1]);
else
{
printf("Input file name? ");
scanf("%s",input_file_name);
}
input_file=fopen(input_file_name,"rb");
lzw_file=fopen("test.lzw","wb");
if (input_file==NULL || lzw_file==NULL)
{
printf("Fatal error opening files.\n");
exit(-1);
};
/*
** Compress the file.
*/
if(command=='r')
{
compress(input_file,lzw_file);
}
fclose(input_file);
fclose(lzw_file);
free(c-ode_value);
/*
** Now open the files for the expansion.
*/
lzw_file=fopen("test.lzw","rb");
output_file=fopen("test.out","wb");
if (lzw_file==NULL || output_file==NULL)
{
printf("Fatal error opening files.\n");
exit(-2);
};
/*
** Expand the file.
*/
expand(lzw_file,output_file);
fclose(lzw_file);
fclose(output_file);
free(prefix_code);
free(append_character);
}
/*
** This is the compression routine. The code should be a fairly close
** match to the algorithm accompanying the article.
**
*/
void compress(FILE *input,FILE *output)
{
unsigned int next_code;
unsigned int character;
unsigned int string_code;
unsigned int index;
int i;
next_code=256; /* Next code is the next available string code*/
for (i=0;i<TABLE_SIZE;i++) /* Clear out the string table before starting */
code_value[i]=-1;
i=0;
printf("Compressing...\n");
string_code=getc(input); /* Get the first code */
/*
** This is the main loop where it all happens. This loop runs util all of
** the input has been exhausted. Note that it stops adding codes to the
** table after all of the possible codes have been defined.
*/
while ((character=getc(input)) != (unsigned)EOF)
{
if (++i==1000) /* Print a * every 1000 */
{ /* input characters. This */
i=0; /* is just a pacifier. */
printf("*");
}
index=find_match(string_code,character);/* See if the string is in */
if (code_value[index] != -1) /* the table. If it is, */
string_code=code_value[index]; /* get the code value. If */
else /* the string is not in the*/
{ /* table, try to add it. */
if (next_code <= MAX_CODE)
{
code_value[index]=next_code++;
prefix_code[index]=string_code;
append_character[index]=character;
}
output_code(output,string_code); /* When a string is found */
string_code=character; /* that is not in the table*/
} /* I output the last string*/
} /* after adding the new one*/
/*
** End of the main loop.
*/
output_code(output,string_code); /* Output the last code */
output_code(output,MAX_VALUE); /* Output the end of buffer code */
output_code(output,0); /* This code flushes the output buffer*/
printf("\n");
}
/*
** This is the hashing routine. It tries to find a match for the prefix+char
** string in the string table. If it finds it, the index is returned. If
** the string is not found, the first available index in the string table is
** returned instead.
*/
int find_match(int hash_prefix,unsigned int hash_character)
{
int index;
int offset;
index = (hash_character << HASHING_SHIFT) ^ hash_prefix;
if (index == 0)
offset = 1;
else
offset = TABLE_SIZE - index;
while (1)
{
if (code_value[index] == -1)
return(index);
if (prefix_code[index] == hash_prefix &&
append_character[index] == hash_character)
return(index);
index -= offset;
if (index < 0)
index += TABLE_SIZE;
}
}
/*
** This is the expansion routine. It takes an LZW format file, and expands
** it to an output file. The code here should be a fairly close match to
** the algorithm in the accompanying article.
*/
void expand(FILE *input,FILE *output)
{
unsigned int next_code;
unsigned int new_code;
unsigned int old_code;
int character;
int counter;
unsigned char *string;
next_code=256; /* This is the next available code to define */
counter=0; /* Counter is used as a pacifier. */
printf("Expanding...\n");
old_code=input_code(input); /* Read in the first code, initialize the */
character=old_code; /* character variable, and send the first */
putc(old_code,output); /* code to the output file */
/*
** This is the main expansion loop. It reads in characters from the LZW file
** until it sees the special code used to inidicate the end of the data.
*/
while ((new_code=input_code(input)) != (MAX_VALUE))
{
if (++counter==1000) /* This section of code prints out */
{ /* an asterisk every 1000 characters */
counter=0; /* It is just a pacifier. */
printf("*");
}
/*
** This code checks for the special STRING+CHARACTER+STRING+CHARACTER+STRING
** case which generates an undefined code. It handles it by decoding
** the last code, and adding a single character to the end of the decode string.
*/
if (new_code>=next_code)
{
*decode_stack=character;
string=decode_string(decode_stack+1,old_code);
}
/*
** Otherwise we do a straight decode of the new code.
*/
else
string=decode_string(decode_stack,new_code);
/*
** Now we output the decoded string in reverse order.
*/
character=*string;
while (string >= decode_stack)
putc(*string--,output);
/*
** Finally, if possible, add a new code to the string table.
*/
if (next_code <= MAX_CODE)
{
prefix_code[next_code]=old_code;
append_character[next_code]=character;
next_code++;
}
old_code=new_code;
}
printf("\n");
}
/*
** This routine simply decodes a string from the string table, storing
** it in a buffer. The buffer can then be output in reverse order by
** the expansion program.
*/
unsigned char *decode_string(unsigned char *buffer,unsigned int code)
{
int i;
i=0;
while (code > 255)
{
*buffer++ = append_character[code];
code=prefix_code[code];
if (i++>=MAX_CODE)
{
printf("Fatal error during code expansion.\n");
exit(-3);
}
}
*buffer=code;
return(buffer);
}
/*
** The following two routines are used to output variable length
** codes. They are written strictly for clarity, and are not
** particularyl efficient.
*/
unsigned int input_code(FILE *input)
{
unsigned int return_value;
static int input_bit_count=0;
static unsigned long input_bit_buffer=0L;
while (input_bit_count <= 24)
{
input_bit_buffer |=
(unsigned long) getc(input) << (24-input_bit_count);
input_bit_count += 8;
}
return_value=input_bit_buffer >> (32-BITS);
input_bit_buffer <<= BITS;
input_bit_count -= BITS;
return(return_value);
}
void output_code(FILE *output,unsigned int code)
{
static int output_bit_count=0;
static unsigned long output_bit_buffer=0L;
output_bit_buffer |= (unsigned long) code << (32-BITS-output_bit_count);
output_bit_count += BITS;
while (output_bit_count >= 8)
{
putc(output_bit_buffer >> 24,output);
output_bit_buffer <<= 8;
output_bit_count -= 8;
}
}

Here's a document on an algorithm to do regex searching directly in LZW compressed bytes :
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.1434&rep=rep1&type=pdf
It contains references to efficient algorithms to search for exact strings as well.

How to format flags in c?

Assume that there are flag definitions such as:
SHF_WRITE 0x1
SHF_ALLOC 0x2
SHF_EXECINSTR 0x4
SHF_MASKPROC 0xf0000000
Given a flag, I need to output SHF_WRITE|SHF_ALLOC if the bits 0x1 and 0x2 is on.
How to do the trick in C?

#define V(n) { n, #n }
struct Code {
int value;
char *name;
} decode[] = {
V(SHF_WRITE),
V(SHF_ALLOC),
{ 0, NULL },
};
void f(const int x) {
struct Code *c = decode;
int beenthere = 0;
for(; c->name; ++c)
if(x & c->value)
printf("%s%s", beenthere++ ? "|" : "", c->name);
if(beenthere)
printf("\n");
}

Just create a character buffer with enough space to hold all possible combinations of strings and add to it the appropriate strings for each applicable bit set. (or you could ditch the buffer and write straight to stdout, your choice) Here's a naive implementation of how you could do such a thing:
void print_flags(int flag)
{
#define BUFLEN (9+9+13+12+3+1)
/* for the text, pipes and null terminator*/
#define PAIRLEN 4
static struct { int value; const char *string; } pair[] =
{
{ SHF_WRITE, "SHF_WRITE" },
{ SHF_ALLOC, "SHF_ALLOC" },
{ SHF_EXECINSTR, "SHF_EXECINSTR" },
{ SHF_MASKPROC, "SHF_MASKPROC" },
};
char buf[BUFLEN]; /* declare the buffer */
char *write = buf; /* and a "write" pointer */
int i;
for (i = 0; i < PAIRLEN; i++)
{
if ((flag & pair[i].value) == pair[i].value) /* if flag is set... */
{
size_t written = write - buf;
write += _snprintf(write, BUFLEN-written, "%s%s",
written > 0 ? "|" : "",
pair[i].string); /* write to the buffer */
}
}
if (write != buf) /* if any of the flags were set... */
{
*write = '\0'; /* null terminate (just in case) */
printf("(%s)", buf); /* print out the buffer */
}
#undef PAIRLEN
#undef BUFLEN
}

PROBLEM:
"SHF_WRITE|SHF_ALLOC" says "bit 0x1 OR bit 0x2", not "bits 0x1 AND 02x".
Nevertheless, if you wanted to print "SOME MSG" if the bits 0x1 and 0x2 were both "on" in some value "flag", here's how:
if (flag & SHF_WRITE & SHF_ALLOC)
printf ("SOME MSG, flag= 0x%x\n", flag);
If you wanted to print the text represtation of ANY bits that were "on" in the value, you might do something like this:
char buf[80] = '\0';
if (flag & SHF_WRITE)
strcpy (buf, " SHF_WRITE");
if (flag & SHF_ALLOC)
strcpy (buf, " SHF_ALLOC");
...
printf ("SOME MSG, flag= %s\n", buf);
And finally, if you DON'T want to print if NO bit is set, just do this:
if (flag)
{
... do printing ...
}
else
{
... do nothing? ...
}