I'm trying to implement an all-purpose function for printing 2D data. What I've come up with is:
int mprintf(FILE* f, char* fmt, void** data, size_t cols, size_t rows)
The challenge is determining how many bits to read at once from data, based on fmt.
The format fmt is going to be the stdlib's-specific format for printf() and alike.
Do you have any knowledge of already-existing features from stdlibc (GNU GCC C's) I could use to ease this up?
I try avoiding having to do it all manually, because I know "I am stupid" (I don't want to introduce stupid bugs). Thus, reusing code would be the bug-freest way.
Thanks
Addendum
I see there's a /usr/include/printf.h. Can't I use any of those functions to do it right and ease my job at the same time?
Design proposed in question:
int mprintf(FILE *f, char *fmt, void **data, size_t cols, size_t rows);
High-level design points
If you want to print a 4x4 section of an 8x8 matrix, you need to know the row length of the matrix as well as the size to print. Or you may prefer to have that as a separate function.
Presumably, the format will define the separation between matrix entries, or will you force a space between them, or what? (If the user specifies "%d", will the numbers all be joined together?)
You're implicitly assuming that the matrix will be printed by itself, left-justified on the page. How would you adapt the interface to print the matrix elsewhere? Leading spaces on the line? Text before each line of the matrix? Text after line of the matrix?
Low-level design points
The format string should be a const char *.
Clearly, your code can do what printf() does, more or less. It looks at the format conversion specifier, and then determines what type to collect. Your code will be slightly more complex, in some respects. You'll need to treat an array of unsigned char differently from an array of short, etc. C99 provides for modifier hh for signed char or unsigned char (before the format specifiers d, i, o, u, x, or X), and the modifier h for short or unsigned short. You should probably recognize these too. Similarly, the modifiers L for long double and l for long and ll for long long should be handled. Interestingly, printf() does not have to deal with float (because any single float value is automatically promoted to double), but your code will have to do that. By analogy with h and L, you should probably use H as the modifier to indicate a float array. Note that this case means you will need to pass to the printf() function a different format from the one specified by the user. You can make a copy of the user-provided format, dropping the 'H' (or use exactly the user-provided format except when it contains the 'H'; you will not modify the user's format string - not least because the revised interface says it is a constant string).
Ultimately, your code will have to determine the size of the elements in the array. It might be that you modify the interface to include that information - by analogy with functions such as bsearch() and qsort(), or fread() and fwrite(). Or you can determine it from the format specifier.
Note that although GCC allows pointer arithmetic on void *, Standard C does not.
Are you sure you want a void ** in the interface? I think it would be easier to understand if you pass the address of the starting element of the array - a single level of pointer.
short s[3][4];
float f[2][5];
char c[20][30];
mprintf(fp, "%3hd", &s[0][0], 4, 3);
mprintf(fp, "%8.4Hf", &f[0][0], 5, 2);
mprintf(fp, "%hhu", &c[0][0], 30, 20);
This changes the data parameter to a void *. Maybe I'm too decaffeinated, but I can't see how to make a double pointer work sanely.
Outline
Determine size of elements and correct format string.
For each row
For each column
Find the data for the element
Call an appropriate function to print it
Print a separator if you need to
Print a newline
Illustration
This code assumes a '0 is success' convention. It assumes you are dealing with numbers, not matrices of pointers or strings.
typedef int (*PrintItem)(FILE *fp, const char *format, void *element);
static int printChar(FILE *fp, const char *format, void *element)
{
char c = *(char *)element;
return (fprintf(fp, format, c) <= 0) ? -1 : 0;
}
...and a whole lot more like this...
static int printLongDouble(FILE *fp, const char *format, void *element)
{
long double ld = *(long double *)element;
return (fprintf(fp, format, ld) <= 0) ? -1 : 0;
}
int mprintf(FILE *fp, const char *fmt, void *data, size_t cols, size_t rows)
{
char *format = strdup(fmt);
int rc = 0;
size_t size;
PrintItem print;
if ((rc = print_info(format, &size, &print)) == 0)
{
for (size_t i = 0; i < rows; i++)
{
for (size_t j = 0; j < cols; j++)
{
void *element = (char *)data + (i * cols + j) * size;
if ((rc = print(fp, format, element)) < 0)
goto exit_loop;
}
fputc('\n', fp); // Possible error ignored
}
}
exit_loop:
free(fmt);
return rc;
}
static int print_info(char *fmt, size_t *size, PrintItem *print)
{
...analyze format string...
...set *size to the correct size...
...set *print to the correct printing function...
...modify format string if need so be...
...return 0 on success, -1 on failure...
}
Working code
Left as an exercise:
Pointers
Strings
size_t
intmax_t
ptrdiff_t
Note that I would not normally use the += or *= operators on the same line as other assignments; it was convenient for generating test numbers, though.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include <assert.h>
/* mprintf() - print a matrix of size cols x rows */
extern int mprintf(FILE *fp, const char *fmt, void *data, size_t cols, size_t rows);
typedef int (*PrintItem)(FILE *fp, const char *format, void *element);
static int printChar(FILE *fp, const char *format, void *element)
{
char value = *(char *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printShort(FILE *fp, const char *format, void *element)
{
short value = *(short *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printInt(FILE *fp, const char *format, void *element)
{
int value = *(int *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printLong(FILE *fp, const char *format, void *element)
{
long value = *(long *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printLongLong(FILE *fp, const char *format, void *element)
{
long long value = *(long long *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printFloat(FILE *fp, const char *format, void *element)
{
float value = *(float *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printDouble(FILE *fp, const char *format, void *element)
{
double value = *(double *)element;
return (fprintf(fp, format, value) <= 0) ? -1 : 0;
}
static int printLongDouble(FILE *fp, const char *format, void *element)
{
long double valued = *(long double *)element;
return (fprintf(fp, format, valued) <= 0) ? -1 : 0;
}
/* analyze format string - all arguments can be modified */
static int print_info(char *format, size_t *size, PrintItem *print)
{
char *fmt = format;
char c;
bool scanning_type = false;
int hcount = 0;
int lcount = 0;
int Hcount = 0;
int Lcount = 0;
char *Hptr = 0;
while ((c = *fmt++) != '\0')
{
switch (c)
{
case '%':
if (*fmt == '%')
fmt++;
else
scanning_type = true;
break;
/* Length modifiers */
case 'h':
if (scanning_type)
hcount++;
break;
case 'l':
if (scanning_type)
lcount++;
break;
case 'L':
if (scanning_type)
Lcount++;
break;
case 'H':
if (scanning_type)
{
Hptr = fmt - 1;
Hcount++;
}
break;
/* Integer format specifiers */
case 'd':
case 'i':
case 'o':
case 'u':
case 'x':
case 'X':
if (scanning_type)
{
/* No floating point modifiers */
if (Hcount > 0 || Lcount > 0)
return -1;
/* Can't be both longer and shorter than int at the same time */
if (hcount > 0 && lcount > 0)
return -1;
/* Valid modifiers are h, hh, l, ll */
if (hcount > 2 || lcount > 2)
return -1;
if (hcount == 2)
{
*size = sizeof(char);
*print = printChar;
}
else if (hcount == 1)
{
*size = sizeof(short);
*print = printShort;
}
else if (lcount == 2)
{
*size = sizeof(long long);
*print = printLongLong;
}
else if (lcount == 1)
{
*size = sizeof(long);
*print = printLong;
}
else
{
*size = sizeof(int);
*print = printInt;
}
return 0;
}
break;
/* Floating point format specifiers */
case 'e':
case 'E':
case 'f':
case 'F':
case 'g':
case 'G':
case 'a':
case 'A':
if (scanning_type)
{
/* No integer modifiers */
if (lcount > 0 || hcount > 0)
return -1;
/* Can't be both float and long double at once */
if (Lcount > 0 && Hcount > 0)
return -1;
/* Cannot repeat L or H modifiers */
if (Lcount > 1 || Hcount > 1)
return -1;
if (Lcount > 0)
{
*size = sizeof(long double);
*print = printLongDouble;
}
else if (Hcount > 0)
{
/* modify format string, dropping the H */
assert(Hptr != 0 && strlen(Hptr+1) > 0);
memmove(Hptr, Hptr+1, strlen(Hptr)); // Copy '\0' too!
*size = sizeof(float);
*print = printFloat;
}
else
{
*size = sizeof(double);
*print = printDouble;
}
return 0;
}
break;
default:
break;
}
}
return -1;
}
int mprintf(FILE *fp, const char *fmt, void *data, size_t cols, size_t rows)
{
char *format = strdup(fmt); // strdup() is not standard C99
int rc = 0;
size_t size;
PrintItem print;
if ((rc = print_info(format, &size, &print)) == 0)
{
for (size_t i = 0; i < rows; i++)
{
for (size_t j = 0; j < cols; j++)
{
void *element = (char *)data + (i * cols + j) * size;
if ((rc = print(fp, format, element)) < 0)
{
fputc('\n', fp); // Or fputs("<<error>>\n");
goto exit_loop;
}
}
fputc('\n', fp); // Possible error ignored
}
}
exit_loop:
free(format);
return rc;
}
#ifdef TEST
int main(void)
{
short s[3][4];
float f[2][5];
char c[8][9];
FILE *fp = stdout;
int v = 0;
for (size_t i = 0; i < 3; i++)
{
for (size_t j = 0; j < 4; j++)
{
s[i][j] = (v += 13) & 0x7FFF;
printf("s[%zu,%zu] = %hd\n", i, j, s[i][j]);
}
}
v = 0;
for (size_t i = 0; i < 8; i++)
{
for (size_t j = 0; j < 9; j++)
{
c[i][j] = (v += 13) & 0x7F;
printf("c[%zu,%zu] = %hhu\n", i, j, c[i][j]);
}
}
float x = 1.234;
for (size_t i = 0; i < 2; i++)
{
for (size_t j = 0; j < 5; j++)
{
f[i][j] = x *= 13.12;
printf("f[%zu,%zu] = %g\n", i, j, f[i][j]);
}
}
mprintf(fp, " %5hd", &s[0][0], 4, 3);
mprintf(fp, "%%(%3hhu) ", &c[0][0], 8, 9);
mprintf(fp, " %11.4He", &f[0][0], 5, 2);
mprintf(fp, " %11.4He", f, 5, 2);
return 0;
}
#endif /* TEST */
Assuming I understood your requirements and assuming fmt specifies how to format only one element (it could be extended so that it is not something you can pass to (f)printf directly, but a description of how to print the whole matrix - which I think is more useful), you simply have to do some easy string manipulation to find which type the data is, thus deciding how to cast your void** pointer (look here).
The cases won't be infinite but only restricted on how many data types you wish to support.
Once you casted the pointer, a simple for loop based on cols and rows will do the trick.
There is no means to do this that is universally applicable, it all depends on HOW you want it displayed. However, the following is pretty general and you can adapt it if you need it changed slightly:
int mprintf(FILE* file, char* fmt, int** data, size_t cols, size_t rows) {
for(int r=0; r<rows; r++) {
for(int c=0; c<cols; c++) {
fprintf(file, fmt, data[r][c]); // fmt can be e.g. "%9i"
}
printf("\n");
}
}
You can't determine the argument type from the printf format string. OP's problem is that he wants to print like printf, but consume arguments by pointer like scanf. Unfortunately, the printf conversion specifiers are insufficient for the 'scanf' side of the task. Consider "%f", which printf uses for floats or doubles. The argument conversion rules for variadic functions mean that printf always sees doubles and doesn't care, but in this case scanf needs "%f" and "%lf" to distinguish them. Nor can the OP try to use scanf versions of the specifiers, as this will break printf. Neither is a subset of the other.
Your generic function needs to be told at least both the format conversion specifier and how big the elements are to do the right pointer arithmetic. (In fact, if you are going to delegate to fprintf and friends, the size is insufficient - you will need to know the type at the fprintf call site). I would probably pass in a function rather than two parameters to avoid the risk of mismatch between the two, thus:
#include <stdio.h>
typedef const void *(*Printfunc)(FILE *f, const void *datum);
/* print an integer and advance the pointer */
static const void* print_int(FILE *f, const void *datum)
{
const int* p = datum;
fprintf(f, "%d", *p);
return p + 1;
}
/* print a char and advance the pointer */
static const void* print_char(FILE *f, const void *datum)
{
const char* p = datum;
fprintf(f, "%c", *p);
return p + 1;
}
static void mprint(FILE *f, Printfunc p, const void *data, size_t cols, size_t rows)
{
const void *next = data;
int i;
for (i = 0; i < rows; ++i) {
int j;
for (j = 0; j < cols; ++j) {
next = p(f, next);
putc(' ', f);
}
putc('\n', f);
}
}
int main()
{
int imatrix[3][2] = { { 0, 1 }, { 2, 3 }, { 4, 5 } };
char cmatrix[2][2] = { { 'a', 'b' }, { 'c', 'd' } };
mprint(stdout, print_int, imatrix, 2, 3);
mprint(stdout, print_char, cmatrix, 2, 2);
return 0;
}
I think by typecasting the void **data based upon the *fmt would do the trick. I am not sure if I understand your question correctly. Because you can you can put a switch-case statement based upon *fmt for typcasting, and then use **data as 2-D array to print it. Use rows/columns as index of the 2-D array.
Related
So given a string of up to 7 letters, I need to find every permutation of that string (with and without all the letters) and then check if any of those permutations can be found in my dictionary.txt file, and print the ones that match. So basically, if the user inputs "try," the permutations would be try, tr, tyr, ty, t, rty, etc., and then check if any of them match words in the txt file. I tried to do this using strncopy and strcmp, but the program doesn't always correctly deduce that two things are equal, it takes forever to run, and there's a bug where it counts having zero letters as a permutation of the original string.
Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 100 /* number of words in dictionary.txt */
#define MAX 7 /* max number of letters in given string */
/* function to swap values at two pointers */
void swap(char *x, char *y){
char temp;
temp = *x;
*x = *y;
*y = temp;
}
/* function to find permutations of the string */
void permute(char *letters, int l, int r){
if (l == r){
char *a[SIZE];
FILE *file = fopen("dictionary.txt", "r");
char target[MAX_2];
memset(target, '\0', sizeof(target));
for (int i = 0; i < SIZE; i++){
a[i] = malloc(100000);
fscanf(file, "%s", a[i]);
}
for (int i = 0; i < 10; i++){
for (int j = 0; j < r - 1; j++){
strcpy(target, a[i]);
if (strcmp(target, &letters[i]) == 0){
printf("%s\n", target);
printf("%s\n", letters);
printf("Match\n");
}
/*else if (strcmp(target, &letters[i]) != 0){
printf("%s\n", target);
printf("%s\n", letters);
printf("Not a match\n");
}
*/
}
}
for (int i = 0; i < SIZE; i++){
free (a[i]);
}
fclose(file);
}
else{
for (int i = l; i <= r; i++){
swap((tiles+l), (tiles+i));
permute(tiles, l+1, r);
swap((tiles+l), (tiles+i));
}
}
}
int main(){
/* initializing tile input */
char letters[MAX];
printf("Please enter your letters: ");
scanf("%s", letters);
/* finding size of input */
int size = strlen(letters);
/* finds all the permutation of the input */
/* parameters: string; start of the string; end of the string */
permute(letters, 0, size);
return 0;
}
Any help or suggestions to pinpoint what I'm doing wrong would be greatly appreciated.
As hinted in my comment, you can map all permutations of a string to a single code value, just by using the bits of a big enough unsigned integer as a bit set. Thus, the (same length) permutations of e.g. the word "try" all map to the same value.
As far as I understood your problem, you also want to match words, which start out with a substring of the wanted word. For that to work, you need to generated N such codes, if N is the number of letters, a word contains. I.e. For a three letter word, the code for the first letter, the first 2 letters and the code for all 3 letters.
Since reading from a file is probably not the problem, here the code, showcasing the "code based" string matching idea (which should be reasonably fast):
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#define MAX_WORD_LENGTH 7
typedef uint32_t WordCode;
typedef struct WordCodes_tag {
size_t count;
WordCode codes[MAX_WORD_LENGTH];
} WordCodes_t;
bool word_to_code(const char* word,
size_t start,
size_t end,
WordCode* code) {
if ((end - start) > MAX_WORD_LENGTH)
return false;
*code = 0;
for (size_t i = start; i < end; i++) {
char c = word[i];
if ((c >= 'a') && (c <= 'z')) {
char bit = c - 'a';
WordCode mask = 1 << bit;
(*code) |= mask;
} else {
return false;
}
}
return true;
}
bool word_to_codes(const char* word, WordCodes_t* codes) {
if (NULL == codes)
return false;
if (NULL == word)
return false;
codes->count = 0;
size_t nchars = strlen(word);
if (nchars > MAX_WORD_LENGTH)
return false;
for (size_t len = nchars; len >= 1; len--) {
WordCode word_code;
if (word_to_code(word, 0, len, &word_code)) {
codes->codes[codes->count] = word_code;
codes->count++;
} else {
return false;
}
}
return true;
}
void show_word_codes(const WordCodes_t* codes) {
if (NULL == codes) return;
printf("(");
for (size_t i = 0; i < codes->count; i++) {
if (i > 0)
printf(", %d", codes->codes[i]);
else
printf("%d", codes->codes[i]);
}
printf(")\n");
}
bool is_match(const WordCodes_t* a, const WordCodes_t* b) {
if ((NULL == a) || (NULL == b))
return false;
if ((0 == a->count) || (0 == b->count))
return false;
const WordCodes_t *temp = NULL;
if (a->count < b->count) {
temp = a;
a = b;
b = temp;
}
size_t a_offset = a->count - b->count;
for (size_t i = a_offset, j = 0; i < a->count; i++, j++) {
if (a->codes[i] == b->codes[j])
return true;
}
return false;
}
int main(int argc, const char* argv[]) {
const char* wanted = "try";
const char* dictionary[] = {
"house", "mouse", "cat", "tree", "try", "yrt", "t"
};
size_t dict_len = sizeof(dictionary) / sizeof(char*);
WordCodes_t wanted_codes;
if (word_to_codes(wanted, &wanted_codes)) {
printf("word codes of the wanted word '%s': ", wanted);
show_word_codes(&wanted_codes);
for (size_t i = 0; i < dict_len; i++) {
WordCodes_t found_codes;
if (word_to_codes(dictionary[i],&found_codes)) {
printf("word codes of dictionary word '%s' (%s): ",
dictionary[i],
is_match(&wanted_codes, &found_codes) ?
"match" : "no match");
show_word_codes(&found_codes);
} else {
printf("word_to_codes(%s) failed!", dictionary[i]);
}
}
} else {
puts("word_to_codes() failed!");
return -1;
}
}
As function is_match() above shows, you need only compare the codes for the respective substring length. Thus, even if you have 2 sets of up to 7 numbers, you need only maximum 7 comparisons.
The output looks like this (which seems to make sense):
./search
word codes of the wanted word 'try': (17432576, 655360, 524288)
word codes of dictionary word 'house' (no match): (1327248, 1327232, 1065088, 16512, 128)
word codes of dictionary word 'mouse' (no match): (1331216, 1331200, 1069056, 20480, 4096)
word codes of dictionary word 'cat' (no match): (524293, 5, 4)
word codes of dictionary word 'tree' (match): (655392, 655376, 655360, 524288)
word codes of dictionary word 'try' (match): (17432576, 655360, 524288)
word codes of dictionary word 'yrt' (match): (17432576, 16908288, 16777216)
word codes of dictionary word 't' (match): (524288)
If you want to match the words in a dictionary against all partial permutations of a search term, you don't have to create all permutations. (The number of permutations n! grows very quickly with the length of the search term, n.)
Instead, it is easier to write a customized search function. You can make use of two strategies here:
A word w is a permutation of the search term s if both words are eaqual if the letters are sorted. For example, "integral" and "triangle" are anagrams of each other, because both sort to "aegilnrt".
You can skip letters in the search term when searching to account for partial anagrams. Because the search term and the word will be sorted, you know which ones to skip: The ones that are lexically "smaller" than the next letter in the word.
So your matching function should sort the words first and then compare the words character by character in such a way that characters from the search term can be skipped.
Here's code that does that:
int char_cmp(const void *pa, const void *pb)
{
const char *a = pa;
const char *b = pb;
return *a - *b;
}
bool partial_anagram(const char *aa, const char *bb)
{
char A[64];
char B[64];
const char *a = strcpy(A, aa);
const char *b = strcpy(B, bb);
qsort(A, strlen(A), 1, char_cmp);
qsort(B, strlen(B), 1, char_cmp);
while (*b) {
while (*a && *a < *b) a++;
if (*a != *b) return false;
a++;
b++;
}
return true;
}
Things to note:
Sorting is done with the function qsort from <stdlib.h>, for which you need a comparator function, in this case char_cmp.
The sorted strings are copies, so that the original strings are not modified. (The code above is unsafe, because it doesn't enforce that the length of the strings is less than 64 characters. Unfortunately, the function strncpy, which can accept a maximum buffer size, is not safe, either, because it can leave the buffer unterminated. A safe way to copy the strings would be snprintf(A, sizeof(A), "%s", aa), but I've kept the strcpy for, er, "simplicity".)
The function partial_anagram takes unsorted strings and sorts them. That makes for a clean interface, but it is inefficient when you want to test against the same search term repeatedly as in your case. You could change the function, so that it expects already sorted strings. This will reduce the function to just the loop and will place the responsibility of sorting to the caller.
If you really have a lot of searches, there is yet more room for optimization. For example, you could insert the sorted dictionary into a trie. Given that you original code read the whole file for each permutation, I guess you're not worried that much about performance. :)
I've put a working example online. The code above works with pointers. If you are more at ease with indices, you can rewrite the function:
bool partial_anagram(const char *aa, const char *bb)
{
char a[64];
char b[64];
unsigned i = 0;
unsigned j = 0;
strcpy(a, aa);
strcpy(b, bb);
qsort(a, strlen(a), 1, char_cmp);
qsort(b, strlen(b), 1, char_cmp);
while (b[j]) {
while (a[i] && a[i] < b[j]) i++;
if (a[i] != b[j]) return false;
i++;
j++;
}
return true;
}
Problem
One is using an algorithm that has exponentially growing run-time with the problem size. There are probably lots of ways to speed this up, but, as suggested by #SparKot, a trie, or prefix tree, is a particularly good fit. One can build a trie from an dictionary array of size n, assuming the length of the strings in your dictionary are bounded, in O(n log n). Looking up angrams in the worst-case, where the letters never run out, (ignoring the arbitrary 7 limit,) is still worst case O(n).
$ bin/trie AAABBBCCCDDDEEEFFFGGGHHHIIIJJJKKKLLLMMMNNNOOOPPPQQQRRRSSSTTTUUUVVVWWWXXXYYYZZZ < Tutte_le_parole_inglesi.txt
build_index warning: duplicate "OUTSOURCING".
build_index warning: duplicate "OUTSOURCINGS".
Loaded 216553 trie entries.
AA
AAH
AAHED
AAHING
...
ZYTHUMS
ZYZZYVA
ZYZZYVAS
211929 words found.
Proposal
The reason a prefix tree is so effective, is it allows you to query prefixes as (even more) efficiently as lookup. With this, one can do a very effective branch-and-bound-style algorithm. That is, the longer the string, the less words it will be a prefix match to; if the string is not a prefix match for any of the words in the dictionary, one can rule out any longer strings and just not test them.
So my idea is, form a histogram with the Scrabble-string of length k in O(k). Then, recursively, add more and more letters, matching, until no dictionary entries are prefix matches of the string. This will run in (*I think) O(n log n + k), assuming a bound on the number of comparisons needed to distinguish words; ie, one's dictionary is not { a, aa, aaa, aaaa, aaaaa, aaaaaa, ... }.
Implementation
I use a PATRiCA tree. It is especially attractive because a lot of data is implicit; one can use a simple array to represent the leaves on a complete binary tree. Specifically, n leaves are already just the list of words in lexicographical order, we want to build an index of n - 1 branches. It requires a stop code; the null-termination in C is perfect. I don't have to create copies of everything and manage them. The below code first sets up a dynamic array, which is useful for input, then sets up a trie, then implements the algorithm.
#include <stdlib.h> /* EXIT malloc free qsort */
#include <stdio.h> /* printf */
#include <string.h> /* memmove memcpy */
#include <assert.h> /* assert */
#include <errno.h> /* errno */
#include <limits.h> /* UINT_MAX */
#include <ctype.h> /* isgraph */
/* Dynamic array. */
#define MIN_ARRAY(name, type) \
struct name##_array { type *data; size_t size, capacity; }; \
static int name##_array_reserve(struct name##_array *const a, \
const size_t min) { \
size_t c0; \
type *data; \
const size_t max_size = (size_t)-1 / sizeof *a->data; \
if(a->data) { \
if(min <= a->capacity) return 1; \
c0 = a->capacity < 7 ? 7 : a->capacity; \
} else { \
if(!min) return 1; \
c0 = 7; \
} \
if(min > max_size) return errno = ERANGE, 0; \
/* `c_n = a1.625^n`, approximation golden ratio `\phi ~ 1.618`. */ \
while(c0 < min) { \
size_t c1 = c0 + (c0 >> 1) + (c0 >> 3); \
if(c0 >= c1) { c0 = max_size; break; } /* Unlikely. */ \
c0 = c1; \
} \
if(!(data = realloc(a->data, sizeof *a->data * c0))) \
{ if(!errno) errno = ERANGE; return 0; } \
a->data = data, a->capacity = c0; \
return 1; \
} \
static type *name##_array_buffer(struct name##_array *const a, \
const size_t n) { \
if(a->size > (size_t)-1 - n) { errno = ERANGE; return 0; } \
return name##_array_reserve(a, a->size + n) \
&& a->data ? a->data + a->size : 0; \
} \
static type *name##_array_append(struct name##_array *const a, \
const size_t n) { \
type *b; \
if(!(b = name##_array_buffer(a, n))) return 0; \
return a->size += n, b; \
} \
static type *name##_array_new(struct name##_array *const a) \
{ return name##_array_append(a, 1); } \
static struct name##_array name##_array(void) \
{ struct name##_array a; a.data = 0, a.capacity = a.size = 0; return a; } \
static void name##_array_(struct name##_array *const a) \
{ if(a) free(a->data), *a = name##_array(); }
MIN_ARRAY(char, char)
/** Append a file, `fp`, to `c`, and add a '\0'.
#return Success. A partial read is failure. #throws[fopen, fread, malloc]
#throws[EISEQ] The text file has embedded nulls.
#throws[ERANGE] If the standard library does not follow POSIX. */
static int append_file(struct char_array *c, FILE *const fp) {
const size_t granularity = 4096;
size_t nread;
char *cursor;
int success = 0;
assert(c && fp);
/* Read entire file in chunks. */
do if(!(cursor = char_array_buffer(c, granularity))
|| (nread = fread(cursor, 1, granularity, fp), ferror(fp))
|| !char_array_append(c, nread)) goto catch;
while(nread == granularity);
/* File to `C` string. */
if(!(cursor = char_array_new(c))) goto catch;
*cursor = '\0';
/* Binary files with embedded '\0' are not allowed. */
if(strchr(c->data, '\0') != cursor) { errno = EILSEQ; goto catch; }
{ success = 1; goto finally; }
catch:
if(!errno) errno = EILSEQ; /* Will never be true on POSIX. */
finally:
if(fp) fclose(fp);
return success;
}
/* Trie is base-2 compact radix tree, described in <Morrison, 1968 PATRICiA>.
Specifically, this is a full binary tree. */
struct branch { unsigned skip, left; };
static const size_t skip_max = UINT_MAX, left_max = UINT_MAX;
MIN_ARRAY(branch, struct branch)
MIN_ARRAY(leaf, char *)
struct trie { struct branch_array branches; struct leaf_array leaves; };
static struct trie trie(void) { struct trie t;
t.branches = branch_array(), t.leaves = leaf_array(); return t; }
static void trie_(struct trie *const t) { if(t) branch_array_(&t->branches),
leaf_array_(&t->leaves), *t = trie(); }
/** From string `a`, extract `bit`, either 0 or 1. */
static int is_bit(const char *const a, const size_t bit) {
const size_t byte = bit >> 3;
const unsigned char mask = 128 >> (bit & 7);
return !!(a[byte] & mask);
}
/** #return Whether `a` and `b` are equal up to the minimum of their lengths'. */
static int is_prefix(const char *a, const char *b) {
for( ; ; a++, b++) {
if(*a == '\0') return 1;
if(*a != *b) return *b == '\0';
}
}
/** [low, high). */
struct range { size_t low, high; };
static int init_branches_r(struct trie *const t, size_t bit,
const struct range range) {
struct range r;
size_t skip = 0, left;
struct branch *branch;
assert(t && t->leaves.size);
assert(t->branches.capacity >= t->leaves.size - 1);
assert(range.low <= range.high && range.high <= t->leaves.size);
if(range.low + 1 >= range.high) return 1; /* Only one, leaf. */
/* Endpoints of sorted range: skip [_1_111...] or [...000_0_] don't care. */
while(is_bit(t->leaves.data[range.low], bit)
|| !is_bit(t->leaves.data[range.high - 1], bit)) {
if(skip == skip_max) return errno = ERANGE, 0;
bit++, skip++;
}
/* Binary search for the rightmost 0 (+1). */
r = range;
while(r.low < r.high) {
size_t m = r.low + (r.high - r.low) / 2;
if(is_bit(t->leaves.data[m], bit)) r.high = m; else r.low = m + 1;
}
if((left = r.low - range.low - 1) > left_max) return errno = ERANGE, 0;
/* Should have space for all branches pre-allocated. */
branch = branch_array_new(&t->branches), assert(branch);
branch->left = (unsigned)left;
branch->skip = (unsigned)skip;
bit++;
return (r.low = range.low, r.high = range.low + left + 1,
init_branches_r(t, bit, r)) && (r.low = r.high, r.high = range.high,
init_branches_r(t, bit, r)) /* && (printf("}\n"), 1) */;
}
/** Orders `a` and `b` by their pointed-to-strings. #implements qsort bsearch */
static int vstrcmp(const void *const a, const void *const b)
{ return strcmp(*(const char *const*)a, *(const char *const*)b); }
/** #param[a] A zero-terminated file containing words. Will be parsed and
modified.
#param[t] An idle tree that is initialized from `a`. Any modification of `a`
invalidates `t`.
#return Whether the tree initialization was entirely successful. */
static int build_trie(struct trie *const t, struct char_array *const a) {
struct range range;
size_t i;
char *cursor, *end, **leaf;
int is_run = 0;
/* Strict for processing ease; this could be made more permissive. */
assert(a && a->size && a->data[a->size - 1] == '\0'
&& t && !t->branches.size && !t->leaves.size);
for(cursor = a->data, end = a->data + a->size; cursor < end; cursor++) {
/* Fixme: 7-bit; mælström would be parsed as "m", "lstr", "m". */
if(!isgraph(*cursor)) {
*cursor = '\0', is_run = 0;
} else if(!is_run) {
if(!(leaf = leaf_array_new(&t->leaves))) return 0;
*leaf = cursor, is_run = 1;
}
}
if(!t->leaves.size) return errno = EILSEQ, 0; /* No parseable info. */
/* Sort and de-duplicate (inefficiently.) Want to treat it as an index. */
qsort(t->leaves.data, t->leaves.size, sizeof *t->leaves.data, &vstrcmp);
for(i = 1; i < t->leaves.size; i++) {
if(strcmp(t->leaves.data[i - 1], t->leaves.data[i]) < 0) continue;
fprintf(stderr, "build_index warning: duplicate \"%s\".\n",
t->leaves.data[i]);
memmove(t->leaves.data + i, t->leaves.data + i + 1,
sizeof *t->leaves.data * (t->leaves.size - i - 1));
t->leaves.size--, i--;
}
range.low = 0, range.high = t->leaves.size;
if(!branch_array_reserve(&t->branches, t->leaves.size - 1)
|| !init_branches_r(t, 0, range)) return 0;
assert(t->branches.size + 1 == t->leaves.size);
return 1;
}
/** #return In `t`, which must be non-empty, given a `prefix`, stores all leaf
prefix matches, only given the index, ignoring don't care bits.
#order \O(`prefix.length`) */
static struct range partial_prefix(const struct trie *const t,
const char *const prefix) {
size_t n0 = 0, n1 = t->branches.size, i = 0, left;
struct branch *branch;
size_t byte, key_byte = 0, bit = 0;
struct range range = { 0, 0 };
assert(t && prefix);
assert(n1 + 1 == t->leaves.size); /* Full binary tree. */
while(n0 < n1) {
branch = t->branches.data + n0;
bit += branch->skip;
/* '\0' is not included for partial match. */
for(byte = bit >> 3; key_byte <= byte; key_byte++)
if(prefix[key_byte] == '\0') goto finally;
left = branch->left;
if(!is_bit(prefix, bit++)) n1 = ++n0 + left;
else n0 += left + 1, i += left + 1;
}
assert(n0 == n1);
finally:
assert(n0 <= n1 && i - n0 + n1 < t->leaves.size);
range.low = i, range.high = i - n0 + n1 + 1;
return range;
}
/* #return Given a `prefix`, what is the range of matched strings in `t`. */
static struct range prefix(const struct trie *const t,
const char *const prefix) {
struct range range;
assert(t && prefix);
if(!t->leaves.size) goto catch;
range = partial_prefix(t, prefix);
if(range.low <= range.high)
if(!is_prefix(prefix, t->leaves.data[range.low])) goto catch;
goto finally;
catch:
range.low = range.high = 0;
finally:
return range;
}
/* Debug graph. */
/** Given a branch `b` in `tr` branches, calculate the right child branches.
#order \O(log `size`) */
static unsigned right_count(const struct trie *const tr,
const unsigned b) {
unsigned left, right, total = (unsigned)tr->branches.size, b0 = 0;
assert(tr && b < tr->branches.size);
for( ; ; ) {
right = total - (left = tr->branches.data[b0].left) - 1;
assert(left < total && right < total);
if(b0 >= b) break;
if(b <= b0 + left) total = left, b0++;
else total = right, b0 += left + 1;
}
assert(b0 == b);
return right;
}
/** #return Follows the branches to `b` in `tr` and returns the leaf. */
static unsigned left_leaf(const struct trie *const tr,
const unsigned b) {
unsigned left, right, total = (unsigned)tr->branches.size, i = 0, b0 = 0;
assert(tr && b < tr->branches.size);
for( ; ; ) {
right = total - (left = tr->branches.data[b0].left) - 1;
assert(left < tr->branches.size && right < tr->branches.size);
if(b0 >= b) break;
if(b <= b0 + left) total = left, b0++;
else total = right, b0 += left + 1, i += left + 1;
}
assert(b0 == b);
return i;
}
static void graph(const struct trie *const tr, const char *const fn) {
unsigned left, right, b, i;
FILE *fp = 0;
assert(tr && fn);
if(!(fp = fopen(fn, "w"))) { perror(fn); return; }
fprintf(fp, "digraph {\n"
"\tgraph [truecolor=true, bgcolor=transparent];\n"
"\tfontface=modern;\n"
"\tnode [shape=none];\n"
"\n");
if(!tr->branches.size) {
assert(!tr->leaves.size);
fprintf(fp, "\tidle;\n");
} else {
assert(tr->branches.size + 1 == tr->leaves.size);
fprintf(fp, "\t// branches\n");
for(b = 0; b < tr->branches.size; b++) { /* Branches. */
const struct branch *branch = tr->branches.data + b;
left = branch->left, right = right_count(tr, b);
fprintf(fp, "\ttree%pbranch%u [label = \"%u\", shape = circle, "
"style = filled, fillcolor = Grey95];\n"
"\ttree%pbranch%u -> ", (const void *)tr, b, branch->skip,
(const void *)tr, b);
if(left) fprintf(fp, "tree%pbranch%u [arrowhead = rnormal];\n",
(const void *)tr, b + 1);
else fprintf(fp,
"tree%pleaf%u [color = Gray85, arrowhead = rnormal];\n",
(const void *)tr, left_leaf(tr, b));
fprintf(fp, "\ttree%pbranch%u -> ", (const void *)tr, b);
if(right) fprintf(fp, "tree%pbranch%u [arrowhead = lnormal];\n",
(const void *)tr, b + left + 1);
else fprintf(fp,
"tree%pleaf%u [color = Gray85, arrowhead = lnormal];\n",
(const void *)tr, left_leaf(tr, b) + left + 1);
}
}
fprintf(fp, "\t// leaves\n");
for(i = 0; i < tr->leaves.size; i++) fprintf(fp,
"\ttree%pleaf%u [label = <%s<FONT COLOR=\"Gray85\">⊔</FONT>>];\n",
(const void *)tr, i, tr->leaves.data[i]);
fprintf(fp, "\n"
"\tnode [color = \"Red\"];\n"
"}\n");
fclose(fp);
}
/* Actual program. */
/* The input argument histogram. Used in <fn:find_r>. (Simple, but questionable
design choice.) */
static unsigned char hist[128];
static const size_t hist_max = UCHAR_MAX,
hist_size = sizeof hist / sizeof *hist;
static size_t words_found;
/** Branch-and-bound recursive function. */
static void find_r(const struct trie *const tr, char *const word) {
struct range r;
size_t len, i;
assert(word);
r = prefix(tr, word);
if(r.low >= r.high) return; /* Found nothing, we can bound this branch. */
if(!strcmp(word, tr->leaves.data[r.low])) { /* Found a match. */
printf("%s\n", word), words_found++;
if(++r.low == r.high) return;
}
len = strlen(word);
for(i = 0; i < hist_size; i++) {
unsigned char *freq;
if(!*(freq = hist + i)) continue;
(*freq)--;
word[len] = (char)i, word[len + 1] = '\0';
find_r(tr, word);
(*freq)++;
}
}
int main(int argc, char *argv[]) {
struct char_array dict = char_array();
struct trie tr = trie();
char *word;
size_t i;
int success = EXIT_FAILURE;
assert(CHAR_BIT == 8); /* C89 this value can change, assumes C99 value. */
if(argc != 2) { errno = EILSEQ;
fprintf(stderr, "Needs argument and dictionary input.\n"); goto catch; }
word = argv[1];
/* Load the dictionary from stdin and index it into a trie. */
if(!append_file(&dict, stdin) || !build_trie(&tr, &dict)) goto catch;
fprintf(stderr, "Loaded %lu trie entries.\n",(unsigned long)tr.leaves.size);
graph(&tr, "dictionary.gv");
/* Histogram the argument. */
for(i = 0; word[i] != '\0'; i++) {
unsigned char *freq;
if(word[i] & 0x80) continue; /* UTF-8 is not supported. :[ */
if(*(freq = hist + word[i]) == hist_max)
{ errno = ERANGE; goto catch; } /* "aaaaaaaaa..." x 5M? */
(*freq)++;
}
/* Might as well re-use the word now that we're finished with it; it's the
right length. */
*word = '\0', find_r(&tr, word);
fprintf(stderr, "%lu words found.\n", (unsigned long)words_found);
{ success = EXIT_SUCCESS; goto finally; }
catch:
perror("word");
finally:
trie_(&tr);
char_array_(&dict);
return success;
}
I am trying to solve a problem in which I need to implement a function that receives a void pointer as an argument. This function needs to search an array for a specific value and return a pointer to the value.
The *arr == *val comparison statement produces a "Invalid operands to binary expression ('void' and 'void')" when I run the following code. I have not been able to figure out why.
void* search(void *arr, int n, void* val, char c){
if(c == 'c') {
arr = (char*) arr;
val = (char*) val;
} else if(c == 'i'){
arr = (int*) arr;
val = (int*)val;
} else {
arr = (float*)arr;
val = (float*)val;
}
for (int i = 0; i < n; i++) {
if(*arr == *val){
return arr;
} else {
arr++;
}
return NULL;
}
}
You get the error because it doesn't make sense to de-reference void pointers.
Some compilers like gcc -std=gnu11 have dangerous compiler extensions enabled by default, which hide bugs in the program. The bugs being that you can't de-reference a void pointer nor can you do arithmetic (++) on one. You have to cast the void pointers to some data pointers first.
The statement arr = (char*) arr; says “Take the value of arr, convert that value to a char *, and assign that value to arr.” It does not say “Change the type of arr to be a char *.”
Below are several ways you can write code in C that works with multiple types.
One is to write code for each type you want to support:
void *search(void *arr, int n, void *val, char c)
{
switch (c)
{
case 'c':
{
char *a = arr, *v = val;
for (int i = 0; i < n; ++i, ++a)
if (*a == *v)
return a;
return NULL;
}
case 'i':
{
int *a = arr, *v = val;
for (int i = 0; i < n; ++i, ++a)
if (*a == *v)
return a;
return NULL;
}
default:
{
float *a = arr, *v = val;
for (int i = 0; i < n; ++i, ++a)
if (*a == *v)
return a;
return NULL;
}
}
}
Another is to treat the data as raw bytes:
void *search(void *arr, int n, void *val, char c)
{
// Determine number of bytes in the desired type.
size_t s;
switch (c)
{
case 'c': s = sizeof(char ); break;
case 'i': s = sizeof(int ); break;
default: s = sizeof(float); break;
}
// Set unsigned char pointers to the bytes.
unsigned char *a = arr, *v = val;
/* Compare the s bytes of the current element at a to the bytes of v.
In each iteration, advance the pointer a by s bytes.
*/
for (int i = 0; i < n; ++i, a += s)
if (memcmp(a, v, s) == 0)
return a;
return NULL;
}
One problem with the above method is that some types have multiple representations of equal values. That is, different values in the raw bytes may represent the same value. For example, floating-point formats commonly have a −0 representation and a +0 representation that compare equal when the == operator is used but that will be different when their bytes are compared with the memcmp operation above.
A third method is to have the caller provide the size of the object and a routine to compare them:
void *search(void *arr, int n, void *val, size_t s, int compare(void *, void *))
{
// Set unsigned char pointers to the bytes.
unsigned char *a = arr, *v = val;
/* Compare each array element to the target element using the
caller's routine.
In each iteration, advance the pointer a by s bytes.
*/
for (int i = 0; i < n; ++i, a += s)
if (compare(a, v) == 0)
return a;
return NULL;
}
To use the above routine to compare characters, the caller would call it using search(arr, n, &character, sizeof (char), compareC);, where compareC could be:
int compareC(void *a, void *v)
{
char *x = a, *y = b;
if (*x == *y)
return 0;
if (*x < *y)
return -1;
else
return +1;
}
I'm pretty new to C and how would I check the duplicates of a 1D char array
for example
#define MAX_SIZE 60
Char canvas[MAX_SIZE] = {0};
for(int i=0; i<MAX_SIZE;i++){
//How do i check if there is a duplicate in that array?
}
How do I iterate through to check for duplicates, like do i have to use double for loops and do sizeOf(canavas)/SOMETHING here?
My solution, using a function:
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
bool mem_hasduplicates(const char arr[], size_t len)
{
assert(arr != NULL);
if (len == 0)
return false;
for (size_t i = 0; i < len - 1; ++i) {
for (size_t j = i + 1; j < len; ++j) {
if (arr[i] == arr[j]) {
return true;
}
}
}
return false;
}
int main() {
const char canvas[] = "zcxabca";
printf("%x\n", mem_hasduplicates(canvas, sizeof(canvas)/sizeof(canvas[0])));
const char other_canvas[] = "abcfsd";
printf("%x\n", mem_hasduplicates(other_canvas, sizeof(other_canvas)/sizeof(other_canvas[0])));
}
Live version available at onlinegdb.
#edit Or we can "just" create a histogram from all the numbers as #selbie suggested, although this got me complicated fast:
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
struct histogram_value_s {
char value;
unsigned int count;
};
struct histogram_s {
struct histogram_value_s *v;
size_t len;
};
#define HISTOGRAM_INIT() {0}
void histogram_fini(struct histogram_s *t)
{
t->len = 0;
free(t->v);
}
static int histogram_sort_by_value_qsort_cb(const void *a0, const void *b0)
{
const struct histogram_value_s *a = a0;
const struct histogram_value_s *b = b0;
assert(a != NULL);
assert(b != NULL);
return a->value - b->value;
}
void histogram_sort_by_value(struct histogram_s *t)
{
qsort(t->v, t->len, sizeof(*t->v), histogram_sort_by_value_qsort_cb);
}
static int histogram_sort_by_count_qsort_cb(const void *a0, const void *b0)
{
const struct histogram_value_s *a = a0;
const struct histogram_value_s *b = b0;
assert(a != NULL);
assert(b != NULL);
return a->count - b->count;
}
void histogram_sort_by_count(struct histogram_s *t)
{
qsort(t->v, t->len, sizeof(*t->v), histogram_sort_by_count_qsort_cb);
}
int histogram_getValue_2(const struct histogram_s *t, char value, size_t *idx, unsigned int *ret0)
{
for (size_t i = 0; i < t->len; ++i) {
if (t->v[i].value == value) {
if (ret0) {
*ret0 = t->v[i].count;
}
if (idx) {
*idx = i;
}
return 0;
}
}
return -1;
}
void histogram_printlns_generic(const struct histogram_s *t, const char fmt[])
{
assert(t != NULL);
for (size_t i = 0; i < t->len; ++i) {
printf(fmt, t->v[i].value, t->v[i].count);
}
}
int histogram_add(struct histogram_s *t, char value)
{
size_t idx;
if (histogram_getValue_2(t, value, &idx, NULL) == 0) {
if (t->v[idx].count == UINT_MAX) {
goto ERR;
}
++t->v[idx].count;
} else {
void *tmp;
tmp = realloc(t->v, (t->len + 1) * sizeof(*t->v));
if (tmp == NULL) goto ERR;
t->v = tmp;
t->v[t->len] = (struct histogram_value_s){
.value = value,
.count = 1,
};
++t->len;
}
return 0;
ERR:
return -1;
}
bool histogram_has_any_count_greater_then_2(const struct histogram_s *t)
{
assert(t != NULL);
for (size_t i = 0; i < t->len; ++i) {
if (t->v[i].count >= 2) {
return true;
}
}
return false;
}
/* ----------------------------------------------------------- */
int histogram_create_from_mem(struct histogram_s *ret0, const char arr[], size_t len)
{
assert(ret0 != NULL);
assert(arr != NULL);
struct histogram_s ret = HISTOGRAM_INIT();
for (size_t i = 0; i < len; ++i) {
const char to_add = arr[i];
if (histogram_add(&ret, to_add) < 0) {
goto ERR;
}
}
*ret0 = ret;
return 0;
ERR:
histogram_fini(&ret);
return -1;
}
int main() {
const char canvas[] = "abc";
struct histogram_s h;
int ret;
ret = histogram_create_from_mem(&h, canvas, sizeof(canvas)/sizeof(canvas[0]));
if (ret) {
fprintf(stderr, "mem_createhistogram error!\n");
return -1;
}
printf("'%s' %s duplicates\n",
canvas,
histogram_has_any_count_greater_then_2(&h)
? "has"
: "does not have"
);
histogram_fini(&h);
}
Live version here.
#edit Or we can sort the array, and check if any two adjacent bytes are the same!
#include <stdlib.h>
#include <stdbool.h>
int cmp_chars(const void *a, const void *b)
{
return *(char*)a - *(char*)b;
}
int main() {
char canvas[] = "abca";
qsort(canvas, sizeof(canvas) - 1, sizeof(canvas[0]), cmp_chars);
bool duplicate_found = false;
for (char *p = canvas; p[1] != '\0'; ++p) {
if (p[0] == p[1]) {
duplicate_found = true;
break;
}
}
printf("'%s' %s duplicates\n",
canvas,
duplicate_found ? "has" : "does not have");
}
Live version available at onlinegdb.
If Char is just a typo for char, then this becomes relatively simple - set up a second array, indexed by character code, that keeps track of the number of occurrences of each character:
#include <limits.h>
#include <ctype.h>
...
int charCount[SCHAR_MAX+1] = {0}; // We're only going to worry about non-negative
// character codes (i.e., standard ASCII)
// [0..127]
...
/**
* This assumes that canvas is *not* a 0-terminated string, and that
* every element of the array is meaningful. If that's not the case,
* then loop on the length of the string instead of MAX_SIZE.
*/
for ( int i = 0; i < MAX_SIZE; i++ )
{
if ( canvas[i] >= 0 && canvas[i] <= SCHAR_MAX )
{
charCount[canvas[i]]++; // index into charCount by the value of canvas[i]
}
}
Then you can walk through the charCount array and print all the character values that occurred more than once:
for ( int i = 0; i <= SCHAR_MAX; i++ )
{
if ( charCount[i] > 1 )
{
/**
* If the character value is a printable character (punctuation, alpha,
* digit), print the character surrounded by single quotes - otherwise,
* print the character code as a decimal integer.
*/
printf( isprint( i ) ? "'%c': %d\n" : "%d: %d\n", i, charCount[i] );
}
}
What's that SCHAR_MAX all about, any why am I yammering about non-negative character codes in the comments?
In C, characters the basic execution character set (digits, upper and lowercase letters, common punctuation characters) are guaranteed to have non-negative encodings (i.e., the [0..127] range of standard ASCII). Characters outside of that basic execution character set may have positive or negative values, depending on the implementation. Thus, the range of char values may be [-128..127] on some platforms and [0..255] on others.
The limits.h header defines constants for various type ranges - for characters, it defines the following constants:
UCHAR_MAX - maximum unsigned character value (255 on most platforms)
SCHAR_MIN - minimum signed character value (-128 on most platforms)
SCHAR_MAX - maximum signed character value (127 on most platforms)
CHAR_MIN - minimum character value, either 0 or SCHAR_MIN depending on platform
CHAR_MAX - maximum character value, either UCHAR_MAX or SCHAR_MAX depending on value
To keep this code simple, I'm only worrying about character codes in the range [0..127]; otherwise, I'd have to map negative character codes onto non-negative array indices, and I didn't feel like doing that.
Both this method and the nested loop solution require some tradeoffs. The nested loop solution trades time for space, while this solution trades space for time. In this case, the additional space is fixed regardless of how large canvas becomes. In the nested loop case, time will increase with the square of the length of canvas. For short inputs, there's effectively no difference, but if canvas gets large enough, you will notice a significant decrease in performance with the nested loop solution.
I'm working on an embedded DSP where speed is crucial, and memory is very short.
At the moment, sprintf uses the most resources of any function in my code. I only use it to format some simple text: %d, %e, %f, %s, nothing with precision or exotic manipulations.
How can I implement a basic sprintf or printf function that would be more suitable for my usage?
This one assumes the existence of an itoa to convert an int to character representation, and an fputs to write out a string to wherever you want it to go.
The floating point output is non-conforming in at least one respect: it makes no attempt at rounding correctly, as the standard requires, so if you have have (for example) a value of 1.234 that is internally stored as 1.2399999774, it'll be printed out as 1.2399 instead of 1.2340. This saves quite a bit of work, and remains sufficient for most typical purposes.
This also supports %c and %x in addition to the conversions you asked about, but they're pretty trivial to remove if you want to get rid of them (and doing so will obviously save a little memory).
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
#include <windows.h>
static void ftoa_fixed(char *buffer, double value);
static void ftoa_sci(char *buffer, double value);
int my_vfprintf(FILE *file, char const *fmt, va_list arg) {
int int_temp;
char char_temp;
char *string_temp;
double double_temp;
char ch;
int length = 0;
char buffer[512];
while ( ch = *fmt++) {
if ( '%' == ch ) {
switch (ch = *fmt++) {
/* %% - print out a single % */
case '%':
fputc('%', file);
length++;
break;
/* %c: print out a character */
case 'c':
char_temp = va_arg(arg, int);
fputc(char_temp, file);
length++;
break;
/* %s: print out a string */
case 's':
string_temp = va_arg(arg, char *);
fputs(string_temp, file);
length += strlen(string_temp);
break;
/* %d: print out an int */
case 'd':
int_temp = va_arg(arg, int);
itoa(int_temp, buffer, 10);
fputs(buffer, file);
length += strlen(buffer);
break;
/* %x: print out an int in hex */
case 'x':
int_temp = va_arg(arg, int);
itoa(int_temp, buffer, 16);
fputs(buffer, file);
length += strlen(buffer);
break;
case 'f':
double_temp = va_arg(arg, double);
ftoa_fixed(buffer, double_temp);
fputs(buffer, file);
length += strlen(buffer);
break;
case 'e':
double_temp = va_arg(arg, double);
ftoa_sci(buffer, double_temp);
fputs(buffer, file);
length += strlen(buffer);
break;
}
}
else {
putc(ch, file);
length++;
}
}
return length;
}
int normalize(double *val) {
int exponent = 0;
double value = *val;
while (value >= 1.0) {
value /= 10.0;
++exponent;
}
while (value < 0.1) {
value *= 10.0;
--exponent;
}
*val = value;
return exponent;
}
static void ftoa_fixed(char *buffer, double value) {
/* carry out a fixed conversion of a double value to a string, with a precision of 5 decimal digits.
* Values with absolute values less than 0.000001 are rounded to 0.0
* Note: this blindly assumes that the buffer will be large enough to hold the largest possible result.
* The largest value we expect is an IEEE 754 double precision real, with maximum magnitude of approximately
* e+308. The C standard requires an implementation to allow a single conversion to produce up to 512
* characters, so that's what we really expect as the buffer size.
*/
int exponent = 0;
int places = 0;
static const int width = 4;
if (value == 0.0) {
buffer[0] = '0';
buffer[1] = '\0';
return;
}
if (value < 0.0) {
*buffer++ = '-';
value = -value;
}
exponent = normalize(&value);
while (exponent > 0) {
int digit = value * 10;
*buffer++ = digit + '0';
value = value * 10 - digit;
++places;
--exponent;
}
if (places == 0)
*buffer++ = '0';
*buffer++ = '.';
while (exponent < 0 && places < width) {
*buffer++ = '0';
--exponent;
++places;
}
while (places < width) {
int digit = value * 10.0;
*buffer++ = digit + '0';
value = value * 10.0 - digit;
++places;
}
*buffer = '\0';
}
void ftoa_sci(char *buffer, double value) {
int exponent = 0;
int places = 0;
static const int width = 4;
if (value == 0.0) {
buffer[0] = '0';
buffer[1] = '\0';
return;
}
if (value < 0.0) {
*buffer++ = '-';
value = -value;
}
exponent = normalize(&value);
int digit = value * 10.0;
*buffer++ = digit + '0';
value = value * 10.0 - digit;
--exponent;
*buffer++ = '.';
for (int i = 0; i < width; i++) {
int digit = value * 10.0;
*buffer++ = digit + '0';
value = value * 10.0 - digit;
}
*buffer++ = 'e';
itoa(exponent, buffer, 10);
}
int my_printf(char const *fmt, ...) {
va_list arg;
int length;
va_start(arg, fmt);
length = my_vfprintf(stdout, fmt, arg);
va_end(arg);
return length;
}
int my_fprintf(FILE *file, char const *fmt, ...) {
va_list arg;
int length;
va_start(arg, fmt);
length = my_vfprintf(file, fmt, arg);
va_end(arg);
return length;
}
#ifdef TEST
int main() {
float floats[] = { 0.0, 1.234e-10, 1.234e+10, -1.234e-10, -1.234e-10 };
my_printf("%s, %d, %x\n", "Some string", 1, 0x1234);
for (int i = 0; i < sizeof(floats) / sizeof(floats[0]); i++)
my_printf("%f, %e\n", floats[i], floats[i]);
return 0;
}
#endif
I wrote nanoprintf in an attempt to find a balance between tiny binary size and having good feature coverage. As of today the "bare-bones" configuration is < 800 bytes of binary code, and the "maximal" configuration including float parsing is < 2500 bytes. 100% C99 code, no external dependencies, one header file.
https://github.com/charlesnicholson/nanoprintf
I haven't seen a smaller vsnprintf implementation than this that has a comparable feature set. I also released the software in the public domain and with the Zero-clause BSD license so it's fully unencumbered.
Here's an example that uses the vsnprintf functionality:
your_project_nanoprintf.c
#define NANOPRINTF_USE_FIELD_WIDTH_FORMAT_SPECIFIERS 1
#define NANOPRINTF_USE_PRECISION_FORMAT_SPECIFIERS 1
#define NANOPRINTF_USE_LARGE_FORMAT_SPECIFIERS 1
#define NANOPRINTF_USE_FLOAT_FORMAT_SPECIFIERS 1
#define NANOPRINTF_USE_WRITEBACK_FORMAT_SPECIFIERS 0
// Compile nanoprintf in this translation unit.
#define NANOPRINTF_IMPLEMENTATION
#include "nanoprintf.h"
your_log.h
void your_log(char const *s);
void your_log_v(char const *fmt, ...);
your_log.c
#include "your_log.h"
#include "nanoprintf.h"
#include <stdarg.h>
void your_log_v(char const *s) {
// Do whatever you want with the fully formatted string s.
}
void your_log(char const *fmt, ...) {
char buf[128];
va_arg args;
va_start(args, fmt);
npf_vsnprintf(buf, sizeof(buf), fmt, args); // Use nanoprintf for formatting.
va_end(args);
your_log_write(buf);
}
Nanoprintf also provides an snprintf-alike and a custom version that takes a user-provided putc callback for things like UART writes.
I add here my own implementation of (v)sprintf, but it does not provide float support (it is why I am here...).
However, it implements the specifiers c, s, d, u, x and the non standard ones b and m (binary and memory hexdump); and also the flags 0, 1-9, *, +.
#include <stdarg.h>
#include <stdint.h>
#define min(a,b) __extension__\
({ __typeof__ (a) _a = (a); \
__typeof__ (b) _b = (b); \
_a < _b ? _a : _b; })
enum flag_itoa {
FILL_ZERO = 1,
PUT_PLUS = 2,
PUT_MINUS = 4,
BASE_2 = 8,
BASE_10 = 16,
};
static char * sitoa(char * buf, unsigned int num, int width, enum flag_itoa flags)
{
unsigned int base;
if (flags & BASE_2)
base = 2;
else if (flags & BASE_10)
base = 10;
else
base = 16;
char tmp[32];
char *p = tmp;
do {
int rem = num % base;
*p++ = (rem <= 9) ? (rem + '0') : (rem + 'a' - 0xA);
} while ((num /= base));
width -= p - tmp;
char fill = (flags & FILL_ZERO)? '0' : ' ';
while (0 <= --width) {
*(buf++) = fill;
}
if (flags & PUT_MINUS)
*(buf++) = '-';
else if (flags & PUT_PLUS)
*(buf++) = '+';
do
*(buf++) = *(--p);
while (tmp < p);
return buf;
}
int my_vsprintf(char * buf, const char * fmt, va_list va)
{
char c;
const char *save = buf;
while ((c = *fmt++)) {
int width = 0;
enum flag_itoa flags = 0;
if (c != '%') {
*(buf++) = c;
continue;
}
redo_spec:
c = *fmt++;
switch (c) {
case '%':
*(buf++) = c;
break;
case 'c':;
*(buf++) = va_arg(va, int);
break;
case 'd':;
int num = va_arg(va, int);
if (num < 0) {
num = -num;
flags |= PUT_MINUS;
}
buf = sitoa(buf, num, width, flags | BASE_10);
break;
case 'u':
buf = sitoa(buf, va_arg(va, unsigned int), width, flags | BASE_10);
break;
case 'x':
buf = sitoa(buf, va_arg(va, unsigned int), width, flags);
break;
case 'b':
buf = sitoa(buf, va_arg(va, unsigned int), width, flags | BASE_2);
break;
case 's':;
const char *p = va_arg(va, const char *);
if (p) {
while (*p)
*(buf++) = *(p++);
}
break;
case 'm':;
const uint8_t *m = va_arg(va, const uint8_t *);
width = min(width, 64); // buffer limited to 256!
if (m)
for (;;) {
buf = sitoa(buf, *(m++), 2, FILL_ZERO);
if (--width <= 0)
break;
*(buf++) = ':';
}
break;
case '0':
if (!width)
flags |= FILL_ZERO;
// fall through
case '1'...'9':
width = width * 10 + c - '0';
goto redo_spec;
case '*':
width = va_arg(va, unsigned int);
goto redo_spec;
case '+':
flags |= PUT_PLUS;
goto redo_spec;
case '\0':
default:
*(buf++) = '?';
}
width = 0;
}
*buf = '\0';
return buf - save;
}
int my_sprintf(char * buf, const char * fmt, ...)
{
va_list va;
va_start(va,fmt);
int ret = my_vsprintf(buf, fmt, va);
va_end(va);
return ret;
}
#if TEST
int main(int argc, char *argv[])
{
char b[256], *p = b;
my_sprintf(b, "%x %d %b\n", 123, 123, 123);
while (*p)
putchar(*p++);
}
#endif
tl;dr : Considering a smaller, but more complete, sprintf() implementation
https://github.com/eyalroz/printf
The standard library's sprintf() implementation you may be using is probably quite resource-taxing. But it's possible that you could avail yourself of a stand-alone sprintf() implementation, you would get more complete functionality without paying with so much memory use.
Now, why would you choose that if you've told us you only need some basic functionality? Because the nature of (s)printf() use is that we tend to use more aspects of it as we go along. You notice you want to print larger numbers, or differences in far decimal digits; you want to print a bunch of values and then decide you want them aligned. Or somebody else wants to use the printing capability you added to print something you haven't thought of. So, instead of having to switch implementations, you use an implementation where compile-time options configure which features get compiled and which get left out.
Suppose we have a matrix in C (a 2D array).
I wonder how could I print the matrix into a string in C.
For example, if I have
double M[2][2]={{1.2,3.4},{3.14,2.718}}
I want to get a string like this:
"1.200 3.400\n3.140 2.718"
which could be printed as
1.200 3.400
3.140 2.718
If print to screen, the problem will become easier because we need not consider the buffer size. However, when printing to a string, it seems hard to know how large the string buffer should be, if I want to use functions like 'sprintf'.
I searched google and stackoverflow but almost all of what I get is about how to convert string to numerical...
How to do this in C? Are there any libs?
Could you please help? Thank you!
EDIT:
The 'snprintf' solution works well for the situation of just one number.
But for matrix, it is supposed to use loops to go through every element.
And then add a little to the string in each loop. Will this work with 'snprintf'?
You'll need to do this in two steps, first calculate the total size to hold the string, then allocate it and compose the string. Could be done in a function like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define ARRAY_SIZE(a) (sizeof(a) / sizeof(*a))
static char * matrix2D_to_string(const double *matrix, size_t rows, size_t columns)
{
const char format[] = "%f";
const char column_separator[] = " ";
const char row_separator[] = "\n";
int *column_widths = NULL;
size_t r = 0, c = 0;
char *buffer = NULL, *p = NULL;
size_t size = 0;
if (!rows || ! columns) {
errno = EINVAL;
return NULL;
}
// calculate maximum width for each column
column_widths = (int *)calloc(columns, sizeof(*column_widths));
for (r = 0; r < rows; ++r) {
for (c = 0; c < columns; ++c) {
char buf[256];
int width = sprintf(buf, format, matrix[r * columns + c]);
if (width > column_widths[c]) {
column_widths[c] = width;
}
}
}
// calculate total buffer size...
// ... values
for (c = 0; c < columns; ++c) {
size += column_widths[c] * rows;
}
// ... column separators
size += (columns - 1) * strlen(column_separator);
// ... row separators
size += (rows - 1) * strlen(row_separator);
// ... nul terminator
++size;
// make the string
buffer = (char *)malloc(size);
p = buffer;
for (r = 0; r < rows; ++r) {
if (r) {
strcpy(p, row_separator);
p += strlen(row_separator);
}
for (c = 0; c < columns; ++c) {
if (c) {
strcpy(p, column_separator);
p += strlen(column_separator);
}
int width = sprintf(p, format, matrix[r * columns + c]);
p += width;
if (width < column_widths[c]) {
width = column_widths[c] - width;
memset(p, ' ', width);
p += width;
}
}
}
*p = '\0';
// cleanup
free(column_widths);
return buffer;
}
int main()
{
double M[3][2]={{1.2,3.4},{3.14,2.718},{100.999,0.000005}};
char *s = matrix2D_to_string((const double *)M, ARRAY_SIZE(M), ARRAY_SIZE(M[0]));
puts(s);
free(s);
return 0;
}
This prints:
1.200000 3.400000
3.140000 2.718000
100.999000 0.000005
You may use snprintf to get buffer length you need first and then allocate and print into it:
char* PrettyPrint(const char* format, ...)
{
va_args va;
va_start(va, format);
char c[1] = {};
int len = vsnprintf(c, 1, format, va);
char* s = malloc(len + 1);
vsnprintf(s, len + 1, format, va);
va_end(va);
return s;
}
You can start with some estimated size, and use realloc if it isn't enough.
If snprintf returns more than the remaining buffer size, it means it didn't have enough room. In this case, you should reallocate, and retry the same snprintf.
Its array.
considering float values.
for(i=0;i<2;i++)
{
for(j=0;j<2;j++)
{
printf("%f \t",M[i][j]);
}
printf("\n");
}
Hope this will help you.
for(i=0;i<2;i++)
{
for(j=0;j<2;j++)
{
printf("%f \t",M[i][j]);
}
printf("\\n");//Change printf("\n") as printf("\\n")
}
#include <stdio.h>
#include <stdlib.h>
long GetFileSize(FILE *fp){
long fsize = 0;
fseek(fp,0,SEEK_END);
fsize = ftell(fp);
fseek(fp,0,SEEK_SET);//reset stream position!!
return fsize;
}
char *ReadToEnd(const char *filepath){
FILE *fp;
long fsize;
char *buff;
if(NULL==(fp=fopen(filepath, "rb"))){
perror("file cannot open at ReadToEnd\n");
return NULL;
}
fsize=GetFileSize(fp);
buff=(char*)malloc(sizeof(char)*fsize+1);
fread(buff, sizeof(char), fsize, fp);
fclose(fp);
buff[fsize]='\0';
return buff;
}
char *printStrMatrix(const char *fmt, const int col, const int row, const double* matrix ){
FILE *fp;
int c,r;
char *str;
if(NULL==(fp=fopen("printStr.tmp", "wb"))){//use tmpnam() better
perror("temporary file cannot open at printStr\n");
return NULL;
}
for(r=0;r<row;++r){
for(c=0;c<col;++c){
fprintf(fp, fmt, *matrix++);
if(c != col-1)
fprintf(fp, " ");
}
fprintf(fp, "\n");
}
fflush(fp);
fclose(fp);
str=ReadToEnd("printStr.tmp");
remove("printStr.tmp");
return str;
}
int main(void){
double M[2][2]={{1.2,3.4},{3.14,2.718}};
char *str;
str=printStrMatrix("%1.3lf", 2, 2, &M[0][0]);
printf("%s", str);
free(str);
return 0;
}