Combining two strings by removing duplicate substrings

Combining two strings by removing duplicate substrings - c

I have two strings that I would like to combine, removing duplicate substrings. Note that every two consecutive numbers constitute a substring. Consider string str1 and str2:
str1 = "#100#123#100#678"
str2 = "#100#678#100#56"
I would like to produce a combined string as:
comboStr = "#100#123#100#678#100#56" (i.e. I removed the duplicate #100#678)
What's the easiest way to do this? Is there a way I can achieve this using regular expressions?

I don't think regular expressions are a good way to solve this problem. Regexes might be useful in finding the #123 tokens, but the problem needs to backtrack on its own string in a way regex's back references are not desiged for.
I also don't think that there is an easy way (as in three lines of code) to solve this.
I assume that the strings always follow the pattern (#\d+)* and that the pair created at the seam when joining two strings is not trated as special, i.e. the resulting pair might be considered as duplicate. This means we can separate concatenation from pair removal.
Convert your string to a list of integers, operate on these lists and then join them back. That's some work, but it makes the actual code to strip duplicates easier - it's complicated enough - and might also come in handy when you need to operate on similar strings often.
#include <stdlib.h>
#include <stdio.h>
/*
* Convert a string to a list of at most max integers. The
* return value is the number of integers in the list (which
* max be greater than max!) or -1 if the string is invalid.
*/
int ilist_split(int *ilist, int max, const char *str)
{
const char *p = str;
int n = 0;
while (*p) {
int x;
int pos;
if (sscanf(p, "#%d %n", &x, &pos) < 1) return -1;
if (n < max) ilist[n] = x;
n++;
p += pos;
}
return n;
}
/*
* Convert a list of integers back to a string. The string
* is at most nbuf - 1 characters long and is assured to be
* zero-terminated if nbuf isn't 0. It is legal to pass NULL
* as char buffer if nbuf is 0. Returns the number of characters
* that would have been written ha dthe buffer been long enough,
* snprintf-style.
*/
int ilist_join(const int *ilist, int n, char *buf, int nbuf)
{
int len = 0;
int i;
for (i = 0; i < n; i++) {
len += snprintf(buf + len,
nbuf > len ? nbuf - len : 0, "#%d", ilist[i]);
}
return len;
}
/*
* Auxliary function to find a pair in an inteher list.
*/
int ilist_find_pair(int *ilist, int n, int a1, int a2)
{
int i;
for (i = 1; i < n; i++) {
if (ilist[i - 1] == a1 && ilist[i] == a2) return i - 1;
}
return -1;
}
/*
* Remove duplicate pairs from an integer list. The first
* pair is kept, subsequent pairs are deleted. Returns the
* new length of the array.
*/
int ilist_remove_dup_pairs(int *ilist, int n)
{
int i, j;
j = 1;
for (i = 1; i < n; i++) {
int a1 = ilist[i - 1];
int a2 = ilist[i];
if (ilist_find_pair(ilist, i - 1, a1, a2) < 0) {
ilist[j++] = ilist[i];
} else {
i++;
}
}
return j;
}
#define MAX 40
int main()
{
const char *str1 = "#100#123#100#678";
const char *str2 = "#100#678#100#56";
char res[80];
int ilist[MAX];
int nlist;
/* convert str1 */
nlist = ilist_split(ilist, MAX, str1);
if (nlist > MAX) nlist = MAX;
/* convert and concatenate str2 */
nlist += ilist_split(ilist + nlist, MAX - nlist, str2);
if (nlist > MAX) nlist = MAX;
/* remove duplicate pairs */
nlist = ilist_remove_dup_pairs(ilist, nlist);
/* convert back to string */
ilist_join(ilist, nlist, res, sizeof(res));
printf("%s\n", res);
return 0;
}

Related

Iterate through every char in string stored in an array

I am really new to C and in my first half year at university. This is my first questio on StackOverflow.
My task is to program it so every string stored in numbers is being converted into a decimal, without changing anything outside the main function.
I am now trying for the past 4 hours to solve this problem, where I want to iterate trough every char in the string I am currently to then, based on there position in comparison to the length to convert it into a decimal.
My only question here is to someone help me to understand how I can get the string length without using strlen() due to the fact I can't add #include <string.h>
This is what I got so far (getting the length of the array to iterate through every index):
#include <stdio.h>
#include <math.h> // Kompilieren mit -lm : gcc -Wall -std=c11 dateiname.c -lm
int main() {
char* numbers[] = {
"01001001",
"00101010",
"010100111001",
"011111110100101010010111",
"0001010110011010101111101111010101110110",
"01011100110000001101"};
// Add here..
int length = sizeof(numbers);
for ( int i = 0; i < length; i++ ){
//how do i get the string size without strlen() D:
}
return 0;
}

In C, strings are really just char arrays with a special terminator character to mark the end of the string. So, say you have something like:
char *str = "hello";
This is essentially equivalent to this:
char str[] = {'h', 'e', 'l', 'l', 'o', '\0'};
Notice that \0 character at the end of the array? This is the special terminator character that C places at the end of strings. Functions like strlen() pretty much iterate through the char array looking for the first occurrence of the \0 character and then stopping.
So, you can make your own version of strlen(), say my_strlen() like this:
int my_strlen(char *str)
{
/* Initialize len to 0 */
int len = 0;
/* Iterate through str, increment len, and stop when we reach '\0' */
while(str[len] != '\0')
len++;
/* Return the len */
return len;
}
Then within your for loop, you can just call this function. Also, note that your calculation of the size of the numbers array:
int length = sizeof(numbers);
will not give you the number of elements in the array. That code gives you the size (in bytes) or numbers which is an array of char pointers. If you want to get the number of elements, you have to divide that size by the size (in bytes) of a single element (i.e., a char pointer). So, something like this would work:
int length = sizeof(numbers) / sizeof(numbers[0]);
Your final code can look something like this:
#include <stdio.h>
#include <math.h> // Kompilieren mit -lm : gcc -Wall -std=c11 dateiname.c -lm
int my_strlen(char *str) {
/* Initialize len to 0 */
int len = 0;
/* Iterate through str, increment len, and stop when we reach '\0' */
while(str[len] != '\0')
len++;
/* Return the len */
return len;
}
int main() {
char* numbers[] = {
"01001001",
"00101010",
"010100111001",
"011111110100101010010111",
"0001010110011010101111101111010101110110",
"01011100110000001101"};
// Add here..
// Notice the change here
int length = sizeof(numbers) / sizeof(numbers[0]);
for(int i = 0; i < length; i++ ){
int str_len = my_strlen(numbers[i]);
// Do what you need with str_len
}
return 0;
}

This project can be done without computing the length of the strings. How? In C, all strings are nul-terminated containing the nul-character '\0' (with ASCII value 0) after the last character that makes up the string. When you need to iterate over a string, you just loop until the character values is 0 (e.g. the nul-character)
This is how all string function know when to stop reading characters. Since you have an array-of-pointers that contains your strings, you just need to loop over each pointer and for each pointer, loop over each character until the nul-character is found.
Putting it altogether, (and noting you don't need math.h), you can do:
#include <stdio.h>
#include <math.h> // Kompilieren mit -lm : gcc -Wall -std=c11 dateiname.c -lm
int main() {
char* numbers[] = {
"01001001",
"00101010",
"010100111001",
"011111110100101010010111",
"0001010110011010101111101111010101110110",
"01011100110000001101"};
int nnumbers = sizeof numbers / sizeof *numbers; /* no. of elements */
for (int i = 0; i < nnumbers; i++) {
long long unsigned number = 0;
/* you don't care about the length, strings are nul-terminated,
* just loop until \0 is found.
*/
for (int j = 0; numbers[i][j]; j++) {
number <<= 1; /* shift left */
number += numbers[i][j] == '1' ? 1 : 0; /* add bit */
}
printf ("%s = %llu\n", numbers[i], number); /* output result */
}
return 0;
}
(note: you must use a 64-bit type to hold the converted values as "1010110011010101111101111010101110110" requires a minimum of 38 bits to represent)
Example Use/Output
Simple example output converting each string to a numeric value:
$ ./bin/binstr2num
01001001 = 73
00101010 = 42
010100111001 = 1337
011111110100101010010111 = 8342167
0001010110011010101111101111010101110110 = 92790519158
01011100110000001101 = 379917

#include <stdio.h>
int main(){
char arr[20]="Hello";
int count=0;
while(arr[count]!='\0'){
count++;
}
printf("%d",count);
return 0;
}
Look at this small code, you will understand. In C a string ended with a NULL character. We can use that advantage.

There are a few ways to do it. IMO, a simple, reasonable way to implement strlen is:
size_t string_length(const char *s) { return strchr(s, '\0') - s; }
but if you're not allowed to use strlen then you're probably not allowed to use strchr either. So you just have to count. The most idiomatic way to do that is probably a bit obscure for a complete beginner, so here is a more verbose method.
Note that your computation of the number of elements in the array is invalid, and has been corrected below.
#include <stdio.h>
int
length(const char *s)
{
int len = 0;
while( *s++ ){
len += 1;
}
return len;
}
int
main(void)
{
char *numbers[] = {
"01001001",
"00101010",
"010100111001",
"011111110100101010010111",
"0001010110011010101111101111010101110110",
"01011100110000001101"
};
int count = sizeof numbers / sizeof *numbers; /* Number of entries */
for( int i = 0; i < count; i++ ){
printf(" length of %s is %d\n", numbers[i], length(numbers[i]));
}
return 0;
}
It's pretty subjective, but IMO a more idiomatic way to write this is:
#include <stdio.h>
int
length(const char *e)
{
const char *s = e;
while( *e++ )
;
return e - s - 1;
}
int
main(void)
{
char *numbers[] = {
"01001001",
"00101010",
"010100111001",
"011111110100101010010111",
"0001010110011010101111101111010101110110",
"01011100110000001101"
};
char **e = numbers + sizeof numbers / sizeof *numbers;
for( char **t = numbers; t < e; t++ ){
printf(" length of %s is %d\n", *t, length(*t));
}
return 0;
}

Check permutations without modifying original string C

I am doing a check whether 2 strings are permutations. I sort the strings then compare each character to each other. However, I think my sorting process also changes the original strings (I am very bad with pointers and passing by reference).
Is there a way to check without modifying the original strings?
I also tried using strcpy but I don't really know how to use it.
I tried this in my check() function:
char temp[128];
strcpy(temp, word);
Below is my code. I call the areAnagram function from another function like this:
void check(char *word, struct Entry *en) {
if (areAnagram(en->word, word) == 1) {
//printf("EW:%s W:%s\n", en->word, word);
//For example, this should return something like
// EW:silent W:listen
//But I got
// EW:eilnst W:eilnst
}
}
Structure for Entry:
typedef struct Entry {
char *word;
int len;
struct Entry *next;
} Entry;
Here is the anagram check process:
void quickSort(char *arr, int si, int ei);
int areAnagram(char *str1, char *str2)
{
// Get lenghts of both strings
int n1 = strlen(str1);
int n2 = strlen(str2);
// If lenght of both strings is not same, then they cannot be anagram
if (n1 != n2) {
return 0;
}
// Sort both strings
quickSort (str1, 0, n1 - 1);
quickSort (str2, 0, n2 - 1);
int i;
// Compare sorted strings
for (i = 0; i < n1; i++) {
if (str1[i] != str2[i]) {
return 0;
}
}
return 1;
}
void exchange(char *a, char *b)
{
char temp;
temp = *a;
*a = *b;
*b = temp;
}
int partition(char A[], int si, int ei)
{
char x = A[ei];
int i = (si - 1);
int j;
for (j = si; j <= ei - 1; j++) {
if(A[j] <= x) {
i++;
exchange(&A[i], &A[j]);
}
}
exchange (&A[i + 1], &A[ei]);
return (i + 1);
}
void quickSort(char A[], int si, int ei)
{
int pi; /* Partitioning index */
if(si < ei) {
pi = partition(A, si, ei);
quickSort(A, si, pi - 1);
quickSort(A, pi + 1, ei);
}
}

There is a better way of checking whether two strings are anagrams.You can create an array to store the count of each character in first string(increment the ASCII value index in the array). Then traverse second string and decrement the count of each character (ASCII value index in the array). Now check if all elements of array are zero,if yes these are anagrams otherwise not.
int arr[123];
suppose two strings are s1="abba" and s2="baba"
while traversing first string arr[97]=2,arr[98]=2;
while traversing second array arr[97]=0,arr[98]=0;
Now if you traverse the whole array then all elements will be zero.
But if two strings s1="abba" and s2="abac"
while traversing first string arr[97]=2,arr[98]=2;
while traversing second string arr[97]=0,arr[98]=1,arr[99]=-1;
Since all elements of array are not zero so these are not anagrams.
The complexity of above algorithm is O(n).
Hope it helps.

Make a copy using strcpy:
char *copy = malloc(strlen(word) + 1); // can use a temporary buffer, but this allows variable length inputs
strcpy(copy, word);
// use copy as your temporary string
free(copy);

Your title states that you do not want to modify the original string, however your solution uses Quicksort, which modifies the string. Plus, sorting -- even a fast optimized sort -- is an expensive operation and is not needed for the problem you are trying to solve. You could use a lookup table for speed and it would not modify the original string. You simply create a unique number for each letter and sum the values. Equal sums would constitute an anagram.
/* OPTION 1: let the compiler build your table */
static const int A=0x0000001;
static const int B=0x0000002;
static const int C=0x0000004;
/* continue to double for other letters until ... */
static const int Z=0x4000000;
/* OPTION 2: calculate a cheap hash for each letter */
/* Returns 0 for anagram similar to strcmp */
int anagram (const char* word1, const char* word2)
{
/* strings must be equal length */
if (strlen(word1) != strlen(word2))
return -1;
unsigned long sum1 = 0;
unsigned long sum2 = 0;
char c;
for (int i = 0 ; word1[i] != '\0' ; i++)
{
/* use toupper() function here if case insensitive */
c = toupper(word1[i]);
sum1 += 1 << (c - 'A');
}
for (int i = 0 ; word2[i] != '\0' ; i++)
{
/* use toupper() function here if case insensitive */
c = toupper(word2[i]);
sum2 += 1 << (c - 'A');
}
return (int)(sum1 - sum2); /* ignore overflow */
}
The anagram function above is untested and has been written for clarity. You'd need to include the ctype.h to convert the case using toupper().
Finally, you could make a copy of one of the strings, traverse the other string calling strchr() on each character to find the matching character in the copy. If strchr() returns NULL then there is no anagram, otherwise if strchr() returns a valid pointer, use it to modify the copy, e.g. set to char value to 0x01, so that you can sum the chars in the modified copy. In this instance, the strings would be anagrams if the sum of all the chars in the modified copy equal the integer length of the comparison string.

C substrings / C string slicing?

Hy everybody!
I am trying to write a program that checks if a given string of text is a palindrome (for this I made a function called is_palindrome that works) and if any of it's substrings is a palindrome, and I can't figure out what is the optimal way to do this:
For example, for the string s = "abcdefg" it should first check "a", then "ab", "abc", "abcd" and so on, for each character
In Python this is the equivalent of
s[:1], s[:2], ... (a, ab, ...)
s[1:2], s[1:3] ... (b, bc, ...)
What function/method is there that I can use in a similar way in C ?

This is the one liner I use to get a slice of a string in C.
void slice(const char *str, char *result, size_t start, size_t end)
{
strncpy(result, str + start, end - start);
}
Pretty straightforward.
Given you've checked boundaries and made sure end > start.

This slice_str() function will do the trick, with end actually being the end character, rather than one-past-the-end as in Python slicing:
#include <stdio.h>
#include <string.h>
void slice_str(const char * str, char * buffer, size_t start, size_t end)
{
size_t j = 0;
for ( size_t i = start; i <= end; ++i ) {
buffer[j++] = str[i];
}
buffer[j] = 0;
}
int main(void) {
const char * str = "Polly";
const size_t len = strlen(str);
char buffer[len + 1];
for ( size_t start = 0; start < len; ++start ) {
for ( int end = len - 1; end >= (int) start; --end ) {
slice_str(str, buffer, start, end);
printf("%s\n", buffer);
}
}
return 0;
}
which, when used from the above main() function, outputs:
paul#horus:~/src/sandbox$ ./allsubstr
Polly
Poll
Pol
Po
P
olly
oll
ol
o
lly
ll
l
ly
l
y
paul#horus:~/src/sandbox$

There isn't; you'll have to write your own.

In order to check a string, you would need to supply to the number of characters to check in order to check for a palindrome:
int palindrome(char* str, int len)
{
if (len < 2 )
{
return 0;
}
// position p and q on the first and last character
char* p = str;
char* q = str + len - 1;
// compare start char with end char
for ( ; p < str + len / 2; ++p, --q )
{
if (*p != *q)
{
return 0;
}
}
return 1;
}
now you would need to call the function above for each substring (as you described it, i.e. always starting from the beginning) e.g.
char candidate[] = "wasitaratisaw";
for (int len = 0; len < strlen(candidate); ++len)
{
if (palindrome(candidate, len))
{
...
}
}
disclaimer: not compiled.

Honestly, you don't need a string slicing function just to check for palindromes within substrings:
/* start: Pointer to first character in the string to check.
* end: Pointer to one byte beyond the last character to check.
*
* Return:
* -1 if start >= end; this is considered an error
* 0 if the substring is not a palindrome
* 1 if the substring is a palindrome
*/
int
ispalin (const char *start, const char *end)
{
if (start >= end)
return -1;
for (; start < end; ++start)
if (*start != *--end)
return 0;
return 1;
}
With that, you can create the following:
int
main ()
{
const char *s = "madam";
/* i: index of first character in substring
* n: number of characters in substring
*/
size_t i, n;
size_t len = strlen (s);
for (i = 0; i < len; ++i)
{
for (n = 1; n <= len - i; ++n)
{
/* Start of substring. */
const char *start = s + i;
/* ispalin(s[i:i+n]) in Python */
switch (ispalin (start, start + n))
{
case -1:
fprintf (stderr, "error: %p >= %p\n", (void *) start, (void *) (start + n));
break;
case 0:
printf ("Not a palindrome: %.*s\n", (int) n, start);
break;
case 1:
printf ("Palindrome: %.*s\n", (int) n, start);
break;
} /* switch (ispalin) */
} /* for (n) */
} /* for (i) */
}
Of course, if you really wanted a string slicing function merely for output (since you technically shouldn't cast a size_t to int), and you still want to be able to format the output easily, the answer by Paul Griffiths should suffice quite well, or you can use mine or even one of strncpy or the nonstandard strlcpy, though they all have their strengths and weaknesses:
/* dest must have
* 1 + min(strlen(src), n)
* bytes available and must not overlap with src.
*/
char *
strslice (char *dest, const char *src, size_t n)
{
char *destp = dest;
/* memcpy here would be ideal, but that would mean walking the string twice:
* once by calling strlen to determine the minimum number of bytes to copy
* and once for actually copying the substring.
*/
for (; n != 0 && *src != 0; --n)
*destp++ = *src++;
*destp = 0;
return dest;
}
strslice actually works like a combination of strncpy and the nonstandard strlcpy, though there are differences between these three functions:
strlcpy will cut the copied string short to add a null terminator at dest[n - 1], so copying exactly n bytes before adding a null terminator requires you to pass n + 1 as the buffer size.
strncpy may not terminate the string at all, leaving dest[n - 1] equal to src[n - 1], so you would need to add a null terminator yourself just in case. If n is greater than the src string length, dest will be padded with null terminators until n bytes have been written.
strslice will copy up to n bytes if necessary, like strncpy, and will require an extra byte for the null terminator, meaning a maximum of n+1 bytes are necessary. It doesn't waste time writing unnecessary null terminators as strncpy does. This can be thought of as a "lightweight strlcpy" with a small difference in what n means and can be used where the resulting string length won't matter.
You could also create a memslice function if you wanted, which would allow for embedded null bytes, but it already exists as memcpy.

There is not any built-in function/method in any standard C library which can handle this. However, you can come up with your own method to do the same.

Creating a string in C of the form "1,2,3,4,5"

I'm having difficulty in generating a string of the form "1,2,3,4,5" to pass to a command line program.
Here's what I have tried:
int N=100;
char list[200];
for (i=0; i<2*N; i+=2) {
char tmp;
sprintf(tmp,'%d', i);
strcpy(list[i], tmp);
strcpy(list[i+1], ',');
}
Edit:
I don't feel this question is a duplicate as it is more to do with appending strings into a list and managing that memory and than literally just putting a comma between to integers.

The following code will do what you need.
#include <stdlib.h>
#include <stdio.h>
char* CommaSeparatedListOfIntegers(const int N)
{
if (N < 1)
return NULL;
char* result = malloc(1 + N*snprintf(NULL, 0, "%d,", N));
char* p = result;
for (int i = 1; i <= N; i++)
p += sprintf(p, "%d,", i);
*(p-1) = '\0';
return result;
}
Note that the function returns a heap allocated block of memory that the caller is responsible for clearing up.
Some points of note:
We put a crude upper bound on the length of each number when converted to text. This does mean that we will over allocate the block of memory, but not by a massive amount. If that is a problem for you then you can code a more accurate length. That would involve looping from 1 to N and calling snprintf for each value to determine the required length.
Note that we initially write out a comma after the final value, but then replace that with the null-terminator.

Let's forget about writing strings for the moment and write a function that just prints that list to the screen:
int range_print(int begin, int end, const char *sep)
{
int len = 0;
int i;
for (i = begin; i < end; i++) {
if (i > begin) {
len += printf("%s", sep);
}
len += printf("%d", i);
}
return len;
}
You can call it like this:
range_print(1, 6, ", ");
printf("\n");
The function does not write a new-line character, so we have to do that. It prints all numbers and a custom separator before each number after the first. The separator can be any string, so this function also works if you want to separate your numbers with slashes or tabs.
The function has printf semantics, because it returns the number of characters written. (That value is often ignored, but it can come in handy, as we'll see soon.) We also make the upper bound exclusive, so that in order to print (1, 2, 3, 4, 5) you have tp pass 1 and 6 as bounds.
We'll now adapt this function so that it writes to a string. There are several ways to do that. Let's look at a way that works similar to snprintf: It should tabe a pre-allocated char buffer, a maximum length and it should return the number of characters written or, if the output doesn't fit, the number of characters that would have been written had the buffer been big enough.
int range(char *buf, int n, int begin, int end, const char *sep)
{
int len = 0;
int m, i;
for (i = begin; i < end; i++) {
m = snprintf(buf, n, "%s%d",
(i > begin) ? sep : "", i);
len += m;
buf += m;
n -= m;
if (n < 0) n = 0;
}
return len;
}
This function is tricky because it has to keep track of the number of characters written and of the free buffer still available. It keeps printing after the buffer is full, which is a bit wasteful in terms of performace, but it is legal to call snprintf with a buffer size of zero, and that way we keep the semantics tidy.
You can call this function like this:
char buf[80];
range(buf, sizeof(buf), 1, 6, ", ");
printf("%s\n", buf);
That means that we need to define a buffer that is large enough. If the range of numbers is large, the string will be truncated. We might therefore want a function that allocates a string for us that is long enough:
char *range_new(int begin, int end, const char *sep, int *plen)
{
int len = (end - begin - 1) * strlen(sep) + 1;
char *str;
char *p;
int i;
for (i = begin; i < end; i++) {
len += snprintf(NULL, 0, "%d", i);
}
str = malloc(len);
if (str == NULL) return NULL;
p = str;
for (i = begin; i < end; i++) {
if (i > begin) p += sprintf(p, "%s", sep);
p += sprintf(p, "%d", i);
}
if (plen) *plen = len - 1;
return str;
}
This function needs two passes: in the first pass, we determine how much memory we need to store the list. Next, we allocate and fill the string. The function returns the allocated string, which the user has to free after use. Because the return value is already used, we lose the information on the string length. An additional argument, a pointer to int, may be given. If it is not NULL, the length will be stored.
This function can be called like this.
char *r;
int len;
r = range_new(1, 6, ", ", &len);
printf("%s (%d)\n", r, len);
free(r);
Note that the same can be achieved by calling our old range function twice:
char *r;
int len;
len = range(NULL, 0, 1, 6, ", ");
r = malloc(len + 1);
range(p, len + 1, 1, 6, ", ");
printf("%s (%d)\n", r, len);
free(r);
So, pick one. For short ranges, I recommend the simple range function with a fixed-size buffer.

What is the proper way of implementing a good "itoa()" function?

I was wondering if my implementation of an "itoa" function is correct. Maybe you can help me getting it a bit more "correct", I'm pretty sure I'm missing something. (Maybe there is already a library doing the conversion the way I want it to do, but... couldn't find any)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char * itoa(int i) {
char * res = malloc(8*sizeof(int));
sprintf(res, "%d", i);
return res;
}
int main(int argc, char *argv[]) {
...

// Yet, another good itoa implementation
// returns: the length of the number string
int itoa(int value, char *sp, int radix)
{
char tmp[16];// be careful with the length of the buffer
char *tp = tmp;
int i;
unsigned v;
int sign = (radix == 10 && value < 0);
if (sign)
v = -value;
else
v = (unsigned)value;
while (v || tp == tmp)
{
i = v % radix;
v /= radix;
if (i < 10)
*tp++ = i+'0';
else
*tp++ = i + 'a' - 10;
}
int len = tp - tmp;
if (sign)
{
*sp++ = '-';
len++;
}
while (tp > tmp)
*sp++ = *--tp;
return len;
}
// Usage Example:
char int_str[15]; // be careful with the length of the buffer
int n = 56789;
int len = itoa(n,int_str,10);

The only actual error is that you don't check the return value of malloc for null.
The name itoa is kind of already taken for a function that's non-standard, but not that uncommon. It doesn't allocate memory, rather it writes to a buffer provided by the caller:
char *itoa(int value, char * str, int base);
If you don't want to rely on your platform having that, I would still advise following the pattern. String-handling functions which return newly allocated memory in C are generally more trouble than they're worth in the long run, because most of the time you end up doing further manipulation, and so you have to free lots of intermediate results. For example, compare:
void delete_temp_files() {
char filename[20];
strcpy(filename, "tmp_");
char *endptr = filename + strlen(filename);
for (int i = 0; i < 10; ++i) {
itoa(endptr, i, 10); // itoa doesn't allocate memory
unlink(filename);
}
}
vs.
void delete_temp_files() {
char filename[20];
strcpy(filename, "tmp_");
char *endptr = filename + strlen(filename);
for (int i = 0; i < 10; ++i) {
char *number = itoa(i, 10); // itoa allocates memory
strcpy(endptr, number);
free(number);
unlink(filename);
}
}
If you had reason to be especially concerned about performance (for instance if you're implementing a stdlib-style library including itoa), or if you were implementing bases that sprintf doesn't support, then you might consider not calling sprintf. But if you want a base 10 string, then your first instinct was right. There's absolutely nothing "incorrect" about the %d format specifier.
Here's a possible implementation of itoa, for base 10 only:
char *itobase10(char *buf, int value) {
sprintf(buf, "%d", value);
return buf;
}
Here's one which incorporates the snprintf-style approach to buffer lengths:
int itobase10n(char *buf, size_t sz, int value) {
return snprintf(buf, sz, "%d", value);
}

A good int to string or itoa() has these properties;
Works for all [INT_MIN...INT_MAX], base [2...36] without buffer overflow.
Does not assume int size.
Does not require 2's complement.
Does not require unsigned to have a greater positive range than int. In other words, does not use unsigned.
Allows use of '-' for negative numbers, even when base != 10.
Tailor the error handling as needed. (needs C99 or later):
char* itostr(char *dest, size_t size, int a, int base) {
// Max text needs occur with itostr(dest, size, INT_MIN, 2)
char buffer[sizeof a * CHAR_BIT + 1 + 1];
static const char digits[36] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (base < 2 || base > 36) {
fprintf(stderr, "Invalid base");
return NULL;
}
// Start filling from the end
char* p = &buffer[sizeof buffer - 1];
*p = '\0';
// Work with negative `int`
int an = a < 0 ? a : -a;
do {
*(--p) = digits[-(an % base)];
an /= base;
} while (an);
if (a < 0) {
*(--p) = '-';
}
size_t size_used = &buffer[sizeof(buffer)] - p;
if (size_used > size) {
fprintf(stderr, "Scant buffer %zu > %zu", size_used , size);
return NULL;
}
return memcpy(dest, p, size_used);
}

I think you are allocating perhaps too much memory. malloc(8*sizeof(int)) will give you 32 bytes on most machines, which is probably excessive for a text representation of an int.

i found an interesting resource dealing with several different issues with the itoa implementation
you might wanna look it up too
itoa() implementations with performance tests

I'm not quite sure where you get 8*sizeof(int) as the maximum possible number of characters -- ceil(8 / (log(10) / log(2))) yields a multiplier of 3*. Additionally, under C99 and some older POSIX platforms you can create an accurately-allocating version with sprintf():
char *
itoa(int i)
{
int n = snprintf(NULL, 0, "%d", i) + 1;
char *s = malloc(n);
if (s != NULL)
snprintf(s, n, "%d", i);
return s;
}
HTH

You should use a function in the printf family for this purpose. If you'll be writing the result to stdout or a file, use printf/fprintf. Otherwise, use snprintf with a buffer big enough to hold 3*sizeof(type)+2 bytes or more.

sprintf is quite slow, if performance matters it is probably not the best solution.
if the base argument is a power of 2 the conversion can be done with a shift and masking, and one can avoid reversing the string by recording the digits from the highest positions. For instance, something like this for base=16
int num_iter = sizeof(int) / 4;
const char digits[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
/* skip zeros in the highest positions */
int i = num_iter;
for (; i >= 0; i--)
{
int digit = (value >> (bits_per_digit*i)) & 15;
if ( digit > 0 ) break;
}
for (; i >= 0; i--)
{
int digit = (value >> (bits_per_digit*i)) & 15;
result[len++] = digits[digit];
}
For decimals there is a nice idea to use a static array big enough to record the numbers in the reversed order, see here

Integer-to-ASCII needs to convert data from a standard integer type
into an ASCII string.
All operations need to be performed using pointer arithmetic, not array indexing.
The number you wish to convert is passed in as a signed 32-bit integer.
You should be able to support bases 2 to 16 by specifying the integer value of the base you wish to convert to (base).
Copy the converted character string to the uint8_t* pointer passed in as a parameter (ptr).
The signed 32-bit number will have a maximum string size (Hint: Think base 2).
You must place a null terminator at the end of the converted c-string Function should return the length of the converted data (including a negative sign).
Example my_itoa(ptr, 1234, 10) should return an ASCII string length of 5 (including the null terminator).
This function needs to handle signed data.
You may not use any string functions or libraries.
.
uint8_t my_itoa(int32_t data, uint8_t *ptr, uint32_t base){
uint8_t cnt=0,sgnd=0;
uint8_t *tmp=calloc(32,sizeof(*tmp));
if(!tmp){exit(1);}
else{
for(int i=0;i<32;i++){
if(data<0){data=-data;sgnd=1;}
if(data!=0){
if(data%base<10){
*(tmp+i)=(data%base)+48;
data/=base;
}
else{
*(tmp+i)=(data%base)+55;
data/=base;
}
cnt++;
}
}
if(sgnd){*(tmp+cnt)=45;++cnt;}
}
my_reverse(tmp, cnt);
my_memcopy(tmp,ptr,cnt);
return ++cnt;
}
ASCII-to-Integer needs to convert data back from an ASCII represented string into an integer type.
All operations need to be performed using pointer arithmetic, not array indexing
The character string to convert is passed in as a uint8_t * pointer (ptr).
The number of digits in your character set is passed in as a uint8_t integer (digits).
You should be able to support bases 2 to 16.
The converted 32-bit signed integer should be returned.
This function needs to handle signed data.
You may not use any string functions or libraries.
.
int32_t my_atoi(uint8_t *ptr, uint8_t digits, uint32_t base){
int32_t sgnd=0, rslt=0;
for(int i=0; i<digits; i++){
if(*(ptr)=='-'){*ptr='0';sgnd=1;}
else if(*(ptr+i)>'9'){rslt+=(*(ptr+i)-'7');}
else{rslt+=(*(ptr+i)-'0');}
if(!*(ptr+i+1)){break;}
rslt*=base;
}
if(sgnd){rslt=-rslt;}
return rslt;
}

I don't know about good, but this is my implementation that I did while learning C
static int ft_getintlen(int value)
{
int l;
int neg;
l = 1;
neg = 1;
if (value < 0)
{
value *= -1;
neg = -1;
}
while (value > 9)
{
l++;
value /= 10;
}
if (neg == -1)
{
return (l + 1);
}
return (l);
}
static int ft_isneg(int n)
{
if (n < 0)
return (-1);
return (1);
}
static char *ft_strcpy(char *dest, const char *src)
{
unsigned int i;
i = 0;
while (src[i] != '\0')
{
dest[i] = src[i];
i++;
}
dest[i] = src[i];
return (dest);
}
char *ft_itoa(int n)
{
size_t len;
char *instr;
int neg;
neg = ft_isneg(n);
len = ft_getintlen(n);
instr = (char *)malloc((sizeof(char) * len) + 1);
if (n == -2147483648)
return (ft_strcpy(instr, "-2147483648"));
if (!instr)
return (NULL);
if (neg == -1)
n *= -1;
instr[len--] = 0;
if (n == 0)
instr[len--] = 48;
while (n)
{
instr[len--] = ((n % 10) + 48);
n /= 10;
}
if (neg == -1)
instr[len] = '-';
return (instr);
}

This should work:
#include <string.h>
#include <stdlib.h>
#include <math.h>
char * itoa_alloc(int x) {
int s = x<=0 ? 1 ? 0; // either space for a - or for a 0
size_t len = (size_t) ceil( log10( abs(x) ) );
char * str = malloc(len+s + 1);
sprintf(str, "%i", x);
return str;
}
If you don't want to have to use the math/floating point functions (and have to link in the math libraries) you should be able to find non-floating point versions of log10 by searching the Web and do:
size_t len = my_log10( abs(x) ) + 1;
That might give you 1 more byte than you needed, but you'd have enough.

There a couple of suggestions I might make. You can use a static buffer and strdup to avoid repeatedly allocating too much memory on subsequent calls. I would also add some error checking.
char *itoa(int i)
{
static char buffer[12];
if (snprintf(buffer, sizeof(buffer), "%d", i) < 0)
return NULL;
return strdup(buffer);
}
If this will be called in a multithreaded environment, remove "static" from the buffer declaration.

This is chux's code without safety checks and the ifs. Try it online:
char* itostr(char * const dest, size_t const sz, int a, int const base) {
bool posa = a >= 0;
char buffer[sizeof a * CHAR_BIT + 1];
static const char digits[36] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char* p = &buffer[sizeof buffer - 1];
do {
*(p--) = digits[abs(a % base)];
a /= base;
} while (a);
*p = '-';
p += posa;
size_t s = &buffer[sizeof(buffer)] - p;
memcpy(dest, p, s);
dest[s] = '\0';
return dest;
}

main()
{
int i=1234;
char stmp[10];
#if _MSC_VER
puts(_itoa(i,stmp,10));
#else
puts((sprintf(stmp,"%d",i),stmp));
#endif
return 0;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Combining two strings by removing duplicate substrings - c

Related

Iterate through every char in string stored in an array

Check permutations without modifying original string C

C substrings / C string slicing?

Creating a string in C of the form "1,2,3,4,5"

What is the proper way of implementing a good "itoa()" function?

Categories

Resources