unintended output character p when reversing a DNA string in C - c

The intended output is to first reverse the whole DNA string, and then convert A <-> T, C <-> G.
However, in the actual output, the first character prints as "p", which is coming out of nowhere, but the rest of the output string is fine.
Here's the code:
int main() {
const char dna[] = "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT"
"TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG"
"GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT"
"CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA"
"AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT";
int dna_len = strlen(dna);
char rev_comp[dna_len+1];
char temp = '\0';
char temp_dna[dna_len+1];
for (int i = 0; i < dna_len + 1; i++) {
temp_dna[i] = dna[dna_len - i];
if (temp_dna[i] == 'A') {
temp = 'T';
rev_comp[i] = temp;
}
else if (temp_dna[i] == 'T') {
temp = 'A';
rev_comp[i] = temp;
}
else if (temp_dna[i] == 'C') {
temp = 'G';
rev_comp[i] = temp;
}
else if (temp_dna[i] == 'G') {
temp = 'C';
rev_comp[i] = temp;
}
}
rev_comp[dna_len+1] = '\0';
printf("original: %s\n", dna);
printf("rev_comp: %s\n", rev_comp);
return 0;
}

#TedLyngmo has already pointed out the indexing errors in your original code, but another consideration you may address is thinking about being able to reuse some of the code that you are writing in other programs later. Rather than writing specialized code over-and-over again for each individual program you write, identifying common parts of the code you may want to use again in another program and creating a short function for that part of the code makes that possible.
You will likely have the need to reverse a string more times than just in this program, so writing a reusable function to reverse a string that you can use wherever it is needed makes sense. Depending on your career path, you may also have the need to transform A <-> T, C <-> G more than in this one program, so a short function to do that may make sense as well.
Caveat: If upmost efficiency is required (dealing with billions of characters strings), then it would make sense to combine the operations and take advantage of a single iteration over the DNA sequence string. By working from each end of the string towards the middle you can handle two-characters per-iteration reducing by-half the number of iterations needed.
To make a reusable function for each the reversal and the transform of the string you can write the functions as follows. The string reversal function shows how to work from each end toward the middle requiring only half the number of iterations as the string has characters:
#include <stdio.h>
#include <string.h>
/* reverse src in dest copying 2-characters per-iteration. */
void strrev(char *dest, const char *src)
{
size_t begin = 0, end = strlen(src); /* begin and 1-past-end indexes */
dest[end] = 0; /* nul-terminate dest */
for(; begin < end--; ++begin) {
dest[begin] = src[end]; /* end to begin */
dest[end] = src[begin]; /* begin to end */
}
}
/* transform A <-> T, C <-> G */
void xformATCG (char *s)
{
do {
if (*s == 'A')
*s = 'T';
else if (*s == 'T')
*s = 'A';
else if (*s == 'C')
*s = 'G';
else if (*s == 'G')
*s = 'C';
} while (*s++);
}
If you like, you can write a simple print function that will break long lines of output at a specific number of characters similar to how you show with your initialization of dna[]. For what it's worth you could add:
/* simple print with break at brk chars function */
void prnwbrk (const char *s, size_t brk)
{
size_t n = 0; /* counter */
while (s[n]) { /* loop until end-of-string */
if (n && n % brk == 0) /* if brk chars, output \n */
putchar ('\n');
putchar (s[n++]); /* output char */
}
putchar ('\n'); /* final \n */
}
Now reversing and transforming the string simply becomes a matter of calling strrev() and xformATCG() in main(). You can output between each operation to check each step (which makes debugging a bit easier). A short main() could be:
int main (void) {
const char dna[] = "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT"
"TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG"
"GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT"
"CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA"
"AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT";
char rev_comp[sizeof dna];
prnwbrk (dna, 50); /* print original dna */
putchar ('\n');
strrev (rev_comp, dna); /* reverse and print */
prnwbrk (rev_comp, 50);
putchar ('\n');
xformATCG (rev_comp); /* transform chars and print */
prnwbrk (rev_comp, 50);
}
Example Use/Output
If I understood your question and the operations properly, the reversed and transformed strings would look like:
$ ./bin/revdna
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT
TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG
GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
TAAGTTAACAATAATAATACAGGATGTTCGTAATTAATTAATTGTGTGAA
ATCATCCATACAAGCGGACATTATAACTTGCATCCACGCTATTTATTATC
TTACTCCGTCCTTAGTTTCTGTCTATGACGCTGTATCCCACGAGGCCGAG
GTCGCAGAGCGTTACGATAGCGCACGTGTGGGGGGTCTGCTTTTATGGTT
TACGTACCTCTCGAGGGCACTCACCAATTATCCCACTATCTGGACACTAG
ATTCAATTGTTATTATTATGTCCTACAAGCATTAATTAATTAACACACTT
TAGTAGGTATGTTCGCCTGTAATATTGAACGTAGGTGCGATAAATAATAG
AATGAGGCAGGAATCAAAGACAGATACTGCGACATAGGGTGCTCCGGCTC
CAGCGTCTCGCAATGCTATCGCGTGCACACCCCCCAGACGAAAATACCAA
ATGCATGGAGAGCTCCCGTGAGTGGTTAATAGGGTGATAGACCTGTGATC
Nothing wrong with doing it all in main(), but thinking ahead can save you from reinventing-the-wheel each time you need to do the same thing in another program. (additionally, writing and debugging a function once, prevents new bugs from slipping in when you reinvent the function later)
Look things over and let me know if you have further questions.

Your loop is wrong and loops up to dna_len (where the null terminator is). It should be:
for (int i = 0; i < dna_len; i++) { // corrected loop
temp_dna[i] = dna[dna_len - i - 1]; // corrected calculation
Also, the final null terminator in rev_comp should be assigned at index dna_len, not dna_len + 1 - which is out of bounds so your program has undefined behavior. Printing p is one possible outcome of undefined behavior.
rev_comp[dna_len] = '\0';
Demo
You can make a small helper function to just do the reversing before you start swapping characters in the string though. Making small dedicated functions that does one thing only is good for debugging your program later. It's then easier to isolate and find problems. Example:
void rev(const char *in, size_t len, char *out) {
for(size_t i = 0; i < len; ++i) {
out[len - i - 1] = in[i];
}
out[len] = '\0';
}
And call it with
rev(dna, dna_len, rev_comp);
before swapping the letters:
for (size_t i = 0; i < dna_len; ++i) {
switch(rev_comp[i]) {
case 'A': rev_comp[i] = 'T'; break;
case 'T': rev_comp[i] = 'A'; break;
case 'G': rev_comp[i] = 'C'; break;
case 'C': rev_comp[i] = 'G'; break;
}
}
Demo

Related

String Palindrome function that doesn't work as intended in C [duplicate]

What is the easiest and most efficient way to remove spaces from a string in C?
Easiest and most efficient don't usually go together…
Here's a possible solution for in-place removal:
void remove_spaces(char* s) {
char* d = s;
do {
while (*d == ' ') {
++d;
}
} while (*s++ = *d++);
}
Here's a very compact, but entirely correct version:
do while(isspace(*s)) s++; while(*d++ = *s++);
And here, just for my amusement, are code-golfed versions that aren't entirely correct, and get commenters upset.
If you can risk some undefined behavior, and never have empty strings, you can get rid of the body:
while(*(d+=!isspace(*s++)) = *s);
Heck, if by space you mean just space character:
while(*(d+=*s++!=' ')=*s);
Don't use that in production :)
As we can see from the answers posted, this is surprisingly not a trivial task. When faced with a task like this, it would seem that many programmers choose to throw common sense out the window, in order to produce the most obscure snippet they possibly can come up with.
Things to consider:
You will want to make a copy of the string, with spaces removed. Modifying the passed string is bad practice, it may be a string literal. Also, there are sometimes benefits of treating strings as immutable objects.
You cannot assume that the source string is not empty. It may contain nothing but a single null termination character.
The destination buffer can contain any uninitialized garbage when the function is called. Checking it for null termination doesn't make any sense.
Source code documentation should state that the destination buffer needs to be large enough to contain the trimmed string. Easiest way to do so is to make it as large as the untrimmed string.
The destination buffer needs to hold a null terminated string with no spaces when the function is done.
Consider if you wish to remove all white space characters or just spaces ' '.
C programming isn't a competition over who can squeeze in as many operators on a single line as possible. It is rather the opposite, a good C program contains readable code (always the single-most important quality) without sacrificing program efficiency (somewhat important).
For this reason, you get no bonus points for hiding the insertion of null termination of the destination string, by letting it be part of the copying code. Instead, make the null termination insertion explicit, to show that you haven't just managed to get it right by accident.
What I would do:
void remove_spaces (char* restrict str_trimmed, const char* restrict str_untrimmed)
{
while (*str_untrimmed != '\0')
{
if(!isspace(*str_untrimmed))
{
*str_trimmed = *str_untrimmed;
str_trimmed++;
}
str_untrimmed++;
}
*str_trimmed = '\0';
}
In this code, the source string "str_untrimmed" is left untouched, which is guaranteed by using proper const correctness. It does not crash if the source string contains nothing but a null termination. It always null terminates the destination string.
Memory allocation is left to the caller. The algorithm should only focus on doing its intended work. It removes all white spaces.
There are no subtle tricks in the code. It does not try to squeeze in as many operators as possible on a single line. It will make a very poor candidate for the IOCCC. Yet it will yield pretty much the same machine code as the more obscure one-liner versions.
When copying something, you can however optimize a bit by declaring both pointers as restrict, which is a contract between the programmer and the compiler, where the programmer guarantees that the destination and source are not the same address. This allows more efficient optimization, since the compiler can then copy straight from source to destination without temporary memory in between.
In C, you can replace some strings in-place, for example a string returned by strdup():
char *str = strdup(" a b c ");
char *write = str, *read = str;
do {
if (*read != ' ')
*write++ = *read;
} while (*read++);
printf("%s\n", str);
Other strings are read-only, for example those declared in-code. You'd have to copy those to a newly allocated area of memory and fill the copy by skipping the spaces:
char *oldstr = " a b c ";
char *newstr = malloc(strlen(oldstr)+1);
char *np = newstr, *op = oldstr;
do {
if (*op != ' ')
*np++ = *op;
} while (*op++);
printf("%s\n", newstr);
You can see why people invented other languages ;)
#include <ctype>
char * remove_spaces(char * source, char * target)
{
while(*source++ && *target)
{
if (!isspace(*source))
*target++ = *source;
}
return target;
}
Notes;
This doesn't handle Unicode.
if you are still interested, this function removes spaces from the beginning of the string, and I just had it working in my code:
void removeSpaces(char *str1)
{
char *str2;
str2=str1;
while (*str2==' ') str2++;
if (str2!=str1) memmove(str1,str2,strlen(str2)+1);
}
#include<stdio.h>
#include<string.h>
main()
{
int i=0,n;
int j=0;
char str[]=" Nar ayan singh ";
char *ptr,*ptr1;
printf("sizeof str:%ld\n",strlen(str));
while(str[i]==' ')
{
memcpy (str,str+1,strlen(str)+1);
}
printf("sizeof str:%ld\n",strlen(str));
n=strlen(str);
while(str[n]==' ' || str[n]=='\0')
n--;
str[n+1]='\0';
printf("str:%s ",str);
printf("sizeof str:%ld\n",strlen(str));
}
The easiest and most efficient way to remove spaces from a string is to simply remove the spaces from the string literal. For example, use your editor to 'find and replace' "hello world" with "helloworld", and presto!
Okay, I know that's not what you meant. Not all strings come from string literals, right? Supposing this string you want spaces removed from doesn't come from a string literal, we need to consider the source and destination of your string... We need to consider your entire algorithm, what actual problem you're trying to solve, in order to suggest the simplest and most optimal methods.
Perhaps your string comes from a file (e.g. stdin) and is bound to be written to another file (e.g. stdout). If that's the case, I would question why it ever needs to become a string in the first place. Just treat it as though it's a stream of characters, discarding the spaces as you come across them...
#include <stdio.h>
int main(void) {
for (;;) {
int c = getchar();
if (c == EOF) { break; }
if (c == ' ') { continue; }
putchar(c);
}
}
By eliminating the need for storage of a string, not only does the entire program become much, much shorter, but theoretically also much more efficient.
/* Function to remove all spaces from a given string.
https://www.geeksforgeeks.org/remove-spaces-from-a-given-string/
*/
void remove_spaces(char *str)
{
int count = 0;
for (int i = 0; str[i]; i++)
if (str[i] != ' ')
str[count++] = str[i];
str[count] = '\0';
}
Code taken from zString library
/* search for character 's' */
int zstring_search_chr(char *token,char s){
if (!token || s=='\0')
return 0;
for (;*token; token++)
if (*token == s)
return 1;
return 0;
}
char *zstring_remove_chr(char *str,const char *bad) {
char *src = str , *dst = str;
/* validate input */
if (!(str && bad))
return NULL;
while(*src)
if(zstring_search_chr(bad,*src))
src++;
else
*dst++ = *src++; /* assign first, then incement */
*dst='\0';
return str;
}
Code example
Exmaple Usage
char s[]="this is a trial string to test the function.";
char *d=" .";
printf("%s\n",zstring_remove_chr(s,d));
Example Output
thisisatrialstringtotestthefunction
Have a llok at the zString code, you may find it useful
https://github.com/fnoyanisi/zString
That's the easiest I could think of (TESTED) and it works!!
char message[50];
fgets(message, 50, stdin);
for( i = 0, j = 0; i < strlen(message); i++){
message[i-j] = message[i];
if(message[i] == ' ')
j++;
}
message[i] = '\0';
Here is the simplest thing i could think of. Note that this program uses second command line argument (argv[1]) as a line to delete whitespaces from.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/*The function itself with debug printing to help you trace through it.*/
char* trim(const char* str)
{
char* res = malloc(sizeof(str) + 1);
char* copy = malloc(sizeof(str) + 1);
copy = strncpy(copy, str, strlen(str) + 1);
int index = 0;
for (int i = 0; i < strlen(copy) + 1; i++) {
if (copy[i] != ' ')
{
res[index] = copy[i];
index++;
}
printf("End of iteration %d\n", i);
printf("Here is the initial line: %s\n", copy);
printf("Here is the resulting line: %s\n", res);
printf("\n");
}
return res;
}
int main(int argc, char* argv[])
{
//trim function test
const char* line = argv[1];
printf("Here is the line: %s\n", line);
char* res = malloc(sizeof(line) + 1);
res = trim(line);
printf("\nAnd here is the formatted line: %s\n", res);
return 0;
}
This is implemented in micro controller and it works, it should avoid all problems and it is not a smart way of doing it, but it will work :)
void REMOVE_SYMBOL(char* string, uint8_t symbol)
{
uint32_t size = LENGHT(string); // simple string length function, made my own, since original does not work with string of size 1
uint32_t i = 0;
uint32_t k = 0;
uint32_t loop_protection = size*size; // never goes into loop that is unbrakable
while(i<size)
{
if(string[i]==symbol)
{
k = i;
while(k<size)
{
string[k]=string[k+1];
k++;
}
}
if(string[i]!=symbol)
{
i++;
}
loop_protection--;
if(loop_protection==0)
{
i = size;
break;
}
}
}
While this is not as concise as the other answers, it is very straightforward to understand for someone new to C, adapted from the Calculix source code.
char* remove_spaces(char * buff, int len)
{
int i=-1,k=0;
while(1){
i++;
if((buff[i]=='\0')||(buff[i]=='\n')||(buff[i]=='\r')||(i==len)) break;
if((buff[i]==' ')||(buff[i]=='\t')) continue;
buff[k]=buff[i];
k++;
}
buff[k]='\0';
return buff;
}
I assume the C string is in a fixed memory, so if you replace spaces you have to shift all characters.
The easiest seems to be to create new string and iterate over the original one and copy only non space characters.
I came across a variation to this question where you need to reduce multiply spaces into one space "represent" the spaces.
This is my solution:
char str[] = "Put Your string Here.....";
int copyFrom = 0, copyTo = 0;
printf("Start String %s\n", str);
while (str[copyTo] != 0) {
if (str[copyFrom] == ' ') {
str[copyTo] = str[copyFrom];
copyFrom++;
copyTo++;
while ((str[copyFrom] == ' ') && (str[copyFrom] !='\0')) {
copyFrom++;
}
}
str[copyTo] = str[copyFrom];
if (str[copyTo] != '\0') {
copyFrom++;
copyTo++;
}
}
printf("Final String %s\n", str);
Hope it helps :-)

Put characters from a char array in a string till a specific character is found

I'd like a reliable method to read the characters from a character array and put them in a string. This will happen till a \r is found. I can iterate through the array but have no good way to put that in a string. I am afraid to use malloc since, at times, puts garbage value in a string.
Here payload is the HTTP data from a TCP packet. \r\n\r\n indicates the end of the payload.
My code so far to iterate through the character array:
void print_payload(const unsigned char *payload, int len) {
int i;
const unsigned char *ch = payload;
for (i = 0; i < len; i++) {
if (strncmp((char*) ch, "\r\n\r\n", 4) == 0) {
// Indicates end of payload data.
break;
} else if (strncmp((char*) ch, "\r\n", 2) == 0) {
//Indicates EOL
printf("\r\n");
ch++;
i++;
} else if(strncmp((char*) ch, "Host:", 5) == 0){
printf("Host: ");
const unsigned char *del = ch + 6;
int i = 0;
while (del[i] != 13 ){
/*
*13 is decimal value for '\r'.
* The characters below are to be inserted
* in a string. Not sure how though.
*/
printf("%c",del[i]);
i++;
}
} else if(strncmp((char*) ch, "User-Agent: ", 11) == 0){
/*
* It has to implemented here as well.
* And in every case where my string matches.
*/
printf("UserAgent: ");
const unsigned char* del = ch + 11;
int i = 0;
while(del[i] != 13){
printf("%c")
}
}
ch++;
}
printf("\r\n\r\n");
printf("\n");
return;
}
Can somebody help me achieve this? I know this is basic but I'm still learning C Programming and am not sure how to do this. Thank in advance.
You have a few options. First, if you can limit the size of the string, and do not need it outside of the function, then a char array would work:
#define STRING_MAX_LEN 999//chux mentions this is better then just putting "1000" in the array[] - 1000 needs to make sense in terms of the program, or something you wish to enforce (and checked!)
char newString[STRING_MAX_LEN+1] = {0};//Initialize to NULL value.
There is no reason to fear malloc though - just remember to work safely and free, and you should be fine:
char *newString = malloc(sizeof(char)*(len+1)); //Better limit on needed space - +1 for a final '\0'.
if (!newString) //Oh no! hard fail.
//do Something
}
memset(newString,0,sizeof(char)*(len+1)); //No garbage in my new string anymore!
...
...
free(newString);
//Finish up with program
You will not even have to append a '\0' - you are already sure the buffer is full of them, so you a valid C string. Note sizeof(char) may be redundant but I like to keep it anyway, in case one day it will not equal 1.
Note if you have to return the new string for some reason you must use a dynamically allocated array, using malloc. Finally, if you only need to check/hold one sub-string at a time, then re-using the same string is preferable.
void print_payload(const unsigned char *payload, int len)
{
int i;
char c;
char *p;
p = (char*)payload;
for(i=0;i<len;i++) {
if(!strncmp(&p[i],"\r\n\r\n",4)) {
c = p[i+4];
p[i+4] = 0;
break;
}
}
if(i==len) {
return;
}
printf("%s\n",p);
p[i+4] = c;
}

Custom STRCAT is overwhelmed by too many arguments

I am trying to code a custom strcat that separates arguments with \n except for the last one and terminates the string with \0.
It's working fine as is up to 5 arguments, but if I try passing a sixth one I get a strange line in response :
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok
ok
ok
ok
ok
ok
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok ok
ok
ok
ok
ok
ok
P/Users/domingodelmasok
Here is my custom strcat code:
char cat(char *dest, char *src, int current, int argc_nb)
{
int i = 0;
int j = 0;
while(dest[i])
i++;
while(src[j])
{
dest[i + j] = src[j];
j++;
}
if(current < argc_nb - 1)
dest[i + j] = '\n';
else
dest[i + j] = '\0';
return(*dest);
}
UPDATE Complete calling function:
char *concator(int argc, char **argv)
{
int i;
int j;
int size = 0;
char *str;
i = 1;
while(i < argc)
{
j = 0;
while(argv[i][j])
{
size++;
j++;
}
i++;
}
str = (char*)malloc(sizeof(*str) * (size + 1));
i = 1;
while(i < argc)
{
cat(str, argv[i], i, argc);
i++;
}
free(str);
return(str);
}
What's wrong here?
Thanks!
Edit: Fixed blunder.
There are quite a few issues with the code:
sizeof (char) == 1 by the C standard.
cat() requires the destination to be a string (terminated by a \0), but does not append it itself (except for current >= argc_nb - 1). This is a bug.
free(str); return str; is an use-after-free bug. If you call free(str), the contents at str are irrevocably lost, inaccessible. The free(str) should simply be removed; it is not appropriate here.
Arrays in C are indexed at 0. However, the concator() function skips the first string pointer (because argv[0] contains the name used to execute the program). This is wrong, and will eventually trip someone. Instead, have concator() add all strings in the array, but call it using concator(argc - 1, argv + 1);.
There might be even more, but at this point, I believe a rewrite from scratch, using a much more appropriate approach, is in order.
Consider the following join() function:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
char *join(const size_t parts, const char *part[],
const char *separator, const char *suffix)
{
const size_t separator_len = (separator) ? strlen(separator) : 0;
const size_t suffix_len = (suffix) ? strlen(suffix) : 0;
size_t total_len = 0;
size_t p;
char *dst, *end;
/* Calculate sum of part lengths */
for (p = 0; p < parts; p++)
if (part[p])
total_len += strlen(part[p]);
/* Add separator lengths */
if (parts > 1)
total_len += (parts - 1) * separator_len;
/* Add suffix length */
total_len += suffix_len;
/* Allocate enough memory, plus end-of-string '\0' */
dst = malloc(total_len + 1);
if (!dst)
return NULL;
/* Keep a pointer to the current end of the result string */
end = dst;
/* Append each part */
for (p = 0; p < parts; p++) {
/* Insert separator */
if (p > 0 && separator_len > 0) {
memcpy(end, separator, separator_len);
end += separator_len;
}
/* Insert part */
if (part[p]) {
const size_t len = strlen(part[p]);
if (len > 0) {
memcpy(end, part[p], len);
end += len;
}
}
}
/* Append suffix */
if (suffix_len > 0) {
memcpy(end, suffix, suffix_len);
end += suffix_len;
}
/* Terminate string. */
*end = '\0';
/* All done. */
return dst;
}
The logic is simple. First, we find out the length of each component. Note that separator is only added between parts (so occurs parts-1 times), and suffix at the very end.
(The (string) ? strlen(string) : 0 idiom just means "if string is non-NULL, strlen(0), otherwise 0". We do that, because we allow NULL separator and suffix, but strlen(NULL) is Undefined Behaviour.)
Next, we allocate enough memory for the result, including the end-of-string NUL char, \0, that was not included in the lengths.
To append each part, we keep the result pointer intact, and instead use a temporary end pointer. (It is the end of the string thus far.) We use a loop, where we copy the next part to the end. Before the second and subsequent parts, we copy the separator before the part.
Next, we copy the suffix, and finally the end-of-string '\0'. (It is important to return a pointer to the beginning of the string, rather than end, of course; and that is why we kept dst to point to the new resulting string, and end at the point we appended each substring.)
You could use it from the command line using for example the following main():
int main(int argc, char *argv[])
{
char *result;
if (argc < 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s SEPARATOR SUFFIX PART [ PART ... ]\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
result = join(argc - 3, (const char **)(argv + 3), argv[1], argv[2]);
if (!result) {
fprintf(stderr, "Failed.\n");
return EXIT_FAILURE;
}
fputs(result, stdout);
return EXIT_SUCCESS;
}
If you compile the above to e.g. example (I use gcc -Wall -O2 example.c -o example), then running
./example ', ' $'!\n' Hello world
in a Bash shell outputs
Hello, world!
(with a newline at end). Running
./example ' and ' $'.\n' a b c d e f g
outputs
a and b and c and d and e and f and g
(again with a newline at end). The $'...' is just a Bash idiom to specify special characters in strings; $'!\n' is the same in Bash as "!\n" is in C, and $'.\n' is the Bash equivalent of ".\n" in C.
(Removing the automatic newline between parts, and allowing a string rather than just one char to be used as a separator and suffix, was a deliberate choice for two reasons. The main one is to stop anyone from just copy-pasting this as an answer to some exercise. The secondary one is to show that while it might sound more complicated than just using single characters for them, it is actually very little additional code; and if you consider the practical use cases, allowing a string to be used as the separator opens up a lot of options.)
The example code above is only very lightly tested, and might contain bugs. If you find any, or disagree with anything I've written above, do let me know in a comment so I can review, and fix as necessary.

void function that removes al the non alphabet chars

I am trying to write a program that gets several strings until it gets the 'Q' string (this string basically stops the scanf).
Each one of the strings is sent to a function that romoves everything except the letters. For example if I scan 'AJUYFEG78348' the printf should be 'AJUYFEG'.
The problem is that the function has to be void.
I have tried several ways to make the "new array with only letters" printed, but none of them worked.
(Is is not allowed to use strlen function)
#include <stdio.h>
void RemoveNonAlphaBetChars(char*);
int main()
{
int flag=1;
char array[100]={0};
while (flag == 1)
{
scanf("%s", &array);
if(array[0] == 'Q' && array[1] =='\0') {
flag=0;
}
while (flag == 1)
{
RemoveNonAlphaBetChars(array);
}
}
return 0;
}
void RemoveNonAlphaBetChars(char* str)
{
int i=0, j=0;
char new_string[100]={0};
for (i=0; i<100; i++)
{
if (((str[i] >= 'a') && (str[i] <= 'z')) || ((str[i] >= 'A') && (str[i] <= 'Z')))
{
new_string[j] = str[i];
j++;
}
}
printf("%s", new_string);
return;
}
The fact that the function has only one argument, non-const char pointer, hints at the fact that the string is going to be changed in the call (better document it anyway), and it's perfectly all right.
A few fixes to your code can make it right:
First, don't loop to the end of the buffer, just to the end of the string (without strlen, it's probably faster too):
for (i=0; str[i] != '\0'; i++)
then don't forget to nul-terminate the new string after your processing:
new_string[j] = '\0';
Then, in the end (where you're printing the string) copy the new string into the old string. Since it's smaller, there's no risk:
strcpy(str,new_string);
now str contains the new stripped string.
Another approach would be to work in-place (without another buffer): each time you encounter a character to remove, copy the rest of the string at this position, and repeat. It can be inefficient if there are a lot of characters to remove, but uses less memory.
The key here is that you are never inserting new characters into the string. That guarantees that the input buffer is large enough to hold the result. It also makes for an easy in-place solution, which is what the void return type is implying.
#include <ctype.h>
#include <stdio.h>
...
void RemoveNonAlphaBetChars(char* str)
{
char *from, *to;
for(from = to = str; *from; from++) {
if(isalpha(*from)) {
if(from > to) *to = *from;
to++;
}
}
*to = *from;
printf("%s\n", str);
return;
}
The pointer from steps along the string until it points to a NUL character, hence the simple condition in the loop. to only receives the value of from if it is a character. The final copy after the loop ensures NUL termination.
Update
If you are dealing with 1) particularly large strings, and 2) you have long stretches of letters with some numbers in between, and 3) your version of memmove is highly optimized compared to copying things manually (e.g. with a special processor instruction), you can do the following:
#include <stdio.h>
#include <ctype.h>
#include <string.h>
...
void RemoveNonAlphaBetChars(char* str)
{
char *from, *to, *end;
size_t len;
for(from = to = str; *from; from = end) {
for(; *from && !isalpha(*from); from++) ;
for(end = from; *end && isalpha(*end); end++) ;
len = end - from;
if(from > to) {
if(len > 1) {
memmove(to, from, len);
} else {
*to = *from;
}
}
to += len;
}
*to = *end;
printf("%s\n", str);
return;
}
The general idea is to find the limits of each range of letters (between from and end), and copy into to block by block. As I stated before though, this version should not be used for the general case. It will only give you a boost when there is a huge amount of data that meets particular conditions.
void return type is a common approach to making functions that produce C string results. You have two approaches to designing your API:
Make a non-destructive API that takes output buffer and its length, or
Make an API that changes the the string in place.
The first approach would look like this:
void RemoveNonAlphaBetChars(const char* str, char *result, size_t resultSize) {
...
}
Use result in place of new_string, and make sure you do not go past resultSize. The call would look like this:
if (flag == 1) { // if (flag == 1), not while (flag == 1)
char result[100];
RemoveNonAlphaBetChars(array, result, 100);
printf("%s\n", result);
}
If you decide to use the second approach, move printf into main, and use strcpy to copy the content of new_string back into str:
strcpy(str, new_string);

Print out all possible strings in C using recursion

My assignment is that code takes a string as input, and if there are x's in the string replace them with either a 0 or a 1, and print out all the possible string combinations. We have to use recursion for this as well. For example, if the input string was "1x0X" the output would be:
1000
1001
1100
1101
I'm really struggling with how I'm supposed to find all the permutations of the string without having the complete string yet. I have a series of function that combine to print out all permutations of a list of numbers, but I don't know how to make a function where it only permutes certain elements of a list.
Does anyone have any suggestions on how to accomplish this?
Jonathan's initial suggestion
This code implements what I suggested in a comment essentially verbatim. It accepts either x or X as a valid marker because the examples in the question do too.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void map_x(const char *str)
{
size_t xloc = strcspn(str, "xX");
if (str[xloc] == '\0')
printf("%s\n", str);
else
{
char *copy = strdup(str);
copy[xloc] = '0';
map_x(copy);
copy[xloc] = '1';
map_x(copy);
free(copy);
}
}
int main(void)
{
char buffer[4096];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
map_x(buffer);
}
return 0;
}
The main() function is essentially the same in all three variants. The use of strcspn() is a standard idiom that trims everything from the first newline onwards, or overwrites the end of the string if there is no newline in it.
Note that this solution is safe even if a read-only string literal is passed to the function; it does not modify the string that it is passed. The following solutions will both crash or otherwise fail if the initial string is in fact a read-only string literal.
It would be possible to determine the string length, allocate a VLA (variable length array) to take the string copy, and copy the string into the VLA. That would dramatically reduce the cost of allocating memory for the string (VLA allocation is much simpler than a general purpose memory allocator).
Gene's Suggestion
This code implements what Gene suggested in a comment. It will be more efficient because it does no extra memory allocation, an expensive operation on most systems.
#include <stdio.h>
#include <string.h>
static void map_x(char *str)
{
size_t xloc = strcspn(str, "xX");
if (str[xloc] == '\0')
printf("%s\n", str);
else
{
char letter = str[xloc];
str[xloc] = '0';
map_x(str);
str[xloc] = '1';
map_x(str);
str[xloc] = letter;
}
}
int main(void)
{
char buffer[4096];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
map_x(buffer);
}
return 0;
}
Mildly optimized variant
This optimizes the work by not rescanning the prefix that is already known to be free of x's.
/* SO 4764-4683 */
#include <stdio.h>
#include <string.h>
static void map_x(char *str, size_t offset)
{
size_t xloc = strcspn(&str[offset], "xX") + offset;
if (str[xloc] == '\0')
printf("%s\n", str);
else
{
char letter = str[xloc];
str[xloc] = '0';
map_x(str, xloc);
str[xloc] = '1';
map_x(str, xloc);
str[xloc] = letter;
}
}
int main(void)
{
char buffer[4096];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
map_x(buffer, 0);
}
return 0;
}
The difference in performance is probably not measurable on almost any input simply because the I/O time will dominate.
With all due respect to chux, I think that the code in the answer is more complex than necessary. The extra data structure seems like overkill.
Recursion is best used when the recursive depth is limited and not too big.
Yet setting aside that axiom, below is a double recursive solution. Its eats up stack space quickly (that is its biggest constraint) and so is not a robust solution. Yet it gets the job.
For "code takes a string as input,", use fgets() - not shown.
Yet in the spirit of recursion why not recurse the input too? The print_combo() recursion produces a linked list (LL) of characters and keeps track of the number of 'x' read. Once an end-of-line/end-of-file occurs, it is time to print and the linked-list starts with the last character.
The foo() recursion prints the LL in reverse order, passing in a binary mask to direct the x substitution of 0 or 1. The unsigned binary mask is good for typically 32 x's. That is another restriction.
If you must, mouse over for the code.
typedef struct node {
const struct node *prev;
int ch;
} node;
// Print the line
void foo(const node *prev, unsigned mask) {
if (prev) {
if (prev->ch == 'x' || prev->ch == 'X') {
foo(prev->prev, mask >> 1);
putchar("01"[mask & 1]);
} else {
foo(prev->prev, mask);
putchar(prev->ch);
}
}
}
// Read, form the LL and then print
void print_combo(const node *prev, unsigned xcount) {
node n = {.prev = prev, .ch = getchar()};
if (n.ch == '\n' || n.ch == EOF) {
for (unsigned mask = 0; mask < (1u << xcount); mask++) {
foo(prev, mask);
putchar('\n');
}
} else {
print_combo(&n, xcount + (n.ch == 'x' || n.ch == 'X'));
}
}
int main(void) {
print_combo(NULL, 0);
}
Input
00x01x10x11
Output
00001010011
00001010111
00001110011
00001110111
00101010011
00101010111
00101110011
00101110111
I would do something a bit simpler. Just use a position parameter to iterate over the input string. Whenever you hit the 'x' character recurse twice, once for '0' and once for '1'. Make sure to reset the character back to 'x' after you return. Whenever you hit the digit character just recurse once. Increment the position parameter each time you recurse. When you hit the end of the string, print it out. With this idea, you'd get something like this:
#include <stdio.h>
void print_combo(char *str, int pos) {
char c;
c = str[pos];
switch (c) {
case '0':
case '1':
print_combo(str, pos+1);
break;
case 'x':
case 'X':
str[pos] = '0';
print_combo(str, pos+1);
str[pos] = '1';
print_combo(str, pos+1);
str[pos] = c;
break;
case '\0':
printf("%s\n", str);
break;
default:
printf("bad input\n");
break;
}
}
int main() {
char str[10];
strcpy(str, "1x0x");
printf("printing %s\n", str);
print_combo(str, 0);
strcpy(str, "0x01x");
printf("printing %s\n", str);
print_combo(str, 0);
strcpy(str, "0x01x0X1");
printf("printing %s\n", str);
print_combo(str, 0);
return 0;
}
My output looks like this:
printing 1x0x
1000
1001
1100
1101
printing 0x01x
00010
00011
01010
01011
printing 0x01x0X1
00010001
00010011
00011001
00011011
01010001
01010011
01011001
01011011

Resources