This question already has answers here:
Modify the content of char* str
(2 answers)
Closed 10 months ago.
Im getting a segfault when changing dna[i] to U, Ive debbuged but I still cant understand why.
Also I was comparing the value at a position against T with strcmp, but from what I understand thats for string literals and I can simply compare it with dna[i] == 'T'. Is that right? thanks.
#include <string.h>
char *dna_to_rna(char *dna) {
int size = (int)( sizeof(dna) / sizeof(dna[0]));
char *retour[size];
strcpy(retour, dna);
for (int i = 0; i < size; i++) {
if (dna[i] == 'T') {
dna[i] = 'U';
}
}
return (char *) retour;
}
int main() {
char *dna[] = {
"TTTT",
"GCAT",
"GACCGCCGCC"
};
char *actual, *expected;
size_t n;
for (n = 0; n < 3; ++n) {
actual = dna_to_rna(*(dna + n));
}
return 0;
}
You are passing to the function dna_to_rna a pointer to a string literal
actual = dna_to_rna(*(dna + n));
and then within the function you are trying to change the string literal
if (dna[i] == 'T') {
dna[i] = 'U';
}
Any attempt to change a string literal results in undefined behavior.
Also the expression with the sizeof operator in this declaration
int size = (int)( sizeof(dna) / sizeof(dna[0]));
does not make a sense. It evaluates the size of a pointer of the type char *.
Instead you should use the standard string function strlen.
And this declaration is incorrect
char *retour[size];
At least you need a character array instead of an array of pointers.
char retour[size];
And the function returns a pointer to an array with automatic storage duration that will not be alive after exiting the function
char *dna_to_rna(char *dna) {
//...
char retour[size];
//...
return (char *) retour;
}
that is the function returns an invalid pointer.
You should dynamically allocate a character array within the function with the length strlen( dna ) + 1 and change and return this array.
It seems what you mean is something like the following
#include <string.h>
#include <stdlib.h>
char * dna_to_rna( const char *dna )
{
size_t n = strlen( dna );
char *retour = malloc( n + 1 );
if ( retour != NULL )
{
strcpy( retour, dna );
for ( size_t i = 0; i < n; i++ )
{
if ( retour[i] == 'T' ) retour[i] = 'U';
}
}
return retour;
}
I am supposed to create a program, which creates an array with the abbreviation of an constant char Array. While my program does not return any errors, it also does not print any characters at my certain printf spots. Because of that I assume that my program does not work properly, and it isn't filling my array with any characters.
void abbrev(const char s[], char a[], size_t size) {
int i = 0;
while (*s != '\0') {
printf('%c', *s);
if (*s != ' ' && *s - 1 == ' ') {
a[i] = *s;
i++;
printf('%c', a[i]);
}
s++;
}
}
void main() {
char jordan1[60] = " Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
system("PAUSE");
}
The actual result is nothing. At least I assume so, since my console isn't showing anything. The result should be "EFF" and the size_t size is supposed to limit my char array a, in case the abbreviation is too long. So it should only implement the letters until my array is full and then the '\0', but I did not implement it yet, since my program is apparantly not filling the array at all.
#include <stdio.h>
#include <ctype.h>
/* in: the string to abbreviate
out: output abbreviation. Function assumes there's enough room */
void abbrev(const char in[], char out[])
{
const char *p;
int zbPosOut = 0; /* current zero-based position within the `out` array */
for (p = in; *p; ++p) { /* iterate through `in` until we see a zero terminator */
/* if the letter is uppercase
OR if (the letter is alphabetic AND we are not at the zero
position AND the previous char. is a space character) OR if the
letter is lowercase and it is the first char. of the array... */
if (isupper(*p) || (isalpha(*p) && (p - in) > 0 && isspace(p[-1]))
|| (islower(*p) && p == in)) {
out[zbPosOut++] = *p; /* ... then the letter is the start letter
of a word, so add it to our `out` array, and
increment the current `zbPosOut` */
}
}
out[zbPosOut] = 0; /* null-terminate the out array */
}
This code says a lot in few lines. Let's take a look:
isupper(*p) || (isalpha(*p) && (p - in) > 0 && isspace(p[-1]))
|| (islower(*p) && p == in)
If the current character (*p) is an uppercase character OR if it is alphabetc (isalpha(*p) and the previous character p[-1] is a space, then we may consider *p to be the first character of a word, and it should be added to our out array. We include the test (p - in) > 0 because if p == in, then we are at the zero position of the array and therefore p[-1] is undefined.
The order in this expression matters a lot. If we were to put (p - in) > 0 after the isspace(p[-1]) test, then we would not be taking advantage of the laziness of the && operator: as soon as it encounters a false operand, the following operand is not evaluated. This is important because if p - in == 0, then we do not want to evaluate the isspace(p[-1]) expression. The order in which we have written the tests makes sure that isspace(p[-1]) is evaluated after making sure we are not at the zero position.
The final expression (islower(*p) && p == in) handles the case where the first letter is lowercase.
out[zbPosOut++] = *p;
We append the character *p to the out array. The current position of out is kept track of by the zbPosOut variable, which is incremented afterwards (which is why we use postscript ++ rather than prefix).
Code to test the operation of abbrev:
int main()
{
char jordan1[] = " electronic frontier foundation ";
char out[16];
abbrev(jordan1, out);
puts(out);
return 0;
}
It gives eff as the output. For it to look like an acronym, we can change the code to append the letter *p to out to:
out[zbPosOut++] = toupper(*p);
which capitalizes each letter added to the out array (if *p is already uppercase, toupper just returns *p).
void print_without_duplicate_leading_trailing_spaces(const char *str)
{
while(*str == ' ' && *str) str++;
while(*str)
{
if(*str != ' ' || (*str == ' ' && *(str + 1) != ' ' && *str))
{
putchar(*str);
}
str++;
}
}
What you want to do could be simplified with a for() loop.
#include <stdio.h>
#include <string.h>
void abbrev(const char s[], char a[], size_t size) {
int pos = 0;
// Loop for every character in 's'.
for (int i = 0; i < strlen(s); i++)
// If the character just before was a space, and this character is not a
// space, and we are still in the size bounds (we subtract 1 for the
// terminator), then copy and append.
if (s[i] != ' ' && s[i - 1] == ' ' && pos < size - 1)
a[pos++] = s[i];
printf("%s\n", a); // Print.
}
void main() {
char jordan1[] = " Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
}
However, I don't think this is the best way to achieve what you are trying to do. Firstly, char s[0] cannot be gotten due to the check on the previous character. Which brings me to the second reason: On the first index you will be checking s[-1] which probably isn't a good idea. If I were implementing this function I would do this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void abbrev(const char s[], char a[], size_t size) {
char *str = strdup(s); // Make local copy.
size_t i = 0;
// Break it up into words, then grab the first character of each word.
for (char *w = strdup(strtok(str, " ")); w != NULL; w = strtok(NULL, " "))
if (i < size - 1)
a[i++] = w[0];
free(str); // Release our local copy.
printf("%s\n", a);
}
int main() {
char jordan1[] = "Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
return 0;
}
I have seen the standard implementation of strlen using pointer as:
int strlen(char * s) {
char *p = s;
while (*p!='\0')
p++;
return p-s;
}
I get this works, but when I tried to do this using 3 more ways (learning pointer arithmetic right now), I would want to know whats wrong with them?
This is somewhat similar to what the book does. Is this wrong?
int strlen(char * s) {
char *p = s;
while (*p)
p++;
return p-s;
}
I though it would be wrong if I pass an empty string but still gives me 0, kinda confusing since p is pre increment: (and now its returning me 5)
int strlen(char * s) {
char *p = s;
while (*++p)
;
return p-s;
}
Figured this out, does the post increment and returns +1 on it.
int strlen(char * s) {
char *p = s;
while (*p++)
;
return p-s;
}
1) Looks fine to me. I personally prefer the explicit comparison against '\0' so that it's clear you didn't mean to (for example) compare p to the NULL pointer in situations where it's not clear from context.
2) When your program runs, the area of memory known as the stack is uninitialized. Local variables live there. The way you wrote your program puts p in the stack (if you made it const or used malloc, it would almost certainly live elsewhere). What happens when you look at *p is that you then peek at the stack. If the string is length 0, this is the same as char p[1] = {0}. Pre-incrementing looks at the byte immediately after the \0, so you're looking at undefined memory. Here be dragons!
3) I don't think there's a question there :) As you see, it always returns one more than the correct answer.
Addendum: You can also write this using a for-loop, if you prefer this style:
size_t strlen(char * s) {
char *p = s;
for (; *p != '\0'; p++) {}
return p - s;
}
Or (more error-prone-ly)
size_t strlen(char * s) {
char *p = s;
for (; *p != '\0'; p++);
return p - s;
}
Also, strlen can't return a negative number, so you should use an unsigned value. size_t is even better.
Version 1 is fine - while (*p != '\0') is equivalent to while (*p != 0), which is equivalent to while (*p).
In the original code and version 1, the pointer p is advanced if and only if *p is not 0 (IOW, you're not at the end of the string).
Versions 2 and 3 advance p regardless of whether *p is 0 or not. *p++ evaluates to the character p points to, and as a side effect advances p. *++p evaluates to the character following the character p points to, and as a side effect advances p. Therefore, versions 2 and 3 will always advance p past the end of the string, which is why your values are off.
One issue you will run into when you compare the performance of strlen replacement functions is their performance will suffer compared to the actual strlen function for long strings? Why? strlen processes more than one-byte per iteration in searching for the end of string. How can you implement a more efficient replacement?
It's not that difficult. The basic approach is to look at 4-bytes per iteration and adjust the return based on where within those 4-bytes the nul-byte is found. You could do something like the following (using array indexing):
size_t strsz_idx (const char *s) {
size_t len = 0;
for(;;) {
if (s[0] == 0) return len;
if (s[1] == 0) return len + 1;
if (s[2] == 0) return len + 2;
if (s[3] == 0) return len + 3;
s += 4, len += 4;
}
}
You can do the exact same thing using pointers and masks:
size_t strsz (const char *s) {
size_t len = 0;
for(;;) {
unsigned x = *(unsigned*)s;
if((x & 0xff) == 0) return len;
if((x & 0xff00) == 0) return len + 1;
if((x & 0xff0000) == 0) return len + 2;
if((x & 0xff000000) == 0) return len + 3;
s += 4, len += 4;
}
}
Either way, you will find a 4-byte comparison each iteration will give you performance equivalent to strlen itself.
This question already has answers here:
How do I use strtok with every single nonalpha character as a delimeter? (C)
(4 answers)
Closed 7 years ago.
char *p_word;
p_word = strtok (p_input, " ,.-:\n1234567890");
while (p_word != NULL)
{
printf ("%s\n", p_word);
p_word = strtok (NULL, " ,.-:\n1234567890");
}
I'm reading in a text file and want to perform various functions on each word at a time, ignoring any characters that arent part of the alphabet.
I am wanting to know if there is a way instead of typing every single undesired character into the delimiter (e.g. " ,.-:\n1234567890"), that I can specify a range of ASCII decimal values I dont want, i.e. 0-64, or otherwise NOT alphabet characters.
Thanks
EDIT: I'm not allowed to use material that hasn't been taught so I dont think I can use functions from "ctype.h"
If you must use strtok, you can build a delimiter string like this (assumes ASCII character set) which excludes the alphabet.
char *p_word;
char delims[128];
int dindex;
int i;
dindex = 0;
for (i = 1; i < 'A'; i++)
delims[dindex++] = i;
for (i = 'Z' + 1; i < 'a'; i++)
delims[dindex++] = i;
for (i = 'z' + 1; i < 128; i++)
delims[dindex++] = i;
delims[dindex] = '\0';
p_word = strtok (p_input, delims);
You can write your own strtok function that will accept a predicate as the second parameter.
Of course you should use some other name for the function as you like.
Here is a demonstrative program. I have written a simplified predicate that checks any alpha ASCII character. You may use your own predicate.
#include <stdio.h>
char * strtok( char *s, int cmp( char ) )
{
static char *p;
if ( s ) p = s;
if ( p )
{
while ( *p && cmp( *p ) ) ++p;
}
if ( !p || !*p ) return NULL;
char *t = p++;
while ( *p && !cmp( *p ) ) ++p;
if ( *p ) *p++ = '\0';
return t;
}
int cmp( char c )
{
c |= 0x20;
return c < 'a' || c > 'z';
}
int main( void )
{
char s[] = " ABC123abc<>XYZ!##xyz";
char *p = strtok( s, cmp );
while ( p )
{
puts( p );
p = strtok( NULL, cmp );
}
}
The program output is
ABC
abc
XYZ
xyz
Using the predicate you can specify in it any rules for skipped characters.
Given a string of hex values i.e. e.g. "0011223344" so that's 0x00, 0x11 etc.
How do I add these values to a char array?
Equivalent to say:
char array[4] = { 0x00, 0x11 ... };
You can't fit 5 bytes worth of data into a 4 byte array; that leads to buffer overflows.
If you have the hex digits in a string, you can use sscanf() and a loop:
#include <stdio.h>
#include <ctype.h>
int main()
{
const char *src = "0011223344";
char buffer[5];
char *dst = buffer;
char *end = buffer + sizeof(buffer);
unsigned int u;
while (dst < end && sscanf(src, "%2x", &u) == 1)
{
*dst++ = u;
src += 2;
}
for (dst = buffer; dst < end; dst++)
printf("%d: %c (%d, 0x%02x)\n", dst - buffer,
(isprint(*dst) ? *dst : '.'), *dst, *dst);
return(0);
}
Note that printing the string starting with a zero-byte requires care; most operations terminate on the first null byte. Note that this code did not null-terminate the buffer; it is not clear whether null-termination is desirable, and there isn't enough space in the buffer I declared to add a terminal null (but that is readily fixed). There's a decent chance that if the code was packaged as a subroutine, it would need to return the length of the converted string (though you could also argue it is the length of the source string divided by two).
I would do something like this;
// Convert from ascii hex representation to binary
// Examples;
// "00" -> 0
// "2a" -> 42
// "ff" -> 255
// Case insensitive, 2 characters of input required, no error checking
int hex2bin( const char *s )
{
int ret=0;
int i;
for( i=0; i<2; i++ )
{
char c = *s++;
int n=0;
if( '0'<=c && c<='9' )
n = c-'0';
else if( 'a'<=c && c<='f' )
n = 10 + c-'a';
else if( 'A'<=c && c<='F' )
n = 10 + c-'A';
ret = n + ret*16;
}
return ret;
}
int main()
{
const char *in = "0011223344";
char out[5];
int i;
// Hex to binary conversion loop. For example;
// If in="0011223344" set out[] to {0x00,0x11,0x22,0x33,0x44}
for( i=0; i<5; i++ )
{
out[i] = hex2bin( in );
in += 2;
}
return 0;
}
If the string is correct and no need to keep its content then i would do it this way:
#define hex(c) ((*(c)>='a')?*(c)-'a'+10:(*(c)>='A')?*(c)-'A'+10:*(c)-'0')
void hex2char( char *to ){
for(char *from=to; *from; from+=2) *to++=hex(from)*16+hex(from+1);
*to=0;
}
EDIT 1: sorry, i forget to calculate with the letters A-F (a-f)
EDIT 2: i tried to write a more pedantic code:
#include <string.h>
int xdigit( char digit ){
int val;
if( '0' <= digit && digit <= '9' ) val = digit -'0';
else if( 'a' <= digit && digit <= 'f' ) val = digit -'a'+10;
else if( 'A' <= digit && digit <= 'F' ) val = digit -'A'+10;
else val = -1;
return val;
}
int xstr2str( char *buf, unsigned bufsize, const char *in ){
if( !in ) return -1; // missing input string
unsigned inlen=strlen(in);
if( inlen%2 != 0 ) return -2; // hex string must even sized
for( unsigned i=0; i<inlen; i++ )
if( xdigit(in[i])<0 ) return -3; // bad character in hex string
if( !buf || bufsize<inlen/2+1 ) return -4; // no buffer or too small
for( unsigned i=0,j=0; i<inlen; i+=2,j++ )
buf[j] = xdigit(in[i])*16 + xdigit(in[i+1]);
buf[inlen/2] = '\0';
return inlen/2+1;
}
Testing:
#include <stdio.h>
char buf[100] = "test";
void test( char *buf, const char *s ){
printf("%3i=xstr2str( \"%s\", 100, \"%s\" )\n", xstr2str( buf, 100, s ), buf, s );
}
int main(){
test( buf, (char*)0 );
test( buf, "123" );
test( buf, "3x" );
test( (char*)0, "" );
test( buf, "" );
test( buf, "3C3e" );
test( buf, "3c31323e" );
strcpy( buf, "616263" ); test( buf, buf );
}
Result:
-1=xstr2str( "test", 100, "(null)" )
-2=xstr2str( "test", 100, "123" )
-3=xstr2str( "test", 100, "3x" )
-4=xstr2str( "(null)", 100, "" )
1=xstr2str( "", 100, "" )
3=xstr2str( "", 100, "3C3e" )
5=xstr2str( "", 100, "3c31323e" )
4=xstr2str( "abc", 100, "abc" )
I was searching for the same thing and after reading a lot, finally created this function. Thought it might help, someone
// in = "63 09 58 81"
void hexatoascii(char *in, char* out, int len){
char buf[5000];
int i,j=0;
char * data[5000];
printf("\n size %d", strlen(in));
for (i = 0; i < strlen(in); i+=2)
{
data[j] = (char*)malloc(8);
if (in[i] == ' '){
i++;
}
else if(in[i + 1] == ' '){
i++;
}
printf("\n %c%c", in[i],in[i+1]);
sprintf(data[j], "%c%c", in[i], in[i+1]);
j++;
}
for (i = 0; i < j-1; i++){
int tmp;
printf("\n data %s", data[i] );
sscanf(data[i], "%2x", &tmp);
out[i] = tmp;
}
//printf("\n ascii value of hexa %s", out);
}
Let's say this is a little-endian ascii platform.
Maybe the OP meant "array of char" rather than "string"..
We work with pairs of char and bit masking.. note shiftyness of x16..
/* not my original work, on stacko somewhere ? */
for (i=0;i < 4;i++) {
char a = string[2 * i];
char b = string[2 * i + 1];
array[i] = (((encode(a) * 16) & 0xF0) + (encode(b) & 0x0F));
}
and function encode() is defined...
unsigned char encode(char x) { /* Function to encode a hex character */
/****************************************************************************
* these offsets should all be decimal ..x validated for hex.. *
****************************************************************************/
if (x >= '0' && x <= '9') /* 0-9 is offset by hex 30 */
return (x - 0x30);
else if (x >= 'a' && x <= 'f') /* a-f offset by hex 57 */
return(x - 0x57);
else if (x >= 'A' && x <= 'F') /* A-F offset by hex 37 */
return(x - 0x37);
}
This approach floats around elsewhere, it is not my original work, but it is old.
Not liked by the purists because it is non-portable, but extension would be trivial.
The best way I know:
int hex2bin_by_zibri(char *source_str, char *dest_buffer)
{
char *line = source_str;
char *data = line;
int offset;
int read_byte;
int data_len = 0;
while (sscanf(data, " %02x%n", &read_byte, &offset) == 1) {
dest_buffer[data_len++] = read_byte;
data += offset;
}
return data_len;
}
The function returns the number of converted bytes saved in dest_buffer.
The input string can contain spaces and mixed case letters.
"01 02 03 04 ab Cd eF garbage AB"
translates to dest_buffer containing
01 02 03 04 ab cd ef
and also
"01020304abCdeFgarbageAB"
translates as before.
Parsing stops at the first "error" (non hex, non space).
Note: also this is a valid string:
"01 2 03 04 ab Cd eF garbage AB"
and produces:
01 02 03 04 ab cd ef
Fatalfloor...
There are a couple of ways to do this... first, you can use memcpy() to copy the exact representation into the char array.
You can use bit shifting and bit masking techniques as well. I'm guessing this is what you need to do as it sounds like a homework problem.
Lastly, you can use some fancy pointer indirection to copy the memory location you need.
All of these methods are detailed here:
Store an int in a char array?
Give a best way:
Hex string to numeric value , i.e. str[] = "0011223344" to value 0x0011223344, use
value = strtoul(string, NULL, 16); // or strtoull()
done. if need remove beginning 0x00, see below.
though for LITTLE_ENDIAN platforms, plus:
Hex value to char array, value 0x11223344 to char arr[N] = {0x00, 0x11, ...}
unsigned long *hex = (unsigned long*)arr;
*hex = htonl(value);
// you'd like to remove any beginning 0x00
char *zero = arr;
while (0x00 == *zero) { zero++; }
if (zero > arr) memmove(zero, arr, sizeof(arr) - (zero - arr));
done.
Notes:
For converting long string to a 64 bits hex char arr on a 32-bit system, you should use unsigned long long instead of unsigned long, and htonl is not enough, so do it yourself as below because might there's no htonll, htonq or hton64 etc:
#if __KERNEL__
/* Linux Kernel space */
#if defined(__LITTLE_ENDIAN_BITFIELD)
#define hton64(x) __swab64(x)
#else
#define hton64(x) (x)
#endif
#elif defined(__GNUC__)
/* GNU, user space */
#if __BYTE_ORDER == __LITTLE_ENDIAN
#define hton64(x) __bswap_64(x)
#else
#define hton64(x) (x)
#endif
#elif
...
#endif
#define ntoh64(x) hton64(x)
see http://effocore.googlecode.com/svn/trunk/devel/effo/codebase/builtin/include/impl/sys/bswap.h
{
char szVal[] = "268484927472";
char szOutput[30];
size_t nLen = strlen(szVal);
// Make sure it is even.
if ((nLen % 2) == 1)
{
printf("Error string must be even number of digits %s", szVal);
}
// Process each set of characters as a single character.
nLen >>= 1;
for (size_t idx = 0; idx < nLen; idx++)
{
char acTmp[3];
sscanf(szVal + (idx << 1), "%2s", acTmp);
szOutput[idx] = (char)strtol(acTmp, NULL, 16);
}
}
Below are my hex2bin and bin2hex implementations.
These functions:
Are public domain (feel free to copy and paste)
Are simple
Are correct (i.e., tested)
Perform error handling (-1 means invalid hex string)
hex2bin
static char h2b(char c) {
return '0'<=c && c<='9' ? c - '0' :
'A'<=c && c<='F' ? c - 'A' + 10 :
'a'<=c && c<='f' ? c - 'a' + 10 :
/* else */ -1;
}
int hex2bin(unsigned char* bin, unsigned int bin_len, const char* hex) {
for(unsigned int i=0; i<bin_len; i++) {
char b[2] = {h2b(hex[2*i+0]), h2b(hex[2*i+1])};
if(b[0]<0 || b[1]<0) return -1;
bin[i] = b[0]*16 + b[1];
}
return 0;
}
bin2hex
static char b2h(unsigned char b, int upper) {
return b<10 ? '0'+b : (upper?'A':'a')+b-10;
}
void bin2hex(char* hex, const unsigned char* bin, unsigned int bin_len, int upper) {
for(unsigned int i=0; i<bin_len; i++) {
hex[2*i+0] = b2h(bin[i]>>4, upper);
hex[2*i+1] = b2h(bin[i]&0x0F, upper);
}
}
First, your question isn't very precise. Is the string a std::string or a char buffer? Set at compile-time?
Dynamic memory is almost certainly your answer.
char* arr = (char*)malloc(numberOfValues);
Then, you can walk through the input, and assign it to the array.