perhaps a lil trivial, but im just learning C and i hate doing with 2 lines, what can be done with one(as long as it does not confuse the code of course).
anyway, im building strings by appending one character at a time. im doing this by keeping track of the char index of the string being built, as well as the input file string's(line) index.
str[strIndex] = inStr[index];
str[strIndex + 1] = '\0';
str is used to temporarily store one of the words from the input line.
i need to append the terminator every time i add a char.
i guess what i want to know; is there a way to combine these in one statement, without using strcat()(or clearing str with memset() every time i start a new word) or creating other variables?
Simple solution: Zero out the string before you add anything to it. The NULs will already be at every location ahead of time.
// On the stack:
char str[STRLEN] = {0};
// On the heap
char *str = calloc(STRLEN, sizeof(*str));
In the calloc case, for large allocations, you won't even pay the cost of zeroing the memory explicitly (in bulk allocation mode, it requests memory directly from the OS, which is either lazily zero-ed (Linux) or has been background zero-ed before you ask for it (Windows)).
Obviously, you can avoid even this amount of work by defering the NUL termination of the string until you're done building it, but if you might need to use it as a C-style string at any time, guaranteeing it's always NUL-terminated up front isn't unreasonable.
I believe the way you are doing it now is the neatest that satisfies your requirement of
1) Not having string all zero to start with
2) At every stage the string is valid (as in always has a termination).
Basically you want to add two bytes each time. And really the most neat way to do that is the way you are doing it now.
If you are wanting to make the code seem neater by having the "one line" but not calling a function then perhaps a macro:
#define SetCharAndNull( String, Index, Character ) \
{ \
String[Index] = (Character); \
String[Index+1] = 0; \
}
And use it like:
SetCharAndNull( str, strIndex, inStr[index]);
Otherwise the only other thing I can think of which would achieve the result is to write a "word" at a time (two bytes, so an unsigned short) in most cases. You could do this with some horrible typecasting and pointer arithmetic. I would strongly recommend against this though as it won't be very readable, also it won't be very portable. It would have to be written for a particular endianness, also it would have problems on systems that require alignment on word access.
[Edit: Added the following]
Just for completeness I'm putting that awful solution I mentioned here:
*((unsigned short*)&str[strIndex]) = (unsigned short)(inStr[index]);
This is type casting the pointer of str[strIndex] to an unsigned short which on my system (OSX) is 16 bits (two bytes). It is then setting the value to a 16 bit version of inStr[index] where the top 8 bits are zero. Because my system is little endian, then the first byte will contain the least significant one (which is the character), and the second byte will be the zero from the top of the word. But as I said, don't do this! It won't work on big endian systems (you would have to add in a left shift by 8), also this will cause alignment problems on some processors where you can not access a 16bit value on a non 16-bit aligned address (this will be setting address with 8bit alignment)
Declare a char array:
char str[100];
or,
char * str = (char *)malloc(100 * sizeof(char));
Add all the character one by one in a loop:
for(i = 0; i<length; i++){
str[i] = inStr[index];
}
Finish it with a null character (outside the loop):
str[i] = '\0';
Related
C language, How to iterate character by character in strings which lengths are larger than INT_MAX, or SIZE_MAX?
How to find out that string length exceeded the any MAXIMUM SIZE applicable for the code below?
int len = strlen(item);
int i=0;
while (i <= len ) {
//do smth
i++;
}
You can access characters in a string (or elements in an array generally) without integer indices by using pointers:
for (char *p = item; *p; ++p)
{
// Do something.
// *p is the current character.
}
int len = strlen(item);
first, this is not an impedment to have a string longer thatn INT_MAX, but it will if you have to deal with it's length. If you thinkg about the implementation of strlen() you'll see that, as how strings are defined (a sequence of chars in memory bounded by thre presence of a null char) you'll see that the only possible implementation is to search the string, incrementing the length as you traverse it searching for the first null char on it. This makes your code very ineficient, because you first traverse the string searching for its end, then you traverse it a second time to do useful work.
int i=0;
while (i <= len ) {
//do smth
i++;
}
it should be better to use directly a pointer, in a for loop, like this one:
char *p;
for (p = item; *p; p++) {
// so something knowing that the char `*p` is the iterated char.
}
In this way, you navigate the string and stop when you find the null char, and you will not have to traverse it twice.
By the way, having strings longer than INT_MAX is quite difficult, because normally (and more with the new 64bit architectures) you are not allowed to create a so compact memory structure (this meaning that if you try to create a static array of that size, you will be fighting with the compiler, and if you try to malloc() such a huge amount of memory, you will end fighting wiht the operating system)
It's most normal that developers having to deal with huge amounts of memory, use an unseen structure to hold large amounts of characters. Just imagine that you need to insert one char and this forces you to move one gigabyte of memory one position because you have no other way to make room for it. It's simply unpractical to use such an amount. A simple approach is to use a similar structure as it is used for the file data in a disk in a unix system. The data has a series of direct pointers that point to fixed blocks of memory holding characters, but at some point those pointers become double pointers, pointing to an array of simple poointers, then a triple pointer, etc. This way you can handle strings as sort as one byte (with just a memory page)to more than INT_MAX bytes, by selecting an appropiate size for the page and the number of pointers.
Another approach is the mbuf_t approach used by BSD software to handle networking packets. This is expressely appropiate when you have to add to the string in front of it (e.g. to add a new protocol header) or to the rear of the packet (to add payload and/or checksum or trailing data)
One last thing... if you create an array of 5Gb, most probably every today operating system will swap it, as soon as you stop using part of it. This will make your application to start swaping as soon as you move on the array, and probably you will not be able to run your application in a computer with a limited address space (like, today a 32bit machine is)
When the character array substring[#] is set as [64], the file outputs an additional character. The additional character varies with each compile. Sometimes es?, sometimes esx among others.
If I change the [64] to any other number (at least the ones I've tried: 65, 256,1..) it outputs correctly as es.
Even more strange, if I leave the unused/undeclared character array char newString[64] in this file, it outputs the correct substring es even with the 64.
How does the seemingly arbitrary size of 64 affect the out?
How does a completely unrelated character array (newString) influence how another character array is output?
.
int main () {
char string[64];
char newString[64];
char substring[64];
fgets(string,64,stdin);
strncpy(substring, string+1, 1);
printf("%s\n", substring);
return 0;
}
The problem is, strncpy() will not copy the null terminator because you've asked it not to.
Using strncpy() is safe and dangerous at the same time, because it will not always copy the null terminator, also using it for a single byte is pointless, instead do this
substring[0] = string[1];
substring[1] = '\0';
and it shall work.
You should read the manual page strncpy(3) to understand what I mean correctly, if you read the manual carefully every time you would become a better programmer in a shorter time.
I am new to C, and I am trying to figure out the best way to approach this problem. I have 2 strings that are both char *'s.
They have multiple \n characters within the strings themselves, and they are usually about 1000 characters in length. I want to display only single lines that are different. Typically only one character (or a relatively small number) would be different in the entire string. So I was hoping to make it so that I could display only that one changed line (the whole string from \n to \n).
I'm not asking for anybody to write the code, or even supply code examples, just in theory what would be the most efficient way to do this?
I've been looking into using strtok, using the '\n' symbol as a delimiter, and then using strcmp to compare the two strings, and if they were not equal then I could add that string to a "old_data" and "new_data" array. Would this be a bad way to do this?
Any advice would be a huge help.
It sounds like you're on the right track: strsep will let you chunk the string up by newline. One thing to keep in mind is that it operates on the original string in place, and doesn't allocate any new memory, which can be both a blessing and a curse.
Probably the most memory efficient way to do this would be to look into allocating arrays of pointers to hold your "old_data" and "new_data" values, and then just save the pointers that point directly into the original string rather than copying the strings themselves over. As long as your original two strings are going to stick around/not get freed from under you, this could save you a decent chunk of memory.
If you aren't ever going to be removing strings from the arrays, one naïve (but effective) way to implement your arrays is to maintain two state variables — a count, and a capacity — and double the capacity each time you're about to overflow the array. E.g.:
char **strArray = NULL;
unsigned int capacity = 10;
unsigned int count = 0;
strArray = malloc(capacity * sizeof(char *));
/* on insert */
if (count == capacity)
{
capacity *= 2;
strArray = realloc(strArray, capacity * sizeof(char *));
}
strArray[count++] = pointerIntoOriginalString;
Good luck!
strtok() is not reentrant. If you were going to do this with strtok, you'd have to iterate over the arrays one after the other. I recommend using strtok_r(), which is the reentrant implementation of strtok.
The other consideration you need to worry about is making sure that your old_data and new_data arrays are big enough, or resizable. Matt's answer shows a simple example of resizing the array, although if you're new to C you might just want to declare something like:
char *new_data[2000];
char *old_data[2000];
Especially since it sounds like you have a good idea of how many lines are in your buffer.
This might be somewhat pointless, but I'm curious what you guys think about it. I'm iterating over a string with pointers and want to pull a short substring out of it (placing the substring into a pre-allocated temporary array). Are there any reasons to use assignment over strncopy, or vice-versa? I.e.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main()
{ char orig[] = "Hello. I am looking for Molly.";
/* Strings to store the copies
* Pretend that strings had some prior value, ensure null-termination */
char cpy1[4] = "huh\0";
char cpy2[4] = "huh\0";
/* Pointer to simulate iteration over a string */
char *startptr = orig + 2;
int length = 3;
int i;
/* Using strncopy */
strncpy(cpy1, startptr, length);
/* Using assignment operator */
for (i = 0; i < length; i++)
{ cpy2[i] = *(startptr + i);
}
/* Display Results */
printf("strncpy result:\n");
printf("%s\n\n", cpy1);
printf("loop result:\n");
printf("%s\n", cpy2);
}
It seems to me that strncopy is both less typing and more easily readable, but I've seen people advocate looping instead. Is there a difference? Does it even matter? Assume that this is for small values of i (0 < i < 5), and null-termination is assured.
Refs: Strings in c, how to get subString, How to get substring in C, Difference between strncpy and memcpy?
strncpy(char * dst, char *src, size_t len) has two peculiar properties:
if (strlen(src) >= len) : the resulting string will not be nul-terminated.
if (strlen(src) < len) : the end of the string will be filled/padded with '\0'.
The first property will force you to actually check if (strlen(src) >= len) and act appropiately. (or brutally set the final character to nul with dst[len-1] = '\0';, like #Gilles does above) The other property is not particular dangerous, but can spill a lot of cycles. Imagine:
char buff[10000];
strncpy(buff, "Hello!", sizeof buff);
which touches 10000 bytes, where only 7 need to be touched.
My advice:
A: if you know the sizes, just do memcpy(dst,src,len); dst[len] = 0;
B: if you don't know the sizes, get them somehow (using strlen and/or sizeof and/or the allocated size for dynamically allocced memory). Then: goto A above.
Since for safe operation the strncpy() version already needs to know the sizes, (and the checks on them!), the memcpy() version is not more complex or more dangerous than the strncpy() version. (technically it is even marginally faster; because memcpy() does not have to check for the '\0' byte)
While this may seem counter-intuitive, there are more optimized ways to copy a string than by using the assignment operator in a loop. For instance, IA-32 provides the REP prefix for MOVS, STOS, CMPS etc for string handling, and these can be much faster than a loop that copies one char at a time. The implementation of strncpy or strcpy may choose to use such hardware-optimized code to achieve better performance.
As long as you know your lengths are "in range" and everything is correctly nul terminated, then strncpy is better.
If you need to get length checks etc in there, looping could be more convenient.
A loop with assignment is a bad idea because you're reinventing the wheel. You might make a mistake, and your code is likely to be less efficient than the code in the standard library (some processors have optimized instructions for memory copies, and optimized implementations usually at least copy word by word if possible).
However, note that strncpy is not a well-rounded wheel. In particular, if the string is too long, it does not append a null byte to the destination. The BSD function strlcpy is better designed, but not available everywhere. Even strlcpy is not a panacea: you need to get the buffer size right, and be aware that it might truncate the string.
A portable way to copy a string, with truncation if the string is too long, is to call strncpy and always add the terminating null byte. If the buffer is an array:
char buffer[BUFFER_SIZE];
strncpy(buffer, source, sizeof(buffer)-1);
buf[sizeof(buffer)-1] = 0;
If the buffer is given by a pointer and size:
strncpy(buf, source, buffer_size-1);
buf[buffer_size-1] = 0;
A comment on one of my answers has left me a little puzzled. When trying to compute how much memory is needed to concat two strings to a new block of memory, it was said that using snprintf was preferred over strlen, as shown below:
size_t length = snprintf(0, 0, "%s%s", str1, str2);
// preferred over:
size_t length = strlen(str1) + strlen(str2);
Can I get some reasoning behind this? What is the advantage, if any, and would one ever see one result differ from the other?
I was the one who said it, and I left out the +1 in my comment which was written quickly and carelessly, so let me explain. My point was merely that you should use the pattern of using the same method to compute the length that will eventually be used to fill the string, rather than using two different methods that could potentially differ in subtle ways.
For example, if you had three strings rather than two, and two or more of them overlapped, it would be possible that strlen(str1)+strlen(str2)+strlen(str3)+1 exceeds SIZE_MAX and wraps past zero, resulting in under-allocation and truncation of the output (if snprintf is used) or extremely dangerous memory corruption (if strcpy and strcat are used).
snprintf will return -1 with errno=EOVERFLOW when the resulting string would be longer than INT_MAX, so you're protected. You do need to check the return value before using it though, and add one for the null terminator.
If you only need to determine how big would be the concatenation of the two strings, I don't see any particular reason to prefer snprintf, since the minimum operations to determine the total length of the two strings is what the two strlen calls do. snprintf will almost surely be slower, because it has to check the parameters and parse the format string besides just walking the two strings counting the characters.
... but... it may be an intelligent move to use snprintf if you are in a scenario where you want to concatenate two strings, and have a static, not too big buffer to handle normal cases, but you can fallback to a dynamically allocated buffer in case of big strings, e.g.:
/* static buffer "big enough" for most cases */
char buffer[256];
/* pointer used in the part where work on the string is actually done */
char * outputStr=buffer;
/* try to concatenate, get the length of the resulting string */
int length = snprintf(buffer, sizeof(buffer), "%s%s", str1, str2);
if(length<0)
{
/* error, panic and death */
}
else if(length>sizeof(buffer)-1)
{
/* buffer wasn't enough, allocate dynamically */
outputStr=malloc(length+1);
if(outputStr==NULL)
{
/* allocation error, death and panic */
}
if(snprintf(outputStr, length, "%s%s", str1, str2)<0)
{
/* error, the world is doomed */
}
}
/* here do whatever you want with outputStr */
if(outputStr!=buffer)
free(outputStr);
One advantage would be that the input strings are only scanned once (inside the snprintf()) instead of twice for the strlen/strcpy solution.
Actually, on rereading this question and the comment on your previous answer, I don't see what the point is in using sprintf() just to calculate the concatenated string length. If you're actually doing the concatenation, my above paragraph applies.
You need to add 1 to the strlen() example. Remember you need to allocate space for nul terminating byte.
So snprintf( ) gives me the size a string would have been. That means I can malloc( ) space for that guy. Hugely useful.
I wanted (but did not find until now) this function of snprintf( ) because I format tons of strings for output later; but I wanted not to have to assign static bufs for the outputs because it's hard to predict how long the outputs will be. So I ended up with a lot of 4096-long char arrays :-(
But now -- using this newly-discovered (to me) snprintf( ) char-counting function, I can malloc( ) output bufs AND sleep at night, both.
Thanks again and apologies to the OP and to Matteo.
EDIT: random, mistaken nonsense removed. Did I say that?
EDIT: Matteo in his comment below is absolutely right and I was absolutely wrong.
From C99:
2 The snprintf function is equivalent to fprintf, except that the output is written into
an array (specified by argument s) rather than to a stream. If n is zero, nothing is written,
and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
Thank you, Matteo, and I apologize to the OP.
This is great news because it gives a positive answer to a question I'd asked here only a three weeks ago. I can't explain why I didn't read all of the answers, which gave me what I wanted. Awesome!
The "advantage" that I can see here is that strlen(NULL) might cause a segmentation fault, while (at least glibc's) snprintf() handles NULL parameters without failing.
Hence, with glibc-snprintf() you don't need to check whether one of the strings is NULL, although length might be slightly larger than needed, because (at least on my system) printf("%s", NULL); prints "(null)" instead of nothing.
I wouldn't recommend using snprintf() instead of strlen() though. It's just not obvious. A much better solution is a wrapper for strlen() which returns 0 when the argument is NULL:
size_t my_strlen(const char *str)
{
return str ? strlen(str) : 0;
}