I'm trying to parse argv back into a single string to use for a system call.
The string shows up fine through the printf call (even without the \0 terminator), but using it as a parameter to system creates all sorts of undefined behaviour.
How can I ensure the string is properly terminated?
Is there a better and more reliable way to go about parsing char[][] into char[]?
#include <stdio.h>
int main(int argc,char *argv[]){
char cmd[255]="tcc\\tcc.exe ";
char**ptr=argv+1;
while(*ptr){
strcat(cmd,*ptr);
++ptr;
if(*ptr)cmd[strlen(cmd)]=' ';
}
printf("cmd: ***%s***\n",cmd);
system(cmd);
}
I just discovered and corrected another flaw in this code, a system commands needs (escaped) backslashes for file paths
This instruction:
if(*ptr)
cmd[strlen(cmd)]=' ';
else
cmd[strlen(cmd)]='\0';
will break cmd, because it will overwrite its zero termination. Try instead:
l = strlen(cmd);
if (*ptr) {
cmd[l++] = ' ';
}
cmd[l] = 0x0;
This will append a space, and zero terminate the string. Actually, since it is already zero terminated, you could do better:
if (*ptr) {
int l = strlen(cmd);
cmd[l++] = ' ';
cmd[l ] = 0x0;
}
Update
A better alternative could be this:
int main(int argc, char *argv[])
{
char cmd[255]="tcc/tcc.exe";
char **ptr=argv+1;
for (ptr = argv+1; *ptr; ptr++)
{
strncat(cmd, " ", sizeof(cmd)-strlen(cmd));
strncat(cmd, *ptr, sizeof(cmd)-strlen(cmd));
}
printf("String: '%s'.\n", cmd);
return 0;
}
We use strncat() to check that we're not overrunning the cmd buffer, and the space gets applied in advance. This way there's no extra space at the end of the string.
It is true that strncat() is a mite slower than directly assigning cmd[], but factoring the safety and debugging time, I think it's worthwhile.
Update 2
OK, so let's try to do this fast. We keep track of what cmd's length ought to be in a variable, and copy the string with memcpy() which is slightly faster than strcpy() and does neither check string length, nor copy the extra zero at end of string.
(This saves something - remember that strcat() has to implicitly calculate the strlen of both its arguments. Here we save that).
int main(int argc, char *argv[])
{
#define MAXCMD 255
char cmd[MAXCMD]="tcc/tcc.exe";
int cmdlen = strlen(cmd);
char **ptr=argv+1;
for (ptr = argv+1; *ptr; ptr++)
{
/* How many bytes do we have to copy? */
int l = strlen(*ptr);
/* STILL, this check HAS to be done, or the program is going to crash */
if (cmdlen + 1 + l + 1 < MAXCMD)
{
/* No danger of crashing */
cmd[cmdlen++] = ' ';
memcpy(cmd + cmdlen, *ptr, l);
cmdlen += l;
}
else
{
printf("Buffer too small!\n");
}
}
cmd[cmdlen] = 0x0;
printf("String: '%s'.\n", cmd);
return 0;
}
Update 3 - not really recommended, but fun
It is possible to try and be smarter than the compiler's usually built-in strlen and memcpy instructions (file under: "Bad ideas"), and do without strlen() altogether. This translates into a smaller inner loop, and when strlen and memcpy are implemented with library calls, much faster performances (look ma, no stack frames!).
int main(int argc, char *argv[])
{
#define MAXCMD 254
char cmd[MAXCMD+1]="tcc/tcc.exe";
int cmdlen = 11; // We know initial length of "tcc/tcc.exe"!
char **ptr;
for (ptr = argv+1; *ptr; ptr++)
{
cmd[cmdlen++] = ' ';
while(**ptr) {
cmd[cmdlen++] = *(*ptr)++;
if (MAXCMD == cmdlen)
{
fprintf(stderr, "BUFFER OVERFLOW!\n");
return -1;
}
}
}
cmd[cmdlen] = 0x0;
printf("String: '%s'.\n", cmd);
return 0;
}
Discussion - not so fun
Shamelessly cribbed from many a lecture I received from professors I thought shortsighted, until they were proved right each and every time.
The problem here is to exactly circumscribe what are we doing - what's the forest this particular tree is part of.
We're building a command line that will be fed to a exec() call, which means that the OS will have to build another process environment and allocate and track resources. Let's step a bit backwards: an operation will be run that will take about one millisecond, and we're feeding it a loop that might take ten microseconds instead of twenty.
The 20:10 (that's 50%!) improvement we have on the inner loop translates in a 1020:1010 (that's about 1%) just the overall process startup operation. Let's imagine the process takes half a second - five hundred milliseconds - to complete, and we're looking at 500020:500010 or 0.002% improvement, in accord with the never-sufficiently-remembered http://en.wikipedia.org/wiki/Amdahl%27s_law .
Or let's put it another way. One year hence, we will have run this program, say, one billion times. Those 10 microseconds saved now translate to a whopping 10.000 seconds, or around two hours and three quarters. We're starting to talk big, except that to obtain this result we've expended sixteen hours coding, checking and debugging :-)
The double-strncat() solution (which is actually the slowest) supplies code that is easier to read and understand, and modify. And reuse. The fastest solution, above, implicitly relies on the separator being one character, and this fact is not immediately apparent. Which means that reusing the fastest solution with ", " as separator (let's say we need this for CSV or SQL) will now introduce a subtle bug.
When designing an algorithm or piece of code, it is wise to factor not only tightness of code and local ("keyhole") performances, but also things like:
how that single piece affects the performances of the whole. It makes no sense to spend 10% of development time on less than 10% of the overall goal.
how easy it is for the compiler to interpret it (and optimize it with no further effort on our part, maybe even optimize specifically for different platforms -- all at no cost!)
how easy will it be for us to understand it days, weeks, or months down the line.
how non-specific and robust the code is, allowing to reuse it somewhere else (DRY).
how clear its intent is - allowing to reengineer it later, or replace with a different implementation of the same intent (DRAW).
A subtle bug
This in answer to WilliamMorris's question, so I'll use his code, but mine has the same problem (actually, mine is - not completely unintentionally - much worse).
This is the original functionality from William's code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
#define CMD "tcc/tcc.exe"
char cmd[255] = CMD;
char *s = cmd + sizeof CMD - 1;
const char *end = cmd + sizeof cmd - 1;
// Cycle syntax modified to pre-C99 - no consequences on our code
int i;
for (i = 1; i < argc; ++i) {
size_t len = strlen(argv[i]);
if (s + len >= end) {
fprintf(stderr, "Buffer overrun!\n");
exit(1);
}
// Here (will) be dragons
//*s++ = '.';
//*s++ = '.';
//*s++ = '.';
*s++ = ' ';
memcpy(s, argv[i], len);
s += len;
}
*s = '\0';
// Get also string length, which should be at most 254
printf("%s: string length is %d\n", cmd, (int)strlen(cmd));
return 0;
}
The buffer overrun check verifies that the string written so far, plus the string that has yet to be written, together do not exceed the buffer. The length of the separator itself is not counted, but things will work out somehow:
size_t len = strlen(argv[i]);
if (s + len >= end) {
fprintf(stderr, "Buffer overrun!\n");
exit(1);
}
Now we add on the separator in the most expeditious way - by repeating the poke:
*s++ = ', ';
*s++ = ' ';
Now if s + len is equal to end - 1, the check will pass. We now add two bytes. The total length will be s + len + 2, which is equal to end plus one:
tcc/tcc.exe, It, was, the, best, of, times, it, was, the, worst, of,
times, it, was, the, age, of, wisdom, it, was, the, age, of,
foolishness, it, was, the, epoch, of, belief, it, was, the, epoch, of,
incredulity, it, was, the, season, of, Light, it, was: string length is 254
tcc/tcc.exe, It, was, the, best, of, times, it, was, the, worst, of,
times, it, was, the, age, of, wisdom, it, was, the, age, of,
foolishness, it, was, the, epoch, of, belief, it, was, the, epoch, of,
incredulity, it, was, the, season, of, Light, it, ouch: string length
is 255
With a longer separator, such as "... ", the problem is even more evident:
tcc/tcc.exe... It... was... the... best... of... times... it... was...
the... worst... of... times... it... was... the... age... of...
wisdom... it... was... the... age... of... foolishness... it... was...
the... epoch... of... belief... it... was... longer: string length is
257
In my version, the fact that the check requires an exact match leads to catastrophic results, since once the buffer is overrun, the match will always fail and result in a massive memory overwrite.
If we modify my version with
if (cmdlen >= MAXCMD)
we will get a code that always intercepts buffer overruns, but still does not prevent them up to the delimiter's length minus two; i.e., a hypothetical delimiter 20 bytes long could overwrite 18 bytes past cmd's buffer before being caught.
I would point out that this is not to say that my code had a catastrophic bug (and so, once fixed, it'll live happy ever after); the point was that the code was structured in such a way that, for the sake of squeezing a little speed, a dangerous bug could easily go unnoticed, or the same bug could easily be introduced upon reuse of what looked like "safe and tested" code. This is a situation that one would be well advised to avoid.
(I'll come clean now, and confess that I myself rarely did... and too often still don't).
This might be a more complicated than you want but it avoids buffer overflows and it also exits if the buffer is too small. Note that continuing to loop once there is not enough space in the buffer for argv[N] can result in trailing strings (argv[N+1] etc) that are shorter than argv[N] being added to the string even though argv[N] was omitted...
Note that I'm using memcpy, because by that point I already know how long argv[i] is.
int main(int argc, char **argv)
{
#define CMD "tcc/tcc.exe"
char cmd[255] = CMD;
char *s = cmd + sizeof CMD - 1;
const char *end = cmd + sizeof cmd - 1;
for (int i = 1; i < argc; ++i) {
size_t len = strlen(argv[i]);
if (s + len >= end) {
fprintf(stderr, "Buffer overrun!\n");
exit(1);
}
*s++ = ' ';
memcpy(s, argv[i], len);
s += len;
}
*s = '\0';
printf("%s\n", cmd);
return 0;
}
Related
I am trying to create the function delete_comments(). The read_file() and main functions are given.
Implement function char *delete_comments(char *input) that removes C comments from program stored at input. input variable points to dynamically allocated memory. The function returns pointer to the polished program. You may allocate a new memory block for the output, or modify the content directly in the input buffer.
You’ll need to process two types of comments:
Traditional block comments delimited by /* and */. These comments may span multiple lines. You should remove only characters starting from /* and ending to */ and for example leave any following newlines untouched.
Line comments starting with // until the newline character. In this case, newline character must also be removed.
The function calling delete_comments() only handles return pointer from delete_comments(). It does not allocate memory for any pointers. One way to implement delete_comments() function is to allocate memory for destination string. However, if new memory is allocated then the original memory in input must be released after use.
I'm having trouble understanding why my current approach is wrong or what is the specific problem that I'm getting weird output. I'm approaching the problem by trying to create a new array where to copy the input string with the new rules.
#include "source.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Remove C comments from the program stored in memory block <input>.
* Returns pointer to code after removal of comments.
* Calling code is responsible of freeing only the memory block returned by
* the function.
*/
char *delete_comments(char *input)
{
input = malloc(strlen(input) * sizeof (char));
char *secondarray = malloc(strlen(input) * sizeof (char));
int x, y = 0;
for (x = 0, y = 0; input[x] != '\0'; x++) {
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
else if ((input[x] == '/') && (input[x + 1] == '/')) {
int j = 0;
while (input[x + j] != '\n') {
y++;
j++;
}
}
else {
secondarray[x] = input[y];
y++;
}
}
return secondarray;
}
/* Read given file <filename> to dynamically allocated memory.
* Return pointer to the allocated memory with file content, or
* NULL on errors.
*/
char *read_file(const char *filename)
{
FILE *f = fopen(filename, "r");
if (!f)
return NULL;
char *buf = NULL;
unsigned int count = 0;
const unsigned int ReadBlock = 100;
unsigned int n;
do {
buf = realloc(buf, count + ReadBlock + 1);
n = fread(buf + count, 1, ReadBlock, f);
count += n;
} while (n == ReadBlock);
buf[count] = 0;
return buf;
}
int main(void)
{
char *code = read_file("testfile.c");
if (!code) {
printf("No code read");
return -1;
}
printf("-- Original:\n");
fputs(code, stdout);
code = delete_comments(code);
printf("-- Comments removed:\n");
fputs(code, stdout);
free(code);
}
Your program has fundamental issues.
It fails to tokenize the input. Comment start sequences can occur inside string literals, in which case they do not denote comments: "/* not a comment".
You have some basic bugs:
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
Here, when we enter the loop, with i = 0, input + x is still pointing to the opening /. We did not skip over the opening * and are already looking for a closing *. This means that the sequence /*/ will be recognized as a complete comment, which it isn't.
This loop's also assumes that every /* comment is properly closed. It's not checking for the null character which can terminate the input, so if the comment is not closed, it will march beyond the end of the buffer.
C has line continuations. In ISO C translation stage 2, all backlash-newline sequences are deleted, converting one or more physical lines into logical lines. What that means is that a // comment can span multiple physical lines:
// this is an \
extended comment
You can see, by the way, that StackOverflow's automatic language detector for syntax highlighting is getting this right!
Line continuations are independent of tokenization, which doesn't happen until translation stage 3. Which means:
/\
/\
this is an extended \
comment
That one has defeated StackOverflow's syntax highlighting.
Furthermore, a line continuation can happen in any token, possibly multiple times:
"\
this is a string literal\
"
If you really want to make this work 100% correctly, you need to parse the input. By "parse" I mean a more formal, rigorous detection routine that understands what it is reading, in the context it is reading it.
For example, there are many times where this code could be defeated.
printf("the answer is %d // %d\n", a, b);
would likely trip your // detection and strip the end of the printf.
There are two general approaches to the problem above:
Find every corner case where comment-like characters could be used, and write conditional statements to avoid them before stripping.
Fully parse the language, so you will know if you are within a string or some other context that's wrapping comment like characters, or if you are in the top level context where the characters really mean "this is a comment"
To learn about parsing, I generally recommend "The Dragon Book" but it is a hard read, unless you have studied a bit of Discrete Mathematics. It covers a lot of different parsing techniques, and in doing so it doesn't have many pages left for examples. This means that it's the kind of book where you have to read, think, and then program a mini-example. If you follow that path, there is no input you can't tackle.
If you are pragmatic in your solution, and it is not about learning parsing, but about stripping comments, I recommend that you find a well constructed parser for C, and then learn how to walk the Abstract Syntax Tree in an Emitter, which fails to emit the comments.
There are some projects that do this already; but, I don't know if they have the right structure for easy modification. lint comes to mind, as well as other "pretty-printers" GCC certainly has the parsing code in there, but I've heard that GCC's Abstract Syntax Tree isn't easy to learn.
Your solution has several problems:
The worst issue
As the first instruction in delete_comments() you overwrite input with a new pointer returned by malloc(), which points to memory of random contents.
In consequence the address to the real input is lost.
Oh, and please check the returned value, if you call malloc().
Failing to increment the scanned position in comments correctly
You are scanning the input by the index x, but if you detect a comment, you don't change it.
You are actually advancing y but this is only used for the copying.
Think about lines like these:
int x; /* some /* weird /* comment */
///////////////////////////////
for (;;) { }
Ignoring character and string literals
Your solution should take character and string literals into account.
For example:
int c_plus_plus_comment_start = '//'; /* multi character constant */
const char* c_comment_start = "/*";
Note: There are more. Learn to use a debugger, or at least insert lots of printf()s in "interesting" places.
I'm posting a continuation of my question from this thread.
I'm trying to create a string that begins with a '!' and adds 6 values read from a sensor (separated by commas) and then sends it over a serial port. A sample output would be: "!5,5,5,5,5,5" or "!34,34,34,34,34,34".
My code is mostly working; I'm able to send the value of one analog sensor across my serial port, !215!215!215 for example, but when I un-comment the for loop code below I see nothing across my serial port and the program seems, for lack of a better word, useless.
There seems to be a runtime error occuring in my for loop but I can't determine where it happens. Why does my code below successfully send serial data for one analog sensor without using the for loop, and send nothing when using the for loop? How can I tweak my code to achieve my desired output?
char* convertIntToString(uint8_t integerValue, char* str){
utoa(integerValue, str, 10);
return str;
}
char* concat(char *s1, char *s2)
{
char *result = malloc(strlen(s1)+strlen(s2)+1);
strcpy(result, s1);
strcat(result, s2);
return result;
}
int main(void){
uint8_t analogValue;
char *outputStr = malloc(1);
while (1) {
outputStr = realloc(outputStr, 2);
strcpy(outputStr, "!");
analogValue = ReadADC(0);
char str[4];
outputStr = concat(outputStr, convertIntToString(analogValue, str));
//RUNTIME ERROR IN THIS LOOP
for(int i = 0; i < 5; i++){
char* newStr = concat(outputStr, ",");
// free the old memory before using the new memory
free(outputStr);
outputStr = newStr;
newStr = concat(outputStr, convertIntToString(analogValue, str));
// free the old memory before using the new memory
free(outputStr);
outputStr = newStr;
}
CDC_Device_SendString(&VirtualSerial_CDC_Interface, outputStr); //send sring over serial port
free(outputStr);
}
}
Expanded from the comment above and comments in the previous question.
If you are able to calculate the maximum size of a "packet", then you can avoid dynamic memory all together and just use a fixed buffer size. The calculation doesn't even have to be 100% accurate, as long as it errs on the side of "too big".
e.g.: 5 instances of 5 numbers with a max of 3 digits separated by commas: 5 * 5 * 4 (3 digits + a comma). Not 100% right because the 5th group doesn't need a comma, so you are over estimating by one (or is that 5?). Just be aware of the possible cumulative effect of this if you have multiple "known errors".
So assuming you can estimate the max size, perhaps "encode" it via #defines - perhaps even fixing some of the "known errors").
So now you have char buffer[KNOWN_UPPER_BOUND], as long as you initialize it correctly (e.g. buffer[0] = '\0';, you can just keep appending to it via strcat(). If you were talking big numbers, you could keep track of the last index to avoid repeated scans through the string looking for the end.
e.g. (using globals for simplicity)
char buffer[KNOWN_UPPER_BOUND];
int last_index=0;
addString(char* str)
{
int len = strlen(str);
if (last_index + len > KNOWN_UPPER_BOUND)
{
/* error handling */
}
else
{
strcat(buffer[last_index], str);
last_index += n;
}
}
So what were some of the issues with the dynamic code?
Potential for leaks (much like the errors I mentioned in the calculation - ok (by which I mean 'not overly harmful in a small program') if you leak 2 bytes once, not so good when you put it in a loop and leak 2 bytes over and over again)
Speed issues - malloc is out of you control, it could be very slow. A lot of small allocations can fragment memory which may mean later on when you want a bigger block you can't get one.
Lots of copying and re-copying of data. Your concat is an example here - each concat does a malloc and copies both strings. Every time you call it.
You could still use dynamic memory to hold the final string, but build up each "component" in a fixed size buffer.
What if you move the declaration of char* newStr outside the loop.
Declaring the newStr as char array will be better than pointer to avoid leakage. something like char newStr[50]
char sentence2[10];
strncpy(sentence2, second, sizeof(sentence2)); //shouldn't I specify the sizeof(source) instead of sizeof(destination)?
sentence2[10] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
//////////////////////////////////////////////////////////////
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this meaningless loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
So here's the problem. When I run the first part of this code, the program crashes.
However, when I add the for loop that just prints garbage values in memory locations, it does not crash but still won't strcpy properly.
Second, when using strncpy, shouldn't I specify the sizeof(source) instead of sizeof(destination) since I'm moving the bytes of the source ?
Third, It makes sense to me to add the the null terminating character after strncpy, since I've read that it doesn't add the null character on its own, but I get a warning that it's a possible out of bounds store from my pelles c IDE.
fourth and most importantly, why doesn't the simply strcpy work ?!?!
////////////////////////////////////////////////////////////////////////////////////
UPDATE:
#include <stdio.h>
#include <string.h>
void main3(void)
{
puts("\n\n-----main3 reporting for duty!------\n");
char *first = "Metal Gear";
char *second = "Suikoden";
printf("strcmp(first, first) = %d\n", strcmp(first, first)); //returns 0 when both strings are identical.
printf("strcmp(first, second) = %d\n", strcmp(first, second)); //returns a negative when the first differenet char is less in first string. (M=77 S=83)
printf("strcmp(second, first) = %d\n", strcmp(second, first)); //returns a positive when the first different char is greater in first string.(M=77 S=83)
char sentence1[10];
strcpy(sentence1, first);
puts(sentence1);
char sentence2[10];
strncpy(sentence2, second, 10); //shouldn't I specify the sizeof(source) instead of sizeof(destination).
sentence2[9] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this nonsensical loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
}
This is how I teach myself to program. I write code and comment all I know about it so that
the next time I need to look up something, I just look at my own code in my files. In this one, I'm trying to learn the string library in c.
char *first = "Metal Gear";
char sentence1[10];
strcpy(sentence1, first);
This doesn't work because first has 11 characters: the ten in the string, plus the null terminator. So you would need char sentence1[11]; or more.
strncpy(sentence2, second, sizeof(sentence2));
//shouldn't I specify the sizeof(source) instead of sizeof(destination)?
No. The third argument to strncpy is supposed to be the size of the destination. The strncpy function will always write exactly that many bytes.
If you want to use strncpy you must also put a null terminator on (and there must be enough space for that terminator), unless you are sure that strlen(second) < sizeof sentence2.
Generally speaking, strncpy is almost never a good idea. If you want to put a null-terminated string into a buffer that might be too small, use snprintf.
This is how I teach myself to program.
Learning C by trial and error is not good. The problem is that if you write bad code, you may never know. It might appear to work , and then fail later on. For example it depends on what lies in memory after sentence1 as to whether your strcpy would step on any other variable's toes or not.
Learning from a book is by far and away the best idea. K&R 2 is a decent starting place if you don't have any other.
If you don't have a book, do look up online documentation for standard functions anyway. You could have learnt all this about strcpy and strncpy by reading their man pages, or their definitions in a C standard draft, etc.
Your problems start from here:
char sentence1[10];
strcpy(sentence1, first);
The number of characters in first, excluding the terminating null character, is 10. The space allocated for sentence1 has to be at least 11 for the program to behave in a predictable way. Since you have already used memory that you are not supposed to use, expecting anything to behave after that is not right.
You can fix this problem by changing
char sentence1[10];
to
char sentence1[N]; // where N > 10.
But then, you have to ask yourself. What are you trying to accomplish by allocating memory on the stack that's on the edge of being wrong? Are you trying to learn how things behave at the boundary of being wrong/right? If the answer to the second question is yes, hopefully you learned from it. If not, I hope you learned how to allocate adequate memory.
this is an array bounds write error. The indices are only 0-9
sentence2[10] = '\0';
it should be
sentence2[9] = '\0';
second, you're protecting the destination from buffer overflow, so specifying its size is appropriate.
EDIT:
Lastly, in this amazingly bad piece of code, which really isn't worth mentioning, is relevant to neither strcpy() nor strncpy(), yet seems to have earned me the disfavor of #nonsensicke, who seems to write very verbose and thoughtful posts... there are the following:
char *pointer = first;
for(int i =0; i < 500; i++)
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
Your use of int i=0 in the for loop is C99 specific. Depending on your compiler and compiler arguments, it can result in a compilation error.
for(int i =0; i < 500; i++)
better
int i = 0;
...
for(i=0;i<500;i++)
You neglect to check the return code of printf or indicate that you are deliberately ignoring it. I/O can fail after all...
printf("%c", *pointer);
better
int n = 0;
...
n = printf("%c", *pointer);
if(n!=1) { // error! }
or
(void) printf("%c", *pointer);
some folks will get onto you for not using {} with your if statements
if(*pointer == '\n') putchar('\n');
better
if(*pointer == '\n') {
putchar('\n');
}
but wait there's more... you didn't check the return code of putchar()... dang
better
unsigned char c = 0x00;
...
if(*pointer == '\n') {
c = putchar('\n');
if(c!=*pointer) // error
}
and lastly, with this nasty little loop you're basically romping through memory like a Kiwi in a Tulip field and lucky if you hit a newline. Depending on the OS (if you even have an OS), you might actually encounter some type of fault, e.g. outside your process space, maybe outside addressable RAM, etc. There's just not enough info provided to say actually, but it could happen.
My recommendation, beyond the absurdity of actually performing some type of detailed analysis on the rest of that code, would be to just remove it altogether.
Cheers!
I am trying to count how many times a . appear in a single string passed in by the command line.
calling myprog "this...is a test."
returns The count is 0?
What am I doing wrong here?
Note: I know this code may look odd but is for education purposes
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int len = strlen(argv[1]);
char *d = malloc (strlen(argv[1])+1);
strcpy(d,argv[1]);
char *p=d;
int count;
count=0;
while(*p){
if (*p ==','){
count++;
}
*p++;
}
printf("The count is: %d\n", count);
return 0;
}
You are counting the number of commas, not of periods. To count periods change the if statement to:
if (*p =='.'){
count++;
}
This code has quite a few...oddities. It's a little hard to guess which are intentional and which aren't, so let's go through it line by line and see what's there.
int len = strlen(argv[1]);
char *d = malloc (strlen(argv[1])+1);
strcpy(d,argv[1]);
It appears you probably intended to use len at some point, but as it stands right now, you get len, then re-compute the same value to use it. Presumably you intended something more like:
size_t len = strlen(argv[1]);
char *d = malloc(len+1);
strcpy(d, argv[1]);
I'd note, however, that there's really no reason to do any of this. Since you're just trying to examine the contents, you might as well just use argv[1] directly (or create another pointer to the same place and use that).
char *p=d;
This creates another pointer to the same location as d. You didn't really need d to start with, and you don't really need this either, but it's fairly harmless.
int count;
count=0;
I'd (strongly) prefer to see count initialized rather than left uninitialized, then assigned a value afterwards. Since there's no possibility of its being negative, I'd probably also make it an unsigned type: size_t count = 0;
while(*p){
if (*p ==','){
count++;
}
*p++;
}
As others have already pointed out, you're comparing to the wrong value here. I'd also note, however, that when you have an initialization, a test, and an "increment" operation of some sort, you're almost certainly better off using a for loop instead of a while loop.
In addition, you have the increment part a bit wrong here. You really only want p++, not *p++.
for (char *p=d; *p; ++p)
if (*p == '.')
++count;
When we get down to it, a slightly modified version of that loop is pretty much all we really need for the whole task though:
char const *p;
for (p = argv[1]; *p; ++p)
if (*p == '.')
++count;
change
if (*p ==',')
to
if (*p =='.')
to count ..
I believe there is just a typo. Replace ',' with '.' and it will return "4", I've just tested it.
I'm porting some code from Java to C, and so far things have gone well.
However, I have a particular function in Java that makes liberal use of StringBuilder, like this:
StringBuilder result = new StringBuilder();
// .. build string out of variable-length data
for (SolObject object : this) {
result.append(object.toString());
}
// .. some parts are conditional
if (freezeCount < 0) result.append("]");
else result.append(")");
I realize SO is not a code translation service, but I'm not asking for anyone to translate the above code.
I'm wondering how to efficiently perform this type of mass string concatenation in C. It's mostly small strings, but each is determined by a condition, so I can't combine them into a simple sprintf call.
How can I reliably do this type of string concatenation?
A rather "clever" way to conver a number of "objects" to string is:
char buffer[100];
char *str = buffer;
str += sprintf(str, "%06d", 123);
str += sprintf(str, "%s=%5.2f", "x", 1.234567);
This is fairly efficient, since sprintf returns the length of the string copied, so we can "move" str forward by the return value, and keep filling in.
Of course, if there are true Java Objects, then you'll need to figure out how to make a Java style ToString function into "%somethign" in C's printf family.
The performance problem with strcat() is that it has to scan the destination string to find the terminating \0' before it can start appending to it.
But remember that strcat() doesn't take strings as arguments, it takes pointers.
If you maintain a separate pointer that always points to the terminating '\0' of the string you're appending to, you can use that pointer as the first argument to strcat(), and it won't have to re-scan it every time. For that matter, you can use strcpy() rater than strcat().
Maintaining the value of this pointer and ensuring that there's enough room are left as an exercise.
NOTE: you can use strncat() to avoid overwriting the end of the destination array (though it will silently truncate your data). I don't recommend using strncpy() for this purpose. See my rant on the subject.
If your system supports them, the (non-standard) strcpy() and strlcat() functions can be useful for this kind of thing. They both return the total length of the string they tried to create. But their use makes your code less portable; on the other hand, there are open-source implementations that you can use anywhere.
Another solution is to call strlen() on the string you're appending. This isn't ideal, since it's then scanned twice, once by strcat() and once by strlen() -- but at least it avoids re-scanning the entire destination string.
The cause of poor performance when concatenating strings is the reallocation of memory. Joel Spolsky discusses this in his article Back to basics. He describes the naive method of concatenating strings:
Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.
The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.
The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"
"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"
If you can, you want to know how large your destination buffer needs to be before allocating it. The only realistic way to do this is to call strlen on all of the strings you want to concatenate. Then allocate the appropriate amount of memory and use a slightly modified version of strncpy that returns a pointer to the end of the destination buffer.
// Copies src to dest and returns a pointer to the next available
// character in the dest buffer.
// Ensures that a null terminator is at the end of dest. If
// src is larger than size then size - 1 bytes are copied
char* StringCopyEnd( char* dest, char* src, size_t size )
{
size_t pos = 0;
if ( size == 0 ) return dest;
while ( pos < size - 1 && *src )
{
*dest = *src;
++dest;
++src;
++pos;
}
*dest = '\0';
return dest;
}
Note how you have to set the size parameter to be the number of bytes left until the end of the destination buffer.
Here's a sample test function:
void testStringCopyEnd( char* str1, char* str2, size_t size )
{
// Create an oversized buffer and fill it with A's so that
// if a string is not null terminated it will be obvious.
char* dest = (char*) malloc( size + 10 );
memset( dest, 'A', size + 10 );
char* end = StringCopyEnd( dest, str1, size );
end = StringCopyEnd( end, str2, size - ( end - dest ) );
printf( "length: %d - '%s'\n", strlen( dest ), dest );
}
int main(int argc, _TCHAR* argv[])
{
// Test with a large enough buffer size to concatenate 'Hello World'.
// and then reduce the buffer size from there
for ( int i = 12; i > 0; --i )
{
testStringCopyEnd( "Hello", " World", i );
}
return 0;
}
Which produces:
length: 11 - 'Hello World'
length: 10 - 'Hello Worl'
length: 9 - 'Hello Wor'
length: 8 - 'Hello Wo'
length: 7 - 'Hello W'
length: 6 - 'Hello '
length: 5 - 'Hello'
length: 4 - 'Hell'
length: 3 - 'Hel'
length: 2 - 'He'
length: 1 - 'H'
length: 0 - ''
If operations like these are very frequent, you could implement them in your own buffer class. Example (error handling omitted for brevity ;-):
struct buff {
size_t used;
size_t size;
char *data;
} ;
struct buff * buff_new(size_t size)
{
struct buff *bp;
bp = malloc (sizeof *bp);
bp->data = malloc (size);
bp->size = size;
bp->used = 0;
return bp;
}
void buff_add_str(struct buff *bp, char *add)
{
size_t len;
len = strlen(add);
/* To be implemented: buff_resize() ... */
if (bp->used + len +1 >= bp->size) buff_resize(bp, bp->used+1+len);
memcpy(buff->data + buff->used, add, len+1);
buff->used += len;
return;
}
Given that the strings look so small, I'd be inclined just to use strcat and revisit if performance becomes an issue.
You could make your own method that remembers the string length so it doesn't need to iterate through the string to find the end (which is potentially the slow bit of strcat if you are doing lots of appends to long strings)