What if there's '\0' character in command line input? - c

int main(int argc,char* argv[]);
If there's a '\0' character in A, will it be split into 2 arguments ?
./programe "A"
I can't reproduce it easily as I can't put an '\0' into A,but there might be someone who can.

Parameters are passed into programs as C strings; in particular, the execve() syscall (lowest level visible to programs and generally ether very close to or identical to the kernel API) uses C strings, so it is not possible to pass \0 within a parameter. Note that, while the usual way the parameter vector is passed into the process's address space by the kernel is contiguous, so that an embedded \0 would indeed split a parameter, the low level exec() interface uses a list of (char *)s, so an embedded \0 would simply terminate the parameter early.

Related

Different behavior for \" in C

I have a strange problem when using string function in C.
Currently I have a function that sends string to UART port.
When I give to it a string like
char buf[32];
strcpy(buf, "AT+CPMS=\"SM");
strcat(buf, "\"");
uart0_putstr(buf);
//or
uart0_putstr("AT+CPMS=SM"); //not a valid AT command, but without quotes just for test
it works well and sends string to UART. But when I use such call:
char buf[32];
strcpy(buf, "AT+CPMS=\"SM\"");
uart0_putstr(buf);
//or
uart0_putstr("AT+CPMS=\"SM\"");
it doesn't print to UART anything.
Maybe you can explain me what the difference between strings in first and second/third cases?
First the C language part:
String literals: All C string literals include an implicit null byte at the end; the C string literal "123" defines a 4 byte array with the values 49,50,51,0. The null byte is always there even if it is never mentioned and enables strlen, strcat etc. to find the end of the string. The suggestion strcpy(buf, "AT+CPMS=\"SM\"\0"); is nonsensical: The character array produced by "AT+CPMS=\"SM\"\0" now ends in two consecutive zero bytes; strcpy will stop at the first one already. "" is a 1 byte array whose single element has the value 0. There is no need to append another 0 byte.
strcat, strcpy: Both functions always add a null byte at the end of the string. There is no need to add a second one.
Escaping: As you know, a C string literal consists of characters book-ended by double quotes: "abc". This makes it impossible to have simple double quotes as part of the string because that would end the string. We have to "escape" them. The C language uses the backslash to give certain characters a special meaning, or, in this case, suppress the special meaning. The entire combination of backslash and subsequent source code character are transformed into a single character, or byte, in the compiled program. The combination \n is transformed into a single byte with the value 13 (usually interpreted as a newline by output devices), \r is 10 (usually carriage return), and \" is transformed into the byte 34, usually printed as the " glyph. The string Between the arrows is a double quote: ->"<- must be coded as "Between the arrows is a double quote: ->\"<-" in C. The middle double quote doesn't end the string literal because it is "escaped".
Then the UART part: The internet makes me believe that the command you want to send over the UART looks like AT+CPMS="SM", followed by a carriage return. The corresponding C string literal would be "AT+CPMS=\"SM\"\r".
The page I linked also inserts a delay between sending commands. Sending too quickly may cause errors that appear only sometimes.
The things to note are :
The AT command syntax probably demands that SM be surrounded by quotes on both sides.
Additionally, the protocol probably demands that a command end in a carriage return.
This ...
char buf[32];
strcpy(buf, "AT+CPMS=\"SM");
strcat(buf, "\"");
... produces the same contents in buf as this ...
char buf[32];
strcpy(buf, "AT+CPMS=\"SM\"");
... does, up to and including the string terminator at index 12. I fully expect an immediately following call to ...
uart0_putstr(buf);
... to have the same effect in each case unless uart0_putstr() looks at bytes past the terminator or its behavior is sensitive to factors other than its argument.
If it does look past the terminator, however, then that might explain not only a difference between those two, but also a difference with ...
uart0_putstr("AT+CPMS=\"SM\"");
... because in this last case, looking past the string terminator would overrun the bounds of the array, producing undefined behavior.
Thanks all. Finally It was resolved with adding NULL char to the end of string.

Properties of strcpy()

I have a global definition as following:
#define globalstring "example1"
typedef struct
{
char key[100];
char trail[10][100];
bson_value_t value;
} ObjectInfo;
typedef struct
{
ObjectInfo CurrentOrderInfoSet[5];
} DataPackage;
DataPackage GlobalDataPackage[10];
And I would like to use the strcpy() function in some of my functions as following:
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, "example2");
char string[100] = "example3";
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, string);
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, globalstring);
First question: Are the global defined strings all initiated with 100 times '\0'?
Second qestion: I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it is or does it fully delete any content of the destination string prior to that?
Third question: All my strings are fixed length of 100. If I use the 3 examples of strcpy() above, with my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it? Meaning do I run into problems when using functions like strlen(), printf() later?
Fourth question: What happens when I strcpy() empty strings?
I plan to overwrite these strings in loops various times and would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Thx.
Are the global defined strings all initiated with 100 times '\0'?
Yes. Global char arrays will be initilizated to all zeros.
I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it
Exactly. It copies the characters up until and including '\0' and does not care about the rest.
If I use ... my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it?
Yes, but NULL is a pointer, it's terminated with zero byte, sometimes called NUL. You might want to see What is the difference between NUL and NULL? .
Meaning do I run into problems when using functions like strlen(), printf() later?
Not if your string lengths are less than or equal to 99.
What happens when I strcpy() empty strings?
It just copies one zero byte.
would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Safety is a broad concept. As far as safety as in if the program will execute properly, there is no point in caring about anything after zero byte, so just strcpy it.
But you should check if your strings are less than 99 characters and handle what to do it they are longer. You might be interested in strnlen, but the interface is confusing - I recommend to use memcpy + explicitly manually set zero byte.

C Why is an unrelated/undeclared variable influencing the output of another?

When the character array substring[#] is set as [64], the file outputs an additional character. The additional character varies with each compile. Sometimes es?, sometimes esx among others.
If I change the [64] to any other number (at least the ones I've tried: 65, 256,1..) it outputs correctly as es.
Even more strange, if I leave the unused/undeclared character array char newString[64] in this file, it outputs the correct substring es even with the 64.
How does the seemingly arbitrary size of 64 affect the out?
How does a completely unrelated character array (newString) influence how another character array is output?
.
int main () {
char string[64];
char newString[64];
char substring[64];
fgets(string,64,stdin);
strncpy(substring, string+1, 1);
printf("%s\n", substring);
return 0;
}
The problem is, strncpy() will not copy the null terminator because you've asked it not to.
Using strncpy() is safe and dangerous at the same time, because it will not always copy the null terminator, also using it for a single byte is pointless, instead do this
substring[0] = string[1];
substring[1] = '\0';
and it shall work.
You should read the manual page strncpy(3) to understand what I mean correctly, if you read the manual carefully every time you would become a better programmer in a shorter time.

Kindly explain how this function of the C program works?

I am reading the book Windows System Programming. In the second chapter, there is a program Cat.c , It implements the cat command of linux. The code is http://pastebin.com/wwQFp599
On the 20th line, a function is called
iFirstFile = Options (argc, argv, _T("s"), &dashS, NULL);
Code for Option is http://pastebin.com/QegxxFpn
Now, the parameters for option is
(int argc, LPCTSTR argv [], LPCTSTR OptStr, ...)
1) What is this "..."? Do it mean we can supply it unlimited number of arguments of type LPCTSTR ?
2)If I execute the program as cat -s a.txt
a) What will be the argc and why?
b) What will be the argv and why?
c) What is _T("s")? Why _T is used here?
d) Why &dashS is used? It is address of a boolean most probably. But I can't understand the logic behind use of this.
e) Why they have passed NULL as last parameter?
I have Basic knowledge of C programming and these things are really confusing. So kindly explain.
You have two different kinds of "variable" lists of arguments here.
First, you have the arguments passed to the program on the command line, clearly a person could invoke the program from the command line with many arguments
cat file1 file2 file3
and so on. The main() of C programs have since the early days of C given access to the command line arguments in the variables argc and argv, argc is the count of how many arguments (3 + the name of the pogram itself in my example above) and argv is the array of the arguments (actually an array of pointers to strings) So in this case we can access argv[0], argv[1], arv[2] and argv[3], knowing to stop there because argc tells us there are four arguments.
So in your example argc will be 3, argv[0] will point to "cat", argv[1] to "-s" and argv[2] to "a.txt".
Next the function you are looking at itself takes an indefinate number of arguments as indicated by the elipses - the ...
You need to read about variable arguments. This is a language feature that was not in the earliest C language, and is considered to be a little bit advanced, hence some of your books either may not cover it, or leave until late in the book. The key point here though is that we interpretting the variable list we need to know when we have reached the end of the variable list, we don't have an "argc" equivalent. So we put a "this is the last one, stop here" value in the function call, that's the NULL you ask about.
1) "..." is a variable argument list as username Cornstalks pointed out. It allows functions like printf() to have a variable number of arguments but their type and number of arguments have to be specified in one of the arguments (like the formatting string for printf()). See *va_list.h* or stdarg.h.
2) a) argc is the number of arguments specified at the command line.
b) argv is the argument array, it's an array of strings.
c) The _T() is a macro, I know it as TEXT(). Basically it allows programmers to use either ASCII strings or Unicode strings at build time without having to modify the entire code. If the UNICODE macro is defined the string specified as argument to the _T() macro becomes L"string", else it becomes "string". That's why some functions have an A or a W as a last letter. For example, OutputDebugString defaults to OutputDebugStringW if UNICODE is defined and OutputDebugStringA if UNICODE is not defined. The functions that have A as a last letter in their name only accept ASCII strings while W only accepts Unicode strings. There's also a type defined for this purpose, TCHAR defaults either to CHAR or WCHAR, and there's also another entry point, i.e. _tmain().
d) &variable means the address of a variable. It is used to pass along to a function the location in the memory of the variable contents so that if the function modifies the value of the variable, the variable is modified everywhere else where it is used.
e) You'll have to look at the function prototype.
It seems to me that you have been misled to believe that starting Windows Programming is the way to go if you want to learn to program. The C and C++ programming languages are OS independent by default and you should learn the independent part first. I recommend "C Programming : A modern approach".

Tool functions for chars

I want to handle some char variables and would like to get a list of some functions that can do these tasks when it comes to handling chars.
Getting first characters of a char (var_name[1] doesnt seem to work)
Getting last characters of a char
Checking for char1 matches with char2 ( eg if "unicorn" matches words with "bicycle"
I am pretty sure some of these methods exist in libraries such as stdio.h or so but google isnt my friend.
EDIT:My 3rd question means not direct match with strcmp but single character match(eg if "hey" and "hello") have e as common letter.
Use var_name[0] to get first character (array indexes run from 0 to N - 1, where N is the number of elements in the array).
Use var_name[strlen(var_name) - 1] to get the last character.
Use strcmp() to compare two char strings.
EDIT:
To search for character in a string you can use strchr():
if (strchr("hello", 'e') && strchr("hey", 'e'))
{
}
There is also strpbrk() function that would indicate if two strings have any common characters:
if (strpbrk("hello", "hey"))
{
}
Assuming you mean a char[], and not a char which is a single character.
C uses 0-based indexing, var_name[0] gives you the first char.
strlen() gives you the length of the string, which together with my answer to 1. means
char lastchar = var_name[strlen(var_name)-1]; http://www.cplusplus.com/reference/clibrary/cstring/strlen/
strcmp(var_name1, var_name2) == 0. http://www.cplusplus.com/reference/clibrary/cstring/strcmp/
I am pretty sure some of these methods exist in libraries such as
stdio.h or so but google isnt my friend.
The string functions in the C standard library (libc) are described in the header file . If you're on a unix-ish machine, try typing man 3 string at a command line. You can then use the man program again to get more information about specific functions, e.g. man 3 strlen. (The '3' just tells man to look in "section 3", which describes the C standard library functions.)
What you're looking for is the string functions in the C runtime library. These are defined in string.h, not stdio.h.
But your list of problems is simple:
var_name[0] works perfectly well for accessing the first char in an array. var_name[ 1] doesn't work because arrays in C are zero-based.
The last char in an array is:
char c;
c = var_name[strlen(var_name)-1];
Testing for equality is simple:
if (var_name[0] == var_name[1])
; // they match
C and C++ strings are zero indexed. The memory you need to hold a particular length string has to be at least the string length and one character for the string terminator \0. So, the first character is array[0].
As #Carey Gregory said, the basic string handling functions are in string.h. But these are only primitives for handling strings. C is a low level enough language, that you have an opportunity to build up your own string handling library based on the functions in string.h.
On example might be that you want to pass a string pointer to a function and also the length of the buffer holding that sane string, not just the string length itself.

Resources