Inserting "String" variable inside a given String in C - c

I have a function, which has a String Parameter:
function(char str[3]){
//here i want to insert the string Parameter str
f = open("/d1/d2/d3/test"+str+"/d2.xyz")
}
I am trying to "insert" the String parameter into the given String path. How can I do this in C?

The typical way would be to create a new string by piecing together the three pieces. One way do to this would be the following (shamelessly stolen from the #chux comment):
char buf[1000];
sprintf(buf, “/d1/d2/d3/test%s/d2.xyz”, str);
But before you go that route you need to ensure you really understand the printf family of functions as they are a common source of security related errors. For example, my buf size is large enough for your example, but certainly not for a general solution. Instead the sizes of the input strings would need to be taken into account to ensure the output buffer is large enough.

Related

Can a C implementation use length-prefixed-strings "under the hood"?

After reading this question: What are the problems of a zero-terminated string that length-prefixed strings overcome? I started to wonder, what exactly is stopping a C implementation from allocating a few extra bytes for any char or wchar_t array allocated on the stack or heap and using them as a "string prefix" to store the number N of its elements?
Then, if the N-th character is '\0', N - 1 would signify the string length.
I believe this could mightily boost performance of functions such as strlen or strcat.
This could potentially turn to extra memory consumption if a program uses non-0-terminated char arrays extensively, but that could be remedied by a compiler flag turning on or off the regular "count-until-you-reach-'\0'" routine for the compiled code.
What are possible obstacles for such an implementation? Does the C Standard allow for this? What problems can this technique cause that I haven't accounted for?
And... has this actually ever been done?
You can store the length of the allocation. And malloc implementations really do do that (or some do, at least).
You can't reasonably store the length of whatever string is stored in the allocation, though, because the user can change the contents to their whim; it would be unreasonable to keep the length up to date. Furthermore, users might start strings somewhere in the middle of the character array, or might not even be using the array to hold a string!
Then, if the N-th character is '\0', N - 1 would signify the string length.
Actually, no, and that's why this suggestion cannot work.
If I overwrite a character in a string with a 0, I have effectively truncated the string, and a subsequent call of strlen on the string must return the truncated length. (This is commonly done by application programs, including every scanner generated by (f)lex, as well as the strtok standard library function. Amongst others.)
Moreover, it is entirely legal to call strlen on an interior byte of the string.
For example (just for demonstration purposes, although I'll bet you can find code almost identical to this in common use.)
/* Split a string like 'key=value...' into key and value parts, and
* return the value, and optionally its length (if the second argument
* is not a NULL pointer).
* On success, returns the value part and modifieds the original string
* so that it is the key.
* If there is no '=' in the supplied string, neither it nor the value
* pointed to by plen are modified, and NULL is returned.
*/
char* keyval_split(char* keyval, int* plen) {
char* delim = strchr(keyval, '=');
if (delim) {
if (plen) *plen = strlen(delim + 1)
*delim = 0;
return delim + 1;
} else {
return NULL;
}
}
There's nothing fundamentally stopping you from doing this in your application, if that was useful (one of the comments noted this). There are two problems that would emerge, however:
You'd have to reimplement all the string-handling functions, and have my_strlen, my_strcpy, and so on, and add string-creating functions. That might be annoying, but it's a bounded problem.
You'd have to stop people, when writing for the system, deliberately or automatically treating the associated character arrays as ‘ordinary’ C strings, and using the usual functions on them. You might have to make sure that such usages broke promptly.
This means that it would, I think, be infeasible to smuggle a reimplemented ‘C string’ into an existing program.
Something like
typedef struct {
size_t len;
char* buf;
} String;
size_t my_strlen(String*);
...
might work, since type-checking would frustrate (2) (unless someone decided to hack things ‘for efficiency’, in which case there's not much you can do).
Of course, you wouldn't do this unless and until you'd proven that string management was the bottleneck in your code and that this approach provably improved things....
There are a couple of issues with this approach. First of all, you wouldn't be able to create arbitrarily long strings. If you only reserve 1 byte for length, then your string can only go up to 255 characters. You can certainly use more bytes to store the length, but how many? 2? 4?
What if you try to concatenate two strings that are both at the edge of their size limits (i.e., if you use 1 byte for length and try to concatenate two 250-character strings to each other, what happens)? Do you simply add more bytes to the length as necessary?
Secondly, where do you store this metadata? It somehow has to be associated with the string. This is similar to the problem Dennis Ritchie ran into when he was implementing arrays in C. Originally, array objects stored an explicit pointer to the first element of the array, but as he added struct types to the language, he realized that he didn't want that metadata cluttering up the representation of the struct object in memory, so he got rid of it and introduced the rule that array expressions get converted to pointer expressions in most circumstances.
You could create a new aggregate type like
struct string
{
char *data;
size_t len;
};
but then you wouldn't be able to use the C string library to manipulate objects of that type; an implementation would still have to support the existing interface.
You could store the length in the leading byte or bytes of the string, but how many do you reserve? You could use a variable number of bytes to store the length, but now you need a way to distinguish length bytes from content bytes, and you can't read the first character by simply dereferencing the pointer. Functions like strcat would have to know how to step around the length bytes, how to adjust the contents if the number of length bytes changes, etc.
The 0-terminated approach has its disadvantages, but it's also a helluva lot easier to implement and makes manipulating strings a lot easier.
The string methods in the standard library have defined semantics. If one generates an array of char that contains various values, and passes a pointer to the array or a portion thereof, the methods whose behavior is defined in terms of NUL bytes must search for NUL bytes in the same fashion as defined by the standard.
One could define one's own methods for string handling which use a better form of string storage, and simply pretend that the standard library string-related functions don't exist unless one must pass strings to things like fopen. The biggest difficulty with such an approach is that unless one uses non-portable compiler features it would not be possible to use in-line string literals. Instead of saying:
ns_output(my_file, "This is a test"); // ns -- new string
one would have to say something more like:
MAKE_NEW_STRING(this_is_a_test, "This is a test");
ns_output(my_file, this_is_a_test);
where the macro MAKE_NEW_STRING would create a union of an anonymous type, define an instance called this_is_a_test, and suitably initialize it. Since a lot of strings would be of different anonymous types, type-checking would require that strings be unions that include a member of a known array type, and code expecting strings should be given a pointer that member, likely using something like:
#define ns_output(f,s) (ns_output_func((f),(s).stringref))
It would be possible to define the types in such a way as to avoid the need for the stringref member and have code just accept void*, but the stringref member would essentially perform static duck-typing (only things with a stringref member could be given to such a macro) and could also allow type-checking on the type of stringref itself).
If one could accept those constraints, I think one could probably write code that was more efficient in almost every way that zero-terminated strings; the question would be whether the advantages would be worth the hassle.

Is it safe to concatenate formatted strings using sprintf()?

I am writing a program that requires a gradual building of a formatted string, to be printed out as the last stage. The string includes numbers that are collected while the string is formed. Thus, I need to add formatted string fragments to the output string.
One straight forward way is to use sprintf() to a temporary string that contains the formatted fragment, which is then concatenated to the output string using strcat(), like demonstrated in this answer.
A more sophisticated approach is to point sprintf() to the end of the current output string, when adding the new fragment. This is demonstrated here.
The help page to the MSVC sprintf_s() function (and the other variants of sprintf()) states that:
If copying occurs between strings that overlap, the behavior is
undefined.
Now, technically, using sprintf() to concatenate the fragment to the end of the output string means overwriting the terminating NULL, which is considered a part of the first string. So, this action falls under the category of overlapping strings. The technique seems to work well, but is it really safe?
The method in the answer you linked to:
strcat() for formatted strings
is safe against overlapping string issues, but of course it's unsafe in that it's performing unbounded writes using sprintf rather than snprintf.
What's not safe, and what the text about overlapping strings is referring to, is something like:
snprintf(buf, sizeof buf, "%s%s", buf, tail);
Here, overlapping ranges of buf are being used both as an input and an output, and this results in undefined behavior. Don't do it.

Splitting input lines at a comma

I am reading the contents of a file into a 2D array. The file is of the type:
FirstName,Surname
FirstName,Surname
etc. This is a homework exercise, and we can assume that everyone has a first name and a surname.
How would I go about splitting the line using the comma so that in a 2D array it would look like this:
char name[100][2];
with
Column1 Column2
Row 0 FirstName Surname
Row 1 FirstName Surname
I am really struggling with this and couldn't find any help that I could understand.
You can use strtok to tokenize your string based on a delimiter, and then strcpy the pointer to the token returned into your name array.
Alternatively, you could use strchr to find the location of the comma, and then use memcpy to copy the parts of the string before and after this point into your name array. This way will also preserve your initial string and not mangle it the way strtok would. It'll also be more thread-safe than using strtok.
Note: a thread-safe alternative to strtok is strtok_r, however that's declared as part of the POSIX standard. If that function's not available to you there may be a similar one defined for your environment.
EDIT: Another way is by using sscanf, however you won't be able to use the %s format specifier for the first string, you'd instead have to use a specifier with a set of characters to not match against (','). Since it's homework (and really simply) I'll let you figure that out.
EDIT2: Also, your array should be char name[2][100] for an array of two strings, each of 100 chars in size. Otherwise, with the way you have it, you'll have an array of 100 strings, each of 2 chars in size.

C: Search and Replace/Delete

Is there a function in C that lets you search for a particular string and delete/replace it? If not, how do I do it myself?
It can be dangerous to do a search and replace - unless you are just replacing single chars with another char (ie change all the 'a' to 'b'). Reason being the replaced value could try and make the char array longer. Better to copy the string and replace as you into a new char array that can hold the result. A good find C function in strstr(). So you can find you string - copy everything before it to another buffer, add your replacement to the buffer - and repeat.
<string.h> is full of string processing functions. Look here for reference:
http://www.edcc.edu/faculty/paul.bladek/c_string_functions.htm
http://www.cs.cf.ac.uk/Dave/C/node19.html

strncmp proper usage

Here's the quick background: I've got a client and a server program that are communicating with each other over a Unix socket. When parsing the received messages on the server side, I am attempting to use strncmp to figure out what action to take.
The problem I'm having is figuring out exactly what to use for the length argument of strncmp. The reason this is being problematic is that some of my messages share a common prefix. For example, I have a message "getPrimary", which causes the server to respond with a primary server address, and a message "getPrimaryStatus", which causes the server to respond with the status of the primary server. My initial thought was to do the following:
if(strncmp(message,"getPrimary",strlen("getPrimary"))==0){
return foo;
}
else if(strncmp(message,"getPrimaryStatus",strlen("getPrimaryStatus"))==0){
return bar;
}
The problem with this is when I send the server "getPrimaryStatus", the code will always return foo because strncmp is not checking far enough in the string. I could pass in strlen(message) as the length argument to strncmp, but this seems to defeat the purpose of using strncmp, which is to prevent overflow in the case of unexpected input. I do have a static variable for the maximum message length I can read, but it seems like passing this in as the length is only making sure that if the message overflows, the effects are minimized.
I've come up with a few solutions, but they aren't very pretty, so I was wondering if there was a common way of dealing with this problem.
For reference, my current solutions are:
Order my if / else if statements in such a way that that any messages with common prefixes are checked in order of descending length (which seems like a really good way to throw a landmine in my code for anyone trying to add something to it later on).
Group my messages with common prefixes together and look for the suffix first:
if(strncmp(message,"getPrimary",strlen("getPrimary"))==0){
if(strncmp(message,"getPrimaryStatus",strlen("getPrimaryStatus"))==0){
return bar;
else
return foo;
}
}
But this just feels messy, especially since I have about 20 different possible messages that I'm handling.
Create an array of all the possible messages I have, add a function to my init sequence that will order the array by descending length, and have my code search through the elements of that list until it finds a match. This seems complicated and silly.
It seems like this should be a common enough issue that there ought to be a solution for it somewhere, but I haven't been able to find anything so far.
Thanks in advance for the help!
Presuming that the string in message is supposed to be null-terminated, the only reason to use strncmp() here rather than strcmp() would be to be to prevent it looking beyond the end of message, in the case where message is not null-terminated.
As such, the n you pass to strncmp() should be the received size of message, which you ought to know (from the return value of the read() / recv() function that read the message).
One technique is to compare the longest names first - order the tests (or the table containing the keywords) so that the longer names precede the shorter. However, taking your example:
GetPrimaryStatus
GetPrimary
You probably want to ensure that GetPrimaryIgnition is not recognized as GetPrimary. So you really need to compare using the length of the longer of the two strings - the message or the keyword.
Your data structure here might be:
static const struct
{
char *name;
size_t name_len;
int retval;
} Messages[] =
{
{ "getPrimaryStatus", sizeof("getPrimaryStatus"), CMD_PRIMARYSTATUS },
{ "getPrimary", sizeof("getPrimary"), CMD_PRIMARY },
...
};
You can then loop through this table to find the relevant command. With some care you can limit the range that you have to look at. Note that the sizeof() values include the NUL at the end of the string. This is useful if you can null terminate your message
However, it is by far simplest if you can null terminate the command word in the message, either by copying the message somewhere or by modifying the message in situ. You then use strcmp() instead of strncmp(). A Shortest Unique Prefix lookup is harder to code.
One plausible way of finding the command word is with strcspn() - assuming your commands are all alphabetic or alphanumeric.
I sense that you are using strncmp to prevent buffer overflow, however, the message is already copied into memory (i.e. the message buffer). Also, the prototype
int strncmp ( const char * str1, const char * str2, size_t num );
indicates that the function has no side effects (i.e. it does not change either input buffer) so there should be no risk that it will overwrite the buffer and change memory. (This is not the case for strcpy(). )
You could make sure that the length of your message buffer is longer than your longest command string. That way you are sure that you are always accessing memory that you own.
Also, if you insist on using strncmp you could store your list of commands in an array and sort it from largest to smallest. You could associate each string with a length (and possibly a function pointer to execute a handler).
Finally, you could find a C version of what C++ calls a map or what Ruby or PHP call associative arrays. This lets library handle this if-else tree for you efficiently and correctly.
Don't use strncmp(). Use strlcmp() instead. It's safer.
Does your message contain just one of these commands, or a command string followed by whitespace/open-parenthesis/etc.?
If it's the former, drop strncmp and just use strcmp.
If it's the latter, simply check isspace(message[strlen(command)]) or message[strlen(command)]=='(' or similar. (Note: strlen(command) is a constant and you should probably write it as such, or use a macro to get it from the size of the string literal.)
The only safe way to use strncmp to determine if two strings are equal is to verify beforehand that the strings have the same length:
/* len is a placeholder for whatever variable or function you use to get the length */
if ((len(a) == len(b)) && (strncmp(a, b, len(a)) == 0))
{
/* Strings are equal */
}
Otherwise you will match to something longer or shorter than your comparison:
strncmp(a, "test", strlen("test")) matches "testing", "test and a whole bunch of other characters", ect.
strncmp(a, "test", strlen(a)) matches "", "t", "te", "tes".
Use strcmp, but also compare the lengths of the two strings. If the lengths are identical, then strcmp will give you the result you seek.
Digging out from my memory doing C programming a year ago, I think the third argument is supposed to tell the function how many characters to process for the comparison. That's why it's safe as you have control over how many characters to process
So should be something like:
if(strncmp(message, "getPrimary", strlen("getPrimary")) {
//
}

Resources