What is a delimiter in this chomp function? - c

I am trying to understand what the following code does
void chomp (char* string, char delim) {
size_t len = strlen (string);
if (len == 0) return;
char* nlpos = string + len - 1;
if (*nlpos == delim) *nlpos = '\0';
}
what is a delimiter?. Does the fourth line basically saves the last character in the string?

If the last character of the string matches delim, then that characters position in the string (*nlpos) is assigned a zero byte, which effectively terminates the C string one position closer to the beginning of the string.
I think that the term chomp became popular with Perl that often trimmed off the terminating newline when doing line by line processing.

The delimiter is the newline character.
Then string length is counted and type set to length (size_t formats lenghth into a ISO defined type that represents size).
Length of string is checked for zero (0) length and then returns to calling routine if true

This code will cut away the delimiter in a string (can be a buffer) and put null character(\0) at the end.
The fourth line will store the last char in the string and replaces it with the null character.
Delimiter is sequence of characters used to specify a boundary in plain text or region. Hence it will come at the end.
The NULL character is used in C style character strings to indicate where the end of the string is.

Related

in C I want to read in line by line from a file a certain way with the end length of the file changing

Ok I need to read information in from a file. I have to take certain parts of the line apart and do different things with each part. I know the maximum and minimum length of the file but I am doing something wrong when I read in the file and then split it up as I am getting really funny values and stuff when I try to compare methods. The maximum length of any line is 80 character.
The format for each line will be as follows: (I will write them in column form as they would appear in a character array)
0-7 _ 9 10-16 _ 18 19-28_ _31-79
spots 0-7 will contain a string(any being under 8 will have blank spaces)
spots 8,17,29,30 are all blank spaces (Marked by underscores)
spots 10-16 will contain a string (again any being under the max length will have blank spaces at the end)
spot 18 will contain a blank space or a character
spot 19-28 will contain another string (Same as other cases)
spot 31-79 can be filled with a string or may not exist at all depends on the users input.
Right now I am using a buffer of size 82 and then doing strncpy to take certain parts from the buffer to break it up. It appears to be working fine but when I do strcmp I am getting funky answers and the strlen is not giving the char arrays I declared the right length.
(I have declared them as having a max length of 8,9,etc. but strlen has been returning weird numbers like 67)
So if I could just read it in broken up it should completely resolve the issue.
I was hoping there would be a way to do this but am currently unsure.
Any help would be greatly appreciated. I have attached the part of the code where I think the error is.
(I know it isn't good to have the size hardcoded in there but I want to get it working first and then I'll get rid of the magic numbers)
while (fgets(buffer, sizeof buffer, fp) != NULL) /* read a line from a file */
{
if (buffer[0] == '.') //If it is a comment line just echo it do not increase counter
{
printf("%s", buffer);
}
else if (buffer[0] == ' ' && buffer[10] == ' ') // If it is a blank line print blank line do not increase counter
{
printf("\n");
}
else //it is an actual instruction perform the real operations
{
//copy label down
strncpy(label, &buffer[0], 8);
//copy Pnemonic into command string
strncpy(command, &buffer[9], 8);
//copy symbol down
symbol = buffer[syLoc];
//copy operand down
strncpy(operand, &buffer[19], 9);
Funky characters and overlong string lengths are a sign that the strings aren't null-terminated, as C (or at least most of C's library functions) expects them.
strncpy will yield null-terminated strings only if the buffer is greater than the length of the source string. In your case, you want to copy substrings out of the middle of a string, so your strings won't have the null terminator.
You could add the null-terminator by hand:
char label[9];
strncpy(label, &buffer[0], 8);
label[8] = '\0';
But given that you have spaces after the substrings you want anyway, you could also use strtok's approach to make your substrings pointers into the line you have read and overwrite the spaces with the null character:
char *label;
char *command;
label = &buffer[0];
buffer[8] = '\0';
command = &buffer[9];
buffer[9 + 8] = '\0';
This approach has the advantage that you don't need extra memory for the substrings. It has the drawback that your substrings will become invalid when you read the next line. If your substrings don't "live" long enough, that approach might be good for you.
Warning: strncpy function do not add any null termination(\0) at the end of the copied chars.
To protect the target char array you have to manually add a \0after each strncpycall like this:
//copy label down
strncpy(label, &buffer[0], 8);
label[8]='\0';
//copy Pnemonic into command string
strncpy(command, &buffer[9], 8);
command[8]='\0';
//copy symbol down
symbol = buffer[syLoc]; //Ok just a single char
//copy operand down
strncpy(operand, &buffer[19], 9);
operand[9]='\0';
If no '\0' is added, chars will be read until a '\0' is encountered in the address after the readed char array in the memory (buffer overflow).

How to add a character to the back of a char array when you obtain it with a gets() function in c?

I have an array of charracters where I put in information using a gets().
char inname[30];
gets(inname);
How can I add another character to this array without knowing the length of the string in c? (the part that are actual letters and not like empty memmory spaces of romething)
note: my buffer is long enough for what I want to ask the user (a filename, Probebly not many people have names longer that 29 characters)
Note that gets is prone to buffer overflow and should be avoided.
Reading a line of input:
char inname[30];
sscanf("%.*s", sizeof(inname), inname);
int len = strlen(inname);
// Remove trailing newline
if (len > 0 && inname[len-1] == '\n') {
len--;
inname[len] = '\0'
}
Appending to the string:
char *string_to_append = ".";
if (len + strlen(string_to_append) + 1) <= sizeof(inname)) {
// There is enough room to append the string
strcat(inname, string_to_append);
}
Optional way to append a single character to the string:
if (len < sizeof(inname) - 2) {
// There is room to add another character
inname[len++] = '.'; // Add a '.' character to the string.
inname[len] = '\0'; // Don't forget to nul-terminate
}
As you have asked in comment, to determine the string length you can directly use
strlen(inname);
OR
you can loop through string in a for loop until \0 is found.
Now after getting the length of prvious string you can append new string as
strcat(&inname[prevLength],"NEW STRING");
EDIT:
To find the Null Char you can write a for loop like this
for(int i =0;inname[i] != 0;i++)
{
//do nothing
}
Now you can use i direcly to copy any character at the end of string like:
inname[i] = Youe Char;
After this increment i and again copy Null char to(0) it.
P.S.
Any String in C end with a Null character termination. ASCII null char '\0' is equivalent to 0 in decimal.
You know that the final character of a C string is '\0', e.g. the array:
char foo[10]={"Hello"};
is equivalent to this array:
['H'] ['e'] ['l'] ['l'] ['0'] ['\0']
Thus you can iterate on the array until you find the '\0' character, and then you can substitute it with the character you want.
Alternatively you can use the function strcat of string.h library
Short answer is you can't.
In c you must know the length of the string to append char's to it, in other languages the same applies but it happens magically, and without a doubt, internally the same must be done.
c strings are defined as sequences of bytes terminated by a special byte, the nul character which has ascii code 0 and is represented by the character '\0' in c.
You must find this value to append characters before it, and then move it after the appended character, to illustrate this suppose you have
char hello[10] = "Hello";
then you want to append a '!' after the 'o' so you can just do this
size_t length;
length = strlen(hello);
/* move the '\0' one position after it's current position */
hello[length + 1] = hello[length];
hello[length] = '!';
now the string is "Hello!".
Of course, you should take car of hello being large enough to hold one extra character, that is also not automatic in c, which is one of the things I love about working with it because it gives you maximum flexibility.
You can of course use some available functions to achieve this without worrying about moving the '\0' for example, with
strcat(hello, "!");
you will achieve the same.
Both strlen() and strcat() are defined in string.h header.

Strange assignment in implementing strtok

I am studying the implementation of strtok and have a question. On this line, s [-1] = 0, I don't understand how tok is limited to the first token since we had previously assigned it everything contained in s.
char *strtok(char *s, const char *delim)
{
static char *last;
return strtok_r(s, delim, &last);
}
char *strtok_r(char *s, const char *delim, char **last)
{
char *spanp;
int c, sc;
char *tok;
if (s == NULL && (s = *last) == NULL)
return (NULL);
tok = s;
for (;;) {
c = *s++;
spanp = (char *)delim;
do {
if ((sc = *spanp++) == c) {
if (c == 0)
s = NULL;
else
s[-1] = 0;
*last = s;
return (tok);
}
} while (sc != 0);
}
}
tok was not previously assigned "everything contained in s". It was set to point to the same address as the address in s.
The s[-1] = 0; line is equivalent to *(s - 1) = '\0';, which sets the location just before where s is pointing to zero.
By setting that location to zero, returning the current value of tok will point to a string whose data spans from tok to s - 2 and is properly null-terminated at s - 1.
Also note that before tok is returned, *last is set to the current value of s, which is the starting scan position for the next token. strtok saves this value in a static variable so it can be remembered and automatically used for the next token.
This took much more space than I anticipated when I started, but I think it offers a useful explanation along with the others. (it became more of a mission really)
NOTE: This combination of strtok and strtok_r attempt to provide a reentrant implementation of the usual strtok from string.h by saving the address of the last character as a static variable in strtok. (whether it is reentrant was not tested)
The easiest way to understand this code (at least for me) is to understand what strtok and strtok_r do with the string they are operating on. Here strtok_r is where the work is done. strtok_r basically assigns a pointer to the string provided as an argument and then 'inch-worms' down the string, character-by-character, comparing each character to a delimiter character or null terminating character.
The key is to understand that the job of strtok_r is to chop the string up into separate tokens, which are returned on successive calls to the function. How does it work? The string is broken up into separate tokens by replacing each delimiter character found in the original string with a null-terminating character and returning a pointer to the beginning of the token (which will either be the start of the string on first call, or the next-character after the last delimiter on successive calls)
As with the string.h strtok function, the first call to strtok takes the original string as the first argument. For successive parsing of the same string NULL is used as the first argument. The original string is left littered with null-terminating characters after calls to strtok, so make a copy if you need it further. Below is an explanation of what goes on in strtok_r as you inch-worm down the string.
Consider for example the following string and strtok_r:
'this is a test'
The outer for loop stepping through string s
(ignoring the assignments and the NULL tests, the function assigns tok a pointer to the beginning of the string (tok = s). It then enters the for loop where it will step through string s one character at a time. c is assigned the (int value of) the current character pointed to by 's', and the pointer for s in incremented to the next character (this is the for loop increment of 's'). spanp is assigned the pointer to the delimiter array.
The inner do loop stepping though the delimeters 'delim'
The do loop is entered and then, using the spanp pointer, proceeds to go through the delim array testing if sc (the spanp character) equals the current for loop character c. If and only if our character c matches a delimiter, we then encounter the confusing if (c == 0) if-then-else test.
The if (c == 0) if-then-else test
This test is actually simple to understand when you think about it. As we are crawling down string s checking each character against the delim array. If we match one of the delimiters or hit the end, then what? We are about to return from the function, so what must we do?
Here we ask, did we reach the normal end of the string (c == 0), if so we set s = NULL, otherwise we match a delimiter, but are not at the end of the string.
Here is where the magic happens. We need to replace the delimiter character in the string with a null-terminating character (either 0 or '\0'). Why not set the pointer s = 0 here? Answer: we can't, we incremented it assigning c = *s++; at the beginning of the for loop, so s is now pointing to the next character in the string rather than the delimiter. So in order to replace the delimiter in string s with a null-terminating character, we must do s[-1] = 0; This is where the string s gets chopped into a token. last is assigned the address of the current pointer s and tok (pointing to the original beginning of s) is returned by the function.
So, in the main program, you how have the return of strtok_r which is a pointer pointing to the first character in the string s you passed to strtok_r which is now null-terminated at the first occurrence of the matching character in delim providing you with the token from the original string s you asked for.
There are two ways to reach the statement return(tok);. One way is that at the point where tok = s; occurs, s contains none of the delimiter characters (contents of delim).
That means s is a single token. The for loop ends when c == 0, that is, at the
null byte at the end of s, and strtok_r returns tok (that is,
the entire string that was in s at the time of tok = s;), as it should.
The other way for that return statement to occur is when s contains some character
that is in delim. In that case, at some point *spanp == c will be true where
*spanp is not the terminating null of delim, and therefore c == 0 is false.
At this point, s points to the character after the one from which c was read,
and s - 1 points to the place where the delimiter was found.
The statement s[-1] = 0; overwrites the delimiter with a null character, so now
tok points to a string of characters that starts where tok = s; said to start,
and ends at the first delimiter that was found in that string. In other words,
tok now points to the first token in that string, no more and no less,
and it is correctly returned by the function.
The code is not very well self-documenting in my opinion, so it is understandable
that it is confusing.

How is blank defined in C?

I want to get a string as input by using scanf and if the string is just a space or blank I have to print error message.
This is what I've tried to do:
char string1[20]
scanf("%s",string1)
if(string1=='')
print error message
But that didn't work, actually I didn't expect it to work because string1 is an array of chars.
Any hint how to do it?
You should note that the scanf function will never scan a string with only blanks in it. Instead check the return value of the function, if it's (in your case) less than one it failed to read a string.
You may want to use fgets to read a line, remove the trailing newline, and then check if each character in the string is a space (with the isspace function).
Like this:
char string1[20];
if (fgets(string1, sizeof(string1), stdin) != NULL)
{
/* Remove the trailing newline left by the `fgets` function */
/* This is done by changing the last character (which is the newline)
* to the string terminator character
*/
string1[strlen(string1) - 1] = '\0';
/* Now "remove" leading whitespace */
for (char *ptr = string1; *ptr != '\0' && isspace(*ptr); ++ptr)
;
/* After the above loop, `*ptr` will either be the string terminator,
* in which case the string was all blanks, or else `ptr` will be
* pointing to the actual text
*/
if (*ptr == '\0')
{
/* Error, string was empty */
}
else
{
/* Success, `ptr` points to the input */
/* Note: The string may contain trailing whitespace */
}
}
scanf() does not always skip leading blanks.
Select formats specifies like "%s", "%d", "%f" do skip leading blanks. (whitespace).
Other formats specifies like "%c", "%[]", "%n" do not skip skip leading whitespace.
Scan in line and look for spaces. (string1 may contain whitespace)
char string1[20];
// Scan in up to 19 non-LineFeed chars, then the next char (assumed \n)
int result = scanf("%19[^\n]%*c", string1);
if (result < 0) handle_IOError_or_EOF();
else if (result == 0) handle_nothing_entered();
else {
const char *p = string1;
while (isspace(*p)) p++;
if (*p == '\0')
print error message
}
First, scanf will skip any blank spaces if you put a space (or other white space characters like '\n' or '\t') before the format specifier, like scanf(" %s", &str)
Second, if(string1=='') will compare the char pointer string1 with the blank char '' which will never be true because an existing variable's address will be non-NULL. That said, there's no "blank" char like that '' in C. You need to get the line input and parse whether it is a blank line or contains only spaces

Strchr and strncpy Misuse

Hi im trying to find the - char and then place the leftmost characters into a string. Here i would like FUPOPER to be stored in program_id_DB, however when i run this code my output results to:
Character '-' found at position 8.
The prgmid contains FUPOPERL <-where is it getting this l?!?!
char data_DB[]="FUPOPER-$DSMSCM.OPER*.FUP";
char program_id_DB[10];
char program_name_DB_c[ZSYS_VAL_LEN_FILENAME];
char *pos = strchr(data_DB, '-');
if (pos)
strncpy(program_id_DB,data_DB, pos-data_DB);
printf("Character '-' found at position %d.\n", pos-data_DB+1);
printf("The prgmid contains %s\n",program_id_DB);
You didn't initialize program_id_DB, so it's free to contain anything it wants. Set it to zero before you start:
memset(program_id_DB, 0, 10);
(You need to #include <string.h> for memset.)
In fact, what you're doing is terribly dangerous because there's no guarantee that the string you pass to printf is null-terminated! Always zero the array before use and copy at most 9 non-null characters into it.
You need to put a \0 to mark the string's end.
A way to do it is: memset(program_id_DB, 0, sizeof(program_id_DB)); before you strncpy to it.
You have to append a null-terminating character at the end of the program_id_DB string as strncpy does not do this automatically for you if you've already copied N characters (i.e., in your case you're copying a total of eight characters, so there will not be a null-terminating character copied into the buffer if you copy more than seven characters). Either that, or zero-initialize your program-id_DB string using memset before using it with strncpy.
strncpy is a bitch!
It doesn't terminate the string. You need to terminate the string yourself.
if (pos) {
strncpy(program_id_DB,data_DB, pos-data_DB);
program_id_DB[pos - data_DB] = 0;
}
And if the string is too small, strncpy will set the remainder with zeros.
strncpy(dst, src, 1000); /* always writes 1000 bytes, whether it needs to */

Resources