Trouble understanding how to process C string - c

I'm trying to use Mac OS X's listxattr C function and turn it into something useful in Python. The man page tells me that the function returns a string buffer, which is a "simple NULL-terminated UTF-8 strings and are returned in arbitrary order. No extra padding is provided between names in the buffer."
In my C file, I have it set up correctly it seems (I hope):
char buffer[size];
res = listxattr("/path/to/file", buffer, size, options);
But when I got to print it, I only get the FIRST attribute ONLY, which was two characters long, even though its size is 25. So then I manually set buffer[3] = 'z' and low and behold when I print buffer again I get the first TWO attributes.
I think I understand what is going on. The buffer is a sequence of NULL-terminated strings, and stops printing as soon as it sees a NULL character. But then how am I supposed to unpack the entire sequence into ALL of the attributes?
I'm new to C and using it to figure out the mechanics of extending Python with C, and ran into this doozy.

char *p = buffer;
get the length with strlen(p). If the length is 0, stop.
process the first chunk.
p = p + length + 1;
back to step 2.

So you guessed pretty much right.
The listxattr function returns a bunch of null-terminated strings packed in next to each other. Since strings (and arrays) in C are just blobs of memory, they don't carry around any extra information with them (such as their length). The convention in C is to use a null character ('\0') to represent the end of a string.
Here's one way to traverse the list, in this case changing it to a comma-separated list.
int i = 0;
for (; i < res; i++)
if (buffer[i] == '\0' && i != res -1) //we're in between strings
buffer[i] = ',';
Of course, you'll want to make these into Python strings rather than just substituting in commas, but that should give you enough to get started.

It looks like listxattr returns the size of the buffer it has filled, so you can use that to help you. Here's an idea:
for(int i=0; i<res-1; i++)
{
if( buffer[i] == 0 )
buffer[i] = ',';
}
Now, instead of being separated by null characters, the attributes are separated by commas.

Actually, since I'm going to send it to Python I don't have to process it C-style after all. Just use the Py_BuildValue passing it the format character s#, which knows what do with it. You'll also need the size.
return Py_BuildValue("s#", buffer, size);
You can process it into a list on Python's end using split('\x00'). I found this after trial and error, but I'm glad to have learned something about C.

Related

Integer to pointer type conversion

int count_words(string word)
{
string spaces = "";
int total_words = 1;
int i, j = 0 ;
for (i = 0; i < strlen(word); i++)
{
strcpy(spaces, word[i]);
if (strcmp(spaces, " ") == 0)
{
total_words = total_words + 1;
}
}
return total_words;
}
I am trying to make a function in c that gets the total number of words, and my strategy is to find the number of spaces in the string input. However i get an error at strcpy about integer to ptrtype conversion,. I cant seem to compare the 2 strings without getting the error. Can someone explain to me whats how the error is happening and how I would go about fixing it. The IDE is also suggesting me to add an ampersand beside word[i] but it then makes a segmentation fault output
You need to learn a little more about the distinction between characters and strings in C.
When you say
strcpy(spaces, word[i]);
it looks like you're trying to copy one character to a string, so that in the next line you can do
if (strcmp(spaces, " ") == 0)
to compare the string against a string consisting of one space character.
Now, it's true, if you're trying to compare two strings, you do have to call strcmp. Something like
if (spaces == " ") /* WRONG */
definitely won't cut it.
In this case, though, you don't need to compare strings. You're inspecting your input a character at a time, so you can get away with a simple character comparison instead. Get rid of the spaces string and the call to strcpy, and just do
if (word[i] == ' ')
Notice that I'm comparing against the character ' ', not the string " ". Using == to compare single characters like this is perfectly fine.
Sometimes, you do have to construct a string out of individual characters, "by hand", but when you do, it's a little more elaborate. It would look like this:
char spaces[2];
spaces[0] = word[i];
spaces[1] = '\0';
if (strcmp(spaces, " ") == 0)
...
This would work, and you might want to try it to be sure, but it's overkill, and there's no reason to write it that way, except perhaps as a learning exercise.
Why didn't your code
strcpy(spaces, word[i]);
work? What did the error about "integer to pointer conversion" mean? Actually there are several things wrong here.
It's not clear what the actual type of the string spaces is (more on this later), but it has space for at most 0 characters, so you're not going to be able to copy a 1-character string into it.
It's also not clear that spaces is even writable. It might be a constant, meaning that you can't legally copy any characters into it.
Finally, strcpy copies one string to another. In C, although strings are arrays, they're usually referred to as pointers. So strcpy accepts two pointers, one to the source and one to the destination string. But you passed it word[i], which is a single character, not a string. Let's say the character was A. If you hadn't gotten the error, and if strcpy had tried to do its job, it would have treated A as a pointer, and it would have tried to copy a string from address 65 in memory (because the ASCII value of the character A is 65).
This example shows that working with strings is a little bit tricky in C. Strings are represented as arrays, but arrays are second-class citizens in C. You can't pass arrays around, but arrays are usually referred to by simple pointers to their first element, which you can pass around. This turns out to be very convenient and efficient once you understand it, but it's confusing at first, and takes some getting used to.
It might be nice if C did have a built-in, first-class string type, but it does not. Since it does not, C programmers myst always keep in mind the distinction between arrays and characters when working with strings. That "convenient" string typedef they give you in CS50 turns out to be a bad idea, because it's not actually convenient at all -- it merely hides an important distinction, and ends up making things even more confusing.

realloc:invalid next size error, can anyone point out the mistake i did in memory allocation

text which i passed to get_document function is a normal string data.
1." " denotes separation of words.
2."." denotes separation of sentences.
3."\n" denotes separation of paragraphs.
get_document is a function which allocates each words, sentences, paragraphs for separate memory blocks making it easily accessible.
Here's the code snippet.
char**** get_document(char* text) {
//get_document
int l=0,k=0,j=0,i=0;
char**** document = (char****)malloc(sizeof(char***));//para
document[l] = (char***)malloc(sizeof(char**));//sen
document[l][k] = (char**)malloc(sizeof(char*));//word
document[l][k][j] = (char*)malloc(sizeof(char));//letter
for(int z = 0; z < strlen(text); z++) {
if(strcmp(&text[z]," ")==0) {
document[l][k][j][i] = '\0';
j++;
document[l][k] = realloc(document[l][k],(sizeof(char*)) * j+1);
i=0;
document[l][k][j] = (char*)malloc(sizeof(char));
}
else if(strcmp(&text[z],".")==0) {
k++;
document[l] = realloc(document[l],(sizeof(char**)) * k+1);
j=0;
i=0;
document[l][k] =(char**)malloc(sizeof(char*));
document[l][k][j] = (char*)malloc(sizeof(char));
}
else if(strcmp(&text[z],"\n")==0) {
l++;
document = realloc(document,(sizeof(char***)) * l+1);
k=0;
j=0;
i=0;
document[l] = (char***)malloc(sizeof(char**));
document[l][k] =(char**)malloc(sizeof(char*));
document[l][k][j] = (char*)malloc(sizeof(char));
}
else {
strcpy(&document[l][k][j][i],&text[z]);
i++;
document[l][k][j] = realloc(document[l][k][j],(sizeof(char)) * i+1);
}
}
return document;
}
but when I run the program , I get the error
realloc:invalid next size
Can anyone help me with this. Thanks in advance.
when I run the program , I get the error
realloc:invalid next size
It appears that one of your realloc calls is failing because the allocator's tracking data has been corrupted. This is one of the more common things that can go wrong when you overwrite the bounds of an object, especially an allocated one. Which you do, a lot:
strcpy(&document[l][k][j][i],&text[z]);
If you want to make any progress in your study of C, it is essential that you learn the difference between a char and a string. The C string functions, such as strcmp() and strcpy(), apply only to the latter. You may use them on empty strings (containing only a nul) or on single-character strings (containing one character plus a nul), among other kinds, but they are neither safe nor useful for individual chars. For individual chars you would use standard C operators instead, such as == and =.
In the case of the line quoted above, each strcpy call will attempt to copy the entire tail of the input string, including the terminator, into into the one-char-big space pointed to by &document[l][k][j][i]. This will always write past the end of the allocated space, often by a lot, thus producing undefined behavior. You appear to instead want:
document[l][k][j][i] = text[z];
(well-deserved criticism of the choice of a quadruple pointer left aside). I see that you leave appending a string terminator for later, which is ok in principle, but I also see that you fail to terminate the last word of each sentence if the period ('.') immediately follows the word without any space.
Along the same lines, your several uses of strcmp() each compare the entire tail of the input string to one of several length-one string literals. Such comparisons are allowed, but they will not yield the results you appear to want. It appears you want simple equality tests against character constants, instead:
if (text[z] == ' ')
// ...
else if (text[z] == '.')
// ...
else if (text[z] == '\n')
And of course, even with those corrections, your approach is highly inefficient. Memory [re]allocation is comparatively expensive, and you are performing an allocation or reallocation for every. single. character. in the document. At least scan ahead to the end of each word so as to allocate a word at a time, though it is possible to do better even than that.
Also, do not neglect the fact that malloc() and realloc() can fail, in which case they return a null pointer. Robust code is meticulous about checking for and handling error results from its function calls, including allocation errors.
You mess up characters with strings.
You conditions to detect your elements are wrong:
if(strcmp(&text[z]," ")==0)
else if(strcmp(&text[z],".")==0)
...
Unless strlen(text) == 1 you will never enter any of your branches.
strcmp compares strings, not single characters. This means it compares the whole remaining buffer with a string of length 1 which can never be true except for the last character.
If you want to compare single characters, use if(text[z] == ' ') instead.
In your final else branch you completely smash your heap:
strcpy(&document[l][k][j][i],&text[z]);
You copy a string (again: the complete remaining buffer) into a single character.
The memory for document[l][k][j] was allocated using size=1. This cannot even hold a string of length 1 because there is no room for terminating '\0' byte.
Copying the string into memory large enough to hold exactly 1 character, causes heap corruption and in any call to memory allocation function, this will finally explode as you can see with your error message.
What you need is:
document[l][k][j][i] = text[z];
document[l][k][j][i+1] = 0;
Finally your memory size for allocation is wrong:
document = realloc(document,(sizeof(char***)) * l+1);
You want to add 1 extra element to the array but you only add 1 byte. Use this instead:
document = realloc(document,(sizeof(char***)) * (l+1));
The same applies for all other levels of your construction.
In addition your naming of counters is poor. One character variable names should only be used for loops etc. where there is no risk of confusion.
If you use them for different levels of array indexing, you should use names like wordcount, paracount etc. This would make the code much more readable.
Also I suggest you follow the hints in comments. Rethink your complete design.

What is the proper way to populate an array of Strings in C, such that each string is a single element in the array

I'm trying to initialize a 2D array of strings in C; which does not seem to work like any other language I've coded in. What I'm TRYING to do, is read input, and take all of the comments out of that input and store them in a 2d array, where each string would be a new row in the array. When I get to a character that is next line, I want to advance the first index of the array so that I can separate each "comment string". ie.
char arr[100][100];
<whatever condition>
arr[i][j] = "//First comment";
Then when I get to a '/n' I want to increment the first index such that:
arr[i+1][j] = "//Second comment";
I just want to be able to access each input as an individual element in my array. In Java I wouldn't need to do this, as each string would already be an individual element in a String array. I've only been working with c for 3 weeks now, and things that I used to take for granted as being simple, have proven to be quite frustrating in C.
My actual code is below. It gives me an infinite loop and prints out a ton of numbers:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const int MAXLENGTH = 500;
int main(){
char comment[MAXLENGTH][MAXLENGTH];
char activeChar;
int cIndex = 0;
int startComment = 0;
int next = 0;
while((activeChar = getchar()) != EOF){
if(activeChar == '/'){
startComment = 1;
}
if(activeChar == '\n'){
comment[next][cIndex] = '\0';
next++;
}
if(startComment){
comment[next][cIndex] = activeChar;
cIndex++;
}
}
for(int x = 0 ; x < MAXLENGTH; x++){
for (int j = 0; j < MAXLENGTH; j++){
if(comment[x][j] != 0)
printf("%s", comment[x][j]);
}
}
return 0;
}
The problem you are having is that C was designed to be essentially a glorified assembler. That means that the only stuff it has 'built-in' are things for which there is an obvious correct way to do it. Strings do not meet this criteria. As such strings are not a first-order citizen in c.
In particular there are at least three viable ways to deal with strings, C doesn't force you to use any of them, but instead allows you to use what you want for the job at hand.
Method 1: Static Array
This method appears to be similar to what you are trying to do, and is often used by new C programmers. In this method a string is just an array of characters exactly long enough to fit its contents. Assigning arrays becomes difficult, so this promotes using strings as immutables. It feels likely that this is how most JVM's would implement strings. C code: char my_string[] = "Hello";
Method 2: Static Bounded Array
This method is what you are doing. You decide that your strings must be shorter than a specified length, and pre-allocate a large enough buffer for them. In this case it is relatively easy to assign strings and change them, but they must not become longer than the set length. C code: char my_string[MAX_STRING_LENGTH] = "Hello";
Method 3: Dynamic Array
This is the most advanced and risky method. Here you dynamically allocate your strings so that they always fit their content. If they grow too big, you resize. They can be implemented many ways (usually as a single char pointer that is realloc'd as necessary in combination with method 2, occasionally as a linked list).
Regardless of how you implement strings, to C's eyes they are all just arrays of characters. The only caveat is that to use the standard library you need to null terminate your strings manually (although many [all?] of them specify ways to get around this by manually specifying the length).
This is why java strings are not primitive types, but rather objects of type String.
Interestingly enough, many languages actually use different String types for these solutions. For example Ada has String, Bounded_String, and Unbounded_String for the three methods above.
Solution
Look at your code again: char arr[100][100]; which method is this, and what is it?
Obviously it is method 2 with MAX_STRING_LENGTH of 100. So you could pretend the line says: my_strings arr[100] which makes your issue apparent, this is not a 2D array of strings, but a 2D array of characters which represents a 1D array of strings. To create a 2D array of strings in C you would use: char arr[WIDTH][HEIGHT][MAX_STRING_LENGTH] which is easy to get wrong. As above, however, you have some logic errors in your code, and you can probably solve this problem with just a 1D array of strings. (2D array of chars)
comment is a 2D array of chars, which are single characters. In C, a string is simply an array of characters, so your definition of comment is one way to define a 1D array of strings.
As far as the loading goes, the only obvious potential problem is that you don't ever reset startComment to zero (but you should use a debugger to make sure it's being loaded correctly), however your code to print it out is wrong.
Using printf() with a %s tells it to start printing the string at whatever address you give it, but you're giving it individual characters, not whole strings, so it's interpreting each character in each string (because C is a horrible, horrible language) as an address in RAM and trying to print that RAM. To print an individual character, use %c instead of %s. Or, just make a 1D for loop:
for(int x=0; x<MAX_LENGTH; X++)
printf("%s\n", comment[x])
It's also a bit confusing that you use the same MAX_LENGTH for the number of lines in the array and the length of the string in each line

How to check if there is a `\0` character in a filename using C?

I'd like to write a function like this:
int validate_file_name(char *filename)
{
//...
}
which will:
return 1 if there was no \0 character in the filename,
0 otherwise.
I thought it may be achieved using a simple for(size_t i = 0; i < strlen(filename); i++), but I don't know how to determine how much characters I've got to check?
I can't use strlen() because it will terminate on the first occurrence of a \0 character.
How should I approach this problem?
Clarification:
I am trying to apply these guidelines to a filename I receive. If you should avoid putting a \0 in a filename, how could you validate this if you've got no size parameter.
Moreover, there are strings with multiple \0 characters, like here: http://www.gnu.org/software/libc/manual/html_mono/libc.html#Argz-and-Envz-Vectors. Still, I had no idea that it is impossible to determine their length if it is not explicitly provided.
Conclusion:
There is no way you can determine the length of string which is not NULL-terminated. Unless you know the length of course or you deploy some dirty hacks: Checking if a pointer is allocated memory or not.
You are trying to solve a problem that does not need to be solved.
A file name is a string. In C, a "string" is by definition "a contiguous sequence of characters terminated by and including the first null
character".
It is impossible to have a string or a file name with a null character embedded in it.
It's possible to have a sequence of characters with an embedded null character. For example:
char buf[] = "foo\0bar.txt";
buf is an array of 12 characters; the characters at positions 3 and 11 are both null characters. If you treat buf as a string, for example by calling
fopen(buf, "r")
it will be treated as a string with a length of 3 (the length of a string does not include the terminating null character).
If you're working with character arrays that may or may not contain strings, then it makes sense to do what you're asking. You would need to keep track of the size of the buffer separately from the address of the initial character, either by passing an additional argument or by wrapping the pointer and the length in a structure.
But if you're dealing with file names, it's almost certainly best just to deal with strings and assume that whatever char* value is passed to your function points to a valid string. If it doesn't (if there is no null character anywhere in the array), that's the caller's fault, and not something you can reasonably check.
(Incidentally, Unix/Linux file systems explicitly forbid null characters in file names. The / character is also forbidden, because it's used as a directory name delimiter. Windows file systems have even stricter rules.)
One last point: NULL is (a macro that expands to) a null pointer constant. Please don't use the term NULL to refer to the null character '\0'.
The answer is that you can't write a function that does that if you don't know the length of the string.
To determine the length of the string strlen() searches for the '\0' character which if is not present will cause undefined behavior.
If you knew the length of the string then,
for (int i = 0 ; i < length ; ++i)
{
if (string[i] != '\0')
continue;
return 1;
}
return 0;
would work, if you don't know the length of the string then the condition would be
for (int i = 0 ; string[i] != '\0' ; ++i)
which obviously means that then searching for the '\0' makes no sense because it's presence is what makes all other string related functions to work properly.
If the string is not NULL-terminated, what else it is terminated by? And if you don't know that, what is it length? If you know the answer to these problems, you know the answer to your question.

Read all the subsequent chars and transform into an int

I'm reading a txt file and getting all the chars that aren't space, transforming them to int using (int)c-'0' and that is working.
The problem is if the number has more than 1 digit, because I'm reading char by char.
How could I do to read like a sequence of chars, transform this sequence of chars into int?
I tried using a string, but when I try to pass this string to my other function, it treats each index as a number, but what I need is that the whole string is treated as one number.
Any ideas?
A convenient way to do the conversion is to read the whole number into a buffer (string) and then call atoi. Make triple sure that the string is properly null-terminated.
One solution, I won't say it's good or bad in your case since you don't provide any code, but you could do something like this: (pseudoish code)
int i;
int val = 0;
char *string = "5238785";
for (i = 0; i < strlen(string); i++) {
val = val * 10 + atoi(string[i]);
}
NOTE: I simplified it and you should do more string controls to make sure you don't go out of bounds etc. Make sure the string is NULL-terminated \0, but the concept is that you read one digit at the time, and just move what you've read so far "one step left" to fit next digit.

Resources