I'm writing a parser in C, and I've got some code that looks like this:
char *consume_while(parser *self, int(*test)(char)) {
char *result;
while (eof(self) && (*test)(next_char(self))) {
// append the return value from the function consumed_char(self)
// onto the "result" string defined above.
}
return result;
}
But I'm kinda new to the whole string manipulation aspect of C, so how would I append the character returned from the function consumed_char(self) to the result char pointer? I've seen people using the strcat function, but that wont work as it takes two constant char pointers, but I'm dealing with a char* and a char. In java it would be something like this:
result += consumed_char(self);
What's the equivalent in C?
Thanks :)
In C, strings do not exist as a type, they are just char arrays with a null-terminating character. This means, assuming your buffers are big enough and filled with zeroes, it can be as simple as:
result[(strlen(result)] = consumed_char(self);
if not, your best bet is to use strcat and change your consumed_self function to return a char *.
That being said, writing a parser without basic understanding of C-style strings, is, to say the least, quite ambitious.
Related
I have been working with C for the first time in a long time and one of the biggest problems for me has been working with strings, since they aren't expressed as well as they are in Python.
From what I know and understand, a char * is just a pointer to a string(or rather, the first character in a string). A char[] is very similar and can be used the same way.
My first question is a little side question, but while we use it to execute the same things, is there a difference in correctness or how the compiler views it?
Going ahead, I know that char *[] is just an array, but each element is a pointer of type char *. So through that each element when deferenced/accessed would just return a string. Which is why char *argv[] just takes values from command line.
For a problem that I was working on I needed a a 2D array of strings and had been trying to run it is char *[][] and making function calls for it.
I have a function type defined as void runoff_function(candidates *, int a, int b,char * array[a][b]); That expects a 2D array of character pointers.
My main function has a variable defined and populated as char* list[n][argc];
Except when running a loop to initialize user inputs:
char* list[n][argc];
for(int i=0;i<n;i++)
{
printf("Voter %d\n",(i+1));
for(int j=1;j<argc;j++)
{
printf("Rank %d\t",j);
scanf("%s",list[i][j-1]);
}
I get a seg fault after my first input and I don't know why.
The declaration char* list[n][argc]; reserves space for the string pointers, only. However, each string needs a place to store its characters. You must supply this space.
The easiest and safest way to do it, is to instruct scanf() to allocate some space on the heap for your string. This is done by adding the "m" modifier to the "%s" conversion. scanf() will then expect a char** as the argument, and store the pointer to a new string at that location. Your code would look like this:
scanf("%ms", &list[i][j-1]);
Note that it is your job to subsequently get rid of the memory allocations. So, once you are done with your strings, you will need to add a loop that calls free() on each cell of the 2D array:
for(int i=0;i<n;i++) {
for(int j=1;j<argc;j++) {
free(list[i][j-1]);
}
}
The "%ms" format is specified by the POSIX.1-2008 standard, so safe to use on any modern linux.
I am finding myself again confused by C strings, chars, etc.
Here is some code I am using to test the syntax on the Arduino. I know that (*message)buff will give me a pointer (I still don’t really know why I need to use pointers, but I can do some research on that!), I convert the *message_buff to a String (just for something to do, but note that later on when I try and print this string to serial I only get a single 'c' character).
I set an array pointer three elements long (three bytes long?? I don't really know):
char *mqtt_command[3] = {};
And later on when I try and add a value to the array using:
*mqtt_command[i] = str;
I get the error:
error: invalid conversion from 'char*' to 'char'
If I change that to:
mqtt_command[i] = str;
(without the *) it compiles fine. I don't know why...
Here is my code:
char *message_buff = "command:range:1";
char *str;
String msgString = String(*message_buff);
char *mqtt_command[3] = {};
int i = 0;
void setup()
{
Serial.begin(9600);
delay(500);
while ((str = strtok_r(message_buff, ":", &message_buff)) != NULL)
{
Serial.println(str);
mqtt_command[i] = str;
i++;
}
delay(1000);
Serial.print("Command: ");
Serial.println(mqtt_command[1]);
Serial.print("MQTT string: ");
Serial.println(msgString);
}
void loop()
{
// Do something here later
}
And here is the output:
command
range
1
Command: range
MQTT string: c
How can I understand chars, strings, pointers, and char arrays? Where can I go for a good all round tutorial on the topic?
I am passing in a command string (I think it is a string, maybe it is a char array????) via MQTT, and the message is:
command:range:1
I am trying to build a little protocol to do things on the Arduino when an MQTT message is received. I can handle the MQTT callbacks fine, that not the problem. The issue is that I don't really understand C strings and chars. I would like to be able to handle commands like:
command:range:0
command:digital:8
read:sensor:2
etc.
You need a C (and/or C++) primer first, you need to work more on your understanding of the declarations and the syntax for pointer access and so on.
This:
char *mqtt_command[3] = {};
means "mqtt_command is an array of 3 char *", i.e. three pointers to characters. Since strings are represented as pointers to characters, this can be called "an array of three strings". There's no actual space for the characters themselves though, so this is not enough to work with but it's a good start.
Then, your first error is this code:
*mqtt_command[i] = str;
The problem the compiler is complaining about is that you're dereferencing things too many times. Just mqtt_command[i] is enough, that evaluates to the i:th value of the array, which has type char *. Then, your initial asterisk dereferences that pointer, meaning the type of the left-hand expression is now char, i.e. it's a single character. You can't assign a pointer into a character, it (typically) won't fit.
Drop the initial asterisk to solve this.
To analyze further, this:
char *message_buff = "command:range:1";
String msgString = String(*message_buff);
is also wrong, for the same reason. You're dereferencing the message_buff pointer, so the argument to the String() constructor is merely the first character, i.e. c. Again, drop the initial asterisk, you mean:
String msgString = String(message_buf);
which can be written as just:
String msgString(message_buf);
mqtt_command[i] = str;
This will work, as mqtt_command[i] is already char pointer. * will redirect it to any previously allocated memory, which is not done in the code.
If I am reading a C string such as:
char myData[100];
and I want to process this data and produce a copy out of it, so my code looks like:
char myData[100], processedData[50];
loop
fill myData from file...
setProcessedData(myData, processedData);
store processedData to file...
where setProcessedData is a function that returns a processed string. let's say for simplicity it returns a substring
void setProcessedData (char *myData, char *processedData) {
memCopy( processedData, myData, 5);
}
Is what I am doing something wrong? Like creating extra objects/strings? is there a better way to do it?
Let's say I read string from file which contains *
I am* A T*est String* But Ho*w to Process*.
I want to get substring which has the first 3 s. So my processedData
I am A t*est String*
and I want to do this for all lines of a file as efficient as possible.
Thanks
Problem is that your function is inherently unsafe, this because you make assumption about the allocated memory by parameters you pass to the function.
If someone is going to call setProcessedData by passing a string which is shorter than 5 bytes then bad things will happen.
In addition you are copying memory with memcpy by using a raw dimension, a safer approach, even if it is quite picky in this situation, is to use sizeof(char)*5.
Best thing you can do, though, is to follow the same approach used by safer functions of standard library like strcpy vs strncpy: you pass a third parameter which is the maximum length that should be copied, eg:
void processData(const char *data, char *processedData, unsigned int length) {
memcpy(processedData,data,length*sizeof(char));
}
I think you can improve your code:
Making the input string pointer const (i.e. const char* myData), to mark that myData is an input string and its content is not modified by the function.
Pass the size of the destination buffer, so in your function you can do proper checks and avoid buffer overruns (a security enemy).
I want to copy a string in C (Windows) that contains nulls in it. I need a function to which I will pass buffer length so that the NULL characters will be meaningless. I found StringCbCopy function but it still stops at the first NULL character.
Since you know the length, use memcpy().
Here is a quick bit of code that may help:
char array1[5] = "test", array2[5];
int length = 5;
memcpy(array2, array1, length*sizeof(char));
//the sizeof() is redundant in this because each char is a byte long
//but it is useful if you are working with other datatypes
memcpy probably will become your best friend for situations like this.
It should be very easy to write your own function to do this. If you know the length of the string, just create a char[] or char* with the specified length, and copy characters one by one.
Coming from a Java background I'm learning C, but I find those vague compiler error messages increasingly frustrating. Here's my code:
/*
* PURPOSE
* Do case-insensetive string comparison.
*/
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int compareString(char cString1[], char cString2[]);
char strToLower(char cString[]);
int main() {
// Declarations
char cString1[50], cString2[50];
int isEqual;
// Input
puts("Enter string 1: ");
gets(cString1);
puts("Enter string 2: ");
gets(cString2);
// Call
isEqual = compareString(cString1, cString2);
if (isEqual == 0)
printf("Equal!\n");
else
printf("Not equal!\n");
return 0;
}
// WATCH OUT
// This method *will* modify its input arrays.
int compareString(char cString1[], char cString2[]) {
// To lowercase
cString1 = strToLower(cString1);
cString2 = strToLower(cString2);
// Do regular strcmp
return strcmp(cString1, cString2);
}
// WATCH OUT
// This method *will* modify its input arrays.
char strToLower(char cString[]) {
// Declarations
int iTeller;
for (iTeller = 0; cString[iTeller] != '\0'; iTeller++)
cString[iTeller] = (char)tolower(cString[iTeller]);
return cString;
}
This generates two warnings.
assignment makes pointer from integer without a cast
cString1 = strToLower(cString1);
cString2 = strToLower(cString2);
return makes integer from pointer without a cast
return cString;
Can someone explain these warnings?
C strings are not anything like Java strings. They're essentially arrays of characters.
You are getting the error because strToLower returns a char. A char is a form of integer in C. You are assigning it into a char[] which is a pointer. Hence "converting integer to pointer".
Your strToLower makes all its changes in place, there is no reason for it to return anything, especially not a char. You should "return" void, or a char*.
On the call to strToLower, there is also no need for assignment, you are essentially just passing the memory address for cString1.
In my experience, Strings in C are the hardest part to learn for anyone coming from Java/C# background back to C. People can get along with memory allocation (since even in Java you often allocate arrays). If your eventual goal is C++ and not C, you may prefer to focus less on C strings, make sure you understand the basics, and just use the C++ string from STL.
strToLower's return type should be char* not char
(or it should return nothing at all, since it doesn't re-allocate the string)
1) Don't use gets! You're introducing a buffer-overflow vulnerability. Use fgets(..., stdin) instead.
2) In strToLower you're returning a char instead of a char-array. Either return char* as Autopulated suggested, or just return void since you're modifying the input anyway. As a result, just write
strToLower(cString1);
strToLower(cString2);
3) To compare case-insensitive strings, you can use strcasecmp (Linux & Mac) or stricmp (Windows).
As others already noted, in one case you are attempting to return cString (which is a char * value in this context - a pointer) from a function that is declared to return a char (which is an integer). In another case you do the reverse: you are assigning a char return value to a char * pointer. This is what triggers the warnings. You certainly need to declare your return values as char *, not as char.
Note BTW that these assignments are in fact constraint violations from the language point of view (i.e. they are "errors"), since it is illegal to mix pointers and integers in C like that (aside from integral constant zero). Your compiler is simply too forgiving in this regard and reports these violations as mere "warnings".
What I also wanted to note is that in several answers you might notice the relatively strange suggestion to return void from your functions, since you are modifying the string in-place. While it will certainly work (since you indeed are modifying the string in-place), there's nothing really wrong with returning the same value from the function. In fact, it is a rather standard practice in C language where applicable (take a look at the standard functions like strcpy and others), since it enables "chaining" of function calls if you choose to use it, and costs virtually nothing if you don't use "chaining".
That said, the assignments in your implementation of compareString look complete superfluous to me (even though they won't break anything). I'd either get rid of them
int compareString(char cString1[], char cString2[]) {
// To lowercase
strToLower(cString1);
strToLower(cString2);
// Do regular strcmp
return strcmp(cString1, cString2);
}
or use "chaining" and do
int compareString(char cString1[], char cString2[]) {
return strcmp(strToLower(cString1), strToLower(cString2));
}
(this is when your char * return would come handy). Just keep in mind that such "chained" function calls are sometimes difficult to debug with a step-by-step debugger.
As an additional, unrealted note, I'd say that implementing a string comparison function in such a destructive fashion (it modifies the input strings) might not be the best idea. A non-destructive function would be of a much greater value in my opinion. Instead of performing as explicit conversion of the input strings to a lower case, it is usually a better idea to implement a custom char-by-char case-insensitive string comparison function and use it instead of calling the standard strcmp.
You don't need these two assigments:
cString1 = strToLower(cString1);
cString2 = strToLower(cString2);
you are modifying the strings in place.
Warnings are because you are returning a char, and assigning to a char[] (which is equivalent to char*)
You are returning char, and not char*, which is the pointer to the first character of an array.
If you want to return a new character array instead of doing in-place modification, you can ask for an already allocated pointer (char*) as parameter or an uninitialized pointer. In this last case you must allocate the proper number of characters for new string and remember that in C parameters as passed by value ALWAYS, so you must use char** as parameter in the case of array allocated internally by function. Of course, the caller must free that pointer later.
strToLower should return a char * instead of a char. Something like this would do.
char *strToLower(char *cString)
char cString1[]
This is an array, i.e. a pointer to the first element of a range of elements of the same data type. Note you're not passing the array by-value but by-pointer.
char strToLower(...)
However, this returns a char. So your assignment
cString1 = strToLower(cString1);
has different types on each side of the assignment operator .. you're actually assigning a 'char' (sort of integer) to an array, which resolves to a simple pointer. Due to C++'s implicit conversion rules this works, but the result is rubbish and further access to the array causes undefined behaviour.
The solution is to make strToLower return char*.