Regular expression for a string literal in flex/lex - c

I'm experimenting to learn flex and would like to match string literals. My code currently looks like:
"\""([^\n\"\\]*(\\[.\n])*)*"\"" {/*matches string-literal*/;}
I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.
I am probably just writing a poor regular expression or one incompatible with flex. Please advise!

A string consists of a quote mark
"
followed by zero or more of either an escaped anything
\\.
or a non-quote character, non-backslash character
[^"\\]
and finally a terminating quote
"
Put it all together, and you've got
\"(\\.|[^"\\])*\"
The delimiting quotes are escaped because they are Flex meta-characters.

For a single line... you can use this:
\"([^\\\"]|\\.)*\" {/*matches string-literal on a single line*/;}

How about using a start state...
int enter_dblquotes = 0;
%x DBLQUOTES
%%
\" { BEGIN(DBLQUOTES); enter_dblquotes++; }
<DBLQUOTES>*\"
{
if (enter_dblquotes){
handle_this_dblquotes(yytext);
BEGIN(INITIAL); /* revert back to normal */
enter_dblquotes--;
}
}
...more rules follow...
It was similar to that effect (flex uses %s or %x to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.

Paste my code snippet about handling string in flex, hope inspire your thinking.
Use Start Condition to handle string literal will be more scalable and clear.
%x SINGLE_STRING
%%
\" BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
\n yyerror("the string misses \" to termiate before newline");
<<EOF>> yyerror("the string misses \" to terminate before EOF");
([^\\\"]|\\.)* {/* do your work like save in here */}
\" BEGIN(INITIAL);
. ;
}

This is what we use in Zolang for single line string literals with embedded templates ${...}
\"(\$\{.*\}|\\.|[^\"\\])*\"

An answer that arrives late but which can be useful for the next one who will need it:
\"(([^\"]|\\\")*[^\\])?\"

Related

What is the utility of escape sequence '\'?

In the below code snippet , how is '\' behaving ?
printf("hii\"); // This line gives error : missing terminating " character
printf("hii\ n"); // This line prints hii n
I am unable to get how this escape sequence is behaving here ,Please explain .
An escape sequence isn't the single \ character; it's that followed by another character. For example, \" is an escape sequence, as is \n. Under some circumstances you can see more than a single character following the backslash all as the same escape code; this has to do with how the characters are represented internally (ASCII or Unicode value) and can be safely ignored for now.
An escape sequence is used to write a character that is inconvenient/impossible to put into the code directly. For example, \" is the escape sequence for a quotation mark. It is like putting a quote inside the string, which you couldn't otherwise do because it would instead close the string literal. Look at the syntax highlighting of your question to see what I mean; most of the first line is considered part of the string, because you never have an unescaped closing quote.
The most common escape sequence is perhaps \n. Unlike with \", it doesn't just produce a literal n in the string; you could do that without an escape. Instead it produces a newline. The code
printf("hii\nthere");
prints
hii
there
to the screen.
The second line of code in your question uses the escape sequence \ (backslash space). Thisis not a standard escape sequence; if you compile with warnings your compiler will probably report that it's ignoring it or something.
(If you want to actually print a backslash to the screen, you need to escape a backslash, using \\)

C: Ignoring a specific character, while using fscanf

As an example I have a text file that includes this text: "name?"
I want to save this String only as name?
I tried ("%["]"), but this doesn't work.
Which function should I use?
The scanf and fscanf functions work exactly the same. Your format is however wrong.
Try instead e.g. "\"%[^\"]\"" as your format.
The first and last " is to mark the start and end of the string. Inside the string one can't use plain double-quote as that will end the string. So these have to be escaped using the backslash.
If we break down the format string into its three main components:
\" - This matches the literal double-quote
%[^\"] - This matches a string not containing the double-quote (the negation is what the ^ does)
Lastly \" again, to match the end quote of your input

Can use escape character for Double Quote JSON

I have a json without escape character which I my code is unable to parse because there's no escape character. I can make it work by adding a \ before the double quotes. However, due to some constraint I am looking for a workaround and I want to know --
a. Is there any other way I can make this json work without an escape character and the content having double quotes is displayed on my application as is, or
b. do I necessarily need to have an escape character before all double quotes and there's no workaround?
"abc": {
"x1": {
"text1": "key1",
"text2": "Given "Example text" is wrong"
}
}
Thanks !!
Your example is invalid JSON, but I think you know that. :-)
do I necessarily need to have an escape character before all double quotes
Yes, the only way to have a " inside a JSON string is to use an escape of some kind. Unlike JavaScript, JSON doesn't have '-delimited strings or backtick-delimited templates that become strings (new in ES2015). There are a couple of different escape sequences you can use (\" and \u0022 for instance), but they're still escape sequences. After all, the " is how the JSON parser knows it's found the end of the string.
In the specific case of HTML, you could also use " (a named character entity) if you're interpreting the string as HTML. But that doesn't change the fact you need to properly escape the string (since newlines and several other characters need escaping as well, not just ").
My experience is that the best way to produce JSON is to produce a structure in memory and then use the facility of your environment to convert that structure to valid JSON. In JavaScript, that's JSON.stringify; in PHP, it's json_encode; etc. Just about any language or environment you can find has a JSON library (built-in or not) for this.
You SHOULD add escape char () in order to have a valid JSON.
According to the specs, this is the list of special character used in JSON :
\b Backspace (ascii code 08)
\f Form feed (ascii code 0C)
\n New line
\r Carriage return
\t Tab
\" Double quote
\ Backslash caracter

How to escape characters in a string and also show the escaped characters?

How can I place \ before " of a string in C with out parsing the string character by character?
Actually,we are using sprintf to take the string and we are forming JSON response. But JSON is giving us error as it expects \ to be there before ".
For example, if the string is in the format :
"hi "hello" bye"
I should get it in format of
"hi \"hello\" bye"
Just escape the backslash in the sprintf by adding \\ befor \":
json_resp->offset += sprintf(&json_resp->buffer[json_resp->offset],
"\n\\\""JSON_FIELD_EVENT_SYNOPSIS"\\\": \\\"%s\\\",", utf8_str);
You should consider using snprintf to avoid buffer overflows.
Look at it piece by piece, rather than confused by the whole thing in one go.
You want to see a \, which needs escaping: \\.
Next, you want to see ", which also needs escaping : \".
So all together it's:
printf("hi \\\"hello\\\" bye")
It looks nastier and more confusing than it actually is if you break it down as above - there's no special rule to it, it's just stringing (ahem..) characters together that you already know how to escape.

string literals/escapes

I am wondering if there is some sort of string prefix so that the cstring is taken as is without the need of my escaping all the characters. I am not 100% sure. I remember something about prefixing the string with the # symbol ( char str[] = #"some\text\here"; ) and you would not need to escape any of your characters such as \, \n,.etc. im working with curl and urls and it is a pain to have to escape every single backslash.
can anyone spread some light on this or am i stuck escaping every character prefixed with a backslash?
No. In C there are only two types of "string", the string literal surrounded by double quotes and the char literal surrounded by single quotes.
In both cases you must backslash escape characters that have special meaning.
This feature is not available in C. It seems you read about verbatim string literals of C#
and if you have to escape - escape characters in C you need to escape that using backslash ( \ )
In C, there is no such thing. You are stuck escaping everything, or perhaps you could put your URLs in a file and read them in.

Resources