I am beginner for C-sharp.I was just learning how to write simple instruction for program

I am beginner for C-sharp.I was just learning how to write simple instruction for program - selenium-webdriver

This is the instruction and error message I received.
I just tried to write a line which possess two slace mark.

Single quotes used to denote character literals. More than one character is a string literal.So given data should be placed inside double quotes instead of single quotes.
And to display a string with escape characters/line breaks, have to add '#' symbol in front of a string.This marks the string as a verbatim string literal.
Find updated code here

Related

Flex match string literal, escaping line feed

I am using flex to try and match C-like, simplified string literals.
A regular expression as such:
\"([^"\\]|\\["?\\btnr]|\\x{HEXDIG}{HEXDIG})*\"
will match all one-line string literals I am interested in.
A string literal cannot contain a non-escaped backslash. A string literal also cannot contain a literal line feed (0x0a) unless it is escaped by a backslash, in which case the line feed and any following spaces and tabulations are ignored..
For example, assuming {LF} is an actual line feed and {TAB} an actual tabulation (I could not format it better than that).
In: "This is an example \{LF}{TAB}{TAB}{TAB}of a confusing valid string"
Token: "This is an example of a confusing valid string"
My first idea was to use a starting state, a trailing context and yymore() to match what I want and check for errors giving something like the following:
...
%%
\" { BEGIN STRING; yymore(); }
<STRING>{
\n { /* ERROR HERE! */ }
<<EOF>> { /* ERROR HERE AS WELL */ }
([^"\\]|\\["?\\btnr]|\\x{HEXDIG}{HEXDIG})* {
/* String ok up to here*/
yymore();
}
\\\n[ \t]* {
/*Vadid inside a tring but needs to be ignored! */
yymore();
}
\" { /* Full string matched */ BEGIN INITIAL;}
.|\n { \* Anything else is considered an error *\ }
}
%%
...
Is there a way to do what I want in the way I am trying to do it? Is there instead any other 'standard' maybe method provided by flex that I just stupidly have not though of? This does not look to me like an uncommon use case. Should I just match the strings separately (beginning to before , after whitespace to end) and concatenate them. This is a bit complicated to do since a string can be decomposed into an arbitrary number of lines using backslashes.

If all you want to do is to recognise a string literal, there's no need for start conditions. You can use some variant of the simple pattern which you'll find in many answers:
["]({normal}|{escape})*["]
(I used macros to make the structure clear, although in practice I would hardly ever use them.)
"Normal" here means any character without special significance in a string. In other words, any character other than " (which ends the literal), \ (which starts an escape sequence, or newline (which is usually an error although some languages allow newlines in strings). In other words, [^"\n\\] (or something similar).
"escape" would be any valid escape sequence. If you didn't want to validate the escape sequence, you could just match a backslash followed by any single character (including newline): \\(.|\n). But since you do seem to want to validate, you'd need to be explicit about the escape sequences you're prepared for:
\\([\n\\btnr"]|x[[:xdigit:]]{2})
But all that only recognises valid string literals. Invalid string literals won't match the pattern, and will therefore fall back to whatever you're using as a fallback rule (matching only the initial "). Since that's practically never what you want, you need to add a second rule which detects error. The easiest way to write the second rule is ["]({normal}|{escape})*, i.e. the valid rule without the final double quote. That will only match erroneous string literals because of (f)lex's maximal munch rule: a valid string literal has a longer match with the valid rule than with the error rule (because the valid rule's match includes the final double quote).
In real-life lexical scanners (as opposed to school exercises), it's more common to expect that the lexical scanner will actually resolve the string literal into the actual bytes it represents, by replacing escape sequences with the corresponding character. That is generally done with a start condition, but the individual patterns are more focussed (and there are more of them). For an example of such a parser, you could look at these two answers (and many others):
Flex / Lex Encoding Strings with Escaped Characters
Optimizing flex string literal parsing

Parsing delimited strings using petitparser

I was originally looking to (manually) write a simple tokenise/parser for my grammar, but one of my requirements means that tokenising is a bit fiddly.
I need to be able to support the notion of delimited strings where the delimiter could be any char. eg. strings are most likely to be delimited using double quotes (eg. "hello") but it could just as easily be /hello/ or ,hello, or pathologically xhellox
So, I started looking at what alternatives there might be to do a combined tokenise/parse... which is when I stumbled across petit parser.
Just curious whether this type of delimited string might be something that would be able to be parsed using Petit Parser? Thanks.

There are multiple ways to achieve this with PetitParser. Probably the most elegant is to use the a continuation parser:
final delimited = any().callCC((continuation, context) {
final delimiter = continuation(context).value.toParser();
final parser = [
delimiter,
delimiter.neg().star().flatten(),
delimiter,
].toSequenceParser().pick<String>(1);
return parser.parseOn(context);
});
The above snippet parses the start character any() (can be further restricted, if necessary) and then dynamically creates a delimiter parser from that. Furthermore, it combines that delimiter parser into one that parses the start character, the contents (not the start character), and the end character and uses the new parser to consume the input. This also gives really nice error messages.

Writing Regular Expressions for a C string

I am currently learning about regex and I am trying to figure out how to capture a string in C that does not allow newlines. I have searched around and found answers regarding flex and lex but I'm trying to learn it a simplistic as I can to gain a better understanding.
This is a piece of expression that I have found searching and it appears to be common(I have found it a lot). But I still have yet to find a clear explanation as to what it means and how it is used.
\"(\\.|[^"])*\"

What this expression means is that there must be a doublequote at the beginning and at the end \", and there will be a sequence of zero or more o the following:
A backslash character \\ followed by any single character ., or
A non-doublequote character [^"]
The first clause is self-explanatory. The second clause is there to treat any single character preceded by backslash as an escape sequence. This ensures that the expression would capture any of the following strings to the end:
"string \"one\" has embedded doublequotes"
"string two \
is split across \
multiple lines"
"string\tthree\nhas\tembedded\tescape\tcharacters"

Rigorous definition for CSV file reading/writing

I have written my own CSV reader/writer in C to store records in a character column in an ODBC database. Unfortunately I have discovered many edge cases that trip over my implementation, and I have come to the conclusion my problem is that I have not rigorously defined the rules for CSV. I've read RFC4180, but it seems incomplete and does not resolve ambiguities.
For example, should "" be considered an empty token or a double quote? Do quotes match outside-in or left to right? What do I do with an input string that has unmatched single quotes? The real mess begins when I have nested tokens, which doubles up the escaped quotation characters.
What I really need is a definitive CSV standard that I can implement in code. Every time I feel I have nailed every corner case, I find another one. I am sure this problem has been mulled over and solved many times over by superior minds to mine, has anyone written a rigorous definition of CSV that I can implement in code? I realise C is not the ideal language here, but I don't have a choice about the compiler at this stage; nor can I use a third party library (unless it compiles with C-90). Boost is not an option as my compiler doesn't support C++. I have contemplated ditching CSV for XML, but it seems like overkill for storing a few tokens in a 256 character database record. Anyone made a definitive CSV spec?

There is no standard (see Wikipedia's article, in particular http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard), so in order to use CSV, you need to follow the general principle of being conservative in what you generate and liberal in what you accept. In particular:
Do not use quotation marks for blank fields. Simply write an empty field (two adjacent delimiters, or a delimiter in the first/last position of the line).
Quote any field containing a quotation mark, comma, or newline.

Find the most authoritative CSV library you trust and read the source. CSV is not so complicated that you won't be able to understand its rules from a comprehensive reading of a source implementation. I have been happy with Java's opencsv. Perl's is here, and so forth.

According to RFC 4180, fields should be parsed from left to right to correctly interpret a double quote. In some contexts "" is an escaped double quote (when inside a quoted field), otherwise it's either an empty string or two double quotes (when inside an otherwise non-empty field value).
For example, consider a file with 4 records (1 column):
"field""value" CRLF
"" CRLF
field""value CRLF
"field value" extra CRLF
"field""value" - should be read as field"value
"" - should be read as an empty string
field""value - should be read as field""value
"field value" extra - could be read as field value extra or you can reject it
Record 4 is really an invalid field so you can either accept it or reject it.
When you start reading a field, you need to check if the first character read is a double quote or not. If the first character is a double quote, the field value is quoted and you need to read until you find an unescaped closing double quote. In this case you can ignore new lines and comma characters, since the field is quoted - it only ends when you encouter a closing double quote.
If the first character is not a double quote then all double quotes in the field value should be treated as literal double qoutes. In this case you reach the end of the field when you encounter a comma or a new line character.
Based on this, I'd recommend to always quote all fields when you write out records and write a proper parser to parse records when you read data. This way you can store any data in your CSV files (even multiline text with embedded quotes) and your format will be clear. When reading a CSV file, I'd fail all files that cannot be correctly parsed - if this is a database, you can expect users to not to mess with the records manually, unless they know what they're doing.

K and R exercise 1-24

I am doing programs in The C Programming Language by Kernighan and Ritchie.
I am currently at exercise 1-24 that says:
Write a program to check a C Program for rudimentary syntax errors
like unbalanced parentheses, brackets and braces. Don't forget about
quotes, both single and double, escape sequences, and comments.
I have done everything well... But I am not getting how escape sequences would affect these parentheses, brackets and braces?
Why did they warned about escape sequences?

In "\"", there are three double quote characters, but still it's a valid string literal. The middle " is escaped, meaning the outer two balance each other. Similarly, '\'' is a valid character literal.
Parentheses, brackets and braces are not affected, unless of course they appear in a string literal that you don't parse correctly because of an escaped quote.

I'd guess they mean that you need to differentiate between " (which starts or ends a string) and \" (which is a " character, possibly inside a string)
This is important if you're to avoid reporting e.g. strlen("\")"); as having unbalanced parentheses.

The obvious possibility would be an escaped quote inside a string. If you don't take the escape into account, you might think the string ended there. For example: "\")\"". The ) is part of the string literal, so it doesn't count as a mis-matched parenthesis.