How to print quotes inside quotes but in commodore 64 basic v.2.0 - c64

I'm writing a little hobby c64 textual adventure and I've stopped in one very concrete moment. Namely, I don't know how to quote anything inside quote.
How to do that inside commodore 64 basic v.2.0.?

You have to generate a string containing the quote character some other way than as a literal. The obvious way is to use CHR$, as in:
? "ONE ";CHR$(34);"QUOTED";CHR$(34);" WORD"
One of the examples at http://www.c64-wiki.com/index.php/CHR%24 is quite similar to this.
If you have to do a lot of them, you could store it in a variable to make the code shorter (which may make it faster or slower - if this matters, measure it yourself)
10 QU$ = CHR$(34)
20 ? "ONE ";QU$;"QUOTED";QU$;" WORD"

Related

I need help filtering bad words in C?

As you can see, I am trying to filter various bad words. I have some code to do so. I am using C, and also this is for a GTK application.
char LowerEnteredUsername[EnteredUsernameLen];
for(unsigned int i = 0; i < EnteredUsernameLen; i++) {
LowerEnteredUsername[i] = tolower(EnteredUsername[i]);
}
LowerEnteredUsername[EnteredUsernameLen+1] = '\0';
if (strstr(LowerEnteredUsername, (char[]){LetterF, LetterU, LetterC, LetterK})||strstr(LowerEnteredUsername, (char[]){LetterF, LetterC, LetterU, LetterK})) {
gtk_message_dialog_set_markup((GtkMessageDialog*)Dialog, "This username seems to be innapropriate.");
UsernameErr = 1;
}
My issue is, is that, it will only filter the last bad word specified in the if statement. In this example, "fcuk". If I input "fuck," the code will pass that as clean. How can I fix this?
(char[]){LetterF, LetterU, LetterC, LetterK}
(char[]){LetterF, LetterC, LetterU, LetterK}
You’ve forgotten to terminate your strings with a '\0'. This approach doesn’t seem to me to be very effective in keeping ~bad words~ out of source code, so I’d really suggest just writing regular string literals:
if (strstr(LowerEnteredUsername, "fuck") || strstr(LowerEnteredUsername, "fcuk")) {
Much clearer. If this is really, truly a no-go, then some other indirect but less error-prone ways are:
"f" "u" "c" "k"
or
#define LOWER_F "f"
#define LOWER_U "u"
#define LOWER_C "c"
#define LOWER_K "k"
and
LOWER_F LOWER_U LOWER_C LOWER_K
Doing human-language text processing in C is painful because C's concept of strings (i.e. char*/char[] and wchar_t*/wchar_t[]) are very low-level and are not expressive enough to easily represent Unicode text, let alone locate word-boundaries in text and match words in a known dictionary (also consider things like inflection, declension, plurals, the use of diacritics to evade naive string matching).
For example - your program would need to handle George carlin's famous Seven dirty words quote:
https://www.youtube.com/watch?v=vbZhpf3sQxQ
Someone was quite interested in these words. They kept referring to them: they called them bad, dirty, filthy, foul, vile, vulgar, coarse, in poor taste, unseemly, street talk, gutter talk, locker room language, barracks talk, bawdy, naughty, saucy, raunchy, rude, crude, lude, lascivious, indecent, profane, obscene, blue, off-color, risqué, suggestive, cursing, cussing, swearing... and all I could think of was: shit, piss, fuck, cunt, cocksucker, motherfucker, and tits!
This could be slightly modified to evade a naive filter, like so:
Someone was quite interested in these words. They kept referring to them: they called them bad, dirty, filthy, foul, vile, vulgar, coarse, in poor taste, unseemly, street talk, gutter talk, locker room language, barracks talk, bawdy, naughty, saucy, raunchy, rude, crude, lude, lascivious, indecent, profane, obscene, blue, off-color, risqué, suggestive, cursing, cussing, swearing... and all I could think of was: shít, pis$, phuck, c​unt, сocksucking, motherfúcker, and títs!
Above, some of the words have simple replacements done, like s to $, others had diacritics added like u to ú, and some are just homonyms), however some of the other words in the above look the same but actually contain homographs or "invisible" characters like Unicode's zero-width-space, so they would evade naive text matching systems.
So in short: Avoid doing this in C. if you must, then use a robust and fully-featured Unicode handling library (i.e. do not use the C Standard Library's string functions like strstr, strtok, strlen, etc).
Here's how I would do it:
Read in input to a binary blob containing Unicode text (presumably UTF-8).
Use a Unicode library to:
Normalize the encoded Unicode text data (see https://en.wikipedia.org/wiki/Unicode_equivalence )
Identify word boundaries (assuming we're dealing with European-style languages that use sentences comprised of words).
Use a linguistics library and database (English alone is full of special-cases) to normalize each word to some singular canonical form.
Then lookup each morpheme in a case-insensitive hash-set of known "bad words".
Now, there are a few shortcuts you can take:
You can use regular-expressions to identify word-boundaries.
There exist Unicode-aware regular-expression libraries for C, for example PCRE2: http://www.pcre.org/current/doc/html/pcre2unicode.html
You can skip normalizing each word's inflections/declensions if you're happy with having to list those in your "bad word" list.
I would write working code for this example, but I'm short on time tonight (and it would be a LOT of code), but hopefully this answer provides you with enough information to figure out the rest yourself.
(Pro-tip: don't match strings in a list by checking each character - it's slow and inefficient. This is what hashtables and hashsets are for!)

Parsing shell commands in c: string cutting with respect to its contents

I'm currently creating Linux shell to learn more about system calls.
I've already figured out most of the things. Parser, token generation, passing appropriate things to appropriate system calls - works.
The thing is, that even before I start making tokens, I split whole command string into separate words. It's based on array of separators, and it works surprisingly good. Except that I'm struggling with adding additional functionality to it, like escape sequences or quotes. I can't really live without it, since even people using basic grep commands use arguments with quotes. I'll need to add functionality for:
' ' - ignore every other separator, operator or double quotes found between those two, pass this as one string, don't include these quotation marks into resulting word,
" "- same as above, but ignore single quotes,
\\ - escape this into single backslash,
\(space) - escape this into space, do not parse resulting space as separator
\", \' - analogously to the above.
Many other things that I haven't figured out I need yet
and every single one of them seems like an exception on its own. Each of them must operate on diversity of possible positions in commands, being included into result or not, having influence on the rest of the parsing. It makes my code look like big ball of mud.
Is there a better approach to do this? Is there a more general algorithm for that purpose?
You are trying to solve a classic problem in program analysis (of lexing and parsing) using a nontraditional structure for lexer ( I split whole command string into separate words... ). OK, then you will have non-traditional troubles with getting the lexer "right".
That doesn't mean that way is doomed to failure, and without seeing specific instances of your problem, (you list a set of constructs you want to handle, but don't say why these are hard to process), it is hard to provide any specific advice. It also doesn't mean that way will lead to success; splitting the line may break tokens that shouldn't be broken (usually by getting confused about what has been escaped).
The point of using a standard lexer (such as Flex or any of the 1000 variants you can get) is that they provide a proven approach to complex lexing problems, based generally on the idea that one can use regular expressions to describe the shape of individual lexemes. Thus, you get one regexp per lexeme type, thus an ocean of them but each one is pretty easy to specify by itself.
I've done ~~40 languages using strong lexers and parsers (using one of the ones in that list). I assure you the standard approach is empirically pretty effective. The types of surprises are well understood and manageable. A nonstandard approach always has the risk that it will surprise you in a bad way.
Last remark: shell languages for Unix have had people adding crazy stuff for 40 years. Expect the job to be at least medium hard, and don't expect it to be pretty like Wirth's original Pascal.

How to efficiently construct HTML on-the-fly in PHP

Four approaches occur immediately to me:
a) Simply echo many small fragments;
b) Create logically complete blocks by concatenating literals and PHP variables, then echo the block;
c) Start with a string representing a logically complete block with place-holder tokens, replace the tokens with PHP variables, the echo the block;
d) Create an array of strings comprising the literal fragments of the block, interspersed with named place-holder elements, then traverse the array and replace the placeholders, then join the array and echo the result.
(a) sounds intuitively inefficient, but since the system will probably buffer them, it may be OK, and then there are no string concatenation or substitution operations. However, the code will look awful and probably be difficult to maintain.
(b) is very messy, and since HTML contains lots of quote characters, it's tedious to get the syntax right - you often can't just use double quotes with interspersed variables. Also I suspect that string concatenation is inefficient.
(c) is good from a maintainability point of view, since you can clearly see the intended HTML, but substitution is probably also inefficient.
(d) may be quite efficient, since the join function can be clever and just allocate memory for the combined string once, and then copy the parts into it. It is also reasonably maintainable if the starting array literal is nicely laid out.
I'm sure lots of developers have thought about this, and quite likely I have missed some obvious alternative - which is the way to go?
It's a mix of a) and b), depending on what you need, but you shouldn't be doing this yourself, people have already made things like this for you to use for free, pick up a Framework. They take a little while to learn, but once you get into it it's so much faster and maintainable.
I'd say take a look at Zend since it's what i know ( http://en.wikipedia.org/wiki/Zend_framework ).

strstr vs regex in c

Let's say, for example, I have a list of user id's, access times, program names, and version numbers as a list of CSV strings, like this:
1,1342995305,Some Program,0.98
1,1342995315,Some Program,1.20
2,1342985305,Another Program,15.8.3
1,1342995443,Bob's favorite game,0.98
3,1238543846,Something else,
...
Assume this list is not a file, but is an in-memory list of strings.
Now let's say I want to find out how often a program has been accessed to certain programs, as listed by their version number. (e.g. "Some Program version 1.20" was accessed 193 times, "Some Program version 0.98" was accessed 876 times, and "Some Program 1.0.1" was accessed 1,932 times)
Would it be better to build a regular expression and then use regexec() to find the matches and pull out the version numbers, or strstr() to match the program name plus comma, and then just read the following part of the string as the version number? If it makes a difference, assume I am using GCC on Linux.
Is there a performance difference? Is one method "better" or "more proper" than the other? Does it matter at all?
Go with strstr() - using regex to count a number of occurrences is not a good idea, as you would need to use loop anyway, so I would suggest you to do a simple loop with searching for poistion of substring and increase counter and starting search position after each match.
strchr/memcmp is how most libc versions implemented strstr. Hardware-dependent implementations of strstr in glibc do better. Both SSE2 and SSE4.2 (x86) instruction sets can do way better than scanning byte-by-byte. If you want to see how, I posted a couple blog articles a while back --- SSE2 and strstr and SSE2 and BNDM search --- that you might find interesting.
I'd do neither: I'm betting it would be faster to use strchr() to find the commas, and strcmp() to check the program name.
As for performance, I expect string functions (strtok/strstr/strchr/strpos/strcmp...) to run all more or less at the same speed (i.e. really, really fast), and regex to run appreciably slower albeit still quite fast.
The real performance benefit would come from properly designing the search though: how many times it must run, is the number of programs fixed...?
For example, a single scan whereby you get ALL the frequency data for all the programs would be much slower than a single scan seeking for a given program. But properly designed, all subsequent queries for other programs would run way faster.
strtok(), and break the data up into something more structured (like a list of structs).

What is this character ` called?

I never noticed the character ` (the one in the same key as tilde ~). There is another single quote character ' in the same key as ". I see that the characters ` and ' aren't interchangeable whereas ' and " are.
I spent a lot of time due to that when compiling GTK programs. It gave error (file not found), and finally figured out that its not a single quote.
What is the purpose of this ` character and when is it (or when should it) be used?
Thanks.
It's typically called a "backtick", and in bash, it is used for command substitution (although the $(cmd) construct is usually preferred due to easier nesting).
` is known variously known as a backtick, or grave accent (See http://en.wikipedia.org/wiki/Grave_accent).
In UNIX shells, as well as some scripting languages (Ruby, Perl...), it introduces some input to be executed in a subshell. In C and C++, it has no special purpose but can be inserted as a character literal, or part of a string literal. One reason it's not used for something more interested in the extremely wide portability of the languages spans machines where the character can't be expected to be on the keyboards, and may not display very differently from the single-right-quote "'" on screen and printouts, making for extremely hard-to-see bugs.
In some word-processing and similar application programs, typing a backtick will insert a single left quote character "‘". Commonly keyboard input software will allow a user to type say "e" in order to enter the character "è", or "a` for "à" etc, as used in some languages' alphabets.
I call it a "grave", as in a grave accent
In MySQL, it's used to surround identifiers when they might otherwise be ambiguous (such as using a reserved word as a table or column name). There are going to be lots of different uses of that character in lots of different pieces of software, just as there are for the other keys on the keyboard.
In a few languages, including PHP, Perl, and i think Ruby, backticks execute shell commands.
http://php.net/manual/en/language.operators.execution.php
The SQL thing mentioned is another use, which unfortunately I am well aware of because of co-workers who decided 'Desc' was a good name for a field

Resources