How to properly locate unbalanced parenthesis in source code? - c

Recently I learned some text editor has functions to check whether all parentheses are balanced inside a file. (e.g. check-parens in Emacs). It may even be able to highlight the 1st unbalanced parenthesis if there is any. However I am not clear whether there is a way to properly locate it instead of always highlight the 1st unbalanced occurrence.
Consider below code:
void foo()
{
int some_num0 = 0;
{
int some_num1 = 1;
{
int some_num2 = 2;
{
int some_num3 = 3;
//ops I forgot to close this block.
}
}
}
A simple balance check, (e.g. check-parens in Emacs) will bring me to the opening brace of foo's definition. However a more reasonable indication should bring me to somewhere around the innermost level, the one near some_num3. Is there a way to achieve this?

The parenthesis check basically reads the code and updates a counter whenever it hits a parenthesis. If it is an opening bracket, it adds 1 ; if it is a closing one, it adds -1 (subtracts 1). If the counter ever hits a negative number, there is a problem, and it tells you. If, at the end of the code, the counter is not zero, there also is a problem ; that is your case.
But how do you want it to know where the missing brace(s) actually is/are ? The only way to do something close to what you want is to check for the indentation as well. But what if your code is not properly indented ?
I do not know of any tool that checks for indentation while checking braces, and I doubt there is a good one (if any).
The fact is, the more you forget a parenthesis, and above all, the more you write code, the more you remember to not forget these parenthesis. Or at least, it is likely to happen.
So, do not blame your IDE for not being endowed with reason, but rather blame you for not immediately locating missing brackets. And write code.

Some of the popular Emacs libraries include, but are not limited to:
highlight-parentheses: https://github.com/nschum/highlight-parentheses.el
rainbow-delimiters: https://github.com/jlr/rainbow-delimiters
Here is a link to a recent modification that I made to one of the functions used by highlight-parentheses, which will preserve the overlays when using scroll-up, scroll-down, or mwheel-scroll: https://stackoverflow.com/a/25269210/2112489
Here is a link to a modified version of highlight-parentheses, which is what I use: https://stackoverflow.com/a/23998965/2112489 I have not yet updated the posted version of parens-mode to include the modification for scrolling that is referred to in the previous link. My personal preference is to add / delete overlays, instead of storing them at the beginning of the buffer and moving them around.
Here is a screen-shot of a modified version of highlight-parentheses:
(source: lawlist.com)

If you indented all of the closing braces by one level, would you still say that it would be more reasonable to indicate the 4th opening brace?
If not, perhaps it should be checking for consistent indentation too.

Not realy a check-parens solution but when something like this happens to me I usually mark the whole block and go M-x indent-region to have emacs try to indent the selected region correctly.
In your simple case it would indent them all one level (MRABs answer) and you would just close the last brace.
In a more complicated case indentation will break somewhere (i.e the code won't look like you expect it to do) and that is usually where the parenthesis is missing.

I just used notepad++ with the BracketsCheck plugin https://community.notepad-plus-plus.org/topic/14090/best-way-to-find-unmatched-parentheses/2. After some hours searching and trying VSCode plugins, n++ did it for me :) I was checking a 2.5K lines XML file with about 270 formula expressions. Checking parenthesis formula by formula using the usual matching parenthesis highlighting feature wasn't looking much of an option.

Related

Accurately count number of keywords "if", "while" in a c file

Are there any libraries out there that I can pass my .c files through and will count the visible number of, of example, "if" statements?
We don't have to worry about "if" statement in other files called by the current file, just count of the current file.
I can do a simple grep or regex but wanted to check if there is something better (but still simple)
If you want to be sure it's done right, I'd probably make use of clang and walk the ast. A URL to get you started:
http://clang.llvm.org/docs/IntroductionToTheClangAST.html
First off, there is no way to use regular expressions or grep to give you the correct answer you are looking for. There are lots of ways that you would find those strings, but they could be buried in any amount of escape characters, quotations, comments, etc.
As some commenters have stated, you will need to use a parser/lexer that understands the C language. You want something simple, you said, so you won't be writing this yourself :)
This seems like it might be usable for you:
http://wiki.tcl.tk/3891
From the page:
lexes a string containing C source into a list of tokens
That will probably get you what you want, but even then it's not going to be trivial.
What everyone has said so far is correct; it seems a lot easier to just grep the shit out of your file. The performance hit of this is neglagible compared to the alternative which is to go get the gcc source code (or whichever compiler you're using), and then go through the parsing code and hook in what you want to do while it parses the syntax tree. This seems like a pain in the ass, especially when all you're worried about is the conditional statements. If you actually care about the branches, you could actually just take a look at the object code and count the number of if statements in the assembly, which would correctly tell you the number of branches (rather than just relying on how many times you typed a conditional, which will not translate exactly to the branching of the program).

Move forward over a compound statement in emacs

Is there an emacs command to move forward over a compound statement in C or C++ files?
If I have the following code
^ if (foo)
{
doSomething();
}$
and the point is at the caret (^), I want to be able to do something like M-x forward-compound-statement RET and have the point move to the dollar sign.
forward-sexp seems like it should be the right command, but that will only step over a single word, putting the point after the if. c-end-of-statement is also wrong, as it only goes to the first close parenthesis.
EDIT: In the case of an if-then-else-if-else block, I would ideally want it to go past all of the else blocks.
I am also hoping to use this inside defuns that I write in the future, so key strokes are less of a concern than having one command to do it rather than repeating a command until I see that it gets to the right location.
Something like forward-compound-statement may very well exist for specific major-modes. I wouldn't bother learning them though.
Instead, I recommend you get used to navigating with more composable commands like C-s. You can try something like jump-char to shorten the sequence by one key.

How to handle labels in a scripting language I'm writing?

So I've been stewing over this for a long time, thinking about it. Here's a code example first, and then I'll explain it.
:main
dostuff
otherlabel
:otherlabel
dostuff
Alright so in this example, main is where the code starts, and it 'calls' the label 'otherlabel'. This is really just a shortcut for a jump command that changes execution to a different location in memory. My problem though is, how do I handle these labels so that they don't have to be declared before they are called?
At the moment, I'm doing a single step compilation reading straight from the source and outputting the bytecode. I am simply handling labels and adding them to a dictionary when I find them. And then I replace 'otherlabel' with a jump command to the correct location in code. But in this case that code wouldn't compile.
I've thought of a few ways to do this:
First is handling labels before anything else but this requires me to do everything in two steps and I have to deal with the same code twice, this slows down the process and just seems like a mess.
Second is queueing up the label calls until AFTER I've gone through the entire file and compiled everything else and then dealing with them, this seems much cleaner.
I'm writing this in C so I'd rather not implement complex data structures, I'm looking for the most straight forward way to handle this.
Use multiple passes. One pass isn't going to suffice for a scripting language, especially when you are getting to the more complex structures.
In a first pass, before compiling, construct your dictionary of labels.
In a later pass, when the compiling happens, just use that dictionary.
You could use "backpatching", although it sounds like that's what you've tried already; and it could be consstrued as a complex structure.
When you encounter a call to an undefined label, you emit the jump with a blank address field (probably into a buffer, otherwise this becomes the same as "multipass" if you have to re-read the file to patch it); and you also store a pointer to the blank field in a "patch-up" list in the dictionary. When you encounter the label definition, you fill-in all the blanks in the list, and proceed normally.

How can I automatically fold a long C code in Vim?

I regularly run into C-codes without folding. It is irritating to read them if there is no folding, particularly with long files. How can I fold them?
To fold according to syntax
:set foldmethod=syntax
If you want to do it manually on the bits you want to fold away
:set foldmethod=manual
then create new folds by selecting / moving and pressing zf
e.g.
shift-v j j zf
(ignoring the spaces)
Edit: Also see the comments of this answer for indent and marker foldmethods.
I think you may have mixed the terminology. Do you need "wrapping" or "folding". Wrapping is the one where lines that wouldn't usually fit on screen due to their length, are wrapped, i.e. shown on several consecutive lines on screen (actually, it is one line, in several lines - hard to explain, best to see in practice).
In vim wrapping is set by
:set wrap
to turn it on, and
:set textwidth=80
to determine where vim should wrap the text (80 characters is usually a nice measure).
Folding on the other hand is a completely different matter. It is the one where vim folds several lines of code (for example, a function) into one line of code. It is useful for increasing readability of code. Vim has several folding methods, you can see all of them if you
:help folding
What you are looking for, I think would be, syntax folding, but I could be wrong. I recommend reading the help page, it is not long, and very useful.
Actually, there is another very straight forward and effective way, which is using foldmethod = marker and set foldmarker to be {,}. Then the fold result would looks like:
all of the functions fold-ed. Basically, it looks like the outline in IDE. (and you can also set foldlevel=1or more, if you do not want to fold everything at the beginning)
this is what a normal function looks like when you open it with level-1 via zo.
In addition, to do folding by syntax needs a bit of extra work, and here is a good tutorial about it. But I think fold by marker={,} is quite enough, and most importantly, it's simple and neat.
I've rolled up a fold plugin for C and C++. It goes beyond what is done with syntax folding (may be it could be improved, I don't know), and leaves less noisy and not really useful things unfolded, compared to indentation and marker based folding.
The caveat: in order to have decent reaction times, I had to make some simplifications, and sometimes the result is quite messed-up (we have to type zx to fix it).
Here is a little screencast to see how the plugin folds a correctly balanced C++ source code, which is not currently being modified :(
In vi (as opposed to vim) the answer was:
:set wm=1
This sets the wrap margin to one character before the end of the line. This isn't the world's best specification with variable sized windows (it made sense with green screens when it was hard to change the size).
That means there is also an alternative way to do it in vim:
:set textwidth=30
See: VimDoc User Manual Section 25.1
The you probably want the setting
:set foldmethod=syntax
But don't put that in manually! Thats missing out on one of Vims biggest features which is having custom settings for hundreds of file types already builtin. To get that, add this to your ~/.vimrc
filetype plugin on
filetype indent on
filetype detection is mostly based on extension, in this case *.c files. See :help :filetype for more info. You can also customize these filetype based settings.

What is this strange C code format?

What advantage, if any, is provided by formatting C code as follows:
while(lock_file(lockdir)==0)
{
count++;
if(count==20)
{
fprintf(stderr,"Can't lock dir %s\n",lockdir);
exit(1);
}
sleep(3);
}
if(rmdir(serverdir)!=0)
{
switch(errno)
{
case EEXIST:
fprintf(stderr,"Server dir %s not empty\n",serverdir);
break;
default:
fprintf(stderr,"Can't delete dir %s\n",serverdir);
}
exit(1);
}
unlock_file(lockdir);
versus something more typical such as
while(lock_file(lockdir)==0) {
count++;
if(count==20) {
fprintf(stderr,"Can't lock dir %s\n",lockdir);
exit(1);
}
sleep(3);
}
if(rmdir(serverdir)!=0) {
switch(errno) {
case EEXIST:
fprintf(stderr,"Server dir %s not empty\n",serverdir);
break;
default:
fprintf(stderr,"Can't delete dir %s\n",serverdir);
}
exit(1);
}
unlock_file(lockdir);
I just find the top version difficult to read and to get the indenting level correct for statements outside of a long block, especially for longs blocks containing several nested blocks.
Only advantage I can see is just to be different and leave your fingerprints on code that you've written.
I notice vim formatting would have to be hand-rolled to handle the top case.
The top example is know as "Whitesmiths style". Wikipedia's entry on Indent Styles explains several styles along with their advantages and disadvantages.
The indentation you're seeing is Whitesmiths style. It's described in the first edition of Code Complete as "begin-end Block Boundaries". The basic argument for this style is that in languages like C (and Pascal) an if governs either a single statement or a block. Thus the whole block, not just its contents should be shown subordinate to the if-statement by being indented consistently.
XXXXXXXXXXXXXXX if (test)
XXXXXXXXXXXX one_thing();
XXXXXXXXXXXXXXX if (test)
X {
XXXXX one_thing();
XXXXX another_thing();
X }
Back when I first read this book (in the 90s) I found the argument for "begin-end Block Boundaries" to be convincing, though I didn't like it much when I put it into practice (in Pascal). I like it even less in C and find it confusing to read. I end up using what Steve McConnel calls "Emulating Pure Blocks" (Sun's Java Style, which is almost K&R).
XXXXXXXXXXXXXX X if (test) {
XXXXXX one_thing();
XXXXXX another_thing();
X }
This is the most common style used to program in Java (which is what I do all day). It's also most similar to my previous language which was a "pure block" language, requiring no "emulation". There are no single-statement bodies, blocks are inherent in the control structure syntax.
IF test THEN
oneThing;
anotherThing
END
Nothing. Indentation and other coding standards are a matter of preference.
Personal Preference I would have thought? I guess it has the code block in one vertical line so possibly easier to work out at a glance? Personally I prefer the brace to start directly under the previous line
It looks pretty standard to me. The only personal change I'd make is aligning the curly-braces with the start of the previous line, rather than the start of the next line, but that's just a personal choice.
Anyway, the style of formatting you're looking at there is a standard one for C and C++, and is used because it makes the code easier to read, and in particular by looking at the level of indentation you can tell where you are with nested loops, conditionals, etc. E.g.:
if (x == 0)
{
if (y == 2)
{
if (z == 3)
{
do_something (x);
}
}
}
OK in that example it's pretty easy to see what's happening, but if you put a lot of code inside those if statements, it can sometimes be hard to tell where you are without consistent indentation.
In your example, have a look at the position of the exit(1) statement -- if it weren't indented like that, it would be hard to tell where this was. As it is, you can tell it's at the end of that big if statement.
Code formatting is personal taste. As long as it is easy to read, it would pay for maintenance!
By following some formatting and commenting standards, first of all you show your respect to other people that will read and edit code written by you. If you don't accept rules and write somehow esoteric code the most probable result is that you will not be able communicate with other people (programmers) effectively. Code format is personal choice if software is written only by you and for you and nobody is expected to read it, but how many modern software is written only by one person ?
The "advantage" of Whitesmiths style (as the top one in your example is called) is that it mirrors the actual logical structure of the code:
indent if there is a logical dependency
place corresponding brackets on the same column so they are easy to find
opening and closing of a context (which may open/close a stack frame etc) are visible, not hidden
So, less if/else errors, loops gone wrong, and catches at the wrong level, and overall logical consistency.
But as benefactual wrote: within certain rational limits, formatting is a matter of personal preference.
Its just another style--people code how they like to code, and that is one accepted style (though not my preferred). I don't think it has much of a disadvantage or advantage over the more common style in which brackets are not indented but the code within them is. Perhaps one could justify it by saying that it more clearly delimits code blocks.
In order for this format to have "advantage", we really need some equivalent C code in another format to compare to!
Where I work, this indentation scheme is used in order to facilitate a home-grown folding editor mechanism.
Thus, I see nothing fundamentally wrong with this format - within certain rational limits, formatting is a matter of personal preference.

Resources