Bulk spell checking for C code - c

I have a large source code package in C and I would like to spell check all string literals and comments. After adding exceptions to a file, I would like to perform the same procedure on each release, to see if there are any spelling errors inserted.
I have checked with ispell, hunspell and aspell but, to my disappointment and surprise, although they do seem to understand HTML, Tex and a few other languages they do not have a C feature. The closest I found was a "ccpp" filter mentioned for aspell, but when I "aspell dump filters" the ccpp filter is not listed.
Any ideas?

You have to write a lexer first to extract string constants and comments to a text file with the associated line & column of the source file.
(ply can be useful or lex/yacc but needs some coding).
Then use whatever spell checker you like, then parse the report, and trace back to the original C file location.
Or connect directly the spell checker to your lexer.

Related

Support for compiling ASN1 file to C with CONTAINING keyword

I'm using ESNACC for compiling multiple ASN source files to C code. For ease of understanding, I will explain the scenario here as succintly as possible:-
FileA.asn1 contains the following:-
FileA DEFINITIONS ::=
BEGIN
A ::= SEQUENCE
{
AContent [0] OCTET STRING (CONTAINING FileB.B)
}
END
FileB.asn1 contains the following:-
FileB DEFINITIONS ::=
BEGIN
B ::= SEQUENCE
{
BElem1 [0] INTEGER,
BElem2 [1] INTEGER
}
END
I used ESNACC to compile both files in one command. Upon analysing the C source files generated, I observed that the AContent field will be decoded as a constructed OCTET STRING (the data being received in the application guarantees that the field will be specified as constructed) with its contents being filled into a simple string. This means that FileB does not come into the picture at all. I was hoping that AContent would be further decoded with a structure of FileB being filled, so that I can easily access the elements within. This does not seem to be the case.
I'm fairly new with ASN1, so please let me know if my understanding is wrong in any way.
Is ESNACC not capable of generating code for supporting CONTAINING keyword properly?
Are there other compilers that are able to do this?
Can this be done by using ESNACC in any way?
If this cannot be done using ESNACC, and I don't want to use any other compiler, how would I access the contents within AContent at runtime easily?
I am not sure of the capabilities of ESNACC, but there are many other compilers that support the CONTAINING keyword. An excellent list of compilers can be found at https://www.itu.int/en/ITU-T/asn1/Pages/Tools.aspx which is part of the ITU-T ASN.1 Project.
Heimdal's ASN.1 compiler (lib/asn1/) has support for the funky Information Object System syntax extensions that allow you to declare things like what all goes into Certificate Extensions (for example), and the generated code will decode everything recursively in one go.

Unicode REGEX in Sql Server CLR function

I have a REGEX SQL CLR function:
var rule1 = new Regex("شماره\\s?\\d{1,10}")
Calling it on SQL Server 2016, however, returns this error:
System.ArgumentException: parsing "?????\s?\d{1,10}" - Quantifier {x,y} following nothing.
at System.Text.RegularExpressions.Regex..ctor(String pattern)
It seems that my unicode characters are changed to question marks, which makes the whole Regex wrong.
This issue has nothing to do with datatypes, whether for input parameters or return values, as the code provided, while sparse on detail, does show enough to see that:
there is no input parameter being used (the string is hard-coded).
the error is being thrown by System.Text.RegularExpressions.Regex, so has nothing to do with T-SQL or return values / types.
Also, while the error message does mention "Quantifier {x,y}", and there is indeed a {1,10} quantifier being used in the Regular Expression, it is a false correlation (albeit a rather understandable one) that the error message is referring to that specific quantifier. If you shorten the Regular Expression down to just "شماره", you will get the same error, except it will report the Regular Expression as being just "?????". Hence, "Quantifier {x,y}" actually refers to the first "?" in the expression shown in the error message (you will get the same error even if the Regular Expression is nothing more than "ش"). I figure that "Quantifier {x,y}" is the generalized way of looking at the ?, +, and * quantifiers as they can also be expressed as {0,1}, {1,}, and {0,}, respectively (or at least they should be).
This issue has nothing to do with SQL Server, or even Regular Expressions. This is an encoding issue, and RegEx is reporting the problem because it is being given ????? instead of شماره.
<TL;DR> Check your source code file's encoding. You might need to go to "Save As...", click on the down-arrow to the right of the word "Save" on the "Save" button, select "Save with Encoding...", and then select "Unicode (UTF-8 with signature) - Codepage 65001".
There is a problem with the project configuration and/or the compiler. I placed the following string in both a Console Application and a Database Project:
"-😈-ŏ-א---\U0001F608-\u014F-\u05D0-"
(The second half of that test string, after the ---, is merely the escape sequences for the same three characters as appear in the first half, and in the same order.)
I compiled both and inspected the compiled output (meaning: it hasn't been deployed to SQL Server yet). That string appears in the EXE file (Console App) as:
2D003DD808DE2D004F012D00D0052D002D002D003DD808DE2D004F012D00D0052D00
which is the UTF-16 LE encoding for: -😈-ŏ-א---😈-ŏ-א-
Yet, it appears in the DLL file (SQLCLR Assembly) as:
2D003F003F002D003F002D003F002D002D002D003DD808DE2D004F012D00D0052D00
which is the UTF-16 LE encoding for: -??-?-?---😈-ŏ-א-
I even changed the output type of the Console App project to be "Class Library" and the string still got embedded correctly in that DLL file. So, for some reason the literal characters are being turned into literal question marks when compiled into a SQLCLR Assembly. I haven't yet figured out what is causing this as a quick look at the config settings and command-line flags for csc.exe seems to show them being effectively the same.
In either case, it should be clear that specifying the Arabic characters via escape sequences, while cumbersome, will at least work, hence providing a (hopefully short-term) work-around so that you can move forward on this. I will continue looking to see what could be causing this difference in behavior.
UPDATE
In order to determine if the string was being converted to an 8-bit encoding or something else, I added two characters to the test string (one in both Windows-1252 and ISO-8859-1, and one only in Windows-1252):
§ = 0xA7 in CP-1252, 0xA7 in ISO-8859-1, and 0x00A7 in UTF-16
œ = 0x9C in CP-1252, not in ISO-8859-1, and 0x0153 in UTF-16
The new test string is:
"-😈-ŏ-א-§-œ---\U0001F608-\u014F-\u05D0-\x00A7-\x0153-"
That string appears in the EXE file (Console App) as:
2D003DD808DE2D004F012D00D0052D00A7002D0053012D002D002D003DD808DE2D004F012D00D0052D00A7002D0053012D00
which is the UTF-16 LE encoding for: -😈-ŏ-א-§-œ---😈-ŏ-א-§-œ-
Yet, it appears in the DLL file (SQLCLR Assembly) as:
2D003F003F002D003F002D003F002D00A7002D0053012D002D002D003DD808DE2D004F012D00D0052D00A7002D0053012D00
which is the UTF-16 LE encoding for: -??-?-?-§-œ---😈-ŏ-א-§-œ-
So, because both § and œ came through correctly in the SQLCLR Assembly, it is clearly not ISO-8859-1. And, it is either Code Page Windows-1252 or some other that supports both of those characters (CP-1252 being the most likely given that my system is using it).
Still investigating the root cause...
UPDATE 2
Ok, I feel kinda silly. Sometimes it helps to close a file (or the entire solution sometimes) and reopen it. Doing so I noticed that my test string now appeared as:
"-??-?-?-?-?---\U0001F608-\u014F-\u05D0-\x00A7-\x0153-"
Funny, I don't remember pasting that in ;-). So, I checked the file encoding that Visual Studio was saving it as and sure enough it was "Western European (Windows) - Codepage 1252". And just to be extra special certain, I checked the file for the Console App and it was correctly set to "Unicode (UTF-8 with signature) - Codepage 65001". D'oh! Changing the file encoding under "Save As..." to "Unicode (UTF-8 with signature) - Codepage 65001", I then replaced both the test string and the O.P.'s Regular Expression. Both came through perfectly, no errors or question marks.

Frama-c: save plugin analysis results in c file

I'am new in frama-c. So I apologize in advance for my question.
I would like to make a plugin that will modify the source code, clone some functions, insert some functions calls and I would like my plugin to generate a second file that will contain the modified version of the input file.
I would like to know if it is possible to generate a new file c with frama-c. For example, the results of the Sparecode and Semantic constant folding plugins are displayed on the terminal directly and not in a file. So I would like to know if Frama-c has the function to write to a file instead of sending the result of the analysis to the standard output.
Of course we can redirect the output of frama-c to a file.c for example, but in this case, for the plugin scf for example, the results of value is there and I found that frama-c replaces for example the "for" loops by while.
But what I would like is that frama-c can generate a file that will contain my original code plus the modifications that I would have inserted.
I looked in the directory src / kernel_services / ast_printing but I have not really found functions that can guide me.
Thanks.
On the command line, option -ocode <file> indicates that any subsequent -print will be done in <file> instead of the standard output (use -ocode "" after that if you want to print on stdout again). Note that -print prints the code corresponding to the current project. You can use -then-on <prj> to change the project you're interested in. More information is of course available in the user manual.
All of this is of course available programmatically. In particular, File.pretty_ast by defaults pretty-prints (i.e. output a C program) the AST of the current project on stdout, but takes two optional argument for changing the project or the formatter to which the output should be done.

Prolog, Define Clause Grammar and File

I'm new to Prolog and I have just started looking around. I read the Define Clause Grammar chapter on both Simply Logical and Learn Prolog now!, so now I wanted to get started with some exercise but I'm stuck.
I have to read from a file with this syntax
setName = {elemen1, element2,..., elementN}.
element1: element2 > element3.
Now I have read that when you define a DCG you have a parser for free, so I wanted to do that to get the data from my file to the Prolog program.
My problem is that in all the examples I have read they always provide a basic dictionary like
article --> [the]
but I cannot do that because I don't know what is going to be written in the file.
Any suggestions?
In SWI-Prolog, consider using library(dcg/basics). It provides building-blocks that you can use in your DCG. Focus on a clear declarative description of what the contents of the file look like, state this with a DCG. Then use phrase_from_file/2 from library(pio) to apply the DCG to a file.

make file running on Linux - how to ignore case sensitive?

I have a huge project, whole written in C language and I have a single make file that is used to compile it. The project C files contains lots of capitalize problems in it's header files, meaning there are tones of header files that were miss-spelled in lots of C files.
The problem is I need to migrate this project to compile on Linux machine and since Linux is case sensitive I got tones of errors.
Is there an elegant way which I can run make file in Linux and tell him to ignore case sensitive?
Any other solution will be welcome as well.
Thanks a lot.
Motti.
You'll have to fix everything by hand and rename every file or fix every place with #include. Even if you have a huge project (comparable with linux kernel), it should be possible to do this during a hour or two. Automation may be possible, but manual way should be better - because script won't be able to guess which name is right - filename, or the name used in #include.
Besides, this situation is a fault of original project developer. If he/she wasn't sloppy and named every header in every #include correctly, this wouldn't happen. Technically, this is a code problem similar to syntax error. The only right way to deal with it is to fix it.
I think it takes not too long to write a small script, which goes thru the directories first, then replaces C headers. Explained:
Scan the headers' folder and collect filenames.
Make a lowercase list of them. You have now original and locased pairs.
Scan the C source files and find each line contains "#include"
Lowercase it.
Find the lowercase filename in the list collected and lowercased from headers.
Replace the source line with the one collected from headers.
You should put the modified files into a separate folder structure, avoid overwriting the whole source with some buggy stuff. Don't forget to create target folders during the source tree scan.
I recommend a script language for that task, I prefer PHP, but just it's the only server-side script language which I know. Yep, it will run for a while, but only once.
(I bet that you will have other difficulties with that project, this problem is not a typical indicator of high quality work.)
Well I can only tell you that you need to change the case of those header files. I don't know that there is any way you can make it automatic but still you can use cscope to do it in a easier way.
http://www.linux-tutorial.info/modules.php?name=ManPage&sec=1&manpage=cscope
You can mount the files on a case-insensitive file system. FAT comes to mind. ntfs-3g does not appear to support this.
I use the find all and replace all functionality of Source Insight when i have to do complete replacement. But your problem seems quite big, but you can try the option to replace every header file name in all occurences of source files using the
"Find All" + "Replace" functionality. You can use notepad++ too for doing the same.
A long time ago there was a great tool under MPW (Macintosh Programmer's Workshop) called Canon. It was used to canonize text files, i.e. make all symbols found in a given refernce list have have the same usage of upper/lower case. This tool would be ideal for a task like this - I wonder if anything similar exists under Linux ?

Resources