Tool to Scan Code Comments, and convert to Standard Format [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm working on a C project that has seen many different authors and many different documentation styles.
I'm a big fan of doxygen and other documentation generations tools, and I would like to migrate this project to use one of these systems.
Is anybody aware of a tool that can scan source code comments for keywords like "Description", "Author", "File Name" and other sorts of context to intelligently convert comments to a standard format? If not I suppose I could write a crazy script, or convert manually.
Thanks

The only one I can think of when I read the O'Reilly's book on Lex + Yacc, was that there was code to output the comments on the command line, there was a section in chapter 2 that shows how to parse the code for comments including the // and /*..*/...There's a link on the page for examples, download the file progs.zip, the file you're looking for is ch2-09.l which needs to be built, it can be easily modified to output the comments. Then that can be used in a script to filter out 'Name', 'Description' etc...
I can post the instructions here on how to do this if you are interested?
Edit: I think I have found what you are looking for, a prebuilt comment documentation extractor here.

I think as tommieb75 suggests, a proper parser is the way to handle this.
I'd suggest looking at ANTLR, since it supports re-writing the token buffer in-place, which I think would minimise what you have to do to preserve whitespace etc - see chapter 9.7 of The Definitive ANTLR reference.

If you have relatively limited set of styles to parse, it would be fairly simple to write a Visual Studio macro (for use in the IDE) or a standalone application (for just processing the source code 'offline') that will search a file for comments and then reformat them into a new style using certain titles or tags to split them apart.
A shortcut that may help you is to use my AtomineerUtils Pro Documentation add-in. It can find and convert all the comments in a source file in one pass. Out of the box it parses XML Documentation, Doxygen, JavaDoc and Qt formats (or anything sufficiently close to them) and can then output the comment in any of those formats. It can also be configured to convert incompatible legacy comments. There are several options to aid conversion, but the most powerful calls a Visual Studio Macro with the comment text before it parses it, allowing you to apply a bit of string processing to convert legacy comments into a format that AtomineerUtils can subsequently read (an example macro for one of the most commonly used legacy styles is supplied on the website, so it's usually pretty simple to modify this to cope with your legacy format, as long as it's suitable for a computer to parse).
The converted text need not be particularly tidy - Once AtomineerUtils can extract the documentaiton entries, it will clean up the comments for you - it optionally applies word wrapping, consistent element ordering and spacing etc automatically, and ensures that the comment accurately describes the code element it documents (its entries match the params, typeparams, exceptions thrown etc) and then outputs a replacement comment in its configured format. This saves you doing a lot of work in the conversion macro to get things tidy - and once you have finished converting you can continue to use the addin to save time documenting your code, and ensure that all new comments continue in the same style.

Related

Justification of BOM mark in file encoding [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I want to be confident that using BOM mark for file encoding is absolutely needed for a file for the following reasons.
A information of a file must be self-contained. We didn't figure out a clear algorithm for identifying which encoding is appropriate for a file.
For the compatibility issue about the shebang line, this issue need to be corrected inside the script language because the encoding is much higher concept than the shebang line.
For the first claim, I have difficult time to determine which encoding is right or not for a file. Therefore, applying different encoding for a file appeared frequently and I guess that most of the fresh developers encounter this situation and ignore the weird characters in a file due to different encoding strategy.
I already recognize the compatibility is an important aspect for software maintenance. However, I think that the old rule that makes the system confuse is changed for future steps.
Is that any thought or any movement to make adding BOM mark as official? Or is there any critical reason that we must not introduce BOM mark? (e.g. A clear algorithm to identify the encoding file exists.)
My understanding comes from the following link, so the additional link to change my perspective would be a great pleasure.
What's the difference between UTF-8 and UTF-8 without BOM?
Thanks,
Your first assumption is wrong. We have protocols to define what a file (or a packet) contain and how to interpret the contain. We should always split meta-data with data. You practically are pushing to put BOM as meta-data, which describe the following bytes, but this is not enough. Text data is not a so useful information: we still need to understand and interpret what it is the meaning of text data. The more obvious part is about interpreting U+0020 (white space) either as a printed character or as a control data. HTML interpret as the second (two whitespaces are not so special, or a white space and a new line, but in <pre>). But also: we have a mail, a mailbox, a markdown file, a HTML, etc.. BOM doesn't help alone. But so, for your first point, we need to add more information, and so on. But then we have a general container format (metadata, with one or more data), so it is not more text, and it is not BOM which help us.
If you need BOM, you already lost the battle, and you can have a BOM which it is not really a BOM, but real data in other encoding. Using 2 bytes or 3 bytes are not enough (shebang, which it is old, used 4 bytes, #! /, now space is not more required, but in any case it is an old protocol, when files were not heavy exchanged, and the path is relevant (nobody execute random files, and if it not a shebang, it was an a.out file).
And you are discussing a old stuff. Now everything is UTF-8. No need to BOM. Microsoft is just making thing more complex: Unix, Linux, macos did a short transition without much hurt (and no "flag day"). Also web: it is UTF-8 by default. Your question is about programming languages, but there UTF-8 is fine: they uses ASCII in syntax, and what it is in strings it doesn't matter so much: it is standard to treat strings and Unicode as opaque objects, but for few cases, else you will miss something from Unicode (e.g. splitting combining chars, splitting emoji e.g. in language which works with UTF16 code units, etc.).
UTF-16 is not one thing you will write programs. It may be used by API (fixed length may be/seem better), or ev. for data, but usually not for coding.
And BOM doesn't help, if you do not modify all scripts/programs (but so, lets' do it as "all is UTF-8"): it is not seldom to find program sources in multiple encoding (and on the same file): you may have copied-pasted the copyright (and so author name) with a simple editor (and with one encoding), then strings may be on other encoding, and few comments (and committers name) maybe on a different one. And git (and other tools) just check lines so it may insert lines with wrong encoding: git has very few information, and users often have incorrect configuration. So you may break sources which where ok (just because encoding problems were just in comments).
Then a short comment on the second assumption, which it is also difficult.
You want to split layers, but this is very problematic: we have scripts which contain binary data at the end, so operating system should not try to transcode the script (and so then to remove BOM), because first part may be just text, but some part may requires exact the correct bytes. (And some Unicode test files are also in this category, and they are text, possibly with some invalid code).
Just use UTF-8 without BOM and all things become much simpler.

Tool to convert (translate) C to Go? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
What tool to use to convert C source code into Go source code?
For example, if the C code contains:
struct Node {
struct Node *left, *right;
void *data;
};
char charAt(char *s, int i) {
return s[i];
}
the corresponding Go code generated by the tool should be:
type Node struct {
left, right *Node
data interface{}
}
func charAt(s string, i int) byte {
return s[i]
}
The tool does not need to be perfect. It is OK if some parts of the generated Go code need to be corrected by hand.
rsc created github.com/rsc/c2go to convert the c based Go compiler into Go.
As an external example, akavel seems to be trying to use it to create a Go based lua: github.com/akavel/goluago/
github.com/xyproto/c2go is another project, but it hasn't been touched in a little while.
I guess no such (C to Go source code conversion) tool exist today. You might consider to make your own converter. The question becomes: is it worth it, and how to do that?
It probably might not be worth the effort, because Go and C could be somehow interoperable. For example, if you use the GCC 4.6 (or to be released 4.7, i.e. the latest snapshot) your probably can link C & Go code together, with some care.
Of course, as usual, the evil is in the details.
If you want a converter, do you want the obtained Go code to be readable and editable (then the task is more difficult, since you want to keep the structure of the code, and you also want to keep the comments)? In that case, you probably need your own C parser (and it is a difficult task).
If you don't care about readability of the generated Go code, you could for example extend an existing compiler to do the work. For example, GCC is extensible thru plugins or thru MELT extensions, and you could customize GCC (with MELT, or your own C plugin for GCC) to transform Gimple representation (the main internal representation for instructions inside GCC) to unreadable Go code. This is somehow simpler (but still require more than a week of work).
Of course, Go interfaces, channels and even memory management (garbage collected memory) has no standard C counterpart.
Check out this project
https://github.com/elliotchance/c2go
The detailed description is in this article
Update: August 6, 2021
Also check this one
https://github.com/gotranspile/cxgo
I'm almost sure there is no such tool, but IMHO in every language it's good to write in its own "coding style".
Remember how much we all loved C preprocessor tricks and really artistic work with pointers? Remember how much care it took to deal with malloc/free or with threads?
Go is different. You have no preprocessor, but you have closures, objects with methods, interfaces, garbage collector, slices, goroutines and many other nice features.
So, why to convert code instead of rewriting it in a much better and cleaner way?
Of course, I hope you don't have a 1000K lines of code in C that you have to port to Go :)
Take a look at SWIG, http://www.swig.org/Doc2.0/Go.html it will translate the C/C++ headers to go and wrap them for a starting point. Then you can port parts over bit by bit.
As far as I know, such tool does not exist (yet). So you're bound to convert your C code to Go by hand.
I don't know how complex the C code is you want to convert, but you might want to keep in mind Go has a "special" way of doing things. Like the usage of interfaces and channels.

C XML library for Embedded Systems [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm working on a project for an embedded system that's using XML for getting data into and out of the system. I don't want the XML handling to devolve into a bunch of bits that build XML strings using snprintf()/strcat() and friends or parse XML by counting "<" and ">" characters.
I've found several XML libraries, a couple of which might even be small enough, but the closest they come to C is C++, which is not in the cards for this system. I hoping I can find an XML library that meets the following constraints:
C source code
no dynamic memory allocation
cheap. Free is better, but copyleft won't do the trick.
It doesn't have to be a full parser - I just want to be able to pull text out of nested elements and have a reasonably simple way to generate XML that doesn't rely on format strings. Attributes aren't being used (yet), so the library doesn't need to support them even. The XML documents will be pretty small, so something that's DOM-like would be fine, as long as it'll work with client-provided buffers (parsing the raw XML in-place would be nice).
PugXML and TinyXML look to be pretty close, but I'm hoping that someone out there knows about an XML lib tailored just for C-based embedded systems that my googling is missing.
I don't know about dynamic memory allocation, but a standard C XML parser is expat, which is the underlying library for a number of parsers out there.
I am not sure but perhaps Mini-XML: Lightweight XML Library will help you:
Mini-XML only requires an ANSI C compatible compiler.
It is freeware.
Its binary size is around 100k.
It parses the whole xml-file and then store all the info into a linked list.
You could use an ASN.1 XER encoder; there's a free one at http://lionet.info/asn1c/
You could use the one from Gnome.
I have written sxmlc to be as simple as possible and I know people use it in routers, to perform in-place parsing of web queries.
Unfortunately (and I'm 5 years late...) it does use memory allocation, though kept at a minimum: one buffer to read each "XML line" (what lies between < and >, sorry ;)), and many small buffer to keep track of the tag name, attributes names and values, and text (though I always wanted to use char[16] or so for those).
And it makes use of strdup/strcpy and such.
As I want that anybody can use it freely, the licence is BSD.
Xerces-C library would be optimal to use, in this scenario.
If it is going to be pretty small XML, why not generate programatically, using sprintf or other stuff and use string extracting functions to parse the same. But as mentioned earlier, if little big, would suggest to use Xerces-c Library, as it is open source

Cross-platform editor control [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need a cross-platform editor control to use as GUI-part in an in-house tool. The control may be commercial, but with reasonable price.
Required features:
Platforms: Win32, OS X, Linux
UTF-8 support
Fine-grained run-time control to the text style (or at least color)
Nice low-level plain C API without usual horrible bloat
Should not prevent me to have these features (even if I'll have to implement them myself):
Undo / Redo
Copy / Paste
Context menu, depending on click position in text
Toolbar, depending on cursor position in text
Sidebar panel, depending on cursor position in text
Actually above requires not simple control, but whole cross-platform GUI library.
Discarded options:
Scintilla and descendants
FLTK
Fox-toolkit
gtksourceview
Update:
Note: I've slipped in some half-written discard reasoning here, I apologize. Scintilla indeed does work on OS X. However, if I get it correctly, Scintilla's API is in C++.
Use-case:
My use-case is to write custom "semi-rigid" logic editor, where user is free to copy-paste around, add comments where he wishes, even type in text directly if he wish. But text structure is a rigid natural language representation of logic tree (somewhat AST-like in nature). I plan to write something intellisense-like (or code-template-like) to be used as the main authoring tool (instead of typing logic by hand).
BTW, storage format would not be plain text, but instead internal representation of mentioned logic tree (with comments and whitespaces etc. metainfo).
So, I have all necessary information to render text in needed colors by myself. I do not need any external lexers etc.
As John wrote, Scintilla is known to run on OS X.
Now, it is not a rich text component, if that's what you are looking for. It is a source code editor: you can't apply arbitrary colors to arbitrary segments of text, it uses a lexer to style the content.
You didn't tell us what is your use case.
[EDIT] Thanks for adding the use case.
Disclaimer 1: I don't try to "sell" Scintilla, I just try to provide you information about a component I know well, hoping that helps you... :-D
Note that the Related Sites page lists a number of alternative Editing Components which can be interesting (or not, lot of them are for Win32 only).
Disclaimer 2: I have no experience of using Scintilla outside of the Win32 platform.
But looking at the source tree, I see a scintilla/macosx folder. Among other things, it has a SciTest sub-folder with a main.cpp file. Despite its extension, it strongly looks like pure C for me. So it can be an example of how to use Scintilla in C.
Note that by design, Scintilla API is very limited: it was initially made to be used as most traditional Win32 components, by sending messages to it. The Scintilla Documentation page only lists these messages and their parameters. The main.cpp example creates the window with the component in MacOS X style and sends commands with lines like scintilla->WndProc(SCI_STYLESETFORE, 0, 0x808080);
I won't claim it does everything you need, or even that it works flawlessly on MacOS X, you have to experiment (or ask the author of the adaptation) to be sure.
Also Scintilla won't provide a toolbar nor a sidebar panel (this belongs more to the application itself). But I think it can provide enough notifications to help you keeping these side components on context.
You will need also to write a specific lexer (C++ here) for your syntax. It isn't hard if you look how works other lexers. Perhaps you will find one for a language close enough to be used as a starting point.
Perhaps of interest too is a feature to set some portions of document as read-only, although I believe this haven't been thoroughly tested.
HTH.
Scintilla and descendants (no OS X)
But, scintilla does work on OS X.
You could try GTK+ with GtkTextView, or Qt's QTextEdit.
FLTK's TextEditor widget is all you need. It is simple, straightforward, and easy to use, has utf8 and you can easily have text-styles. With just few lines you can have an editor. Check the /test/editor.cxx example. It works perfectly on OSX as well. Furthermore, all you need is explained here: http://www.fltk.org/doc-1.1/editor.html .
Well,
you might be able to use tk -- the text widget is supposedly good and flexible -- have a looksie at the tcl/tk wiki.
Or you could go for some embedded/game toolkit (like Agar) -- but there a text widget with editing capabilities would be more cumbersome, I imagine.
But saying you want to do a cross-platform C GUI and then writing off GTK seems like a whole lot of wasted time and effort, to me. You'll probably end up switching languages or using GTK.

General Binary Data Viewer for Windows Vista [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm looking for recommendations for a good program for 32-bit Windows Vista that will load any arbitrary binary file and display textual information or graphical visualization relevant to identifying what actual data the bits are supposed to represent. Is ther anything better than a hex editor for this kind of thing?
One thing I'd like to do is say, look at the non-visible data in a Spore PNG file to get a clue as to what's actually being stored in there. Right now I'm using WordPad and all I get is something that looks like this:
‰PNG
IHDR ¢
/Qã!$D4"Ž‚îvÚ°‰ÅØàïjÃÞÉ_{!…‡ú 9¥Ý´îÁ6 ‰ms ^
I guess what I'm looking for is a souped up hex editor that acts more like an Excel for bits so I can slice and dice statistical patterns to get a better idea of what the bits might be doing.
Try HxD:
HxD is a carefully designed and fast
hex editor which, additionally to raw
disk editing and modifying of main
memory (RAM), handles files of any
size.
The easy to use interface offers
features such as searching and
replacing, exporting,
checksums/digests, insertion of byte
patterns, a file shredder,
concatenation or splitting of files,
statistics and much more.
I like xvi32, although it seems similar to the above - I've found it to be fairly fast even for big files.
What you probably want is a hex editor. The PSPad text editor has a pretty good hex-editing mode.
I use HHD's free HexEditor, it's Free!
I use frhed and vim (with its convert to hex mode but that can be slow for big files).
Do you mean something that detects a set of known fileformats and knows how to display them? Otherwise, hex editor (for example PSPad contains one) is the best thing that you can wish for. It's just bits which can mean anything.
I use Total Commander's built-in Lister (file viewer). It can show data as text, hex, it can show images. Then there are many plugins that range from code editors / viewers, image viewers etc.
Plugins I use are
imagine for viewing images
fileinfo for displaying info of executables
HTML viewer
Syn2 - code viewer / editor with syntax highlighting
There are a lot of plugins listed on author's web page, another good source is www.totalcmd.net
I've used Hex Workshop before, it has a "Find Strings" option in the Tools menu. Not free but works great.

Resources