A newbie question regarding making an executable program [closed] - c

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I’m a student of other discipline. I would like to make an executable program on my own. Suppose the program I would like to make is a small dictionary with a few thousands of words. In the hope of creating such a program in my Windows XP, I collected a compiler called Borland C++ V.5.02. I downloaded some books on C language like Programming Language (2nd Edition) by Brian W. Kernighan, Dennis M. Ritchie; Sams Teach Yourself C in 24 Hours, and Programming with C (2nd edition) by Byron S. Gottfried. I started reading those books but sooner I found there were no instruction how to make such a program or I was unable to understand from those huge contents. I’m expecting some instruction from you all telling how I should proceed. Please leave some comments to help me out create such type of program.

C is not the friendliest language to learn on your own if you have no notions of computer architecture.
Maybe something like Python is more suitable?
I'm sure you can download lots of Python books :-)

Welcome to programming. :)
It might be easier to think of your problem in small pieces:
How will you store your dictionary?
Plain text, any order -- simple to work with, slower
Plain text, sorted -- requires sorting the word list (easy with a sort utility, but you or your users have to remember to sort the list); faster
A binary version (say, 32-byte 'records' for words): much harder to edit, but fast lookups
A binary encoding of a tree structure, child nodes are allowed character transitions: requires tools to create, very fast lookups
clever hashing http://en.wikipedia.org/wiki/Bloom_filters can go very very quickly.
Once you've picked the storage (I suggest plain text, any order, as a good starting point) you'll need to figure out the algorithm:
For a given word, compare it against every single word in the file
So you'll need to read ever line in the file, one at a time (fgets in a loop)
Compare the word with the line (strcmp)
return 1 if found
return 0 if you reach the end of the file
Now, iterate this, once for each word in the input:
read in a line (fgets)
tokenize the string into words (strtok)
strip off punctuation (or ignore? or ...)
pass the word to the routine you wrote earlier
This is an awful dictionary program: for a dictionary of 100,000 words (my /usr/share/dict/words is 98,000, and I think the wordlist on OpenBSD systems is in the 150,000 range) and a document of 5,000 words, you'll run your inner loop roughly 250,000,000 times. That'll be slow on even fast machines. Which is why I mentioned all those much-more-complex data structures earlier -- if you want it fast, you can't do naive.
If you sort the word list, it'll be roughly 83,000 comparisons in your inner-loop.
(And now a small diversion: the look program supports a -b option to ask for a binary search; without the -b, it runs a linear search:
'help' 'universe' 'apple'
linear .040s .054s .058s
binary .001s .001s .001s
In other words, if you're going to be doing 5,000 of these, sorted word list will give you much faster run times.)
If you build a finite-state machine (the tree structure), it'll be as many comparisons as your input word has characters, times 5000. That'll be a huge savings.
If you build the bloom filters, it'll be computing one or two hashes (which is some simple arithmetic on your characters, very quick) and then one or two lookups. VERY fast.
I hope this is helpful, at least the simpler versions shouldn't be hard to implement.

Programming is not the easiest thing to do, despite some people think so. It takes a lot of time to learn and master so if you really considering creating your own application a lot of patience and time is required.
If you want to use this application of yours to learn programming then I'd suggest find some tutorials for particular language and digg in.
If it's something you'd need for your main discipline maybe you could hire somebody to create such app for you, or browse sourceforge.net for similar solution, or find commercial alternative.
And yes, it's hard in the beginning :)

If you want to learn C++ then you can use Microsoft Visual C++ Express which is free. Creating of executable program is pretty straightforward task. Look at Visual C++ Guided Tour

Related

Best way to identify system library commands in Lexer/Bison

I'm writing an intepreter for a new programming language. The language's syntax is very simple and the "system library" commands are treated as simple identifiers (even if is no special construct, but a function like everything else - only pre-defined internally). And no, this is not yet-another-one of the 1 million Lisp's out there.
The question is:
Should I have the Lexer catch them, or should I do it in the AST-construction code?
What I've done so far:
I tried recognizing all of them in my Lexer script, and they are a lot already - over 200. I send the same token back to Bison (SYSTEM_CMD) only with a different value (basically a numeric index pointing to the array of system commands where they are all stored).
As an approach, I think this makes it much faster than having to look up every single one of them in a hash and see if it's a system command.
The thing is the Lexer is getting quite huge (in term of resulting binary filesize I mean) rather fast. And I obviously don't like it.
Given that my focus is something both lightning-fast (I'm already quite good with that) and small enough to be embedded, what would be the most recommended approach?

Tool to convert (translate) C to Go? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
What tool to use to convert C source code into Go source code?
For example, if the C code contains:
struct Node {
struct Node *left, *right;
void *data;
};
char charAt(char *s, int i) {
return s[i];
}
the corresponding Go code generated by the tool should be:
type Node struct {
left, right *Node
data interface{}
}
func charAt(s string, i int) byte {
return s[i]
}
The tool does not need to be perfect. It is OK if some parts of the generated Go code need to be corrected by hand.
rsc created github.com/rsc/c2go to convert the c based Go compiler into Go.
As an external example, akavel seems to be trying to use it to create a Go based lua: github.com/akavel/goluago/
github.com/xyproto/c2go is another project, but it hasn't been touched in a little while.
I guess no such (C to Go source code conversion) tool exist today. You might consider to make your own converter. The question becomes: is it worth it, and how to do that?
It probably might not be worth the effort, because Go and C could be somehow interoperable. For example, if you use the GCC 4.6 (or to be released 4.7, i.e. the latest snapshot) your probably can link C & Go code together, with some care.
Of course, as usual, the evil is in the details.
If you want a converter, do you want the obtained Go code to be readable and editable (then the task is more difficult, since you want to keep the structure of the code, and you also want to keep the comments)? In that case, you probably need your own C parser (and it is a difficult task).
If you don't care about readability of the generated Go code, you could for example extend an existing compiler to do the work. For example, GCC is extensible thru plugins or thru MELT extensions, and you could customize GCC (with MELT, or your own C plugin for GCC) to transform Gimple representation (the main internal representation for instructions inside GCC) to unreadable Go code. This is somehow simpler (but still require more than a week of work).
Of course, Go interfaces, channels and even memory management (garbage collected memory) has no standard C counterpart.
Check out this project
https://github.com/elliotchance/c2go
The detailed description is in this article
Update: August 6, 2021
Also check this one
https://github.com/gotranspile/cxgo
I'm almost sure there is no such tool, but IMHO in every language it's good to write in its own "coding style".
Remember how much we all loved C preprocessor tricks and really artistic work with pointers? Remember how much care it took to deal with malloc/free or with threads?
Go is different. You have no preprocessor, but you have closures, objects with methods, interfaces, garbage collector, slices, goroutines and many other nice features.
So, why to convert code instead of rewriting it in a much better and cleaner way?
Of course, I hope you don't have a 1000K lines of code in C that you have to port to Go :)
Take a look at SWIG, http://www.swig.org/Doc2.0/Go.html it will translate the C/C++ headers to go and wrap them for a starting point. Then you can port parts over bit by bit.
As far as I know, such tool does not exist (yet). So you're bound to convert your C code to Go by hand.
I don't know how complex the C code is you want to convert, but you might want to keep in mind Go has a "special" way of doing things. Like the usage of interfaces and channels.

What is AST,CFG,CLANG, how can we use it in deadcode removal algorithm? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I am about to write a dead-code removal algorithm using C language for an online event with our team.
The requirements are.....
To read a C program source file,Which has many forms of dead-codes.
And our output should be a file, Which is free from all dead-codes.
While surfing the internet, we came across the SO links...
How can I know which parts in the code are never used?
Dead code detection in legacy C/C++ project
Before seeing these links,we had the basic idea...
Reading the input C file, line by line using normal file stream and store in an string array.
Then to analyze those strings and determine the very basic dead codes like if(0) and if(1) etc..
And making a stack, for maintaining the parenthesis. And so more...
But this has a great problem, that this idea will lead us to do more with string operations rather than removing dead-codes.
But After seeing these link...
We came to know about
Clang library,Abstract Syntax Tree,Control-Flow-Graph etc...
But we are very newbie to those libraries and those concepts.
We came to know that they are used to parse the C code.
Hence we need some basic ideas about these AST,CFG and some basic guidance, explaining how can we use
that in our code...
Can we include that clang library as a normal library like math.h?
Where can we download that library?
Does we can use those Clang libraries in windows?
I can explain to you the concept of control flow graphs, but I am not familiar with the library itself.
The concept is simple. Imagine any sequential lines of code (that is without if, goto or function call or labels) as one node of a graph. Every goto or function call creates a directional link from the current node to the node where the goto label is or the function it is calling. Remember that a function itself could be a graph and not a simple node, because it may have ifs or other function calls inside. Each function call also creates a directional link from leaf nodes of the function (where the function returns) to the node containing the codes right after the function call. (That can create a lot of links outgoing from the function graph because the function can be called in many parts of the code)
Likewise, if you have an if, you have two direction links from the current node to both the if part and the else part of the if statement (unless you detect if(0) or if(1) like you said in which case there is only one link to the proper location)
The root of your graph is the entry point of main. Now what you must do to find dead code is to simply traverse the graph from the root position (using DFS or BFS for example) and in the end see which nodes were NOT visited. This shows you the dead codes, that is places in the code that no matter what direction your program takes, it won't reach those locations.
If you want to implement this yourself, you can take a recursive approach (similar to parsing the code but simpler). For example if you see an if you say:
typedef char *line;
FlowGraph *get_flow_graph(line *code)
{
FlowGraph *current_node = malloc(sizeof *current_node);
current_node->flow_to = malloc(some_maximum * sizeof *current_node->flow_to);
current_node->flow_to_count = 0;
...
if (is_if_statement(code[0]))
{
FlowGraph *if_part = get_flow_graph(code + 1);
FlowGraph *else_part = get_flow_graph(code + find_matching_else(code));
current_node->flow_to[current_node->flow_to_count++] = if_part;
current_node->flow_to[current_node->flow_to_count++] = else_part;
}
else
...
}
You can see examples of control and data flow graphs extracted by the DMS Software Reengineering Toolkit.
We have done this on very large applications (26 million lines of C) using DMS's data flow analysis machinery and its C Front End, including a points-to analysis, which is a practical necessity if you really want to find dead functions in a large C system.

Parsing: library functions, FSM, explode() or lex/yacc?

When I have to parse text (e.g. config files or other rather simple/descriptive languages), there are several solutions that come to my mind:
using library functions, e.g. strtok(), sscanf()
a finite state machine which processes one char at a time, tokenizing and parsing
using the explode() function I once wrote out of pure boredom
using lex/yacc (read: flex/bison) to generate an appropriate parser
I don't like the "library functions" approach. It feels clumsy and awkward. explode(), while it doesn't take much new code, feels even more blown up. And flex/bison often seems like sheer overkill.
I usually implement a FSM, but at the same time I already feel sorry for the poor guy that may have to maintain my code at a later point.
Hence my question:
What is the best way to parse relatively simple text files?
Does it matter at all?
Is there a commonly agreed-upon approach?
I'm going to break the rules a bit and answer your questions out of order.
Is there a commonly agreed-upon approach?
Absolutely not. IMHO the solution you choose should depend on (to name a few) your text, your timeframe, your experience, even your personality. If the text is simple enough to make flex and bison overkill, maybe C is itself overkill. Is it more important to be fast, or robust? Does it need to be maintained, or can it start quick and dirty? Are you a passionate C user, or can you be enticed away with the right language features? &c., &c.
Does it matter at all?
Again, this is something only you can answer. If you're working closely with a team of people, with particular skills and abilities, and the parser is important and needs to be maintained, it sure does matter! If you're writing something "out of pure boredom," I would suggest that it doesn't matter at all, no. :-)
What is the best way to parse relatively simple text files?
Well, I don't know that you're going to like my answer. Maybe first read some of the other fine answers here.
No, really, go ahead. I'll wait.
Ah, you're back and relaxed. Let's ease into things, shall we?
Never write it in 'C' if you can do it in 'awk';
Never do it in 'awk' if 'sed' can handle it;
Never use 'sed' when 'tr' can do the job;
Never invoke 'tr' when 'cat' is sufficient;
Avoid using 'cat' whenever possible.
-- Taylor's Laws of Programming
If you're writing it in C, but C feels like the wrong tool...it really might be the wrong tool. awk or perl will likely do what you're trying to do without all the aggravation. You may even be able to do it with cut or something similar.
On the other hand, if you're writing it in C, you probably have a good reason to write it in C. Maybe your parser is a tiny part of a much larger system, which, for the sake of argument, is embedded, in a refrigerator, on the moon. Or maybe you loooove C. You may even hate awk and perl, heaven forfend.
If you don't hate awk and perl, you may want to embed them into your C program. This is doable, in principle--I've never done it myself. For awk, try libmawk. For perl, there are proably a few ways (TMTOWTDI). You can run perl separately using popen to start it, or you can actually embed a Perl interpreter into your C program--see man perlembed.
Anyhow, as I've said, "the best way to parse" entirely depends on you and your team, the problem space, and your approach to the issue. What I can offer is my opinion.
I'm going to assume that in your C-only solutions (library functions and FSM (considering your explode to essentially be a library function)) you've already done your best at isolating the relevant code, designing the code and files well, and so forth.
Even so, I'm going to recommend lex and yacc.
Library functions feel "clumsy and awkward." A state machine seems unmaintainable. But you say that lex and yacc feel like overkill.
I think you should approach your complaints differently. What you're really doing is specifying a FSM. However, you're also hiring someone to write and maintain it for you, thereby solving most of the maintainability problem. Overkill? Did I mention they'll work for free?
I suspect, but do not know, that the reason lex and yacc originally felt like overkill was that your config / simple files just felt too, well, simple. If I'm right (a big if), you may be able to do most of your work in the lexer. (It's even conceivable that you can do all of your work in the lexer, but I know nothing about your input.) If your input is not only simple but widespread, you may be able to find a lexer/parser combination freely available for what you need.
In short: if you can do this not in C, try something else. If you want C, use lex and yacc--they have a little overhead, but they're a very good solution.
If you can get it to work, I'd go with an FSM, but with a huge assist from Perl-compatible regular expressions. This library is easy to understand, and you ought to be able to trim back sufficient extraneous spaghetti to give your monster that aerodynamic flair to which all flying monsters aspire. That, and plenty of comments in well-structured spaghetti, ought to make your code-maintaining successor comfortable. (And, as I'm sure you know, that code-maintaining successor is you after six months, when you've moved on to something else and the details of this code have slipped your mind.)
My short answer is to use the right too for the problem. If you have configuration files use existing standards and formats e.g. ini Files and parse them using Boost program_options.
If you enter the world of "own" languages use lex/yacc, since they provide you with the required features, but you have to consider the cost of maintaining the grammar and language implementation.
As a result I would recommend to further narrow you problem scope to find the right tool.

Making a perfect hash (all consecutive buckets full), gperf or alternatives?

Let's say I want to build a perfect hash table for looking up an array where the predefined keys are 12 Months, thus I would want
hash("January")==0
hash("December")==11
I run my Month names through gperf and got a nice hash function, but it appears to give out 16 buckets(or rather the range is 16)!
#define MIN_HASH_VALUE 3
#define MAX_HASH_VALUE 18
/* maximum key range = 16, duplicates = 0 */
Looking at the generated gperf code, its hash function code does a simple return of len plus char value lookup from a 256 size table. Somehow, in my head I imagined a fancy looking function... :)
What if I want exactly 12 buckets(that is I do not want to skip over unused buckets)? For small sets as this, it really doesn't matter, but when I have 1000 predefined keys and want exactly 1000 buckets in a row?
Can one find a deterministic way to do this?
I was interested in the answer to this question & came to it via a search for gperf. I tried gperf, but it was very slow on a large input file and thus did not seem suitable. I tried cmph but I wasn't happy with it. It requires building a file which is then loaded into the C program at run time. Further, the program is so fragile (crashes with "segmentation fault" on any kind of mistaken input) that I did not trust it. A further Google search led me to this page, and onward to mph. I downloaded mph and found it is very nice. It has an optional program to generate a C file, called "emitc", and using it like
mph < systemdictionaryfile | emitc > output.c
worked almost instantly (a few seconds with a dictionary of about 200,000 words) and created a working C file which compiles with no problems. My tests also indicate that it works. I haven't tested the performance of the hashing algorithm yet though.
The only alternative to gperf I know is cmph : http://cmph.sourceforge.net/ but, as Jerome said in the comment, having 16 buckets provides you some speed benefit.
When I first looked at minimal perfect hasihing I found very interesting readings on CiteseerX but I resisted the temptation to try coding one of those solutions myself. I know I would end up with an inferior solution respect to gperf or cmph or, even assuming the solution was comparable, I would have to spend a lot of time on it.
There are many MPH solutions and algorithms, gerf doesn't yet do MPH's, but I'm working on it. Esp. for large sets. See https://gitlab.com/rurban/gperf/-/tree/hashfuncs
The classic cmph has a lot of constant overhead and is only recommended for huge key sets.
There's the NetBSD nbperf and my improved variant: https://github.com/rurban/nbperf
which does CHM, CHM3 and BZD, with integer key support, optimizations for smaller key sets and alternate hash functions.
There's Bob Jenkin's generator, and Taj Khattra's mph-1.2.
There are also two perl libraries to generate C lookups, one in PostgresQL (PerfectHash.pm) and one for late perl5 unicode lookups (regen/mph.pl), and a tool to compare various generators: https://github.com/rurban/Perfect-Hash

Resources