I can't find many recommendations/style guides for
C that mention how to split up lines in C so you
have less then 80 characters per line.
About the only thing I can find is PEP 7,
the style guide for the main Python implmentation
(CPython).
Does a link exist to a comprehensive C style guide
which includes recommendations for wrapping?
Or failing that, at least some good personal advive
on the matter?
P.S.: What do you do with really_long_variable_names_that_go_on_forever
(besides shortening)? Do you put them on the left edge or let it spill?
Here is Linus'original article about (linux) kernel coding style. The document probably evolved since, it is part of the source distribution.
You can have a look at the GNU Coding Standards which covers much more than coding style, but are pretty interesting nonetheless.
The 80 characters per line "rule" is obsolete.
http://richarddingwall.name/2008/05/31/is-the-80-character-line-limit-still-relevant/
http://en.wikipedia.org/wiki/Characters_per_line
http://news.ycombinator.com/item?id=180949
We don't use punched cards much anymore. We have huge displays with great resolutions that will only get larger as time goes on (obviously hand-helds, tablets, and netbooks are big part of modern computing, but I think most of us are coding on desktops and laptops, and even laptops have big displays these days).
Here are the rules that I feel we should consider:
One line of code does one thing.
One line of code is written as one line of code.
In other words, make each line as simple as possible and do not split a logical line into several physical lines. The first part of the rule helps to ensure reasonable brevity so that conforming to the second part is not burdensome.
Some people believe that certain languages encourage complex "one-liners." Perl is an example of a language that is considered by some to be a "write once, read never" language, but you know what? If you don't write obfuscated Perl, if instead you do one thing per line, Perl code can be just as manageable as anything else... ok, maybe not APL ;)
Besides complex one-liners, another drawback that I see with conforming to some artificial character limit is the shortening of identifiers to conform to the rule. Descriptive identifiers that are devoid of abbreviations and acronyms are often clearer than shortened alternatives. Clear identifiers move us that much closer to literate programming.
Perhaps the best "modern" argument that I've heard for keeping the 80, or some other value, character limit is "side-by-side" comparison of code. Side-by-side comparison is useful for comparing different versions of the same source file, as in source code version control system merge operations. Personally, I've noticed that if I abide by the rules I've suggested, the majority of my lines of code are sufficiently short to view them in their entirety when two source files (or even three, for three-way merges) are viewed side-by-side on a modern display. Sure, some of them overrun the viewport. In such cases, I just scroll a little bit if I need to see more. Also, modern comparison tools can easily tell you which lines are different, so you know which lines you should be looking at. If your tooling tells you that there's no reason to scroll, then there's no reason to scroll.
I think the old recommendation of 80 chars per line comes from a time when monitors were 80x25, nowadays 128 or more should be fine.
Related
I often write codes in MATLAB/Python to test whether my algorithm is feasible (& actually works). I then need to convert the entire code into C and sometimes, in FORTRAN90.
What would be a good way to manually convert a medium sized code from one language to another?
I have tried :
Converting the entire code from one into another and then testing it.
(Sometimes, there are errors and bugs which just won't go away and the finding the source of the error becomes a problem)
Go line by line and check for consistency of outputs every few lines.
(Too time consuming)
Use converters like f2c.
(In my experience, they are extremely horrible. I link to a lot of libraries which have different function calls for C and Fortran)
Also,:
I am fairly conversant with the programming languages I deal with so I don't need manuals or reference guides for my work (i.e. I know the syntax).
I am not asking this question specifically about MATLAB and C but rather as a translation paradigm.
Regarding the size, the codes are less than 100 lines long.
I dont want to call the code of one language to another. Please don't suggest that.
Different languages call for different paradigms. You definitely don't write and design code the same way in eg. Matlab, Python, C# or C++. Even object hierarchies will change a lot depending on the language.
That said, if your code consists in a few interconnected procedures, then you may go away with a direct line by line translation (every language allow you to write two or three interconnected functions while remaining idiomatic). But this is the case only for the simplest programs.
Prototyping in a high level language and then implementing the same idea in a robust and clean way in a "production" language is a very good practice, but involves two very different things :
Prototype in whatever language you want. Test, experiment, and convince yourself that the idea works. Pay attention to the big picture, don't focus on performance but on the high level ideas. Pay also attention to difficulties that you encounter when implementing, as you'll face them again in step 2.
Implement from scratch the idea in the production environment in language X. It will be quicker than if you did not do the prototyping stage, since most of the difficulties have been met in stage 1. Use idiomatic X, and focus on correctness. Pay attention to corner cases, general robustness, and once it works correctly, performance. You'll notice that roughly half of your code is made of new things which did not appear in 1. (eg. error checking, corner case handling, input/output, unit testing, etc).
You can see that line by line translation is obviously not a good idea, since you don't translate into the same program.
Also, when not prototyping, I find myself throwing away the first version and making another one that I like better, ie. I find myself prototyping ! Implementing the same thing twice is not a loss of time, it is normal development flow.
You may want to consider using a higher level domain specific language with multiple backends (e.g., Matlab, C, Fortran), producing clean and idiomatic code for each target language, probably with some optimisations. If your problem domain is narrow and every piece of code is more or less typical, it should be fairly trivial to design and implement such a DSL.
Break the source down into psuedo-code with input/process/output and then write your new code base to fit that spec.
I'm looking for a tool that checks whether two (C) source code files generate the same binary so that I can find actual functional changes between two files and ignore mere coding style changes.
It would be great if this worked even within a file for different changesets, so a file may have changed in coding style on some places, but also had one functional patch added.
It's very very hard to write a program to figure out the "functional" result of another program. Such a program sounds like it would be necessary for this. I would guess that computer programs themselves are right about the most compact and machine-readable way we have to even describe functionality, so it's kind of hard to write a program that analyses a program and generates a "better" description.
Somehow abstracting out and "understanding" that coding style differences don't affect functionality also sounds very, very hard. I find it hard when manually reading other people's code somehow, because the differences in style can be pretty large, even though the end result might be the same in "my style".
I would be surprised if a solution wouldn't also require a solution to the halting problem, which is proven impossible for the general case.
The only way is to compile both with the same compiler options and do a binary diff.
It's not only style changes you'd have to look out for; someone may have extracted code to a function that gets inlined in an optimised build. This may, or may not, depending on compiler options and version, give the same binary.
Mapping binary back to source to "high level functionality" - unlikely.
Comparing two source files with respect to "high level functionality" (ignoring coding style) - possible:
http://cscope.sourceforge.net/
Alternative suggestion:
Write a tool that "normalizes" your source files - by applying the same formatting to both sets of code.
This can easily be automated.
For example:
1) checkout both from version control,
2) apply "standard format",
3) compare
If all you're interested in is whether they both "generate the same binary", then the easiest solution is simply to generate both binaries, and compare.
Note, however, that there are things that would result in binaries that are bitwise different, even though they're functionally identical:
Change in external function names
Optimisations
Reordering non-dependent code snippets
etc.
There is a branch of computer science that deals with concurrency and parallel processes.
One of the applications is deciding whether two systems are behaviorally equivalent (in some bisimulation relation (weak or strong)).
Though it's computationally very difficult to decide whether two large systems are behaviorally equivalent. The usage is mainly for verification of small critical applications where we can't afford failure.
I am currently playing around with programming languages. I have spent some time writing parsers and interpreters in high level languages (most notably Haxe).
I have had some results, that I think are actually quite nice, but now I'd like to make them fast.
My idea was to translate the input language into C.
My C knowledge is limitted to what you learn at university. Beyond some exercises, I have never written actual C programs. But I feel confident I can make it work.
Of course I could try to write a frontend for the LLVM or to generate MSIL or JVM bytecode. But I feel that's too much to learn right now, and I don't see much of a gain actually.
Also C is perfectly human readable, so if I screw up, it's much easier to understand why. And C is, after all, high level. I can really translate concepts from the input language without too much mind-bending. I should be having something working up and running in a reasonable amount of time and then optimize it as I see fit.
So: Are there any downsides to using C? Can you recommend an alternative?
Thank you for your insight :)
Edit: Some Clarification
The reason why I want to go all the way down is, that I am writing a language with OOP support and I want to actually implement my method dispatching by hand, because I have something very specific in mind.
A primary area of use would be writing HTTP services, but I could image adding bindings to a GUI library (wxWidgets maybe) or whatever.
C is a good and quite popular choice for what you're trying to do.
Still, take a look at LLVM's intermediate language (IR). It's pretty readable and I think it's cleaner and easier to generate and parse than C. LLVM comes with quite a big collection of tools to work with it. You can generate native code for variety of platforms (as with C but with slightly more control over output) or for virtual machines. Possibility of JIT compilation is also a plus.
See The Architecture of Open Source Applications, Chapter 11 for introduction to LLVM approach and some snippets of IR.
What is your target environment? This might help us give you better answer.
C is actually a pretty good choice for a target language for a little or experimental compiler -- its widely available on many platforms, so your compiler becomes immediately useful in many environments. The main drawback is dealing with things that are not well supported in C, or are not well defined in the C spec. For example, if you want to do dynamic code generation (JIT compilation), C is problematic. Things like stack unwinding and reflection are tricky to do in C (though setjmp/longjmp and careful use of structs for which you generate layout descriptions can do a lot). Things like word sizes, big or little-endian layout, and arithmetic precision vary between C compilers, so you have to be aware of that, but those are things you need to deal with if you want to support multiple target machines anyways.
Other languages can be used as well -- the main advantage of C is its ubiquity.
You might consider C--, a C-like language intended to be a better target for code generation than C.
C is a good choice, IMHO. Unlike many languages, C is generally considered "elegant" in that you have only 32 keywords, and very basic constructs (sequence, selection, iteration), with a very simple-and-consistent collection of tokens and operators.
Because syntax is very consistent within C (brackets and braces, blocks and statements, use of expressions), you're not marching into an unbounded world of language expansion. C is a mature language, has weathered time nicely, and now-a-days is a "known quantity" (which is really hard to say about many other languages, even "mature" ones).
I know about the existance of question such as this one and this one. Let me explain.
Afet reading Joel's article Back to Basics and seeing many similar questions on SO, I've begun to wonder what are specific examples of situations where knowing stuff like C can make you a better high level programmer.
What I want to know is if there are many examples of this. Many times, the answer to this question is something like "Knowing C gives you a better feel of what's happening under the covers" or "You need a solid foundation for your program", and these answers don't have much meaning. I want to understand the different specific ways in which you will benefit from knowing low level concepts,
Joel gave a couple of examples: Binary databases vs XML, and strings. But two examples don't really justify learning C and/or Assembly. So my question is this: What specific examples are there of knowing C making you a better high level programmer?
My experience with teaching students and working with people who only studied high-level languages is that they tend to think at a certain high level of abstraction, and they assume that "everything comes for free". They can become very competent programmers, but eventually they have to deal with some code that has performance issues and then it comes to bite them.
When you work a lot with C, you do think about memory allocation. You often think about memory layout (and cache locality if that's an issue). You understand how and why certain graphics operations just cost a lot. How efficient or inefficient certain socket behaviors are. How buffers work, etc. I feel that using the abstractions in a higher level language when you do know how it is implemented below the covers sometimes gives you "that extra secret sauce" when thinking about performance.
For example, Java has a garbage collector and you can't directly assign things to memory directly. And yet, you can make certain design choices (e.g., with custom data structures) that affect performance because of the same reasons this would be an issue in C.
Also, and more generally, I feel that it is important for a power programmer to not only know big-O notation (which most schools teach), but that in real-life applications the constant is also important (which schools try to ignore). My anecdotal experience is that people with skills in both language levels tend to have a better understanding of the constant, perhaps because of what I described above.
In addition, many higher level systems that I have seen interface with lower level libraries and infrastructures. For instance, some communications, databases or graphics libraries. Some drivers for certain devices, etc. If you are a power programmer, you may eventially have to venture out there and it helps to at least have an idea of what is going on.
Knowing low level stuff can help a lot.
To become a racing driver, you have to learn and understand the basic physics of how tyres grip the road. Anyone can learn to drive pretty fast, but you need a good understanding of the "low level" stuff (forces and friction, racing lines, fine throttle and brake control, etc) to get those last few percent of performance that will allow you to win the race.
For example, if you understand how the CPU architecture works in your computer, you can write code that works better with it (e.g. if you know you have a certain CPU cache size or a certain number of bytes in each CPU cache line, you can arrange your data structures and the way that you access them to make the best use of the cache - for example, processing many elements of an array in order is often faster than processing random elements, due to the CPU cache). If you have a multi-core computer, then understanding how low level techniques like threading work can gave huge benefits (just as not understanding the low level can lead to disaster in threading).
If you understand how Disk I/O and caching works, you can modify file operations to work well with it (e.g. if you read from one file and write to another, working on large batches of data in RAM can help reduce I/O contention between the reading and writing phases of your code, and vastly improve throughput)
If you understand how virtual functions work, you can design high-level code that uses virtual functions well. If used incorrectly they can severely hamper performance.
If you understand how drawing is handled, you can use clever tricks to improve drawing speed. e.g. You can draw a chessboard by alternately drawing 64 white and black squares. But it is often faster to draw 32 white sqares and then 32 black ones (because you only have to change the drawing colour twice instead of 64 times). But you can actually draw the whole board black, then XOR 4 stripes across the board and 4 stripes down the board in white, and this can be much faster still (2 colour changes, and only 9 rectangles to draw instead of 64). This chessboard trick teaches you a very important programming skill: Lateral thinking. By designing your algorithm well, you can often make a big difference to how well your program operates.
Understanding C, or for that matter, any low level programming language, gives you an opportunity to understand things like memory usage (i.e. why is it a bad thing to create several million heavy objects), how pointers/object references work, etc.
The problem is that as we've created ever increasing levels of abstraction, we find ourselves doing a lot of 'lego block' programming, without understanding how the legos actually function. And by having almost infinite resources, we start treating memory and resources like water, and tend to solve problems by throwing more iron at the situation.
While not limited to C, there's a tremendous benefit to working at a low level with much smaller, memory constrained systems like the Arduino or old-school 8-bit processors. It lets you experience close to the metal coding in a much more approachable package, and after spending time squeezing apps into 512K, you will find yourself applying these skills at a larger level within your day to day programming.
So the language itself is not important, but having a deeper appreciation for how all of the bits come together, and how to work effectively at a level closer to the hardware is a set of skills beneficial to any software developer.
For one, knowing C helps you understand how memory works in the OS and in other high level languages. When your C# or Java program balloons on memory usage, understanding that references (which are basically just pointers) take memory too, and understand how many of the data structures are implemented (which you get from making your own in C) helps you understand that your dictionary is reserving huge amounts of memory that aren't actually used.
For another, knowing C can help you understand how to make use of lower level operating system features. You don't need this often, but sometimes you may need memory mapped files, or to use marshalling in C#, and C will greatly help understand what you're doing when that happens.
I think C has also helped my understanding of network protocols, but I can't put my finger on specific examples. I was reading another SO question the other day where someone was complaining about how C's bit-fields are 'basically useless' and I was thinking how elegantly C bit fields represent low-level network protocols. High level languages dealing with structures of bits always end up a mess!
In general, the more you know, the better programmer you will be.
However, sometimes knowing another language, such as C, can make you do the wrong thing, because there might be an assumption that is not true in a higher-level language (such as Python, or PHP). For example, one might assume that finding the length of a list might be O(N) where N is the length of the list. However, this is probably not the case in many high-level language instances. In Python, for most list-like things the cost is O(1).
Knowing more about the specifics of a language will help, but knowing more in general might lead one to make incorrect assumptions.
Just "knowing" C would not make you better.
But, if you understand the whole thing, how native binaries work, how does CPU work with it, what are architecture limitations, you may write a code which is easier for CPU.
For example, how L1/L2 caches affect your work, and how should you write your code to have more hits in L1/L2 caches. When working with C/C++ and doing heavy optimizations, you will have to go down to that kind of things.
It isn't so much knowing C as it is that C is closer to the bare metal than many other languages. You need to be more aware of how to allocate/deallocate memory because you have to do it yourself. Doing it yourself helps you understand the implications of many decisions that you make.
To me any language is acceptable as long as you understand how the compiler/interpreter (basically) maps your code onto the machine. It's a bit easier to do in a language that exposes this directly, but you should be able to, with a bit of reading, figure out how memory is allocated and organized, what sort of indexing patterns are more optimal than others, what constructs are more efficient for particular applications, etc.
More important, I think, is a good understanding of operating systems, memory architectures, and algorithms. If you understand how your algorithm works, why it would be better to choose one algorithm or data structure over another (e.g., HashSet vs. List), and how your code maps onto the machine, it shouldn't matter what language you are using.
This is my experience of how I learnt and taught myself programming, specifically, understanding C, this is going back to early 1990's so may be a bit antique, but the passion and the drive is important:
Learn to understand the low level principles of the computer, such as EGA/VGA programming, here's a link to the Simtel archive on the C programmer's guide to the PC.
Understanding how TSR's work
Download the whole archive of Bob Stout's snippets which is a big collection of C code that does one thing only - study them and understand it, not alone that, the collection of snippets strives to be portable.
Browse at the International Obfuscated C Code Contest (IOCCC) online, and see how the C code can be abused and understand the intracies of the language. The worst code abuse is the winner! Download the archives and study them.
Like myself, I loved the infamous Ponzo's C Tutorial which helped me immensely, unfortunately, the archive is very hard to find. If anyone knows of where to obtain them, please leave a comment and I will amend this answer to include the link. There is another one that I can remember - Coronado's [Generic?] C Tutorial, again, my memory on this one is hazy...
Look at Dr. Dobb's journal and C User Journal here - I do not know if you can still get them in print but they were a classic, can remember the feeling of holding a printed copy in my hand and tearing off home to type in the code to see what happens!
Grab an ancient copy of Turbo C v2 which I believe you can get from borland.com and just play with 16bit C programming to get a feel and mess with the pointers...sure it is ancient and old but playing with pointers on it is fine.
Understand and learn Pointers, link here to the legacy Simtel.net - a crucial link to achieving C Guru'ship for want of a better word, also you will find a host of downloads pertaining to the C programming language - I remember actually ordering the Simtel CD Archive and looking for the C stuff...
A couple of things that you have to deal directly with in C that other languages abstract away from you include explicit memory management (malloc) and dealing directly with pointers.
My girlfriend is one semester from graduating MIT (where they mainly use Java, Scheme, and Python) with a Computer Science degree, and she is currently working at a company whose codebase is in C++. For the first few days she had a difficult time understanding all the pointers/references/etc.
On the other hand, I found moving from C++ to Java very easy, because I was never confused about pass-references-by-value vs pass-by-reference.
Similarly, in C/C++ it is much more apparent that primitives are just the compiler treating the same sets of bits in different ways, as opposed to a language like Python or Ruby where everything is an object with its own distinct properties.
A simple (not entirely realistic) example to illustrate some of the advice above. Consider the seemingly harmless
while(true)
for(Iterator iter = foo.iterator(); iter.hasNext();)
bar.doSomething( iter.next() )
or the even higher level
while(true)
for(Baz b: foo)
bar.doSomething(b)
A possible problem here is that each time round the while loop a new object (the iterator) is created. If all you care about is programmer convenience, then the latter is definitely better. But if the loop has to be efficient or the machine is resource constrained then you are pretty much at the mercy of the designers of your high level language.
For example, a typical complaint for doing high-performance Java is having execution stop while garbage (such as all those allocated Iterator objects) is reclaimed. Not very good if your software is charged with tracking incoming missiles, auto-piloting a passenger jet, or just not leaving the user wondering why the GUI has stopped responding.
One possible solution (still in the higher-level language) would be to weaken the convenience of the iterator to something like
Iterator iter = new Iterator();
while(true)
for(foo.initAlreadyAllocatedIterator(iter); iter.hasNext();)
bar.doSomething(iter.next())
But this would only make sense if you had some idea about memory allocation...otherwise it just looks like a nasty API. Convenience always costs somewhere, and knowing lower-level stuff can help you identify and mitigate those costs.
I would like to know what projects cannot be done in C.
I know programming can be quicker and more intuitive in
other languages. But I would like to know what features
are missing in C that would prevent a project from being
completed well.
For example, very few web-frameworks exist in C.
C, like many other languages, is Turing Complete.
So simple answer is: none.
However, C++ Template Meta Programming meets the same criterion, so "it is possible" is not a good criterion to choose tools.
The very first C compiler?
A working solution to the halting problem
Alright, here's one: you cannot write an x86 boot sector in C. This is one of those things that has to be written in ASM.
There are none.
Different languages give you different ways to say things. For some classes of problems a given language may be more expressive and/or concise. Are there projects that you should pick something aside from C? Yes, of course. But to say you can't do it well in C is misleading. It would be better to ask which language is the best choice for the problem at hand, and are the gains worth using something unfamiliar?
Anything can be done in virtually any language.
That said there is a level of practicality. As your system's complexity increases, you need better tools to manage it.
The problems are still solvable, but you start to need more people and much more effort in design. I'm not saying other languages don't benefit from design, I'm saying that the same level and attention to detail may not be required.
Since we programmers are Human (I am at least) we have troubles in one area or another. My biggest is memory. If I can visualize my code as objects, manipulating large modules in my head becomes easier, and my brain can handle larger projects.
Of course, it's even possible to write good OO code in C, the patterns were developed in C by manually managing dispatch tables (tables of pointers with some pointers updated to point to different methods), and this is true of all programming constructs from higher languages--they can be done in any language, but...
If you were to implement objects in C, every single class you wrote would have a large amount of boilerplate overhead. If you made some form of exception handling, you would expose more boilerplate.
Higher level languages abstract this boilerplate out of your code and into the system, simplifying what you have to think about and debug (a dispatch table in C could take a lot of debugging, but in C++ it isn't going to fail because the code generated by a working compiler is going to be bug-free and hidden, you never see it).
I guess I'd say that's the biggest (only?) difference between low level and higher level languages, how much boilerplate do you hide. In the latest batch of dynamic languages, they are really into hiding loop constructs within the language, so more things look like:
directory.forEachFile(print file.name); // Not any real language
In C, even if you isolated part of the looping inside a function, setting up the function pointers and stuff would still take lines of un-obvious code that is not solving part of your primary problem.
There is not a single algorithm that cannot be written with C.
Depends on how much you want to invest (time/money/energy) to make it happen. Otherwise, I'd say there aren't any. It is just easier sometimes to use something else.
OS kernel has been written in C and everything runs over it so you can write everything in C.
Boot sector that needs ASM :-) , I don't think you meant that.