This may seem pedantic but I'm asking from a learner's perspective.
The syntax of the it helper lends itself to the syntax
it('should behave in this certain way...')
and most all the coders I know insist on writing 'should' for every single test.
To me this is a subtle but annoying pet peeve because the whole point of it is to save me keystrokes -- if it comes with the requirement to type should for every test it seems like an utter waste of a shortcut.
Is it a best practice to write it('should...') or can we simply write it('behaves in this expected way')?
Serious question -- I'm losing my mind over this little detail.
This might be an opinion based question, but I found myself writing a lot of shoulds in the past, until I found a spotify repository that changed my testing life
Should up is a CLI that removes shoulds from your tests and they explain in the README.md why and how you should write tests.
Basically it just removes a lot of text that is just redundant and makes everything easier to read.
Related
So I've run into an interesting design pattern and I wanted to know if you guys had an opinion on it.
Basically, the design is passing everything around as a pre-serialized type. There is no "types" for the returns, for example. It is passed as a simple uint8_t*. There is a defined header that "tells" you what is in the buffer, how big it is, what the version of the buffer is, ect. I call it "pre-serialized" because it forces flattening of all structures.
The pros:
You can easily write it (or even a set of it) to what ever you want. Files, IO, whatever.
Can store arbitrary data.
The Cons: IMHO:
No type safety is going to be a nightmare
The programmer has to parse the code. Even if there is an enumerated type, the user would have to know what that type means. Even if there are functions to parse the type, the programmer has to know that is the function to call.
Version hell: changing code will cause a ripple effect of errors. Because everywhere is parsing it differently, you have no idea where the code works or where it is broken.
It is viral: because it is flat, you can't "insert" the header on the end of outside data. You could wrap the call if you copy your "data", but this could cause an unnecessary copy that would be SLOW. So either your code is slower than it needs to be, or you conform to this data structure.
It isn't human readable OR debug-able.
Have you seen this design pattern before? Is there a name for this design pattern? Things I missed?
Is there a name for this design pattern?
Well, Legacy Code? :) I have seen such design in 30 years old Cobol systems...
The pros you have stated are easily reachable also by using XML format (or JSON):
You can easily write it (or even a set of it) to what ever you want. Files, IO, whatever - most of all, web services!
Can store arbitrary data.
Furthermore, all your cons are eliminated.
The only pro I can see in your solution is conciseness - when every byte counts and you need to avoid any overhead as too expensive, then this is nice.
Added: Cobol has a feature to easily define the structure of such serialized data, see PICTURE clause. Reading the data is very easy then, you read them as variables. (Like if you have a binary data and define a struct in the C language and typecast the binary to the struct.)
As Honza said this would be normal in Legacy Cobol/PL1 (was there a Cobol/PL1 conversion or interface to COBOL programs ???).
In COBOL this design pattern would make sense, not sure about C though (one of the binary serialization packages or JSON etc might be more sensible).
In Cobol, you would have a Cobol copybook which all programs would use and could edit the data using the Cobol Copybook (with something like file-aid or Microfocus Data Editor).
Why use this "design pattern" in Cobol:
Regression testing of Modules; you can write a driver module like
Read Test-data-file
while more-data
Call Module
write Result to output-file
Read Test-data-file
end
You can then do a compare between Output from the
re-Change Program to the changed program.
Testing - some times you can use a "production file" in testing
A file provides trace or snapshot of what is going on, this can be very useful.
Easy to reorganize Batch streams:
Split a programs up (and pass the data via file). There variety of reason for doing this including
program has gotten to big and is hard to maintain.
Sorting the data
Performance (use a file rather than hitting the DB multiple times)
new uses for extracted data
While your cons are valid for C, they will be less of an issue in Cobol.
The key to using this "design pattern" is being able to edit/view/compare the format. If you can not edit/view/compare a file, I do not see the point
I´m searching information about how to compare two codes and decide if the code submitted by someone is correct or not (based on a solution code defined before).
I could compare the output but many codes may have the same output. Then I think I must compare someway the codes and give a percentage of similitude.
Anybody can help me?
(the language code is C but I think this isn´t important)
Some of my teachers used online automated program grading systems like http://web-cat.org/
In the assignment they would specify a public api you must provide, and then they would just write tests against your functions, much like unit tests. They would intentionally pick tests that would exploit boundary conditions and other things students are notorious for not thinking about, and just call your code with many different inputs to try to get your code to fail.
Sometimes they would hardcode the expected values, other times they would allow values within a range, and other times they just did the assignment themselves and made it so your own code has to match the results produced by their code.
Obviously, not all programs can be effectively graded this way. It's also kinda error prone in that sometimes even the teacher made a mistake and overflowed an int or something, then the correct student submissions wouldn't match the teachers incorrect results. But, a system doesn't need to be perfect to be useful. But I think this raises an important point in that manually grading by reading the code won't necessarily reveal all mistakes either.
Another possibility is copy the submitted code, strip out all of the white space and search for substrings that must exist for the code to be correct and/or substrings that cannot exist for the code to be considered correct. The troublesome bit might be setting up to allow for some of the more tricky requirements such as [(a or c),((a or b) and c),((a or b) and c)], where the variables are the result of a boolean check as to if the substring related to the variable exists within the code.
For example, [("printf"),("for"), (not "1,2,3,4,5,6,7,9,10")], would require that "printf" and "for" be substrings in the code, while "1,2,3,4,5,6,7,9,10" i I'm not familiar with C, so I'm I'm assuming here that "printf" is required to be able to print anything without involving output streams, which could be accounted for by something like [("printf" or "out"),("for"), (not "1,2,3,4,5,6,7,9,10")], where "out" is part of C code required to make use of output streams.
It might be possible to automatically find required substrings based on a "correct" code, but as others have mentioned, there are alternative ways to do things. Which is why hard-coding the "solution" is probably required. Even so, it's quite possible that you'll miss a required substring, and it'll be marked as wrong, but it's probably the only way you can do what you ask with some degree of success.
Regular expressions might be useful here.
The macros listed in this gist https://gist.github.com/1177043 (pasted below)
(defmacro wrap-connection [& body]
`(if (sql/find-connection)
~#body
(sql/with-connection db ~#body)))
(defmacro transaction [& body]
`(if (sql/find-connection)
(sql/transaction ~#body)
(sql/with-connection db (sql/transaction ~#body))))
seem to be pretty useful. Is there a "standard" implementation of these? By standard, I mean something in clojure.contrib or similar. I can easily copy and paste this into my code, but I'm wondering if there's a better way. Or, put another way, what's the clojure way of doing this?
This is my first forray into actually writing clojure code (I've read a lot about it and Common Lisp), so I'm also trying to get a feel for what libraries are out there. It seems to me that the Lisp mentality is kind of "I can write it myself in 15 lines, so why would I use somebody else's".
From what I have seen, when you abstract beyond sql/transactions etc then the abstractions tend to be more application specific. writing your own wrappers as above when it truely makes things simpler is the canonical way to go about things.
Only use as many macroes as actually makes your life simpler, it can be tempting to next macroes (as in the gist) in ways that are more confusing or harder to maintain. I like this gist; just be careful not to over-macro.
ps: if some code is more than about five lines I look to see if someone else has written it first :) but many clojurians feel differently.
When I have to parse text (e.g. config files or other rather simple/descriptive languages), there are several solutions that come to my mind:
using library functions, e.g. strtok(), sscanf()
a finite state machine which processes one char at a time, tokenizing and parsing
using the explode() function I once wrote out of pure boredom
using lex/yacc (read: flex/bison) to generate an appropriate parser
I don't like the "library functions" approach. It feels clumsy and awkward. explode(), while it doesn't take much new code, feels even more blown up. And flex/bison often seems like sheer overkill.
I usually implement a FSM, but at the same time I already feel sorry for the poor guy that may have to maintain my code at a later point.
Hence my question:
What is the best way to parse relatively simple text files?
Does it matter at all?
Is there a commonly agreed-upon approach?
I'm going to break the rules a bit and answer your questions out of order.
Is there a commonly agreed-upon approach?
Absolutely not. IMHO the solution you choose should depend on (to name a few) your text, your timeframe, your experience, even your personality. If the text is simple enough to make flex and bison overkill, maybe C is itself overkill. Is it more important to be fast, or robust? Does it need to be maintained, or can it start quick and dirty? Are you a passionate C user, or can you be enticed away with the right language features? &c., &c.
Does it matter at all?
Again, this is something only you can answer. If you're working closely with a team of people, with particular skills and abilities, and the parser is important and needs to be maintained, it sure does matter! If you're writing something "out of pure boredom," I would suggest that it doesn't matter at all, no. :-)
What is the best way to parse relatively simple text files?
Well, I don't know that you're going to like my answer. Maybe first read some of the other fine answers here.
No, really, go ahead. I'll wait.
Ah, you're back and relaxed. Let's ease into things, shall we?
Never write it in 'C' if you can do it in 'awk';
Never do it in 'awk' if 'sed' can handle it;
Never use 'sed' when 'tr' can do the job;
Never invoke 'tr' when 'cat' is sufficient;
Avoid using 'cat' whenever possible.
-- Taylor's Laws of Programming
If you're writing it in C, but C feels like the wrong tool...it really might be the wrong tool. awk or perl will likely do what you're trying to do without all the aggravation. You may even be able to do it with cut or something similar.
On the other hand, if you're writing it in C, you probably have a good reason to write it in C. Maybe your parser is a tiny part of a much larger system, which, for the sake of argument, is embedded, in a refrigerator, on the moon. Or maybe you loooove C. You may even hate awk and perl, heaven forfend.
If you don't hate awk and perl, you may want to embed them into your C program. This is doable, in principle--I've never done it myself. For awk, try libmawk. For perl, there are proably a few ways (TMTOWTDI). You can run perl separately using popen to start it, or you can actually embed a Perl interpreter into your C program--see man perlembed.
Anyhow, as I've said, "the best way to parse" entirely depends on you and your team, the problem space, and your approach to the issue. What I can offer is my opinion.
I'm going to assume that in your C-only solutions (library functions and FSM (considering your explode to essentially be a library function)) you've already done your best at isolating the relevant code, designing the code and files well, and so forth.
Even so, I'm going to recommend lex and yacc.
Library functions feel "clumsy and awkward." A state machine seems unmaintainable. But you say that lex and yacc feel like overkill.
I think you should approach your complaints differently. What you're really doing is specifying a FSM. However, you're also hiring someone to write and maintain it for you, thereby solving most of the maintainability problem. Overkill? Did I mention they'll work for free?
I suspect, but do not know, that the reason lex and yacc originally felt like overkill was that your config / simple files just felt too, well, simple. If I'm right (a big if), you may be able to do most of your work in the lexer. (It's even conceivable that you can do all of your work in the lexer, but I know nothing about your input.) If your input is not only simple but widespread, you may be able to find a lexer/parser combination freely available for what you need.
In short: if you can do this not in C, try something else. If you want C, use lex and yacc--they have a little overhead, but they're a very good solution.
If you can get it to work, I'd go with an FSM, but with a huge assist from Perl-compatible regular expressions. This library is easy to understand, and you ought to be able to trim back sufficient extraneous spaghetti to give your monster that aerodynamic flair to which all flying monsters aspire. That, and plenty of comments in well-structured spaghetti, ought to make your code-maintaining successor comfortable. (And, as I'm sure you know, that code-maintaining successor is you after six months, when you've moved on to something else and the details of this code have slipped your mind.)
My short answer is to use the right too for the problem. If you have configuration files use existing standards and formats e.g. ini Files and parse them using Boost program_options.
If you enter the world of "own" languages use lex/yacc, since they provide you with the required features, but you have to consider the cost of maintaining the grammar and language implementation.
As a result I would recommend to further narrow you problem scope to find the right tool.
I've seen, here and elsewhere, many questions that, to get input data, use something like this:
...
printf("What's your name? ");
scanf("%s",name);
...
This is very reminiscent of the old BASIC days (INPUT for those who remember it).
The majority of those questions, if not all, are from people just learning C and are homeworks or example taken from their book.
I clearly remember that when I learned C I was told that this type of question/answer style was not a good practice for getting user input. The "Right Way" was either to get parameters on the command line (argv[...]) or reading from a data file to be parsed with fgets(). When user friendliness was a must, termio and friends had to be used.
Now, I wonder if anything changed in the past years. Are people thaught to structure user interaction as a set question/answer now?
I can only see disadvantages in using the printf()/scanf() approach, the main one being the diversity of terminals (^H anyone?) that could make difficult for the user to correct mistakes.
Could anyone point me to concrete advantages of this approach?
This structure is easy to explain and easy to learn, which is why it appears in so many introductory materials. Doing user input "the right way" in C can appear fairly daunting to a neophyte, especially when you have to deal with tokenizing and conversions.
However, I agree that it would be valuable for introductory materials to demonstrate more robust methods for handling user input.
I always thought the Unix way was to accept input from stdin. That way the caller of the command can pipe input in from another command, from a file or manually.
fgets() / sscanf() or similar is the right way to accept user input.
Take what you read here and elsewhere with a (large) pinch of salt.
The GNU readline library is really an excellent resource for this. Its main advantage is that it handles all the intricacies of editing, as well as letting users have their own input settings, eg Vi or Emacs mode.
This is the library bash and many other programs that accept interactive line-based data use.
By using the library, you get an interface that your users will have some knowledge of how to use, plus you get all sorts of nice features without having to explicitly code support for line editing.
I agree with CTFord, that IF you're doing a command line program, then stdin is a perfectly reasonable way to deal with input.
However, now adays, the correct way is to make a window that has a text box a label and a button and blah blah blah.
I know that doesn't really answer your question, but I think my point is that that style of programming has become MUCH less prevalent, so that it doesn't really matter what is "correct".
In many cases you are probably missing the point of the exercise if you think this is important. There's a world of a difference between a homework exercise and real world programs. The purpose of such exercises is seldom, if ever, to teach user interface design; the technique you describe is generally just a simple way to obtain test input for the real exercise, and is also probably encouraged by tutors who, pressed for time require submitted code to adhere to some 'course-style' to make marking easier. 'Course-style' is seldom the same thing as 'good-practice' or 'practical style', but that is not to say that it does not serve some purpose, or even that that purpose is beneficial to the students education.
Unfortunately novices often get hung up on this stuff when it is generally irrelevant to the actual purpose of the exercise. The problem is that user input using standard library primitives is not as foolproof and some tutors seem to think.