Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Today I was in a Webex meeting showing my screen with some Perl code I wrote. My boss suddenly told me while everyone else was looking and hearing that I had to remove trailing commas from my hash and array structures because it is a bad practice. I said I didn't think that was a bad practice in Perl, but he insisted and made me delete those commas just to show my script running in the meeting.
I still think it's not a bad practice in Perl, but I can be wrong. I actually find them pretty convenient and a good practice because they prevent me from adding new elements and forgetting to add the corresponding comma in the process.
But, I'd really like to know if it's a good or bad practice and be able to show it my boss (if he's wrong) with good arguments and even good sources for my arguments.
So, is it a bad practice to leave trailing commas?
This is an example:
my $hash_ref = {
key1 => 'a',
key2 => 'b',
key3 => 'c',
};
my $array_ref = [
1,
2,
3,
];
It's a great practice to have the trailing comma. Larry added it because he saw programmers add elements to a list (or whatever their language called it) but forget the separator character. Perl allows the trailing comma to make that less common. It's not a quirk or side effect of something else. That's what Perl wants you to do.
What is bad practice, however, is distracting a meeting full of people with something your boss could have corrected later. Unless the meeting was specifically a code review, your boss wasted a bunch of time. I've always wished that to join a video conference, you had to enter your per-minute compensation so a counter would show on everyone's screen to show how much money was being wasted. Spending a couple hundred dollars watching you remove commas on a working program would tamp down that nonsense.
So the PBP page referred to by Miller argues for making it easier to reorder the list by cutting and pasting lines; the mod_perl coding style document linked by Borodin argues for avoiding a momentary syntax error when you add stuff.
Much more significant than either, in my opinion, is that if you always have a trailing comma and you add a line, the diff only shows the line you added and the existing lines remain unchanged. This makes blame-finding better, and makes diffs more readable.
All three are good reasons for always using trailing commas, and there are in my opinion no good reasons not to do so.
The Apache mod_perl coding style document says this
Whenever you create a list or an array, always add a comma after the last item. The reason for doing this is that it's highly probable that new items will be appended to the end of the list in the future. If the comma is missing and this isn't noticed, there will be an error.
What your manager may have been thinking of is that doing the same thing in C is non-standard and non-portable, however there is no excuse for his extraordinary behaviour.
It is indeed a good practice and also mentioned in the famous PBP.
There is actually a Policy for perlcritic which always gets me: https://metacpan.org/pod/Perl::Critic::Policy::CodeLayout::RequireTrailingCommas
I favor leading commas, though I know its rather unpopular and seems to irritate the dyslexic. I also haven't been able to find a perltidy option for it. It fixes the line-change-diff problem as well (except for the first line, but that's not usually the one being changed in my experience), and I adore the way the commas line up in neat columns. It also works in languages that are white-space agnostic but don't like trailing commas on lists quite neatly. I think I learned this pattern while working with javascript...
my $hash_ref =
{ key1 => 'a'
, key2 => 'b'
, key3 => 'c'
};
my $array_ref =
[ 1
, 2
, 3
];
Related
I have a problem that I think would be solved relatively quickly with a loop. I have to work with SPSS and I think it can only be solved in syntax.
Unfortunately I am not good with loops, so I hope that one of you can help me.
I have done a study on reasons for abortions. Now I would like to present the distribution of reasons.
The problem is that each person was first asked about all their pregnancies (because this is also relevant for the later analysis), then the pregnancy was determined to which the questionnaire will further refer.
So the further questionnaire was only about one of the pregnancies, whereas the first questions (f.ex. year of pregnancy, reason for abortion) were answered for each pregnancy. For the reasons I only need the information that refers to the pregnancy that was also used for the further questionnaire.
I have an index variable that determines the loop at which pass the relevant pregnancy is asked ("index"). Then I have the variable "Loop_1_R" to "Loop_5_R" which queries the reasons for each up to 5 abortions (of course, for each woman, only the number of pregnancies that she also indicated). In between there are some missing data, for ex. it could be that a woman said that she had 5 pregnancies, but only two of them were abortions (f.ex. the third and fifth). So then she would only give reasons for an abortion in loop3 and loop5.
Now I want to create a new variable which contains only the reason which refers to the relevant pregnancy. So for each woman only one value. I was thinking, you could build a loop in the sense of calculate new variable in such a way that loop i is taken at index i.
I could of course do it by hand, but with a VPN count of over 3000 it will obviously take considerably longer.
I hope someone can help me! This is an example dataset with less loops and VPN:
You can use do repeat to loop and catch the value you need this way:
do repeat vr=Loop_1_R to Loop_5_R/vl=1 to 5.
if Index=vl reason=vr.
end repeat.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Recently I saw an answer on a question where they explained that addressing arrays in this way <number>[array] is valid C code.
How do square brackets work in C?
Example:
char x[] = {'A','B','C','D','E','F','G','H','I','J'};
printf("%d\n",5[X]);
//Will print 70 == 'F'
This kind of notation seems cumbersome and potentially confusing for everybody including the author.
Does this way of addressing arrays come with some justifiable advantage?
or
Can I continue with my life without worrying?
I have never encountered this in "real code" (i.e., outside of intentionally obfuscated things and puzzles with artificial limitations) so it would seem that it is quite universally agreed that this shouldn't be done.
However, I can come up with a contrived example where it might be considered by some (not necessarily me) a nicer syntax: if you have multiple pieces of data related to a single entity in a column, and you represent the rows as different arrays:
enum { ADA, BRIAN, CLAIRE };
const char *name[] = { "Ada", "Brian", "Claire" };
const unsigned age[] = { 30, 77, 41 };
printf("%s is %u years old\n", ADA[name], ADA[age]);
I will be the first to agree that this obfuscates the syntax by making it look like the people are the arrays instead of being the indexes, and I would prefer an array of struct in most cases. I think a case could be made for this being nicer-looking, though, or perhaps in some cases it would be a way to swap the rows and columns (arrays and indexes) with minimal edits elsewhere.
As far as I can tell, there are no technical pros or cons with either method. They are 100% equivalent. As the link you provided says, a[i] = *(p+i) = [addition is commutative] = *(i+p) = i[a].
For subjective pros and cons, well it's confusing. So the form index[array] is useful for code obfuscation, but other than that I cannot see any use of it at all.
One reason (but I'm really digging here) to use the standard way is that a[b+c] is not equivalent to b+c[a]. You would have to write (b+c)[a] instead to make it equivalent. This can be especially important in macros. Macros usually have parenthesis around every single argument in every single usage for this particular reason.
It's basically the same argument as to write if(2==x) instead of if(x==2). If you by accident write = instead of == you will get a compiler error with the first method.
Can I continue with my life without worrying?
Yes.
Yes, the pointer arithmetic is commutative because addition is commutative. References like a[n] are converted to *(a+n) but also n[a] is converted to *(n+a), which is identical. If you want to win ioccc competitions you must use this.
We have an application that will be generating 4-digit random strings for guest WiFi usage. So you walk into a hotel, get your room key and your WiFi password. I want to make these generated passwords as simple as possible to save calls to the helpdesk, but not so simple that they are so easily guessed.
The problem is that inevitably you'll end up with passwords like "POOP" or "DICK". I think a simple solution is so to have a database table of the "forbidden" words, and upon generation check it against the database first to make sure it isn't a banned word.
I have looked at probably dozens of filtered/banned/censored word lists, but I can't find one that is sufficiently detailed so as to include things like DIKK and P00P, and I don't exactly want to use my time today to try to think of every possible offensive 4-letter combination and type them all out manually.
Does anyone have a good resource/word list that would contain these "potentially-offensive" strings?
First I wrote this as a comment. But then I realized it actually answers your question about skipping offensive words:
Consider generating random strings without vowels. You won't get any actual english word. You will both avoid words like 'tree' or 'fukc'
I suggest you to use numbers too, will be "more secure" and you will eliminate this problem
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I waw wondering what could be some of the common pitfalls that a novice go programmer could fall in when writing (unintentionally slow go code).
1) First, I know that in python doing string concatenation can be (or used to be expensive), is that the same in go when trying to add one element to a string? As in "hello"+"World".
2) The other issue is that I find myself very often having to extend my slice with a list of more bytes (rather than 1 byte at a time). I have a "dirty" way of appending it by doing the following:
newStr := string(arrayOfBytes) + string(newBytesToAppend)
Is that way slower than just doing something like?
for _, b := range newBytesToAppend{
arrayOfBytes = append(arrayOfBytes, b)
}
Or is there a better way to append whole slices to other slices or maybe a built in way? It just seems to me a little odd that I would even have to write my own extend function (or even benchmark it)
Also, sometimes I end up having to loop through every element of the byte slice and for readability, I change the type of that current byte to a string. As in:
for _, b := range newBytesToAppend{
c := string(b)
//some more logic on c
logic(c) //logic
}
3) I was wondering, if converting types in go is expensive (specially between string to arrays) and if that might be one of the factors that might be making the code slow. Btw, sometimes I change types (to strings) very often, nearly every iteration.
But more generally, I was trying to search for the web a list of hints of what often are things that makes go code slow and was trying to change it so that it wouldn't (but didn't have that much luck). I am very much aware that this depends from application to application, but was wondering if there are any "expert" advice on what usually makes "novice" go code slow.
4) The last thing I can think of is, that sometimes I do know in advance the length of the slice, so I could just use arrays with fixed length. Could that change anything?
5) I have also made my own types as in:
type Num int
or
type Name string
Do those hinder performance?
6) Is there a general list of heuristic to watch out in go for code optimization? For example, is dereferencing a problem as it can be in C?
Use bytes.Buffer / Buffer.Write, it handles re-sizing the internal slice for you and it's by far the most effiecent way to manage multiple []bytes.
About the 2nd question, it's rather easy to answer that using a simple benchmark.
I'm wondering is there an algorithm or a library which helps me identify the components in an English which has no meaning? e.g., very serious grammar error? If so, could you explain how it works, because I would really like to implement that or use that for my own projects.
Here's a random example:
In the sentence: "I closed so etc page hello the door."
As a human, we can quickly identify that [so etc page hello] does not make any sense. Is it possible for a machine to point out that the string does not make any sense and also contains grammar errors?
If there's such a solution, how precise can that be? Is it possible, for example, given a clip of an English sentence, the algorithm returns a measure, indicating how meaningful, or correct that clip is? Thank you very much!
PS: I've looked at CMU's link grammar as well as the NLTK library. But still I'm not sure how to use for example link grammar parser to do what I would like to do as the if the parser doesn't accept the sentence, I don't know how to tweak it to tell me which part it is not right.. and I'm not sure whether NLTK supported that.
Another thought I had towards solving the problem is to look at the frequencies of the word combination. Since I'm currently interested in correcting very serious errors only. If I define the "serious error" to be the cases where words in a clip of a sentence are rarely used together, i.e., the frequency of the combo should be much lower than those of the other combos in the sentence.
For instance, in the above example: [so etc page hello] these four words really seldom occur together. One intuition of my idea comes from when I type such combo in Google, no related results jump out. So is there any library that provides me such frequency information like Google does? Such frequencies may give a good hint on the correctness of the word combo.
I think that what you are looking for is a language model. A language model assigns a probability to each sentence of k words appearing in your language. The simplest kind of language models are n-grams models: given the first i words of your sentence, the probability of observing the i+1th word only depends on the n-1 previous words.
For example, for a bigram model (n=2), the probability of the sentence w1 w2 ... wk is equal to
P(w1 ... wk) = P(w1) P(w2 | w1) ... P(wk | w(k-1)).
To compute the probabilities P(wi | w(i-1)), you just have to count the number of occurrence of the bigram w(i-1) wi and of the word w(i-1) on a large corpus.
Here is a good tutorial paper on the subject: A Bit of Progress in Language Modeling, by Joshua Goodman.
Yes, such things exist.
You can read about it on Wikipedia.
You can also read about some of the precision issues here.
As far as determining which part is not right after determining the sentence has a grammar issue, that is largely impossible without knowing the author's intended meaning. Take, for example, "Over their, dead bodies" and "Over there dead bodies". Both are incorrect, and could be fixed either by adding/removing the comma or swapping their/there. However, these result in very different meanings (yes, the second one would not be a complete sentence, but it would be acceptable/understandable in context).
Spell checking works because there are a limited number of words against which you can check a word to determine if it is valid (spelled correctly). However, there are infinite sentences that can be constructed, with infinite meanings, so there is no way to correct a poorly written sentence without knowing what the meaning behind it is.
I think what you are looking for is a well-established library that can process natural language and extract the meanings.
Unfortunately, there's no such library. Natural language processing, as you probably can imagine, is not an easy task. It is still a very active research field. There are many algorithms and methods in understanding natural language, but to my knowledge, most of them only work well for specific applications or words of specific types.
And those libraries, such as the CMU one, seems to be still quite rudimental. It can't do what you want to do (like identifying errors in English sentence). You have to develop algorithm to do that using the tools that they provide (such as sentence parser).
If you want to learn about it check out ai-class.com. They have some sections that talks about processing language and words.