Rather long question, and quite long story so I'll try to be as quick and precise as I can. I've written a program which allows the user to create questions for a quiz, and those options be exported into a .txt file with the below structure:
#
Level: 1
Ref: testRef
Question: test question
A: test A
B: test B
C: test C
D: test D
Ans: A
APerc: 100
BPerc: 0
CPerc: 0
DPerc: 0
Phone Answer: Right, I know this. The answer is 100% A. Good luck!
50/50: B
50/50 Percentage 1: 100
50/50 Percentage 2: 0
Force: True
!
Where
# indicates the start of the question,
Level is the part of the quiz this question is worth (the higher it is, the more difficult it is),
A, B, C and D are the alternative answers,
ans is the correct answers,
APerc to 50/50 Percentage 2 are all info for assists to the user to answer the question if they use them,
the value of force is whether this question should appear definitively or not, since their may be more than once question with the same level
and ! indicates the end of a question.
Now, in terms of code what would be the easiest way to read the whole file and identify each of these individual questions with the aforementioned delimiters?
Would it be good to use something like StreamReader? I'd rather read the whole file, then identify if any questions have been forced, then randomly pick 15 questions, either at the start of runtime or during the reveal of the next question, with different levels, possibly passing the value of the current level to the read function and then store all the question info in variables.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I wish to compare a value from a particular location in a binary file (say, value from index n x i, where i = 0,1,2,3... and n = any number, say 10).
I want to see if that value is equal to another, say "m". The location of that value in the file is always in n x i only.
I can think of three methods to this:
I maintain a temp variable which stores the value of n x i and I directly use fseek go to that index and see if it is equal to m.
I do an fseek for the value of m in the file.
I search for the value of m in locations 0, n, 2n, 3n,... and so on using fseek.
I don't know how each of these operations work, but which one of these is the most efficient with respect to space and time taken?
Edit:
This process is a part of a bigger process, which has many more files and hence time is important.
If there is any other way than using fseek, please do tell.
Any help appreciated. Thanks!
Without any prior knowledge of the values and ordering in the file you are searching, the quickest way is just to look through the file linearly and compare values.
This might be best done by using fseek() repeatedly, but repeatedly calling fseek and read may be slower than just reading big chunks of the file and looking through them in memory - because system calls have a lot of overhead.
However if you are doing a lot of searches of the same files, you would be better off building an index and/or sorting your records. One way to do this would be put the data into a relational database with built-in indexes (pretty much any SQL database)
Edit:
Since you know your file is sorted, you can use a binary search.
I am new to Golang.
Should I always avoid appending slices?
I need to load a linebreak-separated data file in memory.
With performance in mind, should I count lines, then load all the data in a predefined length array, or can I just append lines to a slice?
You should stop thinking about performance and start measuring what the actual bottleneck of you application is.
Any advice to a question like "Should do/avoid X because of performance?" is useless in 50% of the cases and counterproductive in 25%.
There are a few really general advices like "do not needlessly generate garbage" but your question cannot be answered as this depends a lot on the size of your file:
Your file is ~ 3 Tera byte? Most probably you will have to read it line by line anyway...
Your file has just a bunch (~50) of lines: Probably counting lines first is more work than reallocating a []string slice 4 times (or 0 times you you make([]string,0,100) it initially). A string is just 2 words.
Your file has an unknown but large (>10k) lines: Maybe it might be worth. "Maybe" in the sense you should measure on real data.
Your file is known to be big (>500k lines): Definitively count first, but you might start hitting the problem from the first bullet point.
You see: A general advice for performance is a bad advice so I won't give one.
I was at a job interview, and this is the question they asked me,
Are these two below ambigous? If they are, provide a string. If they are not, prove why they are not.
I couldn't solve it, and would like to know the answer and the reason for the future.
Question 1
S-->XaaaX
X-->aX | bX | e(epsilon)
Question 2
S-->aaS | aaaS | a
Again, this is not HW.
Thank you. An explanation would help.
We recall that a grammar is ambiguous if (and only if) some production from the grammar has more than one possible derivation.
In question 1 the symbol S expands to XaaaX and then the available alternatives for expanding the symbol X include aX and epsilon (ε). Conventionally the symbol epsilon represents an empty string. Expanding X as epsilon in aX produces a. So there are at least two ways to get aaaa. Richard Mckenna, I'll leave it to you to find them.
In question 2 the symbol S expands to aaS, aaaS, or a. There are at least two ways to get aaaaaa. Again I'll leave it to you to find the derivations.
If you wish, you may write your derivations on this page.
Sorry if this is off-topic, but here is your chance to reduce the amount of "homework" questions on this site :-)
I'm teaching a class of C programming where the students work on a small library of numeric routines in C. This year, the source files from several groups of students had significant amounts of code duplication in them.
(Down to identically misspelled printf debug statements. I mean, how dumb can you be.)
I know that Git can detect when two source files are similar to each others beyond a certain threshold but I never manager to get that to work on two source files that are not in a Git repository.
Keep in mind that these are not particularly sophisticated students. It is unlikely that they would go to the trouble of changing variable/function names.
Is there a way I can use Git to detect significant and literal code duplication a.k.a plagiarism? Or is there some other tool you could recommend for that
Why use git at all? A simple but effective technique would be to compare the sizes of the diffs between all of the different submissions, and then to manually inspect and compare those with the smallest differences.
Moss is a tool that was developed by a Stanford CS prof. I think they use it there as well. It's like diff for source code.
Adding to the other answers, you could use diff -- but I don't think the answers will be that useful by themselves. What you want is the number of lines that match, minus the number of non-blank lines, and to get that automatically you need to do a fair bit of magic with wc -l and grep to compute the sum of the lengths of the files, minus the length of the diff file, minus the number of blank lines that diff included as matching. And even then you'll miss some cases where diff decided that identical lines didn't match because of different things inserted before them.
A much better option is one of the suggestions listed in https://stackoverflow.com/questions/5294447/how-can-i-find-source-code-copying (or in https://stackoverflow.com/questions/4131900/how-to-detect-plagiarized-code, though the answers seem to duplicate).
You could use diff and check whether the two files seem similar:
diff -iEZbwB -U 0 file1.cpp file2.cpp
Those options tell diff to ignore whitespace changes and make a git-like diff file. Try it out on two samples.
Using diff is absolutely not a good idea unless you want to venture in the realm of combinatory hell:
If you have 2 submissions, you have to perform 1 diff to check for plagiarism,
If you have 3 submissions, you have to perform 2 diff to check for plagiarism,
If you have 4 submissions, you have to perform 6 diff to check for plagiarism,
...
If you have n submissions, you have to perform (n-1)! diff !
On the other hand, Moss, already suggested in an other answer, uses a completely different algorithm. Basically, it computes a set of fingerprints for significant k-grams of each document. The fingerprint is in fact a hash used to classify documents, and a possible plagiarism is detected when two documents end-up being sorted in the same bucket.
I am going to be making a program that reads in a line and gets up to 6 numbers. The program will eventually solve a a square matrix between 2x2 and 6x6. My question is what errors do I need to look for on the get_numb() function?
I am thinking that the function will have to check character by character to make sure that the individual characters are actual numbers and not a EOF or \n. I will have to also check that there is not more than 6 numbers on a line. I am about a week into programing, so is there anything I need to know to tackle this?
I absolutely recommend you start by taking into use a good unit testing framework, and write unit tests as you go. This way you can cover all the cases you mention above, and make sure that your program really works the way you think it should work.
There are loads of questions on SO about C unit testing frameworks; pick your favourite.
Apart from the cases you mention, I can think of the following:
less than 6 numbers on a line
empty line
(if the numbers are floating point, various number formats)
If your teacher gave you sample input / output, you may of course incorporate that into your unit tests as well.
The potential errors you described are reasonable ones to check for.
I recommend you give it a shot. If they're not sufficient and you get stuck, then post your code and explain what you're seeing.
Most ascii to integer converters will help you out with the error checking. Here's hoping your teacher gave you some example input code and perhaps, depending on the input methods, some example conversion code. As this is homework, I don't want to get too specific.