As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm trying to implement an autocomplete algorithm for a programming language. I want it to be context aware, meaning that suggestions must appear relative to the statement the user is currently typing.
What is the best way to go around this? What algorithms should I be looking into?
You do not actually need to parse the language to do this.
Assuming you have a list of valid symbols you need only choose the most likely completions when the user presses the autocomplete key (say, TAB, eg). You can weight the symbols by their frequency in the code. You can also weight by symbol type, giving more weight to variable names than reserved words. For example, if the user types "th[TAB]" and they have a variable named "themes" which appears 50 times, that might be the top completion, with the reserved word "then" perhaps being 2nd.
To generate the frequency weighting you need to count the number of times each symbol appears in the code. This can be done using a standard string search algorithm.
If you do have a parser, you can do more fancy things. For example, if you determine all the methods of a class and the user enters the symbol for an instance of a class followed by a period, you can automatically display a list of the methods, because those are the only valid possibilities.
BTW: To build the symbol list will depend on the language. For example, if it is Java, you can use the built-in introspection methods to identify all the defined symbols.
You need a state machine that recognizes the grammar of your language. Additionally, the state transitions should be weighted according to their probability.
If the state of your engine is at public static, the weight of the state transition class could be higher than that of abstract. This would be necessary to display a practical number of options as suggestions.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am writing a weather program that calls an API for data. One of the flags available is a preferred language, of which there are about 45 options. This leads me to the question.
What is the most efficient way to display all the language options, then allow user input, then check for valid input?
My best idea is a loop that prints all the options from a file. The user then inputs an option. Their selection is checked against the list to find a match. If there is a match then the program continues. If not, they are prompted again.
Is this the best way to go about this? I'm trying to make this program as efficient and professional looking as possible as I'm using it for my portfolio.
My best idea is a loop that prints all the options from a file. The user then inputs an option. Their selection is checked against the list to find a match. If there is a match then the program continues. If not, they are prompted again.
Is this the best way to go about this? I'm trying to make this program as efficient and professional looking as possible as I'm using it for my portfolio.
There are always multiple competing goals (single-thread performance, scalability, features, flexibility/extendibility, code readability, fault tolerance). For well designed code, its good to understand the importance of each of these goals for each piece of code (and good to understand that these importances can be different for different pieces of code in the same project). For this specific piece of code; I'd say that flexibility/extendibility (e.g. the ability to add new languages easily later) is the most important, followed by code readability (the ability to understand the code later, and find/fix bugs in it). The least important are scalability (e.g. how much performance increases when number of CPUs increases) then single-thread performance; because the code only needs to work once and is held back by the speed that a human can type anyway.
Is this the best way to go about this? I'm trying to make this program as efficient and professional looking as possible as I'm using it for my portfolio.
In terms of "human computer interaction"; the best way is to make it impossible for the user to enter invalid data (e.g. a drop down list with a well predicted default to avoid the need for a "not set yet" option). The second best way is "active status" - specifically, for every "user input event" (key press, mouse click, etc) a status field corresponding to the input field/control is updated to either indicate that the field/input is in an acceptable state, or provide the reason why it's not; where its impossible for the user to continue (e.g. because an "OK" button is disabled) until all of status fields are saying that the input is acceptable. For both of these options there is no need to validate the submitted input afterwards.
Sadly; for "command line", it's almost impossible to use the best way and almost impossible to use the 2nd best way.
In other words; you need to forget about performance/efficiency (because that's the least important); and then forget about writing software that is good/user-friendly (because it's command line).
The question then is; what is the "least bad" option? For this; I'd start by assuming that the data for each language is stored in a separate file (or directory?) where the file name is usable for display purposes; and all of the data is in a specific directory (e.g. a "project/lang" directory that contains a "project/lang/UK_English" file, a "project/lang/Spanish" file, etc). In this case you can get a list of files in the "project/lang" directory, sort them in alphabetical order, and use them to display a list of numbered options ("1) Spanish", "2) UK English", ..). Then if/when the user selects an option you can validate it (and report any errors if the user entered a bad character, a number that's too high, etc, then ask the user to retry); and load the right file for whichever language they chose (and report any errors if there's a problem with the file and ask the user to choose something else).
That way; people/translators can just create new files, and none of the code will need to be modified.
For a comparison; the fastest way is to use constant strings (e.g. puts("1) Spanish\n2) UK English\n\nEnter language choice:")); and to predict what the user will choose (e.g. based on keeping track of what they chose last time) and "pre-fetch and pre-parse" in the background (so that hopefully all the work is done for the correct choice before the user actually makes a choice), with the ability to quickly cancel the "pre-fetch and pre-parse" work if the user makes a choice that wasn't predicted. This would be extremely good for performance (likely "instant") but extremely bad (inflexible, over-complicated, too hard to maintain).
I have a kind of Q&A site (very approximately) where users enter questions to be answered by our Staff. I am quite concerned about users posting non-questions, which are an annoyance. The best I thought to far is a system to detect whether the text is in Italian (our users' language), and if it is, to check if it's not a copypasta against a list of common copypastas.
So, long story short: users will input some text, I have to make sure it's a proper question in Italian and not random characters.
Not sure what language you'll make
http://www.easywayserver.com/blog/java-string-contains-example/
How do I check if a string contains a specific word in PHP?
Checking if the input String (Question) contains any forbidden word would be one way to go at it.
Pseudo code
ListOfForbiddenWords;
if Language = Italian
if Input does not contain any of ListOfForbiddenwords
//It's fine
else
//Don't spam
else
//You're not Italian
Not quite sure on what's the best way to check if a string is written in a specific language
You can use Rosoka's language detection if you want a commercial option.
You can try it out at Rosoka Cloud for about $1/hour with all of the features. The language ID is available as a stand alone library. So you can feed it examples inputs that you are concerned with to see if it gives back what you want.
Random text like "jgujqkwfjpihoujlkfa" will be flagged as ROMANIZATION or a tag based on the underlying codeblocks that where used if it is non ascii. i.e. input that is not a language will not be tagged as a language.
There are many free language detection libraries. One popular example is libexttextcat from LibreOffice. There are many clones and ports and variants if you don't want a C library; see e.g. http://odur.let.rug.nl/vannoord/TextCat/competitors.html for an (incomplete, slightly dated) list of pointers.
A similar question was asked here a while ago and the answers listed a number of language detection API solutions. One of the answers points to detectlanguage.com which offers up a limited free language detection service.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Recently I had an interesting conversation with a fellow developer who told me that every time you write a "try" it is mandatory to provide a "catch". He could not explain why this rule. He told me that it was a principle of good programming. Why this rule?
For your information I'm not agree with him. I think that sometimes you could write a "try" block with only a "finally" block. But it's true that I think if you write a "catch" you must do something in your catch. Never just re-throw the error.
You're right : you don't need to write a catch clause if you don't know what to do with the exception and just want to ensure your finally clause is executed.
It's bad practice to add a catch clause just to rethrow the exception.
As an aside, to illustrate that catch and finally are in fact related to two different (admittedly not foreign) problems, note that some languages use a different construct for the catching of exception and to ensure some code (usually resource release) is executed. Go use defer for example.
In most applications try/finally constructs heavily outnumber try/catch constructs.
Because it's much more common to have resources to clean up than it is to receive an exception you know how to handle.
However try/finally is nearly always replaceable by using in C#, so in C# your developer might have a point in that case; but it most definitely isn't a "a principle of good programming".
try
{
...
}
finally
{
...
}
Gives you the opportunity to execute code in the finally block that would otherwise get missed if an exception were thrown in the try block.
You only need to add a catch block if you have something specific to do when an exception occurs.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm developing an OpenGL ES 2.0 application with C++.
I want to show my blender's models using OpenGL but I don't know which is the easiest format to load with OpenGL ES 2.0.
I've been trying with Wavefront obj format how to unpacked vertices and how to obtain vertices for glDrawElements' last parameter.
Doy you know an easiest format?
Thanks.
OBJ is a pretty easy format. You can see the spec at http://www.martinreddy.net/gfx/3d/OBJ.spec
You do the loading yourself, of course. You read the .obj file and create the vertices yourself. Faces are like vertex indices.
Be careful, though: OpenGL ES 2.0 cannot render polygons other than triangles, so your obj files must not contain any other polygons or you must convert those yourself.
I just tried a couple of formats.
It looks like the PLY format ( you might have to enable that export format in the user preferences ) exports the model with only one index array. So you don't need multiple index arrays like with the Wavefront OBJ format. Which is very difficult with OpenGL. See rendering-meshes-with-multiple-indices
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
Any programming language that does not have a suitable reflection mechanism I find seriously debilitating for rapidly changing problems.
It seems with certain languages its incredible hard or not possible to do:
Convention over Configuration
Automatic Databinding
AOP / Meta programming
with out reflection.
Some example languages that do not have some sort of programmatic reflection are:
C, C++, Haskell, OCaml. I'm sure there are plenty more.
To show you can example of DRY (Don't Repeat Yourself) being violated by most of these languages is when you have to write Unit Tests. You almost always need to register your test cases in these languages outside of where you define the test.
How do programmers of these languages mitigate this problem?
EDIT: Common languages that do have reflection for those that do not know are: C#, Java, Python, Ruby, and my personal favorite F# and Scala.
EDIT: The two common approaches it seems are code instrumentation and code generation. However I have never seen instrumentation for C.
Instead of just voting to close, could some one please comment on why this should be closed and I'll delete the post.
You don't.
But you can keep the repetitions close to each other so when changing something, you see something else has to be changed too.
For example, I wrote a JSON-Parser that outputs objects, a typical call looks like this:
struct SomeStruct
{
int a;
int b;
double c;
typedef int serializable;
template<class SerializerT> void serialize(SerializerT& s)
{
s("a",a)("b",b)("c",c);
}
};
Sure, when you add a field, you have to add another field in the function, but maybe you don't want to serialize that field (something you'd have to handle in languages with reflection, too), and if you delete a field without removing it from the function, the compiler will complain.
I think it's a matter of degree. Reflection is just one very powerful method of avoiding repetition.
Any time you generalize a function from a specific case you are using DRY principle, the more general you make it the more DRY it is. Just because some languages don't get you where you get with reflection doesn't mean there aren't DRY ways of programming with them. They may not be as DRY, but that doesn't mean they don't have their own unique advantages which in total sum may outweigh the advantages of using a language that has reflection. (For example, speed consequences from heavy use of reflection could be a consideration.)
Also, one method of getting something like the DRY benefits of reflection with a language that doesn't support it is by using a good code-generation tool. In that case you modify the code for different cases once, in the code generation template, and the template pushes it out to different instances in code. (I'm not saying whether or not using code generation is a good thing, but with a good "active" generator it is certainly one way of getting something like the DRY benefit of reflection in a language that doesn't have reflection. And the benefits of code generation go beyond this simple benefit. I'm thinking of something like CodeSmith, although there are many others: http://www.codesmithtools.com/ )
Abstractly, do more at runtime, without the benefits of things like compile-time type checking (you have to essentially write your own type-checking routines) and beautiful code. E.g., use a table instead of a class. (But if you did this, why not use a dynamically-typed language instead?) This is often bad. I do not recommend this.
In C++, generic programming techniques allow you to programmatically include members of a class (is that what you want to do?) via inheritance.
One nice example for C++ unit testing is cxxtest:
http://cxxtest.tigris.org/. It uses convention and a python script to generate your C++ test suite by post-processing your C++ with python.
A good way to think about getting around restrictions in languages is Michael Feathers' notion of "seams". A seam is a place where your program can be changed without changing the code. For example, in C the pre-processor and linker provide seams. In C++ polymorphism is another place. In more dynamic languages like where you can change method definitions, or reflect, you get even more flexibility. Without the seams things can be more complicated and sometimes you just don't want to try to hammer a nail with your shoe but rather go with the flow of the tool at hand.