What is Object Oriented Scanner (Lexical Analysis)? - lexical-analysis

Recently I came across the concept of Lexical Analysis called "Object Oriented Scanner", but I wasn't able to distinguish it from the normal scanning technique. What can be the extra things in object oriented scanner, please help me understanding this concept... what can be object orientation in terms of Lexical analysis?

This should interest you, it is the theory behind oolex.
You should also look into the theory behind object oriented scanners, as in the proceedings of CompSysTech 2000: CompSysTech '00 Proceedings of the conference on Computer systems and technologies, where "From structure oriented to object oriented scanner design" was published. You will need a membership from the ACM if you can't find it in your university's library, but you can check this and this in the meantime. The last link is some Java with appropriate explanations.

Related

Is a neural network a lazy or eager learning method?

Is a neural network a lazy or eager learning method? Different web pages say different things so I want to get a solid answer with good literature to back it up. The most obvious book to look in would be Mitchell's famous Machine Learning book but skimming through the whole thing I can't see the answer. Thanks :).
Looking at the definition of the terms lazy and eager learning, and knowing how a neural network works, I believe that it is clear that it is eager. A trained network is a generalisation function, all the weights and paths used to arrive at a classification are entirely determined by training data, but the training data itself is not retained for the purposes of the decision making.
An important distinction is that a Lazy system stores its training data and uses it directly to determine a solution. An eager system determines a function from the training data, and thereafter the training data is no longer required. That is to say you cannot determine what the training data was from an eager system's function. A neural network certainly fits that description. An eager system can therfore be very storage efficient, but conversely is non-deterministic, in the sense that it is not possible to determine how or why it arrived a a particular solution, so problems of poor or inappropriate training data may be difficult deal with.
The eager article linked above even gives artificial neural networks as an example. You might of course prefer a cited text to Wikipedia but the page has existed with that assertion since 2007 without contradictory edits, so I'd say that was pretty robust.
Some neural networks are eager learners, and some are lazy. Feedforward neural networks (as are commonly trained by some variant of backpropagation) are eager: they attempt to derive a representation of the underlying relationships in the data at the time of training. Radial basis function networks (such as probabilistic NN or generalized regression NN), on the other hand, are lazy learners (very much like k-nearest neighbors, the classic lazy learner).
A neural network is generally considered to be an "eager" learning method.
"Eager" learning methods are models that learn from the training data in real-time, adjusting the model parameters as new examples are presented. Neural networks are an example of an eager learning method because the model parameters are updated during the training process, as the algorithm iteratively processes the training examples. This allows the model to adapt and improve its performance as more examples are seen.
On the other hand, "lazy" learning methods, also known as instance-based or memory-based learning, only learn from the training data when a new example is presented. The model does not update its parameters during the training process but instead, it memorizes the training data and uses it to make predictions. Lazy learning methods typically require less computation time to make predictions than eager learning methods, but they may not perform as well on unseen data.
In general, neural networks are considered eager learning methods because their parameters are updated during the training process.
Here are a few literature references:
"Eager Learning vs. Lazy Learning" by R. S. Michalski, J. G. Carbonell, and T. M. Mitchell. This paper provides a comprehensive overview of the distinction between eager and lazy learning, and discusses the strengths and weaknesses of each approach. It was published in Machine Learning, 1983.
"An overview of instance-based learning algorithms" by A. K. Jain and R. C. Dubes. This book chapter provides an overview of the main concepts and techniques used in instance-based or lazy learning, and compares them to other types of learning algorithms, such as decision trees and neural networks. It was published in "Algorithms for Clustering Data" by Prentice-Hall, Inc. in 1988.
" Machine Learning" by Tom Mitchell. This book provides a comprehensive introduction to the field of machine learning, including the concepts of eager and lazy learning. It covers a wide range of topics, from supervised and unsupervised learning to deep learning and reinforcement learning. It was published by McGraw-Hill Education in 1997.
"Introduction to Machine Learning" by Alpaydin, E. This book provides an introduction to the field of machine learning, including the concepts of eager and lazy learning, as well as a broad range of machine learning algorithms. It was published by MIT press in 2010
It's also worth noting, that this classification of lazy and eager learning is not always clear cut and can be somewhat subjective, and some algorithms can belong to both categories, depending on the specific implementation.

What is the best approach of creating a talking bot?

When creating a AI talking bot what kind of methods of design should I use? Should it be one function, multiple modules, should it have classes?
Understanding language is complicated, so the goal you need to determine first is what aspect of language you want to understand.
An AI must be able to understand what the person says to it, then relate it to what it already knows, and then generate a legitimate response.
These three steps can all be thought of as nearly independent, so you need to address each on its own.
The brain, the world's best language processor, uses a Neural Network, but that's not likely to work well for you.
A logic-based proof solving system, where facts that follow from facts are derived would probably work best, and I know of at least one system that uses it fairly effectively.
I'd start with an existing AI program (like the famous Eliza) and run its output through a speech synthesizer.
Some source for Eliza is available here. One open source speech synthisizer is FreeTTS.
If you're using a language other than Java, there are similar candidates AI bots and text-to-speech code out there.
I've started to do some work in this space using this open source project called Talkify:
https://github.com/manthanhd/talkify
It is a bot framework intended to help orchestrate flow of information between bot providers like Microsoft (Skype), Facebook (Messenger) etc and your backend services. The framework doesn't really provide implementation for the bot providers yet but does provide hooks into its natural language recognition engine.
The built in natural language recognition library can be used to classify sentences to topics which you can then map to skill functions.
Give it a try! I'd really like people's input to see if how it can be improved.

How to generate sequence diagram for my Native (C, C++) code?

I would like to know how to generate a sequence diagram for my Native (C, C++) code. I have written my C code using vim editor.
Thanks,
Sen
First of all, sequence diagram is an object oriented concept. It is meant to convey, at a glance, message passing between objects in an object oriented program in a sequential fashion, which is supposed to help understand time-considerate interaction between the objects. As such, it does not make sense to talk about sequence diagrams in the context of a procedural language like C.
When it comes to C++, sequence diagrams are defined in the general sense by the UML specification, which is the same for all object oriented languages. UML is considered a higher-level concept from source code that looks the same for all languages, and the process of converting source code to UML is called code reverse engineering. There are tools that allow you to convert source code of Java, C++ and other languages into UML diagrams that show relationships between classes, like Enterprise Architect, Visual Paradigm and IBM Rational Software Architect.
A sequence diagram, however, is a special kind of a UML diagram and it turns out that reverse engineering a sequence diagram is quite challenging. First, if you wanted to generate a sequence diagram through static analysis, one of the first questions you must answer is whether, given two objects and a message passed between them, a result is ever returned. This means that, given a method, you would have to analyze its algorithm and figure out if it loops forever or it returns. This is known as the halting problem and has been proven to be undecidable in computer science. This means that in order to produce a sequence diagram through static analysis, you would have to sacrifice accuracy. Dynamic analysis works by actually running the code and mapping the interactions between the objects at run time. This presents its own challenges. First, you would have to instrument the code. Then, filtering out the interactions you are interested in from library and system calls and other fluff present in the code would not be doable without user intervention.
This is not to say that creating a tool that would produce usable sequence diagrams is not possible, but the market interest has apparently not been strong enough to justify the effort, and apart from a few research papers on the subject, like CPP2XMI, I'm not aware of any commercially available tools to reverse engineer C++ into sequence diagrams.
Compounding the problem is the fact that C++ is one of the most complex object oriented languages around, so even if somebody devised a good way of reverse engineering sequence diagrams, C++ would be the last language to receive the treatment. Case in point: Visual Paradigm offers rudimentary support for reversing Java code into sequence diagrams, but not for C++.
Even if such a tool existed for C++, the sad truth is that if your C++ code is complex enough that you would rather use a tool to make a sequence diagram for it instead of doing it manually, then it is most likely too complex for the tool to give you anything useful and you would have to fix it up yourself anyways.
You can try CppDepend which provides the Dependency graph and the dependency matrix to explore the dependencies between directories, files and functions.
Have you tried with plantuml? It works really well with Doxygen, I use it at work with the company template and the syntax it's really easy, you have to write the call sequence yourself though. There are plenty examples in the page, if you are working in Linux you can use your native packaging tool to install it, the same applies to Doxygen (e.g. sudo apt-get plantuml). Otherwise if you are using Windows you can use the installers from the official pages too.
You'll have to do some configuration but it's pretty straightforward, I'll leave you the links to each tool.
Download pages:
http://plantuml.com/download
http://www.doxygen.nl/download.html
Plantuml examples:
http://plantuml.com/sequence-diagram
You can find the documentation in each page, for plantmul you use java executable (.jar) then you don't have to install nothing, you just need to configure doxygen to find the executable, you can find how in the doxygen documentation page:
http://www.doxygen.nl/manual/index.html
If you want to configure it without reading the documentation you could also watch this video:
https://www.youtube.com/watch?v=LZ5E4vEhsKs
I hope this helps, cheers.
You could explore trace2uml with works with doxygen.

Applications for the Church Programming Language

Has anyone worked with the programming language Church? Can anyone recommend practical applications? I just discovered it, and while it sounds like it addresses some long-standing problems in AI and machine-learning, I'm skeptical. I had never heard of it, and was surprised to find it's actually been around for a few years, having been announced in the paper Church: a language for generative models.
I'm not sure what to say about the matter of practical applications. Does modeling cognitive abilities with generative models constitute a "practical application" in your mind?
The key importance of Church (at least right now) is that it allows those of us working with probabilistic inference solutions to AI problems a simpler way to model. It's essentially a subset of Lisp.
I disagree with Chris S that it is at all a toy language. While some of these inference problems can be replicated in other languages (I've built several in Matlab) they generally aren't very reusable and you really have to love working in 4 and 5 for loops deep (I hate it).
Instead of tackling the problem that way, Church uses the recursive advantages of lamda calaculus and also allows for something called memoization which is really useful for generative models since your generative model is often not the same one trial after trial--though for testing you really need this.
I would say that if what you're doing has anything to do with Bayesian Networks, Hierarchical Bayesian Models, probabilistic solutions to POMDPs or Dynamic Bayesian Networks then I think Church is a great help. For what it's worth, I've worked with both Noah and Josh (two of Church's authors) and no one has a better handle on probabilistic inference right now (IMHO).
Church is part of the family of probabilistic programming languages that allows the separation of the estimation of a model from its definition. This makes probabilistic modeling and inference a lot more accessible to people that want to apply machine learning but who are not themselves hardcore machine learning researchers.
For a long time, probabilistic programming meant you'd have to come up with a model for your data and derive the estimation of the model yourself: you have some observed values, and you want to learn the parameters. The structure of the model is closely related to how you estimate the parameters, and you'd have to be pretty advanced knowledge of machine learning to do the computations correctly. The recent probabilistic programming languages are an attempt to address that and make things more accessible for data scientists or people doing work that applies machine learning.
As an analogy, consider the following:
You are a programmer and you want to run some code on a computer. Back in the 1970s, you had to write assembly language on punch cards and feed them into a mainframe (for which you had to book time on) in order to run your program. It is now 2014, and there are high-level, simple to learn languages that you can write code in even with no knowledge of how computer architecture works. It's still helpful to understand how computers work to write in those languages, but you don't have to, and many more people write code than if you had to program with punch cards.
Probabilistic programming languages do the same for machine learning with statistical models. Also, Church isn't the only choice for this. If you aren't a functional programming devotee, you can also check out the following frameworks for Bayesian inference in graphical models:
Infer.NET, written in C# by the Microsoft Research lab in Cambridge, UK
stan, written in C++ by the Statistics department at Columbia
You know what does a better job of describing Church than what I said? This MIT article: http://web.mit.edu/newsoffice/2010/ai-unification.html
It's slightly more hyperbolic, but then, I'm not immune to the optimism present in this article.
Likely, the article was intended to be published on April Fool's Day.
Here's another article dated late march of last year. http://dspace.mit.edu/handle/1721.1/44963

What programming language is used to IMPLEMENT google algorithm?

It is known that google has best searching & indexing algorithm.
The also have good relevancy.
They are also quicker in getting down the latest results.
All that's fine.
What programming language (c, c++, java, etc...) & database (oracle, MySQL, etc...) have they used in achieving this (since they have to manipulate with volume of data quickly and effectively)?.
Though I'm not looking for their in-depth architecture (if in case violates their company policies) an overview of all such things could be useful.
Anybody please add you valuable suggestions and insight on this?
Google internally use C++, Java and Python. See Rhino on Rails:
One of the (hundreds of) cool things
about working for Google is that they
let teams experiment, as long as it's
done within certain broad and
well-defined boundaries. One of the
fences in this big playground is your
choice of programming language. You
have to play inside the fence defined
by C++, Java, Python, and JavaScript.
Google's search algorithm is essentially MapReduce, which stems from functional programming techniques, implemented in C++.
Google has its own storage mechanism for this called the Google File System.
Mainly pigeons:
PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.
Relevance of search results is governed by quality of information retrieval algorithms they use, not the programming language.
But C++ is what most of their backend code is written in (for most services).
They don't use any off-the-shelf RDBMS products for data storage. All of that is written in-house.
Check it out, the Bigtable.

Resources