I've been working on a project, which is a combination of an application server and an object database, and is currently running on a single machine only. Some time ago I read a paper which describes a distributed relational database, and got some ideas on how to apply the ideas in that paper to my project, so that I could make a high-availability version of it running on a cluster using a shared-nothing architecture.
My problem is, that I don't have experience on designing distributed systems and their protocols - I did not take the advanced CS courses about distributed systems at university. So I'm worried about being able to design a protocol, which does not cause deadlock, starvation, split brain and other problems.
Question: Where can I find good material about designing distributed systems? What methods there are for verifying that a distributed protocol works right? Recommendations of books, academic articles and others are welcome.
I learned a lot by looking at what is published about really huge web-based plattforms, and especially how their systems evolved over time to meet their growth.
Here a some examples I found enlightening:
eBay Architecture: Nice history of their architecture and the issues they had. Obviously they can't use a lot of caching for the auctions and bids, so their story is different in that point from many others. As of 2006, they deployed 100,000 new lines of code every two weeks - and are able to roll back an ongoing deployment if issues arise.
Paper on Google File System: Nice analysis of what they needed, how they implemented it and how it performs in production use. After reading this, I found it less scary to build parts of the infrastructure myself to meet exactly my needs, if necessary, and that such a solution can and probably should be quite simple and straight-forward. There is also a lot of interesting stuff on the net (including YouTube videos) on BigTable and MapReduce, other important parts of Google's architecture.
Inside MySpace: One of the few really huge sites build on the Microsoft stack. You can learn a lot of what not to do with your data layer.
A great start for finding much more resources on this topic is the Real Life Architectures section on the "High Scalability" web site. For example they a good summary on Amazons architecture.
Learning distributed computing isn't easy. Its really a very vast field covering areas on communication, security, reliability, concurrency etc., each of which would take years to master. Understanding will eventually come through a lot of reading and practical experience. You seem to have a challenging project to start with, so heres your chance :)
The two most popular books on distributed computing are, I believe:
1) Distributed Systems: Concepts and Design - George Coulouris et al.
2) Distributed Systems: Principles and Paradigms - A. S. Tanenbaum and M. Van Steen
Both these books give a very good introduction to current approaches (including communication protocols) that are being used to build successful distributed systems. I've personally used the latter mostly and I've found it to be an excellent text. If you think the reviews on Amazon aren't very good, its because most readers compare this book to other books written by A.S. Tanenbaum (who IMO is one of the best authors in the field of Computer Science) which are quite frankly better written.
PS: I really question your need to design and verify a new protocol. If you are working with application servers and databases, what you need is probably already available.
I liked the book Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum and Maarten van Steen.
At a more abstract and formal level, Communicating and Mobile Systems: The Pi-Calculus by Robin Milner gives a calculus for verifying systems. There are variants of pi-calculus for verifying protocols, such as SPI-calculus (the wikipedia page for which has disappeared since I last looked), and implementations, some of which are also verification tools.
Where can I find good material about designing distributed systems?
I have never been able to finish the famous book from Nancy Lynch. However, I find that the book from Sukumar Ghosh Distributed Systems: An Algorithmic Approach is much easier to read, and it points to the original papers if needed.
It is nevertheless true that I didn't read the books from Gerard Tel and Nicola Santoro. Perhaps they are still easier to read...
What methods there are for verifying that a distributed protocol works right?
In order to survey the possibilities (and also in order to understand the question), I think that it is useful to get an overview of the possible tools from the book Software Specification Methods.
My final decision was to learn TLA+. Why? Even if the language and tools seem better, I really decided to try TLA+ because the guy behind it is Leslie Lamport. That is, not just a prominent figure on distributed systems, but also the author of Latex!
You can get the TLA+ book and several examples for free.
There are many classic papers written by Leslie Lamport :
(http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html) and Edsger Dijkstra
(http://www.cs.utexas.edu/users/EWD/)
for the database side.
A main stream is NoSQL movement,many project are appearing in the market including CouchDb( couchdb.apache.org) , MongoDB ,Cassandra. These all have the promise of scalability and managability (replication, fault tolerance, high-availability).
One good book is Birman's Reliable Distributed Systems, although it has its detractors.
If you want to formally verify your protocol you could look at some of the techniques in Lynch's Distributed Algorithms.
It is likely that whatever protocol you are trying to implement has been designed and analysed before. I'll just plug my own blog, which covers e.g. consensus algorithms.
Related
I'm starting to study machine learning and bayesian inference applied to computer vision and affective computing.
If I understand right, there is a big discussion between
classical IA, ontology, semantic web researchers
and machine learning and bayesian guys
I think it is usually referred as strong AI vs weak AI related also to philosophical issues like functional psychology (brain as black box set) and cognitive psychology (theory of mind, mirror neuron), but this is not the point in a programming forum like this.
I'd like to understand the differences between the two points of view. Ideally, answers will reference examples and academic papers where one approach get good results and the other fails. I am also interested in the historical trends: why approaches fell out of favour and a newer approaches began to rise up. For example, I know that Bayesian inference is computationally intractable, problem in NP, and that's why for a long time probabilistic models was not favoured in information technology world. However, they've began to rise up in econometrics.
I think you have got several ideas mixed up together. It's true that there is a distinction that gets drawn between rule-based and probabilistic approaches to 'AI' tasks, however it has nothing to do with strong or weak AI, very little to do with psychology and it's not nearly as clear cut as being a battle between two opposing sides. Also, I think saying Bayesian inference was not used in computer science because inference is NP complete in general is a bit misleading. That result often doesn't matter that much in practice and most machine learning algorithms don't do real Bayesian inference anyway.
Having said all that, the history of Natural Language Processing went from rule-based systems in the 80s and early 90s to machine learning systems up to the present day. Look at the history of the MUC conferences to see the early approaches to information extraction task. Compare that with the current state-of-the-art in named entity recognition and parsing (the ACL wiki is a good source for this) which are all based on machine learning methods.
As far as specific references, I doubt you'll find anyone writing an academic paper that says 'statistical systems are better than rule-based systems' because it's often very hard to make a definite statement like that. A quick Google for 'statistical vs. rule based' yields papers like this which looks at machine translation and recommends using both approaches, according to their strengths and weaknesses. I think you'll find that this is pretty typical of academic papers. The only thing I've read that really makes a stand on the issue is 'The Unreasonable Effectiveness of Data' which is a good read.
As for the "rule-based" vs. " probabilistic" thing you can go for the classic book by Judea Pearl - "Probabilistic Reasoning in Intelligent Systems. Pearl writes very biased towards what he calls "intensional systems" which is basically the counter-part to rule-based stuff. I think this book is what set off the whole probabilistic thing in AI (you can also argue the time was due, but then it was THE book of that time).
I think machine-learning is a different story (though it's nearer to probabilistic AI than to logics).
I need help to chose a project to work on for my master graduation, The project must involve Ai / Machine learning or Business intelegence.. but if there is any other suggestion out of these topics it is Ok, please help me.
One of the most rapid growing areas in AI today is Computer Vision. There are many practical needs where the results of your Master Thesis can be helpful. You can try research something like Emotion Detection, Eye-Tracking, etc.
An appropriate work for a MS in CS in any good University can highlight the current status of research on this field, compare different approaches and algorithms. As a practical part, it makes also a lot of fun when your program recognizes your mood properly :)
Netflix
If you want to work more on non trivial datasets (not google size, but not trivial either and with real application), with an objective measure of success, why not working on the netflix challenge (the first one) ? You can get all the data for free, you have many papers on it, as well as pretty good way to compare your results vs other peoples (since everyone used exactly the same dataset, and it was not so easy to "cheat", contrary to what happens quite often in the academic literature). While not trivial in size, you can work on it with only one computer (assuming it is recent enough), and depending on the type of algorithms you are using, you can implement them in a language which is not C/C++, at least for prototyping (for example, I could get decent results doing things entirely in python).
Bonus point, it passes the "family" test: easy to tell your parents what you are working on, which is always a pain in my experience :)
Music-related tasks
A bit more original: something that is both cool, not trivial but not too complicated in data handling is anything around music, like music genre recognition (classical / electronic / jazz / etc...). You would need to know about signal processing as well, though - I would not advise it if you cannot get easy access to professors who know about the topic.
I can use the same answer I used on a previous, similar question:
Russ Greiner has a great list of project topics for his machine learning course, so that's a great place to start.
Both GAs and ANNs are learners/classifiers. So I ask you the question, what is an interesting "thing" to learn? Maybe it's:
Detecting cancer
Predicting the outcome between two sports teams
Filtering spam
Detecting faces
Reading text (OCR)
Playing a game
The sky is the limit, really!
Since it has a business tie in - given some input set determine probable business fraud from the input (something the SEC seems challenged in doing). We now have several examples (Madoff and others). Or a system to estimate investment risk (there are lots of such systems apparently but were any accurate in the case of Lehman for example).
A starting point might be the Chen book Genetic Algorithms and Genetic Programming in Computational Finance.
Here's an AAAI writeup of an award to the National Association of Securities Dealers for a system thatmonitors NASDAQ insider trading.
Many great answers posted already, but I wanted to add my 2 cents.There is one hot topic in which big companies all around are investing lots of resources into, and is still a very challenging topic with lots of potential: Automated detection of fake news.
This is even more relevant nowadays where most of us are connecting though social media and there's a huge crisis looming over.
Fake news, content removal, source reliability... The problem is huge and very exciting. It is as I said challenging as it can be seen from many perspectives (from analising images to detect fakes using adversarial netwotks to detecting fake written news based on text content (NLP) or using graph theory to find sources) and the possbilities for a research proyect are endless.
I suggest you read some general articles (e.g this or this) or have a look at research articles from the last couple of years (a quick google seach will throw you a lot of related stuff).
I wish I had the opportunity of starting over a project based on this topic. I think it's going to be of the upmost relevance in the next few years.
As a researcher I am curious to hear what people think of Multi-Agent Systems if of course you came across the whole idea. Do you believe there is something more in there than just a hype and another buzzword? Can you see any potential uses in business or everyday computing? Or do you think that we can already achieve everything MAS has to offer but with simple more elegant solutions?
I am a research professor who has published many articles in the Autonomous Agents and Multiagent Systems Conference (AAMAS): the main vehicle for multiagent research.
MAS is a term used by researchers (coined around 1995, for the first International Conference on Multiagent Systems (ICMAS), which brought together the Distributed Artificial Intelligence (DAI) and the Autonomous Agents research communities under one tent: the MAS tent) that refers to algorithms and methods for organizing teams of autonomous agents. Researchers in MAS have developed algorithms for Robot soccer (see Robocup), coordinating autonomous robot rovers (as in Mars rovers), distributed allocation of resources (who does which task), and many other domains.
I don't see that there is any "hype" as you describe. You can read all the papers in past conferences and each one clearly states what the author tried to accomplish, how he tried it, and what the results were. I do not know of anyone making silly claims about the power of these techniques: they are all just algorithms (isn't everything). No magic here.
The question
do you think that we can already achieve everything MAS has to offer but with simple more elegant solutions?
is incorrect in that, if you can solve a MAS problem with a simple and more elegant solution, your solution is now a MAS solution!
MAS is a problem domain, along with the solutions we found so far. If you find a better solution then, awesome, publish it and join the MAS community.
As an aside, I see this confusion often. Journeymen programmers don't realize that research communities are (usually) defined by the problem they work on, not a solution approach.
Compared to many other fields of Artificial Intelligence and Technology, multi-agent systems aren't hyped enough!
I find people who know nothing about Multi-agent systems and are actively in the field of technology, programming, and "artificial intelligence". (quoted since it now is hype, and has effectively lost all meaning)
I learned about Multi-agent systems in 2008 through Netlogo, and it changed my perspective about problem solving using computational technology. I realized at the time that these types of programs would require increasing computing power. More recently I have learned all the hype driven stuff (data-science, ML, DNN, RL, etc.) I think all this hype will integrate well with MAS, and has yet to be fully understood. Many people are introduced to this thought through MMO gaming, which has been a huge hit, so there may be a leap yet to come.
As a computer software expert witness, I am required to analyze a huge range of different software technologies. During my deposition or trial testimony, the opposing expert may direct questions targeted at exposing or revealing my weaknesses. There is no time for research or education.
Given that I can't be an expert in every technology, what are the most versatile and transferrable skills or technologies I should learn?
I will start with the obvious:
Databases are omnipresent (but which are the best archetypes?)
C is often involved due to the prevalence of older Windows and DOS based systems
What should be added this list?
I may be mis-reading your question, but I suspect that if you are being called upon as an expert witness, you already have the expertise they are seeking; I suppose that learning more technical aspects of any technology would make you more likely to become an expert witness, but ultimately I would recommend the best skill would be of truthfulness. If you don't know, say so. Any unknown questions can then become the "to be studied" list for later review.
just my 2 cents ...
It would be silly to call you as an expert witness if you cannot be an expert in the line of questioning.
Well, the big thing about being a witness is to listen to the counsel for which you are testifying. In the computer world, your credibility is not easily impugned. If they were to try to do so, it would be by calling into question formal education or training as insufficient to be an expert. They won't be asking you to explain what a Turing Machine is, or how to write a sorting algorithm in LISP, unless it is directly relevant to the matter at hand. They won't be playing "Gotcha!" with difficult technical questions, as it won't resonate with the judge/jury .How many jury members can you picture saying this: "I can't BELIEVE that "expert" doesn't understand database normalization! what a fraud!"? If the jury doesn't understand the question, they won't understand the answer. Any 1st year law student will tell you all about this problem (it comes up it all kinds of expert testimony situations).
No, your credibility will be questioned in laymen's terms. If you are being asked to testify, it's because you have the answers that are relevant. Stick to those and don't do any tricks (as your counsel will tell you), and you'll be fine. If your information is correct, and your degree/experience is solid, you may not even be cross-examined (they will just find their own expert to say the opposite of what you said).
Computer software expert witnesses need to also have a good understanding of networking technology and be able to explain it to a jury or judge. Because a great deal of software is client/server based, being able to explain the way firewalls, ip address, http, internet routers works and why you can tell that certain pieces of software were definitely used at certain times and locations is important.
Being familiar with server operating systems and the log files they generate is also helpful.
I would say forget learning new technology outside understanding industry concepts and how they're really applied in the real world. The key thing you need to be able to do as an expert witness is explain these concepts in terms that can be easily understood by the layman. You already know this stuff or you wouldn't be the expert witness. You're there because your name and reputation are thought well of and they [prosecution/defence] need your help.
I think of it like this: The lawyer/barrister/attorney's job is to sell their vision of the truth and get the jury to buy into their vision [skewed as that vision may or may not be]. Your job is to sell the facts. Either the two are one and the same, or they aren't. Sell the facts to the best of your ability, if you have easily understood examples [by easily understood, I mean by an 8 year old], all the better.
Key concepts I would think would be software systems that people will use/exploit to either commit or to cover up a crime:
Networking systems: Common protocols, packet tracing etc.
Firewall systems and common exploits.
Viruses and replication: Worms vs. Trojans etc.
Major Operating Systems: Basic concepts and common exploits.
Web Applications: How they're structured and how they can be exploited.
Common hacking concepts: DoS, OOB, SQL Injection etc.
Email concepts: transmission, receipt, tracking, header information.
Data storage and recovery concepts and key software.
Surveillance Techniques: Packet analysis, key loggers etc.
I'm sure there are a few others, but no others immediately spring to mind
Definitely learn about email systems. I'd imagine email communications come into play fairly often in court cases these days. Learn how SMTP and POP3 work. Learn the basics of email servers and what ways they can be manipulated and how difficult it is to do.
i think you're deceiving yourself, what is a "computer software expert witness"?. That's like saying, you're an electrical engineer, and so you have the capacity to answer any engineering questions, whether they're from chemical, mechanical, civil or other specific area of engineering.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Welcome!
I very enjoyed programming artificial intelligence in my studies - neural networks, expert machines and other. But in work I develop mainly web applications.
And now I think about returning to such programming, maybe in hobby, or maybe in work. Are there areas where AI is commonly used in applications development and programmer with such skills can search work?
Or maybe I can sold some ideas to my boss and use AI to extend some of our applications.
What are you experience and ideas with using AI in applications?
I recently started reading the book Programming Collective Intelligence. It's an excellent book which discusses exactly what you are looking for - using AI techniques in web applications.
The book is written clearly, is easy to understand, explains everything in terms of real applications (it covers how some commonly used technology works: Google Pagerank, Amazons recommendation system, matchmaking websites, link recommendation systems, bayesian spam filters and more) and it uses actually useful examples using real data (ebay API, facebook API etc are used to collect data). In one chapter, it even explains how you can draw graphs (I mean the data structure, not bar/line/etc graphs) optimally (so that no nodes are too close together, minimum overlapping lines etc), which could be useful for, for example, mapping social networks.
I would recommend having a look at it and see the different ways AI can be applied to web applications.
As a counter-example, parsing data acquired from water testing equipment would probably be a bad place to use artificial intelligence:
The Daily WTF: No, We Need a Neural Network
Just a reminder for all of us to choose the right tool for the right job.
Neural networks are great for working on images, so one area of web applications you could use AI for would be identifying and/or manipulating patterns in images over large sets of data. For example, a site like Flickr or Facebook might have some interesting training material to identify people based on face or associating groupings of pixels (those being the features you work with) with certain items mentioned in captions or tags.
In terms of text manipulation, there's a lot of stuff, but it's usually icing on the cake for other web apps. I'm talking mostly in the areas of automatic completion in search bars and back-end things the user doesn't usually see, like automatic machine translation or improved search capability.
The problem with putting AI at the front of an application's offering is that usually, artificial intelligence is not a feature in and of itself, but rather a way of negotiating large data sets effectively without regular prompts from the designer. In general, a user will associate with an application on a one-to-one basis, and therefore judges it only on the quality of a relatively low number of responses.
Email spam filtering systems - definitely.
Any other security applications which need to spot patterns for malicious stuff.
You probably could analyze the behavior of the visitors of your web applications ; how do they navigate inside the website to provide a better, optimized interface. Now it depends on what kind of web applications you're working on. For on line shopping you can come with suggestions extrapolated from customers habits.
You can also detect "abnormal" behavior and fraud. Fraud and bot detection can take advantage of AI.
Forecasting, of course.
It has immense value for businesses (i.e.: inventory optimization) and is especially valuable in the time of global crisis.
Games do need AI.
Expert systems too.
Outside of games, I've seen very few commercial uses of AI.
It could, in theory, be very useful in industrial robotics and imaging, but those fields also tend to be very conservative, and uncomfortable with non-deterministic algorithms.
You might want to research what iRobot does, but even them use rather simple algorithms in their commercial robots.
In the area of cognitive architectures (e.g. Soar, ACT-R, etc), rather than concentrating on algorithms like A* and games, researchers investigate models of human behavior including decision-making, cultural interchange and learning. They often focus on cognitive plausibility, i.e. how close does a model track what a human would do, including timing, etc.
These systems tend to be strictly research-based with limited commercial applications. So far anyway. Military applications, especially for training, are fairly common though.
Image Processing for detecting cancer! (We actually code IEEE papers about it, creating the algoritms is way harder than coding them so we write papers about the performance of other papers)
Risk assessment is a pretty good case for neural networks, mostly because they're pretty good at pattern matching. Insurance and credit companies use them to some degree for determining the risk of a customer.
I have done some extensive research on using Artificial Neural Networks for classification of underwater sound sources. The algorithm seemed to work quite well, especially that I devoted a big portion of the work on figuring out what combination of fourier transform coefficient composed the best set for the classification (with Principal Component Analysis).
Anything (seriously):
http://highlevellogic.blogspot.com/2010/09/high-level-logic-rethinking-software.html
The High Level Logic (HLL) Open Source project is about finding and coding high level logic under which all the other AI (and in fact, all programming) fits. There are serious concrete ideas and code. HLL is already an application framework.