What is the best technique or framework to implement one robot in one website, something like IKEA Site:
http://193.108.42.79/ikea-us/cgi-bin/ikea-us.cgi
But I don't care about the frontend what I really need to know is what framework or technique to manage the Artificial Inteligence.
Best Regards,
Pedro
While it is possible that the only technique they are using is Keyword Spotting it is more likely that they are also using Naive Bayesian Classification.
There are some some client-side classifiers written in javascript such as Brain, eliminating the need for server-side processing.
Brain: http://harthur.github.com/brain/#bayesian
A more comprehensive solution may include Classifier4J or OpenNLP, either of which could be utilized in a Java Applet or on a server if need be.
OpenNLP: http://incubator.apache.org/opennlp/
Related
Scenario :
I am having an application where I am using AWS Lambdas which are written in Kotlin to query data from a relational DB residing in AWS.
--
My problem is that I want to use an ORM for firing these queries. I dont want to use hibernate as it is too heavy and takes too long to setup, and I need a solution that would take up the least time in setting up and firing from the Lambdas. I have looked upon multiple ORMSs like Exposed, Requery, Jooq, Ktorm and Squash.
Is anybody out there having experience with any of these libraries in the serverless context? What are your experiences with them and what would you suggest using in my scenario?
You can have a look at exposed, https://github.com/JetBrains/Exposed
I have been using Squash with the Hikari connection pool for some large projects and I have been very happy with it. I like that is is very extensible and my team has been able to solve any issues that come up, implementing extensions to the dialect and the simplicity of defining TableDefinition classes makes it work well for generating code. It is also very self contained with very few dependencies and light on reflection, so should be good for serverless though I have not personally used it for that.
Squash is less an ORM than an sql abstraction / translation layer that ties into entities and it doesn't try to solve all the problems that something like hibernate does. In my experience ORMs start as simple, efficient, and powerful projects and grow to heavyweight libraries that try to do too much and their complexity begins to cause issues when the developer cannot easily see what's going on in the chain from usage through to the database / storage mechanism.
One negative about squash that deserves mention is that, while it is a jetlbrains official library and created by a kotlin developer, support is limited as orangy, the creator, is quite busy and I have feature pull requests outstanding, with many more of them backed up currently. I chose it because I favored it's simplicity and extensibility among a small but advanced team of developers all capable of improving upon it.
Which ever library you choose I hope these factors assist you in making your decision at the least.
We are using Kafka-storm in our project. In storm we will use multiple bolts for transformations. But before that, as part of POC, we want to persist data into DB. Which framework we should use? For BigData scenario which can be used? Is Trident applicable here? For persistence I am looking for something like Hibernate/JPA. What can be used? and if possible provide a sample code for this.
You can use almost any framework you want, although your specific choice may impact your topology slightly. Choose your favorite framework and give it a try.
Given that a lot of people use content repositories. There must be a good reason. I'm building out a new web application that will need to store content. Can someone help me understanding this?
What are the advantages to using a content repository like Apache Jackrabbit as opposed to writing your own code/API to store images or text pages? Writing your own requires time etc. but so too does implementing and learning a new framework like the content repository API. A benefit to rolling your own seems to me that you know your code and have immediate expertise if you need to enhance or fix it. Using another framework you need to learn its foibles, and it is always easier to modify code you know that don't know... i.e. you don't know that underlying framework code as well as your own.
As I said a lot of people use them. There must be a reason. I can't see it as being just another "everyone is using them so, so should we." At least I hope it isn't that. :)
A JCR repository allows you to store all your content (from structured database-type data to large multimedia files) in a single place and with a single API, which is extremely convenient and makes your code simpler, avoiding the impedance mismatch between files and data that you usually have in content-based systems.
JCR also provides a lot of infrastructure functionality that you won't have to build or assemble yourself: search (including full-text), observation (callbacks when something changes) versioning, data types including multi-value, ordered nodes, etc...
If you allow a shameless plug, my "JCR - best of both worlds" article at http://java.dzone.com/articles/java-content-repository-best describes this in more detail and also provides a reading list for the JCR spec, that should allow you go get a good overview without reading the whole thing.
The article uses Apache Sling for its examples, which combined with a JCR repository provides a very nice (IMO, but as a Sling committer I'm biased ;-) platform for content-based applications.
My most recent projects have involved both choices: a custom-built data store (MySQL and image files) wtih a multi-level caching mechanism, and a JCR-based commercial repository.
A few thoughts:
In the short run, a DIY solution offers reduced complexity: you only have to build and learn what you need. And there is at least the opportunity to optimize
the data store for your particular application's needs -- more than likely speed of retrieval, but possibly storage footprint, security, or reliability concerns are foremost for you.
However, in the long run, you're looking at a significant increment of work to extend the home-grown system to a new content type (video, e.g.) or to provide new functionality (maybe,
versioning).
Also, it's difficult to separate the choice of a data store approach from the choice of tools that content providers will use to populate and maintain the data store. You'll have to give
your authors something more than an HTML form with a textarea and a submit button.
This is related to the advantages of standardization: compatibility and interchangeability. If everybody writes his own library and API, there is no compatibility and interchangeability, leading to higher cost.
I'm interested in web development and by that I mean the bigger projects like facebook or twitter. I know the basics of java, css, php and mysql. I know there is a lot more out there. I read about it. But I don't know what the purpose is and how to put in place.
Things like: Scribe, thrift, casandra, Unix/Linux, shell/perl/python scripting, PostgreSQL, MongoDB, non-relational NoSQL datastores, JVM, nginx
I want to know why they need it, how they use it and what te purpose is.
What I need is a book like technical background of facebook for dummies or so.
Are there any books or websites that explain this from scratch?
Thank you!
EDIT:
Thank you for your answers! You have been very helpful. I was in the assumption, experienced programmers know almost anything about the technology there's used today. But as I read, you can only know so much and I need to figure out which technology to use. I take on the encouragement to start building small. And will take on php and improve my skills from there.
Thanks again!
http://highscalability.com/
This is one of the best sites out there. There are several case studies describing what and why many websites use, and pointers to further references. I would also look at the Google Scalability Conference 2007 talks
http://www.google.com/search?q=Google+Scalability&hl=en&client=firefox-a&hs=YUg&rls=org.mozilla:en-US:official&prmd=v&source=univ&tbs=vid:1&tbo=u&ei=fl4OTPUkorIwueCQxQw&sa=X&oi=video_result_group&ct=title&resnum=4&ved=0CDIQqwQwAw
It's all about choosing the right tool for the job in my eyes. There is so much technology out there it's impossible to learn it all. Just choose the subset that will work for you.
The best place to start is by building small simple websites, and as you come accross problems that you need solved you research the tools needed to solve those problems.
If you attack all of the areas at once, it's going to be overwhelming and you will not get anywhere.
For a general overview on what each of the technologies does, Wikipedia gives a good overview on most technologies.
If you are interested in database content which it seems like you are, a good place to start is reading up on normalisation.
Scribe, thrift, casandra, Unix/Linux, shell/perl/python scripting, PostgreSQL, MongoDB, non-relational NoSQL datastores, JVM, nginx
Those I would search on Wikipedia for to get a quick overview. Facebook is written in PHP/MySQL. There are some books on the subject of creating social networking sites, and some books have gotten decent reviews on Amazon.com, however, I have not read any of them myself.
If I were you, I'd start with PHP/MySQL and sit down and write a simple social network. Break the project down into components and tasks and Google for each challenge you encounter such as sessions, database structure, security, friend structure, and processing POST and GET requests.
You'll learn a lot and you get the big picture. Once you see the big picture, you can take another look at different technologies that are available and then decide which component you could have developed better with other tools. I personally don't think that looking too much into the technology available is good for someone who is still in the beginning stages. Start doing, learn from it, and then your questions become much more specific and a lot of things will make more sense.
The problem you're having is you're looking at smaller, specialty products, and not at larger, more mature technologies. Wikipedia will actually give you a decent overview of most of the medium-and-large projects out there.
Cassandra, Hadoop, Mongo, and NoSQL are all lovely... but they're specialty tools. SQL is a general purpose solution that works for 99% of the sites on the net.
Unix/Linux isn't a specialty tool; you might want to try going to Ubuntu's website and installing Linux, and just using it day-to-day, the way you'd use Windows. When you need to figure out something new, like setting up a webserver, do it on the Linux box and a Windows box, and you'll eventually learn linux pretty darn well.
As far as scripting, O'Reilly makes a great line of books on Bash, Perl, and Python.
JVM is a Java Virtual Machine, which is a core of getting Java code to go. Sun's website has a great set of tutorials on learning Java.
It might be much, much easier to pick a project (or three) that you'd like to learn, and learn some of these by doing. I'd probably suggest learning some SQL before learning the newly established alternatives; that lets you learn the rest of the system, as SQL is pretty easy. Once you've got the rest of the thing solid, try swapping in a NoSQL solution at that point.
There are a lot of frameworks that do a lot of different things. You've named a lot of different things from a lot of different areas. The best way to think of these things is to group them by category. Here's an example:
Suppose you have a laptop and you want to host a website. You'll need the following at a minimum:
1) Web Server software. Two popular options are Microsoft's IIS and Apache Web Server.
That's really all you need. You can set up your www_root folder and load files into it. Assuming everything is configured properly, you can now load HTML pages into that folder and access them through your IP address. Every page you view in your web browser is in HTML format. CSS is a stylesheet language that defines how your HTML will be formatted. You can also start writing Javascript, as most modern browsers support the client-side scripting language.
Chances are you'll want the following as well:
2) Database software. Two popular options are Microsoft's SQL Server and MySQL
3) Server-side scripting. PHP is very popular, as is ASP. You'll need the runtime deployed on your server. Python, Ruby, Perl, etc all fall under this category.
4) Web Application Framework(s). This will provide you with libraries for your language of choice to help develop web applications and websites. CakePHP, Ruby on Rails, and the Google Web Toolkit are examples of web application frameworks.
Additionally, you may want to utilize:
5) Additional libraries. JQuery, for example, is quickly becoming a popular library for Javascript that handles a lot of common tasks for you. Instead of writing complex effects code and what-not yourself, just use the pre-written code in the JQuery library.
6) Data interchange technology. If you are passing a lot of information back and forth, you will likely want to encapsulate this data in a logical format. Ideally, this format would describe the data and allow your applications to easily read/process it following a standard. This is where XML and JSON come into play.
I can't recommend a good book for you to learn this stuff, but I feel that the collective replies to your question here should be more than enough to get you started.
Ultimately, what you need to do is determine what technologies you need, and then choose the right one for the job. Don't go building an application using Ruby on Rails just because it's what Twitter used, but rather choose it because it provides some advantage to you over the other options.
What technologies should I use when designing for a large social website (with a lot of transactions, like twitter)? using open source solutions
- database
- webserver
- os
Twitter uses Ruby-on-rails and Scala
Facebook uses PHP
StackOverflow uses asp.net mvc
As you can see, it doesn't really matter what you choose; all of these sites have lots of traffic, but are based on very different technologies.
What matters most in a social networking sites is the backend, since most of the bottleneck will be from there. You might want to consider No-SQL databases.
Facebook and Twitter use Cassandra
LinkedIn uses Voldemort
There are a few others like:
Hypertable
MongoDB, used by Sourceforge.
CouchDB
As for the programming language, as others have said, it does not matter that much. But if you really can not decide, you might want to consider a non-blocking webserver like Tornado.
Doesn't matter what kind of scripting language you'll choose, as long as you'll heavily utilize memcached. Having the right caching hierarchy is a must.
At the end of the day, this is a matter of personal preference. Twitter uses Ruby on Rails. Wikipedia runs on PHP. Reddit uses a Python library called web.py, but intitially, it was written in Lisp. I would say pick the technologies you are most familiar with.
A good book on optimizing for high performance websites from the Yahoo engineers is High Performance Web Sites: Essential Knowledge for Front-End Engineers. It is nice and short and basically a bulleted guide on the steps to take to make websites faster by optimizing the less well explored front-end.
As Joel says
People all over the world are constantly building web applications using .NET, using Java, and using PHP all the time. None of them are failing because of the choice of technology.
Choose whichever of the "big 3" (.Net, Java or PHP) that you know best - these technologies are known to be scalable, the real question of whether or not your site will scale is how the site is structured and the quality of the code - using whichever framework you are most familiar with gives you the best chance of achieving that.
Any technologies that suite your taste, In your situation I think algorithms is more important.
Technologies, techniques ,
research what other scaled sites have used and done and what the problems they had were less than he successes, there are podcasts on iTunes, talks and interviews on Youtube
look at industry best practices and follow them to a degree
don't take peoples word for it, make sure you see the problem or the success as opposed to the pr glitz about it
avoid obvious things that do not scale vertically or horizontally, database connectivity, sessions - cookies and the like
look at nosql storage as an sql alternative less overhead but less functionality
take care when looking at the language/framework. frameworks come with lots of baggage you do not need, they speed you up initially and slow you down eventually, i.e. you spend more time hacking the framework than building the site, same with languages does it do what you want rather than be trendy, cool to programme in etc.
If you are building something like Facebook, then your choices are a little limited, Facebook made their own PHP Runtime, check HipHop For PHP