I am looking for research (published) on AI techniques for reading cookbook recipes. Recipes are a very limited domain that might be doable in a natural language recognition engine with some degree of accuracy.
I have in mind writing a program that would allow copy/pasting a recipe from a web browser into the AI and having it determine the title, author, ingredients, instructions, nutritional information, etc. by "reading" the recipe. I would also like to be able to process PDF files (I have a large collection), maybe also just using copy/paste.
The output will be some kind of (standard) XML-based format that can be read by a recipe organizer.
I have in mind PhD or Masters-level work.
One subfield of AI that you might find relevant is information extraction.
Information extraction algorithms often work by using rules (e.g. regular expressions) to identify entities and relations in text. These rules can either be defined by hand (i.e. the Suiseki algorithm) or learned with supervised machine learning algorithms (i.e. RAPIER, Wrapper Induction, Conditional Random Fields).
For example, an information extraction algorithm might grab data from a job posting:
Job Title: Senior DBMS Consultant
Location: Dallas,TX
Responsibilities: DBMS Applications consultant works with project teams to define DBMS based solutions that support the enterprise deployment of Electronic Commerce, Sales Force Automation, and Customer Service applications.
Desired Requirements: 3-5 years exp. developing Oracle or SQL Server apps using Visual Basic, C/C++, Powerbuilder, Progress, or similar. Recent experience related to installing and configuring Oracle or SQL Server in both dev. and deployment environments.
Desired Skills: Understanding of UNIX or NT, scripting language. Know principles of structured software engineering and project management
...and distill it into this template:
title: Senior DBMS Consultant
state: TX
city: Dallas
country: US
language: Powerbuilder, Progress, C, C++, Visual Basic
platform: UNIX, NT
application: SQL Server, Oracle
area: Electronic Commerce, Customer Service
required years of experience: 3
desired years of experience: 5
Ray Mooney and his group at the University of Texas at Austin have done some great work in information extraction. Here are some references that might make good jumping-off points:
Raymond J. Mooney and Razvan Bunescu, Mining Knowledge from Text Using Information Extraction. SIGKDD Explorations, 7:1 (2005), pp 3-10.
Stephen Soderland, Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34:1 (1999), pp 233-272.
C. Blaschke and A. Valencia. The frame-based module of the Suiseki information extraction system. IEEE Intelligent Systems, 17:14–20 (2002).
Related
I'm taking a database course and I have to write a command line application. The prof wants us to write an ESQL (embed SQL) application.
I have a feeling that this kind of technology is depreciated.
We have to use oracle precompiler to translate a esql code in c++. This kind of applications look terrible to maintain.
A php application would also work well, but they probably want a command line application to do the grading faster (unit test with input feed). What you guys think, is Embed SQL used in the industry, does it worth to ask the prof to do a java application ? Is there another technology more appropriate ?
Embedded SQL was one of the the most popular way to do SQL in C during the "old days" (C++ was not yet invented).
These days mostly we'll be using an ORM library. It is not recommended to do embedded SQL any more because, as you put it well, it depends on a proprietary pre-processor and makes code difficult to debug, manage, and maintain. It also hooks you to one single database vendor and your code will be extremely difficult to move to another database backend. Generally, we don't do it in "real life".
But as it is only a class, your prof is probably interested in teaching you SQL and database concepts. Embedded SQL is only a tool. You're supposed to learn SQL and databases, not embedded SQL in C++.
However, I believe that you're missing the point by asking about PHP and Java. Not to mention that PHP is a scripting language, and Java is another language that you can (potentially) write a processor for embedded SQL.
So your point about embedded SQL really has nothing with language choices. It has to do with the tradeoffs and balance between (1) proprietary embedded system with preprocessor, (2) using an ORM library, or a data-access library (e.g. ODBC).
Off-Topic:
I first started using embedded SQL when I was in College (that was about 30 years ago!). Actually got programming jobs out of College and still used it, but obviously it was on the way out. Never seen it used ever since 1990 or so.
Yes, but no. I have not met a single line of Embedded SQL in my 10 years in the field. I would say (and hope) this technology only exists in (some) legacy systems.
Nowadays, database related development in the industry would involve:
Direct database access using JDBC, ADO .NET, OLE DB, ODBC or native libraries (OCCI in your case).
Some sort of ORM (Hibernate, Entity Framework or a home made solution).
Some sort of data access layer based on frameworks and/or patterns (think Ruby on Rails, Active Record or a home made solution).
IMHO, home made solutions should be eradicated but they are more common than you would think. Part of this would certainly have to do with students having only experimented with outdated and inadapted tools at school...
ORM (and data access layers) related problems can be very complex and I would say very interesting to have a look at. Especially if you are a student. I would recommend delving into Martin Fowler's P of EAA.
In C++, I would have a look at SOCI.
We have to maintain on an old system here (20 years and older).
ESQL is used massively in here. The most of the problems we had while moving the software to a new OS (it was an 15 year old hpux) where with the ESQL code.
The new software we are writing are all making use of the C++ library. This gives us more readable code + our IDE doesn't say 'invalid syntax' all the time. etc..
The C++ library is in general terms very equal as how I connect to a database in .NET or Java.
Using the C++ library whe have an improvement of speed (if used wisely) and much less errors.
ESQL is deprecated by my point of view. But since we have entered an time where a lot of the written software is to update/upgrade or maintain existing systems, it is very handy to have basic knowledge of old techniques!
I haven't seen embedded SQL in an application for 10 years. The last time I saw it was in a legacy mainframe app written in COBOL. Yes, still being used at electrical utility company.
The little C++ programming I do these days doesn't involve SQL. These days most relational DB programming I encounter is one of these:
ORM (object relational mapping - hibernate or JPA)
JDBC
stored procedures (oracle or mySQL)
While this is probably outdated ( I also did ESQL ~15-20 years ago), it still may serve as a good example on how to also approach things - even if it is only for you to more enjoy ORM afterwards.
Also from my understanding LINQ in .NET is somewhat similar from the idea of embedding SQL in the host language - and LINQ seems to be quite popular.
Extracting from this to broader CS, embedded DSLs seem to be a current topic of research, so the example of ESQL as an early version my not be too far fetched from todays world.
ESQL is the prime language being heavily propagated for IBM Middleware products . Its not an Object Oriented Language but a Procedural language . Its extensively used in some places to do mapping between XMLs (alias for XSLTs) .
I am using ESql as of date on a Informix 9.x database in the legacy C++ application code that I work on as a part of my job.
While I agree to everyone that it is an old technique, and there are better options out there, I would still say it is a very neat technique. The good part is the SQL is embedded as a part of the C/C++ code flow, syntax wise and logic wise. The little syntax change that ESql carries is easy to learn, and hence I say its fun to use it.
Like Heiko mentioned, LINQ is close in the idea to ESql.
I asked this over on Superuser and it was suggested I try this here:
Can anyone recommend a quality source to learn databases? I am changing careers and have no background in computers but this is what I have chosen to do now.
I was thinking of taking an intro course at a community college but I have no problem teaching myself with a book and some software. I am looking to accelerate my learning curve and don't want to spend an entire semester on an introductory course if there is something better out there.
Any suggestions are welcome. Thank you.
Edit:
Thank you for the feedback. The MS site and Oracle look promising.
I am pursuing a career in software development. I have taken C++ and C# at a community college and was accepted to a masters program for the spring. What level of database knowledge/implementation is required to program in C++ at the master's level and not get handed your lunch?
I guess I need to know how databases are used in programming. I don't have the common core of experience in order to explain the question more thoroughly without tangentially departing into illusory ideals of a programming career.
At any rate, what you have provided is enough to get me going and it will, I am sure, uncover further areas of interest.
Thank you kindly.
There's a whole lot of aspects of "learning databases":
administering databases
designing databases (modeling)
implementing databases (might require adding business logic into triggers and stored procedures).
using databases (and this might be 'writing a query' to aspects of data mining)
And then there's what type of database. Most people these days assume RDBMS, but there's the push towards NoSQL and tuples stores, etc.
There's no one good answer without a little more information about what you're trying to do.
In addition to what people here would consider as database-related jobs (DBA, database developer, data modelling analyst), there are a lot of semi-technical jobs around that refer to themselves as "database people". In particular, you get report analyst -- often using tools like Crystal Reports, Cognos, SSRS -- and business analysts who work on large customer databases, often defining more how the data should be used in various marketing activities than doing much coding. So it would be up to you to decide if you wish to have a more "business" aspect to your career goal, or to work in data entry/management, or do development and design, or to become a specialist admin certified in a particular vendor's software.
In all cases, however, given you start from scratch, some understanding in how relational database systems work would be beneficial. Someone mentioned Element K courses. Personally, I've found the SQL Zoo interactive online course particularly engaging, and have used it with new staff who had to get up to speed with SQL skills in a short time. Also, this tutorial looks well done. For self-study, I recommend the Head First book series, as in Head First SQL. All these recommendations are extremely SQL-centric, though.
Microsoft also offers tutorials that go with Access and SQL Server Express, but I'd warn not to become locked in. SQLite is a free and extremely lightweight RDBMS, which is perfect for trying things out, either using a programming language (if you know one -- if not, I recommend Python).
As for recommending a provider, you're not even saying which country you're in (though from mentioning "community college" I guess it's the US), what your budget and mobility are like etc, so it's hard to decide. Maybe a mentor or advisor at your community college could help?
The IEEE Computer Society has some online Element K database courses available for members. IEEE runs memberships by the calendar year. Professional membership dues are $50 for a half year and $99 for a full year of professional membership. Student memberships are $40 and $20 for full and half year, respectively. This might be less than the cost of one course.
ACM has some online database courses in their catalog also. They also use Element K, so it seems likely they are the same courses. A full membership is also $99 per year, student is $19 per year. (Their membership year starts when you join.)
I assume your goal is to get a job, since you mention you are changing careers.
I understand you wish to accelerate your learning curve. Make sure you do lots of hands-on work, then, to make sure you know how to do things and have skills in databases. The skills, as they say, pay the bills.
I don't know what your background in computer programming, computer science, or information science / managed information science is. If you have some background into those topics, you might be able to make it into the stuff I recommend below.
Most databases are relational in businesses. Depending on the business, they run different datbases. The two really big ones are Oracle and SQL Server.
I've known a couple of friends who learned databases (and other enterprise software) by going online and reading as much documentation as they could put their hands (mouse?) on the Oracle and SQL Server websites, and for other enterprise software. They went and applied for jobs, had enough business sense / charisma, could read/write American business English well, and were good enough to get the job with little/no experience, and have done very well. It pays their bills, and is a stepping stone to bigger things. You might be able to do the same. Go on the Oracle or Microsoft websites, and grab the documentation. Read as much of it as you can, circle words you don't know, try to find the definitions. Find a local guru (that's the main advantage of the community college, they have smart people with communication skills who can help), and talk with them.
If your a tremendous reader, you could do this in a month.
EDIT:
As to how much "database" knowledge you need to know to get into your master's...
It depends on the program; I'm going to assume your'e going for a professional program and not a academic one. Depending on the master's program, you may/may not need to know databases. Consult your professor or advisor to see if this is something you could tackle right away.
You need to know, ultimately, how to "talk" to a database. Think of this like trying to do business with folks in a foreign country:
You'll need to learn the language (relational databases use a language called SQL; make sure you know how to formulate a basic query (SELECT), and modify the data in the database (INSERT / UPDATE / DELETE)).
Make sure you know the culture (what is a table, what is a row, what is a column, what is a tuple, what are keys (foreign/primary/candidate/alternate/super), what are triggers, what are the normal forms, what is an ERD diagram, what is relational theory)
Make sure you have safe transportation (learn APIs to communicate with the database in C++ and C#. Learn how you can run SQL queries on the database from your app and get resulting data back inside your app. Perhaps learn about ORM tools and how to get/set data that way).
The above is what an intro course covers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I was looking at oracle liecense, it looks cheap for named user plus. I mean if I develop a web application in which user has no interaction with database other than registering and logging in and if I make a virtual user inside server to do all these things that is get user name and password from users ect. keep them in queue and execute database commands one by one. Will I need more than one named user plus for this, I am total noob in oracle and web field , i m just a designer who is learning server side technologies so if this question is invalid please let me know why.
Named user licensing is not the best option in this situation - Oracle considers the web application a multiplexing device and will require you to track the users of the application and purchase a named user license for each of them.
[Edit]
I see that you've received some good additional licensing information in the other answers, but in short an Oracle schema != an application user. Years ago I was unlucky enough to be the POC for an unwelcome audit by Oracle and for our intranet application I was required to report distinct IP addresses connecting to the application from the web server.
Oracle licensing is a labyrinth which few people understand. Even most Oracle employees won't discuss it because it's so complicated. In fact there are almost as many consultants making a living from offering licensing advice as there are from tuning the actual databases.
So the following is just an opinion, and you definitely should not use it as the basis of a business plan.
If your web application is for an intranet you could purchase a Named User Plus license, because you should be able to identify each and every user of your application. But if your application is going on the Internet with an unknown and unknowable userbase you will need to buy Per Processor licenses.
Oracle has a complicated mechanism for licensing multi-core processors. It very much depends on which platform and type of chip we're using. It is an area of licensing which Oracle revises on a regular basis, as they try to come to terms with multi-core CPUs. It used to be that pretty much everything was 0.75; as Zendar points out, it is now the case that many configurations are licensed at 0.5 per core. Oracle always round up, so if we have a single dual-core CPU which attracts a 0.75 per core multiplier it will still cost us two Per Processor licenses, but a quad-core will only cost three. Find out more.
One thing to bear in mind is that if you application has quite lightweight DB requirements - that is, less than 4GB of application data, suitable to run on a single CPU (single core) - you can use the Express Edition for free, for any purpose.
One more thing: licenses apply to all databases, not just those in produvction. So you need to factor in the cost of licensing your development and test environments as well.
With regards to that last point Zendar cites the OTN Download license. That outlines what we can do with products we have downloaded from OTN. The problem with the OTN Download license is made clear in Oracle's explanation of Database Licensing:
"This limited license gives the user
the right to develop, but not to
deploy, applications using the
licensed products. It also limits the
use of the downloaded product to one
person, and limits installation of the
product to one server."
So: if we're a one-man operation (no dog) we can develop an application using the OTN Download license. But if we want a team of developers sharing a database we need a Full Use license. And once we're supporting an application in production we need a Full Use license for the maintenance (formerly development) environment.
The other consideration is this: if we want support and patches for our development environment then we need a proper license.
I said it was a labyrinth.
Oracle for a long time have price list published on its website. So, there is no any secret there.
There you will find their definition of "Named user plus".
Short interpretation: named user plus is every individual and/or device that accesses database.
You can buy per processor license or per named user, pick one that suites you better (be careful with processor license - Oracle have some formula for counting processor cores - check price list and Oracle Processor Core Factor Table)
Regarding APC-s answer - all Intel and AMD chips have core factor 0.5 - meaning 1 processor license per 2 cores.
Development license for Oracle RDBMS products states:
We grant you a nonexclusive, nontransferable limited license to use the programs only for the purpose of developing, testing, prototyping and demonstrating your application, and not for any other purpose.
So, you can download Oracle product and use it for developing, testing, prototyping and demonstrating your application. Well, not really. See below edit.
Disclaimer: I am not and have never been Oracle employee or Oracle reseller. Information here is my interpretation of documents freely available on Oracle website. I worked with Oracle products, they are far from perfect, but anyway I don't like misinformations especially if correct information is available.
Edit:
RE APC's comment:
Yep. You are right. It's restrictive as you wrote in your answer.
I reread license agreement. Few sentences after the one I quoted above says:
The programs may be installed on one computer only, and used by one person in the operating environment identified by us.
So, OTN development licence is practically useless for majority of developers.
Quantitative Analysts or "Quants" predict the behavior of markets to maximize profits. I am interested in the software that they use to accomplish this. Are there development platforms, libraries, languages or Data Mining suites specifically tailored to Financial Modeling?
Statistical Modeling:
First, there are statistical computing languages like R which is powerful and open-source, with lots of packages for analysis and plotting.
You will find some R packages that relate to finance:
http://www.quantmod.com/
https://www.rmetrics.org/
https://www.rmetrics.org/ebooks-tseries
Machine Learning and AI to train the system on past data:
Weka Data Minig: http://www.cs.waikato.ac.nz/ml/weka/
libsvm (data classifiers http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
"Artificial Intelligence: Modern Approach" book (code: http://aima.cs.berkeley.edu/code.html)
Backtesting the trading system on past data:
More often that not, broker trading platforms will provide facilities for trading automation, in form of scripts and languages with which you can program the logic of the trading "strategy" (some use common languages like Java, some use proprietary ones). They will also provide some minimal support to test the strategy on past data, and get a detailed report on the taken trades and their outcome.
Connection to broker and System Testing:
Either you use some broker-proprietrary trading API, or go with the more standardized FIX.
Building a FIX server that does a quotation ticks playback to your trading system (which in this case will be a FIX client) is also a very good form of validation of the system. Most reputable ECNs will provide FIX access. So this is more portable than any other interface.
QuickFIX/J is a full featured
messaging engine for the FIX protocol.
It is a 100% Java open source
implementation of the popular C++
QuickFIX engine.
http://www.quickfixj.org/
There aren't any full blown platforms/applications per-se, since pretty much all software in this field is developed in-house, and usually behind the firewall (obviously for competitive advantage; in a fiercely competitive industry)
A well known library that includes a lot of algorithms and pricing models, and makes for a suitable starting point for a framework or app is called quantlib.
The Strata project from OpenGamma provides a comprehensive open source Java library for market risk, including all the basic elements a quant would need to manage things like holidays, trades, valuation and risk measures. Disclaimer, I am an author.
We are looking at acquiring Data Mining software to primarily run predictive analysis processes.
How does SQL Server Data Mining solution compares to other solutions like SPSS from IBM?
Since SQL Server DM is included in SQL Server Enterprise license - what would be the justification to spend extra couple 100K to buy separate software just to do DM?
I would look into open source options as well, including R, RapidMiner, Weka
I would recommend checking out the Rexer survey, as it shows popularity and satisfaction measures for a variety of data mining products:
http://www.kdnuggets.com/2010/03/f-annual-rexer-analytics-data-miner-survey-results.html
Depending on what you are looking to accomplish, and obviously your budget, there are certainly some great things being done in R. Check out Rattle for R and Revolution Computing.
I am a big fan of SPSS, and unfortunately have not used their Modeler package, but it seems like it may be worth considering. I have used SAS Enterprise Miner, and while it is powerful, I am not a big fan.
I haven't dabbled with Weka that much, but I found RapidMiner to have a steep learning curve, but does have alot of capability.
If you want to keep everything in the Microsoft stack check out www.predixionsoftware.com which is planning the release of a disruptive Excel add-in as an update to the current MS DM add-ins.
You might want want to give KNIME a try before paying for something else. Works well with databases and is excellent for exploratory analysis.
I would suggest to check open-source data mining software. There are some very good open-source software that are free.
I Would start by building some data mining models in SSAS using both Multidimensional and Tabular, and then get an account for Google Analytics. I built a social networking website that was set up where members had to join and used Google Analytics to start building reporting dashboards and have probably built near a thousand. Good starting point, R is good, Omni used to be the top dawg but Adobe bought them, clicktracks, quilk view, Sisense, Tableau, Actuate, however I would wait and see how the product Microsoft releases is. Chances are it will set itself apart like they have in the BI market and shot up to 2nd in market share and 1st in growth in the database market.