Hidden Markov Model Bayesian Relation - artificial-intelligence

Hi all this is artificial intelligence class from udacity. I have a question.
P(R0)=1 means probability of day0 rainy is is 1. Here is my question
P(R2 | H1 G2)? meaning we know I am happy at day1 and grumpy at day2 what is probability it is raining at day2

Some useful data
P(R_2|H_1, G_2) can be reduced to P(R_2|G_2) because there is no given transition coeffs. between moods (this can be discovered for a weather sequence however).
P(R_2|G_2) = P(G_2|R_2)*P(R_2)/P(G_2) = 0.264*0.440/0.320 = 0.363

Related

Forecasting 5 Day Sales

I have daily sales data between 2013-02-18 to 2017-02-12 that has only 4 days of data missing (all the Xmases on the 25th of each year). These holidays have a sale volume of zero. My purpose is to understand how to staff my store for the upcoming week by short-term predicting my sales for the next 5-7 days worth of data.
I start by setting up this data as a time series:
ts <- ts(mydata, frequency = 365)
and then an initial analysis through a decomposition:
This seems to show I have a declining sales trend, but there is some seasonality, if I'm not mistaken. So, to start my forecast implementation, I fit an arima model for the first two years worth of data by doing:
fit <- auto.arima(ts[1:730], stepwise = FALSE, approximation = FALSE)
Series: ts[1:730]
ARIMA(4,1,1)
Coefficients:
ar1 ar2 ar3 ar4 ma1
0.3638 -0.2290 -0.1451 -0.2075 -0.8958
s.e. 0.0413 0.0388 0.0388 0.0398 0.0241
sigma^2 estimated as 15424930: log likelihood=-7068.67
AIC=14149.33 AICc=14149.45 BIC=14176.88
This model doesn't seem right to me, because it does not include any seasonality. I know I have enough data. Rob Hyndman's blog said to try using ets which also showed no seasonality. What am I not understanding about this data series or the forecasting methodology?
I've re-asked this question more appropriately in the stats exchange forums. Could someone please close this question here in stackexchange for me?
The questions is now here.
https://stats.stackexchange.com/questions/295012/forecast-5-7-day-sales

genetic Algorithm fitness function for stock market pridiction

i have been working on stock market prediction for couple of months but could not find any relevant information. i googled it and found some research papers but unfortunately they only mention the working of genetic algorithm. which i already know .
i need to design a fitness function to predict stock market
i have already get the real data from stock market
Open High Low Close Volume
253.8 255.8 253.8 255.8 809300
250.8 250.8 243.05 247.8 2041000
248.1 254.9 248.19 254 4550500
254 261.39 252.35 259.54 9926000
259.54 260.60 253.5 253.94 5425700
253.94 257.25 248.05 256.10 7504500
256.1 258.35 248.30 251 10933400
251 253.64 249.25 250.44 5478500
250.44 252.89 248.60 252.25 6316600
252.25 254.85 252 254.05 6332500
254.05 255.35 252 252.25 6961600
253.5 259.5 253.5 259.25 10216200
259.25 260.20 257.10 257.89 6071400
can anyone please help me to get a relevant fitness function
Your fitness function would be how close your predictions were to the actual. So you've got your population of agents who are predicting tomarrow's prices. Like, agent #12683 goes through his model and predicts that the price of eggs will be up 0.5% tomorrow. You take their predictions (+0.5%), subtract them from the actual prices, and take the absolute.
A score of zero is perfect.
You'd use historical data to provide a learning set on.
And you'd be a decade behind the quant-devs who have already done this and a few years behind the quant-devs who gamed those systems to make a buck. Welcome to the stock market.

relational database design for multidimensional matrix questions

I am designing a relational DB for an online survey.
However, I am not sure what is the best relational database design for storing multidimensional matrix questions.
Let's say, I have the following question (sorry, it does not allow me to insert HTML table):
What was your experience of...
----------| Not friendly| (2) |Very friendly|Length of stay|Visited in the last year?|
Sydney |radio button | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
New York | rb | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
London | rb | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
Do you think I should do something along the following lines or is there a better way?
To hold all the question:
Question
questionID
question
QuestionMatrix2d
matrix2dID
questionID
subquestionID
subquestion
QuestionMatrix
questionID
matrix2dID
question_parentID
And to hold all the responses:
QuestionResponse
questionID
response_code
QuestionMatrix2dResponse
questionID
subquestionID
response_code
Thank you for your help.
I disagree with ryan1234. This totally is a relational problem, and there is very little reason not to put it into a database.
I have to do a bit of guesswork though, in what you're trying to achieve here. You have an online survey, so I assume it will be used by more than one person. Your database will need to acommodate for that by having a session or user table, I'll go with the latter since it is more clear to read.
Secondly, you have a list of locations (Sidney, New York, London). I assume this list can either change over time or even from one questionaire to the next.
Then you have a set of questions. You don't explicitly state that these would be variable or fixed. Since you designed a set of tables for that, I assume it's supposed to be variable. Please note that your questions are not a matrix, but a list. Even if they are hierarchical, they still do not compose a matrix.
Last but not least you've got answers to those questions.
Lets create a users table:
user_id user_name
1 me
2 somebody else
Second table is as simple: locations
location_id location_name
1 Sidney
2 New York
3 London
Third table is a bit more complicated - and to be honest: just plain ugly. But this is what you get if you design a database in a database, and the alternatives (using DDL or storing that information in XML/JSON or even outside the database) are not pretty either. If there is a hierarchical question (your examples don't show them), you could add a "parent_question_id" column.
question_id question_text question_type question_type_info
1 How do you rate RADIO 0 to 5
2 Length of stay COMBOBOX 1 day, 2 days, whatever
3 Visited last year CHECKBOX
Finally you need a fourth table to store all the answers
user_id location_id question_id value
1 1 1 2 <-- value here means "rating of 2"
1 1 2 5 <-- value here means "5 days"
1 1 3 1 <-- value here means "yes, visited last year"
Yep. ugly as well. If you had a fixed list of questions I could provide you with a pretty database :)
Edit: Answering to your comments: To link your questions to a survey, you'll need a few more tables surveys defining which questions for which locations are going to be asked. The following database layout lets you specify a list of locations and a list of questions asked as well as a survey name.
Table surveys:
survey_id survey_name
1 Spring 2013 London Travel Survey
2 Spring 2013 Northern Hemisphere Short Survey
Table survey_questions:
survey_id question_id
1 1
1 2
1 3
2 1
Table survey_locations:
survey_id location_id
1 1
2 1
2 2
The contents I put in here gives you two surveys. Survey #1 will ask all three questions just on one location: 'London'. Survey #2 will just ask one question on both London and New York. If you want to ask different questions on different locations your table layout will have to accommodate for that, but such a system won't fit into your original table-like layout.
Having done things similar to this, I would recommend considering not turning this into a relational problem. What if you have objects and just serialize them to something like JSON and store that?
Doing this relationally you'll end up spending quite a bit of time making tables and wiring together complex drawing code in your application to make sure the questions/answers draw in the right order etc.
Otherwise I think you can make your approach work. There is no silver bullet for designing survey stuff in an RDBMS.

Should I create two table to store specific information of an object?

I have some table:
Course: contain info about course, one course has many topics.
Topic: contain info about topic, one topic belongs to one course and one topic has many questions.
Question: contain info about question, one question belongs to one topic.
GeneralExam: Contain info about the exam of a course, one general exam belongs to one course.
GeneralQuestion: Contain set questions of General Exam.
This is columns of two table:
GeneralExam: name, description, semester, duration, user_id, course_id, used (boolean), number_question
GeneralQuestion: general_exam_id, question_id
The questions will be get for GeneralExam is random. It means I will get random questions depend on specific number of question of each topic.
Now I want to know specific information of an general exam, like the number of questions of each topic in course which was made a general exam. Currently, I think I will create a new table to store that info, something like:
New table: general_exam_id, topic_id, number_question
But I don't know if this is the best way to do it, or maybe in this case, has other ways or patterns to solve. Because If I create that New table, when I make a change in GeneralExam table(ex: change set questions), I will need to update 3 table: GeneralExam, GeneralQuestion, New table. I don't sure it is the good way.
So I want to ask, should I create new table to store that information (number of questions of each topic in course of a general exam),
Or should I need to make some changes in table GeneralQuestion for store info of general exam better, and what changes I should do? Thanks for any suggestions and advices.
We are trying to say, that is not required to create a new extra table. You want to manage your schema efficiently with mimimal touches to tables.
Design Rules:
One should not confuse the numbered topics in a particular course book to Topic table's ID numbers. Course doesn't necessarily have to be belonged to an Exam. It's the Exam who must belong to a Course. You have gotten your design so far correct. I assume you are storing all Questions for an Exam in GeneralQuestion table which acts like sort of a question bank of past Exams (including the schedule Exam in the near future which only gives access to the Exam moderators).
Makes more sense to rename your GeneralQuestions table into ExamsQuestions. With this bank your design makes two virtual question types: Exam questions from the bank and questions from Question table where Exam questions are referencing to your Question table. So that gives your the required referencial key to Exam question bank. In my opinion it is a history table. It seems like, your final table that you are not sure should ideally be just a stored query providing real time data.
Main question : Are you planning to store each past/scheduled-future Exam's questions? You say Yes. Hence,
Date becomes very crucial column in your Exam table according to the design I have provided. You need both Date & Course ID in Exam table.
Following is how I would suggest the table schema.
Reference on SQLFiddle
tblCourse
ID, Course
ID NAME
b105 biology 1st year
c323 chemistry 1st year
e120 english 1st year
m122 maths 1st year
m250 maths 2nd year
p302 physics 3rd year
tblTopic : Although ID is indexing, the CID is what recognizes the Topic's Parent (the Course)
ID, CID, Topic
ID CID NAME
t1 m122 Algebra
t2 m122 Probability
t3 e120 Essay Writing
t4 p302 Optics
t5 b105 liver system
t6 b105 neural system
t7 p302 mechanics
tblQuestion : Although ID is indexing, the TID is what recognizes the Question's Parent (the topic)
ID, TID, Question
tblExam : Although ID is indexing, the CID is what recognizes the Question's Parent (the course)
ID, CID, Exam, Date
ID TID QUESTION
q1 t2 x
q10 t7 p
q11 t4 n
q12 t6 i
q13 t7 r
q14 t6 k
q2 t1 y
q3 t1 z
q4 t2 a
q5 t2 v
q6 t6 s
q7 t6 h
q8 t1 l
q9 t2 g
tblExamsQuestions : Foreign Keys : Exam ID, Question ID
ID, QID
ID CID EXAM DATE
e1 b105 1st Year Biology Main Stream June, 08 2012
e2 m122 1st Year Maths Elective December, 20 2011
e3 b105 1st Year Biology Main Stream February, 10 2012
Application:
Somebody wants to get last year's Exam Questions for 1st Year Maths Course. How do you query that? If Exam ID is are on auto increment then it's very hard to know what which id is what exam. So here you could be able to search questiosn for a particular course exam only with course id and date the exam held. That should do the job -> Unless same course exams held multiple times on the same day. Then you can save your data by Time as well. You can remove Date, Time as long as you change your Exam table design to query by Exam ID where the ID is a proper exam ID not just 1, 2, 3, ...
Course ID = m122
Date = Last Year/Month/Date
These are the most logical/important details which will work as a COMPOSITE SEARCH KEY you need to find the Exam ID from Exam table and use that in ExamsQuestions bank to pull the Exam questions.
select * from question
where id in (
select eq.qid from examsquestions eq
inner join exam e
on e.id = eq.id
where e.date = '2011-12-20'
and e.cid = 'm122');
ID TID QUESTION
q1 t2 x
q5 t2 v
q7 t6 h
By the way since you are choosing questions randomly for an Exam - I would be so worried that if I have to take that Exam. Because the risk of getting all questions from one topic is pretty wide. Anyway that's a side issue which I hope you have a unbiased yet FAIR mechanism to generate Exam from all topics for a course ;)
Let me if you have further doubts. Anyone please throw some light to improve ideas for better solutions.
PS: Sorry for the late reply.
If the information you want can be queried from the current data, in general you should not store it in another table. The reason is: every time you add/remove rows from other tables, you'd have to update this one as well. It's easy to create data inconsistencies that way.
For your example (number of questions of a given topic in an exam), you can easily retrieve that info using aggregation:
select q.topic_id, count(gq.question_id)
from topic t join question q on t.id = q.topic_id
join general_question gq on q.id = gq.question_id
where gq.general_exam_id = 10
group by q.topic_ic;
OTOH if the data you want to store is not deduceable from the rest of the data, then yes, it's better to store it where it makes sense - if it's specific to the pair (exam, topic), then on a table that has those two values as its candidate key (i.e. exactly the way you suggested in your question). Whether to create a new table or add those columns in an existing one (with the correct candidate key, of course), it's your choice, I don't have any arguments for or against doing so.

Determining the functional dependencies of a relationship and their normal forms

I'm studying for a database test, and the study guide there are some (many) exercises of normalization of DB, and functional dependence, but the teacher did not make any similar exercise, so I would like someone help me understand this to attack the other 16 problems.
1) Given the following logical schema:
Relationship product_sales
POS Zone Agent Product_Code Qualification Quantity_Sold
123-A Zone-1 A-1 P1 8 80
123-A Zone-1 A-1 P1 3 30
123-A Zone-1 A-2 P2 3 30
456-B Zona-1 A-3 P1 2 20
456-B Zone-1 A-3 P3 5 50
789-C Zone-2 A-4 P4 2 20
Assuming that:
• Points of Sale are grouped into Zone.
• Each Point of Sale there are agents.
• Each agent operates in a single POS.
• Two agents of the same points of sale can not market the same product.
• For each product sold by an agent, it is assigned a Qualification depending on the product and
the quantity sold.
a) Indicate 4 functional dependencies present.
b) What is the normal form of this structure.
To get you started finding the 4 functional dependencies, think about which attributes depend on another attribute:
eg: does the Zone depend on the POS? (if so, POS -> Zone) or does the POS depend on the Zone? (in which case Zone -> POS).
Four of your five statements tell you something about the dependencies between attributes (or combinations of several attributes).
As for normalisation, there's a (relatively) clear tutorial here. The phrase "the key, the whole key, and nothing but the key" is also a good way to remember the 1st, 2nd and 3rd normal forms.
In your comment, you said
Well, According to the theory I've read I think it may be, but I have
many doubts: POS → Zone, {POS, Agent} → Zone, Agent → POS, {Agent,
Product_code, Quantity_Sold} → Qualification –
I think that's a good effort.
I think POS->Zone is right.
I don't think {POS, Agent} → Zone is quite right. If you look at the sample data, and you think about it a bit, I think you'll find that Agent->POS, and that Agent->Zone.
I don't think {Agent, Product_code, Quantity_Sold} → Qualification is quite right. The requirement states "For each product sold by an agent, it is assigned a Qualification depending on the product and the quantity sold." The important part of that is "a Qualification depending on the product and the quantity sold". Qualification depends on product and quantity, so {Product_code, Quantity}->Qualification. (Nothing in the requirement suggests to me that the qualification might be different for identical orders from two different agents.)
So based on your comment, I think you have these functional dependencies so far.
POS->Zone
Agent->POS
Agent->Zone
Product_code, Quantity->Qualification
But you're missing at least one that has a significant effect on determining keys. Here's the requirement.
Two agents of the same points of sale can not market the same product.
How do you express the functional dependency implied in that requirement?

Resources