Summary: Exam 2018
 This + 400k other summaries
 A unique study and practice tool
 Never study anything twice again
 Get the grades you hope for
 100% sure, 100% understanding
Read the summary and the most important questions on Exam 2018

1 Introduction
This is a preview. There are 1 more flashcards available for chapter 1
Show more cards here 
Cristiano is a sporty guy, and this is also his main occupation. In order to prepare well for an important tournament he is engaged in the quantified self. He tracks a variety of things, including all kinds of physiological measurements (such as heart rate and respiration), his movements (using the sensors of his mobile phone) as well as the training sessions he performs.(3 pt) Choe et al distinguishes various purposes why someone would engage in the quantified self. Argue which purpose best fits Cristiano.
”Improve health would be the most natural answer here (though others are accepted provided that a good rationale is provided). Cristiano tries to optimize his health state/execute a plan to get in the best possible shape.” 
Cristiano is a sporty guy, and this is also his main occupation. In order to prepare well for an important tournament he is engaged in the quantified self. He tracks a variety of things, including all kinds of physiological measurements (such as heart rate and respiration), his movements (using the sensors of his mobile phone) as well as the training sessions he performs.(5 pt) Explain for one of the two tasks you have identified above what the table X would look like (explain both the columns and the rows).
For the prediction of the heart rate we could consider a number of different measurements (the columns), including the accelerometer data, the respiration, the training characteristics. The rows would be time points at which we perform measurements (i.e. examples in our dataset). The target is not part of X. 
Cristiano is a sporty guy, and this is also his main occupation. In order to prepare well for an important tournament he is engaged in the quantified self. He tracks a variety of things, including all kinds of physiological measurements (such as heart rate and respiration), his movements (using the sensors of his mobile phone) as well as the training sessions he performs.(3 pt) Explain how we could apply reinforcement learning to the case of Cristiano.
”We could apply reinforcement learning to learn when to motivate Cristiano to really push his limits. For his, we could learn when to send motivating messages to him on his mobile phone such that he will in the end be in a better shape.” 
2 Outlier Detection

We want to apply an outlier detection algorithm to this data. If we considerremoving the outliers for the attribute X1 alone, would you prefer to use Chauvenet’s criterion or a mixture model? Argue your choice.
”The mixture model would be preferred since it seems to be very difficult
to fit a single normal distribution on this data. Two normal distributions seem to fit quite well (one centred around 2 and one around 6. With a mixture model we could establish this.” 
(5 pt) Explain the local outlier factor algorithm on a conceptual level.
Local outlier factor is a distance based outlier detection algorithm and
considers the k closest neighbors around a point to determine whether it is an outlier. For those points it considers how far they are located from their closest neighbors and compares that to the distance of the current point to its neighbors. If the current point is much more distant from its neighbors compared to how distant its neighbors are to their neighbors the point is considered to be an outlier. 
(4 pt) Let us consider the point shown by means of the black star (at X1 = 8 and X2 = 8). Would it be more likely that this point would flagged as an outlier using the simple distance based outlier detection or using the local outlier factor? Argue why.
It is more likely to be flagged as outlier by the simple distance based outlier
detection algorithm since it seems relatively far away from the other points (meaning that possibly to few points would fall in dmin). Local outlier factor would consider the fact that point are in general pretty distant from each other in that area. 
(3 pt) In outlier detection, the outlier detection algorithms have parameters to be set that eventually influence what is considered to be an outlier or not. Explain how appropriate parameter values can be found.
Through visual inspection, or by considering the number of points that are considered to be outliers. 
(4 pt) We want to apply a Principal Component Analysis to this data. Illustrate graphically what the principal component would look like. Argue why you have drawn it in that way.
This should be a diagonal line going from around (0, 0) to (8, 8). The reason for this is that it explain most variance in the data. 
3 Feature Engineering
This is a preview. There are 1 more flashcards available for chapter 3
Show more cards here 
(3 pt) Given this dataset, we are considering to aggregate the values of heart rate in the time domain by using the mean. A proposal is done to apply a window size of λ = 3. Given the size of the dataset shown, do you think this an appropriate choice? Argue why (not).
No, with such a window size only two datapoints would remain where we have a value for the feature from the time domain, that would be too limited. 
(4 pt) Explain how we can derive temporal features in the time domain in case we have a combination of numerical and categorical features.
We create categories for the numerical features (e.g. normal, low, high,
or increasing and decreasing) and apply the algorithm by Batal emphet al..
 Higher grades + faster learning
 Never study anything twice
 100% sure, 100% understanding