# Categorical Latent Variables - r assignment

## 6 important questions on Categorical Latent Variables - r assignment

### How can you fit a normal mixture model for 2:12 potential clusters, and plot the BIC values for each model?

clustbic <- mclustBIC(data, G = 2:12) # fit models with 2 to 12 clusters

clustbic

plot(clustbic) # plot the bic's

the output tells you what the three best models are.

The mclust package uses a maximum-BIC strategy, so the higher BIC in this case is a better fit.

### After you did your exploratory fitting to find the amount of clusters, how can you fit the best fitting model to your data?

clusfit <- Mclust(data, x = Clustbic) # fits the best fitting model

### When you've fitted your model, how can you obtain some interesting statistics from the model?

- clusfit$parameters$pro # these are the class probabilities
- clusfit$z # these are the posterior probabilities of the individuals
- clusfit$classification # what is the cluster each individual is classified to.
- clusfit$parameters$mean # describes the mean score per question per cluster

- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding

### How can you identify whether the clusters are well separated?

# perform dimensionality reduction to plot the clusters

clustred <- MclustDR(clusfit)

plot(clustred, what = "boundaries", ngrid = 200)

plot(clustred, what = "density", dimens = 1)

### What is a strategy to identify what the effects of individual characteristics are on the clustering?

- In a dataframe that contains characteristics for every individual (characteristics as columns, individuals as rows), add the clustering as a column.
- Then you can filter certain characteristics and calculate what percentage of people in a cluster has this characteristic.
- this way you can filter for main, 2-way, 3-way, etc. Effects.

- raters %>%
- group_by(cluster) %>%
- count(age_group) %>%
- mutate(clust_tot = sum(n)) %>%
- mutate(clust_prop = n/clust_tot) %>%
- arrange(desc(clust_prop))
- this allows you to see what characteristics cause people to get clustered to certain clusters.

### What is the effect of clustering only those for which a cluster has a conditional probability of .8 or more, and NA otherwise?

- This causes your clusters to become better separated.
- otherwise you're also clustering individuals which don't have a clear cluster (since the conditional probabilities for all clusters are equal)
- clustering these people causes the essence of the cluster to be diluted.

The question on the page originate from the summary of the following study material:

- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding