Home / Summaries / introduction-to-data-mining

Summary: Introduction To Data Mining | 9780321321367 | Pang Ning Tan, et al

Name: Introduction to data mining
ISBN: 9780321321367

Summary: Introduction To Data Mining | 9780321321367 | Pang Ning Tan, et al Book cover image

This + 400k other summaries
A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

PLEASE KNOW!!! There are just 44 flashcards and notes available for this material. This summary might not be complete. Please search similar or other summaries.

Use this summary

Remember faster, study better. Scientifically proven.

Read the summary and the most important questions on Introduction to data mining | 9780321321367 | Pang-Ning Tan ; Michael Steinbach ; Vipin Kumar.

1 Introduction

This is a preview. There are 8 more flashcards available for chapter 1
Show more cards here
The Data Miner's Toolkit can be broken down into two fundamental tasks. Which two tasks are we talking about?
- Predictive Tasks.
- Descriptive Tasks.
What is (one of) the most famous discoveries made by data mining?
- Men who bought diapers on a Friday night were also likely to buy beer.
- Highlights Data Mining's power to discover surprising, non-obvious and useful patterns.
Deviation/Anomaly Detection is a predictive task of data mining. Can you explain what it is?
- Identify abnormal behaviour.
Common Applications:
- Credit Card Fraud Detection (which transactions are anomalies and potentially fraudulent?).
- Network Intrusion Detection (Detect unusual patterns in network traffic that might indicate an intrusion).
- Identifying Disease and ecosystem disturbances.
2 Data

This is a preview. There are 17 more flashcards available for chapter 2
Show more cards here
What is data preprocessing and what are the seven key preprocessing techniques?
- Data Preprocessing.
- Seven key preprocessing techniques:
Can you describe the preprocessing technique called Aggregation?
- Combining two or more attributes/objects into a single attribute/object.
- Pros:
- Cons:
- E.g., taking daily sales figures and aggregating them into monthly/yearly totals.
What are the best ways for Dimensionality Reduction?
- Linear Algebra Techniques.
- Feature Subset Selection.
Similarity and Dissimilarity are both proximity measures. Can you explain how they differ form each other?
- Similarity.
- Dissimilarity.
Similarity measures, like the Jaccard similarity measure, have two of the same properties a metric (distance) has. What two properties are they, and why doesn't the other property apply?
- Properties in common:
- Similarity doesn't have the triangle inequality.
Like cosine Similarity, the extended Jaccard Coefficient is also a vector-based similarity measure. When would you use the Extended Jaccard Coefficient over the Cosine Similarity?
- Absolute quantities.
What is the Extended Jaccard Coefficient (Tanimoto Coefficient) and how does it differ from the normal one?
- Non-binary attributes.

PLEASE KNOW!!! There are just 44 flashcards and notes available for this material. This summary might not be complete. Please search similar or other summaries.

Read the full summary

This summary +380.000 other summaries A unique study tool A rehearsal system for this summary Studycoaching with videos

Discover Study Smart