Data Mining (IMS)

This is the website for the lecture “Data Mining & Machine Learning” in the Master’s program International Marketing & Sales (IMS) by Prof. Dr. Michael Bücker.

“Without data, you’re just another person with an opinion.”
— W. Edwards Deming

Course Overview

This course provides a practical introduction to data-driven problem solving and machine learning using the CRISP-DM process as the organizing backbone.

  • Foundations: Python & Jupyter, data handling, exploratory analysis
  • CRISP-DM in action: Business & Data Understanding, Preparation, Modeling, Evaluation, Deployment
  • ML Toolkit: supervised/unsupervised learning, model selection, validation, metrics
  • Good Practice: reproducibility, documentation, experiment tracking, deployment basics

The course combines theory, live coding, and hands-on exercises. You will frame business problems, build & evaluate ML models, and communicate results clearly.

Why This Matters

Organizations rely on evidence-based decisions and automation:

  • Identify patterns/drivers in complex data
  • Predict outcomes for operations & strategy
  • Translate analysis into robust, repeatable workflows

You will learn to structure projects end-to-end, from scoping and data work to credible evaluation and delivery.

What Are Data Mining & Machine Learning?

  • Data Mining: discovering useful patterns/relationships in data using statistical and AI methods to support decisions.
  • Machine Learning: software that learns from data to make predictions/decisions (e.g., neural nets, tree-based models, linear models).

Both perspectives meet in this course: rigorous analysis plus predictive modeling.

The Interdisciplinarity of ML/DM

Successful projects blend:

  • Math/Stats: probability, optimization, inference
  • Computer Science: programming, algorithms, data systems
  • Business/Domain: KPIs, constraints, success criteria

We’ll practice switching between these lenses throughout the project lifecycle.

The CRISP-DM Framework

CRISP-DM structures analytics projects into six iterative phases:

  1. Business Understanding – define objectives, success criteria
  2. Data Understanding – acquire, explore, assess quality
  3. Data Preparation – cleaning, feature engineering, splits
  4. Modeling – select algorithms & tune hyperparameters
  5. Evaluation – validate against business & statistical goals
  6. Deployment – deliver insights or operationalize the model
Figure 1: CRISP-DM: an iterative process for end-to-end analytics projects (cf. Shearer 2000).
Note

Focus this semester: We emphasize prediction problems (classification/regression) and credible evaluation.

Case Study: msbank Fraud Detection

We’ll work with a realistic dataset of credit-card transactions (100k rows; 22 predictors; binary target fraud yes/no) to illustrate each CRISP-DM phase.

  • Scenario: msbank is a digital-first bank for students/young adults.
  • Goal: detect fraudulent transactions with real-time scoring; design interventions that balance risk and customer experience.
  • Practice: Explorative Data Analysis, Feature Engineering, Model Building and Comparison, Evlauation

Practical Information

  • Format: weekly lectures with integrated coding sessions
  • Tools: Python, Jupyter, pandas, scikit-learn, matplotlib/Altair, (optionally) experiment tracking
  • Resources: slides, notebooks, datasets in the course repository
  • Communication: via MS Teams
  • Schedule: see the Course Schedule
  • General Info: see General Information

Examination & Assessment

The main deliverable is a group project (3–5 students): an end-to-end CRISP-DM case.
Details and rubrics: see the Assignment.

Learning Outcomes

After this course you will be able to:

  • Translate business questions into data mining tasks
  • Build and compare ML models with proper validation
  • Use metrics appropriately (classification, regression, clustering)
  • Communicate results and recommend decisions
  • Structure projects with CRISP-DM and good MLOps habits
Back to top

References

Shearer, Colin. 2000. “The CRISP-DM Model: The New Blueprint for Data Mining.” Journal of Data Warehousing 5 (4): 13–22.