Data Mining (IMS)
This is the website for the lecture “Data Mining & Machine Learning” in the Master’s program International Marketing & Sales (IMS) by Prof. Dr. Michael Bücker.
“Without data, you’re just another person with an opinion.”
— W. Edwards Deming
Course Overview
This course provides a practical introduction to data-driven problem solving and machine learning using the CRISP-DM process as the organizing backbone.
- Foundations: Python & Jupyter, data handling, exploratory analysis
- CRISP-DM in action: Business & Data Understanding, Preparation, Modeling, Evaluation, Deployment
- ML Toolkit: supervised/unsupervised learning, model selection, validation, metrics
- Good Practice: reproducibility, documentation, experiment tracking, deployment basics
The course combines theory, live coding, and hands-on exercises. You will frame business problems, build & evaluate ML models, and communicate results clearly.
Why This Matters
Organizations rely on evidence-based decisions and automation:
- Identify patterns/drivers in complex data
- Predict outcomes for operations & strategy
- Translate analysis into robust, repeatable workflows
You will learn to structure projects end-to-end, from scoping and data work to credible evaluation and delivery.
What Are Data Mining & Machine Learning?
- Data Mining: discovering useful patterns/relationships in data using statistical and AI methods to support decisions.
- Machine Learning: software that learns from data to make predictions/decisions (e.g., neural nets, tree-based models, linear models).
Both perspectives meet in this course: rigorous analysis plus predictive modeling.
The Interdisciplinarity of ML/DM
Successful projects blend:
- Math/Stats: probability, optimization, inference
- Computer Science: programming, algorithms, data systems
- Business/Domain: KPIs, constraints, success criteria
We’ll practice switching between these lenses throughout the project lifecycle.
The CRISP-DM Framework
CRISP-DM structures analytics projects into six iterative phases:
- Business Understanding – define objectives, success criteria
- Data Understanding – acquire, explore, assess quality
- Data Preparation – cleaning, feature engineering, splits
- Modeling – select algorithms & tune hyperparameters
- Evaluation – validate against business & statistical goals
- Deployment – deliver insights or operationalize the model
Focus this semester: We emphasize prediction problems (classification/regression) and credible evaluation.
Case Study: msbank Fraud Detection
We’ll work with a realistic dataset of credit-card transactions (100k rows; 22 predictors; binary target fraud yes/no) to illustrate each CRISP-DM phase.
- Scenario: msbank is a digital-first bank for students/young adults.
- Goal: detect fraudulent transactions with real-time scoring; design interventions that balance risk and customer experience.
- Practice: Explorative Data Analysis, Feature Engineering, Model Building and Comparison, Evlauation
Practical Information
- Format: weekly lectures with integrated coding sessions
- Tools: Python, Jupyter, pandas, scikit-learn, matplotlib/Altair, (optionally) experiment tracking
- Resources: slides, notebooks, datasets in the course repository
- Communication: via MS Teams
- Schedule: see the Course Schedule
- General Info: see General Information
Examination & Assessment
The main deliverable is a group project (3–5 students): an end-to-end CRISP-DM case.
Details and rubrics: see the Assignment.
Learning Outcomes
After this course you will be able to:
- Translate business questions into data mining tasks
- Build and compare ML models with proper validation
- Use metrics appropriately (classification, regression, clustering)
- Communicate results and recommend decisions
- Structure projects with CRISP-DM and good MLOps habits