Modulhandbuch (Module manual)

M.184.5451 Statistical Learning for Data Science with R and Python
(Statistical Learning for Data Science with R and Python)
Koordinator (coordinator): Prof. Dr. Yuanhua Feng
Ansprechpartner (contact): Prof. Dr. Yuanhua Feng (yuanhua.feng[at]uni-paderborn.de)
Dominik Schulz (dominik.schulz[at]upb.de)
Credits: 5 ECTS
Workload: 150 Std (h)
Semesterturnus (semester cycle): WS
Studiensemester (study semester): 1-4
Dauer in Semestern (duration in semesters): 1
Lehrveranstaltungen (courses):
Nummer / Name
(number / title)
Art
(type)
Kontaktzeit
(contact time)
Selbststudium
(self-study)
Status (P/WP)
(status)
Gruppengröße
(group size)
a) K.184.54511 / Statistical Learning for Data Science with R and Python (Vorlesung) Vorlesung P 75 TN (PART)
b) K.184.54512 / Statistical Learning for Data Science with R and Python (Übung) Übung P 75 TN (PART)
Wahlmöglichkeiten innerhalb des Moduls (Options within the module):
Keine
Empfohlene Voraussetzungen (prerequisites):

​W4479 Econometrics

Inhalte (short description):

This module introduces the students to Data Science and one of the main sub-area of Data Science, e.g. Statistical Learning, as well as the programming languages R and Python. Covered topics of this course are e.g. a brief introduction to Data Science, an Introduction to Statistical Learning,  Linear Regression, Classification, Cross-Validation and Resampling Methods, Model Selection using Stepwise Regression and Regularization using Ridge Regression and LASSO, Regression Splines, Non-parametric Regression, Trees-Based Decision, Baggin, Boosting, Random Forest, Support Vector Machines and Unsupervised Learning(if possible). The course is structured into three parts:

Part 1 – An Introduction to Data Science and statistical Learning, and an overview of the purpose, the organization, main topics as well as the assessment of this module.

Part 2 - Introduction to fundamentals of Statistical Learning. Main contents of this part are basic and advanced concepts like Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, K-Nearest Neighbours, Cross Validation, Bootstrap and Stepwise Regression.

Part 3 – Introduction to advanced fundamentals of Statistical Learning. In this part the focus lies on more sophisticated concepts like Ridge Regression, Lasso, Principal Component Regression, Partial Least Squares, Regression Splines, Generalized Additive Models (GAMs), Regression Trees, Classification Trees, Bagging, Random Forests, Boosting, Maximal Margin Classifiers, Support Vector Classifiers and Support Vector Machines. Further possible topics of unsupervised learning are e.g. Principal Components Analysis, K-Means Cluster analysis and Hierarchical Cluster Analysis.

Please note that the topics of the seminar projects should be on the application of statistical learning approaches to financial and economic data.

Lernergebnisse (learning outcomes):
Fachkompetenz Wissen (professional expertise):
Studierende...
  • understanding of modern Data Science
  • gain fundamental knowledge of Data Science, related problems and methods to solve them.
  • learn different advanced and modern approaches in Statistics and Econometric.
  • understanding the relationship between Statistics, Econometrics and Data Science.
  • understanding the roll of Econometrics in Data Science and vice versa.  
  • learn further advanced concepts of supervised Statistical- and Machine Learning.
  • learn further advanced concepts of unsupervised Statistical- and Machine Learning.
  • Fachkompetenz Fertigkeit (practical professional and academic skills):
    Studierende...
  • the ability to use basic and sophisticated Statistical Learning concepts.
  • gain skills of computer intensive data analysing and for model selection.
  • gain skills to collect, manage, visualize and analyse large and complex data sets.
  • gain advanced knowledge about the programming language R.
  • gain basic knowledge about the programming language Python

  • Personale Kompetenz / Sozial (individual competences / social skills):
    Studierende...

  • improve further skills of problem definition and problem solution
  • gain ability for managing and implementation of a small empirical study project
  • improve cooperative and team-work ability.
  • improve the ability for presenting own results
  • gain communication and conversation skills.
  • Personale Kompetenz / Selbstständigkeit (individual competences / ability to perform autonomously):
    Studierende...

  • gain ability of self-learning
  • gain more expertise in scientific working.
  • obtain further training in independent studying.
  • improve computing data analysis skills.
  • Improve ability for writing a detailed project report.
  • Prüfungsleistungen (examinations)
    Art der Modulprüfung (type of modul examination): Modulteilprüfungen
    Art der Prüfung
    (type of examination)
    Umfang
    (extent)
    Gewichtung
    (weighting)
    a) Hausarbeit 15-20 pages 70.00 %
    b) Hausarbeit mit Präsentation 8-12 pages and 10 minutes, respectively 30.00 %
    Studienleistung / qualifizierte Teilnahme (module participation requirements)
    Nein
    Voraussetzungen für die Teilnahme an Prüfungen (formal requirements for participating in examinations)
    Keine
    Voraussetzungen für die Vergabe von Credits (formal requirements for granting credit points)
    Die Vergabe der Credits erfolgt, wenn die Modulnote mindestens „ausreichend“ ist
    Gewichtung für Gesamtnote (calculation of overall grade)
    Das Modul wird mit der Anzahl seiner Credits gewichtet (Faktor: 1)
    Verwendung des Moduls in den Studiengängen (The module can be selected in the following degree programmes)
    M.Sc. International Business Studies, M.Sc. Betriebswirtschaftslehre, M.Sc. International Economics and Management, M.Sc. Management Information Systems, M.Sc. Wirtschaftsinformatik, M.Sc. Wirtschaftspädagogik, M.Ed. Wirtschaftspädagogik
    Umfang QT (participation requirements):
    Lernmaterialien, Literaturangaben (learning material, literature):

    Main literature: An introduction to Statistical Learning with application in R (ISLR, Springer, James et al., 2017)
    http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf

    Plus: An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code
    https://github.com/JWarmenhoven/ISLR-python

    Baumer, B.S., Kaplan, D.T. and Horton, N.J. (2017). Modern Data Science with R. Chapman & Hall/CRC, Boca Raton.

    The Elements of Statistical Learning (ESL, Springer, 2nd Ed., Hastie et al., 2001)
    https://web.stanford.edu/~hastie/Papers/ESLII.pdf

    Plus: The Elements of Statistical Learning – Python Notebooks
    https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks

    Further references:
    Vapnik, V. (1996). The Nature of Statistical Learning Theory, Springer,
    New York.
    Vapnik, V. (1998). Statistical Learning Theory, Wiley, New York.
    Teilnehmerbegrenzung (participant limit):
    Keine
    Sonstige Hinweise (additional information):

    Teaching language is english.

    Please note that students who already attended the lecture W5333 - Data Science for Business are not allowed to attend this course and vice versa.

      Zum Seitenanfang