M.184.5451 Statistical Learning for Data Science with R and Python | |
---|---|
(Statistical Learning for Data Science with R and Python) |
Koordinator (coordinator): | Prof. Dr. Yuanhua Feng |
Ansprechpartner (contact): | Prof. Dr. Yuanhua Feng (yuanhua.feng[at]uni-paderborn.de) Sebastian Letmathe (lettron[at]mail.uni-paderborn.de) |
Credits: | 5 ECTS |
Workload: | 150 Std (h) |
Semesterturnus (semester cycle): | WS |
Studiensemester (study semester): | 1-4 |
Dauer in Semestern (duration in semesters): | 1 |
Lehrveranstaltungen (courses): | ||||||
---|---|---|---|---|---|---|
Nummer / Name (number / title) |
Art (type) |
Kontaktzeit (contact time) |
Selbststudium (self-study) |
Status (P/WP) (status) |
Gruppengröße (group size) | |
a) | K.184.54511 / Statistical Learning for Data Science with R and Python (Vorlesung) | Vorlesung | P | 75 TN (PART) | ||
b) | K.184.54512 / Statistical Learning for Data Science with R and Python (Übung) | Übung | P | 75 TN (PART) | ||
Wahlmöglichkeiten innerhalb des Moduls (Options within the module): | ||||||
Keine |
Empfohlene Voraussetzungen (prerequisites): |
---|
W4479 Econometrics |
Inhalte (short description): |
---|
This module introduces the students to Data Science and one of the main sub-area of Data Science, e.g. Statistical Learning, as well as the programming languages R and Python. Covered topics of this course are e.g. a brief introduction to Data Science, an Introduction to Statistical Learning, Linear Regression, Classification, Cross-Validation and Resampling Methods, Model Selection using Stepwise Regression and Regularization using Ridge Regression and LASSO, Regression Splines, Non-parametric Regression, Trees-Based Decision, Baggin, Boosting, Random Forest, Support Vector Machines and Unsupervised Learning(if possible). The course is structured into three parts: Part 1 – An Introduction to Data Science and statistical Learning, and an overview of the purpose, the organization, main topics as well as the assessment of this module. Part 2 - Introduction to fundamentals of Statistical Learning. Main contents of this part are basic and advanced concepts like Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, K-Nearest Neighbours, Cross Validation, Bootstrap and Stepwise Regression. Part 3 – Introduction to advanced fundamentals of Statistical Learning. In this part the focus lies on more sophisticated concepts like Ridge Regression, Lasso, Principal Component Regression, Partial Least Squares, Regression Splines, Generalized Additive Models (GAMs), Regression Trees, Classification Trees, Bagging, Random Forests, Boosting, Maximal Margin Classifiers, Support Vector Classifiers and Support Vector Machines. Further possible topics of unsupervised learning are e.g. Principal Components Analysis, K-Means Cluster analysis and Hierarchical Cluster Analysis. Please note that the topics of the seminar projects should be on the application of statistical learning approaches to financial and economic data.
|
Lernergebnisse (learning outcomes): |
---|
Fachkompetenz Wissen (professional expertise): |
Studierende... |
Fachkompetenz Fertigkeit (practical professional and academic skills): |
Studierende...
|
Personale Kompetenz / Sozial (individual competences / social skills): |
Studierende... |
Personale Kompetenz / Selbstständigkeit (individual competences / ability to perform autonomously): |
Studierende... |
Prüfungsleistungen (examinations) | |||
---|---|---|---|
Art der Modulprüfung (type of modul examination): Modulteilprüfungen | |||
Art der Prüfung (type of examination) |
Umfang (extent) |
Gewichtung (weighting) | |
a) | Hausarbeit | 15-20 pages | 70.00 % |
b) | Hausarbeit mit Präsentation | 8-12 pages and 10 minutes, respectively | 30.00 % |
Studienleistung / qualifizierte Teilnahme (module participation requirements) |
---|
Nein |
Voraussetzungen für die Teilnahme an Prüfungen (formal requirements for participating in examinations) |
---|
Keine |
Voraussetzungen für die Vergabe von Credits (formal requirements for granting credit points) |
---|
Die Vergabe der Credits erfolgt, wenn die Modulnote mindestens „ausreichend“ ist |
Gewichtung für Gesamtnote (calculation of overall grade) |
---|
Das Modul wird mit der Anzahl seiner Credits gewichtet (Faktor: 1) |
Verwendung des Moduls in den Studiengängen (The module can be selected in the following degree programmes) |
---|
M.Sc. International Business Studies, M.Sc. Betriebswirtschaftslehre, M.Sc. International Economics and Management, M.Sc. Management Information Systems, M.Sc. Wirtschaftsinformatik, M.Sc. Wirtschaftspädagogik, M.Ed. Wirtschaftspädagogik |
Umfang QT (participation requirements): |
---|
Lernmaterialien, Literaturangaben (learning material, literature): |
---|
Main literature: An introduction to Statistical Learning with application in R (ISLR, Springer, James et al., 2017) http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf Plus: An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code https://github.com/JWarmenhoven/ISLR-python Baumer, B.S., Kaplan, D.T. and Horton, N.J. (2017). Modern Data Science with R. Chapman & Hall/CRC, Boca Raton. The Elements of Statistical Learning (ESL, Springer, 2nd Ed., Hastie et al., 2001) https://web.stanford.edu/~hastie/Papers/ESLII.pdf Plus: The Elements of Statistical Learning – Python Notebooks https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks Further references: Vapnik, V. (1996). The Nature of Statistical Learning Theory, Springer, New York. Vapnik, V. (1998). Statistical Learning Theory, Wiley, New York. |
Teilnehmerbegrenzung (participant limit): |
---|
Keine |
Sonstige Hinweise (additional information): |
---|
Teaching language is english. Please note that students who already attended the lecture W5333 - Data Science for Business are not allowed to attend this course and vice versa. |