Kurt Hornik: Data and Text Mining Summer Semester 2023
Classes
All classes start at 14:30.
Unit | Date | Time | Location | Topic | Materials | Assignments |
| | | | | | | | Read Ch 1 pages 15-42 |
1 | 2023-03-01 | 14:30-17:30 | D4.0.136 | Generalized linear models | Slides
| Ex 1-6 | Due: 2023-03-07 23:59 | Read Ch 4 pages 129-189 |
2 | 2023-03-08 | 14:30-17:30 | D4.0.136 | Resampling | Slides
| Ex 7-13 | Due: 2023-03-14 23:59 | Read Ch 5 pages 197-219 |
3 | 2023-03-15 | 14:30-17:30 | D4.0.136 | Penalized regression | Slides
| Ex 14-16 | Due: 2023-03-21 23:59 | Read Ch 6 pages 225-282 |
4 | 2023-03-22 | 14:30-17:30 | D4.0.136 | Trees, forests and boosting | Slides
| Ex 17-21, 24-25 | Due: 2023-03-28 23:59 | Read Ch 8 pages 327-361
|
5 | 2023-03-29 | 14:30-17:30 | D4.0.136 | Additional topics and case study | Slides
| Ex 22-23, 26-28 | Due: 2023-04-11 23:59
|
6 | 2023-04-12 | 14:30-17:30 | D4.0.136 | Text mining foundations | Slides |
7 | 2023-04-19 | 14:30-17:30 | D4.0.136 | Text mining applications | Slides |
8 | 2023-04-26 | 14:30-17:30 | D4.0.136 | Presentations
|
Assignments
Exercises
When submitting homework assignments by email, please use the subject
‘DTM Unit n Team k’, where n is the number of the unit and
k is the number of your team.
Submit by email to <Kurt.Hornik@wu.ac.at> and cc all team members.
Chapters in assignments refer to the textbook by James et al.
Homework and presentation teams:
| | Package | Papers | Presentations |
A | Luna | xgboost | Loughran et al (2016) | |
B | Andriy | ranger | Gunnarsson | |
C | Pedro | caret | Loughran et al (2014) | |
D | Na | edgar/finreport/XBRL | Loughran et al (2011) | |
E | Sebastian | mlr3 | Gu, Renault
|
R package projects
caret,
mlr3,
ranger,
xgboost,
edgar/
finreportr/
XBRL.
Reading list
Dogu Araci (2019),
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models.
arXiv:1908.10063.
Mike Chen, George Mussalli, Amir Amel-Zadeh and Michael Oliver Weinberg
(2021),
NLP for SDGs: Measuring Corporate Alignment with the Sustainable
Development Goals.
The Journal of Impact and ESG Investing.
https://jesg.pm-research.com/content/early/2021/12/12/jesg.2021.1.035.
Chen Gu and Alexander Kurov (2020),
Informational role of social media: Evidence from Twitter sentiment.
Journal of Banking & Finance, volume 121.
DOI:10.1016/j.jbankfin.2020.105969.
Björn Rafn Gunnarsson, Seppe vanden Broucke, Bart Baesens, María
Óskarsdóttir and Wilfried Lemahieu (2021),
Deep learning for credit scoring: Do or don’t?.
European Journal of Operational Research,
volume 295, issue 1, pages 292-305.
DOI:10.1016/j.ejor.2021.03.006.
Tim Loughran and Bill McDonald (2011),
When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and
10-Ks.
The Journal of Finance, volume 66, issue 1, 35-65.
DOI:10.1111/j.1540-6261.2010.01625.x.
Tim Loughran and Bill McDonald (2014),
Measuring Readability in Financial Disclosures.
The Journal of Finance, volume 69, issue 4, 1643-1671.
DOI:10.1111/jofi.12162.
Tim Loughran and Bill McDonald (2016),
Textual Analysis in Accounting and Finance: A Survey.
Journal of Accounting Research, volume 54, issue 4,
pages 1187-1230.
DOI:10.1111/1475-679X.12123.
Thomas Renault (2019),
Sentiment analysis and machine learning in finance: a comparison of
methods and models on one million messages.
Digital Finance, volume 2, pages 1-13.
DOI:10.1007/s42521-019-00014-x.
Data sets
german (
data,
docs),
firms (
data,
docs),
Financial Phrase bank (
data).
Text books
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani,
"An Introduction to Statistical Learning (with Applications in R)",
Second edition.
https://www.statlearning.com/.
Links
Electronic University Calendar (eVVZ)
File translated from
TEX
by
TTH,
version 4.16.