Kurt Hornik: Data and Text Mining Summer Semester 2023

Classes

All classes start at 14:30.
Unit Date Time Location Topic Materials Assignments
Read Ch 1 pages 15-42
1 2023-03-01 14:30-17:30 D4.0.136 Generalized linear models Slides Ex 1-6 Due: 2023-03-07 23:59 Read Ch 4 pages 129-189
2 2023-03-08 14:30-17:30 D4.0.136 Resampling Slides Ex 7-13 Due: 2023-03-14 23:59 Read Ch 5 pages 197-219
3 2023-03-15 14:30-17:30 D4.0.136 Penalized regression Slides Ex 14-16 Due: 2023-03-21 23:59 Read Ch 6 pages 225-282
4 2023-03-22 14:30-17:30 D4.0.136 Trees, forests and boosting Slides Ex 17-21, 24-25 Due: 2023-03-28 23:59 Read Ch 8 pages 327-361
5 2023-03-29 14:30-17:30 D4.0.136 Additional topics and case study Slides Ex 22-23, 26-28 Due: 2023-04-11 23:59
6 2023-04-12 14:30-17:30 D4.0.136 Text mining foundations Slides
7 2023-04-19 14:30-17:30 D4.0.136 Text mining applications Slides
8 2023-04-26 14:30-17:30 D4.0.136 Presentations

Assignments

Exercises
When submitting homework assignments by email, please use the subject ‘DTM Unit n Team k’, where n is the number of the unit and k is the number of your team.
Submit by email to <Kurt.Hornik@wu.ac.at> and cc all team members.
Chapters in assignments refer to the textbook by James et al.
Homework and presentation teams:
Package Papers Presentations
A Luna xgboost Loughran et al (2016)
B Andriy ranger Gunnarsson
C Pedro caret Loughran et al (2014)
D Na edgar/finreport/XBRL Loughran et al (2011)
E Sebastian mlr3 Gu, Renault

R package projects

caret, mlr3, ranger, xgboost, edgar/finreportr/XBRL.

Reading list

Dogu Araci (2019), FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
Mike Chen, George Mussalli, Amir Amel-Zadeh and Michael Oliver Weinberg (2021), NLP for SDGs: Measuring Corporate Alignment with the Sustainable Development Goals. The Journal of Impact and ESG Investing. https://jesg.pm-research.com/content/early/2021/12/12/jesg.2021.1.035.
Chen Gu and Alexander Kurov (2020), Informational role of social media: Evidence from Twitter sentiment. Journal of Banking & Finance, volume 121. DOI:10.1016/j.jbankfin.2020.105969.
Björn Rafn Gunnarsson, Seppe vanden Broucke, Bart Baesens, María Óskarsdóttir and Wilfried Lemahieu (2021), Deep learning for credit scoring: Do or don’t?. European Journal of Operational Research, volume 295, issue 1, pages 292-305. DOI:10.1016/j.ejor.2021.03.006.
Tim Loughran and Bill McDonald (2011), When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance, volume 66, issue 1, 35-65. DOI:10.1111/j.1540-6261.2010.01625.x.
Tim Loughran and Bill McDonald (2014), Measuring Readability in Financial Disclosures. The Journal of Finance, volume 69, issue 4, 1643-1671. DOI:10.1111/jofi.12162.
Tim Loughran and Bill McDonald (2016), Textual Analysis in Accounting and Finance: A Survey. Journal of Accounting Research, volume 54, issue 4, pages 1187-1230. DOI:10.1111/1475-679X.12123.
Thomas Renault (2019), Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance, volume 2, pages 1-13. DOI:10.1007/s42521-019-00014-x.

Data sets

german (data, docs), firms (data, docs), Financial Phrase bank (data).

Text books

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, "An Introduction to Statistical Learning (with Applications in R)", Second edition. https://www.statlearning.com/.

Links

Electronic University Calendar (eVVZ)



File translated from TEX by TTH, version 4.16.