Kurt Hornik: Data and Text Mining Summer Semester 2023
Classes
All classes start at 14:30.| Unit | Date | Time | Location | Topic | Materials | Assignments | ||
| Read Ch 1 pages 15-42 | ||||||||
| 1 | 2023-03-01 | 14:30-17:30 | D4.0.136 | Generalized linear models | Slides | Ex 1-6 | Due: 2023-03-07 23:59 | Read Ch 4 pages 129-189 |
| 2 | 2023-03-08 | 14:30-17:30 | D4.0.136 | Resampling | Slides | Ex 7-13 | Due: 2023-03-14 23:59 | Read Ch 5 pages 197-219 |
| 3 | 2023-03-15 | 14:30-17:30 | D4.0.136 | Penalized regression | Slides | Ex 14-16 | Due: 2023-03-21 23:59 | Read Ch 6 pages 225-282 |
| 4 | 2023-03-22 | 14:30-17:30 | D4.0.136 | Trees, forests and boosting | Slides | Ex 17-21, 24-25 | Due: 2023-03-28 23:59 | Read Ch 8 pages 327-361 |
| 5 | 2023-03-29 | 14:30-17:30 | D4.0.136 | Additional topics and case study | Slides | Ex 22-23, 26-28 | Due: 2023-04-11 23:59 | |
| 6 | 2023-04-12 | 14:30-17:30 | D4.0.136 | Text mining foundations | Slides | |||
| 7 | 2023-04-19 | 14:30-17:30 | D4.0.136 | Text mining applications | Slides | |||
| 8 | 2023-04-26 | 14:30-17:30 | D4.0.136 | Presentations | ||||
Assignments
Exercises When submitting homework assignments by email, please use the subject ‘DTM Unit n Team k’, where n is the number of the unit and k is the number of your team. Submit by email to <Kurt.Hornik@wu.ac.at> and cc all team members. Chapters in assignments refer to the textbook by James et al. Homework and presentation teams:| Package | Papers | Presentations | ||
| A | Luna | xgboost | Loughran et al (2016) | |
| B | Andriy | ranger | Gunnarsson | |
| C | Pedro | caret | Loughran et al (2014) | |
| D | Na | edgar/finreport/XBRL | Loughran et al (2011) | |
| E | Sebastian | mlr3 | Gu, Renault |
R package projects
caret, mlr3, ranger, xgboost, edgar/finreportr/XBRL.Reading list
Dogu Araci (2019), FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063. Mike Chen, George Mussalli, Amir Amel-Zadeh and Michael Oliver Weinberg (2021), NLP for SDGs: Measuring Corporate Alignment with the Sustainable Development Goals. The Journal of Impact and ESG Investing. https://jesg.pm-research.com/content/early/2021/12/12/jesg.2021.1.035. Chen Gu and Alexander Kurov (2020), Informational role of social media: Evidence from Twitter sentiment. Journal of Banking & Finance, volume 121. DOI:10.1016/j.jbankfin.2020.105969. Björn Rafn Gunnarsson, Seppe vanden Broucke, Bart Baesens, María Óskarsdóttir and Wilfried Lemahieu (2021), Deep learning for credit scoring: Do or don’t?. European Journal of Operational Research, volume 295, issue 1, pages 292-305. DOI:10.1016/j.ejor.2021.03.006. Tim Loughran and Bill McDonald (2011), When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance, volume 66, issue 1, 35-65. DOI:10.1111/j.1540-6261.2010.01625.x. Tim Loughran and Bill McDonald (2014), Measuring Readability in Financial Disclosures. The Journal of Finance, volume 69, issue 4, 1643-1671. DOI:10.1111/jofi.12162. Tim Loughran and Bill McDonald (2016), Textual Analysis in Accounting and Finance: A Survey. Journal of Accounting Research, volume 54, issue 4, pages 1187-1230. DOI:10.1111/1475-679X.12123. Thomas Renault (2019), Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance, volume 2, pages 1-13. DOI:10.1007/s42521-019-00014-x.Data sets
german (data, docs), firms (data, docs), Financial Phrase bank (data).Text books
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, "An Introduction to Statistical Learning (with Applications in R)", Second edition. https://www.statlearning.com/.Links
Electronic University Calendar (eVVZ)File translated from TEX by TTH, version 4.16.