THE CORPORATE FINANCIAL INFORMATION ENVIRONMENT

Analysis of Financial Text

About Us



CFIE is a research programme exploring accounting and financial market text using natural language processing (NLP) and corpus linguistics methods.

Our work aims to understand the properties and impact of financial narratives, with particular emphasis on annual reports, preliminary earnings announcements, conference calls, and the financial media.

Our multidisciplinary team unites researchers with expertise in financial reporting and financial markets, computer science, and computational linguistics.

Key outputs include:

  • academic research papers
  • software tools to support analysis of financial narratives
  • datasets to support academic research
  • commissioned research projects for business users
  • training in textual analysis of financial text

Research funding to date includes UK Research and Innovation via the Economic and Social Research Council (ESRC), the Financial Conduct Authority (FCA), the Financial Reporting Council (FRC), the accounting profession via the Institute of Chartered Accountants in England and Wales (ICAEW), Lancaster University, and the International Centre for Research in Accounting.



People


Research



Text extraction: We develop tools supporting structured text retrieval from PDF annual reports, earnings announcements provided in HTML format, and conference calls provided as rich text files.

  • We develop a series of Apps to support text extraction. Apps are free for academics and non-commercial users.
  • We construct datasets of UK companies qualitative disclosures to support academic research on narrative reporting.

Strategy and business models: What are the properties of commentary on strategy and business model, and how useful is it for investors?

  • We develop a method for scoring strategy-related annual report commentary and then test whether these disclosures improve the quality of companies’ information environment.
  • We also study the characteristics of strategy discourse in corporate disclosures and compare it with the textbook representation of strategy.

Annual report quality: What are the distinguishing linguistic features of high quality annual report narratives?

  • We study the properties of annual reports that win awards.
  • We also develop new measures of annual report narrative quality using NLP and corpus linguistics methods.

Performance reporting: What focus do management give to earnings- versus non-earnings-based measures of performance, and how useful are alternative performance measures (APMs)?

  • We apply topic modelling methods to detect the aspects of performance highlighted by management and then compare these aspects to reported KPIs
  • We are also using NER methods to detect specific performance measures discussed by management and how they are distributed through the annual report

Preliminary earnings announcements: Are UK preliminary earnings announcements narratives useful beyond quantitative results? Is there any suggestion that management use narratives to present a biased view of financial performance?

  • Research funded by the ESRC examines the properties and economic effects of management commentary in preliminary earnings announcements

Predicting accounting errors and manipulation: Can narrative disclosures provide clues that help to predict accounting manipulation and fraud?

  • We use machine learning methods to isolate features in corporate narratives with incremental predictive ability over financial statement data.


Publications

  • Tuan Q. Ho, Norman Strong & Martin Walker (2018) Modelling analysts’ target price revisions following good and bad news?, Accounting and Business Research, 48:1, 37-61
  • Tsileponis, N., Stathopoulos, K., & Walker, M. (2020). Do Corporate Press Releases Drive Media Coverage? British Accounting Review, 52(2) pp.1-18. Winner of the BAFA 2020 Prize.
  • Tsileponis, Nikolaos; Stathopoulos, Konstantinos; & Walker, Martin. (2020) The Monitoring Role of the Financial Press Around Corporate Announcements. Accounting and Business Research, Vol. 50, No. 6, pp. 539-573.
  • Vasiliki Athanasakou, Florian Eugster, Thomas Schleicher & Martin Walker (2020) Annual Report Narratives and the Cost of Equity Capital: U.K. Evidence of a U-shaped Relation, European Accounting Review, 29:1, 27-54.
  • George Emmanuel Iatridis, Kostas Pappas, & Martin Walker. (2021) Narrative disclosure quality and the timeliness of goodwill impairments. British Accounting Review. Forthcoming .
  • El-Haj, M., P. Alves, P. Rayson, M. Walker, S. Young (2019). Retrieving, classifying and analysing narrative commentary in unstructured (glossy) annual reports published as pdf files.Accounting & Business Research forthcoming
  • Lewis, C., S. Young (2019). Fad or future? Automated analysis of financial text and its implications for corporate reporting.Accounting & Business Research 49(5) 2019: 587–615 (open access)
  • El-Haj, M., P. Rayson, M. Walker, V. Simaki, S. Young. In search of meaning: Lessons, resources and next steps for computational analysis of financial discourse.Journal of Business Finance and Accounting 46(3-4) 2019: 265-306 (open access)
  • Salzedo, C., S. Young, M. El-Haj (2018). Does equity analyst research lack rigor and objectivity? Evidence from conference call questions and research notes. Accounting & Business Research 48(1) 2018: 5-36 (lead article)
  • El-Haj, M., P. Rayson, M. Walker, S. Young, A. Moore, V. Athanasakou, T. Schleicher (2016). Learning tone and attribution for financial text mining. In: N. Calzolari, K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (Eds.) Proceedings of LREC 2016, Tenth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA)
  • El-Haj, M., P. Rayson, M. Walker, S. Young (2014). Detecting document structure in a very large corpus of UK financial reports. In: LREC14 Ninth International Conference on Language Resources and Evaluation. Reykjavik, Iceland: European Language Resources Association (ELRA): 1335-1338
  • El-Haj, M., P. Rayson, P. Alves, C. Herrero-Zorita, S. Young (2019). Multilingual Financial Narrative Processing: Analysing Annual Reports in English, Spanish and Portuguese, in M. Litvak & N. Vanetik (eds) Multilingual Text Analysis: Challenges, Models, and Approaches DOI: 10.1142/11116 ISBN: 978-981-327-487-7 Chapter online

Working papers and work-in-progress

  • Athanasakou, V., M. El-Haj, P. Rayson, M. Walker, S. Young (2019). Annual report commentary on the value creation process. Under review.
  • Athanasakou, V., F. Eugster, T. Schleicher, M. Walker (2019). Annual report narratives and the cost of equity capital: U.K. evidence of a u-shaped relation. Under review
  • Munro, J., S. Young. The linguistic features of high quality reporting.
  • El-Haj, M., M. Walker, S.Young, V. Athanasakou, P. Rayson, T. Schleicher (2019). Classifying tone and attribution in preliminary earnings announcements.

Research funding

  • Detecting & disrupting misleading statements: Phase 2 project on using machine learning to assist Primary Oversight Department (PI): Financial Conduct Authority, ongoing (start date 1 May 2019)
  • Economic and Social Research Council Innovation Fellowship: Analysis of financial texts (PI Vaso Simaki). 3-year Fellowship: Economic and Social Research Council, ongoing (start date 24 January 2018)
  • Detecting & disrupting misleading statements: Phase 1 project on using machine learning to assist Primary Oversight Department (PI): Financial Conduct Authority, ongoing (start date 1 January 2018)
  • Analysing Narrative Aspects of UK Preliminary Earnings Announcements and Annual Reports: Tools and Insights for Researchers and Regulators (PI), £419,283: Economic and Social Research Council (ES/R003904/1) and £67,000: Financial Reporting Council, ongoing (start date 1 December 2017)
  • PLSA Workforce Reporting Toolkit: Impact Assessment (PI), £4,000: Pensions and Lifetime Savings Association, completed November 2017
  • An Analysis of CEO Pay Arrangements and Value Creation for FTSE 350 Companies (PI), £15,000: Chartered Financial Analysts UK Society, completed April 2017
  • Pilot Study of CEO Pay Arrangements and Value Creation for FTSE 100 Companies (PI), £2,730: Chartered Financial Analysts UK Society, completed December 2014
  • Understanding Corporate Communications as part of “ESRC Centre for Corpus Approaches to Social Science (CASS)” (Co-I), £3.5 million Economic and Social Research Council (ES/K002155/1), subproject start date 01/12/14.
  • Understanding the Influences of Financial Reporting, Corporate Disclosures and Financial Media on the Corporate Financial Information Environment (Co-I), £332,416, Economic and Social Research Council (ES/J012394/1) plus £50,000 from Institute of Chartered Accountants in England and Wales’ Centre for Business Performance), start date 01/12/12

Workshops

  • 2nd ESRC workshop on textual analysis methods in accounting and finance. 2-4 September 2019, Lancaster University.
  • 2nd Financial Narrative Processing Workshop (FNP 2019). 30 September 2019, to be held at the The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19) Conference, Turku University, Turku, Finland. Shared task: FinTOC-2019. Programme details are available here.
  • 5th MultiLing Summarisation Workshop (MultiLing 2019) @ RANLP Conference. 6 September 2019, to be held at Recent Advances in Natural Language Processing Conference (RANLP) 2019, Varna, Bulgaria. Shared Task: Financial Narrative Summarisation (+ other subtasks). Programme details are available here.
  • Introduction to large sample analysis of financial narratives. 30-31 July 2019, part of 8th WHU Doctoral Summer Program in Accounting Research (SPAR) “Current Issues in Empirical Financial Reporting Research”, WHU – Otto Beisheim School of Management, Vallendar (Germany)
  • In Search of High Quality Financial Reporting Narratives: Concepts, Research Methods, and Evidence. 1 July 2019, The Work Foundation, 21 Palmer Street, London. Programme.
  • 1st ESRC workshop on textual analysis methods in accounting and finance. 12-14 September 2018. Lancaster University. Programme.
  • 1st Financial Narrative Processing Workshop (FNP 2018). 7 May 2018, held at the 11th Edition of the Language Resources and Evaluation Conference (LREC 2018), 7-12 May 2018, Miyazaki (Japan)
  • Introduction to large sample analysis of financial narratives.12-13 July 2016, part of 6th WHU Doctoral Summer Program in Accounting Research (SPAR) “Current Issues in Empirical Financial Reporting Research”, WHU – Otto Beisheim School of Management, Vallendar (Germany).

Practitioner-focused presentations

  • Wolfe Research 2nd Annual Quantitative and Macro Investment Conference. 17 June 2019, May Fair Hotel, Stratton Street, Mayfair, London: Paper: The linguistic features of high quality reporting.
  • INQUIRE UK & Europe Joint Conference. 25-26 March 2019, Oakley Court, Windsor. Overview: Where’s the value in unstructured data?
  • Workshop on Natural Language Processing in Financial Markets. 16 November 2018. Center for Financial Reporting and Auditing ESMT, Berlin (Germany). Keynote: The Implications of Automated Text Processing for Financial Reporting Disclosure Regulation and Research
  • Financial Reporting Council. 20 February 2018. Invited talk: Overview of automated textual analysis and opportunities for financial reporting practice.

Industry


In addition to publishing work on financial narratives in international peer-reviewed academic journals, we also work with a range of industry partners to deliver bespoke research solutions.

The FRC regulates auditors, accountants and actuaries, and sets the UK’s Corporate Governance and Stewardship Codes with the aim of promoting transparency and integrity in business.

The FRC is a project partner and cofunder in our ESRC project Analysing Narrative Aspects of UK Preliminary Earnings Announcements and Annual Reports: Tools and Insights for Researchers and Regulators (contract ES/R003904/1).

We are working on several analyses including the properties of earnings announcement narratives, alternative performance measures (APMs), and strategy and business model reporting.

The FCA is the conduct regulator for UK financial markets and financial services firms, and the prudential regulator for a subset of UK financial services firms. Ongoing research is exploring how automated analysis of text can assist the FCA in its market scrutiny activities.

The PLSA works together the industry and other parties to raise standards, share best practice and support pension schemes, pension advisors and pension savers.

We worked with colleagues at the PLSA to evaluate annual reporting practices by large UK-listed companies on workforce-related matters. Annual reporting practices were assessed against the PLSA's stewardship toolkit.

Evidence indicated that while exemplars of good reporting practice exist, disclosure practices vary considerably across companies and the overall level of transparency is lower than one might expect given executives’ claims about the key role that their workforce plays in delivering long-term corporate success.

Read the final report here.

CFA UK represents around 12,000 investment professionals and comprises part of the worldwide network of member societies of the CFA Institute.

CFK UK commissioned an analysis of the link between CEO pay and long-term value creation for a sample of the largest companies listed on the London Stock Exchange.

The final report highlighted a weak link between traditional performance metrics used in executive remuneration contracts such as EPS and TSR, and proxies for long-term value creation. The evidence also suggested a weak association between CEO pay and long-term value creation.

Read the final report here.

The IR Society promotes best practice in investor relations and serves as the focal point for UK for investor relations practice and IR professionals.

Since 2015 we have provided input to the Best Annual Report category in the IR Society’s annual Best Practice Awards. We use a version of our CFIE-FRSE app to score aspects of annual reports automatically. These automatic scores serve as a cross-check on detailed manual evaluations performed by members of the IR Society’s expert judging panel.

Datasets


Our work on financial market text has generated several novel datasets and associated resources that are available to download and use for academic research and non-commercial purposes.


Summary narrative features for annual reports published between 2003 and 2017 by firms listed on the London Stock Exchange’s Main Market and Alternative Investment Market.

The dataset adjusts for firm name changes to ensure time-series comparability. Fiscal year-ends are matched to Thomson Reuters Datastream.

Note: We do not publish company identifiers due to licensing restrictions. Instead, we provide details on how to match the dataset to Thomson Reuters Datastream using firm names and a SAS script.

A zip file containing the dataset and supplementary material is available to download here.


A range of wordlist resources drawn from prior work and our own research relating to features such as sentiment, forward-lookingness, risk, uncertainty, and strategy.

Wordlists available to download:


A set of annual report corpora constructed using reports published between 2003 and 2017. These corpora can be used to study the linguistic properties of UK annual reports and to identify unusual linguistic features associated with a specific report or report section.

Available UK annual report corpora include:

  • Letter from board chair
  • CEO review
  • Financial review
  • Operating review
  • Business review
  • Aggregate management commentary (comparable to MD&A schedule in Form 10-K)
  • Principal risks and uncertainties
  • Governance statement
  • Chair’s governance introduction
  • Remuneration report
  • Corporate social responsibility disclosures
  • Highlights
  • Group audit report
  • Entire Narratives component (including audit report)
  • Entire Narratives component (excluding audit report)

A zip file containing the dataset and supplementary material is available to download here.

Software and Script


Resources for scraping, retrieving and parsing a range of financial market texts including 10-Ks from EDGAR, PDF annual reports and HTML annual earnings announcements published by firms listed on the London Stock Exchange, and conference call transcripts.


The CFIE-FRSE annual report App for digital UK annual reports published as PDF files (submitted as individual files or in bulk for batch processing).

The App supports structured retrieval of annual report text based on the report table of contents (or PDF bookmarks where a valid table of contents cannot be detected). It also classifies annual report contents into a range of generic sections (e.g., chairman’s statements, governance statements, etc.) to facilitate cross-sectional comparisons.

Output is provided as individual .txt files and as a pooled Excel spreadsheet.

Note that the App is not recommended for structured extraction from scanned (image-based) PDF files.

The Java-based App is available on GitHub for download to your PC.

See here for an academic journal article providing a detailed discussion and validation of the App.

The tool can be adapted to process reports published in other languages and reporting regimes.


Python scripts for scraping files from EDGAR, retrieving text from specific items (e.g., Item 7, MD&A), and text preprocessing prior to analysis


Java scripts to process transcript files saved in RTF and HTML formats.

Scripts support:

  • splitting transcripts into the management presentation and the Q&A sections of the call
  • distinguishing between analysts’ questions and management responses in the Q&A component of the call


Use the following resources to support corpus analysis of financial text

Training


Details of our workshops

  • 2nd ESRC workshop on textual analysis methods in accounting and finance. 2-4 September 2019, Lancaster University.
  • 2nd Financial Narrative Processing Workshop (FNP 2019). 30 September 2019, to be held at the The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19) Conference, Turku University, Turku, Finland.Shared task: FinTOC-2019. Programme details are available here.
  • 5th MultiLing Summarisation Workshop (MultiLing 2019) @ RANLP Conference. 6 September 2019, to be held at Recent Advances in Natural Language Processing Conference (RANLP) 2019, Varna, Bulgaria. Shared Task: Financial Narrative Summarisation (+ other subtasks). Programme details are available here.
  • Introduction to large sample analysis of financial narratives. 30-31 July 2019, part of 8th WHU Doctoral Summer Program in Accounting Research (SPAR) “Current Issues in Empirical Financial Reporting Research”, WHU – Otto Beisheim School of Management, Vallendar (Germany).
  • In Search of High Quality Financial Reporting Narratives: Concepts, Research Methods, and Evidence. 1 July 2019, The Work Foundation, 21 Palmer Street, London. Programme.
  • 1st ESRC workshop on textual analysis methods in accounting and finance. 12-14 September 2018. Lancaster University. Programme.
  • 1st Financial Narrative Processing Workshop (FNP 2018). 7 May 2018, held at the 11th Edition of the Language Resources and Evaluation Conference (LREC 2018), 7-12 May 2018, Miyazaki (Japan)
  • Introduction to large sample analysis of financial narratives.12-13 July 2016, part of 6th WHU Doctoral Summer Program in Accounting Research (SPAR) “Current Issues in Empirical Financial Reporting Research”, WHU – Otto Beisheim School of Management, Vallendar (Germany).

Contact Us


If you have any questions/comments regarding the CFIE project, please let us know.

CONTACT

  • Steven Young
    Office C041, C - Floor
    Accounting and Finance Department
    Lancaster University Management School
    United Kingdom
    LA1 4YX