New research has claimed large language models, specifically GPT-4 (which powers certain versions of ChatGPT and various Microsoft Copilot-branded generative AI products) is able to analyze financial statements with greater accuracy than humans.
The findings from researchers at the University of Chicago suggest significant implications for the future of financial analysis and decision-making as AI becomes more commonplace.
The study also highlights the versatility of generic, multipurpose LLMs such as GPT-4, which can offer similar abilities as more specialized tools, noting, “We find that the prediction accuracy of the LLM is on par with the performance of a narrowly trained state-of-the-art ML model.”
LLMs are outstanding at analyzing financial reports
In their tests, the researchers found that GPT-4 outperformance human analysts even without textual context, highlighting the tech’s accuracy of 60% compared with the 53-57% range of human analysts.
The success didn’t come without some initial ground work, though, with the paper going into detail about the researchers’ use of chain-of-thought prompts in order to craft more suitable and accurate responses.
Moreover, the study found that GPT-4 and human analysts complement each other well – while the LLM excels in areas where humans might be inefficient or biased, humans add value where additional context is required.
GPT-4’s capabilities were attributed to its vast knowledge base and theoretical understanding, which enable it to draw conclusions from data patterns even without specific financial training, and while the model was shown to have some limitations, progress has already seen the latest GPT-4o model significantly improve efficiency while morphing into a multimodal model.
While there has been a fair deal of scepticism surrounding generative AI’s readiness to replace human workers, its position as an integral support is becoming increasingly evident as human workers prepare to hybridize with the efficiency-boosting technology.