LLMs hold promise for the actuarial field

The rapid advancement of generative artificial intelligence has led to the development of large language models. ChatGPT and other AI chatbots are examples of tools powered by LLMs. These models specialize in natural language processing to understand and generate human language. Massive amounts of text data train LLMs to perform tasks such as mimicking human language, providing translations, summarizing, answering questions, generating text and even software coding.

LLMs hold promise for many industries and professions, including the insurance industry and the actuarial field.
LLM use cases in insurance
In March, the Society of Actuaries Research Institute convened a panel of experts to discuss the use of generative AI in the insurance industry. The panel consisted of actuaries from a variety of practice areas, and they noted several LLM insurance applications such as:
- Coding assistance: Code generation and automating documentation
- Digital assistant: Email, document creation, note taking, meeting summarization
- Data summarization and categorization: Claims data, submissions, notes; reinsurance treaties; medical underwriting files, calls and meetings
- Testing and model validation assistance: Generating test cases, testing documentation, review and validation
- Other applications: Translation, research source attribution, claims integration
The panel concluded that current AI tools, such as LLMs, can boost productivity for some tasks, but the technology hasn’t evolved enough to replicate actuarial analysis and decision-making. However, the panel predicted it will become necessary for actuaries to use these tools.
Implementing LLMs isn’t without challenges. The sensitive data insurance companies manage makes data privacy and security critical. Also, regulation compliance and ethical standards are needed to build trust with customers and stakeholders. So, incorporating LLMs into current systems demands thorough planning and teamwork among various departments within the organization.
Benchmarking and comparing models
After identifying tasks that might be suited for LLMs to complete or assist, consider the specific type that best meets the needs of a given task. There are four basic variants for use cases:
- Foundational models: Have not been tuned for specific tasks
- Instruct models: More fine-tuned, meant for task-oriented applications
- Code models: Specialize in understanding and generating code
- Multimodal models: Understand and generate text, images and audio
Opting for the largest and highest-performing LLM may not always be necessary or cost-effective. Other considerations include latency requirements, budgets, scalability, ethical and bias issues. Experimentation and evaluating the results are helpful in choosing the appropriate LLM.
Finding the right LLM for a specific task depends on these considerations:
- Model size and computational requirements
Need | Requirement | Size |
Simple tasks, quick responses | Less powerful hardware | Smaller model |
Complex reasoning | More computational resources | Larger model |
- Task-specific performance
- Context window size: The amount of text generated in a single interaction
- Cost vs. performance
LLM benchmarks are assessment tools that compare strengths and limitations. There are categories of benchmarks, some of which are listed in the table below, along with the specific products that fall within that category and a description of how each of them works:
Benchmark category | Benchmark product | Description |
Knowledge and Recall | Massive Multitask Language Understanding | Uses about 16,000 multiple-choice questions across a range of topics, from mathematics to law. |
Google-Proof Question and Answering | 448 multiple-choice questions written by experts in biology, physics and chemistry. Tests a model’s expert-level knowledge. | |
Mathematics | Mathematics Aptitude Test of Heuristics | 12,500 problems from mathematics competitions, covering a range of difficulty levels and math topics. Requires LLMs to demonstrate their reasoning. |
Coding | HumanEval | Assesses an LLM’s code-writing capabilities. Consists of 164 programming problems that the LLM is required to synthesize. |
Reading Comprehension | Discrete Reasoning Over the Content of Paragraphs (DROP) | A Q&A dataset that assesses an LLM’s ability to understand and extract information from inputs. |
The best way to evaluate LLMs is to create a benchmark that is tailored to a specific task. This method not only gives a more accurate measure of performance for that task but also supports development and enables ongoing performance tracking.
Deploying an LLM
The easiest way to use an LLM is through an application programming interface from major developers, such as ChatGPT. While hosting an LLM independently offers more control, using an API is simpler, faster and cost-effective. It is important to ensure that the chosen provider meets security and privacy standards.
Launching an open LLM follows a similar process to other software deployments, although the details can differ depending on the specific LLM being used. There are a variety of deployment methods, from software for beginners to more robust solutions that are appropriate for production environments. As far as where to locate the LLM, the cloud offers a simpler solution compared to building an independent server.
Because deploying LLMs falls outside typical actuarial training and expertise, it is recommended to seek assistance from cloud engineers and software developers.
Assessing risk and maintaining governance
Actuaries are experts in risk management and governance and have extensive knowledge about technology and data. Their expertise and professional standards make it crucial that they have key roles in the responsible and ethical use of AI and LLMs.
Risk and ethics considerations are essential in choosing LLMs for responsible actuarial use. For example, it is important to find a provider who shares the organization’s viewpoint on ethical AI practices and that they feel comfortable with their AI governance structure.
Other provider considerations include:
- Privacy and protection: Ensuring models and their providers meet privacy and data protection requirements.
- Risk and compliance: Regularly reviewing LLM output to ensure it meets compliance requirements.
- Technology and reliability: Ensuring the model has the necessary capabilities, performs consistently and offers sufficient technical support.
- Bias, fairness and discrimination: Confirming the LLM addresses these risks.
- Transparency and explainability: Documenting model specifications and how it is used, logging outputs, and detailing the development process.
- Accountability and responsibility: Establishing clear lines of accountability and responsibility to oversee decision-making with LLM help.
Below are two important resources that provide a high-level overview of AI ethics:
- UNESCO’s Recommendation on the Ethics of Artificial Intelligence
- The National Association of Insurance Commissioners Principles on Artificial Intelligence, particularly relevant to the financial services and insurance sectors.
SOA resources to help actuaries leverage AI
The SOA Research Institute published a detailed guide on deploying LLMs for actuarial use, Operationalizing LLMs: A Guide for Actuaries, which provides more details and helpful tips. Additionally, SOA’s AI Research landing page provides a library of reports and resources, including the monthly Actuarial Intelligence Bulletin, which informs readers about advancements in actuarial technology and new AI research reports.
© Entire contents copyright 2025 by InsuranceNewsNet.com Inc. All rights reserved. No part of this article may be reprinted without the expressed written consent from InsuranceNewsNet.com.
The post LLMs hold promise for the actuarial field appeared first on Insurance News | InsuranceNewsNet.