CBIZ
  • Article
November 11, 2024

Accounting for Data Acquisition Costs

Table of Contents

Accounting for Data Acquisition Costs

Background

In the world of artificial intelligence (AI), data is the lifeblood that fuels the development of large language models (LLMs). Acquiring and preparing the large datasets required for training these models is a significant undertaking, both operationally and financially. As companies invest in data acquisition and preparation, understanding the appropriate accounting treatment of these costs becomes critical. Companies must consider whether these costs should be expensed as incurred, recognized as a separate intangible asset or capitalized as part of the generative AI application or LLM.

In this article, part of our ongoing AI Accounting Insights series discussing accounting considerations for companies developing generative AI technologies, we examine the accounting considerations for data acquisition and related costs associated with LLM and generative AI application development. By exploring the relevant accounting standards, we aim to guide financial leaders in making informed decisions on whether to capitalize or expense these costs.

Data Acquisition Costs

Generative AI refers to a type of AI that can produce new content — whether it’s text, images, code or even music — based on the patterns it has learned from existing data. Generative AI is capable of generating content that is novel, coherent and contextually appropriate. This technology is powered by advanced machine learning techniques that enable computers to understand and mimic the complexities of human creativity and language.

One of the key innovations behind generative AI is the development of LLMs. LLMs are designed to process and generate human-like text. They are built using deep learning techniques and trained on vast amounts of data, which may include books, articles, websites and other written content. The model learns the structure of language, including grammar, context and the relationships between concepts, which enables it to generate relevant and contextually accurate responses. By leveraging LLMs through generative AI applications, businesses can automate and enhance a variety of language-intensive tasks, leading to increased efficiency and innovation.

LLMs form the backbone of generative AI applications and are increasingly being integrated into various business applications.

The large amounts of data required to train an LLM or fine-tune an AI application can often be utilized to train multiple models as long as the data rights remain active. As freely available data becomes more limited or less novel, companies increasingly need to purchase data to maintain the pace of advancing models. Data rights are typically acquired through perpetual licenses, term-based licenses or outright purchases. Additionally, data can be sourced in non-digital formats, such as books or printed materials, but these must be converted into digitized, machine-readable formats for use in training. Even then, licensing may still be required to comply with copyright laws.

After the data is acquired and digitized, it still needs to undergo several steps before it is ready to train models. These include cleaning the data to remove errors, making sure it is in a consistent format, labeling it so the AI knows what to learn from and splitting the data into training, validation and testing sets.

Current Accounting Framework

Entities developing generative AI applications and LLMs will need to consider whether costs to acquire datasets from third parties should be (1) expensed as incurred, (2) recognized as a separate intangible asset, or (3) capitalized as part of the AI application or LLM development. Under U.S. GAAP, costs to acquire data from a third party can be capitalized as intangible assets if they meet certain criteria outlined in ASC 350,Intangibles—Goodwill and Other. Specifically, ASC 350-30-25-43 states the following regarding the acquisition of intangible assets:

Intangible assets that are acquired individually or with a group of assets in a transaction other than a business combination or an acquisition by a not-for-profit entity may meet asset recognition criteria in FASB Concepts Statement No. 5, Recognition and Measurement in Financial Statements of Business Enterprises, even though they do not meet either the contractual-legal criterion or the separability criterion (for example, specially-trained employees or a unique manufacturing process related to an acquired manufacturing plant). Such transactions commonly are bargained exchange transactions that are conducted at arm’s length, which provides reliable evidence about the existence and fair value of those assets. Thus, those assets shall be recognized as intangible assets.

Acquired data is likely to meet the definition of an asset and can be recognized separately as an intangible asset if the data is separately identifiable and provides an entity with a present right to future economic benefits.

Costs incurred to third parties for acquired data that will be used to train future LLMs or other AI applications can likely be capitalizable under ASC 350-30 based on the perceived future economic benefit. When assessing data for future benefit and its useful life, companies should consider the type of data and its permanence. High-quality, enduring datasets like academic texts or historical data tend to have a longer useful life, while time-sensitive data, such as market reports or news articles, may become obsolete more quickly. Foundational data generally retains its value longer than data subject to rapid change. Companies may need to consider whether stratifying the data assets into multiple tiers may be appropriate to align their useful lives with their anticipated benefit.

The treatment of costs to acquire data will also depend on the nature of the rights obtained. If data is acquired under a perpetual license or outright purchase and provides future economic benefits through the anticipated training of future LLMs and AI applications, these costs can be capitalized separately as intangible assets under ASC 350 as discussed. Costs incurred related to data acquired under term-based licenses, however, are generally expensed over the term of the license since the future benefit associated with this data would be limited to the term of the active license.

Costs incurred for data acquired for a specific LLM or AI application that otherwise does not have alternative future use (i.e., to train other LLMs or applications) would not qualify as its own intangible asset under ASC 350-30. However, these costs may still be capitalizable as a direct cost incurred to develop internal-use software within the scope of ASC 350-40. If the LLM or AI application project is in the application development stage, it may be appropriate to capitalize these data acquisition costs as direct costs incurred in the development of this software. If the data and related training is not related to the application development phase (i.e., only needed to maintain existing functionality or features or incurred during the preliminary project phase of development), these costs would not be capitalized.

Costs incurred for acquiring data used in research and development activities that do not have an alternative future use would be expensed as incurred under ASC 730,Research and Development. Costs to acquire data for a specific software development project that is within the scope of ASC 985-20 for which technological feasibility has not been established would also be expensed as incurred, as these are considered research and development activities.

Costs can also be incurred related to acquired data to prepare the data for use in training LLMs or AI applications. Preparation costs specific to the LLM or AI application being developed may be capitalizable as part of the LLM or application development if those costs are included as part of the application development stage. In some cases, significant costs may be associated with converting acquired data into a suitable format for training purposes. For example, non-digital data is sometimes acquired, and the data is required to be converted into a digitized format to be suitable to train LLMs and AI applications. While not as clear in the current accounting guidance, we believe such costs may be capitalizable under ASC 350-30 as part of the acquired data intangible asset if they are directly attributable to preparing the asset for its intended use and have benefit for training future models. However, we encourage management to consult with their auditors as this is an area where interpretations are still being developed.

Conclusion

The appropriate accounting treatment for data acquisition and preparation costs is a critical consideration for companies developing LLMs and AI applications. Depending on the nature of the data, the rights obtained, and its intended use, costs may be expensed, capitalized as a separate intangible asset or included in the cost of developing internal-use software. Careful evaluation is essential to ensure compliance with U.S. GAAP standards. As interpretations of accounting guidance in this evolving field continue to develop, consultation with experts and a thorough understanding of the nature of the data and related rights are recommended to make informed accounting decisions.

Connect with us to learn more.

At CBIZ ARC, we specialize in providing top-tier technical accounting and financial consulting services to growth-oriented companies, including those leading in AI innovation. Our expert teams bring a unique combination of deep accounting knowledge and a clear understanding of emerging technologies, helping companies navigate complex financial, systems, and data management challenges. From addressing technical accounting issues and preparing for IPOs to unlocking deeper insights with AI-driven tools, we deliver customized solutions that ensure compliance, financial transparency, and performance optimization. Renowned for our commitment to seamless execution and responsive client service, CBIZ ARC provides the strategic support to drive your company’s sustained growth and long-term success.

© Copyright CBIZ, Inc. All rights reserved. Use of the material contained herein without the express written consent of the firms is prohibited by law. This publication is distributed with the understanding that CBIZ is not rendering legal, accounting or other professional advice. The reader is advised to contact a tax professional prior to taking any action based upon this information. CBIZ assumes no liability whatsoever in connection with the use of this information and assumes no obligation to inform the reader of any changes in tax laws or other factors that could affect the information contained herein. Material contained in this publication is informational and promotional in nature and not intended to be specific financial, tax or consulting advice. Readers are advised to seek professional consultation regarding circumstances affecting their organization.

“CBIZ” is the brand name under which CBIZ CPAs P.C. and CBIZ, Inc. and its subsidiaries, including CBIZ Advisors, LLC, provide professional services. CBIZ CPAs P.C. and CBIZ, Inc. (and its subsidiaries) practice as an alternative practice structure in accordance with the AICPA Code of Professional Conduct and applicable law, regulations, and professional standards. CBIZ CPAs P.C. is a licensed independent CPA firm that provides attest services to its clients. CBIZ, Inc. and its subsidiary entities provide tax, advisory, and consulting services to their clients. CBIZ, Inc. and its subsidiary entities are not licensed CPA firms and, therefore, cannot provide attest services.