#main-nav-header { display:none; }

Combating shadow AI

Implementing controls for government use of AI

AI legislation is on the rise

AI regulations and legislation are on the rise. In 2024, the White House Office of Management and Budget issued Memorandum 24-10 to all federal agencies and departments in the United States about the use of AI in government. The three-part focus of the memorandum is to:

Strengthen AI governance
Advance responsible AI innovation
Manage the risks from the use of AI

There have been similar efforts at the state level to address concerns about the use and misuse of AI. In 2023, 25 states introduced legislation focused on some aspects of AI. Legislation was successfully enacted in 18 states and Puerto Rico. While some laws focus on the initial study or evaluation of AI use, others seek to govern the use of AI by employees or implement controls to mitigate malicious use or unintended consequences.

Recent legislation highlights some of the dangers of using AI in government and presents some challenges for government agencies and other public sector organizations. These organizations will need to put controls in place to protect public-facing properties from threats and help ensure appropriate consumption of AI.

One emerging concern highlighted by this wave of legislation is the rise of shadow AI — unsanctioned use of public AI tools and models by employees or departments without oversight. Like “shadow IT” before it, shadow AI introduces governance and data leakage risks that regulations are increasingly aiming to address.

Challenge #1: Protecting public Internet properties from AI bots

AI-based crawlers can have legitimate — beneficial — uses for government agencies and other public sector organizations. In some contexts, responsible crawlers and indexers can use publicly accessible data to enhance citizens’ ability to find relevant online services and information.

On the other hand, poorly developed or malicious AI crawlers can scrape content to train public AI platforms without consideration for the privacy of that content. There could be numerous intellectual property and privacy issues if this data ends up training models. When left unchecked, these bots can also hamper the performance of public websites for all users by consuming resources from legitimate interactions.

Control 1: Deploy application-side protections

Agencies can implement several server- or application-side protections to help control how bots interact with servers. For example, they can deploy a robots.txt file. This file can inform and define how crawler traffic interacts with various sections of a site and its data. The file is deployed in the root of the site and defines what agents (bots) can crawl the site and what resources they can access.

There are a couple of challenges with this approach, however. First, the crawler must respect the robots.txt file. While this is a general best practice for “respectable” bots, not everyone follows the rules. There are also non-malicious bots that might just misinterpret syntax and consequently interact with elements that agencies want to stay hidden.

In short, while this is a common approach, leveraging robots.txt or similar .htaccess (Apache) strategies is not fool-proof protection. However, it can be part of a holistic approach for governing how legitimate bots interact with application content.

Control 2: Deploy Bot Mitigation within a Web Application Firewall

Web application firewalls (WAFs) and bot mitigation solutions are essential in today’s world for protecting public web applications. These controls help organizations safeguard their public digital properties from distributed denial-of-service (DDoS) threats, shadow and insecure APIs, along with various other bot-related threats.

Any bot mitigation solution today should include the ability to programmatically identify and classify bots that are scraping content in the service of AI data training. This classification mechanism is a critical capability. It can allow legitimate and verified AI crawlers, or it can block them altogether until an agency determines how these bots should be allowed to interact with a website.

Selecting scalable solutions is also key. In 2023, United Nations Secretary-General António Guterres observed that while it took more than 50 years for printed books to become widely available across Europe, “ChatGPT reached 100 million users in just two months.” The scale and unprecedented growth in AI platforms directly correlates to the growing number of AI bots searching for any publicly exposed datasets for training. The architecture of these platforms must be able to scale in a distributed global environment.

Challenge #2: Shadow AI: Unsanctioned consumption of public AI models

Public AI platforms have enabled users to accelerate tasks ranging from writing a memo to creating complex code. Within government, state and federal agencies see the potential for using AI to solve complex social problems such as healthcare challenges, access to citizen services, food and water safety, among others. However, without AI governance, organizations could be complicit in leaking regulated data sets to insecure public language model training data.

In the same way that organizations have leveraged tools to get a handle on the consumption of unsanctioned cloud applications — or “shadow IT” — they now need to understand the scope of “shadow AI” consumption within their organizations. The increase of shadow AI is making headlines. A global study by The Conversation involving more than 32,000 employees across 47 countries found that nearly 70% of workers favor using free, public AI tools over employer-provided solutions. Alarmingly, almost half admitted to uploading sensitive company or customer data into public generative AI platforms, and 44% acknowledged using AI at work in ways that violate their organization’s policies.

This sensitive data can also be unknowingly shared across AI models. AI models are increasingly trained on data produced by other models as opposed to traditional sourced content.

Control 1: Determine appropriate use of AI

For a comprehensive approach to shadow AI, organizations first need to define the acceptable use of public AI models. In addition, they should determine which roles need access to those models. Establishing these guardrails is a critical first step. New laws on AI in government, and AI in the public sector more generally, frequently highlight the importance of reviewing appropriate use of AI within agencies and deciding which models should be allowed.

Control 2: Deploy controlled access

Once determinations of appropriate use have been made, agencies must then develop controls for enforcing policies. Zero trust network access (ZTNA) principles enable the development and enforcement of those policies to restrict unsanctioned access.

For example, an agency might allow only authorized users from specific administrative groups to access public AI models. Prior to allowing access to those models, a ZTNA solution can also conduct additional posture checks, such as ensuring corporate devices are up to date with patches or that devices have government-approved endpoint management agents running. With ZTNA, the agency can enforce and restrict who can access these public AI models while operating on government assets.

Control 3: Determine what data is appropriate for disclosure to AI platforms

Acceptable use extends beyond defining which users can access AI platforms. Agencies also need to understand and control the data that is posted or submitted to AI platforms. Even something as innocuous as a department memo could include non-public or sensitive data points. Once those data points are submitted to a large language model (LLM), there is a risk of that data being exposed.

Data loss prevention (DLP) controls can help stop the inappropriate use of sensitive data. The right controls will help ensure that proprietary information, such as sensitive application code or even citizen data, does not become a part of an unsecured training data set for an AI platform.

Take the example of an AI developer group that needs to interact with both public and private (in-house) AI platforms. An agency could allow for the consumption of both public (e.g., ChatGPT) and private (e.g., AWS BedRock) AI platforms. Only approved users in the AI development group would be allowed access to these platforms. General users would be blocked from both platforms.

Even when there is an approved AI development group of users, the implementation of a DLP rule can be beneficial. The DLP rule can examine data posted to AI platforms and can make sure non-public sensitive data is posted only to the internal private AI platform.

Protecting constituents with AI governance

AI governance should start with a policy or a mission, not with technology. To evaluate the benefits and risks of AI, an agency’s leaders should appoint focused teams that can evaluate the potential intersections of AI and the agency’s mission.

As the public continues to increase engagement with government through technology, there will be larger, richer data sets that could be used to train AI models. Public sector organizations might choose a conservative approach by blocking all AI crawlers, for example, until the impact of allowing those interactions is understood. For organizations that see a potential benefit from legitimate crawling of public properties, teams must be able to control access by verified AI crawlers and protect against malicious actions.

To get ahead of increased AI regulation, teams should also establish which roles and tasks require access to AI platforms. Determining who gets access and when, and controlling the kinds of data posted to AI models, can address shadow AI risks without sacrificing the tangible benefits of this technology.

Bot management and zero trust security capabilities are core to helping government entities reduce risk in the face of proliferating AI usage. Protecting public web properties and maintaining responsible us of AI should be top of mind when developing mitigation strategies.

AI has tremendous promise for helping to solve many complex social problems. However, there are several potential disadvantages for using AI in government and the public sector. For government agencies and other public sector organizations, protecting their constituencies must always take precedence when exploring this new technology.

This article is part of a series on the latest trends and topics impacting today’s technology decision-makers.

Dive deeper into this topic.

Learn more about how zero trust can reduce risk in the face of proliferating AI usage in the A roadmap to zero trust architecture guide.

Author

Scottie Ray — @H20nly
Principal Solutions Architect, Cloudflare

Key takeaways

After reading this article you will be able to understand:

The emerging state of AI focused legislation
2 primary challenges AI presents
Controls that help agencies achieve legislative compliance

Receive a monthly recap of the most popular Internet insights!

Subscribe to theNET