The Robodebt Royal Commission report said that:

"When done well, AI and automation can enable government to provide services in a way that is faster, cheaper, quicker and more accessible. The concept of when done well is what government must grapple with as increasingly powerful technology becomes more ubiquitous."

Three recent papers provide some developed thinking on what “when done well” might look like for government use of generative AI (GenAI) - from Australia’s Department of Prime Minister and Cabinet (see ‘read more’ below), California’s Government Operations Agency and the UK’s Ada Lovelace Institute

Who do you trust less: Government or AI?

Governments (understandably) frame the risk in their use of AI as AI degrading the trust that citizens otherwise place in government. As the Australian paper puts it, “[i]f the community does not trust AI, and the [Australian Public Service or APS] still uses it within a service offering, the APS may itself be seen as untrustworthy.”

A more cynical observer might say that the risk is that AI doubles down on the distrust citizens already have of government, but the evidence suggests, at least in the Australian environment, that is not right:

  • most Australians (61%) trust Australian public services and believe they will change to meet Australians’ needs in the future.

  • but less than half of Australians (44%) believe the benefits of AI outweigh the risks.

  • however, if they had to choose between a list of private and public purposes for using AI, people reported highest trust in government to responsibly use AI to deliver services faster (42% of respondents).

  • that said, the level of trust in government use of AI varies demographically:

    • 70% of people who reported knowing ‘very well’ about AI trust government to responsibly use AI to deliver faster services, compared with 54% who reported a moderate knowledge of AI, and 37% who reported having a slight knowledge of AI.

    • younger people reported higher trust in government’s use of AI than older people.

    • women reported lower trust in government’s use of AI than men.

    • people in regional Australia reported lower trust in government’s use of AI than those in metro areas.

    • people born in Australia reported lower trust in government’s use of AI than those born overseas (with the exception of those born in the United Kingdom).

What “when done well” by government looks like

The Californian paper provides a good catalogue of the ways in which a government can use AI for the benefit of its citizens (echoed in the Australian Government and Lovelace papers):

  • improve the performance, capacity, and efficiency of ongoing research and analysis within Government through summarisation and classification. This could go beyond the mechanical task of summarising submissions to a public consultation to involve “sentiment analysis of public feedback” across a wider commentary set, including traditional and social media.

  • personalize and customize Government work products to reflect California’s demographic diversity, with the potential to improve access to services and outcomes for all. This could include auto-populating public program applications based on a person’s situation and household composition. GenAI also can identify groups that, for language or other reasons, are disproportionately not accessing services by analyzing faulty responses, feedback surveys or comments.

  • improve language and communications access for Government materials by producing them in multiple languages and different formats like audio books, large print text, and braille documents.

  • optimize software coding and explain and categorize unfamiliar code across Government IT systems. GenAI can generate code in multiple computing languages and translate code from one language to another. This can improve government operations if a system is using code that is written in an obsolete language.

  • find insights and predict key outcomes in complex datasets to empower and support public service decision-makers. For example, GenAI analyses data streams from drones, satellites, and sensors monitoring public infrastructure to generate detailed damage and deterioration assessments for maintenance cycles.

  • optimize workloads for environmental sustainability in planning and approval processes for private sector works and in the Government’s own activities. GenAI simulation tools could model the carbon footprint, water usage, and other environmental impacts of major infrastructure projects.

  • increase first-call resolution for public service centres.

  • provide better information access for citizens and answers on a ‘whole of government’ basis. For example, searching or matching government codes, permits and regulations applicable to the type of development proposed by a citizen.

While the California State government develops more specific principles, there is an interim set of ‘do’s’ and ‘don’ts’ in government use of AI;

  • to protect the safety and privacy of individuals’ data, in performing their work state employees should only use state-provided enterprise GenAI tools on state-approved equipment.

  • under no circumstances should state employees provide Californians’ data or state data to a free, publicly available GenAI solution like ChatGPT or Google Bard.

  • a plain-language explanation of how GenAI systems factor into delivering a state service should be provided, and the fact that content is generated by GenAI should be disclosed.

  • state supervisors and employees are encouraged to review GenAI products for accuracy and ensure they paraphrase rather than use AI-generated content verbatim.

The Californian paper also imposes some extra obligations on government agencies using ‘high risk’ AI, which is defined as:

an automated decision system that is used to assist or replace human discretionary decisions that have a legal or similarly significant effect, including decisions that materially impact access to, or approval for, housing or accommodations, education, employment, credit, health care, and criminal justice.

These additional obligations are:

  • pre-deployment assessments and red-teaming of GenAI systems to identify any issues with fairness, privacy, security, performance, and safety in the model beforehand.

  • post-deployment monitoring to detect security vulnerabilities, performance changes, and equity issues.

  • a human reviewer of any GenAI-supported workflow or output that results in a decision about program eligibility or social safety net benefits.

  • collecting and reviewing community feedback, particularly from historically marginalised groups, to capture diverse perspectives and provide for decisions to be appealed to humans.

The Lovelace paper goes further and canvasses independent third party audits upfront of any AI used in government, whether developed internally or externally, extending to providers of APIs as part of integrated products or as custom-built tools.

All three papers also emphasise the importance of ‘co-design’ with users and public interest groups. The Lovelace paper says:

The government should incorporate meaningful public engagement into the governance of foundation models, particularly in public-facing applications. While public sector institutions have existing mandates, deploying AI systems raises new questions of benefits, risks and appropriate use that need to be informed by public perspectives.

However, that paper also notes that, at least outside the US, other governments will often be acquiring foundational models developed by foreign commercial providers, and that much more work needs to be done on new models of public and citizen participation between government, the private sector, and the public.

Some things may not be capable of being “done well”

The Californian paper candidly acknowledges that the nature of GenAI makes it difficult to comply with some commonly advocated ‘guardrails’ for AI.

First, explainability is difficult with GenAI, given the vast pool of data from which it draws inferences and:

The difficulty in extracting human-interpretable explanations from GenAI technology is an important factor to consider for government to provide sufficient information about decisions that concern constituents GenAI models can be prompted to "explain their reasoning" through prompting techniques. However, these techniques can be inconsistent because GenAI models have been shown to have misrepresented their stated reasoning.

The rationale for an explainability guardrail is sound - outcomes produced by AI models would ideally be readily understandable and their decision-making processes transparent; in a government context this is especially important in order to build trust. However, in practice explainability tends to prove much more elusive. GenAI, like other AI models, may produce output that is difficult to anticipate and likewise to explain. While the Californian, Australian and UK papers all emphasise the importance of capacity building across the board in government, when GenAI’s ‘point of use’ is a call centre or over the counter at a government service centre, how are frontline staff going to be adequately equipped with the skills to provide a decent, plain language rationale that satisfies the requirements for ‘explainability’?

Second, the ‘right to be forgotten’ in privacy law (for example, under the GDPR) can be compromised because of the difficulty in erasing personal information embedded within the model features (known as algorithmic disgorgement). 

Third, standard approaches of anonymizing data to use in training may be susceptible to being undone by GenAI. If a GenAI model is trained on a data set of images of people, it could potentially generate new images that are similar to the images of real people in the training dataset. These new GenAI images could then be used to identify real individuals in the training dataset, even if the original images were anonymized.

Lastly, the AI supply chain is complex, with multiple independent suppliers involved, and guardrails may need to be applied and tested at each stage of the supply chain. For example, third-party plug-ins and browser extensions that interact with GenAI models can also pose privacy risks. For example, a plug-in could collect data about the user's interactions with a GenAI model, such as the text that they generate or the images that they create.

Should the answer sometimes be that AI can never be “done well” in government?

Both the Australian Government and Ada Lovelace papers step back to consider the deeper philosophical issue of what AI means for or says about the government that we want or expect.

The Lovelace paper cautions that “[t]here is a risk that foundation models are used because they are a new ‘shiny’ technology everyone is talking about, not because they are the best tool or system for the challenge.” Further, governments do not have a good track record in adopting and managing new technology:

..harnessing data and technology has long been a challenge for the UK government. Outdated IT systems are common. Data architecture is fragmented, with information sitting in organisational silos. This makes it difficult to gather and summarise valuable information at speed or in a crisis; impedes holistic analysis, collaborative decision-making and coordinated service delivery; and creates costs and barriers to innovation. The civil service recognises these pressures but has not found solutions despite many digital transformation initiatives.

But the Lovelace paper’s concerns go beyond the IT cackhandedness of governments. The reason governments over-estimate the suitability of automation of services is because they choose to treat the relationship with individual citizens as transactional (and usually one way): you apply for a social security benefit, you sit a test for a driver’s licence, or you lodge a tax return. In the UK, there have been calls for a root and branch rethinking of public service to shift to a relationship model:

Try teaching GenAI to do that.

The Australian Government paper is more sanguine (realistic?) about the inevitability of widespread adoption of AI in government given the external pressures governments are under:

We will need to innovate to meet community expectations of public services in the future. The community’s expectations around the quality of public services are growing: for a higher standard of care; for tailored and personalised services; and for greater responsiveness, convenience and efficiency when accessing services. Australia’s population is ageing, increasing demand for care and support services. At the same time, an increase in the share of older Australians in the population means fewer working-age Australians to help fund public services. External forces, such as climate change, are also expected to increase demand for services while decreasing the resources (people and funding) available to provide them.

However, the paper also recognised the importance of the ‘human dimension’:

..using artificial intelligence shouldn’t come at the expense of empathy. AI will increase the trustworthiness of public services if it is designed and implemented in a way that demonstrates empathy. Trustworthiness is built when the APS demonstrates empathy for the people it serves.

While the AI developer’s response might be “no worries, I can make GenAI mimic a human”, the paper noted that this could be counterproductive to trust:

While AI’s human-like capability can compel users to perceive an AI-system as a person, this was seen to pose a significant risk to trustworthiness, with focus group participants noting that fake empathy from AI could completely destroy trust. Equally, there is a risk that decisions and outcomes that are seen to be lacking empathy are attributed to the use of AI - whether or not this is justified - reducing trust in government’s ability to use AI responsibly in service delivery.

Reflecting the bite of the Robodebt Royal Commission, the paper noted that in some situations there is no substitute for dealing with a human public servant:

A loss of human interaction in moments of need could significantly erode trust. Research literature indicates that a lack of interpersonal interaction with public service actors and decision-makers is a significant driver of distrust in AI.

[Additionally] [f]or some in the community, human interaction and a relationship with public services is as important as the service itself for demonstrating trustworthiness.

But for all that, and while trying to put some sugar coating on it, the paper acknowledged over time the inevitability of having to deal with government through AI:

Successful service delivery depends on supporting people to engage with AI-enabled services in the long term. Maintaining trustworthiness requires the APS to deliver services to the whole community. Public services are for everyone, including those who don’t want to engage with digital and AI-enabled systems or provide additional personal data, to ensure that using AI in public service delivery doesn’t entrench disadvantage. Nevertheless, in the long term, opting-out will not be an option in a more connected world, where AI will be critical to address future challenges. It will be important to invest in building the AI literacy and digital connectivity of the community, particularly cohorts experiencing vulnerability and those that support them, in order to bring everyone along on the AI adoption process.

The paper points out that allowing citizens to opt out of AI has potential implications for both the individual and society as a whole:

..if individuals who had previously shared data chose to opt out of future sharing, would a system that had learned about them then continue to view them as the person they were prior to opting out? Similarly, they noted that if data collection systems were not comprehensive, which might be the case in the context of health, mental health and other serious life issues, then individuals might receive services or interventions that were not appropriate for their current circumstances.

If opt out is at scale (like Australia saw in the MyHealth electronic medical record), the risks of bias and error in AI systems could be exacerbated as the data pool shrinks.