An in-house developed generative artificial intelligence (GenAI) capability built by ASIC to summarise public submissions was shelved immediately after a pilot experiment, senior staff revealed during a senate inquiry on Tuesday, after the machine outputs were deemed inferior to human-generated summaries.
ASIC chair Joe Longo, during the inquiry, also expressed his dissatisfaction with the quality and substance of the machine-generated summaries.
The standalone capability, built on Meta’s Llama 2 large language model (LLM), was designed to produce accurate summaries of public submissions forwarded to the regulator, said Graham Jefferson, ASIC’s digital and transformation lead, who was among the overseers of the GenAI experiment.
Generating these summaries is a typically onerous but necessary responsibility for ASIC’s human staff, with machine-generated summaries offering the prospect of significantly reducing man-hours spent on this task.
“That was an offline experiment,” Jefferson confirmed during the Senate Select Committee on Adopting AI, who further clarified that the summarisation pilot was conducted on submissions made to a parliamentary joint inquiry into the consultancy sector.
According to Jefferson, the submissions, numbering around 50, were loaded into and parsed through the LLM for summarisation. To provide a control for the study, the each of 50 or so submissions were also summarised by ASIC staff members. The two output methods were then assessed by the regulator’s senior staff in a blind comparison.
While, according to Jefferson, ASIC deemed the LLM pilot a “success”, he hastened to add that “the results… wouldn’t be something we’d want to use going forward”.
“The results weren’t sufficiently good for us to want to use that summary technique in that particular way. Not at this stage of the process.
“What we found in general terms was that the summaries were quite generic. The nuance as to how ASIC had been referenced wasn’t coming through in the AI-generated summaries in the way that it was when an ASIC employee was doing the summarising work,” Jefferson added.
ASIC chair Joe Longo labelled the AI-generated summaries “bland”, further adding that they lacked sufficient nuance.
“It wasn’t misleading. It just really didn’t capture what the submissions were saying. The human [summaries] were able to extract nuances and substance.”
The technology, if improved, no doubt has promise for the regulator and government agencies more broadly, with the prospect of saving significant labour hours. It is not uncommon for inquiries of significant public or business interest to attract hundreds of submissions.
He said the regulator was currently reviewing the results of the submissions summarisation experiment, and was taking feedback from staff.
Outside of the summarisation pilot, Jefferson said the regulator has around 20 machine learning algorithms “registered in its inventory” right now.
ASIC also recently joined a whole-of-government Microsoft Copilot trial, coordinated by the Digital Transformation Agency (DTA), trialling a range of GenAI tools embedded in Microsoft’s 365 environment.
Jefferson revealed that around 150 ASIC staff had been testing the technology for around a month.