The Department for Science, Innovation and Technology and the National Cyber Security Centre published results from a Government Cyber Coordination Centre pilot on 12 June 2026, after frontier AI models helped identify 407 cyber findings across nine government organisations.
The work was carried out through weekly, in-person hackathons led by the Government Cyber Coordination Centre, known as GC3. Teams scanned public government code repositories with support from specialists at the UK AI Security Institute and the National Cyber Security Centre.
The pilot tested whether frontier AI could help find and mitigate previously unidentified vulnerabilities before they could be exploited. It also examined how model performance in controlled evaluations translates into applied cyber defence across real public sector environments.
Frontier AI tested in government cyber defence
The pilot was linked to the Government Cyber Action Plan, which aims to improve cyber resilience across the UK public sector. GC3 explored how emerging AI systems could be used safely as part of defensive cyber work across government.
The government said frontier models have developed quickly in cyber-related tasks, but synthetic benchmarks can give only a partial view of real-world performance. The pilot therefore focused on operational testing against public code repositories rather than relying only on laboratory-style evaluation.
- Applied testing: teams used frontier AI in practical defensive workflows rather than standalone benchmark tasks.
- Human oversight: findings were checked by specialists before escalation or remediation decisions were made.
- Public code focus: repositories already published in the open could be reviewed quickly with limited additional sharing checks.
How the pilot scanned public code
The hackathons gave participating teams access to frontier models and allowed them to develop their own tools and workflows. Instead of mandating one approach, GC3 observed what worked each week and encouraged teams to build on the most effective methods.
The pilot used public government repositories because UK Government policy encourages new source code to be open by default, with specific and justified exceptions. The source material said openness can create visibility that attackers may also use, while also reducing duplication and supporting cleaner code maintenance.
AI approaches used by participating teams
Teams tested a range of approaches to determine how frontier AI could support vulnerability discovery. Some combined traditional security scanning tools with AI-assisted review, while others built specialised workflows designed to validate findings and reduce false positives.
The pilot found there was no single successful model. Departments adapted methods to suit their own systems, allowing GC3 to compare different approaches and identify techniques that produced the most useful results.
Pilot Method Indicators
| Indicator | Recent Movement | Context |
|---|---|---|
| Testing format | Weekly hackathons | GC3 led in-person sessions with specialists from AISI and NCSC. |
| Code estate | Public repositories | The pilot focused on government code already published in the open. |
| Tooling model | Varied team approaches | Departments built their own workflows instead of using one mandated method. |
Findings from the government cyber pilot
Participants identified 407 findings in total. The source material said these included weaknesses exposing services to authentication bypass, data exposure and remote code execution.
Some findings were already understood and mitigated through compensating controls, while others had not previously been identified. All highest-risk weaknesses were remediated, and no evidence of exploitation was identified for any finding.
The pilot cost £13,000 in tokens across the month. That work covered nine government organisations and showed how AI models could trace vulnerabilities across service boundaries and connect business logic with technical detail.
- Findings total: participants reported 407 findings across the pilot period.
- Organisations involved: the work covered nine government organisations during the month.
- Token cost: the reported model token cost was £13,000.
Pilot Findings Summary
| Indicator | Recent Movement | Context |
|---|---|---|
| Total findings | 407 | GC3 reported the total across participating government organisations. |
| Highest-risk remediation | Completed | The source material said all highest-risk weaknesses were remediated. |
| Evidence of exploitation | None identified | No evidence of exploitation was found for any reported finding. |
Critical vulnerability example
One notable finding affected legacy GitHub Actions in a repository supporting a government digital service. The issue allowed an external user to trigger a workflow chain by posting a specially structured comment on an open pull request.
The source material said the workflow was triggered by a comment rather than the pull request itself. This bypassed the usual protections for pull requests from unknown contributors and created a route to remote code execution on the GitHub Actions runner.
The workflow took content from the comment, passed it into deployment parameters and used it in an environment substitution step. The source material said this could have allowed malicious actors to extract secrets and tokens available to the workflow.
Lessons from using AI in cyber defence
The pilot found that carefully designed workflows produced better results than using AI models alone. Human validation remained essential because potential vulnerabilities were identified faster than security teams could fully assess them.
The pilot also showed that finding vulnerabilities is not the same as fixing them. Identified weaknesses still had to enter normal remediation pipelines, with prioritisation, review and patching aligned to existing human-centred processes.
Next phase of the GC3 pilot
GC3 will begin a second phase with more departments, additional models and an extension from public code into closed-source estates. The stated aim is to identify vulnerabilities earlier and help departments share proven defensive techniques.
AISI and NCSC involvement will also deepen as the government continues to evaluate AI for cyber defence in applied settings. The next phase is expected to focus on closing the gap between theoretical model scores and measurable reductions in cyber risk.
The GC3 pilot tested frontier AI models against public government code to assess their practical value in cyber defence. The work identified 407 findings, remediated the highest-risk weaknesses and found no evidence of exploitation. The next phase will widen participation, add more models and move into closed-source estates while keeping human validation central to government cyber resilience work.
Sources: Department for Science, Innovation and Technology and National Cyber Security Centre.
Prepared by Ivan Alexander Golden, Founder of THX News, an independent news organisation delivering timely insights from global official sources. Combines AI-analysed research with human-edited accuracy and context.






