- AI-Asssisted Coding: The promises
- Uncovering the AI productivity claim
- Human motivation: are we outsourcing the fun and adding to the mundane
- Are we focusing AI on the right problem
- No need to settle at 55% faster, 10x ourselves
- The big win: optimise for software maintenance
- Research: AI refactoring
- Can AI help us refactor existing code
- Fact checking the AI refactorings: Can we separate the good from the bad refactorings
- Outcome: elevated to the level of human experts with a fact-checking model
- References
AI-Asssisted Coding: The promises
- Travelling back in time to the 1970s
- Setting the scene, in 2024 began programming with the Atari 2600
- No operating system, 128 bytes of RAM
- Compensated for lack of RAM with 4K of ROM for game logic, graphics and sound
- GitHub Copilot gave an ad on this code "55% faster"
- Raised existential questions: do I want to be 55% faster at a hobby? At what?
Uncovering the AI productivity claim
- "The impact of AI on developer productivity: Evidence from GitHub Copilot"
- Researchers claimed it was faster, but with no guarantee that it transfers to real world tasks
- This was done in experimental setting, almost like a classroom
- They also point out that the study suggested less experienced developers would benefit most
- "this study does not examine the effects of AI on code quality"
- What are the implications? Will we make the world a better place?
Human motivation: are we outsourcing the fun and adding to the mundane
- We are basically turning ourselves into "maintenance programmers"
Are we focusing AI on the right problem
- Writing new code is a small part (5%) of what we do
- 55% of 5% is just 1 hour a week
- Not disruptive nor groundbreaking
- The big potential win for AI is in understanding code
- Existing system understanding tells us what to change
- What if we refocus AI so it becomes easier to understand?
No need to settle at 55% faster, 10x ourselves
- Data to support: 'Code Red: The Business Impact of Code Quality' (arxiv) Tornhill and Borg (2022)
- Categorises code as:
- 'green' (easy),
- 'yellow' (problematic, more complicated than the problem calls for)
- 'red' (worst)
- Red category makes you 10x slower, even if problem is similar scale
- If we can use AI to turn red code green, we can indeed make ourselves 10x faster
- We want to ensure AI generates green code, else it just generates maintenance burden
In a follow-up to 'Code Red' study, looked at on-boarding cost
- New programmers need extra time to adapt to it, especially with lower code quality
- Borg, Tornhill Mones (2023) U Owns The Code...
The big win: optimise for software maintenance
- Refactoring is defined as improving the design of existing code without change of behaviour
-
It's not a refactoring unless we improve the design - Need a gold standard to judge if we did or not
-
It's not a refactoring if we fail to preserve the behaviour of the original code, e.g. we introduce a bug
- "Refuctoring": when we fail to keep these requirements
Research: AI refactoring
- 'Refactoring vs. Refuctoring'
- Measured code quality improvement using the 'Code Health' metric
- Only code level metric that correlates with business outcome
- Aggregated metric as there's never a single metric that governs multifaceted problem
- File-level metric, 3 categories
- Module/class level smells e.g. low cohesion, God classes
- Low cohesion: class with too many business rules, makes code hard to understand
- Low cohesion class that grows becomes a God class
-
Function level smells e.g. copy-pasted logic, God functions, primitive obsession
-
Implementation smells e.g. deep nested logic, complex conditionals
- Roughly 20% of all programmer mistakes are due to things like deep nested logic
-
Just doesn't play well with how the human brain works
-
Code health:
- green = healthy code with low defect risk
- yellow = maintenance risk
- red = worse...
Can AI help us refactor existing code
- Benchmarking on 100k refactorings of real world code
- 99%+ valid code, around 68% improved code health, and only 18-37% made a valid refactoring
- Let's say you had a coworker who broke the code in 70-80% of cases
- When it comes to a machine we accept it
Fact checking the AI refactorings: Can we separate the good from the bad refactorings
- How do we know which is refactoring vs. refuctoring?
- CodeScene ACE: auto-refactor the code
- Refactoring goes to a 'model selector' first
- Analyses the code, selects the best AI service for the job, the AI services have different strengths
- Throw away bad refactorings, make another attempt, ask another AI
- Demo: refactoring nested conditionals, naming things (logical operators)
Outcome: elevated to the level of human experts with a fact-checking model
- 98% correctness from 30% range
- Focus on comprehending code over mere writing
- Understanding existing code is a very human-intensive aspect
References
- Code Red
- Refactoring vs. Refuctoring
- Your Code As A Crime Scene (2023)