
Not a day goes by without a renowned CEO making a bold statement about how AI is poised to replace manual coding or tossing around a staggering percentage of programmers who will be displaced by AI in just a few months.
Airbnb’s recent revelations suggest how teams are likely to increasingly utilise AI to manage and migrate codebases. The company has completed its first large-scale, LLM-driven code migration, updating around 3,500 React component test files from Enzyme to React Testing Library (RTL).
“We’d originally estimated this would take 1.5 years of engineering time to do by hand, but — using a combination of frontier models and robust automation — we finished the entire migration in just 6 weeks,” said Charles Covey-Brandt, a software engineer at Airbnb, in a blog post.
The company wanted to move away from Enzyme as its deep access to component internals no longer aligned with modern React testing practices.
75% of Target Files Migrated in just 4 Hrs
In mid-2023, the company validated the concept of using LLMs to successfully convert hundreds of enzyme files to RTL in just a few days. Subsequently, the company constructed a scalable pipeline for an “LLM-driven migration” last year.
The pipeline involved breaking down the migration into automated validation and refactoring steps. “Each file moves through stages of validation, and when a check fails, we bring in the LLM to fix it,” stated Covey-Brandt. The company admitted that this approach made it easy to migrate hundreds of files at once.
To improve migration success, Airbnb initially experimented with prompt engineering but found that a brute-force retry loop was most effective. It implemented a system where each migration step retired validation multiple times, dynamically updating the prompts with errors and the latest file versions.
Besides increasing the retry attempts, Airbnb also increased the prompt context. A context-rich prompt engineering approach helped LLMs understand various team-specific patterns, common testing approaches, and the overall architecture of the codebase.
“Our prompts had expanded to anywhere between 40,000 to 100,000 tokens, pulling in as many as 50 related files, a whole host of manually written few-shot examples, as well as examples of existing, well-written, passing test files from within the same project,” he said.
Using the above techniques, Airbnb migrated 75% of target files in just four hours. However, it still had 900 files that failed the step-based validation criteria. The company then built tools for target re-runs. A migration status comment tracked each file’s progress, while a re-run feature allowed filtering by failure step.
“After running this ‘sample, tune, sweep’ loop for 4 days, we had pushed our completed files from 75% to 97% of the total files, and had just under 100 files remaining,” said Covey-Brandt. The company felt that further retry attempts for the remaining files felt like they were “pushing into the ceiling” of what they could fix via automation. They manually dealt with the rest of the files.
Airbnb’s success story isn’t a first for code migrations driven by AI. Giants like Google and Amazon have revealed something similar in the past.
Google Saw a 50% Improvement in Speed
Earlier this year, Google published a comprehensive report detailing its experiences in various situations involving LLMs for code migrations. It revealed that LLM-driven migrations accelerate the process by 50%.
The company cited an example of converting unique ID types from 32-bit to 64-bit capacity in the Google Ads code base. The transformation was necessary because 32-bit IDs were at risk of exceeding their maximum value, which could cause system failures due to integer overflow.
“The full effort, if done manually, was expected to require hundreds of software engineering years and complex cross-team coordination,” said Google, listing out several potential challenges in the migration process. The company then devised a workflow involving an LLM-based migration toolkit and a human engineer/expert. Google found that 80% of the code modifications in the change lists were fully AI-authored, while the remaining 20% were either human-written or edited.
However, the company also mentioned that engineers must revert or adjust specific AI-generated changes due to inaccuracies or needless modifications. “This observation led to further investment in LLM-driven verification to reduce this burden,” said Google.
Do Engineers Like AI-Driven Migration?
Amazon Web Services (AWS) conducted research on the human-AI partnership in code migrations. The study focused on transitioning legacy Java code to a modern version using Amazon Q Code Transformation. Through a series of interviews with 11 software developers, the study revealed that developers view AI as a collaborative teammate.
The study indicated that developers desire control over the migration process, prefer to guide the AI based on their expertise, and serve as reviewers to verify the changes meticulously.
“Just as code reviews help junior developers improve, constructive user critiques of the AI system enable it to better align with expectations and continuously amend its understanding like a programmer assimilating feedback,” stated AWS in a section of the report.
Moreover, the study also suggested that designers of human-AI partnership systems should reveal limitations instead of obscuring imperfections to help align expectations.
The study did find multiple errors where the Java dependency was modified without verifying that it was the correct version.
“I feel like I am being gaslighted here. The first article says one version, and the second says another version. Which one is it?” said participant 9 (P9), responding to the error.
Furthermore, even when the AI produced current output, participants revealed that they wanted to double-check everything.
“I don’t know if any AI, I would find extremely trustworthy. I am still going to double-check everything. I don’t expect it to be malicious, it gave me a valid dependency update, but I am still going to double-check everything (sic),” said P9.
The above scenarios are LLM-driven code migrations, but not migrations completed fully by LLMs themselves. There is still a need for human oversight, review, and verification.