U of T researcher aims to improve laborious process of summarizing source code
Software may play an integral role in the modern world, but its development, maintenance and management remain expensive and laborious – a challenge the University of Toronto’s Eldan Cohen aims to address.
Cohen, an assistant professor in the department of mechanical and industrial engineering in the Faculty of Applied Science & Engineering, is leading a team of researchers to develop novel, human-centred machine learning algorithms to automatically summarize a snippet of code into clear and concise language, a process known as source code summarization.
Such summaries are meant to capture the purpose of code, helping developers understand, maintain and work with the codebase. They are particularly important in large software development projects and involve both natural language processing techniques and machine learning.
While there has been significant research into using AI to develop automated source code summarization tools that can generate natural language summaries of code, Cohen says there is still much room for improvement.
“Even state-of-the-art deep learning models are prone to mistakes in prediction, yielding summaries that do not match the provided source code,” says Cohen. “In such cases, software developers must reject the proposed summary and resort to manually documenting the code.”
To address this challenge, Cohen recommends developing a human-in-the-loop technique for automated code summarization that considers the developer’s knowledge, preferences and insight to overcome and learn from model mistakes. The approach allows developers to actively participate in the process of generating code summaries through machine learning algorithms and integrates human insights into the automated code summarization workflow.
He is also developing specialized machine learning algorithms to overcome the limitations of existing approaches, including limited diversity and lower-quality summaries.
“We plan on doing this by creating interactive approaches where developers are presented with a small number of diverse and high-quality code summaries to choose from, reducing the risk of generating a single, incorrect summary,” he says.
The long-term goal of Cohen’s work is to significantly improve the effectiveness of automatic source code summarization. By developing these human-in-the-loop approaches, Cohen and his colleagues hope to incorporate developer input into state-of-the-art deep learning models to improve the quality of generated code summaries.
The approach is expected to have significant scholarly impact with the potential to catalyze both research and commercial activity on human-in-the-loop automation in software engineering.
Cohen is one of 49 researchers from across U of T – and one of four from U of T Engineering – supported in the latest round of the Connaught New Researcher Awards, which helps early-career faculty members establish their research programs.
“Students are involved in all stages of this project and are actively involved in developing and evaluating the novel human-in-the-loop techniques for automatic source code summarization,” says Cohen. “The funds from this award will primarily go to supporting their research.”
The other three projects from U of T Engineering supported by the Connaught New Researcher Awards are:
- Margaret Chapman, Edward S. Rogers Sr. department of electrical and computer engineering: Risk-aware, adaptive and scalable algorithms for smart sewer technology in Toronto
- Christopher Lawson, department of chemical engineering and applied chemistry: Engineering untapped anaerobic bacteria for sustainable fuel and chemical production
- Jay Werber, department of chemical engineering and applied chemistry: Ultra-thin bipolar membranes for carbon dioxide removal applications