Machine learning has a variety of applications in scientific research, from rapidly analyzing datasets to making predictions. At the Joint BioEnergy Institute (JBEI), researchers are using machine learning to find new proteins that play a role in plant gene expression — providing the scientific community with new avenues to explore in bioenergy crop engineering.
In a new study published in Cell Systems, JBEI researchers used a machine learning algorithm to screen a plant and yeast species to see if they could find proteins involved in gene expression outside of those that have already been identified. In doing so, they confirmed hundreds of proteins that could potentially be new targets for plant engineering research.
“This will help identify new levers to pull to help improve the vision of deploying engineered bioenergy crops to sustain the future bioeconomy,” said corresponding author Patrick Shih, the director of Plant Biosystems Design at JBEI.
Researchers have a limited understanding of the core mechanisms involved in plant gene expression, mostly because plants take a long time to grow, and therefore study. The majority of plant traits that researchers want to engineer are controlled by transcription, a process in which genetic information in DNA is copied into RNA. The more researchers can understand about transcription, the better they can engineer plants to have desirable traits.
“Most research looks at transcription factors, the proteins that regulate transcription,” Shih said. “But we know that transcription is really complex, and there are many other proteins that are not transcription factors that are contributing to this phenomenon. We’ve done a very poor job of even identifying what proteins could be key regulators in transcription.”
Lead author Niklas Hummel, a graduate student at JBEI, decided to look into a class of proteins that aren’t transcription factors. Examining non-transcription factor proteins would allow researchers to have a strong list of candidates that could be further studied for their involvement in transcription. Using a previously published machine learning algorithm called PADDLE, he screened all of the proteins in the yeast Saccharomyces cerevisiae and the plant Arabidopsis thaliana, in search of non-transcription factor proteins that have transcriptional activity.
“It turned out that 89% of the proteins we studied contained fragments that can activate transcription if they’re in the correct context,” Hummel said. “That was really encouraging.”
Using the machine learning algorithm sped up the research process significantly.
“We used this algorithm to extract areas of interest from plant proteins, which would have taken me forever to do manually,” Hummel said. “I would’ve been doing plant experiments for like a hundred years to get the same amount of knowledge we got from using the machine learning algorithm in a single experiment.”
The use of machine learning in plant engineering is becoming much more frequent, Shih said.
“We have various projects in our lab right now using machine learning,” he said. “It’s permeating all aspects of our research, so it’s safe to assume many in the field are also moving in this direction.”
That’s largely due to an increased availability of quality data sets that are used to train algorithms. The researchers hope the data they uncovered in this study will be incorporated into future machine learning models.
Within this data, the researchers were surprised to find “universal” synthetic biology parts that can activate transcription in both yeast and plants. It’s possible that some of these parts could be used across a variety of organisms to accelerate engineering efforts.
The researchers plan to expand their approach to other plant species and further investigate some of the proteins they identified in this study to understand more about them.
“With this approach, we can start to make really dramatic discoveries in plant biology,” Shih said. “Here are all of these new proteins that have never been characterized, and we have strong evidence that they could potentially be key players in gene expression.”
The Joint BioEnergy Institute is a DOE Bioenergy Research Center.