Can we train computers to predict bacterial functions in plants?


Breaking down the microbiology world one bite at a time

Can we train computers to predict bacterial functions in plants?

You may know that trillions of microbes call your body home. But did you know trillions of microbes call plants and the food you eat home as well? Plants have diverse and complex interactions with microorganisms from bacteria and viruses to larger organisms like fungi, protozoa, and nematodes. Like the human microbiome, many factors contribute to a microbe being harmful or beneficial to plants. The environment, the host, and the fellow microbes impact the host-microbe-microbe interactions.

Take Xylella fastidiosa as an example. This is the causal agent of Pierce’s Disease of grapevine. Xylella is a plant pathogen of grapes but it also calls 350 other plant species home. Many of these plants are not affected by the presence of Xylella.

Researchers can look at Xylella’s genome to better understand what functions it has, and pull out genes which would be detrimental to the plant. We call these genes virulence factors and they can contribute to the disease progression. They do this by comparing genes of the pathogen to a database of known genes with known pathogenic functions. This could be enzymes that break down cell walls or the ability to protect themselves by forming a sticky shield known as biofilms.

Historically, microbiology has revolved around pathogens and infectious diseases. As we get better at deciphering the microbial world, this focus is shifting. Beneficial microbes can enhance plant growth, protect plants from diseases, and aid in the creation of more resilient crop systems. Finding pathogens is easy, they cause disease, create undesirable symptoms. Finding beneficial microbes, well this can be challenging as it’s hard to connect health and resiliency to a single factor.

How do researchers scan thousands of microbes to find potential beneficial qualities? A process called machine learning can help. In a recent article, Genomics and machine learning-accelerated discovery of biocontrol bacteria, authors Matthew Biggs, Kelly Craig, Esther Gachango, Mathias Twizeyimana and David Ingham, used the power of computers to help them discover bacteria that can help fight fungal diseases. They looked for bacteria that had potential antifungal properties.

This is not an AI take-over, computers are not getting smarter. Computers can not think like you and I but they are phenomenal at pattern recognition. Let’s look at an example. If I gave you the sequence ATTGGCTA, you could memorize it. You’d be able to tell me certain properties about it such as length and composition. An eight nucleotide sequence is human readable. What about a full genome? The average bacterial genome is 5 million bp in length which can translate to 5000 proteins!  If you looked at this, what do you think you could pull out? Even after months of straining your eyes staring at this sequence, you’d still barely understand what you are looking at. This is where machine learning comes in. Machine learning is like a training program for computers.

Let’s say you went to ballet lessons as a kid but now you want to learn swing dancing. From your prior experience in ballet, you know that dance includes rhythm, music, certain moves, positions, and a specific outfit just to name a few. You can assume that swing dancing includes these same criteria. The criteria would be specific to swing dancing. The prior experience you have with music, rhythm, etc. will help you learn swing dancing faster than someone who has never danced before in their life.

This is similar to machine learning for computers. Researchers can use example data (past experience) to train a computer to optimize its performance. The performance is how well it can predict the function of certain genes. In Bigg’s case, they were interested in finding antifungal genes. The goal is to find novel microbes that can be developed into microbial products. Microbial products can be alternatives to biofertilizers and pesticides.

In total, the team looked at 1,227 bacterial genomes. They told the computer to look through the genomes for any patterns that are similar to known patterns of antifungal activity. The machine learning process flagged 72 isolates as potentially being antagonistic against fungus. There are several fungicidal compounds already known:  fengycin, pyrrolnitrin, zwittermicin, bacilysin and siderophore pyoverdine are just a few.  Researchers can train computers to look for similar patterns and predict which bacteria may have fungicidal genes. Notice, researchers are not asking the computer to find the exact gene.  Researchers are asking the computer to take prior knowledge and apply it to a new scenario. In doing so, researchers will find the genes that have a known fungicidal function. In addition, they also find novel fungicidal compounds. 

Machine learning is not this simple. There are a number of different machine learning algorithms. Each one focuses on different criteria and will produce slightly different results. The team compared four different models, each performing slightly differently. Different models may perform better for specific applications and conditions. Just like if you learned jazz or tap as a kid, instead of ballet, you can learn swing dancing faster. Computers can not determine the function of bacteria. They can help us predict the potential function of a gene within a bacterial genome. The computer can take previous knowledge, 1227 genomes, and report a subset of promising isolates, in this case 72 isolates, to test further.

Antagonism is a very complex interaction. In the simplest form, a bacterium produces an antifungal, killing a susceptible fungus. This is what the computer is predicting. Antagonism is more dynamic than this. Having the gene in the bacterial genome does not mean the gene is functional. It can’t predict if there are spatial dependencies for producing this antifungal. Does that fungus have to be touching that bacteria for the bacteria to release the antifungal? What conditions trigger this behavior?

Antifungal behavior is also not the only way a bacteria can be antagonistic towards a fungus. The bacteria could simply take up too much space and/or resources in an environment. Without space and resources, the other microbes would starve or have to find a new home. Another way microbes can be antagonistic to each other is through the host. Microbes can manipulate the host. They can tell the host to enhance its defense system, arming them for a pathogenic attack. This is a process known as priming. 

These more complex interactions can not be captured by machine learning. It’s also important to say that computers are not infallible, the prediction can be wrong. Machine learning and other bioinformatics techniques are helping researchers to study the complex and dynamic interactions between plants and microbes at a rate never seen before! However, bioinformatics and machine learning do not replace biology, they complement each other.

After any bioinformatic analysis, it’s vital to bring back the biology. The predictions made by the computer should be tested in real-world settings. What’s so exciting about this technique is it can easily be applied to any situation! Here they looked specifically at bacteria with the potential to destroy a specific fungus but this machine learning method can be applied to finding microbes that cause disease in humans or microbes that can help us fight climate change. It has the potential to drastically reduce research time and help researchers focus their efforts!

So in conclusion, researchers can use computers to help predict bacterial function both as virulence factors and beneficial characteristics across various environments. Computers can help dwindle a pool of 1000s of microbes down to less than a hundred potential important microbes. This can help researchers save time and focus their research when they test these microbes in the plants. 

Link to the original post: Matthew Biggs, Kelly Craig, Esther Gachango, Mathias Twizeyimana, and David Ingham. Genomics- and machine learning-accelerated discovery of biocontrol bacteria. Phytobiomes, May 2021