Scientists
from the laboratory of Dr. Tomáš Pluskal at IOCB Prague are helping colleagues
around the world identify previously unknown compounds. They have created an
extensive library called MSnLib,
which contains several million records showing how small molecules "break
apart" when measured by mass spectrometry.
Until now, comparable databases have
expanded only very slowly, but thanks to a new approach developed at IOCB
Prague, data on unknown molecules can now be obtained in a matter of minutes.
This opens the potential for faster drug
discovery, better monitoring of chemical substances in the environment, and further advances in
artificial intelligence for biomedicine.
An article about the library has been published in the journal Nature Methods.
Credit: Institute of Organic Chemistry and
Biochemistry of the CAS
Mass spectrometry reveals the
composition of chemical substances and is a key tool in medicine, pharmacy, and
environmental research. The instrument breaks a compound into smaller parts,
and from these fragments scientists determine the structure of the original
molecule.
Fragment spectra, which can be
imagined as a fingerprint unique to each substance, are compared with already
known spectra stored in libraries. However, existing databases have covered
only a limited number of known compounds, making the search considerably more
difficult.
Pluskal and his team have moved the
development of spectral libraries significantly forward. At the time they
prepared their study for Nature Methods, they had compiled a
catalog of thirty thousand small molecules. For these, they recorded two million high-quality
spectra, and they did not settle for a rough picture.
Through multistage fragmentation
(MSn), i.e. repeated breaking of molecules, they obtained
a more detailed view of their internal structure. Such a comprehensive data set
is available to the scientific world for the first time.
Pluskal explains, "During the
twenty years I've worked in this field, spectral libraries have not expanded
much. We managed to change this practice and created the largest database
currently in existence. Moreover, we've made it openly available to the global
scientific community."
The researchers also substantially
accelerated the analysis itself. They can measure ten compounds at once, and
the entire process takes only a minute and a half. Because Pluskal's team is
exceptionally well known and active in the global scientific community, they
have received thousands of compounds as gifts from companies and institutions.
"Since writing the article
in Nature Methods, we've advanced further. So far, we've processed
about 70,000 compounds, and we have another 150,000 awaiting analysis. We
continue uploading data online, and by the end of the year we'd like to reach
200,000 measured compounds. That's roughly 10 times more than has been
available over the past 20 years," says the first author of the article,
Dr. Corinna Brungs.
Pluskal and his colleagues are also
using the enormous amount of new data to improve AI algorithms that
autonomously recognize unknown chemical substances—from metabolites in
the human body to compounds in plants and microorganisms.
Scientists "feed" the
machine learning model with data from the chemical library. The more data it
receives, the more accurately the model can predict, based on the supplied
spectrum, what the molecule behind the spectrum might look like.
The spectral library was created using the open-source software "mzmine," which enabled automated processing of a vast number of measurements. As a result, the resource is not only extensive but also easily usable for further scientific projects worldwide.
No comments:
Post a Comment