Mcule - Ultimate Database Project
Synthesis of brand-new building blocks
One of the aims of the ULTIMATE project is to provide a large virtual compound library with high degree of novelty and diversity. In currently available compound screening libraries, flat, aromatic systems are overrepresented and there is a clear need for compounds with more 3D character.
As a part of the ULTIMATE project, we subcontracted the internationally recognised Medicinal Chemistry Group of Vrije Universiteit Amsterdam, Amsterdam, The Netherlands (VUA) for synthesis of novel scaffolds with more 3D character. The synthesis of the new scaffolds with small rings is finished at the end of March 2019. Results are sent to validation to our supplier and pharmaceutical partners. Our aim is to integrate these novel building blocks to the ULTIMATE database and also use them for generating new virtual compounds, which will be highly unique, and also synthesisable.
Validation of our artificial chemist algorithm
We are continuously developing the ULTIMATE database and validating our results. The back-end development of the ULTIMATE platform is finished, and the font-end development is approaching to its end. We generated millions of virtual compounds using our synthetic feasibility algorithm from building block sets from our supplier partners. The synthetic routes for the compounds were validated in collaboration with our supplier partners. The results are very promising. The reactions, to create the virtual molecules, were classified as high-scored (success of synthesis is highly probable), medium and low performing. The reasons for the low scores were analysed and adjustments to the synthetic feasibility algorithm were suggested, which are continuously implemented. The results confirmed that high-scored reactions are synthetically doable with high success (over 80%). These reactions will be included to the final ULTIMATE database, while the algorithm will be further improved for the other reactions.
Synthesis of brand-new heterocycles as new building blocks
According to literature data (Bemis, G. W.; Murcko, M. A. J. Med. Chem. 1996, 39, 2887.), half of the drugs approved until 1996 can be described by the 32 most frequently occurring scaffolds and furthermore the top 50 scaffolds covered about 50% of approved and experimental drugs until 2010 (Wang, J.; Hou, T. J. Chem. Inf. Model. 2010, 50, 55.). These numbers demonstrate that the currently available chemical space for drug discovery is limited and it is based on some privileged scaffolds. Thus, there is a need for novel scaffolds in drug discovery. We believe, that drug discovery could greatly benefit from new heterocycles leading to novel building blocks and eventually novel compounds.
As a part of the ULTIMATE project, we subcontracted the internationally recognised Medicinal Chemistry Research Group of the Research Centre for Natural Sciences of the Hungarian Academy of Sciences (RCNS) for the the synthesis of new heterocycles. The synthesis of the new heterocycle structures is finished at the end of October 2018. Results are sent to validation to our suppliers and pharmaceutical partners. Our aim is to integrate these novel building blocks to the ULTIMATE database and also use them for generating new virtual compounds, which will be highly unique, and also synthesisable.
First version of the artificial chemist algorithm operates
The development team successfully implemented the first version of ULTIMATE’s enumeration algorithm, reaction rules of robust chemical reactions. The first tests have been successfully carried out using building blocks from our supplier partners, Life Chemicals, HTS Biochemie and Key Organics, respectively. New chemical structures (virtual compounds) were successfully created; some examples will be reported shortly.
We aim to extend the existing chemical space with “virtual molecules” generated by our artificial chemist algorithm, a method for predicting compounds that are not yet synthesized but can be synthesized with easy reactions from existing building blocks and reagents at affordable price.
This algorithm will subsequently go through agile optimisation cycles for further improvement to reach the targeted 80% synthetic success rate.
Chemoinformatics developments for the ULTIMATE project
To handle the 500 million compounds of the ULTIMATE database, chemoinformatics developments are necessary for speeding-up substructure and similarity searches. In the last 3 months, we developed the concept for chemoinformatic developments of the ULTIMATE project. We also carried out extensive tests, based on the results, our similarity search implementation resulted in very short average runtimes even on a single core machine.
ULTIMATE database defined
The aim of the ULTIMATE project is to create an easily searchable chemical database of at least 500 million purchasable “virtual” compounds. In the recent months of the project, we defined the overall workflow of the filtering process for compounds to be included in the ULTIMATE database. We implemented property and novelty filters, as well as the method to filter out unwanted structures. Furthermore, we developed a visualisation tool for the chemical space that is based on self-organizing maps.
Novelty filters were implemented to ensure the unique nature of molecules in the ULTIMATE database. The molecules already covered in patents (SureChEMBL) or described in public chemical / biological databases (ChEMBL, PubChem) and those present in already existing purchasable compound databases (ZINC) are filtered out.
Property filters aim to filter out the compounds with properties not compatible with medicinal chemistry purposes. Several physical chemical features are examined, such as molecular mass, logP, tPSA, number of aromatic/aliphatic rings, heteroatom ratio, number of acidic/basic groups, number of sp3 chiral centres, etc.
Compounds containing structures unwanted in medicinal chemistry are also filtered out based on previous well-accepted published methods.
The workflow and implemented filters were successfully tested on a model database. The filters operated well according to the requirements. According to our aim, the ULTIMATE database will contain compounds fulfilling all applied criteria.
Furthermore, we have developed a method for the visualisation of the chemical space using self-organizing maps. The aim of the visualisation is, on one hand, to tailor the development of ULTIMATE database to favour molecules with similar properties to known drugs but underrepresented among already available purchasable compounds and, on the other hand, to help medical chemists to make decisions about the molecules best suited for their research projects. In the latter case the pharma partners may use the maps to compare their in-house molecules to those available in ULTIMATE.
ULTIMATE project first milestone reached
The ULTIMATE project started in August 2017. The aim of the project is to create an easily searchable chemical database of at least 500 million purchasable “virtual” compounds, which can be synthesized (min. 80% delivery rate) at affordable price, in reasonable time (max. 6 weeks of delivery time). Such a large chemical space would present a major advantage for pharmaceutical and biotech companies by increasing their chances to effectively identify novel compounds for diseases, reducing their costs and time losses.
In the first 3 months of the project, Mcule defined the criteria for compound selection and proposed design and filtering rules to be applied in the ULTIMATE database, based on the inputs of our collaborating partners: pharmaceutical companies (GSK, AstraZeneca and Boehringer Ingelheim), suppliers (HTS Biochemie, Key Organics and Life Chemicals), and academic partners (Research Centre for Natural Sciences of the Hungarian Academy of Sciences and Vrije Universiteit Amsterdam).
To increase the novelty of compounds in the ULTIMATE database, synthesis strategies for new, and underrepresented scaffolds were proposed by two academic subcontractors: Medicinal Chemistry Group of Vrije Universiteit, Amsterdam, the Netherlands (VUA) and Medicinal Chemistry Research Group of the Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest, Hungary (RCNS).
In the first three months of the project, Mcule also defined the IT and chemoinformatics requirements for handling such large database. The functions to be used in the ULTIMATE database were suggested based on the analysis of users’ needs.
About the project
Mcule started its new challenging project, called ULTIMATE - The best online drug discovery platform, building the Ultimate chemical database for drug discovery, from 01/08/2017. In this project, a commercial database of 500 million novel, diverse and synthetically feasible compounds will be developed. Standard parameters of the database: min. 80% success rate, max. 6 weeks delivery time, fixed prices. Compound selections, automated quote generation and ordering will be available online at https://mcule.com
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777828. Selected subcontractors and several suppliers will work together with Mcule on the realization of ULTIMATE for 24 months. Project partners include state-of-the-art synthesis design software developers, major chemical suppliers and leading pharma companies.
Mcule realized that the currently accessible chemical space of commercially available compounds is limited. Compound aggregators typically provide access to 7-10 million in-stock compounds only, while the synthetically feasible chemical space is magnitudes larger. Some pharma companies already complement the available in-stock chemical space by in-house virtual libraries, however in-house chemistry resources are typically limited and expensive. Some suppliers already offer virtual libraries, however compound aggregators are not able to integrate them as the size of these libraries presents a major chemoinformatic challenge. The online platform of Mcule is utilizing the latest IT technology and chemoinformatic tools to be capable of handling larger databases integrated with complex modeling tools. Mcule already integrates virtual compounds from major chemical suppliers such as Enamine and UkrOrgSyntez and provides one of the largest compound webshops of over 35 million screening compounds and building blocks. Together with distinguished partners Mcule decided to develop an unprecedented database of 500 million commercially available compounds (Ultimate database) that will be hosted by Mcule.
Mcule (IT, chemoinformatics)
HTS Biochemie (industrial chemistry partner)
Key Organics (industrial chemistry partner)
Life Chemicals (industrial chemistry partner)
Medicinal Chemistry Research Group, MTA-TTK (academic chemistry partner)
Division of Medicinal Chemistry, Vrije Universiteit Amsterdam (academic chemistry parner)
Register as a user for early access
Register as a supplier to participate