publications
2023
- Blue Brain Nexus: An open, secure, scalable system for knowledge graph management and data-driven scienceMohameth François Sy, Bogdan Roman, Samuel Kerrien, Didac Montero Mendez, Henry Genet, Wojciech Wajerowicz, Michaël Dupont, and 16 more authorsSemantic Web, 2023Publisher: IOS Press
Modern data-driven science often consists of iterative cycles of data discovery, acquisition, preparation, analysis, model building and validation leading to knowledge discovery as well as dissemination at scale. The unique challenges of building and simulating the whole rodent brain in the Swiss EPFL Blue Brain Project (BBP) required a solution to managing large-scale highly heterogeneous data, and tracking their provenance to ensure quality, reproducibility and attribution throughout these iterative cycles. Here, we describe Blue Brain Nexus (BBN), an ecosystem of open source, domain agnostic, scalable, extensible data and knowledge graph management systems built by BBP to address these challenges. BBN builds on open standards and interoperable semantic web technologies to enable the creation and management of secure RDF-based knowledge graphs validated by W3C SHACL. BBN supports a spectrum of (meta)data modeling and representation formats including JSON and JSON-LD as well as more formally specified SHACL-based schemas enabling domain model-driven runtime API. With its streaming event-based architecture, BBN supports asynchronous building and maintenance of multiple extensible indices to ensure high performance search capabilities and enable analytics. We present four use cases and applications of BBN to large-scale data integration and dissemination challenges in computational modeling, neuroscience, psychiatry and open linked data.
- KGBlue Brain Knowledge Graph: Leveraging Semantic Web Technologies for Simulation Neuroscience.Cristina E Gonzalez-Espinoza, Anna-Kristin Kaufmann, Eugenia Oshurko, Alejandra Garcia Rojas Martinez, Sarah Mouffok, Konstantinos Platis, Jonathan Lurie, and 11 more authorsIn ISWC 2023 Posters and Demos: 22nd International Semantic Web Conference, November 6 - 10, 2023, Athens, Greece, 2023
The Blue Brain Project, a Swiss neuroscience research initiative, has pioneered a data-driven approach to digitally building and simulating biologically detailed models of the mouse brain as a complementary approach to understanding the brain alongside experimental, theoretical and clinical neuroscience. One of the key steps of this approach involves acquiring, organizing, and integrating heterogeneous data describing the structural and functional organization of the brain at various levels, ranging from synapses and subcellular components to individual neurons, circuits, and entire brain regions. The data is acquired from many sources including neuroscience experiments, published scientific papers, and brain databases. To address many of the data organization, reuse, sparsity, and publishing challenges that arise alongside this approach, Blue Brain built an RDF-based large-scale knowledge graph bringing together RDFS/OWL ontologies, SHACL schemas, JSON-LD, as well as ontology-, rule-, and graph-based inference to complement classical neuroinformatics tools and methods. In this paper, we present how such a knowledge graph is built and used by the project’s domain teams to go beyond high-quality and FAIR metadata cataloging. We describe how the knowledge graph serves a multifaceted role: it addresses the diversity, evolution, and quality assessment of data at the whole brain scale, while concurrently tracking data provenance to facilitate reproducibility and precise attribution. Additionally, it facilitates diverse use cases, including the inference of missing data through knowledge-graph-based methods.
2022
- NeuronInformaticsThe Neuron Phenotype Ontology: A FAIR Approach to Proposing and Classifying Neuronal TypesThomas H. Gillespie, Shreejoy J. Tripathy, Mohameth François Sy, Maryann E. Martone, and Sean L. HillNeuroinformatics, Jul 2022
The challenge of defining and cataloging the building blocks of the brain requires a standardized approach to naming neurons and organizing knowledge about their properties. The US Brain Initiative Cell Census Network, Human Cell Atlas, Blue Brain Project, and others are generating vast amounts of data and characterizing large numbers of neurons throughout the nervous system. The neuroscientific literature contains many neuron names (e.g. parvalbumin-positive interneuron or layer 5 pyramidal cell) that are commonly used and generally accepted. However, it is often unclear how such common usage types relate to many evidence-based types that are proposed based on the results of new techniques. Further, comparing different types across labs remains a significant challenge. Here, we propose an interoperable knowledge representation, the Neuron Phenotype Ontology (NPO), that provides a standardized and automatable approach for naming cell types and normalizing their constituent phenotypes using identifiers from community ontologies as a common language. The NPO provides a framework for systematically organizing knowledge about cellular properties and enables interoperability with existing neuron naming schemes. We evaluate the NPO by populating a knowledge base with three independent cortical neuron classifications derived from published data sets that describe neurons according to molecular, morphological, electrophysiological, and synaptic properties. Competency queries to this knowledge base demonstrate that the NPO knowledge model enables interoperability between the three test cases and neuron names commonly used in the literature.
2021
- KG_NLPA Machine-Generated View of the Role of Blood Glucose Levels in the Severity of COVID-19Emmanuelle Logette, Charlotte Lorin, Cyrille Favreau, Eugenia Oshurko, Jay S. Coggan, Francesco Casalegno, Mohameth François Sy, and 11 more authorsFrontiers in Public Health, Jul 2021
SARS-CoV-2 started spreading toward the end of 2019 causing COVID-19, a disease that reached pandemic proportions among the human population within months. The reasons for the spectrum of differences in the severity of the disease across the population, and in particular why the disease affects more severely the aging population and those with specific preconditions are unclear. We developed machine learning models to mine 240,000 scientific articles openly accessible in the CORD-19 database, and constructed knowledge graphs to synthesize the extracted information and navigate the collective knowledge in an attempt to search for a potential common underlying reason for disease severity. The machine-driven framework we developed repeatedly pointed to elevated blood glucose as a key facilitator in the progression of COVID-19. Indeed, when we systematically retraced the steps of the SARS-CoV-2 infection, we found evidence linking elevated glucose to each major step of the life-cycle of the virus, progression of the disease, and presentation of symptoms. Specifically, elevations of glucose provide ideal conditions for the virus to evade and weaken the first level of the immune defense system in the lungs, gain access to deep alveolar cells, bind to the ACE2 receptor and enter the pulmonary cells, accelerate replication of the virus within cells increasing cell death and inducing an pulmonary inflammatory response, which overwhelms an already weakened innate immune system to trigger an avalanche of systemic infections, inflammation and cell damage, a cytokine storm and thrombotic events. We tested the feasibility of the hypothesis by manually reviewing the literature referenced by the machine-generated synthesis, reconstructing atomistically the virus at the surface of the pulmonary airways, and performing quantitative computational modeling of the effects of glucose levels on the infection process. We conclude that elevation in glucose levels can facilitate the progression of the disease through multiple mechanisms and can explain much of the differences in disease severity seen across the population. The study provides diagnostic considerations, new areas of research and potential treatments, and cautions on treatment strategies and critical care conditions that induce elevations in blood glucose levels.
2012
- IR/SearchUser centered and ontology based information retrieval system for life sciencesMohameth-François Sy, Sylvie Ranwez, Jacky Montmain, Armelle Regnault, Michel Crampes, and Vincent RanwezBMC Bioinformatics, Jan 2012
Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations.