EuBIC-MS Winter School 2024 – Full Program

This page contains the full program for the EuBIC-MS Winter School 2024 in Winterberg. Below you can find a detailed table with detail information for each day and speaker.

Abstract booklet: EuBIC-MS Winter School 2024 Abstracts

Full Program Overview

Time	Sunday 14/01	Monday 15/01	Tuesday 16/01	Wednesday 17/01	Thursday 18/01	Friday 19/01
9:00 AM – 9:15 AM			Announcements	Announcements	Announcements	Announcements
9:15 AM – 10:00 AM		Registration Open	Keynote	Keynote	Keynote	Keynote
10:00 AM – 10:45 AM		Parallel workshops	Keynote	Sponsored talk	Keynote	Keynote
10:45 AM – 11:15 AM		Parallel workshops	Coffee break	Coffee break	Coffee break	Coffee break
11:15 AM – 12:00 PM		Parallel workshops	Keynote	Keynote	Keynote	Closing session
12:00 PM – 1:30 PM		Lunch	Lunch	Lunch	Lunch	Lunch
1:30 PM – 3:30 PM		Parallel workshops	Parallel workshops	Special session	Parallel workshops	Shuttle (1:30 PM)
3:30 PM – 4:00 PM		Coffee break	Coffee break	Coffee break	Coffee break
4:00 PM – 5:00 PM	Shuttle (5:00 PM)	Opening session and poster flash talks	Parallel workshops (continued)	Special session	Parallel workshops (continued)
5:30 PM – 6:30 PM					EuBIC-MS session
6:00 PM – open end			Poster session	Social event

Daily Breakdown

Monday

Tuesday

Wednesday

Thursday

Friday

Monday
9:00 AM – 4:00 PM	Registration open Coffee and poster set up
10:00 AM – 12:00 PM	PROTrEIN workshops + GitHub workshop Introduction to mass spectrometry based proteomics workflows and data analysis, PROTrEIN workshop Workflows for PTM identification and validation, PROTrEIN workshop Prediction models in proteomics, PROTrEIN workshop GitHub Actions Demystified: A Hands-On Workshop, Pieter Verschaffelt, VIB – UGent, BE
12:00 PM – 1:30 PM	Lunch
1:30 PM – 3:30 PM	PROTrEIN workshops (continued) + MSAID workshop Introduction to mass spectrometry based proteomics workflows and data analysis (continued), PROTrEIN workshop Workflows for PTM identification and validation (continued), PROTrEIN workshop Prediction models in proteomics (continued), PROTrEIN workshop AI-powered analysis of bottom-up proteomics data with CHIMERYS, MSAID
3:30 PM – 4:00 PM	Coffee break
4:00 PM – 4:30 PM	Opening session Welcoming, organisational information
4:30 PM – 5:15 PM	Flash talks TBA

Tuesday
9:00 AM – 9:15 AM	Announcments
9:15 AM – 10:00 AM	Keynote: A new take on missing value imputation for bottom-up label-free LC-MS/MS proteomics Thomas Burger, CNRS, FR
10:00 PM – 10:45 AM	Keynote: Puzzle pieces – Unraveling the human proteome with the Human Protein Atlas project Cecilia Lindskog, Uppsala University, SE
10:45 AM – 11:15 AM	Coffee break
11:15 AM – 12:00 PM	Keynote: Exploring the unknown to better appreciate the “known”: a bioinformatics odyssey Ben Neely, NIST, US
12:00 PM – 1:30 PM	Lunch
1:30 PM – 3:30 PM	Parallel workshops Statistical analysis of quantitative peptide-level proteomics data with Prostar, Thomas Burger, CNRS, FR PRIDE and ProteomeXchange: Introduction, submission and re-usage of data, Juan Antonio Vizcaino, EMBL-EBI, UK Database-Independent Analysis of LC-MS/MS datasets with compareMS2, Magnus Palmblad, CPM, Leiden University, NL Demonstration of a python package designed for quality control, Niveda Sundararaman, Cedars-Sinai, US Solve the protein puzzle – Use the Human Protein Atlas to understand the different pieces of the human proteome, Cecilia Lindskog, Uppsala University, SE
3:30 PM – 4:00 PM	Coffee break
4:00 PM – 5:00 PM	Parallel workshops (continued) as above
6:00 PM – open end	Poster session With finger food, drinks and open end

Wednesday
9:00 AM – 9:15 AM	Announcments
9:15 AM – 10:00 AM	Keynote: Challenges in Computational (clinical) Lipidomics and QC – a tale of impostors, cut-throat competition and lack of etiquette Nils Hoffmann, FZ Jülich, DE
10:00 AM – 10:15 AM	Sponsored Talk: de.NBI
10:15 PM – 10:45 AM	Sponsored Talk: What determines precision and accuracy of DIA proteomics? Vadim Demichev, Institute of Biochemistry, Charité, DE, invited by EvoSep
10:45 AM – 11:15 AM	Coffee break
11:15 AM – 12:00 PM	Keynote: Making proteomics data FAIR: Challenges and rewards Juan Antonio Vizcaino, EMBL-EBI, UK
12:00 PM – 1:30 PM	Lunch
1:30 PM – 2:30 PM	Special Session: Scientific communications in the age of skibidi toilet Ben Orsburn, Johns Hopkins University, US and Ben Neely, NIST, US
3:30 PM – 4:00 PM	Coffee Break
4:00 PM – 5 PM	Special Session: Scientific communications in the age of skibidi toilet / Free Time Ben Orsburn, Johns Hopkins University, US and Ben Neely, NIST, US
6:00 PM – open end	Social event TBA

Thursday
9:00 AM 9:15 AM	Announcments
9:15 AM – 10:00 AM	Keynote: A principled approach to process, analyse and interpret single-cell proteomics data Laurent Gatto, De Duve Institute / UCLouvain, BE
10:00 PM – 10:45 AM	Keynote: AlphaPeptDeep: a deep learning framework for peptide property prediction and what we can do with it Wen-Feng Zeng, MPI Biochemistry, DE
10:45 AM – 11:15 AM	Coffee break
11:15 AM – 12:00 PM	Keynote: W.T.F. are isotopes? Dirk Valkenborg, Hasselt University, BE
12:00 PM – 1:30 PM	Lunch
1:30 PM – 3:30 PM	Parallel workshops Overview and challanges in the lipidomics data analysis, Nils Hoffmann, FZ Jülich, DE Deep learning enabled software solutions for discovery proteomics, immunopeptidomics, and glycoproteomics, BSI How to analyse single cell proteomics data with the scp-package, Laurent Gatto, De Duve Institute / UCLouvain, BE Training and transfer learning deep learning models to predict peptide property values with AlphaPeptDeep, Wen Feng-Zeng, MPI Biochemistry, DE BRAIN, Pointless4Peptides, MIND and QCQuan, Dirk Valkenborg, Frédérique Vilenne and Piotr Prostko, Hasselt University, BE
3:30 PM – 4:00 PM	Coffee break
4:00 PM – 5:00 PM	Parallel workshops (continued) as above
5:30 PM – 6:30 PM	EuBIC-MS introduction & open meeting Meet EuBIC-MS members, see what we do, and become a member!

Friday
9:00 AM – 9:15 AM	Announcments
9:15 AM – 10:00 AM	Keynote: Rethinking the space race in proteomics informatics; are we using the right metrics? Robbin Bouwmeester, Ghent University / VIB, BE
10:00 AM – 10:45 AM	Keynote: Simplified and Automated Analysis of Large-Scale Proteomics Datasets Niveda Sundararaman, Cedars-Sinai, US
10:45 AM – 11:15 AM	Coffee break
11:15 AM – 12:00 PM	Closing session Poster prizes, final words
12:00 PM	Lunch

Daily Detailed Program and Speaker-Information

Monday

Tuesday

Wednesday

Thursday

Friday

About all speakers

PROTrEIN Workshops

Introduction to mass spectrometry based proteomics workflows and data analysis – PROTrEIN workshop

This workshop will give you a basic introduction to mass spectrometry based proteomics and how to analyse this type of data. We will start with a presentation, introducing how the data is acquired in the mass spectrometer and how you can use this data to identify peptides and proteins. Then we will do some hands-on work and do a tutorial on data analysis.

Workflows for PTM identification and validation – PROTrEIN workshop

Analyzing post-translational modifications (PTMs) from mass spectrometry data poses challenges due to the expanded search space and sample complexity. These challenges are amplified by the diversity of processing existing workflows, where the choice of workflow can lead to substantial variations in identification results. This workshop aims to explore various strategies for PTM dataset analysis, offering a comprehensive understanding of key processing and validation approaches for PTM studies. Participants will work with a dataset of synthetic phosphopeptides, covering all analysis steps from spectrum identification to the validation and scoring of phosphorylation sites. The results from the different approaches investigated will be compared to assess their respective limitations and advantages.

Prediction models in proteomics – PROTrEIN workshop

This workshop is focused on the use of prediction models in proteomics. We will explore what tasks can be solved by machine learning, the history of these methods in proteomics and state-of-the-art models. In the afternoon session we will apply some of them in the hands-on exercises.

GitHub Workshop

GitHub Actions Demystified: A Hands-On Workshop – Pieter Verschaffelt VIB – Ugent, BE

Are you tired of repetitive, time-consuming tasks in your software development process? GitHub Actions can be your answer. Introduced by GitHub in 2018, these powerful workflows can automate everything from running tests to publishing software releases. But how can you harness this automation for your own projects?

MSAID Workshop

AI-powered analysis of bottom up Proteomics data with CHIMERYS – MSAID

Join us for an immersive workshop on harnessing the power of AI in bottom-up proteomics analysis using CHIMERYS. This workshop is designed to equip participants with the skills to navigate Proteome Discoverer 3.1 (PD) and leverage CHIMERYS 2.0 for insightful data exploration. Throughout the workshop, attendees will delve into the essential steps, starting with the download and seamless installation of PD 3.1. You’ll unlock the potential of CHIMERYS by activating a demo license and initiating a CHIMERYS search on a compact dataset. Hands-on guidance will lead you through the process of effectively exploring data within Proteome Discoverer.

Keynotes

A new take on missing value imputation for bottom-up label-free LC-MS/MS proteomics

Label-free bottom-up proteomics using mass spectrometry and liquid chromatography has long been established as one of the most popular high-throughput analysis workflows for proteome characterization. However, it produces data hindered by complex and heterogeneous missing values, which imputation has long remained problematic. To cope with this, we introduce Pirat, an algorithm that harnesses this challenge following an unprecedented approach. Notably, it models the instrument limit by estimating a global censoring mechanism from the data available. Moreover, it leverages the correlations between enzymatic cleavage products (i.e., peptides or precursor ions), while offering a natural way to integrate complementary transcriptomic information, when available. Our benchmarking on several datasets covering a variety of experimental designs (number of samples, acquisition mode, missingness patterns, etc.) and using a variety of metrics (differential analysis ground truth or imputation errors) shows that Pirat outperforms all pre-existing imputation methods. These results pinpoint the potential of Pirat as an advanced tool for imputation in proteomic data analysis, and more generally underscore the worthiness of improving imputation by explicitly modelling the correlation structures either grounded to the analytical pipeline or to the molecular biology central dogma governing multiple omic approaches.

Puzzle pieces – Unraveling the human proteome with the Human Protein Atlas project

The Human Protein Atlas (www.proteinatlas.org) is one of the world’s largest biological databases, focusing on generating a high-resolution map of the human proteome in tissues, cells, organelles and blood, both in health and disease. The current version consists of 12 different section, each focusing on particular aspects of the human proteome, piece by piece contributing to the large puzzle of understanding the human proteome.

Exploring the unknown to better appreciate the “known”: a bioinformatics odyssey

Mass spec-based proteomics relies on knowing what to look for when identifying peptides (bottom-up) or proteoforms (top-down), notwithstanding de novo efforts. These search spaces are defined by sequence collections of proteins (fasta), and despite the constant churn of updates from different data producers, we accept that, at least for humans, we “know” what should be in a sample and everyone agrees what we should use. In contrast, working in non-model systems requires an appreciation of the unknown and a constant questioning of any species-specific resource that comes online. This healthy skepticism for search space may likewise be warranted in the human proteomics. To demonstrate these concepts, we will delve into the trials and tribulations faced when analyzing non-model organisms, from crows to sea lions, including misappropriated fasta from other species, and how search space choices affect results. These same lessons will be re-hashed using human pangenomes, with an eye to population-level proteomics. We hope to emphasize what may or may not be a problem currently and on the horizon, and lead into a discussion of what sequence variability actually matters, at what level of identification sequence differences are impactful, and whether our current identification paradigm is equipped to handle the pangenomic search space.

Parallel Workshops

Statistical analysis of quantitative peptide-level proteomics data with Prostar – Thomas Burger, CNRS, FR

Prostar is a software tool dedicated to the statistical processing of quantitative data resulting from mass spectrometry based label-free proteomics. Practically, once biological samples have been analyzed by bottom-up proteomics, the raw mass spectrometer outputs are processed by bioinformatics tools, so as to identify peptides and quantify them, notably by means of precursor ion chromatogram integration. The peptide level data, once summarized in a quantitative table, can be processed using Prostar. Prostar proposes a well-standardized workflow to: (1) Filter peptides according to their meta-data or missing/observed status in the various samples; (2) Normalize the sample to cope with batch effects; (3) Impute the missing values; (4) Aggregate the peptide-level signal into protein-level abundances; (5) Perform differential analysis (hypothesis testing and FDR control). The entire statistical workflow is accompanied with visual outputs, as to check the effect of each processing steps. Using Prostar does not require coding capabilities thanks to its shiny-based graphical user interface. The workshop targets proteomics researchers who need to perform the statistical analysis of their data on their own. It will cover the basic usage of Prostar and there is no statistical or computational prerequisite to attend the workshop.

PRIDE and ProteomeXchange: Introduction, submission and re-usage of data – Juan Antonio Vizcaino, EMBL-EBI, UK

This workshop will be a tutorial focused on PRIDE and ProteomeXchange, showing some highlights of public proteomics data re-use. The 3-hours will be composed of:

Introductory talk: “Proteomics data repositories: PRIDE and ProteomeXchange”
Hands-on: How to submit data to PRIDE. Also, we will be highlighting the main functionality of the PRIDE Application Programming Interface (API).
Educational talk on principles and examples of data re-use of public proteomics data.

Database-Independent Analysis of LC-MS/MS datasets with compareMS2 – Magnus Palmblad, CPM, Leiden University, NL

While tandem mass spectrometry data is typically analyzed using sequence databases or spectral libraries, there are cases where database (and library)-independent analyses, such as clustering and comparing spectra across datasets, are useful. Examples of such use cases are identification of biological species and tissues, quality control and experimental design.In this workshop, we will show how to use the compareMS2 software tool we have developed (https://github.com/524D/compareMS2). This tool calculates distances between sets of tandem mass spectra, such as LC-MS/MS datasets, based on the share of similar spectra between the datasets. The tool can be used both from the command line and through a simple graphical user interface. The workshop will begin with explaining the theoretical basis for compareMS2 and describing the underlying algorithms. We will then show how to install and run compareMS2 on some sample data. If participants bring their own data, we can also help them analyze this. Finally, if there is time, we will show how novel features in compareMS2 can be used to guide database (or library) dependent analysis of LC-MS/MS data.

Demonstration of a python package designed for quality control – Niveda Sundararaman, Cedars-Sinai, US

Demonstration of an in-house python package designed to facilitate quality control (QC) analysis, particularly for large-scale mass-spectrometry projects comprising of several samples spanning multiple batches for sample preparation and MS acquisition. Detection of technical biases and consistency verification is imperative for enhancing data reliability and enabling nuanced analysis and interpretation. Assessment of quality is performed through two primary steps: (1) ID-free, allowing for examining QC metrics from raw MS data and (2) ID-based, allowing for QC metrics evaluation from MS data at the protein, peptide or precursor level that has been processed through a search engine.

Solve the protein puzzle – Use the Human Protein Atlas to understand the different pieces of the human proteome – Cecilia Lindskog, Uppsala University, SE

The Human Protein Atlas integrates data from bulk transcriptomics, single-cell transcriptomics, mass spectrometry and antibody-based assays for generating a high-resolution map of the human proteome in tissues, cells, organelles and blood, both in health and disease. In this workshop, you will learn how to browse, search and interpret different pieces of information in the large proteome puzzle available at www.proteinatlas.org, and discuss how it can contribute to your own research project.

Keynotes

Challenges in Computational (clinical) Lipidomics and QC – a tale of impostors, cut-throat competition and lack of etiquette

Lipids play an integral part in biological functions and emerge as crucial biomarkers in medical research, notably in predicting cardiovascular risks. However, their analysis via mass spectrometry poses challenges due to their structural diversity, concentration ranges, isomerism, and complex fragmentation. This presentation highlights pitfalls in lipid identification, emphasizing the need for evidence-backed reporting using standardized nomenclature and lipid-class-specific MS features, together with class-specific quantification. Moreover, the talk explores strategies and presents tools for enhancing reporting practices in lipidomics experiments to foster reanalysis and comparability. Presenting ongoing work by the HUPO-PSI QC group, I introduce the mzQC file format and QC metrics tailored for mass spectrometry, offering insights into their application in lipid studies. Drawing from experiences in an international clinical lipidomics ring trial, I delve into the hurdles faced and strategies employed to establish concentration ranges measured by multiple labs across the world for four clinically relevant ceramides, addressing key challenges encountered during setup, execution, and analysis.

Vadim Demichev
Institute of Biochemistry, Charité, DE

What determines precision and accuracy of DIA proteomics?

DIA proteomics has long been viewed as a discovery method with
good label-free quantitative performance. However, while algorithms for peptide
identification from DIA data have progressed immensely in the past years,
quantification approaches have in contrast remained rather primitive. In this talk, I will
discuss recent technology and algorithm developments towards boosting precision
and accuracy of DIA.

Making proteomics data FAIR: Challenges and rewards

First of all I will summarise the current state of the art with regards to open data practices in proteomics, which have revolutionised the field in recent years. Throughout the talk, I will cover some key questions: why is good to make data available in the public domain? how this can be done?, and maybe even more importantly, what are these practices useful for? I will then highlight some nice examples of how this data is being reused by the community. Finally, I will explain some of the upcoming challenges.

Ben Orsburn
Johns Hopkins University School of Medicine, US

Special Session: Scientific communications in the age of skibidi toilet

This 3 part workshop by Ben Neely and Ben Orsburn will begin with a history of the terrible and inaccurate named “News In Proteomics Research” blog as described by the blog author. Some chronological history will help to portray some of the perils of social media and how these can have long lasting negative effects on your career. The second part of this informal workshop will be focused on how to set up and run your own podcast series. Part 3 will feature the distortion of social media with audience participation to let us all learn something. (This description of this workshop was written by Ben Orsburn)

Keynotes

Laurent Gatto
De Duve Institute / UCLouvain, BE

A principled approach to process, analyse and interpret single-cell proteomics data

Mass spectrometry (MS)-based single-cell proteomics (SCP) has become a credible player in the single-cell biology arena. Continuous technical improvements have pushed the boundaries of sensitivity and throughput. However, the computational efforts to support the analysis of these complex data have been missing. Strong batch effects coupled to high proportions of missing values complicate the analysis, causing strong entanglement between biological and technical variability. We propose a simple, yet powerful approach to address this need: linear models. We use linear regression to model and remove undesired technical factors while retaining the biological variability, even in the presence of high proportions of missing values. The key advantage of linear models lies in the interpretability of the results they generate. Inspired by previous research, we streamlined modelling and exploration of the patterns induced by known technical and biological factors. The exploration enables a thorough assessment of the model coefficients, and highlights key factors influencing SCP experiments. Further exploration of the unmodelled variance recovers unknown but biologically relevant patterns in the data, leveraging the power of single-cell proteomics technologies. We successfully applied our approach to a diverse collection of SCP datasets, and could demonstrate that it is also amenable for integrating datasets acquired using different technologies. Our approach represents a turning point for principled SCP data analysis, moving the tension point from how to perform the analysis to result generation and interpretation.

AlphaPeptDeep: a deep learning framework for peptide property prediction and what we can do with it

Deep learning has demonstrated its efficacy in enhancing the search capabilities of mass spectrometry (MS)-based proteomics data. Leveraging our AlphaX ecosystem, we have developed a deep learning framework called AlphaPeptDeep, equipped with an intuitive programming interface for both training and transfer learning. This framework is designed to handle diverse properties from peptide sequences with sufficient training data. Our studies revealed that AlphaPeptDeep excelled in predicting fragments (MS2), retention times (RT), and ion mobilities (IM) of peptides. Moreover, transfer learning significantly enhanced the accuracy of these predictions, even when trained on a limited dataset specific to certain experimental conditions. In addition to its proficiency in MS2/RT/IM prediction, AlphaPeptDeep can also forecast Major Histocompatibility Complex (MHC)-binding peptides. This feature streamlines the direct search of data-independent acquisition (DIA) MS data without the necessity of data-dependent acquisition (DDA)-based spectral libraries for immunopeptides. During this presentation, we will delve into other peptide properties, such as MS-detectability and charge states, which AlphaPeptDeep can predict through deep learning. We will also highlight the simplicity of model training using AlphaPeptDeep.

W.T.F. are isotopes

In this overview presentation, we will provide an introduction to elemental isotopes. What are they, how were they discovered, and how are these isotopes incorporated in molecular analytes? The discovery of stable isotopes changed chemistry forever as we had to rethink the concept of molecular mass. Therefore, we determine molecules by their monoisotopic and average masses. The mass spectrometry community found an interest in how these isotopes were distributed to aid in molecular identification. This identification worked well for low-mass molecules as it is easy to calculate a theoretical isotope distribution for a given molecule. However, for larger molecules, the calculation of the isotope distribution becomes cumbersome and requires a deep understanding of mathematics and computer science. Therefore, we explain the history of algorithmic design to calculate the isotope distributions. Furthermore, we introduce concepts like, e.g., isotopologues, the aggregated isotope distribution, and the fine isotope distribution.

Parallel Workshops

Overview and challanges in the lipidomics data analysis – Nils Hoffmann, FZ Jülich, DE

In this workshop, we will give participants a brief overview of some of the challenges that exist in lipidomics identification and quantification from mass spectrometry data and how they should best be tackled. We will present the recommendations provided by the Lipidomics Standards Initiative and will discuss why and when they make sense to be applied. Further, we will give an overview of existing resources and tools for lipidomics researchers, to make data analysis, integration and interpretation easier, also in relation to mult-iomics integration efforts. The workshop will include hands-on exercises that walk participants through a typical analysis of a lipidomics MS experiment using a mix of local and web-based software.

Deep learning enabled software solutions for discovery proteomics, immunopeptidomics, and glycoproteomics – BSI

Learn about the latest developments by Bioinformatics Solutions Inc. and how PEAKS can take your LC-MS/MS data analysis to new heights. PEAKS is a comprehensive proteomics software solution that supports data-dependent and data-independent acquisition analyses (DDA and DIA, respectively), and ion mobility mass spectrometry (IMS-MS). As a vendor-neutral computing platform, PEAKS is capable of directly loading raw mass spectrometry data and standard data formats. PEAKS now has streamlined workflows to identify and quantify peptides and proteins for DDA or DIA data. DDA workflows include de novo sequencing, PEAKS DB (database search) identification, PEAKS PTM (post translational modification) analysis, SPIDER search for mutations or variants, PEAKS Glycan for glycoproteomics, and PEAKS DeepNovo Peptidome for analysis of peptidomic datasets. DIA workflows integrate spectral library search, with direct database searching and de novo sequencing. Furthermore, the PEAKS QC tool allows the user to efficiently assess data quality, reproducibility, and troubleshoot potential problems with respect to sample preparation, instrumentation, and data analysis. All of these workflows take advantage of deep learning technology to improve the accuracy and sensitivity of protein/peptide identification. Taken together, PEAKS provides an innovative software solution for proteomics that’s driven by advanced deep learning-enabled algorithms and accommodates the latest mass spectrometry technology.

How to analyse single cell proteomics data with the scp-package – Laurent Gatto, De Duve Institute / UCLouvain, BE

Mass spectrometry-based single-cell proteomics (SCP) has become a credible player in the single-cell biology arena. Continuous technical improvements have pushed the boundaries of sensitivity and throughput to the point where high quality data sets are publicly available. However, the computational efforts to support their exploration, processing and analysis aren’t widely available. The goal of this workshop is to give participants the opportunity to learn how to use the QFeatures and scp Bioconductor package to manage, process and analyse single-cell proteomics data. The hands-on workshop will start by introducing the QFeatures class and its functions to perform generic proteomics data analysis. We will then move to SCP and present how scp extends QFeatures to single-cell applications. The remainder of the workshop will be guide participants through the analysis of a real-life analysis of published SCP data. This workshop is meant for users familiar with R that want to learn current state-of-the-art SCP data analysis. More information is available here.

Training and transfer learning deep learning models to predict peptide property values with AlphaPeptDeep – Wen Feng-Zeng, MPI Biochemistry, DE

In this workshop, participants will develop their own deep learning models to predict peptide properties of interest using the AlphaPeptDeep Python package. This hackathon will provide different kinds of training data for model training and transfer learning. These data include RT, CCS, charge state, and MS2 for regular peptides and modified peptides. Participants can also prepare their own training data. For details, please check here.

BRAIN, Pointless4Peptides, MIND and QCQuan – Dirk Valkenborg, Frédérique Vilenne and Piotr Prostko, Hasselt University, BE

BRAIN (45 Minutes) by Dirk Valkenborg and Frédérique Vilenne – The workshop starts out with a short 15-minute presentation to introduce attendees to our BRAIN algorithm. BRAIN, or Baffling Recursive Algorithm for Isotopic distributioN is an algorithm to accurately calculate the aggregated isotope distribution from an elemental composition using a polynomial method. After the introductory presentation, a hands-on case study of 30 minutes on using BRAIN is provided in the workshop through Jupyter Notebooks. During the practical session, the participants will work with a theoretical database and learn the basics of the isotope distributions through BRAIN in the R programming language.

Pointless4Peptides (45 Minutes) by Dirk Valkenborg and Frédérique Vilenne – After an initial introduction to the isotope distribution, we extend further on the topic through Pointless4Peptides. Pointless4Peptides is a novel algorithm to predict the isotope distribution of average peptides based on their monoisotopic mass. Employing penalized spline regression to model the isotope distribution as a function of the monoisotopic mass. The combination of a compositional data representation with penalized spline regression offers a precise and flexible method to model the isotope probabilities. Additionally, acknowledging the importance of Sulphur in the isotope distribution, the model allows for highly accurate detection of Sulphur present in the molecule. After a short introductory presentation of 15 minutes, the participants will have a 30-minute hands-on session on working with Pointless4Peptides, which is available as an online tool and as an R-function. Using the previously accomplished results from BRAIN, the predictive capabilities of Pointless are further explored and validated.

MIND (45 Minutes) by Piotr Prostko – Stepping further outside of the theoretical framework, we present MIND. A short introductory presentation of 15 minutes is given, elaborating on the theory behind MIND. MIND or MonoIsotopic liNear preDictor is a framework to accurately predict the monoistopic mass of a precursor peptide, utilizing the most abundant isotope peak. After the presentation, a workshop of 30 minutes is provided to explore MIND in a practical context.

QCQuan (45 Minutes) by Dirk Valkenborg and Frédérique Vilenne – The final part of the workshop will be stepping outside of the isotopic framework. QCQuan is an online web application that automatically provides the user with an exploratory and quality control analysis and a differential expression analysis of a quantitative label-based proteomics experiment. An introductory presentation of 15 minutes and a hands-on session of 30 minutes is given.

Keynotes

Robbin Bouwmeester
Ghent University / VIB, BE

Rethinking the space race in proteomics informatics; are we using the right metrics?

Machine learning for predicting the LC-MS behaviour of peptides have become widespread in the field of computational proteomics. These predictions find utility across various applications such as experimental design, rescoring, spectral library creation, and many more. Despite the improvement in commonly used metrics (such as number of PSMs or quantified proteins) when incorporating machine learning, there is disagreement on metrics to quantify this improvement. For example, when new models get introduced their prediction accuracy in terms of the spectral angle or Pearson correlation is compared with the current state-of-the-art. However, making a fair comparison is difficult and does not always help your chances of getting published (unfortunately). Furthermore, while an improvement in prediction accuracy or more identified PSMs can indicate a potential to better answer biological questions, in many cases more accurate predictions will not influence the quality of answers to our biological questions. Are we hyper focused on the accuracy of our models and are we thus wasting valuable research time? In this talk I will present three practical cases where metrics the field commonly uses are not indicative of the quality of answers to biological questions.

Simplified and Automated Analysis of Large-Scale Proteomics Datasets

Analysis of large-scale proteomics dataset involves several steps including quality control (QC), search, quantitation, visualization and reporting. Here, we demonstrate implementing these systems using an easy-to-use, interactive, customizable, web-based next-generation analysis platform for deeper proteomics insights. The highly scalable platform supports standard and custom workflows and robust pipelines, enables reproducible and value driven data transformation and, data visulalzation by levraging supercomputing for higher throughput. The talk will cover the following aspects: 1) User interface with the option to be standardized or customized for diverse project needs 2) Ability to plug in any tool to play with complex web of dynamic pipelines and 3) visulaization and reporting of proteomics datasets in to identify biological patterns and trends.

Thomas Burger

CNRS, FR

Thomas Burger is a CNRS senior scientist specialized in statistical and computational methodologies to improve knowledge extraction from high-throughput mass spectrometry based proteomics data. He holds two MS degrees in computer sciences and in applied mathematics (in 2004), a PhD in pattern recognition (in 2007) and a Habilitation thesis (in 2017), all from Grenoble Alpes University (France). Thomas has been an associate professor in machine learning with South Brittany University for three years, before rushing back to his beloved mountains in 2011, as a CNRS scientist. Since then, he has been affiliated with EDyP, a joint lab with Grenoble Alpes University, CNRS, CEA and INSERM. His research group is essentially focused on theoretical questions underlying missing value imputation, multi-omics data fusion and false discovery rate control, while maintaining and developing the Prostar software suite for the statistical analysis of label-free proteomics data

Cecilia Lindskog

Uppsala University, SE

Cecilia Lindskog is Associate Professor in experimental pathology and head of her research group at the Faculty of Medicine at Uppsala University in Sweden. Her research focuses on integrating transcriptomics and antibody based proteomics with the purpose of linking cell type specificity with the function and disease mechanisms. Additionally, she is leading the tissue-based profiling of Human Protein Atlas.

Ben Neely

NIST, US

Ben Neely is a research chemist with the National Institute of Standards and Technology in Charleston, South Carolina (NIST-Charleston) focused on analytical biochemistry and bioinformatics. His diverse background includes microbiology, wildlife disease, cancer biology, biomarker discovery and validation (protein and glycan) and bioinformatics. His focus is primarily bottom-up proteomic methods and data analysis (DDA and DIA, and metaproteomics). He leads the Comparative Mammalian Proteome Aggregator Resource (CoMPARe) Program, generating standardized proteomic data across non-model species. Approaching this problem requires genome sequencing and annotation, understanding optimum search space construction, and implementing quality control metrics with and without known fasta (including generative ML applications). These solutions have helped drive techniques to identify unknown species, and to apply bottom-up proteomics in complex samples such as host-virus-vector systems.

Nils Hoffmann

FZ Jülich, DE

Nils studied computer science and developed an interest in biology, chemistry, and mass spectrometry during his time at Bielefeld University. After completing his PhD on processing GCxGC-MS metabolomics data, he transitioned to the IT industry, where he contributed to maintaining and developing search engines for the European Patent Office. Nils later returned to the scientific field, joining ISAS Dortmund and currently coordinates activities for the de.NBI LIFS consortium and works at Forschungszentrum Jülich, sharing duties in the development and operations of the de.NBI Cloud portal and coordinating the German ELIXIR node activities related to cloud computing and interoperability.

Vadim Demichev

Institute of Biochemistry, Charité, DE

Vadim Demichev’s laboratory for Quantitative Proteomics, Charité –
Universitätsmedizin Berlin, works on development and application of new mass
spectrometry methods, to boost speed, sensitivity, reliability and quantitiative
accuracy of proteomics. Our recent developments include the QuantUMS concept for
machine-learning guided quantification in proteomics, the plexDIA platform for
multiplexed data-independent acquisition proteomics and the Slice-PASEF
technology that maximises the MS/MS duty cycle and hence the sensitivity and
precision of the mass spectrometry method. We also develop the DIA-NN algorithm
suite for proteomics data processing.

Juan Antonio Vizcaíno

EMBL-EBI, UK

Dr. Juan Antonio Vizcaíno is leading the Proteomics Team at the European Bioinformatics Institute (EMBL-EBI, Cambridge, UK). His group is responsible of the development of the PRIDE database (https://www.ebi.ac.uk/pride/), the world-leading public repository for mass spectrometry (MS) proteomics data, and related tools and resources. In addition, he co-founded and is coordinating the ProteomeXchange Consortium (http://www.proteomexchange.org/), standardizing data submission and dissemination in proteomics resources worldwide. He actively promotes open data policies in the proteomics field and has participated in many studies where public proteomics datasets are reused for different purposes. Additionally, over the years, he has heavily contributed to the development of open proteomics data standard formats and related software, under the umbrella of the HUPO Proteomics Standards Initiative (PSI). He is also co-leading the ELIXIR Proteomics Community (https://elixir-europe.org/communities/proteomics) in Europe. Furthermore, he has co-organised the annual Proteomics Bioinformatics course at EMBL-EBI since 2009.

Ben Orsburn (Special Guest)

Johns Hopkins University School of Medicine, US

Ben Orsburn is a Principal Investigator at the Johns Hopkins University School of Medicine in the Department of Pharmacology. His research centers on measuring atypical biomolecules of interest to human health. He spends most of his free time engaged in scientific communications through projects like the News In Proteomics blog, the Proteomics Show podcast, and as chair of the US HUPO Virtual Media Organization.

Laurent Gatto

De Duve Institute / UCLouvain, BE

Laurent is an Associate Professor of Bioinformatics at the de Duve Institute, UCLouvain, in Belgium. His research group focuses on the development and application of statistical learning for the analysis, integration and comprehension of large scale biological data. The development and publication of scientific software is an integral part of the labs work, as reflected by their numerous contributions to the Bioconductor project. Laurent is an avid open and reproducible research advocate, making his research outputs openly available. He is a Software Sustainability Institute fellow, a Data and Software Carpentry instructor and a member of the Bioconductor technical advisory board.

Wen-Feng Zeng

MPI Biochemistry, DE

Dr. Wen-Feng Zeng got his PhD in pFind Lab in Institute of Computing Technology, Chinese Academy of Sciences. After he worked as an assistant researcher in pFind Lab, he joined Mann Lab in Max-Planck Institute of Biochemistry as a postdoc researcher. His research interests include computational proteomics/glycoproteomics and deep learning in proteomics.

Dirk Valkenborg

Hasselt University, BE

Dirk Valkenborg is team leader of BIER-lab (Bioinformatics, Intelligence, Exploration and Research) at Hasselt University where he is affiliated to the Data Science Institute and the Center for Statistics. With his education in engineering, biostatistics, and mathematics and an interest in biology and clinical research, he develops theories and applications for today’s problems in biotechnology and human health. His research is centered on data processing, statistical analysis of various ‘omics’ data, and integrating data workflows.

Robbin Bouwmeester

Ghent University / VIB, BE

Robbin is a researcher dedicated to studying peptide and small molecule behaviour in liquid chromatography, ion mobility, and mass spectrometry. He completed his PhD in 2020 as part of the H2020 project MASSTRPLAN. Following his PhD, Robbin took on the role of a postdoctoral scientist at Johnson & Johnson and the Flemish Institute for Biotechnology (VIB). There, he applied his knowledge to the development of machine learning models designed to improve quality control processes in pharmaceutical workflows.

Niveda Sundararaman

Cedars-Sinai, US

Niveda Sundararaman is the Bioinformatics lead at the Van Eyk Lab at Cedars Sinai Medical Center, Los Angeles, CA. She carried out her M.Sc. in bioinformatics and computation at Georgia Tech. Her focus is primarily in DIA bottom-up proteomic data analysis through the development of bioinformatics algorithms, pipelines and visualization tools to handle large-scale mass spectrometry datasets.