Implementing a Proteomics Data Pipeline and Database on the LabKey Server Platform, with OpenSlice, to Promote In-Depth Analysis, Data Sharing and Integration

Wen Yu, Principal Scientist, MedImmune, a Member of the AstraZeneca Group


New generation of proteomics technology using high-resolution and fast tandem mass spectrometry, coupled to multiplex-quantitation technique can routinely quantify most of the proteomes (~8000 proteins in cell lysates or more than 1000 proteins in plasma). After careful evaluation, an open-source platform, LabKey, was introduced to manage the proteomics and other data. In this presentation, we will describe the overall approaches, data architecture, pipeline and various customizations made possible by the power and flexibility of LabKey platform.

One of the first workflow being implemented is TMT-based multiplex quantitation of the total proteome. Following the data acquisition and processing in ProteomeDiscoverer, the experimental design, peptide and protein identification and quantitation are imported into LabKey where a custom-built data ingestion pipeline written in R will transform the data and prepare them for deposition in a Microsoft SQL database. Additional workflows will be implemented to support label-free quantitation by Maxquant. Targeted quantitation via MRM will also be supported via SkyLine/Panorama integration in LabKey.

One of the key strengths of LabKey is the flexibility of custom query, visualization and report with SQL/R or point-n-click interface. For example, boxplot, volcano plot can be readily generated in LabKey and shared with other researchers. Once a study is established in LabKey, its experimental design, LcMsMs runs, protein identification and quantitation can be inspected via the web-interface as data grids or plots. To visualize the raw MS and MS/MS data, another open-source program, OpenSlice, was adopted. It will pre-process the raw files to allow instantaneous review of spectrum and XIC trace. We’ve integrated both OpenSlice and LabKey to enable drill-down of the experimental evidences from summary levels downward.

