SIB

Recipe 2.1 - Reading Mass Spectrometry data

Problem

You want to read MS data iteratively from a character stream.

Solution

Use our IterativeReader implementing AbstractMsReaders to go through each parsed Spectrum using the “foreach” statement.

In this recipe we will take as an example the MGF and MZXML readers.

Building MS Readers

Our readers can be associated with a File


String mgfFilename = "mgf_test.mgf";
MgfReader reader = new MgfReader(new File(mgfFilename), PeakList.Precision.DOUBLE);

or with a java.io.Reader


Reader sr = new StringReader("BEGIN IONS\n" +
        "TITLE=B06-1151_p.00478.00478.2\n" +
        "PEPMASS=476.23272705078125\n" +
        "CHARGE=2\n" +
        "135.032\t1.376\n" +
        "146.042\t5.997\n" +
        "158.052\t25.335\n" +
        "175.047\t202.932\n" +
        "186.047\t27.338\n" +
        "203.140\t60.595\n" +
        "213.230\t5.732\n" +
        "221.182\t1.083\n" +
        "231.047\t1159.671...");

MgfReader reader = new MgfReader(sr, new URI("http://somewhere.ch/mymsdata.mgf") , PeakList.Precision.DOUBLE);

We provide also some static factory methods that create the proper reader given the file type


IterativeReader reader =
        MsReaderFactory.newMsnSpectraReader(new File(mgfFilename), PeakList.Precision.DOUBLE);

Some MS formats gives some informations that could be inconsistent with scanned spectra. As an example, mzxml format provides attributes like “peaksCount” or “totIonCurrent” that can conflict with the ones extracted from the decoded spectrum. MzxmlReader provides a system to control those consistency checks:


String mzxmlFilename = "mzxml_test.mzXML";

// default constructor strictly checks for inconsistences
MzxmlReader reader = new MzxmlReader(new File(mzxmlFilename), PeakList.Precision.DOUBLE);

// we can then add/remove ConsistencyCheck
reader.removeConsistencyChecks(EnumSet.complementOf(EnumSet.of(MzxmlReader.ConsistencyCheck.TOTAL_ION_CURRENT)));

MzxmlReader also provides static factory methods:


MzxmlReader tolerantReader = MzxmlReader.newTolerantReader(new File(mzxmlFilename), PeakList.Precision.DOUBLE);
MzxmlReader strictReader = MzxmlReader.newStrictReader(new File(mzxmlFilename), PeakList.Precision.DOUBLE);

Defining processing for future reading action


// this filter will no retain peaks with intensity of 0
MzxmlReader reader = MzxmlReader.newStrictReader(new File(mzxmlFilename), PeakList.Precision.FLOAT, new PeakProcessorChain<>()
        .add(new ThresholdFilter<>(0, ThresholdFilter.Inequality.GREATER)));

Reading MsnSpectrum

Reading is done iteratively until there is no more spectra to get. Here is a snip example with MgfReader


MgfReader reader = new MgfReader(new File(mgfFilename), PeakList.Precision.FLOAT);

// hasNext() returns true if there is more spectrum to read
while (reader.hasNext()) {

    // next() returns the next spectrum or throws an IOException is something went wrong
    MsnSpectrum spectrum = reader.next();

    // do some stuff with your spectrum
    Assert.assertTrue(spectrum.size()>0);
}

reader.close();

Discussion

See also

For an advanced use of MgfReader see also Recipe 2.2