Recipe 2.2 - Customizing mgf reader
Problem
You want to customize MgfReader or MgfReaderGeneric to parse new tags and/or return custom parsed spectrum object.
Solution
MgfReaderGeneric provides a powerful and flexible way to make customizations. You simply extend this class and override methods that handle and parse specific parts of the MGF document.
Parsing TITLE tag
If you need to parse information from the TITLE tag, you can access the whole TITLE value from the Metadata comments. You can also provide your own implementation of TitleParser and give it to the MgfParser. MgfReader delegates the parsing of TITLE tag to TitleParsers.
Default implementations of TitleParser are found in package org.expasy.mzjava.proteomics.io.ms.reader.mgf and are automatically loaded by ServiceProvider
TitleParser instance just need to be added to the MgfParser Here is an example of our implementing class RegexScanNumTitleParser that can be used to match the data you need
String entry =
"BEGIN IONS\n" +
"TITLE=scan 1\n" +
"PEPMASS=822.000\t946967.268\n" +
"CHARGE=2+\n" +
"198.053 38 0\n" +
"199.141 34 0\n" +
"711.524 29 0\n" +
"715.513 48 0\n" +
"738.915 1.86E2 0\n" +
"739.374 81 1\n" +
"743.954 37 0\n" +
"744.386 88 0\n" +
"7.51116E2 90 0\n" +
"768.535 462 0\n" +
"771.675 169 1\n" +
"772.725 70 0\n" +
"END IONS";
MgfReader reader = new MgfReader(new StringReader(entry), URI.create("file:/one_entry.mgf"), PeakList.Precision.DOUBLE);
// it extracts scan number from the TITLE tag and add it to metaData of parsed Spectrum
TitleParser titleParser = new RegexScanNumTitleParser(Pattern.compile("scan number=\\s+(\\d+).*"));
// just add it to the MgfReader instance
reader.addTitleParser(titleParser);
Here is how to make a custom TitleParser
TitleParser titleParser = new TitleParser() {
@Override
public boolean parseTitle(String title, MsnSpectrum spectrum) {
// do process infos from TITLE and store it into spectrum
return true;
}
};
// add it to the MgfReader instance
reader.addTitleParser(titleParser);
Parsing custom tags
MgfReader also provides a set of default implementations for parsing different part of mgf (CHARGE, PEPMASS, SCANS, RTINSECONDS and other unknown tags).
It has been meant to be overridden by subclasses to handle custom parsing of any part of the entry.
It is really easy to custom the parsing of exotic tags by overriding method parseUnknownTag()
// here is the content of an entry
String entry =
"BEGIN IONS\n" +
"TITLE=scan 1\n" +
"PEPMASS=822.000\t946967.268\n" +
"CHARGE=2+\n" +
"MYTAG=my content\n" +
"198.053 38 0\n" +
"199.141 34 0\n" +
"711.524 29 0\n" +
"715.513 48 0\n" +
"738.915 1.86E2 0\n" +
"739.374 81 1\n" +
"743.954 37 0\n" +
"744.386 88 0\n" +
"7.51116E2 90 0\n" +
"768.535 462 0\n" +
"771.675 169 1\n" +
"772.725 70 0\n" +
"END IONS";
// we've created an anonymous *MgfReader*
MgfReader reader = new MgfReader(new StringReader(entry), new URI("file:/tmp/one_entry.mgf"), PeakList.Precision.DOUBLE) {
// you have to override mthod parseUnknownTag() to handle the proper parsing
@Override
protected boolean parseUnknownTag(String tag, String value, MsnSpectrum spectrum) {
// MYTAG=my tag value
if (tag.startsWith("MYTAG")) return parseMyTag(value, spectrum);
else return super.parseUnknownTag(tag, value, spectrum);
}
private boolean parseMyTag(String tagValue, MsnSpectrum spectrum) {
// do parse "my tag value"
return true;
}
};
When you need to store information unknown of MsnSpectrum , you should define your custom MsnSpectrum and create your reader that extends MgfReaderGeneric. Doing so you would have to give an implementation of newSpectrum()
String mgf =
"BEGIN IONS\n" +
"TITLE=scan 1\n" +
"PEPMASS=822.000\t946967.268\n" +
"CHARGE=2+\n" +
"MYTAG=my content\n" +
"198.053 38 0\n" +
"199.141 34 0\n" +
"711.524 29 0\n" +
"715.513 48 0\n" +
"738.915 1.86E2 0\n" +
"739.374 81 1\n" +
"743.954 37 0\n" +
"744.386 88 0\n" +
"7.51116E2 90 0\n" +
"768.535 462 0\n" +
"771.675 169 1\n" +
"772.725 70 0\n" +
"END IONS";
// Creation of anonymous MgfReader*
AbstractMgfReader<PeakAnnotation, CustomMsnSpectrum> reader = new AbstractMgfReader<PeakAnnotation, CustomMsnSpectrum>(
new StringReader(mgf), new URIBuilder("www.expasy.ch", "test").build(), PeakList.Precision.DOUBLE, new PeakProcessorChain<>()) {
@Override
protected CustomMsnSpectrum newSpectrum(AbstractMsReader.ParseContext context, PeakList.Precision precision) {
return new CustomMsnSpectrum(precision);
}
@Override
protected boolean parseUnknownTag(String tag, String value, CustomMsnSpectrum spectrum) {
if (tag.startsWith("MYTAG")) return parseMyTag(value, spectrum);
else return super.parseUnknownTag(tag, value, spectrum);
}
private boolean parseMyTag(String tagValue, CustomMsnSpectrum spectrum) {
spectrum.setMyTagValue(tagValue);
return true;
}
@Override
protected boolean parseTitleTag(String value, CustomMsnSpectrum spectrum) {
return false;
}
@Override
protected void setRetentionTimes(CustomMsnSpectrum spectrum, RetentionTimeList retentionTimeList) {
spectrum.addRetentionTimes(retentionTimeList);
}
@Override
protected void setScanNumbers(CustomMsnSpectrum spectrum, ScanNumberList scanNumbers) {
spectrum.addScanNumbers(scanNumbers);
}
};
// next returns your custom spectrum
CustomMsnSpectrum spectrum = reader.next();
// your custom spectrum object returned by your parser
public static class CustomMsnSpectrum extends MsnSpectrum {
private String myTagValue;
public CustomMsnSpectrum(Precision precision) {
super(precision);
}
public void setMyTagValue(String myTagValue) {
this.myTagValue = myTagValue;
}
}
Discussion
See also
See also Recipe 2.1
|