Mass Spectrum Analysis and Data Conversion Tool

 

Mass Spectrum Analysis and Data Conversion Tool was created by Pavel Cejnar (pavel.cejnar@vscht.cz). This software is also based on Martin Strohalm’s program mMass (https://www.mmass.org) and Joachim Wuttke Levenberg-Marquardt least-squares minimization implementation (https://joachimwuttke.de/lmfit).

Installation on MS Windows

The program is distributed in ZIP archive containing the „ms“ application folder. The program executable file is a command line utility ms.exe and ms-alone.exe.

ms.exe requires exactly two parameters:

  1. the file with configuration options
  2. the file with mass spectrometry data to process

If you want to process more than one file, either use the shell scripting or create a file with a .bat extension having commands of structure

ms.exe path_to_config_file path_to_spectrum_file >> path_to_log_file

on each line inside. For example create a file convertMySpectra.bat containing these three lines:

D:\ms\ms.exe D:\ms\configs\configHighProteins.xml D:\ms\examples\bacteria\Z1-03-Analysis.mzML >> D:\ms\log.txt

D:\ms\ms.exe D:\ms\configs\configHighProteins.xml D:\ms\examples\bacteria\Z1-04-Analysis.mzML >> D:\ms\log.txt

D:\ms\ms.exe D:\ms\configs\configHighProteins.xml D:\ms\examples\bacteria\Z1-05-Analysis.mzML >> D:\ms\log.txt

Then, when you double click this file, you will convert all the three files at once, reading the configuration parameters from the file D:\ms\configs\configHighProteins.xml and logging all the success and failure output information to the D:\ms\log.txt log file.

 

ms-alone.exe requires exactly one parameter - the file with mass spectrometry data to process. It loads the configuration options file config.xml from the same folder as ms-alone.exe. ms-alone.exe produces no output, however for each file it is run, it adds the report to the ms-alone-log.txt and LM-log.txt files created in the same folder as ms-alone.exe.

To simplify things, you can assign ms-alone.exe for the opening of files with selected extension in MS Windows (Open With command in Explorer right-click context menu) and then you can convert the spectrum data through one double-click on a selected file.

Current versions of ms.exe and ms-alone.exe were successfully tested on Windows XP and Windows 7 32-bit and 64-bit (however the program is a 32-bit application).

Running from Source

The python source code has been compiled with Python 2.7 (https://www.python.org, 32-bit Python 2.7.3 for Windows). You will also need to install NumPy extension (https://www.numpy.org, 32-bit numpy-1.6.2 for python 2.7) and if you want to add some windows gadgets, you can use wxPython (https://www.wxpython.org, 32-bit unicode wxPython 2.8.12.1 for python 2.7 for example).

If you want to make your own exectutable file for MS Windows you need to have py2exe (https://pypi.python.org/pypi/py2exe, 32-bit py2exe 0.6.9 for python 2.7).

You can make the executable file for Windows simply by running the command “python setup.py py2exe” or “python setup-ms-alone.py py2exe” in the main “ms” folder. If you want to distribute the compiled files, make sure you will bundle all the dependencies like msvcrXX.dll and you have the license to distribute them.

To compile C-source files calculations.c, lmcurve.c, lmmin.c, use Microsoft Visual Studio 2008 or Microsoft Visual Studio 2010 Express. Open the command line interpreter, move to “mspy” folder and simply run “python setup.py build”. Find a “calculations.pyd'' file and move it back to “mspy” folder.

Input formats

The program supports several mass spectrometry formats, like XML-based formats mzData (https://www.psidev.info), mzXML (https://tools.proteomecenter.org) and mzML (https://www.psidev.info) or simple text based formats - Mascot Generic Format or two column TXT format (*.txt). The mMass Spectrum Document (*.msd) format is also supported. Since it is often impossible to obtain the manufacturer's description of their native file formats, they are not currently supported. If you have a Bruker’s CompassXport tool installed on your computer and the command “CompassXport.exe” is available on the search path, it is automatically used to convert and open raw data from all Bruker’s instruments. The tool is available for free. However, it is for MS Windows platform only. There are also some configuration options in the configuration file for the cooperation with this tool.

Data Processing

The application is configured by its configuration xml file, which is divided into several sections. The section <batch> controls which operation will be executed during the run. The operations are executed in the order croping, smoothing, peakpicking with baseline evaluation, deisotoping, exporting peak list, baseline subtraction, exporting spectrum. To execute the operation, set the parameter value to 1, to omit the operation, set the parameter value to 0.

<batch>

      <param name="crop" value="1" type="int" />

      <param name="smoothing" value="1" type="int" />

      <param name="peakpicking" value="1" type="int" />

      <param name="deisotoping" value="1" type="int" />

      <param name="exportPeaks" value="1" type="int" />

      <param name="baseline" value="1" type="int" />

      <param name="exportSpectrum" value="1" type="int" />

  </batch>

All the parameters for the operations are stored in the <processing> section in an appropriate subsection.

Cropping

This function simply discards all the spectrum data which are out of the m/z range specified by lowMass and highMass parameters.

Smoothing

You can use this function to smooth the noise which distorts shape of the spectrum. There are three different smoothing methods available ­ Moving Average, Gaussian and Savitzky­Golay. In general, Moving Average and Gaussian are much faster but causes significant intensity loss for sharp peaks. These methods should be preferentially used to smooth high­mass spectra where peaks are broader. On the other hand, Savitzky­Golay filter is very slow but intensity loss is much lower. This method should be preferentially used to smooth low­mass spectra where peaks are sharp.

To set the method, set the value “MA”, “GA”, or “SG” to the method parameter. Set the appropriate m/z interval size as a smoothing window to the windowSize parameter. Set the number of repetitions of the smoothing operation to the cycles parameter.

Peak Picking

If you want to automatically find peaks in the spectra, execute the peak picking operation. It is strongly recommended to apply cropping and/or smoothing operation before.

To filter according to s/n threshold, first the baseline (zero-noise level) must be computed. The baseline is computed as a median intensity of the signal in selected signal window.  Set the parameter baselinePrecision to specify the selected window. The higher the value of the baselinePrecision parameter, the shorter the window, from which the baseline will be computed, i.e. the baseline will shape according to the spectrum. Set the value 1 to compute the baseline from a widest possible window. Set the value 0 to compute the constant baseline from the whole spectrum. Set the parameter baselineOffset for relative correction of the computed baseline, i.e. to lower the baseline by the specified relative amount of the computed noise deviation. Then set the parameter snThreshold. All the peaks bellow the S/N threshold will not be reported. Set the parameter pickingHeight to find at which relative height of the peak should be computed the m/z center of the peak and thus its intensity at that point. Set the computePeakArea to 1, if you want to compute the area of the peaks. First, the m/z values where relative peak intensities are above the pickingHeight are determined and then the area between is computed. This is also affected by the smoothing operation.

To precisely determine the peak maximum and peak area, set the peakApproximation parameter to 1. Then each detected peak is approximated by a curve of given shape. All the data where relative intensities are above the peakApproximationDataPickingHeight are used for the approximation. Currently supported parameterized functions are (peakApproximationType):

gaussian:                                         

sumof2gaussian:                            

The approximation by one gaussian function could lead to a better approximated whole spectrum when combined together. The approximation by a sum of two gaussian functions leads to more precise approximation of peak maximum and peak area. The actual function parameters are determined by the Levenberg-Marquardt least-squares minimization algorithm (see [1]) and then the peak center (mz), peak maximum (ai) and peak area (area) are computed. The result of each Levenberg-Marquardt algorithm call is written to the LM-log.txt file. The peak approximation at this step doesn’t affect the spectrum at the exportSpectrum operation.

Deisotoping

The main purpose of this operation is to cluster the peaks to appropriate groups and if required to remove peak isotopes or peaks that don’t belong to any peak cluster. Starting from specified maxCharge, for every peak its isotopes are searched using corresponding isotopic mass difference (1.00287/abs(z)) ± massTolerance. If at least one isotope is found, the peak is set as parent peak (monoisotopic peak) with current charge state. If no isotope is found, current charge state is decreased (abs(z) ­ 1) and search cycle starts again for the same peak. Because of possible peak overlaps, theoretical isotopic pattern needs to be taken into account. While searching for isotopes, intensity of every found peak is also compared with its isotopic theoretical value. If the intensity is matching theoretical value ± (intTolerance * theoretical value), corresponding peak is set as the isotope peak for given parent peak and discarded from any subsequent search cycle. If the difference is over the tolerance, the corresponding peak will be used as a possible parent (monoisotopic) peak in a subsequent search cycle. For isotopes, the default isotope distance is used (1.00287). You can change this value by setting isotopeShift parameter to the value to add to default isotope distance. If you do not want to report the isotopes, but only the parental peaks, set the parameter removeIsotopes to value 1. If you do not want to report peaks that were not assigned to any peak cluster, set the parameter removeUnknown to 1.

Exporting Peaklist

This operation exports the processed peak list to the text file. Any previous operations are applied according to configuration file settings and then the peaks are exported. The output text file is stored to the same directory as the read spectrum.

Set the parameter peaklistHeader to value 1, if you want to add the first line to the exported file, containing the names of the columns. Set the peaklistSeparator parameter to the column delimiter character. Use HTML escape sequences for special characters. For the tabulator character use the value “tab”. Set the peaklistColumns parameter to choose which columns to export.  The possible columns are (in order of appearance):

  1. mz – m/z of the peak or approximated m/z of the peak
  2. ai – absolute processed intensity of the peak or approximated absolute intensity of the peak
  3. base – computed baseline intensity for given m/z
  4. int – intensity of the peak (i.e. = ai – base)
  5. rel – relative intensity of the peak in %. The base (100%) is the highest peak in the spectrum
  6. sn – signal-to-noise ratio
  7. z – detected peak charge
  8. mass – peak mass parameter computed from its m/z and z value.
  9. fwhm – full width at half maximum of the peak
  10. pickheight_b – m/z start of the peak at peak picking height, not affected by the peak approximation
  11. pickheight_e – m/z end of the peak at peak picking height, not affected by the peak approximation
  12. resol – peak resolution  (i.e. = (m/z) / fwhm)
  13. deisotoped – whether peak is a part of any peak cluster. Possible values are: ‘None’ – not a part of any cluster, ‘False’ – peak is a parent peak of a peak cluster, ‘True’ – peak is some subsequent peak in a peak cluster
  14. deisotoping_grp – number of cluster which given peak belongs to or ‘None’ if it is not a part of any peak cluster
  15. area – partial peak area between the m/z of given relative intensity
  16. approx_gauss1_a – approximated height of the first gaussian curve
  17. approx_gauss1_b – approximated center of the first gaussian curve
  18. approx_gauss1_c – approximated standard deviation of the first gaussian curve
  19. approx_gauss2_a – approximated height of the second gaussian curve
  20. approx_gauss2_b – approximated center of the second gaussian curve
  21. approx_gauss2_c – approximated standard deviation of the second gaussian curve

Use the semicolon character (;) as a separator in the parameter string. The order of the columns doesn’t depend on the order in the parameter string. The order is always as listed above.

Baseline Subtraction

Before the exporting the spectrum, user can subtract the baseline to export the intensities instead of the absolute intensities. The baseline is computed as a median intensity of the signal in selected signal window.  Set the parameter baselinePrecision to specify the selected window. The higher the value of the baselinePrecision parameter, the shorter the window, from which the baseline will be computed, i.e. the baseline will shape according to the spectrum. Set the value 1 to compute the baseline from a widest possible window. Set the value 0 to compute the constant baseline from the whole spectrum. Set the parameter baselineOffset for relative correction of the computed baseline, i.e. to lower the baseline by the specified relative amount of the computed noise deviation.

Exporting Spectrum

This operation exports the processed spectrum (the m/z and the intensities after cropping, smoothing and baseline extraction) to a text file. Any previous operations are applied according to configuration file settings. The output text file is stored to the same directory as the read spectrum.

Set the parameter spectrumHeader to value 1, if you want to add the first line to the exported file, containing the names of the columns. Set the spectrumSeparator parameter to the column delimiter character. Use HTML escape sequences for special characters. For the tabulator character use the value “tab”.

License

This program, along with all associated documentation, is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. However, if you find the software useful, please cite the paper:

Hrdlickova Kuckova, S., Rambouskova, G., Hynek, R., Cejnar, P., Oltrogge, D., and Fuchs, R. (2015) Evaluation of mass spectrometric data using principal component analysis for determination of the effects of organic lakes on protein binder identification. J. Mass Spectrom., 50: 1270–1278. doi: 10.1002/jms.3699.

References

[1] K. Madsen, H. B. Nielsen, O. Tingleff: Methods for non-linear least squares problems. https://www.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf (2004).

Last modified: 28.08.2023