unthermo, or simply reading out .raw mass spectrometer data

When working with Thermo Scientific mass spectrometer binary data files (.raw), you’re usually limited to the Windows platform, using the MSFile Reader dll library. Gene Selkov has done a lot of effort in liberating file access with unfinnigan, providing Perl scripts to read the spectral data directly from the file. But you still need to have a Perl library installed. Dealing with libraries and all kinds of file types is more often than not overkill for the problem at hand. Extracting a few peaks should be as easy as executing one simple command.

unthermo is the result of reimplementing parts of unfinnigan in Go, a programming language that allows creating self-contained executables (for all platforms: Windows, Mac OS X, and Linux). Now, you can very easily provide tools reading Thermo files that don’t need any installing. All that is just great fun.

A first tool xic extracts a mass chromatogram for given m/z’s within a certain tolerance in ppm.
Download it on your computer, xic -mz 361.1466 -tol 2.5 -raw rawfile.raw, done. The tool and some more details are available on the homepage of the project.

Technically, the library provides two levels of abstraction over the binary data. On the lowest level reside the data structures of the .raw files, so you can read out e.g. frequency data, or meta-data written by the mass spectrometer. The higher level deals with the scan properties and peak lists (i.e. Spectra). This is for example used to output a specific m/z, time and intensity for every MS 1-level scan by the xic tool.


Developers that want to get there hands on the code are welcome at the git repository on bitbucket. If you have the Go language tools installed (via their web site or your package manager), you can run “go get bitbucket.org/proteinspector/ms/unthermo” which will download the repository to your Go workspace.

Here’s the code of a simple example that uses unthermo to print m/z and intensity of a specific scan number in a raw file. The Scan method returns an ms.Scan, on which function printspectrum can be called. This function loops over the peaks in the mass spectrum (Spectrum is a []Peak), printing their m/z and intensity.

The output of running this code with arguments -sn 1 -raw rawfile.raw [1] will have the following form:

300.0573502627188 22.64058
300.0583320195028 184.16449
300.05931378110523 544.771
300.060295547526 972.3836
300.061277318765 1211.7169
300.0615234375 1220.9489
300.06225909482254 1127.0696
300.06324087569857 834.89545

[1] “go run printspectrum.go -sn 1 -raw rawfile.raw” or “go build printspectrum.go” and then “./printspectrum -sn 1 -raw rawfile.raw”