Study of Techniques for Parsing and Analyzing N42 Gamma Spectra Files

04 Jan 2024

Note: this post was generated with JupyterLab. HTML version is here.

GitHub here.

Intro

The goal of this project is to practice and demonstrate my knowledge of gamma detection systems and the N42 file format, as well as to document some of my initial, naive approach for parsing and analyzing N42 files with Python. Resources I used for this project include the N42 file format standard (available here) and LLNL RASE (GitHub here, credited here). Because this project is educational, and I’m leaning heavily on others work, I’ll warn it’s possible I’ve made a mistake either in the coding or physics side.

Background

Many gamma detection systems are spectroscopic, meaning they measure a range of photon energy values, i.e., a spectrum. The energy profiles emitted by radioactive materials are typically known, so the measured spectroscopic data can be used as a “fingerprint” to identify materials. In the figure below, you can see how the measured spectra is used to to identify the nuclides Ra-226, Pb-214, and Bi-214 of the uranium decay chain.

Gammaspektrum Uranerz

Many spectroscopic instruments output spectroscopic data within files that follow the N42.42 standard. As described by NIST, “the ANSI/IEEE N42.42 standard specifies the XML data format that shall be used for both required and optional data to be made available by radiation measurement instruments.”, and they also mention its purpose is “to facilitate manufacturer-independent transfer of information from radiation measurement instruments for use in homeland security applications as well as for detection of illicit trafficking of radioactive material”.

These files more-or-less contains all the information necessary to assess the spectrum and/or radionuclide identification results. The standard defines the required and optional data elements and how they can be organized. It is also flexible enough to account for the various types of instruments defined in the other ANSI/IEEE N42 Radiation Detection Standards, from handhelds to portal monitors.

The latest version of the N42 file format, N42.N42.2020, can be downloaded for free from the IEEE website (but you have to sign-up for an account). In the standard, there are a few N42 file examples given. I’ve extracted and saved one of these to play around with.

Reading a spectra from an example N42

The example I took from the N42 standard has the following characteristics:

To parse the file’s data, I’ve taken a page from RASE and set-up a YAML configuration file which specifies the xpaths of data elements.

Side note: despite the intuition of being a “standard”, every manufacturer will have different N42 file formats. Certain elements will always have the same name or structured in same way, but, as described earlier, there are various ways to structure and extend the information depending on the instrument, application, etc. In our example we only have on manufacturer, but if there were more, each one would need a separate set of xpaths defined in the YAML config.

Grab filenames of N42s

In this case, just one.

# Get filenames of N42s (in thise case, just one)
import glob
n42_files = []
indir = 'input/'
n42_files.extend(glob.glob(indir + '*.n42'))

Set-up data elements xpath configurations

import yaml
from lxml import etree as ET

def join(loader, node):
    seq = loader.construct_sequence(node)
    return ''.join([str(i) for i in seq])

yaml.SafeLoader.add_constructor(tag='!join', constructor=join)

n42_config_yaml_str = """
    iRID:
        intrument_base: &IB ./RadInstrumentInformation
        instrument_manuf_xpath: !join [*IB, /RadInstrumentManufacturerName/text()]
        instrument_model_xpath: !join [*IB, /RadInstrumentModelName/text()]
        instrument_identifier_xpath: !join [*IB, /RadInstrumentIdentifier/text()]
        calibration_base: &CB ./EnergyCalibration
        calibration_id_xpath: !join [*CB, /@id]
        calibration_coefval_xpath: !join [*CB,/CoefficientValues/text()]
        calibration_remark_xpath: !join [*CB, /Remark/text()]
        
        radmeasurement_bkg: &RMB ./RadMeasurement[MeasurementClassCode="Background"]
        bgk_id_xpath: !join [*RMB,/@id]
        bgk_mcc_xpath: !join [*RMB,/MeasurementClassCode/text()]
        bgk_realtime_xpath: !join [*RMB,/RealTimeDuration/text()]
        bgk_startdt_xpath: !join [*RMB,/StartDateTime/text()]
        
        spectrum_bkg: &SB !join [*RMB,/Spectrum]
        bgk_spectrum_id_xpath: !join [*SB,/@id]
        bgk_spectrum_calibref_xpath: !join [*SB,/@energyCalibrationReference]
        bgk_spectrum_channeldata_xpath: !join [*SB,/ChannelData/text()]
        bgk_spectrum_chdata_compcode_xpath: !join [*SB,/ChannelData/@compressionCode]
        bgk_spectrum_livetime_xpath: !join [*SB,/LiveTimeDuration/text()]
        
        radmeasurement_frg: &RMF ./RadMeasurement[MeasurementClassCode="Foreground"]
        fgr_id_xpath: !join [*RMF,/@id]
        fgr_mcc_xpath: !join [*RMF,/MeasurementClassCode/text()]
        fgr_realtime_xpath: !join [*RMF,/RealTimeDuration/text()]
        fgr_startdt_xpath: !join [*RMF,/StartDateTime/text()]
        
        spectrum_fg: &SF !join [*RMF,/Spectrum]
        fgr_spectrum_id_xpath: !join [*SF,/@id]
        fgr_spectrum_calibref_xpath: !join [*SF,/@energyCalibrationReference]
        fgr_spectrum_channeldata_xpath: !join [*SF,/ChannelData/text()]
        fgr_spectrum_chdata_compcode_xpath: !join [*SF,/ChannelData/@compressionCode]
        fgr_spectrum_livetime_xpath: !join [*SF,/LiveTimeDuration/text()]

        result_base: &RB ./AnalysisResults
        result_id_xpath: !join [*RB,/@id]
        result_rmgr_xpath: !join [*RB,/@radMeasurementGroupReferences]
        result_nucldies_xpath: !join [*RB,/NuclideAnalysisResults/Nuclide/NuclideName/text()]
"""   

n42_config = yaml.safe_load(n42_config_yaml_str)

Parse N42 file into dataframe

Using pandas dataframe to ‘hold’ the data. All of the file’s data translates into a row in the dataframe. If there were multiple files, there would be multiple rows, etc.

import pandas as pd
import rase_functions_copied as rf

df_n42 = pd.DataFrame()
df_n42_list = []
elems_dict = {}
for i,f in enumerate(n42_files):
    tree = ET.parse(f)
    tree = rf.strip_namespaces(tree)
    root = tree.getroot()
    etemp = {}
    etemp['filename'] = f
    model = tree.xpath('./RadInstrumentInformation/RadInstrumentModelName//text()')[0]
    config = n42_config[model]
    for key, val in config.items():
        if 'base' in key:
            continue
        e = tree.xpath(val)
        if len(e) == 1 and (key != 'result_nuclides_xpath'):
            e = e[0]
        etemp[key.replace('_xpath','')] = e
    elems_dict[f] = etemp
df_n42 = pd.DataFrame.from_dict(elems_dict, orient='index').reset_index(drop=True)
# display(df_n42['msr_startdt'])
df_n42.melt()
variable value
0 filename input\n42_example_riid.n42
1 instrument_manuf RIDs R Us
2 instrument_model iRID
3 instrument_identifier SN 7890A
4 calibration_id EnergyCalibration-1
5 calibration_coefval -21.8 12.1 6.55e-03
6 calibration_remark []
7 radmeasurement_bkg [[], [], [], [[], []], [[], []], [[]]]
8 bgk_id RadMeasurement-1
9 bgk_mcc Background
10 bgk_realtime PT60S
11 bgk_startdt 2003-11-22T23:35:30-07:00
12 spectrum_bkg [[], []]
13 bgk_spectrum_id RadMeasurement-1-Spectrum-1
14 bgk_spectrum_calibref EnergyCalibration-1
15 bgk_spectrum_channeldata \n 0 0 0 22 421 847 1295 1982 2127 2222 2302 2...
16 bgk_spectrum_chdata_compcode None
17 bgk_spectrum_livetime PT59.7S
18 radmeasurement_frg [[], [], [], [[], []], [[], []], [[]]]
19 fgr_id RadMeasurement-2
20 fgr_mcc Foreground
21 fgr_realtime PT60S
22 fgr_startdt 2003-11-22T23:45:19-07:00
23 spectrum_fg [[], []]
24 fgr_spectrum_id RadMeasurement-2-Spectrum-2
25 fgr_spectrum_calibref EnergyCalibration-1
26 fgr_spectrum_channeldata \n 0 3 22 421 847 1295 1982 2127 2222 2302 227...
27 fgr_spectrum_chdata_compcode CountedZeroes
28 fgr_spectrum_livetime PT59.61S
29 result_id AnalysisResults-1
30 result_rmgr RadMeasurementGroup-1
31 result_nucldies [Cs-137, K-40]

Some clean-up / pre-processing

Basically getting dataframe in better shape for analysis. Some of this could of been avoided if I had parsed data better, I think.

# Melting dataframe (regret parsing file into one row)
idx = ['filename', 'instrument_manuf', 'instrument_model',
       'instrument_identifier', 'calibration_id', 'calibration_coefval',
       'calibration_remark', 'result_id', 'result_rmgr', 'result_nucldies']
mc = ['id','mcc','realtime','startdt','spectrum_id','spectrum_calibref','spectrum_channeldata','spectrum_chdata_compcode','spectrum_livetime']
df_n42_m = df_n42.melt(id_vars=idx, value_vars=['bgk_id','fgr_id'], value_name='msr_id').drop(columns='variable')
for c in mc[1:]:
    df_n42_m['msr_' + c] = df_n42.melt(value_vars=['bgk_'+c,'fgr_'+c]).drop(columns='variable')

# convert to list, cast to float, reverse for polyval function
df_n42_m['calibration_coefval'] = (
    df_n42_m['calibration_coefval']
    .str.split(' ')
    .apply(lambda x: [float(v) for v in x])
    .apply(lambda x: list(reversed(x))) 
)
# convert to datetime
df_n42_m['msr_startdt'] = pd.to_datetime(df_n42_m['msr_startdt'])
# convert to list, cast to int
df_n42_m['msr_spectrum_channeldata'] = (
    df_n42_m['msr_spectrum_channeldata']
    .str.replace('\n','')
    .str.replace(' +',' ',regex=True)
    .str.strip().str.split(' ')
    .apply(lambda x: [int(v) for v in x])
)

# Account for compression algorithm
cz = df_n42_m['msr_spectrum_chdata_compcode'] == 'CountedZeroes'
df_n42_m.loc[cz,'msr_spectrum_channeldata'] = (
    df_n42_m['msr_spectrum_channeldata'].apply(rf.uncompressCountedZeroes)
)
# go ahead and split calibration coefficients into 3 columns
# makes for slightly easier processing ahead
df_n42_m[['calib_coef_2','calib_coef_1','calib_coef_0']] = pd.DataFrame(df_n42_m['calibration_coefval'].tolist(), index= df_n42_m.index)
df_n42_m = df_n42_m[['filename', 'instrument_manuf', 'instrument_model',
       'instrument_identifier', 'calibration_id', 'calib_coef_2', 'calib_coef_1', 'calib_coef_0',
       'calibration_remark', 'msr_id', 'msr_mcc','msr_realtime', 'msr_startdt',
       'msr_spectrum_id', 'msr_spectrum_calibref', 'msr_spectrum_channeldata',
       'msr_spectrum_livetime', 'result_id', 'result_rmgr', 'result_nucldies']]

df_n42_m
filename instrument_manuf instrument_model instrument_identifier calibration_id calib_coef_2 calib_coef_1 calib_coef_0 calibration_remark msr_id msr_mcc msr_realtime msr_startdt msr_spectrum_id msr_spectrum_calibref msr_spectrum_channeldata msr_spectrum_livetime result_id result_rmgr result_nucldies
0 input\n42_example_riid.n42 RIDs R Us iRID SN 7890A EnergyCalibration-1 0.00655 12.1 -21.8 [] RadMeasurement-1 Background PT60S 2003-11-22 23:35:30-07:00 RadMeasurement-1-Spectrum-1 EnergyCalibration-1 [0, 0, 0, 22, 421, 847, 1295, 1982, 2127, 2222... PT59.7S AnalysisResults-1 RadMeasurementGroup-1 [Cs-137, K-40]
1 input\n42_example_riid.n42 RIDs R Us iRID SN 7890A EnergyCalibration-1 0.00655 12.1 -21.8 [] RadMeasurement-2 Foreground PT60S 2003-11-22 23:45:19-07:00 RadMeasurement-2-Spectrum-2 EnergyCalibration-1 [0, 0, 0, 22, 421, 847, 1295, 1982, 2127, 2222... PT59.61S AnalysisResults-1 RadMeasurementGroup-1 [Cs-137, K-40]

Subtract background

In gamma measurement systems, usually a reference measurement is taken with no sources of radiation present other than the environment’s background radiation. This background measurement can be compared with future measurements to discern if there are significant sources of radiation presesnt. In our sample N42 file, there is both a background and foreground spectrum. You’ll see some instruments will instead have seperate files for each.

Naturally, as I am playing around with this data, I’d like to try subtracting the background from the foreground. However, in this case it will just be an exercise. It turns out both spectra are the same, meaning both measurements were taken in identical conditions (or, as I suspect, the spectra was copy+pasted because it’s just an example file). So when we subtract them, we just get 0s.

There is one difference between the foreground and background measurement within the example file. One uses the “CountedZeroes” compression algorithm, which is described in the N42 standard as “compressed by the removal of repeated zero values… When a “0” value appears in the ChannelData contents, the next value is the number of consecutive zero-value channels beginning with the first zero-value in the sequence.” You’ll see I accounted for this in previous cell.

# demo bg and fg spectra are the same
import numpy as np
spectra = df_n42_m['msr_spectrum_channeldata'].apply(lambda x: np.array(x))
bsub = (spectra[1] - spectra[0])

print(min(bsub),max(bsub))
0 0

Replicate

One N42 file is not much fun. With a set of spectra, I’ll be able to explore more analysis and visualization techniques. Let’s simulate multiple N42 files by replicating the one I have.

To improve this simulated data set, I want to change the cloned spectra so that they are all slightly different, as if they were actually taken at different times, varying environmental conditions, etc. I’ve taken mostly from RASE here and tried to repeat their process for energy calibration distortion influences. In short, the spectra is shifted along the energy axis using a quadratic equation, where C_0=C_2=0 and C_1=1 mean no distortion.

import datetime

def distort_spec(counts, calibration_coefval, distortion_coefval):
    """ Mimics energy calibration distortions, where spectra is shifted along the energy axis using quadratic equation
    Just wrapper for RASE code, relies on their rebin function.
    
    param counts: numpy array of counts indexed by channel
    param calibration_coefval: list of calibration coefficients, for converting channels to energy
    param distortion_coefval: list of distortion coefficients
    return msr_spectrum_channeldata_dist: numpy array of distorted/rebinned counts indexed by channel
    """
    channels = np.arange(len(counts))
    energies = np.polyval(calibration_coefval, channels)
    energiesDist = np.polyval(distortion_coefval, energies)
    counts_dist = rf.rebin(counts, energiesDist, calibration_coefval)
    return counts_dist

r_num = 50
df_n42_repl = df_n42_m.iloc[[0]]
for i in range(0,r_num):
    df_n42_repl = pd.concat([df_n42_repl, df_n42_m.iloc[[1]]])

# Unique timestamps
df_n42_repl = df_n42_repl.reset_index(drop=True)
df_n42_repl['msr_startdt'] = df_n42_repl['msr_startdt'] + pd.Series(df_n42_repl.index.values).apply(lambda x: datetime.timedelta(minutes=x*3))
df_n42_repl['msr_date'] = df_n42_repl['msr_startdt'].apply(lambda x: x.strftime('%m/%d/%Y'))
df_n42_repl['msr_time'] = df_n42_repl['msr_startdt'].apply(lambda x: x.strftime('%H:%M'))

# Distort spectra by applying energy calibration distortion
scale_factor = 0.3
s = np.array((range(0,df_n42_repl.shape[0])))
s = list((s - s.min()) / (s.max() - s.min())*scale_factor)
df_n42_repl['ecal_dist_lvl'] = s
df_n42_repl['calib_coef_0_dist'] = 0
df_n42_repl['calib_coef_1_dist'] = 1 * (1+df_n42_repl['ecal_dist_lvl']) # gain change
# df_n42_repl['calib_coef_2_dist'] = df_n42_repl['calib_coef_2'] * (1+df_n42_repl['ecal_dist_lvl'])
df_n42_repl['calib_coef_2_dist'] = 0

df_n42_repl['msr_spectrum_channeldata'] = (df_n42_repl.apply(
    lambda x: distort_spec(
        x['msr_spectrum_channeldata'],
        [x['calib_coef_2'], x['calib_coef_1'], x['calib_coef_0']],
        [x['calib_coef_2_dist'], x['calib_coef_1_dist'], x['calib_coef_0_dist']]), 
    axis='columns'))

# some noise for fun
df_n42_repl['msr_spectrum_channeldata'] = df_n42_repl.apply(lambda x: np.random.normal(x['msr_spectrum_channeldata']-x['msr_spectrum_channeldata']*x['ecal_dist_lvl']*0.01,x['msr_spectrum_channeldata']*0.01),axis='columns')
df_n42_repl.reset_index(inplace=True)

Plotting Spectra

Here I set-up and save straightforward plots of all of the replicated spectra. The two IDs given by the instrument, Cs-137 and K-40, make sense as it looks like there are peaks around those isotopes’ known peaks, ~660 and 1460 Kev, respectively.

import matplotlib.pyplot as plt
import regex as re

def plot_gamma_spectra(energies, counts,
                       RadInstrumentManufacturerName,
                       RadInstrumentModelName,
                       RadInstrumentIdentifier,
                       StartDateTime,RealTimeDuration,SpectrumID,
                       text='',title_append='',directory='',
                      ylim=None,xlabel="",ylabel=""):
    datetime = '-'.join(re.findall("\d+", str(StartDateTime)))
    fig, ax = plt.subplots()
    ax.plot(energies, counts, linewidth=2.0)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    # ax = spectra.plot()
    # plt.xscale("log")
    if ylim:
        ax.set_ylim(0,ylim)
    # fig = ax.get_figure()
    if text:
        fig.text(.95,.96, text, ha='right',va='top', size=14, transform=ax.transAxes)
    ax.set_title(RadInstrumentManufacturerName + ' ' + RadInstrumentModelName + ' '
                + str(RadInstrumentIdentifier) + '\n' + str(SpectrumID)
                 + ' ' + title_append)
    if directory != '':
        # display('not show')
        fig.savefig(directory + 'spectrum_' + '_' + str(datetime) + '_' + str(SpectrumID) 
                    + str(RadInstrumentModelName) + '.png', dpi=150)
        plt.close(fig)
    else:
        plt.show()
        plt.close(fig)

df_fg = df_n42_repl.loc[df_n42_repl['msr_mcc']=='Foreground'].copy()
# df_fg = df_n42_repl.copy()
# display(df_fg['msr_spectrum_channeldata'].iloc[0])
# Energies (x axis), know in this case will be the same for all spectra
channels = np.arange(len(df_fg['msr_spectrum_channeldata'].iloc[0]))
energies = np.polyval([df_fg['calib_coef_2'].iloc[0], df_fg['calib_coef_1'].iloc[0], df_fg['calib_coef_0'].iloc[0]], channels)
# Plot first spectra in notebook as demonstration
df_fg.iloc[[0]].apply(lambda x:
    plot_gamma_spectra(energies, x['msr_spectrum_channeldata'],
                       x['instrument_manuf'],
                       x['instrument_model'],
                       x['instrument_identifier'],
                       x['msr_startdt'],x['msr_realtime'],x['msr_spectrum_id'],
                       text=x['msr_time'] + '\n IDs: ' + ','.join(x['result_nucldies']),
                       title_append=str(x['index']), 
                       ylim=2500,
                      xlabel="Energy (keV)",ylabel="Counts")
,axis='columns')

# Saves plots for all spectra in dataframe
df_fg.apply(lambda x:
    plot_gamma_spectra(energies, x['msr_spectrum_channeldata'],
                       x['instrument_manuf'],
                       x['instrument_model'],
                       x['instrument_identifier'],
                       x['msr_startdt'],x['msr_realtime'],x['msr_spectrum_id'],
                       text=x['msr_time'] + '\n IDs: ' + ','.join(x['result_nucldies']),
                       title_append=str(x['index']), ylim=2500,
                       directory='plots_1/',
                      xlabel="Energy (keV)",ylabel="Counts")
,axis='columns')
display(' ')

png

' '

Spectra Plots to GIF

I’ve found creating GIFs of spectra plots can help to quickly visualize spectra measurement over time. In this case, the effects of the distortion and noise applied to the replicated spectra are seen.

from PIL import Image
def make_gif(frame_folder):
    frames = [Image.open(image) for image in glob.glob(f"{frame_folder}/*.png")]
    frame_one = frames[0]
    frame_one.save("N42_replicate_test.gif", format='GIF', append_images=frames[1:],
                   save_all=True, duration=100, loop=0)
print('.')
make_gif('plots_1/')
.

spectra-gif

Next Time?