LineDetect.spectra_processor

Created on Wed Apr 1 06:43:54 2023

@author: daniel

Module Contents

Classes

Spectrum

A class for processing spectral data stored in FITS files.

objective_spec

This class is used for optimizing the spectra processing parameters,

ThresholdExceeded

Exception raised when a threshold is exceeded during optimization with Optuna.

Functions

_set_style_()

Function to configure the matplotlib.pyplot style. This function is called before any images are saved,

format_labels(→ list)

Takes a list of labels and returns the list with all words capitalized and underscores removed.

class LineDetect.spectra_processor.Spectrum(halfWindow=25, resolution_range=(1400, 1700), region_size=100, resolution_element=3, savgol_window_size=100, savgol_poly_order=5, N_sig_limits=0.5, N_sig_line1=5, N_sig_line2=3, rest_wavelength_1=2796.35, rest_wavelength_2=2803.53, directory=None, save_all=False)[source]

A class for processing spectral data stored in FITS files.

Can process either a set of .fits files or single spectra.

Note

If the line is detected the spectrum features will be added to the DataFrame df attribute, which will always append new detections. If no line is detected then nothing will be added to the DataFrame, but a message with the object name will print.

Parameters:
  • halfWindow (int, list, np.ndarray) – The half-size of the window/kernel (in Angstroms) used to compute the continuum. If this is a list/array of integers, then the continuum will be calculated as the median curve across the fits across all half-window sizes in the list/array. Defaults to 25.

  • region_size (int) –

  • savgol_window_size (int) –

  • savgol_poly_order (int) – The order of the polynomial used for smoothing the spectrum.

  • resolution_range (tuple) – A tuple of the minimum and maximum resolution (in km/s) used to detect MgII absorption. Can also be an integer or a float.

  • directory (str) – The path to the directory containing the FITS files. Defaults to None.

  • save_all (bool) – Parameter to control whether to save the non-detections. If the spectral feature is not detected and save_all=True, the qso_name will be appended alongside ‘None’ entries. Defaults to False to save only positive detections.

process_files()[source]

Process the FITS files in the directory.

process_spectrum(Lambda, y, sig_y, z, file_name)[source]

Process a single instance of spectral data.

_reprocess()[source]

Re-runs the process_spectrum method using the saved spectrum attributes.

plot(include, errorbar, xlim, ylim, xlog, ylog, savefig)[source]

Plots the spectrum and/or continuum.

find_MgII_absorption(Lambda, y, yC, sig_y, sig_yC, z, qso_name)[source]

Find the MgII lines, if present.

find_CIV_absorption(Lambda, y, yC, sig_y, sig_yC, z, qso_name)

Find the CIV lines, if present.

process_files()[source]

Processes each FITS file in the directory, detecting any Mg II absorption that may be present.

The method iterates through each FITS file in the directory specified during initialization, reads in the spectrum data and associated header information, applies continuum normalization, identifies Mg II absorption features, and calculates the equivalent widths of said absorptions. The results are stored in a pandas DataFrame (df attribute).

Note

Unlike when processing single spectra, this method does not save the continuum and continuum_err attributes, therefore the plot() method cannot be called. Load a single spectrum using process_spectrum to save the continuum attributes.

Returns:

None

process_spectrum(Lambda, flux, flux_err, z, qso_name=None)[source]

Processes a single spectrum, detecting any Mg II absorption that may be present.

Parameters:
  • Lambda (array-like) – An array-like object containing the wavelength values of the spectrum.

  • flux (array-like) – An array-like object containing the flux values of the spectrum.

  • flux_err (array-like) – An array-like object containing the flux error values of the spectrum.

  • z (float) – The redshift of the QSO associated with the spectrum.

  • qso_name (str, optional) – The name of the QSO associated with the spectrum, will be saved in the DataFrame. Defaults to None, in which case ‘No_Name’ is used.

Returns:

None

_reprocess(qso_name=None)[source]

Reprocesses the data, intended to be used after running process_spectrum(). Useful for changing the attributes and quickly re-running the same sample without having to re-input the spectra.

Note

This will update the DataFrame by appending the new object line features (if found).

Parameters:

qso_name (str, optional) –

Returns:

None

find_MgII_absorption(Lambda, y, yC, sig_y, sig_yC, z, qso_name=None)[source]

Finds Mg II absorption features in the QSO spectrum and adds the line information to the DataFrame, including the Equivalent Width and the corresponding error.

Parameters:
  • Lambda (array-like) – Wavelength array.

  • y (array-like) – Observed flux array.

  • yC (array-like) – Estimated continuum flux array.

  • sig_y (array-like) – Observed flux error array.

  • sig_yC (array-like) – Estimated continuum flux error array.

  • z (float) – The redshift of the QSO associated with the spectrum.

  • qso_name (str, optional) – The name of the QSO associated with the spectrum, will be saved in the DataFrame. Defaults to None, in which case ‘No_Name’ is used.

Returns:

None

optimize(Lambda, flux, flux_err, z_qso, z_element, halfWindow, region_size, resolution_element, savgol_window_size, savgol_poly_order, N_sig_limits, N_sig_line1, N_sig_line2, n_trials=100, threshold=0.1, show_progress_bar=False)[source]

This class method will optimize the element detection parameters according to the input constraints

Parameters:
  • resolution_element (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • halfWindow (tuple) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • N_sig_limits (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • N_sig_line1 (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • N_sig_line2 (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • n_trials (int) – Number of trials to run the optimizer for. Defaults to 100.

  • threshold (float) – The desired threshold. For example, if this is set to 0.1, the optimization will stop if the target is within this tolerance. Defaults to 0.0001 which will essentially top the optimization when it reaches the exact target value.

plot(include='both', highlight=True, xlim=None, ylim=None, xlog=False, ylog=False, savefig=False, path=None)[source]

Plots the spectrum and/or continuum.

Parameters:
  • include (float) – Designates what to plot, options include ‘spectrum’, ‘continuum’, or ‘both.

  • highlight (bool) – If True then the line will be highlighted with accompanying vertical lines to visualize the equivalent width. Defaults to True.

  • xlim – Limits for the x-axis. Ex) xlim = (4000, 6000)

  • ylim – Limits for the y-axis. Ex) ylim = (0.9, 0.94)

  • xlog (boolean) – If True the x-axis will be log-scaled. Defaults to True.

  • ylog (boolean) – If True the y-axis will be log-scaled. Defaults to False.

  • savefig (bool) – If True the figure will not disply but will be saved instead. Defaults to False.

  • path (str, optional) – Path in which the figure should be saved, defaults to None in which case the image is saved in the local home directory.

Returns:

AxesImage

plot_param_opt(xlim=None, ylim=None, xlog=True, ylog=False, savefig=False)[source]

Plots the parameter optimization history.

Note

The Optuna API has its own plot function: plot_optimization_history(self.optimization_results)

Parameters:
  • baseline (float) – Baseline accuracy achieved when using only the default engine hyperparameters. If input a vertical line will be plot to indicate this baseline accuracy. Defaults to None.

  • xlim – Limits for the x-axis. Ex) xlim = (0, 1000)

  • ylim – Limits for the y-axis. Ex) ylim = (0.9, 0.94)

  • xlog (boolean) – If True the x-axis will be log-scaled. Defaults to True.

  • ylog (boolean) – If True the y-axis will be log-scaled. Defaults to False.

  • savefig (bool) – If True the figure will not disply but will be saved instead. Defaults to False.

Returns:

AxesImage

plot_param_importance(plot_time=False, savefig=False)[source]

Plots the hyperparameter optimization history.

Note

The Optuna API provides its own plotting function: plot_param_importances(self.optimization_results)

Parameters:
  • plot_tile (bool) – If True, the importance on the duration will also be included. Defaults to False.

  • savefig (bool) – If True the figure will not display but will be saved instead. Defaults to False.

Returns:

AxesImage

class LineDetect.spectra_processor.objective_spec(Lambda, flux, flux_err, z_qso, z_element, rest_wavelength_1, rest_wavelength_2, resolution_range, halfWindow, region_size, resolution_element, savgol_window_size, savgol_poly_order, N_sig_limits, N_sig_line1, N_sig_line2)[source]

Bases: object

This class is used for optimizing the spectra processing parameters, using the high-level Optuna API

Parameters:
  • Lambda (array-like) – An array-like object containing the wavelength values of the spectrum.

  • flux (array-like) – An array-like object containing the flux values of the spectrum.

  • flux_err (array-like) – An array-like object containing the flux error values of the spectrum.

  • z_qso (float) – The redshift of the QSO associated with the spectrum.

  • z_element (float) –

  • rest_wavelength_1 (float) –

  • rest_wavelength_2 (float) –

  • resolution_range (tuple) –

  • resolution_element (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • halfWindow (tuple) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • N_sig_limits (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • N_sig_line1 (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

  • N_sig_line2 (bool) – A tuple containing the (min, max) parameter range to search through. Can be integer to hard code this parameter and exclude from the grid search.

__call__(trial)[source]
class LineDetect.spectra_processor.ThresholdExceeded[source]

Bases: optuna.exceptions.OptunaError

Exception raised when a threshold is exceeded during optimization with Optuna.

This exception is a subclass of optuna.exceptions.OptunaError and is used to represent an error condition where a threshold has been exceeded during the optimization process with Optuna.

Parameters:

None

LineDetect.spectra_processor._set_style_()[source]

Function to configure the matplotlib.pyplot style. This function is called before any images are saved, after which the style is reset to the default.

Parameters:

None

LineDetect.spectra_processor.format_labels(labels: list) list[source]

Takes a list of labels and returns the list with all words capitalized and underscores removed.

Parameters:

labels (list) – A list of strings.

Returns:

Reformatted list, of same lenght.