ML-PSF: Picking Good Sources for PSF Generation

TL;DR

We can maybe use Machine Learning (ML) to help pick stars for Point Spread Function (PSF) creation to save time in astronomy data processing pipelines.

Built With

Background

What is a PSF?

PSFs mathematically describe how point source objects are distorted in an image. Images are a convolution between the true object and its PSF.

Why are PSFs important to astronomy?

PSFs are necessary to study any object close to the resolution limit of a telescope with high precision.

What do we need in order to create PSFs?

Examples of point-like sources in the image of interest are needed as inputs to PSF generation software. In astronomy, good point-like sources would be stars that are bright, round, and well isolated from other sources. The task of selecting these good sources for PSF generation is what this deep learning model has been trained to do.

Goal

Given cutout of each source in an image along with their respective x and y coordinates, this program calls on an already trained ML model that will return a subset of cutouts of sources to use for PSF creation. It also returns the x and y coordinates of these sources that can be used to pass into the python module TRIPPy in order to create the desired PSF.

Method

For this project, images from 2020 taken by the Hyper SuprimeCam (HSC). For each image, the top 25 sources were selected as those with the lowest flux outside the central source, as inferred by the flux of the most discrepant pixel in the source-PSF residual, and the standard deviation of all residual pixels.

Of these top 25, the ones which fell in the accepted range of pixel brightness values were deemed good and labelled 1. All other sources were considered bad and labelled 0. Using this approach, there are far more bad sources than good ones and so a random selection of bad sources is made such that the 0 and 1 class sizes are equal.

A simple 2D Convolutional Neural Network (CNN) was developed for this binary classification problem:

Results

The accuracy on the test set was found to be 89.12% overall.

We can raise the confidence threshold beyond which the model labels a source as good. The default threshold is 50%, however, since we care much more about achieving a low false positive rate than a low false negative rate, the threshold of 90% was adopted such that we can achieve a false positive rate of 93.87% while still having a significant number of sources classified as good:

Once the model is trained, the CNN method takes only ~6% the CPU time of the non-CNN method, dramatically speeding up the pipeline:

Conclusion

This machine learning-based method allows for faster PSF generation but not nessisarily better. Some form of unsupervised learning is likely needed as a next step.

Resources

Project is publically available on GitHub: ashley-ferreira/ML-PSF
Poster available here

Share on

Twitter Facebook LinkedIn

Ashley Ferreira