Testing a science module
This page walks you through the design, test, and submission of a science module in Fink for LSST.
Cloning the repository
First fork and clone the fink-science repository on your machine, and create a new folder in fink_science/rubin. The name of the folder does not matter much, but try to make it meaningful as much as possible!
Working environment
Fink uses a strict Python environment to run. All package versions can be found in the fink-broker-images repository (containerfs/<survey>). You can install these on your computer, but it is best to use docker because Fink also uses other components to run such as Apache Spark that you probably do not want to install on your computer.
We expose ready-to-use Fink docker images on a registry in GitLab. First pull the latest image, trigger a container and mount inside your version of fink-science:
# 3GB compressed
docker pull gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-rubin:latest
# Assuming you are in /path/to/fink-science on the host
docker run -t -i --rm -v \
$PWD:/workspace/fink-science \ # (1)!
gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-rubin:latest bash
- Mount a volume for persisting data generated by and used by you in the Docker container.
Mounted volume
Note that since we invoke docker with -v, your local folder with fink-science is seen by both your host computer and the container. This means, any change done locally will be available in the container. Therefore you can use your favourite code editor from your computer, and only test the code within the docker (no need to code inside the docker).
The advantage of this method is that you have everything installed in it (Python and various frameworks). Beware, it is quite big... Finally remove the pre-installed version of fink-science from the container:
and update your PYTHONPATH to point to the new location:
Finally walk to the mounted folder:
LSST test data
You need data to test your module. Our test suite makes use of data at fink-alert-schemas . Inside the docker container, clone this repository and copy data into your workspace:
cd /workspace
git clone https://github.com/astrolabsoftware/fink-alert-schemas
cp -r /workspace/fink-alert-schemas/datasim /workspace/fink-science/fink_science/
At this stage you can run the test suite to make sure you get your setup right:
you should see some harmless Spark logs in the form
/spark-3.4.1-bin-hadoop3/python/pyspark/sql/pandas/functions.py:399: UserWarning:
In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF
instead of specifying pandas UDF type which will be deprecated in the future releases.
See SPARK-28264 for more details.
then if you do not have errors, you will see the coverage report printed on screen.
Science module design
A module contains necessary routines and classes to process the alert data, and add values. A science module will typically contains two parts: the processor that contains the main routine called by Fink, and any other modules used by the processor:
fink_science/rubin/
├── your_module
│ ├── __init__.py
│ ├── processor.py # (1)!
│ ├── README.md
│ └── utils.py
- The filename
processor.pyis mandatory. All the remaining files or folders can have any names.
The processor will typically look like:
from line_profiler import profile # (1)!
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import FloatType
import pandas as pd
from fink_science.rubin.your_module.utils import super_magic_funtion
@pandas_udf(FloatType())
@profile
def myprocessor(
diaObjectId: pd.Series, psfFlux: pd.Series, anothercolumn: pd.Series
) -> pd.Series:
"""Documentation please!"""
# your logic goes here
output = super_magic_funtion(*args)
# Return a column
return pd.Series(output)
- The use of a profiler will help us to understand performance bottlenecks, and optimise your code.
Remarks
- The use of the decorator is mandatory. It is a decorator for Apache Spark, and it specifies the output type. We are returning floats in this example.
- You can return only one new column (i.e. add one new information per module). However the column can be nested (i.e. containing lists or dictionaries as elements).
You can easily see existing science modules structure on the repository .
Science module test
The more tests the better! Typically, we expect at least unit tests using doctest for all functions. Once you have written your unit tests, you can easily run them:
cd /workspace/fink-science
./run_tests.sh --single_module fink_science/rubin/your_module/processor.py
Additional dependencies
if you need additional dependencies, install them in the container. Do not forget to specify them when you submit your science module later (during the pull request phase).
Need more representative test data?
There is test data in fink-science already, but it might not be enough representative of your science case. In that case, the best is to use the Data Transfer service to get tailored data for your test.
Authentication
Make sure you have an account to use the fink-client .
Once you have an account, install it and register your credentials on the container:
# Install the client. You need at least v10.0
pip install fink-client
# register using your credentials
fink_client_register ...
Trigger a job on the Data Transfer service and download data in your container (December 10 2025 is good to start, only 2k alerts):
# Change accordingly
TOPIC=ftransfer_lsst_2024-07-16_682277
mkdir -p /data/$TOPIC
fink_datatransfer \
-topic $TOPIC \
-survey lsst \
-outdir /data/$TOPIC \
--verbose
and specify this data path in your test:
# usually at the end of processor.py
...
if __name__ == "__main__":
""" Execute the test suite """
globs = globals()
custom_path = "file:///data/ftransfer_lsst_2024-07-16_682277"
globs["custom_path"] = custom_path
...
Submit your science module
Once you are ready (either the module is done, or you are stuck and want help), you can open a Pull Request on the fink-science repository on GitHub, and we will review the module and test it extensively before deployment. Among several things, we will perform profiling and performance tests as long running times add delay for further follow-up observations. See fink-science-perf for more information.
Once your module is accepted and deployed, outgoing alerts will contain new information! You can then define your filter using fink-filters , and you will then be able to receive these alerts in (near) real-time using the fink-client , or access them at any time in the Science Portal.