Create your FInk/LSST filter
This tutorial goes step-by-step for creating a filters used to define which information will be sent to you by the broker. It is expected that you know the basics of Python and Pandas. The use of Apache Spark is a plus. If you are not at ease with software development, that is also fine! Just contact us with your scientific idea, and we will help you designing the filter.
Running entirely Fink just for testing a module might be an overwhelming task. Fink can be a complex system, but hopefully it is highly modular such that you do not need all the parts to test one part in particular. In principle, to test a module you only need Python, and alert data.
Development environment
First fork and clone the fink-filters repository on your machine, and create a new folder in fink_filters/rubin/livestream. The name of the folder must start with filter_. The rest does not matter much, but try to make it meaningful as much as possible!
To make sure you are working in the correct environment, with exact version of dependencies used by Fink, we recommend to use the Fink Docker image. Download the image and mount your version of fink-filters in a container:
# 2.3GB compressed
docker pull gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-rubin:latest
# Assuming you are in /path/to/fink-filters
docker run -t -i --rm -v \
$PWD:/home/libs/fink-filters \ # (1)!
gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-rubin:latest bash
- Mount a volume for persisting data generated by and used by you in the Docker container.
The advantage of this method is that you have everything installed in it (Python and various frameworks). Beware, it is quite big... You should see some logs appearing when entering the container (this is ok). Finally remove the pre-installed version of fink-filters from the container:
From now on, you will be working on /home/libs/fink-filters in the container, which is pointing to your fink-filters folder in your machine. So you can edit files in your machine, change will be seen from within the container.
Filter design
A filter is typically a Python routine that selects which alerts need to be sent based on user-defined criteria. Criteria are based on the alert entries: position, flux, properties, ... plus all the scientific surplus from the Fink science modules. We recommend users to check schemas before starting.
A filter will typically contains two parts: the filter module that contains the main routine called by Fink, and any other modules used by the filter:
.
├── fink_filters/rubin
│ └── livestream
│ ├── filter_example
│ │ ├── filter.py # (1)!
│ │ ├── __init__.py
- The filename
filter.pyis mandatory. All the remaining files or folders can have any names.
The best to start coding you filter is to look at existing filters written by the community of users, and take inspiration. The only thing to know is that argument of the filter function:
should be existing names in the schema (without prefix if this is a nested field).
Filter test
The more tests the better! Typically, we expect at least unit tests using doctest for all functions (see an example here ). Once you have written your unit tests, you can easily run them:
# in /home/libs/fink-filters
./run_tests.sh --single_module fink_filters/rubin/livestream/filter_example/filter.py
you should see some harmless Spark logs in the form
/spark-3.4.1-bin-hadoop3/python/pyspark/sql/pandas/functions.py:399: UserWarning:
In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF
instead of specifying pandas UDF type which will be deprecated in the future releases.
See SPARK-28264 for more details.
then if you do not have errors, you will see the coverage report.
Need more representative test data?
There is test data in fink-filters already, but it might not be enough representative of your science case. In that case, the best is to use the Data Transfer service to get tailored data for your test.
Authentication
Make sure you have an account to use the fink-client .
Once you have an account, install it and register your credentials on the container:
# Install the client
pip install fink-client
# register using your credentials
fink_client_register ...
Trigger a job on the Data Transfer service and download data in your container (July 12 2024 is good to start, only 17k alerts):
# Change accordingly
TOPIC=ftransfer_lsst_2024-07-16_682277
mkdir -p /data/$TOPIC
fink_datatransfer \
-topic $TOPIC \
-survey lsst \
-outdir /data/$TOPIC \
--verbose
and specify this data path in your test:
# usually at the end of filter.py
...
if __name__ == "__main__":
""" Execute the test suite """
globs = globals()
custom_path = "file:///data/ftransfer_lsst_2024-07-16_682277"
globs["custom_path"] = custom_path
...
Submit your Filter
Once you are ready (either the filter is done, or you are stuck and want help), open a Pull Request on the fink-filters repository on GitHub, and we will review the filter and test it extensively before deployment.
Once your filter is accepted and deployed, you will then be able to receive these alerts in (near) real-time using the fink-client , or access them at any time in the Science Portal.