The use case
I had a Jupyter Notebook that connected to a database, ran some heavy processing on the data (using machine learning and everything), and saved aggregated data back to the database.
Since notebooks are great for developing, the requirement was to keep using notebooks for development but to be able to trigger the processing from an API call. The notebook should never be executed twice in parallel though. In other words, the API should return an error if the notebook were already being executed. Note that the notebook would be provided once at deployment time: it won't change during the lifecycle of the app.
I was initially planning to use a simple Flask app but soon got into trouble. How can I (1) run the notebook in a background thread and (2) restrict its execution to one at a time?
This is how I discovered FastAPI and Celery. I implemented an MLP (M*inimum **L*ovable *P**roduct*) based on those technologies available on GitHub.
The use case was perfect for learning. This is why I cooked a complete tutorial based on it, along with schemas and explanations. The tutorial repository can be used as a base to follow along. Not only will you learn about FastAPI and Celery, but also Poetry, ruff, and other nice tips and tricks.
Jump to the main sections:
I used the above website as a base for a talk at the GDG Fribourg and figured some of you could also benefit from it. Don't forget to leave a ⭐!