Apache Airflow — No Backfill

A lot of software seems to be designed to save the user from themselves. This is great 90% of the time when you mess up and really want their help (or when the software’s help is cosmetic … my gripe against auto-correcting smart quotes, as an example). But I seem to fall into the other 10% a lot. And I mean a LOT. Apache Airflow jobs try to grab new information all.of.the.time. It’s a feature called “backfill”, and I’m sure it helps all sorts of people do exactly what they really wanted done. Not me 🙁

Having updated to 1.8, though, I now see a configuration parameter to instruct a DAG not to do me any favors. Just do what you’re asked when you’re asked to do it: catchup = False

DAG('testjob', default_args=default_args, schedule_interval='0 * * * *', catchup=False)

Leave a Reply

Your email address will not be published. Required fields are marked *