2012-03-01
I’ve recently started looking at the awesome flexget and thought it would solve a problem with my sons school newsletter.
Like most schools, the one our son attends has a weekly newsletter to inform the parents of upcoming activities and events at the school. When he started the school gave us the option to either have a physical printout of it sent home with our child1, or we could just download the pdf from their website.
We opted-out of the printed newsletter and were happy to check the website for the pdf version of the newsletter.
As we’re both busy and sometimes forget to check the website we’d occasionally miss things.
I wanted to download the pdf automatically when it appeared on the website and email it to us so we wouldn’t have to remember to check for the latest version.
Initially I used wget called from cron to check the website like this:
/usr/bin/wget -r -l1 -N --no-verbose --continue --no-parent \
--no-directories --no-host-directories --reject html,htm,txt \
--accept .pdf -o /var/log/newsletters.log \
--directory-prefix=/srv/samba/newsletters \
http://www.redacted.schools.nsw.edu.au/newsletters
I used the --continue
flag so that it wouldn’t download the same pdfs over and over. Even taking this into consideration this method still felt like a brute force approach.
(I won’t go into it here, but I then use incron
to look for changes to the /srv/samba/newsletters
, which calls another script and emails the file as an attachment)
I like how flexget remembers what it has seen in a database and not download that file again. I thought this would solve the problem very eloquently and as I couldn’t find much info about getting files from a URL automatically I thought I’d share my config here so others looking to do the same could benefit.
Here’s my newsletter.yml
config:
presets:
global:
free_space:
path: /srv/samba/newsletters
space: 1 #make sure there's Xgb free before downloading more
domain_delay:
www.redacted.schools.nsw.edu.au: 10 seconds
email:
active: True
from: davidmarsh
to:
- redacted@example.com
feeds:
newsletter:
interval: 6 hours
html:
url: http://www.redaceted.schools.nsw.edu.au/newsletters/
title_from: link
regexp:
accept:
- redacted_newsletter*
rest: reject
download: /srv/samba/newsletters
This will:
/srv/samba/newsletters
(which is on the same disk as /
)www.redacted.schools.nsw.edu.au
(even though there`s only one check, I wanted this here if I add more later)/srv/samba/newsletters
directoryI call it from cron with this command:
/usr/local/bin/flexget --cron -c /home/davidmarsh/.flexget/newsletter.yml
It doesn’t really matter how often it runs as it will only actually hit the website every 6 hours due to the interval: 6 hours
option in the yml
file.2
(like before, I’m still using incron
to call scripts to email the files)
Now we get an email with the newsletter attached within 6 hours of a new newsletter appearing on the schools website.