I spent some time recently on getting our AirWave server to automatically generate PDFs of reports, and my manager suggested I share the information with Airheads, since PDF reports seem to be a pretty frequently requested feature.
What you need:
- an AirWave version that supports the post-report scripting hook if you want to automate the process (check /var/airwave/custom/ for a file called post_report.sample; if it's there, you should be good); I think 7.2 or better support it, maybe 7.1 as well
- shell access to the AirWave server (root access if you want to use the post-report scripting hook)
- a little shell and shell scripting knowledge
- wget, which should be installed by default (if /usr/bin/wget is not present, run sudo yum install wget to obtain it)
- wkhtmltopdf, which uses the WebKit rendering engine to generate PDFs from HTML pages, available at http://code.google.com/p/wkhtmltopdf/; I recommend the statically linked AMD64 version (or i386 if you are on an older version of AirWave)
BEFORE YOU START: I TESTED ON AIRWAVE 7.4.8 AND 7.5 BETA I AND HAVE NO IDEA IF ANY OF THIS WILL WORK ON VERSIONS PRIOR TO THAT, SO CAVEAT EMPTOR
ALSO ANY WORK OR EXPERIMENTATION YOU DO ON YOUR AIRWAVE SERVER WHILE FOLLOWING THESE INSTRUCTIONS IS COMPLETELY AT YOUR OWN RISK AND I AM NOT RESPONSIBLE IF YOU ACCIDENTALLY ERASE /usr OR BREAK SOMETHING ELSE REALLY BADLY
1. The first problem that I needed to solve was how to get access to the HTML version of the report itself. Since AirWave's file storage structures on disk are pretty opaque and not externally documented for us, the only way to get the report is to log into the AirWave server via the web UI with an existing account and then scrape the report URL. But how do you go about that?
The answer is to use the wget command-line utility to automate the process. wget does two things we need for this: it can send arbitrary strings as POST data, and it can save the login cookie it gets back from the AirWave server to a text file.
(Nota bene: this assumes you are using at least AirWave 7.3, which I think is when AirWave changed to the form-based authentication; if you are using an older version that still uses HTTP basic authentication, you'll want to use --http-username and --http-password instead, and you may be able to consolidate both wget commands into a single command without having to save the login cookie.)
Here is the first wget command:
/usr/bin/wget -q --keep-session-cookies --save-cookies /tmp/login_cookie.txt --post-data "credential_0=USERNAME&credential_1=PASSWORD&destination=/&login=Log In" --no-check-certificate https://127.0.0.1/LOGIN -O -
Things to note about this command:
- it will save the login cookie to /tmp/login_cookie.txt
- the username and password to log in with are in plaintext in the --post-data string (replace USERNAME and PASSWORD)
- you must log in via the localhost IP (using the external IP/FQDN of the server will not work)
Assuming the login is successful, /tmp/login_cookie.txt will look something like this:
# HTTP cookie file.
# Generated by Wget on 2012-06-07 13:50:10.
# Edit at your own risk.
127.0.0.1 FALSE / TRUE 1339095010 Mercury::Handler::AuthCookieHandler_AMPAuth aabbccddeeff00112233445566778899
If it's empty, something went wrong with the login, so go back and try again. The -q switch reduces wget's output, so remove it if you need more information for debugging.
2. Now that you have a valid login cookie, you re-run wget a second time, telling it to use the login cookie you just saved to gain access to the URL where the report is and scrape the report and all images/CSS/javascript into a directory, rewriting the URLs in the HTML as it goes to be relative links referencing the new locations.
Here is the second wget command:
/usr/bin/wget -q -E -k -r -w 1 --no-check-certificate --load-cookies /tmp/login_cookie.txt -P /tmp/ "https://127.0.0.1/nf/report_detail?id=REPORT_ID&format=xml"
Things to note about this command:
- -E causes wget to append .html to the end of XML/HTML files that don't already end in .html (this is necessary to keep wkhtmltopdf from spitting out weird errors)
- -k causes wget to convert links to be relative links, as mentioned above
- -r causes wget to be recursive about its retrieval (get all the images and CSS and such that are necessary to make the report look correct)
- -w 1 causes wget to wait 1 second between retrievals so as to not slam the server (remove this if you don't care and want everything to be downloaded immediately; you'll probably want to leave this off while experimenting)
- -P provides a new root folder for wget to fetch into (it will create a directory named 127.0.0.1 at the given location and create subdirectories underneath for the files it fetches; you can leave this off to just let it write into the current directory)
- in the URL to fetch, you must provide the ID of the report to be fetched (to find a report's ID, mouse over the URL in your web browser and take note of the ID in the URL)
Once executed, wget will busily set about recursively retrieving everything that's necessary to make the report look like a report.
3. When wget is finished, have a look in /tmp/127.0.0.1/ (or wherever you told wget to write the downloaded files to). You'll see a few directories, like css and images and nf. nf is where the actual report data gets written to, so have a look in there for a file named report_detail?id=ID&format=xml.html (where ID is the ID number you supplied in the previous wget command).
Now, you can do whatever you want with the report. If you want to turn it into a PDF immediately, here's what you do.
Using wkhtmltopdf, which you should have already downloaded and untarballed somewhere, you can convert the report to a PDF with the following command:
/path/to/wkhtmltopdf-amd64 -s Letter -q --disable-internal-links --disable-external-links /path/to/report/127.0.01/nf/report_detail\?id\=ID\&format\=xml.html ~/output.pdf
You can ignore any warning messages it outputs about QPixmap. Assuming that's successful, you should now have a PDF in your home directory of the report!
(As with anything, there are some caveats; the main problem I've noticed is that on reports with really wide tables, wkhtmltopdf elects to scale down the entire report to minuscule proportions to cram the entire table onto the page instead of keeping the scale as normal and paginating the table horizontally when it gets to it).
How to automate this process in the next post...