Building News Applications
COMM 177T/277T (2019)

Automating the News

Day 7 ~ April 23, 2019

Overview

Automated journalism, or the practice of programmatically generating news from data, stretches back 50 years to the field of weather forecasting, according to the Tow Center. It has become more widely adopted in the last decade with the advent of web frameworks and advancement of computational techniques.

The trend began surfacing more widely in data journalism circa 2008, when Matt Waite created a site called Neighborhood Watch at the St. Petersburg Times. The site -- sadly no longer on the Web -- featured code-generated real estate stories on roughly 400 neighborhoods in Florida's Hillsborough and Pinellas Counties.

Neighborhood Watch was a Django app that leaned on a combination of custom business logic and story templates to auto-generate stories on home prices and trends. The approach was novel at the time for most news organizations -- especially smaller newsrooms -- and held the promise to free reporters from the chore of writing boilerplate stories so they could focus on more original reporting and analysis. A similar strategy has been used to automate news tips and "first drafts" of stories in machine-augmented workflows, as the Los Angeles Times demonstrated with Quakebot.

These semi-automated techniques for story generation require journalists to encode templatized narratives and data-driven business logic for a limited number of story types. In recent years, news organizations have started partnering with companies such as Automated Insights and Narrative Science on fully automated, AI-driven stories generated purely from data. This trend will likely continue as newsrooms shrink, data sources improve, and algorithms get better at generating human-readable text.

But there's still a place for semi-automated techniques. At the moment, AI-driven news remains dependent on access to high-quality, structured data and advanced computational techniques beyond the reach of many newsrooms. As newsroom budgets shrink, developers can apply the technniques of semi-automated journalism to generate human-friendly news tips for colleagues and prepare data-generated stories ready for an edit and publication -- stories that may otherwise go uncovered.

And of course, template engines continue to play a key role in the creation of interactive graphics and news applications. They're essential for keeping code DRY and help speed up the pace of development. They allow developers to re-use boilerplate HTML (think headers and footers) and dynamically generate and style web page elements such as page navigation and dropdown menus. They help get the boring stuff out of the way, so news app developers can focus on the visual, narrative and interactive features that make an application unique.

Mad Libs with strings and templates

Semi-automated news stories and alerts rely on a basic technical formula: combine a prefabricated template with data.

Below are some basic Python examples that demonstrate the technique using earthquake data. These are toy examples, but the techniques could easily be applied to produce, for example, a storybot using a real feed of USGS earthquake data.

location = '118km W of Cantwell, Alaska'
magnitude = '1.7'
date = 'April 2, 2019'
time = '4:30am Alaskan Standard Time'

The most basic approach is string concatenation.

'A magnitude ' + magnitude + ' earthquake occured' + \
location + ' on ' + date + ' at ' + time + '.'

A better approach relies on string formatting. Below are examples that use formatting syntax as well as f strings, which were added in Python 3.6.

# Brace-style string formatting
"A magnitude {} earthquake occured {} on {} at {}.".format(
magnitude,
location,
date,
time
)

# "f" strings allow you to embed variables directly in a string, rather
# than passing values using the ".format" method
f"A magnitude {magnitude} earthquake occured {location} on {date} at {time}."

Python also has a built-in string template library that can simplify the construction of slightly longer snippets.

from string import Template
t = Template("A magnitude $magnitude earthquake occured $location on $date at $time.")
t.substitute(magnitude=magnitude, location=location, date=date, time=time)

These examples work well for short snippets of text, but generating a longer narrative often calls for the use of a formal templating library.

Libraries such as Jinja2 allow you to not only merge data with a template, but embed coding logic to dynamically determine content based on the value of variables.

from jinja2 import Template
template = Template("""A {% if magnitude|int < 5 %}small{% else %}large{% endif %}
magnitude {{ magnitude }} earthquake occured {{ location }}
on {{ date }} at {{ time}}.
"""
)
template.render(magnitude=magnitude, location=location, date=date, time=time)

Above, we applied a few of Jinja2's built-in features to generate customized text:

Jinja2 and similar libraries offer many more features, but even this small example demonstrates the basic strategy that drives news applications such as Neighborhood Watch and Quakebot.

Assignment 6: Quakebot

Complete the Quakebot exercise by Tuesday April 30th by class time. To submit the work, submit the URL to your Github fork of the assignment via Canvas. If you forked the code as a private repo, you must add me as a collaborator.