D3.js • Data Viz • Accessibility • JavaScript

Utah Digital Newspapers Dashboard

An interactive exploration tool for 36 million digitized Utah newspaper documents—and what they reveal.

Line chart showing frequency of the words "internet" and "radio" in the Salt Lake Tribune from 1975 to 2004, with radio peaking in the late 1970s and internet rising sharply in the mid-1990s.

Overview

The Marriott Library has digitized over 36 million Utah newspaper documents — spanning roughly 160 years—and made the collection available through a public API. But the archive had no way to explore itself: no visualization, no interface for asking what the collection contained or what patterns it might reveal. This project asked: what does this archive actually look like, and what can it tell us about Utah's history? My partner and I built an interactive dashboard from scratch over one semester, targeting researchers and curious general readers alike.

The problem

The Utah Digital Newspapers archive is a remarkable public resource—over 300 newspapers, millions of documents, more than a century of Utah history. But accessing it meant querying a raw API with no interface for exploration. You could retrieve documents if you knew exactly what you were looking for, but there was no way to ask broader questions: How have certain words trended over time? Which counties are well-represented in the archive? When did a particular newspaper stop being published?

Without an exploration layer, the archive's historical signal was essentially invisible to anyone without the technical ability to write their own queries.

My role & approach

This was a two-person graduate course project with equal contribution across the full project lifecycle—data exploration, visualization design, implementation, and evaluation. We used vanilla JavaScript and D3.js, pulling live data from the UDN API and supplementary geographic data from the Utah Geospatial Resource Center.

We designed for a broad audience—not just researchers who might know what they're looking for, but anyone curious about Utah history. That framing shaped every visualization choice: accessibility and learnability over density.

Process

Defining the questions

We started with exploratory data analysis using simple bar charts—a deliberate choice to understand the data before committing to any final design. Bar charts were useful but limiting, and they clarified what we actually wanted to show: temporal trends, geographic distribution, archive structure, and word-level patterns. From there, we identified five distinct questions the dashboard should answer, and designed one visualization for each.

Choosing visualization types deliberately

Each chart type was chosen for a specific reason, not by default.

Line chart (the core view): Users search for terms, select a newspaper or "All Papers," set a date range, and receive a multi-line chart showing word frequency over time. Line charts let us show multiple search terms simultaneously—the Gestalt principle of connectedness makes the temporal relationship between lines legible in a way scatter plots or bar charts can't match. The chart includes word count transparency: each query displays how many total words were scanned across all papers, and how many came from the specific newspaper chosen. This gives users a sense of what they're actually working with.

Supplemental bar charts: Generated per query—one showing which papers were published in the user's date range (encouraging discovery of other newspapers), and one showing the top 10 most frequent words in the requested paper on each queried date.

Gantt-style timeline: Shows the archive presence of all 300+ newspapers as horizontal bars on a shared time axis. We implemented vertical scrolling specifically because of scale—300+ newspapers spanning ~160 years can't be meaningfully compressed. Hover reveals newspaper name, location, and exact archive dates in a sidebar panel.

Utah map: An SVG map with dots for each city or town represented in the archive. Clicking a city populates a table with all publications from that city, their total document count, and their archive date range. Maps provide a geographic perspective that categorical charts can't give—they make the archive's uneven coverage immediately visible.

County bubble chart: Shows publication count by county. We chose bubbles over a bar chart here because Salt Lake County is a dramatic outlier—a case where extreme size difference communicates more directly in area than in bar height, even though bubbles struggle with close comparisons between similar values.

Bubble chart showing newspaper publication counts by Utah county, with Salt Lake County as a dominant outlier — Salt Lake County's dominance reads immediately as a bubble. We chose this chart type specifically for datasets with a dramatic outlier—and acknowledged that close comparisons between similar counties are harder to read.

Data cleaning and responsible API usage

The UDN API's OCR data—text scanned from physical newspapers—required significant cleanup before it was usable. We implemented a stop word list and dictionary check, filtering out terms that weren't valid words before counting frequency. A naive first implementation was noticeably slow when data structures exceeded 100,000 elements; we optimized the algorithm after identifying the bottleneck.

We also gave users control over the stop word filter. Turning it off lets users search for terms like "y2k" that wouldn't survive dictionary validation—but at a real memory cost, which we surfaced in the UI so users could make an informed choice.

The API itself is inefficiently designed: finding the closest published paper to a given date can require 30-100 sequential API calls. Rather than hiding this constraint, we designed around it. The line chart's granularity controls—increment amount and number of dates—let users explicitly limit how many API calls their query generates. The UDN API is a public resource; consuming it carelessly would be poor practice, and building around the constraint made the tool more honest about what it was doing.

What the tool can actually reveal

The line chart turned out to be the most revealing visualization. Tracking "handicapped," "disabled," and "retarded" across Utah papers shows a visible shift in language use after the ADA passed in 1990—"disabled" rising as older terms decline. The pattern makes cultural change legible in a way that narrative description can't quite match.

Tracking "y2k" and "millennium" shows panic accelerating through 1998-1999 and dropping off sharply after January 2000—the arc of public anxiety made visible in word frequency.

Line chart showing frequency of 'handicapped', 'disabled', and 'retarded' in Utah newspapers from 1975-2004 — Tracking disability language across Utah papers: 'disabled' rises after the ADA passes in 1990, while older terms decline. The tool surfaces cultural change through word frequency—a use case that wasn't possible without the exploration layer.

Gantt-style newspaper timeline with hover state showing Beaver County News details in a sidebar panel — The timeline covers 300+ newspapers spanning ~160 years. Vertical scrolling was a deliberate usability decision—the data couldn't be compressed without losing meaning. Hover reveals exact archive dates and location.

Outcome & results

We delivered five interconnected visualizations covering word frequency trends, geographic distribution, archive structure, and historical span — all pulling live from the UDN API. The dashboard makes the archive's contents legible in ways that weren't possible before, and it surfaces the archive's gaps just as clearly: where coverage is thin, which papers end too early to capture recent events, which regions are underrepresented.

We shipped a working, interactive dashboard within a single semester using no external libraries beyond D3.js. Scope was an explicit constraint, not an afterthought, and the project is better for having treated it that way.

Reflection

The map's biggest usability weakness is the absence of zoom and city labels — users need prior geographic knowledge to find specific locations. Given more time, adding zoom and inline labels would be the first change.

The bubble chart's limitation—difficulty distinguishing counties with similar publication counts—was a conscious tradeoff. A bar chart would have handled close comparisons better; bubbles handled the outlier better. I'd add click interaction next, letting users drill into the papers belonging to a given county.

The ethical API usage decision is something I'd carry into any future project involving public third-party APIs. Building around the constraint rather than ignoring it made the tool more honest about what it was doing — and more respectful of a shared resource.

Links & resources

Up nextFull-Stack • ASP.NET Core • SQL Server

TA Application Portal

A role-based web application for managing teaching assistant applications, built with ASP.NET Core and deployed to AWS EC2.

View case study

Back to all projects