ChromaScribe | charlieburr

Chromascribe

Overview

Ask

Develop a web tool using multimodal data visualization to solve this issue using grant proposals and prior academic research in the field to inform design decisions.

The Problem

For researchers using qualitative coding, when collecting and coding hours of transcripts across dozens if not hundreds of participants the vast amount of data collected creates an untenable cognitive burden. This causes data fatigue which allows for the omission of key themes and imperfect coding.

Additionally, when working with controversial topics, researcher bias can influence outcomes. This offers opportunities for biased conclusions that could negatively impact the participating groups and data.

Tools

Figma & Figjam, Slack, Github, Visual Studio Code

Duration

2022 - 2023

Made possible by The National Science Foundation

Key Issue

When research is conducted, there are many steps by which human bias can influence outcomes. From question writing, sample collection, question-wording, interviewer conduct, transcript coding, commenting, and finally, to drafting conclusions... there are many places where researchers can intentionally or unintentionally include bias.

What the tool addresses

ChromaScribe was created to address the transcript coding step of the process. This is the portion of the process where researchers will develop themes and then code words in the transcript to those themes. Think of the themes as labeled buckets. Words (and the sentences they came from) will be dropped into those buckets and help the researcher to draw their conclusions at the end. The issue occurs when researchers develop these themes with unintentional bias that will skew all remaining data.

Our tool is specifically built with research on controversial issues in mind. Machine learning can parse themes for researchers unbiasedly by seeing all the transcripts at once and pulling themes directly from what is said.

Why This Work Is Important

As a queer designer, this work is incredibly impactful for me. The transcripts on which we are building the program with are from a case of individuals with HIV. I hope that this tool can create more fairness in the way we do research and bring a more just and unbiased approach to qualitative coding that will protect at-risk groups along the way.

But what is thematic coding anyway?

Thematic coding is a qualitative data analysis technique that, in the simplest terms, is the act of organizing and grouping similar ideas or topics in transcribed interviews, helping researchers see the main points more clearly.

Design Process

By the time I joined the project, exploratory research had been conducted. It was my job to understand this research and develop designs based on this. I started by reading this research and processing it by creating notes and annotating the text.

Competitor Analysis

To better understand our core users, I studied how they would use our competitor's programs. I watched the tutorial videos of those programs and took notes on areas where we could improve. I also sought to understand how users currently complete the tasks we solve.

Original Design

This is the original design that I was presented with. Many changes have been made to this including a UI overhaul. Demographic filtering was shrunk to prioritize the transcript and visualization. I also created more space to prioritize transcripts by moving the audio player to the bottom and removing unnecessary data.

User Flow

Sketching

Early in the process, I used GoodNotes to create digital sketches for easy remote handoff. I illustrated my ideas to present to the group, noted specific questions/options, and offered real-world examples to give a better understanding.

Feedback Loops 🔁

In addition to communicating regularly on Slack with my peers developing the tool, I created a system in Figma to comment and communicate on the status of implemented designs.

Each section is dated with space to summarize our main comments. This creates an archive of the implementation as well as an organized space to collaborate.

Accessibility

Accessibility was important for me to to think about when creating our design system given the high reliance on color for the use of the product. Colors were analyzed for accessibility in terms of contrast and color blindness and edited to prioritize accessibility.

Iterative UX Design Decisions

The full blue design lacked professionalism and conveyed a playful, toy-like aesthetic that did not appeal to the target user.
The design was revised to use a sidebar for color, and the main body was changed to grey.
The grey color posed a problem by blending in with the background of the homepage, harming visual continuity. The harsh drop shadows added too much depth, so they were toned down and the rounding was reduced to enhance the professional appeal.

User Research & Usability Findings

In-depth-interview research sessions were conducted by walking participants through the tool over zoom. Participants were asked for impressions at every step and asked to share overall impressions at the end of the session.

Positives

Negatives

The visualization of the transcripts was immediately insightful in reducing bias

Participant indicated that our tool reduced cognitive burden, a key goal of the project

Time to code was greatly reduced and the participant was interested in using the tool

Saving versions was necessary for auditing and was missing from the tool

The tool was missing an Excel/CSV import and export function

It was unclear where speakers were silent in the audio vs neutral

Next Steps

We were pleased to find that the key goals of the project were being met in reducing bias and cognitive burden. Overall the participant impressions were highly positive.

However, there were a few features that were deemed necessary for minimum viability. For example, researchers need the ability to save versions of their work for auditing purposes, and the ability to import and export data from Excel is necessary for seamless integration into the user's workflow. Therefore, these features will be our focus for the next steps of the project.

We plan to conduct further interviews in January 2023.

Designs

* Note to protect the identity and sensitive medical information of HIV health study participants transcript areas will be blurred unless fabricated

Transcript Visualization Panels

Interviews are represented by two paired color bars. The top bar represents the interviewer while the bottom represents the participant. Each color corresponds to a theme generated by our machine learning. This is a novel and unique way to represent these transcripts and it aims to save time by visually representing highly complex data in a uniquely simple way.

When an interview is selected, the audio waveform can be seen. This waveform represents pitch and volume giving users instant insight into the relative emotions and tone of the speakers visually.

Transcript Panels

Transcribed audio segments appear here. Highlights represent coded words. Their color corresponds to the coded theme. The overall segment color is denoted by the most dominant theme. The transcript can be read via this bar. The segments auto-scroll as the interview is played.

Theme Filtering

The themes bar allows users to filter audio segments by the desired theme making it faster for researchers to find desired audio segments

Demographic Filtering

Interviews can be filtered via the expanding sidebar. Users can find patterns in race, age, HIV status or any other demographic as it relates to the selected themes.

Originally demographics appeared at the top of the page. I created the solution for a collapsable side bar to allow users the option to keep it open if desired but close it to save space when not adjusting.

Let's generate our themes 🧪

Rather than letting users (researchers) create themes that could be influenced by their own bias, they are first presented with a list of the top themes generated by ML. The option to add their own is secondary.

Themes will light up with color when selected. When done, users click next to continue.

Once themes are selected, further machine learning will generate a word list from provided transcripts for each theme. These words will be picked up by the program to auto-code transcripts after themes are created. Users drag and drop words from the word cloud (showing relative relevancy) to the theme boxes.

Generated themes live here permanently. Color, words, theme name, and related words can be edited here.