How researchers are using Reddit and Twitter data to forecast suicide rates
Social media sites offer unprecedented levels of real-time data that could be used for mental health research. | Thomas Trutschel/Photothek/Getty Images Researchers at the CDC and Georgia Tech are using a whole lot of data, including social media, to forecast the suicide rate, a statistic that can lag by up to two years. The Centers for Disease Control and Prevention (CDC) is using data from platforms like Reddit and Twitter to power artificial intelligence that can forecast suicide rates. The agency is doing this because its current suicide statistics are delayed by up to two years, which means that officials are forming policy and allocating mental health resources throughout the country without the most up-to-date numbers. The CDC’s suicide rate statistics are calculated based on cause-of-death reports from throughout the 50 states, which are compiled into a national database. That information is the most accurate reporting we have, but it can take a long time to produce. “If we want to do any kind of policy change, intervention, budget allocation, we need to know the real picture of what is going on in the world in terms of people’s mental health experiences,” Munmun de Choudhury, a professor at Georgia Tech’s School of Interactive Computing who is working with the CDC, told Recode. Researchers believe that combining other types of real-time data, including content from social media platforms like Reddit and Twitter, and health-related data we already have, like data from suicide helplines, could reduce that lag time. The idea is that, together, these sources of data can send “signals” about what the suicide rate is — and what it will be — which artificial intelligence can be trained to uncover. This effort is just another way that AI is being used to study how we talk online and to power new approaches to public health. Similar technology is already being used to catch illegal sales of opioids online and has even helped track the initial outbreak of the novel coronavirus. Approaches like these could help save some lives, but they’re a reminder that information that’s publicly shared on the internet is increasingly driving health policy, which could help make decisions that have a real impact on our lives, including in suicide prevention efforts. Combining health data with information gleaned from Twitter and Reddit can make for better predictions Without estimates of the real-time suicide rate, it can be incredibly difficult for public health officials to precisely direct suicide and self-harm prevention efforts where they’re needed. A CDC spokesperson said that those numbers can be delayed by one to two years, which makes it harder to properly respond to the increasing suicide rate, which we know has surged 40 percent in less than two decades. “When you have data that is dated, and you know that the rates of suicide are increasing but you don’t know by how much, it can severely impact the kinds of interventions organizations like the CDC can do, [such as] maybe improving access to resources [and] allocating resources throughout the country,” de Choudhury told Recode. She explains that keywords related to suicide help whittle down publicly available data. A summary of research Recode obtained through a public records request noted that this information could be drawn from “news reports, Twitter, Reddit, Google Trends, [and] YouTube Search trends.” That data is then combined with other health data the CDC has, including data provided from crisis text and call lines. Based on all these sources of data and previous suicide rates created through the CDC’s National Vital Statistics program, researchers can train an algorithm to forecast what the actual rate is. “You train a machine-learning model using data and then you apply that model on an unseen data set to see how well it is doing,” de Choudhury told Recode. “The project was: How can we intelligently harness signals from these different real-time sources in order to offset this one- to two-year lag?” She says the first phase of the research had “remarkable success” and that the algorithm had an error rate of less than 1 percent. That number represents an average of the difference between their predicted suicide rate and the actual rate, as reported by the CDC historically. “What our method does is give estimates at a weekly granularity over all of 2019,” de Choudhury says. “What we are saying is that we can now estimate these rates of suicide up to a year in advance of when death records become available.” That means that they could use data collected until December 2019 to predict the suicide rate for every week of 2021. A CDC spokesperson told Recode a research paper is expected later this year but that the work is still in an early stage. AI is increasingly being used to identify suicide risk De Choudhury says her work with the CDC is just one way AI can drive mental health efforts. Another idea: using machine learning to study patients’ social media (with their consent) to help determine when a person’s mental health symptoms get worse. “By the time people do get connected with care, to receive adequate health [care], that is pretty late in their trajectory of the illness, which makes appropriate treatment that can be tailored to the person specifically really, really challenging,” she explains. The CDC and de Choudhury are not alone in looking at the role of AI in identifying people who are at risk of suicide. Researchers at Vanderbilt University have used machine-learning algorithms, trained on a wide range of data, to predict the likelihood that someone might take their own life. And researchers in Berkeley, California, working with the Department of Energy and the Department of Veterans Affairs, are using deep learning to identify and score a patient’s risk of suicide. Meanwhile, the Crisis Text Line, a text messaging service that allows people who are struggling with their mental health to text a counselor, is using such AI to figure out which people who reach out on its service are more likely to engage in self-harm or to attempt suicide. (You can check out some of the data the service collects here.) That approach is not unlike the AI used by Facebook, which analyzes content on its site to make an informed guess about whether someone is at risk of “imminent harm,” though that strategy has also raised questions about data privacy and transparency. (If you’re curious, you can read more about how that works on Facebook here.) As with most tech innovations, there are trade-offs to using people’s online communication — even personal comments about mental health — to help power AI. It’s worth asking whether we’re comfortable with corporate social platforms being able to make these types of judgments about us, especially on sensitive matters like suicide. At the same time, this tech could also help save people’s lives and get them resources that they need, assuming it works and is used responsibly. And, as the CDC’s research demonstrates, that information can do more than just help individual people. It can help shape how we address the suicide epidemic as a whole. Open Sourced is made possible by Omidyar Network. All Open Sourced content is editorially independent and produced by our journalists.