Reposted from Northeastern Global News
A Northeastern researcher and former postdoctoral fellow have created an artificial intelligence tool that uses sequences of life events — such as health history, education, job and income — to predict everything from a person’s personality to their mortality.
Built using transformer models, which power large language models (LLMs) like ChatGPT, the new tool, life2vec, is trained on a data set pulled from the entire population of Denmark — 6 million people. The data set was made available to the researchers only by the Danish government.
The tool the researchers built based on this complex set of data is capable of predicting the future, including the lifespan of individuals, with an accuracy that exceeds state-of-the-art models. But despite its predictive power, the team behind the research says it is best used as the foundation for future work, not an end in and of itself.
“Even though we’re using prediction to evaluate how good these models are, the tool shouldn’t be used for prediction on real people,” says Tina Eliassi-Rad, professor of computer science and the inaugural President Joseph E. Aoun Professor at Northeastern University. “It is a prediction model based on a specific data set of a specific population.”
Eliassi-Rad brought her AI ethics expertise to the project. “These tools allow you to see into your society in a different way: the policies you have, the rules and regulations you have,” she says. “You can think of it as a scan of what is happening on the ground.”
By involving social scientists in the process of building this tool, the team hopes it brings a human-centered approach to AI development that doesn’t lose sight of the humans amid the massive data set their tool has been trained on.
“This model offers a much more comprehensive reflection of the world as it’s lived by human beings than many other models,” says Sune Lehmann, author on the paper, which was recently published in Nature Computational Science.
At the heart of life2vec is the massive data set that the researchers used to train their model. The data is held by Statistics Denmark, the central authority on Danish statistics, and, although tightly regulated, can be accessed by some members of the public, including researchers. The reason it’s so tightly controlled is it includes a detailed registry of every Danish citizen.
The many events and elements that make up a life and are spelled out in the data, from health factors and education to income. The researchers used that data to create long patterns of recurring life events to feed into their model, taking the transformer model approach used to train LLMs on language and adapting it for a human life represented as a sequence of events.
The whole story of a human life, in a way, can also be thought of as a giant long sentence of the many things that can happen to a person,” says Lehmann, a professor of networks and complexity science at DTU Compute, Technical University of Denmark and previously a postdoctoral fellow at Northeastern.
The model uses the information it learns from observing millions of life event sequences to build what is called vector representations in embedding spaces, where it starts to categorize and draw connections between life events like income, education or health factors. These embedding spaces serve as foundation for the predictions the model ends up making.
One of the life events that that the researchers predicted was a person’s probability of mortality.
“When we visualize the space that the model uses to make predictions, it looks like a long cylinder that takes you from low probability of death to high probability of death,” Lehmann says. “Then we can show that in the end where there’s high probability of death, a lot of those people actually died, and in the end where there’s low probability of dying, the causes of death are something that we couldn’t predict, like car accidents.”
The paper also illustrates how the model is capable of predicting individual answers to a standard personality questionnaire, specifically when it comes to extroversion.
Eliassi-Rad and Lehmann note that although the model makes highly accurate predictions, they are based on correlations, highly specific cultural and societal contexts and the kinds of biases that exist in every data set.
“This kind of tool is like an observatory of society –– and not all societies,” Eliassi-Rad says. “This study was done in Denmark, and Denmark has its own culture, its own laws and its own societal rules. Whether this can be done in America is a different story.”
Given all those caveats, Eliassi-Rad and Lehmann view their predictive model less like an end product and more like the beginning of a conversation. Lehmann says major tech companies have likely been creating these kinds of predictive algorithms for years in locked rooms. He hopes this work can start to create a more open, public understanding of how these tools work, what they are capable of, and how they should and shouldn’t be used.
“The other path ahead is to say, once we can make these accurate predictions about everything –– because we just chose two things, but we can predict all kinds of things –– which are the ones we want to implement in democratic societies?” Lehmann says. “I don’t have those answers, but it’s high time we start the conversation because what we know is that detailed prediction about human lives is already happening and right now there is no conversation and it’s happening behind closed doors.”
“It’s about getting knowledge instead of just predictions,” Lehmann adds. “The knowledge is something we can share and something we can turn into action.”
One of the more promising areas where the researchers see this tool having a positive impact is in health care.
“I am optimistic and want to spend more time in that direction because I think we could do some real good and we could really help by mining this space to help people,” Lehmann says. “This is not sending out a text message to people saying, ‘You’re going to have cancer if you don’t change this,’ but you can have your medical professional get this piece of information so that they can help guide you.”
Eliassi-Rad says health care is also a promising application because it accounts for one of the ethical problem areas that she is concerned about when it comes to how this technology is often implemented: accountability.
“I think health care is a good avenue to use this tool as an exploration and perhaps to be able to provide better health care,” Eliassi-Rad says. “In particular, it’s suitable because there are people who can be held accountable, as opposed to no accountability whatsoever when peoples’ lives are ruined on some prediction that an AI model makes.”
Eliassi-Rad wants to avoid the ethical pitfalls of how predictive tools have been used to impact policy in the past, like in the case of a Dutch fraud assessment algorithm. A tool like life2vec is less about predicting every aspect of a person’s future and more about exploring trends in a society, its policies and its people at a level that was never possible before.
“It’s not good to think about people as vectors in some Euclidean space, and that’s why it’s more about exploration because if you start thinking about people as vectors (i.e. mathematical objects), mathematical objects come and go,” Eliassi-Rad says. “But they’re real people –– they have hearts and minds.”