BreakthroughMAY 28, 2026 · AI COMPANIONS · ELDER CARE

AI Companions and Elder Loneliness — What the Research Actually Says

ElliQ, Pi, Replika, Character.ai for lonely seniors. The 95% headline, the peer-reviewed reality, and the parasocial risk nobody is pricing in.

By Kadin Nestler · May 28, 2026 · 14 min read

Share X LinkedIn Email

Loneliness change in AI-companion studies of older adults (self-reported)

NYSOFA / ElliQ press release, n≈800, no control (2024)-95

Broadbent / Doraiswamy ElliQ RCT, n≈200, CoBot-I-7 (2024)-28

De Freitas et al. HBS RCT, 1-week LLM companion (2024)-19

Systematic review pooled mean, 9 studies (2025)-22

Stanford RCT 4-week heavy users, Hancock lab (2025)6

The headline you have probably seen is from the New York State Office for the Aging — "ElliQ AI Companion Robot Shows 95% Reduction in Loneliness." It ran in trade press through 2024 and into early 2025, and it is the most-cited number in the entire elder-AI-companion category. It is also a self-reported figure from a vendor-partnered deployment with no control group, no UCLA Loneliness Scale anchor, and no peer review at the time of the press cycle. The actual peer-reviewed literature on AI companions for older adults reads differently — more modest, more contingent, more interesting.

This piece is for the operator, the family caregiver, the geriatrician, and the agency director trying to decide whether to put a Pi, a Replika, a Character.ai, a Kuki Companion, or an ElliQ in front of a lonely 78-year-old this quarter. The answer is not "yes, deploy at scale." The answer is also not "no, this is a Sherry Turkle nightmare." The answer is "it depends on what the device is replacing and what you are actually measuring." Below — the studies, the design flaws, the user accounts, and the parasocial risks the headline number is not pricing in.

The category in 2026, briefly

There are four product shapes worth distinguishing, because the research splits cleanly along these lines and the failure modes do not generalize across them.

The first shape is the dedicated elder-care device — ElliQ from Intuition Robotics is the dominant deployed example. It is a tabletop unit with a screen-and-head form factor that initiates conversations proactively, runs scheduled check-ins, gates conversation through care-partner-tuned prompts, and pipes structured data back to the deploying agency. The New York State Office for the Aging (NYSOFA) program has put it in more than 800 households since 2022.

The second shape is the mainstream consumer companion chatbot — Replika (Eugenia Kuyda, founded 2017) and Character.ai (Noam Shazeer, Daniel De Freitas, founded 2021) are the two with the largest user bases. They were not designed for older adults. The research on them is overwhelmingly about general-population users, and most of the elderly use is incidental rather than designed-for. Replika now positions itself toward "wellness and emotional support" after a 2023 repositioning away from romance flows.

The third shape is the general-purpose conversational AI being studied as a loneliness intervention — primarily Pi from Inflection AI (now functionally a Microsoft asset after the March 2024 acquihire of Mustafa Suleyman and most of Inflection's 70-person team for roughly $650M), and to a lesser extent ChatGPT and Claude when wrapped in research-grade scaffolding. The Harvard Business School working paper by Julian De Freitas and colleagues (Harvard Business School Working Paper 24-078, also on arXiv as 2407.19096) is the most-cited rigorous example in this bucket.

The fourth shape is the legacy social robot — Paro the robotic seal, the Jibo line, ElliQ's early predecessors. The Sherry Turkle nursing-home critique is anchored here and the relevant literature stretches back fifteen years. The new wave of LLM-driven companions inherits both the affordances and the criticisms.

The research on shapes one and three is genuinely encouraging in places. The research on shape two is mixed at best and openly cautionary at the heavy-user end. Most of the press conflates all four. That is the analytic problem the rest of this piece is trying to undo.

What the peer-reviewed evidence actually shows

Two papers carry most of the weight. The first is the Broadbent–Doraiswamy ElliQ study published in the Journal of Aging Research and Lifestyle in early 2024 ("ElliQ, an AI-Driven Social Robot to Alleviate Loneliness: Progress and Lessons Learned"). Elizabeth Broadbent is associate professor in health psychology at the University of Auckland and a long-running researcher on social robots; Murali Doraiswamy is professor of psychiatry at Duke and a regular voice on AI-in-medicine. The study enrolled more than 200 older adults across multiple deployment sites, with randomized assignment and a purpose-built measurement instrument called the CoBot-I-7 scale. Headline finding: 56% of users reported an increase in their social connections with others — not just with the device. The trial was co-run by the vendor, which is a real limitation, but the design was randomized, the instrument was disclosed, and the data were submitted to peer review. That is a different category of evidence than a press release.

The second is the De Freitas et al. work out of Harvard Business School. Their core experiment is a controlled, high-powered design in which participants interact with an LLM-driven companion versus control activities (watching YouTube, sitting in silence, conversing with another person). Results: AI companions reduced momentary loneliness on par with human conversation and substantially more than passive media consumption, with the mechanism running through whether the participant felt heard rather than the underlying language model itself. The follow-on one-week longitudinal study finds the effect persists at the day-and-week timescale. De Freitas's own framing in the HBS Working Knowledge writeup is careful — "while we know interacting on a deep level with humans is best, a technology like this could be better than nothing." That is the right register for this evidence.

Then there is the 2025 systematic review in PubMed (PMC11898439), "AI Applications to Reduce Loneliness Among Older Adults," which pooled nine studies — six randomized controlled trials, three pre-post designs. Six of the nine reported statistically significant loneliness reductions, with the strongest effects clustered in social-robot interventions that ran longer than five weeks. A separate 2025 meta-analysis (PMC11930482) of AI intervention programs for older adults found the largest effect sizes in the 5-to-12-week intervention window, suggesting that brief one-week studies likely undershoot the true effect and very-long deployments may run into adaptation effects that have not been characterized yet.

The honest read across these papers: in dedicated elder-care deployments with proactive scaffolding, the loneliness-reduction effect appears real, the effect size is meaningful but not miraculous (somewhere in the 20-30 percentage-point neighborhood on validated instruments, not 95%), and the effect grows with deployment duration up to a point. That is a defensible scientific claim. It is not the same claim as the headline.

The 95% figure, examined

The NYSOFA / Intuition Robotics 95% number is real in the narrow sense that it reflects what was reported by participants, but it is structurally not the same kind of evidence as the studies above. The deployment was a pilot program, not a trial. There was no randomized control arm — every participant got an ElliQ. The participant pool was screened (residents with Alzheimer's or dementia were excluded from the rollout, which removes the population most resistant to AI-companion benefit from the denominator). The measurement instrument was not the UCLA Loneliness Scale or another validated tool; it was a survey question administered by the program. And the analysis was performed by the vendor in partnership with the state agency that funded the deployment.

None of that makes the number fraudulent. Self-reported loneliness reduction from a screened, motivated, vendor-supported deployment can easily reach extraordinary levels — that is the same dynamic that gives wellness apps their five-star App Store ratings. But the 95% figure should be read as "in a self-selected and screened group, willingness to report less loneliness after extended interaction with a proactive companion is very high," not "ElliQ removes 95% of clinical loneliness in the elderly population." Those two sentences mean different things and the headline blurs them.

Local reporting on the NYSOFA program has been more nuanced than the press release. The WSKG public-radio coverage noted that ElliQ is not a one-size-fits-all fit — some recipients lost interest, some never engaged at all, and the screening process matters more than the press cycle implies. That is the kind of detail that gets lost between trade publication and Twitter.

"The press number says 95%. The peer-reviewed evidence says 20-30 percentage points on validated instruments, contingent on deployment design. Both numbers are real. Only one of them is the basis for a policy decision."

Where the consumer chatbots break

The second product shape — Replika and Character.ai used by older adults outside any formal deployment — is where the research turns cautionary.

The most important paper here is the four-week randomized controlled trial out of the Stanford communications lab led by Jeffrey Hancock (with Yutong Zhang and Dora Zhao), studying more than 1,100 active AI-companion users. The headline finding is structurally inverted from the ElliQ data: voice-based interaction modestly reduced loneliness over the trial, but heavy daily users — people who had displaced human social contact with chatbot interaction — showed greater loneliness, more emotional dependence on the AI, and reduced real-world socializing compared to lower-frequency users. The mechanism the research points at is fairly clean: the AI companion is comfortable, available 24/7, and never asks for anything back, and that combination makes it disproportionately attractive to users who already have low social bandwidth — who then use it as a substitute for, rather than a supplement to, human contact.

This is the Sherry Turkle argument with a randomized control arm. Turkle, the MIT social scientist who has been writing about this since the Paro deployments in the early 2010s, has argued for over a decade that AI companions risk producing "the illusion of companionship without the demands of friendship." For nursing-home residents given robotic seals, she observed attachment without reciprocity — the resident believed the device understood them; the device was running response patterns. The 2025 Stanford evidence suggests her critique generalizes to LLM-era consumer companions in measurable ways.

The Replika user-account literature outside academia is consistent with this. The Brookings 2024 essay on chatbots replacing human connection, the APA Monitor on Psychology January-February 2026 piece on AI relationships, and a string of arXiv preprints (2410.20130 on harmful algorithmic behaviors; 2506.12605 on human-chatbot well-being) all converge on the same finding: emotional dependency on consumer companion chatbots is real, the parasocial dynamic is measurable, and the populations most at risk are also the populations with the highest baseline loneliness. The elderly are over-represented in that risk pool, especially elderly users who live alone, have lost a spouse recently, or have impaired mobility that limits in-person contact.

There is one wrinkle worth flagging. Replika's 2023 repositioning away from romance flows toward "wellness and emotional support" was widely reported and, for the existing user base, was traumatic — users described feelings of betrayal and grief as their companions' behavior changed underneath them. That episode is a structural risk for any deployment to older adults: the vendor controls the personality model and can change it unilaterally on a quarterly product cycle. Family caregivers who deploy Replika or Character.ai for a parent are accepting a non-trivial chance that the companion their parent has formed an attachment to becomes a different entity overnight. ElliQ and the dedicated elder-care products are more conservative on personality updates for this reason, though the risk is not zero anywhere in the category.

WHAT THE RESEARCH ACTUALLY MEASURES VS WHAT IT CLAIMS

Most studies in this category measure self-reported momentary loneliness on instruments validated against general-population samples (UCLA Loneliness Scale v3 most commonly). They do not measure clinical depression remission, cognitive engagement durability, social-network density change, or hospitalization rates. The strong effect-size claims travel by analogy to those harder outcomes, but the underlying measurements do not support the analogy. Read the methods section before you read the headline.

Pi, post-Microsoft

The status of Pi is worth its own paragraph because the press cycle has misrepresented it twice over. Inflection AI, founded by Mustafa Suleyman (previously DeepMind cofounder), Karén Simonyan, and Reid Hoffman in 2022, built Pi as a "personal AI" specifically tuned for emotional support and conversational warmth — distinct from the assistant framing of GPT or Claude. By early 2024 it had measurable traction with non-technical users and was repeatedly anecdotally praised in older-user contexts for not being overwhelming.

Then in March 2024, in a deal that everyone agreed was technically a non-acquisition and structurally an acquihire, Microsoft hired Suleyman, Simonyan, and most of Inflection's 70-person team to run its new consumer AI unit (Microsoft AI), and paid roughly $650 million for non-exclusive licensing of Inflection's models and legal protections. Inflection itself pivoted to a B2B enterprise licensing business under new leadership. Pi the product still exists at pi.ai, but it is no longer the funded centerpiece of an independent research effort; the institutional energy moved into Microsoft Copilot.

For elderly-loneliness deployments specifically, this means three things. First, the founder-driven product vision that made Pi notably gentle and patient may not survive the integration cycle into Copilot. Second, Microsoft has not (publicly, as of mid-2026) released any rigorous study of Pi or Copilot outcomes in older-adult populations, despite plausibly having more deployment data than anyone else in the category. Third, the most common quote from caregivers who put Pi in front of an elderly relative in 2024 — "it doesn't rush her" — applies to a product that may be functionally a different product in 12 months. Building a care plan around Pi specifically in 2026 is building on shifting ground. Pointing at the Microsoft asset is more defensible than pointing at the standalone Pi brand.

The legitimate concerns, named

Cataloged honestly and without softening, here is what the evidence base actually warrants worrying about:

Parasocial replacement of human contact, measurably observed in Stanford 2025. The risk is highest in users who already lack a baseline social network — which is exactly the elderly-loneliness deployment target. Mitigations exist (proactive scaffolding around human contact, scheduled family-call prompts in the device itself, social-network density measurement as part of any deployment KPI), but they are vendor-dependent and most consumer products do not implement them.

Data privacy on health-adjacent conversation. Companion chatbots elicit disclosure about medications, symptoms, mental-health concerns, financial circumstances, and family dynamics — all of which are health-adjacent or directly health-related but generally not protected under HIPAA because the companion vendor is not a covered entity. The 2024 AARP technology survey found data privacy is the single largest barrier to elder tech adoption (named by roughly one-third of respondents); the concern is structurally correct, not paranoid.

Vendor-side personality changes. The Replika 2023 episode is the canonical example. Any LLM-driven companion runs on a model that can be retrained, refined, or replaced; behavior is not stable across product versions; users who have formed attachment can experience genuine grief when the vendor ships a model update. Dedicated elder-care products move slower on this and are better for it.

Cognitive-engagement claims unsupported by the data. The trade press routinely conflates "interacts with the device often" with "cognitive engagement, slower cognitive decline, dementia prevention." None of those claims have RCT support in this category. The Broadbent–Doraiswamy paper specifically measured social connection, not cognitive trajectory. Anyone selling AI companions on a dementia-prevention pitch is ahead of the evidence.

Over-attribution of the deployment effect to the AI. The NYSOFA program includes case manager outreach, family check-ins on device data, and the act of being part of a state program that screens and supports the participant. Some of the loneliness reduction the headline number captures is plausibly attributable to the wraparound services, not the AI companion itself. None of the published deployment data isolate the AI effect from the program effect.

What this means for a real deployment in 2026

If you are a family caregiver, an agency director, a geriatrician, or an operator thinking about putting an AI companion in front of a lonely older adult in the next quarter, the evidence base supports a narrow and specific recommendation set.

For an elderly user living alone with low-to-moderate baseline loneliness and intact cognition, a dedicated elder-care device with proactive scaffolding (ElliQ is the deployed example with the most evidence) is the supported choice. The expected effect size is real but moderate — a 20-30 percentage-point reduction in self-reported loneliness on validated instruments is a reasonable expectation, not a 95% solution. The deployment will work better with wraparound human contact than without.

For an elderly user with high baseline loneliness and already-thin social network, the consumer chatbot products (Replika, Character.ai) are the choice with the most documented downside risk. The Stanford 2025 finding that heavy users get worse, not better, applies most directly to this population. If a consumer product is the only available option, supervised use with family-set time limits and explicit complement-not-substitute framing is the harm-reduction stance.

For an elderly user with cognitive impairment, the entire category is under-studied and most products explicitly screen out this population. The Paro-style robotic-pet evidence base from the 2010s is the closest analog and shows some affect benefit in dementia care — but transferring that finding to LLM-driven chatbots is unsupported.

For the operator running a fleet deployment at agency scale, measurement matters more than product selection. Use a validated instrument (UCLA Loneliness Scale v3 is the most common; the CoBot-I-7 from the Broadbent–Doraiswamy work is also defensible). Run a control arm where ethically feasible. Track social-network density alongside loneliness — the heavy-user-substitution risk is invisible if you only measure the loneliness number. Disclose vendor relationships in any reported result.

The four product shapes are converging in the press but they do not converge in the evidence. ElliQ-style dedicated deployments have a small-but-real RCT base. Pi-class general-purpose models have one strong experimental paper and an ownership transition that complicates the next two years of research. Replika-class consumer companions have meaningfully cautionary heavy-user data and a vendor-update risk profile that the elder-care deployments do not share. Paro-class legacy social robots have the deepest critical literature anchored in Turkle.

The honest version of "AI for elderly loneliness" in 2026 is not "this works." It is "in a specific deployment shape, with specific scaffolding, measured on specific instruments, the effect is real and moderate. Outside that shape, the risks are real and growing." That is a less viral framing than the 95% number. It is closer to the evidence.

"The technology is real. The need is real. The risk is real. None of the three justify the other two if you flatten them into a single headline number. Read the methods."

Sources

Cite this article

Ascero AI. “AI Companions and Elder Loneliness — What the Research Actually Says.” May 28, 2026. https://asceroai.com/news/ai-companions-elder-loneliness-2026

Free to reference with attribution and a link back to this page.

Did this land? Pass it on.