成人抖阴

成人抖阴

Artificial Intelligence

A Cautionary AI Tale: Why IBM鈥檚 Dazzling Watson Supercomputer Made a Lousy Tutor

With a new race underway to create the next teaching chatbot, IBM鈥檚 abandoned 5-year, $100M ed push offers lessons about AI鈥檚 promise and its limits.

By Greg Toppo | April 9, 2024

With a new race underway to create the next teaching chatbot, IBM鈥檚 abandoned 5-year, $100M ed push offers lessons about AI鈥檚 promise and its limits. 

In the annals of artificial intelligence, Feb. 16, 2011, was a watershed moment.

That day, IBM鈥檚 Watson supercomputer finished off a three-game shellacking of Jeopardy! champions Ken Jennings and Brad Rutter. Trailing by over $30,000, Jennings, now the show鈥檚 host, wrote out his Final Jeopardy answer in mock resignation: 鈥淚, for one, welcome our computer overlords.鈥

A lark to some, the experience galvanized Satya Nitta, a longtime computer researcher at IBM鈥檚 Watson Research Center in Yorktown Heights, New York. Tasked with figuring out how to apply the supercomputer鈥檚 powers to education, he soon envisioned tackling ed tech鈥檚 most sought-after challenge: the world’s first tutoring system driven by artificial intelligence. It would offer truly personalized instruction to any child with a laptop 鈥 no human required.

YouTube

鈥淚 felt that they’re ready to do something very grand in the space,鈥 he said in an interview. 

Nitta persuaded his bosses to throw more than $100 million at the effort, bringing together 130 technologists, including 30 to 40 Ph.D.s, across research labs on four continents. 

But by 2017, the tutoring moonshot was essentially dead, and Nitta had concluded that effective, long-term, one-on-one tutoring is 鈥渁 terrible use of AI 鈥 and that remains today.鈥

For all its jaw-dropping power, Watson the computer overlord was a weak teacher. It couldn鈥檛 engage or motivate kids, inspire them to reach new heights or even keep them focused on the material 鈥 all qualities of the best mentors.

It鈥檚 a finding with some resonance to our current moment of AI-inspired doomscrolling about the future of humanity in a world of ascendant machines. 鈥淭here are some things AI is actually very good for,鈥 Nitta said, 鈥渂ut it’s not great as a replacement for humans.鈥

His five-year journey to essentially a dead-end could also prove instructive as ChatGPT and other programs like it fuel a renewed, multimillion-dollar experiment to, in essence, prove him wrong.

Some of the leading lights of ed tech, from to , are trying to pick up where Watson left off, offering AI tools that promise to help teach students. Sal Khan, founder of Khan Academy, last year said AI has the potential to bring 鈥減robably the 鈥 that education has ever seen. He wants to give 鈥渆very student on the planet an artificially intelligent but amazing personal tutor.鈥

A 25-year journey

To be sure, research on high-dosage, one-on-one, in-person tutoring is : It鈥檚 interventions available, offering significant improvement in students鈥 academic performance, particularly in subjects like math, reading and writing.  

But traditional tutoring is also 鈥渂reathtakingly expensive and hard to scale,鈥 said Paige Johnson, a vice president of education at Microsoft. One school district in West Texas, for example, recently spent in federal pandemic relief funds to tutor 6,000 students. The expense, Johnson said, puts it out of reach for most parents and school districts. 

We missed something important. At the heart of education, at the heart of any learning, is engagement.

Satya Nitta, IBM Research鈥檚 former global head of AI solutions for learning

For IBM, the opportunity to rebalance the equation in kids鈥 favor was hard to resist. 

The Watson lab is legendary in the computer science field, with and six Turing Award winners among its ranks. It鈥檚 where modern was invented, and home to countless other innovations such as barcodes and the magnetic stripes on credit cards that make . It鈥檚 also where, in 1997, Deep Blue beat Garry Kasparov, essentially inventing the notion that AI could 鈥渢hink鈥 like a person.

Chess enthusiasts watch World Chess champion Garry Kasparov on a television monitor as he holds his head in his hands at the start of the sixth and final match May 11, 1997 against IBM’s Deep Blue computer in New York. Kasparov lost this match in just 19 moves. (Stan Honda/Getty)

The heady atmosphere, Nitta recalled, inspired 鈥渁 very deep responsibility to do something significant and not something trivial.鈥

Within a few years of Watson鈥檚 victory, Nitta, who had arrived in 2000 as a chip technologist, rose to become IBM Research鈥檚 global head of AI solutions for learning. For the Watson project, he said, 鈥淚 was just given a very open-ended responsibility: Take Watson and do something with it in education.鈥

Nitta spent a year simply reading up on how learning works. He studied cognitive science, neuroscience and the decades-long history of 鈥渋ntelligent tutoring systems鈥 in academia. Foremost in his reading list was the research of Stanford neuroscientist Vinod Menon, who鈥檇 put elementary schoolers through a 12-week math tutoring session, collecting before-and-after scans of their brains using an MRI. Tutoring, he found, produced nothing less than an increase in neural connectivity. 

Nitta returned to his bosses with the idea of an AI-powered cognitive tutor. 鈥淭here’s something I can do here that’s very compelling,鈥 he recalled saying, 鈥渢hat can broadly transform learning itself. But it’s a 25-year journey. It’s not a two-, three-, four-year journey.鈥

IBM drafted two of the highest-profile partners possible in education: the children鈥檚 media powerhouse Sesame Workshop and Pearson, the international publisher.

One product envisioned was a voice-activated Elmo doll that would serve as a kind of digital tutoring companion, interacting fully with children. Through brief conversations, it would assess their skills and provide spoken responses to help kids advance.

One proposed application of IBM鈥檚 planned Watson tutoring app was to create a voice-activated Elmo doll that would be an interactive digital companion. (Getty)

Meanwhile, Pearson promised that it could soon allow college students to 鈥渄ialogue with Watson in real time.鈥

Nitta鈥檚 team began designing lessons and putting them in front of students 鈥 both in classrooms and in the lab. In order to nurture a back-and-forth between student and machine, they didn鈥檛 simply present kids with multiple-choice questions, instead asking them to write responses in their own words.

It didn鈥檛 go well.

Some students engaged with the chatbot, Nitta said. 鈥淥ther students were just saying, ‘IDK’ [I don鈥檛 know]. So they simply weren’t responding.鈥 Even those who did began giving shorter and shorter answers. 

Nitta and his team concluded that a cold reality lay at the heart of the problem: For all its power, Watson was not very engaging. Perhaps as a result, it also showed 鈥渓ittle to no discernible impact鈥 on learning. It wasn鈥檛 just dull; it was ineffective.

Satya Nitta (left) and part of his team at IBM鈥檚 Watson Research Center, which spent five years trying to create an AI-powered interactive tutor using the Watson supercomputer.

鈥淗uman conversation is very rich,鈥 he said. 鈥淚n the back and forth between two people, I’m watching the evolution of your own worldview.鈥 The tutor influences the student 鈥 and vice versa. 鈥淭here’s this very shared understanding of the evolution of discourse that’s very profound, actually. I just don’t know how you can do that with a soulless bot. And I’m a guy who works in AI.鈥

When students鈥 usage time dropped, 鈥渨e had to be very honest about that,鈥 Nitta said. 鈥淎nd so we basically started saying, ‘OK, I don’t think this is actually correct. I don’t think this idea 鈥 that an intelligent tutoring system will tutor all kids, everywhere, all the time 鈥 is correct.鈥

鈥榃e missed something important鈥

IBM soon switched gears, debuting another crowd-pleasing Watson variation 鈥 this time, a touching throwback: It engaged in . In a televised demonstration in 2019, it went up against debate champ Harish Natarajan on the topic 鈥淪hould we subsidize preschools?鈥 Among its arguments for funding, the supercomputer offered, without a whiff of irony, that good preschools can prevent 鈥渇uture crime.鈥 Its current iteration, focuses on helping businesses build AI applications like 鈥渋ntelligent customer care.鈥 

Nitta left IBM, eventually taking several colleagues with him to create a startup called . It uses voice-activated AI to safely help teachers do workaday tasks such as updating digital gradebooks, opening PowerPoint presentations and emailing students and parents. 

Thirteen years after Watson鈥檚 stratospheric Jeopardy! victory and more than one year into the Age of ChatGPT, Nitta鈥檚 expectations about AI couldn鈥檛 be more down-to-earth: His AI powers what鈥檚 basically 鈥渁 carefully designed assistant鈥 to fit into the flow of a teacher’s day. 

To be sure, AI can do sophisticated things such as generating quizzes from a class reading and editing student writing. But the idea that a machine or a chatbot can actually teach as a human can, he said, represents 鈥渁 profound misunderstanding of what AI is actually capable of.鈥 

Nitta, who still holds deep respect for the Watson lab, admits, 鈥淲e missed something important. At the heart of education, at the heart of any learning, is engagement. And that’s kind of the Holy Grail.鈥

These notions aren鈥檛 news to those who do tutoring for a living. , which offers live and online tutoring in 500 school districts, relies on AI to power a lesson plan creator that helps personalize instruction. But when it comes to the actual tutoring, humans deliver it, said , chief institution officer at , which operates Varsity.

鈥漈he AI isn’t far enough along yet to do things like facial recognition and understanding of student focus,鈥 said Salcito, who spent 15 years at Microsoft, most of them as vice president of worldwide education. 鈥淥ne of the things that we hear from teachers is that the students love their tutors. I’m not sure we’re at a point where students are going to love an AI agent.鈥

Students love their tutors. I'm not sure we're at a point where students are going to love an AI agent.

Anthony Salcito, Nerdy

The No. 1 factor in a student鈥檚 tutoring success is consistently, research suggests. As smart and efficient as an AI chatbot might be, it鈥檚 an open question whether most students, especially struggling ones, would show up for an inanimate agent or develop a sense of respect for its time.

When Salcito thinks about what AI bots now do in education, he鈥檚 not impressed. Most, he said, 鈥渁ren’t going far enough to really rethink how learning can take place.鈥 They end up simply as fast, spiffed-up search engines. 

In most cases, he said, the power of one-on-one, in-person tutoring often emerges as students begin to develop more honesty about their abilities, advocate for themselves and, in a word, demand more of school. 鈥淚n the classroom, a student may say they understand a problem. But they come clean to the tutor, where they expose, ‘Hey, I need help.’鈥

Cognitive science suggests that for students who aren鈥檛 motivated or who are uncertain about a topic, only will help. That requires a focused, caring human, watching carefully, asking tons of questions and reading students鈥 cues. 

Jeremy Roschelle, a learning scientist and an executive director of Digital Promise, a federally funded research center, said usage with most ed tech products tends to drop off. 鈥淜ids get a little bored with it. It’s not unique to tutors. There’s a newness factor for students. They want the next new thing.鈥澛

There's a newness factor for students. They want the next new thing.

Jeremy Roschelle, Digital Promise

Even now, Nitta points out, research shows that big commercial AI applications don鈥檛 seem to hold users鈥 attention as well as top entertainment and social media sites like YouTube, Instagram and TikTok. dubbed the user engagement of sites like ChatGPT 鈥渓ackluster,鈥 finding that the proportion of monthly active users who engage with them in a single day was only about 14%, suggesting that such sites aren鈥檛 very 鈥渟ticky鈥 for most users.

For social media sites, by contrast, it鈥檚 between 60% and 65%. 

One notable AI exception: , an app that allows users to create companions of their own among figures from history and fiction and chat with the likes of Socrates and Bart Simpson. It has a stickiness score of 41%.

As startups like offer 鈥測our child’s superhuman tutor,鈥 starting at $29 per month, and publicly tests its popular Khanmigo AI tool, Nitta maintains that there鈥檚 little evidence from learning science that, absent a strong outside motivation, people will spend enough time with a chatbot to master a topic.

鈥淲e are a very deeply social species,鈥 said Nitta, 鈥渁nd we learn from each other.鈥

IBM declined to comment on its work in AI and education, as did Sesame Workshop. A Pearson spokesman said that since last fall it has been 鈥嬧媌eta-testing AI study tools keyed to its e-textbooks, among other efforts, with plans this spring to expand the number of titles covered. 

Getting 鈥榰nstuck鈥

IBM鈥檚 experiences notwithstanding, the search for an AI tutor has continued apace, this time with more players than just a legacy research lab in suburban New York. Using the latest affordances of so-called large language models, or LLMs, technologists at Khan Academy believe they are finally making the first halting steps in the direction of an effective AI tutor. 

Kristen DiCerbo remembers the moment her mind began to change about AI. 

It was September 2022, and she鈥檇 only been at Khan Academy for a year-and-a-half when she and founder Khan got access to a beta version of ChatGPT. Open AI, ChatGPT鈥檚 creator, had asked Microsoft co-founder Bill Gates for more funding, but he told them not to come back until the chatbot could pass an Advanced Placement biology exam.

Khan Academy founder Sal Khan has said AI has the potential to bring 鈥減robably the biggest positive transformation鈥 that education has ever seen. He wants to give every student an 鈥渁rtificially intelligent but amazing personal tutor.鈥 (Getty)

So Open AI queried Khan for sample AP biology questions. He and DiCerbo said they鈥檇 help in exchange for a peek at the bot 鈥 and a chance to work with the startup. They were among the first people outside of Open AI to get their hands on GPT-4, the LLM that powers the upgraded version of ChatGPT. They were able to test out the AI and, in the process, become amateur AI before anyone had even heard of the term. 

Like many users typing in queries in those first heady days, the pair initially just marveled at the sophistication of the tool and its ability to return what felt, for all the world, like personalized answers. With DiCerbo working from her home in Phoenix and Khan from the nonprofit鈥檚 Silicon Valley office, they traded messages via Slack.

Kristen DiCerbo introduces users to Khanmigo in a Khan Academy promotional video. (YouTube)

鈥淲e spent a couple of days just going back and forth, Sal and I, going, 鈥極h my gosh, look what we did! Oh my gosh, look what it’s saying 鈥 this is crazy!鈥欌 she told an audience during a recent at the University of Notre Dame. 

She recounted asking the AI to help write a mystery story in which shoes go missing in an apartment complex. In the back of her mind, DiCerbo said, she planned to make a dog the shoe thief, but didn鈥檛 reveal that to ChatGPT. 鈥淚 started writing it, and it did the reveal,鈥 she recalled. 鈥淚t knew that I was thinking it was going to be a dog that did this, from just the little clues I was planting along the way.鈥

More tellingly, it seemed to do something Watson never could: have engaging conversations with students.

DiCerbo recounted talking to a high school student they were working with who told them about an interaction she鈥檇 had with ChatGPT around The Great Gatsby. She asked it about F. Scott Fitzgerald鈥檚 famous , which scholars have long interpreted as symbolizing Jay Gatsby鈥檚 out-of-reach hopes and dreams.

鈥淚t comes back to her and asks, 鈥楧o you have hopes and dreams just out of reach?鈥欌 DiCerbo recalled. 鈥淚t had this whole conversation鈥 with the student.

The pair soon tore up their 2023 plans for Khan Academy. 

It was a stunning turn of events for DiCerbo, a Ph.D. educational psychologist and former senior Pearson research scientist who had spent more than a year on the failed Watson project. In 2016, Pearson that Watson would soon be able to chat with college students in real time to guide them in their studies. But it was DiCerbo鈥檚 teammates, about 20 colleagues, who had to actually train the supercomputer on thousands of student-generated answers to questions from textbooks 鈥 and tempt instructors to rate those answers. 

Like Nitta, DiCerbo recalled that at first things went well. They found a natural science textbook with a large user base and set Watson to work. 鈥淵ou would ask it a couple of questions and it would seem like it was doing what we wanted to,鈥 answering student questions via text.

But invariably if a student鈥檚 question strayed from what the computer expected, she said, 鈥渋t wouldn’t know how to answer that. It had no ability to freeform-answer questions, or it would do so in ways that didn’t make any sense.鈥 

After more than a year of labor, she realized, 鈥淚 had never seen the 鈥極K, this is going to work鈥 version鈥 of the hoped-for tutor. 鈥淚 was always at the 鈥極K, I hope the next version’s better.’鈥

But when she got a taste of ChatGPT, DiCerbo immediately saw that, even in beta form, the new bot was different. Using software that quickly predicted the most likely next word in any conversation, ChatGPT was able to engage with its human counterpart in what seemed like a personal way.

Since its debut in March 2023, Khanmigo has turned heads with what many users say is a helpful, easy-to-use, natural language interface, though a few users have pointed out that it sometimes .

Surprisingly, DiCerbo doesn鈥檛 consider the popular chatbot a full-time tutor. As sophisticated as AI might now be in motivating students to, for instance, try again when they make a mistake, 鈥淚t’s not a human,鈥 she said. 鈥淚t’s also not their friend.鈥

(AI's) not a human. It鈥檚 also not their friend.

Kristen DiCerbo, Khan Academy

Khan Academy鈥檚 shows their tool is effective with as little as 30 minutes of practice and feedback per week. But even as many startups promise the equivalent of a one-on-one human tutor, DiCerbo cautions that 30 minutes is not going to produce miracles. Khanmigo, she said, 鈥渋s not a solution that’s going to replace a human in your life,鈥 she said. 鈥淚t’s a tool in your toolbox that can help you get unstuck.鈥

鈥楢 couple of million years of human evolution鈥

For his part, Nitta says that for all the progress in AI, he鈥檚 not persuaded that we鈥檙e any closer to a real-live tutor that would offer long-term help to most students. If anything, Khanmigo and probabilistic tools like it may prove to be effective 鈥渉omework helpers.鈥 But that鈥檚 where he draws the line. 

鈥淚 have no problem calling it that, but don’t call it a tutor,鈥 he said. 鈥淵ou’re trying to endow it with human-like capabilities when there are none.鈥  

Unlike humans, who will typically do their best to respond genuinely to a question, the way AI bots work 鈥攂y digesting pre-existing texts and other information to come up with responses that seem human 鈥 is akin to a 鈥渟tatistical illusion,鈥 writes Harvard Business School Professor . 鈥淭hey鈥檝e just been well-trained by humans to respond to humans.鈥

Researcher Sidney Pressey鈥檚 1928 Testing Machine, one of a series of so-called 鈥渢eaching machines鈥 that he and others believed would advance education through automation.

Largely because of this, Nitta said, there鈥檚 little evidence that a chatbot will continuously engage people as a good human tutor would.

What would change his mind? Several years of research by an independent third party showing that tools like Khanmigo actually make a difference on a large scale 鈥 something that doesn鈥檛 exist yet.

DiCerbo also maintains her hard-won skepticism. She knows all about the halting early decades of computers a century ago, when experimental, punch-card operated 鈥渢eaching machines鈥 guided students through rudimentary multiple-choice lessons, often with simple rewards at the end. 

In her talks, DiCerbo urges caution about AI revolutionizing education. As much as anyone, she is aware of the expensive failures that have come before. 

Two women stand beside open drawers of computer punch card filing cabinets. (American Stock/Getty Images)

In her recent talk at Notre Dame, she did her best to manage expectations of the new AI, which seems so limitless. In one-to-one teaching, she said, there鈥檚 an element of humanity 鈥渢hat we have not been able to 鈥 and probably should not try 鈥 to replicate in artificial intelligence.鈥 In that respect, she鈥檚 in agreement with Nitta: Human relationships are key to learning. In the talk, she noted that students who have a person in school who cares about their learning have higher graduation rates. 

But still.

ChatGPT now has 100 million weekly users, according to . That record-fast uptake makes her think 鈥渢here’s something interesting and sticky about this for people that we haven’t seen in other places.鈥

Being able to engineer prompts in plain English opens the door for more people, not just engineers, to create tools quickly and iterate on what works, she said. That democratization could mean the difference between another failed promise and agile tools that actually deliver at least a version of Watson鈥檚 promise. 

An early prototype of IBM鈥檚 Watson supercomputer in Yorktown Heights, New York. In 2011, the system was the size of a master bedroom. (Wikimedia Commons)

Seven years after he left IBM to start his new endeavor, Nitta is philosophical about the effort. He takes virtually full responsibility for the failure of the Watson moonshot. In retrospect, even his 25-year timeline for success may have been naive.

鈥淲hat I didn’t appreciate is, I actually was stepping into a couple of million years of human evolution,鈥 he said. 鈥淭hat’s the thing I didn’t appreciate at the time, which I do in the fullness of time: Mistakes happen at various levels, but this was an important one.鈥

Help fund stories like this.

Republish This Article

We want our stories to be shared as widely as possible 鈥 for free.

Please view 成人抖阴's republishing terms.





On 成人抖阴 Today