A blog post from Wharton Interactive’s Faculty Director Ethan Mollick provided me with the heads up about an AI study with potential implications to the Future of Work. Mollick wrote that he and several other academics from Harvard and MIT spent the past several months working with the Boston Consulting Group (BCG).
The research project involved measuring the impact of AI LLM tools, specifically ChatGTP4, on the productivity of 758 individual contributor-level consultants at BCG. The results of the study are presented in a working paper titled Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Worker Productivity and Quality.
The researchers designed two pre-registered randomized experiments using BCG consultants to “assess the impact of AI on high human capital professionals.” Two distinct tests were conducted: one outside of the frontier of AI capabilities and the other inside the bounds of the AI tool.
Consultant participants initially conducted a task without the aid of AI to establish a baseline for performance. Following the completion of that task, participants were randomly assigned to one of three groups. Group 1 was a control condition without AI support. Group 2 had access to ChatGPT4. Group 3 had access to ChatGPT4 and received a prompt engineering overview.
All tasks assigned to participants came with a specific time allocation. Participants with access to AI utilized a BCG platform that mirrored ChatGPT and enabled the collection of all participant prompts and ChatGPT’s responses. As previously mentioned, the first task was within the tech frontier of ChatGPT4, and the second task was designed so that ChatGPT would make an error.
For tasks inside the boundaries of ChatGPT4’s capabilities, AI increased performance and quality for all 18 tasks assigned. Speed increased by more than 25%, performance rated by humans by more than 40%, and task completion by more than 12%. While all participants benefited from the use of AI, the bottom-half performers benefited the most.
Tasks outside the frontier of ChatGPT4 saw performance decreases for the groups that used AI. The researchers noted that these findings “highlight the importance of validating and interrogating AI and of continuing to exert cognitive effort and experts’ judgement when working with AI.” Expertise is required to navigate the frontier.
The researchers note that expertise can be built “through formal education, on-the-job training, and employee=driven upskilling.” They note that these findings should end the debate whether knowledge workers should use AI or not. Organizations should focus on the knowledge workflow and evaluate the value of combinations of AI and humans.
The researchers also noted a few potential red flags from their research. These included the potential for “diminished diversity of ideas stemming from AI usage.” Also, the optimal AI strategy can vary depending on an organization’s production function varying based on expectations of high outputs versus “maximum exploration and innovation.”
I found the research designed and conducted by the researchers and the Boston Consulting Group to be insightful and thought-provoking. The 58-page paper provides far more detail in experiment design, analysis, and suggestions for practical applications as well as future research.
For anyone with an interest in the future of work as well as the future of education, I recommend reading this paper as well. BCG followed up the publication of the paper with a September 26 Weekly Brief blog post.
The BCG blog post referenced the paper and its key findings. In addition, the firm noted that “the study is both promising and sobering in its implications.”
Four points were noted for how companies should use AI tools like ChatGPT4. They should: (1) Build a hiring, training, and reselling plan; (2) Use GenAI technologies selectively and check results; (3) Protect diversity of thought; and (4) Build data advantage.
As I read the paper and later the BCG blog, I couldn’t help thinking that this is another example where a forward-thinking organization like BCG is far ahead of most companies as well as most educators. How long will it be before we see non-knowledge worker companies or universities and K-12 school systems conduct experiments like this? If the recent WCET survey of higher ed administrators is any indicator, it will be a while.
WCET has an AI resources page. There was one item added since the survey and my July blog post. It is an August 31 blog post from a lecturer at the University of Mississippi who trained two dozen UM faculty members this summer on how to use generative AI. At the bottom of that post, I found a link to a five-part instructional video from Wharton’s Ethan Mollick on how teachers can use generative AI in the classroom. Neither of these projects demonstrate the scale and rigor of the BCG experiment.
The researchers write that these findings should end the debate whether knowledge workers should use AI or not. I agree. I also wonder how many of our country’s school districts continue to debate that issue. If university faculty and staff are considered knowledge workers, how many efficiencies would result if all of them received prompt engineering training?
Access to AI, knowledge of its capabilities, and knowing how to utilize LLM tools like ChatGPT4 makes a substantial difference in performance, even with very smart consultants like those who work at BCG. Those without access, knowledge, and ability to use AI tools will be expendable. The gap between “Haves” and “Have Nots” will continue to widen.
Policymakers should not continue to ignore or attempt to block the pace of technology innovations like LLMs. They should encourage additional research like this and incorporate those findings into incentives for education and upskilling and reskilling. While they’re at it, they need to find ways to improve baseline education, K-20. The rest of the developed world is not sitting still.