Data Analytics Applications for Student Retention – Traditional Techniques versus Machine Learning

The launch of the James Webb Space Telescope triggered more than a few memories of mine. One of those memories was the alleged quote from astrophysicist and tv host, Carl Sagan, who referred to the number of stars in the universe as “billions and billions.”

Another memory was when Dr. Phil Ice and I presented a paper about online student retention at the biennial conference of the Oxford Internet Institute at Oxford University. After we presented our findings, there was a short break where conference attendees could ask questions. A gentleman introduced himself as an Oxford astrophysicist and wanted to know why we selected the analytical methodology (neural network analysis) that we used to analyze 3,000,000 online student records. He explained that astrophysicists used a different technique given that there were billions of stars and planets in the universe. I deferred to Dr. Ice, whose knowledge of machine learning was much more advanced than mine.

It’s a given that there are far too many stars and planets for astronomers and astrophysicists to examine through the James Webb Space Telescope. Most of them likely used machine learning techniques to determine which ones they want to analyze if they are awarded time to use the instrument. In order to update my thinking as well as provide an interesting article for my readers, I decided to conduct an “interview” with my friend and former colleague, Dr. Ice.

Wally: Phil, the launch of the James Webb Space Telescope triggered a memory of our astrophysicist acquaintance at Oxford and his query about why we selected neural networks as the tool for our predictive analytics model. How should we think about using machine learning tools versus some of our tried-and-true statistical tools like multiple regression?

Phil: I would like to begin by reflecting on where I am today, where the industry is, and how we got to this point. Some 15 years ago, I was focused on research around the Community of Inquiry (CoI) Framework and how it could help inform best practices in online course development. Naturally, online course development was constantly changing as emerging technologies were integrated by early innovators.

After presenting some findings at the Sloan-C (now Online Learning Consortium) in 2006, I was approached by one of the most interesting and brilliant characters I have ever had the pleasure of knowing, Dr. Frank McCluskey (Dr. Frank to those who know him), Provost at American Public University (APUS). After concurring with the importance of using analytics to measure engagement, he asked, “Why don’t you come work for me?” Shortly thereafter, I joined the APUS family.

When I met you, Wally, I realized that you were one of the few individuals I knew who could weave together the complexities of business and higher education. Your professional interest in retention and student success, led me to build a data analytics team at APUS that could work with its faculty to understand the characteristics and conditions that made the difference between a student who graduated or one who did not.

Using the mountains of student persistence and retention research generated by Vincent Tinto, Alexander Astin, and others, we began systematically exploring the topic using traditional analytical techniques such as multivariate regression, decision trees, and factor analysis. The results were compelling. The findings were quickly communicated across various parts of the University. Student persistence began to improve at APUS.

Two years later, I concluded a presentation on our work at the first Sloan-C Emerging technology conference and a gentlemen approached me and introduced himself as Josh Jarrett, Deputy Director, Postsecondary Success, at the Bill and Melinda Gates Foundation. He complimented me on the work we had done at APUS and, after exchanging cards, asked if we could talk further about retention. Thus, began approximately 18 months of calls and trips to Seattle to discuss online student retention and student success with other like-minded individuals from across the nation.

The time invested paid off in 2011 when the Gates Foundation funded a proposal to examine online student retention across six institutions / systems, which I had the honor of serving as Primary Investigator. While much of the initial work consisted of the less than glamorous task of developing common data definitions, six months of painstaking work led to the analysis phase, where the same conventional techniques that were used in foundational work at APUS were employed and findings published in a paper.

After that project, the Gates Foundation funded further analysis of student persistence to include additional colleges and universities. The next project increased our student records from 660,000 to more than 3,000,000. For the data analysis conducted in that study, we decided to use neural network analysis.

The results of the PAR project are well documented and still live on in retention and progression work by numerous institutions and companies, such as Civitas, Hobsons, and Analytikus. However, the techniques used in analysis have become progressively more complex as we move from retrospective analysis, to predictive and prescriptive. Thus, we come full circle back to the question of if there are advantages and disadvantages to using traditional techniques, such as regression, as opposed to more sophisticated models, such as neural networks.

If we look at classic research techniques, such as regression, we assume that criterion and predictor variables have a linear association. In the online enrollment pathways utilized by non-traditional students, it is quite clear that the series of events influencing enrollment decisions for each student is far from linear. Most machine learning techniques do not assume linearity but assume that there is a certain messiness involved in the interactions and feedback loops that are being examined.

Based on these underlying premises, it would be natural to assume that perhaps neural networks would be an ideal tool for looking at messy sets of data that are the product of incredibly complex and messy interactions. But, not so fast. While neural nets are often extremely good at predicting what will happen, they are often so convoluted that one can’t reasonably interpret how the conclusion was arrived at. In many instances it becomes necessary to break out the old, familiar tools to examine the data a second time, first seeing if the findings are similar and if they are, using one to inform and validate the other. And, if this sounds like it is equal parts art and science, that’s because it is; as with the underlying data it is messy.

As I respond to your original question, neither traditional techniques nor contemporary machine learning are always the best solution. In fact, it is increasingly becoming the case that AI’s, assisted by a human, provide the best analysis. In some circles this is referred to as a human-assisted Turing machine. Though I am leaving out numerous details related to formalization, reducibility, and formation of equivalency classes, the fact is that the answer to retention and other messy problems will, at least for a while, be dependent upon a skill set, whose evolution we are witnessing in real-time.

Wally: Thank you for the compliment, Phil, and thank you for the explanation. I’d like to add some commentary as well as ask another question. After working with APUS for approximately eight years, you left for an entrepreneurial opportunity with Analytikus, a company specializing in artificial intelligence (AI) applications.

I can’t recall if Analytikus had a higher ed retention analysis tool before you joined them but using the knowledge and experience that you gained from your analysis work with APUS, the PAR (Predictive Analytics Reporting project sponsored by the Gates Foundation) project, and others, you and your partners built an analytical tool for retention called Foresight.

My understanding of the Foresight tool is that it collects student data such as courses completed, grades, demographics, etc. from the Student Information System (SIS) and combines that data with current individual course activity data from the Learning Management System (LMS). The analytical tool aggregates the data collected for each student and predicts their likelihood to (A) pass the course or courses that they’re currently enrolled in, (B) drop out or flunk the course or courses that they’re currently enrolled in, and (C) to reenroll for future courses, and (D) graduate. Is that correct?

Phil: Yes, Wally, you are correct. Analytikus had tools that addressed student retention before I came onboard. However, we took what I had learned about forecasting student success in the US market, consolidated the tools that they were using, and integrated other techniques that made the package more robust – resulting in Foresight. It’s important to mention that Foresight did everything you mentioned, but it did so on an ongoing basis. Typically, schools would submit data once or twice a week. This new data was tested against the models currently being used and, where necessary, the model was tweaked. Sometimes this was a small change, sometimes it was large. As institutions utilize the data and the dashboards that report it, they change the inputs. Hopefully, timely reporting helps to retain some students who would have otherwise dropped out. However, whenever an action is taken to support a student predicted to be at risk, that action also changes the data which changes the model. That’s why frequent updates are necessary.

Wally: Going back to your earlier explanation about linear progression of data versus non-linear, it’s clear to me that any college or university with thousands of students whose online or on-ground course attendance habits reflect non-traditional patterns (enroll and attend class whenever it fits their schedule) will have a lot of non-linear data. I can see how traditional tools like multi-variate regression would not necessarily generate the predictive outcomes you’re looking for. Can you explain how the tool can provide those predictors? I would also like an explanation as to why you and your partners at Analytikus are comfortable with the predictive data (presented on dashboards) and do not need to manually “tweak it” as you explained to me at the end of your answer to my first question.

Phil: This is where the conversation gets a little uncomfortable, because sometimes we can’t determine exactly how neural nets arrive at their answers. Certainly, there are cases where cause and effect are obvious. However, unlike regression analysis, you don’t get back a score that tells you how much variance is accounted for, just the positioning of importance of the variable. This is one of the problems with deep learning. We allow the AI software to train itself on volumes of data, which we supervise to the best of our ability. At the end of the day, we have faith in what it tells us. Usually, it is accurate even though some people don’t want it to be, primarily because additional student engagement requires additional time that they haven’t planned for.

Why were we comfortable with this approach? We weren’t always completely comfortable with the results we received. That’s why we ran other types of analyses as well to compare the results and make them as interpretable as possible. To be clear, we weren’t tweaking the algorithms as much as we were interpreting them and adding clarity where we could. This is done in numerous fields and is referred to as the application of ensemble theory. In these situations, the act of conducting multiple analyses and seeing what fits best may mean making some compromises. This is where predictive analytics requires you to be a dominantly left-brain person with just a touch of right brain that enables you to creatively interpret the results.

Wally: Thank you, Phil. We’ve come a long way in our ability to analyze retention and persistence for students enrolling in online courses. I enjoyed hearing about how the data assembled and analyzed by a tool like Analytikus can predict outcomes for individual students much more accurately than the statistical tools that we utilized in our graduate school classes and research projects post grad school. At the same time, your experiences “training” the dataset indicates to me that these tools are better positioned as smart assistants and not total solutions. An institution looking to improve its student retention needs to provide training to faculty and staff to recognize the warning signs for students and how best to assist them.