How to Generate Actionable Predictions on Student Engagement: Hands-on Tutorial with Python Scikit-Learn

Hands-on Tutorial Session in #LAK19, Tempe, Arizona

5th of March, 2019, 13:30-17:00.

Please use the form at the end of this page, to tell me about yourself. This is important for me to tune the tutorial content. See you at Tempe!


The area of predictive analytics has gained an increasing attention from the research community after the emergence of massive open online courses (MOOCs). Thus far, the prediction research has been based on the data from a single past course to build and test predictive models with post-hoc approaches (e.g., cross validation) (Gardner & Brooks, 2018). However, these approaches are not valid for real-world use since they require the true training labels which cannot be known until the target event takes place (e.g., dropouts) (Bote-Lorenzo & Gómez-Sánchez, 2017; Boyer & Veeramachaneni, 2015; Gardner & Brooks, 2018; Whitehill, Mohan, Seaton, Rosen, & Tingley, 2017).

To overcome the limitations of post-hoc prediction models, several works explored the use of the transferring across courses approach, in which a prediction model is built using a completed MOOC and then used for designing interventions in a follow-up MOOC (Boyer & Veeramachaneni, 2015, 2016). Studies investigating transferring models across MOOCs reported accurate predictions (Boyer & Veeramachaneni, 2016; Boyer, Gelman, Schreck, & Veeramachaneni, 2015), showing that different courses could be used to train a model to make predictions in another course.

Different from transferring models across different MOOCs, Boyer & Veeramachaneni (2015) have proposed the in-situ learning approach that allows training a model based on proxy labels (e.g., students are considered dropout if they have no interactions for a specific week (Kurka, Godoy, & Von Zuben, 2016)). A few studies have investigated the use of in-situ learning in MOOCs, and showed its effectiveness in creating models that can be actionable in ongoing courses (Bote-Lorenzo & Gómez-Sánchez, 2017, 2018).

Although transfer across courses and in-situ learning can provide actionable information for creating real-world interventions, their use is very limited in MOOC prediction research (Gardner & Brooks, 2018).


Actionable predictions regarding students’ future learning behaviour can offer a wide range of pedagogical utilities. Such machine learning techniques to create timely actionable information, if widely adopted and practised by researchers and practitioners, can promote the LA-empowered educational interventions in real-world practice.  This tutorial session will introduce two techniques, namely transferring across courses and in-situ learning, for creating actionable predictions in ongoing courses and will demonstrate their use through several hands-on examples. Python Scikit-Learn, one of the most widely used machine learning library in the field, will be used in the tutorial.

This tutorial is highly related with several past LAK workshops and tutorials, including, but are not limited to “Building predictive models of student success with the Weka toolkit” and “Python Bootcamp for Learning Analytics Practitioners”. These previous sessions have mainly focused on fundamental machine learning topics (e.g., unsupervised learning, text mining). Building on this evolving knowledge basis in the learning analytics community, the proposed session will motivate and inspire the LA researchers and practitioners to train machine learning models that can be actionable for designing real-world interventions.

Intended Outcome

The main objective of the proposed tutorial session is to teach the participants the techniques of transferring across courses and in-situ learning through several hands-on exercises. After this session, the participants will be able to train predictive models using both techniques in Python Scikit-Learn. At the end of the session, the participants will reflect on their experience and share their ideas on the ways that transfer across courses and in-situ learning can relate with their own research (if possible) as well as the ways that they can be used for creating educational interventions. To disseminate these emerging ideas about the pedagogical utilities of actionable predictions, social media will be used (e.g., hashtags in Twitter).

Tutorial Plan

Theory (15 mins.)

  • Introduction to Machine Learning
  • Training Paradigms: Cross-Validation, Transferring Across Courses and In-Situ Learning

Hands-On Exercise -PART 1 (60 mins.)

  • Getting Familiar with Jupyter Notebook Environment
  • Introduction to Python, Pandas, and Scikit-Learn
  • Understanding the Data

Hands-On Exercise -PART 2 (90 mins.)

  • Building Machine Learning Models in Scikit-Learn with
    • Cross Validation,
    • Transferring Across Courses
    • In-situ Learning

Concluding Remarks (15 mins.)

  • Reflecting on the session: Ideas to Put into Practice What Was Learned

WHAT YOU NEED: A fully charged laptop, which has Anaconda (Python 3.7 version) installed. Feel free to contact me ( if you need any help before the session.


erkan-smallErkan Er received his PhD degree in Learning, Design, and Technology from the University of Georgia, USA, in 2016. He is currently working as a postdoctoral researcher in GSIC-EMIC research group in the Department of Telecommunications Engineering, in the University of Valladolid, Spain. His recent research interests include using machine learning and educational data mining techniques to understand and support student learning in massive contexts.

He currently works in the project called WeLearnAtScale.




Bote-Lorenzo, M. L., & Gómez-Sánchez, E. (2017). Predicting the decrease of engagement indicators in a MOOC. In Proceedings of Seventh International Conference on Learning Analytics and Knowledge (pp. 143–147). Vancouver, Canada.

Bote-Lorenzo, M. L., & Gómez-Sánchez, E. (2018). An approach to build in situ models for the prediction of the decrease of academic engagement indicators in Massive Open Online Courses. Journal of Universal Computer Science, 1.Accepted.

Boyer, S., Gelman, B. U., Schreck, B., & Veeramachaneni, K. (2015). Data science foundry for MOOCs. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (pp. 1–10). Paris, France.

Boyer, S., & Veeramachaneni, K. (2015). Transfer learning for predictive models in Massive Open Online Courses. In Proceedings of the 17th Conference on Artificial Intelligence in Education (pp. 54–63). Madrid, Spain.

Boyer, S., & Veeramachaneni, K. (2016). Robust predictive models on MOOCs: Transferring knowledge across courses. In Proceedings of the 9th International Conference on Educational Data Mining (pp. 298–305). Raleigh, NC, USA.

Gardner, J., & Brooks, C. (2018). Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 28(2), 127–203.

Kurka, D. B., Godoy, A., & Von Zuben, F. J. (2016). Delving deeper into MOOC student dropout prediction. CEUR Workshop Proceedings, 1691, 21–27.

Whitehill, J., Mohan, K., Seaton, D., Rosen, Y., & Tingley, D. (2017). MOOC dropout prediction: How to measure accuracy? In Proceedings of the Fourth ACM Conference on Learning@Scale (pp. 161–164).