EDM 2023 Tutorial: Introduction to Neural Networks and Uses in EDM

In this tutorial, participants explore the fundamentals of feedforward neural networks such as the backpropagation mechanism and Long Short Term Memory neural networks. The tutorial also covers the basis of Deep Knowledge Tracing, the attention mechanism and the application of neural networks in education. There will be some hands-on applications on open educational datasets. The participants should leave the tutorial with the ability to use neural networks in their research.

Description

Neural networks (NN) are as old as the relatively young history of computer science: McCullogh and Pitts already proposed nets of abstract neurons in 1943 as Haigh and Priestley report in [5]. However, their successful use, especially under the form of convolutional neural networks (CNN) or Long Short Term Memory (LSTM) neural networks, in areas such as image recognition and language translation in the last years have made them widely known, also in the Educational Data Mining (EDM) community. This is reflected in the contributions that are published each year in the proceedings of the conference.

In [11], we counted the percentage of the contributions in the EDM proceedings of the Educational Data Mining (EDM) conference from the beginning of the conference in 2008 till 2019 (long and short papers, posters and demos, young re- search track, doctoral consortium, and papers of the indus- try track) that have used some kind of neural networks in their research. While the percentage stayed below 10% till 2015, it started to increase in 2016 to reach 28% in 2019. This trend has continued since then with 14 long papers from 26 mentioning some kind of neural networks in their research in the EDM proceedings of 2022

Recognizing the growing importance of neural networks in the EDM community, this tutorial aims to provide 1) an in- troduction to neural networks in general and to LSTM neu- ral networks with a focus on the attention mechanism and the Transformer neural networks and 2) a discussion venue on these exciting techniques. Compared with our prece- dent tutorial [11], the main difference is the introduction to Transformer neural networks. This tutorial targets 1) par- ticipants who have no or very little prior knowledge about neural networks and would like to use them in their future work or would like to better understand the work of others, and 2) participants interested in exchanging and discussing their experience with the use of neural networks.

Learning outcome

The objectives of this tutorial are twofold:

Introduce the fundamental concepts and algorithms of neural networks to newcomers, and then build on these fundamentals to give them some understanding of LSTM, the attention mechanism and an introduction to Transformers;
Provide a place to discuss and exchange about experiences while using neural networks with educational data.

Newcomers should leave the tutorial with a good understanding of neural networks and the ability to use them in their own research or to appreciate better research works that use neural networks. Participants already knowledgeable about neural networks get a chance to discuss and share about this topic and connect with others.

Schedule

Duration	Item	Presenter
45 minutes	Introduction - Feedforward neural networks and backpropagation	Agathe Merceron
45 minutes	Application - Discussion	Agathe Merceron
30 minutes	LSTM and Attention Mechanism	Ange Tato
30 minutes	Break
15 minutes	DKT and an Introduction to Transformer neural networks	Ange Tato
45 minutes	Application - Implementation of a LSTM + attention mechanism for student performance prediction - Discussion	Ange Tato

Introduction to feedforward neural networks

This part begins with artificial neurons and their structure - inputs, weight, output, and activation function - and the calculations that are feasible and not feasible with one neu- ron only. It continues with feedforward neural networks or multi-layer perceptrons (MLP). A hands-on example taken from [8] illustrates how a feedforward neural network calcu- lates its output. Further, this part introduces loss functions and the backpropagation algorithms and makes clear what a feedforward neural network learns. Backpropagation is demonstrated with the hands-on example introduced before.

Application of feedfoward NN - Discussion

This part discusses the use of feedforward neural networks in EDM research. These networks are often used to predict students’ performance and students at risk of dropping out, see for example [5, 1, 24]. It must be noted that feedforward neural networks do not necessarily give better results than other algorithms for this kind of task. Other uses emerge. For example, Ren et al. use them to model the influence on the grade of a course taken by a student on all other courses that the student has co-taken [16]. As another example, Or and Russel [13] uses intentionally a feedforward “neural network model to both automatically assess the design of a program and provide personalized feedback to guide stu- dents on how to make corrections”.
It must be noted that neural networks are considered not interpretable, see [12]. When explanations are crucial, it might be worthwhile to evaluate whether interpretable al- gorithms might be used instead; another way is to generate explanations with other algorithms, see [20] for challenges in doing so.
The main activity of this part is for participants to solve a classification task on an educational dataset; participants will create, inspect and evaluate a feedforward neural net- work with Python and relevant libraries.

LSTM and Attention mechanism, intoduction to Transformer neural networks

In this part of the tutorial, basic concepts of LSTM are covered. We will focus on how the different elements (cell, state, etc.) of the architecture work. Participants will learn how to use an LSTM for the prediction of learners' outcomes in an educational system. Concepts such as the Deep Knowledge Tracing (DKT) will be also covered. The attention mechanism is also introduced. Participants will learn how this mechanism works and how to use it in different cases. We will explore concepts such as global and local attention in neural networks.

We will also introduce the Transformer neural network. We will focus on the architecture of the Transformer neural network, including concepts such as the multi-headed attention layer and parallel inputs utilizing GPU. It's important to note that this part will not be included in the application due to our time constraints. However, an example is available here (Deep Knowledge Tracing with transformers)

Application - LSTM and Attention mechanism

In this hands-on part, we will explore existing real-life applications of LSTM (especially Deep Knowledge Tracing) in education. We will also explore the combination of LSTM with Expert Knowledge (using the attention mechanism) for Predicting Socio-Moral Reasoning skills [21,22]. Participants will implement an LSTM with an attention mechanism for the prediction of students’ performance in a tutoring system. We will use Python especially the Keras library for coding. We will use open educational datasets (e.g. Assistments benchmark dataset).

Material

The tutorial material consists of slides (see the schedule section) and some files for the application parts.
Application - Part I :

Feed Forward NN to Predict Dropout in a Degree Program (Jupyter Notebook)

Application - Part II :

Assistments dataset
Split file
Jupyter Notebook 1
Jupyter Notebook 2
Attetion_bn.py (used in the second notebook).
bn_data.csv
rawData_kn.csv

Note: A laptop capable of running, a Jupyter Notebook from Google colab is required for full participation in this tutorial.

Presenters

Agathe Merceron , Beuth University of Applied Sciences Berlin.
Ange Tato, École de Technologie Supérieure de Montréal.

To do before the tutorial

Please add your information here.

References

J. Berens, K. Schneider, S. Gortz, S. Oster, and J. Burghoff. Early detection of students at risk - predicting student dropouts using administrative student data from german universities and machine learning methods. Journal of Educational Data Mining, 11(3):1–41, 12 2019.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio. Attention-based models for speech recognition. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 577–585, Cambridge, MA, USA, 2015. MIT Press.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014. MIT Press, 2014.
G. Dekker, M. M. Pechenizkiy, and J. Vleeshouwers. Predicting students drop out: A case study. In T. Barnes, M. Desmarais, C. Romero, and S. Ventura, editors, Proceedings of the second International Conference on Educational Data Mining (EDM 2009), pages 41–50. International Educational Data Mining Society, July 2009.
A. Ghosh, N. Heffernan, and A. S. Lan. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2330–2339, 2020.
T. Haigh and M. Priestley. von neumann thought turing’s universal machine was’ simple and neat.’ but that didn’t tell him how to design a computer. Communications of the ACM, 63(1):26–32, 2019.
J. Han, M. Kamber, and J. Pei. Data Mining - Concepts and Techniques. Morgan Kaufmann, 2012.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
J. D. M.-W. C. Kenton and L. K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, pages 4171–4186, 2019.
A. Merceron and A. Tato. An introduction to neural networks. In A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, editors, Proceedings of the International Conference on Educational Data Mining (EDM 2020), pages 821–823. International Data Mining Society, 2020.
C. Molnar. Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/, Nov 2022. Last checked on Dec 07, 2022.
J. W. Orr and N. Russell. Automatic assessment of the design quality of python programs with personalized feedback. In I.-H. S. Hsiao, S. S. Sahebi, F. B. chet, and J.-J. Vie, editors, Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), pages 495–501. International Educational Data Mining Society, July 2021.
C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. Guibas, and J. Sohl-Dickstein. Deep knowledge tracing. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 505–513, Cambridge, MA, USA, 2015. MIT Press.
S. Pu, M. Yudelson, L. Ou, and Y. Huang. Deep knowledge tracing with transformers. In International Conference on Artificial Intelligence in Education, pages 252–256. Springer, 2020.
Z. Ren, X. Ning, A. Lan, and H. Rangwala. Grade prediction based on cumulative knowledge and co-taken courses. In M. Desmarais, C. F. Lynch, A. Merceron, and R. Nkambou, editors, Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), pages 158–167. International Educational Data Mining Society, July 2019.
B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. Lee. Investigating neural architectures for short answer scoring. In Proceedings of the 12th workshop on innovative use of NLP for building educational applications, pages 159–168, 2017.
C. Romero, S. Ventura, P. Espejo, and C. Herv ́as. Data mining algorithms to classify students. In R. S. J. de Baker, T. Barnes, and J. E. Beck, editors, Proceedings of the first International Conference on Educational Data Mining (EDM 2008), pages 8–17. International Data Mining Society, 2008
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, page 3104–3112, Cambridge, MA, USA, 2014. MIT Press.
V. Swamy, B. Radmehr, N. Krco, M. Marras, and T. K ̈aser. Evaluating the explainers: Black-box explainable machine learning for student success prediction in MOOCs. In N. Bosch and A. Mitrovic, editors, Proceedings of the 15th International Conference on Educational Data Mining (EDM 2022), pages 98–109, Durham, United Kingdom, July 2022. International Educational Data Mining Society.
A. Tato and R. Nkambou. Infusing expert knowledge into a deep neural network using attention mechanism for personalized learning environments. Frontiers in Artificial Intelligence, 5:921476, 2022.
A. A. N. Tato, R. Nkambou, and A. Dufresne. Hybrid deep neural networks to predict socio-moral reasoning skills. In M. Desmarais, C. F. Lynch, A. Merceron, and R. Nkambou, editors, Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), pages 623–626. International Educational Data Mining Society, 2019.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
K. Wagner, A. Merceron, and P. Sauer. Accuracy of a cross-program model for dropout prediction in higher education. In Companion Proceedings of the 10th International Learning Analytics & Knowledge Conference (LAK 2020), pages 744–749, 2020.
X. Xiong, S. Zhao, E. G. Van Inwegen, and J. E. Beck. Going deeper with deep knowledge tracing. In T. Barnes, M. Chi, and M. Feng, editors, Proceedings of the International Conference on Educational Data Mining (EDM 2016), pages 545–550. International Data Mining Society, 2016.