Winners have been announced for the 2021 Global Student Challenge (GSC-21)! The first round of the competition began in May and it involved student-teams solving two challenge problems. Each team was evaluated on their implementation and reports explaining their results. This year 487 students participated in 250 teams representing 195 universities from 41 countries.
The competition was meant to engage the global student body in solving two real-world data analytic problems. The problems were based on datasets that were collected from real-world computing systems (challenge #1) and from Tweets related to the pandemic (challenge #2). While failures of computing systems are a problem under normal circumstances, in the middle of the pandemic with remote work putting more strain on such systems, their failures or even slowdowns were particularly irksome. Therefore the challenge problem asked participants to create data prediction models to predict when such failures were likely. The system administrators could then take mitigation action if the likelihood of failure was particularly high.
The second challenge problem asked the software to classify people’s emotional state based on their Tweets. With isolation being forced upon us by the pandemic, social media posts promised an important way to gauge people’s emotions. This challenge sought to push the envelope on how accurately software can do that, despite the nuances of expression depending on culture, nationality, and other factors. And it turned out that the best data analytics software could do this surprisingly well, even though it had less than 140 characters to weave its magic on.
On a less technical level, the competition was a powerful way to bring the global student community together, despite many being isolated from one another, creating a level playing field to some extent. Student teams had the freedom to innovate, to learn the skills they lacked to solve the problems and to put their heads together to come up with enterprising, highly accurate solutions. What was heartening is that some of the winning solutions beat the accuracy achieved by the models of the original authors. To the organizers, and to anybody else who was watching this competition unfold, this was a reminder that human ingenuity is a precious commodity and obeys no boundaries of countries or status.
In terms of participation from the organizers, we had very effective cooperation between academics (faculty members and researchers at universities) and industrial practitioners. The distinct parts of their viewpoints as well as the overlapping parts helped bring out various facets of the submissions.
Challenge problem #1: Computer System Failure Data Analysis
With the growing scale of supercomputing systems, scientists are now able to solve challenging computing problems in a matter of seconds which would take hundreds of years on a personal computer. However, with increasing scale (and complexity thereof) grows the probability of application failure, either due to hardware or software errors. Such application failures not only delay scientific progress, but also lead to a tremendous amount of wasted resources, both in terms of time and energy consumption. If we are able to predict when an application would fail due to a system error or due to software bugs, preventive mechanisms such as checkpointing can be initiated to save intermediate results, thereby, reducing the amount of wasted computation.
This challenge problem deals with predicting the failure of application executions (referred to as “job”) on Purdue’s central computing cluster. The teams are given data about the status of each compute node and the resource usages by all jobs running on the cluster.
Challenge problem #2: Sentiment Analysis of COVID-19 related Tweets
This challenge is on sentiment analysis of Tweets related to the Covid-19 pandemic, which is a multi-label text classification task. Since the outbreak of the pandemic, it has affected more than 180 countries where massive losses in the economy and jobs globally and confining about 58% of the global population have ensued. The research on people’s feelings is essential for ensuring positive mental health outcomes and for people to stay informed about Covid-19.
In this competition, the released training data contains 5000 labeled tweets while the released validation data have 2500 pieces of unlabeled tweets. The classes for each tweet are: Optimistic (0), Thankful (1), Empathetic (2), Pessimistic (3), Anxious (4), Sad (5), Annoyed (6), Denial (7), Surprise (8), Official report (9), Joking (10). Tweets can be labeled with multiple classes. The contestants have to automatically label the tweets in the test dataset.
About Global Student Challenge
The Global Student Challenge is a platform for students from all corners of the globe to create innovative solutions to data analytics problems. This 2021 judges panel consisted of IEEE Computer Society volunteers, academics, and representatives from the industry.