Data Science Half Day 2022

Data Science Half Day

April 26, 2022 | 2:25PM - 5:00PM

College High Hall

Schedule Overview

2:25pm

Welcome

COHH 3117

Dr. Bud Fischer, Provost
Dr. Cate Webb, Director, Applied Research & Technology Program
Dr. Richard Schugart, Director, Applied Center for Data Science

2:30 - 4:53pm

General Session I

COHH 3117 (see below for specific times of talks, titles, presenters, and abstracts)

2:30 - 4:23pm

General Session II

COHH 3123 (see below for specific times of talks, titles, presenters, and abstracts)

General Session I

Location: COHH 3117

2:30-2:50 | Machine Learning in Experimental Materials Chemistry

Presenters: Dr. Ranjit Koodali, Associate Provost for Research & Graduate Education

Abstract: The development of advanced materials is an important aspect of modern life. However, the discovery of novel materials involves searching the vast chemical space to find materials with desired properties. Recent developments in the applications of Machine Learning (ML) in materials chemistry show promise to accelerate the material discovery process. In this perspective article, we highlight the importance of ML in materials chemistry. We discuss few examples of ML applications in synthesis, characterization, and predicting activities of materials. Finally, we discuss the challenges in this field and how the progress in ML in chemistry is leveraged together with advanced robotics to perform automated optimization of material discovery.

2:53-2:58 | Classification of Seizures Using Electroencephalogram (EEG) Data and Convolutional Neural Networks

Presenters: Hannah Laney, Gatton Academy; Jenna Waltrip, Gatton Academy

Abstract: Manual electroencephalogram (EEG) analysis requires extensive training and is a subjective process. A computerized method to classify types of seizures from EEG readings will make seizure diagnoses less subjective, more precise, and less timely. Our project uses simulated EEG data combined with convolutional neural networks composed of different layers in order to classify three different types of seizures (absence, tonic, and myoclonic) based on the image of their graphs.

3:00-3:20 | The Future of Weather Forecasting: Right or Fright

Presenters: Dr. Eric Rappin, Kentucky Mesonet

Abstract: In this talk we will discuss one of the original big data questions, how to utilize vast amounts of weather observations to improve weather forecasting. We will discuss the history of weather forecasting from the “simple” models of World War II to the data hungry machine learning algorithms being developed today. Furthermore, we will discuss how the data is generated, be it from observations or models that span spatiotemporal scales from local to global and from the diurnal to the decadal. The focus will be on how weather models have adapted with the growth in computing power and the transition from a fluid dynamical basis to a machine learning focus. Discussion on the overall ability to forecast a nonlinear problem will also be provided.

3:23-3:28 | Determining the Optimal Launch Conditions for Small-Scale Orbital Satellites

Presenter: Nathan Hogg, Gatton Academy; Armaan Rai, Gatton Academy

Abstract: Satellites are vital to the functionality of our ever-evolving technological world. They are the go between for nearly all Earth-based communications, Global Positioning Systems, and many other technical applications. Currently, there aren't many accessible, user-friendly programs that are able to calculate the launch conditions of these satellites. Our project aims to bridge this gap by providing the user with streamlined input methods and clear outputs of critical data.

3:30-3:50 | The Interplay of Computation and Psychology from Neural Nets to Eliza

Presenter: Dr. Lance Hahn, Department of Psychological Sciences

Abstract: Psychological science and computer science have an important history of intellectual exchange. Image recognition, learning, language comprehension, and intelligence are among the topics that have rich literatures in both fields. Marr’s interdisciplinary “Levels of Analysis” is useful for defining a focus and constructing a problem space. Example data and problems will demonstrate the value of an interdisciplinary approach with a focus ranging from pixels to Rogerian talk therapy.

3:53-3:58 | The Relationship Between Stress and Performance

Presenter: Desmond Harris, Gatton Academy; Reagan Phelps, Gatton Academy

Abstract: Stress and performance theories have been poorly represented in education. A random forest model is developed to help athletes assess their stress and how it impacts performance. The model was partially developed identifying critical variables outlined from different psychology textbooks describing performance theory.

4:00-4:20 | Books versus Reality: Who is a Good Leader? Application of Topic Modeling to Leadership Research

Presenter: Dr. Xiaowen Chen, Department of Psychological Sciences

Abstract: This study was to figure out the leadership competencies sought for by employers and to compare those competencies with desired leadership traits described by leadership literature. In order to realize the two purposes, we generated two massive text datasets, one consists of job description of leadership positions available in a job-search engine and the other consists of the leadership book chapters. Topic modeling and other ML techniques were used for automatic competency modeling.

4:23-4:28 | Sentiment Analysis of Text Using MATLAB

Presenter: Brody Johnson, Gatton Academy; Diego Moreno, Gatton Academy

Abstract: Using MATLAB’s Text Analytics Toolbox, we implemented the Valence Aware Dictionary for Sentiment Reasoning (VADER) algorithm to perform a subtype of text analysis known as sentiment analysis, in which the positive & negative sentiments expressed in a text are evaluated, analyzed, and given a score. The model used is polar, meaning all data exists on a 1-dimensional scale, this particular scale being the interval from -1 to 1. Such a program is useful for analyzing writings in a large variety of fields, from the academically prestiged (research papers, economic journals) to the more commonplace (social media posts, customer reviews).

4:30-4:50 | Deep Learning for Video Game Genre Classification

Presenter: Dr. Lukun Zheng, Department of Mathematics

Abstract: Video game genre classification based on its cover and textual description would be utterly beneficial to many modern identification, collocation, and retrieval systems. At the same time, it is also an extremely challenging task due to the following reasons: First, there exists a wide variety of video game genres, many of which are not concretely defined. Second, video game covers vary in many different ways such as colors, styles, textual information, etc, even for games of the same genre. Third, cover designs and textual descriptions may vary due to many external factors such as country, culture, target reader populations, etc. With the growing competitiveness in the video game industry, the cover designers and typographers push the cover designs to its limit in the hope of attracting sales. The computer-based automatic video game genre classification systems become a particularly exciting research topic in recent years. In this paper, we propose a multi-modal deep learning framework to solve this problem. The contribution of this paper is four-fold. First, we compiles a large dataset consisting of 50,000 video games from 21 genres made of cover images, description text, and title text and the genre information. Second, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of genre classification for video games. Third, we developed an efficient and salable multi-modal framework based on both images and texts. Fourth, a thorough analysis of the experimental results is given and future works to improve the performance is suggested. The results show that the multi-modal framework outperforms the current state-of-the-art image-based or text-based models. Several challenges are outlined for this task. More efforts and resources are needed for this classification task in order to reach a satisfactory level."

General Session II

Location: COHH 3123

2:30-2:50 | A machine learning (ML)-based model to estimate PM2.5 concentration levels in Air Quality Dataset

Presenters: Shaharina Shoha, Department of Mathematics

Abstract: During the last many years, air pollution can interact to amplify risks to human health and crop production. Every year, a large number of people have been diagnosed with asthma and other breathing-related problems. The main reason behind this has been the high concentration of life- threatening PM2.5 particles dissolved in its atmosphere. In our project, we proposed ML models-random forest tree (RFT), gradient boosting tree (GBT) and neural network (NN) model to forecast the concentration level of these dissolved particles and analyses the factors, may help to humankind to prepare with careful prevention and significant strategies to save from the major risk factor in human diseases including chronic obstructive pulmonary disease, reduced lung function, pneumonia, cardiovascular diseases, premature death and leukaemia. The dataset contains 71149 instances of hourly averaged responses from an array of 6 metal oxide chemical sensors embedded in an air quality chemical multisensory device. To perform the comparative study with ML models, we have used MAE, RMSE, and ROC as performance metrics.

2:53-2:58 | Menstrual Cycle Predictions Using Machine Learning

Presenters: Gracie Davis, Gatton Academy; Aubrey Morse, Gatton Academy; Maria Pfeifer, Gatton Academy

Abstract: Menstrual cycle predictions carry uncertainty in many different ways, some inherent to the physical process, while other uncertainty comes from the data collection itself. This project looks at identifying and minimizing uncertainty, an idea paramount to data science in the health field, through the lens of menstrual cycle predictions.

3:00-3:20 | Molecular Evolution of Acetylcholine Pathway and Novel Tools for Molecular Analysis of Diseases

Presenters: Dr. Chandrakanth Emani, Department of Biology

Abstract: The presence of non-neuronal acetylcholine in plants and animals is implicated in the regulation of cell differentiation, phytochrome-mediated processes, cytoskeletal organization, immune function, and ion transport. Clinical significance of the non-neuronal acetylcholine’s role is in pathogenesis of diseases such as acute and chronic inflammation, local and systemic infection, dementia, atherosclerosis and cancer. Tracing the molecular evolution of the acetylcholine pathway and subsequent bioinformatics analysis may uncover its transition from a non-neuronal role to a neuronal role. My talk will focus on how the bioinformatics analysis of specific acetylcholine related enzymes provide us valuable model systems to explore molecular basis of diseases.

3:23-3:28 | Using Differential Equations to Predict Changes in Animal Populations

Presenter: Addison Hoskins, Gatton Academy; Tasha Otieno, Gatton Academy

Abstract: Using synthetic data from the user input, our project works to analyze population growth trends and predict future population changes through differential equations, like the Verhulst Model. Our program will prompt the user to select a biome, select an animal, and input values for the current and past populations of the species. Using the logistic growth model, our program will predict future population changes at small increments of time and curve fit the data to generate a function matching the predicted population sizes. This information will then be displayed onscreen, along with a brief overview of the ideal environmental conditions.

3:30-3:50 | Improving Automated Scoring of Student Open Responses in Mathematics

Presenter: Dr. John Erickson, Department of Information Systems

Abstract: Open-ended questions in mathematics are commonly used by teachers to monitor and assess students’ deeper conceptual understanding of content. Student answers to these types of questions often exhibit a combination of language, drawn diagrams and tables, and mathematical formulas and expressions that supply teachers with insight into the processes and strategies adopted by students in formulating their responses. While these student responses help to inform teachers on their students’ progress and understanding, the amount of variation in these responses can make it difficult and time-consuming for teachers to manually read, assess, and provide feedback to student work. For this reason, there has been a growing body of research in developing AI-powered tools to support teachers in this task. This work seeks to build upon this prior research by introducing a model that is designed to help automate the assessment of student responses to open-ended questions in mathematics through sentence-level semantic representations. We find that this model outperforms previously published benchmarks across three different metrics. With this model, we conduct an error analysis to examine characteristics of student responses that may be considered to further improve the method.

3:53-3:58 | Predicting Cheapest Flight Destination Through Predictive Modeling

Presenter: Holly McClure, Gatton Academy; Sophie Wielawski, Gatton Academy

Abstract: Our project will use existing government data for U.S. domestic flights, as well as MATLAB analysis prediction features, to estimate the cheapest destination options personal to the user. The user will be prompted to enter their current state, desired vacation region ( urban, coastal, mountainous, etc.), and the season they plan to vacation during. The code will compare the user inputs to the predicted data to display the information of the most affordable travel destinations.

4:00-4:20 | Data Exploration and Feature Extraction from Existing AIA and GOES Data on Solar Flares

Presenter: Jake Boils, Department of Physics; Lars Hebenstiel, Department of Physics; Dr. Gordon Emslie, Department of Physics; Dr. Ivan Novikov, Department of Physics

Abstract: Solar flares are extremely intense releases of mass and energy and can hurl charged particles towards Earth, damaging electronics. It is important to us to be able to predict such occurrences. In March of 2024, NASA plans to launch two instruments to study solar flares. Using various solar data sources, on the order of 50 petabytes, we plan to construct a convolutional neural network (CNN) to predict solar flares for this NASA mission. So far, we have studied our data sets and have constructed a small four-dimensional datacube in (t,λ,x,y), and have trained a small CNN to classify flares.

Sponsored by:

Gatton Academy

Some of the links on this page may require additional software to view.

Download Acrobat Reader Download Quicktime