Mushtari Sadia
About me
I'm Sadia (she/her), a first year PhD student at the department of CSE in University of Michigan, Ann Arbor. I'm also a member of the research group of Professor Ang Chen. My research interests lie at the intersection of systems, security, privacy, and machine learning. I previously graduated from the department of CSE, BUET, and worked as a lecturer in the department of CSE, BRAC University, and as an adjunct lecturer at the department of CSE, BUET.
In my free time, I love travelling, dancing and binge-watching sitcoms.
Work Experience
-
Graduate Student Research Assistant
Department of Computer Science and Engineering,
University of Michigan, Ann Arbor
August 2024 — PresentWorking with Professor Ang Chen.
-
Lecturer (Full-time)
Department of Computer Science and Engineering,
BRAC University
June 2023 — July 2024Course Instructor:
(Summer 2023, Fall 2023, Spring 2024) CSE 220: Data Structures
(Summer 2023, Fall 2023, Spring 2024, Summer 2024) CSE 221: Algorithms
(Summer 2023, Fall 2023, Summer 2024) CSE 220: Data Structures Sessional
(Summer 2023, Spring 2024, Summer 2024) CSE 110: Programming Language I Sessional
-
Adjunct Lecturer
Department of Computer Science and Engineering,
Bangladesh University of Engineering and Technology
November 2023 — March 2024Course Instructor:
CSE391: Embedded Systems and Interfacing
CSE392: Embedded Systems and Interfacing Sessional
CSE102: Structured Programming Language Sessional
CSE412: Simulation and Modeling Sessional
-
Research Fellow
August 2023 — February 2024Worked with Dr. Praneeth Vepakomma (MIT) on a distributed machine learning project, implementing the concept of layer parallelization of the training process of deep learning models.
Research Experience
-
Effectiveness of Transformer-based Language Models in Detecting Advanced Persistent Threats from System Provenance Graphs (2022 - Current)
Computer Security, Natural Language ProcessingUndergraduate thesis project under Dr. Anindya Iqbal and Dr. Shahrear Iqbal(Research Officer, National Research Council (NRC) Canada). In this work, I co-implemented a framework which includes a robust process of creating a provenance graph from raw log data from the DARPA OPTC and DARPA TC E3 datasets, subsequently generating event sequences from that graph and finally transforming the data into a suitable format for transformer-based language models such as BERT, RoBERTa, GPT. My personal contributions were building the postgres database from the raw log datasets, designing SQL queries to generate the provenance graph, co-writing the code for preprocessing the graph data to extract traces with relevant attributes, and finally building the experiments with various pre-trained LLMs. We were able to achieve state of the art performance from our framework in detection of APT attacks. Currently, this work is under review for publication.
Read the preprint: [arXiv] -
Advancing Parallelization in Deep Learning Training: A Novel HSIC-Based Approach and Its Comparative Performance Against Traditional Federated Models (2023 - Current)
Federated Learning, Privacy, OptimizationWorking with Dr. Praneeth Vepakomma (MIT Camera Culture Group) as part of the Fatima Fellowship Research Program, on a distributed machine learning project. We developed a method for parallelizing the forward propagation in neural network training, utlizing the HSIC objective function to eliminate the need for backpropagation, all while maintaining the same level of accuracy. We also discovered that, in some instances, employing slightly outdated local updates can signifcantly reduce communicaton costs without compromising accuracy. We are presently in the phase of manuscript review of this work, as we prepare it for publication.
-
Development of Flood Forecasting System for Bangladesh-India Using Different Machine Learning Techniques (2020)
Machine LearningIn this study, I preprocessed datasets of weather parameters and employed the use of five different machine learning algorithms- exponent back propagation neural network (EBPNN), multilayer perceptron (MLP), support vector regression (SVR), DT Regression (DTR), and extreme gradient boosting (XGBoost), which were used to develop total 180 independent models based on a different combination of time lags for input data and lead time in forecast. Models were developed for Someshwari-Kangsa sub-watershed of Bangladesh’s North Central hydrological region with 5772 km2 drainage area.
EGU General Assembly 2021: [Poster Presentation] -
Dengue Forecasting System (2020-2022)
Machine LearningWorked with Dr. ABM Alim Al Islam, Ramisa Alam, Mashiat Mustaq and Tahiea Taz. In this project, we built a time series forecasting model-based dengue forecast system that predicts the number of dengue cases in any given region based on the recent cases in that region and the state of different weather parameters. My contributions in the project was preprocessing the datasets as well as developing the models using MS Azure Services.
Presented the idea and won the 1st Runner Up position in Microsoft Virtual Hackathon 2022.
Education
-
PhD in Computer Science and Engineering
University of Michigan, Ann Arbor
August 2024 - Present -
B.Sc. in Computer Science and Engineering
Bangladesh University of Engineering and Technology
April 2018 - May 2023CGPA: 3.84/4.00
Notable Courses:
CSE471- Machine Learning CSE405- Computer Security CSE453- High Performance Database Systems CSE309- Compiler Design CSE321- Computer Networks CSE313- Operating Systems CSE463- Introduction to Bioinformatics CSE409- Computer Graphics CSE411- Simulation and Modeling MATH247- Linear Algebra MATH245- Statistics and Probability CSE305- Computer Architecture -
Higher Secondary School Certificate (HSC)
Engineering University School and College
2017GPA: 5.00/5.00
- Board General Scholarship
-
Secondary School Certificate (SSC)
Engineering University School and College
2017GPA: 5.00/5.00
Technical Skills
-
Programming Languages
C/C++, x86 Assembly, Bison/Flex, Python, Java, Javascript, Bash, MySQL
-
Frameworks
Docker, PyTorch, NS3, xv6, Django REST, ReactJS, Git, Oracle DBMS, LaTeX, Wireshark
-
Libraries
Sklearn, Pandas, Matplotlib, Seaborn