Mushtari Sadia

Mushtari Sadia

About me

I'm Sadia (she/her), a second-year PhD student at the department of CSE in University of Michigan, Ann Arbor, working as a Research Assistant @ Michigan Systems Laboratory, and I'm co-advised by Professor Ang Chen and Professor Amrita Roy Chowdhury.

My research interest lies in reimagining core systems problems using computer vision, natural language processing, and multimodal models, to build secure and/or more capable systems. Therefore, I am especially motivated by problems at the intersection of (1) systems/security, and (2) AI.

I previously graduated from the department of CSE, BUET, and worked as a lecturer in the department of CSE, BRAC University, and as an adjunct lecturer at the department of CSE, BUET.

In my free time, I love travelling, dancing and binge-watching sitcoms.

My Resume

Work Experience

  1. Graduate Student Research Assistant

    Michigan Systems Laboratory

    Advisors: Ang Chen, Amrita Roy Chowdhury.

    Department of Computer Science and Engineering,

    University of Michigan, Ann Arbor

    August 2024 — Present
  2. Lecturer

    Department of Computer Science and Engineering,

    BRAC University

    June 2023 — July 2024

    Course Instructor:

    (Summer 2023, Fall 2023, Spring 2024) CSE 220: Data Structures

    (Summer 2023, Fall 2023, Spring 2024, Summer 2024) CSE 221: Algorithms

    (Summer 2023, Fall 2023, Summer 2024) CSE 220: Data Structures Sessional

    (Summer 2023, Spring 2024, Summer 2024) CSE 110: Programming Language I Sessional

  3. Adjunct Lecturer

    Department of Computer Science and Engineering,

    Bangladesh University of Engineering and Technology

    November 2023 — March 2024

    Course Instructor:

    CSE391: Embedded Systems and Interfacing

    CSE392: Embedded Systems and Interfacing Sessional

    CSE102: Structured Programming Language Sessional

    CSE412: Simulation and Modeling Sessional

Research Experience

(* denotes equal contribution)

  1. SQUiD: Synthesizing Relational Databases from Unstructured Text (2025)

    Mushtari Sadia, Zhenning Yang, Yunming Xiao, Ang Chen, Amrita Roy Chowdhury

    Accepted to EMNLP 2025.

    Over 80% of real-world information exists in unstructured text. While this vast resource holds immense potential value, traditional relational database systems cannot directly analyze such data. We introduce SQUiD (SQL on Unstructured Data), a neurosymbolic framework that automatically synthesizes SQLite databases in the user’s system, after generating both schema and the tuples, from raw text. Unlike direct prompting, which often fails with hallucinations and invalid SQL, SQUiD decomposes the task into four precise stages: schema generation, value identification, table population, and database materialization. By reliably converting free-form text into SQL, SQUiD makes unstructured data as analyzable as any relational database.


    Read the preprint: [arXiv]
  2. Multi-modal Swarm Intelligence for Secure UAV Missions (2024)

    Yunming Xiao, Mushtari Sadia, Ang Chen

    Genzero Workshop Poster (2024)

    Unmanned Aerial Vehicles (UAVs) have found wide use in various tasks, but developing swarm intelligence for intrusion detection is far from easy. In this project, we proposed to leverage recent advances in large multimodal models (LMMs) that can fuse multiple data sources for secure missions. Our project will fine-tune and deploy LLMs of varying sizes to the edge/fog UAVs, combining data sources such as sensory inputs (e.g., camera, IMU) as well as internal operational data (e.g., syscall logs). This will enable real-time detection and response system to thwart threats with swarm-wide coordination.

  3. LogShield: A Transformer-based APT Detection System Leveraging Self-attention (2022)

    Sihat Afnan*, Mushtari Sadia*, Shahrear Iqbal, Anindya Iqbal

    Undergraduate thesis project under Dr. Anindya Iqbal and Dr. Shahrear Iqbal(Research Officer, National Research Council (NRC) Canada). In this work, I co-implemented a framework which includes a robust process of creating a provenance graph from raw log data from the DARPA OPTC and DARPA TC E3 datasets, subsequently generating event sequences from that graph and finally transforming the data into a suitable format for transformer-based language models such as BERT, RoBERTa, GPT. My personal contributions were building the postgres database from the raw log datasets, designing SQL queries to generate the provenance graph, co-writing the code for preprocessing the graph data to extract traces with relevant attributes, and finally building the experiments with various pre-trained LLMs. We were able to achieve state of the art performance from our framework in detection of APT attacks.


    Read the preprint: [arXiv]
  4. Advancing Parallelization in Deep Learning Training: A Novel HSIC-Based Approach and Its Comparative Performance Against Traditional Federated Models (2023)

    Mushtari Sadia, Praneeth Vepakomma

    Worked with Dr. Praneeth Vepakomma (MIT Camera Culture Group) as part of the Fatima Fellowship Research Program, on a distributed machine learning project. We developed a method for parallelizing the forward propagation in neural network training, utlizing the HSIC objective function to eliminate the need for backpropagation, all while maintaining the same level of accuracy. We also discovered that, in some instances, employing slightly outdated local updates can signifcantly reduce communicaton costs without compromising accuracy.

Education

  1. PhD in Computer Science and Engineering

    University of Michigan, Ann Arbor

    Advisors: Ang Chen, Amrita Roy Chowdhury.

    August 2024 - Present
  2. B.Sc. in Computer Science and Engineering

    Bangladesh University of Engineering and Technology

    April 2018 - May 2023

    CGPA: 3.84/4.00

Technical Skills

  1. Programming Languages

    C/C++, x86 Assembly, Bison/Flex, Python, Java, Javascript, Bash, MySQL

  2. Frameworks

    Docker, PyTorch, NS3, xv6, Django REST, ReactJS, MP-SPDZ, Git, Oracle DBMS, LaTeX

  3. Libraries

    Sklearn, Pandas, Matplotlib, Seaborn

Projects

Achievements

Competitions

  1. Microsoft Virtual Hackathon

    1st Runner Up (2022) (Among 700 teams around the world)

    We built an AI based dengue forecast system using MS Azure Services that predicts the number of dengue cases in any given region based on the recent cases in that region and the state of different weather parameters.

    Code: https://github.com/Mushtari-Sadia/Predictado-A-Dengue-Forecasting-Dashboard

    dengue
    Image: The dashboard built with MS PowerBI [competition link]
  2. HerWILL Datathon (2022)

    Champion (Among 110 female contestants around the world)

    A machine learning based forecasting system for predicting taxi demand in a city.

    Code: https://github.com/ramisa2108/Taxi-Demand-Forecasting-System

    herwill
    Image: What an honor and a privilege it is to have received this certificate from two of the most prominent scientists and educators of our lifetime, Dr. Pascal Van Hentenryck (https://en.wikipedia.org/wiki/Pascal_Van_Hentenryck) and Dr. M. Zafar Iqbal (https://en.wikipedia.org/wiki/Muhammed_Zafar_Iqbal). [article link]
  3. UNDP Women’s Digital Innovation Hackathon (2021)

    2nd Runner Up (Among 30 teams)

    Built an AI based Dengue Monitor & Control System

    dengue
    Image: The jury listens to the final pitch about improving Dengue fever control [news article link]
  4. Dhaka-AI 2020

    Participant

    A month-long competition on vehicle detection and classification task from traffic images using object detection models such as Yolo-v5, EfficientDet.

    Code: https://github.com/Mushtari-Sadia/Vehicle-Detection-with-State-of-the-Art-Deep-Learning-Models

  5. Ada Lovelace Datathon 2021

    Participant

    A competition on data analysis using ML on covid mental health.

    Code: https://www.kaggle.com/code/mushtarisadia/team5-thepowerpuffcoders-ensemble-rf-log-reg/notebook

  6. NLP Hackathon 2023

    Participant

    A competition on named entity recognition of a bangla language dataset.

    Code: https://www.kaggle.com/code/mushtarisadia/nlp-hackathon-2023/notebook

  7. Robi Datathon 2.0

    Participant

    A competition on data analysis using ML on data collected by the Robi (mobile SIM) company.

    Code: https://www.kaggle.com/code/mushtarisadia/robi-datahon-2-0-final-e646e6/notebook

  8. Kaggledays competition

    Participant

    In this competition, the aim was to predict the unit load power generation based on the given factors of a steam turbine in specific working environments using ML algorithms.

    Code: https://www.kaggle.com/code/mushtarisadia/fastai-power-lgb/notebook

Honors & Awards

  1. Fatima Fellowship

    August 2023-Present

    [Visit Their Website]

    The Fatima Al-Fihri Predoctoral Fellowship is a free 9-month program in which students from around the world, who are planning on applying to computer science or machine learning PhD programs in the United States or Europe, work with current PhD students or researchers on research projects to gain research experience and strengthen their applications.

  2. GHC Scholarship

    2022

    A scholarship granted to only a few selected candidates based on merit. Received the privilege of attending the Grace Hopper Celebration of Women in Computing, a conference that brings together women in tech from around the world.

  3. Dean’s List Scholarship

    2018-2022

    This scholarship is granted to undergraduate students for their academic excellence.

  4. Dhaka Board General Scholarship (HSC)

    2017

Leadership Experience

  1. President

    Bangladeshi Women In Computer Science & Engineering (BWCSE)

    April 2022 - May 2023

    [Visit the Website]

    The mission of BWCSE is to empower Bangladeshi women by fostering academic, social, and professional growth in the field of computer science and engineering. As president, my responsibilities included coordinating and organizing various competitive programming competitions, workshops and seminars on various CS fields.

  2. Batch Representative

    Bangladeshi Women In Computer Science & Engineering (BWCSE)

    April 2021 - April 2022
  3. Organizer

    BUET CSE FEST 2022

    June 2022 - August 2022

    [Visit our Facebook Page]

    Coordinated several inter-university competitions such as hackathon, programming contest, deep learning competition, AI contest; as well as cultural programs on behalf of the graduating class.

  4. Student Tutor

    December 2017 - April 2023

    Tutored many students ranging from kindergarten, primary school, middle school, high school and undergrad-level students during my undergrad years.