BME2133: Medical Data Privacy and Ethics in the Age of Artificial Intelligence (Projects)

Syllabus

Who / When / Where
Instructor: Zhiyu Wan
Teaching Assistant: Sihan Xie, Hongzhu Jiang
Semester: Fall 2025
Time: Wednesdays & Fridays (Odd Week) , 15:00-15:45, 15:55-16:40
Location: School of Life Science and Technology, Room A103
Office Hours: Upon request, Location: BME Building, Room 228

(Note: This page is tentative and subject to change.)

Instructions for Final Projects

Your project should be a group study (2-4 people) or an independent study (1 person) on data privacy or ethics issues, with relationship to the area of biology, medicine, or health more generally (related to your own research areas preferred)

You may design your own project or choose from a predefined set of topics (will be available on the course website later in the semester)

Do not be afraid to discuss your project ideas with the instructor!

Criteria	Due Date	% of the Project’s Grade
Project Description: A one-pager that describes the project area and how you intend to address the research within the confines of this semester.	Dec 10	5%
Mid-term Project Proposal Presentation: Briefing for the class on project area and first phase of research. (No more than 5 minutes)	Dec 17	10%
Written Project Proposal Report: A summary of the progress you have made (No more than 4 pages).	Dec 19	10%
Final Project Presentation: Showcase of research methods and results. (No more than 10 minutes)	Dec 26	40%
Final Project Report: This will be in the form of a conference-style paper. It will summarize the research area, your methodology, experience, and contributions of your work.	Jan 9	35%

Project Proposals

Project Proposals are due on Dec 19.
No more than 4-page description of project.
Sections to include:
- Introduction
- Pilot Project
- Extended Project
- Timeline

Part 1: Introduction

A brief description of the problem area that you will investigate and your goals.
A brief description of the problem’s significance.

Part 2: Pilot

Describe a small pilot project that you will complete before Dec 17 (this is when you will make your first oral presentation to the class on the status of your project).
Experiment(s) to demonstrate the feasibility of your investigation.

Part 3: Extended Project

Provide a roadmap for the experiments and/or system development that you will undertake to achieve your project goals.
Describe the potential risks that could prevent you from achieving your goals and how you intend to resolve the risks.

Part 4: Timeline

Outline the milestones of your project and when you expect to achieve your milestones.
This is not a contract! It’s an exercise to get you thinking about how your project should be structured.
Your timeline can be a description in the text, table(s), or figure(s).

Final Project

Your final project will be in the form of a research paper:

Introduction
Background
Related Research
Methods
Experiments
Results
Discussion
Conclusions
References

Past Projects

Sanitizing Clinical Data Using Big Data Frameworks | Hongzhu Jiang

2. Evaluating Age and Gender Biases in Acoustic Diagnostic Models for Dysphonia Detection | Sihan Xie

Project Description
Proposal Presentation
Proposal Report
Final Presentation
Final Report

3. Trustworthy LLM in Clinical Notes: A Dual Evaluation of Utility and Security | Jiayue Hou

Project Description
Proposal Presentation
Proposal Report
Final Presentation
Final Report

Sample Project Topics

Fairness Analyses/ Improvements of Medical AI Systems
Explainability Analyses/ Improvements of Medical AI Systems
Evaluation and Design of Privacy Technologies for Personal Health Records
Finding & Relating Publicly Available Repositories of Person Specific Biomedical Information
Anonymization of Clinical Profiles / Sets of Diagnoses / Medical Imaging Data
Applications of AI tools for Biomedical Data Protection
Building and Evaluating Clinical Text De-identification Tools
Applications of Security Frameworks (e.g., Blockchain) for Medical Data Privacy
Applications of Game-Theoretic Models for Optimizing Biomedical Data Protection
Safety and Privacy Issues of Large Language Models and Generative AI tools
Evaluating and Mitigating Privacy Risks of Multimodal Large Language Models
Others

Sample Project 1 (Topic 1 – Fairness)

•Fairness analyses in an AI algorithm (machine learning, deep learning, or large language models) for disease diagnosis, or treatment recommendation, etc.

•Find a dataset with gender, age, or racial information

•Use or design an AI algorithm to predict a health condition (i.e., diagnosis or treatment recommendation)

•Use statistical analysis to detect the performance difference of AI algorithm in different demographic groups

•Try to mitigate the corresponding bias

Sample Project 2 (Topic 2 – Explainability)

•Find an AI algorithm (machine learning, deep learning, or large language models) for disease diagnosis, or treatment recommendation, etc.

•Find a dataset with a set of features

•Apply one of the explainability models (e.g., SHAP, LIME, Grad-CAM)

Sample Project 3 (Topic 5 – Anonymization)

•Re-identifiability of discharge databases through demographics

•HCUP discharge databases that we discussed in an earlier lecture

•How many people are vulnerable to re-identification within a group of size k?

•How can we prevent such a re-identification?

•Experimental analysis

•Do NOT actually conduct the re-identifications

Sample Project 4 (Topic 4 – Resource for Attacker)

•How can we use publicly available resources to construct profiles on individuals?

•For instance, could you use information from Intellius and birthday databases to append birthday to familial records?

•How can you integrate obituary records with external resources?

•Think about simple record linkage likelihoods.

•How can we design an automated mechanism to define accessibility, population coverage, and cost for available resources.

Sample Project 5 (Topic 4 – Resource for Attacker)

•An investigation of medical information that can be found online in open or “semi-open” forums.

•Data gathering that can be done without the approval of an IRB

•How can we find webpages that include personal identifiers as well as medical information / status?

•Webcrawling + standardized medical vocabularies (UMLS) + dictionaries of demographic

•http://news.bbc.co.uk/2/hi/technology/8521598.stm

•http://pleaserobme.com/

Sample Project 6 (Topic 3 – Protection)

•Apply differential privacy or synthetic data generation approaches (e.g., generative adversarial network, diffusion model, generative AI tool) to a health-related AI algorithm or model.

•Find a health-related dataset (e.g. medical imaging data, voice data, text, tabular data, genomic data, or multimodal data)

•Find a medical AI task (e.g., diagnosis or drug recommendation)

•Apply privacy protection approach to the data and compare the effectiveness of the downstream analysis task (i.e., the medical AI task) before and after applying the privacy protection approach

Sample Project 7 (Topic 6 – AI for Privacy)

•Test a set of LLMs to see whether they can anonymize a public health dataset for you

•Consider both proprietary and open-source LLMs

•Quantify and compare their privacy protection capabilities

•Try different data modalities

Sample Project 8 (Topic 7 – Text Scrubbing)

•How can we scrub clinical text with provable guarantees of anonymity?

•What is the accuracy of open-source scrubbing technologies, such as:

•http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1421388

•Imagine combining various “scrubbing” methods with Formal Anonymity Models

•Look at Gardner & Xiong. Data and Knowledge Engineering. 2009.

•What are the features in text that make someone identifiable?

•How many features are necessary to make someone unique?

•Can we generalize clinical concepts to satisfy formal anonymity?

•Dataset available for this project

•I2b2 discharge data (you will need to sign a data use agreement)

Sample Project 9 (Topic 9 – Game Theoretic Genomic Data Protection)

•How can we share genomic data in aggregated or disaggregated form without revealing identity?

•Builds on protection methods of Wan and others

Sample Project 10 (Topic 10 – Privacy of LLMs & Topic 11 – Privacy of MLLMs)

•Privacy attacks on large language models or generative AI tools

•Showcase privacy attacks (e.g., re-identification, membership inference, attribute inference) targeting LLMs or GAI tools

•Evaluate the corresponding privacy risks

•Propose mitigation strategies

Sample Project 11 (Topic 1 – Fairness of LLMs)

•Prompt engineering (in-context learning) and evaluation of a pretrained LLM for healthcare-related tasks

•Bias Detection: Showcase racial or gender bias of the pretrained LLM with real-world data and engineered prompts

•Evaluate racial or gender bias with experiments, quantitative analyses and statistical evidences

•Mitigate racial or gender bias with prompt engineering, retrieval-augmented generation, or fine-tuning