Projects

Transcriptional condensates at super-enhancers mediate pH-dependent transcriptional regulation in innate immunity and human cancer

This project combined genome-wide region-centric analysis, differential signal modeling, and interpretation of pH-sensitive regulatory behavior, using multi-omics integration to identify potential biomarkers associated with BRD4-mediated transcriptional condensates in close coordination with experimental collaborators.

By integrating epigenomic datasets, including BRD4 and H3K27ac ChIP-seq, ATAC-seq, and transcriptomics RNA-seq across multiple conditions, I identified RELA, IRF family, and STAT family transcription factors as candidate regulators associated with BRD4-mediated, pH-sensitive transcriptional condensates in mouse macrophages. Experimental validation further confirmed that these transcription factors showed binding patterns aligned with BRD4 across pH conditions.

Using statistical modeling and survival analysis, I also identified candidate transcriptional regulators associated with patient survival in colon cancer, with putative transcriptional condensate activity in colon tumor cells is more likely to correlate with poor prognosis, whereas T cell-associated transcriptional condensate activity is more likely to be linked to better survival.

Source: bioRxiv | GitHub | Poster

CTCFexplorer: a systematic framework and comprehensive data resource for CTCF binding in the human and mouse genomes

I built CTCFexplorer (www.ctcf.info), an SQL-backed, user-friendly data resource and web platform that integrates more than 4,000 public CTCF ChIP-seq datasets from human and mouse into a searchable database. Using a robust processing pipeline that includes metadata curation, quality control, and statistical analysis, the platform identifies constitutive and cell type-specific CTCF binding events at genome scale. It is designed to turn large public datasets into practical and accessible resources for data mining, querying, and downstream biological interpretation of CTCF binding.

Source: ctcf.info | Cancer Research | GitHub | Poster

Genomic clustering tendency of transcription factors reflects phase-separated transcriptional condensates at super-enhancers

I developed a statistical framework to quantify genome-wide clustering patterns of transcription factor motifs and binding organization. These clustering patterns are linked to transcriptional condensates at super-enhancers. Genomic regions with densely clustered transcription factor binding sites are more enriched at cell type-specific super-enhancers and exhibit higher chromatin accessibility, stronger chromatin interactions, and greater association with cancer outcomes.

Source: Nucleic Acids Research | GitHub

Topology-based machine learning analysis of chromatin structure for cell-type classification using multiplexed chromatin tracing data

I developed machine learning models to classify cell types from multiplexed chromatin tracing data, with a focus on feature design, model development, evaluation, and biological interpretation. I applied a topology-based representation of chromatin structure using Delaunay tessellation and constructed features from thousands of spatial topological units. Using these features, I trained and evaluated random forest classifiers on mouse cortex chromatin tracing data. The models achieved an overall accuracy of ~0.75 for distinguishing major cell types, compared with ~0.5 accuracy from random controls. Pairwise classification performance further revealed clustering patterns consistent with known major cortical cell classes.

In this project, I designed and implemented the full analysis pipeline, including data preprocessing and curation, coordinate imputation, feature engineering, machine learning model training, performance evaluation, and interpretation of results. This work demonstrates the feasibility of applying machine learning and topology-based feature design to extract biologically meaningful patterns from spatial chromatin organization data.

Source: GitHub