Navigating the USC BUGS Jr. Summer Research Program 2023: A Handbook for High School Participants

Shaunak Kapur, a rising senior from Seven Lakes High School in Houston, Texas, has embodied this commitment through his proactive involvement in the research endeavors, particularly during his summer term at the BUGS research program at the University of Southern California (USC). Shaunak foray into the sophisticated world of immunogenetics research began in The Mangul lab with his association with the BUGS summer research program at USC. Under the mentorship of Yu Ning, a current PhD student at the university, Shaunak immersed himself into the challenging yet intriguing domain of immunogenetics. This not only provided him with a platform to apply his academic knowledge but also bestowed him with real-world experience, which is pivotal in scientific research. It is during this tenure that Shaunak was introduced to two significant projects, each with its unique challenges and learnings.

The first research project titled “Assessing the Completeness of Immunogenetic Databases Across European Populations” presented Shaunak with the opportunity to analyze and comprehend the completeness of the IMGT database in representing various studies. Selecting two human TCR-Seq studies from the Sequence Read Archives (SRA), Shaunak executed a meticulous analysis by leveraging the bioinformatics tool, MiXCR, to evaluate how IMGT database in capturing and representing the V and J genes in the two studies. Shaunak also learned data analysis in Python. By utilizing different data science libraries in Python, Shaunak’s evaluative process concentrated on the number of mismatches in the V gene, which encompassed substitutions, insertions, and deletions. His rigorous analysis, coupled with an adept creation of figures based on European ancestries, uncovered valuable insights into the representativeness of the selected studies within the IMGT database. This in-depth analysis and its findings stand crucial for ensuring that the IMGT database is a reliable and comprehensive resource for immunogeneticists globally.

Venturing further into his research journey, Shaunak also dedicated himself on a separate project, titled “The Systematic Assessment of the Completeness of Public Metadata Accompanying Omics Studies.” In this research, he manually examined approximately 100 papers for shared metadata in various studies, focusing on characteristics such as disease, age, sex, tissue types, organism type, and strain information. Shaunak then applied customized Python script to extract the metadata reported in the public repositories across the 100 studies, diligently extracting and documenting the same metadata characteristics from the public GEO repository. Shaunak played a pivotal role in emphasizing the significance of comprehensive and transparent metadata in omics studies.

Shaunak Kapur’s involvement in these intensive research projects underscores the importance of hands-on experiences in shaping the future of bioinformatics scientists. His journey through exploring immunogenetic databases and systematically assessing public metadata has not only contributed significantly to the projects but also enriched his learning, offering him a pragmatic view of the challenges and triumphs embedded in research work. As he forges ahead in his academic journey, the knowledge and experience Shaunak accumulated through these projects will undoubtedly serve as a robust scaffold, enabling him to navigate, contribute, and explore in scientific research.