CTDS Accepted to Google Summer of Code
We are thrilled to announce that the Center for Translational Data Science has been selected to participate in Google Summer of Code for the 2025 season. 185 organizations from around the globe were chosen as mentor organizations for this prestigious program.
Google Summer of Code is focused on bringing new contributors into open-source software development. During this 12+ week program, accepted GSoC contributors spend a few weeks becoming familiar with the community norms and codebase while determining expected milestones with their mentor for the summer. After onboarding, they spend at least 12 weeks working alongside mentors coding on their projects.
As with other work by CTDS, our projects broadly focus on supporting translational research around biomedical data. Specific projects involve working with FHIR (a standard for electronic health records), graph databases, enabling downloads of very large data, monitoring of cloud infrastructure, GPU cluster orchestration, and more!
Contributors may register and submit project proposals on the GSoC site starting March 24. We encourage you to visit our projects page and submit a proposal during the contributor application period. Please reach out to ctds-jobs@lists.uchicago.edu if you have any questions. Google has prepared a guide for contributors who are interested in learning more about the application process and submitting solid project proposals.
Some key dates to keep in mind are:
Starting February 27 - Potential GSoC contributors discuss application ideas with mentoring organizations
March 24 - GSoC contributor application period begins
April 8 - GSoC contributor application deadline
Don’t miss out on this exciting opportunity!
UChicago data scientists update powerful cancer data sharing platform
The University of Chicago Center for Translational Data Science (CTDS) has completed an update to the National Cancer Institute’s Genomic Data Commons, a repository and computational platform for cancer researchers who seek to understand cancer, its clinical progression and response to therapy. This new version is known as GDC 2.0.
Read the full press release here: https://biologicalsciences.uchicago.edu/news/gdc-2-update
ARDM June 6-7 in Chicago
Registration is now open for the inaugural ARDM workshop hosted by the Center for Translational Data Science. Accelerating Research Using Data Meshes and Data Fabrics (ARDM) will focus on the interoperability and integration of data platforms into data meshes, data fabrics, and other types of data ecosystems.
In recent years data platforms, including data commons, data repositories, and databases have seen tremendous growth. These platforms are tailored for biomedical data, environmental data, social determinants of health data, and other data relevant to improving health outcomes. This workshop will be an opportunity to reduce silos while maximizing data use and the potential for innovative discoveries.
This workshop will cover:
Developing a data mesh: five pillars
Technical requirements and standards for adding a data commons or data repositories to a data ecosystem
Policy and governance for data ecosystems
Standard agreements for data ecosystems
Use cases and success stories
If you are interested in attending, please visit our events page for more information and to register.
Biomedical Research Hub selected as key partner for international genomic data standards initiative
The Global Alliance for Genomics and Health (GA4GH) has named the Biomedical Research Hub as one of ten genomic data initiatives with clinical connections as its newest Driver Projects. The collaborations will allow genomic data standards to make new inroads into medicine and biomedical research, including applying machine learning to data from different regions around the globe.
Read the full press release here: https://biologicalsciences.uchicago.edu/news/ga4gh-driver-projects
SC23: Exploring the Data Frontier
This month Dr. Robert Grossman was interviewed by Super Compute’s Communications team sharing insight into his impact on high-performance computing.
Dr. Grossman also discusses his current work in health and wellness including the Genomic Data Commons, Gen3, as well as future challenges of HPC.
“The HPC challenge is to build the data platforms that can manage, explore, analyze, and share biomedical data at the scale needed and with the governance, security, and compliance required so we can tease out interesting small effects.”
The Center for Translational Data Science will be an exhibitor at SC23 in Denver from November 12 -17. Super Compute is the International Conference for High Performance Computing, Networking, Storage, and Analysis. Stop by booth 525 to say hello and learn about our latest research projects!
To read the full interview with Dr. Grossman click here.
Department of Medicine - Woman in the Spotlight
Dr. Aarti Venkat, CTDS Director of Clinical Informatics, was featured as a Woman in the Spotlight by the Department of Medicine
CTDS Gives Demo to UChicago Undergrad Students
CTDS Welcomes UChicago Students for Demo
On Wednesday, April 12th several staff members of CTDS (Aarti Venkat, Ph.D., Kyle Hernandez, Ph.D., Sara Volk de Garcia, Ph.D., Fay Booker Ph.D., and Hillary Carroll) gave a presentation to UChicago undergraduates exploring opportunities in computational biology where they can harness the power of computer science and analytics to answer key questions in the life sciences.
The demo was designed to inform students of possible career paths and covered topics including what Translational Data Science is, CTDS projects and roles, as well as challenges in the field.
Gen3 Community Event - How to Set Up a Gen3 Data Commons Using Helm Charts
We will take you through the current best practices for setting up and configuring your own Gen3 Data Commons in multiple clouds by using Helm Charts. Helm is a tool that streamlines installing and managing Kubernetes applications, which is a system for automating deployment, scaling, and management of containerized applications. The use of Helm will greatly simplify standing up, configuring, and maintaining your own Gen3 Data Commons. This is the first of a series of community events through 2023.
Gen3 Community Forum 2022
The Gen3 platform consists of open-source software services that support the emergence of healthy data ecosystems by enabling the interoperation and creation of cloud-based data resources, including data commons and analysis workspaces. Gen3 aims to accelerate and democratize the process of scientific discovery by making it easy to manage, analyze, harmonize, and share large and complex datasets in the cloud. With Gen3 use spreading globally, there is a demand to coalesce shared knowledge and activities into a community. A new Gen3 Community will meet for the first time at a virtual forum co-hosted by the Center for Translational Data Science at the University of Chicago and the Australian BioCommons from 4 pm to 7 pm, October 10 to 12 (CST time zone), and 8 am to 11 am October 11 - 13 (AEDT time zone). The forum will meet for three days, three hours each day, and will include presentations from various Gen3 operators, developers, and breakout sessions to craft ideas for new features. The inaugural Gen3 Community Forum will:
Share knowledge about Gen3, its architecture, and the Gen3 roadmaps and priorities.
Strengthen the connection between the core team and those developing, operating and using Gen3 platforms.
Design a set of ongoing community engagement activities.
Discuss and agree on key shared development priorities between the Gen3 core team and the community.
Further details of the program and free registration are now available.
University of Kentucky's Commonwealth Computational Summit 2022
On April 13, 2022 Dr. Robert Grossman, Director of the Center for Translational Data Science, gave an academic keynote talk at the Commonwealth Computational Summit. This was the 5th annual summit hosted by the University of Kentucky’s Center for Computational Science.
Talk Title: The Data Gap in Machine Learning and AI: Why Much of Machine Learning and AI is Still Data Limited, and Some of the Options Available.
Abstract: Although large amounts of online text, images and audio have provided enough data that deep learning models can be developed that significantly improve language translation, image recognition, speech recognition and related applications, developing and deploying machine learning and AI models that provide value and limit bias is still quite difficult in many application areas due to the lack of suitable data. This is especially the case in biology, medicine and health care. We discuss some of the reasons that many important AI problems are still data-limited and some of the approaches that have been taken to address this challenge. We use case studies from machine learning models in COVID-19 and cancer to illustrate some of the challenges and some of the options available.
CTDS Google Summer of Code Applications Now Live
We are excited to announce that the applications for the 2022 Google Summer of Code program are now open! The Center for Translational Data Science is among 203 open-source organizations accepted as Mentor Organizations to the program. Google Summer of Code is a global program focused on bringing new contributors into open-source software development. During this 12+ week program, accepted GSoC contributors spend a few weeks becoming familiar with the community norms and codebase while determining expected milestones with their mentor for the summer, then spend 12+ weeks coding on their projects. Contributors may register and submit project proposals on the GSoC site from now until Tuesday, April 19th at 18:00 UTC. Don’t miss out on this exciting opportunity!
CTDS Accepted to Google Summer of Code
We are excited to announce that the Center for Translational Data Science has been accepted as a Mentor Organization for the 2022 Google Summer of Code program. Google Summer of Code is a global program focused on bringing new contributors into open-source software development. This year a total of 203 open-source organizations were accepted to the program and will be mentoring GSoC Contributors. Applications for GSoC Contributors open on April 4th. Don’t miss this exciting opportunity to contribute to one of our projects.
President Biden announces Reignition of the Cancer Moonshot
As Vice President, in 2016, Joe Biden launched the Cancer Moonshot with the mission to accelerate the rate of progress against cancer. The Center for Translational Data Science (CTDS) has maintained involvement in two projects that received support as part of the Cancer Moonshot and have strategic importance for CTDS: the Genomic Data Commons and the Blood Profiling Atlas in Cancer (BloodPAC). Today, President Biden is reigniting the Cancer Moonshot with renewed White House leadership of this effort. President Biden announced new goals for the Cancer Moonshot: to reduce the death rate from cancer by at least 50 percent over the next 25 years and improve the experience of people and their families living with and surviving cancer—and, by doing these and more, to end cancer as we know it today.
Summer 2022 Internship Applications Opening Soon
Internship applications for our summer 2022 program will be opening soon. Interns will contribute toward biomedical research through analytical solutions and will develop technical skills across data engineering, data science, bioinformatics, and software engineering. Interns will have opportunities to learn from staff mentors with experience building petabyte-scale research infrastructure. Please check back soon for more details!