{"id":7318,"date":"2022-09-28T09:10:01","date_gmt":"2022-09-28T09:10:01","guid":{"rendered":"http:\/\/escience.sdu.dk\/?post_type=news&#038;p=7318"},"modified":"2022-09-28T09:10:01","modified_gmt":"2022-09-28T09:10:01","slug":"national-health-data-science-sandbox-for-training-and-research","status":"publish","type":"news","link":"http:\/\/escience.sdu.dk\/index.php\/news\/national-health-data-science-sandbox-for-training-and-research\/","title":{"rendered":"National Health Data Science Sandbox for Training and Research"},"content":{"rendered":"\n<p><em>UCloud is not just an ideal platform for the individual researcher who wants interactive access to HPC resources or an easy way to collaborate with national or international partners. It is also highly suitable for teaching. Jennifer Bartell and Samuele Soraggi, who are both working on the project National Health Data Science Sandbox for Training and Research, share their experiences with using UCloud.&nbsp;<\/em><\/p>\n\n\n\n<p><strong>National \u201csandbox\u201d platform<\/strong><\/p>\n\n\n\n<p>The growing amounts of data in all research fields offer researchers new opportunities and possibilities for scientific breakthrough. In the case of health science, the use of large amounts of data has great potential to improve our health care \u2013 it can e.g. expand our ability to understand and diagnose diseases. One of the constraints of using health data is that\u00a0many datasets (e.g. person-specific health records or genomics data) are\u00a0sensitive\u00a0from a patient privacy perspective\u00a0and\u00a0governed by strict access and usage guidelines. This\u00a0can be a major challenge\u00a0in particular for students or researchers who\u00a0are just learning best practices in handling health data while also developing data science skills.<\/p>\n\n\n\n<p>The idea behind the project&nbsp;<em>National Health Data Science Sandbox for Training and Research<\/em>&nbsp;is to provide students and researchers with a \u201csandbox\u201d environment where they can develop their skills, test their ideas and learn how to&nbsp;analyze&nbsp;large biomedical and clinical datasets&nbsp;in an HPC setting. The national sandbox will contain only public, anonymous data and&nbsp;non-person-sensitive&nbsp;synthetic\/simulated data which can be used without having to worry about GDPR regulations.&nbsp;It will also contain learning modules that guide trainees through major areas of health data science with a focus on new technologies and tools, simultaneously advancing their HPC skills as well as their grasp of the latest single-cell RNA-Seq analysis workflow, for example.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cWe provide the tools and non-sensitive data sets for people to learn and test new tools and algorithms without having to worry about working with sensitive health data which adds tons of restrictions and constraints on what you do,\u201d explains Jennifer Bartell, who is the coordinator of the project and a data scientist.&nbsp;<\/em><\/p><\/blockquote>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" src=\"http:\/\/escience.sdu.dk\/wp-content\/uploads\/2022\/09\/Sandbox_workflow_ai_horizontal2w-01.png\" alt=\"\" class=\"wp-image-7319\" width=\"2438\" height=\"1688\"\/><\/figure>\n\n\n\n<p>The Sandbox team is researching how to safely develop useful, privacy-preserving synthetic data as well as leverage published datasets. Their aim is that&nbsp;researchers and students&nbsp;with interesting or promising Sandbox test results can easily transition their approach and tools to sensitive datasets &#8211; with more evidence that their applications for data access and additional security measures are worth pursuing!&nbsp;<\/p>\n\n\n\n<p>The National Health Data Science Sandbox is funded by the Novo Nordisk Foundation and lead by Professor Anders Krogh from the University of Copenhagen. The&nbsp;Sandbox is being&nbsp;built and supported by a consortium of data scientists from the universities of Copenhagen, Aarhus, Aalborg, Southern Denmark and the Technical University of Denmark.&nbsp;<\/p>\n\n\n\n<p><strong>UCloud as part of the Sandbox<\/strong><\/p>\n\n\n\n<p>The National Health Data Science Sandbox&nbsp;is&nbsp;using the secure cloud technologies available at the Computerome2 installation in Roskilde and the UCloud platform, originally developed by the University of Southern Denmark and now used as the basis for the DeiC Interactive HPC service. Using already existing facilities will ensure that students and researchers are accustomed to the systems that they may subsequently use for projects which involve real sensitive data.<\/p>\n\n\n\n<p>This summer, UCloud was used for a course,&nbsp;<em>Introduction to Next Generation Sequencing Data<\/em>, which was organised in connection with the National Health Data Science Sandbox project.&nbsp;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cThe Sandbox is a project supporting teaching and training in health data science, and we are running training ourselves in how to use the Sandbox \/ HPC environments to perform health data science. We are packaging Sandbox datasets and tools into topical training modules which have self-tutorial, stand-alone versions but which we also use to support in-person workshops. This is how we\u2019re refining the material we provide in the Sandbox,\u201d says Jennifer Bartell.<\/em><\/p><\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cUCloud is pretty ideal for organizing pedagogic material because it is interactive. The user support team at the eScience Center helped us so we basically had an app running and starting the packages so the users would not have to install anything and could\u00a0directly\u00a0use the course material interactively,\u201d says Samuele Soraggi, also a data scientist working on the sandbox, and who, together with\u00a0researchers\u00a0at Aarhus University, organized the course.<\/em><\/p><\/blockquote>\n\n\n\n<p>Jennifer Bartell also expresses her satisfaction with UCloud and the support the project received from the eScience Center\u2019s staff:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cIt has been really nice working<\/em><a><em>\u00a0<\/em><\/a><em>with\u00a0Emiliano Molinaro and Claudio Pica. They have been really responsive\u00a0and engaged in helping us roll out these education-focused apps on UCloud.\u00a0We are happy to be working with partners that are excited about supporting teaching and training\u00a0in an HPC environment,\u201d says Jennifer Bartell.<\/em><\/p><\/blockquote>\n\n\n\n<p>The&nbsp;course&nbsp;has been published&nbsp;as&nbsp;an app on UCloud,&nbsp;<a href=\"https:\/\/docs.cloud.sdu.dk\/Apps\/genomics.html\">Genomics Sandbox<\/a>,&nbsp;which can be used by other researchers organizing similar courses&nbsp;or for independent study by any student or researcher.&nbsp;Course materials can thus be applied to new datasets uploaded by users.&nbsp;Each course has a companion webpage with additional material (slides, explorable code, and notes, etc) that is hosted on the Sandbox&nbsp;<a href=\"https:\/\/hds-sandbox.github.io\/modules\">webpage<\/a>&nbsp;and linked to in each app\u2019s UCloud documentation.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cI think it\u2019s very nice that the course participants did not have to worry about packages and installation. <em>They could\u00a0also\u00a0upload their own\u00a0datasets and try out the same analysis for their research projects.<\/em>\u00a0This allowed us to\u00a0mostly\u00a0concentrate on\u00a0the tutorial for the course,\u201d says Samuele Soraggi.<\/em><\/p><\/blockquote>\n\n\n\n<p>More recently, UCloud was also used for a bulk RNA-Seq workshop developed by the Copenhagen University Sandbox staff in collaboration with the&nbsp;<a href=\"https:\/\/heads.ku.dk\/datalab\/\" target=\"_blank\" rel=\"noreferrer noopener\">HeaDS DataLab<\/a>, which took place on the 18th and 19th of August. The in-person workshop had 26 participants and, as with the&nbsp;<em>Introduction to Next Generation Sequencing Data<\/em>&nbsp;course, the reviews were quite positive.&nbsp;This material will soon be made available as the Transcriptomics Sandbox app on UCloud. A Proteomics Sandbox app is also under active construction with associated workshops rolling out in 2023.<\/p>\n\n\n\n<p>All&nbsp;course material&nbsp;and associated datasets&nbsp;are&nbsp;open source and available through&nbsp;<a href=\"https:\/\/github.com\/hds-sandbox\/NGS_summer_course_Aarhus\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a>&nbsp;and the Sandbox&nbsp;<a href=\"https:\/\/hds-sandbox.github.io\/\">website<\/a>, where&nbsp;future courses and training modules hosted at UCloud or Computerome will also be advertised.<\/p>\n\n\n\n<p><strong>Would you like to host a similar course on UCloud?<\/strong><\/p>\n\n\n\n<p>If you have&nbsp;course material and might want help implementing it in the Sandbox or if you would like the Sandbox team\u2019s assistance in running a course or workshop related to health data science, you can contact Jennifer Bartell at&nbsp;<a href=\"mailto:NHDS_sandbox@sund.ku\">NHDS_sandbox@sund.ku<\/a>.dk&nbsp;or the Sandbox data scientist located at your university (listed&nbsp;<a href=\"https:\/\/hds-sandbox.github.io\/contact\/contact\/\">here<\/a>).<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cIf anybody has a course related to health data science, which they think would be fitting to host on UCloud in this way, but they maybe do not have the time or the resources to set it up, we would be happy to\u00a0discuss deployment options in the Sandbox,\u201d says Jennifer Bartell.<\/em><\/p><\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>\u201cIt\u2019s important to say that we also support the course when it\u2019s running, both at a Slack-channel level, but we also have people on site\u00a0at five universities who can help you,\u201d says Samuele Soraggi. \u00a0\u00a0\u00a0<\/em><\/p><\/blockquote>\n\n\n\n<p>UCloud can also be used for hosting workshops and courses in all research disciplines. If you need help to use UCloud for your course, please contact the support team:&nbsp;<a href=\"mailto:support@escience.sdu.dk\" target=\"_blank\" rel=\"noreferrer noopener\">support@escience.sdu.dk<\/a>.<\/p>\n\n\n\n<p><strong>Find out more<\/strong><\/p>\n\n\n\n<p>You can read more about the National Health Data Science Sandbox for Training and Research&nbsp;<a href=\"https:\/\/datascience.novonordiskfonden.dk\/projects\/national-health-data-science-sandbox-for-training-and-research\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>&nbsp;and visit the project\u2019s website&nbsp;<a href=\"https:\/\/hds-sandbox.github.io\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.&nbsp;<\/p>\n\n\n\n<p>Prof. Anders Krogh, University of Copenhagen,&nbsp;who is the&nbsp;leader of the project,&nbsp;presents the concept of the National Health Data Science Sandbox in a video<a href=\"https:\/\/video.ku.dk\/video\/66908910\/new-data-sandbox-will-be-a-training\" target=\"_blank\" rel=\"noreferrer noopener\">&nbsp;here<\/a>.&nbsp;<\/p>\n\n\n\n<p>UCloud has also been used in a similar way for a workshop organized by the Royal Danish Library. You can read both the participants\u2019 and the organizer\u2019s reflections<a href=\"https:\/\/interactivehpc.dk\/#\/localNews\/Organizer%20reflections%20on%20data(Tinget)%20and%20using%20UCloud\" target=\"_blank\" rel=\"noreferrer noopener\">&nbsp;here<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/interactivehpc.dk\/#\/localNews\/Participant%20reflections%20on%20data(Tinget)%20and%20using%20UCloud\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>UCloud is not just an ideal platform for the individual researcher who wants interactive access to HPC resources or an easy way to collaborate with national or international partners. It is also highly suitable for<a class=\"moretag\" href=\"http:\/\/escience.sdu.dk\/index.php\/news\/national-health-data-science-sandbox-for-training-and-research\/\"> Read more&hellip;<\/a><\/p>\n","protected":false},"author":11,"featured_media":7234,"comment_status":"closed","ping_status":"closed","template":"","tags":[],"news-category":[],"_links":{"self":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/news\/7318"}],"collection":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/news"}],"about":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/types\/news"}],"author":[{"embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/comments?post=7318"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/media\/7234"}],"wp:attachment":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/media?parent=7318"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/tags?post=7318"},{"taxonomy":"news-category","embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/news-category?post=7318"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}