Researchers within the field of humanities are typically not heavy users of HPC (High Performance Computing) or cloud computing. However, a book, once digitalized, is actually quite a big data set. Assistant Professor at the department of Design and Communication, Zhiru Sun, tells us how she has been helping researchers from the Faculty of Humanities at SDU solve their research problems through digital methods and how using computing resources such as UCloud, also called DeiC Interactive HPC, can be a highly viable option if your project e.g. involves looking for patterns and similarities in digitalized texts.
Students’ online learning behavior and digital humanities
Assistant Professor Zhiru Sun has been employed at SDU since 2019. In her own research, she uses data science techniques to study individual and collaborative learning processes in digital learning environments, especially on online learning platforms.
“On these type of platforms, we can get some students’ behaviour data, which can help us understand when, where, and possibly why the students have difficulties in learning, so the teacher can provide target feedback and support to help them overcome the difficulties,” Sun explains.
This type of research has become especially relevant during the past two years of on and off online teaching due to COVID-19. However, Sun stresses that the topic of her research is not spawned by the global pandemic.
“I’ve been having an interest in online learning for a long time – long before COVID-19. My master’s thesis is in educational psychology and my PhD is in educational technology, and I’ve been combining these two subjects for a while.”
In addition to her own research and teaching, Zhiru Sun is the local SDU representative in the digital curriculum and digital literacy support team. Here, she helps colleagues at SDU who participate in the digital curriculum or digital literacy programmes with text analysis, statistical analysis and general IT support.
“I have a strong interest in digital humanities, where I explore various ways to represent, analyze and communicate ideas and data in humanities through computational tools and techniques,” Sun explains.
The term “digital humanities” refers both the digitalization of material – library books, photographs, maps, literature, historical texts and the like – as well as to the use of digital methods and tools within the field of humanities.
Sun is furthermore the representative from the Faculty of Humanities in the SDU eScience Center’s Operational Board.
Collaboration with the Hans Christian Andersen Centre at SDU
One of the projects where Zhiru Sun has been applying her digital skills to assist a humanist project is a collaboration with Nils Holger Berg and Ane Grum-Schwensen from the Hans Christian Andersen Centre at SDU. The project is called Detecting text reuse in H.C. Andersen’s work and is part of a larger publication project, Hans Christian Andersen’s Fairy Tales and Stories – the digital manuscript edition, which aims to digitalize and publish all the preserved manuscripts of Hans Christian Andersen in an online, genetic edition (If you are interested in knowing more about the digital edition, it is described in more detail at http://andersen.sdu.dk/ms. The updated description is in Danish, but an older English version is available at http://beta.auh.sdu.dk/en/.)
The collaboration came about when Holger Berg participated in a digital literacy seminar, looking for digital methods to solve a problem. In 2019, one of his colleagues from Odense City Museums, senior researcher Ejnar Stig Askgaard, had started comparing Hans Christian Andersen’s notes, written between approximately 1833 – 1875, with the 162 fairy tales, novels and autobiographies. This had led to the discovery that Hans Christian Andersen liked to use symbols such as cross marks or deletions in his notes to indicate that the note had been reused in his fairytales.
Now, Berg wanted to find out where each note had been reused. Earlier research had managed to manually identify where 278 notes had been reused in Hans Christian Andersen’s published work, but this had been a time-consuming effort, taking many months of work.
“Because comparing the texts manually was taking such a long time, Holger came to me and asked whether we could do something using digital methods instead,” Sun explains.
As 861 of the notes had been digitalized, Sun was able to apply a method called Natural Language Processing to find similarities between the notes and Hans Christian Anderson’s work. This method generated a number of tables, which indicated how similar a specific note is to a specific fairytale.
“It only took me around 8 hours to generate these tables and find a good indication of where all the 861 digitalized notes had been reused.”
In the tables above, each note has received a score from -1 to 1. The closer the score is to 1, the more similar is the note to the fairytale and vice versa. Note_61, e.g., where the low score indicates that it is very different from all fairytales, is a shopping list.
Projects as these illustrate the vast potential that digital humanities methods have – they can potentially save researchers years of manual work, time and resources which they can apply to analyzing and interpreting the results instead.
“When I’m out presenting to researchers from humanities who are not familiar with these tools, I try to emphasize that what digital humanities methods can do is to shed a light, or place the spotlight on where it might be relevant to narrow down your angles and start to dig deeper. At the end of the day, a digital humanist still do a humanist’s research, i.e. qualitative studies and analysis of conversations and communication, but he or she has saved a lot of time by using the spotlight,” says Sun.
Digital humanities and UCloud
Programmes used in digital humanities such as Python and Voyant can all be accessed from your own computer. However, if you have a large enough data set, you may need additional computing resources. Thankfully, these programmes are available through UCloud as well. For Detecting text reuse in H.C. Andersen’s work, Sun e.g. used the Python application on UCloud to analyze the data.
“If you want to do a digital humanities project, you will often get a huge data set because transforming a book to digital material creates a big data set. So I can easily see that there is a big potential that many researchers from the humanities will use resources such as UCloud in the future.”
Together with the SDU Front Office at the eScience Center, Sun and the team behind Detecting text reuse in H.C. Andersen’s work also initiated the process to have new applications installed on UCloud, e.g. Intertext, a visualization tool developed by the Yale Digital Humanities Lab. Other researchers from humanities, who want to use digital methods, will be able to benefit from these.
“When I presented the tables to the researchers behind Detecting text reuse in H.C. Andersen’s work, they found the results interesting, but also the tables a bit hard to read. In order to reach a wider audience, we used Intertext on UCloud to present our results in a visual and interactive way.”
Want to learn more?
If you are interested in learning more about digital humanities and find out whether digital methods might be relevant for your project, please feel free to contact Zhiru Sun (email@example.com) or her colleagues at the Center for Special Collections & Digital Humanities (SCDH) at SDU.
You are also welcome to contact the eScience Center’s Front Office where we can tell you more about the eScience options available to researchers at SDU.
You can also check out DIGHUMLAB‘s website for more information on workshops and upcoming events, as well as the video below where Zhiru Sun and Rune Jørgensen talk about the work they do as part of Center for Special Collections & Digital Humanities (SCDH).