Workshop 1: First steps to text mining

Presenter: Prof Menno van Zaanen

Helpers: Inonge Lupinda & Anelda van der Walt

Date: 25 August 2023

Time: 8:30 - 12:30

Venue: Room (TBC), Premier Hotel O.R. Tambo, Johannesburg

Max capacity: 30 participants

Eligibility: Workshop participation is open to attendees of DH-IGNITE at no cost. Limited space is available. Successful DH-IGNITE applicants will be invited to register for either one of the workshops.

What to bring: A laptop

What to expect

In this workshop we will take a look at some fundamental skills that will get you started on your journey with text mining. To kick off, we will learn how to tell the computer what to search for. We will start out with simple search operations and explore their limitations. After that, we will look at more complex search operations. We will also introduce the first data wrangling steps for example converting text data into other formats for further processing. The workshop is accessible to humanities and social sciences students and researchers with no prior exposure to programming. We will not be covering any advanced text mining strategies or tools. Skills learned will be applicable in other aspects of research such as well, e.g. literature reviews.

Workshop aim

Most people know how to perform simple searches in text using the computer. However, search (as well as search and replace) operations allow for quite complex applications. In this workshop, we will take a look at how to get from relatively simple search operations to more complex search, as well as search and replace operations for instance for initial text analysis.

Workshop format

The workshop will provide a combination of practical examples and a bit of theory underlying search. The main focus will be on practical examples and exercises which will be handled within the group, within smaller groups, and on an individual basis. As the background knowledge of the participants may vary, we will tackle the different topics slowly, so everybody can follow.

Participants who would be interested in attending

The workshop will be relevant to anybody who uses text on a computer regularly. In particular, people who are interested in identifying interesting parts (such as pronouns (relevant to gender studies), diminutives (relevant to linguistics), etc) of the text will benefit from this workshop.

Topics that will be included

  • Simple search
  • Limitations of simple search
  • More powerful search options
  • Regular expressions
  • Search and replace
  • Finite-state machines
  • (In between several exercises will be tackled)

Topics that will not be included

  • Searching using search engines
  • Text or information extraction (although initial steps are covered)

Workshop outcome

After following this workshop, you will be able to perform simple and more complex search operations in texts. You will be able to automatically identify patterns in text. You will also be able to perform search and replace operations that allow for simple data wrangling (in other words, to convert text into data formats for further processing).