Digital research methods for textual data

Methodology courses and philosophy of science
hands near a laptop

Introduction

This course introduces a set of digital research methods (DRM). With these innovative methods, it is possible to analyse large textual datasets from social media, news articles, interviews, and other sources, and also render these as networks, an alternative analytical perspective. In virtually all disciplines in the social sciences and humanities, these techniques are becoming increasingly popular.

The course is specifically designed for people who do not feel comfortable using technical programming software. We will focus on how DRM can be applied with accessible software based on user-friendly interfaces. However, those who more inclined to learn or use programming are welcome to do so, as the course material also includes instructions for executing DRM using R (a statistical programming language).

Course information

ECTS: 2.5
Number of sessions: 4
Hours of session: 3

Key Facts & Figures

Type
Course
Duration
12 hours
Instruction language
English
Mode of instruction
Offline

What will you achieve?

  • After completion of this workshop, you will be able to scrape and clean textual data from social media and news articles.
  • You will be able to conduct some digital research methods, particularly text analysis, topic modelling, sentiment analysis and network analysis.
  • You will be able to visualise and interpret results of the analysis.

Start dates

In the academic year 2023-2024 this course will take place offline.

Session 1
September 22 (Friday) 2023
10.00-13.00
Langeveld building (campus map), room 2.24

Session 2
September 29 (Friday) 2023
10.00-13.00
Langeveld building (campus map), room 2.24

Session 3
October 6 (Friday) 2023
10.00-13.00
Langeveld building (campus map), room 2.24

Session 4
October 13 (Friday) 2023
10.00-13.00
Langeveld building (campus map), room 1.06

Aim and working method

The first class will introduce concepts and structuring of digital data. We will also cover some basic approaches to scraping social media content (namely Twitter) as well as news articles (LexisNexis) and will also cover steps for cleaning textual data and basic text analysis.

In the second class, more advanced text analysis approaches will be introduced. This will include topic modelling - a powerful but easy to use text analytic method for uncovering hidden themes from many text documents - and sentiment analysis, a method for assessing polarity in texts.

In the third class, we will explore additional social media scraping tools (for Facebook, YouTube, and Instagram) and also introduce network analysis, a relational perspective that can also be applied to text data. We will examine topic models rendered as networks. Network depictions of textual content can reveal new perspectives and lead to enhanced interpretations.

The fourth class continues the exploration of text-as-networks, including entity and semantic networks,

Also, some steps for using network analysis approaches to visualise and analyse qualitative content coding will be undertaken.

› There will be four 3-hour sessions. Each session will include a mix of lectures (40%), demonstrations (5%), and in-class exercises (55%).

› Participants can work with the text and network data supplied for the course OR they can explore text/network data of their own.

How to prepare

In order to actively participate in the course, you are required to read the following literature:

› Levallois, C. (2017). A primer on text mining for business. (https://seinecle.github.io/mk99/generated-pdf/text-mining-for-business.pdf)

› Levallois, C. (2017). A primer on network analysis for business.

 (https://seinecle.github.io/mk99/generated-pdf/network-analysis-for-business.pdf)

› Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. (Focus on sections leading up to ‘LDA and probabilistic models’

 (https://cacm.acm.org/magazines/2012/4/147361-probabilistic-topic-models/fulltext)

› Thelwall, M. (2017). Heart and soul: Sentiment strength detection in the social web with SentiStrength.

Cyberemotions: Collective Emotion in Cyberspace, 119134.

(Paper available on SentiStrength website; focus on sections Introduction, Using, Core, Additional, Sarcasm, Application; you may skim the rest)

› Lee, J. (2021). Digital methods and tools: A Step-by-Step Guide, Erasmus University Rotterdam (URL will be emailed to participants)

The first two readings are very short introductions and applicable to domains beyond business.

You should also familiarise yourself with the instructor’s Digital Research Methods Step-by-Step Guide, particularly the sections on topic modelling (4.8) and topic networks (6.9) and data scraping: Mozdeh (3.8), LexisNexis (4.1), SNScrape (3.9), and Netvizz (for YouTube 2.4):

If the course is not held in a pc lab, then bring your own laptops for the in-class exercises. Do note, you may need to have Administrator rights on your laptop in order to install some of the software. The following software programs need to be installed:

› ConText 1.2 or 2.0: http://context.lis.illinois.edu (http://context.lis.illinois.edu/) (http://context.lis.illinois.edu/)

› Gephi 0.9.2: https://gephi.org (https://gephi.org/) (https://gephi.org/)

› Mozdeh (Big Data Text Analysis, Windows only):(http://mozdeh.wlv.ac.uk/) (http://mozdeh.wlv.ac.uk/)

› SNScrape (for Twitter scraping. Available only through the DRM Dropbox ‘tools/Extra’ folder: (URL to

 be emailed to participants)

These tools may be acquired from either the course instructor’s Digital Research Methods Dropbox ‘tools’ folder (see below) or the original websites.

› DRM Dropbox ‘tools’ folder: (URL to be emailed to participants)

Session description

  • This session introduces you to world of digital data, including text data. 
  • Also, you will learn to scrape data from Twitter and LexisNexis using several online and offline  tools, extract their textual elements, and learn how to conduct basic, but necessary, cleaning of the data in the ConText  text analysis software.
  • Finally, you will learn to conduct basic text analysis.

  • In this session, you will learn about how topic models operate, their application, and subsequently perform and interpret topic modelling on the acquired data.
  • We will cover other approaches to social media scraping (for Facebook, YouTube, and Instagram) and more rigorous text cleaning through Excel.
  • You will also learn about automated sentiment analysis, which can detect polarity of text segments.

  • In this session, you will learn about (social) network analysis, an analytical relational perspective of data analysis.
  • You will learn how textual data can be viewed as networks, specifically topic model networks, through the Gephi program.

  • This session extends the network treatment of textual data and covers various semantic networks.
  • Also, the network approach to qualitative coding/analysis will also be investigated.

Instructor

  • Portrait of Jay Lee
    Ju-Sung (Jay) Lee is assistant professor of digital research methods at the Department of Media and Communication of Erasmus University Rotterdam (EUR). His research focuses on various digital, network, and statistical methodologies and their application to online and offline discourse and interactions, recently in the context of the refugee crisis and artist communities. Jay holds a PhD in sociology from Carnegie Mellon University (USA) and has a background in computer science, organisation and decision sciences, and quantitative sociology.
    Email address

Contact

  • Enrolment-related questions: enrolment@egsh.eur.nl
  • Course-related questions: gruber@ese.eur.nl
  • Telephone+31 (0)10 4082607

Facts & Figures

Fee
  • free for PhD candidates of the Graduate School
  • € 575 for non-members
  • Consult our enrolment policy for more information
Tax
Not applicable
Application deadline
Friday 8 Sep 2023
Duration
12 hours
Offered by
Erasmus Graduate School of Social Sciences and the Humanities
Course type
Course
Instruction language
English
Mode of instruction
Offline

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes