Quantitative and qualitative text analysis with MATLAB

Green leaf on book

Introduction

Analysis of textual data, such as interview transcripts, policy documents, and social media content, is common practice within the social sciences, humanities, and other faculties. In this course, you will learn how to do such analysis with the program MATLAB. We will discuss both pre-processing and analysing both quantitative and qualitative textual data.

Learning and using MATLAB has several valuable advantages:

  • The program’s Text Analytics Toolbox offers a pleasant and effective tool for analysis of textual data.
  • The field of text analysis is rapidly evolving with strong connections to data intensive science and machine learning applications. MATLAB helps you master these advanced methods by providing a versatile learning platform.
  • MATLAB also helps with working with the heterogeneous, multimode and large datasets that are becoming increasingly important in the social sciences and humanities.
  • With MATLAB you can produce high quality visualisations and build a strong Open Science visibility record.
  • Learning MATLAB is a long-term investment; it is worth your time if you want to prepare for a ‘data-intensive’ career potentially using various methods, including text analysis. 

 

Course information

ECTS: 2.5
Number of session: 4
Hours per session: 3

Key Facts & Figures

Type
Course
Instruction language
English
Mode of instruction
Online

What will you achieve?

  • By completing this course you will acquire essential data engineering skills to organise, structure and prepare text data for qualitative and quantitative analysis in MATLAB.
  • By completing this course you will be able to work independently with the MATLAB Text Analytics Toolbox, and to apply various text analysis research methods and functions in this program.
  • By completing this course you will visualise text-analysis results and produce high quality graphics with MATLAB.

Start dates

The course will be offered in academic year 2024-2025.

New dates will be published around mid July 2024.

Please keep an eye on our website!

Aims and working method

This online course follows a learning-by-doing approach with practical hands-on examples and interactive notebooks. After an introduction of MATLAB’s fundamentals, you will learn to work with the Text Analytics toolbox. For instance, you will learn to work with tokenised and labeled datasets and apply various methods and applications for text analysis research, such as TF-IDF, BagofWords, bagOfNgrams, text-search, word-embeddings, and sentiment analysis. Students are encouraged to bring their own dataset(s) to work on.

Preparations and requirements

In advance of session 1 students should have completed the MATLAB Onramp (2 hours, self-paced course online). 

Further, students should install MATLAB and both the Text Analytics toolbox and the Statistics and Machine Learning Toolbox before the course starts. More information on how to install MATLAB and MATLAB Toolboxes can be found here and at the EUR employee work support page.  

Students can install the MATLAB 2023A software directly from the EUR Software Center or download the latest MATLAB version from MathWorks. A MathWorks account is needed to download the latest version of the MATLAB software (choose MATLAB individual as license type). Click here to register for a MathWorks account. A MathWorks account is also required to make use of MATLAB Drive where all course materials will be shared. 

MATLAB’s minimum system requirements are described here. A minimum of 8 GB of RAM is advised. 

Entry level 

Participating in this course does not require any previous programming experience. The course can be attended by researchers who are not yet experienced with text analysis. Students need to prepare for 2 hours homework per session.  

The course is useful for students who have no prior knowledge of and experience with MATLAB or text analysis. Some familiarity with a statistical package (SPSS, Stata, R, SAS) and/or a programming language (Python, R) is recommended. 

Session descriptions

In the first session, students are familiarised with the MATLAB user interface, working with interactive notebooks, MATLAB Drive and installing toolboxes (i.e., the Text Analytics Toolbox). Through hands-on examples students learn to work with (among other things) chars, strings, tokenised documents, text search and simple regular expressions. Students will be introduced to visualising text data in MATLAB using basic 2D scatter plots. Home exercises are provided for further exploration and deepening of working with text in MATLAB.

 

In the second session, students will master various pre-processing methods for text analysis, such as frequency counts, TD-IDF and custom labeled datasets, and learn to make use of MATLAB’s supporting methods for text analysis, such as BagofWords, bagOfNgrams and word-embeddings. The second session will also cover practical data management skills to handle and organise (large) collections of text data. Students are encouraged to bring their own dataset to work on.

In the third session students will explore data structures relevant to text-analysis e.g., graphs and learn to apply clustering, classification, and data reduction methods to text-data. Students will also learn to use MATLAB’s advanced capabilities for visualising text data.

Each student presents a case study and shares lessons learned for working with text data and text analysis in MATLAB. A comparison of MATLAB with other software for text analysis (ATLAS.ti, R) and programming languages e.g., Python) will be discussed where relevant.

Instructor

  • Portrait of Rob Grim
    Rob Grim has held positions as a Data Analyst, as a Research Data Specialist and as Head of Research Support. He currently works as Business/Economics & Data Librarian at the EUR and as a member of the Erasmus Data Service Centre (EDSC) team. Rob is a Carpentries teaching instructor and has extensive experience with data-preprocessing, and data analytics in various science disciplines. He has an interest in statistics, cognitive science, and machine learning. Rob has a background in Psychology.
    Email address

Contact

Facts & Figures

Fee
  • free for PhD candidates of the Graduate School
  • €575,- for non-members
  • consult our enrolment policy for more information
Tax
Not applicable
Offered by
Erasmus Graduate School of Social Sciences and the Humanities
Course type
Course
Instruction language
English
Mode of instruction
Online

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes