5 min read

Apply signal detection theory on behavioral data in R

Signal detection theory is one of the most popular modeling methods applied in behavior experiments. This post uses a data set from a previously published paper to demonstrate how to apply Signal Detection Theory (SDT) on behavioral in R.

There are (at least) mainly two approaches to apply SDT (and subsequent statistics) in R: 1. Calculate d’ in each condition and then perform repeated-measures ANOVA; 2. Apply generalized linear mixed-model with probit link.

In this post, we will focus on the first (and easier) approach. If you are interested in the second approach, you may refer to another paper, where the data analysis codes are available online.

Getting data

Data used here were from a subset of Experiment 1b in Jin et al., (2022). You may get the data by installing the custom package remotes::install_github("haiyangjin/psychr") and getting the data psychr::jin2022noncon. (Alternatively, you may download the data directly from GitHub.)

# to install psychr package used here:
# remotes::install_github("haiyangjin/psychr")
df_clean <- psychr::jin2022noncon 

head(df_clean, 10)

There are three independent variables:

  • Congruency: con vs inc;
  • Alignment: ali vs mis;
  • SD: same vs diff (we will treat same as signal and diff as noise.)

The dependent variables are:

  • isSame: whether the response was “same” (1) or “different” (0);
  • Correct: whether the response was “correct” (1) or “incorrect” (0);
  • RT: response time (this DV will not be used in this post).

Apply signal detection theory in R

In this application, we treat same trials as “signal” and diff trials as “noise”. The general steps for calculating the sensitivity d’ are:

  1. Calculate the hit (hit) and false alarm rates (fa);
  2. Apply corrections if needed;
  3. Convert rates (in percentage) to standard (Z) values (z_hit & z_fa);
  4. Get the sensitivity d’ with z_hit - z_fa.

It is noteworthy that we may use either isSame or Correct to calculate hit and false alarm rates. So we try both here and can compare both approaches.

# load packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

With Correct

Probably this is more popular to calculate hit and false alarm rates with Correct, i.e., whether the response was correct on each trial.

The specific steps of calculating d’ with Correct is as followings:

df_sdt_correct <- df_clean %>% 
  group_by(SubjID, Congruency, Alignment, SD) %>% 
  summarize(acc = mean(Correct), # hit and correct rejection
            count = n(), # number of trials in each condition
            .groups = "drop") %>% 
  mutate(acc = if_else(SD=="diff", 1-acc, acc), # hit and false alarm (Step 1)
         acc = if_else(SD=="same"&acc==1, (2*count-1)/(2*count), acc), # correction for hit
         acc = if_else(SD=="diff"&acc==0, 1/(2*count), acc), # correction for false alarm
         z = qnorm(acc)) %>% # convert to standard Z value
  pivot_wider(id_cols = c(SubjID, Congruency, Alignment), # make hit and false alarm into two columns
              names_from = SD,   
              values_from = z) %>% 
  mutate(d = same - diff, # calculate d from hit and false alarm
         d = round(d,2)) %>% 
  select(-c(same, diff))
  
# display results as wide format
df_sdt_correct %>% 
  pivot_wider(names_from = c(Congruency, Alignment),
              values_from = d)

With isSame

The other approach is to use isSame, i.e., whether the response was “same” on each trial.

df_sdt_issame <- df_clean %>% 
  group_by(SubjID, Congruency, Alignment, SD) %>% 
  summarize(acc = mean(isSame), # hit and false alarm
            count = n(), # number of trials in each condition
            .groups = "drop") %>% 
  mutate(acc = if_else(SD=="same"&acc==1, (2*count-1)/(2*count), acc),
         acc = if_else(SD=="diff"&acc==0, 1/(2*count), acc), # correction for false alarm
         z = qnorm(acc)) %>% # convert to standard Z value
  pivot_wider(id_cols = c(SubjID, Congruency, Alignment), # make hit and false alarm into two columns
              names_from = SD,   
              values_from = z) %>% 
  mutate(d = same - diff, # calculate d from hit and false alarm
         d = round(d,2)) %>% 
  select(-c(same, diff))
  
df_sdt_issame %>% 
  pivot_wider(names_from = c(Congruency, Alignment),
              values_from = d) 

isSame vs. Correct

We may compare the data frames obtained from both approaches:

all.equal(df_sdt_correct, df_sdt_issame)
## [1] TRUE

The data frames obtained from both approaches are the same. For those who wonder why the second approach is worthy discussing here, the second approach is more consistent to when probit link is used in generalized linear mixed-effects modeling to apply SDT. For more, please see this paper and its open codes.

Use library(psychr)

In addition to calculating d’ manually in R, I also made an R function (psychr::sdt()) to calculate d’ in R.

# to install psychr package used here:
# remotes::install_github("haiyangjin/psychr")
library(psychr)

An example with the same data set is:

sdt_clean <- psychr::sdt(df_clean, 
                         SN = "SD", # the column containing the signal/noise information
                         isSignal = "isSame", # the column contains "isSignal"
                         SubjID = "SubjID",
                         group_other = c("Congruency", "Alignment"),
                         signal = "same") # which level in SN is signal

sdt_clean$df %>% 
  mutate(d = round(d,2)) %>% 
  pivot_wider(names_from = c(Congruency, Alignment),
              values_from = d) 
all.equal(df_sdt_correct, sdt_clean$df)
## [1] "Component \"d\": Mean relative difference: 0.001039788"
all.equal(df_sdt_issame, sdt_clean$df)
## [1] "Component \"d\": Mean relative difference: 0.001039788"

The data frame obtained from psychr::sdt() is the same as those from the first two approaches. So you may either use custom codes or psychr::sdt() to apply SDT in R!