Tag&Tab: Pre‑training Data Detection in Large Language Models Using Keyword‑Based Membership Inference Attack

EMNLP 2025

1Ben‑Gurion University of the Negev
Tag&Tab method diagram

Tag&Tab detects whether the given LLM was trained on specific documents, using selected NER spans with low‑entropy keyword probabilities.

Abstract

Large language models (LLMs) have become essential tools for digital task assistance. Their training relies on collecting vast amounts of data, which may include copyright-protected or sensitive content.

Recent efforts to detect pretraining data in LLMs have focused on sentence- or paragraph-level membership inference attacks (MIAs), typically by analyzing token probabilities. However, these methods often fail to capture the semantic importance of individual words, leading to poor accuracy.

To overcome these limitations, we introduce Tag&Tab, a novel black-box MIA method for detecting pretraining data in LLMs. Our approach consists of two steps: Tagging keywords in the input using advanced NLP techniques, and Tabbing—querying the LLM to obtain and average the log-likelihoods of these keywords to compute a robust membership score.

Experiments on four benchmark datasets (BookMIA, MIMIR, PatentMIA, and The Pile) using several open-source LLMs of varying sizes show that Tag&Tab achieves an AUC improvement of 5.3–17.6% over previous state-of-the-art methods. These results highlight the critical role of word-level signals in identifying pretraining data leakage.

Method

The Tag&Tab pipeline is a two‑stage black‑box attack:

  • Tagging. A lightweight keyword‑extraction module marks informative tokens—low‑entropy words and named entities.
  • Tabbing. The attacker prompts the target LLM and logs the likelihoods of the tagged keywords only. Their average yields a membership score, which is compared against a predefined threshold to decide whether the passage belonged to the model’s pre-training data.

This focus on semantically meaningful words drastically reduces noise from filler tokens and boosts detection accuracy, especially for long texts.

Method diagram placeholder

BookMIA Results

Method LLaMa-7b LLaMa-13b LLaMa-30b Pythia-6.9b Pythia-12b GPT-3.5 Average
AUCT@F5 AUCT@F5 AUCT@F5 AUCT@F5 AUCT@F5 AUCT@F5 AUCT@F5
Neighbor0.650.270.710.380.900.730.650.260.710.360.960.880.760.48
Loss0.590.250.700.430.890.740.620.240.690.320.970.900.740.48
Zlib0.530.220.670.420.890.740.550.190.610.250.960.880.700.45
Min-20% Prob0.610.240.700.420.870.700.650.250.700.340.950.860.750.47
MinK++-20% Prob0.600.230.680.380.780.600.590.200.560.200.950.860.690.41
Max-20% Prob0.510.150.660.340.870.690.510.130.590.200.960.910.680.40
ReCaLL0.580.220.700.420.840.640.660.290.720.370.740.500.700.41
DC-PDD0.610.270.710.470.880.770.680.340.740.440.950.890.760.53
Ours (Tag&Tab K=4)0.690.280.780.480.910.760.720.300.750.360.970.900.800.51
Ours (Tag&Tab K=10)0.670.260.770.460.910.770.720.300.760.360.960.870.800.50

AUC and T@F=5% scores for BookMIA membership inference. Best in bold, second-best underlined.

The Pile & MIMIR Results

Method DM Mathematics Github Pile CC C4 Ubuntu IRC Gutenberg EuroParl Average
160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B 160M1.4B2.8B6.9B12B
Loss0.850.760.840.680.860.800.850.860.880.880.530.540.540.550.550.500.510.510.510.510.630.590.600.580.580.530.530.530.530.530.520.520.500.520.510.670.670.690.660.70
Zlib0.680.590.660.550.690.840.880.890.900.900.510.530.530.540.540.510.510.510.510.51 0.520.520.530.540.540.530.600.530.530.530.510.510.500.510.510.630.630.650.620.66
Min-20% Prob0.610.530.700.500.820.800.850.860.880.880.520.530.540.550.550.510.510.510.510.50 0.580.570.520.510.520.530.530.530.530.600.530.540.520.500.510.610.610.650.610.69
Max-20% Prob0.630.670.610.580.510.780.850.850.870.860.520.530.530.530.540.510.500.500.500.50 0.690.690.710.680.670.670.730.600.670.670.530.540.550.530.550.610.640.620.620.60
MinK++-20% Prob0.810.790.660.810.730.570.570.610.630.660.510.500.520.530.530.520.510.510.500.50 0.520.510.520.540.610.670.600.600.600.600.540.530.510.510.510.600.590.570.620.61
RECALL0.800.730.780.640.860.790.760.740.710.720.530.540.540.550.550.510.510.510.510.51 0.720.640.690.640.600.530.800.670.730.800.510.510.510.550.570.670.640.650.620.68
DC-PDD0.900.860.860.850.860.870.910.920.930.930.540.550.560.570.570.510.510.510.510.51 0.580.530.530.530.530.530.600.600.530.530.510.520.500.510.540.700.700.720.710.70
Ours (Tag&Tab K=4)0.960.960.960.950.950.780.820.830.840.850.540.560.560.570.570.530.520.520.520.51 0.640.650.640.660.640.670.670.670.670.670.550.540.550.540.560.700.720.730.720.73
Ours (Tag&Tab K=10)0.920.920.930.920.950.790.830.840.850.860.550.560.560.570.560.530.520.520.520.51 0.610.630.620.610.620.600.670.670.670.670.560.540.550.540.550.700.710.730.720.72

Best AUCs bolded, second-best underlined across MIMIR (top 4 columns) and Pile (bottom 4).

BibTeX

@article{antebi2025tag,
  title={Tag\&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack},
  author={Antebi, Sagiv and Habler, Edan and Shabtai, Asaf and Elovici, Yuval},
  journal={arXiv preprint arXiv:2501.08454},
  year={2025}
}