Tag&Tab — Pre‑training Data Detection in Large Language Models

Abstract

Large language models (LLMs) have become essential tools for digital task assistance. Their training relies on collecting vast amounts of data, which may include copyright-protected or sensitive content.

Recent efforts to detect pretraining data in LLMs have focused on sentence- or paragraph-level membership inference attacks (MIAs), typically by analyzing token probabilities. However, these methods often fail to capture the semantic importance of individual words, leading to poor accuracy.

To overcome these limitations, we introduce Tag&Tab, a novel black-box MIA method for detecting pretraining data in LLMs. Our approach consists of two steps: Tagging keywords in the input using advanced NLP techniques, and Tabbing—querying the LLM to obtain and average the log-likelihoods of these keywords to compute a robust membership score.

Experiments on four benchmark datasets (BookMIA, MIMIR, PatentMIA, and The Pile) using several open-source LLMs of varying sizes show that Tag&Tab achieves an AUC improvement of 5.3–17.6% over previous state-of-the-art methods. These results highlight the critical role of word-level signals in identifying pretraining data leakage.

Method

The Tag&Tab pipeline is a two‑stage black‑box attack:

Tagging. A lightweight keyword‑extraction module marks informative tokens—low‑entropy words and named entities.
Tabbing. The attacker prompts the target LLM and logs the likelihoods of the tagged keywords only. Their average yields a membership score, which is compared against a predefined threshold to decide whether the passage belonged to the model’s pre-training data.

This focus on semantically meaningful words drastically reduces noise from filler tokens and boosts detection accuracy, especially for long texts.

BookMIA Results

Method	LLaMa-7b		LLaMa-13b		LLaMa-30b		Pythia-6.9b		Pythia-12b		GPT-3.5		Average
	AUC	T@F5	AUC	T@F5	AUC	T@F5	AUC	T@F5	AUC	T@F5	AUC	T@F5	AUC	T@F5
Neighbor	0.65	0.27	0.71	0.38	0.90	0.73	0.65	0.26	0.71	0.36	0.96	0.88	0.76	0.48
Loss	0.59	0.25	0.70	0.43	0.89	0.74	0.62	0.24	0.69	0.32	0.97	0.90	0.74	0.48
Zlib	0.53	0.22	0.67	0.42	0.89	0.74	0.55	0.19	0.61	0.25	0.96	0.88	0.70	0.45
Min-20% Prob	0.61	0.24	0.70	0.42	0.87	0.70	0.65	0.25	0.70	0.34	0.95	0.86	0.75	0.47
MinK++-20% Prob	0.60	0.23	0.68	0.38	0.78	0.60	0.59	0.20	0.56	0.20	0.95	0.86	0.69	0.41
Max-20% Prob	0.51	0.15	0.66	0.34	0.87	0.69	0.51	0.13	0.59	0.20	0.96	0.91	0.68	0.40
ReCaLL	0.58	0.22	0.70	0.42	0.84	0.64	0.66	0.29	0.72	0.37	0.74	0.50	0.70	0.41
DC-PDD	0.61	0.27	0.71	0.47	0.88	0.77	0.68	0.34	0.74	0.44	0.95	0.89	0.76	0.53
Ours (Tag&Tab K=4)	0.69	0.28	0.78	0.48	0.91	0.76	0.72	0.30	0.75	0.36	0.97	0.90	0.80	0.51
Ours (Tag&Tab K=10)	0.67	0.26	0.77	0.46	0.91	0.77	0.72	0.30	0.76	0.36	0.96	0.87	0.80	0.50

AUC and T@F=5% scores for BookMIA membership inference. Best in bold, second-best underlined.

The Pile & MIMIR Results

Method	DM Mathematics					Github					Pile CC					C4					Ubuntu IRC					Gutenberg					EuroParl					Average
Method	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B	160M	1.4B	2.8B	6.9B	12B
Loss	0.85	0.76	0.84	0.68	0.86	0.80	0.85	0.86	0.88	0.88	0.53	0.54	0.54	0.55	0.55	0.50	0.51	0.51	0.51	0.51	0.63	0.59	0.60	0.58	0.58	0.53	0.53	0.53	0.53	0.53	0.52	0.52	0.50	0.52	0.51	0.67	0.67	0.69	0.66	0.70
Zlib	0.68	0.59	0.66	0.55	0.69	0.84	0.88	0.89	0.90	0.90	0.51	0.53	0.53	0.54	0.54	0.51	0.51	0.51	0.51	0.51	0.52	0.52	0.53	0.54	0.54	0.53	0.60	0.53	0.53	0.53	0.51	0.51	0.50	0.51	0.51	0.63	0.63	0.65	0.62	0.66
Min-20% Prob	0.61	0.53	0.70	0.50	0.82	0.80	0.85	0.86	0.88	0.88	0.52	0.53	0.54	0.55	0.55	0.51	0.51	0.51	0.51	0.50	0.58	0.57	0.52	0.51	0.52	0.53	0.53	0.53	0.53	0.60	0.53	0.54	0.52	0.50	0.51	0.61	0.61	0.65	0.61	0.69
Max-20% Prob	0.63	0.67	0.61	0.58	0.51	0.78	0.85	0.85	0.87	0.86	0.52	0.53	0.53	0.53	0.54	0.51	0.50	0.50	0.50	0.50	0.69	0.69	0.71	0.68	0.67	0.67	0.73	0.60	0.67	0.67	0.53	0.54	0.55	0.53	0.55	0.61	0.64	0.62	0.62	0.60
MinK++-20% Prob	0.81	0.79	0.66	0.81	0.73	0.57	0.57	0.61	0.63	0.66	0.51	0.50	0.52	0.53	0.53	0.52	0.51	0.51	0.50	0.50	0.52	0.51	0.52	0.54	0.61	0.67	0.60	0.60	0.60	0.60	0.54	0.53	0.51	0.51	0.51	0.60	0.59	0.57	0.62	0.61
RECALL	0.80	0.73	0.78	0.64	0.86	0.79	0.76	0.74	0.71	0.72	0.53	0.54	0.54	0.55	0.55	0.51	0.51	0.51	0.51	0.51	0.72	0.64	0.69	0.64	0.60	0.53	0.80	0.67	0.73	0.80	0.51	0.51	0.51	0.55	0.57	0.67	0.64	0.65	0.62	0.68
DC-PDD	0.90	0.86	0.86	0.85	0.86	0.87	0.91	0.92	0.93	0.93	0.54	0.55	0.56	0.57	0.57	0.51	0.51	0.51	0.51	0.51	0.58	0.53	0.53	0.53	0.53	0.53	0.60	0.60	0.53	0.53	0.51	0.52	0.50	0.51	0.54	0.70	0.70	0.72	0.71	0.70
Ours (Tag&Tab K=4)	0.96	0.96	0.96	0.95	0.95	0.78	0.82	0.83	0.84	0.85	0.54	0.56	0.56	0.57	0.57	0.53	0.52	0.52	0.52	0.51	0.64	0.65	0.64	0.66	0.64	0.67	0.67	0.67	0.67	0.67	0.55	0.54	0.55	0.54	0.56	0.70	0.72	0.73	0.72	0.73
Ours (Tag&Tab K=10)	0.92	0.92	0.93	0.92	0.95	0.79	0.83	0.84	0.85	0.86	0.55	0.56	0.56	0.57	0.56	0.53	0.52	0.52	0.52	0.51	0.61	0.63	0.62	0.61	0.62	0.60	0.67	0.67	0.67	0.67	0.56	0.54	0.55	0.54	0.55	0.70	0.71	0.73	0.72	0.72

Best AUCs bolded, second-best underlined across MIMIR (top 4 columns) and Pile (bottom 4).

BibTeX

@article{antebi2025tag,
  title={Tag\&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack},
  author={Antebi, Sagiv and Habler, Edan and Shabtai, Asaf and Elovici, Yuval},
  journal={arXiv preprint arXiv:2501.08454},
  year={2025}
}

Tag&Tab: Pre‑training Data Detection in Large Language Models Using Keyword‑Based Membership Inference Attack

EMNLP 2025

Tag&Tab detects whether the given LLM was trained on specific documents, using selected NER spans with low‑entropy keyword probabilities.

Abstract

Method

BookMIA Results

The Pile & MIMIR Results

BibTeX