Large language models (LLMs) have become essential tools for digital task assistance. Their training relies on collecting vast amounts of data, which may include copyright-protected or sensitive content.
Recent efforts to detect pretraining data in LLMs have focused on sentence- or paragraph-level membership inference attacks (MIAs), typically by analyzing token probabilities. However, these methods often fail to capture the semantic importance of individual words, leading to poor accuracy.
To overcome these limitations, we introduce Tag&Tab, a novel black-box MIA method for detecting pretraining data in LLMs. Our approach consists of two steps: Tagging keywords in the input using advanced NLP techniques, and Tabbing—querying the LLM to obtain and average the log-likelihoods of these keywords to compute a robust membership score.
Experiments on four benchmark datasets (BookMIA, MIMIR, PatentMIA, and The Pile) using several open-source LLMs of varying sizes show that Tag&Tab achieves an AUC improvement of 5.3–17.6% over previous state-of-the-art methods. These results highlight the critical role of word-level signals in identifying pretraining data leakage.
The Tag&Tab pipeline is a two‑stage black‑box attack:
This focus on semantically meaningful words drastically reduces noise from filler tokens and boosts detection accuracy, especially for long texts.
Method | LLaMa-7b | LLaMa-13b | LLaMa-30b | Pythia-6.9b | Pythia-12b | GPT-3.5 | Average | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUC | T@F5 | AUC | T@F5 | AUC | T@F5 | AUC | T@F5 | AUC | T@F5 | AUC | T@F5 | AUC | T@F5 | |
Neighbor | 0.65 | 0.27 | 0.71 | 0.38 | 0.90 | 0.73 | 0.65 | 0.26 | 0.71 | 0.36 | 0.96 | 0.88 | 0.76 | 0.48 |
Loss | 0.59 | 0.25 | 0.70 | 0.43 | 0.89 | 0.74 | 0.62 | 0.24 | 0.69 | 0.32 | 0.97 | 0.90 | 0.74 | 0.48 |
Zlib | 0.53 | 0.22 | 0.67 | 0.42 | 0.89 | 0.74 | 0.55 | 0.19 | 0.61 | 0.25 | 0.96 | 0.88 | 0.70 | 0.45 |
Min-20% Prob | 0.61 | 0.24 | 0.70 | 0.42 | 0.87 | 0.70 | 0.65 | 0.25 | 0.70 | 0.34 | 0.95 | 0.86 | 0.75 | 0.47 |
MinK++-20% Prob | 0.60 | 0.23 | 0.68 | 0.38 | 0.78 | 0.60 | 0.59 | 0.20 | 0.56 | 0.20 | 0.95 | 0.86 | 0.69 | 0.41 |
Max-20% Prob | 0.51 | 0.15 | 0.66 | 0.34 | 0.87 | 0.69 | 0.51 | 0.13 | 0.59 | 0.20 | 0.96 | 0.91 | 0.68 | 0.40 |
ReCaLL | 0.58 | 0.22 | 0.70 | 0.42 | 0.84 | 0.64 | 0.66 | 0.29 | 0.72 | 0.37 | 0.74 | 0.50 | 0.70 | 0.41 |
DC-PDD | 0.61 | 0.27 | 0.71 | 0.47 | 0.88 | 0.77 | 0.68 | 0.34 | 0.74 | 0.44 | 0.95 | 0.89 | 0.76 | 0.53 |
Ours (Tag&Tab K=4) | 0.69 | 0.28 | 0.78 | 0.48 | 0.91 | 0.76 | 0.72 | 0.30 | 0.75 | 0.36 | 0.97 | 0.90 | 0.80 | 0.51 |
Ours (Tag&Tab K=10) | 0.67 | 0.26 | 0.77 | 0.46 | 0.91 | 0.77 | 0.72 | 0.30 | 0.76 | 0.36 | 0.96 | 0.87 | 0.80 | 0.50 |
AUC and T@F=5% scores for BookMIA membership inference. Best in bold, second-best underlined.
Method | DM Mathematics | Github | Pile CC | C4 | Ubuntu IRC | Gutenberg | EuroParl | Average | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | 160M | 1.4B | 2.8B | 6.9B | 12B | |
Loss | 0.85 | 0.76 | 0.84 | 0.68 | 0.86 | 0.80 | 0.85 | 0.86 | 0.88 | 0.88 | 0.53 | 0.54 | 0.54 | 0.55 | 0.55 | 0.50 | 0.51 | 0.51 | 0.51 | 0.51 | 0.63 | 0.59 | 0.60 | 0.58 | 0.58 | 0.53 | 0.53 | 0.53 | 0.53 | 0.53 | 0.52 | 0.52 | 0.50 | 0.52 | 0.51 | 0.67 | 0.67 | 0.69 | 0.66 | 0.70 |
Zlib | 0.68 | 0.59 | 0.66 | 0.55 | 0.69 | 0.84 | 0.88 | 0.89 | 0.90 | 0.90 | 0.51 | 0.53 | 0.53 | 0.54 | 0.54 | 0.51 | 0.51 | 0.51 | 0.51 | 0.51 | 0.52 | 0.52 | 0.53 | 0.54 | 0.54 | 0.53 | 0.60 | 0.53 | 0.53 | 0.53 | 0.51 | 0.51 | 0.50 | 0.51 | 0.51 | 0.63 | 0.63 | 0.65 | 0.62 | 0.66 |
Min-20% Prob | 0.61 | 0.53 | 0.70 | 0.50 | 0.82 | 0.80 | 0.85 | 0.86 | 0.88 | 0.88 | 0.52 | 0.53 | 0.54 | 0.55 | 0.55 | 0.51 | 0.51 | 0.51 | 0.51 | 0.50 | 0.58 | 0.57 | 0.52 | 0.51 | 0.52 | 0.53 | 0.53 | 0.53 | 0.53 | 0.60 | 0.53 | 0.54 | 0.52 | 0.50 | 0.51 | 0.61 | 0.61 | 0.65 | 0.61 | 0.69 |
Max-20% Prob | 0.63 | 0.67 | 0.61 | 0.58 | 0.51 | 0.78 | 0.85 | 0.85 | 0.87 | 0.86 | 0.52 | 0.53 | 0.53 | 0.53 | 0.54 | 0.51 | 0.50 | 0.50 | 0.50 | 0.50 | 0.69 | 0.69 | 0.71 | 0.68 | 0.67 | 0.67 | 0.73 | 0.60 | 0.67 | 0.67 | 0.53 | 0.54 | 0.55 | 0.53 | 0.55 | 0.61 | 0.64 | 0.62 | 0.62 | 0.60 |
MinK++-20% Prob | 0.81 | 0.79 | 0.66 | 0.81 | 0.73 | 0.57 | 0.57 | 0.61 | 0.63 | 0.66 | 0.51 | 0.50 | 0.52 | 0.53 | 0.53 | 0.52 | 0.51 | 0.51 | 0.50 | 0.50 | 0.52 | 0.51 | 0.52 | 0.54 | 0.61 | 0.67 | 0.60 | 0.60 | 0.60 | 0.60 | 0.54 | 0.53 | 0.51 | 0.51 | 0.51 | 0.60 | 0.59 | 0.57 | 0.62 | 0.61 |
RECALL | 0.80 | 0.73 | 0.78 | 0.64 | 0.86 | 0.79 | 0.76 | 0.74 | 0.71 | 0.72 | 0.53 | 0.54 | 0.54 | 0.55 | 0.55 | 0.51 | 0.51 | 0.51 | 0.51 | 0.51 | 0.72 | 0.64 | 0.69 | 0.64 | 0.60 | 0.53 | 0.80 | 0.67 | 0.73 | 0.80 | 0.51 | 0.51 | 0.51 | 0.55 | 0.57 | 0.67 | 0.64 | 0.65 | 0.62 | 0.68 |
DC-PDD | 0.90 | 0.86 | 0.86 | 0.85 | 0.86 | 0.87 | 0.91 | 0.92 | 0.93 | 0.93 | 0.54 | 0.55 | 0.56 | 0.57 | 0.57 | 0.51 | 0.51 | 0.51 | 0.51 | 0.51 | 0.58 | 0.53 | 0.53 | 0.53 | 0.53 | 0.53 | 0.60 | 0.60 | 0.53 | 0.53 | 0.51 | 0.52 | 0.50 | 0.51 | 0.54 | 0.70 | 0.70 | 0.72 | 0.71 | 0.70 |
Ours (Tag&Tab K=4) | 0.96 | 0.96 | 0.96 | 0.95 | 0.95 | 0.78 | 0.82 | 0.83 | 0.84 | 0.85 | 0.54 | 0.56 | 0.56 | 0.57 | 0.57 | 0.53 | 0.52 | 0.52 | 0.52 | 0.51 | 0.64 | 0.65 | 0.64 | 0.66 | 0.64 | 0.67 | 0.67 | 0.67 | 0.67 | 0.67 | 0.55 | 0.54 | 0.55 | 0.54 | 0.56 | 0.70 | 0.72 | 0.73 | 0.72 | 0.73 |
Ours (Tag&Tab K=10) | 0.92 | 0.92 | 0.93 | 0.92 | 0.95 | 0.79 | 0.83 | 0.84 | 0.85 | 0.86 | 0.55 | 0.56 | 0.56 | 0.57 | 0.56 | 0.53 | 0.52 | 0.52 | 0.52 | 0.51 | 0.61 | 0.63 | 0.62 | 0.61 | 0.62 | 0.60 | 0.67 | 0.67 | 0.67 | 0.67 | 0.56 | 0.54 | 0.55 | 0.54 | 0.55 | 0.70 | 0.71 | 0.73 | 0.72 | 0.72 |
Best AUCs bolded, second-best underlined across MIMIR (top 4 columns) and Pile (bottom 4).
@article{antebi2025tag,
title={Tag\&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack},
author={Antebi, Sagiv and Habler, Edan and Shabtai, Asaf and Elovici, Yuval},
journal={arXiv preprint arXiv:2501.08454},
year={2025}
}