The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

Published in arXiv, 2023

Recommended citation: Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, XiaoFeng Wang, Haixu Tang. The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks. arXiv 2023. https://arxiv.org/abs/2311.16189

Download paper here

Abstract

We propose a novel exploitation, termed the Janus attack, in which an adversary can construct a PII association task and fine-tune an LLM with a minimal dataset of identity-related information, thereby potentially re-revealing and leaking hidden personally identifiable information. We successfully experimented on GPT-3.5, and our research was reported on the front page of The New York Times.

Share on

Twitter Facebook LinkedIn

Suliya

Abstract

Share on