Privacy-Aware Infrastructure in the AI-Native Era: An Asset Classification Case Study

Source Domain: engineering.fb.com

Privacy controls — systems that enforce retention, access, allowed-purpose, downstream-sharing, or anonymization policies — require a reliable understanding of data to function. Before such a control can operate effectively, it must know exactly what it is looking at. This can be complex, as demonstrated by a field simply named “age“: In one context, it might describe a person and require strict protections, while in another, it could be a cache time-to-live (TTL) numerical value in an infrastructure pipeline.

Figure 1: One column name, two governance outcomes. The identical field age is personal data when it describes a person, but ordinary system metadata when it is a cache TTL. Which is why a name alone cannot determine the privacy requirement.

This is the everyday problem behind privacy-aware infrastructure (PAI): The inputs are noisy and probabilistic, but the outputs need to be precise enough to drive enforcement.

AI-native products make that problem harder. They introduce new data modalities, faster iteration cycles, derived features, embeddings, multimodal inputs, and changing policy interpretations. Manual review remains important for judgment and accountability, but it cannot keep up with the volume and pace of change.

At Meta, we apply a hybrid pattern for asset classification at scale:

Build a rich context before asking a model to reason.
Use LLMs to handle ambiguity, cold start, and novelty.
Keep human-reviewed labels separate from model-generated recommendations.
Distill stable behavior into deterministic, versioned rules for routine enforcement.

The end goal is not “LLMs everywhere.” Instead, it is a system that can learn from ambiguous signals while moving production enforcement toward logic that is low latency, replayable, and easier to audit.

The LLM does not make the production decision in the common case, deterministic rules do. We use LLMs deliberately and narrowly, to interpret novel or ambiguous assets, and then to…

Source