Easier Inputs, Harder Questions: AI and the PA-X Database

Easier Inputs, Harder Questions: AI and the PA-X Database

Easier Inputs, Harder Questions: AI and the PA-X Database

https://peacerep.org/2026/06/01/ai-and-the-pa-x-database/

Publish Date: 2026-06-01 12:27:00

Source Domain: peacerep.org

A Different Kind of Difficulty

When I started working on the Peace Agreements Database (PA-X) as a data and research officer almost five years ago, a surprising share of the job was data pipeline work. PA-X itself is built on years of legal analysis, conflict expertise, careful coding decisions and the data-processing infrastructure that supports it. A lot of my role, especially at the start, was taken up by the latter. Agreements arrived as scanned images of printed pages, sometimes folded, sometimes photographed at an angle in a room with bad lighting. Getting the text out of the image was its own project. Once we had the text, we had to wrestle it into a format we could actually work with. Then we needed to translate it, if it was not in English originally, as well as extract the key metadata. The structure of the document needed to be evident to the user, as its position in the document provided key context.

Five years on, a lot of this process is in a much better state, in large part because of improvements in Artificial Intelligence (AI). Text extraction is dramatically easier. Translation, at least during the discovery phase where we are just trying to work out whether a document belongs in the database, is fast and usually good enough, though for the canonical version we add to PA-X, a hybrid approach with a human translator is still what we rely on. The way we code agreements is shifting too, from copy-and-paste into category fields towards AI-assisted tagging of segments, which has opened the database up to new kinds of analysis, such as how provisions sit in relation to each other within an agreement’s structure. Generative AI is genuinely useful, particularly at helping people interact with complex, often non-AI systems in natural language queries at scale.

So, the data pipeline got easier. What I did not expect is that the conceptual work would get harder.

A…

Source