The US government is developing AI technology to unmask anonymous writers citing, among other things, tracking disinformation campaigns and other malicious online activity


The research branch of the intelligence community is preparing to develop new artificial intelligence systems capable of identifying who or what is the author of a specific text and, on the other hand, advanced systems targeting functionalities to protect the privacy of authors.

This effort, from what we believe, is potentially revolutionary for tracking disinformation campaigns, and things like combating human trafficking and other nefarious activities that take place in online text forums, and elsewhere using text,” said Dr. Timothy McKinnon in a recent interview.

McKinnon is the Intelligence Advanced Research Projects Activity Program Manager leading this work, which is dubbed HIATUS (for human interpretable attribution of text using underlying structureliterally human-interpretable attribution of text using an underlying structure.

Writing anonymously may not hide your identity any longer if the US government’s latest artificial intelligence project proves successful.

The Office of the Director of National Intelligence has announced a new AI project managed by the Intelligence Advanced Research Projects Activity (IARPA), which focuses on language fingerprinting technology. IARPA describes itself as investing in high-risk, high-reward research programs to address some of the toughest challenges facing intelligence community agencies and disciplines, and that certainly counts for one.

This is the Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) program which aims to advance human language technology to the point where authors can be identified simply by their writing style. The goal is for HIATUS to be multilingual and able to differentiate between authors based on stylistic characteristics such as word choice, sentence formulation, and information organization.

While this may sound alarm bells for anyone wishing to write anonymously, IARPA points out that HIATUS can also protect an identity. By automatically changing the language model of a known author, it should not be possible for an AI to determine who an author is. HIATUS also aims to be able to explain to novice users how it can attribute a writing to a specific author.

Quote Sent by Office of the Director of National Intelligence

The Intelligence Advanced Research Projects Activity (IARPA), the research and development arm of the Office of the Director of National Intelligence, today announced the launch of a program to design new artificial intelligence technologies capable of assigning authorship of writings and to protect the privacy of authors.

The Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) program represents the intelligence community’s latest research effort to advance human language technology. The resulting innovations could have far-reaching impacts, with the potential to counter malignant foreign influence activities*; identify counter-intelligence risks*; and to help protect authors who might be endangered if their writings are linked to them.

The goals of the program are to create technologies that:

  • Performs multilingual authorship attribution by identifying stylistic features, such as word choice, sentence formulation, organization of information, that help determine the authorship of a given text.
  • Protect the privacy of the author by modifying linguistic models that indicate the identity of the author.
  • Implement explainable AI techniques that enable novice users to understand, trust, and verify why a particular text is attributable to a specific author or why a particular review will preserve an author’s privacy.

Each of the selected artists brings a unique, fresh and compelling approach to the HIATUS Challenge,” said program director Dr. Tim McKinnon. We have a strong chance of achieving our goals, providing much-needed capabilities to the intelligence community, and greatly expanding our understanding of human language variation using the latest advances in computational linguistics and deep learning.

With the right model, IARPA believes it can identify consistency in a writer’s style across different samples, modify those linguistic models to anonymize the writing, and do everything in a way that is explainable to novice users, ODNI said. HIATUS AIs should also be language independent.

“We have a strong chance of achieving our goals, providing much-needed capabilities to the intelligence community, and significantly expanding our understanding of human language variation using the latest advances in computational linguistics and deep learning,” he said. HIATUS program director, Dr. Timothy McKinnon. .

In order to develop robust models, HIATUS plans to approach its goals as a matter of adversarial AI*: attribution of authorship and text anonymization are two sides of the same coin, and groups of HIATUS experimentation will therefore be opposed to each other.

Quote Sent by IARPA

Humans and machines produce vast amounts of textual content every day. The text contains linguistic features that may reveal the identity of the author. To support and protect the IC mission, the goal of the HIATUS program is to develop multilingual tools to assign
authorship and protect the privacy of authors. These tools must implement new explainable artificial intelligence techniques to provide reliable and verifiable results to human users, regardless of the background of the author or the genre, subject and length of the document.

The HIATUS program views authorship attribution and confidentiality as different aspects of the same underlying challenge*: to understand author-level linguistic variation by elucidating the stable identifiers of individual authors across various text types. The program puts in competition the systems of attribution of paternity and confidentiality of the performing artists. Teams of interpreters compete to generate more faithful representations between the unique linguistic footprints of individual authors.

Successful systems are submitted to HIATUS Test and Evaluation (T&E) teams for blind evaluation against the opposing team’s systems on an escrowed dataset comprising multilingual documents representing various text and data characteristics. ‘author. Attribution systems are rated on their ability to match items by the same author in large collections, while confidentiality systems are rated on their ability to thwart attribution systems. The system’s explainability will be assessed using a protocol developed by the interpreters, T&E teams and government partners at the start of the program. The HIATUS program begins at the end of 2022 and has a duration of 42 months.

McKinnon said that part of what HIATUS does is trying to demystify some of the unknowns around neural language models (the focus of HIATUS’s efforts), which he says work well but are essentially black boxes that work without their developers know why they are making a particular decision.

Ideally, McKinnon said, “when we’re doing copyright attribution or confidentiality, we’re able to really understand why the system is behaving the way it is, and be able to verify that it’s not detecting fallacious things and that it works the right thing.”

If successful, HIATUS could have far-reaching impacts, ranging from countering foreign influence activities to identifying counterintelligence risks and protecting perpetrators whose work could put them at risk, the agency said. ODNI. McKinnon adds that HIATUS AIs may also be able to identify whether text is machine-generated rather than human-authored.

Approximately 70% of IARPA’s completed research is channeled to other government partners for implementation, in which IARPA will not be involved—all it does is develop technology, not turn it into something usable. That said, the odds are in favor of HIATUS, according to the intelligence agency.

Don’t expect this technology to appear in full form any time soon: now that HIATUS has started, it will take 42 months (three and a half years) until the experiment is complete, and only then will Other government agencies will likely be able to take HIATUS for a ride, if McKinnon and his team are successful.

A technical approach reminiscent of that of GANs

If you look at the images on the ThisPersonDoesnotExist.com website (this person doesn’t exist), you might think you’ve come across random high school portraits or photos from some other source. Yet every photo on the site was created using a special type of artificial intelligence algorithm called generative adversarial network (GAN, or, in French, generative adversarial network).

Each time the site is refreshed, a lifelike image is presented of a person’s face. Phillip Wang, a software engineer at Uber, created the page to demonstrate the capabilities of GAN, then posted it to the public group “Artificial Intelligence and Deep Learning”.

The underlying code that made this possible, titled StyleGAN, was written by Nvidia and was the subject of an unpeer-reviewed article at that time. This type of neural network has the potential to revolutionize gaming and 3D modeling technology, but like almost any type of technology, it could also be used for more sinister purposes. For example, deepfakes, or computer-generated images overlaid on existing images or videos, can be used to spread fake news stories or other hoaxes. It is therefore for the purpose of raising awareness that Wang has chosen to make this web page.

How do GANs work?

If we had to simplify, we could say that the GAN implies that two networks work against each other. The first will be fed with raw data which it will break down. From this data, it will attempt to create an image. He will then submit this image to another network which only has real photos or images in its database. This second network will then judge the image and will inform the first of its judgment. If the image does not look like the expected result, the first algorithm will start the process again. If the result matches, he will be informed that he is on the right track and will eventually understand what a good image is. Once it is sufficiently trained, it can produce the chain.

Sources: DNI, IARPA

And you?

What do you think of such a project?
Do the stated objectives, such as tracking disinformation campaigns and other malicious activities online, justify, in your opinion, the deployment of such an arsenal? Why ?
are you reassured when it is indicated that the privacy of the authors will be protected?
Do you see any potential leads?
On the technical side, what do you think of this approach, which is reminiscent of that of GANs?

See as well :

Microsoft has developed artificial intelligence that can find bugs in code to help developers debug their applications more accurately and efficiently
Artist Receives First Known U.S. Copyright Registration of AI-Generated Artwork, Amid Heated Online Debate Over Ethics of AI Art

Leave a Comment