Dimosthenis Karatzas

Universitat Autónoma de Barcelona

Dimosthenis Karatzas is a professor at the Universitat Autónoma de Barcelona and associate director of the Computer Vision Center, Barcelona, where he leads the Vision and Language research group. He received his Ph.D. from the University of Liverpool. His main research interests are robust reading systems, document image analysis, human-document interaction, and human perception modeling.

Keynote abstract - Trustworthy End-to-End Document Understanding

Automatic document processing enables the vast majority of daily interactions with and between institutions. From a research viewpoint, document understanding is a multimodal endeavour combining the visual analysis of document images with language processing. Document Visual Question Answering (DocVQA) was introduced in 2019 and it has quickly reshaped the state of the art, converting document understanding into a key benchmark for all modern multimodal VLMs. However, it is easy to show that such models tend to memorize information from their training set, and often hallucinate responses drawing information from their training data, a fact that raises concerns about the treatment of sensitive information in the training set. In this talk I will give an overview of the research that has taken place in the context of the European Lighthouse on Safe and Secure AI, aiming to address mainly privacy concerns, but also adversarial robustness and explainability, in the domain of document understanding.