On the verification of technology systems in ethics: validation, reproducibility, and responsibility.

Published in Ethics by


The necessity of regulation and standardisation of AI systems from an ethical point of view has been proved by many scholars and experts of the field. Giuseppe Primiero, researcher at the University of Milan, highlighted the importance of creating formal criteria for assessing the respect of regulations and standards. Formal criteria are more easily understood by technicians who design AI systems. Also, formal criteria are verifiable by experts that are external to the company designing the system in a fast way. Primiero suggested during the International Conference on Artificial Intelligence: Ethics, Law and Governance in Milan the use of white boxes as tools for assessing formal criteria such as demographic and opportunity equality. While demographic equality has to be considered as part of the training of the model (an a priori), so that the result does not depend on specific attributes such as the gender, opportunity equality is instead a criteria for the control of the model obtained (a posteriori), no difference between subjects with similar characteristics. In theoretical computer science, the term ‘specific’ stands for a feature with normative power. Given that, it is important to underline that specifics for responsibility are almost absent in a black box AI model, as the system would be determined (and judged) only by its training; thus if it is obscure, it is impossible to determine its normative correctness a priori. Delving into the potential repercussions of using the system or model helps identify areas where responsibility may come into play: selection and labelling processes, the control of results, and the satisfaction of responsibility requirements through formal criteria. The strategy would therefore be to construct a transparent box expressing desirable features against which we can check the distance (i.e. statistical divergences) with the closed-box model. Comparing results of the white box with those of the black one, it is possible to verify if the model follows the expectations in the design phase and the behaviour that we consider ethical, without losing the efficiency of opaque models. This solution helps to overcome the problem of opacity of complex systems as Large Language Models (LLMs), often considered black boxes, inaccessible thus difficult to understand and to assess. But how to verify if such a system or model is really responsible for its actions?


While we agree on the points raised by Primiero, we also see some limitations of this approach. As noted by Viola Schiaffonati (professor at the Polytechnic of Milan) during the same conference, a reproducibility crisis is ongoing in the AI field. The scientific validity of many computational models is objected by scholars who claim that those systems’ results are not reproducible as those of other scientific disciplines. Pr. Schiaffonati raised this point to question the necessity of having reproducibility as unique criteria for the scientific validation of experiments’ results. But, this point also highlights the necessity of considering current AI systems in different manners of how researchers classically assessed the security of systems. If AI results are not reproducible, since they are constantly evolving and learning differently, then verifying their security and ethicality cannot be a process limited to the design phase. Using white boxes at the beginning can be very useful to find some bias within the data set or in the way the system treats data. But new bias and unsafe behaviours of the system may emerge during successive phases, undermining the potential trustworthiness of the system. 


For these reasons, we consider fundamental for a correct assessment of the fairness of AIs to create an iterative process that analyse the continuous evolution of the system. Multiple field experts in ‘Frontier AI Regulation: Managing Emerging Risks to Public Safety (open AI)’ described the unexpected ways in which current AI systems are evolving. New information, new data, and new uses of the system create unpredictable scenarios and results. By structuring the ethical considerations into an iterative process, this approach fosters a comprehensive strategy. It enables a thorough exploration of potential ethical challenges, promotes inclusive decision-making, and ensures that ethical principles are not only acknowledged but actively embedded in the design and implementation of technology, adapting the assessment to every new evolution. Companies should collaborate with external experts to have a holistic view of the ongoing changes in the system to have full awareness of the ethicality of systems they produce. Having formal criteria is of paramount importance for the objectivity of the assessment, but those criteria should be used repeatedly and consistently to be effective. Moreover, clear definitions of terms from an ethical point of view are necessary to avoid lexical confusion and conceptual vacuum. In fact, one of the main reasons for the inefficiency of AI policies is the confusion created by vague and undefined terms. Only the use of formal definitions that are continuously updated in light of the evolution of emerging technologies can ensure clarity and policies’ effectiveness in the AI field. The work made on the AI Act has underlined the necessity to find shared lexical definitions to agree on conceptual terms: the question “what is an AI?” was not so easy to answer in the end. This is where the power of language is transformed into normative power, where conceptualisation of ideas translates into principles, norms and policies through the use of ethics. This is why any system, model, or corpus of law needs ethics to effectively reach its goals. If we want democratic and fair access to these systems, we must always take into account that their design is embedded in the social and political context. In this sense, embedding ethics in the design of the technology is a way for opening the ‘black box’ and letting people get into it.


This analysis originates from the International Conference of Philosophy – 23-24 November 2023 “Artificial Intelligence: Ethics, Law and Governance” at University of Milan, organised by PHILTECH – Research Center for the Philosophy of Technology




Primiero Giuseppe – Università degli Studi di Milano

Verifica e Validità: criteri formali per una AI più giusta e affidabile


Schiaffonati Viola – Politecnico di Milano

La crisi della riproducibilità e il suo impatto sulla Trustworthy AI


Van De Hoven Jeroen – Delft University 

The design turn in ethics of technology


Frontier AI Regulation: Managing Emerging Risks to Public Safety (open AI)

[2307.03718] Frontier AI Regulation: Managing Emerging Risks to Public Safety

Service involved