report

Sentiment Analysis in the PERSONA project

Published in Social Acceptance by

AA

What is Sentiment Analysis in the PERSONA context, and is it privacy-friendly?

Over the last couple decades, well-known advances in computing power and techniques have permitted the development of a practical application of computational linguistics known as natural language processing (NLP). One of the main applications of NLP has been sentiment analysis (SA), that is the interpretation of emotion underlying texts. The process allows analysts to identify, categorize, and extract subjective information in written source material in order to help an organization understand the social sentiments connected to it or to its work and expressed online.

At CyberEthics Lab., we recently conducted an SA on Twitter and Google results for the PERSONA project, whose goal is to develop an integrated impact assessment method for evaluating the all-around acceptability of automated technologies in border crossing(1)During the year 2019, following a growing trend, approximately 3.8 billion passengers flew across the globe according to the United Nation’s International Civil Aviation Organization. As rigid infrastructures, airports require large investments to increase capacity, which is nevertheless bound by physical limits. Therefore, a logical solution for accommodating a growing number of passengers is not to expand airports structures, but rather to expedite procedures thanks to automation.. PERSONA intended to disseminate questionnaires regarding said technologies to large samples of travellers at airports, ports, and train stations. However, travel restrictions triggered by the COVID-19 pandemic reduced the likelihood of success for that specific effort. Therefore, SA was viewed as a valid alternative to gather stakeholder feedback. Our goal was to assess the feelings users expressed online towards border control technologies such as aerial, land, and water drones, artificial intelligence and facial recognition, and automated gates.

Our analyses were conducted in full compliance with Twitter and Google’s terms and conditions and did not involve the processing of personal data. Indeed, in the case of Google, data is anonymized by the search engine itself(2)https://support.google.com/trends/answer/4365533?hl=it&ref_topic=6248052; in the case of Twitter, which informs its users of the public nature of the content of the message that they decide to post(3)Clause 1.2, third paragraph https://twitter.com/en/privacy. The language in the privacy policy meets the criteria set forth at page 13, Box 4 of the document at the following link: https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/ethics/h2020_hi_ethics-data-protection_en.pdf , our algorithm discarded any personal information during the mining process.

The Google service used for this analysis is called Google Trends, and it is one of the many services made available by Google LLC. At the time of conducting our analysis, the US Privacy Shield – which gave us the legal basis to conduct the analysis – had not yet been invalidated by the Court of Justice of the European Union. The purpose of using Google Trends to measure interest in a particular topic across time in a given geographical area(4)Rogers, S. What is Google Trends data — and what does it mean? Jul. 1, 2016 https://medium.com/google-news-lab/what-is-google-trends-data-and-what-does-it-mean-b48f07342ee8.

Methodology

Our analysis was conducted in two distinct phases over the course of three months. In April 2020, we produced preliminarily results, which confirmed the goodness of our method. Similar data were analysed more extensively in July 2020. Although the frequency of social network mentions and web search activity may have increased following specific events (e.g. the usage of facial recognition by American law enforcement agencies during activist protests may have spurred Google users to learn more about the technology), the sentiments associated have been demonstrated to remain stable over time. For a much more detailed, technical explanation, follow this link.

We analysed the frequency of search and mentioning of the following technologies:

  • Artificial Intelligence;
  • Automated Border Control gates;
  • Drones (aerial, land, water);
  • e-passport;
  • Facial recognition;
  • Fingerprint enrolment;
  • Iris Enrolment & ID;
  • Sensors.

Google Trends

In order to extract trends from the Google Trends data, we used a powerful and fast algorithm called STL. In essence, even if STL is an iterative procedure, it is very suitable for “long” time series and Big Data contexts when a multidimensional analysis is used. In our work, we grouped Google search queries(5)Especially in Europe, Google search queries are a good proxy for overall internet searches, since Google has a volume market share for searches of over 90%. https://gs.statcounter.com/search-engine-market-share/all/europe into the following four dimensions (i.e. categories).

  • All categories (AC)
  • Intelligence and Counterterrorism (IC)
  • Law Enforcement (LE)
  • Public Safety (PS)

The first variable accounts for the total number of searches performed in a given country or at the global level. Basically, this datum is indicative of the global direction taken by a single keyword across the time. The other three categories have been considered not only for their consistency with the project but also because of their relative high probability to provide valuable information. In fact, especially when “technical” topics are under consideration, it is not rare to face situations where a category-driven investigation results in poor or no outcomes altogether. This is explained by the relatively small number of searches found by examining too specific a category.

As for the time window, all the time series have been sampled at a weekly frequency for a time span of about 4.5 years, starting from the first week of January 2015. Our analysis is based on the data extracted up until July 7, 2020.

Twitter

We applied the NRC Emotion Lexicon, a popular lexigraphic criterion developed by the National Research Council of Canada, for sentiment categorization. The Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). Despite cultural differences, it has been shown that a majority of affective norms are stable across languages, therefore non-English sentences can be translated automatically, and the words subsequently compared to the same list. To put it more practically, the average Twitter user needs to reach a certain level of emotional activation prior to tweeting; this internal motivation needs to be “strong enough” to follow through with the envisioned action on the social network. Given the necessary strength of the prerequisite inner state behind it, a user’s Tweet can be compared to the associations listed in the Lexicon in order to generate scores that identify the strength of the emotions and sentiments expressed through the words in that Tweet.

The Twitter analysis has been based on the data extracted on in April and July 2020. We collected Tweets geo-localized both four airports of the ten busiest airports in Europe(6)Eurostat, Airport traffic data by reporting airport and airlines, Consulted: 20-07-2020 https://appsso.eurostat.ec.europa.eu/nui/submitViewTableAction.do (London Heathrow, Madrid Adolfo Suarez, Paris Charles de Gaulle, and Rome Fiumicino) and far from them.

Results

While SA results do not replace questionnaire data entirely, they have provided valuable, additional insights. Some of the elements in the set of technologies examined did not yield significant results in one or more types of searches. This in itself is significant, given that it may highlight a weak interest and desire to comment on the technologies.

Nevertheless, it is possible to say that the overall view that emerges from the SA is that the public has mixed feelings towards the set. Among the eight selected, the two technologies that seem to the best chances of being accepted easily by the general public in border crossing scenarios – based on the available data – are e-passports and artificial intelligence.

Both the aforementioned technologies might succeed mainly because of their ability to induce a positive sense of familiarity. In the case of electronic documents, the sensitive data are supposed to be collected and processed by an official entity, very likely to be familiar to the people involved. As for the Artificial Intelligence, it can be said that the general public is likely to know how automatic methods based on Artificial Intelligence are routinely employed in many contexts (e.g. in supermarkets to study purchasing behaviours).

Older technologies seem to be more positively accepted than newer ones, and notoriously more invasive ones, whose users and beneficiaries might be perceived to be separate entities, seem to instill a greater sense of fear than those whose users and beneficiaries are one and the same person. To better explain this final point, facial recognition, which has oscillating interest on Google Trends and is generally negatively perceived on Twitter, shares with drones, similarly perceived, the characteristic that its operators and its subjects are different (i.e. Law Enforcement Officers and citizens respectively); on the other hand, e-passports are operated by the subjects themselves.

For a more in-depth report, feel free to contact us.

Notes

Notes
1 During the year 2019, following a growing trend, approximately 3.8 billion passengers flew across the globe according to the United Nation’s International Civil Aviation Organization. As rigid infrastructures, airports require large investments to increase capacity, which is nevertheless bound by physical limits. Therefore, a logical solution for accommodating a growing number of passengers is not to expand airports structures, but rather to expedite procedures thanks to automation.
2 https://support.google.com/trends/answer/4365533?hl=it&ref_topic=6248052
3 Clause 1.2, third paragraph https://twitter.com/en/privacy. The language in the privacy policy meets the criteria set forth at page 13, Box 4 of the document at the following link: https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/ethics/h2020_hi_ethics-data-protection_en.pdf
4 Rogers, S. What is Google Trends data — and what does it mean? Jul. 1, 2016 https://medium.com/google-news-lab/what-is-google-trends-data-and-what-does-it-mean-b48f07342ee8
5 Especially in Europe, Google search queries are a good proxy for overall internet searches, since Google has a volume market share for searches of over 90%. https://gs.statcounter.com/search-engine-market-share/all/europe
6 Eurostat, Airport traffic data by reporting airport and airlines, Consulted: 20-07-2020 https://appsso.eurostat.ec.europa.eu/nui/submitViewTableAction.do

Service involved

Assessment of technology impact on privacy
We help our clients and partners to achieve their business goals while addressing ethics, privacy and cybersecurity concerns in a manner that prevents conflicts, sanctions and loss of money derived by the lack of ethical and legal compliance to national and European applicable regulations. All information technologies must respect human fundamental rights and ensure the rights of people in relation to the protection of their private life, personal data and freedom. The new EU General Data Protection Regulation (GDPR) that replaced the Data Protection Directive in all EU member states on May 2018 introduces many new obligations for companies and a comprehensive set of rights for data subjects, including the right to an effective judicial remedy against a controller or a processor and the right to compensation. Therefore, in addition to being at the receiving end of an enforcement action, data controllers and processors may be subject to court proceedings and have to pay compensation to data subjects for their infringements of the GDPR. Our approach to help our clients to avoid this kind of issues consists of a holistic service composed by the following main components: providing a Data Protection Officer to drive the organization’s legal compliance action; mapping the data processed by the organisation to measure its impact on the ethical principles and legal framework; assessing the cybersecurity mechanisms used by the organisation technologies; conducting an impact assessment for all data processing mechanisms identifying ethical, legal and security risks; making recommendations for the implementation of the organisational and technical means to be compliant with the legal framework while ensuring data confidentiality (preserving authorized restrictions on information access and disclosure, including personal privacy and proprietary information protection), integrity (assurance that data is not modified or deleted in an unauthorized and undetected manner), availability (ensuring there’s timely and reliable access to and use of information) and accountability (supporting non‐repudiation, deterrence, fault isolation, intrusion detection and prevention, and after‐action recovery and legal action).
Social acceptance of technologies assessment
Connected, disruptive technologies permeate all aspects of our daily lives and pose challenges to the real foundation of human rights, such as the right to privacy or the freedom of speech. One could say that human values such as trust, accountability, and dignity are mutually influenced by the social acceptance of technologies. We support our clients to conceive a novel way of aligning the thus-far divergent concepts of sustainability, ethics impact, and technological innovation. By combining these three concepts, we respond to the need of a socially responsible innovation ecosystem by developing a tailored methodology for assessing users’/citizens’ social acceptance of technologies, a fundamental driver for technology market adoption. Our social acceptance framework includes six fundamental dimensions over which social acceptability (i.e. perception, motivation, trust, awareness, capacity enabling, and accountability) is measured and assessed through a two-step approach based on an online Sentiment Analysis (SA) – to create structured and actionable knowledge from the web – and the engagement of our client’s stakeholders (e.g. relevant target groups, associations of citizens, domain operators, decision makers, etc.) for the technology co-creation and communication regarding its social acceptance.
Responsible Research & Innovation
We love discovering and staying on top of new research to continuously advance our knowledge and to transform it into responsible innovation, taking into account effects and potential impacts on ethics, privacy and data protection. We help national and international partners to handle ethical, legal and cybersecurity concerns on both the research process and the project outcomes, through the legal support for the involvement of human beings in the research activity, the analysis of the national and regional legal framework applicable to the implementing technology and the recommendations for the secure and compliant development of technology. We are a multidisciplinary team that promotes the inclusion of legal and ethical concerns in the design of the technology, researching and producing new knowledge and best practices towards making a conscious and transparent adoption of technology.