The abundance of provision of environmental data and their diffusion on the Internet through idiosyncratic methods without unified standards for disclosure, has brought about a situation in which data is available but difficult to aggregate, synthesize and interpret. This article explores the roots and implications of practices of “scraping”, i.e. automatic unauthorized collection of data published on the web, enacted by public and private subjects for the purposes of sustainability. Drawing from the concept of ‘datascape’ to describe the overall socio-technical environment this data circulates, the paper explores two case studies. The first, EDGI/DataRefuge, deals with a systematic attempt to collect and preserve environmental data and documents published by environmental management agencies, which is subject of cancellation by US Government policies. The second case, WorldAQI, examines a platform collecting, refining and publishing on maps the air quality indexes of hundreds of countries in the world. The first case allows us to see the human component of large-scale web scraping efforts of highly heterogeneous data, which highlights the need for resources. The second case highlights how the processes of collation, formatting and normalization of heterogeneous data to maximize readability have implications for data quality and representativeness. In conclusion, we observe how through data scraping stakeholders can enhance the spatial and temporal comparability of data and provide new avenues for public participation into complex decision-making processes.

Tarantino, M., Bridging the Green Datascape: Data Scraping for Sustainability Purposes, <<COMUNICAZIONI SOCIALI>>, 2020; (3): 346-358 [http://hdl.handle.net/10807/168256]

Bridging the Green Datascape: Data Scraping for Sustainability Purposes

Tarantino, Matteo
Primo
2020

Abstract

The abundance of provision of environmental data and their diffusion on the Internet through idiosyncratic methods without unified standards for disclosure, has brought about a situation in which data is available but difficult to aggregate, synthesize and interpret. This article explores the roots and implications of practices of “scraping”, i.e. automatic unauthorized collection of data published on the web, enacted by public and private subjects for the purposes of sustainability. Drawing from the concept of ‘datascape’ to describe the overall socio-technical environment this data circulates, the paper explores two case studies. The first, EDGI/DataRefuge, deals with a systematic attempt to collect and preserve environmental data and documents published by environmental management agencies, which is subject of cancellation by US Government policies. The second case, WorldAQI, examines a platform collecting, refining and publishing on maps the air quality indexes of hundreds of countries in the world. The first case allows us to see the human component of large-scale web scraping efforts of highly heterogeneous data, which highlights the need for resources. The second case highlights how the processes of collation, formatting and normalization of heterogeneous data to maximize readability have implications for data quality and representativeness. In conclusion, we observe how through data scraping stakeholders can enhance the spatial and temporal comparability of data and provide new avenues for public participation into complex decision-making processes.
2020
Inglese
Tarantino, M., Bridging the Green Datascape: Data Scraping for Sustainability Purposes, <<COMUNICAZIONI SOCIALI>>, 2020; (3): 346-358 [http://hdl.handle.net/10807/168256]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/168256
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact