Data analysis

Murales in Granada, photo by Pietro Masi, CC-BY-SA

This activity is led by prof. Karol Jan Borowiecki, SDU University of Southern Denmark.

INCULTUM aims to identify, collect and analyse data on various dimensions of urban and regional development, cultural tourism and a wide selection of socio-economic indicators. The collection measures are established ex-ante in conjunction with pilot coordinators.

Data analysis is based on three approaches to collect data.

First, the pilot studies are closely monitored and a wide range of data are collected before, during and after the intervention that involves the introduction of innovative approaches to urban and regional development. In this regard, it should be noted that the pilot site of Bibracte has worked hard recently in this field (2018-2019) by setting an observatory dealing with the characterization of the visitors and their behavior on the territory, as well as the perception of the heritage offer of the territory by its visitors and its local actors (to raise interest of the latter in tourism issues). The available protocol is going to be shared among the partners of the project, to get comparative data and to measure performances in the different contexts.

Second, official statistics are collected, translated, unified and processed from regional and national statistical offices as well as from international sources, including data from Eurostat, OECD or the Compendium of Cultural Policies & Trends.

Third, data are collected from novel and unique data sources by employing creative, digital approaches – such as Google Trends – which are used to measure over-time changing prominence of certain destinations across various regions and languages. Data are also scraped by the use of self-developed computer programs (coded in Python) to extract big data from various social media platforms (e.g., Instagram, Facebook, Twitter or various travel blogs), which are then used to approximate the prominence and reputation of certain tourism destinations or to identify problems in these locations associated with (over)tourism.

The collected data are then analysed in order to convincingly establish the relationship between each of the innovative approaches to urban and regional development and cultural tourism. State-of-the-art econometric approaches are used with a particular focus on identification of causal relationship, as opposed to just a correlation. The project explores also in depth the mechanisms (i.e. the channels) through which an intervention works on development.


Use of data analysis results

The results of the data analysis work are expected to provide important insights on how to design effective and sustainable cultural policy and to facilitate the mapping of good practices.

This research design aims to construct synthetic control groups that enables to assess the effect of one of the introduced innovative approaches on urban and regional development. This involves the construction of a weighted combination of touristic destinations as controls, to which the treatment destination (i.e. the one where the pilot study is conducted) is compared. This comparison enables to estimate what would have happened to the “treatment destination” if it had not received the treatment. This method can account for the effects of confounders changing over time, by weighting the control group to better match the treatment group before the intervention. This method may be complemented by utilization of difference-in-differences statistical techniques which estimate the differential effect of a treatment (the innovative approach implemented in a pilot study) on a ‘treatment destination’ versus a ‘control destination’. Finally, regression discontinuity design will be used which is a quasi-experimental pretest-posttest design that enables the identification of causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned.

In this light, the decisions on how, where and when to implement pilot studies is essential. For example, the intervention may be directed only at potential visitors based in a certain geographic area. By subsequently comparing observations lying closely on either side of this geographic area (e.g., potential visitor treated by intervention vs. untreated), it is possible to estimate the causal average treatment effect of the intervention as precisely as in randomised controlled trials. Finally, the reliability of the results is tested in placebo tests and by conducting a range of robustness checks by inclusion of various additional control variables or sub-sampling.

The unique and novel data sources and data collection approaches, including the pioneering application of machine learning tools to collect big data in tourism, allow INCULTUM to shed new light on various dimensions of cultural tourism and how it interrelates with urban and regional development.

The boundaries of our knowledge are pushed forward by novel insights on the short- to medium-term effects of the various pilot interventions. The scientific community will benefit from data collected in this project, which are made available via Open Data, along with an extensive catalogue describing the data collection process and each of the variables.


Presentations and videos of the Data Workshop provide insights into cultural statistics and data collection and analysis in tourism, as a contribution to design effective and sustainable cultural policy, by mapping and replication of good practices.

Presentations and video recordings from the INCULTUM Data Workshop >>>


The following deliverables are available for download:

D3.2 Intermediate findings presentation (PDF)

D3.3 Findings analysis report (PDF)