Big data and infrastructure
Big data is a widely used term to describe the collection and analysis of large quantities of data which are too vast to be evaluated using traditional methods. The term refers to a broad range of scientific research and innovation areas. The Earth and space sciences have, for a long time, been using and interpreting big datasets. Few other scientific disciplines can argue such an extensive history of dealing with substantial datasets. The Earth sciences’ use of big data is aiding advancement in climate change modelling, energy usage and generation, Earth observation and mapping, and natural hazard analysis (amongst others). In addition, our experience with handling these data can be applied to understanding other big data applications.
Current EU policy
In July 2014, the Commission outlined a new strategy on Big Data with the aim to accelerate ‘the transition towards a data-driven economy in Europe’. The strategy aims to create better services and products for the EU citizens. The main focus areas include:
- environment (tackling climate change and reducing energy consumption using new national and local datasets)
- agriculture (safer food and increased productivity through a more efficient use of natural resources and by using real time weather and crop data)
- manufacturing and retail
Current and future big data challenges exist within the data analysis. This is namely the volume, the frequency, the variety and the veracity of collected data. Data collected globally is currently expanding at a rate of 40 % per year. Robust analytical methods must be insured in order to extract meaningful and reliable conclusions.
Recent EGU papers covering big data in the geosciences
- Maximizing ozone signals among chemical, meteorological, and climatological variability (ACP, 2018)
- Accessing diverse data comprehensively – CODM, the COSYNA data portal (OS, 2016)
- Core operational Sentinel-3 marine data product services as part of the Copernicus Space Component (OS, 2016)
- A century of sea level data and the UK’s 2013/14 storm surges: an assessment of extremes and clustering using the Newlyn tide gauge record (OS, 2014)
- Implementation and scaling of the fully coupled Terrestrial Systems Modeling Platform (TerrSysMP v1.0) in a massively parallel supercomputing environment – a case study on JUQUEEN (IBM Blue Gene/Q) (GMD, 2014)
If you have a comment or suggestion, or if you would like more information please email firstname.lastname@example.org.