Making the most of big data

- By Daniel Hinge
- 07 Nov 2017

The concept of big data can be hard to pin down – how would you define it?

Per Nymand-Andersen: Big data can be defined as a source of information and intelligence resulting from the recording of operations, or from the combination of such records. There are many examples of recorded operations – records of supermarket purchases, robot and sensor information in production processes, satellite sensors, images, as well as behaviour, event and opinion-driven records from search engines, including information from social media and speech recognition tools. The list seems endless, with more and more information becoming public and digital as a result – for example, the use of credit and debit payments, trading and settlement platforms, and housing, health, education and work-related records.

Should central banks take advantage of big data?

Per Nymand-Andersen: Yes, though central banks do not have to be ahead of the curve. They should not miss this opportunity to extract economic signals in near real time and learn from the new methodologies. Big data can help to enhance economic forecasts and obtain more precise and timely evaluations of the impact of policies.

We may also have to manage expectations to strike a balance between the perceived big data hype and the reality of the future. Big data is part of the ‘data service evolution’ – it is borderless and impacts the structure and functioning of financial markets, our economies and societies.

Therefore, central banks must monitor closely, assess and adapt. We must explore and experiment with new technologies to avoid falling behind the technological curve, and to anticipate their impact on central banks’ policies and their transmission throughout the economy.

Central banks may need to join forces in exploring and assessing the usefulness of selective big data sources and their relevance to central banking, as has been initiated by the Irving Fisher Committee on Central Bank Statistics (IFC), for instance.

What are some of the most promising uses for big data at central banks?

Per Nymand-Andersen: The availability and accessibility of large data sources is a new and rich field for statisticians, economists, econometricians and forecasters, and is relatively unexploited for central banking purposes. It would be of particular interest if such sources could help to detect trends and turning points within the economy, providing supplementary and more timely information compared with central banks’ traditional toolkits.

Central banks already have access to a large amount of statistics, intelligence, structured data and information, which are regularly fed into their decision-making processes in the course of fulfilling their mandates. Central banks therefore appear well positioned to apply their existing models and econometric techniques to new datasets and develop innovative methods of obtaining timelier or new statistics and indicators. These supplementary statistics may provide further insights that support central bankers’ decision-making processes, and assess the subsequent impact and associated risks of decisions on the financial system and real economy. Big data could help central bankers obtain a near real-time snapshot of the economy, as well as providing early warning indicators to identify turning points in the economic cycle.

Additionally, there are new methods and techniques being developed by academic and private researchers to deal with new big data sources. For instance, text-mining techniques open up new possibilities to assess what John Maynard Keynes referred to as “animal spirits”, which cannot be captured in standard economic equations and quantitative variables. Sentiment indexes harvested from internet articles, social media and internet search engines may, by applying adequate statistical algorithms, provide useful and timely insight into consumer sentiment, market uncertainty or systemic risk assessments. Furthermore, new machine-learning techniques and tools are used to provide predictions on the basis of large, complex datasets.

Turning to banking supervision and regulation, there is a clear drive for regulatory authorities to obtain more micro-level data. Since the financial crisis, regulators have been keen to expand their data collections to improve their ability to monitor financial risks and vulnerabilities. New big data sources may support these supervisory tasks – such sources include online operations in trading platforms, credit card payment transactions, mobile banking data, records related to securities settlement and cash payment systems, clearing houses, repurchase operations and derivatives settlement, and commercial and retail transactions.

How is the European Central Bank (ECB) putting this to work?

Per Nymand-Andersen: Collaborative efforts among central banks have been initiated by the IFC, bringing central banks together in showcasing pilot projects on the use of big data for central banking purposes by applying the same methodological framework and working with a similar timetable. The ECB is a member of the IFC and is actively contributing to this work. These pilots are focused around four main themes focused on extracting insight from the following: the internet, including search engines and price information; commercial sources, such as credit card operations; administrative sources, such as fiscal and corporate balance-sheet data; and financial markets, looking at liquidity, transactions and prices. Further work is also being undertaken on text analysis of media reports.

Likewise, the ECB has the Money Market Statistics Regulations and AnaCredit providing high-volume intraday transactional micro-level data on intra-bank loans and banks’ loans to corporates and households, and we are experimenting with text analysis and search machines.

Target2Securities – which provides a unique platform for settlement in near real time of securities transactions – is a one-stop shop for securities settlement operated by central banks, with the central securities depositories as core customers. This could likewise be a potential source of data for exploring intraday and daily transactions of securities trading.

There’s currently a lot of discussion around machine learning – is its impact being exaggerated?

Per Nymand-Andersen: No, though it may take some time for it to enter the mainstream. Central banks are well equipped to experiment with these new techniques. Machine-learning – or artificial intelligence – techniques may be utililised to find new and interesting patterns in large datasets, visualise such datasets, provide summary statistics and predictions and even generate new hypotheses and theories derived from the new patterns observed.

How can central banks overcome the communication challenges presented by black-box models?

Per Nymand-Andersen: Communication is key as part of conveying and obtaining support for your policies. This includes providing the underlying evidence and assessments that lead to these decisions. Off-the-shelf models and their components should be transparent and available for replication. They should comply with similar statistical quality standards as those that already prevail, such as transparency of sources, methodology, reliability and consistency over time. Big data and models remain tools to assist with the decision-making process, but are only of substantial value if they are appropriately understood and analysed. There is no automaticity in complex decision-making.

What other pitfalls exist in the use and interpretation of big data?

Per Nymand-Andersen: One misperception of big data I hear of frequently is that we do not need to worry about sample bias and representativeness, as large volumes of information will supersede standard sampling theory given that the big data sources provide de facto census-type information – this is incorrect

For instance, access to all tweets would mean access to the characteristics of the entire tweeting population – corporates and members of the general public using a Twitter account. But the characteristics of this population may differ from those who do not tweet and are therefore excluded from the sample dataset. Thus, not all groups are represented by data sourced from Twitter and their characteristics may vary across countries, cultures, nationalities and ages. Therefore additional information is required to adjust the figures and to gross these up to the entire population as part of securing the quality of data – particularly if the aim is to extract signals and indicators on household sentiment, or to start producing household indexes using Twitter.

Secondly, statistical corrections will still have to be made for other features relating to unit measurements, double-counting – re-tweeting the same message – over-representativeness and over-fitting of models. In an event‑driven context – such as tweets or internet searches – volume changes do not necessarily refer to reporting units or to changes in demand. Take, for instance, the increased focus on the emissions scandal in the automotive industry – and the subsequent expected increase in internet searches and tweets at that time. These potential increases in searches and tweets may be driven by concerns and wanting to observe the impact of the scandal, rather than an increased interest in purchasing cars. Therefore, the data has to be adjusted if it is to be used as an indicator.

Thirdly, another pitfall refers to the misperception that correlation means causation. A high correlation between variables does not necessarily mean causation. Thus, no conclusions can be drawn purely on the basis of correlations between two variables. The similarity could be coincidental – additional controls therefore need to be conducted.

A fourth – and equally important – pitfall refers to ensuring sufficient quality. Statistical quality cannot be taken for granted and needs to be taken seriously to provide an accurate reflection of the structure and dynamics of our economies. Large datasets do not speak for themselves – they have to be described and contextualised before they can provide useful insights. Similarly, it is important that new big data sources are transparent in terms of their methodology and how data is generated. Otherwise, the value of policy advice and forecasting using big data will be seriously undermined.

Data governance seems to be a challenge at many central banks – how successful has the ECB been in ensuring the right people have access to the right data?

Per Nymand-Andersen: Data governance is vital for credibility and trust in institutions. Strict confidentiality protection is crucial, and is well defined in European legislation – council regulation 2533/98 and ECB/1998/NP28.

When moving from macro-level to micro-level data and statistics, governance components must be revisited to ensure they apply to granular data. This means finding flexible methods to clearly define the roles and responsibilities of each actor, and organising access profiles, controls and audit trails at each point of the production process. This is to manage new incoming data and metadata – including linking and mapping data dictionaries – organise updates and revisions, enrich data and micro-aggregation methods, prove representative statistics and ensure users can benefit from the disaggregation of statistics. Data governance is challenging and remains important for central banks. Several central banks and organisations acknowledge that data and statistics are strategic assets for the institution and are initiating organisational changes to create chief data officer functions as part of streamlining and enhancing data governance across the organisation.

When it comes to private data sources, one could ask whether big data sources on individual behaviours and patterns could be commercialised, or if they should become a public commodity that complies with statistics confidentiality and privacy rules. My vision would be for the availability of new private data sources in public domains to be fostered. The wealth of information and derived knowledge should be a public commodity and made freely available – at least to universities, researchers and government agencies

How must central banks adapt their operations to cope with larger volumes of data?

Per Nymand-Andersen: Central banks’ IT environments must be significantly more flexible to accommodate and manage multiple and large volumes of data streams, and provide the necessary tools for data explorations. I believe this is well under way within the European System of Central Banks.
More importantly, central banks need to attract data scientists with the ability to structure and process unstructured big datasets and swiftly perform statistical and analytical data exploration tasks. These new skills are in high demand, and central banks must compete with attractive private employers. Central banks will also need to invest in training and reschooling existing staff to acquire these new skill sets. Pooling available resources within the central banking community or creating partnerships are interesting options in this regard.

What areas of big data analysis are still closed off because of computational limits?

Per Nymand-Andersen: In today’s IT world, storage and processing power may no longer be the main bottleneck. IT environments must become significantly more flexible as a vital supporting tool for generating knowledge and value.

How might central banks’ use of data change over the next 10 years?

Per Nymand-Andersen: The data service evolution is changing our society, the way we communicate, socialise, date, collaborate, work and use and share data and information. Applying technological enhancements to valuable services – such as mobile devices, cloud services, artificial intelligence and the internet of things – will also change central banks’ data usage. Ten years from now is a lifetime, and becomes a vision.

My vision would be that public authorities move from data owners to data sharers. This requires mindset changes, collaboration and trust. I am a great believer in linking and sharing micro-level datasets among public authorities in such a way that it is only collected once, then made available to other authorities, while ensuring the necessary privacy and confidentiality protection. In our central banking world, this would relate to relevant national and European authorities in banking, markets and supervision. A precondition for managing, linking and sharing micro-level datasets is the use of standards and mapping datasets, so we have a common semantic understanding of how to describe financial markets, actors, instruments and their characteristics.

When one has worked with data for more than 20 years, one must conceptualise and structure the pool of unstructured data. Therefore, financial market actors and financial regulators must collaborate and intensify their working relationships beyond institutional boundaries and agree on standards to become an integrated part of the digital transformation. We need a kind of “Schengen agreement for data”. For instance, we must develop and adapt common identifiers of institutions – legal entity identifiers; instruments – unique product identifiers; transactions – unique transaction identifiers; and decision‑makers – enabling large datasets to be linked together, irrespective of where they are physically stored. Authorities can then slice and dice multiple datasets irrespective of where the data is physically stored. It provides the ability to identify ownership structures between legal entities and their decision-makers, and clarifies the relationships between creditors, instruments, guarantees, trades and decision makers in near real time. This transparent way of showing close relationships between borrowers and lenders will mitigate excessive risks exposures within the financial system, and avoid negative spillover effects to the real economy and citizens. This is of great interest to the financial actors and will provide significant cost savings for the industry, as well as efficiency gains for public and international authorities and regulators.

Per Nymand-Andersen is an adviser to senior management at the European Central Bank (ECB) and a lecturer at Goethe University Frankfurt. Nymand-Andersen has worked as an economist and statistician with the ECB since 1995, becoming principal market infrastructure expert in 2007 before taking up his current role in 2010. He completed an MBA at Copenhagen Business School.

The views expressed are those of the interviewee and may not necessarily represent those of the European Central Bank or the European System of Central Banks.

This interview is part of the Central Banking focus report, Big data in central banks, published in association with BearingPoint.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@centralbanking.com or view our subscription options here: subscriptions.centralbanking.com/subscribe

You are currently unable to print this content. Please contact info@centralbanking.com to find out more.

You are currently unable to copy this content. Please contact info@centralbanking.com to find out more.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@centralbanking.com test test test

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@centralbanking.com test test test