The way central banks think about data is changing. As the world becomes more digital, there is an ever-growing data pool from which central banks and regulators can draw information about the economy. Central Banking convened a webinar in association with TCSA to discuss the evolving data landscape.
- Eric Santor, Adviser, Bank of Canada
- Per Nymand-Andersen, Adviser, European Central Bank
- Howard Chang, Vice-President of Global Affairs, TCSA
The past few years have seen an explosion of data, with all the attendant challenges and opportunities for statisticians and data scientists, central bankers among them. Central banks worldwide have been embracing higher data volumes and speeds, new types of data and new tools for processing, analysing and sharing it.
During the webinar, Improved central bank data management in a digital age, convened in January, our panel examined how central banks are approaching existing data issues, whether new technology is a help or hindrance when it comes to upgrading legacy systems, the challenges related to data privacy, and why monetary transaction data might be the key to more accurate and efficient data-driven policy-making.
Central Banking: how do you see the current data environment?
Per Nymand-Andersen, European Central Bank (ECB): There has been a significant change in the way we handle data. The main reason is not the Covid-19 pandemic but, after the financial crisis [of 2007–08], there was a need to have more detailed information on the financial sector. Also, alternative data sources have been extremely helpful following the recent pandemic, to the extent that even Federal Reserve chairman Jerome Powell has said that high-frequency data has become much more important.
There is a need for more timely information on consumer spending and saving patterns. At the same time, there are also new elicitors at the European level. In particular the new European data strategy, which tries to foster a common marketplace for the release and exchange of digital data.
For central banking, these new data sources and the way we work with micro-level data should provide supplementary insight. There is no need to replace any macroeconomic indicators but, by using new information sources and more granular data, central banks are able to gain a near real-time snapshot of what is happening in the economy. Where you have a bulk of large micro-level datasets, you can potentially combine them with other fields to give you additional insight that may result in new theories.
At the ECB, we are engaged with a lot of new datasets as well as micro-level data, which we are collecting via regulation. We are collecting data on a daily basis from MTS Markets, a digital trading platform for government bonds. This data allows us to produce eurozone yield curves daily, which are released using Nelson-Siegel-Svensson modelling. We are using Google data and, together with Google’s chief economist Hal Varian, we are nowcasting. We are also embarking on some text mining projects with Factiva.
Then, relating to our core mandate, we are using Prisma data, which is providing us with granular data on the prices of goods. We use this to look at volatility and resilience analysis of certain categories with the product basket.
The proliferation of new fintech companies also provides us with a rich paradigm of records. I cited the example of the euro yield curves, but there are many payment and clearing systems that have popped up. Credit card use and consumer spending, mobile payment systems, price scanning data, social media, digital trading platforms and digital consumption platforms are also interesting sources. There is enormous potential in drawing on all these platforms to generate insight that you have to get into your production system.
There is significant value in at least exploring these new sources for economic and financial activities, and there is certainly political drive to develop digital data strategies across borders. Eventually we will need to move from these experimental datasets to actually including them within our toolkits, and we should leverage partnerships between central banks, academia and the private sector to do this.
Eric Santor, Bank of Canada: The world is going digital. The commercial viability of artificial intelligence (AI), machine learning, big data, robotic process animation, the internet of things, all mean we are now in a digital world. And, if central banks want to be effective, we need to be able to operate using the tools and the technologies that are available.
At the Bank of Canada, we have undertaken two big things in the past year or so. We have initiated a digital transformation strategy. We simply want to be ‘digital first’ in every aspect of our business. This is how we will integrate new technology to allow us to do our work better, faster and cheaper. To be clear, it is not just about technology; it is about new ways of working and working in a digital mind-set, failing fast, trying new things, being innovative and taking risks where they’re appropriate.
We have also implemented an enterprise data analytic strategy, which aims to make us a leading central bank in the use analysis of data. The idea is to explore how can we enable and promote the responsible use of data, so we can make better business decisions, have better business insights and economic analysis, achieve operational efficiencies across the institution but, at the same time, enable a culture of innovation and exploration. More broadly, we are now implementing machine learning across the institution, wherever it makes sense to do so. Right now we have over 25 machine learning projects. There is a wide array of applications, not just on the economic side but also within the institution.
We have performed sentiment analysis on our surveys, as well as on our own monetary policy report. Machine learning is also employed to try and improve our forecasting. We use it for anomaly detection when we intake data and we run machine learning through it.
There are challenges when using machine learning tools. The biggest is getting access to the data in a usable form and in a timely fashion. It is nice to talk about big data, but actually getting it onto a platform that is usable, making sure it is high quality and making sure it has appropriate data that is compliant with our guidelines is difficult. Another big challenge is the bias that can come in if you are using a dataset that has some bias already embedded into it. The machine learning algorithm will simply replicate that bias.
Algorithms are alive, they are digital workers and they need to be invested in, so you have to ensure you are on top of your algorithms and see how they work as new data comes in and whether they adjust to it. At the same time, we are automating a lot of things inside the bank. We can take a lot of processes that were manual and apply some simple process automation or robotic process automation. The reporting processes can be automated as well.
For all this to work, for us to be digital first and use the data in the way we want to do, we have to have the appropriate data infrastructure. This is often a big challenge inside our institutions because we often have lots of legacy systems. At the Bank of Canada, there are three big bins of data that you have to think about. One is the analytic data, which often comes from official statistics. Operations data is the second set – this relates to our many functions. And the last set is corporate data. This is the HR data or the audit data, anything that is happening in the building. Now the biggest question is what to do with the data: where do you put it – should it be in a data lake or warehouse? It really depends on what the data looks like, how you want to use it and how much structure it contains.
We are developing an integrated strategy concerning all the different elements of the data that you have to consider, including security, privacy, transparency, how accessible is this and to whom inside the institution. It becomes a balancing act between wanting to allow the analysts and the people in the business to be able to innovate and have the flexibility to use the data, and what I would call the efficiency of managing the data.
There is so much accessible data out there, we need to prioritise how we want to use it. You really need to have a governance structure in place and to think about the trade-offs of how you want to prioritise the investments in your data. Not all data is great – you need to make choices about what data you are going to have and how you’re going to put it into the system. Big data has arrived, all these technologies are there; the question is how do we use them in a way that will allow us to do our work better, faster and cheaper?
Howard Chang, TCSA: What our team has been researching and discovering for the past few years is the National Data Brain. It is a data-driven monetary policy mechanism for financial stability to be used by central banks.
As an international consultancy, we have been serving central governments and central banks around the world in modernising and digitalising national economic governance capabilities. In the process, we have discovered that, in the past 20 years or so since the 2007–08 financial crisis, much of the world has changed and yet much has remained the same. Negative interest rates, excessive quantitative easing and massive balance sheets are becoming the norm. This situation has been exacerbated during the pandemic, repeating itself through boom and bust cycles because our currency in each country lacks a real hard-value anchor. This leads to a lot of money printing and high debt.
From the gold standard to the dollar standard, what is possible now is a whole new realm of possibility given the advancement of digital technology: a data-driven currency value and data-empowered value-based standard where your currency is pegged to quantifiable, real national economic strength. The question is do we have the data we need or data-driven currency model? We do have a lot of data but this data is limited in its sources. Data also comes in an overwhelming number of varieties, different volumes, varying levels of granularity, changing access requirements and fragmented formats.
The new National Data Brain solution aims to address each of those issues, and is divided into three major components: 1) the collection of data via a national data collection network; 2) the management and processing analytics of the data via a national data central processing unit (CPU), similar to a CPU or chip within a computer; and 3) what kinds of applications will you expose it to and use via a national open access data platform.
For example, if I walk into a supermarket to buy a bottle of water, the clerk will scan the bottle of water and I will pay £1. At that moment, at the point of sale, many pieces of data are generated: commodity data on the bottle of water, financial data on the £1 I paid, the business data on the store and, also, consumer data on who I am and where I came from. These pieces of data are collected separately by different entities – public and/or private – and scattered through our society.
What is possible here is that all these dimensions of data can be captured right at the point of sale and piped directly as a small standardised data package to the central bank. This is a huge breakthrough innovation and will cut down on the cost and on the standardisation of data units.
Once all the data has been collected in the CPU of the National Data Brain, it is automatically filtered and organised into six dimensions, which subsequently form very neat little algorithmic units that are aligned at the micro individual level all the way to the country. This allows for the easy computation of a large volume of data.
With the tremendous amount of data gathered, we can then use it for different needs. Using our open access data platform, not only can the central bank make timely monetary decisions based on real-time data, but other departments within government or academia can take advantage of it too. The private sector would also be able to access this type of data, which will likely spur further data innovation within the country.
Central Banking: How can TCSA’s platform deal with smaller institutions that might want to submit data through manual processing? How do you overcome that kind of reliance on legacy systems?
Howard Chang: There are two aspects to my answer to this question. First, this system needs to be built on the existing data platform of the central bank because the central bank already has access to the most ubiquitous source of data in society, which is the monetary transaction data. Whether or not it needs to be automated, we predict that more than 30% of automation or electronic payment is already sufficient to start building the model and have an ever-closing reflection of the real economy. Once the data is collected, we open it back up to the private sector, to the smaller institutions or small businesses, so it is almost a democratisation of data access for all. The idea is to standardise the data for easier access.
Central Banking: What are the challenges relating to privacy when you’re dealing with data?
Howard Chang: The heated debate about data protection of privacy is quite a common question for all and should be an important concern. First and foremost, our data privacy is already a concern because our data is already captured by many players.
Data sharing and privacy are not in binary opposition; privacy can be ensured through technology during the data preparation stage. For example, through anonymisation, output checking, physical controls, authorisation and cloud technology. It could merely be a technical issue. User data can be collected, but rights, access and use of it should be clearly defined within systemic policies and regulation. TCSA encourages the construction of a national open access data platform to be used for policy-making as well as research and commercial projects. However, who gets access to what is something that needs to be explored further.
Transparency towards data should help the government build better protection, rules and regulations to protect the use and access of such data. Would you rather have a profit-driven private sector player owning all of your data or under the rules and regulations of a transparent public sphere by the government? That is also an interesting point to consider.
Central banking: Do you encounter challenges relating to privacy when you’re working with data? How does that change the way you work with data?
Eric Santor: Part of the data strategy is that we want responsible use of data and so we take privacy very seriously. Central banks are built on public trust, so we need to maintain the highest standard with regard to privacy and other elements of how you use data responsibly.
We have to make sure we have the appropriate protections for data that we collect and access while, at the same time, we have to ensure the data is appropriately anonymised and responsibly used. We have established rules to the very highest standard because we cannot use data irresponsibly. We need to make sure we are the vanguard of the highest level of trust on this dimension. A lot of effort goes into making sure that data conforms to the user responsible.
Per Nymand-Andersen: This is the bread and butter of central banks so, in that sense, we have confidentiality regulations in place in order to secure data. This is crucial in terms of having the trust vis-à-vis the financial intermediaries that we are dealing with as part of their recording requirements.
Central banks are very familiar with confidentiality, on the protection of individual data, on anonymising data and how to macro aggregate the data so individuals cannot be identified. Outside of the central bank community, there is the General Data Protection Regulation at the European Union level, and that needs to be tailored more to the requirements of the new data space. Particularly, when you have micro data, you want to have identifiers. I believe we have to think about the enablers for data sharing among public institutions, and private and public institutions. People should still have the ability to remain confidential and have their information submitted to public authorities with assurances it will be protected.
Central Banking: is there a contradiction or a difficulty in terms of adopting AI and machine learning and similar advanced analytics capabilities where central banks do not have sufficient granular data and the right infrastructure to gather it all in one place?
Eric Santor: One thing we have seen inside machine learning communities is that the algorithms are quite well established. They have been working commercially on a research basis since the early 2010s. The challenge is not whether you can find a good algorithm that would suit your task, it is whether you have good data to run the algorithms on.
Part of the art of being a data scientist and being an economist these days is to make sure you have the right data for the question you are trying to answer. I think a lot of the effort really does have to go into curating the right kind of data to address the things you want to address. Seven years ago, central banks were not in that place because we did not have the infrastructure and we had not even attempted to build it. But we now have the capacity to do so. At the same time, we have hired data scientists – in fact, a whole team at the Bank of Canada. What we have found is, for every success, there is a failure as we try to figure out how this is best going to work, but we do have live examples running now in production.
We also have to think about whether machine learning on a big dataset improves your forecast. It is important to remember that forecasting is not just about having a number to present but how you build your narrative. One of the nice things about big data is it can help you tell stories you were not able to tell with official sector data that would come with a two- or three-month lag and monthly or quarterly frequency.
Central banking: Is it feasible to establish a direct link between the central bank and the commercial banks to gather data in real time?
Howard Chang: Current supervision regulatory tech used by commercial banks for reporting compliance aspects to the central bank is a well-established legacy way of reporting the data. What TCSA is proposing is to go back to the source of the data where every human being’s daily activities are captured within monetary transactions. Every action and every activity within a person’s life leaves an actual transactional mark and, when you can capture the data source surrounding each person, we believe that provides a more complete picture.
Commercial banks and their financial transactions or their reporting to the central banks is still very important for compliance purposes in terms of forecasting, decision-making and observing the real economy. But personal monetary transaction data will give a better picture and reflection of it.
Per Nymand-Andersen: In an ideal world, the question would be: what value added would it give? It would be wonderful. When we talk about data sharing among institutions, then we actually mean operability; it is not about moving data across different databases but about giving access to public authorities or whoever needs to have access via a link to their datasets.
If you have a straw in two different datasets that are stored and managed by different public authorities then, yes, you can have a real-time look at the individual firms that you have a need for. For monetary policy, finance ability, this would be wonderful. This is why we have micro-level data, so it is not really a big issue for central banking.
I can see there could be a potential need in the future for real-time access, but monetary policy decisions and those relating to banking stability are done over the medium term. You can have snapshots daily of where the inflation rate is going, but are they signalling anything because of its volatility on a day-to-day basis or it will just add additional confusion? You want to look at trends and you want to look at turning points and that is what has more real-time information.
I think TCSA seems to have already implemented what is already in the thinking of the governance structure of data. When we are talking about data, we are talking about both numeric and textual data.
Central Banking: If a central bank wanted to adopt some of these advanced analytics tools and overhaul its data governance framework, what initial steps would you need to think about and focus on there?
Eric Santor: I think there are different elements you will want to think about. The first is you have to look at your existing data infrastructure. What are your legacy systems? Think about what end-state data infrastructure you want.
It is very tempting just to jump right in, start onboarding big data and running algorithms on it but, as you do that, every time you bring in more data, run algorithms and then produce more data out of that, you’re building legacy. You do not want to end up with a legacy debt on the data side that you need to unwind. Before you jump in you have to think about what your system looks like today and how it is going to evolve.
You also need to make sure you have the skills to use it. Having someone that has learned some Python is not going to be enough. You need to hire the expertise on the data science side so you have people in place who know what a good dataset looks like. You will also need the skills to implement machine learning algorithms because, later down the line, you will need to be able to explain what you are doing.
Not everyone has to be using machine learning but your managers, your leaders need to be familiar and versed in data literacy. You need to build that capacity and know how to lead that kind of work. I would start with that and talk to central banks that are already experimenting. This is a great way of gaining information and lessons learned of what has worked and what has not.
Per Nymand-Andersen: You have to start with what it is you need and where you want to go. You should not start with what you have. Then pilot it and adjust and learn as you go; there is not going to be a ‘big bang’.
I agree skill set and knowledge need to be within the business. It is the business and the understanding of the data that should drive the skill set needed for the data science. Many data scientists do not really understand central banking, but they have good data skills and will be able to use advance analytical tools. But, at the end of the day, the business should drive the data science people to help you.
Central Banking: How do you standardise data, for example, transaction data, which might come from many sources or several trading venues?
Howard Chang: TCSA has been working with each country to create a joint task force to customise our solution depending on the local requirements and needs. In terms of standardisation of data, we have a new economic framework where all human activities can fall under six dimensions, each of which will represent an account. Each of these will have sub-accounts and are based on, for example, natural resources for environmental spheres or industrial or public sector or basic necessities. We have a whole system we need to be able to categorise and standardise.
Eric Santor: If you want data to be accessible, open and transparent, of course standardisation is very helpful but it is a great challenge. It is going to be difficult to achieve because there are so many players generating data in different formats. I am not sure it will be possible to get a standard in place.
Data that is generated is going to be proprietary by the firms that generate it; they in turn will have their own reasons for formatting it in the way they do. There will be standardisation over time, but it is important to have the capacity inside your institution to be able to curate data and bring it in in a wide variety of forms. This is where you have to think about whether you want to have a data warehouse where it is more structured, versus a data lake where there is more flexibility.
Per Nymand-Andersen: When you are working with micro data you need two types of standards. One relates to identifiers of the institutions and of the product and of the transaction; this can only be done if regulated. There needs to be regulation and you need to have an authority with the responsibility to generate identifiers and release this to the public so everyone can use it. But methodology is also important. We need guidelines in order to have standards.
Central Banking: How are the ECB and Bank of Canada addressing challenges such as data citations for data science and privacy preserving decentralised analysis of centralised data?
Eric Santor: We have very strong guidelines and rules about compliance and meeting our privacy standards, and so the sharing of data really is restricted by those guidelines and we adhere to them. Central banks have to take that very seriously and that limits the sharing of many things. That is a good thing because we simply cannot be handing out data just because it is convenient to do so.
Per Nymand-Andersen: It is about providing less aggregated data and sharing this among public institutions, as well as academia. It takes time, but it is certainly the right way to go. We need to ensure we have the right protection in place to be able to share the data at a less aggregated level while ensuring the protection and confidentiality of the individuals. Where is that border and how can we go forward? This is what we need to solve and that is the beauty of having the micro-level data.
Central Banking: Which technology are you the most optimistic about in the near future?
Howard Chang: In my opinion, all the advancements made in regard to fancy technologies are admirable but they are not necessary. When we return to the basics, when we calculate each monetary transaction data point, a huge pool of data is already available. Using new technology to decipher and make sense out of it is admirable but counterproductive.
By returning to monetary transactional data, only simple addition and subtracting is needed in order to get a full picture. The best solutions are usually simple, but require a fundamental rethink of our data structure, data management and data infrastructure.
Eric Santor: Machine learning is marvellous technology, but it is already used widely throughout the economy and is only going to gain more traction, so central banks need to go there as well and use it. What I am excited about at the Bank of Canada is our new quantum lab for advanced analytics. As a result, we have started proof of concepts using quantum computers. This is still an emerging technology and has a long way to go before being commercially viable within the economy and for use in general. However, we are laying some tracks now to understand how it works so when quantum computing becomes more widespread we will be ready to use it. I’m very excited about that.
Per Nymand-Andersen: Data science techniques are what is required to master and visualise all of this micro-level data. I would support Howard in what he has said because data cleaning techniques are vital. If we do not get good-quality data in then it does not matter what we do with the output.
At the moment, we spend 95% of our time resolving quality issues when we would like to spend that time looking at the output and making an assessment or analysis using the data. I would say data cleaning techniques – whether they require technology or not – are the more important. It is certainly what is needed and what I would advise looking into.
Watch the full webinar, Improved central bank data management in a digital age
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email [email protected]
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email [email protected]