AnaCredit: banking with (pretty) big data

The supranational credit database in Europe is intended to help policy-makers and banks assess cross-border risk


In May this year, the governing council of the European Central Bank (ECB) approved a new statistical regulation (ECB/2016/13) establishing, as of end-2018, a common, granular big data database, known as AnaCredit, shared between eurozone member states and comprising harmonised data on credit and credit risk.

The establishment of AnaCredit marks the beginning of a new era for central banking statistics, a genuine paradigm shift triggered by the need to “move beyond the aggregates”, to quote the title of the Eighth ECB Statistics Conference in July 2016.

Aggregate statistics failings

At the same time, it can be seen as the final achievement of a lengthy and, at times, very challenging process, the start of which can be traced back to
2008–2009. At that time, just a few months after the onset of the financial crisis, a small, high-level committee chaired by Otmar Issing1 identified the creation of a ‘global credit register’ as one of the necessary elements of a more resilient international financial architecture. In particular, referring to the data from a credit register, the Issing Committee observed that “while the value of such information is appreciated almost universally on a national level, there is nothing commensurate on an international level”, thus suggesting that “given the current high level of international lending and exposures, a global credit register will greatly enhance risk management, both at the firm level (improving due diligence of cross-border exposures) and at the systemic level (adding a cross-border dimension to financial stability stress testing and to an evaluation of real effects on the economy).” In concrete terms, the committee proposed that “a harmonised approach should be adopted, where harmonisation refers to the standardisation across countries.”

Ten years after these recommendations were made, AnaCredit will finally offer policy-makers in the eurozone the possibility of making their decisions based on highly detailed, timely, accurate and standardised information on credit and credit risk.

Bank credit plays a key role in the eurozone economy, where the share of loans in the total external financing of small and mid-sized enterprises (SMEs) is overwhelmingly higher than in the Anglo-Saxon world, where market financing plays the most important role. Moreover, non-performing loans in the eurozone have surged since the onset of the sovereign debt crisis and have recently been estimated at almost €1 trillion ($1.12 trillion), a value that is similar to the annual GDP of a large EU member state. Good statistics on credit are therefore clearly essential for the ECB – or any other central bank – to fulfil its mandate.

A complete credit picture

At the same time, the recent economic and financial crisis has suddenly revealed the inadequacy of ‘traditional’ aggregate statistics to support ‘unconventional’ policy-making. In particular, we have learned that aggregate statistics, although of high quality, are not sufficient for an adequate understanding of the underlying developments in the wake of increased heterogeneity and market fragmentation, which has resulted in developments across different segments of the economy. Information at the granular level is thus becoming increasingly necessary to support policy decisions.

It is worth recalling that the collection of detailed information on credit and credit risk is not new. Several countries within the EU are operating granular credit data sets, either via national credit registers – normally run by the national central bank (NCB) – or via private credit bureaus. Nevertheless, information is very limited or totally absent in some countries, while significant methodological differences prevent meaningful supranational analysis of the information currently available – for instance, data is collected on a loan-by-loan basis in some cases and on a borrower-by-borrower basis in others. The reporting threshold, defining the minimum exposure recorded in the credit register, also varies considerably across countries, ranging from effectively no threshold up to a few hundred thousand euros. The result is that no meaningful insight can be derived from a cross-country analysis of the current national data.

By contrast, the AnaCredit data set will deliver, mostly on a monthly basis, loan-by-loan information on credit to companies and other legal entities (no natural persons will be covered) extended by, at the minimum, eurozone banks and their foreign branches.

In particular, the AnaCredit data collection has been designed to obtain a complete picture of the credit exposure of the reporting population. By linking ‘observed agents’ with ‘reporting agents’, data is collected on loans granted by credit institutions resident in the eurozone irrespective of whether such loans are provided directly by the credit institutions or, indirectly, via subsidiaries or branches under their control. The information collected comprises almost 100 different ‘attributes’ covering various aspects of the credit exposure (outstanding amount, maturity, interest rate, collateral or guarantee, information on the counterparty, and so on), and is organised in several ‘tables’ connected to each other via the ‘instrument’, which is the centre of the data model.

Besides eurozone member states, other EU countries can join this endeavour on a voluntary basis. Some have already declared their intention to do so, and others will hopefully follow in the coming years. This will result in a complete, detailed, timely and accurate overview of credit and credit risk developments in all participating countries, finally allowing a complete and meaningful cross-country analysis, as urged by the Issing Committee.

The decentralised approach, with NCBs collecting information from their reporting agents and then transmitting it to the ECB, has been designed to allow the necessary national flexibility and exploit the synergies with existing credit registers to the maximum extent. A satisfactory balance has been found between the need to take into account country-specific features, such as the degree of concentration of the banking sector, and the strong demand for common rules allowing a level playing field across all participating countries. The relatively low reporting threshold (€25,000 calculated at the borrower level) allows a sufficient coverage of credit to SMEs, which are the backbone of the European economy both in terms of employment and value added. At the same time, to alleviate the reporting burden and in line with proportionality, some discretion has been left to NCBs in granting – partial or complete – derogations to small institutions in their respective countries.

Early estimates point to around 100 million exposures reported every month, representing loans granted by about 5,000 credit institutions to more than 15 million counterparties. In view of these large expected data volumes, a state-of-the-art IT infrastructure is currently under development, which also takes into account the need to ensure an adequate protection of confidentiality.

An unprecedented step

All in all, this is clearly nothing short of a ‘jump into the world of big data’ for central bank statistics. Although the big data definition might not be fully appropriate in absolute terms (we are still far from the billions of intra-day data points flooding in other contexts), it certainly represents an unprecedented step for the ECB, which traditionally relies on a relatively small set of aggregate statistics for supporting policy analysis and decisions: a true paradigm shift.

Still, in the face of the obvious initial investment and adjustment required by a project of this scale, AnaCredit will bring huge benefits to many stakeholders, reporting banks included.

Collecting very granular information will support the needs of policy-makers in several important fields: monetary policy analysis and operations, risk and collateral management, financial stability and economic research. The AnaCredit data sets will allow a better understanding of the monetary policy transmission channel, particularly with regard to transactions involving SMEs. Although in the initial stage of the project no data will be specifically collected for supervisory purposes, banking supervision will also find the data very useful in many respects, thanks especially to the information on the link between lenders and to the unique identification of counterparties across the entire lender population.

Obviously, potential users – the ECB and participating NCBs, national and European supervisory authorities, national and European resolution authorities, the European Systemic Risk Board and the European Commission – will have access to the AnaCredit information at different levels of detail depending strictly on their proven needs, and in any case always according to strict confidentiality rules as set out under existing relevant European law.

Just as importantly – and more in perspective – we will be in the position to respond to rapidly changing data requests from users in a timely and cost-efficient manner, without having to rely on very costly ad hoc data requests or new reporting requirements, and with clear savings and benefits for the banks. It will be truly multipurpose and allow flexibility for analysis. Moreover, more complete and comparable information returned to reporting agents – via feedback loops that might be established at national level by the respective NCB – will also be beneficial for banks in assessing the creditworthiness of new potential borrowers, even when the latter have multiple cross-border exposures.

Network analysis

Network analysis, or graph theory, is the study of systems of nodes and vertices. In the field of financial stability analysis this can be used to represent financial entities (nodes) and the links they have to counterparties (vertices).

While it need not involve big data, network analysis often draws on very large quantities of data. An obvious source is information on derivatives trades, which is accumulating rapidly in trade repositories. 

A recent study by the Bank of England used this data to map the interconnections between dealers, central counterparties and end-users in the derivatives market.Such a graphical representation can immediately shed light on where the most important connections lie, and therefore where shocks are likely to have the greatest contagion. The authors found the links between counterparties were highly correlated with measures of systemic importance.

It is also possible to map networks on the basis of less structured data, as a study published in January 2016 by the European Central Bank (ECB) showed. Researchers focused on how mentions of bank names in news stories could produce a network of interrelations between banks, allowing them to rank institutions by their systemic importance. 

Examples of the ECB’s interactive visualisations are available online, and show how the network grew denser and more connected as the 2008 crisis developed. Banks such as RBS, which was ultimately bailed out, moved to the centre of the network as the situation worsened. As the crisis faded, the links became weaker and the major players moved to less central zones. Tugging a node shows how ripples spread through the system – RBS impacts every other bank, while less connected players such as Nordea or DZBank have much less effect.

Other forms of text mining, particularly sentiment analysis, have the potential to offer richer insights into such networks, the authors say.

Being fully aware of the paradigm shift we are confronted with, the ECB statistical function is working, with the involvement of the financial industry, towards designing and implementing co-ordinated data management, comprising information collected under different statistical and legal frameworks. The main workstreams in this field, which have already been under way for some time, relate to: the development of a bank’s integrated reporting dictionary (BIRD), defining a sort of ‘common language’ for the information provided by reporting agents, an ECB Single Data Dictionary and an integrated European Reporting Framework, covering both statistical – monetary policy, financial stability, and so on – as well as supervisory reporting. They all point in the direction of providing financial institutions with a unified reporting framework based on a consistent and stable set of rules and definitions with a twofold goal: alleviate the statistical reporting burden and, on the side of the authorities, ensure data quality and consistency; and allow a combined use of all granular information – for example, data on debt securities and, in perspective, credit exposures.

The gradual development of a fully fledged master data set with reference data on all counterparties involved in the exposures covered under the various requirements – lenders, borrowers, holders, issuers, protection providers, etc. – is an important example of this effort and of the challenges entailed in the move from aggregates to granular statistics. Such a reference data set is directly functional to the unique identification of counterparties, which is a precondition for calculating the total exposure of a borrower (and/or issuer) vis-à-vis the whole lenders’ population. Together with complete and up-to-date reference information on counterparties – for example, sector of activity, size, geographic location, annual turnover – this will allow a very informative analysis per specific segments of the economy. 

The definition of the AnaCredit requirement has posed several challenges and now others arise in preparation for the first reporting, due in autumn 2018. Still, the ECB and the Eurosystem are confident that all such challenges were addressed in the best possible way and that the benefits of AnaCredit will definitely outweigh the efforts put into its establishment and continued operation.

Clearly, AnaCredit by itself might not save us from any new financial crisis but, as the Issing Committee observed 10 years ago, a credit register “would capture the longer-term trends that history shows have often posed the biggest threat to financial stability”. It will definitely put policy-makers in a better position to mitigate the risks ex-ante and to limit their potential impact ex-post

The views expressed are those of the author and do not necessarily represent the views of the ECB or the ESCB. The author thanks Riccardo Bonci for his help in preparing this contribution.

1. Besides Otmar Issing, the Committee comprised Jörg Asmussen, Jan Pieter Krahnen, Klaus Regling, Jens Weidmann and William White. 

About the author

aurel-schubert2Aurel Schubert has been director-general of the European Central Bank’s (ECB) statistics department since 2010. He also chairs the Statistics Committee of the European System of Central Banks, and is vice-chair of the Bank for International Settlements’ Irving Fisher Committee on Central Bank Statistics. Schubert spent 25 years working for the National Bank of Austria before moving to the ECB. 

This article is part of the Central Banking focus report, Big data in central banks, published in association with BearingPoint.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact or view our subscription options here:

You are currently unable to copy this content. Please contact to find out more.

You need to sign in to use this feature. If you don’t have a Central Banking account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account