22 May, 23

What is Data Availability?

data availability banner
beau chaseling headshot
Beau Chaseling

Innovation Analyst

Blockchains are often touted for their ability to provide secure and transparent record-keeping. Transactions are recorded, contained within blocks and added to the blockchain, which is then stored locally by each node in the network. The importance of ensuring that transaction data is included in each proposed block cannot be overstated. It is a crucial aspect of maintaining the integrity and trust of the blockchain network. However, it is not enough to simply trust the nodes in the network to act honestly. Their actions must be independently verified, hence the saying “don’t trust, verify.” The challenge of verifying that transaction data is available in each newly proposed block is known as the data availability problem. This article will explore the importance of data availability with regard to the functioning of a blockchain network as well as the active developments occurring in the field.

What is Data Availability?

Data availability (DA) is the assurance of the presence and accessibility of data in a database. In the context of blockchains, DA is the guarantee that the proposed block contains the data of every transaction that has occurred on the blockchain and that this data is accessible to every participant in the network. Without this data, network participants are unable to confirm that a particular event has occurred, which from the blockchain’s perspective means the event did not occur. This highlights the importance of DA for maintaining the integrity and trust of a blockchain network.

When building and proposing new blocks, block producers must make all data contained within the block, including the transaction data in the block body, available for others to see. A blockchain’s storage capacity for new blocks is referred to as blockspace which is essential for the ability to access stored data. Therefore, as block space fills up, data availability may decrease. During the settlement and consensus process, full nodes can download the proposed blocks and re-execute the transactions. Through this process, the validity of each transaction can be confirmed and the security of the network can be guaranteed. Additionally, this also allows for the detection of any malicious transactions that may have been included in the block by the block proposer.

Full Nodes and Light Nodes

A blockchain node is a device that is connected to a blockchain network and participates in its operations. These nodes are responsible for maintaining the integrity of the blockchain network and the data stored on it. There are two main categories of node, a full node and a light node;

A full node is a node that stores a copy of the entire blockchain and participates in the network’s consensus protocol. It validates transactions and blocks, and propagates them throughout the network. A full node also verifies the integrity of the blockchain by checking that the blocks and transactions conform to the network’s consensus rules. Full nodes perform important functions such as keeping the network secure and decentralised. They are usually run by individuals or organisations that are committed to the long-term success of the blockchain network.

A light node, also known as a “light client” or “SPV (Simple Payment Verification) node”, on the other hand, does not store a full copy of the blockchain. Instead, it relies on other full nodes to provide it with the information it needs to validate transactions. Light nodes are typically used by individuals or organisations that want to participate in the network but do not want to run a full node. They are typically less resource-intensive and can be run on less powerful devices, such as mobile phones. While light nodes provide a lower level of security and decentralisation, they are more convenient for users who want to use the blockchain without running a full node; in this way, by lowering the barrier of entry and making the chain more decentralised.

Storing a full copy of the blockchain as well as participating in transaction execution, validation and propagation necessitates the use of large amounts of computational resources from full nodes, frequently requiring the use of specialised hardware. Light nodes, on the other hand, require less storage, and computational power and can be run on less powerful devices with slower internet connections.

The Data Availability Guarantee

DA guarantees refer to the assurance that all necessary data for a transaction is available on the blockchain network. It ensures that data stored on the blockchain is available to all nodes and can be accessed at any time. Commonly guaranteed through decentralisation requires each full node in the network to download each proposed block and verify the availability of transaction data before the block is finalised. Each blockchain guarantees a certain degree of DA depending on the method. Some blockchains have been specifically designed to provide a high guarantee. 

Data availability guarantees play a vital role in maintaining the security and trustlessness of blockchains, even when nodes experience failure or go offline. These guarantees assure that all nodes have consistent access to the same data and that the data is accurate and valid. This helps to prevent malicious data or attempts to tamper with the data. Furthermore, DA guarantees to enable nodes to independently verify transactions and compute the blockchain’s state without the need to trust one another, ensuring that the blockchain remains decentralised and there is no single point of failure.

Costs of Data Availability

Blockchain networks incur costs for DA guarantees. These costs are varied and diverse because of the effort required to ensure DA. The computational resources required to replicate, verify and store data are foremost among such costs. After all, the time and effort required to undertake such processes vital for network function necessitate adequate incentivisation for nodes. Therefore, ensuring DA and providing data storage are two of the primary costs associated with running a blockchain. As new methods for providing DA guarantees are implemented by blockchains such as Ethereum these costs may reduce dramatically.

Rollups and Data Availability

Rollups, a layer 2 scaling solution, execute transactions off-chain before compressing and posting them in batches to the base layer. DA is important for rollups as it allows users to trustlessly verify the correctness of computations performed off the main chain. Presently, there are two main types of rollup networks; optimistic rollups and zero-knowledge rollups (ZKR). The primary difference in guaranteeing DA between the two is in the method of verification. 

Optimistic rollups use an economic incentive model to guarantee DA. Users are incentivised to submit valid data by a deposit which is forfeited if they submit invalid data and a reward for submitting valid data. This deposit is held in a smart contract which is responsible for verifying the validity of the submitted data, and the reward is distributed to users who submit valid data. Additionally, optimistic rollups rely on fraud proofs to prevent invalid state transitions from happening. If a batch is invalid any user can use available transaction data to construct a fraud proof to verify the correctness of the submitted data. 

Conversely, ZKR guarantee DA using cryptographic proofs. This means that users must submit cryptographic proof to prove the validity of a transaction or state transition which is then verified by the network to confirm its integrity. Additionally, the ZK nature of the proof ensures that it can be checked without revealing any information external to the fact that the proof is true. This helps to maintain the privacy of the users while guaranteeing that the data is valid. Furthermore, ZKR can process multiple transactions in a batch, meaning that the proof only needs to be submitted once for the entire batch, saving on time and resources. However, the generation of ZK proofs required by these rollups can be computationally expensive; this cost is usually foisted upon the network’s users who pay additional fees for the generation of the proof.

The costs of DA and data storage for rollups have proven to be Ethereum’s foremost roadblock in adopting them as the network’s primary means of execution. This is due to the immense amounts of calldata that rollups produce. After rollups were introduced on Ethereum they were rapidly adopted, quickly causing the network to be inundated with calldata. While not unmanageable, the amount of calldata rollups produce could be problematic were they to become the primary means of execution for Ethereum. However, without rollup calldata the base layer would be unable to verify and validate the transactions which occurred off-chain. Hence, Ethereum’s current plans expand its data storage capacity and implement more efficient DA guarantees.

Innovations Benefitting Data Availability

Innovations in blockchain technology are providing new ways to resolve cost and storage issues relating to the DA guarantee. Transaction costs have heightened on several major blockchains due to the sheer amounts of calldata produced by DeFi applications and rollups. Furthermore, as these networks are inundated with data, storage issues compound, as most are required to maintain a complete network history. Nevertheless, recent innovations such as data sharding, data availability sampling and randomly sampled committees have introduced more efficient methods for ensuring DA and providing data storage.

Data Availability Sampling

One recent innovation in the field of DA is data availability sampling (DAS). Popularised in a paper written by Mustaffa Al-Bassam, Alberto Sonnino and Vitalik Buterin, DAS is a technique used to improve the DA of a blockchain network. The technique involves randomly selecting nodes on a network to store data, rather than all nodes storing the same data. DAS works by randomly selecting nodes to store a subset of the data on the blockchain, which helps to reduce the resources required to store the data. Each node samples and verifies the availability of the subset of data they were assigned from the proposed block. An example of DAS conducted on a BLOB (Binary Large Object) rather than a block is exhibited below.

data availability sampling

Source: https://hackmd.io/@vbuterin/sharding_proposal

A system utilising DAS may be affected by a malicious block producer making 50-99% of block data available. This may cause some clients to accept the block at first while others reject it. To account for this weakness DAS is used in conjunction with erasure coding. Erasure coding is a technique that allows us to double a dataset by adding redundant pieces (called erasure codes). If the original data is lost, the erasure codes can be used to recover the data. This technique is necessary for DAS as it guarantees that data can be recovered even if some of the data is lost. This ensures that all clients on the network eventually accept the data, even if some initially rejected it. There is no time limit for when a client must accept the data, they will do so once they have received responses to all their queries.

As well as the potential for data omission there exists the potential for edge cases and unknown complications due to the untested nature of DAS. For example, with the beacon block proposer being one of the first to make DAS queries an attacker may publish unavailable BLOBs and only respond to the beacon node’s queries which may lead to a fork of the chain. To account for these eventualities, randomly sampled committees (RSC) are used in conjunction with DAS. RSC are randomly chosen groups of validators used to verify DA. Used in conjunction with DAS, RSC enables lower latency, compatibility with in-shard transaction execution and an extra layer of security for potential edge cases. 

data availability coding

Implementing DAS reduces the resource requirements for running a node, even allowing light nodes to participate. Additionally, this also helps to increase the scalability of a network, as the data stored by nodes is reduced and the network can process more transactions at any given period of time. DAS and data sharding can be used together to improve scalability by randomly selecting nodes to store data in each shard and reducing storage requirements. Additionally, DAS can help to improve security, as it is more difficult for malicious actors to target a specific node and attack the network, even negating 51% of attackers from forcing unavailable blocks.

Data Availability Layer

Recently, the concept of adopting a modular blockchain architecture has gained wider acceptance. Modular blockchains separate the many roles of a blockchain into unique layers, allowing networks to specialise and operate with greater efficiency. DA layers are systems responsible for ensuring that the data is available when it is needed. There are two types of DA layers; on-chain and off-chain. 

The on-chain DA layer is the standard approach among many blockchains, in which data is stored on-chain by the nodes that execute transactions. This ensures high DA, but limits scalability due to the additional burden placed on nodes.

An off-chain DA layer may be another blockchain or any data storage system chosen by developers. In this case, the DA layer focuses on storing data, not execution. DA layers enable blockchains to securely access off-chain resources in a secure and trustless manner, expanding the computational power and data available to smart contracts.

Utilising a dedicated DA layer for data storage and accessibility improves the flexibility, scalability, and decentralisation of the network. It allows for specialisation in data storage and accessibility, relieving nodes from other blockchain functions. Additionally, it improves security by making the network less vulnerable to attacks and makes it more adaptable for specific use cases and industries. The customisability provided by demarcating different layers of the blockchain stack also results in greater flexibility when upgrading or changing the network.

Danksharding and Proto-Danksharding

Presently, each full node in the Ethereum blockchain is required to store a complete copy of the blockchain. These requirements place a large storage burden on each node. This storage burden could be mitigated by sharding the blockchain between nodes.

First proposed in the paper “New sharding design with tight beacon and shard block integration”, danksharding is a new sharding architecture, created by Dankrad Feist to avoid MEV (Maximal Extractable Value), increase decentralisation and provide additional security. The sharding design simplifies the process compared to previous designs by utilising binary large objects (BLOBs) for data storage. BLOBs represent a significant change from current blockchain data storage methods whereby all data, irrespective of its nature, is stored with transactions in blocks. Being inherently different to transactions, treating data as transactions is capital intensive and slow the chain down by filling blocks up without increasing the number of transactions in the block. BLOBs allow data to be stored as data as opposed to transactions in the context of a block. The segregation between the two types of information is what is renders danksharding/proto-danksharding so transformative for Ethereum

These changes are designed to facilitate the rollup-centric roadmap with BLOBs expected to be filled with rollup calldata. To negate the harmful effects of MEV PBS (Proposer-Builder Separation) will be implemented alongside danksharding. This system requires two roles: builders and proposers, where builders watch the Ethereum mempool and order transactions to create the most profitable block before passing said designs to proposers who are selected to validate and propose the next block. Working via a merged fee market, danksharding enables block builders to bid for the right to choose the contents of each BLOB meaning block proposers need only select the builder with the highest bid. 

However, the use of RSC (Random Sampling Committees) can mitigate this risk by distributing dishonest nodes across each shard, making it impossible for them to gain a majority. To further ensure network security in preparation for sharding, DAS is planned to be implemented in conjunction with RSC. By implementing these two measures together, the network can be protected against potential attacks and maintain its integrity.

Proto-danksharding is a route for implementing Ethereum’s complete sharding roadmap, outlined in EIP-4844. It aims to lay the groundwork for sharding and differs from danksharding in that it requires all validators to verify and guarantee DA. It introduces a new transaction format, the BLOB carrying transaction which carries additional data in the form of a BLOB and is cheaper than traditional transactions. It uses KZG commitments and DAS to ensure data security and availability. The use of BLOBs by both models is a significant improvement over current DA and data storage methods, reducing storage requirements for nodes and thereby increasing the scalability of the network. Additionally, the mitigation of front running and other forms of MEV decreases losses incurred by transactors, making danksharded networks more attractive to users. Finally, data can be stored more cheaply with large fixed-size BLOBs, allowing users to store large amounts of data on the blockchain for a lower cost.

Conclusion

Data Availability is the backbone of any blockchain network, without which it would be dysfunctional and unusable. It ensures that all network participants have access to the data stored on the blockchain, allowing transactions to be verified by anyone in an efficient, secure and accurate manner. Recent advancements, such as dedicated DA layers, new transaction formats for BLOBs and danksharding are helping to mitigate the cost of meeting the guarantee. Furthermore, with the public groundswell of perception shifting from monolithic to modular blockchains, the importance of adequately managing the DA layer burgeons. Nevertheless, DA remains a ripe area for future innovation and research as new solutions are constantly being developed and refined to rectify the relevant storage and cost issues.

About Zerocap

Zerocap provides digital asset investment and digital asset custodial services to forward-thinking investors and institutions globally. For frictionless access to digital assets with industry-leading security, contact our team at [email protected] or visit our website www.zerocap.com

DISCLAIMER

Zerocap Pty Ltd carries out regulated and unregulated activities.

Spot crypto-asset services and products offered by Zerocap are not regulated by ASIC. Zerocap Pty Ltd is registered with AUSTRAC as a DCE (digital currency exchange) service provider (DCE100635539-001).

Regulated services and products include structured products (derivatives) and funds (managed investment schemes) are available to Wholesale Clients only as per Sections 761GA and 708(10) of the Corporations Act 2001 (Cth) (Sophisticated/Wholesale Client). To serve these products, Zerocap Pty Ltd is a Corporate Authorised Representative (CAR: 001289130) of AFSL 340799

All material in this website is intended for illustrative purposes and general information only. It does not constitute financial advice nor does it take into account your investment objectives, financial situation or particular needs. You should consider the information in light of your objectives, financial situation and needs before making any decision about whether to acquire or dispose of any digital asset. Investments in digital assets can be risky and you may lose your investment. Past performance is no indication of future performance.

FAQs

What is Data Availability in the context of blockchain technology?

Data Availability (DA) in the context of blockchains is the assurance that the proposed block contains the data of every transaction that has occurred on the blockchain and that this data is accessible to every participant in the network. Without this data, network participants are unable to confirm that a particular event has occurred, which from the blockchain’s perspective means the event did not occur. This highlights the importance of DA for maintaining the integrity and trust of a blockchain network.

What is the difference between a full node and a light node in a blockchain network?

A full node in a blockchain network stores a copy of the entire blockchain and participates in the network’s consensus protocol. It validates transactions and blocks, and propagates them throughout the network. A light node, also known as a “light client” or “SPV (Simple Payment Verification) node”, does not store a full copy of the blockchain. Instead, it relies on other full nodes to provide it with the information it needs to validate transactions. Light nodes are typically used by individuals or organizations that want to participate in the network but do not want to run a full node.

What are the costs associated with Data Availability in blockchain networks?

Blockchain networks incur costs for DA guarantees. These costs are varied and diverse because of the effort required to ensure DA. The computational resources required to replicate, verify and store data are foremost among such costs. After all, the time and effort required to undertake such processes vital for network function necessitate adequate incentivisation for nodes. Therefore, ensuring DA and providing data storage are two of the primary costs associated with running a blockchain.

What are rollups and how do they relate to Data Availability?

Rollups are a layer 2 scaling solution that execute transactions off-chain before compressing and posting them in batches to the base layer. DA is important for rollups as it allows users to trustlessly verify the correctness of computations performed off the main chain. There are two main types of rollup networks; optimistic rollups and zero-knowledge rollups (ZKR). The primary difference in guaranteeing DA between the two is in the method of verification.

What are some innovations benefiting Data Availability in blockchain technology?

Innovations in blockchain technology are providing new ways to resolve cost and storage issues relating to the DA guarantee. Recent innovations such as data sharding, data availability sampling and randomly sampled committees have introduced more efficient methods for ensuring DA and providing data storage. Additionally, the concept of adopting a modular blockchain architecture, which separates the many roles of a blockchain into unique layers, has gained wider acceptance. This includes the concept of a dedicated DA layer for data storage and accessibility.

Like this article? Share
Latest Insights

22 May, 23

What is the CBDC Anti-Surveillance State Act?

On May 23, 2024, the U.S. House of Representatives passed the CBDC Anti-Surveillance State Act. This legislation, introduced by Congressman Tom Emmer, aims to prevent

22 May, 23

Blockchain Fintech Solutions: Bridging the Ecosystems

Blockchain technology is revolutionizing the financial technology (fintech) landscape by providing innovative solutions to longstanding challenges. As the demand for more secure, transparent, and efficient

22 May, 23

How is Bankrupt FTX Paying Back Its Customers?

FTX, once a major player in the cryptocurrency exchange market, filed for bankruptcy in November 2022 following revelations of significant mismanagement and fraudulent activities. The

Receive Our Insights

Subscribe to receive our publications in newsletter format — the best way to stay informed about crypto asset market trends and topics.

Want to see how bitcoin and other digital assets fit into your portfolio?

Contact Us
Ready to sign up?
Create an Account