Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting our team. We will be in touch shortly.Close

  1. Blog
  2. Article

Philip Williams
on 22 December 2021

A look forward to storage in 2022

It’s that time of year, where we start to look ahead, and think about the ongoing trends in our various industries. One thing is for certain in the storage industry: capacity demand remains high, with the industry observing continued exponential growth.

Growth, growth, growth

More and more data is being created every day. It truly is non-stop. In 2021 alone, it was predicted that enterprise storage vendors would ship almost 150 Exabytes in capacity, and this number is only expected to increase again in 2022!

We now see 20TB hard drives on the market to help with these needs, but we have to remain vigilant when building storage clusters, as the access speed of these drives hasn’t really changed at all over the last few years. In failure scenarios, where we have to recreate replicas or erasure-coded shards of data, it can take many many hours with drives of such high capacity.

So the rule of thumb remains the same: a larger number of smaller drives leads to a more predictable system for any amount of capacity. Of course, you do have to remain pragmatic to balance capacity needs with the cost of increasing the number of spindles.

Flash, denser, and faster

Over the last few years, we have seen huge leaps forward in capacity orientated flash. Intel recently launched a 30TB QLC 3D NAND drive, surpassing even the largest of traditional spinning drives. Whilst we wouldn’t suggest using these for very write-heavy workloads, there is definitely a place for them in storage systems to increase throughput above traditional spindle based configurations. Additionally, there are power usage benefits too, which in large-scale clusters becomes more and more important as you scale – and even at the Edge, where power budgets might be quite limited!

Computational storage

An interesting and novel area in hard drive technology is the concept of computational storage, that is, adding more intelligence to the hard drives and SSDs that we use in servers and storage clusters.

We have seen work in this area before, but the use case was almost too narrow. Seagate created a hard drive called Kinetic, which exposed a key/value object storage interface over Ethernet, rather than the usual block interfaces of SAS or SATA. This was interesting for those of us building larger scale object stores. It meant that, with each hard drive added to a cluster, an additional amount of compute resource was added too, leading to a highly scalable sea-of-compute-and-storage. Furthermore, it reduced failure domains significantly to a single disk, rather than a whole server containing multiple disks. However, this concept didn’t really gain much traction, as it required significant changes to the software used to build storage clusters. There just wasn’t enough resources on each drive to run an entire OSD in the case of Ceph.

Fast forward to 2021, and we see some smaller companies start to offer products that maintain typical SAS and SATA interfaces, but also provide capacity efficiency options like compression, or encryption, on-drive, without the requirement of any host processing power, or changes to the software running on the server.

This is a lot like what we have seen already in the Ethernet space, where certain tasks are offloaded to Smart-NICs. With some computationally aware storage devices, it is already possible to access the compute resources on these drives and use them for pre-processing datasets. When you may have a storage system with thousands of drives, this becomes a huge amount of additional computing power at your disposal.

Data repatriation – post pandemic splurge

Over the last two years, we have all seen huge changes in the way that we work. To support that, many companies have turned to public clouds to help them scale their operations immediately and maintain business as usual. Cost optimisation has largely been a secondary consideration.

However, as companies have settled into these new ways of operating, we now see a renewed focus on cost optimization and efficiency. Storage remains the least cloud-friendly piece of infrastructure, as usage is typically static or expanding, and doesn’t have peaks and troughs like compute might.

More and more companies are waking up to the costs of storing data in the cloud, and are considering near-cloud solutions where they operate their own hardware in co-location facilities adjacent to major cloud provider facilities, and link them together with private interconnects. Not only does this reduce costs immediately, it also means that there are no penalties when migrating to other cloud providers in the future too!

Wrap up

We wish you all Happy Holidays and a wonderful New Year!

Open source storage solutions such as Ceph can readily help solve for the growth and scaling challenges seen across the industry. Learn more about deploying Ceph from our recent webinar here.

Related posts

Philip Williams
10 November 2022

What is object storage?

Ceph Article

Object storage is a type of storage where data is manipulated as distinct units. It has accompanied the cloud computing revolution, with S3 (Simple Storage Service) being the very first AWS service. The API for which later turned into the industry standard for the majority of object stores. Object stores have a very simplistic interface, ...

Philip Williams
16 March 2022

Cloud Adjacent Storage

Ceph Article

What is cloud adjacent storage? Put simply, cloud adjacent storage is just a privately owned and operated storage system, within network reach of a cloud provider’s region, but without the pay-as-you-grow and access charges found in public clouds. Why is cloud adjacent storage a better choice than public cloud storage? Public clouds were ...

Philip Williams
11 April 2024

The role of secure data storage in fueling AI innovation

Ceph Article

There is no AI without data Artificial intelligence is the most exciting technology revolution of recent years. Nvidia, Intel, AMD and others continue to produce faster and faster GPU’s enabling larger models, and higher throughput in decision making processes. Outside of the immediate AI-hype, one area still remains somewhat overlooked: ...