Why mass data infrastructure strategy needs to be built with this powerful force in mind.
More than ever, today’s mass data sets are on the move. According to the 2021 IDC Cloud Data Storage and Infrastructure Trends Survey, commissioned by Seagate Technology, 47 per cent of enterprises use a centralised cloud storage architecture. In two years, that number will fall to 22 per cent. Conversely, 25 per cent of respondents currently have a hybrid storage architecture (a combination of both centralised and edge locations); that number will rise to 47 per cent in two years.
As a result, data increasingly needs real-time processing at the edge and transport to the cloud to extract more value from it through computationally intensive tasks (such as training of large machine learning models). Additionally, the amount of enterprise data is rising exponentially – at the astonishing average annual growth rate of 42 per cent.
Mass data matters
As this proliferating data spreads from cloud to edge, organisations must contend with a shift in data gravity.
According to another new Seagate-sponsored report by IDC, Future-Proofing Storage: Modernizing Infrastructure for Data Growth Across Hybrid, Edge, and Cloud Ecosystems, as storage associated with massive data sets continues to grow, so will its gravitational force on other elements within the IT universe.
Just as stars form from scattered clouds of dust that collapse over time from their own gravitational attraction, concentrations of data have a gravitational impact too. Data gravity is the power of data to attract applications, services and other data. “Workloads with the largest volumes of stored data exhibit the largest mass within their ‘universe,’ attracting applications, services, and other infrastructure resources into their orbit,” according to the IDC report.
Generally speaking, data gravity is a consequence of the amount of data (mass) and its level of activation. A body of data with greater mass exerts a stronger pull on the infrastructure surrounding it.
What does all this mean for enterprise leaders? They must implement a strategy to efficiently manage mass data sprawling across cloud, edge, and endpoint environments – especially when designing data storage infrastructure at scale.
What worked for terabytes doesn’t work for petabytes. As enterprises aim to overcome the cost and complexity of storing and activating data at scale, they should seek better economics, less friction, and a simpler experience that’s open, limitless and built for the data-driven, distributed enterprise.
Specific attention should be given to the economics of data movement. Physical data shuttles such as the Seagate® Lyve Mobile, for example, can often prove a more cost-effective and faster solution for large data ecosystems, with the bonus of faster time to insights. As the Future-Proofing Storage report finds, IT environments need architectures that enable “the migration and management of stored data, along with the applications and services that rely on it, regardless of operational location.”
You can learn more about how enterprises can overcome the challenges posed by data gravity by watching Seagate’s Datasphere 2021 General Session, where Seagate experts offer guidance on managing mass data, its exponential growth and sprawl.
Ensuring access to data
A way to do this is to make sure that data is stored nearer applications that require lower latency. This can be accomplished by using cloud-native designs that containerise applications and execute close to users, as well as interact with, create, and store data close to the point of origin. Containerising applications simplifies management and deployment.
Containerisation provides a clean separation of concerns. Developers can focus on their application logic and dependencies. IT operations teams concentrate on deployment and management without bothering with application details such as specific software versions and configurations specific to the app. The benefits to businesses are agility and efficiency, and often better security and TCO improvements.
Enterprise data infrastructure should follow the five key principles of cloud-native architecture, as outlined by Google: 1. Design for automation (of infrastructure). 2. Be smart with state. 3. Favour managed services. 4. Practice defence in depth. 5. Always be architecting.
Data-centric architecture means accessibility. It increases ease-of-use and smooth operations of a data pipeline, and can impact future business innovation, improving the ability to generate metadata and new datasets, enabling search and discovery of the data and further empowering data scientists to deploy the resulting models for machine learning.
Accessibility can also positively impact application performance, reduce latency, curb or eliminate egress charges, and make it easier to manage security and compliance.
A truly data-centric storage infrastructure means awareness of which data sets are being pulled where, what is the most efficient path to move the data, and what helps application workloads run the best. It can also include automating the movement of data to reduce storage costs or moving lower-performing data sets that are not immediately or actively needed.
The business benefits of data gravity-mindful storage infrastructure include excellent customer experience, protection of data sets, policy-driven access, lowest costs for retention, preservation for analysis and management simplicity to ensure service resiliency.
Learn more about data gravity, hybrid architecture, overcoming network constraints and the growing complexity of storage management in the new Seagate-sponsored report from IDC, Future-proofing Storage: Modernizing Infrastructure for Data Growth Across Hybrid, Edge, and Cloud Ecosystems.
By John Morris, Senior Vice President and Chief Technology Officer, Seagate Technology