An Introduction to Dark Data (Exploring Dark Data Series – Part 1)

February 12, 2023

Dark Data is a growing concern for enterprises. With the exponential growth of data, managing and leveraging it effectively has become a critical challenge but valuable pursuit for organizations. In this series of articles, we will delve into the world of dark data, exploring what they are, challenges, and opportunities. We will also take a deep dive into how to unearth all data hiding in darkness and how to tackle them. Join us as we embark on a journey of discovery, unlocking the potential of dark data in organizations.

Data is everywhere, but it's not always used. This so-called "dark data" is all the unseen, unexplored, and untapped information that organizations generate as they interact with countless devices and systems. This data is often old, incomplete, redundant, or in a format that can't be accessed with current tools. In the era of big data, competitive companies must take advantage of all the data they have at their disposal — and dark data may be the key.

But how do companies research, access, and analyze dark data? How would an enterprise, with petabytes of data and constant daily production of new data, even approach managing dark data? What about the risks and challenges? It starts with understanding what dark data is and creating a comprehensive strategy for extracting value from it. This can include developing the tools and processes necessary to process and manage the data, as well as being prepared for the stricter data regulations that may come with it. This article is the first of our Exploring Dark Data series, and together we will explore the definition of dark data, how it can affect your organization, and how organizations can research, access, and analyze their dark data.

The digital revolution has ushered in a new data age, and businesses must be ready to take full advantage of their dark data in order to stay ahead. With the right strategy in place, organizations can unlock the untapped potential of this hidden information and use it to power their growth.

What is Dark Data?

Dark data is a term used to describe data that is collected but not used to further a company's goals. It's data that's left on the cutting room floor, sitting untouched in databases and archives. It's a digital trove of untapped potential, and it's estimated that dark data makes up more than 90 percent of all the data that companies collect.

While it's easy to understand why this data is left untouched—it's often difficult to interpret, or it simply isn't seen as valuable—the potential of dark data is being increasingly recognized. Companies are turning to advanced analytics, machine learning, and artificial intelligence to help them make sense of the data and extract insights that could help them better understand their customers, improve their products and services, and stay ahead of their competition.

Dark data is an untapped resource, but it's also a potential liability. Without taking the necessary steps to secure the data, companies risk exposing it to hackers, putting their customers' data at risk. Companies need to ensure that they have the right security measures in place to protect their dark data and prevent a data breach. It's also important for companies to be aware of any legal requirements for collecting and storing data, such as GDPR, and ensure that they are properly adhering to them.

The Bigger, The Darker

As the digital age continues to advance, the sheer volume of data organizations produce on a daily basis is becoming increasingly difficult to manage. The unknown, undiscovered, unquantified, underutilized, or completely untapped data that organizations produce is at a rate that far exceeds their ability to analyze it.

The problem is further exacerbated by the rapid increase in unstructured data, which is data that is not organized by any pre-defined model. Forbes reports that this type of data is rising at a rate of 55-65% per year. With 1.7 MB of data being created per person per minute, it's projected that by 2025, there will be 175 trillion gigabytes (175 zettabytes) of data globally. Of this, 80% will be unstructured, and 90% of that unstructured data will never be analyzed or used in regular business activities. Despite the potential value of this data and the cost of storage, many organizations are failing to tap into it due to the sheer volume and complexity of managing it.

Types of Dark Data

The concept of dark data is not a one-size-fits-all phenomenon. Different industries have their own unique types of data that fall under the "dark" category. For example, in the fitness industry, a running app might collect background weather data as a user runs. This data could include information such as temperature, humidity, and wind speed. While this data may be collected, it may not be analyzed or used to improve the app's features. Similarly, a shopping app might collect information about a user's browsing history. The goal of this might be to personalize the shopping experience for the user, but more often than not, the majority of the collected data won't be analyzed or used to improve the app's features.

In healthcare, medical imaging is a common source of dark data. These images, such as X-rays and MRIs, are often stored in a digital format, but may not be analyzed or used to diagnose or treat patients. Healthcare organizations do collect large amounts of patient data but may not have the resources or technology to analyze it and extract valuable insights.

In the finance industry, dark data can take the form of transaction data, customer data, and financial reports. Despite being collected and stored, these data may forever remain unanalyzed and unused in efforts to enhance financial forecasting or detect fraud.

If we look into the retail industry, customer purchase history, browsing history, demographic data, and in-store IoT data would all contribute to the cache of evergrowing dark data.

The examples above are just a small sampling of how different industries may collect and store different types of data that can be considered "dark." The key takeaway is that while data is being collected at an unprecedented rate, not all of it is being analyzed or used to its fullest potential.

Where is Dark Data Hiding?

According to Gartner, dark data makes up the majority of an enterprise's information universe, but many companies are unaware of how much dark data they possess. Not only does the storage of dark data increase compliance and cybersecurity risks, but it also comes at a significant cost.

Uncovering the existence and location of dark data within an organization is crucial for securing valuable information and getting rid of what's not needed. However, the true potential of dark data lies in its ability to benefit the business. By identifying, understanding, and utilizing this data, organizations can gain valuable insights, improve decision-making, and potentially uncover new revenue streams.

The challenge for organizations is to find ways to extract value from the vast amount of data they collect and store. This requires not only the right technology but also a clear understanding of what data is valuable, where it is stored, and how to use it. Organizations that can successfully navigate this challenge will not only be able to reduce costs and risks but also gain a competitive advantage by leveraging the insights hidden within their dark data.

Mining dark data is a challenging task that requires a comprehensive approach. The main obstacle is the diversity of formats that dark data can come in. It can be unstructured data, such as text, images, or audio, or structured data, such as spreadsheets and databases. It can also come in the form of scanned documents, audio files, and video files, making it hard to identify and extract insights from.

Another obstacle is that dark data can be completely unformatted, making it hard to analyze or understand. For example, dark data may be stored in the form of unstructured text in emails, social media posts, and customer service interactions. This type of data can be hard to analyze because it is not organized in any pre-defined manner.

Dark data can also be hidden in legacy systems and applications that organizations no longer use. This data can become "dark" if it is not properly migrated or integrated with new systems. For example, an organization that is using a new CRM system may not be able to access customer data that is still stored in an older system. This data can become "dark" if it is not properly migrated or integrated with the new system.

Even if we are thinking about applications and information storage repositories that enterprises are actively using, just think about how many such places there are in any given organization. From SharePoint and cloud storage services like Google Drive, Box, and Dropbox, to network drives with shared folders, the list just keeps on expanding!

We have barely scratched the surface about Dark Data with this first instalment. In part 2 of our series, we willdelve into the why behind Dark Data being a problem for organizations, and explore the many challenges it presents.

Managing both your archive and active content in one ECM efficiently

Massive savings in storage and compute costs. Our 500+ enterprise customers often cut their cloud bill in half or shut down entire data centers after implementing our solutions