Deep Dive: How Data Provides Businesses With Competitive Analytics

U.S. businesses are constantly battling to stay ahead of fraudsters, and 80 percent of IT business leaders expect cyberattacks or critical breaches to occur on their watches within the year. Data holds the key to helping modern enterprises develop effective anti-fraud strategies. Many businesses are sitting on massive troves of it, but they are also facing down the three “V’s” of data complexity — velocity, variety and volume — which can make tackling fraud even harder.

Fraud solutions are needed to help firms understand the meaning behind increasingly complex data sets. These solutions use Big Data analytics and machine learning (ML) to help businesses better detect fraud and reduce the risks of financial losses. Such insights can also help firms improve customers’ experiences and lower operational costs.

Businesses that do not implement fraud solutions risk underutilizing the data they have and overemphasizing ineffective anti-fraud strategies. These businesses’ strategies are often hamstrung by legacy fraud solutions that rely on data warehouse technology. Such warehouses store data from various sources but often lack the flexibility to consider new types, collect additional sources or modify queries.

Data lakes offer an alternative that could help businesses update their anti-fraud solutions. They store and collect data in large depositories, but, unlike data warehouses, they can also provide structure and meaning to information that might not be obvious. Firms can use artificial intelligence (AI) and ML tools to glean insights from data lakes, allowing them to analyze new fraud threats and determine how best to respond. The following Deep Dive uncovers how data lakes are causing a sea change in the fight against fraud.

Structured Versus Unstructured Data

Comprehending how data lakes function requires understanding how the data stored in them works. Data falls into two categories: structured and unstructured. The former refers to information kept in businesses’ databases that has readily discernible meaning. It is estimated that businesses use less than half of the structured data available to them, even though it is already sorted and organized for use.

Unstructured data is more elusive and refers to the contextual information stored outside most businesses’ internal systems that gives meaning to structured data. Understanding how unstructured data can be analyzed for actionable insights is necessary for determining how it functions in data lakes.

AI and ML systems often struggle to determine the meaning and sentiment behind messages. Understanding unstructured data such as photos, images, videos, text messages, social media posts, PDFs, text documents and emails can be particularly challenging for such systems. Data lakes can collect and processes this information — as well as other details like server logs, individual device data and international blacklists — to enable advanced learning tools to more comprehensively analyze data.

Most companies looking to fight fraud are failing to tap into this information. Some sources estimate that modern businesses use as little as 1 percent of their unstructured data, meaning many do not consider the context when scanning for fraud. Unstructured data is projected to account for approximately 80 percent of the data enterprises process on a daily basis by 2025, however, indicating that firms face a significant gap in the amount of data they use and the high volume that will be available to them.

Role Of Human Analysts

Human analysts are often the most adept at detecting contextual particularities inherent in unstructured data. Such information usually consists of written and spoken language, the nuances of which algorithmic tools like AI and ML have difficulty understanding.

Using human employees to assess each transaction for potential fraud can drain funds and deplete resources, leading most modern companies to rely on digital tools to power large-scale, anti-fraud initiatives. This means they fail to adequately invest in technologies capable of parsing unstructured data, and many possess core operating systems that cannot even store such information.

This is a problem that can be easily remedied. There are a host of technologies that can help businesses make sense of syntactic and grammatical data, but firms must first invest in technologies to store it. Data lakes are therefore crucial to enabling comprehensive data and analytics strategies. These repositories are capable of storing not only structured and unstructured data, but also semi-structured data, which exhibits characteristics of both previous forms.

Insurance providers have been particularly quick to adopt data lakes and other tools that tap into businesses’ unstructured data reserves, as the success of the insurance business model hinges on providers’ abilities to detect and thwart fraudulent claims. These insurers collect unstructured data in data lakes and apply special tools to sift through it.

Some of the more common tools include decision logic and language processing functions. These solutions power a sophisticated form of text mining that allow insurers to scan text stored in data lakes for key words indicating fraud and even examine handwritten claims to assess their validity.

The applicability of data lakes, text mining and other decision logic- and language processing-based functions extends beyond the insurance and financial services sectors. Strategies that rely on both structured and unstructured data have been so successful that U.S. agencies now regularly employ them. The Department of the Treasury utilizes a cloud-based solution called the Workplace.gov Community Cloud (WC2), for example, which provides it with data analysis capabilities far beyond those offered by its previous system. The WC2 can not only collect audio files, but also transcribe them and even provide sentiment analyses based on their content.

The collection and analysis of contextual data is particularly important as more consumers’ personally identifiable information (PII) is compromised via targeted scams and large-scale breaches. Some estimates claim that as much as 34 percent of U.S. consumers had their PII compromised in 2018 alone, and half of all consumers can find their birthdates, passwords, credit card details or even their Social Security numbers floating on the dark web. This means fraudsters have plenty of resources available as they attempt to access accounts and steal money.

Checking the PII businesses gather from their customers is often not enough to root out fraudsters and having plans in place to collect and analyze contextual data is crucial to firms’ anti-fraud efforts. Most businesses have a long way to go before they are fully equipped to combat fraudsters, but the path they must follow is clear. They must learn to store, analyze and utilize their unstructured data, and data lakes are the first step in this journey.

Businesses must contend with the three V’s of data, but the right AI and ML tools can add two more V’s that can aid them: visibility and value. Data lakes and AI tools can provide enterprises with greater transparency, and insights into their data can help convert unstructured data into actionable intelligence.