Legal Supplier, Electronic Discovery

Culling the data set for effective eDiscovery

By Staff

In the first instalment of a two-part series on effective eDiscovery, Epiq's Jason Bell-Masterson explores objectives, budgetary limitations and how to cull data to create a focused data set.

Electronic discovery is increasingly used as a way to collect, sort through and review large volumes of electronic data in the course of a lawsuit, investigation or arbitration, and there is a logical process and sequence to ensure it’s effective and efficient, says Jason Bell-Masterson, director of the Toronto branch of the legal technology company Epiq.

“It’s the natural evolution of the old paper discovery process — if you’re involved in a lawsuit or being investigated by the government, you are required to make information that’s relevant to the matter available to the other side,” Bell-Masterson tells With so much data created daily, this can result in the production of reams of documents.

“When everything was on paper, you had an army of paralegals or lawyers who would sometimes sit in warehouses and they would just go through boxes of paper” by hand, copying whatever was relevant to share with the other side, he says.

The digital age — which introduced the ability to store vast amounts of information on a computer, storage device, server and in the cloud — ushered in alternative ways to sort through gigabytes or even terabytes of electronic information for legal purposes.

That process, usually referred to as eDiscovery, involves a team working to get through all of that data in a prescribed period of time and within a specific budget. That team often involves the end client who is the subject of the litigation or investigation, lawyers knowledgeable in the applicable case law and experts in the subject matter.

An eDiscovery vendor brings in any needed expertise in forensics and handles the day-to-day management of the project and review management, Bell-Masterson explains, adding the process needs to be defensible and demonstrate a detailed and reasonable sequence.

“The first stage is collecting the potentially relevant set of data, which is presumably going to contain all the pertinent files related to that case,” he says.

The first step is keying in on what’s relevant and sifting out what isn’t, Bell-Masterson says, and often that means having the lawyers interact with the individual closest to the data who will know what is likely to be more applicable and where relevant data is stored.

“The reason this is important is that the cost of eDiscovery is largely based on the volume of data you end up collecting and then moving through the process. So if you can target what you’re collecting to pull only the relevant data, that will reduce your cost throughout the rest of the process,” he explains.

"The next opportunity to reduce cost is in deciding what collected data needs to be processed," Bell-Masterson says. “If the focus of the search is user-generated data, you might target certain file extensions. At the same time, certain file extensions can be eliminated from the data set if they’re not relevant."

Once that pool of data is identified, it needs to be processed.Through various industry tools the raw information — which includes a variety of document types such as spreadsheets, presentations, PDFs, and emails — is put into a standardized, searchable format, providing a second opportunity to reduce the data set using targeted search terms and date filters, Bell-Masterson says.

Stay tuned for part two where Bell-Masterson explores the review, predictive coding and document preparation phases of eDiscovery.

To Read More Epiq - Jason Masterson Posts Click Here