Legal Supplier

The dos and don'ts of predictive coding

By Kirsten McMahon, Associate Editor

Predictive coding can be a cheaper, faster and better way to get through large volumes of data quickly in order to meet production obligations, but it’s important to know the dos and don’ts before starting a project, says Candice Chan-Glasgow, director of legal review services with Toronto-based Heuristica Discovery Counsel.

Chan-Glasgow, who advises clients on how to use technology to minimize the cost of document review, says predictive coding is the process of having computer software classify documents. It uses machine learning along with user input and coding decisions and applies these classifications to a larger dataset.

“Document review tends to be the most expensive part of the litigation process simply because the digital universe is just so large and expected to double in size every two years going forward,” she tells

“When litigants are preparing to disclose documents, they frequently have to sift through large volumes of data in order to identify those that are relevant. Predictive coding is a way to get through the data quickly in order to save time and money while meeting your production obligations under the Rules of Civil Procedure.

That said, predictive coding isn’t appropriate in all cases.

“It’s not going to be appropriate where your data set has a great deal of drawings, photographs, handwritten or scanned documents or spreadsheets,” Chan-Glasgow explains.

She says one of the first things she will ask a client is the size of the data set. If there is a small number of documents, it doesn't make sense to go through the predictive coding process.

“I find that we've had the most success where there is a large volume of data. We recently had a case with a broad collection of two or three million documents. There were many different concepts in the documents, and we were looking for something very specific, so the software was able to cull out a large amount of junk,” Chan-Glasgow says.

The biggest misconception she’s come across about predictive coding is that it’s a magic button that will instantly provide a set of coded documents.

“Predictive coding is a process, it's not just a click of a button. It's a form of machine learning, so it employs user input and coding decisions, which means you have to train the system and then verify that it's making correct decisions,” Chan-Glasgow says.

“This process takes time, but if you're dealing with large sets of documents, it would take much longer to conduct a manual review.”

Here are some of Chan-Glasgow's dos and don’ts when starting a predictive coding project:

  • Don't use a large team of people to train the system. Ideally, you want one to three people because the more people involved, the more inconsistencies you have in training the system.

    “If one person is saying these types of documents are relevant, and somebody else says they’re not, the software is getting conflicting decisions. With a small team, you'll get a better work product, and it will go much faster,” Chan-Glasgow says.

  • Don’t cull the data set by keywords before starting the predictive coding process or it will artificially skew the results, she says.

    “We strongly advise against doing that because this will likely result in missing relevant documents. When using keywords, you can inadvertently exclude documents that might have typos, and you may miss synonyms or code names,” Chan-Glasgow says.

  • Don’t confuse the algorithm with conflicting coding based solely on date, for example. If you have a date cutoff where documents with the same concepts after 2015 are not relevant, but those before 2015 are, it won't work.

    “The system is not going to know the difference, it just sees this as a construction report, and it doesn't know why it's relevant in one case and not relevant in another. If the only criteria you're looking at to make that decision is a date, you'll have to code them consistently for concepts and apply your date range cutoff either before or after,” Chan-Glasgow says.

  • Do seek agreement with opposing counsel before you engage in the process, she says.

    “Don't just unilaterally decide to use predictive coding — it should be part of your discovery planning, and you should have agreement in advance,” Chan-Glasgow says. “Share information with opposing counsel on your process and be transparent about what you're doing, be open about the process and the methodology used and agree to an acceptable level of overturns.”

    By having this level of transparency, it not only makes the results more defensible but it also reduces opposing counsel's fears about what's happening behind the scenes, she says.

    “You're not going to give away any sort of litigation strategy by just being transparent about what you're doing,” Chan-Glasgow notes.

  • Do plan your timeline around having to do some manual review after the predictive coding process is finished.

    “It's not just a click of a button,” Chan-Glasgow says. “You will need time at the end to review any documents that the software wasn't able to classify and to do a privilege review.”

To Read More Candice Chan-Glasgow Posts Click Here