They are called "Data Scientists" and are part of Machine Learning. Their mission: to build, maintain and evolve the AI that automates the management and analysis of contracts. Who are they? In our series on the backstage of a legaltech, Ahmed and Romain, Data Scientists at Hyperlex, present their job and their daily life to help us understand the functioning and applications of Machine Learning in the legal field. Meet the team of experts behind the software.
Can you present your background and what motivated you to join a legaltech like Hyperlex?
Ahmed: I arrived at the beginning of the Hyperlex adventure, two years ago, at the end of my engineering school. The subject appealed to me a lot, knowing that I had already worked on the analysis and exploitation of a large corpus of text. It was an opportunity to work on the state of the art of NLP applied to the legal field: clause detection, document classification, information extraction in contracts, etc.
As part of artificial intelligence technologies, NLP consists of processing and analysing raw text to accomplish a number of tasks: understanding what the text is about, locating elements in the text, identifying recurring sequences, etc.
In contrast to a form, plain text is not always precise, it can even be ambiguous, which introduces complexity. Its linguistic structure involves several dimensions, syntactic, grammatical and semantic, which the NLP has the task of decoding.
Romain: I had the opportunity to discover artificial intelligence and machine learning during a gap year in engineering school, which allowed me to do research internships in this field and to specialise from the start. I met Hyperlex by chance on the Internet... joining this project came naturally.
A: It's not the legal field per se that attracted me. In the field of text analysis, contracts are present in large numbers and are, moreover, relatively formatted documents, with implicit and explicit rules: they are therefore very good candidates for Machine Learning. We feel that we can achieve interesting things with AI in this field.
A: Law is a challenging and interesting area to apply reading comprehension methods. Legal information is complex and crucial. There is a lot at stake. Take amounts and due dates, for example. This is where NLP comes into its own: it allows you to identify key elements in the contract.
What are the different jobs in Machine Learning? Can you tell us a little more about your team?
Our team consists of three profiles and divides its time between R&D and customer projects.
- Research Scientists: they work on rather theoretical subjects. Their mission is to model and design new ways of extracting data.
- Data Engineers: they implement and deploy Machine Learning algorithms(1).
- Data Scientists: they make the link between theoretical research and the applied world. They integrate business constraints, put algorithms into practice by adapting highly theoretical concepts to their sector of activity, and prepare the work for the data engineers.
What is machine learning and how does it work?
These are techniques that identify a set of hidden similarities in a dataset and thus recognise patterns: these are called "patterns". The machine learns and evolves. It uses these patterns to make rankings and predictions on new data sets.
The nerve of war is data. Indeed, to find correlations between data, there must first be data! This data is initially provided by the business experts, i.e. the lawyers. Only they can indicate to the data scientists the constraints and specificities of their field: what makes the difference between two contracts, between two clauses, between two legal concepts, etc. Subsequently, machine learning will be able to identify similarities within a set of contracts and automatically recognise the different typical elements. This is what will save time and visibility for the user who handles masses of contracts.
A user can train the machine himself and make it evolve then...
Yes. In the Hyperlex interface, at the zero hour, the user is given models that have already been trained, but he is also given the possibility of training models on his own contracts, by annotating his documents on the information that interests him in particular. By validating his summary sheets, he allows the AI to learn and to bring him even more precise and rapid results afterwards.
At the same time, our Customer Success team will train the AI on the specifics of the client's contracts during theonboarding phase in order to deliver a solution that is perfectly exercised on the client's contract library.
What are your interactions with Customer Success? Can you give us an example of a project?
We intervene at the level of the recovery of existing data. Initially, Customer Success identifies with the customer the characteristics that he wants to follow in his contracts in order to train the machine to recognise them. The Machine Learning team puts in place the tools specific to the customer case to enable Customer Success to carry out this recovery as quickly and efficiently as possible.
Each use case can help us to implement new functionalities which we will then deploy in Hyperlex. This contributes to the scalability of the solution, which allows us to deliver an increasingly reliable and well-trained API.
How many contracts are needed to train an AI?
To be able to identify patterns, the AI needs more than one contract of course. But there is no fixed number, simply because it depends on the problem, on the task to be performed. You have to take into account the variability, the complexity and the quality of the media - on this last point, you have to take into account the OCR(2) component.
What are the qualities of a good Data Scientist?
Be honest and pragmatic with the data. In order not to make mistakes, one must be aware that the field of Machine Learning does not yet allow perfect results to be achieved, it is evolving. This is why the data scientist must be creative and ambitious: there is still a lot to do. There are many ways of "cracking" a problem, the trick is to be able to imagine the solution.
Furthermore, although the Data Scientist is not there to prove a theorem, he or she must be able to rigorously understand the new advances in the field in order to see if they are applicable in his or her case. In addition to mathematical, theoretical and technical knowledge, this requires constant monitoring.
What are the main challenges of this profession in the legal sector?
A: Constantly optimise reliability. All algorithms have "accuracy" levels, which means that they can be wrong. In the legal context, we cannot afford to give the user the wrong information or to make him miss important information. It is a real challenge for us to deliver tools with the highest level of reliability.
A: Human-machine interaction. When humans make a mistake, they don't know they are wrong. When the machine makes a mistake, we have clues that show us that. And we can take advantage of this! This is a transparent approach: making inaccuracies visible to the user, for example via confidence scores, to make the work of the automation even more reliable. What must be understood is that the machine helps the analysis: by suggesting to the user the key information it identifies in its contracts, it puts him in a position of control.
When is a victory for you...?
From a project point of view: when a client expresses his enthusiasm at the end of an onboarding, that little "wow" effect when he feels that the machine meets his needs. From a research point of view: when we make a surprising discovery, when we test new techniques and see that they work well on our use cases. Then we feel we have gone one step further.
A message to lawyers who are considering the installation of an AI solution within their legal department?
A: Before setting up an "AI", you have to find out about its limits and ask yourself: "Can it really save me time? It's a shame to install an AI without using it. Using an AI is already, in itself, actively participating in innovation in the legal field. Moreover, we must take the time to rethink internal processes so that they integrate the technological brick into the lawyer's daily life.
A: Before starting, it is important to identify the tasks that are systematic (for example, filling in an excel file with the effective dates): it is on this type of task that automation will be most useful. While AI is not designed to understand a legal concept in detail, it is capable of instantly finding the precise clauses in which the lawyer must make his analysis.
Want to know more about AI in contract management and analysis?
It's here 👇
Check out the interview with our Head of Product right here: Interview with Silvana de Santis, Head of Product 💖