Ken Chatfield

13.11.2023

5 min read

Share on:

Efficient Learning of Domain-specific Visual Cues with Self-supervision

Introducing Perceptual MAE, a new method for efficiently learning domain-specific visual cues using self-supervision. This work is part of our AI 2.0 initiative and was presented at CVPR 2023.

As part of our work to build models which generalise better over different types of damage to vehicles and property, we have developed a new method which can automatically learn to understand visual concepts (such as ‘cracks’ or ‘dents’) directly from images.

The method we developed, based on techniques from generative modelling:

Achieves state-of-the-art classification performance (ranking #2 on ImageNet globally at time of writing with an accuracy of 88.6%)
Is significantly more data and compute efficient than alternative methods including the top ranking method (with a model that is over 3x smaller)

We further show these properties generalise across domains and tasks, providing a way to accelerate the creation of performant image classification models in real-world settings, such as those we face at Tractable, with a much reduced requirement for annotated data.

The work was presented recently at CVPR 2023, and we are open-sourcing our approach such that others can build on our approach.

Learning through Generation

To learn from images without relying on any additional information such as labels, we take the following approach: we mask out parts of the image and ask our model to learn how to fill in (or 'generate') the missing patches:

Generating missing patches as a learning task. Perceptual MAE is trained to reconstruct the image of the dog using only the visible patches on the left-hand side, with the right-hand side showing actual output from our trained model for the missing patches. The model learns from the training data what dogs generally look like, enabling it to reconstruct the head and legs (outlined in green on the right-hand side, both occluded on the left-hand side). For a further example of actual model output on real-world data, see the header image of this blog post.

This is an example of using a generative model (as we are learning by generating parts of the image) for what is known as self-supervision (as learning occurs not by trying to predict a separately provided label, but by predicting properties of the image itself).

It turns out that by learning to generate missing patches from an image, knowledge is picked up by the model which is useful for understanding its contents. Therefore, rather than e.g. learning to identify dogs directly from human annotations, by learning to complete images of dogs we build some understanding of what dogs generally look like.

A similar approach is used by language models such as ChatGPT and GPT-4, where by predicting missing words the model learns to generate its own sentences. In the case of images, rather than words we learn to directly predict missing pixel values in the image. This was shown to be effective by the masked autoencoders (MAE) work on which we build.

Addressing the Grounding Problem

What motivated our approach was the observation that by predicting missing pixel values directly, MAE has a tendency to overly focus on getting individual pixels exactly correct. If we visualise what the trained model is attending to, we can see this in the focus on pixel-level details in the background:

The original masked autoencoder (MAE) overly focuses on individual pixels, leading to a diffuse attention map characterised by high attention on background water pixels in this image

However, for ImageNet and other similar datasets where the evaluation task is to identify the contents of the overall image, placing more emphasis on the consistency of how the different parts of the image fit together to form the whole is desirable. This has also been shown to align with the way humans assess the contents of images, and we are inspired by this insight in our work.

We add more global image-level information to Perceptual MAE by following two steps:

We train a separate neural network to assess how natural the overall image is compared to data seen in training (equivalent to the ‘discriminator’ in the generative adversarial learning literature)
Crucially, we then encourage our generation model to use information learnt by this model to guide generation

Step 2 uses a technique known as perceptual loss by feature matching, illustrated in the figure below. This ties the internal representation used by our generation model to those of the hidden layers of the ‘discriminator’ model trained to identify real generations from fake ones:

We use perceptual loss by feature matching: a method which implicitly encourages the internal representation of the data used by the generator network to be similar for images where the contents are the same (e.g. both being of a cat), even if individual pixels may differ

The result is the Perceptual MAE generation model focuses a lot more on object outlines and layout vs the original MAE:

Comparing the attention maps of MAE vs our proposed method Perceptual MAE: our method focuses much more on overall image layout and the outlines of objects

By using knowledge of the task at hand (ImageNet classification) we have ‘shaped’ the focus of our model towards relevant details in the image, in this case overall image semantics. This contributes greatly to the efficiency of our approach.

Performance and Results

Our method achieves boosted performance over ImageNet, setting a new state-of-the-art of 88.1% without using any additional training data.

If we loosen this restriction and use a pre-trained model for feature matching, we can match the recently released DINOv2 method when using the same input image size attaining 88.6% accuracy. This though is with a much smaller model (only 307M parameters) speaking to the efficiency of our method.

A summary of our results compared to recent alternative methods, both with and without additional training data, along with a comparison of the parameter count of each model is shown below:

Performance of Perceptual MAE over ImageNet compared to other recent methods, the number of trainable parameters for each model shown in the grey bars

We found that these results also generalised across different visual tasks, also beating previous methods by a similar margin over object detection and semantic segmentation (for further details see the paper).

We also evaluated performance when applied to different domains such as the tasks we have at Tractable to see if the above results translated to real-world settings. We found that Perceptual MAE trained on domain-specific images could provide a significant boost to performance vs a supervised (purely trained with labels) baseline, particularly when trained with a limited budget of labelled images.

This is shown below when fine-tuning for the Tractable task of vehicle damage assessment across a dataset comprising 500K annotated images:

Accuracy over a Tractable classification task (vehicle damage assessment) when either training with conventional supervised learning or pre-training with Perceptual MAE over a training set of limited size. We obtain improved accuracy when the number of annotated training images is small compared to conventional supervised learning.

Guided learning = more efficient learning

Recently, there has been a trend in research towards ever larger models. Perceptual MAE demonstrates that with some carefully selected assumptions, it is possible to achieve better or on-par performance with state-of-the-art methods without requiring ever more data and compute.

This also is visible in training, with our method taking around 10x less GPU resources to train than DINOv2. Perceptual MAE also makes the learning of image classification and other tasks on top also much more data efficient, requiring fewer labelled examples.

What’s Next?⁠

This work furthers the path of training performant specialist domain-specific models directly from targeted collections of images in an efficient way, providing an alternative to relying solely on scale as a method for boosting the performance of practical ML systems.

It is also a step towards addressing the robustness and bias issues typically associated with the long-tail when using supervised methods. As part of the Tractable AI 2.0 initiative, we are working to ensure that the computer vision models we train rely directly on expert-defined cues such as ‘cracks’ and ‘scratches’ rather than other superfluous correlations. This work forms one part of this broader initiative.

-> Read the paper

-> Get the code

Discover more related content

Article, 04.03.2022

4 min read

Rising car insurance costs – and how AI can help carriers compete

With huge surges in auto insurance premiums and repair costs, everyone is feeling the pressure of high inflation – insurance carriers and customers alike. So how exactly did we get here, and how can insurers move forward to control costs and deliver valuable customer experiences?

Article, 31.01.2022

1 min read

Tractable: Branching out in Brazil

As a scaling tech company, entering new markets at the right time is critical. Get it right, and you can secure new partners that help fuel further growth; but time it poorly, and you stretch resources to breaking point.

Article, 24.12.2021

3 min read

What insurtech industry trends to watch for in 2022

Insurtech — technology developed to improve and transform the insurance industry — is having a bit of a moment. Forrester recently reported record-breaking funding for insurtechs, closing Q3 at $15 billion.

Article, 10.12.2021

5 min read

Enabling touchless claims since 2020

Automated and touchless processes have become much more widely embraced over the past couple years. Tractable has contributed to that shift toward efficiency by making the vision for Straight-Through Processing (STP) a reality.

Article, 01.12.2021

5 min read

Putting tech to the test: How do we make sure new solutions work?

It’s not easy being an insurance company in 2022. The business environment is shifting quickly, both from the impact of COVID and as new technologies enter the market – introduced both by rival incumbents, and new tech-enabled start-ups.

Article, 19.11.2021

1 min read

Deloitte names Tractable as one of the UK’s 50 fastest-growing technology companies

Deloitte has named Tractable in the 2021 Deloitte UK Technology Fast 50, a ranking of the 50 fastest-growing technology companies in the UK

Article, 09.11.2021

3 min read

Leveraging AI is key to customer-centric success

AI is transforming processes, making them not just more efficient and cost-effective for the business but more customer-centric in terms of the service.

Article, 03.11.2021

3 min read

Touchless, line-by-line auto insurance claims: one year on

One year ago, a driver made a phone call to their insurance company – and made history. How? Well, the insurance claim that followed was the first-ever completely touchless claim – in that it was entirely carried out by a trained artificial intelligence (AI).

Article, 26.10.2021

4 min read

A subrogation adjuster’s nine-to-five

When I worked for a top five insurance company, I was responsible for centralized services that encompassed subrogation, salvage, triage, and administrative operations, including over 200 subro and arbitration adjusters on the staff.

Article, 11.08.2021

3 min read

Tractable accelerates accident and disaster recovery with Graphcore IPU

Tractable has become a standout success story among the many companies using artificial intelligence to transform traditional industries.

Article, 28.07.2021

5 min read

How we built an AI unicorn in 6 years

Today, Tractable is worth $1 billion. Our AI is used by millions of people across the world to recover faster from road accidents, and it also helps recycle as many cars as Tesla puts on the road.

Article, 22.07.2021

1 min read

Tractable’s AI Subro expedites insurers’ review of demand packets

Tractable, the global leader of AI for accident and disaster recovery, just launched AI Subro, a new solution that will help US auto insurers resolve subrogation tasks with AI.

Article, 09.07.2021

1 min read

SOMPO Japan and Tractable: live on World Business Satellite news

SOMPO Japan Insurance Inc., one of Japan’s largest P&C insurers, has revealed on World Business Satellite – the country’s leading business TV channel – that it will now be using Tractable’s #AI to assess damage at first notice of loss.

Press Release, 27.01.2022

3 min read

Atos: AI speeds up claims process and boosts auto insurance industry

Through Atos’ connections and infrastructure, Brazilian insurers will be able to use Tractable’s AI to accelerate all aspects of an auto insurance claim, helping return cars to drivers faster than ever before.

Press Release, 11.05.2022

1 min read

Beesafe to use AI to assess car damage and settle claims in minutes

Beesafe, part of the VIG Group, will use Tractable’s AI to create immediate car damage assessments and repair estimates

Article, 19.05.2022

3 min read

Tractable AI innovation recognized at Poland’s Banking Forum and Insurance Forum

In April 2022, Tractable received the award for ‘Most Interesting Innovation for Insurance’ at Poland’s annual Banking Forum and Insurance Forum.

Article, 06.06.2022

3 min read

Claiming a new world: How applied AI is transforming auto insurance

Auto insurance claims processing used to be a time-consuming, repetitive and attention-demanding series of tasks and stages, each prone to inefficiency, errors and delay. Today, applied AI is radically changing the entire process.

Article, 25.10.2022

6 min read

The art of integrating transformational AI – key considerations and approaches

Artificial Intelligence (AI) is no longer a peripheral or “emerging" technology for insurance companies. According to a recent Accenture report, “Across all industries, including insurance, AI ranked consistently as the top game-changing technology."

Press Release, 24.10.2022

1 min read

Tractable and Entegral partner to offer Smart Vehicle Triage

Backed by the trusted brand and world-class service of Enterprise, Entegral has announced the launch of Smart Assist in partnership with Tractable in North America.

Press Release, 21.10.2022

3 min read

Tractable is ranked number two in Sønr Global's Insurtech 100

Tractable is ranked number two in Sønr Global's Insurtech 100, which has been published conjunction with EY alongside a judging panel from over 30 of the world's biggest insurers, consultancies and VCs in this space.

Article, 10.12.2022

6 min read

Solutions: the small team that’s driving massive impact

At Tractable, we talk a lot about solutions and how we're solving complex problems by applying Artificial Intelligence (AI). Crucial to the results we bring is our team of technical and pre-sales consultants, aptly named the Solutions team.

Podcast, 20.12.2022

41 min listen

Industry predictions for 2023

With 2022 in the rearview mirror, Tony Triola, head of ecosystem partnerships at Verisk, Michael Anderson, global lead for ecosystem solutions at Guidewire and Jimmy Spears, head of automotive at Tractable give us their thoughts on what's going to be big in 2023.

Press Release, 30.03.2023

2 min read

Covéa and Tractable renew their partnership to accelerate the processing of policyholder claims

Covéa has been working with Tractable since 2016 to simplify claims management and has been using the Tractable Auto Reviewer solution since 2020. Through the renewal of this partnership, Covéa enables the use of this technology throughout its 2,000 shop repairer network.

Whitepaper, 25.04.2023

1 min read

Download our latest white paper on AI industry impact

Discover the impact AI is bringing to the insurance, auto and property sectors and where the tech will take us next.

Article, 13.06.2023

4 min read

Revolutionizing Repairs: AI in the Collision Ecosystem

One such sector within the auto ecosystem that stands to gain substantially from AI's advancements is the collision and repair industry.

Press Release, 18.07.2023

3 min read

Tractable AI raises $65M in Series E funding led by SoftBank Vision Fund 2

Tractable, a leader in artificial intelligence (AI) using computer vision to assess the condition of cars and homes, today announces a $65M Series E investment led by SoftBank Vision Fund 2. Existing investors Insight Partners and Georgian participated in the round.

Press Release, 25.07.2024

2 min read

Tractable Partners with Saputo Capital Collision

Tractable Partners with Saputo Capital Collision to improve the efficiency and experience of the vehicle repair process for policyholders of a top Canadian carrier.

Article, 20.10.2021

2 min read

Transforming the vehicle inspection experience with AI

Article, 04.10.2021

2 min read

Sustainability: What’s AI got to do with it?

Cars are recognized as one of the biggest challenges to achieving sustainability goals for the planet, cars that cease to be driven altogether are another challenge for sustainability.

Article, 02.11.2021

2 min read

How Tractable's Auto Inspector helps the recycling industry

We owe it to the planet to minimize as much vehicle waste as much as possible. With the help of AI, more is being recycled than ever before.