Data Labelling is Good for You (video coming soon)

2024-04-19 | ElixirConf EU 2024

In the age of AI, data is crucial. In this talk, I convince you that creating your own data labeling pipeline is a critical part of using AI in your business and that Elixir is a great tool for it. While it’s unlikely you will produce a dataset of the size used by tech giants, small data specific to your use-case is extremely valuable and that the process of labeling is valuable in itself. I walk through how we did this internally, what challenges we faced, how we solved them and what we learned in the process.

Despicable machines: how computers can be assholes

2017-07-13 | EuroPython 2017

When working on a new ML solution to solve a given problem, do you think that you are simply using objective reality to infer a set of unbiased rules that will allow you to predict the future? Do you think that worrying about the morality of your work is something other people should do? If so, this talk is for you. I will convince you that you hold great power over how the future world will look like and that you should incorporate thinking about morality into the set of ML tools you use every day. We will take a short journey through several problems, which surfaced over the last few years, as ML and AI generally, became more widely used. We will look at bias present in training data, at some real-world consequences of not considering it (including one or two hair-raising stories) and cutting-edge research on how to counteract this.

Removing Soft Shadows with Hard Data

2016-05-20 | PyData Berlin 2016 (and SIGGRAPH 2016)

In this talk I presented my PhD thesis - originally gave it at SIGGRAPH 2015, but then also at PyData in Berlin, from where there's a recording available :)

Manipulated images lose believability if the user's edits fail to account for shadows. We propose a method that makes removal and editing of soft shadows easy. Soft shadows are ubiquitous, but remain notoriously difficult to extract and manipulate. We posit that soft shadows can be segmented, and therefore edited, by learning a mapping function for image patches that generates shadow mattes. We validate this premise by removing soft shadows from photographs with only a small amount of user input. Given only broad user brush strokes that indicate the region to be processed, our new supervised regression algorithm automatically unshadows an image, removing the umbra and penumbra. The resulting lit image is frequently perceived as a believable shadow-free version of the scene. We tested the approach on a large set of soft shadow images, and performed a user study that compared our method to the state of the art and to real lit scenes. Our results are more difficult to identify as being altered, and are perceived as preferable compared to prior work.

Gotta catch'em all: recognizing sloppy work in crowdsourcing tasks

2016-03-13 | PyData Amsterdam 2016

Crowdsourced work can be a solution to many problems from data labeling, to gathering subjective opinions, to producing transcripts etc. Turns out it can also work really well for functional software testing - but it's not easy to get right.

One well-known problem with crowdsourcing is sloppy work - where people perform only the absolute minimum actions allowing them to get paid, without actually fulfilling the intended tasks. In many scenarios this can be counteracted by asking multiple workers to complete the same task, but that dramatically increases cost and can still be error-prone. Detecting lazy work is another way to increase quality of gathered data and we have found a way to do this reliably for quite a large variety of tasks.

In this talk, I describe how we have trained a machine learning model to discriminate between good and sloppy work.