Intro to Art-Focused Machine Learning

12 min readDec 5, 2018

TLDR: Download the course material for: Visual Machine Learning, Machine Learning in Writing and Speech, and Machine Learning in Audio/Music.

Goals for this course material

I recently taught a series of workshops on Machine Learning at Pacific Northwest College of the Art’s Make+Think+Code program, and thought I’d make my course materials available online, with 3 goals in mind:

Make the material available to a wider audience of artists looking for a code-minimal, broad overview of machine learning
Provide a starting point for those wishing to teach their own Machine Learning workshops (it’s a lot of work to generate all this stuff from scratch!)
Get feedback on inaccuracies, suggestions on improvements, questions on things that aren’t clear, or recommendations on other technologies, artists, etc. that are accessible to beginning coders / machine learning enthusiasts.

Credit

I should say that I am not a data scientist or a machine learning expert — just a dedicated hobbyist looking to delve in to a topic I find fascinating. By understanding Machine Learning, I think we learn a bit about our own brains, as well as get a sense of where Man and Machine can and will ultimately intersect to achieve new and fascinating things.

Much of my own research and course material was based off of my own primary learning resource, Gene Kogan’s excellent Neural Aesthetic course at ITP. Kyle McDonald’s Medium Article on Machine Learning in Music played heavily, as well as Distill’s amazing Collab notebooks, among many other sources. I’ve tried to link out to my source material where possible, but I have to admit I wasn’t very judicious in documenting as I went along. Please let me know if you see any oversights.

What is Machine Learning?

Machine Learning is the concept of assigning an objective to a computer, giving it a concept of success, failure, and progress**, and giving it the ability to evaluate and adapt along the way to find an optimal solution. This is done through neural networks.

They process information similar to the way the brain does, through Neurons. Neurons receive input, perform some sort of processing on that input, and send it off as output (like a filter).

Essentially, that output can only be a number or a label. But in our models and frameworks, we can turn those numbers and labels into something much, much more meaningful.

The magic of machine learning is in the fact that we can have thousands of them working together in unison to accomplish a task, intuitively reacting to each other, firing in response to each other just like the brain. Cells that fire together wire together.

Is It Art?

Clearly some think so. Paris Collective Obvious’s Portrait of Edmond Bellamy is set to become the first AI generated artwork sold at auction at Christie’s.

And Mario Klingemann won the lumen prize in technology-driven art for teaching an AI to recognize and render the human form from a stick figure.

HISTORY

Early 1900s

Ada Lovelace(first computer programmer) aspires to build a “calculus of the nervous system”

Charles Babbage (Ada’s mentor) — invents the first mechanical computer

1940s

Neural nets pioneered in the 1940s with Turing B-Type Machines — organized logic gates responding to each other. Based on research being done into neuroplasticity — the idea that through repeat activations of neighboring neurons, efficiencies are gained, forming the basis for unsupervised learning.

1950s

Marvin Minsky’s SNARC is the first Neural Network Machine.

1990s

Until this point, the primary application is in support vector machines, essentially calculating probability through linear regression. Examples include image classification and text categorization.

IBM’s Deep Blue beat Gary Kasparov in chess.
LSTMs invented, improving efficiencies.
MNIST database for handwriting recognition by American Census Bureau and some high school kids (free labor!). MNIST is sort of the “hello world” of Machine Learning.

2009

“deep belief networks” were introduce for speech recognition.
ImageNet data set Introduced. Catalyst for our current AI boom.

2011

IBM’s Watson beats human competitors in Jeopardy.

2012

Convolutional Neural Networks(CNNs) — Huge advancement for image processing!

2014

DeepFace by Facebook increases facial recognition accuracy by 27% over previous systems, rivaling human performance.
Deep Dream — brings AI to the attention of artists and tech generalists.

TODAY

Deep Learning is more accessible than ever. Computation can be done in the cloud, we have more and more data available from which to learn from, and frameworks for machine learning are increasingly numerous. Popular frameworks such as Tensorflow have been extrapolated to more and more accessible languages, including Keras and ML5, which we’ll be using in this class.

WHAT CAN WE DO WITH MACHINE LEARNING?

In theory, everything the human brain can / could do, and then some! Right now common applications include:

Prediction — The core goal of machine learning. Comes in two basic forms:
Classification — assigning a label
Regression — assigning a quantity
This is essentially all a neuron can do!
Sentiment analysis — determining whether words as a corpus are positive, negative, etc.

Pose detection

https://www.youtube.com/watch?v=PCBTZh41Ris&

https://youtu.be/PCBTZh41Ris

Image segmentation

Image colorization (take a black and white photo, make it color)
Resolution (Enhance!)
Text to image — Type some text, the neural network constructs what it thinks that text might look like in image form.

Image to text — A machine captions what it sees in a photo using natural language
Speech synthesis (generating voice w text)
Word vectors — define the relationship between words.
Translation — improving word-by-word translation by better identifying common word combinations across languages.

THE BASICS OF ML

What are Neural Networks?

Artificial Neural Networks (ANN) are a collection of connected units or nodes called artificial neurons which loosely model the neurons in a biological brain.

Each connection, like the synapses in a biological brain, can transmit a signal from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it.

In common ANN implementations, the signal at a connection between artificial neurons is a number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs.

Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs.

How do we evaluate the findings of a single neuron?

Examine Linear regression as a simple example: finding a line of best fit. Our evaluation criteria is called a loss function, determining distance between line and data points and penalizing us for greater distances. But doing this across all neurons would be incredibly slow. It would mean having to evaluate every single datapoint!

So how Do We Train Neural Networks?

The most common evaluation of progress across a neural network as a whole is called Gradient descent. It’s sort of like finding your way down a mountain in the dark with a weak flashlight. You look in all directions, see which step takes you furthest down, take a step, then repeat.

We are looking for one thing: the weights of a certain variable or variables that minimizes the cost.

What is Cost?

A cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and Y.

Gradient Descent IRL

Neural networks for practical problems are rarely linear! They have many dimensions / variables. Rather than having a nice easy bowl shape, they’re full of hills and troughs, which can greatly complicate things.

The more variables we have to solve for, the more neurons we need, increasing processing time exponentially. With just brute force guessing, trying to classify a set of digits from 0–9 would take 10*12000 guesses (784 input neurons, 15 hidden, 10 output) . There are only 10*80 atoms in the universe! So we need to optimize.

Optimization

One way to do this is to figure out how far we step in a given direction. This is influenced by learning rate. Too far, we may end up back uphill, too close, we’ll never get down.

There’s gonna be some redundancy in our progress — if we have a long stretch of downhill, we don’t need to evaluate every single step. In ML we can batch it out.

When we have high confidence or continued success in our path, we can use momentum to keep going in more or less the same path. If we keep choosing uphill, we eventually run out of momentum and need to find a better path.

Another optimization method is back propagation — taking what we’ve learned at our current progress step and applying it backward to all previous ones to figure out what is no longer necessary. This allows us to increase our learning rate.

As we train our model, we might notice patterns, called features. If we were analyzing hand writing, those features might ultimately be loops, ascenders, descenders, etc. If we were identifying cats, we might look for eyes, ears, noses, tails.

But features exist in different layers, and the initial layers might just be edges, patterns, colors. Only as these layers build on each other do they take the shape of what we’re looking for.

Identifying features allows us to vastly speed up and increase the accuracy of our models.

How Do We Evaluate the Network As a Whole?

We could run our model over our full dataset, but we are in danger of overfitting — presuming the findings from our data is representative of all data in existence. This is a hard to avoid problem of all sciences.

To evaluate, we can split the data into training and test sets. This allows us to make assumptions based on our learnings, and then test them to find out if they hold up.

How Do We Know How Much to Train?

Consider studying for a test. You improve by studying, but at some point all that studying doesn’t actually improve results / retention, or rather it’s not worth the cost. We can’t remember everything.

To figure out how much to study, you can add a validation set in which you examine all your assumptions.

ML ARTISTS

Mario Klingemann

Excellent Work in Style Transfer

https://twitter.com/quasimondo/status/1043458889969221632

Gene Kogan

I got a lot of the class material from him. Artist and instructor at ITP

https://twitter.com/genekogan/status/857922705412239362

Botnik Studios

Great tools for AI-driven NLP

AI Weirdness

Awesome examples of AI gone weird. E.G:

Neural Network College Course List

General Almosts of Anthropology

Deathchip Study

Advanced Smiling Equations

Genies and Engineering

Practicum Geology-Love

Electronics of Faces

Devilogy

Psychology of Pictures in Archaeology

Melodic Studies in Collegine Mathematics

Advanced Computational Collegy

The Papering II

Professional Professional Pattering II

Introduction to Economic Projects and Advanced Care and Station Amazies

Every Methods

Chemistry of Chemistry

Internship to the Great

The Sciences of Prettyniss

Geophing and Braining

Survivery

Jun Yan Zhu

Most ML advancements seem to start at the research level. This guy has been responsible for so many advancements, including CycleGAN and Vid2Vid, which we’ll look at soon.

Sofia Crespo

So What Machine Learning Architecture Should I Use?

There are a bunch, and it depends on what you are doing. Here is a (vastly oversimplified) summary of a few key networks:

Convolutional Neural Networks (CNNs) — In these networks, neurons only concern themselves with data of neighboring cells, and are very efficient at simplifying information down to only its essential bits, and filtering out noise. Therefore, for things that require a lot of data, like images, sound, and video.

Recurrent Neural Networks (RNNs) — These networks have a concept of memory — propagating their findings backward through the network. These networks are good at prediction — finding the next thing in a sequence, and adapting to learnings as that sequence evolves — such as with sequences of words, arrangements of like objects, etc.

Deep Belief Networks — The back-propagation of recurrent networks comes at a cost — information must be labeled, which is inefficient, as much data can actually be represented numerically. In order to account for this, deep belief networks perform unsupervised learning: making decent decisions about their locally relevant data, that are perhaps not optimal for the network as a whole. If you’ve seen Deep Dream Puppy Slugs, you can see understand how these networks might continue to see one kind of a thing in an image, rather than looking at the image as a whole. Deep Dream has been programmed to be hyper-aware to local neurons, instructed to find the features it is in charge of wherever possible.

Generative Adversarial Networks (GANs) — Two networks working together by working against each other. The first network generates content, and the second judges the outcome. Does the generated content look natural, or artificial? The generator tries to fool the discriminator, and the discriminator tries to catch the generator in the act of forgery.

Machine Learning Workshops:

We will be looking at Machine Learning in 3 different core areas. Click the links to go to the course material for each specific section.

Course Files

This Google Drive contains all of the applications and source code we will be using.

Visual Machine Learning

ML as it relates to image processing, video, and other visual mediums

Machine Learning in Language

ML with respect to text, speech, and language processing

Machine Learning in Sound+Music

ML in audible contexts: voice, music, sound, etc.

I’ve Done Them All. Where to Next?

This course barely scratches the surface of what is possible with Machine Learning. Here are some good resources to take your explorations further.

Courses

Machine Learning for Musicians and Artists — Haven’t taken it, heard it’s good

Instructor: Rebecca Fiebrink

Source: Kadenze

The Neural Aesthetic — the course from which much of this material was drawn

Instructor: Gene Kogan

Source: ITP at NYU

Fast AI — A great library as well as courses to get you started in a framework that prioritizes quick algorithms.

Google Colab — Collection of ML-related Jupyter Notebooks to work through including a Machine Learning crash course and intro to Pandas.

Tools

Lobe — Visual AI model training tools

RunwayML — Suite of plugins, models, and tools for more advanced ML applications from fellow ITP alums.

ML4A-OFX — Tools and code samples / applications using openFrameworks

Fast AI — A tool and course for AI using Python/PyTorch

Pix2Pix Unity — Style Transfer within a Game Engine. openCV for Unity also has some ML models incorporated, including YOLO

Puppeteer — A headless web browser that can be used in conjunction with the language of your choice (node.js?) for scraping data.

Inspiration

Google Chrome AI Experiments

Community

ML4A Slack — Started at ITP, open to anyone

Data

Andy Warhol Photography Archive

Internet Archive Book Archive

List of Datasets on Wikipedia — Generally searches for “datasets for machine learning” on Google will yield a wealth of public data sources.

Data Scraping — A list of courses / tools for data scraping from Coursera. Arguably more important than machine learning knowledge is access to data, so this is worth learning.

Beautiful Soup — The Data Scraper I’ve Used (Python)