Building a Better Back Off: Interview with Eloquent intern Justin Dieter

Eloquent’s head of product, Josh Issler, sat down with Eloquent’s summer intern, Justin Dieter, to discuss the work he did over the summer building a smart back off for Eloquent’s chatbot.

Hi Justin! Why don’t you introduce yourself?

I’m from Colorado, just recently started my sophomore year. I came into Stanford really interested in AI. During my freshman year, I took as many AI classes as I could so I could do research in my areas of interest: NLP, Reinforcement Learning, and Computer Vision.

What have you been working on at Eloquent?

I’ve been working on an NLP model that helps the bot more gracefully back off when it doesn’t understand what a customer says. This means the bot will be able to respond intelligently to requests or questions it’s not able to answer.

Before my model, if a customer says something our bot isn’t sure how to respond to like “When is my birthday?”, we would respond by saying something like, “I’m sorry, I must have missed something in there.” That’s a frustrating response and can really make it feel like the customer isn’t being fully listened to.

Now, with my model, if someone says “When is my birthday?”, our bot can respond by saying “I do not know when your birthday is.” Although the bot still doesn’t understand what the customer said, it backs off by specifically referencing the customer’s words. It makes you feel more listened to and is a much nicer customer experience.

Justin’s model helps Elle rephrase the confusing utterance.

What was the most difficult problem you faced while building your model?

Collecting data specific to the task. There’s no pre-existing dataset for this, so I had to build one myself. I used Amazon Turk crowd-workers to get my dataset, but it’s extremely difficult to frame the instructions just the right way. You want the crowd-workers to correctly answer your prompt, of course. But they need to answer in a way that doesn’t show too much creativity, so a neural model can learn the correct patterns.

How’d you handle that?

I gradually iterated on instructions for the task. By being extremely specific in the details of what the crowd-workers had to do, I created a set of instructions that allowed them to reliably produce data fit to the task.

Justin presenting at BayLearn 2018. Link to poster at the bottom of the interview.

How would you describe working at Eloquent?

Eloquent has a simultaneously laid back and urgent atmosphere. There’s always important work to be done, but the culture still has a calm and happy feel to it. It was a really fun place to intern because the work you do is important to the actual product. You get hands on experience and you feel valued, but you’re also not crazy stressed out.

Working for Eloquent has really improved my ability to do meaningful research. I came in with a very good theoretical understanding of AI, but building something to be actually used in the product definitely sharpened my skills and improved the speed at which I can build models and collect data. Access to experienced researchers at Eloquent also really helped me learn quickly. After my time at Eloquent, I would consider myself a much more competent researcher.

What’s next for you?

I still have more work to do in order to publish a paper for the research I did at Eloquent. Outside of school, that’s my top priority. After that, I have lots of ideas I hope to try out. I want to produce more exciting research!

Best of luck to you Justin, from everyone at Eloquent!

Link to Justin’s poster: “I don’t know what to title this poster”

If you have any questions or comments for Justin, please email them to [email protected] and they will be forwarded to Justin’s personal email. Thanks for reading!



5 Fundamental AI Principles

If everyone had the time and desire to go to college and get an AI degree, you very likely wouldn’t be reading this blog. AI works in mysterious ways, but these five AI principles ought to help you avoid errors when dealing with this tech.

A quick run down of this post for the AI acolyte on the go:

1. Evaluate AI systems on unseen data
2. More data leads to better models
3. An ounce of clean data is worth a pound of dirty data
4. Start with stupid baselines
5. AI isn’t magic

Brief caveat — this post will make much more sense with a basic understanding of machine learning. We wrote a blog post a few weeks ago explaining those basics. It’s not required reading to understand this post, but it would certainly be helpful!

1. Evaluate AI systems on unseen data

In our last post, we walked through how we’d build a classifier to label images as either cats (green circles) or dogs (blue triangles). After converting our training data to vectors, we got the below graph with the red line representing our “decision boundary” (the line that separates images into cats and dogs after they’ve been converted to vectors).

Clearly, the decision boundary wrongly labels one green-circle cat as a blue-triangle dog. It’s missed a training example. So, what’s to stop the training algorithm picking the following as the decision boundary?

In both cases, we’re classifying the training set with the same accuracy — both decision boundaries miss one example. But when we pass in a new unseen cat, like below, only one of the decision boundaries will correctly predict the point as a cat:

A classifier can look great on the dataset it was trained on, but it may not work well on data it was not trained on. Furthermore, even if a classifier works well on a particular type of input (e.g., cats in indoor scenes), it may not work well on different data for the same task (e.g., cats in outdoor scenes).

Blindly purchasing an AI system without testing it on relevant, unseen information can lead to costly mistakes. A practical method to test on unseen data — withhold some of the data you give the company or person developing your AI, then run the withheld data through the resulting system yourself. Or, at minimum, insist that you be able to try out demos yourself.

2. More data leads to better models

Given the training dataset below, where would you draw the decision boundary?

Your answer likely isn’t wrong — many decision boundaries could split this data accurately. While each of the hypothetical decision boundaries below correctly split the data, they are all very different from one another, and as we saw above, some of them are likely to work worse on unseen data (i.e., the data you care about):

From this small dataset, we don’t know which of these very different decision boundaries most accurately represents the real world. The lack of data leads to uncertainty. So, we collect more data points and add it to our initial graph, getting the graph below:

The additional data helps us significantly narrow our options. We’re able to immediately rule out the green and blue decision boundaries, so we know our decision boundary has to be something like the below:

When an ML model behaves unexpectedly, the underlying problem is oftentimes that the model wasn’t trained on enough, or the right kind, of data. It’s also important to keep in mind, though, that while more data almost always helps, the returns are diminisihing. The increase in accuracy is large when we double the data of the first graph. However, if we take that graph, now with double the data, and double it again, the increase in accuracy would not be as large. Accuracy grows roughly logarithmically with the amount of training data, so going from 1k to 10k examples is likely to have a much bigger effect on accuracy than going from 10k to 20k.

A last note on data in AI and a personal pet peeve of mine, especially in the tight-budgeted startup world: you’re paying your ML engineers often hundreds of thousands of dollars in salary; make sure you give them a sufficient budget for collecting data, and give them the time to collect the data carefully.

3. An ounce of clean data is worth a pound of dirty data

While more data is clearly helpful in the example above, it is only helpful if it is accurate. In the previous example, after we collected our additional data, we had a graph and a decision boundary that looked like the below:

But what if some of these new data points were mislabeled and the real world looked more like this?

Note that although the changed dots occupy the same coordinates as the first graph, their meaning has changed. This leads to an entirely different decision boundary:

Even with only a quarter of the dataset mislabeled, it’s clear how much impact wrong data can have on how we create our model. There are techniques we can use during training to mitigate mistakes in labelling our data, but at the end of the day these can only do so much, and in most cases it’s easier and more reliable to clean the underlying data instead.

The point here is that “clean data” is vital. Clean data means the data is accurately labeled; it means the data covers a reasonable portion of the space of interest; it means there are easy cases and hard cases in the training set, so that the decision boundary doesn’t have as much wiggle room and there’s only one “right” answer; and so on.

4. Start with the stupid stuff

This isn’t to say that you should end with the stupid stuff. However, even if the final method you land on is modern and sophisticated, you’ll have developed it faster and the final result will be better.

To give an example of this in action, back when I was a first year grad student, Angel (a fellow student in our lab and researcher at Eloquent) and I each worked on separate projects grounding natural language descriptions of time to a machine-readable representation. Essentially, we were trying to get computers to understand such phrases as “last Friday” or “noon tomorrow”.

Since these projects were required for our grant, Angel worked on a practical, deterministic rule-based system. She was on the hook for making something actually work. On the other hand, I was a wee little rotation student. The team let me pick whatever fancy method I wanted, and I was like a kid in a candy store. I naively explored the newest, shiniest semantic parsing approaches. I went all out, playing with EM, conjugate priors, a whole custom semantic parser…the fun stuff.

Nearly a decade later, I’m grateful to be left with a somewhat well-received and moderately cited paper. However, Angel’s project, SUTime, is now one of the most used components in Stanford’s popular CoreNLP toolkit. The simple approach beat the shiny one.

You’d think I’d have learned my lesson. Just a few years later, now a senior grad student, I was working on getting another system up and running for another grant project. Again, I was trying to get a fancy ML model to train correctly with only modest success. On a particularly frustrating day, I got so fed up that I started writing patterns. Patterns are simple deterministic rules. For example, if the sentence contains “born in”, assume this is a location of birth. Patterns don’t learn and can only get you so far, but they’re easy to write and easy to reason about.

In the end, the pattern-based system didn’t just outperform our original system — it placed in the top 5 systems in the final NIST (National Institute of Standards and Technology) bakeoff organized by the grant and ended up heavily influencing our top-performing ML based model.

The conclusion: do the simple thing first. Anecdotes aside, there are a number of good reasons for this:

1. It’ll lower-bound the performance of the final model. You’d hope anything clever would beat the simple baseline. You should rarely, if ever, do worse than a rule based model. It’s good to know that if you’re doing worse, it means something is very broken, and it’s not just that the task is hard.

2. Often, the simple thing requires less (or no!) training data, which lets you prototype without a large investment in data.

3. It’ll often reveal what’s difficult about the task at hand, which will often inform the correct ML method to use to handle these difficult parts. Moreover, this’ll inform the data you collect for more data-intensive methods.

4. Simple methods tend to generalize to unseen data with less effort (remember: always evaluate on unseen data!). Simpler models tend to be more explainable, which makes them more predictable and therefore more clear how they’d generalize to unseen data.

5. AI isn’t magic

This is something I regularly say. Everyone nods along, but the sentiment rarely sinks in. AI just seems like magic. When speaking about grand future plans for Eloquent’s AI, I’m guilty of reinforcing this faulty notion. The further I get from the nitty-gritty of training ML models, the less the models seem like curve-fitting and the more they seem more like arbitrary magic black boxes I can manipulate to do my bidding.

It’s easy to forget that, as a field, modern ML is still very young — only 2-3 decades old. Contrasted with the maturity and sophistication of modern ML toolkits, the field as a whole is still rather immature. Rapid advancement makes it easy to forget this.

Part of the nefariousness of ML is that it’s inherently probabilistic. It technically can do anything, just not necessarily at the level of accuracy that you’d like. I suspect that in many orgs, as news spreads up the org chart, the nuance surrounding “level of accuracy” gets dropped, leaving only the “AI can do anything” part of the narrative.

How do you separate the impossible from the possible? Some best practices that I try to follow:

1. Talk to the person actually training the model. Not the team lead, not the department head, but the person who pushes “Go” on the model training code. They often have a much better insight into how the model works and what its limits are. Make sure they feel comfortable telling you that the model has limits and perform poorly on certain things — I promise you that it does, whether they tell you or not.

2. For NLP projects at least, you can often check the feasibility of a task with a quick and dirty rule-based system. ML is a wonderful way to generate a very large and fuzzy rule set that you could never write down manually, but it’s usually a bad sign if it’s hard to even start writing down a plausible set of rules to do your task. Then, collect a small dataset and try a learned system. Then a somewhat larger one, and so on while you’re still getting improvements. An important rule-of-thumb: accuracy grows approximately logarithmically with the dataset size.

3. Never trust accuracies that seem like magic: anything above ~95 or 97%. Certainly never trust accuracies above human-level, or above inter-annotator agreement. With overwhelming probability, either your dataset or your evaluation is broken. Both happen frequently, even to seasoned researchers.

4. Everything you read on the internet about ML (news, blogs, papers) is misleading or false until proven otherwise — including this post :).

Thank you for reading! Also, I just wanted to make a quick note thanking everyone for their great response to our last post. It was cool seeing people across the community engaged with the material.

As always, if you have any questions, comments or refutations, please send them to me at [email protected]. Sign up for the list to get these posts directly mailed to your inbox and visit our main site at eloquent.ai. Alright, that’s enough out of me. Talk with you again soon!

Eloquent’s Machine Learning Essentials

Google wants to use machine learning to make restaurant reservations for you. China wants to use it to help with healthcare. Facebook wants to use it to put an end to fake news. It’s easy to find flashy headlines about machine learning. However, a question remains – what actually is it?

Without an understanding of the core essentials of machine learning, applying the technology to solve real world problems is an exercise in futility. It’s like a blind man wearing glasses because he heard they help you see. One ought to understand the fundamentals of machine learning to both make good use of the technology and to avoid costly mistakes when acquiring it.

In this post, I’ll provide a concise introduction to machine learning (ML). If you’re looking for an overview of the AI landscape, please visit our other post here.

As we go along, I’ll call out technical definitions in bold in an attempt to demystify common ML jargon. Each section of this post will act like a domino in a chain, with the biggest question from the first section driving the content of the second section and so on until you have a solid grasp of the basics of machine learning. After this post, you’ll be better able to evaluate the ML at the core of artificial intelligence technology.

At its core, ML isn’t actually that complex. Most ML you’re likely to see in the real world can be understood intuitively by understanding classifiers.

What is a classifier?

At a high level, a classifier is a labeling machine. Feed it a discrete input (e.g., an image, a word, a sentence) and it outputs one of a set of known labels.

In the example above, the enigmatic “black box” represents our classifier. It takes as input a discrete object of interest (the image) and produces a label (“Cat”) from a fixed set of possible labels (the output space).

In the real world, most of the exciting ML models you might have read about are either more complex flavors of the classifier we describe in this blog post, or are composed by chaining simple classifiers together to perform more complex tasks (e.g., sentence translation, self driving cars, chatbots).[1] For the sake of this introductory post, we’re only going to examine a simple classifier predicting images into two classes: cat or dog. This straightforward example will make things more intuitive without sacrificing technical rigor, and the same insights will apply to practical applications such as self driving cars.

How are classifiers constructed?

In the previous section, we imagined a classifier as a black box: an input (the image) went in and a label (“Cat”) came out. Now, we’ll discuss how classifiers are made, which will lay the groundwork to explain how they work. The graphic below helps illustrate the key components of classifiers:

Calling out each of the three components:

  1. The training data consists of pairs of known inputs and outputs that look just like the inputs and outputs that the classifier will emulate. In our example, our data consists of  pictures and their corresponding labels. Our (input, output) data looks like this: (Image of cat, “Cat”), (Image of dog, “Dog”), (Image of dog, “Dog”) and so on.
  2. The training algorithm uses this labeled data to produce a classifier that can emulate the task demonstrated in the data. In our example, this means that the training model can label an image as “Cat” or “Dog”).[2]
  3. The classifier — defined here precisely — is a model with all of its parameters[3] filled with actual values by the training algorithm. A simple way to think about parameters is to recall the equation of a line: y = m*x + b. In this equation, “m” and “b” are the parameters.

A model is an algorithm used by our classifier to make its predictions.[4] For now, you can think of a model like an empty shell. It needs to be filled with information before it can make predictions. While a model has the capacity[5] to perform complex predictions, it requires the right parameters  to do so. Asking a model to make a prediction without providing parameters is like expecting an empty DVD player to play “Interstellar”. While the DVD player has the capacity to show anything on the screen, it lacks the instructions for showing Interstellar in particular.

To develop these parameters and fill this empty shell, we use a process known as training. Basically, we teach our model how to make predictions by using training data.

Here’s how that works at a high level: we construct an algorithm that takes as input labeled data and produces as output a classifier, or, more precisely, the filled in parameters that complete a model. Labeled data goes in (images labeled as “Cats”; images labeled as “Dogs”) and a set of numbers (the parameters of the model) comes out that defines how the model should predict whether a new image is a cat or a dog. The training algorithm fills the values of our parameters.

In the real world, how classifiers are constructed is massively important to the ultimate usefulness of the model. For example, a classifier which is fed examples of stop signs wouldn’t be able to detect traffic lights, just like a DVD player playing Interstellar isn’t showing Lord of the Rings. So, why can we train a classifier on street signs and expect it to work on unseen street signs, but not expect it to work on traffic lights? In the next section, we’ll go over how classifiers are actually trained — how we fill the values of our parameters — and gain insight into what types of unseen data we can expect a classifier to be accurate on.

How do we fill the values of our parameters?

It might help if we yank the lid off our black box in earnest and get into the step by step process of how our mysterious classifier labels images as either “Cat” or “Dog”. The following graphic represents the first few steps:

First, we have to convert our discrete input – for example, an image of a cat – into something we can do math on. For practically all machine learning applications, this is a vector.[6] A vector, in turn, can be thought of as a point on a cartesian grid.

These are usually very long vectors, which means these points are not in 2- or 3-dimensional space, but in something like 500-dimensional or 10 million-dimensional space. I recommend doing what every ML expert does, which is to visualize 2-dimensions and think “10 million” really hard.

Converting an image to a vector is a relatively straightforward process. Every pixel of the image is a dimension of the vector, and the value of each dimension is, for example, the grayscale value of that pixel.[7]

Given that each discrete input is converted to be some point in space, we just have to split this space in two: some of the space for “Cats” and some for “Dogs”. In the simplest case, we do this by drawing a line — or in the high-dimensional case a hyperplane — that separates the space.[8] This is our decision boundary.

The job of the training algorithm is to define the parameters of this decision boundary. In high dimensional space, this decision boundary exists as a high dimensional plane. In two dimensions, the decision boundary exists as a one dimensional plane – which is simply a line. Just like before, when someone refers to a decision boundary or hyperplane it’s safe to visualize a line and think “lots of dimensions” really hard:

A line requires two parameters to be fit: recall again the equation of a line, y = m*x + b. M and b are the parameters our training algorithm sets the values of. For higher-dimensions, we have more parameters, but the idea remains the same.

Putting this all together, our black box of a classifier starts to look pretty straightforward. We convert our input to a vector, we plot the resulting vector, and then we measure which side of the decision boundary (line) it’s on. For bonus points, to get a measure of confidence we can measure how far away we are from the decision boundary — the further we are, the more confident we are. In our example, the further a point is from our decision boundary the cattier the cat or the doggier the dog that point represents.

In the real world, these parameters can can define how a car decides whether or not a sign is a stoplight is displaying a go or stop signal, whether “ciao” means hello or goodbye and whether a chatbot thinks “I’m just joshing you” is a joke or someone telling the bot their name. But, it should also be clear that we’re dividing our space in our example into just a few classes. If we’re classifying between cats and dogs, how would we be expected to pick up on what a badger is? If we’re classifying whether something is a stop sign, how would the model decide whether it’s a traffic light? If we get a new image of a cat, we can expect it to fall into the portion of the space labelled “cat”, but as we move away to more and more different inputs, it becomes less and less clear that they’ll vectorize to the right place in the space.

How does training actually work?

Short answer: training is curve fitting.

Knowing how the black box works, the training algorithm is much more intuitive. Training an ML system, at its core, can be viewed as an exercise in curve fitting. In fact, you may have done least squares regression back in school — this is a perfectly valid and used machine learning algorithm.[9]

The line drawn in the plot below shows how we draw our decision boundary (each point is a training example):

Intuitively, the goal of a training algorithm is to construct a decision boundary that separates our data with as few mistakes as possible, with as many elements as far from the decision boundary as possible. For example, the boundary line in the plot above makes only one mistake — mislabeling a cat as a dog.

In the real world, much of the challenge of machine learning is to take the limited training data we have, and find the “right” curve, the curve that classifies the most unseen data correctly. It should be clear then that more data creates a more accurate boundary, or that the right data — the data that’s closest to your unseen data — should create a more reliable decision boundary. When we talk about “big data” or “clean data” or any of these ML buzzwords, what we mean is data that will help fit the right curve to separate, in our example, cats and dogs.

Where do we go from here?

This post has focused on the crucial technical underpinnings of machine learning, explaining classifiers, how classifiers make predictions and how you can train a classifier from training data. When people ask me practical questions about machine learning, the intuitions gained from these underpinnings ground my answers.

Practical questions these intuitions have helped me answer: why is it important to evaluate ML systems on unseen data? Why are neural nets such powerful classifiers? What type of data is valuable for machine learning? In upcoming blog posts, I’ll leverage the insights from this post to answer those and other questions. Other posts to look forward to include an explanation of AI axioms, how to evaluate AI systems, how to tell if AI is right for your task and more. If you’d like to be alerted when new posts go up, please subscribe.

If you still have questions about the points in this post and want to discuss further, please shoot me an email at [email protected]. Perhaps your question will turn into a future blog post!


[1] A more accurate statement here would be “a lot of tasks that don’t look like classifiers can nonetheless be understood with the same intuitions.” For instance, neural machine translation can be viewed as a sequences of classifiers that work like the below image:

Here, words of the translated English sentence is predicted from (1) the foreign sentence, and (2) the English sentence generated so far. Each of these decisions is essentially a classifier, with a huge output space consisting of every word in the English language. The details are complex, but for the most part the intuition holds.

[2] Training algorithms broadly encompass algorithms you may have heard of, like stochastic gradient descent (SGD) and variants, least squares regression, or reinforcement learning.

[3] Parameters, too, is a technical term. The parameters of the model are the learned numbers that determine how the model will perform its task.

[4] You may have heard of LSTMs, SVMs, and Logistic Regression. These are types of models. Neural networks are a family of models, including models such as the LSTM.

[5] Capacity serves as a technical term as well — high capacity models (e.g., neural networks) have the ability to learn more complex tasks, but tend to be more difficult to train.  High capacity models tend to have many parameters.

[6] For many applications, this is a higher-order tensor. But, for the purposes of intuition a tensor is just a vector with extra mental gymnastics — a vector is a rank 1 tensor, a matrix is a rank 2 tensor, and so on.

[7] For language, converting text into vectors requires an extra step. This is known as embedding words into a vector space (usually, around 100-1000 dimensions); the resulting vectors are called word embeddings. You may have heard of  word2vec and GloVE, which are two popular methods for generating word embeddings, along with the corresponding dictionaries for mapping words to vectors.

[8] A hyperplane is an (n-1) dimensional space embedded into an n-dimensional space. Much like most things in high-dimensional space, most AI experts imagine either a 2-dimensional space and a 1-dimensional “hyperplane”, or a 3-dimensional space and a 2-dimensional “hyperplane” (in that case, just a plane).

[9] Expanding on this a bit: least squares regression will fit a line to a set of points. This line is actually along one more dimension than the embedding space. In the 2D space we’ve been using as an example, imagine now a third dimension coming out of the screen, where every positive example has a value of 1 and every negative example has a value of -1. The “line” we’re fitting is then the line between the 1 and -1 values; the decision boundary hyperplane is the intercept of the fitted line with the feature space.

The AI Landscape

The “AI landscape” is vast, complicated and obscured by hype. Unfortunately, in such a space, it’s frighteningly easy to get hopelessly lost. With this blog, we’ll provide you with a map that will help you navigate the landscape. By combining learnings from our research at Stanford’s AI lab and insights from our time here at Eloquent, we hope to deliver the most useful, understandable, and honest AI blog possible. Depending on who you are, you can expect different benefits from our blog:

For business leaders, we’ll provide practical knowledge that’ll help you apply AI tech to your needs. Upcoming posts will address topics such as: evaluating AI solutions, discerning which problems are well-suited for AI, and why neural nets have been so influential in modern AI. Be sure to subscribe for the latest updates!

For engineers, we’ll provide insights about the enterprise market for AI and what we’ve learned while building Eloquent Labs, from how to build a sane REST API to tips for running stateful AI at scale in production.

For Machine Learning and NLP researchers, we’ll occasionally post deeply technical articles based on our research.

In this first post, I will clearly lay out and explain some common AI terms you may have heard before. We’ll go over important AI Techniques and conclude with a brief discussion of the Subfields of AI that apply those techniques.

AI Techniques

At a high level, the goal of AI is to perform actions that appear “intelligent” — performing tasks that emulate a person. Various techniques have been developed to accomplish this. The figure below illustrates a summary of how those techniques are divided:

We can subdivide AI techniques into four big categories: Rule-Based Systems, Search, Logic, and Machine Learning.

Rule-Based Systems

An early technique, rule-based systems define exactly what the computer should do in particular scenarios. For example, Eliza is a simple chatbot from the 60s meant to emulate a therapist. It uses strict rules, not logic, to generate responses. One fun rule: if someone types “I am XYZ,” Eliza always asks some variant of “how do you feel about being XYZ,” no matter what “XYZ” actually is. Interact with Eliza here.

For the record, I do enjoy being Gabor.


Search techniques find the best path from one state to another, dependent on your goal. If you wanted to find the shortest path in a maze, you’d use search to transition from the state of “at start” to the state of “at finish” in the shortest possible way.

A surprising number of AI techniques boil down to search. For example:

  • Constraint satisfaction. Tasks like finding a coloring of a map so that no two bordering countries share a color. Turns out, this is a search over possible colorings of a map until we reach the desired state: a coloring which fits the criteria.
  • Genetic algorithms. With genetic algorithms, we’re “searching” for the optimal solution to a problem (e.g., the shortest path in a maze) by randomly trying a bunch of solutions, seeing what works well, and then combining two good solutions in a way that’s loosely inspired by genetics. Genetic algorithms are used in cases where finding an exact solution is difficult, but sampling and evaluation possible solutions is easy — for example, fluid dynamics simulations for aerodynamics. At its core, genetic algorithms are a type of search. They search over the “family tree” of solutions until we reach the best possible solution we can find.
  • Reinforcement Learning and Gameplaying. We often see reinforcement learning used to teach computers how to play games and other paradigms where certain choices are rewarded and other choices are punished. Most of the excitement from computers solving games, like AlphaGo, involves the machine learning component. However, the contribution the machine learning makes is to drastically limit the amount we must search. The backbone of these systems is still search — search for the ideal way to transition states in order to get our reward at the end (e.g., win the game).


While modern AI systems are rarely built entirely on the logic-heavy techniques of the 80’s, logic is still an essential underpinning for many AI applications and complement other more recent techniques. At a high level, logic in AI has been used for proving theorems, providing a backbone for representing meaning in AI applications, and logical inference. For instance, a logic AI system would be able to infer that, if we know you were born in Arizona, then you must have been born in the United States.

Machine Learning

Machine Learning is such a popular technique for modern AI that “AI” and “Machine Learning” are often interchanged. It turns out that learning patterns from large amounts of data has been the most successful way to mimic intelligence so far. This is the technique of Machine Learning: given exemplars of how to perform a task, learn how to emulate that task.

The hard part of machine learning is generalizing lessons from exemplars and applying them to unseen data. Simply put, if I feed a machine learning system images of cats and tell it that the images are cats (exemplars), then we want to be able to feed the system a never before seen image of a cat and have the system label that new image as a cat.

At a high level, the two most popular techniques for this are Statistical Learning and Deep Learning / Neural Nets. Statistical learning collects statistics from the examples that it sees to try to probabilistically generalize to unseen inputs. Deep learning and neural nets are a more powerful, non-statistical way to learn from data. Much more on this in later blog posts.

Subfields of AI

With a basic understanding of the various AI techniques, we can now turn our focus to the subfields of AI that use those techniques. Although each field has different goals, they all use many of the same techniques described above.

While many smaller subfields in AI exist, the big three are Computer Vision, Robotics, and Natural Language Processing.

Diving into the specific goals of each subfield:

Computer Vision is the task of parsing and understanding images and videos. For example, detecting that an image of a cat is a cat, or classifying faces to names on Facebook.

Robotics has two objectives: build physical robots and imbue them with useful intelligence. Roboticists employ AI to accomplish the second objective, using AI techniques to teach their robots to understand physical input (e.g., is the object in front of me a cup?) and plan what the robot should do as a result (e.g., this is how I should move my joints to pick up the cup).

Natural Language Processing is the field of parsing and understanding language. For example, extracting the sentiment of a sentence, or building a chatbot. Eloquent’s expertise lays here.

About Eloquent Labs

As you might be able to tell from this blog post, I’m not always perfect with my language. It might seem somewhat ironic, then, that I received my PhD in natural language processing. But I went into NLP because I have a lofty and perhaps very nerdy dream: to create seamless conversations between computers and humans. My co-founder, Keenon Werling, and I founded Eloquent Labs to leverage our research at Stanford to accomplish this dream and bring our results to where some of the most important conversations occur — enterprises.

Since our origin, our AI has been deployed in major insurance and logistics companies across the globe. We’ve reduced call and chat volume to live agents, improved the efficiency of service representatives, and increased employee and customer satisfaction. At our core however, we’re a team of dedicated, passionate engineers and researchers. All of us at Eloquent look forward to providing a valuable resource to the community through this blog.

Please subscribe if you want to be updated when new posts go up and if you have any questions or topics you’d like to see posts on, shoot me an email at [email protected] I look forward to speaking with you!