Notes.

This dataset, is horribly inefficent but is perfect as a jumping off point for
this discussion.

The data comes from IMDB and here is a thank you to them and their copyright
and licensing agreement.

NOTE: IMDB organizes their data with a very good data model, I on the other hand
have destroyed their good model and prepared a horrific set of data for this
presentation.

The data in this dataset is stored as a simple CSV.

It is 150,000 records from the TV episodes database, which has millions of
records.   This file is about 51 megs.

Because it is only 51 megs, this presentation will actually be doable, if it
were a really large db say 1TB like the full IMDB, these queries that I am
performing would be excruciating.

The criteria for choice was tv episodes that were marked as a single episode
of a tv show that only showed once.  But upon reviewing the data it looks more
like a bunch of outlier records that probably have incorrect information.

Incorrect data or outlier data is a big problem in data management.   You may
have millions of records, but if the data in those records are not correct it
will make it difficult to find that data, and some times nearly impossible.

So we are inadvertantly working from thier mostly broken data, but Julian was
too lazy to curate a different set of data, and this will help us in
"reorganizing" this data, which is one thing that a DBA would be doing.

So this SQL 101 presentation puts you in as a DBA who has to curate some
outlier data and make more sense of it.

This data that I have curated for you, is also organized in a horrific manner
and anyone here who has worked in Databases in the past are going to cringe
once I show this data and think I am insane.

Well, that is the point, we are going to "reorganize" this data into a more
proper manner.

First, SQL is Structured Query Language has been around since the early 1970's
but became a "thing" in the mid 1980's with IBM's adoption of it for their
mainline database products.   From that point on we have a better set of tools
for the collation and reporting from large datasets.

Now SQL is actually a defined language with a group that manages the language
itself.   But in practice every database vendor does it differently so when
working in the SQL field you will have some schizophrenia and try and figure
out what DB your using so you can get the native syntax or semantics, sorry
but that is the way this works.

Some popular databases:

Mysql/Mariadb
PostgreSQL
Oracle
SQLServer
DB2
SQLite3

Today for simplicity we are going to use SQLite3 because it is very light and
is actually delivered as a library that tools are built around and can do very
sophisticated SQL with a light footprint.

BTW my favorite heavy lifting database is by far PostgreSQL, it is enterprise
class and fully open source.   Oracle and SQLServer are owned by Oracle and
Microsoft respectively and can do a lot for you, but you will pay through the
nose to use them in a commercial environment.  But developers and students can
use them for free.

All the other databases are awesome battleships for Enterprise, but would be
expensive to maintain and install.  For the purposes of this presentation the
lightness of SQLite3 is perfect.

I on the other hand have used all the above databases except DB2, but my
Mother was a DBA for IBM and DB2, so I have DB2 in my genes :)

A quick look at the data:

Julians-Macbook-Pro:202004_SQL101 julianbrown$ head data_set.csv
Episode,Show,Date,Role,Person,Birthday,Deathday
1019 Henessy,In the Crack,2015,actress,Henessy,1989,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,actor,Cauã Reymond,1980,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,actress,Taís Araújo,1978,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,actor,Reynaldo Gianecchini,1972,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,actress,Giovanna Antonelli,1976,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,actress,Rosi Campos,1954,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,director,Denise Saraceni,Unknown,Unknown
Episode dated 16 August 2004,Da Cor do Pecado,2004,writer,João Emanuel Carneiro,1970,-
Episode dated 16 August 2004,Da Cor do Pecado,2004,composer,Roberto Schilling,Unknown,Unknown

Import a simple csv directly into a new table in a new database

Julians-Macbook-Pro:202004_SQL101 julianbrown$ rm version1.db
Julians-Macbook-Pro:202004_SQL101 julianbrown$ sqlite3 version1.db
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .import data_set.csv episode_people
sqlite> .quit

The sqlite3 csv import code is pretty cool and automatically creates the "schema" or the layout of the data in the table.

What is a schema? Different databases call it slightly different but what I mean in this light is all the "columns" in a table and what their datatypes are.

In this case what schema was generated.

Julians-Macbook-Pro:202004_SQL101 julianbrown$ sqlite3 version1.db
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
sqlite> .schema episode_people
CREATE TABLE episode_people(
  "Episode" TEXT,
  "Show" TEXT,
  "Date" TEXT,
  "Role" TEXT,
  "Person" TEXT,
  "Birthday" TEXT,
  "Deathday" TEXT
);

Note it returned the schema in the form of a SQL statement.  Just as a note here, each database will do the same thing.  And unfortunately each DB has different syntax for their create table formats.

Let's talk about data types.  Each column in a table has a data type that it must conform to (except SQLite, but I don't want to talk about that).

The acceptable datatypes for SQLite3 are in this slide ...

The acceptable datatypes for PostgresSQL are in this slide ...

Note the real databases have much more to choose from in regards to datatypes.

I want to digress a little on:

TEXT, CHAR and VARCHAR.

sqlite3 does not support CHAR or VARCHAR which are the most common datatypes used in all the databases.  This is one of the simplifications that sqlite3 uses to make it lightweight.

Show the difference with CHAR and VARCHAR.

Now for what you all came for, let us query the data and see what we see.

sqlite> SELECT * FROM episode_people WHERE 1 LIMIT 10;
1019 Henessy|In the Crack|2015|actress|Henessy|1989|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|actor|Cauã Reymond|1980|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|actress|Taís Araújo|1978|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|actor|Reynaldo Gianecchini|1972|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|actress|Giovanna Antonelli|1976|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|actress|Rosi Campos|1954|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|director|Denise Saraceni|Unknown|Unknown
Episode dated 16 August 2004|Da Cor do Pecado|2004|writer|João Emanuel Carneiro|1970|-
Episode dated 16 August 2004|Da Cor do Pecado|2004|composer|Roberto Schilling|Unknown|Unknown
Episode dated 16 August 2004|Da Cor do Pecado|2004|actress|Alinne Moraes|1982|-

That is a trivial search so let's go over it.

SELECT
FROM
WHERE, discuss that in this case WHERE is optional
LIMIT optional so that I did not print out all 600k elements.

What is it doing in the background ...

Now lets count the number of rows.

sqlite> SELECT COUNT(Episode) FROM episode_people WHERE 1;
630293

Counting is cool.

But in the first query you notice that the Show is duplicated many times, note I have some tricks for you.

sqlite> .headers on
sqlite> .mode column
sqlite> SELECT Show,COUNT(Show) FROM episode_people WHERE 1 GROUP BY Show LIMIT 10;
Show             COUNT(Show)
---------------  -----------
#BristolDebates  1
#Euroroast       7
#Freerayshawn    3
#FriendsDriveSa  1
#LionelNationus  8
#TheWorstDay     10
#WestonSuperMar  4
#love #likes #t  9
$2 Bill          16
'Wag kukurap     3

Define GROUP BY
Define LIMIT

But I am not done yet.

sqlite> SELECT Show,COUNT(Show) FROM episode_people WHERE 1 GROUP BY Show ORDER BY 2 LIMIT 10;
Show             COUNT(Show)
---------------  -----------
#BristolDebates  1
#FriendsDriveSa  1
+ Brasileiros    1
0800             1
100% question    1
13.11            1
1Love 1House TV  1
2 on the Town    1
20/20 Downtown   1
30 Minuten Deut  1

Discuss ORDER BY, and using 2 instead of the column name

I can rename a column if I so choose.

sqlite> SELECT Show,COUNT(Show) AS Count FROM episode_people WHERE 1 GROUP BY Show ORDER BY Count LIMIT 10;
Show             Count
---------------  ----------
#BristolDebates  1
#FriendsDriveSa  1
+ Brasileiros    1
0800             1
100% question    1
13.11            1
1Love 1House TV  1
2 on the Town    1
20/20 Downtown   1
30 Minuten Deut  1

sqlite> SELECT Show,COUNT(Show) FROM episode_people WHERE 1 GROUP BY Show ORDER BY 2 DESC LIMIT 10;
Show        COUNT(Show)
----------  -----------
EastEnders  26634
The Tonigh  11528
Late Show   10570
Top of the  8010
Howard Ste  6660
The Rosie   5845
Ellen: The  5365
As the Wor  4778
Telediario  4755
Biography   4711

Discuss ASC and DESC

Oh I found something interesting let's explore.

sqlite> SELECT Episode FROM episode_people WHERE Show = "Telediario" LIMIT 10;
Episode
--------------------------
Episode dated 18 June 1962
Episode dated 18 June 1962
Episode dated 18 June 1962
Episode dated 18 June 1962
Episode dated 18 June 1962
Episode dated 18 February
Episode dated 18 February
Episode dated 18 February
Episode dated 18 February
Episode dated 18 February

sqlite> SELECT Episode FROM episode_people WHERE Show = "Telediario" GROUP BY Episode LIMIT 10;
Episode
---------------------------
Episode dated 18 April 1958
Episode dated 18 April 1959
Episode dated 18 April 1960
Episode dated 18 April 1961
Episode dated 18 April 1962
Episode dated 18 April 1963
Episode dated 18 April 1964
Episode dated 18 April 1965
Episode dated 18 April 1966
Episode dated 18 April 1967
sqlite> SELECT Episode FROM episode_people WHERE Show = "Telediario" GROUP BY Episode ORDER BY Episode ASC LIMIT 10;
Episode
---------------------------
Episode dated 18 April 1958
Episode dated 18 April 1959
Episode dated 18 April 1960
Episode dated 18 April 1961
Episode dated 18 April 1962
Episode dated 18 April 1963
Episode dated 18 April 1964
Episode dated 18 April 1965
Episode dated 18 April 1966
Episode dated 18 April 1967

sqlite> SELECT Episode FROM episode_people WHERE Show = "Telediario" GROUP BY Episode ORDER BY Episode DESC LIMIT 10;
Episode
----------------------------
Episode dated 8 January 2007
Episode dated 6 November 200
Episode dated 18 September 2
Episode dated 18 September 2
Episode dated 18 September 2
Episode dated 18 September 2
Episode dated 18 September 2
Episode dated 18 September 2
Episode dated 18 September 2
Episode dated 18 September 1

sqlite> SELECT COUNT(Episode) FROM episode_people WHERE Show = "Telediario" GROUP BY Episode ORDER BY 1 LIMIT 10;
COUNT(Episode)
--------------
2
3
3
3
3
3
3
3
3
3

sqlite> SELECT COUNT(Episode) FROM episode_people WHERE Show = "Telediario" GROUP BY Episode ORDER BY 1 DESC LIMIT 10;
COUNT(Episode)
--------------
10
10
10
10
10
10
10
10
10
10

sqlite> SELECT Episode, COUNT(Episode) FROM episode_people WHERE Show = "Telediario" GROUP BY Episode ORDER BY 2 DESC LIMIT 10;
Episode                      COUNT(Episode)
---------------------------  --------------
Episode dated 18 April 1974  10
Episode dated 18 April 1975  10
Episode dated 18 April 1976  10
Episode dated 18 April 1977  10
Episode dated 18 April 1978  10
Episode dated 18 April 1979  10
Episode dated 18 April 1980  10
Episode dated 18 April 1983  10
Episode dated 18 April 1986  10
Episode dated 18 April 1987  10

sqlite> SELECT Show, Person FROM episode_people WHERE Person LIKE "%Fred A%" GROUP BY Show;
Show                Person
------------------  ------------
ABC News Nightline  Fred Astaire
ABC's Wide World o  Fred Astaire
Au théâtre ce soir  Alfred Adam
Biography           Fred Astaire
Cinéma 16           Alfred Adam
CollegeHumor Origi  Alfred Aquin
Doppelpass          Manfred Amer
Home                Fred Astaire
I Love the 1990s    Fred Aylward
Le petit théâtre d  Alfred Adam
Les amours de la b  Alfred Adam
Makro               Manfred Aign
Peter Steiners The  Alfred Ander
Planet Wissen       Manfred Ande
Planet e.           Manfred Aign
Premium Blend       Fred Armisen
Pro & Contra        Manfred Abel
Project Twenty      Fred Allen
Reel Comedy         Fred Armisen
The Christophers    Fred Allen
The Merv Griffin S  Fred Astaire
The Tonight Show S  Fred Astaire
Tilt                Fred Adison
Your Movie Show     Fred Armisen

sqlite> SELECT Show, Person FROM episode_people WHERE Person = "Fred Astaire" GROUP BY Show;
Show                Person
------------------  ------------
ABC News Nightline  Fred Astaire
ABC's Wide World o  Fred Astaire
Biography           Fred Astaire
Home                Fred Astaire
The Merv Griffin S  Fred Astaire
The Tonight Show S  Fred Astaire

sqlite> SELECT Show, Date, Person FROM episode_people WHERE Person = "Fred Astaire" GROUP BY Show;
Show                Date        Person
------------------  ----------  ------------
ABC News Nightline  1987        Fred Astaire
ABC's Wide World o  1974        Fred Astaire
Biography           2001        Fred Astaire
Home                1957        Fred Astaire
The Merv Griffin S  1975        Fred Astaire
The Tonight Show S  1976        Fred Astaire

sqlite> SELECT Show, Date, Person FROM episode_people WHERE Person IN("Fred Astaire", "Fred Allen")  GROUP BY Show;
Show                Date        Person
------------------  ----------  ------------
ABC News Nightline  1987        Fred Astaire
ABC's Wide World o  1974        Fred Astaire
Biography           2001        Fred Astaire
Home                1957        Fred Astaire
Project Twenty      1956        Fred Allen
The Christophers    1956        Fred Allen
The Merv Griffin S  1975        Fred Astaire
The Tonight Show S  1976        Fred Astaire

Be careful with GROUP BY, know your data

sqlite> SELECT Show, Date, Person FROM episode_people WHERE Person IN("Fred Astaire", "Fred Allen");
Show            Date        Person
--------------  ----------  ----------
Project Twenty  1956        Fred Allen
Biography       2003        Fred Astai
ABC News Night  1987        Fred Astai
The Christophe  1956        Fred Allen
Biography       1999        Fred Astai
Home            1957        Fred Astai
Biography       2001        Fred Astai
The Tonight Sh  1975        Fred Astai
ABC's Wide Wor  1974        Fred Astai
The Tonight Sh  1976        Fred Astai
The Merv Griff  1975        Fred Astai

This will blow your mind.

sqlite> SELECT a.Episode, a.Count FROM(
	SELECT Episode, COUNT(Episode) AS Count FROM episode_people WHERE Show = "Telediario" GROUP BY Episode ORDER BY 2 DESC) a
    WHERE a.Count > 9 ORDER BY a.Count DESC LIMIT 10;
Episode                      Count
---------------------------  ----------
Episode dated 18 April 1974  10
Episode dated 18 April 1975  10
Episode dated 18 April 1976  10
Episode dated 18 April 1977  10
Episode dated 18 April 1978  10
Episode dated 18 April 1979  10
Episode dated 18 April 1980  10
Episode dated 18 April 1983  10
Episode dated 18 April 1986  10
Episode dated 18 April 1987  10

Is SELECT the only thing you can do with SQL?

No you can INSERT, DELETE and UPDATE records,
you can also work complicated transactions with
syntax like BEGIN TRANSACTION, COMMIT TRANSACTION
or ROLLBACK TRANSACTION.

I will show some of that, but keep in mind that
these forms are database dependant, they usually
follow a form, but usually differ on details.

But before we look at those, let's now become a
DBA and let's start doing proper management of
data.

First thing I want to introduce is indexing.
You will find that indexing is one of the most
powerful tools SQL provides for you, and often
overlooked by neophytes.

INDEXING ...

++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++

sqlite> .schema episodes
CREATE TABLE episodes(
  "title_id" TEXT,
  "Date" TEXT,
  "Title" TEXT
);


CREATE TEMP TABLE episode_manip("episode_id" TEXT, "Date" TEXT, "Title" TEXT);
INSERT INTO episode_manip (episode_id, Date, Title) SELECT episode_id, Episode, Episode FROM episode_people WHERE 1 GROUP BY episode_id;
SELECT COUNT(episode_id) FROM episode_manip;

UPDATE episode_manip SET Date = replace(Date, "dated 1 ", "dated 01 ");
UPDATE episode_manip SET Date = replace(Date, "dated 2 ", "dated 02 ");
UPDATE episode_manip SET Date = replace(Date, "dated 3 ", "dated 03 ");
UPDATE episode_manip SET Date = replace(Date, "dated 4 ", "dated 04 ");
UPDATE episode_manip SET Date = replace(Date, "dated 5 ", "dated 05 ");
UPDATE episode_manip SET Date = replace(Date, "dated 6 ", "dated 06 ");
UPDATE episode_manip SET Date = replace(Date, "dated 7 ", "dated 07 ");
UPDATE episode_manip SET Date = replace(Date, "dated 8 ", "dated 08 ");
UPDATE episode_manip SET Date = replace(Date, "dated 9 ", "dated 09 ");

UPDATE episode_manip SET Date = replace(Date, "Episode dated ", "");
UPDATE episode_manip SET Date = replace (Date, "January", "01");
UPDATE episode_manip SET Date = replace (Date, "February", "02");
UPDATE episode_manip SET Date = replace (Date, "March", "03");
UPDATE episode_manip SET Date = replace (Date, "April", "04");
UPDATE episode_manip SET Date = replace (Date, "May", "05");
UPDATE episode_manip SET Date = replace (Date, "June", "06");
UPDATE episode_manip SET Date = replace (Date, "July", "07");
UPDATE episode_manip SET Date = replace (Date, "August", "08");
UPDATE episode_manip SET Date = replace (Date, "September", "09");
UPDATE episode_manip SET Date = replace (Date, "October", "10");
UPDATE episode_manip SET Date = replace (Date, "November", "11");
UPDATE episode_manip SET Date = replace (Date, "December", "12");

SELECT Date FROM episode_manip WHERE LENGTH(Date) > 10;
SELECT Date FROM episode_manip WHERE LENGTH(Date) != 10;

#01234567890
#08 10 1984

UPDATE episode_manip SET Date = substr(Date,7,4) || "-" || substr(Date,4,2) || "-" || substr(Date,1,2);

CREATE TABLE episode_dates ("episode_id" TEXT, "Date" TEXT, "Title" TEXT);
INSERT INTO episode_dates (episode_id, Date, Title) SELECT episode_id, Date, Title FROM episode_manip WHERE 1;
CREATE INDEX episode_dates_episode_id ON episode_dates(episode_id);
DROP TABLE episode_manip;

SELECT Show, episode_people.Date, Person, episode_dates.Date AS "Real Date" FROM episode_people JOIN episode_dates ON episode_people.episode_id = episode_dates.episode_id WHERE Person IN("Fred Astaire", "Fred Allen");

SELECT ep.Show, ep.Date, ep.Person, ed.Date AS "Real Date" FROM episode_people AS ep JOIN episode_dates AS ed ON ep.episode_id = ed.episode_id WHERE ep.Person IN("Fred Astaire", "Fred Allen");

SELECT ep.Show, ep.Date, ep.Person, ed.Date AS "Real Date" FROM episode_people AS ep JOIN episode_dates AS ed ON ep.episode_id = ed.episode_id WHERE ep.Person IN("Fred Astaire", "Fred Allen") GROUP BY ep.Show;

BEGIN TRANSACTION;
CREATE TABLE new_episode_people(
  "Show" TEXT,
  "Date" TEXT,
  "Role" TEXT,
  "Person" TEXT,
  "Birthday" TEXT,
  "Deathday" TEXT,
  "episode_id" TEXT,
  "show_id" TEXT
);
INSERT INTO new_episode_people
    (Show, Date, Role, Person, Birthday, Deathday, episode_id, show_id) 
SELECT 
    Show, Date, Role, Person, Birthday, Deathday, episode_id, show_id
FROM episode_people
WHERE 1;
DROP TABLE episode_people;
ALTER TABLE new_episode_people RENAME TO episode_people;
COMMIT;

SELECT ep.Show, ed.Title, ed.Date, ep.Role, ep.Person FROM episode_people AS ep JOIN episode_dates AS ed ON ep.episode_id = ed.episode_id GROUP BY Show LIMIT 10;
Show           Title                       Date        Role        Person
-------------  --------------------------  ----------  ----------  -------------
#Freerayshawn  Episode dated 6 April 2020  2020-04-06  actress     Alyshia Ochse
#LakeShow      Episode dated 14 April 201  2014-04-14  self        James Worthy
#somos         Episode dated 1 January 20  2017-01-01  actor       Juan Camilo Y
't is gebeurd  Episode dated 22 November   2016-11-22  self        Toby Van Camp
+ Clair        Episode dated 25 March 200  2006-03-25  self        Sandrine Bonn
+ de cinéma    Episode dated 23 October 2  2001-10-23  self        Salomé Stéven
...18 - 20 nu  Episode dated 29 April 197  1971-04-29  self        Pompilia Stoi
...Amb Manel   Episode dated 17 August 20  2004-08-17  self        Marc Parrot
...mal ehrlic  Episode dated 3 March 2000  2000-03-03  self        Rosa von Prau
01-For London  Episode dated 22 September  1988-09-22  director    Henry Murray

.schema episode_people
CREATE TABLE IF NOT EXISTS "episode_people"(
  "Show" TEXT,
  "Date" TEXT,
  "Role" TEXT,
  "Person" TEXT,
  "Birthday" TEXT,
  "Deathday" TEXT,
  "episode_id" TEXT,
  "show_id" TEXT
);

SELECT Show, show_id FROM episode_people WHERE 1 GROUP BY Show LIMIT 10;
CREATE TABLE shows("show_id" TEXT, "Show" TEXT);
INSERT INTO shows (show_id, Show) SELECT show_id, Show FROM episode_people WHERE 1 GROUP BY Show;
SELECT * FROM shows LIMIT 10;
CREATE INDEX shows_show_id ON shows(show_id);
CREATE INDEX shows_show ON shows(Show);

BEGIN TRANSACTION;
CREATE TABLE new_episode_people(
  "Date" TEXT,
  "Role" TEXT,
  "Person" TEXT,
  "Birthday" TEXT,
  "Deathday" TEXT,
  "episode_id" TEXT,
  "show_id" TEXT
);
INSERT INTO new_episode_people
    (Date, Role, Person, Birthday, Deathday, episode_id, show_id) 
SELECT 
    Date, Role, Person, Birthday, Deathday, episode_id, show_id
FROM episode_people
WHERE 1;
DROP TABLE episode_people;
ALTER TABLE new_episode_people RENAME TO episode_people;
COMMIT;

CREATE TABLE people("person_id" INTEGER PRIMARY KEY AUTOINCREMENT, "Person" TEXT, "Birthday" TEXT, "Deathday" TEXT);
SELECT Person, Birthday, Deathday FROM episode_people WHERE 1 GROUP BY Person LIMIT 10;

INSERT INTO people (Person, Birthday, Deathday) SELECT Person, Birthday, Deathday FROM episode_people WHERE 1 GROUP BY Person;
CREATE UNIQUE INDEX people_person ON people(Person);

CREATE TABLE roles("role_id" INTEGER PRIMARY KEY AUTOINCREMENT, "Role" TEXT);
INSERT INTO roles(Role) SELECT Role FROM episode_people WHERE 1 GROUP BY Role;
SELECT COUNT(Role) FROM roles;
SELECT Role FROM roles ORDER BY Role ASC;

WORKING ON THIS

CREATE TABLE new_episode_people(
  "episode_people_id" INTEGER PRIMARY KEY AUTOINCREMENT,
  "Date" TEXT,
  "role_id" INTEGER,
  "person_id" INTEGER,
  "episode_id" TEXT,
  "show_id" TEXT
);
INSERT INTO new_episode_people
    (Date, role_id, person_id, episode_id, show_id) 
SELECT
  ep.Date, r.role_id, p.person_id, ep.episode_id, ep.show_id
FROM
  episode_people AS ep
JOIN
  roles AS r ON ep.Role = r.Role
JOIN 
  people AS p ON ep.Person = p.Person
WHERE 1;

SELECT * FROM new_episode_people LIMIT 10;
DROP TABLE episode_people;
ALTER TABLE new_episode_people RENAME TO episode_people;


