Python - Reading and Writing csv and Excel Files!

Edit subtitles

0:00 - 0:01

PROFESSOR: Hi, everyone.
0:01 - 0:05

In this video, I'll show you how
to read and write CSV and Excel
0:05 - 0:08

files within Python.
0:08 - 0:13

This is a valuable skill to have
in industry and in academia.
0:13 - 0:18

We can see on a job site like
indeed.com, plenty of employers
0:18 - 0:20

are looking for
data analysts that
0:20 - 0:23

can make sense of all
the CSV and Excel files
0:23 - 0:23

they have stored.
0:23 - 0:27
0:27 - 0:27

Great.
0:27 - 0:30

Let's get started.
0:30 - 0:33

We'll be working with
three different modules
0:33 - 0:40

in this tutorial-- pandas,
pyexcel, and openpyxl.
0:40 - 0:46

If you're working in a web IDE
like myself and, in particular,
0:46 - 0:52

Google Colab, the way that we
have this loaded in is we pip
0:52 - 0:54

install the pyexcel module.
0:54 - 0:58
0:58 - 1:00

And once starts
installed, we need
1:00 - 1:06

to restart the runtime in order
to be able to use pyexcel.
1:06 - 1:08

So we'll run this
cell one more time.
1:08 - 1:11
1:11 - 1:15

And we can see that all the
requirements are satisfied.
1:15 - 1:18

So we should be set.
1:18 - 1:22

The next thing we need to do
is we need to load in the file
1:22 - 1:23

that we're working with.
1:23 - 1:25

Again, since I'm
in Google Colab,
1:25 - 1:30

I'm just going to drag and
drop the CSV file I'll work
1:30 - 1:32

with into the Content folder.
1:32 - 1:36
1:36 - 1:39

So the first module I'll
highlight is pandas.
1:39 - 1:44

And pandas is probably the
best module to work with.
1:44 - 1:48

And I think it will become
more obvious as we go
1:48 - 1:50

through some of the examples.
1:50 - 1:52

So the first thing
that we'll do is
1:52 - 1:54

we'll create a data
fram which is going
1:54 - 1:59

to store all the data in
the CSV file in tabular form
1:59 - 2:01

within Python.
2:01 - 2:07

So for data frames, they're
almost always called df.
2:07 - 2:10

And that will be the variable
name for the data frame.
2:10 - 2:14

We're going to call
the pandas module pd.
2:14 - 2:17

Then we need to read CSV.
2:17 - 2:22

And we can see here that
pandas is a very powerful tool.
2:22 - 2:25

And it can read
CSVs, read Excels,
2:25 - 2:32

and far more different file
types, such as SAS or SQL files.
2:32 - 2:37

So we're going to select
the CSV, parentheses.
2:37 - 2:42

Then within the apostrophes,
we're going to copy the path.
2:42 - 2:46

And we're going to
put the path in here.
2:46 - 2:48

And our data frame
should be loaded.
2:48 - 2:51

So now, let's take
a look at the data.
2:51 - 2:54

The way that we can
do that is df head,
2:54 - 2:57

which will show the
first n number of rows
2:57 - 2:58

that we put into the parameter.
2:58 - 3:02

So I'm going to take a
look at the first 10 rows.
3:02 - 3:03

Great.
3:03 - 3:05

So here, I have financial data.
3:05 - 3:10

And these are daily
prices for the S&P 500
3:10 - 3:13

as well as the NASDAQ.
3:13 - 3:16

And I'm going to do a few
different things into this.
3:16 - 3:19

But before we can
do anything, let's
3:19 - 3:20

take a look at the data types.
3:20 - 3:27

So if we look at df.dtypes,
this will give us the data types
3:27 - 3:30

of each of these.
3:30 - 3:30

Great.
3:30 - 3:36

So it looks like we have
this all in numeric form.
3:36 - 3:43

And we actually don't have
to make any edits to turn
3:43 - 3:46

these objects into numbers.
3:46 - 3:49

So let's say that
what I want to do
3:49 - 3:51

is I want to get
the daily returns.
3:51 - 3:55

And what I'll do is I'll
get the daily log returns.
3:55 - 3:57

So the way that we
can do this is I'm
3:57 - 4:03

going to create two new
columns within this data frame.
4:03 - 4:05

The first one, I'll call df.
4:05 - 4:13

And within the brackets, I'll
call that NASDAQ Log Return.
4:13 - 4:16
4:16 - 4:20

And this is going to
create our column here.
4:20 - 4:23

Now, I have to actually
put in the operation
4:23 - 4:25

to create the column.
4:25 - 4:29

And the way that we get
the log return is we
4:29 - 4:36

can just take the log
difference between the price
4:36 - 4:41

at time t subtracted by
the price at t minus 1.
4:41 - 4:49

So we'll take this 4,441
and subtract it by 4,405.
4:49 - 4:51

And that will give
us the log return.
4:51 - 4:57

So we'll actually call
the NumPy to do this.
4:57 - 5:05

So np.log df and NASDAQ Close.
5:05 - 5:11

And this is going to get all the
closing prices for the NASDAQ
5:11 - 5:14

here in this column.
5:14 - 5:18

And what I have to do is I
have to get the lag difference.
5:18 - 5:20

And pandas makes
this very easy to do.
5:20 - 5:24

And we'll see how
relatively easy this is.
5:24 - 5:30

So if I go subtract
this by NASDAQ Close--
5:30 - 5:36

and what I have to do is I
have to get this previous day.
5:36 - 5:38

So subtract this
day by this day.
5:38 - 5:42

And do that for all of
the rows all the way down.
5:42 - 5:46

And we'll have one missing
row because there's always
5:46 - 5:50

going to be one day that you
don't have the previous day's
5:50 - 5:53

return.
5:53 - 5:58

Pandas has a very great function
that makes this easy to do.
5:58 - 6:00

It's called the
dot shift function.
6:00 - 6:02

And I'm going to
put in negative 1.
6:02 - 6:07

And this will enable us to
get the previous day's return.
6:07 - 6:08

So let's take a look at that.
6:08 - 6:12
6:12 - 6:13

Great.
6:13 - 6:15

And you can always
check this yourself
6:15 - 6:16

to make sure it's right.
6:16 - 6:24

So I'll just do
np.log minus np.log.
6:24 - 6:34
6:34 - 6:36

And it matches up great.
6:36 - 6:40

So we were able to
calculate the log return.
6:40 - 6:43

And we'll do the same
thing for the S&P 500.
6:43 - 6:44

I'm just going to
change the name around.
6:44 - 6:49
6:49 - 6:50

Great.
6:50 - 6:51

Now, let's take a
look at that as well.
6:51 - 6:54
6:54 - 7:01

So we were able to calculate the
daily log returns very easily
7:01 - 7:04

within pandas.
7:04 - 7:06

At this point,
what I'll do now is
7:06 - 7:10

I'll take the S&P 500
data and the NASDAQ data
7:10 - 7:14

and split it into
different data frames.
7:14 - 7:17

And the way I'll do this is
I'll create two new data frames.
7:17 - 7:20

The first, I'll call S&P 500.
7:20 - 7:29

So S&P is equal to and then
call the df log function.
7:29 - 7:34

And what I need to do is I
need to grab a certain column.
7:34 - 7:38

So what I'll do is, in
this, I put in a colon
7:38 - 7:43

because I want all the
rows, put in my comma.
7:43 - 7:50

And what I'll do is I'll
get df.columns, which
7:50 - 7:53

brings in all the column names.
7:53 - 7:56

And then I want
columns 1 through 5.
7:56 - 8:10

And just to illustrate what
this looks like before I run it,
8:10 - 8:12

these are all the
columns I want.
8:12 - 8:16

And I want to recalculate
the returns using
8:16 - 8:19

either pyexcel or openpyxl.
8:19 - 8:23

So I have the S&P 500 dataframe.
8:23 - 8:24

We can take a
quick look at that.
8:24 - 8:27
8:27 - 8:30

And I'll create
a NASDAQ one too.
8:30 - 8:36

And the way I'll do that
is I'll drop the columns--
8:36 - 8:38

so df drop.
8:38 - 8:44

And that will create another
instance of the df dataframe.
8:44 - 8:48

And I'm just going to copy
and paste this in here.
8:48 - 8:51

So this gets rid
of those columns.
8:51 - 8:58

And then I also want
to drop the returns.
8:58 - 9:01

Because again,
for each of these,
9:01 - 9:05

I'm going to recalculate
the returns using
9:05 - 9:09

the different modules
and NASDAQ columns.
9:09 - 9:14
9:14 - 9:20

And we're going to drop
the last two just to--
9:20 - 9:22

we'll run this and give an idea.
9:22 - 9:25

And that just drops
the last two columns.
9:25 - 9:27
9:27 - 9:29

So if we take a
look at the NASDAQ--
9:29 - 9:35
9:35 - 9:36

great.
9:36 - 9:39

So we have another dataframe.
9:39 - 9:44

And pandas makes it very easy
to bring these data frames
9:44 - 9:48

into CSV or Excel format.
9:48 - 9:50

So I'll write this
in Excel format.
9:50 - 9:53

And the way that we could
do this is we take writer
9:53 - 9:54

is equal to.
9:54 - 9:58

And pandas has a function
called ExcelWriter.
9:58 - 10:01
10:01 - 10:08

And we are going to
write a new file.
10:08 - 10:12

We'll just call this
financial data XLS.
10:12 - 10:13

Copy path.
10:13 - 10:17

And what we'll actually
do is we'll take this S&P.
10:17 - 10:20

And we'll put it in one
Excel sheet and this NASDAQ
10:20 - 10:24

and put it in a
second Excel sheet.
10:24 - 10:33

And it will be overall in this
financial data Excel sheet.
10:33 - 10:36

And we need to put the right
extension name-- so XLSX.
10:36 - 10:39
10:39 - 10:42

And we have to write the
S&P to the first sheet--
10:42 - 10:44

so sandp.to_excel.
10:44 - 10:49
10:49 - 10:54

Then we need to specify that
it's going to the ExcelWriter.
10:54 - 10:56

Get a sheet name for it.
10:56 - 11:05

So we'll call the sheet
name S&P500, no spaces.
11:05 - 11:07

And finally, we'll just
get rid of the index.
11:07 - 11:09

So index is equal to false.
11:09 - 11:11

And we'll do the same
thing for the NASDAQ.
11:11 - 11:29
11:29 - 11:34

Finally, we just need to put
in writer.save to save this.
11:34 - 11:37
11:37 - 11:40

So we were able
to save our sheet.
11:40 - 11:45

And we can see it here in the
Google Colab Content folder.
11:45 - 11:50

So let's move on to the pyexcel
module and see how it works.
11:50 - 11:56

So I'll create a new
variable called fin_data.
11:56 - 11:58

And I'll call pyexcel.
11:58 - 12:00

And we put that
in as p.get_book.
12:00 - 12:04
12:04 - 12:11

And we'll take that financial
data Excel sheet we just
12:11 - 12:13

created and put it in there.
12:13 - 12:14

Let's run that.
12:14 - 12:18
12:18 - 12:20

I need to specify the file name.
12:20 - 12:26
12:26 - 12:36

And if we return this, we'll
get a PySpark looking table.
12:36 - 12:38

And it separates it by sheets.
12:38 - 12:42

So if we scroll up enough, we'll
see that the NASDAQ and S&P
12:42 - 12:44

are separated.
12:44 - 12:46

So this is the S&P 500 sheet.
12:46 - 12:48

If I scroll down
far enough, I'll
12:48 - 12:52

be able to find
the NASDAQ sheet.
12:52 - 12:55

So what I want to do next
is I want to take this.
12:55 - 12:58

And I'll actually put
it in dictionary format.
12:58 - 13:09

So what I can do with fin_data
I can say fin dot dict.
13:09 - 13:11

And this returns a dictionary.
13:11 - 13:15

And it indexes the sheets.
13:15 - 13:18

So the first sheet
will be the S&P 500.
13:18 - 13:20

And the second sheet
will be the NASDAQ.
13:20 - 13:22

And we can see that
this is an ordered dict,
13:22 - 13:24

a special type of dictionary.
13:24 - 13:32

And let's actually save this
to fin_dict is equal to this.
13:32 - 13:35
13:35 - 13:38

And I'll have to use
a different approach
13:38 - 13:40

to calculate the returns.
13:40 - 13:44

And I think if anything, this
shows the power of pandas
13:44 - 13:46

and how easy it makes it.
13:46 - 13:49

So what I'll have
to do is I'll have
13:49 - 13:53

to create an empty list
called percent change.
13:53 - 13:56
13:56 - 14:00

And I'll have to use a for
loop to calculate the returns.
14:00 - 14:04

And we'll also calculate
the log returns for this.
14:04 - 14:06

So let's do that now.
14:06 - 14:09

And what I'll have
to do is say for row.
14:09 - 14:13

And then I'll say
row lag because we
14:13 - 14:19

need the S&P 500
closing price at time t
14:19 - 14:22

but also the closing
price at t minus 1.
14:22 - 14:26
14:26 - 14:30

And we'll use the zip function.
14:30 - 14:32

And we'll say fin_data--
14:32 - 14:35
14:35 - 14:40

or fin_dict because we're
calling the dictionary.
14:40 - 14:42

And we need to
specify the S&P 500
14:42 - 14:46

because this is a dictionary.
14:46 - 14:49

So that will be the row.
14:49 - 14:56

And then for the row
lag, we are going to do--
14:56 - 15:04

and we actually also need
to index this, put it at 1.
15:04 - 15:10

Because at 0, the first cell
is the actual title of it.
15:10 - 15:13

So we don't want to get
back the string instance.
15:13 - 15:15

We just want the returns.
15:15 - 15:17

So what I'll do
is just copy this.
15:17 - 15:21

And this will get
the price at t.
15:21 - 15:26

And the second part will
get the price at t minus 1--
15:26 - 15:27

so the previous day's price.
15:27 - 15:30
15:30 - 15:35

And what I want to do is I want
to append the percent change
15:35 - 15:40

list with that calculation.
15:40 - 15:43

So it will be np.log.
15:43 - 15:46
15:46 - 15:49

Then I need to put row.
15:49 - 15:54

And it's going to be the last--
15:54 - 15:57

so we want the closing price.
15:57 - 16:00

And if we just have
the row, it will
16:00 - 16:07

return us a full list of
each of these elements.
16:07 - 16:13

We just want the last element,
the S&P 500 closing price.
16:13 - 16:17

And we want to divide
that by row_lag.
16:17 - 16:22

And same thing, we want
just the closing price.
16:22 - 16:25

So let's try to run this.
16:25 - 16:27

Now that we have
the log percentage
16:27 - 16:33

change for the S&P 500, you can
take a quick look at that here.
16:33 - 16:41

And that looks like it matches
the calculation that we had.
16:41 - 16:45
16:45 - 16:48

And we were able to
calculate the log returns.
16:48 - 16:53

So now, we have to append
this percentage change
16:53 - 16:58

list to the fin_dict
that we created.
16:58 - 17:01

So what we'll do is for row.
17:01 - 17:14

And then I'll put rtn for
return in zip fin_dict S&P500.
17:14 - 17:20

And we need to,
again, put it as going
17:20 - 17:25

from one to the end because
we'll have to later append
17:25 - 17:27

the title, log return to it.
17:27 - 17:30

But right now, we're just
appending the returns.
17:30 - 17:32

Percentage change.
17:32 - 17:40

And we want to
row.append the return.
17:40 - 17:43
17:43 - 17:50

And we'll also append
manually the title fin_dict
17:50 - 17:53

for the S&P 500.
17:53 - 17:56

And this is going
to be the title.
17:56 - 18:04

So it's going to be 0 or index
position 0 dot append S&P
18:04 - 18:05

500 Log Return.
18:05 - 18:09
18:09 - 18:11

Let's run this.
18:11 - 18:11

Great.
18:11 - 18:14

Now, it should be all set.
18:14 - 18:19

And we can save this
to an Excel file.
18:19 - 18:24

So save as array is equal to--
18:24 - 18:27

and this is just going
to be the fin_dict.
18:27 - 18:31

We don't want to get the NASDAQ
because we didn't edit it.
18:31 - 18:36

And the destination file name.
18:36 - 18:40
18:40 - 18:42

So it's going to go
to the Content folder.
18:42 - 18:45

I'll just copy this
and then edit it.
18:45 - 18:49

So it'll be the S&P 500.
18:49 - 18:58

And I'll just call
this S&P data.
18:58 - 19:00

And save that.
19:00 - 19:03

And we can check that we were
able to write this correctly.
19:03 - 19:08

So the way that we can do
that is we can copy the path.
19:08 - 19:11

And we can use
pandas-- pd.read--
19:11 - 19:14
19:14 - 19:16

and this time, we're
reading an Excel file.
19:16 - 19:18
19:18 - 19:19

Great.
19:19 - 19:25

So it looks like we were able
to successfully iterate over
19:25 - 19:32

this within pyexcel and
write to an Excel file
19:32 - 19:37

and correctly add the
returns from that sheet.
19:37 - 19:42

Now, finally, let's take a look
at the last module, openpyxl.
19:42 - 19:47

And we'll now calculate
the returns for the NASDAQ.
19:47 - 19:50

So the way that we
work with openpyxl
19:50 - 19:51

is we work with workbooks.
19:51 - 19:59

And I'll call this workbook
is equal to load_workbook.
19:59 - 20:02
20:02 - 20:06

And I am going to put in
the file name is equal to--
20:06 - 20:10

and it's going to be
this financial data
20:10 - 20:13

sheet or workbook.
20:13 - 20:16
20:16 - 20:18

Let's read that in.
20:18 - 20:19

Great.
20:19 - 20:24

So if I take a look at
this, this is the object--
20:24 - 20:26

so workbook object.
20:26 - 20:29

And what I could do
is I can actually
20:29 - 20:32

get the NASDAQ sheet from this.
20:32 - 20:34

So we can see worksheet NASDAQ.
20:34 - 20:38

And as we recall earlier, when
we wrote this from pandas,
20:38 - 20:41

we labeled the sheet NASDAQ.
20:41 - 20:47

So I'll save this into a
variable called nasdaq_sheet.
20:47 - 20:47

Run this.
20:47 - 20:50
20:50 - 20:52

And great.
20:52 - 20:58

So similar to what we did over
with pyexcel for openpyxl,
20:58 - 21:03

we need to iterate
over the rows in order
21:03 - 21:09

to be able to get
the returns for this.
21:09 - 21:13

So I'll call this
nasdaq_l for NASDAQ list
21:13 - 21:18

and create an empty list.
21:18 - 21:24

And I'm going to go
for row in range 2.
21:24 - 21:31

And for the NASDAQ
sheet, we're going
21:31 - 21:40

to get the number of
rows, max row plus 1.
21:40 - 21:43

So what this is
going to do is it's
21:43 - 21:51

going to get us all those values
within the NASDAQ sheet rows.
21:51 - 21:54

And what we'll do
is nasdaq_l.append.
21:54 - 22:01
22:01 - 22:05

So for openpyxl, we're
actually going to work with
22:05 - 22:11

the underlying cells like you
would in an Excel workbook--
22:11 - 22:12

so nasdaq_sheet.cell.
22:12 - 22:16
22:16 - 22:17

We're going to get the row.
22:17 - 22:20
22:20 - 22:32

nasdaq_sheet.max_column minus 1.
22:32 - 22:34

And then we're going
to get the values--
22:34 - 22:34

value.
22:34 - 22:38
22:38 - 22:39

Let's run this.
22:39 - 22:42

And let's return
the NASDAQ list.
22:42 - 22:48

So I was able to copy
all of the NASDAQ prices
22:48 - 22:50

into this NASDAQ list.
22:50 - 22:52
22:52 - 22:57

And like I did before
with the pyexcel module,
22:57 - 23:01

I need to also calculate
the percentage change.
23:01 - 23:05

And I'll just write over it.
23:05 - 23:09

And what I'm going to
do is for row, similar
23:09 - 23:12

to what we did before, row_lag.
23:12 - 23:21
23:21 - 23:26

And let's just copy
and paste this.
23:26 - 23:29

And this is the same thing I
did for the S&P 500 before.
23:29 - 23:32

So we have the price at t.
23:32 - 23:35

And we're getting the
difference at t minus 1.
23:35 - 23:41
23:41 - 23:47

So I'm also going to
insert into the list
23:47 - 23:53

the name of the title
of the log returns.
23:53 - 23:57

So at index position
0, I'm going
23:57 - 24:09

to put in the title, NASDAQ Log
Returns and then percent change.
24:09 - 24:13

At the end, I'm just going
to append a none value.
24:13 - 24:19

Because again, for one
of the calculations,
24:19 - 24:23

we won't have a value
because you'll always
24:23 - 24:27

have a null value because
you won't always have
24:27 - 24:29

a previous day at the very end.
24:29 - 24:31

So let's run this.
24:31 - 24:34

And I should have
the percent change
24:34 - 24:37

calculations for the NASDAQ.
24:37 - 24:40

Awesome.
24:40 - 24:44

What we have to
do within openpyxl
24:44 - 24:48

is we need to add this in.
24:48 - 24:53

So what we'll do is we'll
get the max column value.
24:53 - 24:57

And we're going to
create a new column
24:57 - 25:01

to put this into our workbook--
25:01 - 25:05
25:05 - 25:08

plus 1.
25:08 - 25:10

And I'm going to iterate
this with a for loop--
25:10 - 25:19

for row, value in enumerate,
25:19 - 25:23

And I'm going to iterate
over the percent change.
25:23 - 25:27

Start is equal to 1.
25:27 - 25:30
25:30 - 25:38

And for the nasdaq_sheet,
for the cells,
25:38 - 25:39

I'm going to iterate over.
25:39 - 25:43

row is going to be equal to row.
25:43 - 25:47

column is going to
be equal to max_col.
25:47 - 25:51

So while the rows are
going to iterate over--
25:51 - 25:54

so it'll go from row 0 to n--
25:54 - 25:56

the column will stay the same.
25:56 - 25:59

So we're just writing
to the same column.
25:59 - 26:04

And finally, we're
going to put in all
26:04 - 26:10

the values from the percentage
change list that we created.
26:10 - 26:12

OK, let's run this.
26:12 - 26:18

Finally, let's write
the workbook to Excel.
26:18 - 26:21

So we'll call workbook save.
26:21 - 26:26

And we'll call this--
26:26 - 26:32

similar to what we
did with the S&P data,
26:32 - 26:34

let's call this
nasdaq_data.xlsx.
26:34 - 26:38
26:38 - 26:42

And that will save it.
26:42 - 26:44

And we can check this as well.
26:44 - 26:46

So we'll do pd.read.
26:46 - 26:48

So we'll use pandas
just to verify
26:48 - 26:51

that we did this correctly.
26:51 - 26:57

We need to do this
for the NASDAQ data.
26:57 - 26:58

Copy that path.
26:58 - 27:02
27:02 - 27:05

And we also need to specify
the sheet name for this one
27:05 - 27:09

because we have two sheets-- the
S&P 500, which we didn't edit,
27:09 - 27:12

and the NASDAQ data.
27:12 - 27:13

Let's run this.
27:13 - 27:14

Great.
27:14 - 27:20

So we were also able to
do this with openpyxl,
27:20 - 27:25

where we're able to iterate over
and calculate the log returns.
27:25 - 27:27

So I think, if anything,
this really showed
27:27 - 27:29

the powers of pandas.
27:29 - 27:34

And you can see how pandas fared
relative to two other modules
27:34 - 27:38

and how much easier it was
to calculate the returns
27:38 - 27:45

and to, in general, work
and edit the tabulated data.
27:45 - 27:48

If you want to learn
more about these modules,
27:48 - 27:51

I highly recommend checking
out the documentation.
27:51 - 27:54

It's very easy to
follow and intuitive.
27:54 - 27:58

And it gives a lot
of useful pointers.
27:58 - 28:01

Finally, if you enjoyed the
video, feel free to subscribe.
28:01 - 28:05

You can also connect with me on
LinkedIn, Twitter, and GitHub.
28:05 - 28:08

Happy coding, everyone.
28:08 - 28:08

Title:: Python - Reading and Writing csv and Excel Files!
Description:: more » « less
Video Language:: English
Duration:: 28:08

TTU_OAL edited English subtitles for Python - Reading and Writing csv and Excel Files!

Jun 12, 2025, 9:07 PM

English subtitles

Revisions

Revision 1 Uploaded

TTU_OAL Jun 12, 2025, 9:07 PM

Python - Reading and Writing csv and Excel Files!

Revisions

Our website uses cookies

Operating cookies (Required)