< Return to Video

Python - Reading and Writing csv and Excel Files!

  • 0:00 - 0:01
    PROFESSOR: Hi, everyone.
  • 0:01 - 0:05
    In this video, I'll show you how
    to read and write CSV and Excel
  • 0:05 - 0:08
    files within Python.
  • 0:08 - 0:13
    This is a valuable skill to have
    in industry and in academia.
  • 0:13 - 0:18
    We can see on a job site like
    indeed.com, plenty of employers
  • 0:18 - 0:20
    are looking for
    data analysts that
  • 0:20 - 0:23
    can make sense of all
    the CSV and Excel files
  • 0:23 - 0:23
    they have stored.
  • 0:23 - 0:27
  • 0:27 - 0:27
    Great.
  • 0:27 - 0:30
    Let's get started.
  • 0:30 - 0:33
    We'll be working with
    three different modules
  • 0:33 - 0:40
    in this tutorial-- pandas,
    pyexcel, and openpyxl.
  • 0:40 - 0:46
    If you're working in a web IDE
    like myself and, in particular,
  • 0:46 - 0:52
    Google Colab, the way that we
    have this loaded in is we pip
  • 0:52 - 0:54
    install the pyexcel module.
  • 0:54 - 0:58
  • 0:58 - 1:00
    And once starts
    installed, we need
  • 1:00 - 1:06
    to restart the runtime in order
    to be able to use pyexcel.
  • 1:06 - 1:08
    So we'll run this
    cell one more time.
  • 1:08 - 1:11
  • 1:11 - 1:15
    And we can see that all the
    requirements are satisfied.
  • 1:15 - 1:18
    So we should be set.
  • 1:18 - 1:22
    The next thing we need to do
    is we need to load in the file
  • 1:22 - 1:23
    that we're working with.
  • 1:23 - 1:25
    Again, since I'm
    in Google Colab,
  • 1:25 - 1:30
    I'm just going to drag and
    drop the CSV file I'll work
  • 1:30 - 1:32
    with into the Content folder.
  • 1:32 - 1:36
  • 1:36 - 1:39
    So the first module I'll
    highlight is pandas.
  • 1:39 - 1:44
    And pandas is probably the
    best module to work with.
  • 1:44 - 1:48
    And I think it will become
    more obvious as we go
  • 1:48 - 1:50
    through some of the examples.
  • 1:50 - 1:52
    So the first thing
    that we'll do is
  • 1:52 - 1:54
    we'll create a data
    fram which is going
  • 1:54 - 1:59
    to store all the data in
    the CSV file in tabular form
  • 1:59 - 2:01
    within Python.
  • 2:01 - 2:07
    So for data frames, they're
    almost always called df.
  • 2:07 - 2:10
    And that will be the variable
    name for the data frame.
  • 2:10 - 2:14
    We're going to call
    the pandas module pd.
  • 2:14 - 2:17
    Then we need to read CSV.
  • 2:17 - 2:22
    And we can see here that
    pandas is a very powerful tool.
  • 2:22 - 2:25
    And it can read
    CSVs, read Excels,
  • 2:25 - 2:32
    and far more different file
    types, such as SAS or SQL files.
  • 2:32 - 2:37
    So we're going to select
    the CSV, parentheses.
  • 2:37 - 2:42
    Then within the apostrophes,
    we're going to copy the path.
  • 2:42 - 2:46
    And we're going to
    put the path in here.
  • 2:46 - 2:48
    And our data frame
    should be loaded.
  • 2:48 - 2:51
    So now, let's take
    a look at the data.
  • 2:51 - 2:54
    The way that we can
    do that is df head,
  • 2:54 - 2:57
    which will show the
    first n number of rows
  • 2:57 - 2:58
    that we put into the parameter.
  • 2:58 - 3:02
    So I'm going to take a
    look at the first 10 rows.
  • 3:02 - 3:03
    Great.
  • 3:03 - 3:05
    So here, I have financial data.
  • 3:05 - 3:10
    And these are daily
    prices for the S&P 500
  • 3:10 - 3:13
    as well as the NASDAQ.
  • 3:13 - 3:16
    And I'm going to do a few
    different things into this.
  • 3:16 - 3:19
    But before we can
    do anything, let's
  • 3:19 - 3:20
    take a look at the data types.
  • 3:20 - 3:27
    So if we look at df.dtypes,
    this will give us the data types
  • 3:27 - 3:30
    of each of these.
  • 3:30 - 3:30
    Great.
  • 3:30 - 3:36
    So it looks like we have
    this all in numeric form.
  • 3:36 - 3:43
    And we actually don't have
    to make any edits to turn
  • 3:43 - 3:46
    these objects into numbers.
  • 3:46 - 3:49
    So let's say that
    what I want to do
  • 3:49 - 3:51
    is I want to get
    the daily returns.
  • 3:51 - 3:55
    And what I'll do is I'll
    get the daily log returns.
  • 3:55 - 3:57
    So the way that we
    can do this is I'm
  • 3:57 - 4:03
    going to create two new
    columns within this data frame.
  • 4:03 - 4:05
    The first one, I'll call df.
  • 4:05 - 4:13
    And within the brackets, I'll
    call that NASDAQ Log Return.
  • 4:13 - 4:16
  • 4:16 - 4:20
    And this is going to
    create our column here.
  • 4:20 - 4:23
    Now, I have to actually
    put in the operation
  • 4:23 - 4:25
    to create the column.
  • 4:25 - 4:29
    And the way that we get
    the log return is we
  • 4:29 - 4:36
    can just take the log
    difference between the price
  • 4:36 - 4:41
    at time t subtracted by
    the price at t minus 1.
  • 4:41 - 4:49
    So we'll take this 4,441
    and subtract it by 4,405.
  • 4:49 - 4:51
    And that will give
    us the log return.
  • 4:51 - 4:57
    So we'll actually call
    the NumPy to do this.
  • 4:57 - 5:05
    So np.log df and NASDAQ Close.
  • 5:05 - 5:11
    And this is going to get all the
    closing prices for the NASDAQ
  • 5:11 - 5:14
    here in this column.
  • 5:14 - 5:18
    And what I have to do is I
    have to get the lag difference.
  • 5:18 - 5:20
    And pandas makes
    this very easy to do.
  • 5:20 - 5:24
    And we'll see how
    relatively easy this is.
  • 5:24 - 5:30
    So if I go subtract
    this by NASDAQ Close--
  • 5:30 - 5:36
    and what I have to do is I
    have to get this previous day.
  • 5:36 - 5:38
    So subtract this
    day by this day.
  • 5:38 - 5:42
    And do that for all of
    the rows all the way down.
  • 5:42 - 5:46
    And we'll have one missing
    row because there's always
  • 5:46 - 5:50
    going to be one day that you
    don't have the previous day's
  • 5:50 - 5:53
    return.
  • 5:53 - 5:58
    Pandas has a very great function
    that makes this easy to do.
  • 5:58 - 6:00
    It's called the
    dot shift function.
  • 6:00 - 6:02
    And I'm going to
    put in negative 1.
  • 6:02 - 6:07
    And this will enable us to
    get the previous day's return.
  • 6:07 - 6:08
    So let's take a look at that.
  • 6:08 - 6:12
  • 6:12 - 6:13
    Great.
  • 6:13 - 6:15
    And you can always
    check this yourself
  • 6:15 - 6:16
    to make sure it's right.
  • 6:16 - 6:24
    So I'll just do
    np.log minus np.log.
  • 6:24 - 6:34
  • 6:34 - 6:36
    And it matches up great.
  • 6:36 - 6:40
    So we were able to
    calculate the log return.
  • 6:40 - 6:43
    And we'll do the same
    thing for the S&P 500.
  • 6:43 - 6:44
    I'm just going to
    change the name around.
  • 6:44 - 6:49
  • 6:49 - 6:50
    Great.
  • 6:50 - 6:51
    Now, let's take a
    look at that as well.
  • 6:51 - 6:54
  • 6:54 - 7:01
    So we were able to calculate the
    daily log returns very easily
  • 7:01 - 7:04
    within pandas.
  • 7:04 - 7:06
    At this point,
    what I'll do now is
  • 7:06 - 7:10
    I'll take the S&P 500
    data and the NASDAQ data
  • 7:10 - 7:14
    and split it into
    different data frames.
  • 7:14 - 7:17
    And the way I'll do this is
    I'll create two new data frames.
  • 7:17 - 7:20
    The first, I'll call S&P 500.
  • 7:20 - 7:29
    So S&P is equal to and then
    call the df log function.
  • 7:29 - 7:34
    And what I need to do is I
    need to grab a certain column.
  • 7:34 - 7:38
    So what I'll do is, in
    this, I put in a colon
  • 7:38 - 7:43
    because I want all the
    rows, put in my comma.
  • 7:43 - 7:50
    And what I'll do is I'll
    get df.columns, which
  • 7:50 - 7:53
    brings in all the column names.
  • 7:53 - 7:56
    And then I want
    columns 1 through 5.
  • 7:56 - 8:10
    And just to illustrate what
    this looks like before I run it,
  • 8:10 - 8:12
    these are all the
    columns I want.
  • 8:12 - 8:16
    And I want to recalculate
    the returns using
  • 8:16 - 8:19
    either pyexcel or openpyxl.
  • 8:19 - 8:23
    So I have the S&P 500 dataframe.
  • 8:23 - 8:24
    We can take a
    quick look at that.
  • 8:24 - 8:27
  • 8:27 - 8:30
    And I'll create
    a NASDAQ one too.
  • 8:30 - 8:36
    And the way I'll do that
    is I'll drop the columns--
  • 8:36 - 8:38
    so df drop.
  • 8:38 - 8:44
    And that will create another
    instance of the df dataframe.
  • 8:44 - 8:48
    And I'm just going to copy
    and paste this in here.
  • 8:48 - 8:51
    So this gets rid
    of those columns.
  • 8:51 - 8:58
    And then I also want
    to drop the returns.
  • 8:58 - 9:01
    Because again,
    for each of these,
  • 9:01 - 9:05
    I'm going to recalculate
    the returns using
  • 9:05 - 9:09
    the different modules
    and NASDAQ columns.
  • 9:09 - 9:14
  • 9:14 - 9:20
    And we're going to drop
    the last two just to--
  • 9:20 - 9:22
    we'll run this and give an idea.
  • 9:22 - 9:25
    And that just drops
    the last two columns.
  • 9:25 - 9:27
  • 9:27 - 9:29
    So if we take a
    look at the NASDAQ--
  • 9:29 - 9:35
  • 9:35 - 9:36
    great.
  • 9:36 - 9:39
    So we have another dataframe.
  • 9:39 - 9:44
    And pandas makes it very easy
    to bring these data frames
  • 9:44 - 9:48
    into CSV or Excel format.
  • 9:48 - 9:50
    So I'll write this
    in Excel format.
  • 9:50 - 9:53
    And the way that we could
    do this is we take writer
  • 9:53 - 9:54
    is equal to.
  • 9:54 - 9:58
    And pandas has a function
    called ExcelWriter.
  • 9:58 - 10:01
  • 10:01 - 10:08
    And we are going to
    write a new file.
  • 10:08 - 10:12
    We'll just call this
    financial data XLS.
  • 10:12 - 10:13
    Copy path.
  • 10:13 - 10:17
    And what we'll actually
    do is we'll take this S&P.
  • 10:17 - 10:20
    And we'll put it in one
    Excel sheet and this NASDAQ
  • 10:20 - 10:24
    and put it in a
    second Excel sheet.
  • 10:24 - 10:33
    And it will be overall in this
    financial data Excel sheet.
  • 10:33 - 10:36
    And we need to put the right
    extension name-- so XLSX.
  • 10:36 - 10:39
  • 10:39 - 10:42
    And we have to write the
    S&P to the first sheet--
  • 10:42 - 10:44
    so sandp.to_excel.
  • 10:44 - 10:49
  • 10:49 - 10:54
    Then we need to specify that
    it's going to the ExcelWriter.
  • 10:54 - 10:56
    Get a sheet name for it.
  • 10:56 - 11:05
    So we'll call the sheet
    name S&P500, no spaces.
  • 11:05 - 11:07
    And finally, we'll just
    get rid of the index.
  • 11:07 - 11:09
    So index is equal to false.
  • 11:09 - 11:11
    And we'll do the same
    thing for the NASDAQ.
  • 11:11 - 11:29
  • 11:29 - 11:34
    Finally, we just need to put
    in writer.save to save this.
  • 11:34 - 11:37
  • 11:37 - 11:40
    So we were able
    to save our sheet.
  • 11:40 - 11:45
    And we can see it here in the
    Google Colab Content folder.
  • 11:45 - 11:50
    So let's move on to the pyexcel
    module and see how it works.
  • 11:50 - 11:56
    So I'll create a new
    variable called fin_data.
  • 11:56 - 11:58
    And I'll call pyexcel.
  • 11:58 - 12:00
    And we put that
    in as p.get_book.
  • 12:00 - 12:04
  • 12:04 - 12:11
    And we'll take that financial
    data Excel sheet we just
  • 12:11 - 12:13
    created and put it in there.
  • 12:13 - 12:14
    Let's run that.
  • 12:14 - 12:18
  • 12:18 - 12:20
    I need to specify the file name.
  • 12:20 - 12:26
  • 12:26 - 12:36
    And if we return this, we'll
    get a PySpark looking table.
  • 12:36 - 12:38
    And it separates it by sheets.
  • 12:38 - 12:42
    So if we scroll up enough, we'll
    see that the NASDAQ and S&P
  • 12:42 - 12:44
    are separated.
  • 12:44 - 12:46
    So this is the S&P 500 sheet.
  • 12:46 - 12:48
    If I scroll down
    far enough, I'll
  • 12:48 - 12:52
    be able to find
    the NASDAQ sheet.
  • 12:52 - 12:55
    So what I want to do next
    is I want to take this.
  • 12:55 - 12:58
    And I'll actually put
    it in dictionary format.
  • 12:58 - 13:09
    So what I can do with fin_data
    I can say fin dot dict.
  • 13:09 - 13:11
    And this returns a dictionary.
  • 13:11 - 13:15
    And it indexes the sheets.
  • 13:15 - 13:18
    So the first sheet
    will be the S&P 500.
  • 13:18 - 13:20
    And the second sheet
    will be the NASDAQ.
  • 13:20 - 13:22
    And we can see that
    this is an ordered dict,
  • 13:22 - 13:24
    a special type of dictionary.
  • 13:24 - 13:32
    And let's actually save this
    to fin_dict is equal to this.
  • 13:32 - 13:35
  • 13:35 - 13:38
    And I'll have to use
    a different approach
  • 13:38 - 13:40
    to calculate the returns.
  • 13:40 - 13:44
    And I think if anything, this
    shows the power of pandas
  • 13:44 - 13:46
    and how easy it makes it.
  • 13:46 - 13:49
    So what I'll have
    to do is I'll have
  • 13:49 - 13:53
    to create an empty list
    called percent change.
  • 13:53 - 13:56
  • 13:56 - 14:00
    And I'll have to use a for
    loop to calculate the returns.
  • 14:00 - 14:04
    And we'll also calculate
    the log returns for this.
  • 14:04 - 14:06
    So let's do that now.
  • 14:06 - 14:09
    And what I'll have
    to do is say for row.
  • 14:09 - 14:13
    And then I'll say
    row lag because we
  • 14:13 - 14:19
    need the S&P 500
    closing price at time t
  • 14:19 - 14:22
    but also the closing
    price at t minus 1.
  • 14:22 - 14:26
  • 14:26 - 14:30
    And we'll use the zip function.
  • 14:30 - 14:32
    And we'll say fin_data--
  • 14:32 - 14:35
  • 14:35 - 14:40
    or fin_dict because we're
    calling the dictionary.
  • 14:40 - 14:42
    And we need to
    specify the S&P 500
  • 14:42 - 14:46
    because this is a dictionary.
  • 14:46 - 14:49
    So that will be the row.
  • 14:49 - 14:56
    And then for the row
    lag, we are going to do--
  • 14:56 - 15:04
    and we actually also need
    to index this, put it at 1.
  • 15:04 - 15:10
    Because at 0, the first cell
    is the actual title of it.
  • 15:10 - 15:13
    So we don't want to get
    back the string instance.
  • 15:13 - 15:15
    We just want the returns.
  • 15:15 - 15:17
    So what I'll do
    is just copy this.
  • 15:17 - 15:21
    And this will get
    the price at t.
  • 15:21 - 15:26
    And the second part will
    get the price at t minus 1--
  • 15:26 - 15:27
    so the previous day's price.
  • 15:27 - 15:30
  • 15:30 - 15:35
    And what I want to do is I want
    to append the percent change
  • 15:35 - 15:40
    list with that calculation.
  • 15:40 - 15:43
    So it will be np.log.
  • 15:43 - 15:46
  • 15:46 - 15:49
    Then I need to put row.
  • 15:49 - 15:54
    And it's going to be the last--
  • 15:54 - 15:57
    so we want the closing price.
  • 15:57 - 16:00
    And if we just have
    the row, it will
  • 16:00 - 16:07
    return us a full list of
    each of these elements.
  • 16:07 - 16:13
    We just want the last element,
    the S&P 500 closing price.
  • 16:13 - 16:17
    And we want to divide
    that by row_lag.
  • 16:17 - 16:22
    And same thing, we want
    just the closing price.
  • 16:22 - 16:25
    So let's try to run this.
  • 16:25 - 16:27
    Now that we have
    the log percentage
  • 16:27 - 16:33
    change for the S&P 500, you can
    take a quick look at that here.
  • 16:33 - 16:41
    And that looks like it matches
    the calculation that we had.
  • 16:41 - 16:45
  • 16:45 - 16:48
    And we were able to
    calculate the log returns.
  • 16:48 - 16:53
    So now, we have to append
    this percentage change
  • 16:53 - 16:58
    list to the fin_dict
    that we created.
  • 16:58 - 17:01
    So what we'll do is for row.
  • 17:01 - 17:14
    And then I'll put rtn for
    return in zip fin_dict S&P500.
  • 17:14 - 17:20
    And we need to,
    again, put it as going
  • 17:20 - 17:25
    from one to the end because
    we'll have to later append
  • 17:25 - 17:27
    the title, log return to it.
  • 17:27 - 17:30
    But right now, we're just
    appending the returns.
  • 17:30 - 17:32
    Percentage change.
  • 17:32 - 17:40
    And we want to
    row.append the return.
  • 17:40 - 17:43
  • 17:43 - 17:50
    And we'll also append
    manually the title fin_dict
  • 17:50 - 17:53
    for the S&P 500.
  • 17:53 - 17:56
    And this is going
    to be the title.
  • 17:56 - 18:04
    So it's going to be 0 or index
    position 0 dot append S&P
  • 18:04 - 18:05
    500 Log Return.
  • 18:05 - 18:09
  • 18:09 - 18:11
    Let's run this.
  • 18:11 - 18:11
    Great.
  • 18:11 - 18:14
    Now, it should be all set.
  • 18:14 - 18:19
    And we can save this
    to an Excel file.
  • 18:19 - 18:24
    So save as array is equal to--
  • 18:24 - 18:27
    and this is just going
    to be the fin_dict.
  • 18:27 - 18:31
    We don't want to get the NASDAQ
    because we didn't edit it.
  • 18:31 - 18:36
    And the destination file name.
  • 18:36 - 18:40
  • 18:40 - 18:42
    So it's going to go
    to the Content folder.
  • 18:42 - 18:45
    I'll just copy this
    and then edit it.
  • 18:45 - 18:49
    So it'll be the S&P 500.
  • 18:49 - 18:58
    And I'll just call
    this S&P data.
  • 18:58 - 19:00
    And save that.
  • 19:00 - 19:03
    And we can check that we were
    able to write this correctly.
  • 19:03 - 19:08
    So the way that we can do
    that is we can copy the path.
  • 19:08 - 19:11
    And we can use
    pandas-- pd.read--
  • 19:11 - 19:14
  • 19:14 - 19:16
    and this time, we're
    reading an Excel file.
  • 19:16 - 19:18
  • 19:18 - 19:19
    Great.
  • 19:19 - 19:25
    So it looks like we were able
    to successfully iterate over
  • 19:25 - 19:32
    this within pyexcel and
    write to an Excel file
  • 19:32 - 19:37
    and correctly add the
    returns from that sheet.
  • 19:37 - 19:42
    Now, finally, let's take a look
    at the last module, openpyxl.
  • 19:42 - 19:47
    And we'll now calculate
    the returns for the NASDAQ.
  • 19:47 - 19:50
    So the way that we
    work with openpyxl
  • 19:50 - 19:51
    is we work with workbooks.
  • 19:51 - 19:59
    And I'll call this workbook
    is equal to load_workbook.
  • 19:59 - 20:02
  • 20:02 - 20:06
    And I am going to put in
    the file name is equal to--
  • 20:06 - 20:10
    and it's going to be
    this financial data
  • 20:10 - 20:13
    sheet or workbook.
  • 20:13 - 20:16
  • 20:16 - 20:18
    Let's read that in.
  • 20:18 - 20:19
    Great.
  • 20:19 - 20:24
    So if I take a look at
    this, this is the object--
  • 20:24 - 20:26
    so workbook object.
  • 20:26 - 20:29
    And what I could do
    is I can actually
  • 20:29 - 20:32
    get the NASDAQ sheet from this.
  • 20:32 - 20:34
    So we can see worksheet NASDAQ.
  • 20:34 - 20:38
    And as we recall earlier, when
    we wrote this from pandas,
  • 20:38 - 20:41
    we labeled the sheet NASDAQ.
  • 20:41 - 20:47
    So I'll save this into a
    variable called nasdaq_sheet.
  • 20:47 - 20:47
    Run this.
  • 20:47 - 20:50
  • 20:50 - 20:52
    And great.
  • 20:52 - 20:58
    So similar to what we did over
    with pyexcel for openpyxl,
  • 20:58 - 21:03
    we need to iterate
    over the rows in order
  • 21:03 - 21:09
    to be able to get
    the returns for this.
  • 21:09 - 21:13
    So I'll call this
    nasdaq_l for NASDAQ list
  • 21:13 - 21:18
    and create an empty list.
  • 21:18 - 21:24
    And I'm going to go
    for row in range 2.
  • 21:24 - 21:31
    And for the NASDAQ
    sheet, we're going
  • 21:31 - 21:40
    to get the number of
    rows, max row plus 1.
  • 21:40 - 21:43
    So what this is
    going to do is it's
  • 21:43 - 21:51
    going to get us all those values
    within the NASDAQ sheet rows.
  • 21:51 - 21:54
    And what we'll do
    is nasdaq_l.append.
  • 21:54 - 22:01
  • 22:01 - 22:05
    So for openpyxl, we're
    actually going to work with
  • 22:05 - 22:11
    the underlying cells like you
    would in an Excel workbook--
  • 22:11 - 22:12
    so nasdaq_sheet.cell.
  • 22:12 - 22:16
  • 22:16 - 22:17
    We're going to get the row.
  • 22:17 - 22:20
  • 22:20 - 22:32
    nasdaq_sheet.max_column minus 1.
  • 22:32 - 22:34
    And then we're going
    to get the values--
  • 22:34 - 22:34
    value.
  • 22:34 - 22:38
  • 22:38 - 22:39
    Let's run this.
  • 22:39 - 22:42
    And let's return
    the NASDAQ list.
  • 22:42 - 22:48
    So I was able to copy
    all of the NASDAQ prices
  • 22:48 - 22:50
    into this NASDAQ list.
  • 22:50 - 22:52
  • 22:52 - 22:57
    And like I did before
    with the pyexcel module,
  • 22:57 - 23:01
    I need to also calculate
    the percentage change.
  • 23:01 - 23:05
    And I'll just write over it.
  • 23:05 - 23:09
    And what I'm going to
    do is for row, similar
  • 23:09 - 23:12
    to what we did before, row_lag.
  • 23:12 - 23:21
  • 23:21 - 23:26
    And let's just copy
    and paste this.
  • 23:26 - 23:29
    And this is the same thing I
    did for the S&P 500 before.
  • 23:29 - 23:32
    So we have the price at t.
  • 23:32 - 23:35
    And we're getting the
    difference at t minus 1.
  • 23:35 - 23:41
  • 23:41 - 23:47
    So I'm also going to
    insert into the list
  • 23:47 - 23:53
    the name of the title
    of the log returns.
  • 23:53 - 23:57
    So at index position
    0, I'm going
  • 23:57 - 24:09
    to put in the title, NASDAQ Log
    Returns and then percent change.
  • 24:09 - 24:13
    At the end, I'm just going
    to append a none value.
  • 24:13 - 24:19
    Because again, for one
    of the calculations,
  • 24:19 - 24:23
    we won't have a value
    because you'll always
  • 24:23 - 24:27
    have a null value because
    you won't always have
  • 24:27 - 24:29
    a previous day at the very end.
  • 24:29 - 24:31
    So let's run this.
  • 24:31 - 24:34
    And I should have
    the percent change
  • 24:34 - 24:37
    calculations for the NASDAQ.
  • 24:37 - 24:40
    Awesome.
  • 24:40 - 24:44
    What we have to
    do within openpyxl
  • 24:44 - 24:48
    is we need to add this in.
  • 24:48 - 24:53
    So what we'll do is we'll
    get the max column value.
  • 24:53 - 24:57
    And we're going to
    create a new column
  • 24:57 - 25:01
    to put this into our workbook--
  • 25:01 - 25:05
  • 25:05 - 25:08
    plus 1.
  • 25:08 - 25:10
    And I'm going to iterate
    this with a for loop--
  • 25:10 - 25:19
    for row, value in enumerate,
  • 25:19 - 25:23
    And I'm going to iterate
    over the percent change.
  • 25:23 - 25:27
    Start is equal to 1.
  • 25:27 - 25:30
  • 25:30 - 25:38
    And for the nasdaq_sheet,
    for the cells,
  • 25:38 - 25:39
    I'm going to iterate over.
  • 25:39 - 25:43
    row is going to be equal to row.
  • 25:43 - 25:47
    column is going to
    be equal to max_col.
  • 25:47 - 25:51
    So while the rows are
    going to iterate over--
  • 25:51 - 25:54
    so it'll go from row 0 to n--
  • 25:54 - 25:56
    the column will stay the same.
  • 25:56 - 25:59
    So we're just writing
    to the same column.
  • 25:59 - 26:04
    And finally, we're
    going to put in all
  • 26:04 - 26:10
    the values from the percentage
    change list that we created.
  • 26:10 - 26:12
    OK, let's run this.
  • 26:12 - 26:18
    Finally, let's write
    the workbook to Excel.
  • 26:18 - 26:21
    So we'll call workbook save.
  • 26:21 - 26:26
    And we'll call this--
  • 26:26 - 26:32
    similar to what we
    did with the S&P data,
  • 26:32 - 26:34
    let's call this
    nasdaq_data.xlsx.
  • 26:34 - 26:38
  • 26:38 - 26:42
    And that will save it.
  • 26:42 - 26:44
    And we can check this as well.
  • 26:44 - 26:46
    So we'll do pd.read.
  • 26:46 - 26:48
    So we'll use pandas
    just to verify
  • 26:48 - 26:51
    that we did this correctly.
  • 26:51 - 26:57
    We need to do this
    for the NASDAQ data.
  • 26:57 - 26:58
    Copy that path.
  • 26:58 - 27:02
  • 27:02 - 27:05
    And we also need to specify
    the sheet name for this one
  • 27:05 - 27:09
    because we have two sheets-- the
    S&P 500, which we didn't edit,
  • 27:09 - 27:12
    and the NASDAQ data.
  • 27:12 - 27:13
    Let's run this.
  • 27:13 - 27:14
    Great.
  • 27:14 - 27:20
    So we were also able to
    do this with openpyxl,
  • 27:20 - 27:25
    where we're able to iterate over
    and calculate the log returns.
  • 27:25 - 27:27
    So I think, if anything,
    this really showed
  • 27:27 - 27:29
    the powers of pandas.
  • 27:29 - 27:34
    And you can see how pandas fared
    relative to two other modules
  • 27:34 - 27:38
    and how much easier it was
    to calculate the returns
  • 27:38 - 27:45
    and to, in general, work
    and edit the tabulated data.
  • 27:45 - 27:48
    If you want to learn
    more about these modules,
  • 27:48 - 27:51
    I highly recommend checking
    out the documentation.
  • 27:51 - 27:54
    It's very easy to
    follow and intuitive.
  • 27:54 - 27:58
    And it gives a lot
    of useful pointers.
  • 27:58 - 28:01
    Finally, if you enjoyed the
    video, feel free to subscribe.
  • 28:01 - 28:05
    You can also connect with me on
    LinkedIn, Twitter, and GitHub.
  • 28:05 - 28:08
    Happy coding, everyone.
  • 28:08 - 28:08
Title:
Python - Reading and Writing csv and Excel Files!
Description:

more » « less
Video Language:
English
Duration:
28:08

English subtitles

Revisions