1
00:00:00,100 --> 00:00:02,350
♪ [music] ♪

2
00:00:03,700 --> 00:00:05,700
- [narrator] Welcome
to Nobel Conversations.

3
00:00:07,000 --> 00:00:10,128
In this episode, Josh Angrist
and Guido Imbens

4
00:00:10,128 --> 00:00:13,700
sit down with Isaiah Andrews
to discuss and disagree

5
00:00:13,700 --> 00:00:16,580
over the role of machine learning
in applied econometrics.

6
00:00:18,300 --> 00:00:19,769
- [Isaiah] So, of course,
there are a lot of topics

7
00:00:19,769 --> 00:00:21,087
where you guys largely agree,

8
00:00:21,087 --> 00:00:22,313
but I'd like to turn to one

9
00:00:22,313 --> 00:00:24,240
where maybe you have
some differences of opinion.

10
00:00:24,240 --> 00:00:25,728
So I'd love to hear
some of your thoughts

11
00:00:25,728 --> 00:00:26,883
about machine learning

12
00:00:26,883 --> 00:00:29,900
and the goal that it's playing
and is going to play in economics.

13
00:00:30,200 --> 00:00:33,352
- [Guido] I've looked at some data
like the proprietary

14
00:00:33,352 --> 00:00:35,100
so that there's
no published paper there.

15
00:00:36,719 --> 00:00:38,159
There was an experiment
that was done

16
00:00:38,159 --> 00:00:39,500
on some search algorithm.

17
00:00:39,700 --> 00:00:41,497
And the question was...

18
00:00:42,901 --> 00:00:45,600
it was about ranking things
and changing the ranking.

19
00:00:45,900 --> 00:00:47,500
That was sort of clear...

20
00:00:48,400 --> 00:00:50,600
that was going to be
a lot of heterogeneity there.

21
00:00:50,600 --> 00:00:51,700
Mmm,

22
00:00:51,700 --> 00:00:58,120
You know, if you look for say,

23
00:00:58,300 --> 00:01:00,350
a picture of Britney Spears

24
00:01:00,350 --> 00:01:02,400
that it doesn't really matter
where you rank it

25
00:01:02,400 --> 00:01:05,500
because you're going to figure out
what you're looking for,

26
00:01:06,200 --> 00:01:07,867
whether you put it
in the first or second

27
00:01:07,867 --> 00:01:09,800
or third position of the ranking.

28
00:01:10,100 --> 00:01:12,500
But if you're looking
for the best econometrics book,

29
00:01:13,300 --> 00:01:16,500
if you put your book
first or your book tenth,

30
00:01:16,500 --> 00:01:18,100
that's going to make
a big difference

31
00:01:18,600 --> 00:01:21,829
how much how often people
are going to click on it.

32
00:01:21,829 --> 00:01:23,417
And so there you go --

33
00:01:23,417 --> 00:01:27,218
- [Josh] Why do I need
machine learning to discover that?

34
00:01:27,218 --> 00:01:29,195
It seems like could
I can discover it simply?

35
00:01:29,195 --> 00:01:30,435
- [Guido] So in general--

36
00:01:30,435 --> 00:01:32,100
- [Josh] There were lots
of possible...

37
00:01:32,100 --> 00:01:35,490
- You what you want to think about
there being lots of characteristics

38
00:01:35,490 --> 00:01:37,610
of the items

39
00:01:37,610 --> 00:01:41,682
that you want to understand
what drives the heterogeneity

40
00:01:42,300 --> 00:01:43,427
in the effect of--

41
00:01:43,427 --> 00:01:45,600
- But you're just predicting

42
00:01:45,600 --> 00:01:47,700
In some sense, you're solving
a marketing problem.

43
00:01:48,400 --> 00:01:49,580
- [inaudible] it's causal effect,

44
00:01:49,580 --> 00:01:51,800
- It's causal, but it has
no scientific content.

45
00:01:51,800 --> 00:01:53,300
Think about...

46
00:01:54,100 --> 00:01:57,300
- No, but it's similar things
in medical settings.

47
00:01:58,000 --> 00:02:01,300
If you do an experiment, 
you may actually be very interested

48
00:02:01,300 --> 00:02:03,900
in whether the treatment
works for some groups or not.

49
00:02:03,900 --> 00:02:06,500
And you have a lot of individual
characteristics,

50
00:02:06,500 --> 00:02:08,000
and you want
to systematically search.

51
00:02:08,000 --> 00:02:09,500
- Yeah. I'm skeptical about that --

52
00:02:09,500 --> 00:02:12,603
that sort of idea that there's
this personal causal effect

53
00:02:12,603 --> 00:02:13,900
that I should care about,

54
00:02:14,000 --> 00:02:16,063
and that machine learning
can discover it

55
00:02:16,063 --> 00:02:17,596
in some way that's useful.

56
00:02:17,596 --> 00:02:21,400
So think about -- I've done
a lot of work on schools,

57
00:02:21,400 --> 00:02:23,950
going to, say, a charter school,

58
00:02:23,950 --> 00:02:25,225
a publicly funded private school,

59
00:02:25,225 --> 00:02:26,500
effectively, you know,
that's free to structure

60
00:02:26,500 --> 00:02:29,300
its own curriculum
for context there.

61
00:02:29,300 --> 00:02:31,000
Some types of charter schools

62
00:02:31,000 --> 00:02:32,700
generate spectacular
achievement gains,

63
00:02:32,700 --> 00:02:36,400
and in the data set
that produces that result,

64
00:02:36,400 --> 00:02:37,800
I have a lot of covariance.

65
00:02:37,800 --> 00:02:41,200
So I have baseline scores,
and I have family background,

66
00:02:41,200 --> 00:02:45,800
the education of the parents, the sex
of the child, the race of the child.

67
00:02:45,800 --> 00:02:48,300
And, well, soon as I put

68
00:02:48,400 --> 00:02:51,900
Half a dozen of those together. I
have a very high dimensional space.

69
00:02:52,300 --> 00:02:54,900
I'm definitely interested
in in sort, of course,

70
00:02:54,900 --> 00:02:59,400
features of that treatment effect,
like whether it's better for people who

71
00:02:59,900 --> 00:03:02,100
come from lower income families.

72
00:03:02,600 --> 00:03:06,000
I have a hard time believing
that there's an application,

73
00:03:06,400 --> 00:03:10,300
you know, for the very high
dimensional version of that, where

74
00:03:10,500 --> 00:03:13,200
I discovered that for
non-white children who have

75
00:03:13,800 --> 00:03:17,800
high family incomes, but Baseline
scores in the third quartile,

76
00:03:18,300 --> 00:03:23,000
And only went to public school in the
third grade, but not the sixth grade.

77
00:03:23,000 --> 00:03:25,500
So that's what that high
dimensional analysis produces.

78
00:03:25,800 --> 00:03:28,100
This very elaborate conditional statement.

79
00:03:28,300 --> 00:03:31,000
There's two things that are wrong
with that. In my view first.

80
00:03:31,000 --> 00:03:34,000
I don't see it as I just can't
imagine why it's actionable.

81
00:03:34,600 --> 00:03:36,600
I don't know why you'd want to act on it.

82
00:03:36,600 --> 00:03:41,200
And I know also that there's some
alternative model that fits almost as well.

83
00:03:41,800 --> 00:03:43,000
That flips everything,

84
00:03:43,200 --> 00:03:47,500
right? Because machine learning doesn't
tell me that this is really the predictor

85
00:03:47,900 --> 00:03:48,100
that

86
00:03:48,400 --> 00:03:52,300
Is it just tells me that this
is a good predictor? And so,

87
00:03:52,800 --> 00:03:55,900
you know, I think there is
something different about the

88
00:03:56,000 --> 00:03:58,400
Moss social science contest. So I think

89
00:03:58,500 --> 00:04:02,600
the socialized signs of applications
you're talking about once where

90
00:04:03,400 --> 00:04:08,100
I think there's not a huge amount
of heterogeneity in the effects.

91
00:04:08,400 --> 00:04:14,000
And so what there might be a few
allow me to to fill that space. No,

92
00:04:14,600 --> 00:04:18,100
not even then I think for
a lot of those those into

93
00:04:18,300 --> 00:04:22,000
Sanctions even effect. You would expect
that. The effect is the same sign

94
00:04:22,100 --> 00:04:22,900
for everybody.

95
00:04:23,400 --> 00:04:27,600
It may be there may be small differences
in the magnitude, but it's not

96
00:04:28,200 --> 00:04:31,700
for a lot of these education
defenses. They're good for everybody.

97
00:04:31,800 --> 00:04:32,300
They're

98
00:04:32,900 --> 00:04:37,600
the it's not that they're bad for some
people and good for other people and

99
00:04:37,600 --> 00:04:40,800
that is kind of very small
Pockets where they're bad the

100
00:04:40,900 --> 00:04:43,900
but it may be some
variation in the magnitude,

101
00:04:44,000 --> 00:04:48,200
but you would need very very big
data sets to find those and I

102
00:04:48,400 --> 00:04:51,400
Then in those cases, they probably
wouldn't be very actionable anyone.

103
00:04:51,700 --> 00:04:53,800
But there's I think there's
a lot of other settings

104
00:04:54,100 --> 00:04:56,600
where there is much more hydrogen it.

105
00:04:57,400 --> 00:05:01,600
Well, I'm open to that possibility
and I think the example you gave of

106
00:05:01,900 --> 00:05:05,000
it's essentially a marketing example.

107
00:05:06,400 --> 00:05:08,400
Now that maybe they
say there's a there's a

108
00:05:08,500 --> 00:05:10,700
have implications for
and that's organization.

109
00:05:10,700 --> 00:05:13,900
How you actually need to
whether you need to worry about

110
00:05:14,000 --> 00:05:17,900
the well, I know Market
power, some see that paper.

111
00:05:18,400 --> 00:05:21,200
So that's the sense. The
sense I'm getting is that

112
00:05:21,500 --> 00:05:23,500
we still disagree on something. Yes.

113
00:05:24,100 --> 00:05:26,700
We have it converged on
everything. I'm getting that sense.

114
00:05:27,200 --> 00:05:31,000
Actually. We've diverged on this because
this wasn't around to argue about.

115
00:05:33,200 --> 00:05:38,000
Is it getting a little warm here? Yeah.
Warm warmed up. Warmed up is good.

116
00:05:38,100 --> 00:05:40,800
The sense. I'm getting his Jaws.
Sort of, you're not, you're not

117
00:05:40,900 --> 00:05:43,400
saying that you're confident
that there is no way.

118
00:05:43,400 --> 00:05:45,400
That there is an application
where the stuff is useful.

119
00:05:45,400 --> 00:05:48,200
You are saying you are you're
unconvinced by the existing.

120
00:05:48,300 --> 00:05:52,200
Applications to dedicate fair
that I'm very confident. Yeah,

121
00:05:54,200 --> 00:05:55,000
in this case.

122
00:05:55,300 --> 00:05:57,500
I think Josh does have a point that today

123
00:05:58,000 --> 00:06:02,100
even in the prediction cases the where

124
00:06:02,300 --> 00:06:05,000
a lot of the machine learning
methods really shine is

125
00:06:05,000 --> 00:06:06,600
where there's just a lot of heterogeneity.

126
00:06:07,300 --> 00:06:10,600
You don't really care much
about the details there, right?

127
00:06:10,900 --> 00:06:15,000
Yes. It does. It doesn't have
a policy angle or something.

128
00:06:15,200 --> 00:06:18,100
They kind of recognizing
handwritten digits and stuff.

129
00:06:18,300 --> 00:06:24,000
For it does much better there than
building some complicated model.

130
00:06:24,400 --> 00:06:28,100
But a lot of the social science, a
lot of the economic applications.

131
00:06:28,300 --> 00:06:32,100
We actually know a huge amount about the
relationship between various variables.

132
00:06:32,100 --> 00:06:34,600
A lot of the relationships
are strictly monotone.

133
00:06:35,400 --> 00:06:39,400
There and education is going
to increase people's earnings,

134
00:06:39,800 --> 00:06:44,100
irrespective of the demographic,
irrespective of the level of Education.

135
00:06:44,100 --> 00:06:47,800
You already have until they get to a
PhD. Yeah. There is a graduate school.

136
00:06:49,500 --> 00:06:50,700
A reasonable range.

137
00:06:51,600 --> 00:06:55,900
It's a it's not going to
go down very much. We're

138
00:06:56,100 --> 00:06:59,700
in a lot of the settings. For these
machine learning method shine.

139
00:06:59,700 --> 00:07:01,900
It's going to there's a lot
of non-monetary Necessities

140
00:07:02,100 --> 00:07:04,900
kind of multi modality
in these relationships

141
00:07:05,300 --> 00:07:11,500
and they're they're going to be very
powerful but I still stand by that.

142
00:07:11,700 --> 00:07:16,100
It kind of It kind of this message just
have a huge amount to offer the for

143
00:07:16,400 --> 00:07:18,100
for economists and they go.

144
00:07:18,200 --> 00:07:21,700
To be a big part of the future.

145
00:07:23,400 --> 00:07:25,800
Feels like there's something interesting
to be said about machine learning here.

146
00:07:25,800 --> 00:07:27,700
So, here I was wondering,
could you give some more,

147
00:07:28,000 --> 00:07:29,000
maybe some examples

148
00:07:29,000 --> 00:07:32,500
of the sorts of examples you're thinking
about with applications? I'm at the moment.

149
00:07:32,500 --> 00:07:34,100
So while I'm on areas where

150
00:07:34,700 --> 00:07:36,400
instead of looking for average

151
00:07:36,500 --> 00:07:42,200
cause of facts were looking for
individualized estimates, and predictions of

152
00:07:42,400 --> 00:07:47,500
of course of facts and their machine
learning algorithms have been very effective,

153
00:07:48,000 --> 00:07:48,100
too.

154
00:07:48,300 --> 00:07:51,500
Surely would have, we would have done
these things, using kernel methods.

155
00:07:51,600 --> 00:07:54,500
And theoretically they work great and

156
00:07:54,600 --> 00:07:57,400
the sort of some arguments that
you formally can't do any better.

157
00:07:57,600 --> 00:08:00,500
But in practice, they
don't work very well and

158
00:08:00,900 --> 00:08:05,400
random Forest, random cause of forest
type things that stuff on wagon, Susan.

159
00:08:05,400 --> 00:08:09,500
I think I've been working
on. I used very widely.

160
00:08:09,600 --> 00:08:12,200
They've been very effective,
kind of, in the settings

161
00:08:12,400 --> 00:08:18,100
to actually get cause of facts
that are that the ferry by

162
00:08:18,200 --> 00:08:19,900
Bike over has, and this kind of,

163
00:08:20,700 --> 00:08:25,700
I think this is still just the beginning
of these methods. But in many cases,

164
00:08:26,400 --> 00:08:31,600
the these algorithms are very
effective as searching over big spaces

165
00:08:31,800 --> 00:08:35,600
and finding the functions that fit

166
00:08:35,900 --> 00:08:41,100
the very well in ways that we
couldn't really do the beforehand.

167
00:08:41,500 --> 00:08:45,300
I don't know of an example, where
machine learning has generated insights

168
00:08:45,300 --> 00:08:48,100
about a causal effect that
I'm interested in. And I,

169
00:08:48,300 --> 00:08:51,300
You know of examples where it's
potentially very misleading.

170
00:08:51,300 --> 00:08:53,700
So I've done some work
with Brigham Franz and

171
00:08:54,100 --> 00:08:55,100
using, for example,

172
00:08:55,100 --> 00:08:59,900
random Forest to model covariate effects
in an instrumental variables problem.

173
00:09:00,200 --> 00:09:01,200
Where you need,

174
00:09:01,600 --> 00:09:03,500
you need to condition on covariance

175
00:09:04,400 --> 00:09:08,200
and you don't particularly have strong
feelings about the functional form for that.

176
00:09:08,200 --> 00:09:10,000
So maybe you should curve

177
00:09:10,500 --> 00:09:10,900
think,

178
00:09:10,900 --> 00:09:14,500
be open to flexible curve fitting
and that leads you down a path

179
00:09:14,500 --> 00:09:18,000
where there's a lot of
nonlinearities in the model and

180
00:09:18,200 --> 00:09:23,000
That's very dangerous with IV because
any sort of excluded non-linearity

181
00:09:23,300 --> 00:09:27,600
potentially generates a spurious, causal
effect and Brigham. And I showed that

182
00:09:27,900 --> 00:09:32,200
very powerfully. I think in
the case of two instruments

183
00:09:32,700 --> 00:09:36,000
that come from a paper, mine
with Bill Evans. Where if you,

184
00:09:36,500 --> 00:09:37,600
you know, replace it

185
00:09:38,100 --> 00:09:42,600
in a traditional two stage least squares,
estimator with some kind of random Forest.

186
00:09:42,900 --> 00:09:48,000
You get very precisely at
estimated nonsense estimates and

187
00:09:49,000 --> 00:09:51,100
You know, I think that's
a, that's a big caution.

188
00:09:51,100 --> 00:09:53,400
And I, you know, in view of those findings

189
00:09:53,700 --> 00:09:57,100
in an example, I care about where
the instruments are very simple

190
00:09:57,400 --> 00:09:59,100
and I believe that they're valid,

191
00:09:59,300 --> 00:10:01,600
you know, I would be skeptical of that. So

192
00:10:02,900 --> 00:10:06,800
non-linearity and Ivy don't mix
very comfortably. Now I said,

193
00:10:07,200 --> 00:10:11,400
you know in some sense that's already
a more complicated. Well, it's Ivy.

194
00:10:11,600 --> 00:10:11,900
Yeah,

195
00:10:12,500 --> 00:10:16,700
but then we work on that and friend out.

196
00:10:18,600 --> 00:10:22,300
I sat in tow vehicle actually guy a lot
of these papers Cross by my desk and it,

197
00:10:22,700 --> 00:10:29,500
but the motivation is is not
clear at a fact, really lacking.

198
00:10:29,800 --> 00:10:35,100
And they're not, they're not, they called
type semi-parametric foundational papers.

199
00:10:35,400 --> 00:10:37,100
So that that's a big problem

200
00:10:38,000 --> 00:10:42,400
and kind of related problem is that
we have this tradition in econometrics

201
00:10:42,600 --> 00:10:47,500
being very focused on these formulas
and tonic results kind of weird.

202
00:10:48,800 --> 00:10:52,600
We have just have a lot of papers
that where you people, propose

203
00:10:52,800 --> 00:10:55,700
a method and then establish
the asymptotic properties

204
00:10:56,300 --> 00:11:01,900
in in a very kind of
standardized way that bad.

205
00:11:02,900 --> 00:11:07,200
Well, I think it's sort of close
the door for a lot of work.

206
00:11:07,200 --> 00:11:11,600
That doesn't fit it into that. We're
in the machine learning literature.

207
00:11:11,900 --> 00:11:14,300
A lot of things are
more algorithmic people.

208
00:11:15,700 --> 00:11:18,500
Had algorithms for coming
up with predictions.

209
00:11:18,800 --> 00:11:23,600
The turn out to actually work much better
than say, nonparametric kernel regression

210
00:11:24,000 --> 00:11:26,800
for a long-ass time. We're doing all
the nonparametric syndecan, metrics.

211
00:11:26,800 --> 00:11:31,100
We do it using kernel regression and
I was great for proving theorems.

212
00:11:31,300 --> 00:11:34,800
You could get confidence, intervals and
consistency, and asymptotic normality,

213
00:11:34,800 --> 00:11:37,000
and it was all great, but
it wasn't very useful.

214
00:11:37,300 --> 00:11:40,900
And the things they did in machine
learning. I just way way better,

215
00:11:41,000 --> 00:11:45,100
but they didn't have to the proper. That's
not my beef with machine learning theory.

216
00:11:45,300 --> 00:11:51,200
As we know my name, I'm saying
there for the prediction part.

217
00:11:51,400 --> 00:11:54,500
It does much better. Yeah, that's
a better curve fitting to it.

218
00:11:54,900 --> 00:11:56,500
But it did. So

219
00:11:57,100 --> 00:12:02,700
in a way that would not have made
those papers initially easy to get into

220
00:12:03,000 --> 00:12:06,300
the econometrics journals because it
wasn't proving the type of things.

221
00:12:06,400 --> 00:12:11,200
You know, when when Brian was doing his
regression trees that just didn't fit in

222
00:12:11,800 --> 00:12:15,100
and I think he would have
had a very hard time.

223
00:12:15,200 --> 00:12:18,400
Polishing these things. And it
could have had six journals.

224
00:12:18,900 --> 00:12:24,400
I, so I think we're we limited
ourselves too much and we

225
00:12:24,700 --> 00:12:27,900
that left us close things off

226
00:12:28,000 --> 00:12:30,800
for a lot of these machine learning
methods, that actually very useful.

227
00:12:30,900 --> 00:12:34,000
Hmm. I mean, I think they're in general,

228
00:12:34,900 --> 00:12:36,200
that literature the computer.

229
00:12:36,200 --> 00:12:39,300
Scientists have brought a huge
number of these algorithms.

230
00:12:39,600 --> 00:12:43,900
The have proposed a huge number of these
algorithms that actually very useful

231
00:12:44,000 --> 00:12:44,700
at that are

232
00:12:45,500 --> 00:12:49,100
Affecting the way we're going
to be doing empirical work,

233
00:12:49,800 --> 00:12:55,100
but we've not fully internalize that
because we're still very focused on getting

234
00:12:55,300 --> 00:12:57,500
Point estimates and
getting standard errors

235
00:12:58,600 --> 00:13:01,200
and getting P values in a way that

236
00:13:01,700 --> 00:13:03,100
we need to move Beyond

237
00:13:03,300 --> 00:13:04,300
to fully harness.

238
00:13:04,300 --> 00:13:10,700
The force, the quote, the benefits
from machine learning literature.

239
00:13:10,900 --> 00:13:15,100
Hmm. On the one hand. I guess I very
much take your point that sort of the the

240
00:13:15,200 --> 00:13:18,600
Tional. Econometrics, framework
of sort of propose, a method,

241
00:13:18,600 --> 00:13:22,600
proved a limit theorem under some
asymptotic story, story story,

242
00:13:22,600 --> 00:13:26,900
story story publish a
paper is constraining.

243
00:13:26,900 --> 00:13:29,700
And that in some sense by thinking, more,

244
00:13:29,700 --> 00:13:33,200
broadly about what a methods paper could
look. Like we may write in some sense.

245
00:13:33,200 --> 00:13:35,900
Certainly the machine learning
literature has found a bunch of things,

246
00:13:35,900 --> 00:13:38,300
which seem to work quite
well for a number of problems

247
00:13:38,300 --> 00:13:42,400
and are now having substantial influence
in economics. I guess a question.

248
00:13:42,400 --> 00:13:44,800
I'm interested in is, how do you think?

249
00:13:45,200 --> 00:13:47,600
The goal of fear.

250
00:13:47,900 --> 00:13:51,200
Sort of, do you think there is? There's
no value in the theory part of it?

251
00:13:51,600 --> 00:13:54,800
Because I guess it's sort of a question
that I often have to sort of seeing

252
00:13:54,800 --> 00:13:56,900
that output from a machine learning tool

253
00:13:56,900 --> 00:13:59,400
that actually a number of the
methods that you talked about.

254
00:13:59,400 --> 00:14:01,800
Actually do have inferential
results, develop for them,

255
00:14:02,600 --> 00:14:06,400
something that I always wonder about a sort
of uncertainty quantification and just,

256
00:14:06,500 --> 00:14:08,000
you know, I I have my prior,

257
00:14:08,000 --> 00:14:11,000
I come into the world with my view.
I see the result of this thing.

258
00:14:11,000 --> 00:14:14,500
How should I update based on it? And
in some sense, if I'm in a world where

259
00:14:14,600 --> 00:14:15,100
things are.

260
00:14:15,200 --> 00:14:18,200
Normally distributed. I know
how to do it here. I don't.

261
00:14:18,200 --> 00:14:21,400
And so I'm interested to hear
had I think it sounds. So

262
00:14:21,500 --> 00:14:24,300
I don't see this as sort
of close it saying, well

263
00:14:24,400 --> 00:14:26,500
we do these results
are not not interesting

264
00:14:26,600 --> 00:14:27,700
but it's gonna be a lot of cases

265
00:14:28,000 --> 00:14:31,200
where it's going to be incredibly hard to
get those results and we may not be able

266
00:14:31,200 --> 00:14:33,200
to get there and

267
00:14:33,400 --> 00:14:37,700
we may need to do it in stages. Where
first someone says. Hey I have this

268
00:14:39,600 --> 00:14:44,800
interesting algorithm for for doing
something and it works well by some

269
00:14:45,600 --> 00:14:49,900
The Criterion that on this
this particular data set

270
00:14:51,000 --> 00:14:53,400
and I'm visit put it
out there and we should

271
00:14:53,700 --> 00:14:58,000
maybe someone will figure out a way that
you can later actually still do inference

272
00:14:58,000 --> 00:14:59,100
on the some condition.

273
00:14:59,100 --> 00:15:02,100
So and maybe those are not
particularly realistic conditions,

274
00:15:02,100 --> 00:15:05,500
then we kind of go further,
but I think we've been

275
00:15:06,700 --> 00:15:11,400
Too constraining things too much where we
said, you know, this is the type of things

276
00:15:12,100 --> 00:15:14,400
that we need to do. And I had some sense

277
00:15:15,700 --> 00:15:18,200
that goes back to kind of
the way they dress and I

278
00:15:19,700 --> 00:15:21,900
thought about things for the
local average treatment effect.

279
00:15:21,900 --> 00:15:24,600
That wasn't quite the way people
were thinking about these problems.

280
00:15:24,600 --> 00:15:29,200
Before they say they there was a sense
that some of the people said, you know,

281
00:15:29,500 --> 00:15:31,900
the way you need to do. These
things, is you first, say

282
00:15:32,200 --> 00:15:36,300
what you're interested in estimating
and then you do the best job you can.

283
00:15:36,500 --> 00:15:37,700
In estimating that

284
00:15:38,100 --> 00:15:44,200
and what you have you guys had doing is
doing it, you guys are doing it backwards.

285
00:15:44,300 --> 00:15:46,700
You're going to say
here. I have an estimator

286
00:15:47,300 --> 00:15:49,600
and now I'm going to figure out what what

287
00:15:49,800 --> 00:15:51,400
what it says estimating then expose.

288
00:15:51,400 --> 00:15:53,900
You're going to say why you
think that's interesting

289
00:15:53,900 --> 00:15:56,600
or maybe why it's not interesting
and that's that's not okay.

290
00:15:56,600 --> 00:15:58,600
You're not allowed to do that that way.

291
00:15:59,000 --> 00:16:04,100
And I think we should just be a little
bit more flexible and thinking about the

292
00:16:04,300 --> 00:16:06,300
how to look at at

293
00:16:06,400 --> 00:16:11,300
Problems because I think we've missed
some things by not by not doing that.

294
00:16:13,000 --> 00:16:16,600
So you've heard our views.
Isaiah, you've seen that, we have

295
00:16:17,000 --> 00:16:20,400
some points of disagreement. Why
don't you referee this dispute for us?

296
00:16:22,500 --> 00:16:28,100
Oh, I'm so so nice of you to ask me
a small question. So I guess for one.

297
00:16:28,200 --> 00:16:33,200
I very much agree with something
that he do said earlier of.

298
00:16:36,000 --> 00:16:36,300
So what?

299
00:16:36,500 --> 00:16:37,900
Where it seems. Where the,

300
00:16:37,900 --> 00:16:41,400
the case for machine learning seems
relatively clear is in settings, where

301
00:16:41,500 --> 00:16:45,100
you know, we're interested in some version
of a nonparametric prediction problem.

302
00:16:45,100 --> 00:16:49,700
So I'm interested in estimating a conditional
expectation or conditional probability

303
00:16:50,000 --> 00:16:52,100
and in the past, maybe I
would have run a colonel,

304
00:16:52,100 --> 00:16:55,800
I would have run a kernel regression or
I would have run a series regression or

305
00:16:56,100 --> 00:16:57,400
something along those lines.

306
00:16:57,700 --> 00:16:58,000
Sort of,

307
00:16:58,000 --> 00:16:58,700
it seems like

308
00:16:58,700 --> 00:17:02,000
at this point we've a fairly good
sense that in a fairly wide range

309
00:17:02,000 --> 00:17:06,300
of applications machine learning
methods seem to do better for

310
00:17:06,400 --> 00:17:06,800
Or, you know,

311
00:17:06,800 --> 00:17:08,800
estimating conditional mean functions

312
00:17:08,800 --> 00:17:12,000
or conditional probabilities or
various other nonparametric objects

313
00:17:12,400 --> 00:17:16,600
than more traditional nonparametric
methods that were studied in econometrics

314
00:17:16,600 --> 00:17:19,100
and statistics, especially
in high dimensional settings.

315
00:17:19,500 --> 00:17:23,100
So you thinking of maybe the propensity
score or something like that?

316
00:17:23,100 --> 00:17:25,300
So exactly, so nuisance functions. Yeah.

317
00:17:25,300 --> 00:17:28,900
So things like propensity scores
things or I mean even objects

318
00:17:28,900 --> 00:17:30,100
of more direct inference

319
00:17:30,200 --> 00:17:32,400
interest, like conditional
average treatment effects, right?

320
00:17:32,400 --> 00:17:35,100
Which of the difference of two
conditional, expectation functions,

321
00:17:35,100 --> 00:17:36,300
potentially things like that.

322
00:17:36,500 --> 00:17:40,400
Of course, even there,
right? We the the theory

323
00:17:40,500 --> 00:17:43,700
for in France or the theory for
sort of how to how to interpret,

324
00:17:43,700 --> 00:17:45,900
how to make large simple statements
about some of these things are

325
00:17:46,000 --> 00:17:50,100
less well-developed depending on the
machine learning, estimator used.

326
00:17:50,100 --> 00:17:53,800
And so, I think there's something
that is tricky is that we

327
00:17:53,900 --> 00:17:55,700
can have these methods, which work a lot,

328
00:17:55,700 --> 00:17:58,000
which seemed to work a lot
better for some purposes.

329
00:17:58,000 --> 00:18:01,600
But which we need to be a bit
careful in how we plug them in or how

330
00:18:01,600 --> 00:18:03,300
we interpret the resulting statements.

331
00:18:03,600 --> 00:18:06,200
But of course, that's a very,
very active area right now. We're

332
00:18:06,400 --> 00:18:10,400
People are doing tons of great work.
And so I exfoli expect and hope

333
00:18:10,400 --> 00:18:12,800
to see much more going forward there.

334
00:18:13,000 --> 00:18:17,300
So one issue with machine learning,
that always seems a danger is, or

335
00:18:17,400 --> 00:18:20,300
that is sometimes a danger
and had some times led to

336
00:18:20,500 --> 00:18:22,600
applications that have
made. Less sense, is

337
00:18:22,800 --> 00:18:25,100
when folks start with a method that are

338
00:18:25,300 --> 00:18:28,500
start with a method that they're very
excited about rather than a question,

339
00:18:28,900 --> 00:18:32,100
right? So sort of starting with
a question where here's the

340
00:18:32,500 --> 00:18:36,200
object I'm interested in here is
the parameter of Interest. Let me

341
00:18:36,700 --> 00:18:37,100
You know,

342
00:18:37,300 --> 00:18:39,500
think about how I would
identify that thing,

343
00:18:39,500 --> 00:18:41,800
how I would recover that
thing, if I had a ton of data,

344
00:18:41,900 --> 00:18:44,000
oh, here's a conditional
expectation function.

345
00:18:44,000 --> 00:18:47,100
Let me plug in an estimator on
machine. Learning estimator for that.

346
00:18:47,200 --> 00:18:48,800
That seems very very sensible.

347
00:18:49,000 --> 00:18:53,100
Whereas, you know, if I
digress quantity on price

348
00:18:53,700 --> 00:18:56,000
and say that I used a
machine learning method,

349
00:18:56,300 --> 00:18:58,900
maybe I'm satisfied that that
solves the in dodging, 80 problem.

350
00:18:58,900 --> 00:19:01,200
We're usually worried
about their maybe I'm not,

351
00:19:01,500 --> 00:19:03,200
but again, that's something where the,

352
00:19:03,400 --> 00:19:06,300
the way to address. It, seems
relatively clear, right?

353
00:19:06,500 --> 00:19:09,000
It's the find your object of interest and

354
00:19:09,200 --> 00:19:11,600
think about, is that just
bringing the economics?

355
00:19:11,700 --> 00:19:12,200
Exactly.

356
00:19:12,200 --> 00:19:15,400
And and can I think about it,
and they denied it, but harnessed

357
00:19:15,400 --> 00:19:18,300
the power of the machine
learning methods for precisely

358
00:19:18,500 --> 00:19:22,800
for some of the components precisely.
Exactly. So sort of, you know, the, the,

359
00:19:22,900 --> 00:19:25,600
the question of interest is the same as
the question of interest is always been,

360
00:19:25,600 --> 00:19:29,500
but we now better methods for estimating
some pieces of this, right? The

361
00:19:29,900 --> 00:19:31,600
the place that seems harder to, uh,

362
00:19:31,900 --> 00:19:33,400
harder to forecast is Right.

363
00:19:33,400 --> 00:19:36,300
Obviously, there's a huge amount
going in going on in the machine.

364
00:19:36,400 --> 00:19:37,400
Learning literature

365
00:19:37,500 --> 00:19:39,700
and the great sort of The Limited ways

366
00:19:39,700 --> 00:19:42,900
of plugging it in that I've referenced
so far are limited piece of that.

367
00:19:43,000 --> 00:19:46,100
And so I think there are all sorts of
other interesting questions about where,

368
00:19:46,300 --> 00:19:46,900
right sort of

369
00:19:47,100 --> 00:19:49,300
where does this interaction
go? What else can we learn?

370
00:19:49,300 --> 00:19:52,000
And that's something where,
you know, I think there's

371
00:19:52,200 --> 00:19:56,400
a ton going on which seems very promising
and I have no idea what the answer is.

372
00:19:57,000 --> 00:20:01,200
No, no. No, it's I so I totally
agree with that but it's no.

373
00:20:01,800 --> 00:20:03,500
That's makes it very exciting.

374
00:20:03,800 --> 00:20:06,100
And I think that's just a
little work to be done there.

375
00:20:06,600 --> 00:20:11,400
All right. So I say agrees
with me there, say that person.

376
00:20:14,500 --> 00:20:17,700
If you'd like to watch more
Nobel conversations, click here,

377
00:20:18,000 --> 00:20:20,400
or if you'd like to learn
more about econometrics,

378
00:20:20,500 --> 00:20:23,100
check out Josh's mastering
econometrics series.

379
00:20:23,600 --> 00:20:26,500
If you'd like to learn more
about he do Josh and Isaiah

380
00:20:26,700 --> 00:20:28,200
check out the links in the description.