< Return to Video

Daniel Reetz' DIY Book Scanner Tech Talk at Google HQ

  • 0:17 - 0:21
    You've seen the very latest thing that I created
  • 0:21 - 0:23
    with my computer
  • 0:23 - 0:25
    So, uh, January 2009 I was a grad student
  • 0:25 - 0:29
    and I was searching Amazon for copies of textbooks
  • 0:29 - 0:30
    that I couldn't afford
  • 0:30 - 0:33
    and Amazon recommended that I look at some cameras
  • 0:33 - 0:34
    The cameras just so happened to be
  • 0:34 - 0:37
    a quarter the price of my textbook.
  • 0:37 - 0:39
    And so I did what I always do when I'm in trouble:
  • 0:39 - 0:41
    I jumped in a dumpster
  • 0:41 - 0:42
    and started building things.
  • 0:42 - 0:46
    Over three days I built the very first DIY book scanner
  • 0:46 - 0:49
    and showed it to my friend Aaron.
  • 0:49 - 0:50
    Aaron's a software wizard
  • 0:50 - 0:54
    and understood that I needed postprocessing software which I had written in a photoshop script
  • 0:54 - 0:56
    So he turned around and wrote software in exchange for his own scanner.
  • 0:56 - 0:58
    As I built the second scanner
  • 0:58 - 0:59
    I documented the thing obsessively
  • 0:59 - 1:02
    in total detail
  • 1:02 - 1:04
    and put the thing on Instructables
  • 1:04 - 1:06
    as a 79 step tutorial on
  • 1:06 - 1:08
    how to build your own book scanner.
  • 1:08 - 1:11
    So what happened after that kind of blew my mind.
  • 1:11 - 1:16
    First of all, it got over a 100,000 views in the first month.
  • 1:16 - 1:18
    It's now 204,000 views as of today.
  • 1:18 - 1:20
    Second of all, I won a laser cutter.
  • 1:20 - 1:23
    Third off, I won a qbuild for laser-cut art.
  • 1:23 - 1:26
    But what blew my mind wasn't actually winning things.
  • 1:26 - 1:28
    What blew my mind was the people that contacted me
  • 1:28 - 1:29
    after that.
  • 1:29 - 1:31
    Immediately after I put up the Instructable,
  • 1:31 - 1:33
    I got dozens & dozens of messages
  • 1:33 - 1:34
    from people all over the world
  • 1:34 - 1:35
    saying "I really want a book scanner like that,
  • 1:35 - 1:37
    I'm so glad you put these plans up."
  • 1:37 - 1:38
    And they started telling their stories
  • 1:38 - 1:39
    as to why they wanted them,
  • 1:39 - 1:40
    what they were going to do with them,
  • 1:40 - 1:42
    and they also started back improvements,
  • 1:42 - 1:44
    which in a way, isn't really all that difficult,
  • 1:44 - 1:45
    because mine was made of trash.
  • 1:45 - 1:51
    So, what I did was grab the first guy who showed up
  • 1:51 - 1:52
    his name's Rob. He's a mathematician.
  • 1:52 - 1:55
    We founded diybookscanner.org
  • 1:55 - 1:58
    diybookscanner.org today
  • 1:58 - 2:00
    Excuse me for flipping back and forth here
  • 2:00 - 2:04
    let me back up one more step
  • 2:04 - 2:06
    The first five people to sign up were
  • 2:06 - 2:07
    two mechanical engineers, two software engineers,
  • 2:07 - 2:09
    and an intellectual property lawyer.
  • 2:09 - 2:12
    Today, we have 600 members,
  • 2:12 - 2:14
    between 90 and 150 builds,
  • 2:14 - 2:16
    it's hard to say because they're not all complete
  • 2:16 - 2:20
    and it seems like we get a new build
  • 2:20 - 2:21
    almost every day.
  • 2:21 - 2:24
    Just showing you a selection of builds here,
  • 2:24 - 2:26
    giving you an idea of the enormous variety
  • 2:26 - 2:29
    of stuff people have constructed to do this job.
  • 2:29 - 2:31
    And this is only about half.
  • 2:31 - 2:37
    So what is the DIY book scanner?
  • 2:37 - 2:38
    What exactly are we doing?
  • 2:38 - 2:40
    DIY book scanning at its core
  • 2:40 - 2:42
    is nothing more than using cheap
  • 2:42 - 2:45
    compact cameras to digitize books and other materials.
  • 2:45 - 2:47
    So the essential insight that we have
  • 2:47 - 2:50
    is not only that books and cameras
  • 2:50 - 2:52
    cost about the same amount,
  • 2:52 - 2:54
    but that the quality of the point & shoots
  • 2:54 - 2:55
    has gotten good enough that we can do serious
  • 2:55 - 2:57
    digitization work with them
  • 2:57 - 2:59
    As you know, scanning books with cameras
  • 2:59 - 3:02
    has many, many advantages,
  • 3:02 - 3:04
    the first being that it's an order of magnitude faster
  • 3:04 - 3:06
    than scanning with a flatbed scanner.
  • 3:06 - 3:08
    Anybody who's tried to scan a book on a flatbed scanner
  • 3:08 - 3:09
    knows it's miserable.
  • 3:09 - 3:11
    You break the binding, it takes forever...
  • 3:11 - 3:13
    it's an exercise in frustration.
  • 3:13 - 3:18
    Where we go from there is kind of interesting.
  • 3:18 - 3:23
    When you start trying to digitize books
  • 3:23 - 3:24
    using compact cameras,
  • 3:24 - 3:25
    you run into some interesting problems.
  • 3:25 - 3:27
    There is not an equivalence
  • 3:27 - 3:28
    between compact cameras and SLR.
  • 3:28 - 3:30
    So many of our people, including myself,
  • 3:30 - 3:31
    showed up saying, oh, well, to get better sharpness
  • 3:31 - 3:33
    we'll stop down the aperture.
  • 3:33 - 3:35
    Well, that's actually a problem because
  • 3:35 - 3:37
    believe it or not, these cameras don't have aperatures.
  • 3:37 - 3:39
    They have a neutral density filter
  • 3:39 - 3:40
    that drops in the optical path
  • 3:40 - 3:41
    and degrades image quality.
  • 3:41 - 3:43
    So in the case of compact cameras
  • 3:43 - 3:44
    just as one example,
  • 3:44 - 3:46
    you leave the aperature as wide open as you possibly can.
  • 3:46 - 3:51
    And that's one of the big differences between DIY
  • 3:51 - 3:52
    book scanning and all of the commercial stuff
  • 3:52 - 3:54
    is that we treat compact cameras
  • 3:54 - 3:55
    for what they are
  • 3:55 - 3:58
    and optimize them for the best possible parameters.
  • 3:58 - 4:00
    Now I'll talk a little bit about the construction
  • 4:00 - 4:01
    of the DIY book scanner. This is in the
  • 4:01 - 4:03
    my old -- the garage I used to live in.
  • 4:03 - 4:06
    There you're looking at lighting.
  • 4:06 - 4:07
    And for DIY book scanning
  • 4:07 - 4:08
    I almost always recommend
  • 4:08 - 4:09
    tungsten or halogen lighting.
  • 4:09 - 4:11
    The reason for recommending tungsten
  • 4:11 - 4:12
    or halogen lighting is technical:
  • 4:12 - 4:14
    they have a pretty flat spectral output.
  • 4:14 - 4:16
    The sensors on a camera
  • 4:16 - 4:18
    have essentially red, green, and blue sensitivities
  • 4:18 - 4:19
    that peak some way.
  • 4:19 - 4:20
    A lot of people that come to our forums
  • 4:20 - 4:22
    want to use compact flourescents.
  • 4:22 - 4:23
    They say "oh they're eco-friendly, and
  • 4:23 - 4:25
    they're in a variety and you get them in
  • 4:25 - 4:26
    any color balance you want."
  • 4:26 - 4:27
    But the output is spiky.
  • 4:27 - 4:30
    And it might not match the spectral sensitivity of your sensor.
  • 4:30 - 4:31
    So generally we recommend those
  • 4:31 - 4:34
    because they work best with all platforms.
  • 4:34 - 4:36
    However in cases such as this tungsten
  • 4:36 - 4:37
    isn't practical, and so we've developed
  • 4:37 - 4:39
    high-output lighting systems
  • 4:39 - 4:41
    using pre-LEDs (?)
  • 4:41 - 4:44
    and that's the folding scanner
  • 4:44 - 4:45
    that I produced with my laser cutter
  • 4:45 - 4:46
    right after I got it.
  • 4:46 - 4:48
    and this is (?).
  • 4:48 - 4:52
    So the next part of the book scanner
  • 4:52 - 4:54
    is camera support.
  • 4:54 - 4:55
    On this camera we just had
  • 4:55 - 4:56
    two columns here
  • 4:56 - 4:58
    and on this scanner which we call "The New Standard"
  • 4:58 - 5:00
    it's sort of a standard
  • 5:00 - 5:01
    build for beginners in our forum.
  • 5:01 - 5:02
    You can see there are two 2×4s.
  • 5:02 - 5:03
    And the reason for going for camera support
  • 5:03 - 5:06
    is simply that if your cameras move while you're digitizing
  • 5:06 - 5:07
    it makes post-processing really hard.
  • 5:07 - 5:10
    If your book moves while you're scanning
  • 5:10 - 5:13
    it makes things really hard.
  • 5:13 - 5:17
    The next part of the scanner is called the platten.
  • 5:17 - 5:19
    And the platten is a v-shaped piece of glass
  • 5:19 - 5:23
    or plexiglass, or in our case now
  • 5:23 - 5:24
    we're starting to use Gorilla Glass,
  • 5:24 - 5:26
    which mechanically flattens the page of the book.
  • 5:26 - 5:29
    We don't mess around with computational dewarping.
  • 5:29 - 5:30
    We don't mess around with structured light
  • 5:30 - 5:31
    and other methods of dewarping the page.
  • 5:31 - 5:33
    Unless we have to.
  • 5:33 - 5:34
    It's much better to get good input
  • 5:34 - 5:36
    and good output.
  • 5:36 - 5:38
    So, what goes under the platten is the cradle.
  • 5:38 - 5:41
    And the cradle is what holds the book
  • 5:41 - 5:43
    and the cradle also accommodates
  • 5:43 - 5:45
    the thickness of the spine.
  • 5:45 - 5:46
    So you'll notice that the gap
  • 5:46 - 5:48
    on the cradle that
  • 5:48 - 5:49
    John (?) is holding is
  • 5:49 - 5:50
    adjustable.
  • 5:50 - 5:53
    You have to put all that stuff somewhere
  • 5:53 - 5:54
    so we have a thing called the base.
  • 5:54 - 5:56
    Where we put everything.
  • 5:56 - 5:58
    And finally we have the electronics.
  • 5:58 - 5:59
    Now, one of the fundamental things
  • 5:59 - 6:02
    that enable the DIY book scanning technology
  • 6:02 - 6:03
    in the first place
  • 6:03 - 6:06
    was that there's a hack for these cameras
  • 6:06 - 6:08
    that allows them to be triggered simultaneously.
  • 6:08 - 6:10
    So there's extra firmware
  • 6:10 - 6:11
    authored by David Sykes (?)
  • 6:11 - 6:13
    called Stereo Datamaker
  • 6:13 - 6:14
    and when they boot up,
  • 6:14 - 6:17
    they're waiting for a signal to fire at the same time
  • 6:17 - 6:18
    I can demonstrate later.
  • 6:18 - 6:28
    So, all of that gets you photographs of the pages.
  • 6:28 - 6:34
    And you need to post-process them somehow.
  • 6:34 - 6:35
    And this first software that Aaron wrote
  • 6:35 - 6:36
    was great for our first software
  • 6:36 - 6:37
    -- Photoshop scripts worked great --
  • 6:37 - 6:38
    but we really needed something more serious.
  • 6:38 - 6:39
    And we hooked up with the author
  • 6:39 - 6:40
    of a package called ScanTailor
  • 6:40 - 6:43
    which is the closest thing we have to pure magic.
  • 6:43 - 6:45
    It can eliminate lighting issues
  • 6:45 - 6:47
    binarize (?) extremely difficult pages
  • 6:47 - 6:52
    It's amazing how it cleanly handles really poor quality input.
  • 6:52 - 6:54
    And many of our scanners do produce poor quality.
  • 6:54 - 6:58
    What's so amazing about it also
  • 6:58 - 6:59
    is that he spent a lot of time
  • 6:59 - 7:01
    making it really simple to use,
  • 7:01 - 7:02
    free to download and modify,
  • 7:02 - 7:05
    and it also works on all operating systems.
  • 7:05 - 7:06
    We have a forum specifically to support it
  • 7:06 - 7:08
    and we're continuously to support it
  • 7:08 - 7:09
    developing it and pushing it further,
  • 7:09 - 7:10
    including dewarping (?).
  • 7:10 - 7:16
    So, dewarping progress which was posted two days ago
  • 7:16 - 7:17
    -- this is an example
  • 7:17 - 7:20
    where we moved from the page on the left
  • 7:20 - 7:25
    to the page in the center, using some spline detection.
  • 7:25 - 7:27
    So, that's a lot of talk about technology.
  • 7:27 - 7:32
    What I'd really like to talk about are people.
  • 7:32 - 7:35
    This is Rob the co-founder of my forum.
  • 7:35 - 7:36
    And the reason I like to talk about people is
  • 7:36 - 7:37
    because it's what people do with this
  • 7:37 - 7:40
    is what's important, not some gadgets with cameras
  • 7:40 - 7:41
    or whatever.
  • 7:41 - 7:43
    Or garbage for that matter.
  • 7:43 - 7:44
    So, Rob is sort of the canonical user
  • 7:44 - 7:46
    of DIY book scanners.
  • 7:46 - 7:48
    His home is absolutely full of
  • 7:48 - 7:49
    Science Fiction books, technical books,
  • 7:49 - 7:51
    just bending every MDF bookshelf
  • 7:51 - 7:52
    in the house.
  • 7:52 - 7:55
    And Rob, like many of our forum users
  • 7:55 - 7:56
    doesn't find it acceptable
  • 7:56 - 7:58
    to repurchase books continuously
  • 7:58 - 8:01
    like we did with vinyl, 8-track, cassette, mp3,
  • 8:01 - 8:03
    and now whatever comes out of the iTunes store.
  • 8:03 - 8:05
    So Rob built a book scanner
  • 8:05 - 8:07
    which he did when he's not building a
  • 8:07 - 8:11
    rod-logic-based (?) mechanical computer
  • 8:11 - 8:13
    and digitized all of his books.
  • 8:13 - 8:16
    So, go from there
  • 8:16 - 8:19
    to Ben Verady (?) who's a graduate student
  • 8:19 - 8:20
    at Tulane University
  • 8:20 - 8:21
    who's working on a project called
  • 8:21 - 8:22
    The Durationator.
  • 8:22 - 8:23
    The Durationator (?) it's currently in process
  • 8:23 - 8:29
    right now, will be a database of
  • 8:29 - 8:32
    all database records.
  • 8:32 - 8:34
    Such that you can actually search for legal advice
  • 8:34 - 8:35
    on whether or not you can scan something.
  • 8:35 - 8:39
    And this is one way that DIY book scanning
  • 8:39 - 8:40
    technology.
  • 8:40 - 8:41
    A recent member of our forum
  • 8:41 - 8:43
    is Patrick Hall, he's a linguist.
  • 8:43 - 8:47
    And he works specifically with fileslips,
  • 8:47 - 8:48
    which are a linguistics tool.
  • 8:48 - 8:50
    They're slips of paper
  • 8:50 - 8:50
    on which are written grammatical fragments,
  • 8:50 - 8:52
    words, definitions, etc.
  • 8:52 - 8:56
    And these fileslips are a precursor to a linguistic dictionary.
  • 8:56 - 8:58
    And he's particularly interested Native American
  • 8:58 - 9:00
    languages, California Native American languages,
  • 9:00 - 9:02
    so there might be, in his case,
  • 9:02 - 9:05
    17,000 slips that never actually became a dictionary.
  • 9:05 - 9:08
    And he'd like to (?) them, and use them.
  • 9:08 - 9:09
    And he did exactly that, using the exactly
  • 9:09 - 9:15
    the same firmware (?) plus a cardboard box.
  • 9:15 - 9:16
    We have quite a number of disabled people
  • 9:16 - 9:19
    on the forums, and this is one of my favorite stories
  • 9:19 - 9:21
    This man's name is Tristan
  • 9:21 - 9:22
    he's a mechanical engineering major
  • 9:22 - 9:24
    who built his own scanner
  • 9:24 - 9:25
    because he has difficulty reading with eyes
  • 9:25 - 9:28
    He's a perfectly normal human being,
  • 9:28 - 9:29
    hears fine with his ears,
  • 9:29 - 9:30
    super intelligent,
  • 9:30 - 9:31
    just can't read with his eyes.
  • 9:31 - 9:33
    Built a scanner, his computer now reads
  • 9:33 - 9:34
    his books to him.
  • 9:34 - 9:36
    Probably the greatest story to come out of
  • 9:36 - 9:37
    our forum is also
  • 9:37 - 9:38
    one of, let's say the second person,
  • 9:41 - 9:42
    Surya Darnu (?)
  • 9:42 - 9:43
    Suyran Darna is a village official in Indonesia
  • 9:43 - 9:45
    who wrote me and said
  • 9:45 - 9:46
    "This is the first time I'm writing anybody
  • 9:46 - 9:47
    on the internet but
  • 9:47 - 9:48
    we have this problem.
  • 9:48 - 9:50
    We have village holy books
  • 9:50 - 9:52
    which are wet, being eaten by bugs,
  • 9:52 - 9:53
    and being destroyed by fires and floods.
  • 9:53 - 9:57
    These books are everywhere, nobody can scan
  • 9:57 - 9:58
    them fast enough, and scanning them
  • 9:58 - 10:00
    is destroying them.
  • 10:00 - 10:01
    I want to make a book scanner
  • 10:01 - 10:02
    like yours, but we can't afford the cameras."
  • 10:02 - 10:04
    So we took up a donation, we sent him cameras,
  • 10:04 - 10:07
    He built a scanner, and
  • 10:07 - 10:08
    started scanning.
  • 10:08 - 10:09
    And as far as I know, the work
  • 10:09 - 10:10
    is ongoing right now.
  • 10:10 - 10:12
    In many First Nations communities
  • 10:12 - 10:13
    in the far north of Canada,
  • 10:13 - 10:15
    it's extremely difficult to get resources
  • 10:15 - 10:17
    especially books.
  • 10:17 - 10:19
    The schools there have chronic shortages
  • 10:19 - 10:20
    of books.
  • 10:20 - 10:21
    And when they can bring in either books
  • 10:21 - 10:22
    or milk on a plane, they'll probably bring in milk.
  • 10:22 - 10:26
    So the University of Toronto started a pilot project
  • 10:26 - 10:28
    using DIY book scanner technology
  • 10:28 - 10:31
    to enable these people to copy
  • 10:31 - 10:32
    their own books.
  • 10:32 - 10:33
    It's called the On-Demand Book Service
  • 10:33 - 10:34
    and it's a DIY book scanner
  • 10:34 - 10:36
    and a bunch of other printing and binding equipment.
  • 10:36 - 10:37
    Including DIY binding equipment.
  • 10:37 - 10:41
    Now, there's a lot of people doing things
  • 10:41 - 10:43
    for themselves, but there's also a lot of institutional use.
  • 10:43 - 10:46
    Misty De Meo is the
  • 10:46 - 10:49
    Digitization Assistant at the County of Brant Public Library in Brant County, Ontario.
  • 10:49 - 10:55
    And she exemplifies the use of DIY technology in libraries.
  • 10:55 - 10:58
    She digitizes materials for her community:
  • 10:58 - 11:01
    maps, etc, books of all kinds.
  • 11:01 - 11:04
    For example militia handbooks.
  • 11:04 - 11:08
    Missy not only works to use DIY technology in archival fashion
  • 11:08 - 11:09
    not just an ad-hoc fashion,
  • 11:09 - 11:13
    but she also is I think at the forefront of digitization
  • 11:13 - 11:14
    digitizing things now.
  • 11:14 - 11:15
    Not just historical things,
  • 11:15 - 11:17
    not just old dictionaries but things we can use now (?).
  • 11:17 - 11:19
    She also has a great digitization blog.
  • 11:19 - 11:24
    So, what this comes down to is that
  • 11:24 - 11:26
    DIY technology sitting still.
  • 11:26 - 11:28
    We don't have any one scanner
  • 11:28 - 11:29
    We don't have one project.
  • 11:29 - 11:30
    We don't have any one software.
  • 11:30 - 11:32
    We're just continuously producing new designs
  • 11:32 - 11:33
    new software, new innovation.
  • 11:33 - 11:37
    And in this case I worked with a man named Dário de Moura
  • 11:37 - 11:39
    in Brazil, who helped me package
  • 11:39 - 11:42
    and license under the GPL the artwork for my
  • 11:42 - 11:45
    first laser-cut scanner that goes into carry-on luggage
  • 11:45 - 11:49
    and we've also produced now
  • 11:49 - 11:50
    ruggedized scanners that can be used
  • 11:50 - 11:52
    all over the world
  • 11:52 - 11:57
    and what has come up over and over and over again
  • 11:57 - 11:58
    and this is the really important point
  • 11:58 - 12:02
    is that there are all kinds of situations
  • 12:02 - 12:04
    where by social contract or circumstance
  • 12:04 - 12:07
    economic circumstances, you name it
  • 12:07 - 12:08
    you can't just bring in
  • 12:08 - 12:12
    they can't buy a perfect scanner for $10,000
  • 12:12 - 12:14
    these are book scanners that you can't buy
  • 12:14 - 12:17
    for people and situations where you can't afford them.
  • 12:17 - 12:22
    And that is where DIY Book Scanning technology fits
  • 12:22 - 12:25
    and that is absolutely what it does best.
  • 12:25 - 12:26
    And our goal as a community
  • 12:26 - 12:29
    in sharing these things absolutely wide open, no secrets,
  • 12:29 - 12:33
    is that we help people help themselves.
  • 12:33 - 12:35
    They allow people in situ to digitize things
  • 12:35 - 12:38
    share things with their community that they can't share elsewhere.
  • 12:38 - 12:44
    I know it may seem a little foreign from here to the Googleplex,
  • 12:44 - 12:46
    but there are many people who believe
  • 12:46 - 12:47
    that Google's of the class actions in Google Books search
  • 12:47 - 12:52
    compromises everyone else's fair use rights,
  • 12:52 - 12:53
    and I'm one of them.
  • 12:53 - 12:55
    And there are many people who feel that their data in general
  • 12:55 - 12:56
    can't be trusted in the cloud.
  • 12:56 - 12:59
    And just for some examples of that
  • 12:59 - 13:00
    there are personal journals,
  • 13:00 - 13:02
    I don't know, porn collections,
  • 13:02 - 13:04
    medical records, and other things which can't be shared widely.
  • 13:04 - 13:10
    Particularly religious books.
  • 13:10 - 13:12
    And in these situations the most important things we can do
  • 13:12 - 13:16
    is put technologies together that people can use to help themselves.
  • 13:16 - 13:25
    So going forward in a world which is irrevocably changed
  • 13:25 - 13:26
    by the diffusion and wide distribution
  • 13:26 - 13:29
    of e-reading technology by amazing, incredible services like Google Books,
  • 13:29 - 13:32
    probably the best thing we can do
  • 13:32 - 13:34
    is empower people to go on with reading
  • 13:34 - 13:38
    on their own terms, in their own place.
  • 13:38 - 13:40
    One of the ways we can do this is to iterate
  • 13:40 - 13:42
    on designs like this and share them freely
  • 13:42 - 13:44
    so that anybody with access to the machinery
  • 13:44 - 13:46
    will produce them.
  • 13:46 - 13:49
    This machinery needs help.
  • 13:49 - 13:50
    It's too complicated, it's too expensive,
  • 13:50 - 13:52
    and it's widely applicable just yet.
  • 13:52 - 13:55
    But we're just a few steps from each thing.
  • 13:55 - 13:56
    But the reason it is that way is, we were in a hurry.
  • 13:56 - 14:03
    So, I'm personally pursuing this next version of the Standard,
  • 14:03 - 14:04
    I'll be sharing it with the world.
  • 14:04 - 14:09
    I hope to see all of you at the DIY Book Scanner forum, where this works out.
  • 14:09 -
    Thank you.
Title:
Daniel Reetz' DIY Book Scanner Tech Talk at Google HQ
There has been no activity on this language so far.

English subtitles

Revisions