% Conan the Librarian % 120 hour epic % sax marathon
In which we befriend the REPL and create a library catalog.
- Using the REPL for prototyping.
- Building a set of functions that help each other.
In the secret dossier called Project Gutenberg data you will find a catalog of a subset of the Finnish books on Project Gutenberg. The data is in a Clojure vector and can be read into a program easily:
(def books (load-file "books.clj"))
The elements in books
are maps that each represents a book, like this:
{:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
:author {:name "Hoffmann, Franz", :birth-year 1814, :death-year 1882}}
A book is a Clojure map with two keys: :title
and :author
.
An author is also a Clojure map. However, while the data contains the birth and death years for most authors, there are books whose authors that do not have either:
{:title "Contigo Pan y Cebolla",
:author {:name "Gorostiza, Manuel Eduardo de"}}
In other words, an author is a map with at least the key :name
, and
optionally also :birth-year
and :death-year
. All authors who have either
:birth-year
or :death-year
have both. (We checked.)
This is a feature of the original data. The data has another peculiarity: some authors' names contain a birth year in the form "Name of Author, 1234-", where 1234 is the birth year. This is an artifact of the data extraction script we used to transform the data into a Clojure map.
However, there is a lot we can do – even with this imperfect data. Imperfect data is also a good example of the kind of data you often encounter in real world situations, such as scraping websites or, in general, communicating with third party systems you have no control over.
Our goal is to write a catalog, a data structure that we can query to find all the titles by a certain author.
Now that we have prototyped in the REPL and hopefully have some idea of what we want to implement, we can start writing code in a file. We're going to cheat and not tell you what the final function is like. Instead, we will build up to it, bottom-up, with successive helper methods.
First, this is how the function we will eventually implement will work:
(author-catalog books {:name "Gorostiza, Manuel Eduardo de"})
;=> (TODO)
Let's start building the helper functions we need to implement
author-catalog
.
First, though, let's write a few practice functions to make sure we understand how the data is structured. An interesting feature in the data is the missing birth and death years for some authors. Let's see how we could detect these authors.
Implement the function (author-has-years? author)
, which returns true
or
false
depending on whether the given author
has the :birth-year
and
:death-year
entries.
(author-has-years? {:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
:author {:name "Hoffmann, Franz",
:birth-year 1814,
:death-year 1882}})
;=> true
(author-has-years? {:title "Contigo Pan y Cebolla",
; :author {:name "Gorostiza, Manuel Eduardo de"}})
;=> false
Now that we have a function to detect if an author map contains the year information, we can, for an example, write a function that will return all the books whose authors do have the year information. We do not need this for our catalog, but it is a good practice function.
Write the function (books-with-author-years books)
, which returns those
books from books
whose :author
has :death-year
and :birth-year
.
(books-with-author-years
[{:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
:author {:name "Hoffmann, Franz",
:birth-year 1814,
:death-year 1882}}
{:title "Contigo Pan y Cebolla",
:author {:name "Gorostiza, Manuel Eduardo de"}}])
;=> ({:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
; :author {:name "Hoffmann, Franz",
; :birth-year 1814,
; :death-year 1882}})
Now we begin to write the actual helper functions we know we will need to
implement author-catalog
.
(authors books)
returns a collection of authors. The returned collection
should not contain the same author multiple times even if the author has
multiple books in books
.
You can use distinct
to remove duplicates from a sequence.
(authors [{:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
:author {:name "Hoffmann, Franz",
:birth-year 1814,
:death-year 1882}}
{:title "Ihmiskohtaloja"
:author {:name "Järnefelt, Arvid",
:birth-year 1861,
:death-year 1932}}
{:title "Elämän meri"
:author {:name "Järnefelt, Arvid",
:birth-year 1861,
:death-year 1932}}])
;=> ({:name "Järnefelt, Arvid",
:birth-year 1861,
:death-year 1932}
; {:name "Hoffmann, Franz",
:birth-year 1814,
:death-year 1882})
Another useful function will be author-names
:
(author-names books)
returns a collection of author names, without
duplicates.
(author-names [{:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
:author {:name "Hoffmann, Franz",
:birth-year 1814,
:death-year 1882}}
{:title "Ihmiskohtaloja"
:author {:name "Järnefelt, Arvid",
:birth-year 1861,
:death-year 1932}}])
;=> ("Hoffmann, Franz", "Järnefelt, Arvid")
We now implement the author-catalog
function. It is an example of one kind
of a view (or catalog) into the Project Gutenberg data. You are encouraged to
write other kinds of views. For an example, it would be interesting to catalog
authors by their birth year.
Write the function (author-catalog books)
that returns a new map of the
catalog data with author names as keys and the respective book titles as
values. For an example:
(author-catalog [{:title "Nuoren Robertin matka Grönlantiin isäänsä hakemaan"
:author {:name "Hoffmann, Franz",
:birth-year 1814,
:death-year 1882}}
{:title "Ihmiskohtaloja"
:author {:name "Järnefelt, Arvid",
:birth-year 1861,
:death-year 1932}}
{:title "Elämän meri"
:author {:name "Järnefelt, Arvid",
:birth-year 1861,
:death-year 1932}}])
;=> {"Hoffmann, Franz"
; ("Nuoren Robertin matka Grönlantiin isäänsä hakemaan")
; "Järnefelt, Arvid"
; ("Elämän meri", "Ihmiskohtaloja")}