Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used to work in libraries for awhile and there's a pretty wide range of things this is useful for.

Lot's of interesting data analysis similar to what people are doing with google n-gram data for culturomics[0]. For example since you have publication year and subject heading you can look at the shift in popularity of certain subjects over time. I remember for fun I once plotted the life spans of various people by there area of research (art, math, sciences etc) it was interesting because there did seem to be some trends.

If you're doing any text classification research you now have a great way to label data if you just have title and author data. Or if you have texts with poor metadata you might be able to use this set to clean that up.

For libraries themselves I would love to see some machine learning approaches to cleaning up messy records, or just replace bad records with good one directly.

The big thing is that this is a very large set of curated bibliographic metadata from a reputable source. If you have any large project related to books (directly or indirectly) this could be a huge asset

[0] http://en.wikipedia.org/wiki/Culturomics



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: