The Data Studio


Quite Interesting

This is a list of books and other resources that we have found interesting and useful.

Title Author Medium What We Think About This Publisher
Peopleware: Productive Projects and Teams Tom DeMarco and Timothy Lister Paper, Kindle If you don't read anything else, at least read this. It is the most important book ever written about software development, and it is fun to read. Addison-Wesley
Extreme Programming Explained: Embrace Change Kent Beck Paper, Kindle This is a nice slim book that describes how to do agile development properly. Don't bother reading about Scrum (which - as it is used in most corporates - is just Waterfall with Stand-ups); read Kent Beck's book instead. Addison- Wesley Longman
The Mythical Man-Month: Essays on Software Engineering Frederick P Brooks Jr This is another one that you must read. Then your challenge is to get your manager to read it. If you succeed then he should be more understanding about the challenges you face. This book was originally written in 1975, by a man who already had a huge amount of experience. He bravely reviewed his work 20 years later. The anniversary edition is the one to get. Fred Brooks is honest and thorough in his criticism of his own work. It was made a bit easier for him because he did such a good job in the first place. Your manager probably has no appreciation of Brooks' ideas. If he does then you've got a good one. Addison-Wesley
The Pragmatic Programmer: From Journeyman to Master Andrew Hunt and David Thomas Paper, eBook Every developer should read this. It contains loads of great advice from people who really know how to build good systems. It's worth more than all the methodologies and "best practices" you will be lectured about in most organisations. Addison-Wesley
Secrets Behind Ruby on Rails David Heinemeier Hansson Podcast Great insights from a brilliant programmer with an exceptional understanding of how to build great software.
The whole talk is worth listening to, but at least grab the last minute and a half.
This is an old podcast from 2005. See below to learn what David Heinemeier Hansson has done in the years since then.
David Heinemeier Hansson David Heinemeier Hansson Website I've singled out an old talk, above, and directed you to the last minute and a half of that. I'm putting the reference to David Heinemeier Hansson's whole website here because it is full of enthusiasm and energy and passion for building great software. It is worth a long browse. David Heinemeier Hansson
An Introduction to Database Systems C J Date paper only Brilliant and hugely important in the development and establishment of relational databases. Now in its 8th edition. It is hard work though. I'll be cheeky and suggest that you read my little book first. Then read Date's big solid book. Addison-Wesley Longman
“One Size Fits All”: An Idea Whose Time Has Come and Gone Michael Stonebraker and Uğur Çetintemel pdf An important and very readable 10-page academic paper describing the shortcomings in the general-purpose relational databases (Oracle, Microsoft SQL Server, etc.) and how we can build highly effective database systems that are specialised for particular tasks. This is a good approach and we are now seeing the results with products like Vertica, VoltDB, Streambase and SciDB, which stand on the shoulders of the best relational systems rather than just kicking them out of the way.
Rethinking Main Memory OLTP Recovery Nirmesh Malviya, Ariel Weisberg, Samuel Madden, Michael Stonebraker pdf An important and very readable 12-page academic paper describing the work behind VoltDB. This follows on from the One Size Fits All paper. The idea is that we can build specialised relational database management systems to take advantage of current hardware to deliver outstanding functionality and performance for specific types of applications. In the case of VoltDB, this is a ludicrously fast, in-memory (really!) transaction processing engine.
MapReduce: A major step backwards David J. DeWitt Website

In January 2008, David J. DeWitt and Michael Stonebraker bravely stuck their heads above the parapet, to argue that the MapReduce upstart was a wannabe Emperor, very lacking in clothes. The response was furious. Nine years later we can see that the upstart has indeed made a lot of money, but, as I have experienced recently, still does not work well for most applications.

We owe DeWitt and Stonebraker a debt of gratitude for their courage in taking a critical look at this technology bubble early on.

MapReduce: A major step backwards
PASS Talks David J. DeWitt Website

We also owe David DeWitt a huge debt of gratitude for the lifetime of work he has contributed as one of the small band of founding fathers behind technology that we all use every day, mostly without realising it. All major businesses actually rely, not on Big Data tools, but on relational databases. Your bank, your insurance company, your mobile phone provider, your police force, your hospitals, your online retailers, even Google and Facebook, along with most other websites, in fact just about every organisation with more than a handful of data, all these use and rely on relational databases.

The PASS Talks are all interesting. I found the 2011 Pass Talk particularly interesting because of my recent experience in a Big Data project. Slides 55-57 support my experience comparing Hive with SQL Server Parallel Data Warehouse (PDW).

University of Wisconsin - Madison, Computer Sciences Department
Industry Standard Data Models Margy Ross Website Industry Standard Data Models. An expensive bad idea. Don't use them. Written in 2010 and just as true today. Decision Works. Formerly doing business as The Kimball Group.
The Checklist Manifesto: How To Get Things Right Atul Gawande Profile Books
I Think You’ll Find It’s A Bit More Complicated Than That Ben Goldacre Paper, Kindle, Audio Ben Goldacre is a champion of science and the value of evidence-based information. We need him now more than ever. You should visit his website and read his other books too. Fourth Estate
Effective XML Elliotte Rusty Harold Paper

XML is over-used and over-rated, but for those times when you do have to use it, this book gives sound advice.

W3C describes XML as:

“Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.”

Pragmatic Guide to Subversion Mike Mason Paper, eBook A well-written, comprehensive and easy to follow reference for users of Subversion. Subversion is more appropriate than Git for most projects that are carried out by teams in organisations or by individual developers. Git is designed for massively distributed development of open-source projects. The complexity and flexibility of Git is inappropriate for most projects. So, if you have chosen the right source-code control system for your work, this book will help you use it well. The Pragmatic Bookshelf
Agile Web Development with Rails 5 Sam Ruby, Dave Thomas and David Heinemeier Hansson Paper, eBook An excellent tutorial on building web applications with Ruby on Rails. On the way you pick up the philosophy of Rails which really is an excellent way of working and a refreshing alternative to the feature-obese systems that many of us have to suffer. The Pragmatic Bookshelf
More or Less Tim Harford and others Radio, podcast More or Less podcast. Thoughtful and informed analysis of numbers in the news. Another champion of evidence-based information The BBC
Everybody Lies Seth Stephens-Davidowitz Paper, Kindle, Audio The sensational title is probably very smart; I guess it increases the number of people who buy the book. But this is not a sensationalist book. Sure, it has many sections that will make your eyes pop out if you read it, but the thrust of the book is about good, solid research. Seth Stephens-Davidowitz is the first person I have come across in this area who actually deserves the title "Data Scientist". He describes rigorous research and statistical techniques and he has a curiosity that drives him to question every conclusion: is it valid? is there some other reason we could be getting this result? have we made any assumptions that could give us the wrong answer? The answers he has found so far are fascinating and mostly very important. Definitely worth reading. Bloomsbury
The Attention Merchants Tim Wu Paper, Kindle This is what Big Data is really all about. A fascinating history describing how we got here. Atlantic Books
Big Data: The Broken Promise of Anonymisation Martyn Thomas Video, audio, pdf One of a series of very fine public lectures at Gresham College, London. Read more... Gresham College, London
Are You The Customer Or The Product? Martyn Thomas Video, audio, pdf One of a series of very fine public lectures at Gresham College, London. Read more... Gresham College, London
Spurious Correlations Tyler Vigen Website This is very important. It has implications for Data Science and Machine Learning, to say nothing of the crazy stories that appear in our news media from time to time. Website
Evolutionary Database Design Martin Fowler
Planning Extreme Programming Kent Beck and Martin Fowler Addison-Wesley
Structured Programming C.J. Dahl, E.W. Dijkstra and C.A.R. Hoare Academic Press
Refactoring: Improving the Design of Existing Code Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts Addison-Wesley
Patterns of Enterprise Application Architecture Martin Fowler Addison- Wesley Longman
The Oberon System Martin Reiser Addison-Wesley