 
                
             
        For your data analysis pleasure, I give you a giant list of super cool publicly available data. If you’re looking at the data sets and wondering “now what?” – you can find this list AND tutorials on how to use H2O for analysis at the H2O docs page (here: http://docs.0xdata.com) .
 You can also get a detailed hands on experience analyzing any of this data, random numbers you might have laying around, stuff you made up, or whatever you want by coming to any of our upcoming meetups and hanging out with the 0xdata math team (http://www.meetup.com/H2Omeetup/).  
 Open City Datasets 
 **Palo Alto Open Data
 http://www.cityofpaloalto.org/gov/depts/it/open_data/default.asp
 Chicago 
 https://data.cityofchicago.org/
 20 yrs crime data 
 https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
 NYC 
 https://nycopendata.socrata.com/
 Rents & Neighborhoods 
 http://www.huduser.org/portal/datasets/HUD_data_matrix.html
 Transportation and Travel 
 Airlines Dataset 
 http://stat-computing.org/dataexpo/2009/the-data.html – but so far it contains years 1987-2007 (based on http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html)
 Data source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236
 Open Flights Database 
 http://openflights.org/data.html
 Capital Bikes Share Data 
 https://www.capitalbikeshare.com/trip-history-data
 Sciences and Engineering 
 NASA Open Data 
 http://data.nasa.gov/
 Seismic Data 
 http://sioseis.ucsd.edu/segy.header.html
 Weather Public Data 
 http://OpenWeatherMap.org
 http://OpenMeteoData.org
 Diverse Data Sets 
 
 Many Eyes Community Datasets 
 http://www-958.ibm.com/software/analytics/manyeyes/
 Kaggle Competitions 
 http://www.kaggle.com/
 UCI Machine Learning Library 
 http://archive.ics.uci.edu/ml/datasets.html
 Human Activity Recognition Using Smartphones  http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
 MLData repository 
 http://mldata.org/
 GitHub Challenge 
 https://github.com/blog/1450-the-github-data-challenge-ii
 Yelp Dataset Challenge 
 https://www.yelp.com/dataset_challenge
 Netflix Prize 
 http://stackoverflow.com/questions/1407957/netflix-prize-dataset
 Infochimps 
 
 Stanford Dataset Library 
 http://snap.stanford.edu/data/index.html
 Million Songs Database 
 http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset
 Caret 
 http://caret.r-forge.r-project.org/datasets.html
 Public Policy Data 
 European Open Data 
 http://open-data.europa.eu/en/
 US Open Data 
opendatasites
 
 WorldBank Data 
 http://data.worldbank.org/data-catalog
 Guardian Data 
 http://www.guardian.co.uk/news/datablog/interactive/2013/jan/14/all-our-datasets-index
 Statistics Netherlands 
 http://www.cbs.nl/en-GB/menu/home/default.htm?Languageswitch=on
 Quandl 6M Financial, Economics, and Social Datasets 
 http://www.quandl.com/