Download Package: http://ekeepo.s3.amazonaws.com/Census Public Data Sets Walk Through Package.zip
I wrote a step-by-step walk through (with screenshots) which shows how to use the Year 2000 Census Public Data Set in the context of AWS services (Amazon EC2, Amazon SimpleDB, and Amazon Simple Queue Service (Amazon SQS)). The following are the scenarios which show the use of Public Data Sets to solve some business problems:
- Simple Census Parser - Using Amazon EC2, Visual Studio 2008, and C# to parse data from the Year 2000 Census public data set. The parsed census data is used to hydrate .NET objects which are then serialized to an XML file. This scenario shows how easy it is to load, parse, and manipulate data in a Public Data Set.
- Census Excel Viewer – Using Amazon EC2, Visual Studio 2008 Tools for Office, C#, and Excel 2007 to create an Excel 2007 add-in which parses through the Year 2000 Census Public Data Set (using a library created in the Simple Census Parser) and creates Excel sheets with formatted census data. This scenario shows how to enable business users (familiar with Excel 2007) to easily interact with a Public Data Set from within Amazon EC2.
- Simple Census Parser to Amazon SimpleDB – An extension of the Simple Census Parser which serializes the data into Amazon SimpleDB. This scenario shows how a Public Data Set can be exposed as structured data using Amazon SimpleDB.
- Census Excel Viewer from Amazon SimpleDB – An extension of the Census Excel Viewer which loads the census data from Amazon SimpleDB (instead of from the Amazon EC2 PDS volume). This component takes advantage of the Simple Census Parser to Amazon SimpleDB code which structures, organizes, and exposes the census data via the Amazon SimpleDB web services. This scenario shows how Public Data Set data can be consumed by business users (inside Excel 2007 running on any machine with access to Amazon SimpleDB) by exposing the structured data using Amazon SimpleDB.
- Parallel Query Processor using Amazon EC2 and Amazon Simple Queue Service (Amazon SQS) – A national restaurant/bar chain which has a presence in the top five US markets is investigating the next five markets to which they will expand. This company is looking for an easy way to access and process the large data set which is the US Census in order to determine the top 10 US cities/regions with the highest number of people between the age of 20 and 34. Using Amazon EC2, Amazon SQS, and two C# WinForms applications (Query Manager and Query Processor) to implement a parallel query processor implemented using a map-reduce type architecture, this business can easily access the census data, scale to the hardware requirements for this specific query, and parallelize the processing which will help them make the decision quicker while using less resources. The Query Manager application uses Amazon SQS to map (or break down the components of the census data which will be processed by the Query Processor). The Query Processor (which can be run on a number of Amazon EC2 instances) is instructed by the Amazon SQS queue to process a subset of the query and report back to the Query Manager, again using Amazon SQS.
Send me a message if you have done some exciting integration with your data and Amazon Public Data Sets or if you have some ideas about how this type of data integration could be interesting to your business.