Skip to content

ekeepo

Helping teams build better software

Archive

June, 2009Archive for

Download Package: http://ekeepo.s3.amazonaws.com/Census Public Data Sets Walk Through Package.zip

I wrote a step-by-step walk through (with screenshots) which shows how to use the Year 2000 Census Public Data Set in the context of AWS services (Amazon EC2, Amazon SimpleDB, and Amazon Simple Queue Service (Amazon SQS)).  The following are the scenarios which show the use of Public Data Sets to solve some business problems:

  1. Simple Census Parser - Using Amazon EC2, Visual Studio 2008, and C# to parse data from the Year 2000 Census public data set.  The parsed census data is used to hydrate .NET objects which are then serialized to an XML file.  This scenario shows how easy it is to load, parse, and manipulate data in a Public Data Set.
  2. Census Excel Viewer – Using Amazon EC2, Visual Studio 2008 Tools for Office, C#, and Excel 2007 to create an Excel 2007 add-in which parses through the Year 2000 Census Public Data Set (using a library created in the Simple Census Parser) and creates Excel sheets with formatted census data.  This scenario shows how to enable business users (familiar with Excel 2007) to easily interact with a Public Data Set from within Amazon EC2.
  3. Simple Census Parser to Amazon SimpleDB – An extension of the Simple Census Parser which serializes the data into Amazon SimpleDB.  This scenario shows how a Public Data Set can be exposed as structured data using Amazon SimpleDB.
  4. Census Excel Viewer from Amazon SimpleDB – An extension of the Census Excel Viewer which loads the census data from Amazon SimpleDB (instead of from the Amazon EC2 PDS volume).  This component takes advantage of the Simple Census Parser to Amazon SimpleDB code which structures, organizes, and exposes the census data via the Amazon SimpleDB web services.  This scenario shows how Public Data Set data can be consumed by business users (inside Excel 2007 running on any machine with access to Amazon SimpleDB) by exposing the structured data using Amazon SimpleDB.
  5. Parallel Query Processor using Amazon EC2 and Amazon Simple Queue Service (Amazon SQS) – A national restaurant/bar chain which has a presence in the top five US markets is investigating the next five markets to which they will expand.  This company is looking for an easy way to access and process the large data set which is the US Census in order to determine the top 10 US cities/regions with the highest number of people between the age of 20 and 34.  Using Amazon EC2, Amazon SQS, and two C# WinForms applications (Query Manager and Query Processor) to implement a parallel query processor implemented using a map-reduce type architecture, this business can easily access the census data, scale to the hardware requirements for this specific query, and parallelize the processing which will help them make the decision quicker while using less resources.  The Query Manager application uses Amazon SQS to map (or break down the components of the census data which will be processed by the Query Processor).  The Query Processor (which can be run on a number of Amazon EC2 instances) is instructed by the Amazon SQS queue to process a subset of the query and report back to the Query Manager, again using Amazon SQS.

Send me a message if you have done some exciting integration with your data and Amazon Public Data Sets or if you have some ideas about how this type of data integration could be interesting to your business.

The installer for TFS 2010 looks great- very nice UI and even had the dependency validation (similar to SQL Server install with the green checkboxes). I decided to brave the install into a Windows Server 2008 R2 (release candidate) for a variety of reasons (HyperV support, Win7 UI, among others) and ran into a problem with the SharePoint part of the install. After you install the SharePoint there are a couple of command line commands (stsadm.exe ….) that you’ll need to run (see TFS 2010 Install Guide). Well, these fail with the following error if you’ve installed SharePoint (without SP2): “Value does not fall within the expected range.”

See the post at http://blogs.msdn.com/dstfs/archive/2009/05/15/installing-tfs-2010-on-windows-server-2008-r2-rc.aspx for the details of how to get around it. There are a couple of steps to fix it. The solution worked like a charm.

I have now created my first Team Project and I like a few things already:
1. Project Collections – Great features for organizing and working with Team Projects in a group.
2. Team Foundation Administration Console – Centralized place to manage all things TFS.
3. Workflow based Team Builds – Very neat and appropriate use of Windows Workflow Foundation. This will make customizing builds very interesting (lots of possbilities for customizations/extensions).

More to come :)