Schedule: June 2-3, Tutorials

  • Most tutorials’ materials can be found on this listing, or on the individual pages of the tutorial workshops.

    We’re pleased to announce our Tutorials schedule.  The HPCS 2013 tutorials precede the symposium, with an HPC Summer School taking place on Tuesday May 28th through Friday May 31st, and HPCS 2013 tutorials occuring Sunday, June 2nd and Monday, June 3rd.  All tutorials will take place on University of Ottawa main campus, near the symposium venue; room locations will be announced shortly.

    The schedule is as follows:

    Tues, May 28th through Friday, May 31st

    • Ontario HPC Summer School East [Level: Beginner] (Separate, free, registration), covering Intro to HPC, the terminal, OpenMP, MPI, Posix threads, GPGPU with CUDA, and Visualization. See website for program details.

    Sunday, June 2nd: AM

    Sunday, June 2nd: PM

    Monday, June 3rd: AM

    • Visualization with Paraview Part I.  [Level: Beginner].  Brian Corrie. This tutorial will explore the basics of scientific visualization with 1-, 2- and 3-dimensional data sets through hands-on exercises. Attendees will need to bring a laptop and should have Paraview installed on their laptop before the workshop.
    • GPFS Training.  [Level: Advanced].  Ray Paden, IBM.  This tutorial will walk through advanced GPFS configuration and administration techniques to improve performance and scalability.
    • Advanced Data Journalism Workshop Part I.  [Level: Intermediate; separate registration].  Fred Vallance-Jones.   In this hands-on session and demonstration Fred will take you through basic methods of finding trends and leads on stories in large datasets, and how to visualize your results using online tools. Participants will be introduced to MySQL, an enterprise-scale database program that can run on a desktop PC or Mac, and to Tableau Public and Google Fusion tables, two online tools for visualizing data.

    Monday, June 3rd: PM

    Welcome Reception: Monday, June 3rd, 6pm-8pm

    •  University of Ottawa, Social Sciences Bldg, Rm FSS-4007, 6-8 pm

     

    A more detailed schedule follows:

    Sunday

    Time
    Description
    8:30am - 12:30pm
    Tutorial: High Performance Storage Systems Part 1
    Intended Audience: Systems administrators, technical people. (e.g., no papers or research-oriented material). Format: 80% technical training, provided by different vendors. 20% “state-of-the-union” (status of OpenSFS, etc.) and user reports (e.g., from different Canadian sites and their experience with storage systems). Agenda 8:30 – 8:40 Welcome and Introduction 8:40 – 9:00 OpenSFS Update (Peter Jones, Intel) 9:00 – 10:00 Measuring Lustre Performance (John Fragalla, Xyratex) 10:00 – 11:00 Lustre Troubleshooting and Tuning, Part 1 (Brett Lee, Intel) 11:00 – 12:00 Lustre Troubleshooting and Tuning, Part 2 (Brett Lee, Intel) Abstracts: Measuring Lustre Performance Instructor: John Fragalla, Principal Architect, High Performance Computing, Xyratex When baselining performance for a Lustre filesystem, there are many methods and tools to accomplish this task. This session will walk through using a benchmark tool, IOR, and how changing the stripe size on Lustre effects performance for file per process and single shared file benchmarks. IOR in itself has many flags and options to change and optimize which can also effect performance, and how matching these parameters with the Lustre stripe size plays a part in various performance results. Lastly, this session will provide tips on how to match the number of clients to the number of OSTs to provide a balanced benchmark setup which also effects benchmark results. Lustre Troubleshooting and Tuning Instructor: Brett Lee, Intel Lustre is an open source, multi-vendor, global, parallel, and high performance scalable file system. Lustre has earned a significant reputation in the High Performance Computing (HPC) community for its speed and scalability. It has a reputation for being hard to manage. The purpose of this presentation is not to discuss the speed (now over 1 terabyte/sec) or scale (well into the petabytes) of Lustre, but instead we will pull back the curtain on some of the mysteries of Lustre, and learn how to better and more easily manage it. This presentation will cover the most common sources of problems as well as the information needed to determine and resolve these problems. In addition to troubleshooting, tuning can contain many elements as well. Lustre, like Linux, provides many ways to tune the file system. In the second half of this presentation we will look at some key Lustre configuration options, including those at the macro level that apply to most installations as well as others that more granular and are typically specific for a given site.
    Paul Lu
    9:00am - 12:30pm
    Tutorial: Intro to Parallel I/O
    This tutorial will introduce MPI-IO, which allows a program to read and write to a single file from multiple processes. It will also cover the parallel I/O libraries NetCDF and HDF5, which are built on top of MPI-IO.
    Craig Lucas
    9:00am - 12:30pm
    Tutorial: Introduction to Data Analysis with R
    This tutorial will introduce computing in R. Users will be expected to bring laptops with R and Rstudio installed.
    Joey Bernard
    1:30pm - 5:00pm
    Tutorial: Genomics Data Workflow
    High-throughput genomics technologies are revolutionizing genetics and medical research. While next-generation sequencing is enabling a wealth of new applications that promise to have a significant impact on biology and health care, the major limiting factor as shifted from data production to data processing and interpretation. The scale of data that needs to be analyzed in genomics implies that efficient usage of High-Performance Computing (HPC) resources has become essential. This workshop is geared towards bioinformaticians and informaticians interested in working with next generation sequencing datasets and will provide hands-on experience on how to run genomics tools on an HPC cluster. The specific analysis pipeline that will be tested is for exome analysis and will cover all the steps required for variant calling. These include: adapter removal and trimming, mapping of reads on a reference genome, variant calling itself and annotation. Objectives: Obtain basic knowledge of analysis pipelines in genomics Overview of HPC resources needed for genomics Learn how various genomics tools work Run all the steps of a genomics analysis pipeline on an HPC cluster Tools to be discussed: Trimmomatic Burrows-Wheeler Aligner (BWA) Samtools/bcftools Picard Requirements: Basic Linux knowledge
    1:30pm - 6:00pm
    Tutorial: High Performance Storage Systems Part 2
    Intended Audience: Systems administrators, technical people. (e.g., no papers or research-oriented material). Format: 80% technical training, provided by different vendors. 20% “state-of-the-union” (status of OpenSFS, etc.) and user reports (e.g., from different Canadian sites and their experience with storage systems). Agenda 1:00 – 2:00 Lustre Troubleshooting and Tuning, Part 3 (Brett Lee, Intel) 2:00 – 3:00 Lustre Troubleshooting and Tuning, Part 4 (Brett Lee, Intel) 3:00 – 4:00 PanFS Tuning and Application I/O Design (Bill Loewe, Panasas) 4:00 – 5:00 EMC/Isilon Training: Abstract TBA (David Bickle, EMC) 5:00 – 6:00 GPFS Training: Ray Paden, IBM Abstracts: Lustre Troubleshooting and Tuning Instructor: Brett Lee, Intel Lustre is an open source, multi-vendor, global, parallel, and high performance scalable file system. Lustre has earned a significant reputation in the High Performance Computing (HPC) community for its speed and scalability. It has a reputation for being hard to manage. The purpose of this presentation is not to discuss the speed (now over 1 terabyte/sec) or scale (well into the petabytes) of Lustre, but instead we will pull back the curtain on some of the mysteries of Lustre, and learn how to better and more easily manage it. This presentation will cover the most common sources of problems as well as the information needed to determine and resolve these problems. In addition to troubleshooting, tuning can contain many elements as well. Lustre, like Linux, provides many ways to tune the file system. In the second half of this presentation we will look at some key Lustre configuration options, including those at the macro level that apply to most installations as well as others that more granular and are typically specific for a given site. Panasas: PanFS Tuning and Application I/O Design Instructor: Bill Loewe, Panasas The Panasas PanFS parallel file system is designed to provide high performance by maximizing parallelism and avoiding bottlenecks. Understanding the file system tuning options and application I/O design can help in achieving the best results for different usage cases. This training seminar will focus on the interaction of the file system and application, as well as issues in programming for parallel I/O. The session will cover the Panasas architecture, general parallel I/O models, and the tuning topics of prefetching, caching, data layout, and MPI-IO hints to control file access, before concluding with best practices.
    Paul Lu
    1:30pm - 5:00pm
    Tutorial: Large-Scale Data Analysis with Parallel R
    This tutorial will introduce parallel computing techniques and packages for analyzing large amounts of data in R.
    Joey Bernard

    Monday

    Time
    Description
    9:00am - 12:00pm
    Tutorial: GPFS Training, IBM
    GPFS (General Parallel File System) is IBM's clustered/parallel file system commonly used for HPC and cluster applications. It has been generally available since 1998 giving it both maturity and market presence. This workshop is both flexible and dynamic. It surveys GPFS's features, semantics, tuning and optimization guidelines, best practices and environment. Specific topics are emphasized on the basis of attendee interest. This seminar is delivered in a comfortable environment encouraging question and answer dialogue.
    Ray Paden
    9:00am - 12:00pm
    Tutorial: Visualization with Paraview Part 1
    This workshop will explore the basics of scientific visualization with 1-, 2- and 3-dimensional data sets through hands-on exercises. Attendees will need to bring a laptop and should have Paraview installed on their laptop before the workshop. Location - Colonel By Building, CBY-E015 - enter off of King Edward Street. Breakfast, Badge pick-up and coffee breaks outside of FSS-1007. Breakfast at 8:30, lunch at 12, coffee breaks at 10:30,2:30
    Brian Corrie
    9:00am - 12:00pm
    Workshop: Advanced Data Journalism Part 1
    You've mastered simple data analysis using Excel, and you know something about where and how to obtain data for stories. Now, find out how you can take your practice of data journalism to a new level with more advanced data analysis and visualization. In this hands-on session and demonstration Fred will take you through basic methods of finding trends and leads on stories in large datasets, and how to visualize your results using online tools. Participants will be introduced to MySQL, an enterprise-scale database program that can run on a desktop PC or Mac, and to Tableau Public and Google Fusion tables, two online tools for visualizing data. Computers and software will be provided, but if participants wish to bring their own computers with MySQL and Tableau public installed, they can use their own machines. Location: Cube building, 2nd floor, CBE-202. Breakfast, Badge pick-up and coffee Breaks outside of FSS-1007. Time: Monday, June 3rd, 9am-5pm; breakfast at 8:30, lunch at 12, coffee breaks at 10:30,2:30
    12:00pm - 5:00pm
    Registration; Exhibit/Poster Setup
    12:00pm - 2:00pm
    TECC-Access
    Meeting of the TECC Access working Group
    1:30pm - 5:00pm
    A Hands-On Introduction To Hadoop
    In this session, attendees will learn the basics of the Hadoop ecosystem, and will implement simple analysis tasks using Hadoop Map-Reduce on a VM on their laptop.
    Mike Nolta (h), Jonathan Dursi
    1:30pm - 5:00pm
    Tutorial: Visualization with Paraview Part 2
    This workshop will explore the basics of scientific visualization with 1-, 2- and 3-dimensional data sets through hands-on exercises. Attendees will need to bring a laptop and should have Paraview installed on their laptop before the workshop.
    Brian Corrie
    1:30pm - 5:00pm
    Workshop: Advanced Data Journalism Part 2
    You've mastered simple data analysis using Excel, and you know something about where and how to obtain data for stories. Now, find out how you can take your practice of data journalism to a new level with more advanced data analysis and visualization. In this hands-on session and demonstration Fred will take you through basic methods of finding trends and leads on stories in large datasets, and how to visualize your results using online tools. Participants will be introduced to MySQL, an enterprise-scale database program that can run on a desktop PC or Mac, and to Tableau Public and Google Fusion tables, two online tools for visualizing data. Computers and software will be provided, but if participants wish to bring their own computers with MySQL and Tableau public installed, they can use their own machines.
    3:00pm - 6:00pm
    TECC-SC
    Meeting of the TECC-SC working group
    6:00pm - 8:00pm
    Welcome Reception, Registration