01.12.2012 Views

GTC 2012 Program Guide - GPU Technology Conference

GTC 2012 Program Guide - GPU Technology Conference

GTC 2012 Program Guide - GPU Technology Conference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

PRESENTED BY PLATINUM SPONSORS<br />

MAY 14-17, <strong>2012</strong> | SAN JOSE, CA<br />

PROGRAM<br />

GUIDE


The Power<br />

to do More<br />

HP <strong>GPU</strong> computing ranging from personal<br />

supercomputing Z-workstations to the<br />

world’s most self-sufficient <strong>GPU</strong> enabled<br />

servers. Come and talk with the HP<br />

<strong>GPU</strong> experts about the performance,<br />

HP Ad<br />

efficiency and agility you get with HP.<br />

Visit HP Booth #47 for<br />

more information.<br />

www.hp.com/go/zworkstations<br />

www.hp.com/go/accelerators


WELCOME<br />

TO <strong>GTC</strong><br />

Dear <strong>GTC</strong> Attendees,<br />

Back in 2009 we had an idea to bring together the wide<br />

variety of people who use <strong>GPU</strong>s in their work.<br />

Disciplines from quantum chemistry to computational<br />

fluid dynamics and astrophysics. People from every<br />

corner of the world. We hosted our first <strong>GPU</strong><br />

<strong>Technology</strong> <strong>Conference</strong>.<br />

We were proud of its success. More than 90 percent of<br />

the presentations were made by people outside<br />

NVIDIA. Meetings spilled into the hallways. The place<br />

buzzed with energy. We realized that <strong>GPU</strong> computing<br />

was bigger than NVIDIA. And that <strong>GTC</strong> was a conduit<br />

into the collective power of brilliant scientists,<br />

technologists and thought leaders. It was an honor to<br />

host it on your behalf.<br />

In 2010, we doubled down, with more than 280 sessions,<br />

and 2,000 attendees. The reach of <strong>GPU</strong> computing was<br />

growing. And its impact was truly breathtaking.<br />

Researchers from Adobe showcased their work in<br />

computational photography, which one day will redefine<br />

the field. A surgeon described how <strong>GPU</strong>s were vital in<br />

performing surgery on a beating heart.<br />

And <strong>GTC</strong> <strong>2012</strong> promises to be better still.<br />

You can choose from among hundreds of sessions.<br />

Among them are talks by Oak Ridge National<br />

Laboratory on using <strong>GPU</strong>s to build Titan, the world’s<br />

largest supercomputer; Tokyo Institute of <strong>Technology</strong>,<br />

winner of last year’s Gordon Bell Prize, on<br />

stereoscopic 3D visualization; and Beijing’s BGI on<br />

using <strong>GPU</strong>s for bioinformatics research. A variety of<br />

entrepreneurs will speak at the Emerging Companies<br />

Summit about how their startups use <strong>GPU</strong>s.<br />

<strong>GTC</strong> will also play host for the first time to two other<br />

events. Los Alamos National Laboratory will hold its<br />

Accelerated HPC Symposium, bringing together world<br />

leaders in supercomputing. InPar will provide a<br />

first-tier academic venue for peer-reviewed, archival<br />

publications in the emerging fields of parallel<br />

computing.<br />

And NVIDIA will discuss Kepler, our first new<br />

architecture in two years, and its impact on computing.<br />

Our first Kepler-based graphics card recently launched<br />

to fantastic reviews. We can’t wait to share how these<br />

powerful, super energy-efficient <strong>GPU</strong>s open up new<br />

horizons in high performance computing and scientific<br />

discovery.<br />

We will also be talking much more about <strong>GPU</strong> and<br />

cloud computing, as well as our Maximus technology,<br />

which creates a workstation so powerful that it<br />

simulates the physics of a design while it is being<br />

created.<br />

It should make for the best <strong>GTC</strong> yet.<br />

Enjoy the conference!<br />

Sincerely,<br />

The NVIDIA <strong>GTC</strong> Team<br />

CONFERENCE GUIDE


“0.1 of a second can be the difference<br />

between winning and losing in Formula One.<br />

Data analysis doesn’t get much<br />

more critical than that.”<br />

DELL AD?<br />

See how Dell helped Caterham F1 Team deploy an<br />

enterprise-class IT system able to do real-time analysis of<br />

data sent from the car, while withstanding the intense heat<br />

and vibration of the Formula 1 TM<br />

trackside environment.<br />

Learn more at Dell.com/EfficientIT.<br />

Mark Smith<br />

Technical Director<br />

Caterham F1 Team<br />

Join our breakout session on Wednesday, May 16th at 2pm, room M, where Dr. Jeff Layton,<br />

HPC Enterprise Technologist, will discuss compelling new technology advancements in <strong>GPU</strong> Computing.


IMPORTANT INFORMATION<br />

If there is anything we can do to make your conference experience better, please stop by the<br />

info desk and let us know.<br />

REGISTRATION / INFORMATION DESK HOURS<br />

SUNDAY, MAY 13<br />

16:00 to 18:00<br />

MONDAY, MAY 14<br />

08:00 to 18:00<br />

TUESDAY, MAY 15<br />

07:00 to 19:00<br />

WEDNESDAY, MAY 16<br />

08:00 to 18:00<br />

THURSDAY, MAY 17<br />

08:00 to 16:00<br />

EXHIBIT AND MEAL HOURS<br />

TUESDAY, MAY 15<br />

WEDNESDAY, MAY 16<br />

THURSDAY, MAY 17<br />

12:00 to 14:00 Lunch / Exhibits Open<br />

18:00 to 20:00 Reception / Exhibits Open<br />

12:00 to 14:00 Lunch / Exhibits Open<br />

18:00 to 20:00 Reception / Exhibits Open<br />

12:00 to 14:00 Lunch / Exhibits Open<br />

ENROLL IN YOUR SESSIONS Go to https://registration.gputechconf.com/schedule and log in to<br />

start adding sessions to your personal schedule. Priority access<br />

into each session will be given to those who enroll. Enrolling in<br />

sessions also helps us schedule the most popular sessions in the<br />

largest rooms.<br />

WIRELESS INTERNET ACCESS Free wireless internet access can be found under <strong>GTC</strong><strong>2012</strong> and is<br />

available in most session rooms, keynote hall, exhibit hall and<br />

throughout the concourse.<br />

DOWNLOAD THE MOBILE APP Keep up-to-date with the latest news and information at the<br />

conference through the <strong>GTC</strong> <strong>2012</strong> Mobile App. Download it from the<br />

Android market at https://play.google.com/store. You can also<br />

access news and announcements from the home page of<br />

www.gputechconf.com.<br />

BUSINESS CENTER / SHIPPING The Marriott Hotel and the Hilton Hotel both have business centers<br />

located on the first floor, near their respective front lobbies.<br />

Alternatively, there is a Fedex Office Print & Ship Center at 93 E. San<br />

Carlos Street, near 3rd Street (3 blocks from the Convention Center,<br />

call 408-295-4336 for hours).<br />

GO GREEN! Take part in the shared goal of minimizing our collective impact on<br />

the environment. Please take only the conference materials you<br />

need and recycle, and reuse, whenever possible throughout the<br />

week. Please turn in your badges for recycling at the conclusion of<br />

the event.<br />

BAG AND COAT CHECK Bag check is available at the bell desk of the Marriott and Hilton<br />

hotels, connected to the Convention Center. It is also available on the<br />

concourse of the Convention Center.<br />

LOST AND FOUND Please check the information desk should you lose or find an article.<br />

FIRST AID / EMERGENCY Should there be a medical emergency, please dial 911 and alert the<br />

nearest conference personnel.


Lenovo ® recommends Windows ® 7 Professional.<br />

MONTHS OF PLANNING | A FUTURISTIC MOVIE SET | AN IDEA TURNED TO REALITY.<br />

LENOVO AD?<br />

DREAM.<br />

CREATE.<br />

INTRODUCING THE LENOVO® THINKSTATION® 30 SERIES,<br />

FEATURING THE D30 FOR HIGH-END GRAPHICS AND PROCESSING<br />

POWER.<br />

The Lenovo ThinkStation® 30 series was designed for those who push technology to the limits and depend<br />

on professional applications and platforms to get there. The ThinkStation® 30 Series is certified to run the<br />

applications you need most from Adobe, Autodesk, Dassault Systemes, PTC and Siemens. Designed to tackle<br />

the biggest challenges, the D30 delivers the ultimate in performance and expandability. And now armed with<br />

the latest generation of Intel® Xeon® processors, Genuine Windows ® 7 Professional and supporting discrete<br />

Quadro and Tesla graphics technology from NVIDIA® - you can defy expectations like never before.<br />

Energy-efficient � Quiet Acoustics � Scalable Storage � ISV-certified<br />

www.lenovo.com/thinkstation<br />

Lenovo, the Lenovo logo, For Those Who Do and ThinkStation are trademarks or registered trademarks of Lenovo. Microsoft and Windows are registered trademarks of Microsoft Corporation in<br />

the U.S. and other countries. Intel and Intel Xeon are registered trademarks of Intel Corporation in the U.S. and other countries. Nvidia is registered trademarks of Nvidia Corporation in the<br />

U.S. and other countries.<br />

© Lenovo <strong>2012</strong>. All rights reserved.


1<br />

3<br />

6<br />

10<br />

20<br />

23<br />

27<br />

47<br />

69<br />

83<br />

103<br />

145<br />

160<br />

TABLE OF CONTENTS<br />

Welcome Letter<br />

Important Information<br />

<strong>Conference</strong> Highlights - Don’t Miss These Events!<br />

Emerging Companies Summit<br />

Los Alamos National Laboratory Accelerated High<br />

Performance Computing Symposium<br />

Sessions Listing - Monday<br />

Sessions Listing - Tuesday<br />

Sessions Listing - Wednesday<br />

Sessions Listing - Thursday<br />

Research Posters Listing<br />

Speakers and Panelists Listing<br />

Sponsors and Exhibitors<br />

Stay Connected!


CONFERENCE<br />

HIGHLIGHTS –<br />

DON’T MISS<br />

THESE EVENTS!<br />

NVIDIA ® Nsight Lab<br />

The lab will be open daily for product discussions, testing of your application with<br />

the latest version of Nsight, or a place to simply hang out and relax with the Nsight<br />

development team. The lab is located on the first floor next to the Nsight Lab.<br />

C++ AMP LOUNGE, by Microsoft<br />

While attending <strong>GTC</strong>, come learn from the experts at the C++ AMP Lounge by<br />

Microsoft, a casual environment for hands-on learning and instruction. Experts<br />

will be available each day to answer questions and provide instruction. The<br />

lounge is located on the concourse.<br />

Ask the CUDA Expert<br />

Stop by Ask the CUDA Expert on the main concourse for a quick consultation with<br />

NVIDIA software engineers and developer technology experts. Experts on CUDA<br />

C, Fortran, OpenACC, <strong>GPU</strong>-Accelerated Libraries and more will be on hand to<br />

answer your questions. No question is too challenging or too easy for this crew!<br />

Ask the CUDA Expert will be open as follows:<br />

Monday 10:00 to 16:00<br />

Tuesday 12:00 to 19:00<br />

Wednesday 10:00 to 11:00, 12:00 to 19:00<br />

Thursday 10:00 to 11:00, 12:00 to 16:00<br />

DigitalGuru: Where Smart People Get Smarter<br />

DigitalGuru Technical Bookshop of Cupertino, California is pleased to<br />

participate in <strong>GTC</strong> <strong>2012</strong>. Please visit our table during the conference for a wide<br />

and relevant selection of books on parallel programming, computer science,<br />

application tools and more. Books sold at <strong>GTC</strong> are available at 20% off list<br />

price. For more info visit www.digitalguru.com.<br />

Dinner with Strangers<br />

Over a meal in some of the best restaurants in Silicon Valley, engage in lively<br />

conversation and share your best ideas. Pre-reserved tables for small groups<br />

will be made available to <strong>GTC</strong> attendees to mix and mingle with fellow attendees.<br />

Dinner with Strangers is open to all, but space is limited and is on a first come,<br />

first serve basis. Stop by the sign-up board located on the concourse. Dinner<br />

with Strangers happens on Monday and Tuesday night with reservations at 20:00.


SUNDAY, MAY 13<br />

08:30 to 17:35 InPar <strong>2012</strong>, Foundations & Applications of <strong>GPU</strong>, Manycore, and<br />

Heterogeneous Systems (Room J)<br />

MONDAY, MAY 14<br />

08:40 to 17:00 InPar <strong>2012</strong>, Foundations & Applications of <strong>GPU</strong>, Manycore, and<br />

Heterogeneous Systems (Room J)<br />

09:00 to 15:50 Pre-<strong>Conference</strong> Tutorials<br />

16:00 to 18:00 Research Poster Showcase and Reception<br />

TUESDAY, MAY 15<br />

10:30 to 11:50 Opening Keynote with Jen-Hsun Huang, NVIDIA CEO and Co-Founder<br />

(Keynote Hall, Hall 1)<br />

12:00 to 14:00 Exhibits Open / Networking Lunch (Exhibit Hall)<br />

14:00 to 18:00 <strong>GPU</strong>-accelerated Science on Titan: Tapping into the World’s<br />

Preeminent <strong>GPU</strong> Supercomputer to Achieve Better Science, Jack<br />

Wells, Director of Science, Oak Ridge Leadership Computing Facility, Oak<br />

Ridge National Laboratory (Room A2)<br />

16:00 to 16:50 CUDA 5 and Beyond, Mark Harris, Chief Technologist, <strong>GPU</strong> Computing,<br />

NVIDIA (Hall 1)<br />

18:00 to 20:00 Exhibits Open / Networking Reception (Exhibit Hall)<br />

WEDNESDAY, MAY 16<br />

9:00 to 9:30 Emerging Companies Summit Opening Address with Jeff Herbst, VP<br />

Business Development, NVIDIA (Marriott Hotel, Ballroom 4)<br />

09:00 to 10:20 Exascaling Your Apps, moderated by Mike Bernhardt, Publisher, The<br />

Exascale Report (Room C)<br />

11:00 to 11:50 Day 2 Keynote with Dr. Iain Couzin, Professor, Princeton University<br />

(Keynote Hall, Hall 1)<br />

12:00 to 14:00 Exhibits Open / Networking Lunch (Exhibit Hall)<br />

14:00 to 14:50 Emerging Companies Summit Fireside Chat with Jen-Hsun Huang,<br />

NVIDIA CEO and Co-Founder (Marriott Hotel, Ballroom 4)<br />

14:00 to 15:20 Inside Kepler, Stephen Jones, CUDA Developer, NVIDIA, Lars Nyland,<br />

Senior Architect, NVIDIA (Hall 1)<br />

14:00 to 17:55 Los Alamos National Laboratory Accelerated High Performance<br />

Symposium (Room J1)<br />

18:00 to 20:00 Exhibits Open / Networking Reception (Exhibit Hall)<br />

20:00 to 23:00 <strong>GTC</strong> Party (Civic Auditorium)<br />

During a week of rigorous learning, it’s important to cut loose and<br />

celebrate with fellow members of the <strong>GPU</strong> community. Come party and<br />

enjoy the comedic and juggling talents of The Passing Zone and try your<br />

luck in the casino. And don’t forget to raise a glass to your success!<br />

THURSDAY, MAY 17<br />

09:00 to 15:50 Los Alamos National Laboratory Accelerated High Performance<br />

Symposium (Room J1)<br />

11:00 to 11:50 Day 3 Keynote with Robert Boehme CEO & Team Lead, Part-Time<br />

Scientists and Wes Faler, Head of Software Development, Part-Time<br />

Scientists (Keynote Hall, Hall 1)<br />

12:00 to 14:00 Exhibits Open / Networking Lunch (Exhibit Hall)


OPEN GENOMICS ENGINE <br />

Accelerating the DNA-analysis pipeline<br />

for cancer research<br />

Visit the Open Genomics Engine booth (#118) in the<br />

<strong>GTC</strong> exhibit hall to learn more.<br />

Developed by Sponsored by<br />

An NVIDIA Foundation Initiative


Welcome to<br />

NVIDIA’s Emerging<br />

Companies<br />

Summit (ECS) <strong>2012</strong>!<br />

We are thrilled to once again showcase promising<br />

startups that are using the massive computing power<br />

of <strong>GPU</strong> technology to transform existing industries and<br />

create new ones.<br />

From gesture-recognition technology and interactive<br />

video to virtualization and cloud computing, the dozens<br />

of companies from around the world participating in<br />

ECS <strong>2012</strong> are at the cutting-edge of technology. <strong>GPU</strong>s<br />

have recently stormed the handheld computing<br />

market, so you’ll also find a large number of mobile<br />

companies participating in this year’s summit.<br />

ECS itself has become something of a growth industry.<br />

In addition to this being our fourth event in Silicon<br />

Valley, we have recently held successful summits in<br />

Israel and China, with more planned in the near future.<br />

The conference has proven to be a great venue for<br />

startups, analysts, executives and industry experts to<br />

exchange information and understand where<br />

technology is heading.<br />

As a key part of the <strong>GPU</strong> <strong>Technology</strong> <strong>Conference</strong>, ECS<br />

<strong>2012</strong> will be host to hundreds of participants –<br />

including panelists, presenters, analysts, industry<br />

execs and others in our growing audience. Awaiting<br />

them is our best program yet.<br />

This year sees the return of our hugely popular “CEO<br />

on Stage” format, where a select group of CEOs<br />

present their companies to a distinguished panel of<br />

experienced investors, analysts and technology<br />

leaders, who in turn respond with insightful feedback.<br />

NVIDIA CEO and founder Jen-Hsun Huang will also sit<br />

down for another thoughtful and entertaining fireside<br />

chat, this year with Tim Bajarin, president of Creative<br />

Strategies Inc., a leading Silicon Valley industry<br />

analysis and market intelligence firm.<br />

New this year are special events like Startup<br />

University, where presenting and exhibiting companies<br />

will hold workshops on topics such as “Protecting Your<br />

IP Assets in a Global Marketplace” and “Best Practices<br />

for Building Valuable Relationships with <strong>Technology</strong><br />

Industry Analysts.” In addition, the exhibit halls will be<br />

filled with the innovative work of companies in a<br />

diverse array of fields. And this year a jury will select<br />

the most promising companies with the “One to<br />

Watch” awards, announced Wednesday evening in the<br />

Hilton ballroom.<br />

The <strong>GPU</strong> computing ecosystem is growing rapidly –<br />

and you, as an ECS attendee, are a key part of its<br />

success. I encourage you to participate in as many<br />

sessions as possible and thank you for joining us at<br />

what promises to be another superb event.<br />

In closing, I’d like to express gratitude to our sponsors<br />

who are helping to make this event possible, including<br />

Cooley LLP, Morgan Stanley, Silicon Valley Bank,<br />

Deloitte, mergermarket, and Dow Jones Private Equity<br />

& Venture Capital.<br />

Jeff Herbst<br />

Vice President of Business Development, NVIDIA


AGENDA<br />

WEDNESDAY, MAY 16, <strong>2012</strong><br />

MARRIOTT SAN JOSE BALLROOM 4<br />

9:00 to 9:50 S2000 Emerging Companies Summit Opening with Jeff Herbst (VP of<br />

Business Development, NVIDIA), followed by CEO on Stage featuring<br />

� Rocketick (Tomer Ben-David, VP R&D)<br />

� Cortexica (Iain McCready, CEO)<br />

Panelists:<br />

� Jon Peddie, President, Jon Peddie Research<br />

� Neil Sequeira, Managing Director, General Catalyst Partners<br />

� Savitha Srinivasan, Partner, IBM Venture Capital Group<br />

� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />

10:00 to 10:50 S2001 Emerging Companies Summit: CEO on Stage featuring<br />

� Unity Technologies (David Helgason, CEO)<br />

� MirriAd (Mark Popkiewicz, CEO)<br />

� BioDigital (Aaron Oliker, Partner/Director of 3D <strong>Technology</strong> and Frank Sculli,<br />

Co-Founder/Informatics Director)<br />

Panelists:<br />

� Jon Peddie, President, Jon Peddie Research<br />

� Neil Sequeira, Managing Director, General Catalyst Partners<br />

� Savitha Srinivasan, Partner, IBM Venture Capital Group<br />

� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />

11:00 to 11:50 S2002 Emerging Companies Summit: CEO on Stage featuring<br />

� eyeSight Mobile (Gideon Shmuel, CEO)<br />

� Numira Biosciences (David Weinstein, CTO)<br />

� Ubitus (Wesley Kuo, CEO)<br />

Panelists:<br />

� Jon Peddie, President, Jon Peddie Research<br />

� Neil Sequeira, Managing Director, General Catalyst Partners<br />

� Savitha Srinivasan, Partner, IBM Venture Capital Group<br />

� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />

12:00 to 13:50 Networking Lunch and Exhibits (Hall 2 – San Jose Convention Center)


14:00 to 14:50 S2003 Emerging Companies Summit Fireside Chat with Jen-Hsun Huang<br />

(CEO, President and Co-Founder, NVIDIA) and Tim Bajarin (President of<br />

Creative Strategies)<br />

15:00 to 15:50 S2004 Emerging Companies Summit: CEO on Stage featuring<br />

� GAIKAI (David Perry, CEO and Co-Founder)<br />

� Immersive Media (Myles M. McGovern, CEO)<br />

� Numecent (Osman Kent, Co-Founder & CEO)<br />

Panelists:<br />

� Tom Furlong, Managing Director, Granite Ventures<br />

� Rob Enderle, Principal Analyst, Enderle Group<br />

� Flip Gianos, General Partner, Interwest Partners<br />

� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />

16:00 to 16:50 S2005 Emerging Companies Summit: CEO on Stage featuring<br />

� RealView Imaging (Shaul Geldman, Co-Founder and VP of R&D)<br />

� Elemental Technologies (Sam Blackman, CEO and Co-Founder)<br />

� Mersive (Robert Balgley, CEO)<br />

Panelists:<br />

� Tom Furlong, Managing Director, Granite Ventures<br />

� Rob Enderle, Principal Analyst, Enderle Group<br />

� Flip Gianos, General Partner, Interwest Partners<br />

� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />

17:00 to 17:50 S2006 Emerging Companies Summit: CEO on Stage featuring<br />

� Raytrix (Christian Perwass, CEO)<br />

� Playcast (Guy De Beer, CEO)<br />

� Universal Robotics (David Peters, CEO)<br />

Panelists:<br />

� Tom Furlong, Managing Director, Granite Ventures<br />

� Rob Enderle, Principal Analyst, Enderle Group<br />

� Flip Gianos, General Partner, Interwest Partners<br />

� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />

18:00 to 19:50 Networking Reception (Hall 2 - San Jose Convention Center)


Cooley is a proud Platinum Sponsor of<br />

the <strong>2012</strong> NVIDIA <strong>GTC</strong> <strong>Conference</strong><br />

Emerging Company Summit.<br />

Cooley attorneys have served as counselors, strategists<br />

and advocates to technology entrepreneurs and<br />

investment funds since 1959.<br />

Cooley, a global law firm for the converging worlds of high<br />

technology, high finance and high-stakes litigation.<br />

For more information, visit us at www.cooley.com<br />

Experienced <strong>Guide</strong>s<br />

PALO ALTO | NEW YORK | SAN DIEGO | SAN FRANCISCO | RESTON, VA | BROOMFIELD, CO | WASHINGTON, DC | BOSTON | SEATTLE | SHANGHAI<br />

© <strong>2012</strong> Cooley LLP, 101 California Street, 5th Floor, San Francisco, CA 94111. 415/693-2000.


CEO ON STAGE LISTING<br />

BIODIGITAL<br />

BioDigital is the leading developer of state of the art biomedical visualization.<br />

BioDigital recently launched The BioDigital Human - a 3D visualization platform<br />

with a revolutionary approach for communicating health and medical information<br />

with interactive tools for exploring human anatomy, physiology and conditions.<br />

www.biodigital.com<br />

Speakers Aaron Oliker, Partner/Director of 3D <strong>Technology</strong> and<br />

Frank Sculli, Co-Founder/Informatics Director<br />

Session Time Wednesday, May 16 at 10:45<br />

CORTEXICA VISION SYSTEMS<br />

Cortexica Vision Systems are the award winning creators of a bio-inspired vision<br />

system enabling intelligent image recognition using principles derived from the<br />

human visual cortex. Cortexica provides a patented platform for radically new<br />

Visual Search products that deliver exciting new experiences and value for<br />

consumers and businesses.<br />

www.cotexica.com<br />

Speaker Iain McCready, CEO<br />

Session Time Wednesday, May 16 at 09:35<br />

ELEMENTAL TECHNOLOGIES<br />

Elemental Technologies is a leading supplier of video solutions for multiscreen<br />

content delivery. Founded in 2006 and headquartered in Portland, Oregon, the<br />

company pioneered the use of graphics processors to power adaptive video<br />

streaming over IP networks. Top media and entertainment companies around the<br />

world rely on solutions from Elemental to drive next-generation video services.<br />

www.elementaltechnologies.com<br />

Speaker Sam Blackman, CEO and Co-Founder<br />

Session Time Wednesday, May 16 at 16:30<br />

13 CONFERENCE GUIDE EMERGING<br />

COMPANIES SUMMIT


��<br />

���������������<br />

EYESIGHT MOBILE TECHNOLOGIES<br />

eyeSight Mobile Technologies Ltd. presents innovative gesture recognition<br />

technology that powers Touch Free UI solutions, creating an enhanced user<br />

experience when interacting with a variety of digital devices. The technology is<br />

entirely software based, requiring only a standard 2D camera, while operating on<br />

the full range of operating systems.<br />

www.eyesight-tech.com<br />

Speaker Gideon Shmuel, CEO<br />

Session Time Wednesday, May 16 at 11:00<br />

GAIKAI<br />

GAIKAI offers a fully managed cloud platform that is optimized to deliver<br />

high-end video games and applications within seconds to all leading web<br />

browsers, operating systems, and devices, even in Facebook.<br />

www.gaikai.com<br />

Speaker David Perry, CEO and Co-Founder<br />

Session Time Wednesday, May 16 at 15:00<br />

IMMERSIVE MEDIA COMPANY<br />

Immersive Media is the pioneer and leading world provider of 360º, full motion,<br />

interactive video. Our immersive 360º video content is delivered via internet to<br />

PC, Ipad or mobile device. Immersive Media provides the enabling technologies<br />

for interaction videos to record, process, live stream and deliver images from<br />

ours or other wide field cameras, with a patent portfolio covering key discoveries<br />

and capabilities of interactive and immersive video.<br />

www.immersivemedia.com<br />

Speaker Myles M. McGovern, President/CEO<br />

Session Time Wednesday, May 16 at 15:30


MERSIVE<br />

Since it was founded in 2006, Mersive has revolutionized high performance<br />

display setup and maintenance enabling a new class of displays. Mersive’s Sol<br />

software automatically aligns multiple commodity projectors into one seamless<br />

image of extraordinary quality and resolution without the expense of specialized<br />

hardware and services.<br />

www.mersive.com<br />

Speaker Robert Balgley, CEO<br />

Session Time Wednesday, May 16 at 16:45<br />

MIRRIAD<br />

MirriAd is an end to end marketing solution that can be implemented quickly,<br />

easily and cost-effectively using our online campaign management system. We<br />

provide a new and innovative way for advertisers to reach their target audiences,<br />

and for content owners to generate additional revenue. We have an everexpanding<br />

library of content, from films and TV series to corporate training<br />

videos and user-generated material – and we’re always on the lookout for new<br />

and exciting content owners to work with.<br />

www.mirriad.com<br />

Speaker Mark Popkiewicz, CEO<br />

Session Time Wednesday, May 16 at 10:30<br />

NUMECENT<br />

Numecent is a start-up which came out of stealth with a bang in March <strong>2012</strong> and<br />

is the inventor of ‘cloudpaging’. This patented technology enables friction-free<br />

digital delivery of native software and other non-linear assets through<br />

virtualization. One of the benefits of cloudpaging is that it can reduce the<br />

network footprint of digital downloads between 20x and 100x and execute them<br />

natively, at full speed, without actually requiring installation. Once cloudpaged,<br />

applications can even run off-line and always under license control.<br />

www.numecent.com<br />

Speaker Osman Kent, Co-Founder and CEO<br />

Session Time Wednesday, May 16 at 15:45<br />

15 CONFERENCE GUIDE EMERGING<br />

COMPANIES SUMMIT


aytrix<br />

3D light field camera<br />

NUMIRA BIOSCIENCES<br />

Numira Biosciences is a leading provider of specialty contract research services<br />

for preclinical drug and device development. Numira’s customers include the top<br />

biopharmaceutical companies and academic research institutions. Through its<br />

next-generation study portal, Numira provides its customers with interactive tools<br />

for accessing, exploring, and communicating about their preclinical study data.<br />

www.numirabio.com<br />

Speaker David Weinstein, CTO<br />

Session Time Wednesday, May 16 at 11:30<br />

PLAYCAST MEDIA SYSTEM<br />

Playcast Media System brings video games to the world’s largest media<br />

distribution platform – Pay TV networks. The Company’s solution delivers<br />

off-the-shelf next generation video games to existing cable, IPTV and hybrid<br />

satellite platforms. We bring cloud gaming to the world’s hundreds of millions of<br />

paying TV subscribers.<br />

www.playcast-media.com<br />

Speaker Guy De Beer, CEO<br />

Session Time Wednesday, May 16 at 17:15<br />

RAYTRIX<br />

Raytrix develops and markets single-lens 3D video cameras based on their<br />

patented high resolution light field technology, offering solutions for Particle<br />

Image Velocimetry (PIV), optical inspection, face capturing, microscopy – as well<br />

as IP for consumer products (mobile phones).<br />

www.raytrix.de<br />

Speaker Christian Perwass, CEO<br />

Session Time Wednesday, May 16 at 17:00


REALVIEW IMAGING LTD.<br />

RealView Imaging Ltd. is developing a revolutionary 3D holographic display and<br />

interface system, initially for medical imaging applications. RealView’s<br />

proprietary technology projects high-res., full color, dynamic, real-time 3D<br />

holograms “floating in open air” allowing direct and precise interaction with and<br />

within the “in air” image by literally touching the image.<br />

www.realview.co.il<br />

Speaker Shaul Gelman, Co-Founder and VP of R&D<br />

Session Time Wednesday, May 16 at 16:00<br />

ROCKETICK<br />

Rocketick is a leading provider of software simulation acceleration, enabling<br />

acceleration of 10x or more for Verilog simulations. The company’s flagship<br />

product, RocketSim , supports semiconductor companies to reduce the overall<br />

time to market of new chip designs by up to 30%, allowing development teams to<br />

tape-out with greater confidence.<br />

www.rocketick.com<br />

Speaker Tomer Ben-David, Co-Founder and VP of R&D<br />

Session Time Wednesday, May 16 at 09:20<br />

UBITUS<br />

Ubitus Inc., the technology leader in deploying Cloud-enabled rich media<br />

services, offers innovative cloud computing solutions for device manufacturers,<br />

wired/wireless communication service providers, telecommunication operators<br />

and digital content developers. Founded in 2007 and headquartered in Taipei,<br />

Taiwan, the company now has 150 employees and 4 offices in Tokyo, Beijing,<br />

Guangzhou and Seoul.<br />

www.ubitus.com<br />

Speaker Wesley Kuo, CEO<br />

Session Time Wednesday, May 16 at 11:45<br />

17 CONFERENCE GUIDE EMERGING<br />

COMPANIES SUMMIT


unity<br />

UNIVERSAL<br />

Robotics<br />

R<br />

UNITY TECHNOLOGIES<br />

Unity Technologies is revolutionizing the game industry with Unity, its awardwinning<br />

breakthrough development platform. Unity Technologies has more than<br />

450,000 registered users worldwide — including Bigpoint, Cartoon Network,<br />

Coca-Cola, Disney, Electronic Arts, LEGO, Microsoft, NASA, Nickelodeon,<br />

Ubisoft, Warner Bros., large and small studios, indies, students and hobbyists<br />

— all using Unity to create games and interactive 3D on the web, mobile,<br />

consoles and beyond. Unity Technologies is aggressively innovating to expand<br />

usability, power and platform reach along with its Asset Store digital content<br />

marketplace and Union distribution service.<br />

www.unity3d.com<br />

Speaker David Helgason, CEO<br />

Session Time Wednesday, May 16 at 10:00<br />

UNIVERSAL ROBOTICS<br />

Universal Robotics is a software company which has brought to market a new<br />

form of artificial intelligence that uses sensor information to learn. Called<br />

Neocortex it discovers patterns in chaotic environments which are relevant to an<br />

assigned task. It then analyzes those patterns to understand complexity,<br />

improving process. The company has targeted the materials handling industry<br />

as its first market, increasing the flexibility in automated machines. Among<br />

various accolades, Universal won an “Emerging Company to Watch” award from<br />

NVIDIA in 2010<br />

www.universalrobotics.com<br />

Speaker David Peters, CEO<br />

Session Time Wednesday, May 16 at 17:45


CONFERENCE GUIDE<br />

19


WEDNESDAY, MAY 16 & THURSDAY, MAY 17, <strong>2012</strong><br />

ROOM J<br />

Los Alamos National Laboratory, a leading U.S. national security research<br />

institution, co-locates the Accelerated HPC Symposium at <strong>GTC</strong> <strong>2012</strong> and bring<br />

together world leaders in supercomputing to share knowledge and help solve<br />

the world’s most crucial technology challenges.<br />

Symposium highlights include:<br />

� Learning how accelerator technologies can be leveraged in innovative ways to<br />

advance the state-of-the-art for simulations on large-scale systems<br />

� Establishing hardware and software requirements that can meet the<br />

requirements of power, scalability and fault tolerance needed for the next<br />

generation of HPC<br />

� Understanding how legacy codes can be adapted to make use of modern<br />

computing architectures<br />

� Providing a forum for feedback to the vendor community to aid in the adoption<br />

of accelerator technologies


AGENDA<br />

WEDNESDAY, MAY 16<br />

Plenary Session I 14:00–14:45 Opening Keynote with Bill Barth of TACC<br />

14:50–15:15 A New <strong>GPU</strong> Appliance Sorin Faibish (EMC)<br />

15:20–15:45 Accelerator Architectures for HPC Justin Tripp (LANL)<br />

Plenary Session II 16:00–16:25 Adaptive Heterogeneous Computing with OpenCL Simon McIntosh-Smith<br />

(University of Bristol)<br />

16:30–16:55 Accelerating Iterative Linear Solvers Hui Liu (University of Calgary)<br />

17:00–17:25 Efficient AMG on Hybrid <strong>GPU</strong> Clusters Thomas Brandes (SCAI)<br />

17:30–17:55 PISTON: Visualization Portability and<br />

Performance<br />

Christoper Sewell (LANL)<br />

THURSDAY, MAY 17<br />

Scalability:<br />

9:00–9:10 Introduction: Justin Tripp (Chair)<br />

Hardware and Software 9:10–9:20 The FPGA: Another Piece of the Puzzle Justin Tripp (LANL)<br />

9:20–9:30 Increasing Efficiency with Kepler Stephen Jones (NVIDIA)<br />

9:30–9:50 Discussion<br />

9:50–10:00 Break<br />

10:00–10:10 Can You Keep All of the Astronomers Happy All Christopher Fluke<br />

of the Time?<br />

(Swinburne University of<br />

<strong>Technology</strong>)<br />

10:10–10:20 In situ Image Analysis for Large Scale<br />

Visualization<br />

Christopher Sewel (LANL)<br />

10:20–10:40 <strong>GPU</strong> Acceleration of MapReduce Miao Xin (Junnan University)<br />

10:40–10:50 Discussion<br />

Applications –<br />

Methods and<br />

<strong>Program</strong>ming Models,<br />

Part 1<br />

Applications –<br />

Methods and<br />

<strong>Program</strong>ming Models,<br />

Part 2<br />

9:00–9:10 Introduction: Guillaume Colin de Verdiere (Chair)<br />

9:10–9:20 Preconditioning for Large-Scale Linear Solvers Dimitar Lukarski<br />

(Karlsruhe Institute of <strong>Technology</strong>)<br />

9:20–9:30 Changing Data Structures for a Changing World Hui Liu (University of Calgary)<br />

9:30–9:40 Leveraging Roadrunner Experiences Jamaludin Mohd-Yusof (LANL)<br />

9:40–9:50 Discussion<br />

9:50–10:00 Break<br />

10:00–10:30 Taming Laser Plasma Interactions: PICon<strong>GPU</strong> Michael Bussmann (Helmholtz-<br />

Zentrum Dresden-Rossendorf)<br />

10:30–10:50 Discussion<br />

14:00–14:10 The Portability Wall: How hard can it really be? John Stone (Urbana Champaign)<br />

14:10–14:20 Accelerating NAMD James Phillips (University of<br />

Illinois)<br />

14:20–14:30 Refitting Legacy Software for the New Reality John Humphrey (EM Photonics)<br />

14:30–14:40 Unstructured Data Structures: An Achilles Heel? Raphael Poncet (CEA)<br />

14:40–14:50 Discussion<br />

14:50–15:00 Break<br />

15:00–15:10 Power: The New Metric Simon MacIntosh-Smith<br />

(University of Bristol)<br />

15:10–15:20 It’s About Concurrency, Stupid! Stanley Tzeng (UC Davis)<br />

15:20–15:40 Discussion<br />

*Please note: Session details can be found within the daily sessions pages that follow.<br />

CONFERENCE GUIDE<br />

21


SPONSORED BY:<br />

SYNNEX<br />

<strong>GTC</strong> NETWORK<br />

Please visit these Tesla Preferred Partners exhibits and be<br />

entered into a daily drawing to win a free NVIDIA Tesla C2075!<br />

ACE Computers AMAX Appro Aspen Systems<br />

Colfax International Creative Consultants Exxact Technologies Microway<br />

Penguin Computing, Inc Seneca Data Themis


SESSION INFORMATION –<br />

PRE-CONFERENCE TUTORIALS –<br />

MONDAY, MAY 14<br />

MONDAY, MAY 14, 09:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A5<br />

S0005 Languages, APIs and Development Tools for<br />

<strong>GPU</strong> Computing<br />

Get a head start on the conference with this first-day introduction<br />

to key technologies for <strong>GPU</strong> Computing. This 90-minute tutorial<br />

session will cover the key features and differences between the<br />

major programming languages, APIs and development tools<br />

available today. Attendees will also learn several high level design<br />

patterns for consumer, professional and HPC applications, with<br />

practical programming considerations for each.<br />

Speaker(s): Will Ramey (Sr. Product Manager, <strong>GPU</strong><br />

Computing, NVIDIA)<br />

Topic(s): General Interest, Development Tools & Libraries, Application<br />

Design & Porting Techniques (Beginner)<br />

MONDAY, MAY 14, 09:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A3<br />

S0023 NVIDIA OpenGL for <strong>2012</strong><br />

Attend this session to get the most out of OpenGL on NVIDIA<br />

Quadro and GeForce <strong>GPU</strong>s. Topics covered include the latest<br />

advances available for Cg 3.1, the OpenGL Shading Language<br />

(GLSL); programmable tessellation; improved support for<br />

Direct3D conventions; integration with Direct3D and CUDA<br />

resources; bindless graphics; and more. When you utilize the<br />

latest OpenGL innovations from NVIDIA in your graphics<br />

applications, you benefit from NVIDIA’s leadership driving OpenGL<br />

as a cross-platform, open industry standard.<br />

Speaker(s): Mark Kilgard (Principal Software Engineer, NVIDIA)<br />

Topic(s): Computer Graphics, Development Tools & Libraries,<br />

Visualization, Audio, Image and Video Processing (Intermediate)<br />

MONDAY, MAY 14, 09:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM C<br />

S0614 Part 1: Introduction to <strong>GPU</strong> <strong>Program</strong>ming<br />

(Presented by Acceleware)<br />

Join us for an informative introduction to <strong>GPU</strong> <strong>Program</strong>ming. The<br />

session will begin with a brief overview of CUDA and dataparallelism<br />

before focusing on the <strong>GPU</strong> programming model. We<br />

will explore the fundamentals of <strong>GPU</strong> kernels, host and device<br />

responsibilities, CUDA syntax and thread hierarchy. A<br />

programming demonstration of a simple CUDA kernel will<br />

be provided.<br />

Introduction to <strong>GPU</strong> <strong>Program</strong>ming<br />

���������������<br />

������������������<br />

�����������������������<br />

<strong>GPU</strong> kernels<br />

Host vs. device responsibilities<br />

CUDA syntax<br />

Thread hierarchy<br />

���������������������������������������<br />

Speaker(s): Chris Mason (Product Manager, Acceleware)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />

Development Tools & Libraries (Beginner)<br />

MONDAY, MAY 14, 10:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A2<br />

S0341 See the Big Picture Scalable Visualization<br />

Solutions for System Integrators<br />

NVIDIA Quadro Scalable Visualizations Solutions provide many<br />

feature for System Integrators who are building large scale<br />

displays. Come join us in this tutorial session on how to configure<br />

multi-projector systems, stereoscopic and immersive displays.<br />

Speaker(s): Doug Traill (Senior Solutions Architect, NVIDIA)<br />

Topic(s): Visualization (Beginner)<br />

MONDAY, MAY 14, 10:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM B<br />

S0517A <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 1 of 3)<br />

OpenACC is a programming standard for parallel computing on<br />

accelerators (including <strong>GPU</strong>s) using directives. It is designed to<br />

harness the transformative power of heterogeneous computing<br />

systems easily and quickly. In this tutorial you will learn how to<br />

add simple compiler hints to your code to expose parallelism to<br />

the compiler, allowing it to map computation onto an accelerator.<br />

OpenACC directives allow developers to make simple and<br />

portable code changes, enabling an easier migration to<br />

accelerated computing.<br />

This is part 1 of a 3-part tutorial that will take you from an<br />

overview through how to optimize your code. The tutorial starts<br />

with an overview of OpenACC programming in which you will learn<br />

about applying basic OpenACC directives to your code, with<br />

examples. You will also learn more about how <strong>GPU</strong>s execute<br />

parallel programs, and apply this understanding to optimizing<br />

more advanced OpenACC examples to gain larger speedups and<br />

accelerate applications with various types of parallelism.<br />

Lastly, you will see how to use NVIDIA profiling tools to target<br />

your optimizations.<br />

Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA),<br />

Duncan Poole (Senior Manager, HPC, NVIDIA), Cliff Woolley (CUDA<br />

Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

MONDAY, MAY 14, 10:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A3<br />

S0603 <strong>GPU</strong> Ray Tracing<br />

Learn the latest approaches in levering <strong>GPU</strong>s for the fastest<br />

possible ray tracing results from experts developing and<br />

leveraging the NVIDIA OptiX ray tracing engine, the team behind<br />

NVIDIA iray, and those making custom renderers. Multiple<br />

rendering techniques, <strong>GPU</strong> programming languages, out-of-core<br />

rendering, and optimal hardware configurations will be covered in<br />

this cutting-edge discussion.<br />

Speaker(s): Phillip Miller (Director, Workstation Software Product<br />

Management, NVIDIA)<br />

Topic(s): Ray Tracing (Beginner)<br />

MONDAY, MAY 14, 10:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM C<br />

S0615 Part 2: Introduction to the <strong>GPU</strong> Architecture and<br />

Memory Model (Presented by Acceleware)<br />

Explore the memory model of the <strong>GPU</strong>. The first part of the<br />

session covers task parallelism and thread cooperation in <strong>GPU</strong><br />

computing. The second part focuses on the different memory<br />

types available on the <strong>GPU</strong>. We will define shared, constant and<br />

global memory and discuss the best locations to store your<br />

23 CONFERENCE GUIDE MONDAY


MONDAY<br />

application data for optimized performance. A programming<br />

demonstration of shared memory will be delivered.<br />

Introduction to the <strong>GPU</strong> Architecture and Memory Model<br />

������������������<br />

�������������������������������������<br />

������������������<br />

Shared memory<br />

Constant memory<br />

Global memory<br />

���������������������������������<br />

Speaker(s): Chris Mason (Product Manager, Acceleware)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />

Development Tools & Libraries (Beginner)<br />

MONDAY, MAY 14, 10:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A5<br />

S0624 Introduction to CUDA C<br />

Starting with a background in C or C++, learn everything you need<br />

to know in order to start programming in CUDA C. Beginning with<br />

a “Hello, World” CUDA C program, explore parallel programming<br />

with CUDA through a number of hands-on code examples.<br />

Examine more deeply the various APIs available to CUDA<br />

applications and learn the best (and worst) ways in which to<br />

employ them in applications.<br />

Speaker(s): Justin Luitjens (Devtech Engineer, NVIDIA)<br />

Topic(s): <strong>Program</strong>ming Languages & Techniques (Beginner)<br />

MONDAY, MAY 14, 13:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM B<br />

S0517B <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 2 of 3)<br />

OpenACC is a programming standard for parallel computing on<br />

accelerators (including <strong>GPU</strong>s) using directives. It is designed to<br />

harness the transformative power of heterogeneous computing<br />

systems easily and quickly. In this tutorial you will learn how to<br />

add simple compiler hints to your code to expose parallelism to<br />

the compiler, allowing it to map computation onto an accelerator.<br />

OpenACC directives allow developers to make simple and<br />

portable code changes, enabling an easier migration to<br />

accelerated computing.<br />

This is part 2 of a 3-part tutorial that will take you from an<br />

overview through how to optimize your code. The tutorial starts<br />

with an overview of OpenACC programming in which you will learn<br />

about applying basic OpenACC directives to your code, with<br />

examples. You will also learn more about how <strong>GPU</strong>s execute<br />

parallel programs, and apply this understanding to optimizing<br />

more advanced OpenACC examples to gain larger speedups and<br />

accelerate applications with various types of parallelism.<br />

Lastly, you will see how to use NVIDIA profiling tools to target<br />

your optimizations.<br />

Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA),<br />

Duncan Poole (Senior Manager, HPC, NVIDIA), Cliff Woolley (CUDA<br />

Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

MONDAY, MAY 14, 13:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A2<br />

S0530 Multi-Display Roundtable<br />

Join NVIDIA product manager and application engineers for<br />

multi-display systems for an interactive discussion on the<br />

current trends in video walls, blended multi-projector systems<br />

and its deployment.<br />

Speaker(s): Andrew Page (Senior Product Manager, NVIDIA), Shalini<br />

Venkataraman (Senior Applied Engineer, NVIDIA), Ian Williams (NVIDIA)<br />

Topic(s): Visualization (Beginner)<br />

MONDAY, MAY 14, 13:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A3<br />

S0604 NVIDIA Advanced Rendering Solutions<br />

The full range of advanced rendering solutions and frameworks<br />

from NVIDIA will be explored in this insightful product and<br />

technology discussion and demonstration. Come learn about the<br />

latest possibilities involving advanced rendering techniques and<br />

how they integrate within commercial products – from production<br />

ray tracing to volumetric and distributed rendering.<br />

Speaker(s): Phillip Miller (Director, Workstation Software Product<br />

Management, NVIDIA)<br />

Topic(s): Ray Tracing (Advanced)<br />

MONDAY, MAY 14, 13:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM C<br />

S0616 Part 3: Debugging <strong>GPU</strong> <strong>Program</strong>s (Presented<br />

by Acceleware)<br />

Get the low down on debugging your <strong>GPU</strong> program. This session<br />

includes discussion on debugging techniques and tools to help<br />

you identify issues in your kernels. The latest debugging tools<br />

provided in CUDA 4.1 including Parallel NSight, cuda-gdb and<br />

cuda-memcheck will be discussed. A programming<br />

demonstration of Parallel NSight will be provided.<br />

Debugging <strong>GPU</strong> <strong>Program</strong>s<br />

��������������������������������<br />

����������<br />

�����������������<br />

���������������<br />

�����������������������������������<br />

Speaker(s): Chris Mason (Product Manager, Acceleware)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />

Development Tools & Libraries (Beginner)<br />

MONDAY, MAY 14, 13:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A5<br />

S0629 CUDA Accelerated Compute Libraries<br />

The libraries distributed in the CUDA SDK and offered by third<br />

parties provide a wealth for functions commonly encountered in a<br />

<strong>GPU</strong> acceleration project. Using these libraries can often<br />

significantly shorten the development time of a <strong>GPU</strong> project while<br />

leading to high-performance, high-quality software. In this<br />

tutorial, we will provide an overview of the libraries in the CUDA<br />

SDK, including cuBLAS, cuRAND, NPP and Thurst and introduce<br />

common use cases. The audience will not only learn about the<br />

strengths of the individual libraries, but also learn about the<br />

decision making process to select the best suited library for<br />

their project.<br />

Speaker(s): Peter Messner (NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

MONDAY, MAY 14, 13:00 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A8<br />

S0630 Part 1 of 2: <strong>Program</strong>ming Heterogeneous Manycores<br />

Using Directives (Presented by CAPS)<br />

Directive-based programming is a very promising technology to<br />

deal with Many-Core. In this context, HPC users can rely on<br />

emerging standards such as OpenACC and OpenHMPP. CAPS will<br />

introduce OpenACC and HMPP directive-based programming


models with companion tools (e.g. for tracing, tuning, debugging):<br />

HMPP Wizard, CULA, ArrayFire, Vampir, Paraver, DDT,<br />

CodeletFinder, etc. The speakers will provide insights on how <strong>GPU</strong><br />

/ CPU can be exploited in a unified manner and how code tuning<br />

issues can be minimized. The discussion will also cover the use of<br />

libraries which is essential when addressing Many-Core<br />

<strong>Program</strong>ming. Pathscale will present its product supporting<br />

OpenHMPP programming model.<br />

Speaker(s): Francois Bodin (CAPS), Christopher Bergström (Pathscale)<br />

Topic Area(s): Parallel <strong>Program</strong>ming Languages & Compilers;<br />

Development Tools & Libraries (Beginner)<br />

MONDAY, MAY 14, 14:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A5<br />

S0027A All-In-One Debugging Experience with CUDA-<br />

GDB and CUDA-MEMCHECK<br />

CUDA Debugger tools CUDA-GDB and CUDA-MEMCHECK provide<br />

a whole new feature set to help improve your CUDA application<br />

development cycle. This session is a detailed walk-through of the<br />

key new features and advanced techniques on using CUDA-GDB<br />

and CUDA-MEMCHECK together to improve overall code<br />

productivity. This tutorial will also include live demos.<br />

This session will repeat on Wednesday at 14:00.<br />

Speaker(s): Geoff Gerfin (Technical Manager and Senior Engineer,<br />

NVIDIA), Vyas Venkataraman (Software Engineer, NVIDIA)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

MONDAY, MAY 14, 14:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM B<br />

S0517C <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 3 of 3)<br />

OpenACC is a programming standard for parallel computing on<br />

accelerators (including <strong>GPU</strong>s) using directives. It is designed to<br />

harness the transformative power of heterogeneous computing<br />

systems easily and quickly. In this tutorial you will learn how to<br />

add simple compiler hints to your code to expose parallelism to the<br />

compiler, allowing it to map computation onto an accelerator.<br />

OpenACC directives allow developers to make simple and<br />

portable code changes, enabling an easier migration to<br />

accelerated computing.<br />

This is a 3-part tutorial that will take you from an overview<br />

through how to optimize your code. The tutorial starts with an<br />

overview of OpenACC programming in which you will learn about<br />

applying basic OpenACC directives to your code, with examples. You<br />

will also learn more about how <strong>GPU</strong>s execute parallel programs,<br />

and apply this understanding to optimizing more advanced<br />

OpenACC examples to gain larger speedups and accelerate<br />

applications with various types of parallelism. Lastly, you will see<br />

how to use NVIDIA profiling tools to target your optimizations.<br />

Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA),<br />

Duncan Poole (Senior Manager, HPC, NVIDIA), Cliff Woolley (CUDA<br />

Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

MONDAY, MAY 14, 14:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A3<br />

S0522 Introduction to CUDA Fortran<br />

This tutorial will cover various aspects of writing code in CUDA<br />

Fortran, which is the Fortran interface to the CUDA architecture.<br />

Topics covered will include a basic introduction to parallel<br />

programming concepts using CUDA, performance measurements<br />

and metrics, optimization, and multi-<strong>GPU</strong> programming via CUDA<br />

4.0’s peer-to-peer capability and MPI. Several case studies will be<br />

presented as well.<br />

Speaker(s): Massimiliano Fatica (Manager, NVIDIA), Gregory Ruetsch<br />

(Applied Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

MONDAY, MAY 14, 14:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A2<br />

S0601 <strong>GPU</strong>-Based Video Processing Round Table<br />

Have questions, concerns or thoughts about the direction of<br />

<strong>GPU</strong>-based video and image processing? Join NVIDIA engineers<br />

and product managers for a lively discussion of such topics as<br />

application design, multi-<strong>GPU</strong> architecture, data movement,<br />

threading, APIs, and color management as they apply to Video and<br />

Image processing applications.<br />

Speaker(s): Alina Alt (Applied Engineer, NVIDIA), Andrew Page (Senior<br />

Product Manager, NVIDIA), Thomas True (Senior Applied Engineer,<br />

NVIDIA), Ian Williams (Director of Applied Engineering, NVIDIA), Eric<br />

Young (Manager of Applied Research, NVIDIA)<br />

Topic(s): Audio, Image and Video Processing (Beginner)<br />

MONDAY, MAY 14, 14:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM C<br />

S0617 Part 4: Introduction to Optimizations and Profiling<br />

(Presented by Acceleware)<br />

Learn how to optimize and profile your algorithms for the <strong>GPU</strong>.<br />

This session will cover the essentials of code optimization and will<br />

include: arithmetic optimizations, warps, branching efficiency,<br />

memory latency/occupancy and memory performance<br />

optimizations. Real life commercial examples will be discussed to<br />

highlight the critical aspects of <strong>GPU</strong> optimization techniques. A<br />

programming demonstration using the NVIDIA Visual Profiler will<br />

be included.<br />

Introduction to Optimizations and Profiling<br />

��������������������������<br />

�������<br />

���������������������<br />

���������������������������<br />

�����������������������������������<br />

����������������������������������<br />

Speaker(s): Chris Mason (Product Manager, Acceleware)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />

Development Tools & Libraries (Beginner)<br />

MONDAY, MAY 14, 14:30 (80 MINUTES)<br />

PRE-CONFERENCE TUTORIAL - ROOM A8<br />

S0631 Part 2: <strong>Program</strong>ming Heterogeneous Many-cores<br />

Using Directives (Presented by CAPS)<br />

Directive-based programming is a very promising technology to<br />

deal with Many-Core. In this context, HPC users can rely on<br />

emerging standards such as OpenACC and OpenHMPP. CAPS will<br />

introduce OpenACC and HMPP directive-based programming<br />

models with companion tools (e.g. for tracing, tuning, debugging):<br />

HMPP Wizard, CULA, ArrayFire, Vampir, Paraver, DDT,<br />

CodeletFinder, etc. The speakers will provide insights on how <strong>GPU</strong><br />

/ CPU can be exploited in a unified manner and how code tuning<br />

issues can be minimized. The discussion will also cover the use of<br />

libraries which is essential when addressing Many-Core<br />

<strong>Program</strong>ming. Pathscale will present its product supporting<br />

OpenHMPP programming model.<br />

Speaker(s): Francois Bodin (CAPS), Christopher Bergström (Pathscale)<br />

Topic Area(s): Parallel <strong>Program</strong>ming Languages & Compilers;<br />

Development Tools & Libraries (Beginner)<br />

25 CONFERENCE GUIDE MONDAY


�������������������������������������������<br />

������������������������������������������������������������������������������������������������������������ ��������������������������������������������������������������������������������������������������<br />

������������������������������������������������������������������������������������������������������������<br />

��������������������������������������������������������������������������������������������������������<br />

������������������������������������������������������������������<br />

�������������������������������������������������������������������������������������������������������������<br />

��������������������������������������������������������������������������������������������������������������<br />

����������������������������<br />

������������������������������������������������������������������������������������������������������������<br />

��������������������������������������������������������������������������������������������������������������<br />

�����������������������������������������������������������<br />

��������������������������������������������������������������������������������������������������������������<br />

�����������������������������������������������������������������������������������������������������������������<br />

�����������������������������<br />

����������������������������������������������������������������� �� ��������� ��������� ��� ������� ��������<br />

�������������������������������<br />

������������������������������������������������������������������������������������������������������������ � �������������������������


SESSION INFORMATION<br />

TUESDAY, MAY 15<br />

TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />

ROOM J1<br />

S0102 Flame On: Real-Time Fire Simulation for<br />

Video Games<br />

Fire and explosions are common elements in video games and<br />

other virtual environments. We present a real-time fire simulator<br />

inspired by the paper “Directable, High-Resolution Simulation of<br />

Fire on the <strong>GPU</strong>” [Horvath and Geiger 2009], but this time<br />

implemented entirely in CUDA and targeted at adding interactive<br />

fire to video games. This talk will describe both the tricks necessary<br />

to implement an efficient fluid simulator in CUDA, and techniques<br />

for rendering the results to achieve realistic looking fire.<br />

Speaker(s): Simon Green (Senior Software Engineer, NVIDIA),<br />

Christopher Horvath (Global <strong>Technology</strong> Technical Director, Pixar)<br />

Topic(s): Computer Graphics, Computational Fluid Dynamics (Intermediate)<br />

TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0248 Excitements, Challenges, and Rewards In<br />

Optimizing GP<strong>GPU</strong> Kernels<br />

Learn about the excitements and challenges in optimizing CUDA<br />

kernels for the last two generations of NVIDIA GP<strong>GPU</strong>s.<br />

Autotuning, although crucially important, is merely a silver bullet<br />

to port code from one generation of <strong>GPU</strong> to another. The process<br />

required many steps: (a) architecture specific algorithms, (b)<br />

tuning algorithms, (c) finding innovative tricks to handle generic<br />

cases, (d) tweaking <strong>GPU</strong>’s internal scheduling to handle partition<br />

camping, and (e) above all, the dedication of many enthusiastic<br />

programmers. We will share our experiences and discoveries<br />

through the development of MAGMABLAS - a subset of CUDA<br />

BLAS, highly optimized for NVIDIA GP<strong>GPU</strong>s.<br />

Speaker(s): Rajib Nath (Student, University of California San Diego),<br />

Stanimire Tomov (Research Director, University of Tennessee, Knoxville)<br />

Topic(s): Algorithms & Numerical Techniques, Application Design &<br />

Porting Techniques, Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />

ROOM A8<br />

S0268 Virtual Process Engineering - Realtime<br />

Simulation of Multiphase Systems<br />

Realtime simulation and virtual reality with quantitatively correct<br />

physics for industrial processes with multi-scale and multiphase<br />

system is once a remote dream for process engineering, but is<br />

becoming true now with CPU-<strong>GPU</strong> hybrid supercomputing.<br />

Numerical and visualization methods for such simulations on<br />

thousands of <strong>GPU</strong>s will be reported with applications in chemical<br />

and energy industries.<br />

Speaker(s): Wei Ge (Professor, Institute of Process Engineering,<br />

Chinese Academy of Sciences)<br />

Topic(s): Computational Fluid Dynamics, Molecular Dynamics,<br />

Computational Physics, Algorithms & Numerical Techniques (Advanced)<br />

TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />

ROOM A7<br />

S0296 A <strong>GPU</strong>-Enabled SPH Method for Micro and<br />

Nanofluidic Simulations<br />

With SPH methods multi-phase flows within complex geometries<br />

can be efficiently investigated. Also physical effects present in<br />

micro- and nanofluidic applications are described with little effort<br />

using the SPH methodology. In order to investigate microfluidic<br />

applications relevant to industry, large domains and high spatial<br />

resolutions are required. Therefore, a SPH method for accelerated<br />

computations on <strong>GPU</strong>s is currently developed. The code features<br />

dynamic casting of computational data into blocks of appropriate<br />

size to fit the <strong>GPU</strong> memory layout. Also tree-like data structures<br />

for efficient manipulation of particle distributions help to obtain<br />

significant performance gains on <strong>GPU</strong> hardware.<br />

Speaker(s): Daniel Gaudlitz (Research Associate, Technische<br />

Universität München)<br />

Topic(s): Computational Fluid Dynamics, Algorithms & Numerical<br />

Techniques (Intermediate)<br />

TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />

ROOM J3<br />

S0317 Compiling a Parallel Domain Specific Language<br />

to <strong>GPU</strong>s<br />

Discuss techniques for compiling Parallel DSLs to <strong>GPU</strong>s. Verilog<br />

is a Domain Specific Language for Hardware Description. Verilog<br />

users express parallelism with guarded processes similar to<br />

Occam’s guarded commands. Review Verilog semantics, and<br />

different approaches to compiling Verilog to parallel architectures<br />

and to <strong>GPU</strong>s. Discuss challenges with (a) Verilog description’s<br />

runtime behavior (b) managing process dependency. Discuss<br />

approaches and challenges in compiling a parallel DSL to CUDA C.<br />

Speaker(s): Ramesh Narayanaswamy (Principal Engineer, Synopsys Inc.)<br />

Topic(s): Electronic Design Automation, Application Design & Porting<br />

Techniques (Intermediate)<br />

TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />

ROOM K<br />

S0337 High-Throughput Epistasis Screening Using <strong>GPU</strong>s<br />

Epistasis is the interaction of two or more genes in coding for a<br />

biological property. Epistasis is believed to be an important factor<br />

in an individual’s susceptibility to disease, and the search for<br />

epistasis is a major component in the development of<br />

personalized approaches to genomic medicine. Statistical tests<br />

for epistasis are typically confounded by the multiple-testing<br />

problem, that is, the aggregated loss of precision incurred through<br />

repeated hypothesis testing. One way to circumvent this problem<br />

is to simulate a false-discovery rate via resampling. We report<br />

success in using <strong>GPU</strong>s to accelerate these highly computeintensive<br />

resampling techniques.<br />

Speaker(s): Mark Seligman (Senior Scientist, Insilicos LLC)<br />

Topic(s): Bioinformatics, Life Sciences, Supercomputing,<br />

Cloud Computing (Intermediate)<br />

TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />

ROOM A2<br />

S0395 <strong>GPU</strong> Enablement in Adobe Photoshop<br />

Photoshop is one of the most popular products in history. It<br />

attempts to delight the customers with an immersive experience.<br />

Since CS4, Adobe has been tapping into the horsepower of the<br />

<strong>GPU</strong> to create a compelling playground for the imaginations of<br />

creative pros. Please join us to review the latest developments on<br />

how <strong>GPU</strong>s have been an enabling force.<br />

Speaker(s): Jeff Chien (Adobe Systems), Jerry Harris (Senior Computer<br />

Scientist II, Adobe Systems)<br />

Topic(s): Digital Content Creation & Film, Audio, Image and Video<br />

Processing (Beginner)<br />

27 CONFERENCE GUIDE TUESDAY


TUESDAY<br />

TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />

ROOM C<br />

S0419A Optimizing Application Performance with CUDA<br />

Profiling Tools<br />

NVIDIA provides two powerful profiling tools that you can use to<br />

maximize your application’s performance. The NVIDIA Visual Profiler<br />

helps you understand your application’s behavior with a detailed<br />

timeline and data from <strong>GPU</strong> performance counters. The Visual<br />

Profiler also provides an automatic, data-driven analysis engine that<br />

provides suggestions on potential optimization strategies for your<br />

application. Nvprof is a command-line profiler that provides<br />

gprof-like functionality for the <strong>GPU</strong>. Nvprof provides summary<br />

information about where your application is spending the most time,<br />

so that you can focus your optimization efforts. This session will<br />

provide a step-by-step walk through of both of these profiling tools,<br />

showing how you can use these tools to identify optimization<br />

opportunities at the application, kernel, and source-line levels.<br />

This session will repeat Wednesday at 14:00 (S0419B).<br />

Speaker(s): David Goodwin (Software Engineer, NVIDIA)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />

ROOM J2<br />

S0527 <strong>GPU</strong>s and the Next-Generation Aerial Surveillance<br />

Graphics processors are already used for computationally<br />

intensive video tasks in many ISR (Intelligence, Surveillance,<br />

Reconnaissance) applications; <strong>GPU</strong>-based system for video<br />

enhancement and analytics outperforms a similarly priced<br />

CPU-based system 5-to-1 at HD resolutions. Our initial tests on 64<br />

megapixel Wide Area Aerial Surveillance (WAAS) data show at least<br />

10x speedup with tasks such as super-resolution or moving target<br />

indication. In this talk, we’ll discuss unique design and<br />

implementation challenges of real-time processing of very large<br />

video data sets. We will demonstrate our existing <strong>GPU</strong>-based<br />

software, IKENA ISR, and discuss its video-processing pipeline and<br />

innovative processing solutions that are promising to dramatically<br />

expand capabilities of emerging aerial surveillance platforms.<br />

Speaker(s): Nikola Bozinovic (CTO, MotionDSP)<br />

Topic(s): General Interest (Beginner)<br />

TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />

ROOM A1<br />

S0607 High Performance 3D Perception<br />

The path to general purpose graphics programming was driven by<br />

computer graphics: the process of rendering 3d models into 2d<br />

viewpoints. With the advent of flexible programming of GP<strong>GPU</strong><br />

processing, this process can be reversed. 3D perception is the<br />

problem of inferring structure and motion of the physical world<br />

from 2d and 3d measurements. In this talk, we will demonstrate<br />

the role GP<strong>GPU</strong> plays in a diverse set of applications in high speed<br />

3d perception and discuss optimization of these techniques for the<br />

GP<strong>GPU</strong>. We also demonstrate several capabilities of future<br />

systems which are enabled by GP<strong>GPU</strong> technologies.<br />

Speaker(s): Chris Slaughter (President, University of Texas Perception,<br />

Lynx Labs)<br />

Topic(s): Computer Vision (Beginner)<br />

TUESDAY, MAY 15, 09:30 (25 MINUTES)<br />

ROOM J2<br />

S0040 Introducing CUDA in KBE Applications for Digital<br />

Vehicle Development <strong>Program</strong>s<br />

Get the latest development in Next Generation Knowledge Based<br />

Engineering (KBE) software which provides real results over the<br />

traditional design approach. Today there exist numerous KBE<br />

applications in the field of vehicle ergonomics, suspension, NVH,<br />

safety, regulations etc which deal with huge number of iterations<br />

and mathematical algorithm. With <strong>GPU</strong> computing and CUDA the<br />

KBE kernel is restructured to incorporate parallel programming<br />

model which helps the applications run faster and achieving time<br />

reduction from hours to seconds. KBE geometry kernel also gets<br />

benefited by enabling CUDA in topology based operations which<br />

take lot of time when performed on CPU.<br />

Speaker(s): Avijit Santra (Project Manager, Knowledge Based<br />

Engineering, Tata Motors Limited)<br />

Topic(s): General Interest (Intermediate)<br />

TUESDAY, MAY 15, 09:30 (25 MINUTES)<br />

ROOM K<br />

S0083 Swift: A <strong>GPU</strong>-based Smith-Waterman Sequence<br />

Alignment <strong>Program</strong><br />

This session describes Swift, a <strong>GPU</strong>-based Smith-Waterman<br />

implementation for aligning short DNA sequences to large<br />

genomes. Swift has been designed to reduce computation time<br />

and lower hardware cost. Also, unlike other leading <strong>GPU</strong>-based<br />

Smith-Waterman sequence alignment programs like CUDASW++<br />

and SWCUDA which focus on protein sequence alignment, Swift<br />

has been developed for DNA sequence alignment. Swift performs<br />

200x faster than CUDASW++ using a test data set containing 1000<br />

reads (100 bases each) and 1000 references (1000 bases each),<br />

and it performs 11x faster than the CPU-based implementation of<br />

Smith-Waterman using 24 million reads (100 bases each) and<br />

human chromosome 1.<br />

Speaker(s): Pankaj Gupta (Bioinformatics Application Developer, St<br />

Jude Children’s Research Hospital)<br />

Topic(s): Bioinformatics (Beginner)<br />

TUESDAY, MAY 15, 09:30 (25 MINUTES)<br />

ROOM A7<br />

S0258 Sailfish: Lattice Boltzmann Fluid Simulations with<br />

<strong>GPU</strong>s and Python<br />

Learn how Run-Time Code Generation (RTCG) techniques allowed<br />

for fast development of a lattice Boltzmann (LB) fluid dynamics<br />

solver called Sailfish. Sailfish is completely open source, supports<br />

a wide variety of LB models (single and multiple relaxation times,<br />

the entropic model; single and binary fluids) and can take<br />

advantage of multiple <strong>GPU</strong>s. Even though the project is written<br />

predominantly in Python, no performance compromises are made.<br />

This talk will introduce the basic design principles of Sailfish and<br />

illustrate how RTCG allows to exploit the power of <strong>GPU</strong>s with<br />

minimal programmer effort.<br />

Speaker(s): Michal Januszewski (PhD Student/Software Engineer,<br />

University of Silesia in Katowice/Google Switzerland)<br />

Topic(s): Computational Fluid Dynamics, Computational Physics,<br />

Development Tools & Libraries (Intermediate)<br />

TUESDAY, MAY 15, 9:30 (25 MINUTES)<br />

ROOM J3<br />

S0329 Using <strong>GPU</strong>s to Speedup Computational Lithography<br />

In this paper we show how <strong>GPU</strong>s can be used to significantly<br />

speedup computational lithography, which is heavily used in the<br />

Electronic Design Automation (EDA) industry. In particular, we<br />

demonstrate a noticeable performance increase in several basic<br />

optical lithography algorithms as well as the speedup of the<br />

full-chip verification software, crucial parts of which were ported


to NVIDIA’s <strong>GPU</strong>s. We summarize the advantages, disadvantages<br />

and challenges of using <strong>GPU</strong>s and compare it to more traditional<br />

multithreading and distributed computing alternatives for the<br />

same applications.<br />

Speaker(s): Constantin Chuyeshov (Algorithm Engineer, Cadence<br />

Design Systems)<br />

Topic(s): Electronic Design Automation (Intermediate)<br />

TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />

ROOM A1<br />

S0404 Computer Vision Libraries with <strong>GPU</strong>s<br />

Learn how Computer Vision libraries can take advantage of <strong>GPU</strong>s.<br />

Computer Vision algorithms are extremely well suited for <strong>GPU</strong><br />

architectures because they demand large computational power<br />

that <strong>GPU</strong>s offer over CPUs. This talk provides an overview of the<br />

different <strong>GPU</strong> libraries such as (OpenCV, <strong>GPU</strong>CV, PCL, and NPP<br />

Libraries) and online resources (<strong>GPU</strong>4Vision and OpeNVIDIA)<br />

available for developers today. Examples and demonstrations of<br />

practical applications making use of these libraries will also be<br />

shown throughout the talk.<br />

Speaker(s): Eric Young (Manager of Developer <strong>Technology</strong> Profesional<br />

and Consumer Applications, NVIDIA)<br />

Topic(s): Computer Vision, Audio, Image and Video Processing (Beginner)<br />

TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />

ROOM B<br />

S0430 Developing Next-Generation CUDA Acceleration<br />

in Wolfram’s Mathematica with Parallel Nsight<br />

Since version 8, Mathematica offers advanced support for <strong>GPU</strong><br />

acceleration with optimized CUDA functions and a built-in<br />

framework for developing scientific CUDA kernel code. In this<br />

session, the Wolfram development team will share their<br />

experience developing their next-generation CUDA support in<br />

Mathematica. From the unique ability of Parallel Nsight to attach<br />

its CUDA debugger to a running process, the new parallel Warp<br />

Watch for warp-wide variable views and expression evaluation, to<br />

the latest runtime CUDA profiling experiments; they will<br />

demonstrate how they were able to take advantage of Parallel<br />

Nsight to get the most out of CUDA and the <strong>GPU</strong>.<br />

Speaker(s): Abdul Dakkak (Kernel Developer, Wolfram), Sebastien Domine<br />

(Sr. Director, Software Engineering, Developer Tools, NVIDIA), Ulises<br />

Cervantel-Pimentel (Senior Kernel Developer, Wolfram)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />

ROOM M<br />

S0618 Best Practices of a 800TFlop Hybrid<br />

Supercomputer Implementation (Presented by Appro)<br />

Learn about the “Frontier Computing System”, deployed by Appro<br />

for the University Of Tsukuba Center Of Computational Sciences in<br />

Japan containing over half a million <strong>GPU</strong> cores. Learn how<br />

reliability, availability, manageability and compatibility were<br />

essential for this successful 800TF hybrid supercomputing<br />

implementation. Explore new techniques in how HA-PACS is<br />

accelerating large scale parallel code by combining CPU/<strong>GPU</strong><br />

processing cluster configurations for scientific research, such as<br />

astrophysics and climate modeling. Learn how to improve data I/O<br />

performance and memory size limitations in hybrid systems<br />

configured with Lustre File System offering the best<br />

performance per dollar and excellent memory capacity per/FLOP.<br />

Speaker(s): Taisuke Boku (Deputy Director of Center for Computational<br />

Sciences at University of Tsukuba), Steve Lyness (VP of HPC Solutions<br />

Engineering, Appro)<br />

Topic(s): Supercomputing, Astronomy & Astrophysics (Intermediate)<br />

TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0800 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight<br />

development team! Whether you would like a private meeting to<br />

discuss specific product features or test out your application with<br />

the latest version of Nsight, or you just want to hang out with the<br />

team after attending one of the exciting training session, the lab is<br />

great place to learn everything you ever wanted to know about the<br />

tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM J2<br />

S0013 <strong>GPU</strong>s for Fast Triggering in NA62 Experiment<br />

We discuss an approach for using commercial graphic processors<br />

(<strong>GPU</strong>s) at the earliest trigger stages in high-energy physics<br />

experiments, and study its implementation on a real trigger<br />

system in preparation. In particular we focus on the possibility to<br />

reconstruct rings in a Cherenkov detector as building block of a<br />

selective trigger condition for rare decay search. Latency and<br />

processing rate measurements on several state-of-the-art<br />

devices are presented, and the potential issues related to<br />

processing time jitter and data transfer throughput are discussed.<br />

Speaker(s): Gianluca Lamanna (Researcher, CERN), Marco Sozzi<br />

(Associate Professor, Physics Department of Pisa)<br />

Topic(s): General Interest (Intermediate)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM A8<br />

S0031 Unstructured Grid Numbering Schemes for <strong>GPU</strong><br />

Coalescing Requirements<br />

Learn how to achieve high performance for computational fluid<br />

dynamics (CFD) solvers over unstructured grids using numbering<br />

schemes tailored for <strong>GPU</strong> coalescing requirements. Using these<br />

techniques, unstructured grid CFD solvers can make more<br />

effective use of memory bandwidth, which is an otherwise<br />

significant performance bottleneck that has so far led to relatively<br />

limited performance gains on <strong>GPU</strong>s in comparison to structured<br />

grid CFD solvers. Performance benchmarks will be shown using<br />

the Jet Engine Noise Reduction (JENRE) code.<br />

Speaker(s): Andrew Corrigan (Research Mathematician, Naval<br />

Research Laboratory), Johann Dahm (University of Michigan)<br />

Topic(s): Computational Fluid Dynamics, Algorithms & Numerical<br />

Techniques, Computational Physics (Advanced)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM A7<br />

S0251 RANS CFD Solver on Fermi<br />

SJTU-NS3D is an in-house CFD code co-developed by SJTU and<br />

COMAC for large civil airplane, solving 3D Reynolds Average<br />

Navier-Stokes (RANS) equations on structured grids by finite<br />

volume method, which could be used in designing wing model. In<br />

this talk, we will present the design and further optimization of<br />

CUDA version of SJTU-NS3D, and it achieves 20-fold speedup for<br />

standard M6 wing model and 37-fold speedup for wing model<br />

candidate from COMAC on single Fermi C2050.<br />

29 CONFERENCE GUIDE TUESDAY


<strong>GPU</strong> SuperBlade®<br />

SBI-7127RG<br />

Suports 20 <strong>GPU</strong>s in 7U<br />

4U 4 <strong>GPU</strong> SuperServer®<br />

SS7047GR Series<br />

Supports Up to 4 Double-Width <strong>GPU</strong>s in 4U<br />

�����������������������������������<br />

HPC Systems Optimized for Scientifi c, Engineering and Computational Finance Applications<br />

�� Up to 20 <strong>GPU</strong>s in 7U<br />

�� Non-Blocking Native PCI-E 3.0 x16 Direct Connections to <strong>GPU</strong>s<br />

�� Centralized Remote Management Module<br />

(IPMI 2.0, KVM-over-IP, Remote Virtual Media)<br />

�� Redundant Platinum Level (94%+) High-Effi ciency Power Supplies<br />

�� New Dual Intel® Xeon® E5-2600 Processor Family<br />

2U 4/6 <strong>GPU</strong> SuperServer®<br />

SS2027GR Series<br />

Supports Up to 6 Double-Width <strong>GPU</strong>s in 2U<br />

www.supermicro.com/X9<br />

1U 3/4 <strong>GPU</strong> SuperServer®<br />

SS1027GR Series<br />

Supports Up to 4 Double-Width <strong>GPU</strong>s in 1U<br />

© Super Micro Computer, Inc. Specifi cations subject to change without notice.<br />

Intel®, the Intel® logo, Xeon®, and Xeon® Inside, are trademarks or registered trademarks of Intel Corporation in the US and other countries. All other brands and names are the property of their respective owners.<br />

SMCI-<strong>2012</strong>0221- 1


Speaker(s): James Lin (Assistant Professor, Shanghai Jiao<br />

Tong University)<br />

Topic(s): Computational Fluid Dynamics (Intermediate)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0255 Telecom Systems Simulations Acceleration via<br />

CPU/<strong>GPU</strong> Co-Processing: Turbo Codes Case Study<br />

Learn how the struggle for acceleration of simulations of a<br />

Serially Concatenated turbo code (SCCC) led to the knowledge of<br />

new techniques applicable to a broad range of non-natively<br />

parallel physical layer telecommunication systems simulations.<br />

The overall architectural features of CUDA became inspiring for<br />

newer parallelization techniques involving algorithm engineering;<br />

the simulation acceleration attained for iterative SCCC Decoder<br />

represents an example of efficiency of leveraging on<br />

heterogeneous <strong>GPU</strong>-CPU coprocessing concepts. The registrants<br />

will deep dive into data sets and tasks organization strategies<br />

as well as into results and insights, all widely presented<br />

and discussed.<br />

Speaker(s): Paolo Spallaccini (System Engineer, Ericsson)<br />

Topic(s): Algorithms & Numerical Techniques, Audio, Image and Video<br />

Processing, Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM A2<br />

S0300 Jet: A Domain-Specific Approach to Parallelism<br />

for Film Fluid Simulation<br />

Discover how a domain-specific language can not only provide fast<br />

parallel performance but a simpler user experience in an<br />

environment that highly values flexibility. This talk will present the<br />

Jet language and heterogeneous compiler built on the LLVM<br />

compiler framework that enables efficient generation of X86<br />

machine code or NVIDIA PTX for stencil computation on<br />

structured grids. We show that moving target-specific<br />

optimizations upstream into the compiler can greatly improve the<br />

ability to manipulate the logic of the solver and thus lower the<br />

barrier-to-entry for artists and developers without compromising<br />

on performance.<br />

Speaker(s): Dan Bailey (R&D, Double Negative)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Digital<br />

Content Creation & Film, Computational Fluid Dynamics (Intermediate)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM L<br />

S0343 A Quantum Chemistry Domain-Specific Language<br />

For Heterogeneous Clusters<br />

This talk discuss the development of a Domain-Specific Language<br />

(DSL), the tools and the related runtime for efficiently generating<br />

Tensor Contractions (generalized matrix multiplications), an<br />

important part of many quantum chemistry methods (e.g. Coupled<br />

Cluster Theory). Starting from a high level description of the<br />

computation, the tool analyses it and generates optimized C,<br />

OpenCL or CUDA implementations. The runtime, supporting a<br />

task based computation model, is then able to execute the<br />

generated code on <strong>GPU</strong>-accelerated heterogeneous large scale<br />

clusters, maximizing the utilization of the processing elements<br />

and minimizing communication costs.<br />

Speaker(s): Antonino Tumeo (Research Scientist, Pacific Northwest<br />

National Laboratory), Oreste Villa (Research Scientist, Pacific<br />

Northwest National Laboratory)<br />

Topic(s): Quantum Chemistry, Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM K<br />

S0376 Dynamic <strong>Program</strong>ming on CUDA: Finding the Most<br />

Similar DNA Sequence<br />

Learn a couple of techniques to speed up compute-heavy Dynamic<br />

<strong>Program</strong>ming algorithms on the <strong>GPU</strong>. Our particular problem<br />

regarded DNA sequences: given a reference sequence, how to find<br />

the one most similar to it among a large database? The sequences<br />

are millions characters long, and their similarity is calculated with<br />

a (quadratic) DP algorithm, which makes the problem very tough<br />

even for the <strong>GPU</strong>s. We speed up both the theoretical and practical<br />

side: we present programming techniques that enable Dynamic<br />

<strong>Program</strong>ming to be performed at the hardware speed, and<br />

improvements to the algorithm itself that drastically lower the<br />

execution time.<br />

Speaker(s): Grzegorz Kokosinski (Software Engineer, IBM Poland),<br />

Krzysztof Zarzycki (Senior Software Developer, IBM Poland)<br />

Topic(s): Bioinformatics, Algorithms & Numerical Techniques<br />

(Intermediate)<br />

TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />

ROOM J3<br />

S0520 Using <strong>GPU</strong>s to Speedup Chip Verification<br />

As VLSI designs become more complex, the process of verifying<br />

them becomes increasingly expensive and time consuming.<br />

Verification of such designs has become quite taxing as they take<br />

simulators to the edge in terms of both runtime demands and<br />

host memory requirements. In order to reduce verification time,<br />

different verification methodologies have been adopted including<br />

the use of emulators. However, emulators’ price point is high and<br />

so is the engineering time to set them up. Rocketick develops a<br />

Verilog co-simulator that uses <strong>GPU</strong>s as an acceleration platform.<br />

Rocketick’s product, RocketSim® is now part of NVIDIA’s design<br />

flow and it is being used to accelerate simulations by 10X-30X<br />

compared to the standard simulator and to reduce the memory<br />

footprint by 5X. In this session RocketSim ® will be presented using<br />

some real-world examples of verification flows.<br />

Speaker(s): Tomer Ben-David (Co-Founder and Vice President,<br />

R&D, Rocketick)<br />

Topic(s): Electronic Design Automation (Beginner)<br />

TUESDAY, MAY 15, 10:30 (80 MINUTES)<br />

KEYNOTE – HALL 1<br />

S3000 Opening Keynote<br />

Do not miss this opening keynote, featuring Jen-Hsun Huang, CEO<br />

and Co-Founder of NVIDIA. Hear about what’s next in computing<br />

and graphics, and preview disruptive technologies and exciting<br />

demonstrations from across industries. Jen-Hsun co-founded<br />

NVIDIA in 1993 and has served since its inception as president,<br />

chief executive officer and a member of the board of directors.<br />

Speaker(s): Jen-Hsun Huang (CEO & Co-Founder, NVIDIA)<br />

Topic(s): General Interest (All Levels)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM A3<br />

S0024 <strong>GPU</strong>-Accelerated Path Rendering<br />

Standards such as Scalable Vector Graphics (SVG), PostScript,<br />

TrueType outline fonts, and immersive web content such as Flash<br />

depend on a resolution-independent 2D rendering paradigm that<br />

<strong>GPU</strong>s have not traditionally accelerated. This session explains a<br />

new opportunity to greatly accelerate vector graphics, path<br />

rendering, and immersive web standards using the <strong>GPU</strong>. By<br />

31 CONFERENCE GUIDE TUESDAY


TUESDAY<br />

attending, you will learn how to write OpenGL applications that<br />

accelerate the full range of path rendering functionality. Not only<br />

will you learn how to render sophisticated 2D graphics with<br />

OpenGL, you will learn to mix such resolution-independent<br />

2D rendering with 3D rendering and do so at dynamic,<br />

real-time rates.<br />

Speaker(s): Mark Kilgard (Principal Software Engineer, NVIDIA)<br />

Topic(s): Computer Graphics, <strong>GPU</strong> Accelerated Internet, Digital<br />

Content Creation & Film, Visualization (Beginner)<br />

TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />

ROOM J3<br />

S0069 <strong>GPU</strong> Computing Advances in 3D<br />

Electromagnetic Simulation<br />

Learn about the latest developments in <strong>GPU</strong> acceleration for 3D<br />

Full Wave Electromagnetic simulation. The latest version of CST<br />

Studio Suite supports the full range of Tesla products on both<br />

Windows and Linux operating systems. Using <strong>GPU</strong>, multi-<strong>GPU</strong> and<br />

MPI-<strong>GPU</strong> Computing drastically reduces the simulation times for<br />

CST customers. We will provide a status of current and future <strong>GPU</strong><br />

developments at CST and share detailed simulation results.<br />

Speaker(s): Andreas Buhr (Department Manager - Performance<br />

Optimization, CST AG), Fabrizio Zanella (Systems Manager, CST<br />

of America)<br />

Topic(s): Electronic Design Automation (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM C<br />

S0088 Point Cloud Library (PCL) on CUDA<br />

The Point Cloud Library (PCL - http://pointclouds.org) is a large<br />

scale, open project for 3D point cloud processing. The PCL<br />

framework contains numerous state-of-the art algorithms<br />

including filtering, feature estimation, surface reconstruction,<br />

registration, model fitting and segmentation. Due to the massively<br />

parallel nature of many of the above algorithms, GP<strong>GPU</strong><br />

accelerations holds great potential for achieving real-time<br />

performance in numerous applications. In this work we<br />

demonstrate some of the recent advances in GP<strong>GPU</strong><br />

programming for 3D point cloud processing, and outline plans for<br />

future development.<br />

Speaker(s): Michael Dixon (Research Engineer, Willow Garage, Inc),<br />

Radu Rusu (Research Scientist, Willow Garage, Inc),<br />

Topic(s): Computer Vision, Algorithms & Numerical Techniques,<br />

Stereoscopic 3D, Machine Vision (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM A5<br />

S0254 Graphics in the Cloud - How NVIDIA is Enabling<br />

Cloud Visualization<br />

Engineers, artists, scientists, and gamers are the most<br />

demanding visual thinkers on the planet, and as such have not<br />

been willing to move their computing environments to the<br />

infamous “cloud”. These remotely accessed systems are seen as<br />

slow and not up to the visual experience that users expect when<br />

dealing with these types of applications. NVIDIA aims to change<br />

that perception with the NVIDIA Virtual Graphics Platform. In this<br />

session you will hear about the technologies behind accelerating<br />

graphics in the cloud, and some of the industry partnerships that<br />

are enabling it.<br />

Speaker(s): Will Wade (Manager, Quadro Advanced Technologies, NVIDIA)<br />

Topic(s): Cloud Computing, Visualization, Computer Graphics<br />

(Intermediate)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0313 Understanding and using Atomic<br />

Memory Operations<br />

Atomic memory operations provide powerful communication and<br />

coordination capabilities for parallel programs, including the<br />

well-known operations compare-and-swap and fetch-and-add. The<br />

atomic operations enable the creation of parallel algorithms and<br />

data structures that would otherwise be very difficult (or<br />

impossible) to express without them - for example: shared parallel<br />

data structures, parallel data aggregation, and control primitives<br />

such as semaphores and mutexes. In this talk we will use examples<br />

to describe atomic operations, explain how they work, and discuss<br />

performance considerations and pitfalls when using them.<br />

Speaker(s): Stephen Jones (CUDA Developer, NVIDIA), Lars Nyland<br />

(Compute Architect, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques, Parallel <strong>Program</strong>ming<br />

Languages & Compilers (Advanced)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM N<br />

S0319 Advanced Driver Assistance System Testing<br />

using OptiX<br />

Learn in this session how the AUDI AG and its partners make use<br />

of OptiX as a unified platform for the simulation of perception<br />

sensors utilizing different physical measurement principles, e.g.<br />

Video Camera, LIDAR, Ultra Sonic, etc. The aim is to generate<br />

synthetic sensor data with realistic measurement errors for<br />

testing Advanced Driver Assistance Systems. Get details about the<br />

challenges they faced during the implementation of the necessary<br />

tools for validating the sensor models and join the discussion<br />

when they describe the upcoming challenges related to real-time<br />

Ray Tracing and advanced material descriptions, when multiple<br />

sensors are simulated simultaneously.<br />

Speaker(s): Erwin Roth (Researcher, Technische Universitaet<br />

Muenchen), Tugkan Calapoglu (Lead Graphics Software Developer,<br />

VIRES Simulationstechnologie GmbH)<br />

Topic(s): Ray Tracing, Machine Vision (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />

ROOM A8<br />

S0321 <strong>GPU</strong>-Based Monte Carlo Ray Tracing Simulation<br />

for Solar Power Plants<br />

Learn about real time simulations of Concentrating Thermal Solar<br />

Power using <strong>GPU</strong> technology to enable performance optimization<br />

of these utility scale plants. By leveraging the power of <strong>GPU</strong>s and<br />

the parallel aspect of the field of thousands sun-tracking mirrors,<br />

we have been successful in cutting the computation time by<br />

orders of magnitude versus the previously required minutes and<br />

hours runtime. We will present an overview of the problem<br />

domain and describe how we used the <strong>GPU</strong> to derive a Monte<br />

Carlo physics ray tracing method to simulate the flux reflected by<br />

the mirrors onto the solar receiver.<br />

Speaker(s): Michel Izygon (Tietronix Software), Claus Nilsson<br />

(<strong>Program</strong>mer, Tietronix Software)<br />

Topic(s): Energy Exploration, Computational Physics, Ray Tracing<br />

(Beginner)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM J2<br />

S0328 Best Practices in <strong>GPU</strong>-Based Video Processing<br />

The combination of the <strong>GPU</strong>’s massively parallel compute engine


with extremely high memory bandwidth and new programming<br />

paradigms such as CUDA and OpenCL have made the <strong>GPU</strong> well<br />

suited for image and video processing applications. This session<br />

will explore best practices and techniques for the development of<br />

efficient <strong>GPU</strong>-based video and image processing applications.<br />

Topics to be discussed include image segmentation and threading<br />

models for efficient parallelism, optimal memory usage strategies<br />

to reduce expensive data movement as well as multi-<strong>GPU</strong><br />

considerations. Case studies and examples specific to video and<br />

image processing will be presented.<br />

Speaker(s): Thomas True (Applied Engineer, NVIDIA)<br />

Topic(s): Audio, Image and Video Processing, Digital Content Creation &<br />

Film, Computer Vision, Medical Imaging & Visualization (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM J1<br />

S0364 Interacting with Huge Particle Simulations in<br />

Maya with the <strong>GPU</strong><br />

We present a plug-in for Maya which enables an artist to simulate<br />

huge particle counts in real-time by leveraging the NVIDIA <strong>GPU</strong>.<br />

Being able to interact with the simulation opens up new<br />

possibilities for modifying the workflow. We will demonstrate the<br />

plug-in, and provide insight into the algorithms used.<br />

Speaker(s): Wil Braithwaite (Senior Applied Engineer, NVIDIA)<br />

Topic(s): Digital Content Creation & Film, Computational Fluid<br />

Dynamics, Visualization (Beginner)<br />

TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />

ROOM A1<br />

S0412 A 2-Petaflops Stencil Application with<br />

Stereoscopic 3D Visualization - Gorden Bell Prize 2011<br />

Most stencil applications such as CFD and structure analysis are<br />

memory-bound problems. <strong>GPU</strong> has high performances in both<br />

computation and memory bandwidth suitable for them. The<br />

TSUBAME 2.0 supercomputer with 4224 <strong>GPU</strong>s has started since<br />

November 2010. We study a metal dendritic solidification by solving<br />

the phase-field model. The performance of 2.0 Petaflops was<br />

achieved for 4,096x6,500x1,0400 mesh on 4000 <strong>GPU</strong>s and we<br />

received the ACM Gordon Bell Prize in 2011. We also demonstrated<br />

several large-scale stencil applications (Lattice Boltzmann,<br />

weather prediction and so on) with stereoscopic 3D visualization.<br />

Speaker(s): Takayuki Aoki (Professor, Tokyo Institute of <strong>Technology</strong>)<br />

Topic(s): Supercomputing, Computational Fluid Dynamics, Climate &<br />

Weather Modeling, Stereoscopic 3D (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM L<br />

S0418 High Productivity Computational Finance on <strong>GPU</strong>s<br />

Learn how Aon Benfield helps clients use <strong>GPU</strong>s to develop and<br />

accelerate Monte Carlo derivatives pricing models. We will<br />

present our PathWise software tools used by actuaries and quants<br />

in order to rapidly develop and deploy production quality, <strong>GPU</strong> grid<br />

enabled, Monte Carlo models, using only high-level languages and<br />

tools without requiring any knowledge of CUDA or C/C++. We will<br />

describe our approaching of using Code Generation, Visual<br />

<strong>Program</strong>ming, Domain Specific Languages and scripting<br />

languages to create a High Productivity Computing software stack<br />

for financial services applications.<br />

Speaker(s): Aamir Mohammad (Associate Director, Aon Benfield<br />

Securities), Peter Phillips (SVP, Aon Benfield Securities)<br />

Topic(s): Finance, Application Design & Porting Techniques, Parallel<br />

<strong>Program</strong>ming Languages & Compilers (Beginner)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM A7<br />

S0434 Schlumberger LiveQuest: Application Delivery<br />

and Collaboration Solution<br />

The LiveQuest application delivery and collaboration solution<br />

allows petro-technical professionals to securely access and share<br />

exploration and production (E&P) applications and data, including<br />

3D visualization applications, anytime, anywhere. By utilizing web<br />

and thin-client technologies, LiveQuest provides platformindependent<br />

and application-agnostic real-time collaboration. In<br />

this session, Mario Dean will provide an introduction to the needs<br />

of the O&G exploration from an application and large data 3D<br />

visualization perspective. He will discuss the LiveQuest solution<br />

stack, with specific focus on the 3D remote visualization<br />

technology, and share customer deployment examples and overall<br />

ROI considerations.<br />

Speaker(s): Mario Dean (Schlumberger)<br />

Topic(s): Energy Exploration (Beginner)<br />

TUESDAY, MAY 15, 14:00 (90 MINUTES)<br />

HALL 1<br />

S0515 Multi-<strong>GPU</strong> <strong>Program</strong>ming<br />

CUDA releases starting with 4.0 include a number of features that<br />

facilitate multi-<strong>GPU</strong> programming and computing. In this session<br />

we will review the features useful for programming for multiple<br />

<strong>GPU</strong>s, both within a single node and across network. We will cover<br />

peer-to-peer <strong>GPU</strong> communication, communication patterns for<br />

various <strong>GPU</strong> topologies, as well as streams in the context of<br />

multiple <strong>GPU</strong>s. Concepts will be illustrated with a case study of 3D<br />

forward wave modeling, common in seismic computing.<br />

Speaker(s): Paulius Micikevicius (Developer <strong>Technology</strong><br />

Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />

TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />

ROOM K<br />

S0519 <strong>GPU</strong> Accelerated Bioinformatics Research at BGI<br />

After digitizing DNA double helix by sequencing, computation is<br />

the key connecting raw sequences with life science discoveries. As<br />

massive data is generated, how to process and analysis as well as<br />

storage them in an efficiently manner turns out to be a major<br />

challenge. By developing <strong>GPU</strong> accelerated bioinformatics tools<br />

and integrate them into pipelines, BGI researchers now run<br />

analysis pipelines in several hours instead of several days. These<br />

tools include SOAP3 aligner, SNP calling and tool for population<br />

genomics. The speed up is generally around 10-50x comparing<br />

with traditional counterparts.<br />

Speaker(s): BingQiang Wang (Head of High Performance<br />

Computing, BGI)<br />

Topic(s): Bioinformatics, Life Sciences, Algorithms & Numerical<br />

Techniques, Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (240 MINUTES)<br />

ROOM A2<br />

S0606 <strong>GPU</strong>-accelerated Science on Titan: Tapping into the<br />

World’s Preeminent <strong>GPU</strong> Supercomputer to Achieve<br />

Better Science<br />

This year, the leadership-class computing facility at Oak Ridge<br />

National Labs is upgrading its largest supercomputer for open<br />

science, “Jaguar”, to employ high-performance, power- efficient<br />

<strong>GPU</strong>s. Once the transition is complete, the machine will be known<br />

as “Titan”. In this extended <strong>GTC</strong> session, we will feature a range of<br />

33 CONFERENCE GUIDE TUESDAY


BULL Ad?


presenters showcasing research codes that will run<br />

computational science on the <strong>GPU</strong> at scale. Through these<br />

selected presentations, we will investigate the progress and<br />

anticipated results of <strong>GPU</strong>-acceleration of these significant codes.<br />

In this session, we will also explain how research scientists<br />

interested in tapping into the immense capabilities of Titan can do<br />

so, through programs such as the Incite program sponsored by<br />

the US Department of Energy. The presenters include:<br />

�����������������������������������������������������������<br />

National Laboratories)<br />

“Direct Numerical Simulation of Turbulence-Chemistry<br />

Interactions: Fundamental Insights Towards Predictive Models”<br />

���������������������������������������������������<br />

“S3D Direct Numerical Simulation - Preparations for the<br />

10-100PF Era”<br />

�����������������������������������������������������������<br />

Princeton Plasma Physics Laboratory (PPPL), Princeton)<br />

“Fusion Energy Sciences & Computing at the Extreme Scale”<br />

���������������������������������������������<br />

�������������������������������������������������<br />

“Computer Simulation of Lignocellulosic Biomass”<br />

����������������������������������������������������������������<br />

Science, Princeton)<br />

“Toward Global Seismic Imaging based on Spectral-Element<br />

and Adjoint Methods”<br />

Speaker(s): Jack Wells, Ph.D. (Director of Science, Oak Ridge<br />

Leadership Computing Facility, Oak Ridge National Laboratory)<br />

Topic(s): Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />

ROOM B<br />

S0609 Computational Graphics: An Overview of Graphics<br />

Research at NVIDIA<br />

The future of computer graphics presents many challenges. The<br />

worlds we render will be vastly more complex in geometry and<br />

artistic “texture”. Real-time rendering will use global illumination<br />

to achieve a far richer appearance, robustly. And content creation,<br />

which has grown to be the dominant cost of producing both games<br />

and film, must get simpler and less expensive. The NVIDIA<br />

Graphics Research group addresses these challenges with a focus<br />

on “Computational Graphics”: using general-purpose computation<br />

to enhance and extend the traditional pipelines and capabilities of<br />

real-time rendering. In this talk David Luebke, who leads graphics<br />

research, will give an overview of recent and ongoing work in<br />

computational graphics at NVIDIA Research.<br />

Speaker(s): David Luebke (Senior Director of Graphics<br />

Research, NVIDIA)<br />

Topic(s): Computer Graphics (Intermediate)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM M<br />

S0632 Learn how Adobe After Effects CS6 takes<br />

advantage of NVIDIA Optix technology for 3D Ray Tracing<br />

(Presented by Adobe)<br />

Adobe After Effects CS6 unveils an amazing new 3D ray-traced<br />

rendering engine based on NVIDIA Optix technology with <strong>GPU</strong><br />

acceleration of up to 50x faster than a CPU alone. This enables<br />

simple and quick designs of realistic geometric text and shapes in<br />

3D space. Motion graphics artists can now create more physically<br />

accurate scenes with beautiful results such as reflections,<br />

transparency, soft shadows, and depth-of-field blur directly in<br />

After Effects. <strong>GPU</strong>-accelerated ray tracing drastically improves<br />

the workflow by enabling motion graphics artists to develop these<br />

3D effects entirely within After Effects.<br />

Speaker(s): Steve Forde (Senior Product Manager, After Effects)<br />

Topic(s): Digital Content Creation (Beginner)<br />

TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0801 CUDA Debugger Training on Windows<br />

Nsight offers a variety of powerful CUDA debugging feature set<br />

that enables developers to quickly spot bugs. From the memory<br />

checker to advanced breakpoints and variable warp watch panel, a<br />

developer can quickly isolate access memory errors, filter out the<br />

thousands of threads to a specific thread and quickly spot<br />

abnormal variable value ranges. Through a set of comprehensive<br />

exercises, the attendee will be able to utilize these features to<br />

become fully proficient at developing CUDA code.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />

ROOM J3<br />

S0046 Application of the <strong>GPU</strong> to a Two-Part<br />

Computational Electromagnetic Algorithm<br />

The shooting and bouncing ray (SBR) method is one way to<br />

simulate electromagnetic field radiation. Like all methods, there<br />

are certain problems where it does not yield accurate results. In<br />

this presentation, we will explain one such case that consists of an<br />

antenna resonating between two metal plates. We will discuss<br />

how we used the graphics processing unit (<strong>GPU</strong>) to separate the<br />

problem into two parts. Each part is simulated individually with<br />

SBR producing an improved result. Such a <strong>GPU</strong>-accelerated,<br />

two-part approach can be applied to other more general<br />

hybrid simulations.<br />

Speaker(s): Eric Dunn (Electromagnetic Research Scientist, SAIC)<br />

Topic(s): Computational Physics, Algorithms & Numerical Techniques,<br />

Ray Tracing (Beginner)<br />

TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />

ROOM A1<br />

S0351 Strong Scaling for Molecular Dynamics<br />

Applications<br />

In this session we will talk about how to improve strong scaling for<br />

molecular dynamics applications. Using the NAMD molecular<br />

dynamics code as our primary case study, we will discuss the<br />

types of issues that can impede scaling, how to use already<br />

available and custom tools to discover such issues, and how to<br />

build a model to help analyze and predict scaling performance.<br />

Although this session is primarily focused on molecular dynamics<br />

applications, most of the lessons can be applied equally well to<br />

many other areas and applications.<br />

Speaker(s): Sarah Tariq (Software Engineer, NVIDIA)<br />

Topic(s): Molecular Dynamics, Cluster Management, Life Sciences<br />

(Intermediate)<br />

TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />

ROOM A8<br />

S0379 <strong>GPU</strong>-based High-Performance Simulations<br />

for Spintronics<br />

The joint utilization of the electron’s charge and spin in<br />

“spintronics” represents a promising technology for data<br />

processing and storage in nanostructures. The complex quantum<br />

effects like the spin-Hall effect in these devices require<br />

demanding numerical simulations providing a convenient link<br />

between idealized analytical models to often very complex results<br />

35 CONFERENCE GUIDE TUESDAY


TUESDAY<br />

from measurements. The simulations involving multiplications<br />

and inversions of large matrices provide an ideal showcase for<br />

performance gain by employing GP<strong>GPU</strong>s in the execution of the<br />

algebraic routines on these matrices in computing environments<br />

with shared execution of algorithms on multiple nodes with<br />

multiple GP<strong>GPU</strong>s and CPU cores.<br />

Speaker(s): Jan Jacob (Postdoctoral Researcher, University of Hamburg)<br />

Topic(s): General Interest, Computational Physics, Application Design<br />

& Porting Techniques (Intermediate)<br />

TUESDAY, MAY 15, 14:30 (50 MINUTES)<br />

ROOM K<br />

S0516 The Advantage of <strong>GPU</strong> Computation for Analyzing<br />

Complex Traits<br />

Most import agriculture traits and human diseases are complex<br />

traits which are controlled by gene network with gene by gene<br />

interaction (epistasis) and gene by environment interaction (GE).<br />

New statistic methods and software are developed for analyzing<br />

genetic architecture for complex traits based on genome-wide<br />

association study (GWAS). When deal with large mapping<br />

population and huge amount of molecular information, <strong>GPU</strong><br />

computation has an advantage over CPU computation. We will<br />

demonstrate the newly developed <strong>GPU</strong> based software<br />

QTLNetwork V3.0 and GWAS-GMDR for mapping genes with<br />

epistasis and GE interaction for complex traits of human, crops,<br />

and mouse.<br />

Speaker(s): Jun Zhu (Professor, Zhejiang University)<br />

Topic(s): Bioinformatics, Life Sciences (Intermediate)<br />

TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />

ROOM B<br />

S0610 Octree-Based Sparse Voxelization For Real-Time<br />

Global Illumination<br />

Discrete voxel representations are generating growing interest in<br />

a wide range of applications in computational sciences and<br />

particularly in computer graphics. A new real-time usage of<br />

dynamic voxelization inside a sparse voxel octree is to compute<br />

voxel-based global illumination. When used in real-time contexts,<br />

it becomes critical to achieve fast 3D scan conversion (also called<br />

voxelization) of traditional triangle-based surface representations.<br />

This talk describes an new surface voxelization algorithm that<br />

produces a sparse voxel representation of a triangle mesh scene<br />

in the form of an octree structure using the <strong>GPU</strong> hardware<br />

rasterizer. In order to scale to very large scenes, our approach<br />

avoids relying on an intermediate full regular grid to build the<br />

structure and constructs the octree directly.<br />

Speaker(s): Cyril Crassin (Postdoctoral Research Scientist, NVIDIA)<br />

Topic(s): Computer Graphics (Intermediate)<br />

TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />

ROOM A2<br />

S0655 Direct Numerical Simulation of Turbulence-<br />

Chemistry Interactions: Fundamental Insights Towards<br />

Predictive Models<br />

Recent petascale direct numerical simulation (DNS) of turbulent<br />

combustion have transformed our ability to interrogate finegrained<br />

‘turbulence-chemistry’ interactions in canonical<br />

laboratory configurations. In particular, three-dimensional DNS,<br />

at moderate Reynolds numbers and with complex chemistry, is<br />

providing unprecedented levels of detail to understand<br />

fundamental coupling between turbulence, mixing and reaction.<br />

This information is leading to new physical insight and is providing<br />

unique validation data for assessing model assumptions in<br />

coarse-grained engineering CFD approaches used to design<br />

modern combustors. The role of petascale DNS is illustrated<br />

through selected examples relevant to controlling ignition and<br />

combustion rates in homogeneous charge compression ignition<br />

engines and to fuel injection processes in stationary gas turbines<br />

for power generation. Petascale simulations presently generate<br />

upwards of a petabyte of complex, multi-scale, time-varying data<br />

used by combustion modelers to validate subfilter combustion and<br />

mixing models in large-eddy simulation. With the advent of 10-20<br />

petaflop hybrid architectures with accelerators like Titan at Oak<br />

Ridge National Laboratory, it will be possible to dramatically<br />

increase the chemical complexity of DNS. This will help accelerate<br />

the development of predictive subprocess models which will be<br />

used by engine developers to better understand and tailor the<br />

combustion of gasoline and new, more complex types of fuels in<br />

advanced engines. With Titan, simulations will move beyond<br />

today’s studies of simple fuels—hydrogen, syngas and methane—<br />

to more complex, larger-molecule hydrocarbon fuels like<br />

isooctane (a surrogate for gasoline), commercially important<br />

oxygenated alcohols (for example, ethanol and butanol), and<br />

biofuel surrogates.<br />

Speaker(s): Jacqueline H. Chen (Combustion Research Facility, Sandia<br />

National Laboratories)<br />

Topic(s): Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM L<br />

S0034 Real-Time Risk Simulation: The <strong>GPU</strong> Revolution In<br />

Profit Margin Analysis<br />

Discover how ICHEC helped a world leading company in its sector,<br />

to dramatically speed-up and improve the quality of its real-time<br />

risk management tool chain. In this session, we present the<br />

method used for porting the core-part of the simulation engines<br />

to <strong>GPU</strong>s using CUDA. This porting was realized on two very<br />

different simulation algorithms and resulted in speed-ups of 2 to<br />

3 orders of magnitude, allowing much greater accuracy of the<br />

results in a real-time environment.<br />

Speaker(s): Gilles Civario (Senior Software Architect, ICHEC), Renato<br />

Miceli (Computational Scientist, ICHEC)<br />

Topic(s): Finance, Application Design & Porting Techniques, Algorithms<br />

& Numerical Techniques (Intermediate)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM C<br />

S0036 Multiparticle Collision Dynamics on <strong>GPU</strong>s<br />

See how we employ <strong>GPU</strong>s to simulate the interaction of millions of<br />

solvent and solute particles of a fluid system. Often the domain of<br />

large cluster system, the most time consuming part of our<br />

simulations can now be done on desktop PCs in reasonable time.<br />

This contribution shows how <strong>GPU</strong>s can effectively be used to<br />

accelerate existing programs and how techniques like streaming<br />

and increased data locality significantly enhance calculation<br />

throughput. It also shows how a <strong>GPU</strong>-optimized program<br />

structure yields usually expensive additional functionality “almost<br />

free”. Furthermore, a well-scaling single-node/multi-<strong>GPU</strong><br />

implementation of the program is presented.<br />

Speaker(s): Elmar Westphal (Software Developer,<br />

Forschungszentrum Juelich)<br />

Topic(s): Computational Physics, Computational Fluid Dynamics,<br />

Molecular Dynamics (Intermediate)


TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM J2<br />

S0049 Using the <strong>GPU</strong> Direct for Video API<br />

This tutorial will demonstrate how video I/O devices can take<br />

advantage of the <strong>GPU</strong> Direct for Video API to optimize the data<br />

transfer performance for digital video, film and broadcast<br />

applications and computer vision applications. The <strong>GPU</strong> Direct for<br />

Video API is a technology that permits the DMA transfer of data<br />

buffers between video I/O devices and the <strong>GPU</strong> through the use of<br />

a shared system memory buffer for immediate processing by<br />

OpenGL, DirectX, CUDA and OpenCL. This direct transfer can<br />

improve synchronization and eliminate latency between video<br />

capture, <strong>GPU</strong> processing and video output.<br />

Speaker(s): Alina Alt (Applied Engineer, NVIDIA), Thomas True (Applied<br />

Engineer, NVIDIA)<br />

Topic(s): Audio, Image and Video Processing, Development Tools &<br />

Libraries, Digital Content Creation & Film, Machine Vision (Advanced)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM A8<br />

S0067 PICon<strong>GPU</strong> - Bringing large-scale Laser Plasma<br />

Simulations to <strong>GPU</strong> Supercomputing<br />

With powerful lasers breaking the Petawatt barrier, applications<br />

for laser-accelerated particle beams are gaining more interest<br />

than ever. Ion beams accelerated by intense laser pulses foster<br />

new ways of treating cancer and make them available to more<br />

people than ever before. Laser-generated electron beams can<br />

drive new compact x-ray sources to create snapshots of ultrafast<br />

processes in materials. With PICon<strong>GPU</strong> laser-driven particle<br />

acceleration can be computed in hours compared to weeks on<br />

standard CPU clusters. We present the techniques behind<br />

PICon<strong>GPU</strong>, detailed performance analysis and the benefits of<br />

PICon<strong>GPU</strong> for real-world physics cases.<br />

Speaker(s): Michael Bussmann (Junior Group Leader Computational<br />

Radiation Physics, Helmholtz-Zentrum Dresden-Rossendorf), Guido<br />

Juckeland (System Engineer (HPC), Technical University Dresden)<br />

Topic(s): Computational Physics, Algorithms & Numerical Techniques,<br />

Application Design & Porting Techniques, Supercomputing (Advanced)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM A1<br />

S0075 Oculus Real-Time Modular Cognitive Vision System<br />

This session will explore ways to integrate <strong>GPU</strong> processing into a<br />

real-time computer vision architecture. While there has been a<br />

rapid push to move vision algorithms onto <strong>GPU</strong>s, integration into<br />

an efficient vision system architecture remains elusive. We will<br />

discuss our development of a modular vision system architecture<br />

that enables rapid prototyping of complex pipelines using multiple<br />

<strong>GPU</strong>s. The system incorporates modules for segmentation,<br />

disparity mapping, optical flow and particle filter tracking on the<br />

<strong>GPU</strong>. Our talk will explore the various difficulties associated with<br />

developing such a system and will give a hands-on demonstration<br />

of Oculus, our vision platform.<br />

Speaker(s): Jeremie Papon (PhD Student, University of Gottingen),<br />

Alexey Abramov (PhD Student, University of Gottingen)<br />

Topic(s): Computer Vision, Audio, Image and Video Processing,<br />

Application Design & Porting Techniques, Machine Vision (Intermediate)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM N<br />

S0223 Rapid Training of Acoustic Models Using <strong>GPU</strong>s<br />

Learn how to realize robust and accurate speech recognition<br />

systems by training acoustic models on <strong>GPU</strong>s. For common<br />

languages, state-of-the-art systems are now trained on<br />

thousands of hours of speech data, which can take weeks even<br />

with a large cluster of machines. To overcome this development<br />

bottleneck, we propose a new framework for rapid training of<br />

acoustic models using highly parallel <strong>GPU</strong>s. With a single NVIDIA<br />

GTX580 <strong>GPU</strong>, our proposed approach is shown to be 51x faster<br />

than a sequential CPU implementation, enabling a moderately<br />

sized acoustic model to be trained on 1000-hour speech data in<br />

just over 9 hours.<br />

Speaker(s): Jike Chong (Co-Director of CUDA Research Center,<br />

Carnegie Mellon University), Ian Lane (Assistant Research Professor,<br />

Carnegie Mellon University)<br />

Topic(s): Audio, Image and Video Processing, Machine Learning & AI<br />

(Intermediate)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0308 Recent Trends in Hierarchical N-body Methods<br />

on <strong>GPU</strong>s<br />

See the newest developments in the area of hierarchical N-body<br />

methods for <strong>GPU</strong> computing. Hierarchical N-body methods have<br />

O(N) complexity, are compute bound, and require very little<br />

synchronization, which makes them a favorable algorithm on<br />

next-generation supercomputers. In this session we will cover<br />

topics such as hybridization of treecodes and fast multipole<br />

methods, auto-tuning kernels for heterogenous systems, fast tree<br />

construction based on prefix sums, fast load balancing of global<br />

trees, and more. Examples will be given using ExaFMM --an open<br />

source hierarchical N-body library for heterogenous systems<br />

developed by the speaker. (Released at SC11)<br />

Speaker(s): Rio Yokota (Research Scientist, King Abdullah University of<br />

Science and <strong>Technology</strong>)<br />

Topic(s): Algorithms & Numerical Techniques, Supercomputing,<br />

Development Tools & Libraries (Intermediate)<br />

TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />

ROOM J3<br />

S0349 Tree Accumulation on the <strong>GPU</strong><br />

Learn how to map irregular tree structured computations to the<br />

<strong>GPU</strong> efficiently. See how extremely irregular data-dependent<br />

computations can be implemented by composing them out of<br />

regular data-parallel primitives. In particular we focus on the<br />

problem of tree accumulation, a generalization of the scan primitive<br />

to arbitrary tree data structures. We first show how tree orderings<br />

and properties can be computed using the Euler tour technique and<br />

standard scan primitives. Using these orderings we then develop<br />

our new approach to computing tree accumulations in parallel.<br />

Speaker(s): Scott Rostrup (Software Engineer, Synopsys Inc)<br />

Topic(s): Algorithms & Numerical Techniques, Application Design &<br />

Porting Techniques (Advanced)<br />

TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />

ROOM J1<br />

S0403 NURBS Tessellation with CUDA<br />

NURBS, or Non Uniform Rational B Splines, are a curved surface<br />

representation commonly used in computer aided design and<br />

digital content creation. This recursive representation gives a great<br />

deal of flexibility, allowing arbitrary surface order and knot vectors,<br />

enabling a single NURBS surface to contain many contiguous<br />

patches. However, this recursive representation is also expensive to<br />

compute, so a NURBS surface is often converted into multiple<br />

Bezier patches before being tessellated. In this implementation, we<br />

37 CONFERENCE GUIDE TUESDAY


TUESDAY<br />

present an efficient method for directly tessellating NURBS<br />

surfaces using the NVIDIA CUDA computing API.<br />

Speaker(s): Brent Oster (Applied Engineer, NVIDIA)<br />

Topic(s): Computer Graphics (Advanced)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM A3<br />

S0407 A High Level <strong>Program</strong>ming Environment for<br />

Accelerated Computing<br />

One of the critical hurdles for the widespread adoption of accelerated<br />

computing in HPC is programming difficulty. Users need a simple<br />

programming model that is portable and is not significantly different<br />

from the approaches used on current multi-core x86 processors. In<br />

this talk I will present Cray’s strategy to accelerator programming,<br />

which is based on a high level programming environment with tightly<br />

coupled compilers, libraries, and tools. Ease of use is possible with<br />

compiler making it feasible for users to write applications in Fortran,<br />

C, C++, tools to help users port and optimize for accelerators, and<br />

auto-tuned scientific libraries.<br />

Speaker(s): Luiz DeRose (Director of <strong>Program</strong>ming Environment,<br />

Cray Inc.)<br />

Topic(s): Development Tools & Libraries, Parallel <strong>Program</strong>ming<br />

Languages & Compilers (Intermediate)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM A5<br />

S0413 Delivering 3D Professional Graphics from the<br />

Cloud with Citrix XenDesktop<br />

Recent technological advances have made it practical to deliver<br />

3D professional graphics applications from the Cloud (private or<br />

public) with a high quality user experience and at an attractive<br />

cost. Organizations can keep their intellectual property safe in the<br />

data center since only fully-rendered screen images are sent over<br />

the network. Users in remote locations no longer have to wait for<br />

large file transfers. And they can access 3D models from a wide<br />

variety of devices, including iPads and Android tablets. Learn how<br />

Citrix XenDesktop, XenServer and Receiver technologies have<br />

made all of this a reality for many organizations today.<br />

Speaker(s): Derek Thorslund (Director of Product Management, Citrix<br />

Systems, Inc.)<br />

Topic(s): Cloud Computing, Computer Graphics, Visualization (Beginner)<br />

TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />

ROOM A7<br />

S0436 Integrated <strong>GPU</strong> Acceleration With Real Time<br />

Visualization Of Terabyte Data<br />

Computation and visualization doesn’t necessarily have to act as<br />

two separate entities. This talk explains the integration of real-time<br />

compute with real-time visualization. Industry and academia have<br />

provided attractive solutions for compiler-directive optimized code<br />

for computations. To support cases that involves massive yet ad-hoc<br />

data I/O and computation with interactive visualization, Hue<br />

developed a different model which bridges the gap between<br />

“complete system rewrite” and “compiler directive optimized code”.<br />

The talk explains how highly optimized data I/O mechanisms<br />

coupled with predefined input and output definitions for kernels<br />

provide excellent scalability and interactivity during runtime.<br />

Speaker(s): Kelly Walker (Senior Software Developer, Hue)<br />

Topic(s): Visualization, Energy Exploration (Beginner)<br />

TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />

ROOM B<br />

S0611 Edge-Aware Shaders for Real-Time<br />

Computer Graphics<br />

The most common approach in rendering is to define behavior at a<br />

point in terms of material properties and incident illumination.<br />

That approach works well when the geometry and material<br />

properties are well-known, and the light physics are simulated<br />

accurately. We present a technique to help situations where the<br />

model and/or physics is incomplete. This technique augments<br />

shaders with information about nearby edges, such as corners<br />

and boundaries between materials, and makes it natural to add<br />

richness procedurally near these visually critical regions.<br />

Speaker(s): Peter-Pike Sloan (Principal Research Scientist, NVIDIA)<br />

Topic(s): Computer Graphics (Intermediate)<br />

TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM M<br />

S0620 VSIPL++: A High-Level <strong>Program</strong>ming Model<br />

for Productivity and Performance (Presented by<br />

Mentor Graphics)<br />

Learn how VSIPL++ can improve your productivity and provide<br />

software portability, without sacrificing performance. We will<br />

describe how VSIPL++’s open-standard high-level programming<br />

model addresses the challenges of writing high-performance<br />

embedded software on GP-<strong>GPU</strong>s and other heterogeneous<br />

hardware, using advanced C++ techniques and data abstraction –<br />

and how we make this work in the real world. We will also present<br />

a comparison of performance results from various configurations<br />

of CPU and GP-<strong>GPU</strong> processing engines for a signal processing<br />

application developed using VSIPL++.<br />

Speaker(s): Brooks Moses, Ph.D. (Sourcerer, Mentor<br />

Graphics Corporation)<br />

Topic(s): Supercomputing (Beginner)<br />

TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />

ROOM A2<br />

S0625 S3D Direct Numerical Simulation - Preparations<br />

for the 10-100PF Era<br />

The evolution of supercomputing into the mid-petaflop era has<br />

been typified by heterogenous compute nodes with the majority of<br />

the compute capability delivered by a large number of lightweight<br />

cores. In order to prepare for the extension of this trend, the DNS<br />

code S3D has been retooled in anticipation of a target architecture<br />

offering 10s of thousands of heterogeneous nodes containing many<br />

X86 cores as well as <strong>GPU</strong> derived accelerators. Movement of outer<br />

loops to the highest level in the code facilitates hybrid MPI-OpenMP<br />

performance and an elegant path to accelerated kernels using<br />

OpenACC. It is anticipated that relevant scientific simulations at this<br />

scale will have a per-node footprint that can be contained entirely<br />

on the accelerator, so provision is made to maintain primary<br />

solution variables in accelerator memory with specific regions<br />

moved to the CPU for inter-node communication and workload<br />

balancing. With the current performance it is estimated that the<br />

new code will make it possible to meet early science goals with the<br />

full build-out of the anticipated Titan system as well as provide a<br />

platform to transition into the exascale software research space.<br />

Speaker(s): Ray Grout (National Renewable Energy Laboratory)<br />

Topic(s): Supercomputing (Beginner)


TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0802 CUDA Profiler Training on Windows<br />

Nsight offers a comprehensive set of performance analysis tools.<br />

From the ability to trace complete system multi-core CPU and<br />

multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />

experiments, developers can identify system level optimization<br />

opportunities as well as expensive and inefficient CUDA kernels<br />

requiring in-depth analysis with the CUDA profiler. Through a set<br />

of comprehensive exercises, the attendee will be able to utilize<br />

these features to become fully proficient at optimizing complex<br />

CUDA applications.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />

ROOM K<br />

S0152 Accurate Sequence Alignment using Distributed<br />

Filtering on <strong>GPU</strong> Clusters<br />

Learn how <strong>GPU</strong>s enable new ways to rethink a complex<br />

bioinformatics problem: Accurate sequence alignment. What was<br />

once prohibitive to compute can become the basic block of novel<br />

<strong>GPU</strong>-based algorithms. Modern DNA sequencing machines<br />

generate enormous amounts of short sequences within minutes,<br />

and they should be aligned to a reference genome in real time.<br />

Most solutions only find a few locations that match a short<br />

sequence. We introduce a new technique to find all matching<br />

locations inside a reference sequence for a given number of<br />

mismatches. Our technique is based on a distributed filtering<br />

scheme and <strong>GPU</strong> based processing.<br />

Speaker(s): Reza Farivar (PhD Student, University of Illinois at Urbana-<br />

Champaign), Shivaram Venkataraman (PhD Student, UC Berkeley)<br />

Topic(s): Bioinformatics, Algorithms & Numerical Techniques<br />

(Intermediate)<br />

TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />

ROOM J3<br />

S0316 Using <strong>GPU</strong>s to Accelerate Synthetic Aperture<br />

Sonar Imaging via Backpropagation<br />

This presentation describes our development of a <strong>GPU</strong>accelerated<br />

backpropagation implementation for Synthetic<br />

Aperture Sonar systems that supports multiple nodes via MPI and<br />

multi-<strong>GPU</strong> nodes. This implementation can form a complexvalued<br />

gigapixel image in one hour on a single C2050. We further<br />

scale this implementation to the Keeneland system where we can<br />

form the same gigapixel image in 21 seconds on 48 nodes with<br />

144 C2070 Tesla <strong>GPU</strong>s. Our talk will discuss the details of our<br />

implementation, including our optimizations and scaling results<br />

for various node and <strong>GPU</strong> configurations, as well as the<br />

applicability to other domains, including Synthetic Aperture Radar.<br />

Speaker(s): Thomas Benson (Research Engineer II, Georgia Tech<br />

Research Institute)<br />

Topic(s): Application Design & Porting Techniques (Intermediate)<br />

TUESDAY, MAY 15, 15:30 (50 MINUTES)<br />

ROOM J1<br />

S0366 OptiX Out-of-Core and CPU Rendering<br />

OptiX has broken some major barriers recently by enabling<br />

out-of-<strong>GPU</strong>-core memory rendering and by adding a CPU<br />

rendering back-end when an OptiX-capable <strong>GPU</strong> is not present in<br />

the system. OptiX users and CUDA developers will be interested in<br />

how we accomplished these feats within the existing <strong>GPU</strong><br />

architecture. This talk will provide a brief introduction to OptiX and<br />

then dive into what the new features provide. We will then go<br />

under the covers and show how we pulled it off.<br />

Speaker(s): David McAllister (OptiX Manager, NVIDIA, OptiX group)<br />

Topic(s): Ray Tracing, Computer Graphics (Intermediate)<br />

TUESDAY, MAY 15, 15:30 (50 MINUTES)<br />

ROOM B<br />

S0409 Stochastic Rasterization<br />

Learn how to render transparency, motion blur, and depth of field<br />

effects in real time using random sampling. These effects<br />

combine multiple objects in each pixel, making them expensive to<br />

compute directly. But recent research shows that, with stratified<br />

sampling and clever reconstruction, good image quality can be<br />

achieved with surprisingly small numbers of samples per pixel.<br />

We will explain how to do this on the <strong>GPU</strong>, and explore trade-offs<br />

of performance, quality, accuracy, and noise.<br />

Speaker(s): Eric Enderton (Research Scientist, NVIDIA), Morgan<br />

McGuire (Visiting Professor, NVIDIA and WIlliams College)<br />

Topic(s): Computer Graphics, Digital Content Creation & Film<br />

(Intermediate)<br />

TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />

ROOM A7<br />

S0444 Explore New Techniques in Volume Rendering/<br />

Segmentation with Open Inventor<br />

The goal of this session is to show the improvements in quality,<br />

performance and flexibility of the volume rendering implementation<br />

of Open Inventor. The latest <strong>GPU</strong> techniques, such as virtual<br />

textures and ray casting, have been combined into a flexible shader<br />

API and applied on out of core data. The techniques of volume<br />

rendering, sugarcube rendering, basic and complex clipping,<br />

sculpting, editing and segmentation will be demonstrated using<br />

examples from a geobody extraction workflow. The great ease and<br />

flexibility of the shader pipeline API will be illustrated, and we will<br />

discuss the broad future perspectives of that technology.<br />

Speaker(s): Mike Heck (<strong>Technology</strong> Advisor, VSG)<br />

Topic(s): Computer Graphics (Advanced)<br />

TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />

ROOM A2<br />

S0654 Fusion Energy Sciences & Computing at the<br />

Extreme Scale<br />

The fusion energy sciences community has made excellent progress<br />

in developing advanced codes for which computer run-time and<br />

problem size scale well with the number of processors on massively<br />

parallel supercomputers. A good example is the effective usage of<br />

the full power of modern leadership class computational platforms<br />

from the terascale to the petascale and beyond to produce nonlinear<br />

particle-in-cell simulations which have accelerated progress in<br />

understanding the nature of plasma turbulence in magneticallyconfined<br />

high temperature plasmas. Illustrative results provide great<br />

encouragement for being able to include increasingly realistic<br />

dynamics in extreme-scale computing campaigns to enable<br />

predictive simulations with unprecedented physics fidelity.<br />

Speaker(s): William Tang (Fusion Simulation <strong>Program</strong> at the Princeton<br />

Plasma Physics Laboratory (PPPL), Princeton)<br />

Topic Area(s): Supercomputing (Intermediate)<br />

39 CONFERENCE GUIDE TUESDAY


The Many-Core Company<br />

Discover our global solutions for many-core programming:<br />

Software tools<br />

Expertise<br />

and the methodology to safely port your code<br />

www.caps-entreprise.com


TUESDAY, MAY 15, 16:00 (50 MINUTES<br />

ROOM K<br />

S0008 Algorithms and Tools for Bioinformatics on <strong>GPU</strong>s<br />

Learn how to use <strong>GPU</strong>s to accelerate compute- and data-intensive<br />

applications and algorithms Bioinformatics. High-throughput<br />

techniques for DNA sequencing and gene expression analysis with<br />

microarrays have led to a rapid growth in the amount of digital<br />

biological data, e.g. the NCBI Sequence Read Archive (SRA) houses<br />

raw sequence data generated by next-generation sequencing (NGS)<br />

technologies which succeeds 25 trillion base-pairs. Therefore,<br />

modern bioinformatics tools need to be scalable; i.e. they need to<br />

deal with an ever growing amount of data. <strong>GPU</strong>s and CUDA provide<br />

the opportunity to significantly reduce the runtime of many<br />

biological algorithms on inexpensive hardware.<br />

Speaker(s): Bertil Schmidt (Nanyang Technological University)<br />

Topic(s): Bioinformatics, Life Sciences (Intermediate)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM J3<br />

S0050 High Performance Logic Simulation with <strong>GPU</strong>s<br />

Verification has become the bottleneck of IC design process due to<br />

its fast increasing complexity. The fundamental means of verifying<br />

digital circuits is logic simulation, which can be performed at both<br />

register-transfer level (RTL) and gate level. In this work, we<br />

developed <strong>GPU</strong> based logic simulation solutions. We implemented<br />

a Chandy-Misra-Bryant parallel simulation protocol on <strong>GPU</strong>s for<br />

sufficient parallelism. A dynamic <strong>GPU</strong> memory allocator was<br />

introduced to efficiently manage <strong>GPU</strong> memory resources. RTL<br />

simulation is performed in a compiled-code scheme by translating<br />

Verilog code into equivalent CUDA code. Experimental results<br />

proved that the <strong>GPU</strong> simulators significantly outperform their<br />

CPU counterparts.<br />

Speaker(s): Yangdong Deng (Associate Professor, Tsinghua University)<br />

Topic(s): General Interest, Algorithms & Numerical Techniques<br />

(Advanced)<br />

TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />

ROOM A1<br />

S0062 Inverse 3D Vision: Detection and Tracking of<br />

NVIDIA Glasses<br />

Computer Vision is becoming increasingly popular and important<br />

nowadays. With the advent of powerful mobile devices and<br />

increasing power of desktop PCs, it is important to improve user<br />

experience by tackling the hardest problems of real-time<br />

interaction with the user. These include body parts tracking, face,<br />

and gesture recognition. This talk discusses techniques behind an<br />

interaction pattern between a user and a 3D visualization system, in<br />

which the system tracks the position of NVIDIA 3D Vision Glasses,<br />

and accounts this information during rendering. The mentioned<br />

techniques include Histograms of Oriented Gradients and Template<br />

Matching. The system implementation is discussed too.<br />

Speaker(s): Anton Obukhov Engineering Consultant, (Ubiquiti Networks)<br />

Topic(s): Computer Vision, Machine Vision, Development Tools &<br />

Libraries, (Advanced)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM A3<br />

S0089 Accelerator Directives, OpenACC and OpenMP4ACC<br />

Rather than require the programmer to rewrite code for<br />

accelerators several directive sets have been created and<br />

proposed to support non-cache coherent and cache coherent<br />

accelerators. This talk will present the OpenACC specification and<br />

its implementation for Cray developers, as well as touch on a<br />

similar proposal being evaluated by the OpenMP language<br />

committee. The presentation will start by discussing the Memory<br />

and Execution model needed to allow a programmer to write<br />

codes that will run effectively on both distinct memory systems<br />

and unified memory systems. Once a proper background has been<br />

set the directives will be examined via usage examples.<br />

Speaker(s): James Beyer (Software Engineer, Cray Inc), David Oehmke<br />

(Cray Inc.)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers,<br />

Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />

ROOM C<br />

S0108 An Innovative Massively Parallelized Molecular<br />

Dynamic Software<br />

In this paper, we present how we improved the speedup of the<br />

electronic structure calculator VASP by more than an order of<br />

magnitude. Recently, the research works done (at IFP Energies<br />

Nouvelles) have shown that by coupling traditional clusters or<br />

High Performance Computing (HPC) machines with accelerators<br />

based on graphical processor units (<strong>GPU</strong>s), by recording the most<br />

time consuming parts of the codes (with programming languages<br />

like CUDA, OpenCL) and offloading them on the graphic chips, it is<br />

possible to reduce the computing time to ensure a speedup of a<br />

factor of 5 to 15.<br />

Speaker(s): Thomas Guignon (Research Engineer, IFPEN), Ani Anciaux<br />

Sedrakian (IFP Energie Nouvelles)<br />

Topic(s): Molecular Dynamics, Supercomputing, Application Design &<br />

Porting Techniques (Intermediate)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0221 1024 Bit Parallel Rational Arithmetic Operators for<br />

the <strong>GPU</strong><br />

Learn how to create a set of rational arithmetic operators that<br />

manipulate 1024 bit operands on a Tesla C2050. These operators<br />

are used to create a numerically stable implementation for Bessel<br />

functions. Naive implementations of the Bessel functions produce<br />

unreliable results when they are used to solve Maxwell’s<br />

equations by way of Mie theory. Maxwell’s equations are used to<br />

model the scattering of light by small particles. Light scatter is<br />

used in Particle Characterization to measure the quality of<br />

materials like cocoa, cement and pharmaceuticals.<br />

Speaker(s): Robert Zigon (Sr. Staff Development Engineer,<br />

Beckman Coulter)<br />

Topic(s): Algorithms & Numerical Techniques, Computational Physics<br />

(Intermediate)<br />

TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />

ROOM A8<br />

S0245 Porting Legacy Plasma Codes to <strong>GPU</strong><br />

Learn how to port legacy Fortran plasma codes to <strong>GPU</strong>. Many legacy<br />

plasma codes are written in Fortran and have many lines of codes.<br />

We will discuss techniques in porting such legacy codes easily and<br />

efficiently to CUDA C/C++. Performance analysis of major algorithmic<br />

patterns in plasma codes will be discussed. The discussion will use<br />

the <strong>GTC</strong> and GeFi plasma code as realistic examples.<br />

Speaker(s): Peng Wang (Devtech Engineer, NVIDIA)<br />

Topic(s): Computational Physics, Computational Physics (Intermediate)<br />

41 CONFERENCE GUIDE TUESDAY


TUESDAY<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM A5<br />

S0261 Scalable <strong>GPU</strong> Computing Service Architecture<br />

In this session we describe our <strong>GPU</strong> accelerated computing<br />

service which supports several internal business processes in a<br />

large scale company setup. The service supports diverse<br />

computational needs such as on-demand rendering, mesh<br />

optimization, a Massive Multiplayer Online Game (MMO), product<br />

visualizations and other demanding computational tasks. We<br />

present the architectural considerations for a service-oriented<br />

computational framework and the practical learning’s and<br />

opportunities encountered during development a enterprise<br />

system using NVIDIA technologies such as CUDA, OptiX, OpenGL<br />

and OpenCL. Our aim is to share knowledge and present LEGO’s<br />

vision for a <strong>GPU</strong> accelerated computational platform as a<br />

business-driven technology.<br />

Speaker(s): Henrik Høj Madsen (Solution Architect, LEGO), Michael<br />

Schøler (Senior Consultant, LEGO)<br />

Topic(s): Cloud Computing, Computer Graphics, Ray Tracing<br />

(Intermediate)<br />

TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />

ROOM A7<br />

S0336 <strong>GPU</strong> Acceleration for Seismic<br />

Interpretation Algorithms<br />

The oil and gas industry is already leveraging <strong>GPU</strong>s for seismic<br />

data processing, but what about 3D seismic interpretation? This<br />

session will cover how the <strong>GPU</strong> is being used by TerraSpark<br />

Geosciences to dramatically decrease the runtime of algorithms<br />

for enhancing faults, computing horizon orientation, and<br />

calculating volumetric curvature. We will share our experiences in<br />

porting these techniques to the <strong>GPU</strong>, the challenges encountered,<br />

the solutions found, and, of course, the benefits to execution time.<br />

Speaker(s): Jonathan Marbach (Director, Software Architecture and<br />

Engineering, TerraSpark Geosciences)<br />

Topic(s): Energy Exploration (Beginner)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM J2<br />

S0356 Optimized Texture Transfers<br />

Many real world graphics applications need to transfer textures<br />

efficiently in and out of <strong>GPU</strong> memory in the form of 2D images,<br />

2.5D terrains or 3D volumes as well as their time-varying<br />

counterparts. The first part of this talk covers technical pointers<br />

on how to optimize your OpenGL application to overlap transfers<br />

with rendering using the NVIDIA Copy Engines. The second part<br />

demonstrates the integration and performance of this feature<br />

within the a real world latency-sensitive broadcast graphics<br />

application from VizRT.<br />

Speaker(s): Shalini Venkataraman (Senior Applied Engineer, NVIDIA),<br />

Gerhard Lang (Chief Engineering Officer, VizRT )<br />

Topic (s): Computer Graphics, Visualization<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM L<br />

S0435 Leveraging GP<strong>GPU</strong> <strong>Technology</strong> for Valuation of<br />

Complex Insurance Products<br />

We share our experiences moving a mature, large scale insurance<br />

application from a CPU to <strong>GPU</strong> environment. This session explores<br />

the nuances of porting a C++ application when ‘blank sheet’<br />

re-architecture is not an option. This session will cover: Insurance<br />

differences from other financial products (and the implications for<br />

the <strong>GPU</strong>), Considerations when moving an existing, fully featured<br />

C++ system to a GP<strong>GPU</strong> platform, Supporting CPU and <strong>GPU</strong><br />

implementations from a single code base, Supporting user defined<br />

code extensions on the <strong>GPU</strong>, CUDA 4.0 C++ extensions: experiences,<br />

challenges and limitations and Performance case study.<br />

Speaker(s): Chris Stiefeling (Oliver Wyman Financial Services)<br />

Topic(s): Finance (Intermediate)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM N<br />

S0526 Tools for Mobile Computational Photography<br />

This session will talk about advances in Mobile Computational<br />

Photography and the tools that NVIDIA is putting together to<br />

enable these on Tegra powered devices. It will demonstrate the<br />

use of FCam, an Application <strong>Program</strong>ming Interface (API) that<br />

allows for easy and precise control of the camera system. In<br />

addition, the FCam API can enable the application developer to<br />

replace basic camera routines such as metering, which are<br />

typically hidden inside black boxes in traditional camera<br />

programming models.<br />

Speaker(s): Alejandro Troccoli (Mobile Imaging Researcher, NVIDIA)<br />

Topic(s): Computational Photography (Intermediate)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM M<br />

S0638 Lenovo ThinkStation Accelerates Medical<br />

Research with Beckman Coulter (Presented by Lenovo)<br />

Lenovo ThinkStations utilize Nvidia Maximus technology to<br />

accelerate mission critical applications across multiple industries,<br />

including manufacturing, media & entertainment, and Life<br />

Sciences. Discover how <strong>GPU</strong>s are used to accelerate medical<br />

research from product experts with Lenovo and Beckman Coulter.<br />

Beckman Coulter has utilized Nvidia <strong>GPU</strong>s to reduce software<br />

development and test cycles by 50% with their Kaluza software.<br />

Kaluza is a revolutionary flow cytometry analysis software solution<br />

that provides visualization tools, speed and an innovative<br />

simplicity to the flow community. See how Kaluza allows users to<br />

analyze 10 million cells in real time. Session attendees will<br />

receive a drawing entry to win a brand new ThinkPad Tablet.<br />

Speaker(s): Scott Ruppert (ThinkStation Technical Solutions Manager,<br />

Lenovo), Tanmay Dharmadhikari (Senior Software Development<br />

Engineer, Beckman-Coulter)<br />

Topic(s): Computer Graphics, Life Sciences (Beginner)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

HALL 1<br />

S0641 CUDA 5 and Beyond<br />

CUDA, NVIDIA’s platform for parallel computing, has grown<br />

rapidly in the past 5 years. The performance and efficiency of<br />

software built on CUDA, combined with a thriving ecosystem of<br />

programming languages, libraries, tools, training, and service<br />

providers, have helped make <strong>GPU</strong> computing a leading HPC<br />

technology. CUDA 5 and the Kepler <strong>GPU</strong> architecture don’t just<br />

increase application performance; they enable a more powerful<br />

parallel programming model that expands the possibilities of <strong>GPU</strong><br />

computing, and language features that improve programmer<br />

productivity. In this talk you’ll hear about these revolutionary<br />

features and get insight into the philosophy driving the<br />

development of new CUDA hardware and software. You will learn<br />

about NVIDIA’s vision for CUDA and the challenges for the future<br />

of parallel software development.


Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0803 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight<br />

development team! Whether you would like a private meeting to<br />

discuss specific product features or test out your application with<br />

the latest version of Nsight, or you just want to hang out with the<br />

team after attending one of the exciting training session, the<br />

lounge is great place to learn everything you ever wanted to know<br />

about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />

ROOM J1<br />

S0021 OptiX for DirectX <strong>Program</strong>mers - EVE Online’s<br />

<strong>GPU</strong>-Raytraced Portraits<br />

By integrating NVIDIA’s OptiX system for real-time <strong>GPU</strong> raytracing<br />

into a DirectX9 based engine, CCP Games enables high-quality<br />

raytraced player portraits for the single shard MMO Eve Online,<br />

reusing the game’s assets and pipeline. We selectively add<br />

stochastic effects while closely maintaining the look of the<br />

DX9-based renderer that Art Direction aimed for. In this talk we<br />

approach OptiX from the point of view of a programmer familiar<br />

with DirectX, discuss integrating these two systems, and show<br />

how we reproduced some DirectX-based effects like transparency<br />

and subsurface scattering within OptiX.<br />

Speaker(s): Bert Peers (Senior Graphics <strong>Program</strong>mer, CCP Games)<br />

Topic(s): Ray Tracing, Computer Graphics, Application Design &<br />

Porting Techniques (Intermediate)<br />

TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />

ROOM A1<br />

S0104 <strong>GPU</strong> Implementation of Deep Learning for<br />

Intelligent Computer Vision<br />

Learn how to use <strong>GPU</strong> supercomputing for intelligent computer<br />

vision, via deep learning algorithms. We will focus on a case study<br />

of visual object and event recognition in a humanoid robotics<br />

context, involving a port to CUDA of the DeSTIN “compositional<br />

spatiotemporal deep learning network” vision processing<br />

algorithm (originally implemented at the University of Tennessee<br />

in Knoxville for conventional serial computers). The audience will<br />

learn how to use the open-source DeSTIN CUDA code, and also<br />

how to port other deep learning algorithms to CUDA.<br />

Speaker(s): Ben Goertzel (CEO, Novamente LLC)<br />

Topic(s): Computer Vision, Algorithms & Numerical Techniques<br />

(Advanced)<br />

TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />

ROOM C<br />

S0314 Efficient k-Nearest Neighbor Search Algorithms<br />

on <strong>GPU</strong>s<br />

Come see how to select the k smallest elements from an unsorted<br />

list. We present a selection and combination of different<br />

algorithms that perform exact k-nearest neighbors search<br />

(k-NNS) on <strong>GPU</strong>s and outperform the competition. In this session<br />

we present four different selection algorithms designed to exploit<br />

differently the parallelization of the <strong>GPU</strong> according to the relative<br />

size of the corpus data set, the size of the query set and the<br />

number of neighbors sought. We show the application of Logo<br />

Retrieval with SIFT vector matching on two different <strong>GPU</strong>s, the<br />

Tesla C1060 and the Fermi GTX480.<br />

Speaker(s): Nikos Pitsianis (Assistant Professor, Aristotle University,<br />

Greece), Xiaobai Sun (Professor, Duke University)<br />

Topic(s): Machine Learning & AI, Databases, Data Mining, Business<br />

Intelligence, Algorithms & Numerical Techniques (Beginner)<br />

TUESDAY, MAY 15, 16:30 (90 MINUTES<br />

ROOM A7<br />

S0628 <strong>GPU</strong>s in Energy & Exploration: Software<br />

Development and Production<br />

This session will feature expert panelists that will share their<br />

experience adopting <strong>GPU</strong>s in their respective environments. Since<br />

2009, these production systems have been boosting throughput,<br />

and shorten cycle times while delivering enhanced images using<br />

NVIDIA technologies. Featured panelists will include: Hess,<br />

Schlumberger, Petrobras, Chevron and more.<br />

Speaker(s): Paulius Micikevicius (Developer <strong>Technology</strong> Engineer, NVIDIA),<br />

Alexander Loddoch (Chevron), Dave Nichols (Schlumberger), Paulo Souza<br />

(Petrobas), Mauricio Araya (Repsol)<br />

Topic(s): Energy Exploration (Beginner)<br />

TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />

ROOM A2<br />

S0659 Computer Simulation of Lignocellulosic Biomass<br />

Biomass from terrestrial plants offers the potential of an<br />

abundant source of cellulosic ethanol. However, technical<br />

problems still hinder the cost-effective conversion of biomass to<br />

ethanol arising from the recalcitrance of biomass to hydrolysis.<br />

Here, computer simulation of biomass is employed to understand<br />

the physical origins of biomass recalcitrance. The temperaturedependent<br />

structure and dynamics of lignin polymers in aqueous<br />

solution are examined using extensive molecular dynamics<br />

simulations. Neutron scattering experiments and molecular<br />

dynamics simulations reveal the structure of lignin aggregates.<br />

Finally, the interaction of lignin with cellulose is examined and<br />

differential binding to crystalline and amorphous cellulose<br />

explained thermodynamically.<br />

Speaker(s): Loukas Petridis (Staff Scientist, Oak Ridge National<br />

Laboratory)<br />

Topic Areas: Supercomputing (Intermediate)<br />

TUESDAY, MAY 15, 17:00 (25 MINUTES)<br />

ROOM K<br />

S0037 SeqNFind: Application Of CUDA <strong>GPU</strong><br />

Technologies To Sequence Alignment Techniques<br />

Explosive growth in the amount of genomic data has created a<br />

need for faster systems that align and compare nucleotide<br />

sequences. With the development of tools for leveraging the<br />

massively parallel architecture of NVIDIA <strong>GPU</strong>s it is a logical next<br />

step to construct algorithms for genomic analysis on <strong>GPU</strong> clouds/<br />

clusters. Although a seemingly simple task, there are a number of<br />

challenges to deploying the current algorithms. Every algorithm<br />

from Smith-Waterman to BLAST has its own unique set of<br />

barriers. Presented here some of the lessons learned and how<br />

ongoing genomic research projects have benefitted from the<br />

increased speed and accuracy.<br />

Speaker(s): D. Andrew Carr (Director of Bioinformatics, Accelerated<br />

<strong>Technology</strong> Laboratories)<br />

Topic(s): Bioinformatics, Algorithms & Numerical Techniques<br />

(Advanced)<br />

43 CONFERENCE GUIDE TUESDAY


TUESDAY<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

HALL 1<br />

S0156 Towards Computing the Cure for Cancer<br />

Attend this session to learn about how to create “designer”<br />

genomic analysis pipelines as part of the “Compute the Cure” for<br />

cancer initiative from NVIDIA Foundation. It will offer an overview<br />

of an open-source framework that enables the creation of<br />

customized genomic analysis pipelines. It will disucss how<br />

different plug-ins from the “mapping/realignment/discovery”<br />

repositories, respectively, can be composed to form a genomic<br />

analysis pipeline. Attendees will learn to use next-generation<br />

sequencing data to characterize previously undetectable genetic<br />

changes between normal and malignant cells and ways to<br />

contribute to the “Compute the Cure” cause.<br />

Speaker(s): Wu Feng (Professor, Virginia Tech), Heshan Lin (Research<br />

Scientist, Virginia Tech)<br />

Topic(s): Bioinformatics, Life Sciences, Supercomputing, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

TUESDAY, MAY 15, 17:00 (25 MINUTES<br />

ROOM C<br />

S0219 Efficient Top-Down Planning in<br />

Business Intelligence<br />

In business intelligence, tasks like corporate planning or what-if<br />

analysis complement traditional reporting and analysis. One main<br />

difference is that while the latter only read data, the former<br />

require the change of possibly large numbers of existing and<br />

creation of new data records in the business model, preferably in<br />

real time. In this session, we describe the extension of an existing<br />

BI tool, Jedox OLAP, by <strong>GPU</strong>-based parallel algorithms for<br />

interactive planning scenarios. Compared to sequential inmemory<br />

algorithms, our CUDA approach yields tremendous<br />

speedups and can also cope with large amounts of data by using<br />

multiple <strong>GPU</strong>s.<br />

Speaker(s): Tobias Lauer (Senior Researcher, Jedox AG), Alexander<br />

Haberstroh (Software Developer, Jedox AG)<br />

Topic(s): Databases, Data Mining, Business Intelligence, Finance,<br />

Algorithms & Numerical Techniques (Intermediate)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0247 3D ADI Method for Fluid Simulation on<br />

Multiple <strong>GPU</strong>s<br />

Find out about a multiple <strong>GPU</strong> implementation of the Alternating<br />

Direction Implicit method for large 3D domains. The ADI technique<br />

is applied towards direct numerical fluid simulation. Modeling<br />

complex flows demands extremely large grids and a distributed<br />

computation is required for sharing the memory among multiple<br />

<strong>GPU</strong>s. In this session a novel distributed tridiagonal solver as well<br />

as parallelization and load balancing strategies will be covered in<br />

detail. Finally, a comprehensive performance analysis and scaling<br />

studies for different input geometries and possible future<br />

improvements will be discussed.<br />

Speaker(s): Nikolay Markovskiy (HPC DevTech Engineer, NVIDIA),<br />

Nikolai Sakharnykh (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques, Computational<br />

Fluid Dynamics (Intermediate)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM J2<br />

S0267A Mixing Graphics and Compute with Multiple <strong>GPU</strong>s<br />

In this session we will cover all the different aspects of interaction<br />

between graphics and compute. The first part of the session will<br />

focus on compute API interoperability with OpenGL (using CUDA<br />

and OpenCL APIs), while the second part of the session will delve<br />

into interoperability at a system level. In particular we will go<br />

through the challenges and benefits of dedicating one <strong>GPU</strong> for<br />

compute and another for graphics, how different system<br />

configurations affect data transfer between two <strong>GPU</strong>s, and how it<br />

translates into application design decisions helping to enable an<br />

efficient, cross-<strong>GPU</strong> interoperability between compute and<br />

graphics contexts.<br />

This session is repeated on Thursday at 15:30 (S0267B).<br />

Speaker(s): Alina Alt (Applied Engineer, NVIDIA)<br />

Topic(s): Visualization Application Design & Porting Techniques<br />

(Beginner)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM A5<br />

S0359 VMware and NVIDIA: Delivering 3D Workstations<br />

from the Cloud<br />

This session will detail the delivery of the most demanding<br />

Workstation class workloads from the private cloud using<br />

technologies from NVIDIA and VMware. We will cover the<br />

configuration and performance metrics of the combined VMware,<br />

NVIDIA direct pass through hardware accelerated graphics<br />

solution. Using sample workloads, we will demonstrate how<br />

customers can realize the operational and security benefits of<br />

cloud based personal computing without sacrificing performance.<br />

Speaker(s): Aaron Blasius (Sr. Product Manager, VMware), Warren<br />

Ponder (Director, Product Management, VMware)<br />

Topic(s): Visualization, Cloud Computing (Advanced)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM L<br />

S0427 Intra-Day Risk-Management with Parallelized<br />

Algorithms on <strong>GPU</strong>s<br />

The challenge with intra-day risk management is that a very large<br />

number of calculations are required to be performed in a very<br />

short amount of time. Typically, we may be interested in<br />

calculating VaR for 100 to 1000 securities per second based on<br />

100 million potential scenarios. The magnitude of these<br />

calculations is not Utopian but it reflects the reality of modern<br />

financial institutions and exchanges. In this presentation, we<br />

outline how the complex problem of intra-day risk management<br />

can be solved using parallelized algorithms on <strong>GPU</strong>s. The<br />

methodology has been proven in a POC at 2 financial institutions.<br />

Speaker(s): Partha Sen (CEO, Fuzzy Logix)<br />

Topic(s): Databases, Data Mining, Business Intelligence, Finance,<br />

Algorithms & Numerical Techniques, Supercomputing (Advanced)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM A3<br />

S0602 An Introduction to the Thrust Parallel<br />

Algorithms Library<br />

Thrust is a parallel algorithms library which resembles the C++<br />

Standard Template Library (STL). Thrust’s high-level interface<br />

greatly enhances developer productivity while enabling performance<br />

portability between <strong>GPU</strong>s and multicore CPUs. Interoperability with<br />

established technologies (such as CUDA, TBB and OpenMP)<br />

facilitates integration with existing software. In this talk we’ll walk<br />

though the library’s main features and explain how developers can<br />

build high-performance applications rapidly with Thrust.


Speaker(s): Nathan Bell (Senior Research Scientist, NVIDIA), Julien<br />

Demouth (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />

Development Tools and Libraries (Beginner)<br />

TUESDAY, MAY 15, 17:00:00 AM (25 MINUTES)<br />

ROOM A2<br />

S0608 Toward Global Seismic Imaging based on<br />

Spectral-Element and Adjoint Methods<br />

Precise information about the structure of the solid Earth comes<br />

from seismograms recorded at the surface of a highly<br />

heterogeneous lithosphere. Seismic imaging based on spectralelement<br />

and adjoint methods can assimilate this information into<br />

three-dimensional models of elastic and anelastic structure.<br />

These methods fully account for the physics of wave excitation,<br />

propagation, and interaction by numerically solving the<br />

inhomogeneous equations of motion for a heterogeneous<br />

anelastic solid. Such methods require the execution of complex<br />

computational procedures that challenge the most advanced<br />

high-performance computing systems. Current research is<br />

petascale; future research will require exascale capabilities. We<br />

illustrate the current state-of-the-art based on an inversion for<br />

European upper-mantle structure. Our ultimate goal is to move<br />

toward “adjoint tomography” of the entire planet.<br />

Speaker(s): Jeroen Tromp (Director, Princeton Institute for<br />

Computational Science, Princeton)<br />

Topic(s): Supercomputing, (Intermediate)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM M<br />

S0643 Hybrid Architectures for Advanced Seismic<br />

Imaging: Recent Experiences at Bull (Presented by Bull)<br />

The two-part presentation describes Bull’s system architecture<br />

for accelerated seismic applications using <strong>GPU</strong>s, together with<br />

the parallel programming aspects involved and some examples of<br />

recent work. The first part covers hybrid system architectures,<br />

basic principles of Reverse Time Migration and the numerical<br />

methods used to implement it in various forms, together with the<br />

architectural features needed, depending on the specific<br />

algorithms used. The second part examines CUDA programming<br />

aspects and the use of compiler-based directives and libraries to<br />

convert existing codes for maximum performance and scalability<br />

on <strong>GPU</strong> architectures.<br />

Speaker(s): Mathieu Dubois (Senior HPC Consultant, Bull), Guy Gueritz<br />

(Oil & Gas Business Development Director, Bull)<br />

Topic(s): Energy Exploration, High Performance Computing<br />

(Intermediate)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM A8<br />

S0646 Massively Parallel Code Development on Stelletto<br />

CDA (Presented by Creative Consultants)<br />

Come participate in the global launch of Stelletto – a multi-Node,<br />

office based, <strong>GPU</strong> accelerated conSTELLAtion compute platform.<br />

Join Rob Farber (author/scientist), Denis Gerrer (CAPS<br />

Enterprise), and Greg Scantlen (Creative Consultants) to learn<br />

how to create and leverage massively parallel applications.<br />

Whether you are porting legacy code or developing new code from<br />

scratch, the Stelletto Code Development Appliance offers a<br />

cost-effective methodology for producing scalable apps. In 50<br />

minutes you will learn the essentials of assembling a complete<br />

hardware and software solution for scalable Many-Core and <strong>GPU</strong><br />

accelerated code development from plug-in Stelletto to massively<br />

parallel executable code.<br />

Speaker(s): Rob Farber (BlackDog Endeavors, LLC), Denis Gerrer<br />

(CAPS enterprise), Greg Scantlen (CreativeC.com)<br />

Topic (s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

(Intermediate)<br />

TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0804 CUDA Debugger Training on Windows<br />

Nsight offers a variety of powerful CUDA debugging feature set<br />

that enables developers to quickly spot bugs. From the memory<br />

checker to advanced breakpoints and variable warp watch panel, a<br />

developer can quickly isolate access memory errors, filter out the<br />

thousands of threads to a specific thread and quickly spot<br />

abnormal variable value ranges. Through a set of comprehensive<br />

exercises, the attendee will be able to utilize these features to<br />

become fully proficient at developing CUDA code.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 17:30 (25 MINUTES)<br />

ROOM C<br />

S0043 30x Faster Regular Expressions on a <strong>GPU</strong><br />

We present a regular expression (regex) engine on a <strong>GPU</strong>. We<br />

utilize the highly parallel architecture of <strong>GPU</strong>s to accelerate such<br />

searches. We believe that previous attempts to utilize the <strong>GPU</strong> for<br />

this task did not fully tap its potential. Regex present imbalanced<br />

compute workloads which are very different from common <strong>GPU</strong><br />

applications (CFD, CG and image processing). Hence, they can<br />

teach us general lessons on how to utilize <strong>GPU</strong>s for more general<br />

workloads.Our initial results show 30x improvement in running<br />

time relative to single threaded commercial regex engines.<br />

Speaker(s): David Lehavi (Senior Research Scientist, HP)<br />

Topic(s): Databases, Data Mining, Business Intelligence (Advanced)<br />

TUESDAY, MAY 15, 17:30 (25 MINUTES)<br />

ROOM K<br />

S0287 Jacket for Multidimensional Scaling in Genomics<br />

In this tutorial, we will present AccelerEyes’ Jacket software<br />

which enables <strong>GPU</strong> computing in MATLAB through a user case<br />

study entitled “Multidimensional Scaling for Genomics”. We show<br />

how Jacket enables developers to write and run code on the <strong>GPU</strong><br />

in the native M-Language used in MATLAB. By simply casting data<br />

to Jacket’s <strong>GPU</strong> data structure, MATLAB functions are<br />

transformed into <strong>GPU</strong> functions. Additionally, we will also include<br />

demos of running MATLAB code on the <strong>GPU</strong> for image and signal<br />

processing, life science, finance, and other applications. A Q/A<br />

session will enable audience members to ask specific questions<br />

about Jacket.<br />

Speaker(s): Chris McClanahan (Software Engineer, AccelerEyes)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

TUESDAY, MAY 15, 17:30 (25 MINUTES)<br />

ROOM A2<br />

S0657 Applying for INCITE <strong>Program</strong>, Conclusions, Q&A<br />

This session offers a wrap-up of “<strong>GPU</strong>-accelerated Science on<br />

Titan: Tapping into the World’s Preeminent <strong>GPU</strong> Supercomputer to<br />

Achieve Better Science” with Jack Wells.<br />

Speaker(s): Jack Wells, Ph.D. (Director of Science, Oak Ridge<br />

Leadership Computing Facility, Oak Ridge National Laboratory )<br />

Topic(s): Supercomputing (Intermediate)<br />

45 CONFERENCE GUIDE TUESDAY


C++ Accelerated Massive Parallelism (C++ AMP)<br />

���������������������������������������������������������������<br />

What is C++ AMP, how can it help me, and where can I get it?<br />

C++ AMP is a key new C++ language feature plus an STL-like library. It's designed to help you increase the performance of<br />

����������������������������������������������������������������������������������������������������������������������<br />

���������������������������������������������������������������������������������������������������������������������������<br />

�������������������������������������������������������������������������������������������������������������������������<br />

���������������������������������������������������������������������������������������������������������������������<br />

���������������������������������������������������������������<br />

MICROSOFT Ad?<br />

What platforms and hardware does C++ AMP support?<br />

������������������������������������������������������������������������������������������������������������������������<br />

�����������������������������������������������������������������������������������������������������������������������<br />

����������������������������������������������������<br />

�������������������������������������������������������������������������������������������������������������������<br />

What new language feature does C++ AMP introduce?<br />

Microsoft added the restrict(amp)� ��������� ������ ���� ���� ������ ��� ���� ��������� ����������� ��������� ��� �������� ����� ����<br />

function can be executed on a C++ AMP accelerator. The restrict keyword instructs the compiler to statically check that the<br />

�����������������������������������������������������������������������������������������void myFunc() restrict(amp) {…}<br />

��������������������������������������������������������������������������������������������������������������������������<br />

for purposes that are unrelated to C++ AMP.<br />

What new classes (APIs) does C++ AMP introduce?<br />

��������������������������������������������������������������������������������������������������������������������<br />

���������������������������������������������������������������������������������������������������������������������<br />

��������������������������������������������������������������������������������������������������������������������������<br />

�����������������������������������������������������������������������������<br />

���������������������������������������������������������������������������������������������������<br />

What does C++ AMP code look like?<br />

�����������������������������������������������������������������������������������������������<br />

void AddArrays(int n, int m, int * pA, int * pB, int * pSum) {<br />

concurrency::array_view a(n, m, pA), b(n, m, pB), sum(n, m, pSum);<br />

concurrency::parallel_for_each(sum.extent, [=](concurrency::index i) restrict(amp)<br />

{<br />

sum[i] = a[i] + b[i];<br />

});<br />

}<br />

Follow our blog: ������������������������������������������<br />

Ask questions: ��������������������������������������������������������������������


SESSION INFORMATION<br />

WEDNESDAY, MAY 16<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM N<br />

S0010 Towards Routine Microsecond Molecular<br />

Dynamics Simulations on Commodity Hardware<br />

The original AMBER 11 provided performance on one <strong>GPU</strong><br />

equivalent to an 8 node cluster and almost 60ns/day for 8 <strong>GPU</strong>s<br />

running the JAC production benchmark without additional<br />

approximations outstripping the performance of all conventional<br />

supercomputers. Here we describe further optimization of the<br />

code, coupled with hardware and software advances on the part of<br />

NVIDIA, that provides performance of >50ns/day on a single <strong>GPU</strong><br />

with multiple <strong>GPU</strong>s providing simulation rates on systems the size<br />

of DHFR approaching a microsecond per day. This brings<br />

performance levels on desktops and commodity hybrid clusters to<br />

levels previously only considered possible using custom silicon.<br />

Speaker(s): Ross Walker (Assistant Professor, University of California<br />

San Diego)<br />

Topic(s): Molecular Dynamics, Life Sciences (Advanced)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM A8<br />

S0017 4D Medical Image Processing with CUDA<br />

Learn how to do 4D image processing with CUDA, especially for<br />

medical imaging applications. In this session we will give a couple<br />

of examples of how 4D image processing can take advantage of<br />

the computational power of the <strong>GPU</strong>. We will present how to use<br />

the <strong>GPU</strong> for functional magnetic resonance imaging (fMRI)<br />

analysis and true 4D image denoising. Most of our examples use<br />

the <strong>GPU</strong> both to speedup the analysis and to visualize the results.<br />

Speaker(s): Anders Eklund (PhD Student, Linköping University)<br />

Topic(s): Medical Imaging & Visualization, Audio, Image and Video<br />

Processing, Neuroscience, Visualization (Advanced)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM B<br />

S0072 <strong>GPU</strong>-Enabled Spatiotemporal Model of Stochastic<br />

Cardiac Calcium Dynamics and Arrhythmias<br />

Calcium ions play a central role controlling the contraction of the<br />

heart to pump blood. This requires tight regulation of cellular<br />

calcium dynamics which depends upon over 1,000,000 calcium<br />

channels that open and close stochastically and have a very specific<br />

spatial arrangement. In the School of Systems Biology at George<br />

Mason University, CUDA technology coupled to novel algorithms for<br />

Monte Carlo simulation have made possible this computationally<br />

expensive spatiotemporal model of calcium dynamics in the heart<br />

muscle cell to study the regulation of calcium dynamics and what<br />

aberrations leads to cardiac arrhythmia.<br />

Speaker(s): Mohsin Jafri (Professor and Chair, George Mason University),<br />

Hoang-Tron Minh Tuan (PhD Student, George Mason University)<br />

Topic(s): Life Sciences, Bioinformatics (Beginner)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM A7<br />

S0171 Numerical Modeling Of 3D Anisotropic Seismic<br />

Wave Propagation On Multi<strong>GPU</strong> Platforms<br />

We present an efficient and accurate numerical algorithm for the<br />

simulation of seismic experiments. The basis of the approach is a<br />

heterogeneous spectral element method implemented on<br />

Multi<strong>GPU</strong> applied to anisotropic elastic wave equation. The<br />

approach was designed to simulate wave propagation in 3D<br />

arbitrary anisotropic elastic media. Due to the use of an<br />

unstructured grid, the spectral element algorithm enables<br />

handling complicate geometries of the layers. We discuss results<br />

and computational efforts of simulation on Multi<strong>GPU</strong> platform.<br />

Several aspects of the code implementation are considered:<br />

optimal domain decomposition, data transfers between <strong>GPU</strong> by<br />

means of P2P and UVA, etc.<br />

Speaker(s): Denis Sabitov (Schlumberger)<br />

Topic(s): Energy Exploration, Algorithms & Numerical Techniques,<br />

Supercomputing, Molecular Dynamics (Intermediate)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM M<br />

S0253 Sensor Processing with Rugged Kepler <strong>GPU</strong>s<br />

(Presented by GE Intelligent Platforms)<br />

Swimming in sensors and drowning in data? Turn the tide on<br />

high-bandwidth sensors with rugged next-generation Kepler <strong>GPU</strong>s<br />

from NVIDIA. See how we deploy Kepler into the most extreme of<br />

environments, providing GP<strong>GPU</strong> capabilities onboard platforms<br />

where SWaP and GFLOPS/watt is key. Dig into four realtime CUDA<br />

sensor processing applications - Hyperspectral Imaging,Wide-Area<br />

Surveillance, 360° Situational Awareness, and GSM Cellular SIGINT.<br />

Discuss the CUDA algorithms, interconnects, and rugged platforms<br />

behind each. Learn how we utilize <strong>GPU</strong>Direct and realtime Linux for<br />

improved latency and determinism.<br />

Speaker(s): Dustin Franklin (GP<strong>GPU</strong> Applications Engineer, GE<br />

Intelligent Platforms)<br />

Topic(s): Audio, Image and Video Processing, General Interest, Machine<br />

Vision, Computer Vision (Intermediate)<br />

WEDNESDAY, MAY 16, 09:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0289 Fine-Grained Parallel Preconditioners for Fast<br />

<strong>GPU</strong>-based Solvers<br />

Leverage the power of <strong>GPU</strong>s for efficient parallel solution of large<br />

sparse linear systems of equations by means of fine-grained and<br />

scalable parallel preconditioners. In this session we describe<br />

parallel preconditioners for <strong>GPU</strong>s based on multicolor re-ordering<br />

for Gauss-Seidel-type and ILU-type preconditioners as well as<br />

approximate inverse (FSAI) preconditioners. With the power(q)pattern<br />

method we detail a novel method for controlling the fill-in<br />

pattern of ILU(p) factorizations that introduces a high degree of<br />

parallelism in the preconditioning phase. We demonstrate<br />

significant improvements with respect to solver time for various<br />

problem scenarios and different Krylov-type solvers.<br />

Speaker(s): Dimitar Lukarski (Research Associate, Karlsruhe Institute<br />

of <strong>Technology</strong> (KIT)), Jan-Philipp Weiss (Junior Professor, Karlsruhe<br />

Institute of <strong>Technology</strong>)<br />

Topic(s): Algorithms & Numerical Techniques (Advanced)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM A1<br />

S0353 <strong>Program</strong>ming Multi-<strong>GPU</strong>’s for Scalable Rendering<br />

Multi-<strong>GPU</strong> configurations are becoming common affordable<br />

options for OpenGL applications to scale performance, data size,<br />

display size and image quality. We show how to structure your<br />

application for multi-gpu rendering by using multiple threads and<br />

OpenGL contexts and handle the synchronization and data<br />

transfer. We conclude with a discussion of how to implement<br />

common parallel rendering approaches such as sort-first,<br />

sort-last and hybrid techniques.<br />

47 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

Speaker(s): Shalini Venkataraman (Senior Applied Engineer, NVIDIA<br />

Topic(s): Visualization (Advanced)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM L<br />

S0383 Speedup Derivatives and Structured Products<br />

Pricing, Reduce TCO Using <strong>GPU</strong>s<br />

Numerix will share its experience using <strong>GPU</strong> to significantly<br />

reduce its customers’ Total Cost of Ownership (TCO) and<br />

accelerate forward Monte Carlo pricing methods and hybrid<br />

models of complex financial structured products and variable<br />

annuities. Numerix will describe how it combines complex<br />

financial and actuarial modeling with user scripting to drive <strong>GPU</strong><br />

execution from a script interpreted at run time. This architecture<br />

is well suited to financial services firms with portfolios of many<br />

different types of structured products where deals are<br />

represented independently from the models used to price them.<br />

Speaker(s): Steve Karmesin (Senior Developer, Numerix)<br />

Topic(s): Finance, Algorithms & Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM A5<br />

S0420 NSight IDE for Linux and Mac<br />

NSight IDE for Linux and Mac is an all-in-one development<br />

environment that lets you develop, debug and optimize CUDA code in<br />

an integrated UI environment. If you were waiting for an IDE on Linux<br />

and Mac then this session is for you. This session provides a detail<br />

usage walk-through of a fully CUDA aware source editor, build<br />

integration of the CUDA toolchain, graphical debugger for both CPU<br />

and <strong>GPU</strong>, and graphical profiler to enable performance optimization.<br />

Speaker(s): David Goodwin (Software Engineer, NVIDIA), Eugene<br />

Ostroukhov (Tools Developer, NVIDIA)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 09:00 (25 MINUTES)<br />

ROOM K<br />

S0431 Evolving Use of <strong>GPU</strong> for Dassault Systems<br />

Simulation Products<br />

SIMULIA, the Dassault Systems brand for simuliation, has been<br />

working with NVIDIA GP<strong>GPU</strong> cards to accelerate the computation<br />

required in doing large-scale structural finite-element<br />

simulations with the widely used Abaqus product line. SIMULIA’s<br />

initial efforts with GP<strong>GPU</strong>’s have been focused on accelerating<br />

particularly costly parts of the code when running both on<br />

workstations and clusters. We will look at success in these areas<br />

with existing products. Futher SIMULIA is now looking at how<br />

evolving programming models like OpenACC open the door to<br />

using <strong>GPU</strong>’s as a compute platform more than acceleration for<br />

limited parts of an application.<br />

Speaker(s): Luis Crivelli (Dassault Systemes, SIMULIA)<br />

Topic(s): Computational Structural Mechanics, Parallel <strong>Program</strong>ming<br />

Languages & Compilers (Intermediate)<br />

WEDNESDAY, MAY 16, 09:00 (90 MINUTES)<br />

ROOM C<br />

S0531 Exascaling Your Apps<br />

In the global exascale race, hardware often takes center stage.<br />

But the race might ultimately be won or lost based on how well<br />

the industry optimizes new and existing applications for extreme<br />

parallelism. Today’s apps will not just run on tomorrow’s systems,<br />

so we must think strategically and creatively about how to design<br />

applications that take maximum advantage of the first power-<br />

efficient, accelerator-driven exascale systems. This panel of HPC,<br />

software and computer science experts will discuss what we can,<br />

and should be doing, including a review of new scientific and<br />

commercial HPC requirements, programming model options and<br />

how to best align architecture and software design processes.<br />

Speaker(s): Mike Bernhardt (The Exascale Report), Olav Lindtjorn<br />

(Schlumberger), Satoshi Matsuoka (Titech), Steve Scott (CTO, Tesla<br />

Business, NVIDIA), Jeff Vetter (Oak Ridge National Laboratory) )<br />

Topic(s): Supercomputing (Beginner)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0805 CUDA Profiler Training on Windows<br />

Nsight offers a comprehensive set of performance analysis tools.<br />

From the ability to trace complete system multi-core CPU and<br />

multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />

experiments, developers can identify system level optimization<br />

opportunities as well as expensive and inefficient CUDA kernels<br />

requiring in-depth analysis with the CUDA profiler. Through a set<br />

of comprehensive exercises, the attendee will be able to utilize<br />

these features to become fully proficient at optimizing complex<br />

CUDA applications.<br />

Speaker(s): NVIDIA Developer Tools Team)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2000 Emerging Companies Summit Opening Address,<br />

Followed by CEO on Stage featuring Rocketick and Cortexica<br />

The Emerging Companies Summit is a unique forum for startup<br />

companies to showcase innovative applications that leverage the<br />

<strong>GPU</strong> to solve visual and compute-intensive problems. The opening<br />

address includes an overview of NVIDIA’s <strong>GPU</strong> ecosystem<br />

development activities. ECS is a great opportunity to discover new<br />

players in the <strong>GPU</strong> ecosystem, find great investments, explore<br />

partnership and customer/vendor opportunities, network/build<br />

relationships, and discuss the future of an industry that is<br />

reshaping computing. Immediately following the opening address is<br />

the ECS CEO on Stage session featuring two startups who will each<br />

have 15 minutes to introduce their companies and interact with a<br />

panel of leading venture capitalists, technology executives, and<br />

industry analysts.<br />

Speaker(s): Jeff Herbst (Vice President of Business Development, NVIDIA),<br />

Tomer Ben-David (VP R&D, Rocketick), Iain McCready (CEO, Cortexica)<br />

Topic(s): General Interest<br />

WEDNESDAY, MAY 16, 09:30 (25 MINUTES)<br />

ROOM K<br />

S0225 Speedup Altair RADIOSS Solvers Using NVIDIA <strong>GPU</strong><br />

Solvers are the heart of Altair’s HyperWorks computer aided<br />

engineering simulation software. In this session, you will learn how<br />

<strong>GPU</strong> can improve their performance. Direct solver is widely used in<br />

structural analysis and sensitivity calculations. By offloading the<br />

intensive matrix computation on the <strong>GPU</strong> and using heterogeneous<br />

computing, you will discover how its speed can be increased<br />

compared to multi-core approach. Iterative solver is particularly<br />

suited to solve large problems with millions of degrees of freedom.<br />

An innovative hybrid parallelization using multi <strong>GPU</strong>s and MPI<br />

allowing dramatic solution time reduction will be presented.<br />

Speaker(s): Eric Lequiniou (Director, High Performance Computing,<br />

Altair), Hongwei Zhou (Senior Software Development Engineer, Altair)<br />

Topic(s): Computational Structural Mechanics (Beginner)


WEDNESDAY, MAY 16, 09:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0415 An Accelerated Weeks Method for Numerical<br />

Laplace Transform Inversion<br />

Mathematical methods based on the use of the Laplace transform<br />

are a standard component of undergraduate education. Real world<br />

problems however often yield Laplace space solutions which are<br />

too complex to be analytically inverted to expressions in physically<br />

meaningful variables. A robust numerical inversion approach is<br />

thus desirable. In this talk, I present one of the approaches to<br />

compute an approximate inverse, the Weeks method. I will also<br />

discuss the difficulties in performing numerical inversion. Finally,<br />

I will show how we have been able to utilize Jacket from<br />

AccelerEyes in MATLAB to more efficiently and robustly<br />

implement the Weeks method.<br />

Speaker(s): Patrick Kano (Co-Owner, Acunum Algorithms and<br />

Simulations, LLC)<br />

Topic(s): Algorithms & Numerical Techniques (Beginner)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

ROOM A2<br />

S0016 NVIDIA Grad Fellowship Fast Forward<br />

We invite you to a special presentation from our 2011-<strong>2012</strong><br />

Graduate Fellowship recipients to learn “what’s next” in the world<br />

of research and academia. The NVIDIA Graduate Fellowship<br />

recipients were selected from 200 applications in 27 countries.<br />

Sponsored projects involve a variety of technical challenges,<br />

including computer architecture, computer vision, programmability<br />

and optimization for heterogeneous systems, automotive computing<br />

and much more. We believe that these minds lead the future in our<br />

industry and we are proud to support the 2011-<strong>2012</strong> NVIDIA<br />

Graduate Fellows. For more information on the 2011-<strong>2012</strong> NVIDIA<br />

Graduate Fellows, please visit www.NVIDIA.com/fellowship.<br />

Speaker(s): David Luebke (Director, NVIDIA Research)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

ROOM N<br />

S0058 Advancing <strong>GPU</strong> Molecular Dynamics: Rigid Bodies<br />

in HOOMD-blue<br />

Learn how rigid body dynamics are implemented in HOOMD-blue.<br />

Previous releases were capable of executing classical molecular<br />

dynamics -- where free particles interact via smooth potentials and<br />

their motion through time is computed using Newton’s laws. The<br />

latest version allows particles to be grouped into bodies that move<br />

as rigid units. Users can now simulate materials made of cubes,<br />

rods, bent rods, jacks, plates, patchy particles, bucky balls, or any<br />

other arbitrary shapes. This talk covers how these algorithms are<br />

implemented on the <strong>GPU</strong>, tuned to perform well for bodies of any<br />

size, and discusses several use-cases relevant to research.<br />

Speaker(s): Joshua Anderson (Research Area Specialist, University of<br />

Michigan), Trung Dac Nguyen (University of Michigan)<br />

Topic(s): Molecular Dynamics, Computational Physics (Intermediate)<br />

WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />

ROOM K<br />

S0066 Particleworks: Particle-based CAE Software<br />

Fully Ported on Multi-<strong>GPU</strong><br />

Get the latest information on Particle-based fluid simulation +<br />

multi-<strong>GPU</strong> computing as a commercial CAE software named<br />

“Particleworks” in Japan. In this session, we provide the<br />

information such as (1) Particle simulation trends in CAE, (2)<br />

Particle simulation development in Japanese industry, (3)<br />

Implementation and performance of full <strong>GPU</strong> porting and (4)<br />

Multi-<strong>GPU</strong>s scaling with the several clients’ cases.<br />

Speaker(s): Yoshiaki Hanada (CEO, Prometech Software), Issei Masaie<br />

(Chief Engineer, Prometech Software)<br />

Topic(s): Computational Fluid Dynamics (Intermediate)<br />

WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />

ROOM A7<br />

S0125 Memory Efficient Reverse Time Migration in 3D<br />

Learn how we can image the interior of the Earth in three dimensions<br />

using Reverse Time Migration. We discuss how <strong>GPU</strong>s accelerate this<br />

method using parallel wave propagation kernels, texture memories<br />

and minimal device to host transfers. Further we discuss how the<br />

progression to 3D presents a multitude of new problems, particularly<br />

memory based - causing the system to be IO limited. By manipulating<br />

boundary positions and values to a pseudo-random form we show<br />

how many of these memory restrictions can be diminished and how<br />

detailed subsurface images can be fully constructed using <strong>GPU</strong>s.<br />

Speaker(s): Chris Leader (Research Assistant, Stanford<br />

Exploration Project)<br />

Topic(s): Energy Exploration, Computational Physics (Intermediate)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

ROOM A5<br />

S0235 Compiling CUDA and Other Languages for <strong>GPU</strong>s<br />

This talk gives an overview of the technology behind NVIDIA’s<br />

CUDA C and OpenCL C compilers, as well as the <strong>GPU</strong> architecture<br />

as seen from a compiler’s perspective. Similarities and<br />

differences with compiling to a CPU are also discussed. We<br />

provide insights into compiler optimizations affect performance<br />

and how other languages could be targeted to <strong>GPU</strong>s.<br />

Speaker(s): Vinod Grover (Senior Manager, NVIDIA), Yuan Lin (Senior<br />

Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

ROOM L<br />

S0250 From <strong>GPU</strong> Computing Toward Full HPC In Finance<br />

with <strong>GPU</strong>s<br />

During the previous <strong>GTC</strong> Murex has shown how the company had<br />

adapted their generic Monte-Carlo & PDE codes compatible with a<br />

payoff language. With one more year of experience with <strong>GPU</strong>s and<br />

OpenCL Murex will show how the company has broadened the<br />

usage of <strong>GPU</strong>s for other subjects like vanilla screening or model<br />

calibration and focus on their new challenge ‘use as many <strong>GPU</strong>s<br />

as possible’ for one single computation.<br />

Speaker(s): Pierre Spatz (Head of Quantitative Research, Murex SAS)<br />

Topic(s): Finance (Intermediate)<br />

WEDNESDAY, MAY 16, 10:00 (25 MINUTES<br />

ROOM B<br />

S0262 <strong>GPU</strong>-Accelerated Model-Based Drug Development<br />

Explore how <strong>GPU</strong>s can be used to improve the efficiency of drug<br />

development. Drug development is a very time-consuming,<br />

complex and expensive process that has low successful rate. A<br />

model-based drug development paradigm has been proposed as a<br />

possible solution to overcome these problems. A key challenge is<br />

to develop computational intensive drug and disease-specific<br />

models from a large quantity of highly complicated preclinical and<br />

clinical data. This session will describe how <strong>GPU</strong>s can and will<br />

49 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

play a key role in shortening the model development times and<br />

improving the efficiency of model-based drug development.<br />

Speaker(s): Chee Ng (Research Assistant Professor of Pediatrics,<br />

Children Hospital of Philadelphia/University of Pennsylvania)<br />

Topic(s): Life Sciences, Algorithms & Numerical Techniques,<br />

Bioinformatics (Beginner)<br />

WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />

ROOM A8<br />

S0312 <strong>GPU</strong> Implementation for Rapid Iterative Image<br />

Reconstruction in Nuclear Medicine<br />

<strong>GPU</strong> implementation can greatly accelerate iterative techniques of<br />

3D image reconstruction in nuclear medicine imaging. Single<br />

Photon Emission Computed Tomography (SPECT) is a functional<br />

imaging modality widely used in clinical diagnosis. To obtain high<br />

quality images within reduced scanning times high sensitivity<br />

collimators need to be used and their response function modeled<br />

in the reconstruction. This is in general very computationally<br />

intensive and unfeasible with CPU and algorithm<br />

implementations. Our software is able to perform the<br />

reconstruction of patient data within clinically acceptable times<br />

using relatively low cost and widely available hardware.<br />

Speaker(s): Jakub Pietrzak (Software Engineer, University of Warsaw)<br />

Topic(s): Medical Imaging & Visualization, Computational Physics,<br />

Computer Graphics (Intermediate)<br />

WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />

ROOM A1<br />

S0322 Warping & Blending for Multi-Display Systems<br />

This talk will describe how to scale up from one to many displays for<br />

high end visualization. You will learn about NVIDIA’s new Warp and<br />

Blend capability that allows you to create a truly seamless logical<br />

display comprised of many individual display outputs. With this new<br />

capability you can project your graphics onto curved surfaces and<br />

implement the correct transformation entirely on the <strong>GPU</strong> without<br />

any external hardware to get the correct display transformations.<br />

Speaker(s): Shalini Venkataraman (Senior Applied Engineer, NVIDIA)<br />

Topic(s): Visualization, Computer Graphics (Beginner)<br />

WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />

ROOM A3<br />

S0325 ArrayFire Graphics: A Tutorial<br />

Learn how to use the graphics primitives for <strong>GPU</strong> computing<br />

available in ArrayFire, a new C and C++ library for <strong>GPU</strong> computing<br />

in both CUDA and OpenCL. In this session, we will cover the<br />

capabilities of ArrayFire’s graphics primitives and show how to<br />

build fast, visual computing applications. The tutorial centers<br />

around the construction of an application for the computation of<br />

optical flow on the <strong>GPU</strong> and will illustrate how to couple graphics<br />

with compute using ArrayFire’s graphics primitives. We will also<br />

show how the graphics primitives can be composed to result in<br />

scalable, fast graphics that complement <strong>GPU</strong> applications.<br />

Speaker(s): Chris McClanahan (Software Engineer, AccelerEyes)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

ROOM M<br />

S0633 Learn about new Hewlett-Packard <strong>GPU</strong><br />

Systems, Solutions, and Applications! (Presented by<br />

Hewlett-Packard)<br />

Learn how to shorten time to discovery, gain faster insight, and<br />

beat the barriers to innovation, with performance, efficiency and<br />

agility! Hear the latest on how you can do this and more with HP’s<br />

purpose built SL server line. Servers are specifically designed for<br />

<strong>GPU</strong>s with HP ProActive Insight Architecture. Discover what a new<br />

generation of workstation desktop <strong>GPU</strong> computing technology<br />

from HP and NVIDIA can do for you! HP will compare and contrast<br />

<strong>GPU</strong> compute performance on the PCI Express Gen2 architecture<br />

available in HP’s Z800 Workstation to the PCI Express Gen3<br />

architecture in HP’s latest Z820 Workstation.<br />

Speaker(s): David Korf (Senior Marketing Manager, Hewlett-Packard),<br />

John Brown (Principle Engineer, Hewlett-Packard)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0806 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight development<br />

team! Whether you would like a private meeting to discuss specific<br />

product features or test out your application with the latest version<br />

of Nsight, or you just want to hang out with the team after attending<br />

one of the exciting training session, the lounge is great place to<br />

learn everything you ever wanted to know about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2001 Emerging Companies Summit: CEO on Stage<br />

Featuring Unity Technologies, MirriAd, and BioDigital<br />

See the hottest new technologies from startups that are<br />

transforming computing. In a lively and fast-paced exchange, the<br />

Emerging Companies Summit CEO on Stage sessions will feature<br />

CEOs from three startups who will each have 15 minutes to<br />

introduce their companies and interact with a panel of leading<br />

venture capitalists, technology executives, and industry analysts.<br />

Speaker(s): David Helgason (CEO, Unity Technologies), Mark<br />

Popkiewicz (CEO, MirriAd), Aaron Oliker (Partner/Director of 3D<br />

<strong>Technology</strong>, BioDigital), and Frank Sculli (Co-Founder/Informatics<br />

Director, BioDigital)<br />

Panelist(s): Jon Peddie (President, Jon Peddie Research), Neil<br />

Sequeira (Managing Director, General Catalyst Partners), Savitha<br />

Srinivasan (Partner, IBM Venture Capital Group)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 10:30 (25 MINUTES<br />

MARRIOTT BALLROOM 3<br />

S0115 Specialized Sparse Matrix Formats and SpMV<br />

Kernel Tuning for <strong>GPU</strong>s<br />

This session is focused on optimizing sparse matrix-vector product<br />

for NVIDIA <strong>GPU</strong>s. This is a frequently studied kernel that appears in<br />

applications employing iterative methods for solving systems of<br />

linear equations. In the majority of cases the computation is<br />

memory bandwidth bound. Our study focuses on developing<br />

specialized sparse matrix storage formats and corresponding<br />

CUDA SpMV implementation that achieves high performance at the<br />

cost of additional start-up time required for conversion and tuning.<br />

The proposed storage formats allow to reduce required memory<br />

bandwidth by providing compact coding for locations of some<br />

frequently observed patterns of non-zero elements.<br />

Speaker(s): Arutyun Avetisyan (Deputy Director, ISP, Russian Academy<br />

of Sciences), Alexander Monakov (Researcher, ISP, Russian Academy<br />

of Sciences)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)


WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />

ROOM A3<br />

S0209 Performance of 3-D FFT Using Multiple <strong>GPU</strong>s with<br />

CUDA 4<br />

Get the latest information on performance of 3-D fast Fourier<br />

transform using multiple <strong>GPU</strong> devices. CUDA 4.0 enables efficient<br />

data transfer between <strong>GPU</strong>s. It is really important in FFT computation<br />

since it requires a large amount of all-to-all data exchange between<br />

<strong>GPU</strong>s. The peer-to-peer communication feature of <strong>GPU</strong>Direct V2<br />

improves the communication between the devices on same node.<br />

<strong>GPU</strong>Direct also accelerates the communication between <strong>GPU</strong>s on<br />

different nodes. We will present the latest performance results on a<br />

four-<strong>GPU</strong> system and up to 128 compute nodes of TSUBAME 2.0.<br />

Speaker(s): Akira Nukada (Researcher, Tokyo Institute of <strong>Technology</strong>)<br />

Topic(s): Algorithms & Numerical Techniques, Development Tools<br />

& Libraries (Advanced)<br />

WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />

ROOM B<br />

S0272 <strong>GPU</strong> GWAS - CUDA Based Genome Wide<br />

Association Studies<br />

We have developed a CUDA based GWAS analyzer that has<br />

achieved a 10x analysis speed-up per <strong>GPU</strong>. Genome wide<br />

association studies scans through millions of SNP markers across<br />

the human genome seeking the genetic basis of life threatening<br />

diseases such as coronary artery disease and prostate cancer. The<br />

prospect of the $1,000 genome heralds a potential new scale of<br />

GWAS involving hundreds of thousands of patients. We will<br />

discuss how we utilized the Python, R, and C languages to produce<br />

a robust GWAS algorithm that can be extended to multiple <strong>GPU</strong>s<br />

and <strong>GPU</strong> clusters.<br />

Speaker(s): Tim Bi (Graduate Research Analyst, Johns Hopkins<br />

University / George Mason University)<br />

Topic(s): Life Sciences, Bioinformatics (Intermediate)<br />

WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />

ROOM K<br />

S0304 Large Scale Computational Fluid Dynamics<br />

Simulations on Hybrid Supercomputers<br />

Learn how to approach the all-too-common program of trying to<br />

retrofit a major application for speed in the modern era of the<br />

hybrid supercomputer. In this talk, we will focus on computational<br />

fluid dynamics (CFD) codes that are run on Top500<br />

Supercomputers. Many of these applications have existed for 20 or<br />

more years, so the process of adding the <strong>GPU</strong> and getting<br />

wall-clock improvements in performance can be very challenging!<br />

Our talk will discuss how to properly target your effort, the impact<br />

of directives-based coding, and how to maintain efficiency across<br />

a hybrid cluster.<br />

Speaker(s): John Humphrey (Engineering Director, EM Photonics), Eric<br />

Kelmelis (CEO, EM Photonics)<br />

Topic(s): Computational Fluid Dynamics, Supercomputing<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />

ROOM A8<br />

S0348 <strong>GPU</strong>s Open New Avenues in Medical MRI<br />

See how <strong>GPU</strong>s enable exciting new developments in medical<br />

Magnetic Resonance Imaging (MRI). Their computational power<br />

makes now practical new MRI techniques that can bring shorter<br />

imaging sessions, better images, and more insight into human<br />

physiology. Learn about the characteristics of the general<br />

computational approach for obtaining the final image, and how it<br />

can be implemented using an iterative conjugate gradient<br />

algorithm. The algorithm exhibits massive parallelism and fits<br />

well the <strong>GPU</strong> architecture. Learn about its CUDA implementation<br />

details and Matlab integration. See throughput measurements of<br />

Tesla <strong>GPU</strong>s compared to top of the line many-core and large RAM<br />

CPU systems.<br />

Speaker(s): Chris A. Cocosco (Scientist, University Medical Center<br />

Freiburg, Dept. of Radiology, Medical Physics)<br />

Topic(s): Medical Imaging & Visualization (Beginner)<br />

WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />

ROOM A7<br />

S0352 <strong>GPU</strong>-Accelerated Parallel Computing for<br />

Simulation of Seismic Wave Propagation<br />

We adopted <strong>GPU</strong> to accelerate large-scale, parallel finitedifference<br />

(FDTD) simulation of seismic wave propagation.<br />

Effective parallel implementation is needed because the size of<br />

the memory of a single <strong>GPU</strong> is too small for real applications.<br />

Thus we describe the memory optimization, the threedimensional<br />

domain decomposition, and overlapping the<br />

communication and computation adopted in our program. We<br />

achieved so far a high performance (single-precision) of about 61<br />

TFlops by using 1200 <strong>GPU</strong>s of TSUBAME-2.0, the <strong>GPU</strong><br />

supercomputer in Tokyo Institute of <strong>Technology</strong>, Japan. As an<br />

important application, we show the results of the simulation of the<br />

2011 Tohoku-Oki mega-quake.<br />

Speaker(s): Taro Okamoto (Assistant Professor, Tokyo Institute<br />

of <strong>Technology</strong>)<br />

Topic(s): Energy Exploration, Computational Physics, General Interest<br />

(Advanced)<br />

WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />

ROOM A1<br />

S0355 Seamless Scalable Displays- Using NVDIA Warp +<br />

Intensity API<br />

In this talk we will discuss how we use the NVIDIA Warp and<br />

Intensity API to create seamless displays made up of<br />

multiprojectors based on our camera feedback systems. We will<br />

show and discuss case studies in production including a 25<br />

megapixel touch wall, military dome simulation systems, VR<br />

Walls, VR Caves, and immersive conference rooms that are made<br />

affordable and enabled by this technology.<br />

Speaker(s): Rajeev Surati (President, Scalable Display Technologies)<br />

Topic(s): Visualization, Audio, Image and Video Processing, Computer<br />

Vision, Computer Graphics (Beginner)<br />

WEDNESDAY, MAY 16, 11:00 (50 MINUTES)<br />

HALL 1<br />

S3001 Day 2 Keynote: From Democratic Consensus to<br />

Cannibalistic Hordes: <strong>GPU</strong> Computing Reveals the<br />

Principles of Collective Behavior<br />

Collective behavior is one of the most pervasive features of the<br />

natural world. Our brains are composed of billions of<br />

interconnected cells communicating with chemical and electrical<br />

signals. We are integrated in our own human society. Elsewhere in<br />

the natural world a fish school convulses, as if one entity, when<br />

being attacked by a predator. How does individual behavior<br />

produce dynamic group-level properties? Do animal groups -or<br />

even cells in a tumor- function as some form of ‘collective mind’?<br />

How does socially contagious behavior spread through natural<br />

human crowds? In his keynote address, Prof. Iain D. Couzin, will<br />

demonstrate how <strong>GPU</strong> computing has been pivotal in the study of<br />

51 CONFERENCE GUIDE WEDNESDAY


NVIDIA ® Quadro ® by PNY<br />

Visually Amplify Your Desktop<br />

If you’re an artist, designer, or video professional, accelerate your<br />

® ® Quadro by PNY professional<br />

graphic solutions. Delivering excellent graphics performance<br />

across a broad range of design, animation and video<br />

applications, NVIDIA Quadro by PNY offers the advantage.<br />

Get The Advantage·<br />

To learn more go to www.pny.com/quadro<br />

© <strong>2012</strong> NVIDIA Corporation. NVIDIA, the NVIDIA logo, Quadro are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries.<br />

Other company and product names may be trademarks of the respective companies with which they are associated. All rights reserved.<br />

The PNY logo is a registered trademark of PNY Technologies, Inc. All other trademarks are the property of their respective owners. Copyright © <strong>2012</strong> PNY Technologies, Inc. All rights reserved.


collective behavior, helping reveal how collective action emerges<br />

in a wide range of groups from plague locusts to human crowds,<br />

and the critical role that uninformed, or weakly-opinionated,<br />

individuals play in democratic consensus decision-making.<br />

Speaker(s): Iain Couzin (Assistant Professor, Princeton University)<br />

Topic(s): General Interest (All Levels)<br />

WEDNESDAY, MAY 16, 11:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2002 Emerging Companies Summit: CEO on Stage<br />

Featuring eyeSight Mobile, Numira Biosciences, and Ubitus<br />

See the hottest new technologies from startups that are<br />

transforming computing. In a lively and fast-paced exchange, the<br />

Emerging Companies Summit CEO on Stage sessions will feature<br />

CEOs from three startups who will each have 15 minutes to<br />

introduce their companies and interact with a panel of leading<br />

venture capitalists, technology executives, and industry analysts.<br />

Speaker(s): Gideon Shmuel (CEO, eyeSight Mobile), David Weinstein<br />

(CTO, Numira Biosciences), Wesley Kuo, (CEO, Ubitus)<br />

Panelist(s): Jon Peddie (President, Jon Peddie Research), Neil<br />

Sequeira (Managing Director, General Catalyst Partners), Savitha<br />

Srinivasan (Partner, IBM Venture Capital Group)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM C<br />

S0027B All-In-One Debugging Experience with CUDA-<br />

GDB and CUDA-MEMCHECK<br />

CUDA Debugger tools CUDA-GDB and CUDA-MEMCHECK provide<br />

a whole new feature set to help improve your CUDA application<br />

development cycle. This session is a detail walk-through of the<br />

key debugger features and advanced techniques on using printf,<br />

CUDA-GDB and MEMCHECK together to improve overall code<br />

productivity on Linux and MacOS platforms. This tutorial will also<br />

include live demos.<br />

Speaker(s): Geoff Gerfin (Technical Manager / Senior Engineer,<br />

NVIDIA), Vyas Venkataraman (Software Engineer, NVIDIA)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0029 Leveraging Matrix Block Structure In Sparse<br />

Matrix-Vector Multiplication<br />

The commonly occurring block structure of sparse matrices can<br />

be effectively leveraged to improve the performance of Sparse<br />

Matrix-Vector multiplication (SpMV) on <strong>GPU</strong>s. This session will<br />

present one such algorithm and discuss both its design and its<br />

performance relative to other SpMV algorithms. In particular,<br />

aspects of <strong>GPU</strong> floating point performance, <strong>GPU</strong> memory use, and<br />

datastructure translation effort will be detailed.<br />

Speaker(s): Steve Rennich (HPC Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />

ROOM K<br />

S0064 MSC Nastran Sparse Direct Solvers for Tesla <strong>GPU</strong>s<br />

The current implementation of MSC Nastran’s MSCLDL and<br />

MSCLU sparse direct solvers for multiple Tesla <strong>GPU</strong>s is<br />

presented. The matrix is first statically decomposed into a<br />

prescribed number of domains. The Schur compliments are then<br />

calculated with CPUs and <strong>GPU</strong>s, and the residual structure is<br />

solved afterward. Back-substitution is used to find the solution at<br />

every grid point. Merits of this method are discussed and<br />

performance comparisons are made.<br />

Speaker(s): Cheng Liao (Development Manager, MSCsoftware)<br />

Topic(s): Computational Structural Mechanics (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />

ROOM A7<br />

S0140 Accelerating Reservoir Simulation and Algebraic<br />

Multigrid with <strong>GPU</strong>s<br />

Given a model of a reservoir’s rock and well properties, a<br />

reservoir simulator solves the PDEs for the multiphase flow<br />

through porous rock to predict well production. Over the past<br />

several decades, simulation has progressed from coarse 2D<br />

models to detailed 3D models, providing strong fidelity to<br />

empirical production rates. By reformulating the Marathon Oil<br />

Corporation’s Multiscale Flow Simulator to use <strong>GPU</strong>s, we improve<br />

the overall execution speed by a factor of over 100, allowing fast<br />

turnaround on a <strong>GPU</strong> workstation. We also introduce GAMPACK, a<br />

fully-accelerated <strong>GPU</strong> algebraic multigrid solver, and demonstrate<br />

its performance relative to CPU solvers.<br />

Speaker(s): Kenneth Esler (Computational Physicist, Stone Ridge<br />

<strong>Technology</strong>), Vincent Natoli (Founder & CEO, Stone Ridge <strong>Technology</strong>)<br />

Topic(s): Energy Exploration (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM N<br />

S0142 VMD: High Performance Molecular Visualization<br />

and Analysis on <strong>GPU</strong>s<br />

This talk will present recent successes in the use of <strong>GPU</strong>s to<br />

accelerate interactive molecular visualization and analysis tasks<br />

on desktop computers, and batch-mode simulation and analysis<br />

jobs on <strong>GPU</strong>-accelerated HPC clusters. We’ll present Fermispecific<br />

algorithms and optimizations and compare with those for<br />

other devices. We’ll also present performance and performance/<br />

watt results for VMD analysis calculations on <strong>GPU</strong> clusters, and<br />

conclude with a discussion of ongoing work and future<br />

opportunities for <strong>GPU</strong> acceleration, particularly as applied to the<br />

analysis of petascale simulations of large biomolecular complexes<br />

and long simulation timescales.<br />

Speaker(s): John Stone (Senior Research <strong>Program</strong>mer, University of<br />

Illinois at Urbana-Champaign)<br />

Topic(s): Molecular Dynamics, Algorithms & Numerical Techniques,<br />

Computer Graphics (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />

ROOM A3<br />

S0307 New Advances in <strong>GPU</strong> Linear Algebra<br />

Hear product experts explain how we have created two of the most<br />

widely used libraries in the <strong>GPU</strong> computing ecosystem. The CULA<br />

library for dense linear algebra has been expanding to multi-<strong>GPU</strong><br />

and out-of-core applications, meaning that users are no longer<br />

limited by the onboard <strong>GPU</strong> memory for their work. In this field,<br />

effectively using multiple <strong>GPU</strong>s is significantly more challenging than<br />

a single <strong>GPU</strong>! The brand new CULA Sparse library tackles the tough<br />

world of sparse linear algebra and achieves 10x speedups. Learn<br />

more about what makes these two libraries work in this session.<br />

Speaker(s): John Humphrey (Engineering Director, EM Photonics), Kyle<br />

Spagnoli (Research Engineer, EM Photonics)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />

53 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM B<br />

S0327 Large and Sparse– Mass Spectrometry Data<br />

Processing in the <strong>GPU</strong><br />

Learn how the <strong>GPU</strong> helps identify millions of ions in datasets of<br />

several billion points of four-dimensional sparse data. The data is<br />

first reduced to 3D to locate regions of dense data, and then only<br />

those regions are processed in 4D. Processing involves combining<br />

several steps of convolution filters in three axes, finding local<br />

maximums in volumes of data, and extracting information from<br />

the data around each local maximum.<br />

Speaker(s): Jose de Corral (Principal Consulting Engineer,<br />

Waters Corporation)<br />

Topic(s): Life Sciences, Bioinformatics (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM A1<br />

S0335 Live 3D-Video with a Lightfield Camera<br />

In this session you will learn what a lightfield camera is, how it<br />

works and what you can do with it. Next to the theoretical<br />

presentation we give a live demo of the camera system developed<br />

by our company Raytrix that gives you 3D live video from a single<br />

camera through a single lens currently at up to 10fps with a<br />

maximum effective resolution of 3 megapixels synthesized from<br />

an 11 megapixel sensor using CUDA algorithms on a GTX580.<br />

Post-production features include pixel-wise focusing, depth zoom,<br />

variable stereo base-line and base-line rotation.<br />

Speaker(s): Christian Perwass (CEO, Raytrix GmbH)<br />

Topic(s): Computational Photography, Audio, Image and Video<br />

Processing, Stereoscopic 3D, Computer Vision (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />

ROOM A8<br />

S0342 Volumetric Processing and Visualization on<br />

Heterogeneous Architecture<br />

Volumetric data is typically very large and involves intensive<br />

computation for processing and visualization. We have developed an<br />

OpenCL-based framework that can utilize all available resources in<br />

a system or a cluster of systems. The framework manages one or<br />

more OpenCL devices. A large volume is partitioned into bricks.<br />

Each OpenCL device is associated with a set of brick producers that<br />

generates the contents of bricks while optionally utilizing other<br />

bricks as input. The framework is also composed of a scheduler<br />

that distributes brick workloads to different devices and chooses an<br />

optimized processing order aiming at certain criteria.<br />

Speaker(s): Wei Li (Research Scientist, Siemens Corporation)<br />

Topic(s): Visualization, Supercomputing (Advanced)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM L<br />

S0369 Running Risk On <strong>GPU</strong>s<br />

A key component of Basel III is the Credit Value Adjustment (CVA)<br />

which is in essence the value of counter-party credit risk.<br />

Quantifying the CVA on simple products already poses<br />

considerable computational challenges and considering many<br />

banks have hundreds of thousands of positions it becomes clear<br />

that the computational challenges of CVA are massive. Calculating<br />

CVA sensitivities for hedging only add to this burden. In this talk<br />

we will discuss real world applications of <strong>GPU</strong>s in risk<br />

management and show how, using CUDA, <strong>GPU</strong> computing is an<br />

enabling technology to address the computational challenges of<br />

an evolving regulatory environment.<br />

Speaker(s): Norbert Hari (Trading Quantitative Analyst, ING Bank nv), Tim<br />

Wood (Quantitative Analyst, ING Bank nv)<br />

Topic(s): Finance (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM A5<br />

S0419B Optimizing Application Performance with CUDA<br />

Profiling Tools<br />

NVIDIA provides two powerful profiling tools that you can use to<br />

maximize your application’s performance. The NVIDIA Visual<br />

Profiler helps you understand your application’s behavior with a<br />

detailed timeline and data from <strong>GPU</strong> performance counters. The<br />

Visual Profiler also provides an automatic, data-driven analysis<br />

engine that provides suggestions on potential optimization<br />

strategies for your application. Nvprof is a command-line profiler<br />

that provides gprof-like functionality for the <strong>GPU</strong>. Nvprof provides<br />

summary information about where your application is spending<br />

the most time, so that you can focus your optimization efforts.<br />

This session will provide a step-by-step walk through of both of<br />

these profiling tools, showing how you can use these tools to<br />

identify optimization opportunities at the application, kernel, and<br />

source-line levels.<br />

Speaker(s): David Goodwin (Software Engineer, NVIDIA)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />

ROOM A2<br />

S0600 Scalable <strong>GPU</strong> Graph Traversal<br />

Breadth-first search (BFS) is a core primitive for graph traversal<br />

and a basis for many higher-level graph analysis algorithms. It is<br />

also representative of a class of parallel computations whose<br />

memory accesses and work distribution are both irregular and<br />

data-dependent. Recent work has demonstrated the plausibility of<br />

<strong>GPU</strong> sparse graph traversal, but has tended to focus on<br />

asymptotically inefficient algorithms that perform poorly on<br />

graphs with non-trivial diameter. We present a BFS parallelization<br />

focused on fine-grained task management constructed from<br />

efficient prefix sum that achieves an asymptotically optimal<br />

O(|V|+|E|) work complexity. Our implementation delivers excellent<br />

performance on diverse graphs, achieving traversal rates in<br />

excess of 3.3 billion and 8.3 billion traversed edges per second<br />

using single and quad-<strong>GPU</strong> configurations, respectively. This level<br />

of performance is several times faster than state-of-the-art<br />

implementations both CPU and <strong>GPU</strong> platforms.<br />

Speaker(s): Duane Merrill (Research Scientist, NVIDIA)<br />

Topic(s): Algorithms and Numerical Techniques (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM M<br />

S0637 Analyzing performance and power of applications<br />

with <strong>GPU</strong>s on Dell 12G platforms (Presented by Dell)<br />

In this talk, both performance and power aspects of running<br />

various applications on NVIDIA <strong>GPU</strong>s on Dell 12G platforms will be<br />

presented. These platforms utilize the latest PCIe Gen 3 slots and<br />

processors in conjunction with varying number of NVIDIA <strong>GPU</strong>s<br />

and are tested with several applications both from a performance<br />

perspective and a power perspective.<br />

Speaker(s): Dr. Jeff Layton (HPC Enterprise Technologist, Dell)<br />

Topic(s): Supercomputing, Visualization (Intermediate)


WEDNESDAY, MAY 16, 14:00 (80 MINUTES)<br />

HALL 1<br />

S0642 Inside Kepler<br />

In this talk, individuals from the <strong>GPU</strong> architecture and CUDA<br />

software groups will dive into the features of the compute<br />

architecture for “Kepler” – NVIDIA’s new <strong>GPU</strong>. From the<br />

reorganized processing cores with new instructions and<br />

processing capabilities, to an improved memory system with<br />

faster atomic processing and low-overhead ECC, we will explore<br />

how the Kepler <strong>GPU</strong> achieves world leading performance and<br />

efficiency, and how it enables wholly new types of parallel<br />

problems to be solved.<br />

Speaker(s): Stephen Jones (CUDA Developer, NVIDIA), Lars Nyland<br />

(Senior Architect, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM J1<br />

S0700 Stampede System Architecture and Early<br />

Accelerator <strong>Program</strong>ming Experiences<br />

We present a description of the design of the Stampede system to<br />

be deployed at TACC over the course of <strong>2012</strong>. Stampede comprises<br />

a 2PF Intel Sandy Bridge cluster with FDR InfiniBand augmented<br />

8PF of Intel MIC Architecture co-processors. We will describe the<br />

design of the system, the datacenter that houses it, and expected<br />

programming models and usage modes. In support of this, we will<br />

present early experiences programming for the Intel MIC<br />

Architecture using the Knights Ferry Software Development<br />

Platform. Key to this will be the presentation of several different<br />

programming models and the scalability of the resulting codes.<br />

Speaker(s): Bill Barth (Director of High Performance Computing, Texas<br />

Advanced Computing Center, University of Texas at Austin)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0807 CUDA Debugger Training on Windows<br />

Nsight offers a variety of powerful CUDA debugging feature set<br />

that enables developers to quickly spot bugs. From the memory<br />

checker to advanced breakpoints and variable warp watch panel, a<br />

developer can quickly isolate access memory errors, filter out the<br />

thousands of threads to a specific thread and quickly spot<br />

abnormal variable value ranges. Through a set of comprehensive<br />

exercises, the attendee will be able to utilize these features to<br />

become fully proficient at developing CUDA code.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2003 Emerging Companies Summit: Fireside Chat with<br />

Jen-Hsun Huang (CEO and Co-Founder, NVIDIA) and Tim<br />

Bajarin (President, Creative Strategies)<br />

NVIDIA CEO and co-founder Jen-Hsun Huang will take part in a<br />

fireside chat with Tim Bajarin, one of IT world’s pre-eminent<br />

analysts and president of Creative Strategies. They will discuss<br />

trends in mobile, visual and parallel computing, and the<br />

transformational changes ahead for the industry.<br />

Speaker(s): Jen-Hsun Huang (CEO, President and Co-Founder,<br />

NVIDIA), Tim Bajarin (President, Creative Strategies)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />

ROOM A3<br />

S0085 Floating Point and IEEE 754 Compliance for<br />

NVIDIA <strong>GPU</strong>s: Precision & Performance<br />

As a result of continuing improvements, NVIDIA offers <strong>GPU</strong>accelerated<br />

floating-point performance in compliance with IEEE<br />

754. It is our experience that a number of issues related to floating<br />

point accuracy and compliance are a frequent source of confusion<br />

both on CPUs and <strong>GPU</strong>s. The purpose of this talk is to discuss the<br />

most common ones related to NVIDIA <strong>GPU</strong>s and to supplement<br />

the documentation in the CUDA C <strong>Program</strong>ming <strong>Guide</strong><br />

Speaker(s): Alex Fit-Florea (Senior Engineer, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques, Development Tools<br />

& Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />

ROOM A8<br />

S0105 Hardware Acceleration for Vessel<br />

Visualization Tasks<br />

To analyze datasets visually, systems with fast feedback loops on<br />

user interaction are beneficial. In this session rendering and<br />

preprocessing techniques for medical volume data will be<br />

presented using OpenGL and CUDA. In the context of the coronary<br />

artery disease the analysis of individual vessel branches is<br />

important. We show how local transfer function application and<br />

generation by means of histogramm analysis can help navigating<br />

and finding details in the datasets. Furthermore, domain-specific<br />

acceleration and illustration techniques for volume rendering are<br />

also applied to datasets from brain aneurysms.<br />

Speaker(s): Christoph Kubisch (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Medical Imaging & Visualization, Computer Graphics (Beginner)<br />

WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />

ROOM K<br />

S0143 Fluid-Structure-Interaction Using SPH and<br />

GP<strong>GPU</strong> <strong>Technology</strong><br />

There are two goals when developing engineering analysis<br />

software, one is accuracy and the other is speed. In the area of<br />

Fluid-Structure Interaction (FSI) computational time has always<br />

been the major impediment to solving large realistic engineering<br />

problems. In our implementation the fluid/structural dynamics<br />

solver uses a combination of <strong>GPU</strong>/CPU processing. The added<br />

benefit of using a powerful <strong>GPU</strong> workstation is that it is roughly 10<br />

times less expensive than a regular CPU cluster. In this paper, we<br />

present the use of <strong>GPU</strong> <strong>Technology</strong> as implemented in the explicit<br />

dynamic finite element software IMPETUS Afea Solver ® .<br />

Speaker(s): Jean Luc Lacome (IMPETUS Afea SAS), Jerome Limido<br />

(IMPETUS Afea SAS)<br />

Topic(s): Computational Structural Mechanics, Algorithms &<br />

Numerical Techniques, Computational Fluid Dynamics (Intermediate)<br />

WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />

ROOM A7<br />

S0190 Large-Scale Reservoir Simulation on <strong>GPU</strong><br />

Develop highly parallel <strong>GPU</strong>-based GMRES solver and several<br />

precondtioners, and couple them with the in-house reservoir<br />

simulator to speedup large-scale reservoir simulation with over<br />

one million grid blocks. For those preconditioners, we develop the<br />

highly parallelized ILU(k), ILUT, and block ILU(k), block ILUT, with<br />

matrix partition by METIS on <strong>GPU</strong>. The excellent speedup and<br />

accurate results can demonstrate the great promising future of<br />

the <strong>GPU</strong> parallel device in parallel reservoir simulation.<br />

55 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

Speaker(s): Song Yu (Chemical & Petroleum Department, University<br />

of Calgary)<br />

Topic(s): Application Design & Porting Techniques, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0271 Fast Adaptive Sampling Technique for Multi-<br />

Dimensional Integral Estimation Using <strong>GPU</strong>s<br />

Evaluating multi-dimensional integrals is a commonly encountered<br />

problem in many areas of science including Physics and Volume<br />

estimation of convex bodies. One of the widely used techniques for<br />

integral evaluation in large dimensions is the Monte Carlo method.<br />

Vanilla Monte Carlo methods of Integral Estimation use uniform<br />

sampling techniques. Variance of such uniform sampling reduces<br />

as 1/√Sample-size, which is too slow for most real life applications.<br />

In this study, we discuss about an adaptive sampling technique<br />

called VEGAS which reduces the variance at a much faster rate than<br />

uniform sampling. We present a new parallel implementation for<br />

VEGAS based on CUDA that can significantly reduce the<br />

computation time of multi-dimensional integrals. We show that our<br />

<strong>GPU</strong> based implementation of VEGAS achieves up to a 45x speed up<br />

over an equivalent CPU based implementation.<br />

Speaker(s): Srinivasa Prasanna (Professor, Internation Institute of<br />

Information <strong>Technology</strong> Bangalore), Pradeep Rao (<strong>Technology</strong><br />

Architect, Infosys Technologies Ltd)<br />

Topic(s): Algorithms & Numerical Techniques, Finance (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0035 <strong>GPU</strong> Parallelization of Gibbs Sampling:<br />

Abstractions, Results, and Lessons Learned<br />

Monte-Carlo-Markov-Chain (MCMC) estimation of Hierarchical<br />

Bayesian (HB) models is not only time-consuming, but also<br />

difficult to parallelize due to its sequential (Markovian) nature. We<br />

present an abstraction of a widely-used MCMC algorithm, called<br />

Gibbs sampling. We define a taxonomy of variable blocks, and for<br />

each type of variable block we offer suitable parallelization<br />

strategies, along with their corresponding CUDA implementations.<br />

For large problems where model estimation may take several<br />

hours or days using a single-threaded software, we see speedups<br />

in the 30x-100x range, thereby reducing estimation time to a few<br />

hours. In addition to lower computation cost relative to MPI-based<br />

parallelization, the reduction in estimation time allows for a more<br />

interactive modeling experience. We offer an extensive discussion<br />

of lessons learned for the broader scientific computing field,<br />

including an analysis of tradeoffs between computation costs and<br />

development costs, implications of our tradeoff analysis for<br />

optimal software development and parallelization, and some<br />

practical tips and gotcha’s for rookie <strong>GPU</strong> programmers.<br />

Speaker(s): Alireza Mahani (Quantitative Modeler, Sentrana)<br />

Topic(s): Algorithms & Numerical Techniques, Databases, Data Mining,<br />

Business Intelligence (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

ROOM A3<br />

S0042 Solving Challenging Numerical Linear Algebra<br />

Algorithms using Multiple <strong>GPU</strong> Accelerators<br />

See the newest features integrated in MAGMA (Matrix Algebra on<br />

<strong>GPU</strong> and Multicore Architectures) to tackle the multiple <strong>GPU</strong>-based<br />

systems for numerical linear algebra. In this talk, we describe how<br />

we leveraged MAGMA to solve existing and new challenging<br />

numerical problems on multiple hardware accelerators. Using a<br />

hybridization methodology, the new multi<strong>GPU</strong>-enabled MAGMA is<br />

characterized by a representation of linear algebra algorithms as<br />

directed acyclic graphs, where nodes correspond to tasks and edges<br />

to data dependencies among them, and a dynamic runtime system<br />

environment StarPU used to schedule various computational kernels<br />

over hybrid architectures of <strong>GPU</strong>s and homogeneous multicores.<br />

Speaker(s): Hatem Ltaief (Computational Scientist, KAUST<br />

Supercomputing Laboratory), Stanimire Tomov (University of Tennessee)<br />

Topic(s): Algorithms & Numerical Techniques, Development Tools<br />

& Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM A5<br />

S0099 Debugging <strong>GPU</strong> Applications For Correctness<br />

and Performance<br />

This session reveals how debugging CUDA applications is made<br />

straightforward with the powerful Allinea DDT debugger. New<br />

features enabling greater understanding of performance<br />

optimizations will be explored, showing how they can be used to<br />

produce better, faster CUDA code. Coupled with newly released<br />

support for multiple languages and compilers we will also show<br />

how Allinea DDT is enabling developers on desktops and the<br />

largest supercomputers to achieve both correct and efficient<br />

<strong>GPU</strong> applications.<br />

Speaker(s): David Lecomber (CTO, Allinea Software)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM N<br />

S0127 Petascale Molecular Dynamics Simulations on<br />

<strong>GPU</strong>-Accelerated Supercomputers<br />

The highly parallel molecular dynamics code NAMD was chosen in<br />

2006 as a target application for the NSF petascale supercomputer<br />

now know as Blue Waters. NAMD was also one of the first codes<br />

to run on a <strong>GPU</strong> cluster when G80 and CUDA were introduced in<br />

2007. How do the Cray XK6 and modern <strong>GPU</strong> clusters compare to<br />

300,000 CPU cores for a hundred-million-atom Blue Waters<br />

acceptance test? Come learn the opportunities and pitfalls of<br />

taking <strong>GPU</strong> computing to the petascale and the importance of<br />

CUDA 4.0 features in combining multicore host processors and<br />

<strong>GPU</strong>s in a legacy message-driven application.<br />

Speaker(s): James Phillips (Senior Research <strong>Program</strong>mer, University<br />

of Illinois)<br />

Topic(s): Molecular Dynamics, Application Design & Porting<br />

Techniques, Parallel <strong>Program</strong>ming Languages & Compilers,<br />

Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM K<br />

S0214 <strong>GPU</strong> Based Stacking Sequence Optimization For<br />

Composite Skins Using GA<br />

The goal of this session is to showcase how <strong>GPU</strong>s can be used to<br />

achieve high performance in a Genetic algorithm based optimization.<br />

The particular domain applied is stacking sequence optimization of<br />

Aircraft wing skins. The concepts illustrated use CUDA but are<br />

generic to any other <strong>GPU</strong> language. It is assumed that the<br />

registrants have exposure to optimization in engineering domain.<br />

Speaker(s): Sathya Narayana K. (Principal Consultan, Infosys Ltd.),<br />

Ravikumar G.V.V. (Infosys Ltd, Bangalore)<br />

Topic(s): Computational Structural Mechanics, Algorithms &<br />

Numerical Techniques, Parallel <strong>Program</strong>ming Languages &<br />

Compilers, Algorithms & Numerical Techniques (Advanced)


WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM A8<br />

S0259 A High Performance Platform for Real-Time<br />

X-Ray Imaging<br />

We will share our experience on development of the <strong>GPU</strong>-based<br />

platform for synchrotron-based X-ray imaging aimed to analysis<br />

of dynamic processes. The complete data flow from the camera to<br />

the data storage will be discussed with a special focus on I/O<br />

issues, hardware platform, and ways to utilize the available<br />

system resources. An efficient <strong>GPU</strong>-implementation of filtered<br />

back projection will be presented highlighting differences of<br />

implementations for GT200, Fermi, and AMD Cypress<br />

architectures. We will introduce our software platform used to<br />

abstract current configuration of the imaging station and to<br />

simplify the development of parallel image processing algorithms.<br />

Speaker(s): Suren Chilingaryan (Researcher, Karlsruhe Institute<br />

of <strong>Technology</strong>)<br />

Topic(s): General Interest, Supercomputing, Audio, Image and Video<br />

Processing, Algorithms & Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM A1<br />

S0281 Accelerate a Fully Functional Photo Editing<br />

Software with <strong>GPU</strong><br />

Introduce how to design a fully functional <strong>GPU</strong>-based photo<br />

editing software, which provides features like layering and<br />

selecting, and integrates various adjusting tools and image filters.<br />

This design contains a fast layer rendering engine, an image filter<br />

framework which manages different filters supporting visual<br />

feedback for filter parameter adjustment. We will also introduce<br />

how to design undoing system for <strong>GPU</strong>-based image processing<br />

software. Specifically a CUDA-accelerated HDR tool will be<br />

presented in detailed.<br />

Speaker(s): Kaiyong Zhao (PhD Student, Hong Kong Baptist University)<br />

Topic(s): Computational Photography, Computer Graphics (Beginner)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

ROOM C<br />

S0365 Delite: A Framework for Implementing<br />

Heterogeneous Parallel DSLs<br />

Domain-specific languages can be a solution for heterogeneous<br />

parallel computing since they provide higher productivity and<br />

performance. To lower the barrier for DSL development, we<br />

implemented the Delite compiler framework and runtime. DSL<br />

developers can easily extend the framework to build a new DSL.<br />

The framework provides various optimization facilities and<br />

automatically generates code for heterogeneous hardware<br />

including <strong>GPU</strong>. The runtime executes the generated code in<br />

parallel by scheduling the kernels on target devices and managing<br />

the memory allocations and data transfers. This talk will cover the<br />

details of Delite with examples from OptiML, a machine learning<br />

DSL implemented with the framework.<br />

Speaker(s): HyoukJoong Lee (PhD Student, Stanford University), Kevin<br />

J. Brown (Research Assistant, Stanford University)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

ROOM L<br />

S0405 New Generation <strong>GPU</strong> Accelerated Financial<br />

Quant Libraries<br />

Learn from industry experts how new generation <strong>GPU</strong> accelerated<br />

solutions for derivative pricing, hedging, and risk management<br />

can be build more efficiently with modern technology and<br />

functional programming languages like F# on .NET or Scala on<br />

the Java VM. As a concrete example we report from a large<br />

derivative pricing project developed in F# on .NET. We will<br />

introduce the key design concepts and parallelization strategies,<br />

which lead to an efficient and transparent <strong>GPU</strong> acceleration.<br />

Several examples will illustrate the benefit of the functional as<br />

compared to the classical object oriented approach.<br />

Speaker(s): Daniel Egloff (Managing Partner, QuantAlea GmbH)<br />

Topic(s): Finance, Application Design & Porting Techniques, Algorithms<br />

& Numerical Techniques, Cloud Computing (Advanced)<br />

WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM A7<br />

S0432 New Ideas for Massively Parallel Preconditioners<br />

Linear Solvers on serial machines tend to be highly recursive, but<br />

that’s not an option on <strong>GPU</strong>s. In this paper we describe a new<br />

preconditoner for GMRES and similar Krylov subspace linear<br />

solvers that is highly parallel, but also provides effective<br />

mechanisms to reconcile remote driving forces in a spatially<br />

discretized system. We will present results, taken from some<br />

real-world studies using a commercial oil reservoir simulator,<br />

showing how it compares with a state of the art serial solver, and<br />

showing how performance scales in a domain decomposition<br />

formulation run on a multiple CPU+<strong>GPU</strong> cluster.<br />

Speaker(s): John Appleyard (Managing Director, Polyhedron Software<br />

Ltd), Jeremy Appleyard (Analyst, Polyhedron Software Ltd)<br />

Topic(s): Algorithms & Numerical Techniques, Computational Fluid<br />

Dynamics, Energy Exploration (Advanced)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

ROOM M<br />

S0635 How to Bake Portable Many-Core <strong>Program</strong>s<br />

(Presented by CAPS enterprise)<br />

A legacy code, a cool many-core accelerator and a directive-based<br />

programming environment are the main ingredients of the recipe to<br />

transform your legacy code into a portable many-core one. This<br />

presentation shows by the example how to exploit accelerators in<br />

legacy code without sacrificing portability. We describe a<br />

methodology and the use of directives, such as HMPP and OpenACC,<br />

to exploit the massive parallelism provided by many-core devices.<br />

During the presentation we illustrate using numerous illustrations<br />

how to analyze performance, tune accelerator code, reduce data<br />

transfers, deal with libraries, exploit multiple accelerators, etc.<br />

Speaker(s): François Bodin (Chief technology Officer, CAPS enterprise)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />

ROOM J1<br />

S0701 New <strong>GPU</strong> Appliance for Co-processing<br />

In the Petascale era, the super computers were used both for<br />

simulation and the graphical visualization of the results in-situ. At<br />

Exascale the compute resources will be more precious than<br />

before and using them for co-processing tasks will be not<br />

efficient. We are designing at a new appliance that will move the<br />

processing required for graphical visualization on a separate<br />

appliance that will allow visualization as co-processing to the<br />

simulation. We showcased the appliance at SC11. Running a<br />

pipeline of computational simulation and visualization, we show<br />

that our prototype system reduces total time to simulation<br />

completion by up to 30%.<br />

57 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

Speaker(s): Sorin Faibish (EMC Corporation)<br />

Panelist(s): Tom Furlong (Managing Director, Granite Ventures), Rob<br />

Enderle (Principal Analyst, Enderle Group), Flip GIanos (General Partner,<br />

InterWest Partners), Jeff Herbst (VP of Business Development, NVIDIA)<br />

Topic(s): HW/SW Architectures for Co-processing (Intermediate)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0808 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight<br />

development team! Whether you would like a private meeting to<br />

discuss specific product features or test out your application with<br />

the latest version of Nsight, or you just want to hang out with the<br />

team after attending one of the exciting training session, the<br />

lounge is great place to learn everything you ever wanted to know<br />

about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2004 Emerging Companies Summit: CEO on Stage<br />

Featuring GAIKAI, Immersive Media, and Numecent<br />

See the hottest new technologies from startups that are<br />

transforming computing. In a lively and fast-paced exchange, the<br />

Emerging Companies Summit CEO on Stage sessions will feature<br />

CEOs from three startups who will each have 15 minutes to<br />

introduce their companies and interact with a panel of leading<br />

venture capitalists, technology executives, and industry analysts.<br />

Speaker(s): David Perry (CEO and Co-Founder, GAIKAI), Mark<br />

McGovern (CEO, Immersive Media), Osman Kent (CEO, Numecent)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM A1<br />

S0073 Cost-effective <strong>GPU</strong> Acceleration of a Video<br />

Restoration and Archiving Workflow<br />

The goal of this session is to present a complex <strong>GPU</strong>-accelerated<br />

video restoration and archiving workflow. The workflow consists of<br />

many different processing steps and a final review application.<br />

Fast and cost-effective processing and real-time display of the<br />

processed video material is a key requirement. It will be shown in<br />

detail how a <strong>GPU</strong> based acceleration can be achieved for many<br />

different processing steps and the review application based on the<br />

use of OpenCV, OpenCL, and OpenGL. Furthermore, an object<br />

oriented software architecture supporting the acceleration of<br />

several different processing tasks on the same graphics adapter<br />

will be presented.<br />

Speaker(s): Klaus Gaedke (Lab Manager, Technicolor)<br />

Topic(s): Audio, Image and Video Processing (Intermediate)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM B<br />

S0103 Accelerating Protein Sequences and Classification<br />

using <strong>GPU</strong>-HMMER Search<br />

In this paper we present the results of parallelizing HMMer, which<br />

is a widely used tool for protein sequence homology detection, as<br />

well as functional annotation of homologous protein sequences,<br />

and protein family classification. The HMMer program is based<br />

upon a Viterbi algorithm coded in C, and is quite time consuming.<br />

We modify the Viterbi algorithmic logically to port it on GP<strong>GPU</strong>. We<br />

test multiple enhancements in our <strong>GPU</strong> kernels in order to<br />

demonstrate the effectiveness of each strategy. Our<br />

implementation cuda_hmmsearch achieves overall up to 30x<br />

speedup over intel single core CPU.<br />

Speaker(s): Mahesh Khadtare (PhD Student - Scientist ESP, I2IT,<br />

Pune University)<br />

Topic(s): Life Sciences, Bioinformatics (Intermediate)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM A8<br />

S0141 <strong>GPU</strong>-Accelerated Optical Coherence<br />

Tomography Imaging<br />

We developed a series of <strong>GPU</strong>-based technologies to accelerate<br />

the imaging reconstruction and visualization for optical coherence<br />

tomography (OCT). Several <strong>GPU</strong>-based algorithms such as<br />

non-uniform fast Fourier transform, numerical dispersion<br />

compensation, simultaneous phase modulation and multi-<strong>GPU</strong><br />

implementation were developed to achieve improved impulse<br />

response, better SNR, doubled imaging range and higher system<br />

stability. The <strong>GPU</strong>-accelerated 4D-OCT system was validated by<br />

imaging both in vivo and ex vivo biological tissues. This technology<br />

overcomes the imaging reconstruction and visualization<br />

bottlenecks that widely exist in current ultrahigh speed OCT<br />

systems and opens the way to interventional OCT imaging for<br />

applications in guided microsurgery.<br />

Speaker(s): Kang Zhang (Research Scientist, GE Global Research)<br />

Topic(s): Medical Imaging & Visualization (Beginner)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM N<br />

S0207 <strong>GPU</strong> Enabled Macromolecular Simulation:<br />

Challenges and Opportunities<br />

<strong>GPU</strong> enabled simulation of fully atomistic macromolecular<br />

simulation is rapidly gaining momentum, enabled by the massive<br />

parallelism and due to parallelizability of various components of<br />

the underlying algorithms and methodologies. The massive<br />

parallelism in the order of several hundreds to few thousands of<br />

cores, presents opportunities as well poses implementation<br />

challenges. In this talk dive deep into the various key aspects of<br />

simulation methodologies of macro molecular systems<br />

specifically adapted to <strong>GPU</strong>s. Learn some of the underlying<br />

challenges and get the latest solutions devised to tackle them in<br />

the FEN ZI code for fully atomistic macromolecular simulations.<br />

Speaker(s): Michela Taufer (Assistant Professor, University of<br />

Delaware), Sandeep Patel (University of Delaware)<br />

Topic(s): Molecular Dynamics, Algorithms & Numerical Techniques<br />

(Advanced)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM K<br />

S0293 Culises – A Library for Accelerated CFD on Hybrid<br />

<strong>GPU</strong>-CPU Systems<br />

The vast majority of CFD simulations relies on the solution of<br />

large-scale systems of linear equations (SLE), where the solution of<br />

a system can consume most of the total CPU time. We have<br />

developed a library (Culises) for state-of-the-art solution of SLE that<br />

is targeted on hybrid <strong>GPU</strong>-CPU platforms. Culises can be connected<br />

to MPI-parallelized CFD codes (e.g. OpenFOAM) via an applicationspecific<br />

interface. In this talk, we focus on efficient implementation<br />

of preconditioned Krylov subspace methods. Using the computing<br />

power of <strong>GPU</strong>s, Culises can significantly accelerate pure CPU<br />

computations for a multitude of industrial CFD applications.


Speaker(s): Bjoern Landmann (Development Engineer, FluiDyna GmbH)<br />

Topic(s): Computational Fluid Dynamics, Algorithms & Numerical<br />

Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 15:30 (50 MINUTES)<br />

ROOM A5<br />

S0340 Debug Multi-<strong>GPU</strong> Applications on CUDA-<br />

Accelerated Clusters with TotalView<br />

Learn how TotalView can help you develop CUDA applications on<br />

single servers, multi-<strong>GPU</strong> servers, and HPC-style clusters. For<br />

more than 20 years the TotalView debugger has set the standard<br />

for parallel and multi-core debugging on Linux, HPC clusters and<br />

custom supercomputers such as the Cray XT/XE/XK series. CUDA<br />

developers deal with the same types of complexity and can realize<br />

the same productivity benefits. This talk will introduce TotalView<br />

for CUDA and show how you can program more easily with CUDA<br />

3.2, 4.0 and 4.1.<br />

Speaker(s): Chris Gottbrath (Principal Product Manager, Rogue<br />

Wave Software)<br />

Topic(s): Development Tools & Libraries, Supercomputing<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM A7<br />

S0433 Accelerated FDTD Technique for Marine<br />

Controlled Source Electromagnetic Imaging<br />

Find out about the newest method for Marine Hydrocarbon<br />

Exploration. In this session we will profile the use of Finite<br />

Difference Time Domain (FDTD) technique in combination with<br />

Mittet’s method and <strong>GPU</strong>s to produce faster, cheaper, more<br />

accurate forward modeling for electromagnetic imaging<br />

(Controlled Source Electromagnetic or CSEM). Unlike many<br />

frequency domain CSEM techniques this accelerated method does<br />

not require simplifying assumptions to reduce the memory and<br />

computational burden and has excellent scaling properties<br />

(essentially linear) across clusters of <strong>GPU</strong> accelerated nodes.<br />

CSEM is used in the industry to enhance confidence in<br />

hydrocarbon reservoir discoveries.<br />

Speaker(s): Geoff Clark (CEO, Acceleware Ltd.), Michal Okoniewski<br />

(Director of Marketing, Acceleware Ltd.)<br />

Topic(s): Energy Exploration (Intermediate)<br />

WEDNESDAY, MAY 16, 15:30 (180 MINUTES)<br />

HALL 1<br />

S0514 <strong>GPU</strong> Performance Analysis and Optimization<br />

This session will present the fundamental performanceoptimization<br />

concepts and illustrate their practical application in<br />

the context of programming for Fermi and Kepler <strong>GPU</strong>s. The goal<br />

is twofold: make the optimization process a methodical sequence<br />

of steps, facilitate making performance-aware algorithmic<br />

decisions before coding even starts. In order to maximize <strong>GPU</strong><br />

performance, a code should have sufficient parallelism, access<br />

memory in a coalesced pattern, and be amenable to vector<br />

execution within warps (groups of 32 threads). We will show how<br />

to quantify these requirements for a specific <strong>GPU</strong> in order to<br />

determine performance limiters and their importance for a given<br />

code. To address the limiters, we will review hardware operation<br />

specifics and related optimization techniques. Optimization<br />

process will be illustrated using NVIDIA profiling tools and kernel<br />

case studies.<br />

Speaker(s): Paulius Micikevicius (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />

WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />

ROOM J1<br />

S0702 The Architecture of Acceleration in HPC<br />

High Performance Computing applications push the envelope of<br />

what can be computed today. Acceleration technologies play a<br />

critical role in extending and enhancing capability. Balancing the<br />

impact of acceleration within hardware and software is a difficult<br />

art, where critical decisions can have dramatic impacts. We<br />

present the role of acceleration in tightly and loosely coupled<br />

settings, as well as data structures and execution model.<br />

Speaker(s): Justin Tripp (Technical Staff Member, Los Alamos National<br />

Laboratory), Zack Baker (Los Alamos National Laboratory)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM K<br />

S0055 Particle Dynamics with MBD and FEA using CUDA<br />

Many sphere particles are solved with DEM (Discrete Element<br />

Method) and simulated with <strong>GPU</strong> technology. Fast algorithm is<br />

applied to calculate hertzian contact forces between many sphere<br />

particles (from 100,000 to 1,000,000) and NVIDIA’s CUDA is used to<br />

accelerate the calculation. Many sphere particles and MBD and<br />

FEA entities are simulated within commercial software RecurDyn.<br />

Many models are built and simulated; fork lifter with sand model,<br />

oil in oil tank model, oil filled engine system and water filled<br />

washing machine model. All models are simulated with NVIDIA’s<br />

<strong>GPU</strong> and the result is shown.<br />

Speaker(s): Graham Sanborn (Lead Software Developer, FunctionBay)<br />

Topic(s): Computational Structural Mechanics, Computational Physics,<br />

Computational Fluid Dynamics (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM B<br />

S0109 SOAP3: <strong>GPU</strong>-based Compressed Indexing and<br />

Ultra-fast Parallel Alignment of Short Reads<br />

We give the fi_x000C_rst implementation of a compressed index<br />

(Burrows-Wheeler Transform) on the <strong>GPU</strong>, supporting very<br />

efficient parallel alignment of short patterns (reads) onto the<br />

human genome. The new alignment software SOAP3 is tens of<br />

times faster than existing ones and can catch up the throughput<br />

(Giga to Tera bp) of next generation DNA sequencer. It takes 2.4<br />

seconds to perform exact matching for one million length-100<br />

reads (tens of seconds for small-error approximate matching).<br />

Technically, we show how to minimize memory accesses to the<br />

index from individual threads and to control the branching and<br />

divergence of the threads.<br />

Speaker(s): BingQiang Wang (BGI)<br />

Topic Areas: Bioinformatics (Advanced)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM A8<br />

S0131 Multi-<strong>GPU</strong> Real-Time Ptychographic X-ray<br />

Image Reconstruction<br />

Learn how a new imaging technique, combined with the<br />

computational power of <strong>GPU</strong>s and the brightness of modern X-ray<br />

synchrotrons can quickly and easily produce images with<br />

nanometer level resolution. Ptychography is a recent X-ray<br />

imaging technique in which overlapping regions of a sample are<br />

exposed in quick succession and the resulting scattering is used<br />

to reconstruct a high resolution image of the sample. Discover<br />

why <strong>GPU</strong>s can substitute for the lack of X-ray lenses and how they<br />

59 CONFERENCE GUIDE WEDNESDAY


enabled a dramatic reduction in the feedback time for users of the<br />

technique from days to seconds.<br />

Speaker(s): Filipe Maia (Postdoctoral Fellow, Lawrence Berkeley<br />

National Laboratory)<br />

Topic(s): Audio, Image and Video Processing, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM A3<br />

S0149 On the Parallel Solution of Sparse Triangular<br />

Linear Systems<br />

A parallel algorithm for solving a sparse triangular linear system on<br />

the <strong>GPU</strong> is proposed. It implements the solution of the triangular<br />

system in two phases. The analysis phase builds a dependency graph<br />

based on the matrix sparsity pattern and groups the independent<br />

rows into levels. The solve phase obtains the full solution by iterating<br />

sequentially across the constructed levels. The solution elements<br />

corresponding to each level are obtained in parallel. The numerical<br />

experiments are presented and it is shown that the incomplete-LU<br />

and Cholesky preconditioned iterative methods can achieve a 2x<br />

speedup on the <strong>GPU</strong> over their CPU implementation.<br />

Speaker(s): Maxim Naumov (Software Engineer, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques, Development Tools &<br />

Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM L<br />

S0206 Monte-Carlo Pricing Under a Hybrid Local<br />

Volatility Model<br />

This session shows how to calculate the prices of several financial<br />

products, vanilla and exotic, under Dupire’s Local Volatility model.<br />

We start with vanilla options on the foreign exchange rate and<br />

explain how to rescale the Local Volatility matrix in order to take<br />

advantage of the fast texture memory interpolation. We then extend<br />

this framework to two factors by including stochastic interest rates<br />

following Hull-White model, and show how to price Power-Reverse<br />

Dual Coupon swaps with an exotic TARN feature. We provide details<br />

of the algorithms and compare accuracy and speed with typical<br />

performances of single-core production implementations.<br />

Speaker(s): Sebastien Gurrieri (Quantitative Analyst, Mizuho<br />

International)<br />

Topic(s): Finance, Algorithms & Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM A1<br />

S0273 Fast JPEG Coding on the <strong>GPU</strong><br />

The goal of this session is to demonstrate how high speed JPEG<br />

compression and decompression can be efficiently implemented<br />

on the <strong>GPU</strong> using CUDA. In this session we will present: detailed<br />

analysis of Baseline JPEG compression and decompression<br />

processes and its constituent parts (such as Huffman Coding,<br />

RLE, Differential Coding, Quantization, Discrete Cosine Transform)<br />

and their suitability for the <strong>GPU</strong> architecture, analysis of achieved<br />

results and comparison with existing implementations,<br />

applications to high-speed imaging.<br />

Speaker(s): Fyodor Serzhenko (SEO, Fastvideo), Victor Podlozhnyuk<br />

(NVIDIA)<br />

Topic(s): Audio, Image and Video Processing, Algorithms &<br />

Numerical Techniques (Advanced)<br />

WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />

ROOM A2<br />

S0286 Scaling Applications to a Thousand <strong>GPU</strong>s<br />

and Beyond<br />

Discover how to scale scientific applications to thousands of <strong>GPU</strong>s<br />

in parallel. We will demonstrate our techniques using two codes<br />

representative of a wide spectrum of programming methods. The<br />

Ludwig lattice Boltzmann package, capable of simulating<br />

extremely complex fluid dynamics models, combines C, MPI and<br />

CUDA. The Himeno three-dimensional Poisson equation solver<br />

benchmark combines Fortran (using the new coarray feature for<br />

communication) with prototype OpenMP accelerator directives (a<br />

promising new high-productivity <strong>GPU</strong> programming method). We<br />

will present performance results using the cutting-edge<br />

massively-parallel Cray XK6 hybrid supercomputer featuring the<br />

latest NVIDIA Tesla 2090 <strong>GPU</strong>s.<br />

Speaker(s): Alan Gray (HPC Architect, The University of Edinburgh),<br />

Roberto Ansaloni (Cray Italy)<br />

Topic(s): Supercomputing, Computational Fluid Dynamics, Parallel<br />

<strong>Program</strong>ming Languages & Compilers, Application Design &<br />

Porting Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM C<br />

S0299 Exploiting Fault Tolerant Heterogeneous<br />

Parallelism with SPM.Python<br />

In this session, we shall review how SPM.Python enables the<br />

exploitation of parallelism across servers, cores and <strong>GPU</strong>s in a<br />

fault tolerant manner. We will start off by describing the how/<br />

what/why SPM.Python augments the traditional (serial) Python<br />

with parallel concepts like parallel task managers and<br />

communication primitives. Specifically, the context for and<br />

solutions to three formally open technical problems will be<br />

described. We will conclude by reviewing examples of how SPM.<br />

Python can be used to exploit both coarse and fine grain<br />

parallelism using <strong>GPU</strong>s within and across servers in a fault<br />

tolerant manner.<br />

Speaker(s): Minesh B Amin (Founder / CEO, MBA Sciences)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0332 Efficient Graph Matching and Coloring on the <strong>GPU</strong><br />

The goal of this session is to compare the performance of graph<br />

matching and graph coloring algorithms on massively parallel<br />

devices such as <strong>GPU</strong>s. We present novel algorithms, which produce<br />

superior results for certain graphs and also discuss the techniques<br />

used to efficiently implement these algorithms on the <strong>GPU</strong>.<br />

Speaker(s): Patrice Castonguay (Emerging Applications Intern,<br />

NVIDIA), Jonathan Cohen (Emerging Applications, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM N<br />

S0363 Efficient Molecular Dynamics on Heterogeneous<br />

<strong>GPU</strong> Architectures in GROMACS<br />

Molecular Dynamics is an important application for <strong>GPU</strong><br />

acceleration, but many algorithmic optimizations and features still<br />

rely on code that prefers traditional CPUs. It is only with the latest<br />

hardware and software we have been able to realize a<br />

heterogeneous <strong>GPU</strong>/CPU implementation and reach performance<br />

61 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

significantly beyond the state-of-the-art of hand-tuned CPU code<br />

in our GROMACS program. The sub-millisecond iteration time<br />

poses challenges on all levels of parallelization. Come and learn<br />

about our new atom-cluster pair interaction approach for<br />

non-bonded force evaluation that achieves 60% work-efficiency<br />

and other innovative solutions for heterogeneous <strong>GPU</strong> systems.<br />

Speaker(s): Berk Hess (PhD Student, KTH Royal Institute of <strong>Technology</strong>),<br />

Szilárd Páll (PhD Student, KTH Royal Institute of <strong>Technology</strong>)<br />

Topic(s): Molecular Dynamics, Computational Physics, Life Sciences<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM A7<br />

S0507 Interactive and Scalable Subsurface Data<br />

Visualization Framework<br />

The goal is to present an interactive visualization framework for<br />

large geo-spatial data. This framework has been developed by<br />

NVIDIA Advanced Rendering Center for the oil and gas<br />

(Hydrocarbone) industry. The Cuda based application is running on<br />

the cloud at interactive frame-rates. The visualization is remote<br />

on clients in a browser, including tablets. The scalable<br />

visualization framework can handle terra bytes of.<br />

Speaker(s): Tom-Michael Thamm (Director, Software Product<br />

Management, NVIDIA ARC), Marc Nienhaus (NVIDIA ARC)<br />

Topic(s): Visualization, Cloud Computing (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />

ROOM M<br />

S0636 Supermicro: Worldwide leader in GP/<strong>GPU</strong> Servers<br />

and Workstation Platforms (Presented by Supermicro)<br />

Discover the measurable advantages that make Supermicro the<br />

time-to-market leader in <strong>GPU</strong> platform enablement. See how<br />

Supermicro’s innovative Application-Optimized designs enable<br />

partners to both scale-up and scale-out for maximum return on<br />

investment. Review actual case studies that highlight Supermicro’s<br />

leadership in Compute Density, Peak Performance, Scalability,<br />

Power Efficiency, Manageability, Reliability and Cost Effectiveness.<br />

Speaker(s): Don Clegg (VP, Supermicro)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />

ROOM J1<br />

S0703 Adaptive Heterogeneous Computing with OpenCL:<br />

A Molecular Docking Case Study<br />

Modern computer systems routinely include multiple types of fully<br />

programmable computing resource, such as multi-core CPUs and<br />

many-core <strong>GPU</strong>s. Most research into accelerator-based<br />

computing tends to focus on just one part of the system, typically<br />

the <strong>GPU</strong>. In our work we have developed methods to harness all of<br />

the available computing resources in a system simultaneously,<br />

including CPUs and <strong>GPU</strong>s, using OpenCL as the underpinning<br />

cross-platform layer. In this paper we shall include results from a<br />

molecular docking program, which has been shown to scale<br />

across hundreds of hybrid CPU/<strong>GPU</strong> systems, yielding significant<br />

increases in performance and energy efficiency.<br />

Speaker(s): Simon McIntosh-Smith (University of Bristol)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0809 CUDA Profiler Training on Windows<br />

Nsight offers a comprehensive set of performance analysis tools.<br />

From the ability to trace complete system multi-core CPU and<br />

multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />

experiments, developers can identify system level optimization<br />

opportunities as well as expensive and inefficient CUDA kernels<br />

requiring in-depth analysis with the CUDA profiler. Through a set<br />

of comprehensive exercises, the attendee will be able to utilize<br />

these features to become fully proficient at optimizing complex<br />

CUDA applications.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2005 Emerging Companies Summit: CEO on Stage<br />

Featuring RealView Imaging, Elemental Technologies,<br />

and Mersive<br />

See the hottest new technologies from startups that are<br />

transforming computing. In a lively and fast-paced exchange, the<br />

Emerging Companies Summit CEO on Stage sessions will feature<br />

CEOs from three startups who will each have 15 minutes to<br />

introduce their companies and interact with a panel of leading<br />

venture capitalists, technology executives, and industry analysts.<br />

Speaker(s): Shaul Geldman (Co-Founder and VP of R&D, RealView<br />

Imaging), Sam Blackman (CEO and Co-Founder, Elemental<br />

Technologies), Robert Balgley (CEO, Mersive)<br />

Panelist(s): Tom Furlong (Managing Director, Granite Ventures), Rob<br />

Enderle (Principal Analyst, Enderle Group), Flip GIanos (General<br />

Partner, InterWest Partners), Jeff Herbst (VP of Business Development,<br />

NVIDIA)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM A1<br />

S0052 Fast High Quality Image and Video Background<br />

Removal with CUDA<br />

A tool to efficiently and easily cut out objects from a taken picture<br />

has great practical value. In this session we present aspects on how<br />

to efficiently implement such a tool with CUDA and the NPP library<br />

based on the GrabCut approach by Rother et al. Through <strong>GPU</strong><br />

acceleration both runtime and accuracy is improved compared to<br />

CPU based implementations such as the one in MS Word 2011.<br />

Further we show how to extend our <strong>GPU</strong> implementation to enable<br />

live background removal in a webcam video stream.<br />

Speaker(s): Timo Stich (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Audio, Image and Video Processing, Machine Learning & AI<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM K<br />

S0070 Large-Scale Matrix-Free Topology Optimization<br />

on the <strong>GPU</strong><br />

Popular topology optimization methods today are based on the<br />

SIMP concept. Unfortunately, SIMP leads to ill-conditioned<br />

stiffness matrices that are difficult to solve on <strong>GPU</strong> architectures.<br />

In this talk, I will present a new topology optimization method<br />

called PareTO that relies on the concepts of topological sensitivity<br />

and pareto-tracing. The resulting stiffness matrices are well


conditioned, and one can now fully exploit <strong>GPU</strong> architectures for<br />

fast matrix-free implementation of the finite element method.<br />

Numerical experiments demonstrate that the efficacy of PareTO.<br />

Speaker(s): Krishnan Suresh (Associate Professor, University<br />

of Wisconsin)<br />

Topic(s): Computational Structural Mechanics, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM B<br />

S0084 CUMACH - A Fast <strong>GPU</strong>-based Genotype<br />

Imputation Tool<br />

The goal of this session is to introduce a <strong>GPU</strong>-implemented tool in<br />

bioinformatics. Genotype imputation is method which extrapolates<br />

genetic correlations from a densely characterized reference panel<br />

to a sparsely typed study sample. There have already been lots of<br />

CPU-based tools, but they all cost lots of time for large data-set.<br />

In this session, we try to implement a <strong>GPU</strong>-based imputation tool<br />

which can get relatively good result and fast speed. There will be<br />

three main parts for the session: 1) Introduce the background and<br />

its HMM based algorithm, 2) <strong>GPU</strong> implementation and<br />

optimization, 3) Results.<br />

Speaker(s): Agatha Hu (NVIDIA)<br />

Topic(s): Bioinformatics (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM N<br />

S0121 Software Architecture to Facilitate CUDA<br />

Development<br />

We describe a workflow architecture and its use in developing<br />

Schrödinger’s core-hopping application. The application supplies<br />

the stages as callbacks. A stage may have multiple<br />

implementations; for example, CUDA and CPU. An implementation<br />

can be assigned a maximum number of simultaneous threads.<br />

When any stage completes, a scheduling algorithm determines<br />

which implementation of which stage will be launched next. The<br />

application may detect “special” environments, such as CUDA, and<br />

set up its stages accordingly, or it may allow specification of which<br />

implementation of each stage to run. This makes it easy to develop<br />

and debug CUDA stages flexibly and incrementally.<br />

Speaker(s): Peter Shenkin (Vice President, Schrodinger), K. Patrick<br />

Lorton (Principal Developer, Schrodinger)<br />

Topic(s): Development Tools & Libraries, Life Sciences (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM A8<br />

S0202 Terascale Volume Visualization in Neuroscience<br />

Learn how to create a scalable volume visualization system for<br />

interactive rendering of terascale EM data. We will describe the<br />

major design principles, how we can avoid the standard approach<br />

of pre-computing a 3D multi-resolution hierarchy such as an<br />

octree, and how to handle continuous streaming of newly acquired<br />

data. For rendering we build upon a visibility-driven approach and<br />

3D virtual texturing, and perform interactive volume rendering of<br />

a “virtual” volume, where the corresponding physical storage is<br />

only represented and populated in a sparse manner with 2D<br />

instead of 3D image data on the fly during rendering.<br />

Speaker(s): Johanna Beyer (Postdoctoral Fellow, King Abdullah<br />

University of Science and <strong>Technology</strong>), Markus Hadwiger (Assistant<br />

Professor, KAUST)<br />

Topic(s): Visualization, Neuroscience (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0241 Large Graphs on Multi-<strong>GPU</strong>s<br />

The goal of this session is to propose new paradigms to explore<br />

large graphs on <strong>GPU</strong>s. Graphs with billions of edges don’t fit<br />

within the memory of a single <strong>GPU</strong>. A possible solution is to resort<br />

to multiple <strong>GPU</strong>s. Most of common graph algorithms show low<br />

arithmetic intensity and irregular access patterns. These features<br />

lead to a poor load balance among threads and un-coalesced<br />

access to memory. We show how to balance the load to exploit as<br />

much as possible all threads and then how to use fast algorithms,<br />

as radix-sort and scan, to rearrange data before process them.<br />

Speaker(s): Enrico Mastrostefano (PhD Student, Sapienza Università<br />

di Roma)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM A5<br />

S0257 Trace Based Performance Analysis For <strong>GPU</strong><br />

Accelerated Multi-Hybrid Applications<br />

Get in contact with performance tuning experts for multi-hybrid<br />

applications and see first hand how VampirTrace/Vampir can<br />

significantly speed up application porting and development.<br />

Speaker(s): Guido Juckeland (System Engineer (HPC), Leader<br />

Hardware Accelerator Group, TU Dresden - ZIH)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM C<br />

S0367 Physis: An Implicitly Parallel Framework for<br />

Stencil Computations<br />

This session presents how to implement finite difference methods<br />

in a concise, readable, and portable way, yet achieving good<br />

scalability over hundreds of <strong>GPU</strong>s, using the Physis high-level<br />

application framework. Physis extends the standard C language<br />

with a small set of custom declarative constructs for expressing<br />

stencil computations with multidimensional structured grids,<br />

which are automatically translated to CUDA for <strong>GPU</strong> acceleration<br />

and MPI for node-level parallelization with automatic domainspecific<br />

optimizations such as overlapped boundary exchanges.<br />

We demonstrate the programmability improvement and<br />

performance of Physis using hundreds of <strong>GPU</strong>s on TSUBAME2.0.<br />

Speaker(s): Naoya Maruyama (Assistant Professor, Tokyo Institute<br />

of <strong>Technology</strong>)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers,<br />

Supercomputing, Development Tools & Libraries, Computational Fluid<br />

Dynamics (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM L<br />

S0377 C++ Data Marshalling Best Practices<br />

When integrating CUDA C++ kernels into existing C++ applications,<br />

it is at times desirable to migrate a C++ object instance from the<br />

host to the device or vice versa. Given variations among host<br />

compilers regarding structure layout, accomplishing this data<br />

marshalling in a manner that is reliable, simple, and efficient is a<br />

complex issue. cudaMemcpy is our primary means to transfer<br />

data to the <strong>GPU</strong>, but memcpy-style operations are more readily<br />

amenable to C-style structures and arrays than to C++ objects or<br />

collections of objects. In this session, we will cover the caveats<br />

and best practices for marshalling C++ data.<br />

63 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

Speaker(s): Cliff Woolley (CUDA Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Finance, Application Design & Porting Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM A7<br />

S0511 3D Helmholtz Solver with a Shifted Laplace<br />

Multigrid on Multi-<strong>GPU</strong>s<br />

Learn about an iterative solver of the 3D Helmholtz equation on<br />

multi-<strong>GPU</strong> using CUDA. The Helmholtz equation discretized by a<br />

second order finite differences is solved with Bi-CGSTAB<br />

preconditioned by a shifted Laplace multigrid method. Two<br />

multi-<strong>GPU</strong> approaches are considered: data parallelism and<br />

algorithm-split. Their implementations on multi-<strong>GPU</strong> architecture<br />

are compared to a multi-threaded CPU and single <strong>GPU</strong><br />

implementation. The results show that the data parallel<br />

implementation is suffering from communication between <strong>GPU</strong>s<br />

and CPU, but is still several times faster compared to many-cores.<br />

The algorithm-split across <strong>GPU</strong>s limits communication and<br />

delivers speedups comparable to a single <strong>GPU</strong> implementation.<br />

Speaker(s): Kees Lemmens (Delft University of <strong>Technology</strong>)<br />

Topic(s): Energy Exploration, Algorithms & Numerical Techniques<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM A3<br />

S0525 Copperhead: Data Parallel Python<br />

Copperhead is a data parallel language suitable for <strong>GPU</strong><br />

programming, embedded in Python, which aims to provide both a<br />

productive programming environment as well as excellent<br />

computational efficiency. Copperhead programs are written in a<br />

small, restricted subset of the Python language, using standard<br />

constructs like map and reduce, along with traditional data<br />

parallel primitives like scan and sort. Copperhead programs<br />

interoperate with existing Python numerical and visualization<br />

libraries such as NumPy, SciPy, and Matplotlib. In this talk, we will<br />

discuss the Copperhead language, the open-source Copperhead<br />

runtime, and selected example programs.<br />

Speaker(s): Bryan Catanzaro (Research Scientist, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Intermediate)<br />

WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />

ROOM J1<br />

S0704 Accelerating Iterative Linear Solvers on <strong>GPU</strong>s<br />

In this talk, we present our work on solving sparse linear systems<br />

on NVIDIA Tesla <strong>GPU</strong>. We develop a new matrix format for <strong>GPU</strong>,<br />

HEC (Hybrid of ELL and CSR). The corresponding sparse matrix<br />

vector multiplication kernel and other related BLAS 1/2<br />

subroutines are developed. Based on these subroutines, seven<br />

Krylov subspace solvers and two algebraic multigrid solvers<br />

(AMG) are implemented. Several commonly used preconditioners,<br />

such as Neumann polynomial, approximate inverse, ILU(k), ILUT,<br />

block ILU(k), block ILUT, domain decomposition (DDM) and AMG<br />

preconditioners, are also developed. Besides, a new parallel<br />

triangular solver for <strong>GPU</strong> is designed. With this solver, a unified<br />

framework for ILU-related preconditioners is implemented.<br />

Speaker(s): Hui Liu (University of Calgary)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM L<br />

S0100 Mathematica as a Practical Platform for <strong>GPU</strong>-<br />

Accelerated Finance<br />

With the introduction of <strong>GPU</strong> support in version 8, Mathematica<br />

has become an excellent environment for integrating CUDA with<br />

high level code for interpretation or visualization. In this<br />

presentation, we will show the usefulness of Mathematica in the<br />

venue of computational finance. In addition to demonstrating the<br />

<strong>GPU</strong>-accelerated financial computations which can be readily<br />

performed within Mathematica, we will show that these<br />

calculations can easily be integrated with third-party data sources<br />

including Microsoft Excel and databases. Furthermore, we will<br />

cover the UnRisk Mathematica package written by MathConsult,<br />

which seamlessly adds <strong>GPU</strong>-accelerated complex model<br />

calibration algorithms to Mathematica’s repertoire.<br />

Speaker(s): Abdul Dakkak (Kernel Developer, Wolfram Research),<br />

Dylan Roeh (Kernel Developer, Wolfram Research)<br />

Topic(s): Finance, Development Tools & Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM A1<br />

S0128 V:Screen: A Real-Time Augmented Video Method<br />

This presentation presents a tool for image editing that allows us<br />

to modify a region of any image or video by another image or<br />

video. This application is useful for advertisements, commercials,<br />

music videos, movies, etc. We named “Virtual Screen” or just<br />

VScreen, to our development. The main difference between editing<br />

(augmenting) videos and fixed images is that the occlusions need<br />

be managed. Moving objects in the foreground may occlude the<br />

augmented region in background. So that we use a procedure for<br />

foreground-background video segmentation, that is implemented<br />

in NVIDIA video cards to fulfill the real-time requirement.<br />

Speaker(s): Francisco J. Hernandez-Lopez (PhD Student, CIMAT A.C.),<br />

Mariano Rivera (Researcher-Professor, CIMAT A.C.)<br />

Topic(s): Computer Vision (Beginner)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM N<br />

S0139 <strong>GPU</strong>-Based Molecular Dynamics Simulations of<br />

Protein and RNA Assembly<br />

Protein and RNA biomolecular folding and assembly problems<br />

have important applications because misfolding is associated with<br />

diseases like Alzheimer’s and Parkinson’s. However, simulating<br />

complex biomolecules on the same timescales as experiments is<br />

an extraordinary challenge due to a bottleneck in the force<br />

calculations. To overcome these hurdles, we perform coarsegrained<br />

molecular dynamics simulations where biomolecules are<br />

reduced into simpler components. Furthermore, our <strong>GPU</strong>-based<br />

simulations have a significant performance improvement over<br />

CPU-based simulations, which is limited to systems of 50-150<br />

residues/nucleotides. The <strong>GPU</strong>-based code can simulate protein/<br />

RNA systems of 400-10,000+ residues/nucleotides, and we<br />

present ribosome assembly simulations.<br />

Speaker(s): Samuel Cho (Assistant Professor, Wake Forest University)<br />

Topic(s): Molecular Dynamics, Computational Physics (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />

ROOM A3<br />

S0242 Harnessing <strong>GPU</strong> Compute with C++ AMP (Part 1 of 2)<br />

C++ AMP is an open specification for taking advantage of<br />

accelerators like the <strong>GPU</strong>. In this session we will explore the C++


AMP implementation in Microsoft Visual Studio 11. After a quick<br />

overview of the technology understanding its goals and its<br />

differentiation compared with other approaches, we will dive into<br />

the programming model and its modern C++ API. This is a code<br />

heavy, interactive, two-part session, where every part of the<br />

library will be explained. Demos will include showing off the<br />

richest parallel and <strong>GPU</strong> debugging story on the market, in the<br />

upcoming Visual Studio release.<br />

Speaker(s): Daniel Moth (Principal <strong>Program</strong> Manager, Microsoft)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Development<br />

Tools & Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />

ROOM A5<br />

S0298 Performance Tools for <strong>GPU</strong>-Powered Scalable<br />

Heterogeneous Systems<br />

Discover the latest parallel performance tool technology for<br />

understanding and optimizing parallel computations on scalable<br />

heterogeneous platforms. The session will present the TAU<br />

performance system and its support of measurement and analysis<br />

of heterogeneous platforms composed of clusters of sharedmemory<br />

nodes with <strong>GPU</strong>s. In particular, TAU’s integration of the<br />

CUPTI 4.1+ technology will be described and demonstrated<br />

through CUDA SDK examples and the SHOC benchmarks.<br />

Attendees will be provided LiveDVDs containing the TAU toolsuite<br />

and many pre-installed parallel tool packages. It will also include<br />

the last CUDA driver, runtime library, and CUPTI.<br />

Speaker(s): Allen Malony (Professor, University of Oregon)<br />

Topic(s): Development Tools & Libraries, Parallel <strong>Program</strong>ming<br />

Languages & Compilers, Application Design & Porting Techniques<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />

ROOM B<br />

S0361 Lossless Data Compression on <strong>GPU</strong>s<br />

In this talk, we will discuss common data compression algorithms<br />

used in the bzip2 implementation. We will also discuss our efforts<br />

towards parallelizing the Burrows-Wheeler Transform, Move-to-<br />

Front Transform, and Huffman encoding. The Burrows-Wheeler<br />

Transform is an algorithm used in both lossless data compression<br />

and bioinformatics. We’ll explain how it was computed using a<br />

parallel string-sorting algorithm. We will also show performance<br />

comparisons to serial implementations of each algorithm.<br />

Speaker(s): Jason Mak (Graduate Student, UC Davis), Ritesh Patel<br />

(Student, University of California Davis)<br />

Topic(s): Algorithms & Numerical Techniques, Bioinformatics<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0410 Computing Hausdorff Distances Between<br />

Freeforms on the <strong>GPU</strong><br />

We present new <strong>GPU</strong> algorithms for computing the directed<br />

Hausdorff distance between freeform surfaces, with applications in<br />

shape matching, mesh simplification, and geometric approximation<br />

and optimization. Our algorithms run in real-time with very small<br />

error bounds for parametric models defined by complex NURBS<br />

surfaces and can be used to interactively compute the Hausdorff<br />

distance for models made of dynamic deformable surfaces. We<br />

discuss implementation decisions and tradeoffs between OpenGL,<br />

Cuda, and Thrust, and the advantages and disadvantages of parallel<br />

hierarchical culling methods for this application.<br />

Speaker(s): Sara McMains (Professor, UC Berkeley), Adarsh<br />

Krishnamurthy (Post-doctoral Researcher, UC San Diego)<br />

Topic(s): Algorithms & Numerical Techniques, Computer Graphics,<br />

Computer Vision (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM K<br />

S0518 <strong>GPU</strong> Computing: From Sand to Tank Dynamics<br />

This talk explores the use of heterogeneous CPU/<strong>GPU</strong> computing,<br />

as enabled by an in-house developed Heterogeneous Computing<br />

Template (HCT), for physics-based simulations of mechanical<br />

systems. HCT draws on five components: advanced modeling<br />

techniques (formulating the governing equations); algorithmic<br />

support (solving these equations); proximity computation; domain<br />

decomposition/data exchange (for multi-node distributed CPU/<strong>GPU</strong><br />

computing); and post-processing/visualization. These five<br />

components provide the foundation of a computational framework<br />

used to analyze mechanical systems with millions of interacting<br />

elements. Example applications will include granular terrain<br />

simulation, tracked and wheeled vehicle mobility studies (tanks,<br />

rovers), fluid-solid interaction and nonlinear finite element analysis.<br />

Speaker(s): Dan Negrut (Associate Professor, University of<br />

Wisconsin-Madison)<br />

Topic(s): Computational Structural Mechanics, Computational<br />

Fluid Dynamics (Advanced)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM C<br />

S0605 cudaDMA: Emulating DMA engines on <strong>GPU</strong>s for<br />

Performance and <strong>Program</strong>mability<br />

The CudaDMA library is a collection of DMA objects that support<br />

efficient movement of data between off-chip global memory and<br />

on-chip shared memory in CUDA kernels. CudaDMA objects<br />

support many different data transfer patterns including sequential,<br />

strided, gather, scatter, and halo patterns. The library encapsulates<br />

efficient synchronization and data transfer implementations to<br />

achieve high memory bandwidth utilization. <strong>Program</strong>mer<br />

productivity is achieved by avoiding the need for thread array<br />

shapes to match data layout. Using CudaDMA, speedups of up to<br />

1.37x on synthetic micro-benchmarks and 1.15x-3.2x on kernels<br />

from scientific applications have been demonstrated.<br />

Speaker(s): Brucek Khailany (Senior Research Scientist, NVIDIA)<br />

Topic(s): Development Tools and Libraries (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 TBD (25 MINUTES)<br />

ROOM A8<br />

S0623 Visualizing Heterogeneous Performance Tested<br />

on MPI+CUDA Gigapixel Panorama Stitching<br />

This session consists of two technical parts. In the first part, we<br />

explain the use and implementation of a hybrid Poisson solver for<br />

gradient domain processing of massive images. Specifically, we<br />

provide a parallel out-of-core method for the seamless stitching<br />

of gigapixel panoramas in a parallel CUDA + MPI environment. In<br />

the second part, we shall cover the ongoing work of using novel<br />

visualizing techniques to understand performance data of<br />

heterogeneous computing clusters. The Poisson solver application<br />

shall be taken up as an example to demonstrate various features<br />

of this performance visualization tool.<br />

Speaker(s): Valerio Pascucci (Director of the Center for Extreme Data<br />

Management, Analysis and Visualization, University of Utah)<br />

Topic(s): Supercomputing, Visualization, Development Tools and<br />

Libraries (Beginner)<br />

65 CONFERENCE GUIDE WEDNESDAY


WEDNESDAY<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM J1<br />

S0705 Efficient AMG on Hybrid <strong>GPU</strong> Clusters<br />

This talk presents the implementation of an AMG solver for a hybrid<br />

cluster that exploits distributed and shared memory parallelization<br />

and uses the available <strong>GPU</strong> accelerators on each node. This solver<br />

has been written by using LAMA (Library for Accelerated Math<br />

Applications). This library does not only provide an easy-to-use<br />

framework for solvers that might run on different devices with<br />

different matrix formats, but also comes with features to optimize<br />

and hide communication and memory transfers between CPUs and<br />

<strong>GPU</strong>s. These features are explained and their impact on the<br />

efficiency of the AMG solver is shown. The benchmark results<br />

demonstrate that an efficient use of hybrid clusters is even possible<br />

for multi-level methods like AMG where fast solutions are needed<br />

on all levels for multiple problems sizes.<br />

Speaker(s): Thomas Brandes (Senior Scientist, Fraunhofer Institute for<br />

Algorithms and Scientific Computing SCAI), Jiri Krau, Fraunhofer<br />

Institute for Algorithms and Scientific Computing SCAI)<br />

Topic(s): Supercomputing (Intermediate)<br />

WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S2006 Emerging Companies Summit: CEO on Stage<br />

Featuring Raytrix and Playcast Featuring Raytrix,<br />

Playcast and Unviversal Robotics<br />

See the hottest new technologies from startups that are<br />

transforming computing. In a lively and fast-paced exchange, the<br />

Emerging Companies Summit CEO on Stage sessions will feature<br />

CEOs from three startups who will each have 15 minutes to<br />

introduce their companies and interact with a panel of leading<br />

venture capitalists, technology executives, and industry analysts.<br />

Speaker(s): Christian Perwass (CEO, Raytrix), Guy De Beer, (CEO,<br />

Playcast), David Peters (CEO, Universal Robotics)<br />

Panelist(s): Tom Furlong (Managing Director, Granite Ventures), Rob<br />

Enderle (Principal Analyst, Enderle Group), Flip GIanos (General Partner,<br />

InterWest Partners), Jeff Herbst (VP of Business Development, NVIDIA)<br />

Topic(s): General Interest (Beginner)<br />

WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />

ROOM M<br />

S0639 Presented by Penguin<br />

Description unavailable at press time.<br />

Topic(s): General (Beginner)<br />

WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />

ROOM A7<br />

S0647 Effective HPC Architecture - Design, Develop,<br />

Implement (Presented by ELEKS)<br />

Effective HPC system is so much more than just GP<strong>GPU</strong>. Realworld<br />

applications often need to stream large amounts of data from<br />

across system boundaries to the dozens of worker nodes in a most<br />

scalable and efficient way. They usually require storing huge<br />

amounts of data, scheduling of computation jobs, monitoring of<br />

system health and results visualization. Having first-hand<br />

experience in design, development and implementation of end-toend<br />

HPC solutions, our engineers will share their experience on<br />

some of the pitfalls to avoid and things to consider when planning<br />

your next HPC system that works.<br />

Speaker(s): Oleh Khoma (Head of HPC Unit, ELEKS)<br />

Topic(s): Supercomputing; Application Design & Porting Techniques<br />

Intermediate (Beginner)<br />

WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0810 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight development<br />

team! Whether you would like a private meeting to discuss specific<br />

product features or test out your application with the latest version<br />

of Nsight, or you just want to hang out with the team after attending<br />

one of the exciting training session, the lounge is great place to<br />

learn everything you ever wanted to know about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0096 Summed Area Ripmaps<br />

In this presentation, we show how ripmaps can replace Summed<br />

Area Tables (SATs) for the purpose of computing a large number<br />

of spatially varying box filter kernels throughout the input data,<br />

providing both higher accuracy and higher speed for typical use<br />

cases. For this purpose, we demonstrate an implementation of<br />

ripmap generation in CUDA C (accelerated by shared memory<br />

usage), and a texture-cache based box filter for spatially varying<br />

kernel sizes, which can be implemented in both CUDA C and<br />

graphics-based APIs (e.g. OpenGL and DirectX).<br />

Speaker(s): Gernot Ziegler (Compute Developer <strong>Technology</strong>, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques, Computer Vision,<br />

Computer Graphics (Intermediate)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

ROOM K<br />

S0217 Efficient Implementation of CFD Algorithms on<br />

<strong>GPU</strong> Accelerated Supercomputers<br />

The goal of this session is to introduce the concepts necessary to<br />

perform large computational fluid dynamic (CFD) problems on<br />

collections of many <strong>GPU</strong>s. Communication and computation<br />

overlapping schemes become even more critical when using fast<br />

compute engines such as <strong>GPU</strong>s that are connected via a relatively<br />

slow interconnect (such as MPI on InfiniBand). The algorithms<br />

presented are validated on unsteady CFD simulations of<br />

turbulence using 192 graphics processors to update half-a-billion<br />

unknowns per computational timestep. The performance results<br />

from three different <strong>GPU</strong> accelerated supercomputers (Lincoln,<br />

Forge, and Keeneland) are compared with a large CPU based<br />

supercomputer (Ranger).<br />

Speaker(s): Ali Khajeh Saeed (PhD Candidate, University of<br />

Massachusetts, Amherst), Blair Perot (University of Massachusetts,<br />

Amherst)<br />

Topic(s): Computational Fluid Dynamics, Computational Physics,<br />

Supercomputing, Application Design & Porting Techniques (Intermediate)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

ROOM C<br />

S0311 Teaching Applied Parallel Computing with <strong>GPU</strong>s<br />

Learn how the next generation of HPC developers are learning<br />

hands-on skills with <strong>GPU</strong>s, and how <strong>GPU</strong> computing is being<br />

incorporated into Computer Science courses. We will discuss how<br />

<strong>GPU</strong>s are being used to enhance student learning of parallel<br />

computing concepts through a cross-teaching approach, where<br />

students with different domain expertise are grouped into teams<br />

and tasked with parallelizing an application such as ray tracing.<br />

We’ll show that student projects that emphasize optimization of<br />

architectural resources and performance tuning allow students


with no prior experience to parallelize a large-scale application with<br />

significant performance improvement in as little as six weeks.<br />

Speaker(s): Chris Lupo (Assistant Professor, California Polytechnic<br />

State University)<br />

Topic(s): General Interest, Ray Tracing (Intermediate)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

ROOM N<br />

S0346 GP<strong>GPU</strong> Accelerated Protein Similarity Measures<br />

Identifying Biological Relevant Structure<br />

Atomic structure similarity measures for proteins help in de novo<br />

protein structure prediction. For a large set of computationally<br />

generated protein structures (~20k) all pairwise similarities have to<br />

be calculated to cluster structures. Common similarity measures<br />

are root mean square deviation (RMSD) and global distance test<br />

total score (GDT_TS). Although GDT_TS has advantages over RMSD,<br />

it is not used due to its time consuming calculation. Afore<br />

mentioned and other similarity measures are ported for parallel<br />

execution on GP<strong>GPU</strong>s to make them amenable for clustering de<br />

novo generated structural models to find the largest cluster<br />

representing the biological relevant protein conformations.<br />

Speaker(s): Edward Lowe (Research Assistant Professor, Vanderbilt<br />

University), Nils Woetzel (Research Assistant, Vanderbilt University)<br />

Topic(s): Bioinformatics, Application Design & Porting Techniques<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

ROOM A1<br />

S0425 File Sharing Plus Real Time Media and<br />

Document Collaboration<br />

Studiopass is a cloud based file sharing and visual collaboration<br />

tool which allows participants to collaborate on Microsoft<br />

documents and media files including 1080p video. It is graphic<br />

intensive and requires the best <strong>GPU</strong> performance to push<br />

playback of heavy files. This session will discuss how NVIDIA<br />

Tegra powered devices delivers the graphic and video<br />

performance needed for efficient collaboration needs and how it<br />

will bring more acceleration with the new Tegra 3 Quad Core plus<br />

1. Studiopass collaboration is not only accelerated by Tegra<br />

devices but also leverages NVIDIA Tesla accelerated transcoding<br />

running on Amazon Web Services.<br />

Speaker(s): Kevin Jackson (Founder / CEO, Viewpartners)<br />

Topic(s): Mobile Applications & Interfaces, Cloud Computing (Beginner)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

ROOM L<br />

S0656 kdb+ and <strong>GPU</strong>s for Market Data Analytics<br />

and Trading<br />

Market data volumes increase year-on-year with the occasional<br />

extraordinary capacity-breaking peak. We must capture, store and<br />

process these data to gain insights for quantitative and<br />

algorithmic trading using a variety of market data analytics and<br />

techniques. kdb+ from KX Systems is a memory-based column<br />

database, written in the vector-functional language q, often used<br />

in finance for these analyses. In this session we demonstrate a<br />

method for the enhanced performance of general programs<br />

written in q and kdb+ by executing them on the <strong>GPU</strong>.<br />

Speaker(s): Philip A. Beasley-Harling (Bank of America Merrill Lynch)<br />

Topic Area(s): Finance (Beginner)<br />

WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />

ROOM J1<br />

S0706 PISTON: Portability and Performance for Data-<br />

Parallel Visualization and Analysis Operators<br />

Due to the wide variety of current and next-generation<br />

supercomputing architectures, the development of highperformance<br />

parallel visualization and analysis operators<br />

frequently requires re-writing the underlying algorithms for many<br />

different platforms. In order to facilitate portability, we have<br />

devised a framework for creating such operators that employs the<br />

data-parallel programming model.<br />

Speaker(s): Christopher Sewell (Los Alamos National Laboratory),<br />

Li-Ta Lo (Los Alamos National Laboratory)<br />

Topic(s): <strong>GPU</strong>/Hybrid Computing, Data Science and Visualization<br />

(Intermediate)<br />

WEDNESDAY, MAY 16, 18:00 (50 MINUTES)<br />

ROOM L<br />

S0653 C++ and CUDA Birds-of-a-Feather<br />

This birds-of-a-feather will provide an opportunity for C++ and<br />

<strong>GPU</strong> users to learn about how the powerful C++ language can be<br />

used on the CUDA platform. NVIDIA and guest speakers will<br />

present details of the latest C++ features in CUDA and the Thrust<br />

open source template library, as well as discuss some goals and<br />

directions for C++ on the CUDA platform. It will also provide<br />

attendees a valuable opportunity to network with other attendees<br />

and NVIDIA engineers who share their interest in C++.<br />

Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

67 CONFERENCE GUIDE WEDNESDAY


SESSION INFORMATION<br />

THURSDAY, MAY 17<br />

THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0057 <strong>GPU</strong>-Accelerated Molecular Dynamics Simulation<br />

of Solid Covalent Crystals<br />

An efficient and highly scalable algorithm for molecular dynamics<br />

(MD) simulation (using sophisticated many-body potentials) of solid<br />

covalent crystals is presented. Its effective memory throughput on a<br />

single C2050 <strong>GPU</strong> board reached 102 GB/s (81% of the peak), the<br />

instruction throughput reached 412 Ginstr/s (80% of the peak), and<br />

27% of the peak flops of a single <strong>GPU</strong> was obtained. Parallel<br />

efficiency of the algorithm can be as high as 95% on all 7168 <strong>GPU</strong>s<br />

of Tianhe-1A, reaching possibly a record in high performance of MD<br />

simulations, 1.87Pflops in single precision.<br />

Speaker(s): Wei Ge (Professor, Institute of Process Engineering,<br />

Chinese Academy of Sciences)<br />

Topic(s): Molecular Dynamics, Algorithms & Numerical Techniques,<br />

Supercomputing (Intermediate)<br />

THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />

ROOM A8<br />

S0129 A Monte Carlo Thermal Radiation Solver in <strong>GPU</strong>/<br />

CPU Hybrid Architecture<br />

A Monte Carlo ray-tracing code is developed to predict radiative<br />

heat transfer behaviours in CFD simulation of combustion<br />

phenomena. Using emission-reciprocal method, each random ray<br />

casting of each node could be independently conducted for<br />

parallel computations. The code is efficiently implemented in<br />

hybrid <strong>GPU</strong>/CPU HPC resources using a dedicated dynamic load<br />

balancing strategy. A linear speedup scaling of hybrid HPC<br />

resources has been shown in demonstrating calculation of<br />

radiative heat transfer of a helicopter engine’s combustion<br />

chamber, while adding one <strong>GPU</strong> in HPC resources pool is in sense<br />

of nine CPU cores supplements.<br />

Speaker(s): Oliver Gicquel (Professor, Laboratoire E.M2.C, Ecole<br />

Centrale Paris), Gaofeng Wang (Postdoc Fellow, Laboratoire E.M2.C,<br />

Ecole Centrale Paris)<br />

Topic(s): Computational Fluid Dynamics, Computational Fluid<br />

Dynamics, Computational Physics, Ray Tracing (Intermediate)<br />

THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />

ROOM A3<br />

S0133 Improving Mars Rover Image Compression Via<br />

<strong>GPU</strong>s And Genetic Algorithms<br />

Learn how to use Jacket to accelerate genetic algorithm (GA)<br />

image compression. Our research uses a GA to optimize lossy<br />

compression transforms that outperform state-of-the-art<br />

wavelet-based approaches for a variety of image classes,<br />

including fingerprints, satellite, medical, and images transmitted<br />

from the Mars Exploration Rovers. A typical training run evolves a<br />

population of transforms over many generations; since each<br />

transform must be applied to each image from the training set,<br />

each run entails thousands of independent, parallelizable fitness<br />

evaluations. By using MATLAB, and Jacket to perform 2D<br />

convolution on the <strong>GPU</strong>, we have greatly reduced the total<br />

computation time needed.<br />

Speaker(s): Brendan Babb (Student/Research Technician, University of<br />

Alaska Anchorage)<br />

Topic(s): Machine Learning & AI, Audio, Image and Video Processing,<br />

Development Tools & Libraries (Beginner)<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM N<br />

S0256 A Stencil Library for the New Dynamic Core<br />

of COSMO<br />

We will present a stencil library used in the heart of the COSMO<br />

numeric weather prediction model. During the talk we’ll show<br />

how we implemented an abstraction that allows easy development<br />

of new stencils and solvers on top of a framework allowing<br />

execution on both CPU and <strong>GPU</strong>. The library makes efficient use<br />

of <strong>GPU</strong> resources and we will show how to structure memory<br />

accesses and computation optimally. Developers involved in<br />

porting or writing fully-featured C++ libraries for CUDA will also<br />

be interested in attending.<br />

Speaker(s): Tobias Gysi (Supercomputing Systems AG),<br />

Paul Messner (NVIDIA)<br />

Topic(s): Climate & Weather Modeling, Development Tools &<br />

Libraries (Advanced)<br />

THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0302 Accelerating miniFE: A Finite Element<br />

Mini-application<br />

The Mantevo performance project is a collection of self-contained<br />

proxy applications that illustrate the main performance<br />

characteristics of important algorithms. miniFE is intended to be<br />

and approximation to an unstructured implicit finite element or<br />

finite volume application. Our work investigated algorithms for<br />

assembling a matrix on the <strong>GPU</strong>. Parallelization algorithms using<br />

both 1 thread and 8 threads per element were investigated. Using<br />

these approaches a significant speedup (over 60x for double<br />

precision) compared to the serial algorithm.<br />

Speaker(s): Justin Luitjens (Developer <strong>Technology</strong>, Compute, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM C<br />

S0303 <strong>GPU</strong> Acceleration for Threshold Based Region<br />

Growth Algorithms<br />

Come learn how the massively parallel computing power of<br />

modern <strong>GPU</strong>s help to create faster and more accurate volume<br />

rendered images for the medical imaging community. Attendees<br />

of this session will gain insight into how <strong>GPU</strong>s can accelerate<br />

region growth algorithms and how these algorithms can be<br />

optimized for the latest generation of NVIDIA hardware. Topics<br />

covered will include fundamental of region growth, <strong>GPU</strong><br />

implementations, and practical examples of vessel tracking<br />

algorithms based on <strong>GPU</strong> accelerated algorithms.<br />

Speaker(s): Supratik Moulik (Cardiovascular Imaging Fellow, University<br />

of Pennsylvania), Jason Walsh (University of Pennsylvania 3D Lab)<br />

Topic(s): Medical Imaging & Visualization, Bioinformatics (Beginner)<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM A1<br />

S0326 Next Generation InfoWall<br />

Learn how you can use a multiple display configuration to render<br />

video content captured from multiple sources, utilizing the power<br />

of <strong>GPU</strong>s to achieve unprecedented performance.<br />

Speaker(s): Alina Alt (Applied Engineer, NVIDIA), Andrew Page (Sr.<br />

Product Manager, NVIDIA), Shalini Venkataraman (Senior Applied<br />

Engineer, NVIDIA), Ian Williams (NVIDIA)<br />

Topic(s): Visualization, Computer Graphics (Intermediate)<br />

69 CONFERENCE GUIDE THURSDAY


THURSDAY<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM B<br />

S0333 GMAC-2: Easy and Efficient <strong>Program</strong>ming for<br />

CUDA-Based Systems<br />

In this talk we introduce GMAC-2, a framework that eases the<br />

development of CUDA applications and tools while achieving<br />

similar or better performance than hand-tuned code. The new<br />

features implemented in GMAC-2 allow programmers to further<br />

fine-tune their code and remove some limitations found in the<br />

original GMAC library. For example, memory objects can be now<br />

arbitrarily mapped on several devices without restrictions and a<br />

host thread can launch kernels on any <strong>GPU</strong> in the system.<br />

Moreover, GMAC-2 transparently takes advantage of the new<br />

features offered by the hardware like the <strong>GPU</strong>Direct 2 peer-topeer<br />

communication.<br />

Speaker(s): Javier Cabezas (PhD Student, Barcelona Supercomputing<br />

Center), Isaac Gelado (Senior Researcher, Barcelona<br />

Supercomputing Center)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM M<br />

S0347 Accelerating Radio Astronomy Cross-Correlation<br />

beyond 1 Tflops using Fermi<br />

Radio astronomy is a signal processing application that requires<br />

extreme supercomputing. While today’s radio telescopes require<br />

10-100 Tflops of computational power, by the end of the decade<br />

this will increase to 1 Exaflops. The most compute intensive part<br />

of this problem is the so-called cross-correlation algorithm, which<br />

is a linear-algebra problem. In this session we demonstrate that<br />

the Fermi architecture is ideally suited to this problem, and<br />

through exploiting the Fermi memory hierarchy it is possible to<br />

achieve close to 80% of peak performance in a real application.<br />

Speaker(s): Michael Clark (Compute DevTech Engineer, NVIDIA)<br />

Topic(s): Astronomy & Astrophysics, Supercomputing (Intermediate)<br />

THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />

HALL 1<br />

S0362 Maximizing Performance on Multi-<strong>GPU</strong> Systems<br />

Are 512 CUDA Cores not enough? This session is for power users<br />

that are looking to scale applications to multi-<strong>GPU</strong> systems. We<br />

will take a holistic approach towards optimization. Rather than<br />

just focusing on CUDA programming, this session will cover<br />

techniques for reducing pressure on the PCIe bus, using CUDA<br />

Streams to improve load balance, dealing with NUMA impacts,<br />

and taking advantage of CPU threads. This talk will also cover<br />

strategies for developing applications that run on clusters with<br />

100 or more <strong>GPU</strong>s.<br />

Speaker(s): Kenneth Czechowski (Student, Georgia Tech)<br />

Topic(s): Supercomputing (Advanced)<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM L<br />

S0619 Hate to Wait? Flash Memory for Full-Throttle<br />

<strong>GPU</strong> Acceleration (Presented by Fusion-io)<br />

Are you guilty of ever not trying out an idea because of the time it<br />

would take to process the effect? With flash memory throttling your<br />

system like jet fuel for your <strong>GPU</strong>, you can finally make sluggish<br />

application performance a bad memory. This session will couple a<br />

technical overview of the latest in PCIe-attached flash memory<br />

technology for accelerating graphics processing with developer best<br />

practices and tuning for <strong>GPU</strong> applications using flash memory for<br />

image compositing, editing, video playback, 3D content creation,<br />

video capture and many other data-intensive tasks.<br />

Speaker(s): Vincent Brisebois (Visual Computing Product Manager,<br />

Fusion-io), Robert Wipfel (Fellow, Fusion-io)<br />

Topic(s): Digital Content Creation & Film, Computer Graphics<br />

(Intermediate)<br />

THURSDAY, MAY 17, 9:00 (50 MINUTES)<br />

ROOM A2<br />

S0648 Presented by ASUS<br />

Description unavailable at press time.<br />

Topic(s): General<br />

THURSDAY, MAY 17, 9:00 (110 MINUTES)<br />

ROOM J2<br />

S0707 Accelerated HPC Symposium: Scalability:<br />

Hardware and Software (Presented by LANL)<br />

This session will feature an introduction by Justin Tripp, followed<br />

by a short talk on “The FPGA: Another Piece of the Puzzle”<br />

followed by talk on “Increasing Efficiency with Kepler.” After a<br />

short discussion and break, we’ll end this session with three short<br />

talks, “Image Analysis for Terascale Radio Astronomy,” “In situ<br />

Image Analysis for Large Scale Visualization,” and “<strong>GPU</strong><br />

Acceleration of MapReduce.<br />

Speaker(s): Justin Tripp (LANL), Stephen Jones (NVIDIA),Christopher<br />

Fluke (Swinburne University of <strong>Technology</strong>),Christopher Sewel (LANL),<br />

Miao Xin (Junnan University)<br />

Topic(s): Supercomputing (Intermediate)<br />

THURSDAY, MAY 17, 9:00 (110 MINUTES)<br />

ROOM J3<br />

S0708 Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 1 (Presented<br />

by LANL)<br />

This session will feature an introduction by Guillaume Colin de<br />

Verdiere, followed by a short talk on “Precondition for Large-Scale<br />

Linear Solvers.” Proceeding this segment are two short talks on<br />

“Changing Data Structures for a Changing World,” and<br />

“Leveraging Roadrunner Experiences,” After a short discussion<br />

and break, we will then end this Part 1 of 2 talks with “Taming<br />

Laser Plasma Interactions: PICon<strong>GPU</strong>”.<br />

Speaker(s): Dimitar Lukarski (Karlsruhe Institute of <strong>Technology</strong>), Hui<br />

Liu (University of Calgary) and Michael Bussmann (Helmholtz-Zentrum<br />

Dresden-Rossendorf), Jamal Mohd-Yusof (Los Alamos National<br />

Laboratory)<br />

Topic(s): Supercomputing (Intermediate)<br />

THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0811 CUDA Debugger Training on Windows<br />

Nsight offers a variety of powerful CUDA debugging feature set<br />

that enables developers to quickly spot bugs. From the memory<br />

checker to advanced breakpoints and variable warp watch panel, a<br />

developer can quickly isolate access memory errors, filter out the<br />

thousands of threads to a specific thread and quickly spot<br />

abnormal variable value ranges. Through a set of comprehensive<br />

exercises, the attendee will be able to utilize these features to<br />

become fully proficient at developing CUDA code.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)


THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />

ROOM A3<br />

S0081 Parallel Computing In Mobile Robotics for RISE<br />

RISE, Risky Intervention and Surveillance Environment, is a very<br />

demanding task. In this presentation, three areas of research are<br />

discussed, these include: 3D data registration, robot navigation<br />

and 3D cloud of points processing. The approach based on robust<br />

KNN nearest neighborhood search applied for improvement of ICP<br />

algorithm is shown. The path planning parallel approach based on<br />

wave propagation method is shown. On line segmentation of 3D<br />

cloud of points based on normal vector computation is given. The<br />

set of proposed algorithms where tested on GP<strong>GPU</strong> NVIDIA CUDA<br />

GF 580, the results are satisfying.<br />

Speaker(s): Janusz Bedkowski (Researcher)<br />

Topic(s): Machine Vision (Beginner)<br />

THURSDAY, MAY 17, 09:30 (50 MINUTES)<br />

ROOM K<br />

S0238 Tesla Cluster Monitoring & Management APIs<br />

Learn more about cluster management and monitoring of Tesla<br />

and Quadro products. This includes a detailed description of the<br />

NVIDIA Management Library (NVML) and user facing third party<br />

software. Additionally, a brief summary of our out-of-band<br />

capabilities will be provided.<br />

Speaker(s): Robert Alexander (CUDA Tools Software Engineer, NVIDIA)<br />

Topic(s): Cluster Management (Beginner)<br />

THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />

ROOM A8<br />

S0264 CU++: An Object-Oriented Framework for<br />

Computational Fluid Dynamics (CFD) Applications<br />

In this session, I will elucidate the power of blending C++<br />

expression templates and CUDA which has resulted in a smart<br />

framework - CU++ for solving Computational Fluid Dynamics<br />

problems on structured and unstructured meshes. Briefly, CU++<br />

allows a code developer with just C/C++ knowledge to write<br />

computer programs that will execute on the <strong>GPU</strong> with minimal<br />

knowledge of specific programming techniques in CUDA. It allows<br />

the user to reuse existing C/C++ CFD codes with minimal<br />

changes. Codes written in CU++ can also be compiled in serial<br />

mode to be executed on a CPU using the tool ugc.<br />

Speaker(s): Dominic Chandar (Postdoctoral Research Associate,<br />

University of Wyoming)<br />

Topic(s): Computational Fluid Dynamics, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0290 Algorithm Acceleration for Geospatial Analysis<br />

Learn how the power of <strong>GPU</strong> computing is being leveraged to<br />

accelerate algorithms in the field of geospatial image analysis.<br />

The data volume and computation requirements associated with<br />

geospatial imagery are rapidly expanding as a result of the<br />

increasing number of satellite and airborne sensors, greater data<br />

accessibility, and expanded utilization of data intensive<br />

technologies. This equates to a growing need for highperformance<br />

computing in this field. We demonstrate the capacity<br />

for <strong>GPU</strong> computing to meet this need by accelerating a complex<br />

non-linear optimization algorithm used for the mapping and<br />

assessment of coral reef ecosystems.<br />

Speaker(s): James Goodman (President/CEO, HySpeed Computing<br />

LLC), Matthew Sellitto (Northeastern University)<br />

Topic(s): Algorithms & Numerical Techniques, General Interest<br />

(Intermediate)<br />

THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0354 Bcl::ChemInfo Suite Enables Machine Learning-<br />

Based Drug Discovery Using <strong>GPU</strong>s<br />

High-throughput screening data allows the training of machine<br />

learning quantitative structure activity relationship models which<br />

can be used for in silico drug discovery screening. Here, we present<br />

a <strong>GPU</strong>- accelerated suite for descriptor generation, model training,<br />

feature selection, and data set similarity analysis, bcl::ChemInfo.<br />

The suite provides functionality for the analysis of constructed<br />

models as well as for screening external libraries of compounds.<br />

We examine case studies illustrating how this workflow can now be<br />

completed in a single day on a Tesla equipped workstation with<br />

speedups reaching 300x providing a complete <strong>GPU</strong>-accelerated<br />

cheminformatics framework for drug discovery.<br />

Speaker(s): Edward Lowe (Research Assistant Professor, Vanderbilt<br />

University), Nils Woetzel (PhD Candidate, Vanderbilt University)<br />

Topic(s): General Interest (Intermediate)<br />

THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />

HALL 1<br />

S0360 Set <strong>GPU</strong>s Free: Integrating a File System with<br />

CUDA <strong>Program</strong>s<br />

This session seeks the answer to the question: “Can we simplify<br />

and speed up CUDA programs by allowing them to access files<br />

residing on a host?” To prove our affirmative answer, we<br />

demonstrate how the concept of a file system enables programs<br />

with non-trivial CPU-<strong>GPU</strong> and <strong>GPU</strong>-<strong>GPU</strong> interactions to be<br />

efficiently and easily implemented on top of a new <strong>GPU</strong> file-system<br />

layer. We also show that such a file system enables implementation<br />

of fully stand-alone <strong>GPU</strong> programs without any CPU wrapper code.<br />

Finally we outline the details of the file system design which<br />

contributed to scalability, data consistency and performance.<br />

Speaker(s): Mark Silberstein (Post-doctoral Researcher, UT Austin),<br />

Emmet Witchel (University of Texas, Austin)<br />

Topic(s): General Interest (Intermediate)<br />

THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />

ROOM A5<br />

S0621 NVIDIA OpenACC<br />

OpenACC is a directives-based programming standard for parallel<br />

computing on accelerators (including <strong>GPU</strong>s). It is designed to<br />

harness the transformative power of heterogeneous computing<br />

systems easily and quickly. Adding simple compiler hints to your<br />

code to express parallelism, allows the compiler to map<br />

computation onto an accelerator. OpenACC directives allow<br />

developers to make simple and portable code changes, enabling an<br />

easier migration to accelerated computing. This talk discusses the<br />

merits of this model, and provides an overview and guidance of the<br />

tools available to the developer from the OpenACC members.<br />

Speaker(s): Duncan Poole (Senior Manager, HPC, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0039 Data-Driven GP<strong>GPU</strong> Ideology Extension<br />

In this session we will demonstrate how the GP<strong>GPU</strong> ideology can<br />

71 CONFERENCE GUIDE THURSDAY


THURSDAY<br />

be extended so that it can be used on a scale of Infiniband hybrid<br />

system. The approach that we are presenting combines delayed<br />

execution, scheduling techniques and, most importantly, casts<br />

down the CPU multi-core ideology to the streaming<br />

multiprocessor’s one enforcing full fledged “GP<strong>GPU</strong> as a coprocessor”<br />

way of programming for large-scale MPI hybrid<br />

applications. Staying compatible with modern CPU/GP<strong>GPU</strong><br />

libraries it provides more than a fine grained control over<br />

resources - more than you wanted that is.<br />

Speaker(s): Bela Bauer (Postdoc, Microsoft Research), Alexandr<br />

Kosenkov (Software Engineer, University of Geneva)<br />

Topic(s): Application Design & Porting Techniques, Computational<br />

Physics, Parallel <strong>Program</strong>ming Languages & Compilers, Development<br />

Tools & Libraries (Advanced)<br />

THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />

ROOM N<br />

S0053 Real Time <strong>GPU</strong>-Based Marine Scenes Simulation<br />

Marine survey, carried out by sea or by air, is of major concern for<br />

current defense and security applications. Essential surveillance/<br />

observation/ identification systems involve electro-optics (visible<br />

and infra-red) and radar. Optimizing their performance requires<br />

amounts of expensive observational data spanning the wide<br />

variability of the marine environment. Computer simulation<br />

provides a valuable flexible and inexpensive alternative. Since<br />

2007, ALYOTECH, in partnership with the IFREMER (French<br />

Research Institute for Exploration of the Sea), has been developing<br />

a <strong>GPU</strong>-based real-time ocean scene simulator for visible, infrared<br />

and radar sensors, in order to meet the challenging requirements<br />

arising from marine survey issues.<br />

Speaker(s): Jérôme Graindorge (Project Manager, ALYOTECH), Julien<br />

Houssay (Software Engineer, ALYOTECH)<br />

Topic(s): Climate & Weather Modeling, Visualization (Intermediate)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM B<br />

S0078 Panoptes: A Binary Instrumentation Framework<br />

for CUDA<br />

Traditional CPU-based computing environments offer a variety of<br />

binary instrumentation frameworks, while the instrumentation<br />

and analysis tools available to date for <strong>GPU</strong> environments have<br />

been more limited. Here we present Panoptes, a binary<br />

instrumentation framework for CUDA that targets the <strong>GPU</strong>. By<br />

exploiting the <strong>GPU</strong> to run modified kernels, Panoptes allows<br />

computationally intensive programs to be run at the native<br />

parallelism of the device during analysis. To demonstrate the<br />

instrumentation capabilities of Panoptes, we will present our work<br />

on a memory addressability and validity checker that targets<br />

CUDA programs.<br />

Speaker(s): Christopher Kennelly (Research Scientist,<br />

D. E. Shaw Research)<br />

Topic(s): Development Tools & Libraries (Advanced)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM M<br />

S0124 Signal Processing on <strong>GPU</strong>s for Radio Telescopes<br />

This session will present <strong>GPU</strong> implementations of four highly<br />

compute-intensive algorithms used by radio telescopes.<br />

Speaker(s): John Romein (Senior Researcher, ASTRON)<br />

Topic(s): Astronomy & Astrophysics (Intermediate)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM C<br />

S0244 Harnessing <strong>GPU</strong> Compute with C++ AMP (Part 2 of 2)<br />

C++ AMP is an open specification for taking advantage of<br />

accelerators like the <strong>GPU</strong>. In this session we will explore the C++<br />

AMP implementation in Microsoft Visual Studio 11. After a quick<br />

overview of the technology understanding its goals and its<br />

differentiation compared with other approaches, we will dive into<br />

the programming model and its modern C++ API. This is a code<br />

heavy, interactive, two-part session, where every part of the<br />

library will be explained. Demos will include showing off the<br />

richest parallel and <strong>GPU</strong> debugging story on the market, in the<br />

upcoming Visual Studio release.<br />

Speaker(s): Daniel Moth (Principal <strong>Program</strong> Manager, Microsoft)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Development<br />

Tools & Libraries (Intermediate)<br />

THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />

ROOM A8<br />

S0305 Classical Algebraic Multigrid for CFD with CUDA<br />

Classical algebraic multigrid (AMG) is one of the most popular<br />

algorithms used in engineering, and the engine in many<br />

successful commercial packages. Among sparse linear solvers, it<br />

is known for being fast, parallel and scalable, yet it maps to <strong>GPU</strong><br />

architecture with some considerable difficulty. We have tackled<br />

these difficulties and currently have a full CUDA implementation<br />

of classical AMG, which has been validated against the goldstandard,<br />

Hypre. Significant effort was dedicated to reducing<br />

thread divergence and optimizing memory access, and we<br />

continue to work on performance improvements. We are aiming<br />

for a competitive AMG code for fluid dynamics applications.<br />

Speaker(s): Simon Layton (PhD Candidate, Boston University)<br />

Topic(s): Computational Fluid Dynamics, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0315 Probing Bio-Nano Interface Structure from<br />

Microsecond Molecular Dynamics on <strong>GPU</strong>s<br />

Using the latest algorithmic development in molecular dynamics<br />

on multiple <strong>GPU</strong>s over MPI, and technologies like <strong>GPU</strong>Direct it is<br />

now possible to address problems of interaction at bio-nano<br />

interface via large scale atomistic simulations. This talk will<br />

discuss the aspects of DNA-nanotube interactions and SWCNT<br />

induced conformational changes in DNA nucleosome structure.<br />

We will also address technical challenges upon porting and tuning<br />

AMBER 11 code on Condor <strong>GPU</strong> cluster at AFRL.<br />

Speaker(s): Olexandr Isayev (Research Scientist, Case Western<br />

Reserve University)<br />

Topic(s): Molecular Dynamics, Life Sciences (Advanced)<br />

THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />

ROOM A1<br />

S0324 Content Generation and Real-Time Hologram<br />

Computation for Holographic 3D-Displays<br />

This session will introduce SeeReal’s sub-hologram technology to<br />

massively reduce hologram computation effort in comparison to<br />

classic holography and how SeeReal implemented those still<br />

compute intensive algorithms to execute on the <strong>GPU</strong> to enable<br />

viewing of interactive, rich 3D-content on holographic 3D-displays<br />

using off-the-shelf graphics hardware. In contrast, you will<br />

explore why classic holography does not suit well for interactive


applications. Furthermore guidelines to create appropriate<br />

3D-content are presented, including aspects regarding<br />

transparency in holograms. Finally the specification and some<br />

impressions of SeeReal’s 20” holographic prototype will be<br />

presented, which allows viewing of live computed holograms<br />

showing 3D-content and 3D-video.<br />

Speaker(s): Enrico Zschau (Lead Software Architect, SeeReal<br />

Technologies GmbH)<br />

Topic(s): Visualization, Stereoscopic 3D, Algorithms & Numerical<br />

Techniques, Audio, Image and Video Processing (Beginner)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

HALL 1<br />

S0338 New Features In the CUDA <strong>Program</strong>ming Model<br />

The continuing evolution of the <strong>GPU</strong> brings with it new hardware<br />

capabilities and new functionality. Simultaneously, ongoing<br />

development of CUDA and its tools, libraries and ecosystem<br />

brings new features to the software stack as well. Come and learn<br />

from on of CUDA’s programming model architects about what’s<br />

new in the <strong>GPU</strong>, what’s coming in the next release of CUDA, how it<br />

works, and how it all fits together.<br />

Speaker(s): Stephen Jones (CUDA Developer, NVIDIA)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Intermediate)<br />

THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />

ROOM A2<br />

S0508 Faster Finite Elements for Wave Propagation Codes<br />

Learn how to develop faster and better finite-element codes for<br />

wave propagation using <strong>GPU</strong>s and MPI combined with overlapping<br />

techniques to hide the cost of communications and of host/device<br />

memory copies. Different options based on mesh coloring or on<br />

atomic operations will be presented. The difficulty to define<br />

speedup will also be discussed (speedup versus what? Using what<br />

definition of “cost”?). Examples will be given using SPECFEM3D, a<br />

highly optimized spectral finite-element code that has won the<br />

Gordon Bell SuperComputing award and the BULL Joseph Fourier<br />

award, and that can run on CPU or <strong>GPU</strong> clusters.<br />

Speaker(s): Max Rietmann (PhD Student, Institute for Computational<br />

Science / USI Lugano, Switzerland)<br />

Topic(s): Algorithms & Numerical Techniques, Computational<br />

Physics (Intermediate)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM A3<br />

S0521 Desktop Supercomputing in the Soft-Matter<br />

Physics Laboratory<br />

While many GP<strong>GPU</strong> applications reside on large clusters, in many<br />

laboratories the time to move data to an external cluster would<br />

exceed the time to analyze it upon arrival. By bringing highthroughput<br />

computational power to the data in the laboratory,<br />

<strong>GPU</strong>s offer new capabilities in doing science. This session offers a<br />

number of ways in which <strong>GPU</strong>s are making a significant impact on<br />

our research in experimental physics, biology and chemistry, from<br />

designing and building apparatus (Quadro and Tesla), to collecting<br />

data on portable devices (Tegra), to high-throughput analysis of<br />

large data sets (Tesla). It also presents results from studies<br />

investigating the motion of diffusing and aggregating colloidal<br />

particles and swimming bacteria, observing liquid-gas phase<br />

separation onboard the International Space Station, applying high<br />

dynamic-range techniques to optical tomography, and using<br />

low-cost devices to detect chemical and microbial contamination<br />

in the third world.<br />

Speaker(s): Peter Lu (Post-Doctoral Research Fellow, Harvard University)<br />

Topic(s): General Interest (Beginner)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM A5<br />

S0622 The PGI Fortran and C99 OpenACC Compilers<br />

Experienced <strong>GPU</strong> programmers will learn about the latest PGI<br />

OpenACC Fortran and C compilers. This session discusses how<br />

and where to apply the Parallel and Kernels constructs and the<br />

differences between the two. It includes a review of the latest PGI<br />

release and a comparison of the OpenACC standard to the PGI<br />

Accelerator Model. Live component demonstrates how to interpret<br />

compiler feedback and how to use it to enable better performance<br />

and how to inter-operate with lower-level explicit <strong>GPU</strong> languages<br />

like CUDA and OpenCL. The presentation wraps up with a look at<br />

planned future enhancements.<br />

Speaker(s): Brent Leback (Portland Group)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM L<br />

S0644 Molecule Dynamics, <strong>GPU</strong>s, and EC2 (Presented by<br />

Amazon Web Services)<br />

<strong>GPU</strong>s have made molecular dynamics simulations faster, better,<br />

and cheaper, achieving supercomputer performance from a single<br />

<strong>GPU</strong> without sacrificing stability or accuracy. In this talk we<br />

demonstrate how the <strong>GPU</strong> refactoring of AMBER 12 Molecular<br />

Dynamics has led to an implementation that produces results that<br />

are indistinguishable from the original CPU code. In addition, we<br />

describe the <strong>GPU</strong> compute instances available on the Amazon EC2<br />

platform to show how anyone can run any number of AMBER 12<br />

simulations, anytime from anywhere.<br />

Speaker(s): Scott Le Grand (Principal Engineer, Amazon Web Services)<br />

Topic(s): Molecular Dynamics; Computational Fluid Dynamics<br />

(Intermediate)<br />

THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0812 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight<br />

development team! Whether you would like a private meeting to<br />

discuss specific product features or test out your application with<br />

the latest version of Nsight, or you just want to hang out with the<br />

team after attending one of the exciting training session, the<br />

lounge is great place to learn everything you ever wanted to know<br />

about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />

ROOM A2<br />

S0079 Warped Parallel Nearest Neighbor Searches<br />

Using KD-Trees<br />

We propose a nearest neighbor search algorithm for a set of<br />

closely located query points that utilizes <strong>GPU</strong> parallelism and is<br />

optimized for a single CUDA warp. Instead of each query point<br />

traversing its own distinct path, a combined non-divergent path<br />

suitable for the entire query set can constructed. Therefore, for a<br />

single warp a single stack can be maintained for the entire set of<br />

query points, allowing for efficient utilization of the shared<br />

memory and a number of simultaneous queries equal to the<br />

number of threads in a warp.<br />

73 CONFERENCE GUIDE THURSDAY


Speaker(s): Roman Sokolov (Director of System Architecture, D4D<br />

Technologies), Andrei Tchouprakov (Director of System Architecture,<br />

D4D Technologies)<br />

Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />

ROOM N<br />

S0107 Acceleration of Long-Wave Rapid Radioactive<br />

Transfer Model on GP<strong>GPU</strong><br />

The WRF model is a next-generation mesoscale numerical<br />

weather prediction system designed to serve both operational<br />

forecasting and atmospheric research communities. WRF offers<br />

multiple physics options, one of which is the Long-Wave Rapid<br />

Radiative Transfer Model. We found, porting rtrn() subroutine to<br />

the CUDA challenging. It has couple of recursive loops, for which<br />

GP<strong>GPU</strong>s are actually not suitable. We developed a new technique<br />

called loop inversion, which helped us in getting 7.7x speed up for<br />

the individual, rtrn() subroutine without memory transfer, and in<br />

turn 10x speed up for overall RRTM module including initialization<br />

and memory transfer.<br />

Speaker(s): Mahesh Khadtare (PhD Student - Scientist ESP, I2IT, Pune<br />

University), Prakalp Somawanshi (CRL India)<br />

Topic(s): Climate & Weather Modeling, Application Design & Porting<br />

Techniques, Climate & Weather Modeling (Intermediate)<br />

THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0122 Computational Screening of Novel Carbon<br />

Capture Materials<br />

Discover how <strong>GPU</strong>s are used to identify optimal framework<br />

structures for carbon dioxide separation with the goal of reducing<br />

carbon emission. We describe the algorithm behind our <strong>GPU</strong><br />

software tool that iterates through a database of hypothetical<br />

zeolites and computes the selectivity of each of the structures.<br />

The code can be easily extended to simulate other adsorbent<br />

structures such as ZIFs (zeolitic imidazolate frameworks) and<br />

provide valuable insights to both theorists and experimentalists<br />

who have interest in carbon capture research.<br />

Speaker(s): Jihan Kim (Postdoctoral Researcher, Berkeley Lab),<br />

Berend Smit (UC Berkeley/Berkeley Lab)<br />

Topic(s): Molecular Dynamics (Intermediate)<br />

THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />

ROOM A1<br />

S0252 Building Real-Time Professional Visualization<br />

Solutions with OpenCL<br />

Professional visualization solutions, like high-quality highresolution<br />

medical displays or very large screens for surveillance<br />

or entertainment, benefit from <strong>GPU</strong>’s image and graphics<br />

compute capabilities to achieve real-time performance, but add<br />

specific constraints, like low-latency, multiple HD streams and<br />

strict synchronization. This talk first motivates the industrial<br />

relevance of development in OpenCL on heterogeneous devices. It<br />

then explains the techniques currently explored to meet the<br />

specific design constraints, with a main focus on parallel data<br />

transfer and compute. The lessons learned are illustrated with a<br />

real-life example.<br />

Speaker(s): Kristof Denolf (Research Engineer, Barco), Ronny Dewaele<br />

(Director <strong>Technology</strong> Center, Barco)<br />

Topic(s): Audio, Image and Video Processing, Visualization (Intermediate)<br />

THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0291 LAtoolbox: A Multi-platform Sparse Linear<br />

Algebra Toolbox<br />

Find out about an easy way for building sparse linear solvers for<br />

<strong>GPU</strong>s and multi-/many-core platforms. Based on data abstraction<br />

and virtualization of the hardware, the LAtoolbox supports several<br />

platforms such as <strong>GPU</strong>s, multi-core CPUs, and accelerators. The<br />

various backends (CUDA, OpenCL, OpenMP, ...) utilize optimized and<br />

platform-specific routines and allow seamless integration of <strong>GPU</strong>s<br />

into scientific applications. By means of unified interfaces across all<br />

platforms the library enables you to build generic linear solvers and<br />

preconditioners on a single code base without specific information of<br />

your hardware. We demonstrate portability and flexibility of our<br />

open-source approach on heterogeneous platforms.<br />

Speaker(s): Dimitar Lukarski (Research Associate, Karlsruhe Institute<br />

of <strong>Technology</strong> (KIT)), Jan-Philipp Weiss (Junior Professor, Karlsruhe<br />

Institute of <strong>Technology</strong>)<br />

Topic(s): Application Design & Porting Techniques (Intermediate)<br />

THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />

ROOM K<br />

S0309 Dynamically Allocating GP<strong>GPU</strong> to Host<br />

Nodes (Servers)<br />

Learn how to remotely change the mapping of <strong>GPU</strong>s to hosts<br />

based on application needs. Audience will then be presented with<br />

example scripts and a demo illustrating how this can be<br />

implemented to improve system resource utilization.<br />

Speaker(s): Alaa Yousif (Software Solution Architect, Dell), Saeed Iqbal<br />

(Senior Systems Engineer, Dell)<br />

Topic(s): Cluster Management (Beginner)<br />

THURSDAY, MAY 17, 11:00 (50 MINUTES)<br />

KEYNOTE HALL 1<br />

S3002 Day 3 Keynote: Not Your Grandfather’s Moon<br />

Landing<br />

Do not miss the day 3 keynote, featuring Part-Time Scientists<br />

Robert Boehme and Wes Faler. Boehme and Faler are part of a<br />

team of international scientists and engineers who want to send a<br />

rover to the moon before the end of the year 2013. In this<br />

presentation, they will discuss their goals, recent<br />

accomplishments and milestones, and how <strong>GPU</strong>s have help in<br />

unexpected ways.<br />

Speaker(s): Robert Boehme (CEO & Team Lead, Part-Time Scientists),<br />

Wes Faler (Head of Software Development, Part-Time Scientists)<br />

Topic(s): General Interest (All Levels)<br />

THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />

ROOM N<br />

S0044 A Massively Parallel Two-Phase Solver for<br />

Incompressible Fluids on Multi-<strong>GPU</strong> Clusters<br />

Join our presentation of a multi-<strong>GPU</strong> fluid solver for high<br />

performance <strong>GPU</strong> compute clusters. We use high-order scientific<br />

techniques to simulate the interaction of two fluids like air and<br />

water. Scientists, engineers and even the computer animation<br />

industry will profit from the enormous compute power of tens or<br />

hundreds of <strong>GPU</strong>s. A major focus in this talk will be on the applied<br />

<strong>GPU</strong> implementation techniques and the performance results<br />

including performance per Watt and performance per dollar<br />

results. We also highlight the lessons we learned from porting the<br />

complex CPU CFD code NaSt3DGPF to the <strong>GPU</strong>.<br />

75 CONFERENCE GUIDE THURSDAY


THURSDAY<br />

Speaker(s): Peter Zaspel (Research Assistant, University of Bonn)<br />

Topic(s): Computational Fluid Dynamics, Supercomputing, Algorithms &<br />

Numerical Techniques, Digital Content Creation & Film (Intermediate)<br />

THURSDAY, MAY 17, 14:00 (25 MINUTES)<br />

ROOM C<br />

S0054 PFAC Library: <strong>GPU</strong>-Based String Matching Algorithm<br />

In this section, we first propose an exact string matching<br />

algorithm, called Parallel-Failureless Aho-Corasick (PFAC)<br />

algorithm which is used to match input texts against a set of<br />

string patterns on <strong>GPU</strong>s. The string patterns are compiled into a<br />

finite state machine similar to the well-known Aho-Corasick<br />

algorithm. Furthermore, to accommodate large number of<br />

patterns, we present two kinds of hash functions which are<br />

adopted to compress the state transition table. The experimental<br />

results show that the PFAC library achieves significant<br />

performance on NVIDIA <strong>GPU</strong>s. Finally, the PFAC library has been<br />

released on Google code (http://code.google.com/p/pfac/).<br />

Speaker(s): Cheng-Hung Lin (Associate Professor, National Taiwan<br />

Normal University)<br />

Topic(s): Development Tools & Libraries, Algorithms & Numerical<br />

Techniques (Beginner)<br />

THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />

ROOM K<br />

S0119 Best Practices for Architecting and Managing<br />

High-Performance <strong>GPU</strong> Clusters<br />

An overview of designing, deploying, and managing <strong>GPU</strong> clusters<br />

for HPC. Learn to build and operate top500-class <strong>GPU</strong> computing<br />

resources that provide users with the latest CUDA features.<br />

Speaker(s): Dale Southard (Senior Solution Architect, NVIDIA)<br />

Topic(s): Cluster Management, Supercomputing (Intermediate)<br />

THURSDAY, MAY 17, 14:00 (25 MINUTES)<br />

ROOM M<br />

S0187 <strong>GPU</strong>s for Radio Imaging<br />

With the advent of a new breed of Telescopes like the Low<br />

Frequency Array (LOFAR), which rely on software processing to<br />

process large data-sets that they generate, there is a need to<br />

improve the software to run as fast as possible in order to process<br />

the large data-sets in a reasonable time. In this session we<br />

describe how we have used the computing power of <strong>GPU</strong>’s to<br />

improve the performance of the standard radio imaging<br />

techniques as well as how this computational power is useful for<br />

creating a new generation of Radio Imaging Algorithms.<br />

Speaker(s): Vamsi Krishna Veligatla (<strong>GPU</strong> <strong>Program</strong>mer, University<br />

of Groningen)<br />

Topic(s): Astronomy & Astrophysics (Intermediate)<br />

THURSDAY, MAY 17, 14:00 (25 MINUTES)<br />

ROOM L<br />

S0285 Optimization of a Sparse Matrix-Matrix<br />

Multiplication on the <strong>GPU</strong><br />

The goal of this session is to present advanced techniques to<br />

optimize CUDA code on the <strong>GPU</strong>. In particular, we will<br />

demonstrate the use of advanced CUDA instructions (inline PTX,<br />

warp instructions, “extended” syncthreads) and load-balancing<br />

strategies to improve the performance of a sparse matrix-matrix<br />

multiplication on the <strong>GPU</strong>.<br />

Speaker(s): Julien Demouth (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />

Topic(s): Algorithms & Numerical Techniques (Advanced)<br />

THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />

ROOM B<br />

S0320 PTask: OS Support for <strong>GPU</strong> Dataflow <strong>Program</strong>ming<br />

This session considers the PTask API, OS-level abstractions that<br />

support <strong>GPU</strong>s as first-class computing resources, and supports a<br />

dataflow programming model. With PTask, the programmer<br />

specifies where data goes, rather than how and when it should get<br />

there, allowing the system to provide fairness and isolation<br />

guarantees, streamline data movement in ways that currently<br />

require direct programmer involvement, and enable code<br />

portabality across diverse <strong>GPU</strong>-based platforms. Our experience<br />

building the PTask APIs shows that PTask can provide important<br />

system-wide guarantees and can enable significant performance<br />

benefits, for example improving the throughput of hand-tuned<br />

CUDA programs by up to 2x.<br />

Speaker(s): Jon Currey (Microsoft Research Silicon Valley), Christopher<br />

Rossbach (Researcher, Microsoft Research Silicon Valley)<br />

Topic(s): Development Tools & Libraries, General Interest, Parallel<br />

<strong>Program</strong>ming Languages & Compilers (Advanced)<br />

THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0378 VASP Accelerated with <strong>GPU</strong>s<br />

This session will detail the performance and capabilities of<br />

<strong>GPU</strong>-accelerated VASP, explain design decisions made in porting<br />

VASP to CUDA, and present a roadmap for <strong>GPU</strong> accelerated<br />

VASP development. We’ve achieved performance improvements<br />

up to around 20x on systems of around 100 ions and have<br />

implemented exact-exchange. We are working on ports of more<br />

conventional functionality.<br />

Speaker(s): Maxwell Hutchinson (PhD Student, University of Chicago)<br />

Topic(s): Quantum Chemistry, Application Design & Porting<br />

Techniques, Computational Physics (Intermediate)<br />

THURSDAY, MAY 17, 14:00 (110 MINUTES)<br />

ROOM J1<br />

S0709 Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models: Part 2 (Presented<br />

by LANL)<br />

This session is part 2 of Applications- Methods and <strong>Program</strong>ming<br />

model that will feature short talks on “The Portability Wall: How<br />

hard can it really be?,” followed by a talk on “Accelerating NAMD”<br />

as well as “Refitting Legacy Software for the New Reality” and<br />

“Unstructured Data Structures: An Achilles Heel?” After<br />

Discussion and break , the session will end with short talks on<br />

“Power: The New Metric” and “It’s about Concurrency, Stupid!”<br />

Speaker(s): John Stone (Urbana Champaign), James Phillips<br />

(University of Illinois), John Humphrey (EM Photonics), Raphael Poncet<br />

(CEA), Simon MacIntosh-Smith (University of Bristol), Stanley Tzeng<br />

(UC Davis)<br />

Topic(s): Supercomputing (Intermediate)<br />

THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0813 CUDA Profiler Training on Windows<br />

Nsight offers a comprehensive set of performance analysis tools.<br />

From the ability to trace complete system multi-core CPU and<br />

multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />

experiments, developers can identify system level optimization<br />

opportunities as well as expensive and inefficient CUDA kernels<br />

requiring in-depth analysis with the CUDA profiler. Through a set<br />

of comprehensive exercises, the attendee will be able to utilize


these features to become fully proficient at optimizing complex<br />

CUDA applications.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

THURSDAY, MAY 17, 14:30 (25 MINUTES)<br />

ROOM M<br />

S0022 Scalable Frameworks and Algorithms for<br />

Terascale Radio Astronomy Images<br />

Learn how the oldest science is using the newest processors to<br />

solve a critical problem: how to accomplish traditional image<br />

analysis and visualization tasks when the images are terabytes in<br />

size? Simple, standard operations such as displaying 2-d slices,<br />

evaluating image statistics, and applying histogram equalization<br />

become manifestly challenging when images dramatically exceed<br />

single-node memory capacity. We will explain how our hybrid<br />

CPU-<strong>GPU</strong> cluster framework – which can volume render a 200GB<br />

image at >50fps! – will support traditional radio astronomy tasks<br />

for the colossal images that the Square Kilometre Array and its<br />

precursor, the Australian SKA Pathfinder, will generate.<br />

Speaker(s): Christopher Fluke (Senior Lecturer, Swinburne University of<br />

<strong>Technology</strong> - Centre for Astrophysics and Supercomputing)<br />

Topic(s): Astronomy & Astrophysics, Visualization (Intermediate)<br />

THURSDAY, MAY 17, 14:30 (25 MINUTES)<br />

ROOM C<br />

S0032 Teraflop <strong>GPU</strong> Acceleration Of Large Matrix Algebra<br />

Learn how Multipath’s Fast Matrix Solver (FMS) is setting<br />

performance records using multiple <strong>GPU</strong>’s solving large matrices<br />

in production applications. By (1) leveraging NVIDIA’s CUBLAS<br />

library, (2) operating multiple <strong>GPU</strong>’s in parallel and (3) overlapping<br />

data transfers with computation, FMS averages over 2 teraflops of<br />

performance, even on jobs lasting for days. The presentation also<br />

includes a description of what problems FMS solves and how it is<br />

incorporated into applications programs.<br />

Speaker(s): Ronald Young (President, Multipath Corporation)<br />

Topic(s): Development Tools & Libraries, General Interest (Beginner)<br />

THURSDAY, MAY 17, 14:30 (50 MINUTES)<br />

ROOM L<br />

S0106 <strong>GPU</strong> Based Numerical Methods in Mathematica<br />

A fast way of developing, prototyping and deploying numerical<br />

algorithms that can take advantage of CUDA capable systems is<br />

available in Mathematica 8. Over the past year, educators,<br />

scientists, and business users have taken advantage of the<br />

benefits that the support of <strong>GPU</strong> programming in Mathematica. By<br />

integrating and implementing CUDA/OpenCL in their programs,<br />

users make use of a hybrid approach, combining the speed-up<br />

that <strong>GPU</strong>s offer and a powerful numerical development system. In<br />

this presentation several examples describing numerical<br />

applications ranging from deconvolution of MRI imaging, linear<br />

solvers for FEM, systems of ODEs, line integral convolution<br />

visualization are presented.<br />

Speaker(s): Ulises Cervantes-Pimentel (Senior Kernel Developer,<br />

Wolfram Research), Abdul Dakkak (Kernel Developer, Wolfram Research)<br />

Topic(s): Algorithms & Numerical Techniques, Visualization,<br />

Application Design & Porting Techniques, Development Tools &<br />

Libraries (Intermediate)<br />

THURSDAY, MAY 17, 14:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0231 Levenberg-Marquardt Using Block Sparse Matrices<br />

on CUDA<br />

This session describes the experiences of constructing <strong>GPU</strong> based<br />

matrix-vector functions for block sparse matrices having multiple<br />

block sizes and a domain-specific numerical Jacobian generation<br />

function. The bundle adjustment algorithm is an optimization<br />

procedure which attempts to refine the relative camera pose, and<br />

3D structure location variables, estimated from multiple sets of<br />

images. The Conjugate Gradient algorithm is used to solve the<br />

normal equations which appear in the inner loop to the non-linear<br />

least squares problem.<br />

Speaker(s): Tetsuo Tawara (Software Engineer, Koozyt)<br />

Topic(s): Application Design & Porting Techniques, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />

ROOM C<br />

S0071 The High-Level Linear Algebra Library ViennaCL<br />

and Its Applications<br />

Get to know ViennaCL, an OpenCL high-level linear algebra<br />

software, which allows to get the speed of <strong>GPU</strong> computing at the<br />

convenience level of the C++ Boost libraries. Decrease the<br />

development and execution time of applications by utilizing our<br />

well-tested and widely used library, instead of spending days on<br />

learning details of <strong>GPU</strong> architectures and debugging. We provide<br />

examples that demonstrate not only how quickly existing<br />

applications are ported efficiently from single-threaded execution<br />

to fully utilizing multi-threaded environments, but also how to<br />

utilize the rich set of functionalities ranging from common BLAS<br />

routines to iterative solvers.<br />

Speaker(s): Karl Rupp (Project Assistant, TU Wien)<br />

Topic(s): Development Tools & Libraries, Algorithms & Numerical<br />

Techniques, Computational Physics (Intermediate)<br />

THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />

ROOM M<br />

S0087 <strong>GPU</strong> Acceleration of Dense Stellar<br />

Clusters Simulation<br />

Computing the interactions between stars within dense stellar<br />

clusters is a problem of fundamental importance in theoretical<br />

astrophysics. This paper presents the parallelization of a Monte<br />

Carlo algorithm for simulating stellar cluster evolution using<br />

programmable Graphics Processing Units. The kernels of this<br />

algorithm exhibit high levels of data dependent decision making<br />

and unavoidable non-contiguous memory accesses. However, we<br />

adopt various parallelization strategies and utilize the high<br />

computing power of the <strong>GPU</strong> to obtain substantial near-linear<br />

speedups which cannot be easily achieved on a CPU-based<br />

system. This acceleration allows to explore physical regimes<br />

which were out of reach of current simulations.<br />

Speaker(s): Bharath Pattabiraman (PhD Student, Northwestern University),<br />

Stefan Umbreit (Postdoctoral Associate, Northwestern University)<br />

Topic(s): Astronomy & Astrophysics, Computational Physics, Algorithms<br />

& Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />

ROOM N<br />

S0091 Sustainable Hybrid Parallelization of an<br />

Unstructured Hydrodynamic Code<br />

The goal of this presentation is to share our methodology for<br />

77 CONFERENCE GUIDE THURSDAY


THURSDAY<br />

porting a numerical code to hybrid supercomputing architectures<br />

using MPI coupled with directive-based languages (OpenMP for<br />

multicore CPUs, and HMPP for <strong>GPU</strong>s). Our code, VOLNA, is an<br />

unstructured partial differential equation hydrodynamic solver<br />

developed for the simulation of tsunamis. Our results<br />

demonstrate that using directive-based languages such as HMPP<br />

for <strong>GPU</strong> programming, one can retain good performance (e.g.<br />

speedup of 15 compared to 1 CPU core, 3 compared to 8 CPU<br />

cores) with minimal modifications of the original CPU source code<br />

(about 30 lines of directives in our case).<br />

Speaker(s): Raphaël Poncet (Research Scientist, Commissariat à<br />

l’Energie Atomique et aux Energies Alternatives)<br />

Topic(s): Application Design & Porting Techniques, Algorithms &<br />

Numerical Techniques, Computational Fluid Dynamics,<br />

Computational Physics (Advanced)<br />

THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />

ROOM B<br />

S0157 A Study of Persistent Threads Style <strong>Program</strong>ming<br />

Model for <strong>GPU</strong> Computing<br />

We present the usefulness of a new style of <strong>GPU</strong> programming<br />

called Persistent Threads, known to be useful on irregular<br />

workloads. First, we will begin by formally defining the PT model.<br />

We will then categorize use of PT into four “use cases”, and<br />

present micro-benchmark analyses of when this model is useful<br />

over traditional kernel formulations. Third, we will show a full<br />

speech recognition application that uses all four PT use cases.<br />

Finally, we will conclude our talk by suggesting appropriate<br />

modifications to <strong>GPU</strong> hardware, software, and APIs that make PT<br />

kernels both easier to implement and more efficient.<br />

Speaker(s): Kshitij Gupta (Graduate Student Researcher, UC Davis),<br />

Jeff Stuart (PhD Student, UC Davis)<br />

Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Audio, Image<br />

and Video Processing (Advanced)<br />

THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0334 The Fast Multipole Method on CPU and <strong>GPU</strong><br />

Processors<br />

The fast multipole method (FMM) is a widely used numerical<br />

algorithm in computational engineering. Accelerating the FMM on<br />

CUDA-enabled <strong>GPU</strong>s is challenging because the FMM has a<br />

complicated data access pattern, mostly during the so-called<br />

multipole-to-local (M2L) operation. We have created several<br />

schemes to optimize the M2L and have attained a performance of<br />

over 350 (resp. 160) Gflop/s for single (double) precision<br />

arithmetic. The optimal algorithm was incorporated into a<br />

complete FMM code, which can accept any smooth kernel as<br />

specified by the user, making it very flexible. We have also<br />

developed a highly efficient CPU version.<br />

Speaker(s): Eric Darve (Professor, Stanford)<br />

Topic(s): Computational Physics, Molecular Dynamics, Algorithms &<br />

Numerical Techniques (Advanced)<br />

THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />

ROOM K<br />

S0368 Unraveling the Mysteries of Quarks with<br />

Hundreds of <strong>GPU</strong>s<br />

Dive into the world of quarks and gluons, and hear how <strong>GPU</strong><br />

computing is revolutionizing the way many calculations in lattice<br />

quantum chromodynamics (lattice QCD) are performed. The main<br />

computational challenge in such calculations is to repeatedly<br />

solve large systems of linear equations arising from a fourdimensional<br />

finite-difference problem. In this session, we’ll<br />

discuss strategies for parallelizing such a solver across hundreds<br />

of <strong>GPU</strong>s. These include techniques and algorithms for reducing<br />

memory traffic and inter-<strong>GPU</strong> communication. The net result is an<br />

implementation that achieves better than 20 Tflops on 256 <strong>GPU</strong>s,<br />

realized in the open-source “QUDA” library.<br />

Speaker(s): Ronald Babich (Research Scientist, NVIDIA)<br />

Topic(s): Computational Physics, Application Design & Porting<br />

Techniques, Algorithms & Numerical Techniques, Supercomputing<br />

(Intermediate)<br />

THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0429 Quantum Chemistry: Automated Code Generation<br />

and Optimization for <strong>GPU</strong> Kernels<br />

In this session we discuss the challenges encountered in<br />

development of quantum chemistry software for <strong>GPU</strong>s from<br />

scratch and optimization of the kernels for the best performance.<br />

We attempt to create a unified framework for automatic<br />

generation of efficient quantum chemistry codes tailored<br />

individually for various <strong>GPU</strong> (NVIDIA, ATI) and CPU architectures<br />

and programming (CUDA, OpenCL, C/C++) languages using a<br />

meta-programming approach based on a computer algebra<br />

system. We demonstrate its utility by generating highly optimized<br />

<strong>GPU</strong> and CPU kernels dealing with various integrals over<br />

Gaussian basis functions implemented in the TeraChem quantum<br />

chemistry package.<br />

Speaker(s): Alexey Titov (Engineering Research Associate, Stanford),<br />

Ivan Ufimtsev (Postdoc, Stanford)<br />

Topic(s): Quantum Chemistry (Advanced)<br />

THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0814 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight<br />

development team! Whether you would like a private meeting to<br />

discuss specific product features or test out your application with<br />

the latest version of Nsight, or you just want to hang out with the<br />

team after attending one of the exciting training session, the<br />

lounge is great place to learn everything you ever wanted to know<br />

about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

THURSDAY, MAY 17, 15:30 (50 MINUTES)<br />

ROOM M<br />

S0111 An Efficient CUDA Implementation of a Tree-Based<br />

N-Body Algorithm<br />

This session presents a complete CUDA implementation of the<br />

irregular Barnes-Hut n-body algorithm. This algorithm repeatedly<br />

builds and traverses unbalanced trees, making it difficult to map<br />

to <strong>GPU</strong>s. We explain in detail how our code exploits the<br />

architectural features of <strong>GPU</strong>s, including lockstep operation and<br />

thread divergence, both of which are commonly viewed as hurdles<br />

to achieving high performance, especially for irregular codes. On<br />

a five million body simulation running on a Tesla C2050, our CUDA<br />

implementation is 30 times faster than a parallel pthreads version<br />

running on a high-end 6-core Xeon.<br />

Speaker(s): Martin Burtscher (Associate Professor, Texas State University)<br />

Topic(s): Application Design & Porting Techniques, Astronomy &<br />

Astrophysics, Molecular Dynamics, Supercomputing (Advanced)


THURSDAY, MAY 17, 15:30 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0138 <strong>GPU</strong> Task-Parallelism: Primitives and Applications<br />

We explore how a task-parallel model can be implemented on the<br />

<strong>GPU</strong> and address concerns and programming techniques for<br />

doing so. We discuss the primitives for building a task-parallel<br />

system on the <strong>GPU</strong>. This includes novel ideas for mapping tasking<br />

systems onto the <strong>GPU</strong> including task granularity, load balancing,<br />

memory management, and dependency resolution. We also<br />

present several applications which demonstrate how a taskparallel<br />

model is more suitable than the regular data parallel<br />

model. These applications include a Reyes renderer, tiled deferred<br />

lighting renderer, and a video encoding demo.<br />

Speaker(s): Anjul Patney (PhD Candidate, UC Davis), Stanley Tzeng<br />

(Graduate Student, UC Davis)<br />

Topic(s): Application Design & Porting Techniques, Development Tools<br />

& Libraries, Computer Graphics (Intermediate)<br />

THURSDAY, MAY 17, 15:30 (50 MINUTES)<br />

ROOM L<br />

S0267B Mixing Graphics and Compute with Multiple <strong>GPU</strong>s<br />

In this session we will cover all the different aspects of interaction<br />

between graphics and compute. The first part of the session will<br />

focus on compute API interoperability with OpenGL (using CUDA<br />

and OpenCL APIs), while the second part of the session will delve<br />

into interoperability at a system level. In particular we will go<br />

through the challenges and benefits of dedicating one <strong>GPU</strong> for<br />

compute and another for graphics, how different system<br />

configurations affect data transfer between two <strong>GPU</strong>s, and how it<br />

translates into application design decisions helping to enable an<br />

efficient, cross-<strong>GPU</strong> interoperability between compute and<br />

graphics contexts.<br />

Speaker(s): Alina Alt (Applied Engineer, NVIDIA)<br />

Topic(s):Visualization, Application Design & Porting Techniques<br />

(Beginner)<br />

THURSDAY, MAY 17, 15:30 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0392 Large-Scale First Principle Pseudopotential DFT<br />

Calculations on <strong>GPU</strong> Clusters<br />

In this session, we will present a series of work on density<br />

functional theory (DFT) plane wave pseudopotential(PWP)<br />

calculations on <strong>GPU</strong> clusters. The <strong>GPU</strong> version is developed based<br />

on a CPU DFT-PWP code: PEtot, which can calculate ~1000 atoms<br />

on thousands of processors. Our test indicates that the <strong>GPU</strong><br />

version can have a ~20 times speedup over CPU code. A detail<br />

analysis of the speed-up and the scaling on the number of CPU/<br />

<strong>GPU</strong>(up to 256) will be presented. As far as we know, this is the<br />

first <strong>GPU</strong> DFT-PWP code scalable to large number of CPU/<strong>GPU</strong>.<br />

Speaker(s): WeiLe Jia (Postgraduate Student, Supercomputing Center of<br />

CNIC, Chinese Academy of Sciences), Long Wang (Associate Professor,<br />

Supercomputing Center of CNIC, Chinese Academy of Sciences)<br />

Topic(s): Quantum Chemistry, General Interest (Advanced)<br />

THURSDAY, MAY 17, 16:00 (25 MINUTES)<br />

MARRIOTT BALLROOM 3<br />

S0038 Designing Killer CUDA Applications for X86,<br />

multi<strong>GPU</strong>, and CPU+<strong>GPU</strong><br />

CUDA redefined software development with 10 to 1000-times<br />

faster <strong>GPU</strong> applications. Now a single CUDA source tree can<br />

support the x86 mass market (no <strong>GPU</strong> required) and 1/3 billion<br />

CUDA-enabled <strong>GPU</strong>s. Multi<strong>GPU</strong> and CPU+<strong>GPU</strong> apps utilize all<br />

system resources. <strong>GPU</strong>direct, UVA, caches, prefetching, ILP<br />

(Instruction level Parallelism), automated analysis tools and more<br />

offer ease, capability, and performance. The overall impact on<br />

software investment, scalability, balance metrics, programming<br />

API, and lifecycle will be considered. Working real-time video and<br />

other examples from my book, ”CUDA Application Design and<br />

Development” provide practical insight to enable augmented<br />

reality and your killer apps.<br />

Speaker(s): Robert Farber (Chief Scientist, BlackDog Endeavorsr, LLC)<br />

Topic(s): Machine Learning & AI, Supercomputing, Databases, Data<br />

Mining, Business Intelligence, Computer Vision (Intermediate)<br />

THURSDAY, MAY 17, 16:00 (50 MINUTES)<br />

ROOM N<br />

S0063 Robust Preconditioned Conjugate Gradient for the<br />

<strong>GPU</strong> and Parallel Implementations<br />

Get a closer look on how parallel conjugate gradient(CG) method<br />

can get an edge over it’s optimized CPU implementation. We have<br />

developed preconditioning techniques for CG which are suited to<br />

the <strong>GPU</strong> and match Block-IC in terms of numerical performance.<br />

We present our results for two level preconditioned CG on the <strong>GPU</strong><br />

and also compare it with multi-CPU, implementations. Our results<br />

show that for large problem sizes (1 million unknowns and above)<br />

it is possible to achieve an order of magnitude and higher<br />

speedups for the two level preconditioned CG method.<br />

Speaker(s): Rohit Gupta (PhD Student, Delft University of <strong>Technology</strong>)<br />

Topic(s): Computational Fluid Dynamics, Algorithms &<br />

Numerical Techniques (Intermediate)<br />

THURSDAY, MAY 17, 16:00 (25 MINUTES)<br />

ROOM K<br />

S0282 Leveraging NVIDIA <strong>GPU</strong>Direct on APEnet+ 3D<br />

Torus Cluster Interconnect<br />

APEnet+ is a novel cluster interconnect, based on a custom PCI<br />

card which features a PCI Express Gen2 X8 link and a reconfigurable<br />

HW component (FPGA). It supports a 3D Torus<br />

topology and has special acceleration features specifically<br />

developed for NVIDIA Fermi <strong>GPU</strong>s. An introduction to the basic<br />

features and the programming model of APEnet+ will be followed<br />

by a description of its performance on some numerical<br />

simulations, e.g. High Energy Physics simulations.<br />

Speaker(s): Davide Rossetti (Researcher, Italian National Institue for<br />

Nuclear Physics)<br />

Topic(s): Supercomputing, Computational Physics (Intermediate)<br />

THURSDAY, MAY 17, 16:00 (25 MINUTES)<br />

ROOM B<br />

S0428 Panini: A <strong>GPU</strong> Aware Array Class<br />

We present a new templated C++ class library, PANINI, for use in<br />

the development of large-scale scientific simulations in an<br />

hetrogeneous computing environment. The key feature of this new<br />

library is a generic parallel array class built on advanced generic<br />

programming methodologies where details of parallelization is<br />

hidden inside the array class itself. This library will be used for<br />

Poison Solver, Advection Diffusion and other equation.<br />

Speaker(s): Priyanka Sah (Compute DevTech Engineer, NVIDIA),<br />

Santosh Ansumali (Faculty Fellow, Engineering Mechanics Unit,<br />

JNCASR, Bangalore)<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

79 CONFERENCE GUIDE THURSDAY


THURSDAY<br />

THURSDAY, MAY 17, 16:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0815 CUDA Debugger Training on Windows<br />

Nsight offers a variety of powerful CUDA debugging feature set<br />

that enables developers to quickly spot bugs. From the memory<br />

checker to advanced breakpoints and variable warp watch panel, a<br />

developer can quickly isolate access memory errors, filter out the<br />

thousands of threads to a specific thread and quickly spot<br />

abnormal variable value ranges. Through a set of comprehensive<br />

exercises, the attendee will be able to utilize these features to<br />

become fully proficient at developing CUDA code.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

THURSDAY, MAY 17, 16:30 (50 MINUTES)<br />

ROOM M<br />

S0065 Satellite HUB Communication System <strong>GPU</strong> Based<br />

In the last few years the increasing <strong>GPU</strong> computational power has<br />

opened new perspectives in telecommunication fields trough SDR<br />

(software defined radio) approach. Some tasks, such as the one<br />

we had to deal with, do not offer negotiation margins with the<br />

execution speed due to the real-time analysis of a radio signal. We<br />

coped with the implementation of the lowest layer in the protocol<br />

stack for a land mobile satellite communication system, and we<br />

were able to deliver a product with a reduced time to market with<br />

respect to traditional FPGA approach.<br />

Speaker(s): Gaetano Mendola (Principal Engineer, MBI srl), Francesco<br />

Basile (Software Engineer, MBI srl)<br />

Topic(s): General Interest (Intermediate)<br />

THURSDAY, MAY 17, 16:30 (25 MINUTES)<br />

ROOM B<br />

S0218 ASI Parallel Fortran: A General-Purpose Fortran<br />

to <strong>GPU</strong> Translator<br />

Over the last 3 years we have developed a general-purpose<br />

Fortran to <strong>GPU</strong> translator: ASI Parallel Fortran does. The talk will<br />

detail its purpose, design layout and capabilities, and show how it<br />

is used and implemented. The use of ASI Parallel Fortran will be<br />

shown for large-scale CFD/CEM codes as well as other general<br />

purpose Fortran codes.<br />

Speaker(s): Rainald Lohner (Professor, George Mason University)<br />

Topic(s): Development Tools & Libraries, Computational Fluid<br />

Dynamics, Computational Physics, Parallel <strong>Program</strong>ming Languages<br />

& Compilers (Advanced)<br />

THURSDAY, MAY 17, 16:30 (50 MINUTES)<br />

MARRIOTT BALLROOM 4<br />

S0220 Enabling Faster Material Science Modeling Using<br />

the Accelerated Quantum ESPRESSO<br />

The goal of this session is to present the advantages of mixing<br />

CUDA libraries and CUDA kernels to deliver a robust community<br />

package for material science modeling that fully exploits multicore<br />

systems equipped with <strong>GPU</strong>s. The Plane-Wave Self-<br />

Consistent Field (PWscf) code of the Quantum ESPRESSO suite is<br />

the focus of this work. During the session the main computationdependent<br />

components, that also represent fundamental building<br />

blocks for many other quantum chemistry codes, will be<br />

discussed and analyzed. Subsequently an in-depth performance<br />

assessment of several realistic scientific cases will be presented,<br />

starting from single workstations to large clusters equipped with<br />

hundreds of <strong>GPU</strong>s.<br />

Speaker(s): Filippo Spiga (Computational Scientist, Irish Centre for<br />

High-End Computing)<br />

Topic(s): Quantum Chemistry, Supercomputing, Application Design &<br />

Porting Techniques (Intermediate)<br />

THURSDAY, MAY 17, 16:30 (25 MINUTES)<br />

ROOM L<br />

S0411 Artifact-Free Cloud-Based CAD Rendering<br />

Cloud computing for mechanical CAD provides centrally stored and<br />

synchronized models for concurrent engineering. For compactness,<br />

trimmed parametric NURBS surface representations are optimal<br />

for data transfer to client devices, which must evaluate and render<br />

models locally. Direct <strong>GPU</strong> rendering without pre-tessellation is an<br />

attractive solution in this context, both for speed and to preserve<br />

fidelity to the original geometry. However, existing data-parallel<br />

direct rendering approaches for NURBS suffer from rendering<br />

artifacts at trim boundaries. This talk proposes a solution to<br />

address these rendering artifacts that are still preventing widescale<br />

adoption of all such direct rendering algorithms for trimmed<br />

parametric models.<br />

Speaker(s): Sara McMains (Professor, UC Berkeley), Sushrut<br />

Pavanaskar (PhD Candidate, UC Berkeley)<br />

Topic(s): Algorithms & Numerical Techniques, Computer Graphics,<br />

Cloud Computing, Visualization (Beginner)<br />

THURSDAY, MAY 17, 17:00 (25 MINUTES)<br />

ROOM L<br />

S0074 Techniques for Designing GP<strong>GPU</strong> Games<br />

Learn how to develop faster and better games with the use of<br />

GP<strong>GPU</strong> thought the use of Game <strong>GPU</strong> tricks. Normally, games<br />

process most of its tasks in the CPU, using the <strong>GPU</strong> only for<br />

graphics processing. This session shows some techniques on how<br />

to better use the GP<strong>GPU</strong> power to process all the game logic,<br />

achieving speedups when compared to CPU, and traditional <strong>GPU</strong><br />

models. This session also shows some examples of this technique<br />

in practice.<br />

Speaker(s): Mark E S Joselli (Researcher, UFF), Esteban Clua<br />

(Professor, UFF)<br />

Topic(s): Development Tools & Libraries (Intermediate)<br />

THURSDAY, MAY 17, 17:00 (50 MINUTES)<br />

ROOM NVIDIA NSIGHT LAB<br />

S0816 NVIDIA Nsight Lounge<br />

Come to the NVIDIA Nsight Lounge to meet the Nsight<br />

development team! Whether you would like a private meeting to<br />

discuss specific product features or test out your application with<br />

the latest version of Nsight, or you just want to hang out with the<br />

team after attending one of the exciting training sessions, the<br />

lounge is great place to learn everything you ever wanted to know<br />

about the tool.<br />

Speaker(s): NVIDIA Developer Tools Team<br />

Topic(s): Development Tools & Libraries (Beginner)<br />

THURSDAY, MAY 17, 17:30 (25 MINUTES)<br />

ROOM M<br />

S0134 On the Integration of OpenCL into a Software<br />

Defined Radio<br />

We will present a software defined radio system that allows for<br />

heterogeneous processing using a host computer’s CPUs and<br />

<strong>GPU</strong>s, via dynamic runtime resource allocation provided by our<br />

Surfer framework and extensions to it using OpenCL. This system


collects runtime statistics including samples / second throughput<br />

for each signal processing block, data transfer latency between<br />

different processors, and the host CPU cores’ loads. Using this<br />

information, a supervisor can move computations between<br />

processors during runtime, without interrupting data processing.<br />

We will demonstrate an OFDM transmitter, graphing the system<br />

throughput and CPU loads while selecting where processing<br />

occurs for each block.<br />

Speaker(s): Michael Dickens (Graduate Student, University of Notre Dame)<br />

Topic(s): General Interest (Intermediate)<br />

81 CONFERENCE GUIDE THURSDAY


<strong>GPU</strong> Consolidation<br />

and Virtualization for<br />

Application Acceleration<br />

and Data Visualization<br />

www.nextio.com


ALGORITHMS & NUMERICAL TECHNIQUES<br />

AN01 - A Novel Parallel Realisation of the<br />

Element-by-Element FEM Technique<br />

The element-by-element (EbE) finite element<br />

method (FEM) is a long known technique, by which<br />

a conjugate gradient (CG) type iterative solution<br />

scheme can be entirely decomposed into<br />

computations on the element level, i.e., without<br />

assembling the global system matrix. In our<br />

implementation a CUDA capable <strong>GPU</strong> is utilized to<br />

perform the required element-wise computations in<br />

parallel. Since element matrices need not be stored,<br />

the memory requirement can be kept extremely low.<br />

This low-storage but computation intensive<br />

technique is better suited for <strong>GPU</strong>s than those<br />

requiring the massive manipulation of large data<br />

sets, enabling handling of millions of tetrahedrons.<br />

Contact: Zsolt Badics (Tensor Research, LLC)<br />

AN02 - ExaFMM: An Open Source Library for<br />

Fast Multipole Methods<br />

The fast multipole method (FMM) is a numerical<br />

engine use din many applications, from acoustics,<br />

electrostatics, fluid simulations, wave scattering<br />

and more. Despite its importance, there is lack of<br />

open community code, which arguably has<br />

affected its wider adoption. It is also a difficult<br />

algorithm to understand and to program, making<br />

availability of open-source implementations even<br />

more desirable. We developed a novel treecode-<br />

FMM hybrid algorithm with auto-tuning<br />

capabilities. It is highly parallel and <strong>GPU</strong>-capable.<br />

Its usage in the simulation of homogeneous<br />

isotropic turbulence achieved 0.5 petaflop/s on<br />

2048 <strong>GPU</strong>s of the Tsubame system.<br />

Contact: Lorena Barba (Boston University)<br />

AN03 - Collatz-Type Conjectures on <strong>GPU</strong><br />

We verify two types of Collatz conjectures: on the<br />

set of rational numbers and the set of matrices<br />

modulo p, where p is prime. In both cases, the<br />

number of pairs of rational numbers and matrices<br />

grow exponentially. However, our algorithm<br />

exhibits simple parallel patterns which exploit<br />

<strong>GPU</strong>s in an efficient way. The preliminary results<br />

show that the conjecture holds for both cases for<br />

large sets.<br />

Contact: Peter Yoon (Trinity College)<br />

AN04 - CUDA Implementation of Recurrence<br />

Equation Solvers Using P-scheme approach<br />

The recurrence equation solver is used in many<br />

numerical applications and other general-purpose<br />

applications, but it is inherently a sequential<br />

algorithm, so it is difficult to implement the<br />

parallel program for it. We implement a parallel<br />

and scalable algorithm for solving recurrence<br />

equations on <strong>GPU</strong>s by using CUDA and evaluate<br />

its effectiveness. The algorithm was originally<br />

implemented for MIMD parallel computers by the<br />

authors and we modify the algorithm suitable for<br />

the GP<strong>GPU</strong> system by rearranging arrays<br />

configurations. We also show how to determine<br />

the optimal size of threads in a thread block and<br />

evaluate its validity.<br />

Contact: Akiyoshi Wakatani (Konan University)<br />

AN05 - Accelerating Symmetric Matrix-Vector<br />

Product on Fermi <strong>GPU</strong><br />

We aim in the work presented here to describe an<br />

optimized numerical kernels computing the<br />

symmetric matrix-vector product (Level 2 BLAS)<br />

on the last NVIDIA TESLA <strong>GPU</strong> family, codenamed<br />

Fermi (C2070). Due to its inherent memory-bound<br />

nature, this kernel represents one of the most<br />

critical operations in computing the tridiagonal<br />

form of a symmetric dense matrix, which is the<br />

preprocessing step toward calculating the<br />

eigenpairs. Using a novel design to address the<br />

irregular memory accesses by hiding latency and<br />

increasing bandwidth, our preliminary asymptotic<br />

results show up to 3.5 fold speedups over existing<br />

numerical libraries.<br />

Contact: Hatem Ltaief (KAUST Supercomputing<br />

Laboratory)<br />

AN06 - Rapid Matrix Construction for Wavelet-<br />

Galerkin Schemes<br />

The wavelet Galerkin scheme is an efficient<br />

numerical method used to improve Boundary<br />

Element Methods and Finite Element Methods for<br />

solving partial differential equations given<br />

resulting matrix features like sparseness and<br />

conditionality. Using CUDA C/C++ we have<br />

implemented the open-source C++ Library of<br />

Adaptive Wavelet Applications (LAWA) on the <strong>GPU</strong><br />

and achieve significant performance gain for<br />

matrix construction.<br />

Contact: Yuri Nesterenko (Dantec Dynamics A/S)<br />

AN07 - Big Number Modulo Exponentiations For<br />

Zero-Knowledge Protocols on <strong>GPU</strong>s<br />

In this work we implement parallel big number<br />

exponentiations having a fixed base on the <strong>GPU</strong>.<br />

For this task we develop a new implementation of<br />

the Montgomery multiplication algorithm. Although<br />

big number exponentiations benefit from large<br />

caches like on a CPU, we show that this lack can be<br />

compensated by a high level of parallelization and<br />

an adaptation of the algorithms.<br />

Contact: Tobias Jeske (TU Hamburg-Harburg)<br />

AN08 - Tuning a Finite Difference Stencil<br />

Several ways of tuning a finite difference stencil<br />

computation are discussed. The combination of<br />

vectorization and a modified data layout, a cache<br />

aware algorithm, loop unrolling, parallelization<br />

and parameter tuning lead to optimized<br />

implementations at a level of up to 90% peak<br />

performance of the floating point pipelines on<br />

NVIDIA Fermi <strong>GPU</strong>s and on CPUs.<br />

Contact: Gerhard Zumbusch (University Jena)<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

83


POSTER LISTINGS<br />

AN09 - Parallel <strong>Program</strong>ming on CPU-<strong>GPU</strong> for<br />

Solving Population Balance Equation<br />

The population balance equation (PBE) is one of<br />

those. The Dual Quadrature Method of Generalized<br />

Moments (DuQMoGeM) is a promising method for<br />

solving the PBE. The drawback of this methodology<br />

is the large computational cost associated with the<br />

adaptive numerical integration. Therefore, the<br />

adaptive cubature algorithm was implemented in<br />

hybrid architecture (MPI-CUDA) to accelerate the<br />

DuQMoGeM. The maximum speed up was about<br />

48x using 4 <strong>GPU</strong>s and 4 nodes and the maximum<br />

speed up was about 40x using 2 <strong>GPU</strong>s and 1 node.<br />

Contact: Fabio Pereira dos Santos (Institute for<br />

Medical Physics)<br />

AN10 - <strong>GPU</strong> Enabled Comparison Between<br />

Stochastic Decomposition Methods<br />

The scale of engineering problems has sharply<br />

increased over the last twenty years. The ability to<br />

learn the coupling (inter-dependence) structure of<br />

a problem during the solution process could lead<br />

to large reductions in the time to analyze complex<br />

problems. Such decomposition methods could<br />

also provide engineering insight on the<br />

fundamental physics driving problem solution.<br />

This work forwards the current state of the art in<br />

engineering decomposition through the<br />

application of techniques originally developed<br />

within computer science and information theory.<br />

CUDA enabled a detailed comparison between the<br />

current practice of using Genetic Algorithms and a<br />

newly introduced method called MIMIC.<br />

Contact: Richard Otero (Los Alamos National Lab)<br />

AN11 - <strong>GPU</strong>-Accelerated 3-D Electromagnetic<br />

Particle-in-Cell Implementations in VORPAL<br />

We present recent developments in implementing<br />

3D <strong>GPU</strong>-accelerated eletromagnetic particle-incell<br />

particle updates in the plasma physics<br />

framework VORPAL. The primary challenge in PIC<br />

methods on <strong>GPU</strong>s is thread contention during the<br />

current deposition stage: we resolve these thread<br />

contentions by sorting particles into ‘tiles’ of many<br />

cells each time step. Multiple thread blocks may<br />

be assigned to each tile, and each block<br />

accumulates the contribution from a moderate<br />

number of particles via an unsegmented<br />

Esirkepov 1st-order scheme. We achieve update<br />

times of 50 ns per-particle per-timestep for a<br />

variety of realistic self-consistent double-precision<br />

EM simulations.<br />

Contact: Keegan Amyx (Tech-X Corporation)<br />

AN12 - LU Factorization for 10,000s of Small<br />

Dense Matrices<br />

LU factorization is a ”high-level” algebraic<br />

description for Gaussian elimination and is a<br />

fundamental operation performed in linear<br />

algebra. By implementing a register heavy<br />

mapping in CUDA specifically for small matrices,<br />

speed-up factors of more than 10 are achieved vs.<br />

an OpenMP parallelized Intel MKL implementation<br />

running on a high-end quad-core CPU.<br />

Contact: Ian Wainwright (High Performance<br />

Consulting)<br />

AN13 - <strong>GPU</strong> Implementation of a Streaming<br />

Broadband RF Receiver<br />

An experimental radio broadcasting system<br />

spreads the signal with a PN code. To reconstruct<br />

the original signal, the receiver correlates the PN<br />

code with the signal received, numerically<br />

sampled, requiring a direct as well as an inverse<br />

Fast Fourier Transform, plus other conditioning<br />

and filtering operations. Since the target speed of<br />

the system is 625 mega samples per second,<br />

processed in segments of one mega samples<br />

each, performing this computation on a standard<br />

CPU system is prohibitive and <strong>GPU</strong> processing is<br />

an attractive option. This project describes an<br />

initial CUDA implementation that performs almost<br />

at target speed.<br />

Contact: Andrea Di Blas (University of California,<br />

Santa Cruz)<br />

AN14 - Efficient Algebraic Multigrid Methods<br />

on <strong>GPU</strong>s<br />

Algebraic multigrid methods for large, sparse<br />

linear systems are a necessity in many<br />

computational simulations, yet parallel algorithms<br />

for such solvers are generally decomposed into<br />

coarse-grained tasks suitable for distributed<br />

computers with traditional processing cores. We<br />

develop a parallel algebraic multigrid method<br />

which exposes substantial fine-grained<br />

parallelism in both the construction of the<br />

multigrid hierarchy as well as the cycling or solve<br />

stage. The resulting solver achieves an average<br />

speedup of 1.8x in the setup phase and 5.7x in the<br />

cycling phase when compared to a representative<br />

CPU implementation.<br />

Contact: Steven Dalton (University of Illinois at<br />

Urbana-Champaign)<br />

APPLICATION DESIGN & PORTING<br />

TECHNIQUES<br />

AP01 - Debugging Floating Point<br />

Implementations on <strong>GPU</strong>s<br />

To debug <strong>GPU</strong> code it is important to understand<br />

differences between both CPU and <strong>GPU</strong><br />

implementations. The differences arise due to<br />

floating point (FP) differences and casting from<br />

floating point to fixed point. FP differences arise<br />

due to the lack of associativity of FP, differences in<br />

instruction implementation, and choices made by<br />

the compiler. We analyzed medical image<br />

reconstruction code for breast reconstruction and<br />

showed that <strong>GPU</strong> and CPU code could be made to<br />

produce identical results. We also analyze the<br />

performance implications of choosing different<br />

implementation options on the <strong>GPU</strong> and CPU to<br />

make the codes match.<br />

Contact: Miriam Leeser (Northeastern University)


AP02 - KILO Transactional Memory for <strong>GPU</strong><br />

<strong>GPU</strong>s are designed to efficiently execute of 1000s<br />

of concurrent threads on multiple SIMT cores to<br />

hide long latency operations. Currently, threads in<br />

different CUDA blocks can only communicate via<br />

global memory accesses, and programmers have<br />

to consider data-races. Although fine-grained<br />

locks can be constructed using 32-/64-bit word<br />

atomic operations in recent <strong>GPU</strong>s, operations<br />

involving multiple locks can have deadlocks. We<br />

propose to solve these problems by extending<br />

<strong>GPU</strong>s to support transactional memory. Some of<br />

the major challenges are to support 1000s of<br />

concurrent transactions, to commit nonconflicting<br />

transactions in parallel, and to<br />

integrate with stack-based SIMT execution.<br />

Contact: Wilson Wai Lun Fung (University of<br />

British Columbia)<br />

AP03 - CUDA-Based <strong>GPU</strong> Computing Framework<br />

for GNU Octave<br />

This poster presents the design of a CUDA-<strong>GPU</strong><br />

based parallel processing framework for GNU<br />

Octave. Octave is a high-level interpreted<br />

language, primarily intended for numerical<br />

computations. GNU Octave being an open source<br />

alternative to Matlab, is widely used in academic<br />

and research institutes. The <strong>GPU</strong> framework<br />

allows Octave users to accelerate their software<br />

written in Octave high-level ‘M’ language on <strong>GPU</strong>s<br />

with minimal code modifications. To my<br />

knowledge, this is the first attempt to build a <strong>GPU</strong><br />

framework for Octave, contrary to previous<br />

attempts to provide <strong>GPU</strong> variants for a set of<br />

Octave functions.<br />

Contact: John Melonakos (AccelerEyes)<br />

ASTRONOMY & ASTROPHYSICS<br />

AA01 - Adaptive Beam-Forming for Radio<br />

Astronomy on <strong>GPU</strong>s<br />

With the advent of a new breed of Telescopes like<br />

the Low Frequency Array (LOFAR), which rely on<br />

software processing to process large data-sets<br />

that they generate, there is a need to improve the<br />

software to run as fast as possible in order to<br />

process the large data-sets in a reasonable time.<br />

In this session we describe how we have used the<br />

computing power of <strong>GPU</strong>’s to improve the<br />

performance of the standard radio imaging<br />

techniques as well as how this computational<br />

power is useful for creating a new generation of<br />

Radio Imaging Algorithms.<br />

Contact: Vamsi Krishna Veligatla (University<br />

of Groningen)<br />

AA02 - Accelerating Real-Time Processing of the<br />

ATST Adaptive Optics System<br />

The real-time processing of the four meter<br />

Advanced <strong>Technology</strong> Solar Telescope (ATST)<br />

adaptive optics (AO) system with approximately<br />

1750 sub-apertures and 1900 actuators requires<br />

massive parallel processing to complete the task.<br />

The parallel processing is harnessed with the<br />

addition of hardware accelerators such as<br />

Graphics Processing Unit (<strong>GPU</strong>). We investigate<br />

the hybrid data processing architecture of the<br />

Shack-Hartmann correlation and wavefront<br />

reconstruction using FPGAs and <strong>GPU</strong>s. The ATST<br />

AO algorithm is implemented, benchmarked on<br />

the FPGA-<strong>GPU</strong> system and compared with the<br />

existing legacy Digital Signal Processing (DSP)<br />

based hardware system.<br />

Contact: Vivek Venugopal (United Technologies<br />

Research Center)<br />

AA03 - Cosmological Calculations on the <strong>GPU</strong><br />

Cosmological measurements often involve the<br />

calculation of non-trivial quantities over<br />

increasingly large datasets. The next generation of<br />

survey telescopes will yield information for billions<br />

of galaxies. The scale of the datasets, and the type<br />

of calculations involved, are ideal models for use<br />

of the <strong>GPU</strong>. We present two cosmological<br />

measurements, and describe the implementation<br />

and improvements found with the <strong>GPU</strong>.<br />

Contact: Deborah Bard (SLAC National Accelerator<br />

Laboratory)<br />

AA04 - Fast Cross-Matching of Astronomical<br />

Catalogs on <strong>GPU</strong>s<br />

We present a method of cross-matching objects of<br />

large astronomical catalogs, over 150 million<br />

objects, in under 4 minutes. We utilize up to 6<br />

NVIDIA c2050 and have achieved an over 40x<br />

speedup versus conventional methods.<br />

Contact: Matthias Lee (Johns Hopkins University)<br />

AUDIO, IMAGE & VIDEO PROCESSING<br />

AV01 - Rapid Training of Acoustic Models Using<br />

<strong>GPU</strong>s<br />

Robust and accurate speech recognition systems<br />

can only be realized with adequately trained<br />

acoustic models. For common languages,<br />

state-of-the-art systems are now trained on<br />

thousands of hours of speech data, which can take<br />

weeks even with a large cluster of machines. To<br />

overcome this development bottleneck, we<br />

propose a new framework for rapid training of<br />

acoustic models using highly parallel <strong>GPU</strong>s. With<br />

a single NVIDIA GTX580 <strong>GPU</strong>, our proposed<br />

approach is shown to be 51x faster than a<br />

sequential CPU implementation, enabling a<br />

moderately sized acoustic model to be trained on<br />

1000-hour speech data in just over 9 hours.<br />

Contact: Jike Chong (Carnegie Mellon University)<br />

AV02 - 2 Million Pixel Experiment<br />

This experimental application has been created as<br />

a piece of computational art using visual computing<br />

technologies. It maps a high definition video source<br />

(1080p) into 3D space. The pixel transformation is<br />

accelerated by a CUDA kernel to achieve realtime<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

85


POSTER LISTINGS<br />

accuracy. Beside the production of visual effects in<br />

arts this method may be utilized for video quality<br />

checking on lower pixel level.<br />

Contact: Philipp Drieger (Noumentalia.de - Digital<br />

Arts / KU Eichstätt-Ingolstadt)<br />

AV03 - Speeding Up Camera Sabotage Detection<br />

on CUDA<br />

Camera Sabotage Detection (CSD) algorithms,<br />

namely Camera Moved Detection, Camera Out of<br />

Focus Detection and Camera Covered Detection,<br />

are used to detect tampering attempts on<br />

surveillance cameras. CSD algorithms are required<br />

to be run on a high number of cameras in realtime,<br />

bringing high computational load to the video<br />

analytics systems. In this work, the CSD algorithms<br />

are accelerated by using CUDA. The overall system<br />

test results show that parallelization in <strong>GPU</strong> makes<br />

the system 18 times faster than its CPU<br />

counterpart and up to 400 cameras can be<br />

supported in real time on a GTX 470.<br />

Contact: Alptekin Temizel (Middle East Technical<br />

University)<br />

AV04 - Remote Sensing on <strong>GPU</strong>: A Case Study<br />

Satellite images have become widely available; as<br />

a result there are increasing number of<br />

commercial applications utilizing these images.<br />

Satellites provide data in different wavelengths<br />

and they have higher resolution and larger data<br />

size compared to typical images. Running complex<br />

algorithms on satellite images for large data<br />

volumes is highly time consuming using CPUs and<br />

can be speeded-up using <strong>GPU</strong>s. In this paper,<br />

performance of shadow detection and vegetation<br />

detection algorithms are investigated and their<br />

performance on <strong>GPU</strong> and CPU are compared.<br />

Results show that up to 10.2 times speed up could<br />

be achieved using <strong>GPU</strong>.<br />

Contact: Alptekin Temizel (Middle East Technical<br />

University)<br />

AV05 - Finite Difference-Based Sound Synthesis<br />

Using <strong>GPU</strong>s<br />

Finite Difference (FD) methods can be the basis<br />

for physics-based music instrument models that<br />

generate realistic audio output. However, such<br />

methods are compute-intensive; large simulations<br />

cannot run in real time on current CPUs. In this<br />

poster, we describe the current state of our<br />

implementation of a real-time sound synthesizer<br />

using an FD-based simulation of a twodimensional<br />

membrane executed on <strong>GPU</strong>s. We<br />

demonstrate that it is possible to use this method<br />

to create a usable real-time audio synthesizer.<br />

Contact: Marc Sosnick (San Francisco State<br />

University)<br />

AV06 - Parallelization of Hough Transform for<br />

Circles Using CUDA<br />

Hough Transform (HT) is a well-known technique<br />

used for detection of parametric shapes in image<br />

processing. However, various optimizations are<br />

necessary in its implementation due to large<br />

memory and computational requirements. In this<br />

paper, we consider the case of parallelization of<br />

Hough Transform for circles. A number of different<br />

implementation approaches of the algorithm is<br />

compared in CUDA. Results show that up to 360<br />

times speed up could be achieved compared to its<br />

CPU version, enabling real time applications.<br />

Contact: Alptekin Temizel (Middle East Technical<br />

University)<br />

AV07 - Accelerating an Imaging Spectroscopy<br />

Algorithm Using <strong>GPU</strong>s<br />

Graphics Processing Units (<strong>GPU</strong>s) have proven to<br />

be effective at accelerating a range of scientific<br />

applications. As data needs increase, and more<br />

complex data analysis methods are used, the<br />

processing requirements for solving scientific<br />

problems also increase. The parallel processing<br />

power of <strong>GPU</strong>s can be harnessed and used<br />

alongside multi-core CPUs to address this. As an<br />

example, many problems require solving<br />

optimization problems of multiple variables across<br />

large arrays of data. By utilizing modern<br />

optimization techniques and combining them with<br />

the computational throughput of a CPU-<strong>GPU</strong><br />

computing platform, we can greatly decrease the<br />

processing time required to solve these problems.<br />

Contact: Matthew Sellitto (LLC IntroVision)<br />

AV08 - CUVILib - <strong>GPU</strong> Accelerated Vision &<br />

Imaging Library<br />

Image Processing algorithms are used in a variety<br />

of different domains, from surveillance to medicine<br />

to industry. CUVI (CUDA Vision and Imaging Library)<br />

provides <strong>GPU</strong> accelerated Vision and Imaging<br />

functionality with plug-and-play ease of use, simple<br />

yet powerful interface and support for both NVIDIA<br />

and AMD <strong>GPU</strong>s. With over 1000 users of the Beta<br />

version, CUVI has fast grown into a mature solution<br />

of choice when it comes to delivering real-time<br />

performance for your Imaging/Vision applications<br />

and software-frameworks.<br />

Contact: Salman Ul Haq (TunaCode)<br />

AV09 - Implementation of Raptor Code on <strong>GPU</strong><br />

Raptor Code comes as an improvement to<br />

LT-Code, which performs as close as possible to<br />

the Shannon’s channel limit and provides linear<br />

encoding and decoding time. It has been chosen<br />

for the forward error correction (FEC) scheme in<br />

3GPP and DVB-H standards. We implement<br />

Raptor Codes on <strong>GPU</strong> for the purpose of<br />

processing large block size and symbol size<br />

effectively and efficiently.Our <strong>GPU</strong> decoding<br />

achieve up to a 40x speedup over the sequential<br />

CPU decoding.<br />

Contact: Linjia Hu (Michigan Technological<br />

University)<br />

AV10 - Real-Time Wind Velocity Estimation from<br />

Aerosol Lidar Data Using <strong>GPU</strong>s<br />

The REAL is an atmospheric light detection and


anging (LIDAR) system. It produces nearhorizontal<br />

and vertical cross-sectional images of<br />

the lower atmosphere. The images reveal the<br />

spatial distribution of atmospheric aerosol<br />

(particulate matter). By applying motion<br />

estimation algorithms to image sequences,<br />

two-dimensional vector wind fields can be<br />

determined. We will explore the use of <strong>GPU</strong><br />

computing in the real-time computation of wind<br />

vector fields.<br />

Contact: Chris Mauzey (Johns Hopkins University,<br />

Applied Physics Laboratory)<br />

AV11 - <strong>GPU</strong> Based Feature Extraction<br />

Implementation<br />

In this poster, we introduce an efficient parallel<br />

implementation of Mel-frequency Cepstral<br />

Coefficient (MFCC)-based feature extraction and<br />

describe the optimizations required for effective<br />

throughput on many core Graphic Processing<br />

Units (<strong>GPU</strong>) processors. We demonstrate that the<br />

feature extraction process in automatic speech<br />

recognition is well suited for <strong>GPU</strong>s and a<br />

substantial reduction in computation time can be<br />

obtained by performing feature extraction on<br />

these platforms. Using a single NVIDIA GTX460<br />

<strong>GPU</strong> our proposal approach is shown to be<br />

approximately 25x faster than a sequential CPU<br />

implementation, enabling feature extraction to be<br />

performed in real-time.<br />

Contact: Haofeng Kou (SCU)<br />

BIOINFORMATICS<br />

BI01 - Acceleration of Complex Network Analysis<br />

The scientific role of complex networks nowadays is<br />

of great importance. Their universal characteristics<br />

can be adopted for use from all over the scientific<br />

fields as network pharmacology.There is need for<br />

acceleration where the time execution of the used<br />

algorithms will be decreased in a large scale.The<br />

breakthrough is the use of <strong>GPU</strong>s and parallel<br />

computing in order to accelerate the whole<br />

process.The transformation of common algorithms<br />

as matrix multiplication to a parallel model has<br />

shown large acceleration, which is a promising<br />

point for the field of network analysis.<br />

Contact: Athanasios Grivas (Newcastle University)<br />

BI02 - GHOSTM: A <strong>GPU</strong>-Accelerated Homology<br />

Search Tool for Metagenomics<br />

A vast amount of sensitive homology searches is<br />

required for mapping sequence data to known<br />

protein sequence databases in metagenomic<br />

analysis. However, fast search tools such as BLAT<br />

do not have enough search sensitivity for<br />

metagenomic analysis. Thus a sensitive and<br />

efficient homology search tool is highly required.<br />

We develop <strong>GPU</strong> optimized algorithm for<br />

performing sensitive sequence homology<br />

searches. We implemented as the <strong>GPU</strong>-<br />

Accelerated Homology Search Tool for<br />

Metagenomics (GHOSTM), achieves calculation<br />

speeds faster and search accuracy higher than<br />

BLAT program. Our results indicate that GHOSTM<br />

offers a potentially cost-efficient solution to the<br />

increasingly difficult computational analysis of<br />

metagenomic data.<br />

Contact: Shuji Suzuki (Tokyo Institute of <strong>Technology</strong>)<br />

CLIMATE & WEATHER MODELING<br />

CW01 - CUDA/JAVA Model for Gas Line-by-Line<br />

Absorption of Atmospheric Radiation<br />

The potential of graphics processing units (<strong>GPU</strong>) to<br />

speed up the calculation of radiative energy<br />

absorption by atmospheric gases is presented. Gas<br />

absorption calculations are needed at millions of<br />

electromagnetic waves to have an accurate<br />

depiction of the Earth’s in-coming and out-coming<br />

radiative energies. The CUDA/<strong>GPU</strong> portion obtains<br />

the gases’ Voigt lineshapes, whereas the Java/CPU<br />

portion performs efficient I/O tasks on the large<br />

HITRAN database of molecular gas parameters. A<br />

modular combination of the lower-level CUDA<br />

algorithms and the higher-level Java language<br />

results in an accessible interface to the end-user<br />

that is not an expert in <strong>GPU</strong>.<br />

Contact: William Godoy (NASA Langley Research<br />

Center)<br />

CW02 - Heat Transfer Ray Tracing with OptiX<br />

QUIC Radiant is part of a suite of <strong>GPU</strong>-assisted<br />

tools developed by our research group that aim to<br />

increase knowledge for how environment and<br />

urban form interact. Our hypothesis is that urban<br />

structures exist that can minimize energy use<br />

while also minimizing air pollution exposure. Our<br />

efforts investigate the complex interactions of<br />

various types of urban structures by developing<br />

design strategies for optimizing urban form under<br />

a variety of constraints.<br />

Contact: Scot Halverson (University of<br />

Minnesota Duluth)<br />

COMPUTATIONAL FLUID DYNAMICS<br />

CD01 - Coalesced Simulation of Incompressible<br />

Navier-Stokes Equations Over Airfoil Using <strong>GPU</strong><br />

This work presents <strong>GPU</strong> based implementation of<br />

Finite Differencing Time Domain (FDTD) methods,<br />

for solving unsteady incompressible viscous flow<br />

over airfoil using the Stream function-Vorticity<br />

formulation for the structured grid. For the<br />

large-scale simulations, FDTD methods can be<br />

computationally expensive and require<br />

considerable amount of time to solve on CPUs. On<br />

the contrary, modern GP<strong>GPU</strong>s are designed to<br />

accelerate lots of independent calculations due to<br />

advantage of their parallel architecture. Our<br />

implemented FDTD simulation has efficient global<br />

memory coalescence with 66.67% of occupancy.<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

87


with High<br />

SGI®<br />

Performance<br />

<strong>GPU</strong><br />

NVIDIA®<br />

Tesla®<br />

Compute Solutions<br />

© <strong>2012</strong> Silicon Graphics International Corp. SGI is a trademark of Silicon Graphics International Corp or its subsidiaries in the<br />

U.S. and/or other countries. NVIDIA and Tesla are trademarks of NVIDIA Corporation in the U.S. and/or other countries.<br />

SGI Ad?<br />

���������������������<br />

SGI ® servers with NVIDIA ® Tesla ®<br />

<strong>GPU</strong>s deliver massive parallel<br />

compute power. Power that<br />

accelerates the pace at which our<br />

customers can solve their most<br />

compute-intensive challenges<br />

including structural design, drug<br />

research, oil and gas exploration,<br />

������������������������������<br />

sgi.com/products/gpu<br />

Come visit us at booth #4 in<br />

the exhibitor hall of the <strong>GTC</strong><br />

conference.


<strong>GPU</strong> based version of flow solver is over 28 times<br />

faster than a sequential CPU version.<br />

Contact: Iman Gohari (University of Tehran)<br />

CD02 - Parallel Computations on <strong>GPU</strong> in 3D<br />

Vortex Particle Method<br />

In this poster the Vortex in Cell (VIC) method for<br />

solution of the fluid equation in 3D and its<br />

implementation for parallel computation in<br />

muliticore architecture of the graphics cards was<br />

shortly presented. One of the most important<br />

components of VIC method algorithm is solution of<br />

the Poisson equation. Multigrid and Full Multigrid<br />

methods were chosen for its solution. It was<br />

obtained 12 times speed-up comparing to the<br />

direct fast solution algorithm for a single processor.<br />

The VIC method was fully implemented on the <strong>GPU</strong><br />

and a 46 times speed-up was obtained. The tests of<br />

the method were also shown.<br />

Contact: Andrzej Kosior (Wroclaw University<br />

of <strong>Technology</strong>)<br />

CD03 - Reynolds Equation Solver on GP<strong>GPU</strong> for<br />

Gas Film Lubrication Problem<br />

In the present study, we implemented a Reynolds<br />

equation solver on GP<strong>GPU</strong> for gas film lubrication<br />

problem. By using Red-Black Gauss-Siedle<br />

iteration scheme, we achieved 106x speedup for<br />

core calculation part and overall 12x speedup<br />

(double precision), relative to 1 core of AMD Llano<br />

A8-3850. A small serial part becomes a critical<br />

bottleneck and degrades overall speedup as the<br />

problem size gets bigger and <strong>GPU</strong> efficiency<br />

increases. Future work will include the<br />

development of general gas film analysis solver<br />

and the development of parallelization scheme for<br />

remaining serial part, such as integration, error<br />

check, and et al.<br />

Contact: Ji-Hoon Kang (KISTI)<br />

CD04 - Digital Core Analysis with <strong>GPU</strong><br />

Application<br />

Markets associated with the use of computed<br />

tomography (CT) for the calculation of core<br />

characteristics is one of the fast-growing markets<br />

in the oilfield services. Multi-<strong>GPU</strong> system<br />

processes raw data from CT-scanner using<br />

cheaper and more efficient way than CPU<br />

clusters. Calculation of key parameters of core<br />

such as porosity, absolute permeability and<br />

acoustic properties was processed using MPI and<br />

CUDA technologies. Special attention was paid to<br />

optimize memory usage and computational<br />

algorithms. Algorithms were tested on<br />

“Lomonosov” supercomputer and had close to<br />

linear increase in the computation speed<br />

according to the number of <strong>GPU</strong> devices in use.<br />

Contact: Dmitry Senin (University of Illinois at<br />

Urbana-Champaign)<br />

CD05 - Immersed Boundary Turbulent Flow<br />

Simulations on <strong>GPU</strong> Clusters<br />

A survey of recent literature reveals that <strong>GPU</strong><br />

speedup factors are generally much higher for<br />

structured Cartesian mesh methods than<br />

unstructured mesh methods. However, Cartesian<br />

mesh methods do not readily extend to complex<br />

geometries. To this end, immersed boundary (IB)<br />

methods extend Cartesian methods to complex<br />

geometry flow problems by imposing the boundary<br />

conditions on the equations as a forcing term. In<br />

this study we further develop our multi-<strong>GPU</strong><br />

parallel flow solver, GIN3D, to complex geometry<br />

turbulent flow problems by implementing the IB<br />

method along with the Lagrangian dynamic<br />

large-eddy simulation (LES) technique, which is<br />

suitable for arbitrarily complex shapes.<br />

Contact: Rey DeLeon (Boise State University)<br />

CD06 - Framework for Advanced Plasma<br />

Simulations on <strong>GPU</strong> HPC Clusters<br />

We present a fluid code called WARPM utilizing<br />

modern many-core computing devices – namely<br />

<strong>GPU</strong>s. WARPM is designed to both minimize data<br />

movement and maximize data-parallel<br />

computation. The code is a hybrid combination of<br />

OpenCL for parallel computation, MPI for<br />

communication between nodes, and threads for<br />

task-parallelism. The OpenCL standard is central<br />

to the code. <strong>GPU</strong>s and/or multi-core CPUs are<br />

utilized simultaneously to compute updates to the<br />

system of fluid equations using patch sequencing.<br />

We believe this new framework is representative of<br />

the future of high-performance fluid simulations<br />

and can be useful now to others in the community.<br />

Contact: Noah Reddell (University of Utah)<br />

COMPUTATIONAL PHYSICS<br />

CP01 - High Performance Beam Dynamics<br />

Simulator for the LANSCE Linear Accelerator<br />

The LANSCE accelerator complex located at the<br />

Los Alamos National Laboratory is a multi-beam<br />

facility that provides high-intensity H+ and H-<br />

particle beams for a variety of user programs. At<br />

the heart of the facility is a ½-mile long linear<br />

accelerator (linac). During beam operations, linac<br />

parameters are adjusted to maintain minimal<br />

beam spill, but without detailed knowledge of the<br />

beam distribution. We are presently developing a<br />

high performance multiparticle beam dynamics<br />

simulator using <strong>GPU</strong> that will provide fast and<br />

valuable information about the beam distribution<br />

in pseudo real-time during accelerator operations.<br />

Contact: Xiaoying Pang (The University of Plymouth)<br />

CP02 - Accelerating Atomic Collisions<br />

Calculations with CUDA: Atomic Basis Overlaps<br />

Atomic collisions calculations are relevant in many<br />

areas of science, from research in new materials<br />

to atmospheric studies, and even radiation therapy<br />

treatments. Accurate atomic computations are<br />

difficult and time consuming, computer codes in<br />

those areas rely basically in approximate models.<br />

The high performance computing power of <strong>GPU</strong>s<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

89


POSTER LISTINGS<br />

will allow to include precise computations in those<br />

codes. We started our research using simple ways<br />

to accelerate basic atomic collisions calculations<br />

using CUDA, and found excellent speed ups.<br />

Contact: Flavio Colavecchia (Div. Colisiones<br />

Atómicas/Instituto Balseiro)<br />

CP03 - Fast Discrete Element Simulations Using<br />

<strong>GPU</strong>s in the Million-Particle-Range<br />

Discrete Element Method (DEM) was introduced<br />

already in 1979. Even though available, due to<br />

limited computational power it was a challenge to<br />

run a simulation of granular assemblies of a few<br />

hundred disks in two dimensions for a long time.<br />

Meanwhile three-dimensional simulations in the<br />

range of 10,000 to 200,000 particles are standard<br />

and can be achieved on workstations and clusters,<br />

enabling simulated process times of up to several<br />

minutes in the latter case. Smart implementations<br />

with respect to the specific architecture of a <strong>GPU</strong><br />

allows for millions of particles already on a single<br />

<strong>GPU</strong> under your desk.<br />

Contact: Charles Radeke (University of Washington)<br />

CP04 - Discontinuous Galerkin Time-Domain<br />

Simulations of Plasmonic Nanostructures on<br />

NVIDIA <strong>GPU</strong>s<br />

The discontinuous Galerkin time-domain (DGTD)<br />

method is a powerful method to explore the<br />

electromagnetic properties of nano-scale<br />

plasmonic and dielectric systems. Here, we present<br />

the method’s advantages and disadvantages when<br />

implemented to run on graphic processing units<br />

(<strong>GPU</strong>s). The <strong>GPU</strong>’s superior performance is<br />

demonstrated for realistic nanophotonic setups<br />

characterized by both, optical spectroscopy and<br />

electron energy loss spectroscopy. Compared to<br />

modern CPU hardware, <strong>GPU</strong>-based DGTD yields up<br />

to two orders of magnitude decreased<br />

computational time.<br />

Contact: Richard Diehl (Karlsruhe Institute<br />

of <strong>Technology</strong>)<br />

CP05 - Inversion of a Sequence of Matrices<br />

Differing in Diagonal Elements<br />

We propose an implementation of the <strong>GPU</strong><br />

algorithm for the inversion of special matrices set.<br />

Each matrix in the set is differs from others only by<br />

its diagonal elements.The algorithm uses a direct<br />

product procedure for the matrix inversion. The<br />

ability to use massive parallelization for the<br />

calculation of the direct product allows to effectively<br />

use <strong>GPU</strong> calculations which speeds up the solution<br />

of this problem. We implement and study the<br />

properties of this algorithm for complex valued<br />

matrices. Using the <strong>GPU</strong> algorithm for simulation<br />

of the disordered 2D-lattice systems allows to<br />

achive significant speed up in calculations.<br />

Contact: Alexey Osipov (Jet Propulsion Laboratory)<br />

CP06 - Accelerating Particle Simulations with<br />

<strong>GPU</strong> Computing<br />

RandomWalk is a program designed to model<br />

particle dispersion for a city-scale environment. It<br />

is used to model airborne hazards in urban<br />

environments. We reimplemented RandomWalk in<br />

CUDA to achieve significantly faster results.<br />

Contact: Scot Halverson (University of<br />

Minnesota Duluth)<br />

CP07 - Accelerating Particle-Tracking Based<br />

Beam Dynamics Simulations with <strong>GPU</strong>s<br />

Efficient implementation of general-purpose<br />

particle tracking on <strong>GPU</strong>s can result in significant<br />

performance benefits to large-scale particle<br />

tracking and tracking-based accelerator<br />

optimization simulations. We present our work on<br />

CUDA kernels for transfer maps of single-particledynamics<br />

and collective-effects beamline elements,<br />

to be incorporated into a <strong>GPU</strong>-accelerated version<br />

of the Argonne National Lab’s accelerator code<br />

ELEGANT. In particular, we discuss techniques for<br />

efficient utilization of the device shared, cache, and<br />

local memory in the design of single-particle and<br />

collective-effects kernels. We also discuss the use<br />

of data-parallel and hardware-assisted approaches<br />

for resolving memory contention issues in collective<br />

effects kernels.<br />

Contact: Keegan Amyx (Tech-X Corporation)<br />

COMPUTER GRAPHICS<br />

CG01 - CUDA-Based Interactive Design of Urban<br />

Ecosystems<br />

We address the problem of interactive design of<br />

urban spaces by integrating plants in urban<br />

environments. We have developed an interactive<br />

simulation and procedural system for 3D urban<br />

models. Using our CUDA-based interactive system<br />

we can simulate spatial distribution of a large<br />

ecosystem embedded in a city. We have achieved a<br />

performance of 50M-70M collision tests per<br />

second allowing for 250,000 plants being<br />

simulated at 5-6 fps on a Tesla C2050.<br />

Contact: Michel Abdul Massih (Purdue University)<br />

CG02 - Robust <strong>GPU</strong> Algorithm for Exact 3D<br />

Minkowski Sum Computation<br />

We present a robust <strong>GPU</strong> algorithm to compute<br />

exact 3D Minkowski sum of two polyhedral<br />

objects. While Minkowski sum is of great<br />

importance in mathematics, geometric modeling,<br />

and robotics, it is hard to compute efficiently and<br />

robustly. The proposed algorithm achieves high<br />

performance by mainly running on <strong>GPU</strong>, while<br />

filtering out unsafe predicates caused from<br />

degenerate cases by using interval arithmetic. The<br />

filtered unsafe predicates are tossed to CPU<br />

where they are robustly evaluated by using<br />

extended arithmetic (MPFR). The performance<br />

result shows speedup of one order of magnitude<br />

versus a pure CPU algorithm.<br />

Contact: Min-Ho Kyung (Ajou University)


CG03 - Real-Time Mixed Water Simulation and<br />

Rendering Techniques for Visual Effects<br />

The synthesis of realistic scenes is a important<br />

research areas for applications in games and<br />

visual effects. Research groups developed<br />

techniques for realistic water rendering, but there<br />

are no research work that describes techniques<br />

and make a comparative analysis of them. The<br />

present work research analyses the most<br />

important techniques for water simulation and<br />

visualization, makes performance comparison,<br />

and create a system driven for artists. The system<br />

can choose between algorithms and combine<br />

them using layers to achieve the desired result.<br />

Finally it can use a virtual camera to output the<br />

final render in multiple passes for post production.<br />

Contact: Rodrigo Marques (California State<br />

University, Chico)<br />

COMPUTER VISION<br />

CV01 - Efficient Dense Stereo Matching Using<br />

CUDA<br />

The proposed work demonstrates the general<br />

strategy for parallelization of dense matching<br />

methods on <strong>GPU</strong>s, shows the potential capability<br />

of common graphics cards for general<br />

computation, and compares the implementations<br />

between local and global methods with the<br />

example of Sum of Absolute difference (SAD) and<br />

Semi-Global Matching (SGM).<br />

Contact: Ke Zhu (Technische Universität München)<br />

CV02 - Scalable Local Feature Extraction with<br />

Orientation Maps and <strong>GPU</strong> Computing<br />

This paper presents scalable computational<br />

techniques for extracting local invariant features.<br />

Although several investigators have developed<br />

efficient algorithms and implementations for<br />

feature extraction, the scalability in terms of the<br />

number of extracted features still remains as an<br />

issue. We introduce the data structure called<br />

orientation maps and <strong>GPU</strong> computing to improve<br />

the scalability of feature extraction. Experimental<br />

results demonstrate that using orientation maps<br />

and a <strong>GPU</strong> enable us to improve the scalability as<br />

well as the efficiency of computation compared to<br />

a CPU.<br />

Contact: Naoyuki Ichimura (National Institute of<br />

Advanced Industrial Science and <strong>Technology</strong> (AIST))<br />

CV03 - <strong>GPU</strong>-Accelerated Detection of Severe<br />

Video Distortions<br />

We show how to port a previously proposed<br />

algorithm for detection of severe analog and digital<br />

video distortions (termed ‘video breakup’), efficiently<br />

to Fermi Architecture <strong>GPU</strong>s with CUDA. By porting<br />

to a <strong>GPU</strong>, the runtime of the CPU implementations<br />

can be reduced by an order of magnitude. Thus our<br />

<strong>GPU</strong> algorithm is capable of analyzing up to ten Full<br />

HD (1920 x 1080) video streams in real-time. The<br />

<strong>GPU</strong> implementation is integrated in the AV-<br />

Inspector application, which allows the user to get<br />

an automatic assessment of the quality of video and<br />

film material in very short time.<br />

Contact: Hannes Fassold (JOANNEUM RESEARCH)<br />

CV04 - VScreen: A Real-Time Augmented<br />

Video Method<br />

We present a tool for image editing that allows us<br />

to modify a region of any image or video by another<br />

image or video. This application is useful for<br />

advertisements, commercials, music videos,<br />

movies, etc. The main difference between editing<br />

(augmenting) videos and fixed images is that the<br />

occlusions need be managed. Moving objects in<br />

foreground may occlude the augmented region in<br />

background. So that we use a procedure for<br />

Foreground/Background (FgBg) video<br />

segmentation, which is implemented in NVIDIA<br />

video cards to fulfill the real-time requirement.<br />

Contact: Francisco J. Hernandez-Lopez (CIMAT A.C.)<br />

CV05 - Accelerated Multiple Region Evaluation<br />

for Human Motion Tracking<br />

In this work we present a study about different<br />

NVIDIA CUDA approaches to the problem of the<br />

evaluation of a region of interesting (ROI) pixels in<br />

an image. This problem is usually integrated as<br />

part of other higher level methods, such as image<br />

retargeting, completion, video summarization,<br />

object detection, visual tracking, etc. Because<br />

of these problems evaluate millions of ROIs, in<br />

many cases performance is usually far from<br />

being interactive.<br />

Contact: David Concha Gomez (Universidad Rey<br />

Juan Carlos)<br />

CV06 - Efficient Segmentation Trees on the <strong>GPU</strong><br />

There are numerous computer vision tasks which<br />

demand a high performance algorithm for<br />

segmentation trees building. Unfortunately,<br />

current state-of-the-art methods aimed for the<br />

CPU are way too slow. Present work describes an<br />

efficient <strong>GPU</strong> implementation of a popular<br />

algorithm. Performance evaluations show that<br />

unlike its CPU counterpart the proposed method<br />

is suitable for real-time applications.<br />

Contact: Yaroslav Ganin (NVIDIA)<br />

CV07 - <strong>GPU</strong> Vision: OpenCV’s <strong>GPU</strong> Module<br />

Accelerates Computer Vision<br />

OpenCV is the world’s most used library for<br />

computer vision with over 3 million downloads<br />

worldwide. Using the power of CUDA and the<br />

NVPP library, the most computationally<br />

demanding of OpenCV’s more than 500 functions<br />

have been ported for an average speedup of 33X<br />

over the already highly optimized CPU code.<br />

Several application work flows have been<br />

dramatically improved, including HOG pedestrian<br />

detection, face detection, stereo correspondence,<br />

and feature detection and matching.<br />

Contact: Colin Tracey (NVIDIA)<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

91


POSTER LISTINGS<br />

CV08 - Orientation Flows: <strong>GPU</strong> Implementation<br />

Clarifies Cortical Computation<br />

Orientation flows play an important role in shape<br />

inference. We have developed a model of<br />

orientation flow extraction that explains the<br />

statistics of neurophysiologically observed<br />

connection structure through second order (mean<br />

and variance). Our <strong>GPU</strong>-based implementation of<br />

this model realizes dramatic performance<br />

improvements over the original C implementation,<br />

enabling us to pursue formerly prohibitively<br />

time-consuming studies.<br />

Contact: Daniel Holtmann-Rice (Yale University)<br />

CV09 - Michigan Visual Sonification System:<br />

Driving Efficient Mobile Vision Designs<br />

Visual Sonification is the process of converting<br />

visual properties of objects into audio. The<br />

Michigan Visual Sonification System (MVSS)<br />

utilizes this process to assist the visually impaired<br />

in distinguishing objects in their surroundings.<br />

MVSS uses computer vision to analyze scenes and<br />

create a dynamic audio representation of each<br />

object which is presented to the user using 3D<br />

audio. The performance of MVSS on mobile<br />

processors exposed a need for improved mobile<br />

vision performance. Our benchmark suite,<br />

MEVBench, was used to further analyze the<br />

computational characteristics of mobile vision.<br />

The EFFEX architecture was developed for<br />

efficient feature extraction in mobile vision.<br />

Contact: Jason Clemons (University of Michigan)<br />

DATABASES, DATA MINING, BUSINESS<br />

INTELLIGENCE<br />

DB01 - Parallel Data Mining Techniques on<br />

Graphics Processing Unit with CUDA<br />

Data mining is widely used in various domains and<br />

has significant applications. However, current data<br />

mining tools cannot meet the requirement of<br />

applications with large-scale databases in terms<br />

of speed. We propose three techniques to<br />

accelerate fundamental kernels in data mining<br />

algorithms on CUDA platform, scalable thread<br />

scheduling scheme for irregular pattern, parallel<br />

distributed top-k scheme, and parallel high<br />

dimension reduction scheme. They play a key role<br />

in our GUCAS_CU-Miner, including three<br />

representative data mining algorithms, CU-<br />

Apriori, CU-KNN and CU-K-means. The<br />

experiments have shown that <strong>GPU</strong> + CUDA<br />

parallel architecture is feasible and promising for<br />

data mining applications.<br />

Contact: Ying Liu (Graduate University of Chinese<br />

Academy of Sciences)<br />

DB02 - Parallel Spectral Graph Partitioning<br />

on CUDA<br />

Spectral graph partitioning is a widely used<br />

technique in many fields such as image<br />

processing, scientific computing and machine<br />

learning. In this study, we analyze the subroutines<br />

of spectral graph partitioning algorithm on CUDA.<br />

Each step is analyzed using various different<br />

techniques to lead a conclusion about suitability of<br />

the step for <strong>GPU</strong> implementation.Two different<br />

<strong>GPU</strong> configurations are implemented and their<br />

results are compared against the CPU version.<br />

Contact: Alptekin Temizel (Middle East Technical<br />

University)<br />

DB03 - Red Fox: Accelerating Data Warehousing<br />

Applications Using GP<strong>GPU</strong>s<br />

Red Fox is a compiler optimization framework for<br />

accelerating large scale data warehousing<br />

applications on cloud architectures augmented<br />

with <strong>GPU</strong>s. Currently, the framework is structured<br />

around the program transformations based on the<br />

concepts of kernel fusion and fission, drawing<br />

upon the analogy with classical loop fusion and<br />

fission transformations. These transformations<br />

seek to improve <strong>GPU</strong> utilization and optimize data<br />

movement throughout the CPU/<strong>GPU</strong> memory<br />

hierarchy. Coupled with the Ocelot dynamic<br />

compiler, this framework can optimize the<br />

execution of applications across the CPU and<br />

<strong>GPU</strong>. The initial application domain includes<br />

relational operators and arithmetic functions<br />

found in data warehousing applications.<br />

Contact: Haicheng Wu (Georgia Institute of<br />

<strong>Technology</strong>)<br />

DEVELOPMENT TOOLS & LIBRARIES<br />

DL01 - AutoTune: Automatic Online Code Tuning<br />

Performance analysis and tuning is an important<br />

step in programming multicore and manycore<br />

architectures. There are several tools to help<br />

developers analyze application performance; still,<br />

no tool provides recommendations about how to<br />

tune the code. AutoTune will extend Periscope, an<br />

automatic online and distributed performance<br />

analysis tool developed by Technische Universität<br />

München, with plugins for performance and<br />

energy efficiency tuning. The resulting Periscope<br />

Tuning Framework will be able to tune serial and<br />

parallel codes with and without <strong>GPU</strong> kernels; in<br />

addition, it will return tuning recommendations<br />

that can be integrated into the production version<br />

of the code.<br />

Contact: Renato Miceli (Aon Benfield Securities)<br />

DL02 - Interactive Linked Visualizations for<br />

Performance Analysis Of Heterogeneous<br />

Computing Clusters<br />

Performance analysis is a vital step in identifying<br />

execution bottlenecks to help target optimizations.<br />

This analysis is derived from observations of<br />

performance data collected from the computing<br />

hardware. Data obtained from computing clusters<br />

is necessarily complicated because its collection<br />

involves multiple interacting nodes as opposed to<br />

just a single serial execution. Further,<br />

heterogeneous clusters, having CPUs working


together with several <strong>GPU</strong>s, add additional layers<br />

of complexity. These characteristics pose a<br />

serious challenge to the analysis and<br />

improvement of application performance. We<br />

present a tool that assists performance analysis<br />

by visualizing performance data with the help of<br />

various interactive linked views.<br />

Contact: Aaditya Landge (Scientific Computing and<br />

Imaging Institute, University of Utah)<br />

DL03 - High-Performance Pedestrian Multi-<br />

Simulation Using <strong>GPU</strong> Cluster<br />

We have created a tool that could potentially help<br />

with decision support and planning of large-scale<br />

emergency pedestrian evacuations. Through the<br />

use of our simulation software distributed over a<br />

<strong>GPU</strong> cluster, many evacuation scenarios can be<br />

simultaneously simulated at faster than real-time<br />

speeds and compared for their effectiveness.<br />

Contact: Twin Karmakharm (University of Sheffield)<br />

DL04 - ttgLib - Middleware for Dynamic Software<br />

Adaptation to Heterogeneous Architectures<br />

We present ttgLib, a middleware that efficiently<br />

distributes computational tasks between CPUs<br />

and <strong>GPU</strong>s and provides load balancing between<br />

them on the fly. This enables an application to use<br />

all available processing units of heterogeneous<br />

HPC system simultaneously. ttgLib accomplishes<br />

several dynamic optimization procedures that<br />

significantly facilitate the development of new<br />

applications for and porting of existing software to<br />

heterogeneous platforms. ttgLib can be<br />

considered as an extension of widely used parallel<br />

programming tools that can be easily integrated<br />

into software development process. This<br />

middleware efficiently solves the most tedious<br />

problems of ‘heterogeneous coding’ the<br />

developers usually met with.<br />

Contact: Sergey Grizan (Moscow State University<br />

and Siberian Federal University, ttgLabs)<br />

DL05 - Efficient Formal Verification of CUDA<br />

SIMD and Atomics<br />

Detecting and Debugging assertion failures and<br />

runtime errors in CUDA programs is usually hard.<br />

Typical multithreaded program verification<br />

methods are not effective for verifying the largescale<br />

fine-grained concurrency of CUDA. Our novel<br />

contribution is a technique to handle CUDA SIMD<br />

plus Atomics using concolic execution methods.<br />

Contact: Wei-Fan Chiang (School of Computing,<br />

University of Utah)<br />

DL06 - Performance Optimizations And<br />

Modeling For Large-Scale Heterogeneous<br />

Computing Systems<br />

This poster proposes to address the following at<br />

every level of parallelism in heterogeneous<br />

computing systems: 1) performance optimizations<br />

of applications, and 2) performance modeling<br />

and prediction.<br />

Contact: Ashwin Aji (Virginia Tech)<br />

ELECTRONIC DESIGN AUTOMATION<br />

EA01 - Parallel VLSI CAD Algorithms for Energy<br />

Efficient Heterogeneous Computing Platforms<br />

In the past decade, parallel VLSI CAD tools have<br />

been successfully developed by major EDA<br />

vendors to leverage multi-core/distributed parallel<br />

computing powers. However, for recent energy<br />

efficient heterogeneous computing platforms that<br />

integrate multi-core CPUs and many-core <strong>GPU</strong>s,<br />

very limited progress has been made in VLSI CAD<br />

research society. Developing efficient CAD<br />

algorithms for such heterogeneous platforms can<br />

be extremely challenging, requiring strong<br />

domain-specific CAD algorithm knowledge as well<br />

as thorough understanding of the latest hardware<br />

properties. In this abstract, we show our latest<br />

research progress on large scale circuit electrical<br />

and thermal modeling and simulation methods.<br />

Contact: Zhuo Feng (Michigan Technological<br />

University)<br />

EA02 - Ultra-Low Power Transceivers for<br />

High-Bandwidth Interconnects<br />

A low-power transceiver for highly parallel<br />

chip-to-chip data communication is presented.<br />

The receiver is implemented in a 45nm SOI<br />

technology. High data rate and low power<br />

dissipation is achieved using a switched-capacitor<br />

S/H/summer front-end which enables FEXT<br />

cancellation with 33µW/Gbps power overhead. It<br />

operates up to 15Gb/s and dissipates 7.5mW from<br />

a 1.2V supply. The 15Gb/s transmitter employs an<br />

analog filtering pre-emphasis equalization<br />

technique and dissipates 10mW from a 1.2V supply<br />

while occupies 0.01mm2. It was fabricated in<br />

65nm CMOS technology and compensates for<br />

channel losses up to 20dB at Nyquist-rate.<br />

Contact: Meisam Honarvar Nazari (California<br />

Institute of <strong>Technology</strong>)<br />

ENERGY EXPLORATION<br />

EE01 - The Maven Vector-Thread Architecture<br />

We present a taxonomy and modular<br />

implementation approach for data-parallel<br />

accelerators, including the MIMD, vector-SIMD,<br />

subword-SIMD, SIMT, and vector-thread(VT)<br />

architectural design patterns. We have developed<br />

a new VT microarchitecture, Maven, based on the<br />

traditional vector-SIMD microarchitecture that is<br />

simpler to implement and easier to program than<br />

previous VT designs. Using an extensive designspace<br />

exploration of full VLSI implementations of<br />

many accelerator design points, we evaluate the<br />

varying tradeoffs between programmability and<br />

implementation efficiency among the different<br />

architectural patterns. Our results suggest that<br />

the Maven VT microarchitecture is superior to the<br />

vector-SIMD architecture, providing both greater<br />

efficiency and easier programmability.<br />

Contact: Yunsup Lee (UC Berkeley)<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

93


Covering the fastest computers in the world<br />

and the people who run them<br />

Subscribe Today!<br />

HPC Wire Ad?<br />

www.hpcwire.com


FINANCE<br />

FA01 - PathWise High Productivity<br />

Computing Platform<br />

PathWise High Productivity Computing (HPC)<br />

platform is a financial modeling environment for<br />

targeting <strong>GPU</strong> grids.<br />

Contact: Aamir Mohammad (Tsinghua University)<br />

GENERAL INTEREST<br />

GI01 - High Throughput MIMO-OFDM Detection<br />

with Graphics Processing Units<br />

A novel strategy is proposed to implement<br />

a reconfigurable MMSE-based detector for<br />

multiple-input multiple-output (MIMO) wireless<br />

communication systems with orthogonal<br />

frequency-division multiplexing (OFDM). The key<br />

component of the strategy is a massively parallel<br />

implementation of the scalable matrix inversion<br />

on <strong>GPU</strong>s. A series of optimization methods<br />

including multi-threaded matrix inversion with<br />

multiple data frames, maximizing the utilization<br />

of the fast on-chip memories, and overlapping<br />

kernel execution with data transfer, are proposed.<br />

Experiments demonstrate that the throughputs<br />

for a 4×4 64QAM MIMO-OFDM system can<br />

achieve over 100 Mbit/s, satisfying 4G wireless<br />

communication standards like LTE/LTE-Advanced.<br />

Contact: Dan Sui (Wireless & Mobile<br />

Communication R&D Center, Tsinghua University)<br />

GI02 - A Fast Irregular LDPC Decoder on NVIDIA<br />

Fermi<br />

Low-Density Parity-Check (LDPC) codes are<br />

widely used in many wireless communication<br />

systems. The decoding algorithms are often<br />

time-consuming. Graphics Processing Unit (<strong>GPU</strong>)<br />

is an attractive co-processor of CPU to implement<br />

massively parallel computing. The <strong>GPU</strong>-based<br />

LDPC decoder is studied, especially for irregular<br />

LDPC codes. Optimization techniques for <strong>GPU</strong> are<br />

considered. Experimental results demonstrate<br />

that compared to CPU, <strong>GPU</strong> can achieve more<br />

than 80 times speedup.<br />

Contact: Dan Sui (Wireless & Mobile<br />

Communication R&D Center, Tsinghua University)<br />

GI03 - Actual Power Consumption in Pattern<br />

Matching on CUDA <strong>GPU</strong>s<br />

For many embedded applications in e.g. the<br />

Aerospace/Defense industry, power efficiency is<br />

very important as both cooling and power are<br />

often difficult to supply. We show that the specified<br />

max power of a CUDA <strong>GPU</strong> is not a good measure<br />

of actual power consumption under a CUDA load,<br />

and that writing efficient code which reaches high<br />

utilization is of the essence when it comes to<br />

power efficiency.<br />

Contact: Ian Wainwright (High Performance<br />

Consulting)<br />

GI04 - <strong>GPU</strong>-Accelerated Fingerprint Matching<br />

As biometric databases approach hundreds of<br />

millions of identities in size, it becomes more<br />

costly and time-consuming to search these<br />

databases. Using a <strong>GPU</strong>-accelerated coarse<br />

filtering algorithm, we demonstrate that a large<br />

fingerprint database can be searched very quickly<br />

for a matching individual by isolating a small list<br />

of potential matches using <strong>GPU</strong>s, such that only<br />

these few records will be given further scrutiny by<br />

the matching system.<br />

Contact: Scott Bai (The MITRE Corporation)<br />

GI05 - Towards Task-Pipelined General Purpose<br />

Computing on <strong>GPU</strong>s<br />

Many real-world applications, especially those<br />

following a stream processing pattern, feature<br />

interleaved task-pipelined and data parallelisms.<br />

Current <strong>GPU</strong>s are ill-equipped for such<br />

applications due to the insufficient usage of<br />

computing resources and/or the excessive off-chip<br />

memory traffic. This paper focuses on architectural<br />

enhancements to enable task-pipelined execution<br />

of data-parallel kernels on <strong>GPU</strong>s. We propose an<br />

efficient adaptive dynamic scheduling mechanism<br />

and a moderately modified L2 cache structure to<br />

orchestrate both task-pipelined and data<br />

parallelisms. Simulation results show that the<br />

proposed <strong>GPU</strong> architecture improves IPC by 18%<br />

and reduces the overall access to off-chip <strong>GPU</strong><br />

memory by 11% on average.<br />

Contact: Shuai Mu (ABB Corporate Research)<br />

GI06 - High Performance Computing in<br />

Volumetric Velocimetry<br />

Since the advent of Particle Image Velocimetry (PIV)<br />

in experimental fluids measurements there has<br />

been a steady and sustained incline in the<br />

throughput capability and resolution of hardware<br />

devices (i.e.CMOS cameras) needed to acquire and<br />

transfer the copious amounts of image data. With<br />

the introduction of tomographic measurement<br />

techniques the amount of data suddenly increased<br />

by an order of magnitude. While the development of<br />

hardware paces reasonably well with the<br />

acquisition demand placed by current experiments,<br />

the ability of computers and current algorithms to<br />

process and further reduce the data within a<br />

reasonable period has fallen dramatically behind.<br />

Contact: Thomas Nonn (Moscow State University,<br />

Physics Department)<br />

GI07 - Conformal Transformations of 3D Meshes<br />

in Parallel<br />

Arbitrary deformations applied on 3D meshes<br />

pose significant restrictions in many design<br />

applications. Conformal transformations, these<br />

that preserve oriented angles for a given 3D mesh<br />

parametrization, however, offer the right balance<br />

between flexibility of the geometric form and<br />

structural preservation. The advantage of using<br />

such transformations is two-fold: one can<br />

maintain flexibility of the design process, and<br />

preserve texture and emblematic features of the<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

95


POSTER LISTINGS<br />

mesh. To this end, we investigate efficient and<br />

scalable implementations of the methodology<br />

introduced by Crane et al. in <strong>GPU</strong> architectures.<br />

Contact: Nikolaos Yiotis (London College of<br />

Fashion/ University of Arts London)<br />

<strong>GPU</strong> ACCELERATED INTERNET<br />

GA01 - Accelerating Greater Than-Strong<br />

Conditional Oblivious Transfer Multiparty<br />

Protocol Using <strong>GPU</strong><br />

Greater Than-Strong Conditional Oblivious<br />

Transfer (GT-SCOT) is a protocol used for sharing<br />

data between two parties without revealing any<br />

private information. Due to the large number of<br />

iterative operations and the increasing size of the<br />

input, the algorithm is computationally intensive,<br />

and hence cannot be used for large credentials or<br />

secure database mining. This work presents an<br />

implementation of GT-SCOT using <strong>GPU</strong> in order to<br />

accelerate the operations and handle large<br />

messages. Results show that <strong>GPU</strong><br />

implementation achieved a speedup of 7x for<br />

messages with size of 1024 bits using 64 bits of<br />

encryption for each bit of the message.<br />

Contact: Axel Rivera (The University of Tokyo)<br />

LIFE SCIENCES<br />

LS01 - <strong>GPU</strong>-Enabled Stochastic Spatiotemporal<br />

Model of Rat Ventricular Myocyte Calcium<br />

Dynamics<br />

Some cardiac arrhythmias are thought to result<br />

from Ca2+ waves under spark-induced spark<br />

phenomenon. Calcium sparks - the local elevation<br />

of calcium, may recruit the sparks in the<br />

neighboring sites. However, the study of such<br />

calcium dynamics at a detail whole-cell model is<br />

computational prohibitive. We introduced a novel<br />

Markov-Chain Monte Carlo simulation. The time<br />

steps is at microscale range, i.e.10ns to 1us. The<br />

simulation thus can capture the dynamics of<br />

individual ion channel kinetics. The authors<br />

introduced an on-going effort to study calcium<br />

dynamics, for the first time, that incorporate detail<br />

structure of rat ventricular myocytes.<br />

Contact: Tuan Hoang-Trong (George Mason<br />

University)<br />

LS02 - <strong>GPU</strong> Accelerated Signal Processing in Ion<br />

Torrent Analysis Pipeline<br />

We have adopted solutions to provide fast analysis<br />

results to our customers by accelerating our<br />

signal processing pipeline using Tesla C2050 <strong>GPU</strong>.<br />

This poster presents a high level view of <strong>GPU</strong><br />

application to our processing pipeline.<br />

Contact: Mohit Gupta (Life Technologies)<br />

LS03 - A Fast CUDA Compatible Short Read<br />

Aligner to Large Genomes<br />

We present CUSHAW, a parallelized short read<br />

aligner that exploits CUDA-compatible graphics<br />

hardware as accelerators to achieve fast speed. It<br />

employs a quality-aware bounded search<br />

approach based on the Burrows-Wheeler<br />

transform (BWT) and the FM-index to reduce the<br />

search space and achieve high alignment quality.<br />

Performance evaluation reveals that CUSHAW<br />

running on one or two <strong>GPU</strong>s achieves significant<br />

speedups in terms of execution time, while<br />

yielding comparable or even better alignment<br />

quality for paired-end alignments compared to<br />

three popular BWT-based aligners: Bowtie, BWA<br />

and SOAP2 (availability: http://cushaw.<br />

sourceforge.net).<br />

Contact: Bertil Schmidt (Johannes Gutenberg<br />

University Mainz)<br />

MACHINE LEARNING & AI<br />

ML01 - Accelerating Parallel Monte Carlo Tree<br />

Search Using CUDA<br />

The poster presents a parallel implementation of<br />

Monte Carlo Tree Search algorithm on <strong>GPU</strong> using<br />

CUDA. It is run on the TESLA equipped TSUBAME<br />

supercomputer and the results show that in a<br />

2-player game such as Reversi, the <strong>GPU</strong> version is<br />

much stronger than the CPU one. Additionally, it<br />

can be easily scaled to thousands of <strong>GPU</strong> cores.<br />

The scalability factors are presented.<br />

Contact: Kamil Rocki (KPIT Cummins Infosystems Ltd.)<br />

ML02 - Message Passing Parallelism for Belief<br />

Propagation in Junction Trees<br />

Belief propagation over junction tree is known to be<br />

computationally intensive in the general case. One<br />

way of addressing this computational challenge is<br />

to use parallel computing on <strong>GPU</strong>. In this paper, we<br />

develop a two dimensional parallel computing<br />

model for node level message passing. Based on<br />

this approach, we further develop a novel clique<br />

merging technique that leverages the two<br />

dimensions of parallelismto adapt the various<br />

Bayesian networks to parallel computing platform.<br />

We implement our approach on an NVIDIA <strong>GPU</strong> and<br />

test it using BNs from several applications.<br />

Contact: Lu Zheng (Carnegie Mellon)<br />

ML03 - Parallel Memetic Algorithm<br />

Implementation on CUDA<br />

In this poster, a parallel memetic algorithm<br />

implementation for CUDA platform is described.<br />

The conventional genetic operators are adapted to<br />

the <strong>GPU</strong> considering the <strong>GPU</strong> architecture. In this<br />

population based optimization technique, there are<br />

one more islands and each island consists of<br />

constant number of individuals. Each CUDA thread<br />

is responsible for evolution of one individual, and<br />

islands are mapped as CUDA blocks to benefit from<br />

the shared memory. The results show up to 38x<br />

speedup compared to the CPU implementation.<br />

Contact: Alptekin Temizel (Middle East Technical<br />

University)


ML04 - <strong>GPU</strong>-Accelerated Action Acquisition<br />

Through Multiple Time Scales Recurrent<br />

Neural Network<br />

This poster presents novel results of complex action<br />

learning experiments based on the use of extended<br />

multiple timescales recurrent neural<br />

networks(MTRNN). The experiments were carried<br />

out with the iCub humanoid robot, as a model of the<br />

developmental learning of motor primitives as the<br />

basis of sensorimotor and linguistic<br />

compositionality. The model was implemented<br />

through the <strong>GPU</strong>-accelerated Aquila cognitive<br />

robotics toolkit. The results presented herein show<br />

that the model was able to learn and successfully<br />

reproduce multiple actions in an object manipulation<br />

task scenario using large-scale MTRNNs. This<br />

forms the basis on ongoing experiments on action<br />

and language compositionality.<br />

Contact: Martin Peniak (Federal University of Rio<br />

de Janeiro)<br />

MACHINE VISION<br />

MV01 - <strong>GPU</strong> Based Fast Block Matching Using<br />

Orthogonal Thread Transformation<br />

Block matching (BM) technique is extensively used<br />

in object tracking and defect detection problems.<br />

BM has moderate accuracy for defect detection but<br />

it suffers from heavy performance drawbacks.<br />

Modifications in BM with compromised accuracy<br />

and increased performance have been reported in<br />

the literature. This technique is exhaustive search<br />

technique but on the contrary, it is highly data<br />

parallel in nature. We present the implementation of<br />

BM algorithm using CUDA using a novel orthogonal<br />

thread transformation technique to maintain the<br />

data parallelism throughout the processing. We<br />

have achieved 350x speed up against CPU and 2.3x<br />

against other <strong>GPU</strong> implementations.<br />

Contact: Sudhakar Sah (University of Oregon)<br />

MV02 - Integrating Machine Vision and<br />

Kinematics for a Robotic EV Charger<br />

This Poster is explains the use of Tegra II ULP<br />

GeForce <strong>GPU</strong> for Integrated Machine Vision and<br />

Inverse Kinematics (IK) on a Robotic (SCARA)<br />

Electric Vehicle Charging System. The<br />

convergence of wireless tech, mobile chip sets<br />

and powerful software environments enabled the<br />

smartphone revolution. Applying these economies<br />

with the addition of powerful imaging and GP<strong>GPU</strong><br />

capabilities enables a low-cost, high-performance,<br />

easily-engineered, embedded machine-tomachine<br />

(M2M) solution to an emergent problem<br />

in vehicle transportation. The result:<br />

PowerHydrant ® ELIMINATES ELECTRIC VEHICLE<br />

CHARGING INCONVENIENCE. Robotic conductive<br />

chargers beat wireless inductive chargers on<br />

efficiency, charger-time and constraint-free use.<br />

Contact: Kevin Leary (PowerHydrant)<br />

MEDICAL IMAGING & VISUALIZATION<br />

MI01 - Optimal Speed Gain for CUDA<br />

Implementation of SPECT Image Reconstruction<br />

<strong>GPU</strong> implementation can greatly accelerate<br />

iterative techniques of 3D image reconstruction in<br />

nuclear medicine imaging. To obtain high quality<br />

images in Single Photon Emission Computed<br />

Tomography (SPECT) within reduced scanning<br />

times, high sensitivity collimators need to be used<br />

and their response function modeled in the<br />

reconstruction. This is in general very<br />

computationally intensive and unfeasible with<br />

conventional PCs and algorithm implementations.<br />

Our software is able to perform the reconstruction<br />

of patient data within clinically acceptable times<br />

(18 s vs 17 min on CPU) using relatively low cost<br />

and widely available hardware.<br />

Contact: Jakub Pietrzak (RCPE-TU GRAZ)<br />

MI02 - Accelerating Mutual Information<br />

Computation for Nonrigid Registration on the <strong>GPU</strong><br />

Nonrigid registration is a technique for defining a<br />

geometric relation between each point in images.<br />

Although this technique helps medical doctors in<br />

detecting cancers by monitoring changes in size,<br />

some registration algorithms cannot be efficiently<br />

implemented due to small shared memory. The<br />

main objective of this poster is how such a<br />

capacity issue can be tackled for intra-operative<br />

registration. As an example, we present a CUDAbased<br />

method capable of rapidly computing joint<br />

histograms using shared memory. Our method<br />

achieved a three-fold speedup by exploiting the<br />

sparse structure of joint histograms, with<br />

successful registration of liver CT datasets.<br />

Contact: Kei Ikeda (Osaka University)<br />

MI03 - CUDA Accelerated Real Time Steered<br />

Spatial Compounding in Diagnostic Ultrasound<br />

Spatial compounding is a real time transmit and<br />

receive beam steering technique which acquires<br />

images from multiple lines of sight to increase the<br />

information content in medical ultrasound<br />

images. This function is implemented in the latest<br />

release of the ACUSON SC2000 platform for high<br />

frequency vascular imaging using CUDA texture<br />

lookups for geometric transformation to a<br />

common view. CUDA and the Quadro 2000 enable<br />

a substantial increase in processing performance<br />

(>8×) over conventional CPU based processing.<br />

Contact: Ismayil Guracar (Siemens Healthcare,<br />

Ultrasound Business Unit)<br />

MI04 - Ultrafast Multipinhole SPECT<br />

Iterative Reconstruction Using CUDA-Based<br />

<strong>GPU</strong> Computing<br />

We have developed an ultrafast SIR method for<br />

multipinhole SPECT programmed in CUDA and<br />

tested using a high performance graphic<br />

processing unit. We show significant performance<br />

improvement in reconstruction using both<br />

computer-generated and experimental<br />

sinograms, demonstrating an up-to fifty-fold<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

97


POSTER LISTINGS<br />

speed enhancement with virtually the same<br />

accuracy as the CPU-based SIR (with 0.15%<br />

normalized root mean square error).<br />

Contact: Fares Alhassen (University of California,<br />

San Francisco)<br />

MOBILE APPLICATIONS & INTERFACES<br />

MA01 - Accelerating Computer Vision with<br />

Tegra <strong>GPU</strong><br />

The mobile platform is quickly becoming a serious<br />

computing device, capable of tackling complex<br />

computer vision tasks. The ability to share memory<br />

space between <strong>GPU</strong> and CPU on Tegra 3 offers a<br />

unique opportunity to utilize the <strong>GPU</strong> without<br />

expensive memory copies. We have demonstrated<br />

that acceleration of just a handful of computeintensive<br />

CV operations on the Tegra 3 <strong>GPU</strong> can<br />

free some common bottlenecks and achieve real<br />

time performance on Video Stabilization and<br />

Panoramic Stitching applications.<br />

Contact: Colin Tracey (NVIDIA)<br />

MOLECULAR DYNAMICS<br />

MD01 - <strong>GPU</strong>-Based Molecular Dynamic<br />

Simulations Optimized with CUDPP and CURAND<br />

Libraries<br />

Computer simulations are indispensible tools for<br />

deciphering how biomolecular structures and<br />

folding correspond to functions. These simulations<br />

benefit greatly from advances in parallel<br />

computations (e.g., <strong>GPU</strong>s) because the calculated<br />

forces are inherently independent computations.<br />

However, a major limitation of <strong>GPU</strong>s is that the<br />

transfer of data between the CPU and <strong>GPU</strong> must be<br />

minimized. We introduce a new algorithm for<br />

calculating neighbor lists and transferring them to<br />

<strong>GPU</strong>s with minimal memory transfer. This<br />

algorithm is readily implemented with CUDPP and<br />

CURAND libraries. Using simulations of the<br />

ribosome, we observe a significant improvement in<br />

the performance, which is system size dependent.<br />

Contact: Tyson Lipscomb (Wake Forest University)<br />

MD02 - Plane Wave Pseudopotential Density<br />

Functional Theory Calculations on <strong>GPU</strong> Clusters<br />

In this poster, we present our implementation of the<br />

density functional theory (DFT) plane wave pseudopotential<br />

(PWP) calculation on <strong>GPU</strong> clusters. This<br />

<strong>GPU</strong> version is developed based on a CPU DFT-PWP<br />

code: PEtot. Our test indicates that the <strong>GPU</strong> version<br />

can have a ~10 times speed-up over the CPU version<br />

and is about 5 times faster than the legendary VASP<br />

code. An analysis of the speed-up and the scaling on<br />

the number of CPU/<strong>GPU</strong> computing units(up to 256)<br />

are presented. The success of our speed-up relies<br />

on a hybrid reciprocal-space and band-index<br />

parallelization scheme.<br />

Contact: WeiLe Jia (Supercomputing Center of<br />

CNIC, Chinese Academy of Sciences)<br />

MD03 - Single vs. Double Precision MD<br />

Simulations: Correlation is Length-Scale<br />

Dependent<br />

This poster evaluates how single vs. double<br />

precision operations affect Molecular Dynamics<br />

simulations using a <strong>GPU</strong>-optimized MD simulation<br />

software by performing coarse-grained MD<br />

simulations of many biologically relevant systems of<br />

various size. Three different measures of structural<br />

similarity are used to analyze structure of<br />

trajectories and to determine when single precision<br />

calculations would be appropriate and when would<br />

not. The conclusion is that the increased<br />

performance of single-precision implementations of<br />

MD simulations makes no significant difference in<br />

the accuracy and precision of MD simulations if the<br />

system size is sufficiently large.<br />

Contact: Anqi Zou (Wake Forest University)<br />

MD04 - <strong>GPU</strong>-Based Monte Carlo Simulations for<br />

Canonical and Gibbs Ensembles<br />

Markov Chain Monte Carlo (MCMC) simulation of<br />

chemical systems allows examination of<br />

nanoscopic thermodynamics and associated<br />

behavior at small time scales. These simulations<br />

tend to be computationally expensive, requiring<br />

days or more of CPU time to collect data.<br />

Optimization work is essential in order to remedy<br />

the inherent time complexity of these simulations.<br />

To date, there is no multi-ensemble molecular<br />

MCMC engine for the simulation of chemical<br />

systems that leverages <strong>GPU</strong>s. A speed up of 6.3<br />

and 14.4 times were achieved for a problem size of<br />

131072 particles for the canonical and Gibbs<br />

ensemble implementations, respectively.<br />

Contact: Loren Schwiebert (Northeastern University)<br />

MD05 - Simultaneous Evolution of Multiple<br />

Molecular Dynamics Simulations<br />

The need to generate statistically significant data<br />

from time intensive molecular dynamics (MD)<br />

simulations drives the search for algorithms that<br />

can take advantage of inherent parallelism in<br />

computer architectures. CUDA is an ideal platform<br />

for performing multiple MD simulations for<br />

ensemble averaging. We demonstrate a proof of<br />

concept highlighting the potential of CUDA in<br />

performing multiple MD simulations with different<br />

initial conditions. Compared to the traditional<br />

implementation, CUDA is able to deliver the output<br />

ten times faster. Work is in progress for improving<br />

the performance through memory optimization.<br />

Contact: Cory Slep (NC State University)<br />

MD06 - <strong>GPU</strong> Accelerated Molecular Dynamics<br />

Enabling Transformative Drug Development<br />

One powerful computational technique for the<br />

science of drug development has been the use of<br />

molecular dynamics (MD) simulations. MD<br />

simulations can explore the interactions between<br />

small molecule drugs and membrane-bound<br />

proteins on an atomic level. It is now possible to<br />

understand the biological function of drug targets<br />

through their structural motions. <strong>GPU</strong> computing


is revolutionizing the field of MD, with <strong>GPU</strong><br />

accelerated MD code competing with national<br />

supercomputers. Our research goal is to use <strong>GPU</strong><br />

technology to not only improve MD performance,<br />

but to improve MD development and workflow for<br />

drug development.<br />

Contact: Benjamin Madej (University of California<br />

San Diego, San Diego Supercomputer Center)<br />

NEUROSCIENCE<br />

NS01 - Realtime Cerebellum: Realtime<br />

Simulation of a Realistic Cerebellar Model<br />

Realtime computing is a natural demand to deal<br />

with realtime signal processing ang control. The<br />

cerebellum plays an essential role in motor<br />

learning and control. Once we build a cerebellar<br />

model running in realtime, the model could be<br />

used as a neural controller of hardware such as<br />

robots. We built a large-scale spiking network<br />

model of the cerebellum composed of more than<br />

100,000 neurons that runs in realtime. We<br />

succeeded to control a humanoid robot to hit a<br />

ball thrown by a pitching machine through online<br />

learning of a proper timing to swing a bat.<br />

Contact: Tadashi Yamazaki (RIKEN Brain<br />

Science Institute)<br />

NS02 - Computational Modeling of Human Head<br />

Electromagnetics Using <strong>GPU</strong>s<br />

This poster presents a computational environment<br />

ACSON that leverages <strong>GPU</strong> technology to<br />

accelerate the solution of the EEG forward problem,<br />

which is necessary to solve the neuroimaging<br />

inverse problem. Two finite difference algorithms,<br />

ADI and VAI, to solve Poisson equation are<br />

presented. The ADI algorithm can only handle<br />

isotropic conductivities of the head tissue while VAI<br />

can hand anisotropic conductivities as well. Their<br />

performance on different <strong>GPU</strong>s are evaluated and<br />

compared with OpenMP implementation.<br />

Contact: Allen D. Malony (University of Chicago)<br />

PARALLEL PROGRAMMING LANGUAGES<br />

& COMPILERS<br />

PC01 - Automatic Mapping of Shared Memory<br />

<strong>Program</strong>s to <strong>GPU</strong>-Based Heterogeneous Systems<br />

Realizing the potential of <strong>GPU</strong>-based<br />

heterogeneous systems is challenging due to the<br />

complexity of programming. We have developed a<br />

compiler-based approach to automatically generate<br />

optimised OpenCL code from shared memory<br />

OpenMP programs. A key feature of our scheme is<br />

that it leverages existing transformations, especially<br />

data transformations, to improve performance on<br />

<strong>GPU</strong> architectures. As not all programs are suitable<br />

for <strong>GPU</strong> execution it uses predictive modeling to<br />

automatically determine if it is worthwhile running<br />

the OpenCL code on the <strong>GPU</strong> or OpenMP code on<br />

the multi-core host.<br />

Contact: Dominik Grewe (University of Edinburgh)<br />

PC02 - GKLEE: Practical Concolic Verification<br />

and Test Generation for <strong>GPU</strong>s<br />

We provide a new framework called GKLEE that can<br />

analyze C++ <strong>GPU</strong> programs, locating the important<br />

correctness and performance bugs. For these<br />

programs, GKLEE can also automatically generate<br />

tests that provide high coverage, and these tests<br />

can later be run on the hardware to cross-check<br />

results. It helps pin-point memory accesses and<br />

execution steps that cause performance<br />

degradation. It also provides a versatile user<br />

interface. GKLEE has detected bugs and issues in<br />

many CUDA SDK kernels, and also has been able to<br />

handle non-trivial multi-kernel examples.<br />

Contact: Peng Li (School of Computing, University<br />

of Utah)<br />

PC03 - <strong>GPU</strong> Ocelot: Dynamic Compilation for PTX<br />

<strong>GPU</strong> Ocelot is an open-source dynamic JIT<br />

compilation framework for <strong>GPU</strong> compute<br />

applications targeting a range of <strong>GPU</strong> and non-<strong>GPU</strong><br />

execution targets. Ocelot supports CUDA<br />

applications and provides an implementation of the<br />

CUDA Runtime API enabling seamless integration<br />

with existing CUDA applications. Its JIT compiler<br />

supports four backend execution targets - (1) an<br />

emulator that implements NVIDIA’s Parallel Thread<br />

Execution (PTX) instruction set architecture, (2)<br />

NVIDIA <strong>GPU</strong>s, (3) AMD <strong>GPU</strong>s, and (4) a translator to<br />

LLVM for efficient parallel execution of <strong>GPU</strong> kernels<br />

on multicore CPUs. Existing CUDA applications are<br />

seamlessly supported.<br />

Contact: Andrew Kerr (Georgia Institute<br />

of <strong>Technology</strong>)<br />

PC04 - Legion: Expressing Locality and<br />

Independence with Logical Regions<br />

Modern parallel architectures have both<br />

heterogeneous processors and deep, complex<br />

memory hierarchies. We present Legion, a<br />

programming model and runtime system for<br />

programming these machines. Legion is<br />

organized around logical regions, which express<br />

both locality and independence of program data.<br />

Legion also enables explicit, programmer<br />

controlled movement of data through the memory<br />

hierarchy and placement of tasks based on locality<br />

information via a novel mapping interface.<br />

Running on a 4 node cluster with 8 total <strong>GPU</strong>s and<br />

4 levels of memory hierarchy, our implementation<br />

of Legion achieves a 5.9X speedup over a single<br />

CPU-<strong>GPU</strong> node on real-world applications.<br />

Contact: Michael Bauer (Stanford University)<br />

PC05 - Compilation Techniques for Demand-<br />

Driven Execution on Heterogeneous<br />

Architectures<br />

In order to leverage massive parallelism, there has<br />

been a resurgence of demand-driven programming<br />

models. The goal of this work is to develop<br />

compilation techniques and language extensions<br />

for existing imperative parallel programming<br />

languages that will then be mapped onto<br />

heterogeneous parallel architectures. In particular,<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

99


POSTER LISTINGS<br />

this work addresses the following topics: automatic<br />

generation of task-graphs from explicitly parallel<br />

loops, programming language extensions to<br />

provide the ordering constraints between sections<br />

of code, and the mapping of data and computation<br />

onto massively parallel architectures.<br />

Contact: Albert Sidelnik (Globo Network)<br />

PC06 - DL: A Data Layout Transformation System<br />

for Heterogeneous Computing<br />

DL is a combination of a novel approach to laying<br />

out array of aggregate types across <strong>GPU</strong> and CPU<br />

architectures to further improve memory<br />

parallelism and kernel performance beyond what<br />

is achieved by human programmers using discrete<br />

arrays today. Our proposed new layout can be<br />

derived in situ from the traditional Array of<br />

Structure, Structure of Arrays, and adjacent<br />

Discrete Arrays layouts used by programmers.<br />

Second, DL has novel in-place layout conversion<br />

algorithms implemented as part of a run-time<br />

library for OpenCL that transparently converts<br />

data to accommodate application components<br />

that have different data layout requirements.<br />

Contact: I-Jui Sung (University of Illinois at<br />

Urbana-Champaign)<br />

RAY TRACING<br />

RT01 - Searching for Cold Trapped Resources in<br />

the Lunar Regolith<br />

Our poster describes a ray tracing technique<br />

applied to the latest digital elevation models of the<br />

Moon in an effort to find permanent shadows<br />

where water ice may be cold trapped. Some of the<br />

shadows we found are characterized with surface<br />

temperature measurements from the Diviner<br />

mid-infrared radiometer on the Lunar<br />

Reconnaissance Orbiter.<br />

Contact: Andy McGovern (Irish Centre for High-End<br />

Computing (ICHEC))<br />

SUPERCOMPUTING<br />

SC01 - Multi-<strong>GPU</strong> Computing<br />

Our poster details several projects that make<br />

multi-<strong>GPU</strong> computing easy. It presents our work on<br />

a a callback method for <strong>GPU</strong>s (presented at<br />

UCHPC 2010), message-passing interface for <strong>GPU</strong>s<br />

(IPDPS 2009), a heterogeneous computationalresource<br />

scheduler (EG 2009), and a multi-<strong>GPU</strong><br />

MapReduce implementation (IPDPS 2011).<br />

Contact: Jeffery Stuart (UC Davis)<br />

SC02 - Automatic Generation of FFT Libraries<br />

for <strong>GPU</strong>s<br />

In this poster we present an extension of the<br />

Spiral code generation system to <strong>GPU</strong>s. We<br />

address the key problems of <strong>GPU</strong> memory<br />

hierarchy and parallelism, and we introduce a<br />

variety of FFT algorithms which avoid shared<br />

memory bank conflicts without wasting space<br />

using padding and optimized global memory<br />

bandwidth transfer with minimum register<br />

allocation even in low occupancy. We demonstrate<br />

high performance results against cuFFT 1-D and<br />

2-D DFTs for single precision. This research is still<br />

in progress, but at the moment we are able to<br />

match and beat cuFFT library on sizes we have<br />

generated optimized code.<br />

Contact: Christos Angelopoulos (Carnegie<br />

Mellon University)<br />

SC03 - Computational and Simulation Sciences:<br />

Applications of Heterogeneous Computing<br />

As the size and complexity of scientific problems<br />

grow, scientists from a broad range of discipline<br />

areas are relying more on computational methods<br />

and simulations to help solve their problems. This<br />

work presents summary of heterogeneous<br />

algorithms and applications that have been<br />

developed by CSIRO for solving practical and<br />

challenging science problems faster than is<br />

possible with conventional multi-core CPUs alone.<br />

The problem domains include: CFD, imaging and<br />

visualization, advanced materials modeling,<br />

computational biology, geosciences and climate<br />

research. The algorithms utilize NVIDIA <strong>GPU</strong>s<br />

and multi-core CPUs on a scale ranging from<br />

single workstation installations through to large<br />

<strong>GPU</strong> clusters.<br />

Contact: Tomasz Bednarz (CSIRO)<br />

SC04 - 75-Round SHA-1 Collision Search Using<br />

<strong>GPU</strong> Clusters<br />

SHA-1 is one of the most widely used<br />

cryptographic hash function. We ported method of<br />

characteristics for collision search for SHA-1 to<br />

<strong>GPU</strong> clusters. Using it, we found a collision for<br />

75-round version of SHA-1, which is currently the<br />

world record.<br />

Contact: Andrew Adinetz (Lomonosov Moscow<br />

State University)<br />

SC05 - <strong>GPU</strong> Clusters for Large-Scale Analysis of<br />

X-Ray Scattering Data<br />

X-ray scattering is a valuable tool for measuring<br />

the structural properties of materials used in the<br />

design and fabrication of energy-relevant<br />

nanodevices. A primary challenge here is in the<br />

analysis of data due to its generation rate and<br />

size. We are developing novel HPC algorithms and<br />

codes for such analyses. Here we present two<br />

advances using <strong>GPU</strong>s: a flexible Grazing Incidence<br />

Small Angle Scattering simulation code. This code<br />

can compute the scattered light intensity from any<br />

given sample in all directions of space. Second, an<br />

efficient inverse modeling code for structural<br />

fitting problems using Reverse Monte Carlo (RMC)<br />

simulation algorithm.<br />

Contact: Abhinav Sarje (Wayne State University)


VISUALIZATION<br />

VZ01 - CNC Tool Path Planning and Machining<br />

Simulation on <strong>GPU</strong><br />

Today a main part of a low-volume manufacturing<br />

cost involving CNC machining is a cost of a tool<br />

path planning performed by an engineer. The goal<br />

of this research is to develop an automatic CNC<br />

machine tool path planning and simulation<br />

system. In order to achieve a reasonable<br />

performance we are using GP<strong>GPU</strong> approach for<br />

geometry processing and propose to develop a<br />

new solid geometry representation especially<br />

designed for parallel processing and GP<strong>GPU</strong><br />

which will become a base for a new automatic tool<br />

path planning system and will also significantly<br />

increase speed and accuracy of a machining<br />

process simulation.<br />

Contact: Dmytro Konobrytskyi (Clemson University)<br />

VZ02 - <strong>GPU</strong>-Accelerated Power System<br />

State Visualization<br />

Modern energy management systems aim to<br />

provide situational awareness to grid operators<br />

using a variety of tools. Advances in technology<br />

such as high-frequency data from phasor<br />

measurement units distributed across the system<br />

support the display and analysis of the dynamic<br />

state of the power grid. Scattered data interpolation<br />

is a computationally intensive problem that benefits<br />

massively from parallel implementations on <strong>GPU</strong>s.<br />

This poster presents a highly optimized network<br />

state visualization system that fully exploits<br />

programmable graphics hardware and delivers<br />

three orders of magnitude performance<br />

improvements while offering extra features<br />

compared to a traditional, CPU-based approach.<br />

Contact: Martin Naef (NVIDIA)<br />

VZ03 - Image Treatment Implementing Extended<br />

Depth of Field with NVIDIA CUDA<br />

Extended depth of field (EDF) is a specific method<br />

used to analyze and treat specific image zones in<br />

optical research. Due to the complexity of the EDF<br />

and the large volume of data processed in optics<br />

problems, EDF is a good candidate to process in<br />

parallel architectures. This work is an<br />

implementation of parallel-extended depth of field<br />

using NVIDIA CUDA. We propose a solution<br />

algorithm addressed a multicomputer cluster and<br />

shared memory represented by an hybrid parallel<br />

machine based on NVIDIA <strong>GPU</strong>s. Moreover, a<br />

performance evaluation in terms of execution<br />

time is proposed followed by a discussion about<br />

this approach.<br />

Contact: Mónica Liliana Hernández Ariza<br />

(Universidad Industrial de Santander)<br />

VZ04 - Diderot: A Parallel DSL for Image<br />

Analysis and Visualization<br />

The analysis of structure in three-dimensional<br />

images is increasingly important for biomedical<br />

research and computational science. In this<br />

poster, we outline ongoing work developing<br />

Diderot, a parallel domain-specific language for<br />

three-dimensional image visualization and<br />

analysis algorithms, such as volume rendering,<br />

fiber tractography, and particle systems. Diderot<br />

supports a high-level mathematical computation<br />

model coupled with a batch-synchronous<br />

parallelism model. The poster further describes<br />

Diderot’s <strong>GPU</strong> implementation and its high<br />

performance measurements on <strong>GPU</strong>s versus<br />

other sequential and parallel platforms.<br />

Contact: Lamont Samuels (Lawrence Berkeley<br />

National Laboratory)<br />

CONFERENCE GUIDE POSTER LISTINGS<br />

101


What you need to know. Now.<br />

Dr. Dobbs Ad?<br />

Available on the iPad <br />

100% Free. Try it today!


<strong>GTC</strong> <strong>2012</strong><br />

SPEAKERS & PANELISTS<br />

Alexey Abramov<br />

PhD Student (University of Gottingen)<br />

Alexey Abramov received the M.Sc. degree in Computer<br />

Science from the Moscow Engineering and Physics<br />

Institute (State University), Moscow, Russia. Currently he<br />

is a PhD student at the Georg-August University,<br />

Goettingen, Germany. His research interests include<br />

image processing, image segmentation and object<br />

tracking, stereo image processing and real-time<br />

computer vision with highperformance computing on<br />

parallel hardware.<br />

h Session(s): S0075 - Oculus Real-Time Modular<br />

Cognitive Vision System (Tuesday, 15:00, Room: A1)<br />

Robert Alexander<br />

CUDA Tools Software Engineer (NVIDIA)<br />

Robert Alexander is a software engineer on the NVIDIA<br />

Tesla Platform Software team. His focus is on<br />

management, monitoring and diagnostics of <strong>GPU</strong>s in a<br />

cluster environment. His work includes the NVIDIA<br />

Management Library (NVML), the NVIDIA System<br />

Management Interface (NVIDIA-smi), and he is<br />

responsible for the Perl and Python NVML bindings.<br />

Robert has a BS in Computer Science from the<br />

Rochester Institute of <strong>Technology</strong>.<br />

h Session(s): S0238 - Tesla Cluster Monitoring &<br />

Management APIs (Thursday, 09:30, Room: K)<br />

Alina Alt<br />

Applied Engineer (NVIDIA)<br />

Alina Alt is an Applied Engineer at NVIDIA where her<br />

responsibilities include helping users incorporate<br />

NVIDIA’s <strong>GPU</strong>s, video products and video related driver<br />

features into their solutions and applications. Her past<br />

experience includes developing augmented reality<br />

applications for live sports telecasts and developing a<br />

scalable, CPU-based cluster graphics driver.<br />

h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />

Round Table (Monday, 14:30, Room: A2)<br />

h S0049 - Using the <strong>GPU</strong> Direct for Video API<br />

(Tuesday, 15:00, Room: J2)<br />

h S0267A - Mixing Graphics and Compute with<br />

Multiple <strong>GPU</strong>s (Tuesday, 17:00, Room: J2)<br />

h S0326 - Next Generation InfoWall<br />

(Thursday, 09:00, Room: A1)<br />

h S0267B - Mixing Graphics and Compute<br />

with Multiple <strong>GPU</strong>s (Thursday, 15:30, Room: L)<br />

Minesh B. Amin<br />

Founder / CEO (MBA Sciences)<br />

Dr. Minesh B. Amin is Founder & CEO of MBA Sciences,<br />

Inc. MBA Sciences enables engineers and scientists to<br />

rapidly prototype, analyze and deploy robust parallel<br />

solutions across heterogeneous computing resources<br />

spanning servers, cores and <strong>GPU</strong>s from either data<br />

centers or public clouds. Previously he worked at<br />

Synopsys, Inc, where he helped, prototype, implement<br />

and deploy several parallel versions of existing serial<br />

products including TetraMax TenX ATPG product and<br />

PrimeTime DMSA. Dr. Amin received his PhD from the<br />

University of Minnesota.<br />

h Session(s): S0299 - Exploiting Fault Tolerant<br />

Heterogeneous Parallelism with SPM.Python<br />

(Wednesday, 16:00, Room: C)<br />

Joshua Anderson<br />

Research Area Specialist (University of Michigan)<br />

Joshua Anderson is a Research Area Specialist in the<br />

Laboratory for Computational Nanoscience & Soft<br />

Matter Simulation at the University of Michigan. Dr.<br />

Anderson holds a Ph.D. degree in Condensed Matter<br />

Physics from Iowa State University and is the lead<br />

developer of HOOMD-blue, a high performance particle<br />

simulation tool. His current research interests include<br />

<strong>GPU</strong> computing, polymer physics, and nanoparticle<br />

self-assembly.<br />

h Session(s): S0058 - Advancing <strong>GPU</strong> Molecular<br />

Dynamics: Rigid Bodies in HOOMD-blue<br />

(Wednesday, 10:00, Room: N)<br />

Roberto Ansaloni<br />

(Cray Italy)<br />

Biography unavailable at press time.<br />

h Session(s): S0286 – Scaling Applications to a<br />

Thousand <strong>GPU</strong>s and Beyond<br />

(Wednesday, 16:00, Room: A2)<br />

Santosh Ansumali<br />

(Faculty Fellow, Engineering Mechanics Unit, JNCASR,<br />

Bangalore)<br />

Dr. Ansumali is a faculty at EMU, JNCASR and also<br />

holding Ramanujan Fellowship from DST India since July<br />

2009. Prior to this, he was an assistant Prof. at NTU,<br />

Singapore since August 2005. He has done his PhD from<br />

ETH, Zurich (Switzerland) on mesoscale simulation<br />

methods. His research area is mesoscale simulation<br />

methods and high performance computing based on<br />

Kinetic theory.<br />

h Session(s): S0428 – Panini: A <strong>GPU</strong> Aware Array<br />

Class (Thursday, 16:00, Room: B)<br />

Takayuki Aoki<br />

Professor (Tokyo Institute of <strong>Technology</strong>)<br />

Takayuki Aoki received a Dr. Sci (1989) from Tokyo<br />

Institute of <strong>Technology</strong>, was a visiting researcher in the<br />

Max-Planck Institute in Germany for one year, has been<br />

a professor in Tokyo Institute of <strong>Technology</strong> since 2001.<br />

He has received the Computational Mechanics<br />

Achievement Award from Japan Society of Mechanical<br />

Engineers and many awards and honors in visualization.<br />

He is also the vice president of the Japan Association for<br />

Computational Mechanics. He has authored the first<br />

book in the Japanese language on the CUDA<br />

programming and applications. His research covers<br />

numerical schemes for CFD, numerical weather models,<br />

HPC applications on graphics processors, multi-phase<br />

flows, and simulation of natural disasters.<br />

h Session(s): S0412 - A 2-Petaflops Stencil<br />

Application with Stereoscopic 3D Visualization -<br />

Gorden Bell Prize 2011 (Tuesday, 14:00, Room: A1)<br />

Jeremy Appleyard<br />

Analyst (Polyhedron Software Ltd)<br />

Biography unavailable at press time.<br />

h Session(s): S0432 – New Ideas for Massively<br />

Parallel Preconditioners<br />

(Wednesday, 15:00, Room: A7)<br />

John Appleyard<br />

Managing Director (Polyhedron Software Ltd)<br />

BA, MA and PhD from Cambridge University. One of the<br />

Original Developers of the Eclipse Oil Reservoir<br />

Simulator and an MD of Polyhedron Software Ltd.<br />

h Session(s): S0432 - New Ideas for Massively<br />

Parallel Preconditioners<br />

(Wednesday, 15:00, Room: A7)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

103


SPEAKERS AND<br />

PANELISTS<br />

Arutyun Avetisyan<br />

Deputy Director (Institute for System <strong>Program</strong>ming,<br />

Russian Academy of Sciences)<br />

Arutyun Avetisyan is Deputy Director of the Institute for<br />

System <strong>Program</strong>ming of the Russian Academy of<br />

Sciences (ISP RAS). His research interests are in the<br />

areas of compiler technologies, HPC and Cloud<br />

computing. He is leader of several projects, including<br />

researching compiler support for heterogeneous<br />

systems. He represents RAS in Steering Committee of<br />

Open Cirrus Community – the global cloud computing<br />

testbed for research projects. He is PI of the National<br />

“University Cluster” program, including in particular the<br />

technology platform (unihub.ru), which is an opportunity<br />

of creating wide range of services within a single<br />

infrastructure, e.g. creating subject-specific web-labs.<br />

h Session(s): S0115 – Specialized Sparse Matrix<br />

Formats and SpMV Kernel Tuning for <strong>GPU</strong>s<br />

(Wednesday, 10:30, Marriott Ballroom 3)<br />

Brendan Babb<br />

Student/Research Technician (University of<br />

Alaska Anchorage)<br />

Brendan Babb has over 20 years experience as a<br />

software programmer and analyst in the engineering<br />

and telecommunications industries with a background in<br />

Mathematics. He holds three patents in error detection<br />

and correction and his current interests are in<br />

Evolutionary Computation, Biomimicry, GP<strong>GPU</strong> and their<br />

collective application to optimizing renewable energy<br />

solutions. Since 2005 he has used evolutionary<br />

computation to evolve wavelet like transforms that<br />

improve image compression for photo, fingerprint,<br />

satellite, CT scans, Ultrasound and Mars Rover images.<br />

h Session(s): S0133 - Improving Mars Rover Image<br />

Compression Via <strong>GPU</strong>s And Genetic Algorithms<br />

(Thursday, 09:00, Room: A3)<br />

Ronald Babich<br />

Research Scientist (NVIDIA)<br />

Ron Babich is a Research Scientist at NVIDIA, where he<br />

works at the intersection of algorithms and architecture,<br />

with a particular focus on high-performance computing. He<br />

was previously a postdoctoral fellow in Boston University’s<br />

Center for Computational Science and received his PhD in<br />

Physics from Boston University in 2009.<br />

h Session(s): S0368 - Unraveling the Mysteries of<br />

Quarks with Hundreds of <strong>GPU</strong>s<br />

(Thursday, 15:00, Room: K)<br />

Philip A. Beasley-Harling<br />

(Bank of America Merrill Lynch)<br />

Biography unavailable at press time.<br />

h Session(s): S0656 kdb+ and <strong>GPU</strong>s for Market Data<br />

Analytics and Trading (Wednesday, 17:30, Room: L)<br />

Dan Bailey<br />

R&D (Double Negative)<br />

Dan Bailey is working in Research and Development at<br />

Double Negative, where he is driving the adoption of the<br />

<strong>GPU</strong> and increased parallelism in general. His primary<br />

focus is the proprietary fluid solver, where a strong<br />

educational background in Computer Science has<br />

complemented an interest in fluid simulation. His<br />

research concentrates on languages and parallel<br />

compiler technology, but with a strong leaning towards<br />

its use in production.<br />

h Session(s): S0300 - Jet: A Domain-Specific<br />

Approach to Parallelism for Film Fluid Simulation<br />

(Tuesday, 10:00, Room: A2)<br />

Tim Bajarin<br />

President (Creative Strategies)<br />

Tim Bajarin is recognized as one of the leading industry<br />

consultants, analysts and futurists, covering the field of<br />

personal computers and consumer technology. Mr.<br />

Bajarin has been with Creative Strategies since 1981 and<br />

has served as a consultant to most of the leading<br />

hardware and software vendors in the industry including<br />

IBM, Apple, Xerox, Hewlett Packard/Compaq, Dell, AT&T,<br />

Microsoft, Polaroid, Lotus, Epson, Toshiba and<br />

numerous others. His articles and/or analysis have<br />

appeared in USA Today, Wall Street Journal, The New<br />

York Times, Time and Newsweek magazines,<br />

BusinessWeek and most of the leading business and<br />

trade publications. He has appeared as a business<br />

analyst commenting on the computer industry on all of<br />

the major television networks and was a frequent guest<br />

on PBS’ The Computer Chronicles. Mr. Bajarin has been<br />

a columnist for US computer industry publications such<br />

as PC Week and Computer Reseller News and wrote for<br />

ABCNEWS.COM for two years and Mobile Computing for<br />

10 years. His columns currently appear in Asia<br />

Computer Weekly, Personal Computer World (UK), and<br />

Microscope (UK) as well as Mobile Enterprise Magazine.<br />

His various columns and analyses are syndicated in over<br />

30 countries.<br />

h Session(s): S2003 – Emerging Companies Summit<br />

Fireside Chat with Jen-Hsun Huang (CEO,<br />

President and Co-Founder, NVIDIA) and Tim<br />

Bajarin (President, Creative Strategies)<br />

(Wednesday, 14:00, Marriott Ballroom 4)<br />

Zack Baker<br />

(Los Alamos National Laboratory)<br />

Biography unavailable at press time.<br />

h Session(s): S0702 - Los Alamos AHPC Symposium,<br />

The Architecture of Acceleration in HPC<br />

(Wednesday, 15:30, Room: J1)<br />

Robert Balgley<br />

CEO (Mersive)<br />

Over the past 20 years Balgley has worked as CEO of<br />

several category-defining companies funded by some of<br />

the most successful venture capital firms in the world.<br />

Prior to Mersive, he was CEO of SkyeTek, the worldwide<br />

market share leader in embedded RFID readers and<br />

technology. Prior to that, Balgley was CEO of Jabber, the<br />

pioneer and leader in enterprise instant messaging,<br />

which was later acquired by Cisco Systems. Before<br />

Jabber, he was CEO of Mobile Logic, an early market<br />

leader of mobile data networking software which was<br />

acquired in 2000. Earlier in his career, Balgley held<br />

executive positions in sales and marketing at GE, 3Com,<br />

Hughes Aircraft and Case Communications.<br />

h Session(s): S2005 – Emerging Companies Summit:<br />

CEO on Stage Featuring RealView Imaging,<br />

Elemental Technologies, and Mersive<br />

(Wednesday, 16:00, Marriott Ballroom 4)<br />

Bill Barth<br />

Director of High Performance Computing (Texas Advanced<br />

Computing Center, University of Texas at Austin)<br />

Bill Barth is the Director of High Performance<br />

Computing at the Texas Advanced Computing Center<br />

where he oversees the use of TACC’s large-scale HPC<br />

resources by a diverse international community of<br />

scientists and researchers. Dr. Barth received his PhD<br />

from the Aerospace Engineering Department of The<br />

University of Texas in 2004 where he worked on finite<br />

element methods for incompressible flow and transport<br />

problems. His current interests include network topology<br />

aware job scheduling and MPI communication,<br />

physics-based, flow visualization, software tools for


large-scale clusters, and the design and deployment of<br />

leadership-class supercomputers.<br />

h Session(s): Los Alamos AHPC Symposium,<br />

Stampede System Architecture and Early<br />

Accelerator <strong>Program</strong>ming Experiences<br />

(Wednesday, 14:00, Room: J1)<br />

Francesco Basile<br />

Software Engineer (MBI srl)<br />

Basile obtained his joint PhD in Mathematical Physics at<br />

University of Pisa / Brunel University London in 2008.<br />

Since 2008 he devolved is strong mathematical<br />

background to analysis of digital radio signal processing.<br />

h Session(s): S0065 – Satellite HUB Communication<br />

System <strong>GPU</strong> Based (Thursday, 16:30, Room: M)<br />

Bela Bauer<br />

Postdoc (Microsoft Research)<br />

Biography unavailable at press time.<br />

h Session(s): S0039 – Data-Driven GP<strong>GPU</strong> Ideology<br />

Extension (Thursday, 10:00, Marriott Ballroom 3)<br />

Janusz Bedkowski<br />

Researcher<br />

Janusz has been a researcher in area of mobile robotics<br />

- navigation, 3D modeling and simulation since 2006. He<br />

is working in cooperation with following institutions:<br />

Warsaw University of <strong>Technology</strong>, faculty of Mechatronics<br />

(education), Industrial Research Institute for<br />

Automation and Robotics (researcher, mobile robot<br />

design and programming), Institute of Mathematical<br />

Machines (researcher, simulation and modeling using<br />

parallel computing).<br />

h Session(s): S0081 - Parallel Computing In Mobile<br />

Robotics for RISE (Thursday, 09:30, Room: A3)<br />

Nathan Bell<br />

Senior Research Scientist (NVIDIA)<br />

Nathan Bell joined NVIDIA Research in August 2008. His<br />

current research interests include sparse linear algebra<br />

and programming models for parallel computing.<br />

Nathan contributes to several open source projects<br />

including Thrust, a high-level parallel template library,<br />

Cusp, a library for sparse linear algebra and graph<br />

algorithms, and PyAMG, a library of algebraic multigrid<br />

methods in Python. Nathan received a bachelor’s degree<br />

in Computer Science from Georgia Tech and a Ph.D in<br />

Computer Science from the University of Illinois at<br />

Urbana-Champaign (UIUC).<br />

h Session(s): S0602 - An Introduction to the<br />

Thrust Parallel Algorithms Library<br />

(Tuesday, 17:00, Room: A3)<br />

Tomer Ben-David<br />

Co-Founder and Vice President, R&D (Rocketick)<br />

Tomer has co-founded Rocketick at 2008 and since then<br />

he is serving the company as the VP of R&D. Tomer<br />

brings 15 years of experience in management and<br />

engineering of software and hardware products. He<br />

previously worked at Intel Corporation, Siliquent<br />

(acquired by Broadcom) and Mellanox. Tomer holds a B.<br />

Sc. (Cum Laude) in Computer Engineering from the<br />

Technion – the Israeli Institute of <strong>Technology</strong>, and<br />

Executive MBA from Recanati School of Business,<br />

Tel-Aviv University.<br />

h Session(s): S0520 - Using <strong>GPU</strong>s to Speedup Chip<br />

Verification (Tuesday, 10:00, Room: J3)<br />

h S2004 – Emerging Companies Summit: CEO on<br />

Stage Featuring Raytrix, Rocketick, and Ubitus<br />

(Wednesday, 17:00, Marriott Ballroom 4)<br />

Thomas Benson<br />

Research Engineer II (Georgia Tech Research Institute)<br />

Thomas Benson is a Research Engineer with Georgia<br />

Tech Research Institute, where his research focus and<br />

interests include high-performance computing,<br />

high-performance embedded computing, radar signal<br />

processing and medical imaging, heterogeneous<br />

computing, and programming models related to such<br />

systems. He holds a Ph.D. in Computer Science from the<br />

University of Tennessee, Knoxville, and has nearly five<br />

years of post-graduate industrial research experience<br />

with GE Global Research in the field of medical imaging,<br />

specifically image reconstruction and related algorithms<br />

for X-ray computed tomography (CT). His experience<br />

includes developing large-scale real-time processing<br />

implementations for several of his fields of research.<br />

h Session(s): S0316 - Using <strong>GPU</strong>s to Accelerate<br />

Synthetic Aperture Sonar Imaging via<br />

Backpropagation (Tuesday, 15:30, Room: J3)<br />

Mike Bernhardt<br />

(The Exascale Report)<br />

Mike Bernhardt is a well-respected strategic marketing,<br />

communications, media relations and electronic<br />

publishing consultant with 25 years of experience<br />

serving the HPC community. Bernhardt founded The<br />

Exascale Report in 2010 to serve as the voice of the<br />

emerging exascale community. Today, the subscriptionbased<br />

Exascale Report is a widely read publication from<br />

which articles and extracts have been presented to<br />

numerous governmental bodies to help drive funding<br />

and political commitment discussions on a global scale.<br />

As an independent consultant, Bernhardt has worked<br />

with dozens of companies throughout the global HPC<br />

ecosystem on branding, marketing, strategic<br />

communications and public speaking programs.<br />

Bernhardt is a former Intel marketing executive and<br />

currently serves as a consultant or Board-level advisor<br />

to a number of privately held organizations.<br />

h Session(s): S0531 - Exascaling Your Apps<br />

(Wednesday, 09:00, Room: C)<br />

James Beyer<br />

Software Engineer (Cray Inc)<br />

James Beyer received his Ph.D. from University of<br />

Minnesota. He has been a member of the Cray<br />

<strong>Program</strong>ming Environment Optimization team for more<br />

than 12 years. He has represented Cray on the OpenMP<br />

language committee and ARB since Cray rejoined the<br />

organization. He led the effort to redesign the Cray<br />

OpenMP implementation to improve optimizer<br />

integration. He authored the original OpenMP for<br />

Accelerators, OpenMP4ACC, proposal and co-chairs the<br />

OpenMP language subcommittee on Accelerators.<br />

James was the primary Cray representative during the<br />

design of the OpenACC specification. He is currently<br />

actively involved in the Cray implementations of OpenMP,<br />

OpenACC and OpenMP4ACC.<br />

h Session(s): S0089 - Accelerator Directives,<br />

OpenACC and OpenMP4ACC<br />

(Tuesday, 16:00, Room: A3)<br />

Johanna Beyer<br />

Postdoctoral Fellow (King Abdullah University of Science<br />

and <strong>Technology</strong>)<br />

Johanna Beyer is a postdoctoral fellow at the Geometric<br />

Modeling and Scientific Visualization Center at King<br />

Abdullah University of Science and <strong>Technology</strong> (KAUST),<br />

Saudi Arabia. She holds an M.Sc. in medical software<br />

engineering (2004, University of Applied Sciences<br />

Hagenberg, Austria) and a Ph.D. in computer science<br />

(2009, University of <strong>Technology</strong> Vienna, Austria). Her<br />

research focuses on <strong>GPU</strong>-based volume rendering<br />

techniques for medical and neuroscience applications,<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

105


SPEAKERS AND<br />

PANELISTS<br />

with emphasis on visualization of large and multi-modal<br />

data. She regularly publishes at IEEE TVCG/IEEE<br />

Visualization.<br />

h Session(s): S0202 - Terascale Volume Visualization<br />

in Neuroscience (Wednesday, 16:30, Room: A8)<br />

Tim Bi<br />

Graduate Research Analyst (Johns Hopkins University /<br />

George Mason University)<br />

Tim Bi is a Bioinformatics Ph.D. candidate at George<br />

Mason University currently working as a GRA for Dr.<br />

Saleet Jafri and contributing to the efforts of improving<br />

the <strong>GPU</strong> program for Calcium Induced Calcium Release.<br />

He is also working for Dr. Diane Becker at the Johns<br />

Hopkins School of Medicine contributing to the GWAS<br />

studies being conducted at the GeneSTAR lab.<br />

h Session(s): S0272 - <strong>GPU</strong> GWAS - CUDA Based<br />

Genome Wide Association Studies<br />

(Wednesday, 10:30, Room: B)<br />

James Bigler<br />

Sr. Software Engineer (NVIDIA)<br />

James Bigler is currently working for NVIDIA as a Sr.<br />

Software Engineer developing OptiX, a <strong>GPU</strong> accelerated<br />

ray tracing framework. His work with ray tracing dates<br />

back to 2000 at the University of Utah where he worked<br />

under Dr. Steven Parker researching and developing<br />

parallel ray tracing applications for rendering and<br />

scientific visualization. Since coming to NVIDIA in 2008,<br />

James has strived to bring more ray tracing<br />

awesomeness to everyone through OptiX. James holds a<br />

B.S. and M.S. in Computer Science from the University of<br />

Utah.<br />

h Session(s): S0366 - OptiX Out-of-Core and CPU<br />

Rendering (Tuesday, 15:30, Room: J1)<br />

Sam Blackman<br />

CEO and Co-Founder (Elemental Technologies)<br />

Sam Blackman co-founded Elemental Technologies in<br />

2006 and has grown the company into a leading supplier<br />

of video solutions for multiscreen content delivery. Prior<br />

to co-founding Elemental, Sam designed integrated<br />

circuit products for Pixelworks. He has also held<br />

engineering positions at Silicon Graphics and Intel<br />

Corporation. Sam holds an M.B.A from University of<br />

Oregon, an M.S. in electrical engineering from University<br />

of California at Berkeley and a B.S in electrical<br />

engineering from Brown University.<br />

h Session(s): S2005 – Emerging Companies Summit:<br />

CEO on Stage Featuring RealView Imaging,<br />

Elemental Technologies, and Mersive<br />

(Wednesday, 16:00, Marriott Ballroom 4)<br />

Aaron Blasius<br />

Sr. Product Manager (VMware)<br />

Biography unavailable at press time.<br />

h Session(s): S0359 - VMware and NVIDIA: Delivering<br />

3D Workstations from the Cloud<br />

(Tuesday, 17:00, Room: A5)<br />

François Bodin<br />

Chief <strong>Technology</strong> Officer (CTO) (CAPS enterprise)<br />

As chief scientist, François Bodin plans, advises and<br />

advocates the research and development projects which<br />

led to the creation of innovative software tools. François<br />

carries on with its research activities at the Irisa lab,<br />

which focus in code optimization and compiler<br />

technologies for high performance computers and<br />

embedded systems. François is member of HIPEAC, the<br />

European Network of Excellence on High-Performance<br />

Embedded Architecture and Compilation. François has<br />

degrees in computer science from the University of<br />

Rennes I. François Bodin is also Chairman of IRISA<br />

Rennes, a research unit in the forefront of information<br />

and communication science and technology.<br />

h Session(s): S0630 Part 1of 2: <strong>Program</strong>ming<br />

Heterogeneous Many-cores Using Directives<br />

(Presented by CAPS) (Monday, 13:00, Room: A8)<br />

h S0631 Part 2 of 2: <strong>Program</strong>ming Heterogeneous<br />

Many-cores Using Directives (Presented by CAPS)<br />

(Monday, 14:30, Room: A8)<br />

h S0635 - How to Bake Portable Many-Core<br />

<strong>Program</strong>s (Wednesday, 15:00, Room: M)<br />

Robert Boehme<br />

Team Lead & CEO (Part-Time Scientists)<br />

Robert Boehme is Team Lead and CEO of Part-Time<br />

Scientists. The Part-Time Scientists Team consists of<br />

100 international engineers and scientists working in<br />

their free time on the first private mission to the moon.<br />

Over the past two years they managed to get the full<br />

technical development kick-started with a lot of<br />

prototypes and technology taken from the industry back<br />

into space. With five prototype lines, 50 business<br />

partnerships, several cooperations and many hours<br />

testing, the team is amongst the leading competitors for<br />

the 30 million dollar Google Lunar X-PRIZE competition.<br />

h Session(s): S3002 – Day 3 Keynote: Not Your<br />

Grandfather’s Moon Landing<br />

(Thursday, 11:00, Keynote Hall)<br />

Taisuke Boku<br />

Deputy Director of Center for Computational Sciences at<br />

University of Tsukuba (University of Tsukuba)<br />

Biography unavailable at press time.<br />

h Session(s): S0618 – Best Practices of a 800TFlop<br />

Hybrid Supercomputer Implementation<br />

(Tuesday, 09:30, Room: M)<br />

Nikola Bozinovic<br />

CTO (MotionDSP)<br />

Nikola Bozinovic is Chief <strong>Technology</strong> Officer at<br />

MotionDSP where he leads all technical efforts and<br />

oversees product development. As the company’s key<br />

technologist, he leverages his expertise in signal<br />

processing, image and video analysis, and video<br />

compression to provide people and organizations around<br />

the world with groundbreaking video technology. Prior to<br />

establishing MotionDSP’s engineering department,<br />

Nikola was as a senior software engineer at Veodia, a<br />

video streaming and distribution company, and a<br />

research scientist at Microsoft. Nikola holds M.S. and<br />

Ph.D. degrees from Boston University, where he was a<br />

Dean’s Fellow.<br />

h Session(s): S0527 - <strong>GPU</strong>s and the Next-Generation<br />

Aerial Surveillance (Tuesday, 09:00, Room: J2)<br />

Wil Braithwaite<br />

Senior Applied Engineer (NVIDIA)<br />

Wil Braithwaite has worked for 15 years in VisualFX at<br />

studios in London and Los Angeles, including<br />

FrameStore, MPC, and the Jim Henson Company.<br />

Positions ranged from Technical direction, Compositing,<br />

CG Supervision, and Mocap supervision. He has<br />

pioneered the use of graphics hardware in the VFX<br />

workflow, which led to his role at NVIDIA as a Senior<br />

Applied-Engineer for VFX, where he specializes in<br />

consulting, training and assisting development for studio<br />

projects utilizing NVIDIA technologies.<br />

h Session(s): S0364 - Interacting with Huge<br />

Particle Simulations in Maya with the <strong>GPU</strong><br />

(Tuesday, 14:00, Room: J1)


Thomas Brandes<br />

Senior Scientist (Fraunhofer Scientific Computing<br />

Institute (FhG-SCAI))<br />

Thomas Brandes received his PhD in Applied<br />

Mathematics in 1988 from the University in Marburg.He<br />

joined Fraunhofer’s Scientific Computing Institute<br />

(FhG-SCAI) in 1989. He is working as a senior scientist<br />

on the design, parallelization and optimization of<br />

scientific applications for all kinds of parallel<br />

architectures. His research interests are centered<br />

around parallelization tools, cache optimization, <strong>GPU</strong><br />

programming and object-oriented design of parallel<br />

software.<br />

h Session(s): S0705 - Los Alamos AHPC Symposium,<br />

Efficient AMG on Hybrid <strong>GPU</strong> Clusters<br />

(Wednesday, 17:00, Room: J1)<br />

Vincent Brisebois<br />

Visual Computing Product Manager (Fusion-io)<br />

As Visual Computing Product Manager at Fusion-io,<br />

Vincent Brisebois works closely with entertainment<br />

production studios on implementing solutions that<br />

facilitate new levels of creativity, productivity and<br />

worldwide collaboration. Vincent has designed<br />

technology solutions for 2D and 3D production in the<br />

visual effects, video game and design industries for over<br />

15 years.<br />

h Session(s): S0619 - Hate to Wait? Flash Memory<br />

for Full-Throttle <strong>GPU</strong> Acceleration<br />

(Thursday, 09:00, Room: L)<br />

John Brown<br />

Principal Engineer (Hewlett-Packard)<br />

John is a Principal Engineer in Hewlett-Packard’s<br />

Workstation Graphics Research and Development,<br />

engineering graphics and workstation solutions since<br />

1984. He has contributed to a wide variety of HP<br />

products and projects for 24 years, ranging from HP’s<br />

SRX graphics processor, to HP’s SV6 Scalable<br />

Visualization solution, to HP’s latest family of high-end<br />

workstation platforms.<br />

h Session(s): S0633 – Learn about new Hewlett-<br />

Packard <strong>GPU</strong> Systems, Solutions, and Applications!<br />

(Wednesday, 10:00, Room: M)<br />

Kevin J. Brown<br />

Research Assistant (Stanford University)<br />

Biography unavailable at press time.<br />

h Session(s): S0365 – Delite: A Framework for<br />

Implementing Heterogeneous Parallel DSLs<br />

(Wednesday, 15:00, Room: C)<br />

Andreas Buhr<br />

Department Manager - Performance Optimization<br />

(CST AG)<br />

Andreas Buhr works on performance optimization at<br />

CST AG since 2009. He holds a bachelor’s degree in<br />

physics and a master’s degree in applied physics from<br />

the Technical University Darmstadt. He is working with<br />

CUDA since its version 0.9.<br />

h Session(s): S0069 – <strong>GPU</strong> Computing Advances<br />

in 3D Electromagnetic Simulation<br />

(Tuesday, 14:00, Room: J3)<br />

Martin Burtscher<br />

Associate Professor (Texas State University)<br />

Martin Burtscher is Associate Professor in the<br />

Department of Computer Science at Texas State<br />

University. He received the combined BS/MS degree in<br />

computer science from the Swiss Federal Institute of<br />

<strong>Technology</strong> (ETH) Zurich in 1996 and the Ph.D. degree in<br />

computer science from the University of Colorado at<br />

Boulder in 2000. Martin’s research interests include<br />

efficient parallelization of programs for <strong>GPU</strong>s as well as<br />

automatic performance assessment and optimization of<br />

HPC applications. He is a senior member of the IEEE, its<br />

Computer Society, and the ACM. Martin has co-authored<br />

over 60 peer-reviewed publications, including a <strong>GPU</strong><br />

Computing Gems chapter.<br />

h Session(s): S0111 - An Efficient CUDA<br />

Implementation of a Tree-Based N-Body Algorithm<br />

(Thursday, 15:30, Room: M)<br />

Michael Bussmann<br />

Junior Group Leader Computational Radiation Physics<br />

(Helmholtz-Zentrum Dresden-Rossendorf)<br />

Michael Bussmann is a member of the Laser Particle<br />

Acceleration Group at the Helmholtz-Zentrum Dresden-<br />

Rossendorf (HZDR). He leads the Junior Group on<br />

Computational Radiation Physics, looking for ways to<br />

create and optimize new sources of radiation using<br />

high-intensity lasers. His goal is to create low-cost,<br />

compact, laser-driven sources of ion, electron and X-ray<br />

beams that can be used to understand the properties of<br />

matter on the atomic scale. Besides his interest in<br />

fundamental physics Michael helps to make laser-driven<br />

ion beams available to cancer patients for ion beam<br />

treatment of tumors. With <strong>GPU</strong>s he has been able to<br />

simulate the generation of laser-driven particle beams<br />

in a new, much faster way. Since then, Michael is used to<br />

think of computation speed in frames per second.<br />

h Session(s): S0067 - PICon<strong>GPU</strong> - Bringing largescale<br />

Laser Plasma Simulations to <strong>GPU</strong><br />

Supercomputing (Tuesday, 15:00, Room: A8)<br />

h S0708- Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 1<br />

(Thursday, 9:00, Room: J3)<br />

Javier Cabezas<br />

PhD Student (Barcelona Supercomputing Center)<br />

Javier Cabezas received a bachelor’s degree in Computer<br />

Science and a master’s degree in Computer Architecture<br />

from Universitat Politècnica de Catalunya (UPC). Since<br />

2008, he is a PhD student in the Computer Architecture<br />

Department at UPC. He also works in the Barcelona<br />

Supercomputing Center as a resident student since 2009.<br />

He has contributed to projects done in collaboration with<br />

companies like Hewlett-Packard, NXP and Repsol. His<br />

research is focused on operating system and run-time<br />

support for heterogeneous massively-parallel computing<br />

systems and massively-parallel accelerators.<br />

h Session(s): S0333 - GMAC-2: Easy and Efficient<br />

<strong>Program</strong>ming for CUDA-Based Systems<br />

(Thursday, 09:00, Room: B)<br />

Tugkan Calapoglu<br />

Lead Graphics Software Developer (VIRES<br />

Simulationstechnologie GmbH)<br />

Tugkan Calapoglu is the lead graphics software<br />

developer at Vires GmbH, Germany, with more than 10<br />

years of experience in visual simulation industry. He is<br />

working on design and development of 3D rendering<br />

software for real-time hardware-in-the-loop and<br />

human-in-the-loop simulation applications.<br />

h Session(s): S0319 – Advanced Driver<br />

Assistance System Testing using OptiX<br />

(Tuesday, 14:00, Room: N)<br />

D. Andrew Carr<br />

Director of Bioinformatics (Accelerated <strong>Technology</strong><br />

Laboratories, Inc.)<br />

D. AndrewCarr, Ph.D. is the Director of Bioinformatics for<br />

Accelerated <strong>Technology</strong> Laboratories where he oversees<br />

the design and development of new high through put<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

107


SPEAKERS AND<br />

PANELISTS<br />

computational and database tools for use in human<br />

genomic scale analysis projects. Andrewreceived his<br />

received his Ph.D. in Computational Science<br />

Bioinformatics from George Mason University in 2006.<br />

After spending a year as a research assistant professor in<br />

Computational Materials Science Center and<br />

Nanotechnology at GMU, he took a postdoctoral position at<br />

University of North Carolina Charlotte, where he worked<br />

developing tools algorithms, database and visualization<br />

tools for genomic microarray and sequence analysis.<br />

h Session(s): S0037 - SeqNFind: Application Of<br />

CUDA <strong>GPU</strong> Technologies To Sequence Alignment<br />

Techniques (Tuesday, 17:00, Room: K)<br />

Patrice Castonguay<br />

Emerging Applications Intern (NVIDIA)<br />

Patrice Castonguay is completing his Ph.D. in the<br />

Aeronautics and Astronautics department at Stanford<br />

University working under the supervision of Professor<br />

Antony Jameson at the Aerospace Computing Lab. His<br />

research focuses on unstructured high-order methods<br />

for fluid flow simulations and on the use of <strong>GPU</strong>s for<br />

algorithm developments in high performance<br />

computing. Recently, he worked in the Emerging<br />

Applications group at NVIDIA on the development of<br />

algebraic multigrid methods.<br />

h Session(s): S0332 - Efficient Graph Matching<br />

and Coloring on the <strong>GPU</strong><br />

(Wednesday, 16:00, Marriott Ballroom 3)<br />

Bryan Catanzaro<br />

Research Scientist (NVIDIA)<br />

Bryan recently received his PhD from the University of<br />

California at Berkeley, where he researched compilation<br />

techniques for embedded data parallel languages. He<br />

then joined NVIDIA Research, where he focuses on<br />

developing the Copperhead runtime and compiler.<br />

h Session(s): S0525 - Copperhead: Data Parallel<br />

Python (Wednesday, 16:30, Room: A3)<br />

Ulises Cervantes-Pimentel<br />

Senior Kernel Developer (Wolfram Research)<br />

Ulises Cervantes-Pimentel is Wolfram’s research lead<br />

kernel developer in visualization, computational teometry<br />

and <strong>GPU</strong> development since 2001. Ulises is a graduate<br />

from the University of Illinois at Urbana-Champaign<br />

h Session(s): S0430 – Developing Next-Generation<br />

CUDA Acceleration in Wolfram’s Mathematica with<br />

Parallel Nsight (Tuesday, 09:30, Room: B)<br />

h S0106 - <strong>GPU</strong> Based Numerical Methods in<br />

Mathematica (Thursday, 14:30, Room: L)<br />

Dominic Chandar<br />

Postdoctoral Research Associate (University of Wyoming)<br />

Dominic is a Postdoc at the University of Wyoming, and<br />

works on <strong>GPU</strong> acceleration for CFD codes. He has a PhD<br />

in Mechanical and Aerospace Engineering from Nanyang<br />

Technological University, Singapore, and a Masters in<br />

Aerospace Engineering from Indian Institute of Science,<br />

India. He has also held the position of a Scientist in the<br />

Defense Research and Development Organization, India.<br />

h Session(s): S0264 - CU++: An Object-Oriented<br />

Framework for Computational Fluid Dynamics<br />

(CFD) Applications (Thursday, 09:30, Room: A8)<br />

Jacqueline H. Chen<br />

Combustion Research Facility,National Laboratories<br />

Jacqueline H. Chen is a Distinguished Member of<br />

Technical Staff at the Combustion Research Facility at<br />

Sandia National Laboratories. She has contributed<br />

broadly to research in petascale direct numerical<br />

simulations (DNS) of turbulent combustion focusing on<br />

fundamental turbulence-chemistry interactions. These<br />

benchmark simulations provide fundamental insight into<br />

combustion processes and are used by the combustion<br />

modeling community to develop and validate turbulent<br />

combustion models for engineering CFD simulations. In<br />

collaboration with computer scientists and applied<br />

mathematicians she is the Director of the Center for<br />

Exascale Simulation of Combustion in Turbulence<br />

(ExaCT) co-designed exascale DNS algorithms together<br />

with exascale computer architectures including in-situ<br />

data mining and visualization.<br />

h Session(s): S0655 Direct Numerical Simulation of<br />

Turbulence-Chemistry Interactions: Fundamental<br />

Insights Towards Predictive Models<br />

(Tuesday, 14:30, Room: A2)<br />

Jeff Chien<br />

Principle Scientist (Adobe Systems)<br />

Biography unavailable at press time.<br />

h Session(s): S0395 – <strong>GPU</strong> Enablement in Adobe<br />

Photoshop (Tuesday, 09:00, Room: A2)<br />

Suren Chilingaryan<br />

Researcher (Karlsruhe Institute of <strong>Technology</strong>)<br />

Suren Chilingaryan is a data processing expert at<br />

Institute for Data Processing and Electronics at<br />

Karlsruhe Institute of <strong>Technology</strong>. He graduated in<br />

mathematics from Moscow State University and was<br />

awarded a Ph.D. degree in Computer Science from<br />

Armenian National Academy of Sciences. He works on<br />

data acquisition and slow control systems for the long<br />

running scientific experiments. The current research<br />

focus is a high performance data processing.<br />

h Session(s): S0259 - A High Performance<br />

Platform for Real-Time X-Ray Imaging<br />

(Wednesday, 15:00, Room: A8)<br />

Samuel Cho<br />

Assistant Professor (Wake Forest University)<br />

Sam graduated from the University of Maryland, Baltimore<br />

County with B.S. degrees in Biochemistry and Computer<br />

Science. He went on to receive a Ph.D. in Physical<br />

Chemistry at the University of California, San Diego. Since<br />

then, he performed post-doctoral research at the<br />

University of Maryland, College Park, where he was<br />

awarded the NIH (NRSA) Post-doctoral Fellowship. He has<br />

published his interdisciplinary computational biophysics<br />

research in protein and RNA dynamics, folding and<br />

assembly in over 15 papers in peer-reviewed journals,<br />

including four as first author in the high impact factor<br />

journal, Proceedings of the National Academy of Sciences.<br />

h Session(s): S0139 - <strong>GPU</strong>-Based Molecular<br />

Dynamics Simulations of Protein and RNA<br />

Assembly (Wednesday, 17:00, Room: N)<br />

Jike Chong<br />

Co-Director of CUDA Research Center<br />

(Carnegie Mellon University)<br />

Jike Chong is an adjunct professor at Carnegie Mellon<br />

Silicon Valley and directs the CUDA Teaching Center and<br />

the CUDA Research Center there. For the past 10 years,<br />

he has been working on multicore, manycore and<br />

parallel computing technologies at Carnegie Mellon<br />

University, Intel Research Labs, and Sun Microsystems<br />

and University of California, Berkeley. His research<br />

interests include speech recognition and analytics,<br />

quantitative financial analytics, and design patterns for<br />

parallel programming. Jike earned his Ph.D. from UC<br />

Berkeley, M.S. and B.S. for Carnegie Mellon University.<br />

h Session(s): S0223 - Rapid Training of Acoustic<br />

Models Using <strong>GPU</strong>s (Tuesday, 15:00, Room: N)


Constantin Chuyeshov<br />

Algorithm Engineer (Cadence Design Systems)<br />

Constantin Chuyeshov is an Algorithm Engineer with<br />

Computational Lithography Solutions Group at Cadence<br />

Design Systems. He is focusing on computational<br />

lithography, image processing and high-performance<br />

computing. Constantin was born in 1979 in Kharkov,<br />

Ukraine. He got his BS degree in Mathematical Physics<br />

and Applied Mathematics from Karazin Kharkov National<br />

University (Ukraine) and MSc degree in Computational<br />

Mathematics from Stanford University.<br />

h Session(s): S0329 - Using <strong>GPU</strong>s to Speedup<br />

Computational Lithography<br />

(Tuesday, 9:30, Room: J3)<br />

Gilles Civario<br />

Senior Software Architect (ICHEC)<br />

Gilles Civario is <strong>GPU</strong> software architect in ICHEC, PI of<br />

ICHEC’s NVIDIA CUDA Research Center, and a NVIDIA<br />

Certified CUDA <strong>Program</strong>mer. Gilles is involved directly or<br />

indirectly in all of ICHEC’s <strong>GPU</strong>-related projects. His<br />

involvement ranges from software or hardware<br />

architectural advices, to code development and tuning,<br />

debugging and implementation. Gilles also regularly<br />

presents talks to explain <strong>GPU</strong> computing and its benefits,<br />

and runs NVIDIA certified CUDA training courses. His<br />

unique expertise in both hardware and software allows<br />

him to design and propose tailored solutions to address<br />

each users’ particular needs. Gilles is particularly involved<br />

in ICHEC’s technology transfer activities.<br />

h Session(s): S0034 - Real-Time Risk Simulation:<br />

The <strong>GPU</strong> Revolution In Profit Margin Analysis<br />

(Tuesday, 15:00, Room: L)<br />

Geoff Clark<br />

CEO (Acceleware Ltd.)<br />

Before joining Acceleware, Geoff was CFO of SQFive a<br />

private oil and gas technology company, and of TSX listed<br />

Guest-Tek Interactive Entertainment Ltd. While with<br />

Guest-Tek, Geoff was instrumental in completing two<br />

major acquisitions, a share buyback, and several private<br />

placements of debt and equity. Geoff was a co-founder of<br />

Revolve Magnetic Bearings Inc. a supplier of magnetic<br />

levitation systems. Geoff secured several rounds of<br />

financing for Revolve and was instrumental in Revolve’s<br />

eventual sale to Sweden’s SKF. Geoff holds an MBA<br />

degree from the University of Western Ontario, and a<br />

BSc in Electrical Engineering from the University of<br />

Calgary.<br />

h Session(s): S0433 - Accelerated FDTD Technique<br />

for Marine Controlled Source Electromagnetic<br />

Imaging (Wednesday, 15:30, Room: A7)<br />

Michael Clark<br />

Compute DevTech Engineer (NVIDIA)<br />

Dr. Clark’s background is in high energy physics, having<br />

completed his doctoral research in Monte Carlo<br />

algorithms for lattice qcd in 2005, graduating from the<br />

University of Edinburgh. He subsequently moved to<br />

Boston University, developing adaptive multi-grid<br />

algorithms and symplectic integrators. There, he initiated<br />

research into harnessing <strong>GPU</strong>s for lattice QCD<br />

computation. Dr. Clark spent 2009-2011 at Harvard<br />

University, where he continued to work on algorithms for<br />

<strong>GPU</strong>s and many-core processors, with focus on signal<br />

processing and multigrid. Dr. Clark moved to NVIDIA in<br />

2011, where his present work lies at the interface between<br />

applications, algorithms and parallel computation.<br />

h Session(s): S0347 - Accelerating Radio Astronomy<br />

Cross-Correlation beyond 1 Tflops using Fermi<br />

(Thursday, 09:00, Room: M)<br />

Don Clegg<br />

VP (Supermicro)<br />

Biography unavailable at press time.<br />

h Session(s): S0636 - Supermicro: Worldwide leader<br />

in GP/<strong>GPU</strong> Servers and Workstation Platforms<br />

(Wednesday, 16:00, Room: M)<br />

Esteban Clua<br />

Professor (Computer Science Department of<br />

Universidade Federal Fluminense, Rio de Janeiro, Brazil)<br />

Esteban is associated professor at Universidade Federal<br />

Fluminense, Rio de Janeiro, and director of UFF<br />

Medialab. He is one of the founders of SBGames -<br />

Brazilian Symposium of Digital Entertainment and Video<br />

Games, is director of Academia of IGDA-Rio, president of<br />

the Brazilian Computing Society Game. In 2007 received<br />

an award for contributing to the growth of the video<br />

game industry in Brazil and in 2009 received the prize of<br />

Young Scientist of the State of Rio de Janeiro. Esteban is<br />

coordinator of the first Latin America CUDA NVIDIA<br />

Research Center, at UFF Medialab.<br />

h Session(s): S0074 – Techniques for Designing<br />

GP<strong>GPU</strong> Games (Thursday, 17:00, Room: L)<br />

Jonathan Cohen<br />

Emerging Applications (NVIDIA)<br />

Jonathan Cohen leads the Emerging Applications group<br />

as part of NVIDIA’s Content and <strong>Technology</strong> organization.<br />

Emerging Applications seeks to develop enabling<br />

technologies that will allow end-users to access the<br />

power of <strong>GPU</strong> computing in a wide variety of application<br />

areas. Previously, he spent three years as a senior<br />

research scientist with NVIDIA Research developing<br />

scientific computing and real-time physical simulation<br />

applications on NVIDIA’s massively parallel <strong>GPU</strong>s. Cohen<br />

was awarded an Academy Award (Technical Achievement<br />

Award) in 2007 from the Academy of Motion Pictures<br />

Arts and Sciences for his work on fluid simulation and<br />

volumetric modeling for visual effects. He received an<br />

undergraduate degree from Brown in Mathematics and<br />

Computer Science.<br />

h Session(s): S0332 – Efficient Graph Matching<br />

and Coloring on the <strong>GPU</strong><br />

(Wednesday, 16:00, Marriott Ballroom 3)<br />

Chris A. Cocosco<br />

Scientist (University Medical Center Freiburg, Dept. of<br />

Radiology, Medical Physics.)<br />

Chris A. Cocosco has spent over 15 years in research &<br />

development at the intersection of medical imaging,<br />

electrical engineering, computer science, and high<br />

performance computing, in both academic/clinical and<br />

industrial/commercial environments.<br />

h Session(s): S0348 - <strong>GPU</strong>s Open New Avenues in<br />

Medical MRI (Wednesday, 10:30, Room: A8)<br />

Andrew Corrigan<br />

Research Mathematician (Naval Research Laboratory)<br />

AndrewCorrigan has been a scientist at the Laboratory<br />

for Computational Physics and Fluid Dynamics at the US<br />

Naval Research Laboratory since 2010, where he is<br />

developing the Jet Engine Noise Reduction (JENRE)<br />

code. His research interests are in supersonic jet noise<br />

reduction and algorithms for high performance CFD<br />

solvers. He received his Ph.D. in 2009 from George<br />

Mason University, where he also worked as a<br />

postdoctoral researcher in the GMU CFD Center, porting<br />

the unstructured grid CFD code FEFLO to run on <strong>GPU</strong>s.<br />

h Session(s): S0031 - Unstructured Grid Numbering<br />

Schemes for <strong>GPU</strong> Coalescing Requirements<br />

(Tuesday, 10:00, Room: A8)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

109


SPEAKERS AND<br />

PANELISTS<br />

Iain Couzin<br />

Professor, Department of Ecology and Evolutionary<br />

Biology (Princeton University)<br />

Iain Couzin joined the Princeton faculty in late 2007.<br />

Prior to joining the faculty there, he was a Royal Society<br />

University Research Fellow in the Department of<br />

Zoology, University of Oxford, and a Junior Research<br />

Fellow in the Sciences at Balliol College, Oxford. His<br />

work aims to reveal the fundamental principles that<br />

underlie evolved collective behavior, and consequently<br />

his research includes the study of a wide range of<br />

biological systems, from brain tumors to insect swarms,<br />

fish schools and human crowds. Couzin is a member of<br />

the Faculty of 1000 Biology and in recognition of his<br />

research he was a recipient of the Searle Scholar Award<br />

in 2008, the Mohammed Dahleh Award in 2009 and<br />

Popular Science Magazines “Brilliant 10” award in 2010.<br />

Couzin holds a PhD in Biology from the University of<br />

Bath, UK.<br />

h Session(s): S3001: Day 2 Keynote: From Democratic<br />

Consensus to Cannibalistic Hordes: <strong>GPU</strong> Computing<br />

Reveals the Principles of Collective Behavior<br />

(Wednesday, 11:00, Keynote Hall)<br />

Cyril Crassin<br />

Postdoctoral Research Scientist (NVIDIA)<br />

Cyril Crassin joined NVIDIA Research in 2011 as a<br />

postdoctoral research scientist. Cyril obtained his Ph.D.<br />

degree from Grenoble University at INRIA in France in<br />

2011. His research interests include realistic rendering,<br />

voxel-based representations, global illumination,<br />

real-time ray-tracing and out-of-core data management.<br />

During his Ph.D., he developed the GigaVoxels approach<br />

that proposed the use of pre-filtered voxel representations<br />

for real-time rendering of large detailled scenes, complex<br />

objects, as well as global illumination effects.<br />

h Session(s): S0610 - Octree-Based Sparse<br />

Voxelization For Real-Time Global Illumination<br />

(Tuesday, 14:30, Room: B)<br />

Luis Crivelli<br />

Director of Solver Development (Dassault Systemes,<br />

SIMULIA)<br />

Biography unavailable at press time.<br />

h Session(s): S0431 - Evolving Use of <strong>GPU</strong> for<br />

Dassault Systems Simulation Products<br />

(Wednesday, 09:00, Room: K)<br />

Jon Currey<br />

(Microsoft Research Silicon Valley)<br />

Jon Currey joined Microsoft Research in 2007, initially<br />

working on the Dryad and DryadLINQ cluster computing<br />

projects. His current research focus is systems support for<br />

<strong>GPU</strong>-accelerated computation. Jon previously worked for<br />

Apple, Oracle, Nortel and some startups. He holds a BA<br />

and MA in philosophy from the University of Cambridge.<br />

h Session(s): S0320 – PTask: OS Support for <strong>GPU</strong><br />

Dataflow <strong>Program</strong>ming (Thursday, 14:00, Room: B)<br />

Kenneth Czechowski<br />

Student (Georgia Tech)<br />

Kenneth Czechowski is a PhD student in the School of<br />

Computational Science and Engineering at the Georgia<br />

Institute of <strong>Technology</strong>. His research interests include<br />

algorithm-architecture codesign, performance modeling<br />

for <strong>GPU</strong>/manycore architectures, and parallel and<br />

distributed algorithms. Czechowski holds a masters in<br />

computer science from the Georgia Institute of <strong>Technology</strong>.<br />

h Session(s): S0362 - Maximizing Performance on<br />

Multi-<strong>GPU</strong> Systems (Thursday, 09:00, Hall 1)<br />

Johann Dahm<br />

(University of Michigan)<br />

Biography unavailable at press time.<br />

h Session(s): S0031 – Unstructured Grid Numbering<br />

Schemes for <strong>GPU</strong> Coalescing Requirements<br />

(Tuesday, 10:00, Room: A8)<br />

Abdul Dakkak<br />

Wolfram Research)<br />

Biography unavailable at press time.<br />

h Session(s): S0100 – Mathematica as a Practical<br />

Platform for <strong>GPU</strong>-Accelerated Finance<br />

(Wednesday, 17:00, Room: L)<br />

h S0106 – <strong>GPU</strong> Based Numerical Methods in<br />

Mathematica (Thursday, 14:30, Room: L)<br />

Eric Darve<br />

Professor (Stanford)<br />

Prof. Darve received his PhD in Applied Mathematics from<br />

Pierre et Marie Curie University, Paris, France (1999),<br />

while working in the Jacques-Louis Lions Numerical<br />

Analysis Laboratory under the supervision of Prof. Olivier<br />

Pironneau. He was a postdoctoral fellow at Stanford in the<br />

Center for Turbulence Research, under the supervision of<br />

Prof. Parviz Moin and Dr. AndrewPohorille (NASA Ames<br />

Research Center). He became an assistant professor of<br />

Mechanical Engineering at Stanford University in 2001<br />

and was promoted to Associate Professor in 2010. He is a<br />

member of the Institute for Computational and<br />

Mathematical Engineering, a CUDA Center of Excellence.<br />

This work is in collaboration with Dr. Toru Takahashi<br />

(Nagoya University) and Dr. Cris Cecka (Harvard).<br />

h Session(s): S0334 - The Fast Multipole Method<br />

on CPU and <strong>GPU</strong> Processors<br />

(Thursday, 15:00, Marriott Ballroom 3)<br />

Guy De Beer<br />

CEO (Playcast Media System)<br />

Guy founded Playcast Media System. During his 16 years<br />

in the digital media communications industry, he led the<br />

successful development and commercialization of<br />

dozens of digital media communications products and<br />

services. Prior to founding Playcast, Guy managed<br />

Harmonic’s (NASDAQ: HLIT) Broadcast and VoD edge<br />

product lines. Before joining Harmonic, he held several<br />

product marketing and business development<br />

management positions with the MRV group (NASDAQ:<br />

MRVC). Guy holds a BA in Media from the University of<br />

Bar-Ilan in Israel and an MA in Philosophy of Digital<br />

Culture from the University of Tel Aviv.<br />

h Session: – S2006- Emerging Companies Summit:<br />

CEO on Stage Featuring Raytrix and Playcast,<br />

Featuring Raytrix, Playcast and Universal Robotics<br />

(Wednesday, 17:00, Marriott Ballroom 4)<br />

Jose de Corral<br />

Principal Consulting Engineer (Waters Corporation)<br />

Jose is currently Principal Consulting Engineer at Waters<br />

Corporation. Jose de Corral received his B.S. in Electrical<br />

Engineering from Universidad Politénica de Madrid, and<br />

his M.S. in Software Engineering from Harvard University.<br />

Jose has a long career at Waters, where he started in<br />

1983. He has been involved in many R&D design projects,<br />

specializing in analog electronic design, feedback control<br />

systems, and embedded software development. Jose’s<br />

preferences evolved toward the design of complex<br />

algorithms for data processing and instrument control.<br />

Since 2007, his main focus has been in Computer<br />

Graphics and <strong>GPU</strong> Computing.<br />

h Session(s): S0327 - Large and Sparse– Mass<br />

Spectrometry Data Processing in the <strong>GPU</strong><br />

(Wednesday, 14:00, Room: B)


Mario Dean<br />

Schlumberger<br />

Mario Dean’s current role is remote application delivery<br />

product champion at Schlumberger Information<br />

Solutions.<br />

h S0434 Schlumberger LiveQuest: Application<br />

Delivery and Collaboration Solution<br />

(Tuesday, 14:00, Room: A7)<br />

Julien Demouth<br />

Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Julien Demouth is a Developer <strong>Technology</strong> Engineer at<br />

NVIDIA where he works mainly on CUDA for high<br />

performance computing. Julien obtained his Ph.D.<br />

degree in Computational Geometry from Nancy<br />

University at INRIA in France.<br />

h Session(s): S0602 – An Introduction to the<br />

Thrust Parallel Algorithms Library<br />

(Tuesday, 17:00, Room: A3)<br />

h S0285 - Optimization of a Sparse Matrix-Matrix<br />

Multiplication on the <strong>GPU</strong><br />

(Thursday, 14:00, Room: L)<br />

Yangdong Deng<br />

Associate Professor (Tsinghua University)<br />

Yangdong Deng received his Ph.D. degree in Electrical<br />

and Computer Engineering from Carnegie Mellon<br />

University, Pittsburgh, PA, in 2006. He received his MS<br />

and BE degrees in Electronic Department from Tsinghua<br />

University, Beijing, in 1998 and 1995, respectively. He has<br />

been an associate professor of Institute of<br />

Microelectronics, Tsinghua University, since 2008. He<br />

also leads the systems modeling team of the Tsinghua-<br />

Intel Center of Advanced Mobile Computing <strong>Technology</strong>.<br />

His research interests include VLSI verification, parallel<br />

microarchitecture, and parallel algorithms. He is the<br />

author or co-author of three books and over 30 papers.<br />

h Session(s): S0050 - High Performance Logic<br />

Simulation with <strong>GPU</strong>s (Tuesday, 16:00, Room: J3)<br />

Kristof Denolf<br />

Research Engineer (Barco)<br />

Kristof Denolf received the M.Eng. degree in electronics<br />

from the KHBO(Belgium) in 1998, the M.Sc. degree in<br />

electronic system design from LMU (U.K.) in 2000 and a<br />

PhD from the Technische Universiteit Eindhoven in 2007.<br />

He joined IMEC, in August 1998, as research engineer<br />

focusing on optimized, low power video implementations.<br />

During 2008, he spent six months as a visiting<br />

researcher at Xilinx research labs to work with highlevel<br />

synthesis tools. In 2010, he was as SW architect at<br />

Philips. Recently he joined Barco’s technology center,<br />

working on cost efficient design of advanced video<br />

processing systems.<br />

h Session(s): S0252 - Building Real-Time<br />

Professional Visualization Solutions with OpenCL<br />

(Thursday, 10:30, Room: A1)<br />

Luiz DeRose<br />

Director of <strong>Program</strong>ming Environment (Cray Inc.)<br />

Dr. Luiz DeRose is a Senior Principal Engineer and the<br />

<strong>Program</strong>ming Environments Director at Cray Inc, where<br />

he is responsible for the programming environment<br />

strategy for all Cray systems. Dr. DeRose has a Ph.D. in<br />

Computer Science from the University of Illinois at<br />

Urbana-Champaign. With more than 20 years of high<br />

performance computing experience and a deep knowledge<br />

of its programming environments, he has published more<br />

than 50 peer-review articles in scientific journals,<br />

conferences, and book chapters, primarily on the topics of<br />

compilers and tools for high performance computing.<br />

h Session(s): S0407 - A High Level <strong>Program</strong>ming<br />

Environment for Accelerated Computing<br />

(Tuesday, 15:00, Room: A3)<br />

Ronny Dewaele<br />

Director <strong>Technology</strong> Center (Barco)<br />

Biography unavailable at press time.<br />

h Session(s): S0252 – Building Real-Time<br />

Professional Visualization Solutions with OpenCL<br />

(Thursday, 10:30, Room: A1)<br />

Tanmay Dharmadhikari<br />

Senior Software Development Engineer (Beckman-Coulter)<br />

Biography unavailable at press time.<br />

h Session(s): S0638 – Lenovo ThinkStation<br />

Accelerates Medical Research with Beckman<br />

Coulter (Presented by Lenovo)<br />

(Tuesday, 16:00, Room: M)<br />

Michael Dickens<br />

Graduate Student (University of Notre Dame)<br />

Michael L. Dickens is a Ph.D. candidate in Electrical<br />

Engineering at the University of Notre Dame. He<br />

received a B.S. from MIT in 1991, and a M.S. degree from<br />

the University of Notre Dame in 2001. He has more than<br />

10 years of industry experience, having worked at the<br />

Oak Ridge National Labs (Oak Ridge, TN), Bolt Beranek<br />

and Newman (“BBN”, Cambridge, MA), and most<br />

recently the MITRE Corporation (Bedford, MA). His<br />

current research interests span all aspects of<br />

programming for software-defined radios -- from<br />

system boot codes to kernels, signal-processing<br />

algorithm implementations to user interfaces.<br />

h Session(s): S0134 - On the Integration of<br />

OpenCL into a Software Defined Radio<br />

(Thursday, 17:30, Room: M)<br />

Michael Dixon<br />

Research Engineer (Willow Garage, Inc)<br />

Biography unavailable at press time.<br />

h Session(s): S0088 – Point Cloud Library (PCL) on<br />

CUDA (Tuesday, 14:00, Room: C)<br />

Sebastien Domine<br />

Sr. Director, Software Engineering, Developer<br />

Tools (NVIDIA)<br />

Sébastien is the Sr. Director of Developer <strong>Technology</strong><br />

Tools at NVIDIA. He runs various software engineering<br />

teams and oversees the development of software<br />

products dedicated to ease the developer’s life and to<br />

foster the creation of more applications that can take<br />

advantage of the <strong>GPU</strong>. Prior to NVIDIA, he worked on PC<br />

games at GameFX/THQ and 3D digital content creation<br />

tools at Katrix and Nichimen Graphics. He holds a<br />

Diplôme d’Ingénieur in Computer Science from EPITA,<br />

Paris, France.<br />

h Session(s): S0430 - Developing Next-Generation<br />

CUDA Acceleration in Wolfram’s Mathematica with<br />

Parallel Nsight (Tuesday, 09:30, Room: B)<br />

Mathieu Dubois<br />

(Bull)<br />

Mathieu joined Bull in 2009 as a <strong>GPU</strong> and hardware<br />

accelerator expert. After an engineering degree in<br />

electronics and a PhD in theoretical physics and<br />

nano-sciences, he started porting electronic transport<br />

applications to Graphical Processing Units in 2007, as<br />

part of a postdoctoral project for the simulation of new<br />

materials for nano-electronics. Now a member of the<br />

BULL’s Applications & Performance Team based in<br />

Grenoble, France, his main <strong>GPU</strong> activities are<br />

benchmarking, CUDA and OpenCL training, Proofs Of<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

111


SPEAKERS AND<br />

PANELISTS<br />

Concept and new technology evaluations. In 2011, he<br />

was heavily involved in the deployment of the three<br />

largest <strong>GPU</strong> clusters in Europe, at CEA, GENCI and the<br />

Barcelona Supercomputing Centre.<br />

h Session(s): S0643 Hybrid Architectures for<br />

Advanced Seismic Imaging: Recent Experiences at<br />

Bull (Presented by Bull) (Tuesday, 17:00, Room: M)<br />

Eric Dunn<br />

Electromagnetic Research Scientist (SAIC)<br />

Dr. Dunn has been a research scientist at SAIC since<br />

2005 responsible for planning and executing a diverse<br />

range of solutions to problems that employ<br />

computational electromagnetics. His current<br />

responsibilities involve serving as a principle investigator<br />

to research high frequency asymptotic methods and<br />

hybrid techniques. His research interests involve<br />

studying hardware and software acceleration for<br />

high-performance scientific computing. He has been<br />

involved with product development and training for many<br />

SAIC software tools as well as outreach to Universities<br />

for collaboration and research mentoring. BSEE/<br />

UMCP/1999, MS/UIUC/2000, PhD/UIUC/2005.<br />

h Session(s): S0046 - Application of the <strong>GPU</strong> to a<br />

Two-Part Computational Electromagnetic<br />

Algorithm (Tuesday, 14:30, Room: J3)<br />

Daniel Egloff<br />

Managing Partner (QuantAlea GmbH)<br />

Dr. Daniel Egloff studied mathematics, theoretical<br />

physics, and computer science at the University of<br />

Zurich and the ETH Zurich. He has been working for the<br />

last 17 years in the financial industry, mainly in risk<br />

management, credit risk, and derivative pricing. Since<br />

2007 he is actively working with <strong>GPU</strong>s to accelerate<br />

quantitative financial calculations. In 2010 he founded<br />

QuantAlea, a niche consulting firm providing specialized<br />

project services in the area of derivative modeling,<br />

statistical arbitrage strategies and risk management<br />

paired with first class software engineering.<br />

h Session(s): S0405 - New Generation <strong>GPU</strong><br />

Accelerated Financial Quant Libraries<br />

(Wednesday, 15:00, Room: L)<br />

Anders Eklund<br />

PhD Student (Linköping University)<br />

Anders Eklund is a Ph.D. student at Linköping University,<br />

Sweden, with a M.Sc. in applied physics and electrical<br />

engineering. He is focused on medical image analysis,<br />

especially functional magnetic resonance imaging<br />

(fMRI). His current work involves using <strong>GPU</strong>s for<br />

non-parametric fMRI analysis (e.g. random permutation<br />

tests), real-time fMRI analysis (e.g. brain computer<br />

interfaces), interactive functional connectivity analysis<br />

and general medical image processing in 4D (e.g.<br />

denoising of large computed tomography (CT) datasets,<br />

512 x 512 x 450 x 20).<br />

h Session(s): S0017 - 4D Medical Image Processing<br />

with CUDA (Wednesday, 09:00, Room: A8)<br />

Rob Enderle<br />

Principal Analyst (Enderle Group)<br />

Rob is President and Principal Analyst of the Enderle<br />

Group, a forward looking emerging technology advisory<br />

firm. With over 25 years experience with emerging<br />

technologies he has provided regional and global<br />

companies with guidance on how to be successful in this<br />

changing world. Before founding the Enderle Group Rob<br />

was the Senior Research Fellow for Forrester Research<br />

and the Giga Information Group. While there he worked<br />

for and with companies like Microsoft, TI, HP, IBM, Dell,<br />

Toshiba, Gateway, Sony, USAA, Texas Instruments, AMD,<br />

Intel, Credit Suisse First Boston, GM, Ford, ROLM, and<br />

Siemens. Prior to that he worked for IBM and held<br />

positions in Internal Audit, Competitive Analysis,<br />

Marketing, Finance, and Security. Currently Rob writes<br />

on Emerging Personal <strong>Technology</strong>, Security, and Linux<br />

for a wide variety of publications including<br />

TechNewsWorld, CIO, Forbes, TGdaily, TMCNET,<br />

Datamation, and IT Business Edge and international<br />

news organizations like CNBC, CNN, Bloomberg, and<br />

NPR. Rob also does a semi weekly radio spot for Wall<br />

Street Journal radio on consumer technology. Rob sits<br />

on the advisory councils for a variety of technology<br />

companies.<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Eric Enderton<br />

Research Scientist (NVIDIA)<br />

Eric Enderton is a research scientist at NVIDIA, focusing<br />

on transparency, shadows, and film rendering. He was a<br />

principal engineer on NVIDIA Gelato, the first <strong>GPU</strong>accelerated<br />

film rendering software. Previously, Eric<br />

developed rendering and animation software at<br />

Lucasfilm’s Industrial Light & Magic and at other major<br />

film studios. His film credits include “Terminator 2”,<br />

“Jurassic Park”, and “Star Wars Episode I”. Eric has a<br />

masters degree in computer science from the University<br />

of California at Berkeley.<br />

h Session(s): S0409 - Stochastic Rasterization<br />

(Tuesday, 15:30, Room: B)<br />

Kenneth Esler<br />

Computational Physicist (Stone Ridge <strong>Technology</strong>)<br />

Dr. Esler is a computational physicist at Stone Ridge<br />

<strong>Technology</strong> in Bel Air, Maryland. He received his<br />

bachelor’s degree in physics from MIT in 1999. He<br />

completed his Ph.D. in computational condensed matter<br />

physics at the University of Illinois at Urbana-Champaign<br />

in 2006, developing methods for quantum-level<br />

simulation of matter at finite temperature. He accepted<br />

postdoctoral appointments at the Carnegie Institution of<br />

Washington and the National Center for Supercomputing<br />

Applications. His professional interests include<br />

computational methods development, algorithm<br />

optimization, and heterogeneous computing platforms.<br />

h Session(s): S0140 - Accelerating Reservoir<br />

Simulation and Algebraic Multigrid with <strong>GPU</strong>s<br />

(Wednesday, 14:00, Room: A7)<br />

Sorin Faibish<br />

(EMC Corporation)<br />

Sorin Faibish designed and built innovative shared High<br />

Performance storage solutions including architecture<br />

design of NFS clusters, architect the performance<br />

strategy of Celerra file system. Sorin is a technology<br />

consultant and evangelist for pNFS as well as member<br />

of IETF and contributor to the pNFS protocol and<br />

promoted pNFS in research forums. Sorin’s wider<br />

expertise include: Clustered File systems, Storage<br />

systems, High Performance Computing, Robotic<br />

architectures, Complex systems design and Artificial<br />

Intelligence. Sorin holds a Master degree from Technion,<br />

Israel in EE, and is a member of IEEE, ACM, USENIX,<br />

IETF and SNIA and has 50 papers and 36 patents.<br />

h Session(s): S0701 - Los Alamos AHPC Symposium,<br />

New <strong>GPU</strong> Appliance for Co-processing<br />

(Wednesday, 15:00, Room: J)<br />

Wes Faler<br />

Head of Software Development (Part-Time Scientists)<br />

Wesley Faler is a Head of Software Development at<br />

Part-Time Scientists. He is also a software engineer with<br />

25 years of broad experience. Unusual skills include<br />

<strong>GPU</strong>-based simulations, genetic programming, FPGAs,


high voltage electronics, ion engines, and sending a<br />

rover to the moon with the Part-Time Scientists for the<br />

Google Lunar X Prize.<br />

h Session(s): S3002 – Day 3 Keynote: Not Your<br />

Grandfather’s Moon Landing<br />

(Thursday, 11:00, Keynote Hall)<br />

Robert Farber<br />

Chief Scientist (BlackDog Endeavors, LLC)<br />

Rob is recognized for his work in High Performance<br />

Computing (HPC), machine learning, complex dynamical<br />

systems and high energy physics. Lately, he has been<br />

focused on advancing the state-of-the art through his<br />

publications and computational research including his<br />

book CUDA Application Design and Development, online<br />

venues Doctor Dobb’s Journal and The Code Project,<br />

peer-review journals, conferences, and magazines such<br />

as Scientific Computing. Rob has co-founded two<br />

companies that achieved liquidity events, as a theoretical<br />

division scientist at Los Alamos, on-staff at SFI, Berkeley<br />

and PNNL. Currently, he is working with and teaching at<br />

research and educational organizations around the world.<br />

h Session(s): S0038 - Designing Killer CUDA<br />

Applications for X86, multi<strong>GPU</strong>, and CPU+<strong>GPU</strong><br />

(Thursday, 16:00, Marriott Ballroom 3),<br />

h S0646 Massively Parallel Code Development on<br />

Stelletto CDA (Presented by Creative Consultants)<br />

(Tuesday, 17:00, Room: A8)<br />

Reza Farivar<br />

PhD Student (University of Illinois at Urbana-Champaign)<br />

Reza Farivar received his B.S. degree in electrical<br />

engineering in 2003, and his M.S. degree in computer<br />

engineering in 2005. He is currently finishing his PhD in<br />

Electrical and Computer Engineering at the University of<br />

Illinois at Urbana-Champaign. His major research<br />

interests include parallel cloud computing programming<br />

models, heterogeneous computing algorithms<br />

(specifically with <strong>GPU</strong>s) and combining <strong>GPU</strong>s and cloud<br />

computing paradigms. He has also worked on reliability<br />

and security as well as ubiquitous computing.<br />

h Session(s): S0152 - Accurate Sequence Alignment<br />

using Distributed Filtering on <strong>GPU</strong> Clusters<br />

(Tuesday, 15:30, Room: K)<br />

Massimiliano Fatica<br />

Manager (NVIDIA)<br />

Massimiliano Fatica is a manager of the Tesla<br />

Performance Group at NVIDIA where he works in the<br />

area of <strong>GPU</strong> computing (high-performance computing<br />

and clusters). He holds a laurea in Aeronautical<br />

Engineering and a Phd in Theoretical and Applied<br />

Mechanics from the University of Rome “La Sapienza”.<br />

Prior to joining NVIDIA, he was a research staff member<br />

at Stanford University where he worked at the Center for<br />

Turbulence Research and Center for Integrated<br />

Turbulent Simulations on applications for the Stanford<br />

Streaming Supercomputer.<br />

h Session(s): S0522 – Introduction to CUDA Fortran<br />

(Monday, 14:30, Room: A3)<br />

Wu Feng<br />

Professor (Virginia Tech)<br />

Wu Feng holds dual appointments in Computer Science<br />

and Electrical & Computer Engineering at Virginia Tech<br />

(VT) and an adjunct professorship in Cancer Biology and<br />

Translational Science Institute at Wake Forest University.<br />

He is an internationally recognized expert in highperformance<br />

computing (HPC), as evidenced by his<br />

presence on HPCwire’s People to Watch List in 2011. His<br />

lab works at the synergistic intersection of HPC and the<br />

domain sciences. He is an ACM Distinguished Scientist<br />

and an IEEE Senior Member.<br />

h Session(s): S0156 - Towards Computing the Cure<br />

for Cancer (Tuesday, 17:00, Hall 1)<br />

Alex Fit-Florea<br />

Senior Engineer (NVIDIA)<br />

Alex Fit-Florea currently works for NVIDIA as the CUDA<br />

software manager in charge with core mathematical<br />

functionality, random number generators, and fft<br />

algorithms. His main professional and research<br />

interests revolve around computer arithmetic and<br />

numerical methods. He served as a member of the<br />

IEEE754-2008 Standard for Floating Point Arithmetic<br />

Review Committee. Alex holds B.S and M.S. degrees<br />

from UB-B, and a PhD from SMU.<br />

h Session(s): S0085 - Floating Point and IEEE 754<br />

Compliance for NVIDIA <strong>GPU</strong>s: Precision &<br />

Performance (Wednesday, 14:30, Room: A3)<br />

Christopher Fluke<br />

Senior Lecturer (Swinburne University of <strong>Technology</strong> -<br />

Centre for Astrophysics and Supercomputing)<br />

Dr. Christopher Fluke is a Senior Lecturer at the Centre<br />

for Astrophysics and Supercomputing, Swinburne<br />

University of <strong>Technology</strong>. His main research interests are<br />

in gravitational lensing, astronomy visualization, and<br />

advanced computation, with an emphasis on the adoption<br />

of <strong>GPU</strong>s to accelerate the rate of astronomical discovery.<br />

His <strong>GPU</strong> work has included advancements in gravitational<br />

microlensing computations (teraflop/s rates achieved on<br />

the desktop), real-time terascale visualization and data<br />

analysis on <strong>GPU</strong>-clusters (for next generation radio<br />

telescopes), and strategies for adoption of <strong>GPU</strong>s by<br />

astronomers. He is the Principle Investigator of the<br />

NVIDIA CUDA Research Centre at Swinburne University.<br />

h Session(s): S0707- Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Scalability:<br />

Hardware and Software (Thursday, 9:00, Room: J2)<br />

h S0022 - Scalable Frameworks and Algorithms<br />

for Terascale Radio Astronomy Images<br />

(Thursday, 14:30, Room: M)<br />

Steve Forde<br />

Senior Product Manager (Adobe)<br />

Steve Forde joined Adobe in 2011 as senior product<br />

manager for After Effects, the industry-leading software<br />

for creating sophisticated motion graphics and cinematic<br />

visual effects. In this role, Forde oversees extending<br />

After Effects into new markets and workflows. Forde is<br />

an experienced executive and co-founder of multiple<br />

businesses within media and emerging technology. He<br />

joined Adobe from Gridiron Software where he was<br />

co-founder/CEO and CTO. Gridiron develops<br />

complementary technologies for After Effects, and<br />

software for managing overall workflow in the creative<br />

enterprise. Forde grew the company from venture<br />

funding to a global operation and from a perpetual<br />

license revenue base to a SaaS model. Forde was<br />

co-founder/CEO of Creative Shack Inc. and oversaw an<br />

acquisition by Mitel Networks. Forde sits on the board of<br />

Black Cherry Digital Media.<br />

h Session(s): S0632 Learn how Adobe After Effects<br />

CS6 takes advantage of NVIDIA Optix technology<br />

for 3D Ray Tracing (Presented by Adobe)<br />

(Tuesday, 14:00, Room: M)<br />

Dustin Franklin<br />

GP<strong>GPU</strong> Applications Engineer (GE Intelligent Platforms)<br />

Dustin is a <strong>GPU</strong> expert in the defense & aerospace<br />

industry. Originally a 3D rendering architect for games<br />

and simulations, he changed focus in 2005 to GP<strong>GPU</strong>.<br />

Dustin has years of experience in deploying highperformance<br />

CUDA applications onto rugged platforms<br />

like tanks, humvees, and UAVs. Currently, he works for<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

113


SPEAKERS AND<br />

PANELISTS<br />

GE as a GP<strong>GPU</strong> Applications Engineer and lives near<br />

Washington DC.<br />

h Session(s): S0253 - Sensor Processing with<br />

Rugged Kepler <strong>GPU</strong>s (Wednesday, 09:00, Room: M)<br />

Tom Furlong<br />

Managing Director (Granite Ventures LLC)<br />

Tom joined Granite Ventures in 2000, after a successful<br />

career in Silicon Valley that included stints as a vice<br />

president at Zhone Technologies, a communications<br />

equipment provider, and as a partner with a leading<br />

valley law firm, where he spent 13 years counseling<br />

technology companies, venture capitalists and<br />

investment banks. Tom currently serves on the Boards<br />

of Directors for Aspen Avionics, GoingOn Networks,<br />

Indicee, Mixamo and Skytide. Prior investments include<br />

Biz360 (acquired by Attensity), Digital Fountain (acquired<br />

by QualComm), Five Across (acquired by Cisco), Kinecta<br />

(acquired by Stellent), and TuVox (acquired by West<br />

Interactive).<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Ravikumar G.V.V.<br />

(Infosys Ltd, Bangalore)<br />

Biography unavailable at press time.<br />

h Session(s): S0214 – <strong>GPU</strong> Based Stacking Sequence<br />

Optimization For Composite Skins Using GA<br />

(Wednesday, 15:00, Room: K)<br />

Klaus Gaedke<br />

Lab Manager (Technicolor)<br />

Klaus Gaedke studied Electrical and Electronic<br />

Engineering at the University of Hannover, Germany, and<br />

received his Dipl.-Ing. and PhD degree from this<br />

institution. In 1996 he started to work for Technicolor<br />

Research and Innovation. Currently, he is responsible for<br />

Technicolor’s Image Processing Lab. His research<br />

interest include parallel programming, parallel real-time<br />

processing architectures and real-time implementation<br />

of image processing algorithms.<br />

h Session(s): S0073 - Cost-effective <strong>GPU</strong><br />

Acceleration of a Video Restoration and Archiving<br />

Workflow (Wednesday, 15:30, Room: A1)<br />

Daniel Gaudlitz<br />

Research Associate (Technische Universität München)<br />

As a research associate at Technische Universität<br />

München, Daniel Gaudlitz works on complex multiphase<br />

flows and their numerical modelling. Also efficient<br />

methods for HPC in academia and industry is a major<br />

research focus. Daniel Gaudlitz also leads R&D activities at<br />

the engineering company FluiDyna GmbH. After gratuating<br />

with a master’s degree from TU Dresden in 2003, he joined<br />

TU München and received a PhD in 2008 for his research<br />

on numerical simulations of multiphase flows.<br />

h Session(s): S0296 - A <strong>GPU</strong>-Enabled SPH Method<br />

for Micro and Nanofluidic Simulations<br />

(Tuesday, 09:00, Room: A7)<br />

Wei Ge<br />

Professor (Institute of Process Engineering, Chinese<br />

Academy of Sciences)<br />

Prof. Ge got his PhD degree at Harbin Institute of<br />

<strong>Technology</strong> in 1998 and has been professor of chemical<br />

engineering at Institute of Process Engineering, Chinese<br />

Academy of Sciences since 2006. He is mainly engaged<br />

in multi-scale simulation of particle-fluid two-phase<br />

systems. He proposed the so-called “pseudo-particle”<br />

model which enables simulation of macro-scale flow<br />

phenomena from microscopic physics through largescale<br />

parallel computation. As project leader, he has<br />

been working on the multi-scale software and hardware<br />

systems to bridge the simulation of molecular details to<br />

reactor performance.<br />

h Session(s): S0268 - Virtual Process Engineering<br />

- Realtime Simulation of Multiphase Systems<br />

(Tuesday, 09:00, Room: A8)<br />

h S0057 - <strong>GPU</strong>-Accelerated Molecular Dynamics<br />

Simulation of Solid Covalent Crystals<br />

(Thursday, 09:00, Marriott Ballroom 4)<br />

Isaac Gelado<br />

Senior Researcher (Barcelona Supercomputing Center)<br />

Isaac Gelado is a Senior Researcher at the Barcelona<br />

Supercomputing Center and a Visiting Scholar at the<br />

Coordinated Science Laboratory at the University of<br />

Illinois. At BSC, Isaac is working in the Mont-Blanc<br />

project and the NVIDIA CUDA Center of Excellence. Isaac<br />

holds a Master’s degree on Telecommunications<br />

Engineering from the Universidad de Valladolid, and a<br />

PhD degree from The Department of Computer<br />

Architecture in the Universitat Politecnica de Catalunya,<br />

where he also held a teaching position in the Computer<br />

Architecture Department.<br />

h Session(s): S0333 – GMAC-2: Easy and Efficient<br />

<strong>Program</strong>ming for CUDA-Based Systems<br />

(Thursday, 09:00, Room: B)<br />

Shaul Geldman<br />

Co-Founder and VP of R&D (RealView Imaging Ltd.)<br />

Mr. Gelman is an experienced R&D executive with over<br />

twelve years of hands-on experience in cutting edge<br />

projects in the field of multidisciplinary display<br />

technologies. Mr. Gelman co-founded RealView Imaging<br />

in 2008 and has been leading all the company’s R&D<br />

activities since inception. Prior to that, Shaul worked for<br />

Elbit Systems (NASDAQ: ESLT), one of Israel’s largest<br />

defense companies, leading the development of<br />

high-end helmet-mounted display systems for aviation/<br />

pilot applications. Mr. Gelman earned his Executive MBA<br />

from the Haifa University, and a B.Sc. in Industrial<br />

Engineering & Management from the Technion, Israel<br />

Institute of <strong>Technology</strong>.<br />

h Session(s): S2005 – Emerging Companies Summit:<br />

CEO on Stage Featuring RealView Imaging,<br />

Elemental Technologies, and Mersive<br />

(Wednesday, 16:00, Marriott Ballroom 4)<br />

Geoff Gerfin<br />

Sr. System Software Engineer and Technical Manager<br />

(NVIDIA)<br />

Geoff Gerfin is currently a Sr. System Software Engineer<br />

and Technical Manager in the CUDA Tools Group at<br />

NVIDIA, where he develops and manages tools for<br />

next-generation <strong>GPU</strong> architectures. Geoff has worked in<br />

the HPC community since receiving his degree in<br />

Computer Engineering from the University of Delaware<br />

in 2005.<br />

h Session(s): S0027A - All-In-One Debugging<br />

Experience with CUDA-GDB and CUDA-MEMCHECK<br />

(Monday, 14:30, Room: A5)<br />

h S0027B - All-In-One Debugging Experience with<br />

CUDA-GDB and CUDA-MEMCHECK<br />

(Wednesday, 14:00, Room: C)<br />

Denis Gerrer<br />

Denis Gerrer has 20 years of experience in HPC<br />

previously working for SGI and Altair Engineering. As<br />

CAPS VP and General Manager Americas, he is now in<br />

charge of relations with CAPS Enterprise partners.<br />

h Session(s): S0646 Massively Parallel Code<br />

Development on Stelletto CDA (Presented by<br />

Creative Consultants) (Tuesday, 17:00, Room: A8)


Flip Gianos<br />

General Partner (Interwest Partners)<br />

Philip “Flip” Gianos has been part of InterWest’s IT team<br />

since 1982. With a background in engineering, he has<br />

invested in multiple areas of information technology,<br />

including semiconductors, computing and networking<br />

equipment, and infrastructure and applications software.<br />

He is chairman of the board of Xilinx (XLNX), a publicly<br />

held company, and is also a board member of several<br />

privately held companies, including: Bivio Networks,<br />

Brand.net, Convey Computer, and SpectraLinear. Gianos<br />

also serves on the advisory board of Storm Ventures II,<br />

and is a past president of the Western Association of<br />

Venture Capitalists.<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Oliver Gicquel<br />

Professor (Laboratoire E.M2.C, Ecole Centrale Paris)<br />

Biography unavailable at press time.<br />

h Session(s): S0129 – A Monte Carlo Thermal<br />

Radiation Solver in <strong>GPU</strong>/CPU Hybrid Architecture<br />

(Thursday, 09:00, Room: A8)<br />

Ben Goertzel<br />

CEO (Novamente LLC)<br />

Biography unavailable at press time.<br />

h Session(s): S0104 - <strong>GPU</strong> Implementation of Deep<br />

Learning for Intelligent Computer Vision<br />

(Tuesday, 16:30, Room: A1)<br />

James Goodman<br />

President/CEO (HySpeed Computing LLC)<br />

Dr. Goodman is founder and President/CEO of HySpeed<br />

Computing, a technology company specializing in<br />

developing advanced algorithms and analytic tools for<br />

the geospatial community. His expertise includes remote<br />

sensing, image analysis, mathematical modeling, and<br />

high performance computing. Dr. Goodman maintains<br />

academic affiliations with the University of Puerto Rico<br />

at Mayaguez and the University of Miami, where<br />

research is focused on remote sensing of coastal<br />

ecosystems. He has been awarded grants from NASA,<br />

NSF and NOAA, and collaborated with investigators from<br />

around the world. He is also active in the scientific<br />

community, publishing research and leading sessions at<br />

international conferences.<br />

h Session(s): S0290 - Algorithm Acceleration for<br />

Geospatial Analysis (Thursday, 09:30, Marriott<br />

Ballroom 3)<br />

David Goodwin<br />

Software Engineer (NVIDIA)<br />

David is technical lead for the CUDA Visual Profiler<br />

at NVIDIA.<br />

h Session(s): S0419A - Optimizing Application<br />

Performance with CUDA Profiling Tools<br />

(Tuesday, 09:00, Room: C)<br />

h S0420 - NSight IDE for Linux and Mac<br />

(Wednesday, 09:00, Room: A5)<br />

h S0419B - Optimizing Application Performance with<br />

CUDA Profiling Tools (Wednesday, 14:00, Room: A5)<br />

Chris Gottbrath<br />

Principal Product Manager (Rogue Wave Software)<br />

Chris Gottbrath is Principal Product Manager for<br />

TotalView, MemoryScape, ReplayEngine and<br />

ThreadSpotter at Rogue Wave Software. He’s worked<br />

with the TotalView debugger for more than a decade in a<br />

range of technical and marketing roles. Prior to that he<br />

wrote his fair share of bugs in linux-based numerical<br />

simulations of galaxy dynamics and large scale structure<br />

as a graduate student in Tucson, AZ. He has a Masters<br />

of Science in Astronomy and Astrophysics from the<br />

University of Arizona.<br />

h Session(s): S0340 - Debug Multi-<strong>GPU</strong> Applications<br />

on CUDA-Accelerated Clusters with TotalView<br />

(Wednesday, 15:30, Room: A5)<br />

Jérôme Graindorge<br />

Project Manager (ALYOTECH)<br />

Graindorge has been working for six years for ALYOTECH<br />

(a software services company) first as a software<br />

engineer, and most recently as a project manager<br />

specially dedicated to HPC and particularly <strong>GPU</strong>-based<br />

scientific applications.<br />

h Session(s): S0053 - Real Time <strong>GPU</strong>-Based Marine<br />

Scenes Simulation (Thursday, 10:00, Room: N)<br />

Alan Gray<br />

HPC Architect (The University of Edinburgh)<br />

Dr. Alan Gray was awarded a Ph.D. at The University of<br />

Glasgow in Theoretical Particle Physics in 2003, winning<br />

the 2004 Ogden Prize for the best UK thesis in particle<br />

physics phenomenology. He furthered this work under a<br />

fellowship at The Ohio State University, and since joining<br />

EPCC in 2005 he has been involved with a wide range of<br />

HPC-related projects: lately his research has focused on<br />

the role <strong>GPU</strong>s will play in future generations of<br />

supercomputers, including participation in the OpenMP<br />

language committee exploring adoption of accelerators.<br />

He has authored a large number of refereed and<br />

highly-cited publications.<br />

h Session(s): S0286 - Scaling Applications to a<br />

Thousand <strong>GPU</strong>s and Beyond<br />

(Wednesday, 16:00, Room: A2)<br />

Simon Green<br />

Senior Software Engineer (NVIDIA)<br />

Simon Green is a senior member of the Developer<br />

<strong>Technology</strong> group at NVIDIA, specializing in real-time<br />

compute, rendering and physical simulation. He started<br />

graphics programming on the Sinclair ZX-81, which had<br />

1 kB of RAM and a screen resolution of 64 by 48 pixels,<br />

and has been trying to improve the quality of real-time<br />

graphics ever since.<br />

h Session(s): S0102 - Flame On: Real-Time Fire<br />

Simulation for Video Games<br />

(Tuesday, 09:00, Room: J1)<br />

Ray Grout<br />

(National Renewable Energy Laboratory)<br />

Dr. Grout’s research interests as part of the<br />

Computational Science Center at the National<br />

Renewable Energy Laboratory include algorithmic<br />

advances to facilitate integrating partial differential<br />

equations (PDEs) numerically on future architectures<br />

and development of future computation fluid dynamics<br />

(CFD) capabilities with particular emphasis on reacting<br />

flows. Dr. Grout has expertise in development of<br />

turbulent combustion submodels and has a wealth of<br />

experience developing several combustion codes at<br />

different institutions. His recent work has focused on the<br />

development of DNS (direct numerical simulation)<br />

databases for jets in cross flow from peta-scale,<br />

high-fidelity simulations in collaboration with the gas<br />

turbine industry. A key outcome of this work has been<br />

insight into the importance of low-velocity recirculation<br />

zones and stratified combustion in the stabilization of<br />

flames above a jet in cross flow. Earlier work involved<br />

using DNS to probe fundamental understanding of<br />

stratified combustion, to investigate appropriate flame<br />

markers (progress variables, tracers), and to propose<br />

new models for the combined effects of flame<br />

propagation and mixing. Dr. Grout also has experience<br />

deploying models for gaseous auto-ignition using<br />

commercial CFD codes.<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

115


SPEAKERS AND<br />

PANELISTS<br />

h Session(s): S0625 S3D Direct Numerical<br />

Simulation - Preparations for the 10-100PF Era<br />

(Tuesday, 15:00, Room: A2)<br />

Vinod Grover<br />

Senior Manager (NVIDIA)<br />

Vinod Grover manages the compiler team at NVIDIA and<br />

responsible for compilation of CUDA and OpenCL to PTX<br />

ISA. Vinod has been with NVIDIA for 4 years and at<br />

Microsoft and Sun Microsystems before that. He<br />

holds a Master’s degree in computer science from<br />

Syracuse University.<br />

h Session(s): S0235 - Compiling CUDA and Other<br />

Languages for <strong>GPU</strong>s (Wednesday, 10:00, Room: A5)<br />

Guy Gueritz<br />

(Bull)<br />

Guy Gueritz joined Bull in 2008 to develop Bull’s HPC<br />

business in the upstream oil and gas industry, with<br />

particular focus on <strong>GPU</strong>-accelerated hybrid systems for<br />

advanced seismic imaging applications such as Reverse<br />

Time Migration. He has over twenty years’ experience in<br />

HPC and visualization applied to the geosciences, with<br />

previous roles in Hewlett-Packard, Linux Networx and<br />

SGI. His worldwide responsibilities include working with<br />

oil companies, seismic contractors, independent<br />

software vendors and technology partners to deploy<br />

advanced imaging capabilities on scalable HPC systems.<br />

He regularly participates in oil industry seminars and<br />

conferences and is a member of SEG and EAGE.<br />

h Session(s): S0643 Hybrid Architectures for<br />

Advanced Seismic Imaging: Recent Experiences at<br />

Bull (Presented by Bull) (Tuesday, 17:00, Room: M)<br />

Thomas Guignon<br />

Research Engineer (IFPEN)<br />

Biography unavailable at press time.<br />

h Session(s): S0108 - An Innovative Massively<br />

Parallelized Molecular Dynamic Software<br />

(Tuesday, 16:00, Room: C)<br />

Kshitij Gupta<br />

Graduate Student Researcher (UC Davis)<br />

Kshitij Gupta is a Ph.D. candidate in the Department of<br />

Electrical & Computer Engineering at UC Davis. He is<br />

interested in a variety of application domains like audio,<br />

image, and video. His primary interests are in exploring<br />

novel ways of transforming today’s high-performance<br />

algorithms onto emerging low-end, low-power, hybrid<br />

(CPU/<strong>GPU</strong>/DSP/ASIP) processors targeted towards<br />

mobile and automotive platforms. In his spare time, he<br />

likes procrastinating about novel user-interfaces, and<br />

hopes to work more actively on it some day. Kshitij<br />

received his Masters in EE from University of Pittsburgh<br />

(PA, USA), and his Bachelors in ECE from Osmania<br />

University (Hyderabad, India).<br />

h Session(s): S0157 - A Study of Persistent Threads<br />

Style <strong>Program</strong>ming Model for <strong>GPU</strong> Computing<br />

(Thursday, 15:00, Room: B)<br />

Pankaj Gupta<br />

Bioinformatics Application Developer (St Jude Children’s<br />

Research Hospital)<br />

Pankaj is working as a Bioinformatics Application<br />

Developer at St. Jude Children’s Research Hospital in<br />

Memphis, TN. He received his bachelor’s degree in<br />

Computer Science from Rutgers University and his<br />

master’s degree in Computational Bioscience from<br />

Arizona State University. He likes working with opensource<br />

technologies whenever possible.<br />

h Session(s): S0083 - Swift: A <strong>GPU</strong>-based Smith-<br />

Waterman Sequence Alignment <strong>Program</strong><br />

(Tuesday, 09:30, Room: K)<br />

Rohit Gupta<br />

PhD Student (Delft University of <strong>Technology</strong>)<br />

Rohit completed his masters at the Delft University of<br />

<strong>Technology</strong> in computer engineering. During his<br />

masters’ thesis he worked on implementing a<br />

preliminary version of a preconditioned conjugate<br />

gradient solver on the <strong>GPU</strong>. He continued at the Delft<br />

Institute of Applied Mathematics as a phd student after<br />

graduating. His primary focus is to find new<br />

preconditioning methods that are suited to the <strong>GPU</strong> and<br />

the same time are at par with established parallelizable<br />

preconditioning techniques like Block Incomplete<br />

Cholesky in terms of achievable precision and<br />

mathematical stability.<br />

h S0063 - Robust Preconditioned Conjugate Gradient<br />

for the <strong>GPU</strong> and Parallel Implementations<br />

(Thursday, 16:00, Room: N)<br />

Sebastien Gurrieri<br />

Quantitative Analyst (Mizuho International)<br />

With a background of research in Theoretical Physics<br />

(String Theory), Gurrieri switched to finance 4 years ago.<br />

He is now working in the London branch of a Japanese<br />

investment bank and specializes in Risk Management of<br />

Fixed Income and Equity products. Until now he has<br />

been mostly interested in calibration and Monte-Carlo<br />

simulation issues, although he has also done some work<br />

on Finite Difference methods.<br />

h Session(s): S0206 - Monte-Carlo Pricing<br />

Under a Hybrid Local Volatility Model<br />

(Wednesday, 16:00, Room: L)<br />

Tobias Gysi<br />

(Supercomputing Systems AG)<br />

Tobias Gysi graduated 2005 in computer science from<br />

ETH Zurich, Switzerland. He joined the R&D service<br />

provider Supercomputing Systems AG (SCS), working on<br />

advanced topics such as cryptography, image<br />

processing, speech recognition, and Monte-Carlo<br />

pricing. Tobias’ work has a strong focus on performance<br />

optimizations - developing more efficient<br />

implementation strategies and algorithms, and<br />

employing accelerators such as <strong>GPU</strong>s or FPGAs.<br />

Currently Tobias is dealing with a community code<br />

project where software maintainability and<br />

(performance) portability are key issues.<br />

h Session(s): S0256 – A Stencil Library for the New<br />

Dynamic Core of COSMO (Thursday, 09:00, Room: N)<br />

Alexander Haberstroh<br />

Software Developer (Jedox AG)<br />

Alexander Haberstroh studied computer science with a<br />

focus on image processing at the University of Freiburg,<br />

Germany, where he obtained his Master’s degree in<br />

2010. Between 2008 and 2010, he was also working at<br />

the Fraunhofer Institute for Solar Energy Systems.<br />

During his studies he worked on his first CUDA project,<br />

developing algorithms for comparing depth maps which<br />

are used in mobile robot mapping. Since 2011, he has<br />

been working at Jedox, concentrating on <strong>GPU</strong><br />

algorithms for multidimensional databases in the area<br />

of Business Intelligence.<br />

h Session(s): S0219 – Efficient Top-Down Planning in<br />

Business Intelligence (Tuesday, 17:00, Room: C)<br />

Markus Hadwiger<br />

Assistant Professor (KAUST)<br />

Markus Hadwiger is an assistant professor of computer<br />

science at King Abdullah University of Science and<br />

<strong>Technology</strong> (KAUST) in Saudi Arabia. His research<br />

interests are petascale visual computing and scientific<br />

visualization, volume rendering, and <strong>GPU</strong> algorithms in<br />

general. He is currently teaching classes on scientific


visualization, and <strong>GPU</strong> and GP<strong>GPU</strong> programming. He<br />

obtained a PhD in computer science from the Vienna<br />

University of <strong>Technology</strong>. He has taught a series of<br />

courses on various aspects of visualization and volume<br />

rendering at ACM SIGGRAPH, IEEE Visualization, and<br />

Eurographics, and is a coauthor of the book Real-Time<br />

Volume Graphics (A.K. Peters, 2006).<br />

h Session(s): S0202 – Terascale Volume Visualization<br />

in Neuroscience (Wednesday, 16:30, Room: A8)<br />

Yoshiaki Hanada<br />

CEO (Prometech Software, Inc.)<br />

Yoshiaki Hanada is CEO of Prometech Software and<br />

works on promoting a particle simulation technology<br />

from Japan to the world. In his former job, he worked at<br />

Accenture Japan as a management consultant. In 2006<br />

he recieved a master’s degree from the Department of<br />

Advanced Energy, Graduate School of Frontier Sciences,<br />

The University of Tokyo.<br />

h Session(s): S0066 - Particleworks: Particle-based<br />

CAE Software Fully Ported on Multi-<strong>GPU</strong><br />

(Wednesday, 10:00, Room: K)<br />

Jerry Harris<br />

Senior Computer Scientist II (Adobe Systems)<br />

For 25+ years, Jerry has focused on deploying engaging<br />

commercial imaging applications. First as part of a<br />

startup that delivered the first commercial color paint<br />

program to the macintosh, later at Apple, and for the<br />

past 15 years at Adobe working on Photoshop. Has been<br />

an engineer on the Photoshop team starting on version<br />

5.0. Responsible for Layer Effects, Painting, Warping,<br />

and <strong>GPU</strong> acceleration. His current focus in on <strong>GPU</strong><br />

enablement, and the delivery of joy of use via immersive<br />

fluid workflows.<br />

h Session(s): S0395 - <strong>GPU</strong> Enablement in Adobe<br />

Photoshop (Tuesday, 09:00, Room: A2)<br />

Mark Harris<br />

Chief Technologist, <strong>GPU</strong> Computing (NVIDIA)<br />

Mark Harris is Chief Technologist for <strong>GPU</strong> Computing at<br />

NVIDIA, where he works as a developer advocate and<br />

helps drive NVIDIA’s <strong>GPU</strong> computing software strategy.<br />

His research interests include parallel computing,<br />

general-purpose computation on <strong>GPU</strong>s, physically based<br />

simulation, and real-time rendering. Mark founded www.<br />

GP<strong>GPU</strong>.org while he was earning his PhD in computer<br />

science from the University of North Carolina at Chapel<br />

Hill. Mark brews his own beer and cures his own bacon<br />

in Brisbane, Australia, where he lives with his wife and<br />

daughter.<br />

h Session(s): S0517A - <strong>Program</strong>ming <strong>GPU</strong>s with<br />

OpenACC (Part 1 of 3) (Monday, 10:30, Room: B)<br />

h S0517B - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />

2 of 3) (Monday, 13:00, Room: B)<br />

h S0517C - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 3<br />

of 3) (Monday, 14:30, Room: B)<br />

h S0641 - CUDA 5 and Beyond (Tuesday, 16:00, Hall 1)<br />

h S0653 - C++ and CUDA Birds-of-a-Feather<br />

(Wednesday, 18:00, Room: L)<br />

Mike Heck<br />

<strong>Technology</strong> Advisor (VSG)<br />

Biography unavailable at press time.<br />

h Session(s): S0444 - Explore New Techniques in<br />

Volume Rendering/Segmentation with Open<br />

Inventor (Tuesday, 15:30, Room: A7)<br />

Francisco J. Hernandez-Lopez<br />

(PhD Student, CIMAT A.C.)<br />

Francisco received a bachelor’s degree in computer<br />

systems engineering from the San Luis Potosi Institute<br />

of <strong>Technology</strong>, Mexico in 2005. He received the MSc<br />

degree in Computer Science from the Center for<br />

Research in Mathematics (CIMAT) in 2009. Since then, he<br />

is doctoral student at the CIMAT where he has been<br />

granted a CONACYT scholarship. His main interests are<br />

in the area of computer vision and in particular the<br />

development of efficient, parallel, algorithms for video<br />

processing and analysis.<br />

h Session(s): S0128 - V:Screen: A Real-Time<br />

Augmented Video Method<br />

(Wednesday, 17:00, Room: A1)<br />

David Helgason<br />

CEO (Unity Technologies)<br />

David Helgason, an entrepreneur, visionary and<br />

ex-programmer, has served as the CEO of Unity<br />

Technologies since co-founding it in 2003. The vision is<br />

to democratize game development and develop<br />

technology for the next generation of the industry. David<br />

founded and participated in startups in fields such as<br />

news and community integration, music distribution and<br />

consulting. He serves on the boards of several games<br />

and technology startups.<br />

h Session(s): S2001 – Emerging Companies Summit:<br />

CEO on Stage Featuring Unity Technologies,<br />

MirriAd and BioDigital<br />

(Wednesday, 10:00, Marriott Ballroom 4)<br />

Jeff Herbst<br />

Vice President of Business Development (NVIDIA)<br />

Jeff is the Vice President of Business Development at<br />

NVIDIA Corporation, the world leader in visual<br />

computing technologies (and inventor of the <strong>GPU</strong>). In<br />

this role, which he has held since 2001, Jeff leads<br />

NVIDIA’s worldwide business development efforts,<br />

including overall ecosystem development, mergers and<br />

acquisitions strategy, investments, partnerships and<br />

other strategic business relationships and transactions.<br />

Prior to NVIDIA, Jeff was the worldwide head of<br />

corporate and business development at AltaVista, and<br />

also served as general manager for a start-up focused<br />

on content delivery infrastructure for wireless networks.<br />

Earlier in his career, Jeff was a partner with the law firm<br />

of Wilson Sonsini where he specialized in corporate<br />

finance, joint ventures, mergers and acquisitions and<br />

other strategic business and intellectual propertyrelated<br />

transactions. Jeff holds a B.S degree in<br />

Computer Science from Brown University (where he<br />

studied computer graphics), and a law degree from<br />

Stanford Law School.<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Berk Hess<br />

PhD Student (KTH Royal Institute of <strong>Technology</strong>)<br />

Biography unavailable at press time.<br />

h Session(s): S0363 – Efficient Molecular Dynamics<br />

on Heterogeneous <strong>GPU</strong> Architectures in GROMACS<br />

(Wednesday, 16:00, Room: N)<br />

Christopher Horvath<br />

Global <strong>Technology</strong> Technical Director (Pixar)<br />

Biography unavailable at press time.<br />

h Session(s): S0102 – Flame On: Real-Time Fire<br />

Simulation for Video Games<br />

(Tuesday, 09:00, Room: J1)<br />

Julien Houssay<br />

Software Engineer (ALYOTECH)<br />

Julien is a software engineer at ALYOTECH, specialized<br />

in <strong>GPU</strong> computing in scientific applications. He is<br />

currently working on a marine scene simulator mixing<br />

electro-optics and radar, using <strong>GPU</strong> for both general<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

117


SPEAKERS AND<br />

PANELISTS<br />

purpose computing (CUDA and/or OpenCL) and<br />

rendering (OpenGL).<br />

h Session(s): S0053 – Real Time <strong>GPU</strong>-Based Marine<br />

Scenes Simulation (Thursday, 10:00, Room: N)<br />

Agatha Hu<br />

Developer technology Engineer (NVIDIA)<br />

Agatha Hu is Developer <strong>Technology</strong> Engineer at NVIDIA<br />

Corporation. She received a master’s degree in Biomedical<br />

Engineering from Shanghai Jiaotong University. Her work<br />

includes developing data parallel algorithms on <strong>GPU</strong> for<br />

bioinformatics as well as image processing.<br />

h Session(s): S0084 CUMACH - A Fast <strong>GPU</strong>-based<br />

Genotype Imputation Tool<br />

(Wednesday, 16:30, Room: B)<br />

Jen-Hsun Huang<br />

Co-Founder, President and CEO (NVIDIA)<br />

Jen-Hsun Huang co-founded NVIDIA in 1993 and has<br />

served since its inception as president, chief executive<br />

officer and a member of the board of directors. Under<br />

his leadership, NVIDIA invented the graphics processing<br />

unit (<strong>GPU</strong>) in 1999. Since then, it has consistently set<br />

new standards in visual computing with breathtaking,<br />

interactive graphics available on devices ranging from<br />

tablets and portable media players to notebooks and<br />

workstations. NVIDIA’s expertise in programmable <strong>GPU</strong>s<br />

has led to breakthroughs in parallel processing which<br />

make supercomputing inexpensive and widely<br />

accessible. The company holds more than 1,100 U.S.<br />

patents, including ones covering designs and insights<br />

fundamental to modern computing.<br />

h Session(s): S3000: Opening Keynote<br />

(Tuesday, 10:30, Keynote Hall)<br />

h S2003: Emerging Companies Summit Fireside Chat<br />

(Wednesday, 14:00, Marriott Ballroom 4)<br />

John Humphrey<br />

Engineering Director (EM Photonics)<br />

John received his MSEE from the University of Delaware<br />

in 2004 and has been working in the field of accelerated<br />

computing for 10 years. The past six years have focused<br />

primarily on <strong>GPU</strong> applications, in areas ranging from<br />

computational electromagnetics to computational fluid<br />

dynamics and linear algebra libraries.<br />

h Session(s): S0304 - Large Scale Computational<br />

Fluid Dynamics Simulations on Hybrid<br />

Supercomputers (Wednesday, 10:30, Room: K)<br />

h S0307 - New Advances in <strong>GPU</strong> Linear Algebra<br />

(Wednesday, 14:00, Room: A3)<br />

h S0709- Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 2<br />

(Thursday, 14:00, Room: J1)<br />

Maxwell Hutchinson<br />

PhD Student (University of Chicago)<br />

Maxwell is currently a physics PhD student at the<br />

University of Chicago, funded by a Department of Energy<br />

Computational Science Graduate Fellowship. He has<br />

been working with GP<strong>GPU</strong>s since 2008, applying them to<br />

problems in electronic structure, Ising models, error<br />

correction in radio systems, and post-processing for<br />

particle detectors.<br />

h Session(s): S0378 - VASP Accelerated with <strong>GPU</strong>s<br />

(Thursday, 14:00, Marriott Ballroom 4)<br />

Saeed Iqbal<br />

Senior Systems Engineer (Dell)<br />

Saeed Iqbal is a Senior Systems Engineer in the Global<br />

Solutions Engineering Group at Dell. Currently, he is the<br />

lead engineer on integration and performance analysis<br />

of <strong>GPU</strong>s in the Dell HPC solutions. He is also the lead<br />

engineer of the HPC advisor online tool at Dell.com/hpc.<br />

This tool is used by HPC customers to configure <strong>GPU</strong><br />

enabled HPC clusters and associated high performance<br />

parallel storage clusters.<br />

h Session(s): S0309 – Dynamically Allocating GP<strong>GPU</strong><br />

to Host Nodes (Servers) (Thursday, 10:30, Room: K)<br />

Olexan Isayev<br />

Research Scientist (Case Western Reserve University)<br />

Olexan Isayev was born in Ukraine and earned his Ph.D.<br />

in Theoretical Chemistry under the supervision of Jerzy<br />

Leszczynski at Jackson State University. He is currently<br />

joint Postdoctoral Fellow at Case Western Research<br />

University and US Army Engineering Research and<br />

Development Center (ERDC). Dr. Isayev’s research<br />

interests focused on structure and dynamics at bio-nano<br />

interfaces, fist principles and hybrid QM/MM simulations<br />

and high performance computing.<br />

h Session(s): S0315 - Probing Bio-Nano Interface<br />

Structure from Microsecond Molecular Dynamics<br />

on <strong>GPU</strong>s (Thursday, 10:00, Marriott Ballroom 4)<br />

Michel Izygon<br />

CTO (Tietronix Software, Inc.)<br />

Dr. Izygon has been involved in Solar Energy Projects<br />

since 1982, when he became the Principal Investigator<br />

on a French-Israeli research project to build and assess<br />

the performance of different solar energy concentrating<br />

systems. Since 1999, Dr. Izygon has been the co-founder<br />

and CTO of Tietronix Software, a company specializing<br />

in custom software development for customers such<br />

as NASA.<br />

h Session(s): S0321 – <strong>GPU</strong>-Based Monte Carlo Ray<br />

Tracing Simulation for Solar Power Plants<br />

(Tuesday, 14:00, Room: A8)<br />

Kevin Jackson<br />

Founder / CEO (Viewpartners)<br />

Kevin Jackson is founder and CEO of Viewpartners. He<br />

has 20+ years of visual media experience as one of the<br />

first in L.A.’s special effects market. He has worked with<br />

the biggest names in the film and advertising industry –<br />

Sony, Disney, BBDO, JWT, and others.<br />

h Session(s): S0425 - File Sharing Plus Real Time<br />

Media and Document Collaboration<br />

(Wednesday, 17:30, Room: A1)<br />

Jan Jacob<br />

Postdoctoral Researcher (University of Hamburg)<br />

Dr. Jan Jacob is a postdoctoral researcher at the<br />

Institute of Applied Physics of the University of Hamburg,<br />

Germany. He studied physics in Hamburg and graduated<br />

in 2007 with his diploma thesis “Preparation and<br />

Characterization of Spin Filters based on InAs Quantum-<br />

Point Contacts”. Two years later in 2009 he received his<br />

Ph.D. from the University of Hamburg for his thesis<br />

“All-electrical InAs Spin Filters”. Since then he expanded<br />

his research from low-temperature magnetotransport<br />

measurements also to numerical high-performance<br />

computing simulations of spin and charge transport in<br />

mesoscopic systems to model spintronic devices.<br />

h Session(s): S0379 - <strong>GPU</strong>-based High-Performance<br />

Simulations for Spintronics<br />

(Tuesday, 14:30, Room: A8)<br />

M. Saleet Jafri<br />

Professor and Chair (George Mason University)<br />

M. Saleet Jafri is a Professor in the School of Systems<br />

Biology at George Mason University. His current research<br />

uses detailed multi-scale models consisting of the<br />

subcellular, cellular, and tissue components to


understand the mechanisms that give rise to complex<br />

diseases in the heart such as cardiac arrhythmia,<br />

ischemic heart disease, and heart failure. <strong>GPU</strong> computing<br />

plays a central role in these studies. He received his PhD<br />

from Mount Sinai School of Medicine/CUNY in the<br />

Biomathematical Sciences, MS in Mathematics from the<br />

Courant Institute of Mathematical Sciences at NYU and is<br />

BS in mathematics from Duke University.<br />

h Session(s): S0072 - <strong>GPU</strong>-Enabled Spatiotemporal<br />

Model of Stochastic Cardiac Calcium Dynamics and<br />

Arrhythmias (Wednesday, 09:00, Room: B)<br />

Michal Januszewski<br />

PhD Student and Software Engineer (University of Silesia<br />

in Katowice; Google Switzerland)<br />

Michał Januszewski is a Software Engineer at Google<br />

Switzerland and a PhD student at the University of<br />

Silesia in Katowice under the supervision of Prof. Marcin<br />

Kostur. His current research is centered around applying<br />

mesoscale hydrodynamics simulation methods to<br />

biologically relevant flows. Michał is also the leader of<br />

the Sailfish project, an open source effort to build a<br />

highly scalable lattice Boltzmann fluid dynamics solver<br />

for <strong>GPU</strong>s.<br />

h Session(s): S0258 - Sailfish: Lattice Boltzmann<br />

Fluid Simulations with <strong>GPU</strong>s and Python<br />

(Tuesday, 09:30, Room: A7)<br />

WeiLe Jia<br />

Postgraduate Student (Supercomputing Center of CNIC,<br />

Chinese Academy of Sciences)<br />

Weile Jia is a post-graduate student from<br />

Supercomputing Center of Chinese Academy of Sciences.<br />

h Session(s): S0392 - Large-Scale First Principle<br />

Pseudopotential DFT Calculations on <strong>GPU</strong> Clusters<br />

(Thursday, 15:30, Marriott Ballroom 4)<br />

Stephen Jones<br />

CUDA Developer (NVIDIA)<br />

Stephen Jones is a member of CUDA’s parallel<br />

algorithms group. Having first worked on the CUFFT<br />

library, he moved on to architect the parallel system<br />

software framework which enables system I/O from <strong>GPU</strong><br />

kernels, and wrote the first parallel system calls. He has<br />

made a particular study of thread execution on the <strong>GPU</strong>,<br />

and now works on future <strong>GPU</strong> architectures and<br />

development of the CUDA programming model.<br />

h Session(s): S0313 – Understanding and using<br />

Atomic Memory Operations<br />

(Tuesday, 14:00, Marriott Ballroom 3)<br />

h S0642 - Inside Kepler (Wednesday, 14:00, Hall 1)<br />

h S0338 - New Features In the CUDA <strong>Program</strong>ming<br />

Model (Thursday, 10:00, Hall 1)<br />

h S0707- Los Alamos AHPC Symposium, Accelerated<br />

HPC Symposium: Scalability: Hardware and<br />

Software (Thursday, 09:00, Room: J2)<br />

Mark E S Joselli<br />

Researcher (UFF)<br />

Mark is a Industrial Engineer and Electrical Electronic<br />

emphasis by the Federal Center for Technological<br />

Education Celso Suckow da Fonseca (CEFET-RJ – 2005)<br />

and MSc in Computer Science from Federal Fluminense<br />

University (2007). He has experience in Computer<br />

Science with emphasis in Computer Methods and<br />

Techniques. Acting on the following topics: Games,<br />

Simulation, GP<strong>GPU</strong>.<br />

h Session(s): S0074 - Techniques for Designing<br />

GP<strong>GPU</strong> Games (Thursday, 17:00, Room: L)<br />

Guido Juckeland<br />

System Engineer (HPC), Leader Hardware Accelerator<br />

Group (TU Dresden - ZIH)<br />

Guido is a computer engineer at Technische Universität<br />

Dresden where he is responsible for the design, setup and<br />

operation of the HPC resources for the state of Saxony. He<br />

is also working on a Ph.D. thesis titled “Trace Based<br />

Performance Analysis for Hardware Accelerators”.<br />

h Session(s): S0067 – PICon<strong>GPU</strong> - Bringing largescale<br />

Laser Plasma Simulations to <strong>GPU</strong><br />

Supercomputing (Tuesday, 15:00, Room: A8)<br />

h S0257 - Trace Based Performance Analysis For<br />

<strong>GPU</strong> Accelerated Multi-Hybrid Applications<br />

(Wednesday, 16:30, Room: A5)<br />

Patrick Kano<br />

Co-Owner (Acunum Algorithms and Simulations, LLC)<br />

Patrick Kano’s background lies in algorithm and<br />

simulation development and physics based modeling. In<br />

addition to being a co-owner of Acunum, he is a<br />

consultant with PsiNapse <strong>Technology</strong> in the San<br />

Francisco Bay Area. He studied at the University of<br />

Arizona (2001-2005) and was a student at the Arizona<br />

Center for Mathematical Sciences. He received a<br />

Diplom-Physik from the Dresden University of<br />

<strong>Technology</strong> in 2000 and a BS in physics from the<br />

University of Nevada, Reno in 1998. From 1998 to 2000,<br />

he was a research assistant at the Max Planck Institute<br />

for the Physics of Complex Systems.<br />

h Session(s): S0415 - An Accelerated Weeks Method<br />

for Numerical Laplace Transform Inversion<br />

(Wednesday, 09:30, Marriott Ballroom 3)<br />

Steve Karmesin<br />

Senior Developer (Numerix)<br />

Dr. Steve Karmesin is a senior developer at Numerix LLC<br />

working with many aspects of the CrossAsset derivatives<br />

pricing and analytics software, from software<br />

architecture to numerical modeling to <strong>GPU</strong><br />

development. His <strong>GPU</strong> work rests on his background in<br />

supercomputing at the Los Alamos Advanced Computing<br />

Laboratory where he worked on numerous massively<br />

parallel projects including leading the POOMA (Parallel<br />

Object Oriented Methods and Algorithms) team for<br />

applying advanced C++ techniques to large scale<br />

scientific codes.<br />

h Session(s): S0383 - Speedup Derivatives and<br />

Structured Products Pricing, Reduce TCO Using<br />

<strong>GPU</strong>s (Wednesday, 09:00, Room: L)<br />

Eric Kelmelis<br />

CEO (EM Photonics)<br />

Biography unavailable at press time.<br />

h Session(s): S0304 – Large Scale Computational<br />

Fluid Dynamics Simulations on Hybrid<br />

Supercomputers (Wednesday, 10:30, Room: K)<br />

Christopher Kennelly<br />

Research Scientist (D. E. Shaw Research)<br />

Chris Kennelly received his B.S. in computer science<br />

from Caltech. During his time as an Amgen Scholar at<br />

Caltech, he developed algorithms for simulating DNA<br />

self-assembly. Since then, Chris has been employed at<br />

D.E. Shaw Research developing algorithms and software<br />

for Desmond.<br />

h Session(s): S0078 - Panoptes: A Binary<br />

Instrumentation Framework for CUDA<br />

(Thursday, 10:00, Room: B)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

119


SPEAKERS AND<br />

PANELISTS<br />

Osman Kent<br />

Co-Founder & CEO (Numecent)<br />

Osman Kent is a serial technology and media<br />

entrepreneur. He is best known as the co-founder & CEO<br />

of 3Dlabs – at one time a $1B company on NASDAQ and<br />

one of the fathers of the <strong>GPU</strong> and the OpenGL on the PC.<br />

He has a First Class double-major in Computer Science<br />

and Electronics from University of Birmingham (UK), is a<br />

fellow of the Royal Society (RSA) and was recently given<br />

the Freedom of London for lifetime contributions to the<br />

IT industry. He is the inventor of numerous patents in<br />

computing and graphics. In his spare time, Osman<br />

incubates musicians through his record label<br />

Songphonic, recites live poetry while improvising on the<br />

piano and produces music for films.<br />

h Session(s): S2003 – Emerging Companies<br />

Summit: CEO on Stage Featuring GAIKAI,<br />

Immersive Media and Numecent<br />

(Wednesday, 15:00, Marriott Ballroom 4)<br />

Mahesh Khadtare<br />

PhD Student - Scientist ESP (I2IT, Pune University)<br />

Biography unavailable at press time.<br />

h Session(s): S0103 - Accelerating Protein<br />

Sequences and Classification using <strong>GPU</strong>-HMMER<br />

Search (Wednesday, 15:30, Room: B)<br />

h S0107 - Acceleration of Long-Wave Rapid<br />

Radioactive Transfer Model on GP<strong>GPU</strong><br />

(Thursday, 10:30, Room: N)<br />

Brucek Khailany<br />

Senior Research Scientist (NVIDIA)<br />

Brucek Khailany joined NVIDIA in December 2009 as a<br />

member of the Computer Architecture Research Group.<br />

Previously, Dr. Khailany was a Co-Founder and Principal<br />

Architect at Stream Processors, Inc. (SPI) where he led<br />

research and development activities related to highlyparallel<br />

programmable processor architectures. He<br />

received his Ph.D. and Masters in Electrical Engineering<br />

from Stanford University and received B.S.E. degrees in<br />

Electrical Engineering and Computer Engineering from<br />

the University of Michigan.<br />

h Session(s): S0605 - cudaDMA: Emulating DMA<br />

engines on <strong>GPU</strong>s for Performance and<br />

<strong>Program</strong>mability (Wednesday, 17:00, Room: C)<br />

Ali Khajeh-Saeed<br />

PhD Candidate (University of Massachusetts, Amherst)<br />

Ali Khajeh-Saeed obtained his Ph.D. in Mechanical<br />

Engineering and in Computer Science at the University<br />

of Massachusetts Amherst in November 2011. Ali was<br />

awarded a bachelor and master degrees in Aerospace<br />

Engineering from Sharif University of <strong>Technology</strong>, Iran in<br />

2008. His main research interests are Computational<br />

Fluid Dynamics (CFD), parallel computation and<br />

General-Purpose computation on Graphics Processing<br />

Units (GP<strong>GPU</strong>). He is currently working as a software<br />

engineer in CD-Adapco.<br />

h Session(s): S0217 - Efficient Implementation of<br />

CFD Algorithms on <strong>GPU</strong> Accelerated<br />

Supercomputers (Wednesday, 17:30, Room: K)<br />

Oleh Khoma<br />

Head of HPC Unit (ELEKS)<br />

With background in Applied Mathematics and more than<br />

12 years of experience in Software Engineering, Oleh is a<br />

Head of HPC Unit at ELEKS and is leading companys<br />

efforts in the reign of High Performance Computing.<br />

During the last couple of years Oleh and his team has<br />

successfully completed several complex bespoke HPC<br />

solutions utilizing the power of NVIDIA GP<strong>GPU</strong> cards.<br />

Passionate about engineering, his largest affection is his<br />

team. When you have the right people, no problem is<br />

challenging enough.<br />

h Session(s): S6047 - Effective HPC Architecture -<br />

Design, Develop, Implement (Presented by ELEKS)<br />

(Wednesday, 17:00, Room: A7)<br />

Mark Kilgard<br />

Principal Software Engineer (NVIDIA)<br />

Mark J. Kilgard is a Principal System Software Engineer<br />

and an NVIDIA Distinguished Inventor based in Austin,<br />

Texas. Mark works on OpenGL, programmable shading<br />

languages, and <strong>GPU</strong>-rendering algorithms. Mark wrote<br />

numerous important OpenGL extension specifications<br />

and implemented the popular OpenGL Utility Toolkit<br />

(GLUT) for developing portable OpenGL examples and<br />

demos. Mark co-authored the book The Cg Tutorial: the<br />

definitive guide to programmable real-time graphics.<br />

Mark’s Karaoke rendition of Dolly Parton’s “9 to 5” can’t<br />

be beat.<br />

h Session(s): S0023 - NVIDIA OpenGL for <strong>2012</strong><br />

(Monday, 09:00, Room: A3)<br />

h S0024 - <strong>GPU</strong>-Accelerated Path Rendering<br />

(Tuesday, 14:00, Room: A3)<br />

Jihan Kim<br />

Postdoctoral Researcher (Berkeley Lab)<br />

Jihan Kim began his new postdoctoral researcher position<br />

at NERSC on August, 2009, after earning his doctorate<br />

degree in electrical engineering at the University of Illinois<br />

Urbana-Champaign. For his dissertation, Kim wrote a<br />

quantum Monte Carlo code in C used to conduct<br />

simulations of quantum dots. He also worked on the<br />

device simulator Charon, during a summer internship at<br />

the Sandia National Laboratory. Currently, he is<br />

collaborating with Prof. Berend Smit from UC Berkeley on<br />

carbon capture and separation project.<br />

h Session(s): S0122 - Computational Screening<br />

of Novel Carbon Capture Materials<br />

(Thursday, 10:30, Marriott Ballroom 4)<br />

Grzegorz Kokosiński<br />

Software Engineer (IBM Poland)<br />

Grzegorz Kokosiński is MSc of Computier Science from<br />

Warsaw University of <strong>Technology</strong> in Poland with thesis<br />

about Ray Tracing implementation on CUDA in 2010.<br />

Since February 2011, he is a Software Engineer at IBM<br />

Netezza R&D Department in Warsaw, Poland. He has<br />

been involved in HPC appliance project as a CUDA team<br />

member, where he contributed in many proof of<br />

concepts, including advanced analitycs, bioinformatics<br />

and geo spatial algorithms implementation on CUDA.<br />

h Session(s): S0376 - Dynamic <strong>Program</strong>ming on<br />

CUDA: Finding the Most Similar DNA Sequence<br />

(Tuesday, 10:00, Room: K)<br />

David Korf<br />

Senior Marketing Manager (Hewlett-Packard)<br />

Mr. Korf has 16 years of engineering experience with the<br />

last 25 years in various senior marketing, product<br />

management and partner management positions.<br />

Accelerators, partner relationships and competitive<br />

analysis are currently some of his focus areas.<br />

h Session(s): S0633 - Learn about new Hewlett-<br />

Packard <strong>GPU</strong> Systems, Solutions, and Applications!<br />

(Wednesday, 10:00, Room: M)<br />

Alexandr Kosenkov<br />

Software Engineer (University of Geneva)<br />

Highly qualified software engineer in the field of HPC<br />

and distributed applications under Linux with over five<br />

years of experience. Possesses strong understanding


of hardware architecture designs down to the physics<br />

level and high-level technologies/programming<br />

languages. Open-minded leader, delivering useable,<br />

well-designed products.<br />

h Session(s): S0039 - Data-Driven GP<strong>GPU</strong> Ideology<br />

Extension (Thursday, 10:00, Marriott Ballroom 3)<br />

Jiri Kraus<br />

(Fraunhofer Institute for Algorithms and Scientific<br />

Computing (FhG-SCAI))<br />

Biography unavailable at press time.<br />

h Session(s): S0706 - Los Alamos AHPC Symposium,<br />

Efficient AMG on Hybrid <strong>GPU</strong> Clusters<br />

(Wednesday, 17:00, Room: J)<br />

Adarsh Krishnamurthy<br />

Post-Doctoral Researcher (UC San Diego)<br />

Adarsh Krishnamurthy is a post-doctoral researcher in<br />

the department of bioengineering at UC San Diego. His<br />

research interests include computer-aided design (CAD),<br />

geometric modeling, parallel <strong>GPU</strong> algorithms,<br />

biomechanics, and heart modeling. He received his Ph.D.<br />

in mechanical engineering from UC Berkeley specializing<br />

on parallel <strong>GPU</strong> algorithms for CAD. He received his<br />

bachelors and masters in mechanical engineering from<br />

Indian Institute of <strong>Technology</strong>, Madras, India.<br />

h Session(s): S0410 - Computing Hausdorff<br />

Distances between Freeforms on the <strong>GPU</strong><br />

(Wednesday, 17:00, Marriott Ballroom 3)<br />

Christoph Kubisch<br />

Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Prior joining NVIDIA as Developer <strong>Technology</strong> Engineer<br />

(Professional Solutions), Christoph was a Ph.D. student<br />

on hardware accelerated visualization techniques for<br />

medical datasets at the Otto-von-Guericke University of<br />

Magdeburg. During his studies he has co-authored<br />

luxinia, a scriptable 3d game engine for games and<br />

research projects. Furthermore, he has worked for the<br />

games industry as technical artist doing game art,<br />

shader and 3dsmax plugin development.<br />

h Session(s): S0105 - Hardware Acceleration<br />

for Vessel Visualization Tasks<br />

(Wednesday, 14:30, Room: A8)<br />

Wesley Kuo<br />

CEO (Ubitus)<br />

Wesley Kuo founded Ubitus Inc. in 2007. Ubitus is<br />

specialized in providing cutting-edge cloud computing<br />

technology in multimedia application and has won<br />

recognition from leading carriers and hand-held device<br />

manufacturers around the world including NTT, NTT<br />

Docomo and Samsung Electronics. Wesley is a<br />

successful entrepreneur who founded i@Solution Inc. in<br />

2000 which was later merged with Aplix Corporation in<br />

2004 where he was a board member and held several<br />

managerial positions in the field of international sales,<br />

marketing and OEM business. Wesley owns a Bachelor<br />

degree in Computer Science and Information<br />

Engineering from National Taiwan University and has<br />

dedicated his career in cloud computing, distributed<br />

computing and embedded solutions.<br />

h Session(s): S2002 – Emerging Companies Summit:<br />

CEO on Stage Featuring eyesight Mobile,<br />

Numira Biosciences, and Ubitus<br />

(Wednesday, 11:00, Marriott Ballroom 4))<br />

Jean Luc Lacome<br />

CEO (IMPETUS Afea SAS)<br />

Jean Luc LACOME has a background in Applied<br />

Mathematics and has been working for the past 10 years<br />

on the development of Smoothed Particle<br />

Hydrodynamics. Jean Luc has interests in fluid-structure<br />

interaction and defense applications. Jean-Luc is CEO of<br />

IMPETUS Afea France.<br />

h Session(s): S0143 - Fluid-Structure-Interaction<br />

Using SPH and GP<strong>GPU</strong> <strong>Technology</strong><br />

(Wednesday, 14:30, Room: K)<br />

Gianluca Lamanna<br />

Researcher (CERN)<br />

Gianluca is physicist working at CERN, the European<br />

Laboratory for Particle physics. In particular, at the<br />

moment, he’s involved in building the trigger system and<br />

the data acquisition system for an experiment searching<br />

for very rare processes. He obtained his PhD in physics<br />

in 2006 in the Pisa University with a thesis in data<br />

analysis about the search for possible violation of the<br />

particle physics Standard Model. After the PhD he spent<br />

few years in getting skills in electronics design and<br />

FPGA programming, very useful in our field to build<br />

detectors and acquisition system.<br />

h Session(s): S0013 - <strong>GPU</strong>s for Fast Triggering in<br />

NA62 Experiment (Tuesday, 10:00, Room: J2)<br />

Bjoern Landmann<br />

Development Engineer (FluiDyna GmbH)<br />

Landmann is a development engineer at FluiDyna GmbH,<br />

Munich, Germany since 2011. His research interests<br />

include: computational multiphysics; high-performance<br />

computing; and turbulence and aeroacoustics.<br />

h Session(s): S0293 - Culises – A Library for<br />

Accelerated CFD on Hybrid <strong>GPU</strong>-CPU Systems<br />

(Wednesday, 15:30, Room: K)<br />

Ian Lane<br />

Assistant Research Professor (Carnegie Mellon<br />

University)<br />

Biography unavailable at press time.<br />

h Session(s): S0223 – Rapid Training of Acoustic<br />

Models Using <strong>GPU</strong>s (Tuesday, 15:00, Room: N)<br />

Gerhard Lang<br />

Chief Engineering Officer (VizRT)<br />

Biography unavailable at press time.<br />

h Session(s): S0356 - Optimizing Texture Transfers<br />

(Tuesday, 16:00, Room: J2)<br />

Tobias Lauer<br />

Senior Researcher (Jedox AG)<br />

Tobias Lauer got his PhD in computer science from the<br />

University of Freiburg (Germany) in 2007. From 2008-<br />

2011, he did research on parallel algorithms for OLAP<br />

applications in a project sponsored by the German<br />

Research Foundation (DFG). He is now a Senior<br />

Researcher at Jedox AG, a software company specialized<br />

in Business Intelligence.<br />

h Session(s): S0219 - Efficient Top-Down Planning in<br />

Business Intelligence (Tuesday, 17:00, Room: C)<br />

Jeff Layton<br />

Enterprise Technologist for HPC (Dell)<br />

Dr. Jeffrey Layton is the Enterprise Technologist for HPC<br />

within Dell. Dr. Layton’s Ph.D. is from Purdue in<br />

Aeronautical and Astronautical Engineering. In his 25+<br />

years of experience with Supercomputing technologies,<br />

Dr. Layton has served in roles as a Professor, Engineer<br />

and Scientist at Boeing, Lockheed Martin, NASA, and<br />

Clarkson University, and has led technical efforts for High<br />

Performance Computing companies such as Linux<br />

Networx, Panasas, and Dell. In these roles he has been a<br />

cluster builder, a cluster user and code writer, a cluster<br />

administrator, as well as a systems engineer, manager,<br />

and benchmark engineer for HPC vendors. He is also an<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

121


SPEAKERS AND<br />

PANELISTS<br />

active contributor to multiple open source projects and<br />

actively contributes to technical publications both for<br />

magazines, books, and for websites.<br />

h Session(s): S0637 Analyzing performance<br />

and power of applications with <strong>GPU</strong>s on<br />

Dell 12G platforms (Presented by Dell)<br />

(Wednesday, 14:00, Room: M)<br />

Simon Layton<br />

PhD Candidate (Boston University)<br />

Simon Layton obtained his Masters in Mechanical<br />

Engineering from Boston University in 2011, and a<br />

Bachelor’s in mathematics and computer science from<br />

the University of Bristol in 2008. He is a PhD candidate<br />

under the supervision of Professor Barba at Boston<br />

University. During his postgraduate studies, he has<br />

worked on <strong>GPU</strong>-based projects, including the Fast Gauss<br />

transform and a CUDA based implementation of the<br />

immersed boundary method in fluid dynamics. Currently<br />

he is working on a <strong>GPU</strong> accelerated classical algebraic<br />

multigrid, work begun while interning at NVIDIA in<br />

Jonathan Cohen’s emerging applications group during<br />

the Summer of 2011.<br />

h Session(s): S0305 - Classical Algebraic Multigrid<br />

for CFD with CUDA (Thursday, 10:00, Room: A8)<br />

Scott Le Grande<br />

Principal Engineer (Amazon Web Services)<br />

Scott Le Grand is currently a principal engineer at<br />

Amazon Web Services. He developed the first molecular<br />

modeling system for home computers, Genesis, in 1987,<br />

Folderol, the distributed computing project targeted at<br />

the protein folding problem in 2000, and BattleSphere, a<br />

networkable 3D space shooter for the Atari Jaguar the<br />

same year. Surprisingly, all three of these efforts shared<br />

a common codebase. More recently, he ported the<br />

Folding@Home codebase to CUDA, achieving a 5x<br />

speedup over previous efforts, and which currently<br />

accounts for ~2.6 petaFLOPs of the project’s<br />

computational firepower. He is best known for his work<br />

porting the AMBER molecular dynamics package to<br />

CUDA, attaining record-breaking performance in the<br />

process. In a previous life, Scott picked up a B.S. in<br />

biology from Siena College and a Ph.D. in biochemistry<br />

from the Pennsylvania State University. In the current<br />

life, he is developing life science services on Amazon’s<br />

Elastic Compute Cloud (EC2).<br />

h Session(s): S0644 Molecule Dynamics, <strong>GPU</strong>s, and<br />

EC2 (Presented by Amazon Web Services)<br />

(Thursday, 10:00, Room: L)<br />

Chris Leader<br />

Research Assistant (Stanford Exploration Project)<br />

Chris Leader is currently working towards a PhD in<br />

Geophysics with the Stanford Exploration Project, under<br />

the supervision of Biondo Biondi and Jon Claerbout. He<br />

received an MSc in Geophsyics from Imperial College<br />

London whilst working on imaging 3D land seismic data<br />

and a BA in Physics from The University of Oxford whilst<br />

working on astrophysics and atmospheric phenomena.<br />

His interests include imaging blended seismic data,<br />

geophysical algorithm acceleration using advanced<br />

computing architectures and using micro-seismic data<br />

for imaging purposes.<br />

h Session(s): S0125 - Memory Efficient Reverse Time<br />

Migration in 3D (Wednesday, 10:00, Room: A7)<br />

Brent Leback<br />

Engineering Manager (Portland Group)<br />

Brent Leback is an Engineering Manager for PGI. He has<br />

worked in various positions over the last 26 years in HPC<br />

customer support, math library development,<br />

applications engineering and consulting at QTC, Axian,<br />

PGI and STMicroelectronics.<br />

h Session(s): S0622 - The Portand Group OpenACC<br />

(Thursday, 10:00, Room: A5)<br />

David Lecomber<br />

CTO (Allinea Software)<br />

Dr. David Lecomber is a founder of Allinea and leads the<br />

research, development and support teams behind its<br />

software products. David’s history in High Performance<br />

Computing began with the Oxford BSP group in 1993,<br />

working on alternatives for parallel programming to the<br />

emerging complex MPI standard. He obtained a DPhil in<br />

Parallel Computing, on the simulation of sharedmemoryand<br />

formal semantics for distributed-memory<br />

clusters, continuing to research parallel libraries and<br />

languages afterwards. After two years developing<br />

software for online services on clusters, he returned to<br />

HPC at Allinea, building the development tools needed<br />

for parallel and multithreaded software.<br />

h Session(s): S0099 - Debugging <strong>GPU</strong> Applications<br />

For Correctness and Performance<br />

(Wednesday, 15:00, Room: A5)<br />

HyoukJoong Lee<br />

PhD Student (Stanford University)<br />

HyoukJoong Lee is a PhD candidate in electrical<br />

engineering at Stanford University. His research<br />

interests include parallel computer architecture and<br />

general-purpose <strong>GPU</strong> computing with their<br />

programming models. He has an MS in electrical<br />

engineering from Stanford University.<br />

h Session(s): S0365 - Delite: A Framework for<br />

Implementing Heterogeneous Parallel DSLs<br />

(Wednesday, 15:00, Room: C)<br />

David Lehavi<br />

Senior Research Scientist (HP)<br />

David Lehavi is a senior research scientist with HP Labs<br />

Israel. He got his Ph.D. in algebraic geometry from the<br />

Hebrew university of Jerusalem on 2002. He has done<br />

research in algebraic geometry, bioinformatics,<br />

computerized proofs, communication networks, and<br />

semantics. He is currently interested in machine<br />

learning, and in various models for execution of general<br />

purpose algorithms on <strong>GPU</strong>s.<br />

h Session(s): S0043 - 30x Faster Regular<br />

Expressions on a <strong>GPU</strong> (Tuesday, 17:30, Room: C)<br />

Eric Lequiniou<br />

Director, High Performance Computing (Altair)<br />

Lequiniou is director of High Performance Computing at<br />

Altair. Expert in software optimization and parallelization<br />

on clusters and multi-core architectures, he developed<br />

the Hybrid MPP parallel version of RADIOSS finite<br />

element software. After a MsC degree in computer<br />

science, Eric started his career in 1994 at CNRS, in<br />

Laboratoire Informatique du Parallélisme. He joined<br />

Mecalog company in 1994 and worked for the French<br />

company until it was acquired by Altair in 2006. He also<br />

holds an Executive MBA from HEC French business<br />

school obtained in 2007.<br />

h Session(s): S0225 - Speedup Altair RADIOSS<br />

Solvers Using NVIDIA <strong>GPU</strong><br />

(Wednesday, 09:30, Room: K)<br />

Wei Li<br />

Research Scientist (Siemens Corporation)<br />

Wei Li is a research scientist at Siemens Corporation,<br />

Corporate Research & <strong>Technology</strong>, with the responsibility<br />

focused on <strong>GPU</strong>-related innovations for Siemens’<br />

products. He is the creator of the volume renderer that


is widely deployed in Syngo.via, the medical imaging<br />

platform of Siemens Healthcare. Wei Li received a PhD<br />

in computer science from Stony Brook University. His<br />

research interests include visualization, medical<br />

imaging, <strong>GPU</strong> acceleration for graphics and nongraphics<br />

applications. He has published 20+ papers in<br />

prestigious journals and conferences, and has produced<br />

10+ approved and pending patents.<br />

h Session(s): S0342 - Volumetric Processing and<br />

Visualization on Heterogeneous Architecture<br />

(Wednesday, 14:00, Room: A8)<br />

Cheng Liao<br />

Development Manager (MSCsoftware)<br />

Cheng Liao received a PhD degree from Georgia Tech, and<br />

is a development manager with MSCsoftware. His<br />

professional interests include high performance matrix<br />

computing, I/O, and other FEA related technologies. Prior<br />

to MSC, Cheng spent many years with SGI and Convex.<br />

h Session(s): S0064 - MD.Nastran Sparse Direct<br />

Solvers for Tesla <strong>GPU</strong>s<br />

(Wednesday, 14:00, Room: K)<br />

Jerome Limido<br />

Research & Development (IMPETUS Afea SAS)<br />

Jérôme LIMIDO has experience from research and<br />

advanced engineering within aerospace applications.<br />

The main work of Jérôme has focused on processes<br />

involving large deformations, both experimentally and<br />

numerically. Jérôme has special interests in advanced<br />

numerical methods and fatigue of materials. Jérôme is<br />

R&D responsible at IMPETUS Afea France and teaches<br />

Advanced Computational Mechanics and Numerical<br />

Methods at ISAE.<br />

h Session(s): S0143 – Fluid-Structure-Interaction<br />

Using SPH and GP<strong>GPU</strong> <strong>Technology</strong><br />

(Wednesday, 14:30, Room: K)<br />

Cheng-Hung Lin<br />

Associate Professor (National Taiwan Normal University)<br />

Cheng-Hung Lin received the Ph.D. degree in computer<br />

science from the National Tsing Hua University in 2008. He<br />

is currently an associate professor with National Taiwan<br />

Normal University. His current research interests include<br />

multicore programming and parallel algorithm design.<br />

h Session(s): S0054 - PFAC Library: <strong>GPU</strong>-Based<br />

String Matching Algorithm<br />

(Thursday, 14:00, Room: C)<br />

Heshan Lin<br />

Research Scientist (Virginia Tech)<br />

Heshan Lin is a Research Scientist in the Department of<br />

Computer Science at Virginia Tech. His current research<br />

focuses on the intersection of High Performance<br />

Computing and Bioinformatics. Specifically, his research<br />

aims at massively accelerating biological discoveries<br />

with emergent computational techniques including<br />

graphics processing units (<strong>GPU</strong>) and cloud computing.<br />

He is the author of the latest version of mpiBLAST, a<br />

popular parallel sequence-search software that has<br />

received thousands of downloads worldwide. He received<br />

a Ph.D. degree in Computer Science from North Carolina<br />

State University in 2009.<br />

h Session(s): S0156 - Towards Computing the Cure<br />

for Cancer (Tuesday, 17:00, Hall 1)<br />

James Lin<br />

Technical Director, High Performance Computing Center<br />

(Shanghai Jiao Tong University)<br />

James Lin is technical director for High Performance<br />

Computing Center in Shanghai Jiao Tong University and<br />

co-funder of HMPP Competence Center for AP & Japan.<br />

His major research area is parallel programming,<br />

especially for applying CUDA in CFD. He was awarded<br />

NVidia Academic Partnership <strong>Program</strong> in Year 2010 and<br />

is in reviewer committee for CUDA Campus Contest.<br />

h Session(s): S0251 - RANS CFD Solver on Fermi<br />

(Tuesday, 10:00, Room: A7)<br />

Yuan Lin<br />

Senior Engineer (NVIDIA)<br />

Yuan Lin is a senior engineer and manages the compute<br />

compiler code generation team at NVIDIA. His team’s<br />

responsibilities include PTX code generation, tools and<br />

platform support. Yuan has been at NVIDIA for 3 years.<br />

He was at Sun Microsystems and Motorola before that.<br />

He holds a doctorate in computer science from<br />

University of Illinois at Urbana-Champaign.<br />

h Session(s): S0235 – Compiling CUDA and Other<br />

Languages for <strong>GPU</strong>s (Wednesday, 10:00, Room: A5)<br />

Olay Lindtjorn<br />

(Schlumberger)<br />

Biography unavailable at press time.<br />

h Session(s): S0531 - Exascaling Your Apps<br />

(Wednesday, 09:00, Room: C)<br />

Hui Liu<br />

(University of Calgary)<br />

Hui Liu is working for the reservoir simulation group at<br />

the University of Calgary. He is leading the development of<br />

<strong>GPU</strong>-based parallel iterative solvers. He has successfully<br />

designed/implemented a sparse BLAS library, four Krylov<br />

subspace solvers, two algebraic multigrid solvers, parallel<br />

triangular solvers and several preconditioners. He<br />

received his PhD degree in Computational Mathematics<br />

and Parallel Computing from the Chinese Academy of<br />

Sciences in 2010, and his BSc. degree in Computational<br />

Mathematics from the University of Science and<br />

<strong>Technology</strong> of China (USTC) in 2005.<br />

h Session(s): S0704 - Los Alamos AHPC Symposium,<br />

Accelerating Iterative Linear Solvers on <strong>GPU</strong>s<br />

(Wednesday, 16:30, Room: J1)<br />

h S0708 - Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 1<br />

(Thursday, 09:00, Room: J3)<br />

Li-Ta Lo<br />

(Los Alamos National Laboratory)<br />

Biography unavailable at press time.<br />

h Session(s): S0706 - Los Alamos AHPC Symposium,<br />

PISTON: Portability and Performance for Data-<br />

Parallel Visualization and Analysis Operators<br />

(Wednesday, 17:30, Room: J1)<br />

Alex Loddoch<br />

Sr. Research Scientist (Chevron)<br />

Alex Loddoch is a Senior Research Scientist in Chevron’s<br />

Technical Computing group. His work includes the<br />

evaluation of emerging High Performance Computing<br />

technologies and their application to algorithms in<br />

Seismic Imaging and Processing and Reservoir<br />

Simulation. Before joining Chevron he was a Research<br />

Assistant at the University of Muenster, Germany where<br />

he worked on topics such as Computational Fluid<br />

Dynamics, Visualization and Data Compression. Alex<br />

received a M.Sc. in Physics and a Ph.D. in Geophysics<br />

from University of Muenster, studying the internal<br />

dynamics of terrestrial planets.<br />

h Session(s): S0628 - Panel Session: Learn from<br />

Experts in the Oil & Gas Industry<br />

(Wednesday, 16:30, Room: A7)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

123


SPEAKERS AND<br />

PANELISTS<br />

Rainald Lohner<br />

Professor (George Mason University)<br />

Biography unavailable at press time.<br />

h Session(s): S0218 - ASI Parallel Fortran: A<br />

General-Purpose Fortran to <strong>GPU</strong> Translator<br />

(Thursday, 16:30, Room: B)<br />

K. Patrick Lorton<br />

Principal Developer (Schrodinger)<br />

Patrick Lorton is a Principal Developer and the Technical<br />

Lead for the Core Hopping and Combiglide products at<br />

Schrödinger. He received bachelors degrees in Computer<br />

Science, Mathematics and Chemistry from Indiana<br />

University, where he published in the fields of Parallel<br />

Computing and Computational Chemistry. He has<br />

worked with Schrödinger since graduation.<br />

h Session(s): S0121 – Software Architecture to<br />

Facilitate CUDA Development<br />

(Wednesday, 16:30, Room: N)<br />

Edward Lowe<br />

Research Assistant Professor (Vanderbilt University)<br />

Dr. Lowe is a research assistant professor at Vanderbilt<br />

University developing novel computational methods for<br />

drug discovery. His interests include <strong>GPU</strong> acceleration,<br />

algorithmic techniques in massively parallel<br />

programming, machine learning, computational<br />

chemistry, and enzyme mechanisms. He currently leads<br />

a cheminformatics core in the laboratory of Professor<br />

Jens Meiler as a member of the Vanderbilt Center for<br />

Structural Biology and Institute in Chemical Biology.<br />

h Session(s): S0346 - GP<strong>GPU</strong> Accelerated Protein<br />

Similarity Measures Identifying Biological<br />

Relevant Structure (Wednesday, 17:30, Room: N)<br />

h S0354 - Bcl::ChemInfo Suite Enables Machine<br />

Learning-Based Drug Discovery Using <strong>GPU</strong>s<br />

(Thursday, 09:30, Marriott Ballroom 4)<br />

Hatem Ltaief<br />

Computational Scientist (KAUST<br />

Supercomputing Laboratory)<br />

Dr. Hatem Ltaief received the MSc degree from ISITIL, a<br />

school of engineering at the University of Claude<br />

Bernard Lyon I, France, the MSc in applied mathematics<br />

at the University of Houston and the PhD degree in<br />

computer science from the University of Houston. He<br />

was a Research Scientist II in the Innovative Computing<br />

Laboratory in the Department of Electrical Engineering<br />

and Computer Science at the University of Tennessee,<br />

Knoxville. He is currently a Computational Scientist at<br />

KAUST Supercomputing Laboratory, Saudi Arabia.<br />

h Session(s): S0042 - Solving Challenging Numerical<br />

Linear Algebra Algorithms using Multiple <strong>GPU</strong><br />

Accelerators (Wednesday, 15:00, Room: A3)<br />

Peter Lu<br />

Post-Doctoral Research Fellow (Harvard University)<br />

Peter J. Lu received his AB summa cum laude in physics<br />

(2000) from Princeton University, and AM (2002) and PhD<br />

(2008) in physics from Harvard University. He is presently<br />

a post-doctoral research fellow in the Department of<br />

Physics and SEAS at Harvard University; his main focus<br />

is on the physics of attractive colloids and the integration<br />

of high-performance imaging and analysis techniques.<br />

He conducts experiments aboard the International Space<br />

Station, examining phase separation of colloid mixtures<br />

in the absence of gravity. He has published his<br />

discoveries of modern quasicrystal geometry in medieval<br />

Islamic architectural tilings; the first precision<br />

compound machines, from ancient China; the first use of<br />

diamond, in prehistoric China; and the first<br />

quasicrystalline mineral found in nature.<br />

h Session(s): S0521 - Desktop Supercomputing<br />

in the Soft-Matter Physics Laboratory<br />

(Thursday, 10:00, Room: A3)<br />

David Luebke<br />

Senior Director of Graphics Research (NVIDIA)<br />

David Luebke helped found NVIDIA Research in 2006<br />

after eight years on the faculty of the University of<br />

Virginia. Luebke received his Ph.D. under Fred Brooks at<br />

the University of North Carolina in 1998. His principal<br />

research interests are <strong>GPU</strong> computing and real-time<br />

computer graphics. Luebke’s honors include the NVIDIA<br />

Distinguished Inventor award, the NSF CAREER and DOE<br />

Early Career PI awards, and the ACM Symposium on<br />

Interactive 3D Graphics “Test of Time Award”. Dr. Luebke<br />

has co-authored a book, a SIGGRAPH Electronic<br />

Theater piece, a major museum exhibit visited by over<br />

110,000 people, and dozens of papers, articles, chapters,<br />

and patents.<br />

h Session(s): S0609 - Computational Graphics: An<br />

Overview of Graphics Research at NVIDIA<br />

(Tuesday, 14:00, Room: B)<br />

h S0016 - NVIDIA Grad Fellowship Fast Forward<br />

(Wednesday, 10:00, Room: A2)<br />

Justin Luitjens<br />

Devtech Engineer (NVIDIA)<br />

Justin Luitjens is a Devtech Engineer at NVIDIA and<br />

works with applications engineers to optimize and port<br />

their applications to CUDA. He joined NVIDIA after<br />

receiving his Ph.D. in Scientific Computing from the<br />

University of Utah in 2011.<br />

h Session(s): S0624 - Introduction to CUDA C<br />

(Monday, 10:30, Room: A5)<br />

h S0302 - Accelerating miniFE:<br />

A Finite Element Mini-application<br />

(Thursday, 09:00, Marriott Ballroom 3)<br />

Dimitar Lukarski<br />

Research Associate (Karlsruhe Institute of<br />

<strong>Technology</strong> (KIT))<br />

Dimitar Lukarski holds a bachelor’s degree from Technical<br />

University of Sofia, Bulgaria and a master’s degree from<br />

Technical University of Karlsruhe, Germany. Currently, he<br />

is working at the Engineering Mathematics and Computing<br />

Lab (EMCL) at Karlsruhe Institute of <strong>Technology</strong> (KIT) on<br />

interdisciplinary topics in the area of parallel numerical<br />

methods and emerging hardware such as <strong>GPU</strong>s and<br />

multi-core CPUs. His focus is on robust and fine-grained<br />

parallel preconditioners with implementations on<br />

stream-based platforms such as CUDA.<br />

h Session(s): S0289 - Fine-Grained Parallel<br />

Preconditioners for Fast <strong>GPU</strong>-based Solvers<br />

(Wednesday, 09:00, Marriott Ballroom 3)<br />

h S0708 - Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 1<br />

(Thursday, 09:00, Room: J3)<br />

h S0291 - LAtoolbox: A Multi-platform Sparse<br />

Linear Algebra Toolbox<br />

(Thursday, 10:30, Marriott Ballroom 3)<br />

Chris Lupo<br />

Assistant Professor (California Polytechnic<br />

State University)<br />

Chris Lupo is an Assistant Professor of Computer<br />

Science and Computer Engineering at California<br />

Polytechnic State University in San Luis Obispo. His<br />

teaching and research interests include parallel<br />

computing, computer architecture, embedded system<br />

design and code generation. Chris earned his PhD in<br />

Computer Engineering from UC Davis in 2008.


h Session(s): S0311 - Teaching Applied Parallel<br />

Computing with <strong>GPU</strong>s (Wednesday, 17:30, Room: C)<br />

Steve Lyness<br />

VP of HPC Solutions Engineering (Appro)<br />

In November of 2007, Steve Lyness joined Appro as Vice<br />

President of HPC Solutions Engineering. Steve is<br />

responsible for the success of Appro’s closed-loop<br />

solution management, up-front consulting and<br />

pre-integration of Appro’s HPC solutions across a wide<br />

range of HPC applications. Steve also acts as a key<br />

member of the management team for project<br />

management, planning and coordinating of worldwide<br />

pre-sales and post-sales customer solution programs.<br />

Before joining Appro, Steve was Director of Sales<br />

Engineering for NetEffects, a provider of 10 GigE adapter<br />

technologies for HPC and Enterprise customers. Steve<br />

graduated from Drexel University with a Bachelor’s<br />

degree in Electrical Engineering with an emphasis on<br />

radar and signal processing technologies.<br />

h Session(s): S0618 - Best Practices of a 800TFlop<br />

Hybrid Supercomputer Implementation<br />

(Tuesday, 09:30, Room: M)<br />

Henrik Høj Madsen<br />

Solution Architect (LEGO)<br />

Henrik’s background is based on a Master degree in<br />

Computer sciences and Engineering from Technical<br />

University of Denmark where he designed and<br />

implemented a realtime raytracing architecture on FPGA<br />

hardware. Henrik was CEO and Lead game developer in<br />

DogOnFire Interactive, a small game development<br />

company dedicated to producing core MMO technologies<br />

for indie market developers. He is currently positioned<br />

as Solution Architect at LEGO where he is the architect<br />

of the 3D rendering backend technologies for “LEGO<br />

Universe”, LEGO’s Massive Online Multiplayer Game for<br />

LEGO fans worldwide.<br />

h Session(s): S0261 - Scalable <strong>GPU</strong> Computing<br />

Service Architecture (Tuesday, 16:00, Room: A5)<br />

Alireza Mahani<br />

Quantitative Modeler (Sentrana)<br />

Dr. Alireza S Mahani works as a computational scientist<br />

at Sentrana Inc., a quantitative marketing company in<br />

Washington, DC. His recent work has been focused on<br />

building high-performance software (using CUDA/<br />

OpenMP/MPI) for Monte Carlo Markov Chain (MCMC)<br />

sampling of high-dimensional conditional posterior<br />

distributions arising in Gibbs sampling of Hierarchical<br />

Bayesian models. Prior to joining Sentrana, Dr. Mahani<br />

worked as a management consultant at McKinsey & Co.<br />

He holds a Ph.D. in Physics from Washington University<br />

in St. Louis, where his research on statistical modeling<br />

of neuronal motion processing in the avian brain<br />

resulted in six articles in peer-reviewed journals.<br />

h Session(s): S0035 - <strong>GPU</strong> Parallelization of Gibbs<br />

Sampling: Abstractions, Results, and Lessons<br />

Learned (Wednesday, 15:00, Marriott Ballroom 3)<br />

Filipe Maia<br />

Fellow (Lawrence Berkeley National Laboratory)<br />

Filipe Maia graduated in biochemistry from Oporto<br />

University, Portugal, in 2004 and completed his PhD in<br />

Physics at Uppsala University, Sweden. He is currently a<br />

Petascale Postdoctoral Fellow at NERSC, Lawrence<br />

Berkeley National Laboratory. His main research<br />

interests, besides <strong>GPU</strong> computing, are diffraction<br />

imaging, image reconstruction and compressive sensing.<br />

h Session(s): S0131 - Multi-<strong>GPU</strong> Real-Time<br />

Ptychographic X-ray Image Reconstruction<br />

(Wednesday, 16:00, Room: A8)<br />

Jason Mak<br />

Graduate Student (UC Davis)<br />

Jason is a computer science Ph.D student at U.C. Davis.<br />

He received my B.S. in computer science from California<br />

Polytechnic State University. His research interests<br />

include <strong>GPU</strong> computing, parallel algorithms and<br />

architectures, and scientific computing.<br />

h Session(s): S0361 – Lossless Data Compression on<br />

<strong>GPU</strong>s (Wednesday, 17:00, Room: B)<br />

Allen Malony<br />

Professor (University of Oregon)<br />

Allen D. Malony is a Professor in the Department of<br />

Computer and Information Science at the University of<br />

Oregon where he directs the TAU parallel performance<br />

system project. His research interests are in parallel<br />

computing, performance tools, and computational<br />

science. Malony was awarded the NSF National Young<br />

Investigator award, was a Fulbright Research Scholar to<br />

The Netherlands and Austria, and received the<br />

prestigious Alexander von Humboldt Research Award for<br />

Senior U.S. Scientists by the Alexander von Humboldt<br />

Foundation. He also received a Professor Partnership<br />

award from NVIDIA Corporation. Malony is CEO of<br />

ParaTools, Inc., founded in 2005.<br />

h Session(s): S0298 - Performance Tools for<br />

<strong>GPU</strong>-Powered Scalable Heterogeneous Systems<br />

(Wednesday, 17:00, Room: A5)<br />

Jonathan Marbach<br />

Director, Software Architecture and Engineering<br />

(TerraSpark Geosciences)<br />

Jonathan Marbach is Director of Software Architecture<br />

and Engineering at TerraSpark Geosciences, makers of<br />

the 3D Seismic Interpretation package Insight Earth. He<br />

received his PhD from the University of Colorado and<br />

specializes in 3d graphics, virtual reality, and<br />

visualization. He presented at <strong>GTC</strong> 2010 on <strong>GPU</strong><br />

accelerated stereographic rendering.<br />

h Session(s): S0336 - <strong>GPU</strong> Acceleration for Seismic<br />

Interpretation Algorithms (Tuesday, 16:00, Room: A7)<br />

Nikolay Markovskiy<br />

HPC DevTech Engineer (NVIDIA)<br />

Nikolay Markovskiy is a developer technology engineer<br />

at NVIDIA and specializes in high performance<br />

computing using CUDA. He has a background in<br />

computational condensed matter physics and made his<br />

PhD in multi-level Monte Carlo algorithms at University<br />

of Southern California.<br />

h Session(s): S0247 – 3D ADI Method for Fluid<br />

Simulation on Multiple <strong>GPU</strong>s<br />

(Tuesday, 17:00, Marriott Ballroom 3)<br />

Samuel Maroy<br />

Software Engineer (Barco)<br />

Samuel Maroy received the M.Sc. degree in computer<br />

science from the Universiteit Gent in 2008. He joined<br />

Barco, in August 2008, as software engineer working on<br />

the development of a networked visualization system.<br />

Since 2011, Samuel focuses on the use of <strong>GPU</strong>’s to<br />

power the video streaming and video processing in<br />

Barco’s next generation visualization platform. Outside<br />

of work, Samuel is interested in graphics rendering and<br />

hopes someday to build his own game. Furthermore, he<br />

enjoys cycling, soccer, racing and spending time with<br />

friends.<br />

h Session(s): S0252 - Building Real-Time<br />

Professional Visualization Solutions with OpenCL<br />

(Thursday, 10:30, Room: A1)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

125


SPEAKERS AND<br />

PANELISTS<br />

Naoya Maruyama<br />

Assistant Professor (Tokyo Institute of <strong>Technology</strong>)<br />

Naoya Maruyama received his Ph.D. degree in Computer<br />

Science from Tokyo Institute of <strong>Technology</strong> in 2008, and<br />

is an Assistant Professor at Global Scientific Information<br />

and Computing Center, Tokyo Institute of <strong>Technology</strong>. He<br />

has been working on research topics related to<br />

large-scale high performance computing, including fault<br />

tolerance, low power computing, and programming<br />

models for heterogeneous systems.<br />

h Session(s): S0367 - Physis: An Implicitly Parallel<br />

Framework for Stencil Computations<br />

(Wednesday, 16:30, Room: C)<br />

Issei Masaie<br />

Chief Engineer (Prometech Software, Inc.)<br />

Issei Masaie is a chief engineer of Prometech Software<br />

and works on developing physics simulation and<br />

acceleration technology on gpu / cell / multicore<br />

hardware for particle-based CAE software. In 2005 he<br />

recieved a master’s degree from the Department of<br />

Quantum Engineering and System Science, at the<br />

Graduate School of Engineering, The University of Tokyo.<br />

h Session(s): S0066 – Particleworks: Particle-based<br />

CAE Software Fully Ported on Multi-<strong>GPU</strong><br />

(Wednesday, 10:00, Room: K)<br />

Chris Mason<br />

Product Manager (Acceleware)<br />

Chris is the Product Manager for Acceleware’s <strong>GPU</strong><br />

accelerated electromagnetic product line. He is<br />

responsible for the successful development and launch of<br />

Acceleware products used by companies world-wide.<br />

Chris has seven years of experience in developing<br />

commercial applications for the <strong>GPU</strong> and has delivered<br />

over 20 CUDA courses to students in a diverse range of<br />

industries. His previous experience also includes<br />

parallelization of algorithms on digital signal processors<br />

(DSPs) for cellular phones and base stations. Chris has a<br />

Masters in Electrical Engineering from Stanford University.<br />

h Session(s): S0614 - Part 1: Introduction to <strong>GPU</strong><br />

<strong>Program</strong>ming (Monday, 09:00, Room: C)<br />

h S0615 - Part 2: Introduction to the <strong>GPU</strong><br />

Architecture and Memory Model<br />

(Monday, 10:30, Room: C)<br />

h S0616 - Part 3: Debugging <strong>GPU</strong> <strong>Program</strong>s<br />

(Monday, 13:00, Room: C)<br />

h S0617 - Part 4: Introduction to Optimizations and<br />

Profiling (Monday, 14:30, Room: C)<br />

Enrico Mastrostefano<br />

PhD Student (Sapienza Università di Roma)<br />

Enrico is a PhD student at Sapienza University of Rome.<br />

h Session(s): S0241 - Large Graphs on Multi-<strong>GPU</strong>s<br />

(Wednesday, 16:30, Marriott Ballroom 3)<br />

Satoshi Matsuoka<br />

Titech<br />

Biography unavailable at press time.<br />

h Session(s): S0531 - Exascaling Your Apps<br />

(Wednesday, 09:00, Room: C)<br />

David McAllister<br />

OptiX Manager (NVIDIA, OptiX group)<br />

Bio unavailable at press time.<br />

h Session(s): S0366 - OptiX Out-of-Core and CPU<br />

Rendering (Tuesday, 15:30, Room: J1)<br />

Chris McClanahan<br />

Software Engineer (AccelerEyes)<br />

Chris McClanahan is a software engineer at<br />

AccelerEyes. He has a Master’s Degree in Computer<br />

Science from the Georgia Institute of <strong>Technology</strong>, with a<br />

focus on computer vision and computational<br />

photography.<br />

h Session(s): S0287 - Jacket for Multidimensional<br />

Scaling in Genomics (Tuesday, 17:30, Room: K)<br />

h S0325 - ArrayFire Graphics: A Tutorial<br />

(Wednesday, 10:00, Room: A3)<br />

Iain McCready<br />

CEO (Cortexica)<br />

Iain has over 25 years experience within the world’s<br />

Telecommunications and IT Industries. Until recently he<br />

was the CEO of NeoMedia Inc., a public US based<br />

software business that is the world leader in state-of-the<br />

art barcode creation, capture, delivery and reading<br />

technology. Prior to that Iain was CEO of Mobiqa Limited,<br />

an Edinburgh based business where he led the company<br />

form a start up to the world leaders in mobile ticketing,<br />

mobile boarding pass and couponing solutions based on<br />

the creation, optimisation, delivery and redemption of<br />

barcodes to mobile phones. He was also Chairman of<br />

Scolocate Limited a co-location and managed services<br />

business specialising in IT architecture, design and<br />

planning, project management and implementation<br />

services. Prior to that he was Chief Operating Officer of<br />

KSCL, Scotland’s largest software house and a leading<br />

supplier of customer care and billing applications to the<br />

world’s mobile phone operators.<br />

h Sessions: S2000 – Emerging Companies Summit<br />

Opening with Jeff Herbst (VP of Business<br />

Development, NVIDIA), Followed by CEO on<br />

Stage Featuring, Rocketick and Cortexica<br />

(Wednesday, 09:00, Marriott Ballroom 4)<br />

Myles M. McGovern<br />

President/CEO (Immersive Media)<br />

Myles McGovern has served as the President and CEO of<br />

Immersive Media since 2004. Under Myles’ direction IMC<br />

has pioneered and become the world leading provider of<br />

3600 interactive video experience ever since. Prior to<br />

joining IMC Myles was the Founder, President and CEO<br />

of Centrinity/MC2 where he spearheaded the company’s<br />

rapid growth in 55 countries and was twice nominated<br />

for Canadian Entrepreneur of the Year. After his post<br />

secondary education at Simon Fraser University Myles<br />

gained valuable technology experience during his 10<br />

years at Xerox culminating in product management for<br />

their digital product integration strategy.<br />

h Session(s): SS2004 – Emerging Companies<br />

Summit: CEO on Stage Featuring GAIKAI,<br />

Immersive Media, and Numecent<br />

(Wednesday, 15:00, Marriott Ballroom 4)<br />

Morgan McGuire<br />

Visiting Professor (NVIDIA and WIlliams College)<br />

Morgan McGuire is a visiting professor in the NVIDIA<br />

Research Graphics Group, where he works on real-time<br />

special effects and future <strong>GPU</strong>s, and an assistant<br />

professor of Computer Science at Williams College<br />

where he teaches computer graphics and game design.<br />

He is also the editor in chief of the Journal of Graphics<br />

Tools. Dr. McGuire contributed to many commercial<br />

products including the E-Ink display for the Amazon<br />

Kindle, the PeakStream high-performance computing<br />

infrastructure acquired by Google, the Titan Quest role<br />

playing game, and the Marvel Ultimate Alliance 2 video<br />

game for Xbox 360.<br />

h Session(s): S0409 – Stochastic Rasterization<br />

(Tuesday, 15:30, Room: B)


Simon McIntosh-Smith<br />

(The University of Bristol)<br />

Simon McIntosh-Smith has spent most of his life<br />

designing and programming multi-core and many-core<br />

systems. He began his career as a microprocessor<br />

architect at Inmos and STMicroelectronics, before<br />

co-designing the world’s first fully programmable <strong>GPU</strong><br />

at Pixelfusion in 2000. In 2002 he co-founded ClearSpeed<br />

where, as Director of Architecture and Applications, he<br />

led the development of the first modern many-core HPC<br />

accelerators. In 2003 he designed the first accelerated<br />

BLAS/LAPACK and FFT libraries, leading to the first<br />

modern accelerated Top500 system, TSUBAME 1.0 at<br />

Tokyo Tech in 2006. He now leads the Microelectronics<br />

Research Group at the University of Bristol, UK.<br />

h Session(s): S0703 - Los Alamos AHPC Symposium,<br />

Adaptive Heterogeneous Computing with<br />

OpenCL: A Molecular Docking Case Study<br />

(Wednesday, 16:00, Room: J1)<br />

h S0709 - Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 2<br />

(Thursday, 14:00, Room: J1)<br />

Sara McMains<br />

Professor (UC Berkeley)<br />

Dr. McMains is an Associate Professor of Mechanical<br />

Engineering at Berkeley. Her research interests include<br />

geometric solid modeling, CAD/CAM, <strong>GPU</strong> algorithms,<br />

geometric Design for Manufacturing feedback, computer<br />

aided process planning, layered manufacturing,<br />

computer graphics, visualization, and virtual prototyping.<br />

Applications of her research include haptic design<br />

environments, accessibility analysis for manufacturing,<br />

design for cleanability, layered manufacturing, and<br />

machining. She received her AB from Harvard and her<br />

MS and PhD from Berkeley, all in Computer Science. She<br />

is the recipient of Best Paper Awards from Usenix, ASME<br />

and the ACM Solid and Physical Modeling Symposium,<br />

and the NSF CAREER Award.<br />

h Session(s): S0410 - Computing Hausdorff<br />

Distances between Freeforms on the <strong>GPU</strong><br />

(Wednesday, 17:00, Marriott Ballroom 3)<br />

h S0411 - Artifact-Free Cloud-Based CAD Rendering<br />

(Thursday, 16:30, Room: L)<br />

Gaetano Mendola<br />

Principal Engineer (MBI srl)<br />

Principal Software Engineer for MBI srl. MBI develops<br />

exclusive critical mission solutions. He graduated in<br />

computer engineer at University of Pisa. His interest are<br />

related to low latency systems. Since 2008 exploiting the<br />

Software Designed Radio approach is leading the<br />

building of real demodulators completely in software<br />

offloading to <strong>GPU</strong> what normally other do with FPGA.<br />

h Session(s): S0065 - Satellite HUB Communication<br />

System <strong>GPU</strong> Based (Thursday, 16:30, Room: M)<br />

Duane Merrill<br />

Research Scientist (NVIDIA)<br />

Duane Merrill joined NVIDIA Research after completing<br />

his Ph.D. in Computer Science at the University of<br />

Virginia. His research interests include algorithmic<br />

primitives, design idioms, and programming models with<br />

a particular focus on dynamic, irregular, and cooperative<br />

parallelism. He contributes to the B40C and Thrust open<br />

source libraries of <strong>GPU</strong> computing primitives.<br />

h Session(s): S0600 - Scalable <strong>GPU</strong> Graph Traversal<br />

(Wednesday, 14:00, Room: A2)<br />

Peter Messner<br />

Compute Devtech Engineer (NVIDIA)<br />

Peter Messmer has been developing and optimizing<br />

parallel scientific software for over 15 years. After<br />

completing his PhD in solar plasma-physics at ETH Zurich<br />

in 2001, Peter joined Tech-X Corp in Boulder, CO, where he<br />

was leading a group of scientists solving space-related<br />

simulation and data analysis problems. As part of a NASA<br />

project, he became an early adopter of <strong>GPU</strong> computing<br />

and the lead developer of <strong>GPU</strong>Lib, a library for accelerating<br />

data analysis tasks with <strong>GPU</strong>s. Since joining NVIDIA in<br />

2011, he has been working with clients to optimize their<br />

massively parallel <strong>GPU</strong> applications.<br />

h Session(s): S0629 - CUDA Accelerated Compute<br />

Libraries (Monday, 13:00, Room: A5)<br />

h S0256 - A Stencil Library for the New Dynamic<br />

Core of COSMO (Thursday, 09:00, Room: N)<br />

Renato Miceli<br />

Computational Scientist (ICHEC)<br />

Renato Miceli is a Computational Scientist and <strong>GPU</strong><br />

Developer at the Irish Centre for High-End Computing.<br />

He has a BSc in Computer Science (hons) from<br />

Universidade Federal de Campina Grande, Brazil, where<br />

he focused on Software Engineering and Distributed<br />

Systems, especially Grid and Cloud Computing for HPC.<br />

At ICHEC, Renato works primarily at analyzing,<br />

developing, optimizing and porting of applications to<br />

many-core architectures; his past projects involved<br />

cryptography, financial simulation, geophysical analysis<br />

and molecular dynamics. Renato also works on the<br />

European FP7 projects PRACE, in enabling scientific<br />

computing on <strong>GPU</strong>s; and AutoTune, for automatic tuning<br />

of <strong>GPU</strong> codes.<br />

h Session(s): S0034 – Real-Time Risk Simulation:<br />

The <strong>GPU</strong> Revolution In Profit Margin Analysis<br />

(Tuesday, 15:00, Room: L)<br />

Paulius Micikevicius<br />

Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Paulius Micikevicius is a Developer <strong>Technology</strong> Engineer<br />

at NVIDIA with a focus on parallel computation and<br />

performance analysis. He has been involved in the<br />

analysis and optimization of both industrial and scientific<br />

codes over several generations of <strong>GPU</strong>s starting with<br />

G80, the first CUDA-capable architecture. Prior to joining<br />

NVIDIA, Paulius was an assistant professor of Computer<br />

Science at Armstrong Atlantic State University as well as<br />

a research associate at the Media Convergence<br />

Laboratory at UCF. Paulius holds a PhD in Computer<br />

Science from the University of Central Florida and a B.S.<br />

in Computer Science from Midwestern State University.<br />

h Session(s): S0515 - Multi-<strong>GPU</strong> <strong>Program</strong>ming<br />

(Tuesday, 14:00, Room: Hall 1)<br />

h S0628 - Panel Session: Learn from Experts in the<br />

Oil & Gas Industry (Tuesday, 16:30, Room: A7)<br />

h S0514 - <strong>GPU</strong> Performance Analysis and<br />

Optimization (Wednesday, 15:30, Hall 1)<br />

Phillip Miller<br />

Director, Workstation Software Product<br />

Management (NVIDIA)<br />

Bio unavailable at press time.<br />

h Session(s): S0603 - <strong>GPU</strong> Ray Tracing<br />

(Monday, 10:30, Room: A3)<br />

h S0604 - NVIDIA Advanced Rendering Solutions<br />

(Monday, 13:00, Room: A3)<br />

Aamir Mohammad<br />

Associate Director (Aon Benfield Securities)<br />

Aamir leads the development of High Productivity<br />

Computing solutions for Variable Annuity derivatives<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

127


SPEAKERS AND<br />

PANELISTS<br />

models, at Aon Benfield Securities. Prior to joining Aon,<br />

Aamir worked in US Variable Annuity Hedging at a global<br />

insurance company, and began his career in quantitative<br />

finance at a hedge fund in Toronto. Aamir has over five<br />

years of experience in computational finance, trading<br />

and software development. Aamir holds an Honors B.Sc.<br />

in Applied Mathematics & Statistics from the University<br />

of Toronto.<br />

h Session(s): S0418 - High Productivity<br />

Computational Finance on <strong>GPU</strong>s<br />

(Tuesday, 14:00, Room: L)<br />

Jamal Mohd-Yusof<br />

(Los Alamos National Laboratory)<br />

Jamal Mohd-Yusof is member of the Collaborative<br />

<strong>Program</strong>ming team in Applied Computer Science group<br />

at LANL. He was part of team which worked on Open<br />

Science programming for Roadrunner, where he was<br />

responsible for refactoring and porting of the CFDNS-RR<br />

fluid dynamics code, including development of a novel<br />

low-communication tridiagonal solver. He has been<br />

working with advanced architectures for several years,<br />

and teaches OpenCL courses at LANL. He is currently<br />

developing and profiling physics algorithms for a variety<br />

of advanced architectures. Prior to coming to LANL he<br />

worked at the Center for Turbulence Research at<br />

Stanford University. He received his MS and PhD from<br />

Cornell University in fluid mechanics, where he<br />

developed novel computational techniques for multiphase<br />

flow simulation.<br />

h Session(s): S0708 - Accelerated HPC Symposium:<br />

Applications - Methods and <strong>Program</strong>ming Models,<br />

Part 1 (Thursday, 09:00, Room: J3)<br />

Alexander Monakov<br />

Researcher (ISP RAS)<br />

Alexander Monakov is a PhD candidate at Moscow State<br />

University and a researcher at Institute for System<br />

<strong>Program</strong>ming, specializing in program optimization and<br />

compiler technology. He has provided improvements to<br />

the GCC compiler, including contributions to Graphite-<br />

OpenCL, an automatic translation pass that generates<br />

OpenCL code from parallel loops.<br />

h Session(s): S0115 - Specialized Sparse Matrix<br />

Formats and SpMV Kernel Tuning for <strong>GPU</strong>s<br />

(Wednesday, 10:30, Marriott Ballroom 3)<br />

Brooks Moses<br />

Ph.D., Sourcerer (Mentor Graphics Corporation)<br />

Dr. Moses leads the High Performance Computing<br />

Solutions team in Mentor Graphics’ Embedded Software<br />

Division. He also participates directly in the development<br />

of the Sourcery VSIPL++ library and other highperformance<br />

library products. Dr. Moses worked<br />

extensively on the Cell/B.E and NVIDIA CUDA ports of<br />

Sourcery VSIPL++. Dr. Moses holds a Ph.D. in<br />

Mechanical Engineering from Stanford University where<br />

he conducted advanced research into algorithms for<br />

computational fluid dynamics simulation.<br />

h Session(s): S0620 - VSIPL++: A High-Level<br />

<strong>Program</strong>ming Model for Productivity and<br />

Performance (Tuesday, 15:00, Room: M)<br />

Daniel Moth<br />

Principal <strong>Program</strong> Manager (Microsoft)<br />

As a Principal <strong>Program</strong> Manager in the Developer<br />

Division, Daniel Moth is responsible for parallel runtimes<br />

and tools that ship with Visual Studio. He has been with<br />

Microsoft for over five years; before that he worked in<br />

the UK as a consultant for Avanade, and before that as a<br />

developer for a Honeywell company for seven years. In<br />

his free time you can find him on FICS playing chess or<br />

near a beach SCUBA diving with his wife.<br />

h Session(s): S0242 - Harnessing <strong>GPU</strong> Compute<br />

with C++ AMP (Part 1 of 2)<br />

(Wednesday, 17:00, Room: A3)<br />

h S0244 - Harnessing <strong>GPU</strong> Compute with C++ AMP<br />

(Part 2 of 2) (Thursday, 10:00, Room: C)<br />

Supratik Moulik<br />

Cardiovascular Imaging Fellow (University of Pennsylvania)<br />

Biography unavailable at press time.<br />

h Session(s): S0303 - <strong>GPU</strong> Acceleration for<br />

Threshold Based Region Growth Algorithms<br />

(Thursday, 09:00, Room: C)<br />

Sathya Narayana K.<br />

Principal Consultan (Infosys Ltd.)<br />

Sathya Narayana K. is a Principal Consultant with<br />

Advanced Engineering Group (AEG) of Infosys. He has<br />

more than twenty years of experience in the areas of<br />

high performance scientific computing (HPC), Computer<br />

Graphics (CG), Mathematical Modeling & Simulation and<br />

Engineering Software Development. His research<br />

interests include Mathematical Modeling, Simulation,<br />

Optimization and Operations Research in Aerospace,<br />

Gaming, Oil and Gas industry. He has Master of Science<br />

degree in structural engineering (1993) and information<br />

technology. He has published 5 papers in national and<br />

international conferences.<br />

h Session(s): S0214 - <strong>GPU</strong> Based Stacking Sequence<br />

Optimization For Composite Skins Using GA<br />

(Wednesday, 15:00, Room: K)<br />

Ramesh Narayanaswamy<br />

Principal Engineer (Synopsys Inc.)<br />

Ramesh works on Optimizing Compilers and Special<br />

Purpose Supercomputers for Hardware Description<br />

Language execution. Notable architectures from past<br />

projects include a 96 core Heterogeneous Computer with<br />

MIPS Core + ASIC Coprocessor, a 1024 core HDL<br />

Processor, and a Multicore CPU + Array of FPGAs. These<br />

architectures provide orders of magnitude performance<br />

improvement. Ramesh has been granted seven patents.<br />

h Session(s): S0317 - Compiling a Parallel<br />

Domain Specific Language to <strong>GPU</strong>s<br />

(Tuesday, 09:00, Room: J3)<br />

Rajib Nath<br />

Student (University of California San Diego)<br />

Biography unavailable at press time.<br />

h Session(s): S0248 - Excitements, Challenges, and<br />

Rewards In Optimizing GP<strong>GPU</strong> Kernels<br />

(Tuesday, 09:00, Marriott Ballroom 3)<br />

Vincent Natoli<br />

Founder & CEO (Stone Ridge <strong>Technology</strong>)<br />

Dr. Vincent Natoli is the founder and CEO of Stone Ridge<br />

<strong>Technology</strong>. Stone Ridge is an NVIDIA partner that<br />

develops, optimizes and ports complex scientific and<br />

engineering codes to <strong>GPU</strong> and multi-core platforms. The<br />

company focusses on work in the energy industry and has<br />

experience with seismic, reservoir simulation and other<br />

industry applications. Dr. Natoli has a BS and MS from MIT,<br />

a PhD in Physics from the University of Illinois Urbana-<br />

Champaign and an MS in technology management from<br />

the University of Pennsylvania and Wharton School. He<br />

worked for 10 years with ExxonMobil Corporate research<br />

before starting Stone Ridge <strong>Technology</strong>.<br />

h Session(s): S0140 – Accelerating Reservoir<br />

Simulation and Algebraic Multigrid with <strong>GPU</strong>s<br />

(Wednesday, 14:00, Room: A7)


Maxim Naumov<br />

Software Engineer (NVIDIA)<br />

Maxim Naumov’s expertise is in the area of parallel<br />

numerical linear algebra. In particular, he has worked<br />

on parallel iterative linear systems and eigenvalue<br />

solvers. He received his Ph.D. in Computer Science (with<br />

specialization in Computational Science and<br />

Engineering) in 2009 and his B.Sc. in Computer Science<br />

and Mathematics in 2003, all from Purdue University<br />

– West Lafayette. He currently works in NVIDIA CUDA<br />

Platform team developing parallel numerical algorithms<br />

for Graphics Processing Units (<strong>GPU</strong>s). He has previously<br />

worked in the Intel Corporation Microprocessor<br />

<strong>Technology</strong> Lab and Computational Software Lab, and<br />

received a 2008-09 Intel Foundation Ph.D. Fellowship.<br />

h Session(s): S0149 - On the Parallel Solution of<br />

Sparse Triangular Linear Systems<br />

(Wednesday, 16:00, Room: A3)<br />

Dan Negrut<br />

Associate Professor (University of Wisconsin-Madison)<br />

Dan Negrut received his Mechanical Engineering Ph.D.<br />

in 1998 from the University of Iowa after which he spent<br />

six years in the CAE industry. In 2004 he served as<br />

Adjunct Assistant Professor in the Department of<br />

Mathematics at the University of Michigan. He spent<br />

2005 as a Visiting Scientist at Argonne National<br />

Laboratory in the Mathematics and Computer Science<br />

Division. At the end of 2005 Dan joined the Mechanical<br />

Engineering faculty at the University of Wisconsin-<br />

Madison. His interests are in Computational Science and<br />

he leads the Simulation-Based Engineering Lab (http://<br />

sbel.wisc.edu) and Wisconsin Applied Computing Center.<br />

h Session(s): S0518 - <strong>GPU</strong> Computing: From Sand to<br />

Tank Dynamics (Wednesday, 17:00, Room: K)<br />

Chee Ng<br />

Research Assistant Professor of Pediatrics (Children<br />

Hospital of Philadelphia/University of Pennsylvania)<br />

Dr. Chee M Ng PharmD PhD FCP, is a Research<br />

Assistant Professor of Pediatrics, at the University of<br />

Pennsylvania and an investigator of the Laboratory for<br />

Applied Pharmacokinetic/Pharmacodynamic in the<br />

Division of Clinical Pharmacology and Therapeutics at<br />

the Children’s Hospital of Philadelphia (CHOP). He is<br />

also an investigator of Kinetic Modeling and Simulation<br />

(KMAS) core of the University of Pennsylvania. He<br />

received his B.S. from the State University of New York at<br />

Buffalo, Doctor of Pharmacy with High Honor from the<br />

University of Illinois, PhD in pharmaceutics from the<br />

University of North Carolina at Chapel Hill.<br />

h Session(s): S0262 - <strong>GPU</strong>-Accelerated Model-Based<br />

Drug Development (Wednesday, 10:00, Room: B)<br />

Trung Dac Nguyen<br />

(University of Michigan)<br />

Biography unavailable at press time.<br />

h Session(s): S0058 – Advancing <strong>GPU</strong> Molecular<br />

Dynamics: Rigid Bodies in HOOMD-blue<br />

(Wednesday, 10:00, Room: N)<br />

Dave Nichols<br />

(Schlumberger)<br />

Biography unavailable at press time.<br />

h Session(s): S0628 - Panel Session: Learn from<br />

Experts in the Oil & Gas Industry<br />

(Wednesday, 16:30, Room: A7)<br />

Marc Nienhaus<br />

(NVIDIA ARC)<br />

Biography unavailable at press time.<br />

h Session(s): S0507 – Interactive and Scalable<br />

Subsurface Data Visualization Framework<br />

(Wednesday, 16:00, Room: A7)<br />

Claus Nilsson<br />

<strong>Program</strong>mer (Tietronix Software, Inc.)<br />

Biography unavailable at press time.<br />

h Session(s): S0321 - <strong>GPU</strong>-Based Monte Carlo Ray<br />

Tracing Simulation for Solar Power Plants<br />

(Tuesday, 14:00, Room: A8)<br />

Lars Nyland<br />

Senior Architect (NVIDIA)<br />

Lars Nyland has been a Senior Architect in the Compute-<br />

Architecture Group at NVIDIA for over 6 years. Among his<br />

concerns is memory performance for <strong>GPU</strong> computing,<br />

and one of the more interesting sub-problems has been<br />

the implementation and performance evaluation of atomic<br />

memory operations on tesla, fermi and kepler <strong>GPU</strong>s.<br />

Prior to joining NVIDIA, Lars was a professor of Computer<br />

Science at the University of North Carolina and the<br />

Colorado School of Mines. Lars earned his Ph.D. studying<br />

parallel programming at Duke University in 1991.<br />

h Session(s): S0313 - Understanding and using<br />

Atomic Memory Operations<br />

(Tuesday, 14:00, Marriott Ballroom 3)<br />

h Session(s): S0642 – Inside Kepler<br />

(Wednesday, 14:00, Hall 1)<br />

Akira Nukada<br />

Researcher (Tokyo Institute of <strong>Technology</strong>)<br />

Akira Nukada is a researcher at Global Scientific<br />

Information and Computing center, Tokyo Institute of<br />

<strong>Technology</strong>, Japan. His research interest includes high<br />

performance computing, especially on fast Fourier<br />

transform and <strong>GPU</strong> computing. He has developed the<br />

FFTSS library and NukadaFFT library, which are for<br />

superscalar processor systems and for NVIDIA CUDA<br />

<strong>GPU</strong>s, respectively. Both of them have a kind of<br />

auto-tuning mechanism and the performance is often<br />

competitive with vendor’s libraries.<br />

h Session(s): S0209 - Performance of 3-D FFT<br />

Using Multiple <strong>GPU</strong>s with CUDA 4<br />

(Wednesday, 10:30, Room: A3)<br />

Anton Obukhov<br />

Engineering Consultant (Ubiquiti Networks)<br />

Anton Obukhov’s specialization lies in the field of<br />

computer vision, multimedia processing, and systems<br />

design. Prior to joining Ubiquiti Networks, he was an<br />

engineer at NVIDIA in the Developer <strong>Technology</strong> group<br />

for four years. He graduated from Moscow State<br />

University with a master’s degree in Computer Science<br />

from the Computational Mathematics and Cybernetics<br />

department in Russia. Before joining NVIDIA, he<br />

conducted research and development in the Graphics<br />

and Multimedia Lab at Moscow State University while<br />

also working at YUVsoft Corporation.<br />

h Session(s): S0062 - Histograms of Oriented<br />

Gradients with CUDA: Performance Analysis and<br />

Optimization Tips (Tuesday, 16:00, Room: A1)<br />

David Oehmke<br />

(Cray Inc.)<br />

Biography unavailable at press time.<br />

h Session(s): S0089 – Accelerator Directives, OpenACC<br />

and OpenMP4ACC (Tuesday, 16:00, Room: A3)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

129


SPEAKERS AND<br />

PANELISTS<br />

Taro Okamoto<br />

Assistant Professor (Department of Earth and Planetary<br />

Sciences, Tokyo Institute of <strong>Technology</strong>)<br />

Taro Okamoto’s major research fields include:<br />

geophysics, in particular seismology: simulating and<br />

analyzing seismic waves to study the structure of the<br />

Earth and other planets, and to study the earthquake<br />

source physics.<br />

h Session(s): S0352 - <strong>GPU</strong>-Accelerated Parallel<br />

Computing for Simulation of Seismic Wave<br />

Propagation (Wednesday, 10:30, Room: A7)<br />

Michal Okoniewski<br />

Director of Marketing (Acceleware Ltd.)<br />

Biography unavailable at press time.<br />

h Session(s): S0433 – Accelerated FDTD Technique<br />

for Marine Controlled Source Electromagnetic<br />

Imaging (Wednesday, 15:30, Room: A7)<br />

Aaron Oliker<br />

Partner/Director of 3D <strong>Technology</strong> (BioDigital)<br />

Aaron is a partner and Director of 3D <strong>Technology</strong> at<br />

BioDigital. Aaron is an expert in the field of 3D computer<br />

based medical simulation and his work has created a<br />

new paradigm in medical education. Aaron is also a<br />

Research Assistant Professor of Educational Informatics<br />

New York University School of Medicine. He has taught<br />

3D programming and medical visualization at the<br />

undergraduate and graduate level at NYU and SVA for<br />

past 12 years. Prior to BioDigital, Aaron founded<br />

CyberFiber, Inc. and was the Director of Animation and<br />

<strong>Program</strong>ming at the New York University School of<br />

Medicine Virtual Surgery Research Laboratory.<br />

h Session(s): S2001 – Emerging Companies<br />

Summit: CEO on Stage Featuring Unity<br />

Technologies, MirriAd, and BioDigital<br />

(Wednesday, 10:00, Marriott Ballroom 4)<br />

Brent Oster<br />

Applied Engineer (NVIDIA)<br />

Brent Oster is an applied engineer at NVIDIA, with 17<br />

years experience in computer graphics and simulation,<br />

having worked with Bioware, LucasFilm, Electronic Arts,<br />

and holding a degree in Aerospace Engineering and<br />

graduate studies in scientific computing.<br />

h Session(s): S0403 - NURBS Tessellation with CUDA<br />

(Tuesday, 15:00, Room: J1)<br />

Eugene Ostroukhov<br />

Tools Developer (NVIDIA)<br />

Eugene Ostroukhov is currently a part of the NVIDIA<br />

CUDA developer tools team, developing NVIDIA Nsight<br />

for Linux and Mac platforms. He believes in visual tools<br />

as an important way to combat ever-increasing software<br />

complexity and spent almost a decade working on visual<br />

tools and popular integrated developing environments<br />

for Java, web and mobile application developers. He<br />

holds B.S. and M.S. from KNEU.<br />

h Session(s): S0420 – NSight IDE for Linux and Mac<br />

(Wednesday, 09:00, Room: A5)<br />

Andrew Page<br />

Senior Product Manager (NVIDIA)<br />

Andrew Page is the Senior Product Manager for<br />

multi-display and broadcast video products in NVIDIA’s<br />

Quadro product line. Over his 15 years in hardware and<br />

software industries he has held engineering and<br />

marketing roles in professional photo imaging, color<br />

management and high performance 3D graphics toolkits.<br />

h S0530 - Multi-Display Roundtable<br />

(Monday, 13:00, Room: A2)<br />

h S0326 - Next Generation InfoWall<br />

(Thursday, 09:00, Room: A1)<br />

Szilárd Páll<br />

PhD Student (KTH Royal Institute of <strong>Technology</strong>)<br />

Szilard is a PhD student at KTH Royal Instute of<br />

<strong>Technology</strong>, working on parallel algorithms for Molecular<br />

Dynamics; developer of the GROMACS MD package.<br />

h Session(s): S0363 - Efficient Molecular Dynamics<br />

on Heterogeneous <strong>GPU</strong> Architectures in GROMACS<br />

(Wednesday, 16:00, Room: N)<br />

Jeremie Papon<br />

PhD Student (University of Gottingen)<br />

Biography unavailable at press time.<br />

h Session(s): S0075 - Oculus Real-Time Modular<br />

Cognitive Vision System (Tuesday, 15:00, Room: A1)<br />

Valerio Pascucci<br />

(University of Utah)<br />

Dr. Valerio Pascucci is the Director of the Center for<br />

Extreme Data Management, Analysis and Visualization<br />

(CEDMAV.COM) of the University of Utah establishes in<br />

collaboration of the Pacific Northwest National<br />

Laboratory (PNNL). Valerio is a Professor of the Scholl of<br />

computing, Associate Director of the Scientific<br />

Computing and Imaging (SCI) Institute, and a Laboratory<br />

Fellow at the PNNL. Before joining SCI, Dr. Pascucci<br />

served as a Group Leader and a Project Leader at the<br />

Lawrence Livermore National Laboratory, Center for<br />

Applied Scientific Computing and as Adjunct Professor<br />

at the Computer Science Department of University of<br />

California Davis.<br />

h Session(s): S0623 Visualizing Heterogeneous<br />

Performance Tested on MPI+CUDA Gigapixel<br />

Panorama Stitching (Wednesday, 17:00, Room: A8)<br />

Ritesh Patel<br />

Student (University of California Davis)<br />

Ritesh is a graduate student pursuing my M.S. degree in<br />

Electrical and Computer Engineering at the University of<br />

California, Davis. His interests are in the area of GP<strong>GPU</strong><br />

applications.<br />

h Session(s): S0361 - Lossless Data Compression on<br />

<strong>GPU</strong>s (Wednesday, 17:00, Room: B)<br />

Sandeep Patel<br />

Assitant Professor (University of Delaware)<br />

Sandeep Patel is and Assistant Professor in the<br />

Department of Chemistry and Biochemistry at the<br />

University of Delaware. He earned his Ph.D. in Chemical<br />

Engineering from the Massachusetts Institute of<br />

<strong>Technology</strong> (MIT). His research interests include the<br />

broad areas to which simulation techniques of<br />

biophysical systems and development of advanced<br />

molecular modeling technologies are applied.<br />

h Session(s): S0207 – <strong>GPU</strong> Enabled Macromolecular<br />

Simulation: Challenges and Opportunities<br />

(Wednesday, 15:30, Room: N)<br />

Anjul Patney<br />

PhD Candidate (University of California, Davis)<br />

Anjul is a fifth year PhD student in the Department of<br />

Electrical and Computer Engineering at University of<br />

California, Davis. He works under the guidance of Prof.<br />

John Owens in the area of graphics and computer<br />

architecture. In his research, he is interested in pursuing<br />

hardware and software challenges in the design of<br />

programmable rendering architectures.<br />

h Session(s): S0138 – <strong>GPU</strong> Task-Parallelism:<br />

Primitives and Applications<br />

(Thursday, 15:30, Marriott Ballroom 3)


Bharath Pattabiraman<br />

PhD Student (Northwestern University)<br />

Biography unavailable at press time.<br />

h Session(s): S0087 - <strong>GPU</strong> Acceleration of<br />

Dense Stellar Clusters Simulation<br />

(Thursday, 15:00, Room: M)<br />

Sushrut Pavanaskar<br />

PhD Candidate (UC Berkeley)<br />

Sushrut Pavanaskar is a PhD candidate in Mechanical<br />

Engineering at UC Berkeley. His research interests<br />

include CAD/CAM, geometric modeling, <strong>GPU</strong> algorithms,<br />

computer graphics, and manufacturing. Applications of<br />

his research include solid model rendering, toolpath<br />

planning, and methods to improve efficiency in<br />

manufacturing. He received his BE in Mechanical<br />

Engineering from Pune University and his M. Tech. from<br />

IIT Bombay in Manufacturing. Currently at Berkeley, he<br />

works in computer aided design and manufacturing<br />

laboratory advised by Prof. Sara McMains. He recently<br />

won Audi Production Award 2011 for his concept on<br />

applying advanced geometric algorithms in automobile<br />

manufacturing for resource efficiency.<br />

h Session(s): S0411 – Artifact-Free Cloud-Based CAD<br />

Rendering (Thursday, 16:30, Room: L)<br />

Jon Peddie<br />

President (Jon Peddie Research)<br />

Jon Peddie is one of the pioneers of the graphics<br />

industry, starting his career in computer graphics in<br />

1962. After the successful launch of several graphics<br />

manufacturing companies, Peddie began JPA in 1984 to<br />

provide comprehensive data, information and<br />

management expertise to the computer graphics<br />

industry. Peddie lectures at numerous conferences on<br />

topics pertaining to graphics technology and the<br />

emerging trends in digital media technology. Recently<br />

named one of the most influential analysts, he is<br />

frequently quoted in trade and business publications,<br />

and contributes articles to numerous publications<br />

including as well as appearing on CNN and TechTV.<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Bert Peers<br />

Senior Graphics <strong>Program</strong>mer (CCP Games)<br />

Bert Peers is a senior graphics programmer with<br />

Iceland based CCP Games, the company behind the<br />

single shard space MMO Eve Online. After working in the<br />

games industry as a freelancer for over a decade, as<br />

well as a few years in the field of medical imaging and<br />

rapid prototyping, he joined CCP to focus on high fidelity<br />

avatar customization, rendering, and all things<br />

characters.<br />

h Session(s): S0021 - OptiX for DirectX <strong>Program</strong>mers<br />

- EVE Online’s <strong>GPU</strong>-Raytraced Portraits<br />

(Tuesday, 16:30, Room: J1)<br />

Blair Perot<br />

Professor (University of Massachusetts, Amherst)<br />

Prof. Perot is the Director of the Theoretical and<br />

Computational Fluid Dynamics Laboratory at the<br />

University of Massachusetts, Amherst. He obtained his<br />

Ph.D. and M.S. degrees in Mechanical Engineering and<br />

in Computer Science from Stanford University and a<br />

B.S.E in Engineering Physics with highest honors from<br />

Princeton University in 1987. Research in the Theoretical<br />

and Computational Fluid Dynamics Laboratory focuses<br />

on high performance computing, the computer<br />

simulation of fluid flow, and the study of fluid turbulence.<br />

The Laboratory is funded, in part, by the Office of Naval<br />

Research, the Air Force Office of Scientific Research, the<br />

DOE and the NSF.<br />

h Session(s): S0217 – Efficient Implementation of<br />

CFD Algorithms on <strong>GPU</strong> Accelerated<br />

Supercomputers (Wednesday, 17:30, Room: K)<br />

David Perry<br />

CEO and Co-Founder (GAIKAI)<br />

David Perry was the founder & president Shiny<br />

Entertainment, Inc. for over 12 years (bought by Atari),<br />

he’s one of the best known video game industry<br />

veterans. Over 29 years, Perry has developed or<br />

programmed over 100 games across 29 video game<br />

platforms. All told, Perry’s games (including #1 hits like<br />

The Terminator, Teenage Mutant Ninja Turtles, Disney’s<br />

Aladdin & Warner’s Matrix projects) have totaled over a<br />

billion dollars in retail sales. Perry sits on the advisory<br />

board of the Game Developers <strong>Conference</strong>, Indiecade,<br />

VGEXPO, and has spoken at TED, E3, Hollywood and<br />

Games Summit, CGDC, MIT, USC, UCI, UCLA, QUB,<br />

Montreal Game Summit, Digital Hollywood, What Teens<br />

Want etc.). In his last position Perry was the co-founder<br />

& chief creative officer of Acclaim.com, directing<br />

multiple MMORPG games, Social Network Games &<br />

Casual Titles. All games used the ‘free-to-play’ model,<br />

supported by in-game advertising, subscriptions or<br />

micro-transactions. Now Perry is the CEO and cofounder<br />

of Gaikai.com, a company that’s developed a<br />

cutting-edge video game streaming technology that<br />

allows any Windows game or application to run in any<br />

browser with just one click. Perry also recently launched<br />

a book for students called David Perry on Game Design<br />

- GameDesignBook.org (the largest non-profit book on<br />

Game Design ever written).<br />

h Session(s): S2004 – Emerging Companies<br />

Summit: CEO on Stage Featuring GAIKAI,<br />

Immersive Media, and Numecent<br />

(Wednesday, 15:00, Marriott Ballroom 4)<br />

Christian Perwass<br />

CEO (Raytrix GmbH)<br />

Dr. Christian Perwass received a MSci degree in Physics<br />

from the University of London, UK, in 1996, and a Ph.D.<br />

in engineering from Cambridge University, UK, in 1999.<br />

He then held a post-doctoral position at the University of<br />

Kiel, Germany, until 2006, where he worked on image<br />

processing, machine learning and camera models. From<br />

2006 until 2009 he worked at Robert Bosch GmbH,<br />

Germany, where he developed image processing<br />

software for automated optical inspection machines. In<br />

2009 he co-founded Raytrix GmbH to develop and build<br />

lightfield cameras.<br />

h Session(s): S0335 - Live 3D-Video with a Lightfield<br />

Camera (Wednesday, 14:00, Room: A1)<br />

h S2006 - Emerging Companies Summit: CEO on<br />

Stage Featuring Raytrix, Playcast and Universal<br />

Robotics (Wednesday, 17:00, Marriott Ballroom 4)<br />

David Peters<br />

(CEO, Universal Robotics)<br />

David launched Universal Robotics in April of 2008,<br />

having raised private equity to capitalize operations. He<br />

is the Chairman of the Board. Before founding Universal,<br />

he was an entrepreneur in the motion picture industry,<br />

working as a producer for 17 years. He is a seasoned<br />

operations executive and fund raiser. David is a member<br />

of the Director’s Guild of America and the Robotics and<br />

Smart Device Committee of the World Economic Forum<br />

Network of Global Agenda Councils. He has a Bachelor<br />

of Fine Arts from the Cleveland Institute of Art.<br />

h S2006 - Emerging Companies Summit: CEO on<br />

Stage Featuring Raytrix, Playcast and Universal<br />

Robotics (Wednesday, 17:00, Marriott Ballroom 4)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

131


SPEAKERS AND<br />

PANELISTS<br />

Loukas Petridis<br />

Staff Scientist (Oak Ridge National Laboratory)<br />

Loukas Petridis obtained his PhD in theoretical physics<br />

from Cambridge University in 2006. He is a Postdoctoral<br />

fellow at Oak Ridge National Laboratory from 2007 to<br />

2009 where he currently is a Staff Scientist.<br />

h Session(s): S0659 - Computer Simulation of<br />

Lignocellulosic Biomass (Tuesday, 16:30, Room: A2<br />

James Phillips<br />

Senior Research <strong>Program</strong>mer (University of Illinois)<br />

James Phillips is a Senior Research <strong>Program</strong>mer in the<br />

Theoretical and Computational Biophysics Group at the<br />

Beckman Institute for Advanced Science and <strong>Technology</strong><br />

at the University of Illinois at Urbana-Champaign. He<br />

has a Ph.D. in Physics from the University of Illinois.<br />

Since 1999, James has been the lead developer of the<br />

highly scalable parallel molecular dynamics program<br />

NAMD, for which he received a Gordon Bell Award in<br />

2002. His research interests include improving the<br />

performance and accuracy of biomolecular simulations<br />

through parallelization, optimization, hardware<br />

acceleration, better algorithms, and new methods.<br />

h Session(s): S0127 - Petascale Molecular Dynamics<br />

Simulations on <strong>GPU</strong>-Accelerated Supercomputers<br />

(Wednesday, 15:00, Room: N)<br />

h S0709 - Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 2<br />

(Thursday, 14:00, Room: J1)<br />

Peter Phillips<br />

SVP (Aon Benfield Securities)<br />

Biography unavailable at press time.<br />

h Session(s): S0418 – High Productivity<br />

Computational Finance on <strong>GPU</strong>s<br />

(Tuesday, 14:00, Room: L)<br />

Jakub Pietrzak<br />

Software Engineer (University of Warsaw)<br />

Jakub Pietrzak is member in the research team in the<br />

Department of Medical Physics, Maria Skłodowska-<br />

Curie Memorial Cancer Centre - Institute of Oncology.<br />

He is an experienced C++ developer interested in image<br />

processing and analyzing techniques and their<br />

applications in medical imaging. He also worked as<br />

software engineer for a video postproduction company.<br />

Jakub is a student of the final year of Inter-faculty<br />

Individual Studies In Mathematics and Natural Sciences<br />

at the University of Warsaw, where he studies<br />

simultaneously physics (specialization in nuclear<br />

medicine) and mathematics.<br />

h Session(s): S0312 - <strong>GPU</strong> Implementation for Rapid<br />

Iterative Image Reconstruction in Nuclear<br />

Medicine (Wednesday, 10:00, Room: A8)<br />

Nikos Pitsianis<br />

Assistant Professor (Aristotle University, Greece)<br />

Nikos Pitsianis is an assistant professor at the<br />

Department of Electrical and Computer Engineering,<br />

Aristotle University of Thessaloniki, Greece, and an<br />

adjunct professor with the Departments of Computer<br />

Science and Electrical and Computer Engineering of<br />

Duke University, Durham, North Carolina. His research<br />

interests include high-performance algorithms and<br />

architectures for signal and image processing.<br />

h Session(s): S0314 - Efficient k-Nearest<br />

Neighbor Search Algorithms on <strong>GPU</strong>s<br />

(Tuesday, 16:30, Room: C)<br />

Victor Podlozhnyuk<br />

Software Engineer (NVIDIA)<br />

Victor Podlozhnyuk is a performance optimization expert<br />

currently working on NVIDIA FFT library. In his spare time<br />

he is investigating various opportunities for putting to use<br />

the tremendous amount of horsepower modern<br />

<strong>GPU</strong>-based systems have. In his previous role of a devtech<br />

engineer at NVIDIA he authored a number of sample<br />

algorithm implementations in CUDA and OpenCL for<br />

NVIDIA <strong>GPU</strong> Computing SDK. Victor holds a Master’s and<br />

a Bachelor’s degree in Electrical Engineering from<br />

Moscow Institute of Physics and <strong>Technology</strong>.<br />

h Session(s): S0273 - Fast JPEG Coding on the <strong>GPU</strong><br />

(Wednesday, 16:00, Room: A1)<br />

Raphaël Poncet<br />

Research Scientist (Commissariat à l’Energie Atomique<br />

et aux Energies Alternatives)<br />

Raphael Poncet is a research scientist at CEA (the French<br />

Alternative Energies and Atomic Energy Commission), a<br />

French government-funded technological research<br />

institution, where he works on a high performance<br />

industrial multi-physics multi-material hydrodynamic code.<br />

h Session(s): S0091 - Sustainable Hybrid<br />

Parallelization of an Unstructured Hydrodynamic<br />

Code (Thursday, 15:00, Room: N)<br />

Warren Ponder<br />

Director, Product Management (VMware)<br />

Biography unavailable at press time.<br />

h Session(s): S0359 – VMware and NVIDIA:<br />

Delivering 3D Workstations from the Cloud<br />

(Tuesday, 17:00, Room: A5)<br />

Duncan Poole<br />

Senior Manager, HPC (NVIDIA)<br />

Duncan Poole is the CEO of the OpenACC organization,<br />

and Senior Manager in HPC for NVIDIA, he where he<br />

works with 3rd party tools providers to deliver <strong>GPU</strong>enabled<br />

capabilities. Duncan’s interests include<br />

fostering strong academic research relationships, most<br />

recently in the area of computational chemistry. Duncan<br />

is a graduate in Electrical Engineering from the<br />

University of Toronto.<br />

h Session(s): S0517A – <strong>Program</strong>ming <strong>GPU</strong>s with<br />

OpenACC (Part 1 of 3) (Monday, 10:30, Room: B)<br />

h S0517B – <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />

2 of 3) (Monday, 13:00, Room: B)<br />

h S0517C – <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />

3 of 3) (Monday, 14:30, Room: B)<br />

h S0621 - NVIDIA OpenACC<br />

(Thursday, 09:30, Room: A5)<br />

Mark Popkiewicz<br />

CEO (MirriAd)<br />

Mark has extensive executive experience in high growth<br />

companies and has grown businesses from small to<br />

large and from local to global market leadership globally,<br />

having set up 30 operations around the world. With<br />

extensive executive experience in high growth companies<br />

such as Eicon Network, SDX Business systems, Lucent<br />

Technologies, Mobile Media, and BBC Ventures Group,<br />

and BBC Vecta – he is now CEO of MirriAd. Mark has a<br />

thorough understanding of contemporary technology and<br />

business models in digital media, on-line advertising as<br />

well as telecoms and mobile.<br />

h Session(s): S2001 – Emerging Companies<br />

Summit: CEO on Stage Featuring Unity<br />

Technologies, MirriAd, BioDigital<br />

(Wednesday, 10:00, Marriott Ballroom 4))


Srinivasa Prasanna<br />

Professor (International Institute of Information<br />

<strong>Technology</strong> Bangalore)<br />

Biography unavailable at press time.<br />

h Session(s): S0271 – Fast Adaptive Sampling<br />

Technique for Multi-Dimensional Integral Estimation<br />

Using <strong>GPU</strong>s (Wednesday, 14:30, Marriott Ballroom 3)<br />

Will Ramey<br />

Sr. Product Manager, <strong>GPU</strong> Computing (NVIDIA)<br />

As NVIDIA’s Senior Product Manager for <strong>GPU</strong><br />

Computing, Will helps define and promote platforms,<br />

libraries and developer tools for CUDA architecture<br />

<strong>GPU</strong>s. Prior to joining NVIDIA in 2003, he managed an<br />

independent game studio and developed advanced<br />

technology for the entertainment industry as a product<br />

manager and software engineer. He holds a BA in<br />

Computer Science from Willamette University and<br />

completed the Japan Studies <strong>Program</strong> at the Tokyo<br />

International University. Outside of work, Will learns<br />

something new every day, usually from his two kids. He<br />

enjoys hiking, camping, swimming, spending time with<br />

his wonderful wife, and playing The Game.<br />

h Session(s): S0005 - Languages, APIs and<br />

Development Tools for <strong>GPU</strong> Computing<br />

(Monday, 09:00, Room: A5)<br />

Pradeep Rao<br />

<strong>Technology</strong> Architect (Infosys Technologies Ltd)<br />

Pradeep is <strong>Technology</strong> Architect at Infosys Limited,<br />

Bangalore, India. He has nine years of experience in the<br />

IT industry. His core focus area has been building<br />

solutions and applied research in the field of High<br />

Performance Computing (HPC). He has experience in<br />

many HPC technologies such as CUDA, OpenCL and<br />

multi-core technologies such as Microsoft HPC Server.<br />

As part of HPC team at Infosys, his responsibilities<br />

include providing consulting services to our Fortune 500<br />

clients for their HPC needs and building solutions<br />

leveraging suitable HPC technology. He has also worked<br />

on various Microsoft platforms including .Net<br />

technologies and Sql Server.<br />

h Session(s): S0271 - Fast Adaptive Sampling<br />

Technique for Multi-Dimensional Integral<br />

Estimation Using <strong>GPU</strong>s<br />

(Wednesday, 14:30, Marriott Ballroom 3)<br />

Steve Rennich<br />

HPC Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Steve Rennich is a CUDA Developer <strong>Technology</strong> Engineer<br />

at NVIDIA where he supports the use of <strong>GPU</strong>s in by<br />

computational structural mechanics community. Steve<br />

holds a PhD in Aeronautics and Astronautics from<br />

Stanford University where he studied computational fluid<br />

mechanics and vortex system instabilities. Prior to<br />

joining NVIDA Steve spent 10 years developing structural<br />

analysis codes.<br />

h Session(s): S0029 - Leveraging Matrix Block<br />

Structure In Sparse Matrix-Vector Multiplication<br />

(Wednesday, 14:00, Marriott Ballroom 3)<br />

Max Rietmann<br />

PhD Student (Institute for Computational Science / USI<br />

Lugano, Switzerland)<br />

Max Rietmann is a PhD Student in computer science at<br />

the Institute for Computational Science at the USI<br />

Lugano in Switzerland. As a developer for the <strong>GPU</strong><br />

version of seismology code SPECFEM3D, he research is<br />

focused on both computational and algorithmic<br />

challenges associated with numerical wave propagation.<br />

h Session(s): S0508 - Faster Finite Elements for Wave<br />

Propagation Codes (Thursday, 10:00, Room: A2)<br />

Mariano Rivera<br />

(Researcher-Professor, CIMAT A.C.)<br />

Biography unavailable at press time.<br />

h Session(s): S0128 - V:Screen: A Real-Time<br />

Augmented Video Method<br />

(Wednesday, 17:00, Room: A1)<br />

Dylan Roeh<br />

Kernel Developer (Wolfram Research Inc)<br />

Dylan Roeh is a Kernel Developer for Wolfram Research<br />

Inc., the company that makes Mathematica and<br />

Wolfram|Alpha. He is one of the developers responsible<br />

for the recently-added CUDA and OpenCL support.<br />

h Session(s): S0100 - Mathematica as a Practical<br />

Platform for <strong>GPU</strong>-Accelerated Finance<br />

(Wednesday, 17:00, Room: L)<br />

John Romein<br />

Senior Researcher (ASTRON)<br />

John W. Romein is a senior system researcher in<br />

high-performance computing at ASTRON, where he is<br />

responsible for the central, real-time data processing of<br />

LOFAR telescope data. He obtained his Ph.D. degree on<br />

distributed search algorithms for board-game playing at<br />

Vrije Universiteit, Amsterdam. As a postdoctoral<br />

researcher, he solved the game of Awari using a large<br />

computer cluster and did research on parallel<br />

algorithms for bioinformatics. His research interests<br />

include high-performance computing, parallel<br />

algorithms, networks, programming languages, and<br />

compiler construction.<br />

h Session(s): S0124 - Signal Processing on <strong>GPU</strong>s for<br />

Radio Telescopes (Thursday, 10:00, Room: M)<br />

Christopher Rossbach<br />

Researcher (Microsoft Research Silicon Valley)<br />

Chris Rossbach is a Researcher with Microsoft Research<br />

Silicon Valley.<br />

h Session(s): S0320 - PTask: OS Support for <strong>GPU</strong><br />

Dataflow <strong>Program</strong>ming (Thursday, 14:00, Room: B)<br />

Davide Rossetti<br />

Researcher (Italian National Institue for Nuclear Physics)<br />

Davide Rossetti has a degree in Theoretical Physics and<br />

is currently a staff researcher at Italian National Institute<br />

for Nuclear Physics (INFN). He has been member of the<br />

Array Processor Experiment (APE) research group for<br />

more than 15 years. His interests range from numerical<br />

simulations and HPC to processor architectures,<br />

compilers, computer graphics. He spent the last 10<br />

years working on the development of software and<br />

hardware for high performance interconnection<br />

networks on clusters.<br />

h Session(s): S0282 - Leveraging NVIDIA <strong>GPU</strong>Direct<br />

on APEnet+ 3D Torus Cluster Interconnect<br />

(Thursday, 16:00, Room: K)<br />

Scott Rostrup<br />

Software Engineer (Synopsys Inc)<br />

After completing a Masters Thesis at the University of<br />

Waterloo on developing fluid simulation algorithms for<br />

the Cell and <strong>GPU</strong> architectures, Scott joined Synopsys’s<br />

<strong>GPU</strong> computing effort. Since joining Synopsys, Scott has<br />

become interested in developing <strong>GPU</strong> algorithms for<br />

applications not typically thought suitable for<br />

acceleration such as sparse linear algebra, graph<br />

algorithms, and circuit simulation.<br />

h Session(s): S0349 - Tree Accumulation on the <strong>GPU</strong><br />

(Tuesday, 15:00, Room: J3)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

133


SPEAKERS AND<br />

PANELISTS<br />

Erwin Roth<br />

Researcher (Technische Universitaet Muenchen)<br />

Erwin graduated from Technische Universität München<br />

in 2008 with a Master of Science (Dipl.-Ing.) degree in<br />

Mechanical Engineering with a solid background in<br />

computer vision and model based tracking. He is<br />

currently working as PhD candidate for the Ingolstadt<br />

Institute of the Technische Universität München, a<br />

scientific research center founded by the AUDI AG and<br />

the Technische Universität München in the field of<br />

sensor data simulation for the computer-based testing<br />

of Advanced Driver Assistance Systems.<br />

h Session(s): S0319 - Advanced Driver<br />

Assistance System Testing using OptiX<br />

(Tuesday, 14:00, Room: N)<br />

Gregory Ruetsch<br />

Applied Engineer (NVIDIA)<br />

Greg Ruetsch is an applications engineer in <strong>GPU</strong><br />

Computing at NVIDIA. Prior to this he held positions at<br />

Clearspeed Technologies and at Sun Microsystems. He<br />

received his Bachelor’s degree in mechanical and<br />

aerospace engineering from Rutgers University and a<br />

Ph.D. in applied mathematics from Brown University,<br />

after which he was a postdoctoral fellow in the<br />

Aerospace Engineering Department at the University of<br />

Southern California and in the Center for Turbulence<br />

Research at Stanford University.<br />

h Session(s): S0522 - Introduction to CUDA Fortran<br />

(Monday, 14:30, Room: A3)<br />

Karl Rupp<br />

Project Assistant (TU Wien)<br />

Karl Rupp received the BSc degree in electrical<br />

engineering from the Technische Universität Wien in<br />

2006, the MSc in computational mathematics from<br />

Brunel University in 2007, and the degree of<br />

Diplomingenieur in microelectronics and in technical<br />

mathematics from the Technische Universität Wien in<br />

2009. He completed his doctoral degree on deterministic<br />

numerical solutions of the Boltzmann transport<br />

equation in 2011. His scientific interests are in the field<br />

of semiconductor device simulation and include generic<br />

programming, advanced discretization schemes for<br />

partial differential equations and parallel computing.<br />

h Session(s): S0071 - The High-Level Linear<br />

Algebra Library ViennaCL And Its Applications<br />

(Thursday, 15:00, Room: C)<br />

Scott Ruppert<br />

ThinkStation Technical Solutions Manager (Lenovo)<br />

Scott Ruppert is Technical Solutions Manager for the<br />

worldwide ThinkStation business unit at Lenovo.<br />

h Session(s): S0638 - Lenovo ThinkStation<br />

Accelerates Medical Research with Beckman<br />

Coulter (Presented by Lenovo)<br />

(Tuesday, 16:00, Room: M)<br />

Radu Rusu<br />

Research Scientist (Willow Garage, Inc)<br />

Radu B. Rusu is a Research Scientist at Willow Garage<br />

and a Visiting Lecturer at Stanford University. Dr. Rusu<br />

received his Ph.D. in Computer Science from the<br />

Technische Universitaet Muenchen, Germany. During the<br />

last few years, Dr. Rusu has been on the board of many<br />

workshops and scientific events held at prestigious<br />

conferences, such as RSS, ICRA, IROS, AAAI, etc. He has<br />

authored over 50 scientific publications, including 1 book<br />

and 1 best paper award at ICAR 2009. Dr. Rusu’s current<br />

research interests include realtime perception and 3D<br />

semantic mapping. He is currently a maintainer of the<br />

PCL project.<br />

h Session(s): S0088 - Point Cloud Library (PCL) on<br />

CUDA (Tuesday, 14:00, Room: C)<br />

Denis Sabitov<br />

(Schlumberger)<br />

Biography unavailable at press time.<br />

h Session(s): S0171 - Numerical Modeling Of 3D<br />

Anisotropic Seismic Wave Propagation On<br />

Multi<strong>GPU</strong> Platforms (Wednesday, 09:00, Room: A7)<br />

Priyanka Sah<br />

Compute DevTech Engineer (NVIDIA)<br />

Having spent two years with the Indian Space Research<br />

Organization, developing and implementing parallel<br />

image processing algorithms for satellite imagery,<br />

Priyanka Sah went on to attain her masters in Computer<br />

Science and Engineering at IIT Delhi. Priyanka<br />

subsequently worked on life science and weather<br />

simulation codes as a CUDA consultant, before joining<br />

NVIDIA in their Developer <strong>Technology</strong> group. With NVIDIA<br />

Priyanka works in a number of HPC application<br />

domains, helping customers develop with the <strong>GPU</strong> and<br />

working at the leading edge of HPC performance.<br />

h Session(s): S0428 - Panini: A <strong>GPU</strong> Aware Array<br />

Class (Thursday, 16:00, Room: B)<br />

Nikolai Sakharnykh<br />

Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Nikolai Sakharnykh is a developer technology engineer<br />

at NVIDIA. He has been working with game developers<br />

and HPC CUDA customers providing support for<br />

graphics technology and <strong>GPU</strong> compute. Currently he is<br />

working on CFD and linear algebra related projects for<br />

current and future <strong>GPU</strong> hardware. His interests include<br />

computational fluid dynamics, sparse matrix solvers and<br />

visualization techniques. Nikolai graduated with honours<br />

from Moscow State University, the department of<br />

Computational Mathematics and Cybernetics as a<br />

specialist in applied mathematics and informatics.<br />

Currently he’s also working on his PhD at MSU.<br />

h Session(s): S0247 - 3D ADI Method for<br />

Fluid Simulation on Multiple <strong>GPU</strong>s<br />

(Tuesday, 17:00, Marriott Ballroom 3)<br />

Graham Sanborn<br />

Lead Software Developer (FunctionBay)<br />

Graham Sanborn is a research engineer at FunctionBay,<br />

Inc. He is a member of the multi-flexible-body dynamics<br />

(MFBD) development team, where his research and<br />

development focus is finite element technologies for<br />

nonlinear dynamics, the integration of these technologies<br />

with multi-body formulations for system-level analysis of<br />

dynamic systems, and the numerical methods<br />

appropriate for these systems. He has a bachelor’s<br />

degree in computer science and a PhD in mechanical<br />

engineering. He received his PhD in 2008 from the<br />

University of Illinois at Chicago, where he studied<br />

computational rigid and flexible body system dynamics.<br />

h Session(s): S0055 - Particle Dynamics with MBD<br />

and FEA using CUDA (Wednesday, 16:00, Room: K)<br />

Avijit Santra<br />

Project Manager - Knowledge Based Engineering (Tata<br />

Motors Limited)<br />

Avijit Santra received his Masters in Mechanical<br />

Engineering from IIT Kharagpur 2001. He then joined Tata<br />

Technologies Ltd in 2001 and deputed to Tata Motors Ltd<br />

Engineering Research Center. Having 10 years of<br />

experience in Knowledge Based Engineering Kernel and<br />

Application development, he is also involved in various<br />

initiatives in Tata Motors Digital Vehicle Development<br />

<strong>Program</strong> which includes PLM, 3D for All etc.


h Session(s): S0040 - Introducing CUDA in KBE<br />

Applications for Digital Vehicle Development<br />

<strong>Program</strong>s (Tuesday, 09:30, Room: J2)<br />

Greg Scantlen<br />

Greg Scantlen is CEO of CreativeC.com, a supplier of<br />

high-performance computing machines and expertise to<br />

scientists and researchers at academic institutions and<br />

US national laboratories, such as Los Alamos National<br />

Laboratory and Sandia National Laboratory.<br />

h Session(s): S0646 - Massively Parallel Code<br />

Development on Stelletto CDA (Presented by<br />

Creative Consultants) (Tuesday, 17:00, Room: A8)<br />

Bertil Schmidt<br />

(Nanyang Technological University)<br />

Biography unavailable at press time.<br />

h Session(s): S0008 - Algorithms and Tools for<br />

Bioinformatics on <strong>GPU</strong>s (Tuesday, 16:00, Room: K)<br />

Michael Schøler<br />

Senior Consultant (LEGO)<br />

Michael has a Masters Degree in Computer Science<br />

from Aalborg University within the fields of Computer<br />

Vision and Artificial Intelligence systems. As a Senior<br />

Consultant and CEO in Hinnerup Net A/S, Michael has<br />

participated in a number of projects for LEGO. One of<br />

these projects is LEGO 3DServices which is a service<br />

oriented distributed HPC framework (<strong>GPU</strong>/CPU) that this<br />

session will focus on. Michael has worked on numerous<br />

other projects, ranging from simple websites to cutting<br />

edge technology development. The most recent primary<br />

customers for Hinnerup Net A/S are: Vestas, TrygVesta,<br />

The Danish Road-Directory and LEGO.<br />

h Session(s): S0261 – Scalable <strong>GPU</strong> Computing<br />

Service Architecture (Tuesday, 16:00, Room: A5)<br />

Steve Scott<br />

CTO, Tesla Business (NVIDIA)<br />

Dr. Steve Scott is Chief <strong>Technology</strong> Officer of the Tesla<br />

business unit at NVIDIA, where he is responsible for the<br />

evolution of NVIDIA’s <strong>GPU</strong> computing roadmap. Prior to<br />

joining NVIDIA in August 2011, Steve spent 19 years at<br />

Cray, where he was CTO since 2004. He was the Chief<br />

Architect of multiple systems at Cray, architected the<br />

routers for the Cray XT, XE and Cascade systems, and<br />

led the Cray Cascade project funded by the DARPA High<br />

Productivity Computing Systems program. Steve holds<br />

twenty-eight US patents, and has served on numerous<br />

advisory boards and program committees. He was the<br />

recipient of the 2005 ACM Maurice Wilkes Award and the<br />

2005 IEEE Seymour Cray Computer Engineering Award.<br />

He received his PhD in computer architecture in 1992<br />

from the University of Wisconsin at Madison, where he<br />

was a Wisconsin Alumni Research Foundation and Hertz<br />

Foundation Fellow.<br />

h Session(s): S0531 - Exascaling Your Apps<br />

(Wednesday, 09:00, Room: C)<br />

Frank Sculli<br />

Co-Founder/Informatics Director (BioDigital)<br />

Frank cofounded BioDigital on the premise that<br />

advancements in 3D and information technology will<br />

revolutionize the understanding of health and this vision<br />

continues to drive innovation. With extensive experience<br />

in health informatics, Frank has consulted to numerous<br />

prestigious medical institutions. Most notably, Frank led<br />

the development of the Caisis cancer data management<br />

project which is used globally by leading cancer<br />

hospitals. Prior to cofounding BioDigital, Frank worked<br />

at Honeywell, and later as a consultant to major<br />

organizations such as the Bank of New York, Pfizer and<br />

the Pennsylvania Treasury Department. Frank received<br />

his MS in Engineering from Columbia University.<br />

h Session(s): S2001 – Emerging Companies Summit:<br />

CEO on Stage Featuring Unity Technologies,<br />

Numecent, and BioDigital<br />

(Wednesday, 10:00, Marriott Ballroom 4)<br />

Ani Anciaux Sedrakian<br />

(IFP Energie Nouvelles)<br />

biography unavailable at press time.<br />

h Session(s): S0108 – An Innovative Massively<br />

Parallelized Molecular Dynamic Software<br />

(Tuesday, 16:00, Room: C)<br />

Mark Seligman<br />

Senior Scientist (Insilicos LLC)<br />

Mark was a compiler developer for supercomputer<br />

vendors for many years. In recent years, he became<br />

more interested in the interplay of algorithms with<br />

hardware and now prefers to work directly with other<br />

researchers. His original training was in pure math, but<br />

nowadays he tends to focus on bioinformatics,<br />

computational statistics and optimization.<br />

h Session(s): S0337 - High-Throughput Epistasis<br />

Screening Using <strong>GPU</strong>s (Tuesday, 09:00, Room: K)<br />

Matthew Sellitto<br />

(Northeastern University)<br />

Biography unavailable at press time.<br />

h Session(s): S0290 – Algorithm Acceleration<br />

for Geospatial Analysis<br />

(Thursday, 09:30, Marriott Ballroom 3)<br />

Partha Sen<br />

CEO (Fuzzy Logix)<br />

Partha Sen is the Co-founder and CEO of Fuzzy Logix.<br />

He has a passion for solving complex business problems<br />

using quantitative methods, data mining and pattern<br />

recognition. Since 1995, Partha has pursued this passion<br />

and has developed numerous high-performance<br />

quantitative algorithms. Today, these algorithms and<br />

models are the basis for the products being brought to<br />

market by Fuzzy Logix. Before founding Fuzzy Logix,<br />

Partha worked at Bank of America where he held senior<br />

management positions in the commercial and<br />

investment bank and in the portfolio strategies group.<br />

Previously Partha held managerial positions at Ernst<br />

and Young and Tata. He has an Engineering degree from<br />

the Indian Institute of <strong>Technology</strong> and an MBA from<br />

Wake Forest University.<br />

h Session(s): S0427 - Intra-Day Risk-Management<br />

with Parallelized Algorithms on <strong>GPU</strong>s<br />

(Tuesday, 17:00, Room: L)<br />

Neil Sequeira<br />

Managing Director (General Catalyst Partners)<br />

As a Managing Director of General Catalyst Partners,<br />

Neil invests in both new and existing technology<br />

businesses. His areas of special interest include:<br />

Internet and new media; software; consumer services;<br />

and network infrastructure. He is based in our Palo Alto<br />

office. Before joining General Catalyst Partners, Neil<br />

held positions at Time Warner where he was most<br />

recently Managing Director, <strong>Technology</strong> for Time Warner<br />

Investments. Formerly AOL Time Warner Ventures, the<br />

early stage private investment vehicle for the world’s<br />

largest media company. During his four years at Time<br />

Warner, Neil worked closely with various operating<br />

groups including AOL, HBO, Time Inc., Time Warner<br />

Cable, Turner and Warner Brothers to identify<br />

investment opportunities. Neil sourced, led and was a<br />

board director or observer for several of the companies<br />

within the Time Warner Investments portfolio including:<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

135


SPEAKERS AND<br />

PANELISTS<br />

Arroyo Video Solutions (CSCO), BigBand Networks<br />

(BBND), Entropic (ENTR), Goldpocket Interactive (ERIC),<br />

N2Broadband (ERIC) and Waterfront Media.<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Fyodor Serzhenko<br />

SEO (Fastvideo)<br />

Fyodor Serzhenko is CEO of Fastvideo company. His<br />

research interests include high speed cameras and<br />

software for high speed imaging, high performance<br />

computing. He was graduated from Moscow Institute of<br />

Physics and <strong>Technology</strong> in 1989 and got PhD in physics<br />

of semiconductors in 1993.<br />

h Session(s): S0273 - Fast JPEG Coding on the <strong>GPU</strong><br />

(Wednesday, 16:00, Room: A1)<br />

Christopher Sewell<br />

(Los Alamos National Laboratory)<br />

Biography unavailable at press time.<br />

h Session(s): S0706 - Los Alamos AHPC Symposium,<br />

PISTON: Portability and Performance for Data-<br />

Parallel Visualization and Analysis Operators<br />

(Wednesday, 17:30, Room: J2)<br />

h S0707 - Los Alamos AHPC Symposium, Accelerated<br />

HPC Symposium: Scalability: Hardware and<br />

Software (Thursday, 09:00, Room: J2)<br />

Peter Shenkin<br />

Vice President (Schrodinger)<br />

Peter S. Shenkin, Vice President, joined Schrödinger in<br />

1999. Previously, he was the lead developer of the<br />

MacroModel molecular-modeling package at Columbia<br />

University. He received his Ph.D. in Chemistry from<br />

Princeton University in 1979. After working for Owens-<br />

Corning Fiberglass Corporation for four years, he taught<br />

and carried out research at Columbia University and<br />

Barnard College prior to joining the MacroModel group<br />

in 1992. He has published in the areas of biosequence<br />

diversity analysis, protein structure determination,<br />

implicit solvation models for molecular mechanics and<br />

fast methods for determining solvent-accessible surface<br />

areas for atoms in molecules.<br />

h Session(s): S0121 - Software Architecture to<br />

Facilitate CUDA Development<br />

(Wednesday, 16:30, Room: N)<br />

Gideon Shmuel<br />

CEO (eyeSight Mobile Technologies, Ltd.)<br />

Gideon joined eyesight with 20 Years of experience in<br />

the Telecoms and Enterprise Software markets.<br />

Gideon has been involved in growing technology<br />

organizations and running and establishing the business<br />

activities and operations of several companies across<br />

international markets. Most recently Gideon performed<br />

the role of VP Sales at cVidya Networks. Prior to that<br />

Gideon had a number of executive roles in a number of<br />

countries in Olista, Top Image Systems, LCR Telecom<br />

and Esprit Telecom.<br />

h Session(s): S2002 - Emerging Companies Summit:<br />

CEO on Stage Featuring eyeSight Mobile,<br />

Numira Biosciences, and Ubitus<br />

(Wednesday, 11:00, Marriott Ballroom 4)<br />

Mark Silberstein<br />

Post-doctoral Researcher (UT Austin)<br />

Mark Silberstein is a Post-doctoral fellow at the<br />

University of Texas at Austin, with Prof. Emmett Witchel.<br />

He earned his PhD from the Technion, Israel Institute of<br />

<strong>Technology</strong>. His current research focuses on improving<br />

the integration of <strong>GPU</strong>s with the Operating Systems, as<br />

well as optimized execution of hybrid applications<br />

involving both <strong>GPU</strong>s and CPUs. He can be reached at<br />

marks@cs.utexas.edu.<br />

h Session(s): S0360 - Set <strong>GPU</strong>s Free: Integrating a<br />

File System with CUDA <strong>Program</strong>s<br />

(Thursday, 09:30, Hall 1)<br />

Chris Slaughter<br />

President (University of Texas Perception, Lynx Labs)<br />

Chris Slaughter is the President of Lynx Laboratories<br />

and a member of the Perception Laboratory at the<br />

University of Texas at Austin. Along with a team of<br />

engineers and researchers, he investigates theoretical<br />

problems in Computer Vision with an emphasis on high<br />

performance. His current research direction is focused<br />

on compressive motion analysis, real-time data<br />

clustering, and statistical localization on large maps. As<br />

the President of Lynx Labs, he also oversees the<br />

development of high performance algorithms for<br />

tracking, dense reconstruction, and SLAM as well as the<br />

commercialization of these technologies<br />

h Session(s): S0607 - High Performance 3D<br />

Perception (Tuesday, 09:00, Room: A1)<br />

Peter-Pike Sloan<br />

Principal Research Scientist (NVIDIA)<br />

Peter-Pike Sloan recently moved to NVIDIA Research.<br />

Prior to that he was part of a research group for Disney<br />

Interactive Studios and also spent nearly 10 years at<br />

Microsoft, where he worked in the graphics research<br />

group, DirectX and on the many-core incubation team.<br />

He is interested in all areas of computer graphics,<br />

particularly interactive rendering techniques.<br />

h Session(s): S0611 - Edge-Aware Shaders<br />

for Real-Time Computer Graphics<br />

(Tuesday, 15:00, Room: B)<br />

Berend Smit<br />

(UC Berkeley/Berkeley Lab)<br />

Biography unavailable at press time.<br />

h Session(s): S0122 – Computational Screening<br />

of Novel Carbon Capture Materials<br />

(Thursday, 10:30, Marriott Ballroom 4)<br />

Roman Sokolov<br />

Director of System Architecture (D4D Technologies)<br />

Roman Sokolov received his Ph.D. in Physics from UCSD in<br />

2005. He has been working at D4D technologies since 2007<br />

as a software engineer. His main interests include applied<br />

mathematics, numerical methods and image processing.<br />

h Session(s): S0079 – Warped Parallel Nearest<br />

Neighbor Searches using KD-Trees<br />

(Thursday, 10:30, Room: A2)<br />

Prakalp Somawanshi<br />

(CRL India)<br />

Biography unavailable at press time.<br />

h Session(s): S0107 – Acceleration of Long-Wave<br />

Rapid Radioactive Transfer Model on GP<strong>GPU</strong><br />

(Thursday, 10:30, Room: N)<br />

Paulo Souza<br />

HPC Consultant / Software Engineer (Petrobras)<br />

Paulo Souza has spent 9+ years working with E&P<br />

production geophysics software, seismic imaging on<br />

HPC clusters, RTM, One Way Wave Equation, Kirchhoff,<br />

multiple architecture optimization (GP<strong>GPU</strong>, x86, Power,<br />

Cell) and cluster deployment. He has been working with<br />

GP<strong>GPU</strong> since 2006 porting seismic imaging applications<br />

to CUDA with gains up to 10X in performance/price and<br />

performance/watt over a traditional multi-million dollar<br />

x86 cluster.


h Session(s): S0628 - Panel Session: Learn from<br />

Experts in the Oil & Gas Industry<br />

(Wednesday, 16:30, Room: A7)<br />

Dale Southard<br />

Senior Solution Architect (NVIDIA)<br />

Dale Southard is a senior solution architect with NVIDIA.<br />

In the past he was a HW architect in the LLNL systems<br />

group designing the vis/post-processing solutions and<br />

on-call for capability systems.<br />

h Session(s): S0119 - Best Practices for Architecting<br />

and Managing High-Performance <strong>GPU</strong> Clusters<br />

(Thursday, 14:00, Room: K)<br />

Marco Sozzi<br />

Associate Professor (Physics Department of Pisa)<br />

Marco Sozzi is associate professor of physics at the<br />

University of Pisa, working in particle physics and<br />

focusing on discrete symmetry violations in Nature. His<br />

areas of interest include high-performance triggering<br />

and event selection, and he coordinates the Trigger and<br />

Data Acquisition project for the NA62 experiment in<br />

preparation at CERN, for which a pilot project using<br />

<strong>GPU</strong>s is foreseen.<br />

h Session(s): S0013 – <strong>GPU</strong>s for Fast Triggering in<br />

NA62 Experiment (Tuesday, 10:00, Room: J2)<br />

Kyle Spagnoli<br />

Research Engineer (EM Photonics)<br />

Kyle has been working in <strong>GPU</strong> accelerated algorithms and<br />

applications since the pre-CUDA era. At the University of<br />

Delaware, he received his Master’s degree in electrical<br />

engineering with a focus in parallel computing<br />

architectures. Since then, as a research engineer at EM<br />

Photonics, he was worked on a number of GP<strong>GPU</strong> projects<br />

including: accelerated physical optics simulations,<br />

computational fluid dynamics, biomedical processing,<br />

advanced image processing, and computational linear<br />

algebra. Currently, he is researching new algorithms and<br />

techniques for large scale sparse linear algebra solvers.<br />

h Session(s): S0307 – New Advances in <strong>GPU</strong> Linear<br />

Algebra (Wednesday, 14:00, Room: A3)<br />

Paolo Spallaccini<br />

System Engineer (Ericsson)<br />

Paolo Spallaccini is working at Ericsson R&D Italy,<br />

Microwave Department, as a system engineer. His<br />

research interests lie in diverse digital signal processing<br />

areas, with focus on source and channel coding, as well<br />

as in software engineering and in algorithm<br />

engineering, with focus on parallel computing. His<br />

working experiences ranged from joining/leading<br />

technical development groups for signal processing<br />

systems to pioneering long-time perspective innovative<br />

and strategic projects for telecommunication networks<br />

backbone and mobile backhaul systems. He received a<br />

master degree in Electronic Engineering from University<br />

of Perugia in 1999. He is an IEEE Member.<br />

h Session(s): S0255 - Telecom Systems Simulations<br />

Acceleration via CPU/<strong>GPU</strong> Co-Processing: Turbo<br />

Codes Case Study<br />

(Tuesday, 10:00, Marriott Ballroom 3)<br />

Pierre Spatz<br />

Head of Quantitative Research (Murex SAS)<br />

Pierre has joined Murex in 1989 and has a master<br />

degree in computer science and applied mathematics<br />

from ENSIMAG. After various leading positions in the<br />

Murex software development team Pierre has launched<br />

the Murex Analytics initiative in 2002.<br />

h Session(s): S0250 - From <strong>GPU</strong> Computing<br />

Toward Full HPC In Finance with <strong>GPU</strong>s<br />

(Wednesday, 10:00, Room: L)<br />

Filippo Spiga<br />

Computational Scientist (Irish Centre for High-End<br />

Computing)<br />

Filippo joined ICHEC in January 2011 as a<br />

Computational Scientist after six months at the IBM T.J.<br />

Watson Research Center as Research Engineer. His<br />

main interests include general GP-<strong>GPU</strong> programming,<br />

numerical algorithms for GP-<strong>GPU</strong>, development of<br />

mixed multi-core CPU and <strong>GPU</strong> code and scientific<br />

application porting. Inside ICHEC Filippo is directly<br />

involved in the GP-<strong>GPU</strong> porting of the PWSCF package<br />

(QUANTUM ESPRESSO suite), enabling the package for<br />

efficient and high-scalable serial and parallel<br />

calculations on large <strong>GPU</strong> clusters.<br />

h Session(s): S0220 - Enabling faster material<br />

science modeling using the accelerated Quantum<br />

ESPRESSO (Thursday, 16:30, Marriott Ballroom 4)<br />

Savitha Srinivasan<br />

Partner (IBM Venture Capital Group)<br />

Savitha Srinivasan is a Partner in IBM’s Venture Capital<br />

Group in Corporate Strategy where she develops<br />

strategic relationships with venture capitalists and their<br />

portfolio companies to leverage external innovation for<br />

mutual strategic advantage. She has over 20 years of<br />

experience at IBM in leadership roles addressing the<br />

strategic priorities of IBM’s Services businesses and<br />

leads the development of IBM’s Services venture<br />

ecosystem, with each of the Global <strong>Technology</strong> Services<br />

business units – Strategic Outsourcing, Integrated<br />

<strong>Technology</strong> Services, Managed Business Process<br />

Services and Industry Analytics with early identification<br />

of companies, fostering pilots, partnerships and M&A<br />

insights. She is currently engaged in driving IBM<br />

Watson’s content partnership strategy.<br />

h Session(s): Emerging Companies Summit<br />

(Wednesday all day, Marriott Ballroom 4)<br />

Timo Stich<br />

Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Timo Stich is a Developer <strong>Technology</strong> Engineer for<br />

NVIDIA Corporation. His focus is on image processing<br />

and general purpose compute applications of <strong>GPU</strong>s.<br />

Prior to joining NVIDIA he was research staff at the<br />

Graphics, Optics and Vision Group at the Max-Planck-<br />

Institute for Computer Science, Saarbruecken and the<br />

Computer Graphics Lab at Brunswick University. He<br />

received a diploma degree in Computer Science from<br />

Mannheim University, Germany and a Ph.D. degree from<br />

the Brunswick University, Germany.<br />

h Session(s): S0052 - Fast High Quality Image and<br />

Video Background Removal with CUDA<br />

(Wednesday, 16:30, Room: A1)<br />

Chris Stiefeling<br />

(Oliver Wyman Financial Services)<br />

Chris has more than 15 years of experience in designing<br />

and implementing software solutions for the Financial<br />

Sector. He has an in-depth knowledge of spreadsheet,<br />

database and automation technologies and has<br />

developed expertise in many different programming<br />

languages and technologies. He has developed a<br />

significant amount of experience in the areas of<br />

economic scenario generation as well as pricing and<br />

valuation of derivatives and insurance products using<br />

Monte Carlo simulation techniques. Chris has expertise<br />

in implementing HPC solutions including large scale<br />

cloud computing implementations, programming on<br />

general purpose <strong>GPU</strong> cards and distributed computing<br />

frameworks such as Windows HPC.<br />

h Session(s): S0435 - Leveraging GP<strong>GPU</strong> <strong>Technology</strong><br />

for Valuation of Complex Insurance Products<br />

(Tuesday, 16:00, Room: L)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

137


SPEAKERS AND<br />

PANELISTS<br />

John Stone<br />

Senior Research <strong>Program</strong>mer (University of Illinois at<br />

Urbana-Champaign)<br />

John Stone is a Senior Research <strong>Program</strong>mer in the<br />

Theoretical and Computational Biophysics Group, and<br />

Associate Director of the NVIDIA CUDA Center of<br />

Excellence at the University of Illinois. Stone is the lead<br />

developer of VMD, a high performance molecular<br />

visualization tool used by researchers all over the world.<br />

His research interests include molecular visualization,<br />

<strong>GPU</strong> computing, parallel processing, ray tracing, haptics,<br />

and virtual environments. Mr. Stone was awarded as an<br />

NVIDIA CUDA Fellow in 2010. Stone provides consulting<br />

services for projects involving computer graphics and<br />

<strong>GPU</strong> computing.<br />

h Session(s): S0142 - VMD: High Performance<br />

Molecular Visualization and Analysis on <strong>GPU</strong>s<br />

(Wednesday, 14:00, Room: N)<br />

h S0709 - Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 2<br />

(Thursday, 14:00, Room: J1)<br />

Jeff Stuart<br />

PhD Student (UC Davis)<br />

Biography unavailable at press time.<br />

h Session(s): S0157 – A Study of Persistent Threads<br />

Style <strong>Program</strong>ming Model for <strong>GPU</strong> Computing<br />

(Thursday, 15:00, Room: B)<br />

Xiaobai Sun<br />

Professor (Duke University)<br />

Xiaobai Sun is a professor of computer science at Duke<br />

University. Her research interests and efforts focus on<br />

numerical algorithm design and analysis, especially, in<br />

bridging and blending mathematical models and<br />

computer architectures for scientific simulation and<br />

signal processing.<br />

h Session(s): S0314 – Efficient k-Nearest<br />

Neighbor Search Algorithms on <strong>GPU</strong>s<br />

(Tuesday, 16:30, Room: C)<br />

Rajeev Surati<br />

President (Scalable Display Technologies)<br />

Biography unavailable at press time.<br />

h Session(s): S0355 - Seamless Scalable Displays-<br />

using NVDIA Warp + Intensity API<br />

(Wednesday, 10:30, Room: A1)<br />

Krishnan Suresh<br />

Associate Professor (University of Wisconsin)<br />

Krishnan Suresh is currently an Associate Professor in<br />

the Department of Mechanical Engineering Department,<br />

University of Wisconsin, Madison. He graduated in 1998<br />

from Cornell with a Ph.D. in Mechanical Engineering. He<br />

later served as an Engineering Manager at Kulicke and<br />

Soffa Industries, Philadelphia from 1998 through 2002.<br />

His research interests are in representational and<br />

computational challenges underlying computational and<br />

bio-mechanics.<br />

h Session(s): S0070 - <strong>GPU</strong>-Friendly<br />

Preconditioners for Thin Structure Analysis<br />

(Wednesday, 16:30, Room: K)<br />

William Tang<br />

Director of Fusion Simulation <strong>Program</strong> at the Princeton<br />

Plasma Physics Laboratory (Princeton)<br />

William Tang is the Director of the Fusion Simulation<br />

<strong>Program</strong> at the Princeton Plasma Physics Laboratory<br />

(PPPL) and Lecturer with Rank & Title of Professor in<br />

the Department of Astrophysical Sciences at Princeton<br />

University. He is a Fellow of the American Physical<br />

Society and received the 2005 Chinese Institute of<br />

Engineers-USA (CIE-USA) Distinguished Achievement<br />

Award “for his outstanding leadership in fusion research<br />

and contributions to fundamentals of plasma science.”<br />

He is internationally recognized for his theoretical<br />

contributions as well as associated HPC applications<br />

dealing with electromagnetic kinetic plasma behavior in<br />

complex geometries. He has over 200 publications – with<br />

more than 140 peer-reviewed papers and an “h-index”<br />

or “impact factor” of 42 on the Web of Science, including<br />

over 5400 total citations. He is currently the U.S. PI for<br />

the G8 Exascale Project in Fusion Energy -- an<br />

international HPC collaboration involving the US, UK,<br />

France, Germany, Japan, and Russia.<br />

h Session(s): S0654 Fusion Energy Sciences &<br />

Computing at the Extreme Scale<br />

(Tuesday, 15:30, Room: A2)<br />

Sarah Tariq<br />

Software Engineer (NVIDIA)<br />

Sarah is a senior engineer in NVIDIA’s Developer<br />

<strong>Technology</strong> team focusing on High Performance <strong>GPU</strong><br />

Computing in the Life Sciences domain. As part of her job<br />

she works collaboratively with external developers to<br />

research and develop <strong>GPU</strong> computing algorithms and<br />

ensure the best performance of <strong>GPU</strong> computing<br />

applications on current and next-generation architectures.<br />

h Session(s): S0351 - Strong Scaling for Molecular<br />

Dynamics Applications (Tuesday, 14:30, Room: A1)<br />

Michela Taufer<br />

Assistant Professor (University of Delaware)<br />

Michela Taufer is an Assistant Professor in Computer<br />

and Information Sciences at the University of Delaware.<br />

She earned her MS in Computer Engineering from the<br />

University of Padova and her Ph.D. in Computer Science<br />

from ETH. She was a post-doc at UC San Diego and The<br />

Scripps Research Institute. Michela has a long history of<br />

interdisciplinary work with computational biophysics<br />

groups. Her research interests include software<br />

applications and their advance programmability in<br />

heterogeneous computing (i.e., multi-core platforms and<br />

<strong>GPU</strong>s); cloud computing and volunteer computing; and<br />

performance analysis, modeling and optimization of<br />

multi-scale applications.<br />

h Session(s): S0207 - <strong>GPU</strong> Enabled Macromolecular<br />

Simulation: Challenges and Opportunities<br />

(Wednesday, 15:30, Room: N)<br />

Tetsuo Tawara<br />

Software Engineer (Koozyt)<br />

Tetsuo Tawara is currently a software engineer at Koozyt<br />

where he works on augmented reality and data mining<br />

projects. He received a Masters degree in Mechanical<br />

Engineering from Aoyama Gakuin University.<br />

h Session(s): S0231 - Levenberg-Marquardt using<br />

Block Sparse Matrices on CUDA<br />

(Thursday, 14:30, Marriott Ballroom 3)<br />

Andrei Tchouprakov<br />

Director of System Architecture (D4D Technologies)<br />

Andrei Tchouprakov is a Director of System Architecture<br />

at D4D Technologies where he is currently working on<br />

developing a 3D dental scanner. His background is in 3D<br />

data acquisition, point cloud processing, surface<br />

generation, image processing and parallel computing.<br />

He received his MS degree in Mathematics in 1998 from<br />

Irkutsk State University, Russia.<br />

h Session(s): S0079 - Warped Parallel Nearest<br />

Neighbor Searches using KD-Trees<br />

(Thursday, 10:30, Room: A2)


Tom-Michael Thamm<br />

Director, Software Product Management (NVIDIA ARC)<br />

Tom-Michael Thamm is the Director for Software<br />

Product Management at NVIDIA ARC and is responsible<br />

for all products, such as iray, mental ray and the<br />

geo-spatial library. He is managing direct customer<br />

support as well. Thamm is working for mental images<br />

and NVIDIA ARC for over 20 years. He has led several<br />

key projects such as integration of mental ray into many<br />

of the major CAD systems. He has studied Mathematics<br />

and has developed various 3D file formats, such as<br />

extended OBJ, and free-form surface algorithms.<br />

h Session(s): S0507 - Interactive and Scalable<br />

Subsurface Data Visualization Framework<br />

(Wednesday, 16:00, Room: A7)<br />

Derek Thorslund<br />

Director of Product Management (Citrix Systems, Inc.)<br />

Derek Thorslund Drives Citrix’s product strategy for HDX<br />

(high definition experience) multimedia virtualization<br />

technologies and leads the company’s HDX Product<br />

Management group across XenDesktop, XenApp,<br />

VDI-in-a-Box, Citrix Receiver and CloudGateway. Upon<br />

joining Citrix in 2003, he played a key role in introducing<br />

the Citrix Access Suite, forerunner to XenDesktop<br />

Platinum Edition. Thorslund has had an extensive career<br />

in the high-tech industry as Director of Product<br />

Management at Avotus and Manager of New Business<br />

Applications at Bell-Northern Research.<br />

h Session(s): S0413 - Delivering 3D Professional<br />

Graphics from the Cloud with Citrix XenDesktop<br />

(Tuesday, 15:00, Room: A5)<br />

Alexey Titov<br />

Engineering Research Associate (Stanford)<br />

Dr. Alexey Titov is an Engineering Research Associate in<br />

the Martinez Group at Stanford University. His research<br />

efforts are focused on exploring, implementing and<br />

optimizing computational chemistry algorithms for novel<br />

architectures. He is one of developers of TeraChem,<br />

quantum chemistry software created from scratch for<br />

<strong>GPU</strong>s. Alexey Titov’s research interests also include<br />

parallel algorithms, various applications of symbolic<br />

algebra systems in optimization of performance-critical<br />

computational routines for novel architectures.<br />

h Session(s): S0429 - Quantum Chemistry: Automated<br />

Code Generation and Optimization for <strong>GPU</strong> Kernels<br />

(Thursday, 15:00, Marriott Ballroom 4)<br />

Stanimire Tomov<br />

Research Director (University of Tennessee, Knoxville)<br />

Biography unavailable at press time.<br />

h Session(s): S0248 – Excitements, Challenges,<br />

and Rewards In Optimizing GP<strong>GPU</strong> Kernels<br />

(Tuesday, 09:00, Marriott Ballroom 3)<br />

h S0042 – Solving Challenging Numerical Linear<br />

Algebra Algorithms using Multiple <strong>GPU</strong><br />

Accelerators (Wednesday, 15:00, Room: A3)<br />

Doug Traill<br />

Senior Solutions Architect (NVIDIA)<br />

Doug Traill is a Senior Solutions Architect at NVIDIA for<br />

scalable visualization solutions. He has over 15 years<br />

experience in designing and building some of the worlds<br />

most complex visualization systems.<br />

h Session(s): S0341 - See the Big Picture Scalable<br />

Visualization Solutions for System Integrators<br />

(Monday, 10:30, Room: A2)<br />

Justin Tripp<br />

Technical Staff Member (Los Alamos National Laboratory)<br />

Dr. Justin L. Tripp is a Technical Staff Member on the<br />

Advanced Architectures team at Los Alamos National<br />

Laboratory. Dr. Tripp works on tools and methodologies<br />

for creating high-performance computing systems,<br />

which have been applied to systems from<br />

supercomputers to satellites and airborne video<br />

surveillance. Dr. Tripp received an R&D100 Award for his<br />

work on the Trident C-to-FPGA Compiler. Dr. Tripp<br />

received his PhD in Electrical Engineering from Brigham<br />

Young University in 2004 and has nineteen publications<br />

relating to FPGAs and high-performance computing, and<br />

more than 15 years of experience with FPGAs, highperformance<br />

computing, advanced architectures, and<br />

system-level design and analysis tools.<br />

h Session(s): S0702 - Los Alamos AHPC Symposium,<br />

The Architecture of Acceleration in HPC<br />

(Wednesday, 15:30, Room: J1)<br />

h S0707- Los Alamos AHPC Symposium, Accelerated<br />

HPC Symposium: Scalability: Hardware and<br />

Software (Thursday, 09:00, Room: J2)<br />

Alejandro Troccoli<br />

Mobile Imaging Researcher (NVIDIA)<br />

Alejandro has been with NVIDIA since 2006 and joined<br />

NVIDIA Research in March 2011 to work in mobile<br />

computer vision and applications. As a 3D Systems<br />

Software Engineer he lead the development of NVIDIA’s<br />

Optimus technology, contributed to NVIDIA’s hybrid<br />

technology and did development work for the Direct3D<br />

Driver. Alejandro received a Licenciatura en Ciencias de<br />

la Computacion from the Universidad de Buenos Aires,<br />

Argentina, in 2001. He did his graduate work at Columbia<br />

University in the City of New York, where he received a<br />

Ph.D. in 2006.<br />

h Session(s): S0526 - Tools for Mobile Computational<br />

Photography (Tuesday, 16:00, Room: N)<br />

Jeroen Tromp<br />

Director, Princeton Institute for Computational<br />

Science (Princeton)<br />

Seismologist Jeroen Tromp, Blair Professor of Geology,<br />

Professor of Applied & Computational Mathematics, and<br />

Director of the Princeton Institute for Computational<br />

Science joined the Princeton faculty in 2008. Tromp’s<br />

main research interests are in theoretical &<br />

computational seismology, including simulations of<br />

acoustic (an)elastic, and poroelastic seismic wave<br />

propagation on local, regional and global scales. The<br />

current focus of his research involves imaging Earth’s<br />

interior based on spectral-element and adjoint methods.<br />

He received the Macelwane Medal of the American<br />

Geophysical Union in 1999 and a Gordon Bell Award in<br />

2003. He is a corresponding member of the Royal<br />

Netherlands Academy of Sciences.<br />

h Session(s): S0608 - Toward Global Seismic Imaging<br />

based on Spectral-Element and Adjoint Methods<br />

(Tuesday, 17:00, Room: A2)<br />

Thomas True<br />

Applied Engineer (NVIDIA)<br />

Tom is a Senior Applied Engineer in NVIDIA’s<br />

Professional Solutions Group where he focuses on the<br />

use of <strong>GPU</strong>s in broadcast, video and film applications<br />

ranging from pre-visualization to post production and<br />

live to air. Prior to joining NVIDIA, Tom was an<br />

Applications Engineer at SGI. Thomas has a M.S. degree<br />

in Computer Science from the Graphics Lab at Brown<br />

University and a B.S. Degree from the Rochester<br />

Institute of <strong>Technology</strong>.<br />

h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />

Round Table (Monday, 14:30, Room: A2)<br />

h S0328 - Best Practices in <strong>GPU</strong>-Based Video<br />

Processing (Tuesday, 14:00, Room: J2)<br />

h S0049 - Using the <strong>GPU</strong> Direct for Video API<br />

(Tuesday, 15:00, Room: J2)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

139


SPEAKERS AND<br />

PANELISTS<br />

Hoang-Tron Minh Tuan<br />

PhD Student (George Mason University)<br />

Tuan is currently the PhD student at George Mason<br />

University, School of System Biology. His research has<br />

been focusing on calcium dynamics, cardiac cell<br />

modeling, and high performance computing. Currently,<br />

he’s working on developing a computational model for<br />

cardiac cell at a microscale level using <strong>GPU</strong> technology<br />

to study the underlying mechanisms of calciumentrained<br />

arrhythmias.<br />

h Session(s): S0072 – <strong>GPU</strong>-Enabled Spatiotemporal<br />

Model of Stochastic Cardiac Calcium Dynamics and<br />

Arrhythmias (Wednesday, 09:00, Room: B)<br />

Antonino Tumeo<br />

Research Scientist (Pacific Northwest National<br />

Laboratory)<br />

Dr. Antonino Tumeo received the M.S degree in<br />

Informatic Engineering, in 2005, and the Ph.D. degree in<br />

Computer Engineering, in 2009, from Politecnico di<br />

Milano in Italy. Since February 2011, he has been a<br />

research scientist in the PNNL’s High Peformance<br />

Computing group. He Joined PNNL in 2009 as a post<br />

doctoral research associate. Previously, he was a post<br />

doctoral researcher at Politecnico di Milano. His<br />

research interests are modeling and simulation of high<br />

performance architectures, hardware-software<br />

codesign, FPGA prototyping and GP<strong>GPU</strong> computing.<br />

h Session(s): S0343 - A Quantum Chemistry<br />

Domain-Specific Language For Heterogeneous<br />

Clusters (Tuesday, 10:00, Room: L)<br />

Stanley Tzeng<br />

Graduate Student (University of California, Davis)<br />

Stanley Tzeng is a graduate student at the University of<br />

California, Davis. His main research is into task-parallel<br />

systems on the <strong>GPU</strong> and he is interested in its applications.<br />

h Session(s): S0138 - <strong>GPU</strong> Task-Parallelism:<br />

Primitives and Applications<br />

(Thursday, 15:30, Marriott Ballroom 3)<br />

h S0709- Los Alamos AHPC Symposium,<br />

Accelerated HPC Symposium: Applications -<br />

Methods and <strong>Program</strong>ming Models, Part 2<br />

(Thursday, 14:00, Room: J1)<br />

Ivan Ufimtsev<br />

Postdoc (Stanford)<br />

Biography unavailable at press time.<br />

h Session(s): S0429 – Quantum Chemistry: Automated<br />

Code Generation and Optimization for <strong>GPU</strong> Kernels<br />

(Thursday, 15:00, Marriott Ballroom 4)<br />

Stefan Umbreit<br />

Postdoctoral Associate (Northwestern University)<br />

Biography unavailable at press time.<br />

h Session(s): S0087 – <strong>GPU</strong> Acceleration of<br />

Dense Stellar Clusters Simulation<br />

(Thursday, 15:00, Room: M)<br />

Vamsi Krishna Veligatla<br />

<strong>GPU</strong> <strong>Program</strong>mer (University Of Groningen)<br />

Vamsi Krishna Veligatla received his Masters in<br />

Computer Science (IIIT Hyderabad 2006) and BTech in<br />

Computer Science (IIIT Hyderabad 2004). His<br />

professional experience includes, Software Developer at<br />

NVIDIA (Pune, India), then later worked as a Software<br />

Developer at AMD (Hyderabad, India), and most recently<br />

has been working as <strong>GPU</strong> <strong>Program</strong>mer at Kapteyn<br />

Astronomical Institute, University Of Groningen<br />

(Groningen, The Netherlands).<br />

h Session(s): S0187 - <strong>GPU</strong>s for Radio Imaging<br />

(Thursday, 14:00, Room: M)<br />

Shalini Venkataraman<br />

Senior Applied Engineer (NVIDIA)<br />

Shalini Venkataraman is a Senior Applied Engineer<br />

at NVIDIA.<br />

h S0530 - Multi-Display Roundtable<br />

(Monday, 13:00, Room: A2)<br />

h Session(s): S0356 - Optimized Texture Transfers<br />

(Tuesday, 16:00, Room: J2)<br />

h S0353 - <strong>Program</strong>ming Multi-<strong>GPU</strong>’s for Scalable<br />

Rendering (Wednesday, 09:00, Room: A1)<br />

h S0322 - Warping & Blending for Multi-Display<br />

Systems (Wednesday, 10:00, Room: A1)<br />

h S0326 - Next Generation InfoWall<br />

(Thursday, 09:00, Room: A1)<br />

Shivaram Venkataraman<br />

PhD Student (UC Berkeley)<br />

Shivaram Venkataraman is a PhD student at the<br />

University of California, Berkeley and is a part of the<br />

AMP Lab. He completed his M.S at the University of<br />

Illinois in 2011 and his B.E from the Birla Institute of<br />

<strong>Technology</strong> and Science, Pilani, India. His research<br />

interests are in design of storage systems and analytics<br />

platforms for big-data applications.<br />

h Session(s): S0152 – Accurate Sequence Alignment<br />

using Distributed Filtering on <strong>GPU</strong> Clusters<br />

(Tuesday, 15:30, Room: K)<br />

Vyas Venkataraman<br />

Software Engineer (NVIDIA)<br />

Vyas Venkataraman is a software engineer in the CUDA<br />

developer tools group at NVIDIA. He is primarily<br />

responsible for CUDA-MEMCHECK, and contributes to<br />

the CUDA Driver and backend code shared by clients of<br />

the debug API. He joined NVIDIA in 2010 from Boston<br />

University where he was doing research on abstractions<br />

for high level modeling of synthesizable communicating<br />

systems. Vyas received his Doctor of Philosophy from the<br />

College of Engineering at Boston University.<br />

h Session(s): S0027A – All-In-One Debugging<br />

Experience with CUDA-GDB and CUDA-MEMCHECK<br />

(Monday, 14:30, Room: A5)<br />

h S0027B – All-In-One Debugging Experience<br />

with CUDA-GDB and CUDA-MEMCHECK<br />

(Wednesday, 14:00, Room: C)<br />

Jeff Vetter<br />

(Oak Ridge National Laboratory)<br />

Biography unavailable at press time.<br />

h Session(s): S0531 - Exascaling Your Apps<br />

(Wednesday, 09:00, Room: C)<br />

Oreste Villa<br />

Research Scientist (Pacific Northwest National Laboratory)<br />

Biography unavailable at press time.<br />

h Session(s): S0343 – A Quantum Chemistry<br />

Domain-Specific Language For Heterogeneous<br />

Clusters (Tuesday, 10:00, Room: L)<br />

Will Wade<br />

Manager, Quadro Advanced Technologies (NVIDIA)<br />

Will Wade manages the Quadro Advanced Technologies<br />

Team at NVIDIA, responsible for some of the highest<br />

demanding visual computing solutions on the planet.<br />

This team creates technologies for virtual reality caves,<br />

3D stereo-scopic professional visualization, real-time<br />

broadcast graphics, and remote and virtualized<br />

interactive graphics. Will has been a leader in the field<br />

for over 15 years, with work at both NVIDIA and HP.<br />

h Session(s): S0254 - Graphics in the Cloud -<br />

How NVIDIA is Enabling Cloud Visualization<br />

(Tuesday, 14:00, Room: A5)


Kelly Walker<br />

Senior Software Developer (Hue)<br />

Biography unavailable at press time.<br />

h Session(s): S0436 - Integrated <strong>GPU</strong> Acceleration<br />

With Real Time Visualization Of Terabyte Data<br />

(Tuesday, 15:00, Room: A7)<br />

Ross Walker<br />

Assistant Professor (University of California San Diego)<br />

Ross Walker is an Assistant Research Professor at the<br />

San Diego Supercomputer Center, an Adjunct Assistant<br />

Professor in the Department of Chemistry and<br />

Biochemistry at the University of California, San Diego<br />

and an NVIDIA Fellow. He runs the Walker Molecular<br />

Dynamics Lab where he leads a team developing<br />

advanced techniques for Molecular Dynamics Simulations<br />

supporting work improving drug and biocatalyst design.<br />

His work includes improved Quantum Mechanical/<br />

Molecular Mechanical models, development of force<br />

fields for simulation of lipid membranes, simulations of<br />

cellulase enzymes for improved cellulosic bioethanol<br />

production and the development of <strong>GPU</strong> accelerated<br />

versions of the AMBER Molecular Dynamics engine.<br />

h Session(s): S0010 - Towards Routine Microsecond<br />

Molecular Dynamics Simulations on Commodity<br />

Hardware (Wednesday, 09:00, Room: N)<br />

Jason Walsh<br />

(University of Pennsylvania 3D Lab)<br />

Biography unavailable at press time.<br />

h Session(s): S0303 – <strong>GPU</strong> Acceleration for<br />

Threshold Based Region Growth Algorithms<br />

(Thursday, 09:00, Room: C)<br />

BingQiang Wang<br />

Head of High Performance Computing (BGI)<br />

BingQiang Wang completed his doctorate in<br />

computational chemistry at East China University of<br />

Science and <strong>Technology</strong> (ECUST) in 2006. From March<br />

2005, he was a research scientist at Shanghai<br />

Supercomputer center, dedicated to high performance<br />

computing enabling in computational chemistry and life<br />

science research. In March 2010 he joined BGI as group<br />

head of high performance computing, to develop<br />

solutions for challenging life science problems.<br />

h Session(s): S0519 - <strong>GPU</strong> Accelerated<br />

Bioinformatics Research at BGI<br />

(Tuesday, 14:00, Room: K)<br />

h S0109 - SOAP3: <strong>GPU</strong>-based Compressed Indexing<br />

and Ultra-fast Parallel Alignment of Short Reads<br />

(Wednesday, 16:00, Room: B)<br />

Gaofeng Wang<br />

Postdoc Fellow (Laboratoire E.M2.C, Ecole Centrale Paris)<br />

Dr. Gaofeng WANG is postdoc fellow in Laboratory EM2C,<br />

CNRS UPR288, Ecole Centrale Paris. His research<br />

interests are in area of turbulent combustion modeling<br />

and high fidelity CFD.<br />

h Session(s): S0129 - A Monte Carlo Thermal<br />

Radiation Solver in <strong>GPU</strong>/CPU Hybrid Architecture<br />

(Thursday, 09:00, Room: A8)<br />

Long Wang<br />

Associate Professor (Supercomputing Center of CNIC,<br />

Chinese Academy of Sciences)<br />

Biography unavailable at press time.<br />

h Session(s): S0392 – Large-Scale First Principle<br />

Pseudopotential DFT Calculations on <strong>GPU</strong> Clusters<br />

(Thursday, 15:30, Marriott Ballroom 4)<br />

Peng Wang<br />

Devtech Engineer (NVIDIA)<br />

Peng Wang is currently the manager of HPC developer<br />

technology in NVIDIA China, where he works with HPC<br />

developers in porting and optimizing HPC codes on <strong>GPU</strong>.<br />

Previously he works in NVIDIA US as a HPC developer<br />

technology engineer, where he mainly worked on CAE<br />

solvers on <strong>GPU</strong> and molecular dynamics. He got a Ph.D.<br />

on computational physics from Stanford, where he<br />

worked on developing massively parallel adaptive mesh<br />

fluid simulations code and applying to astrophysical<br />

turbulence simulations. He also got a MS in Physics and<br />

BS in Scientific Computing from Nankai University.<br />

h Session(s): S0245 - Porting Legacy Plasma Codes<br />

to <strong>GPU</strong> (Tuesday, 16:00, Room: A8)<br />

David Weinstein<br />

CTO (Numira Biosciences)<br />

Dr. David Weinstein is the Chief <strong>Technology</strong> Officer and<br />

Senior Director of Salt Lake Operations for Numira<br />

Biosciences. As a PhD student at the University of Utah<br />

in the early 90’s, David was a founding member of the<br />

Scientific Computing and Imaging (SCI) Institute. In 2004,<br />

he co-founded Visual Influence (VI), a SCI startup<br />

focused on custom visualization and analysis software<br />

for the medical imaging industry. In 2007, VI was<br />

acquired by Numira Biosciences, where David and his<br />

team now develop high-throughput processing, and<br />

Cloud-based interactive visual analysis tools for<br />

preclinical imaging. David has co-authored over 40<br />

peer-reviewed scientific publications.<br />

h Session(s): S2002 – Emerging Companies Summit:<br />

CEO on Stage Featuring eyeSight Mobile,<br />

Numira Biosciences, and Ubitus<br />

(Wednesday, 11:00, Marriott Ballroom 4)<br />

Jack Wells, Ph.D.<br />

Director of Science, Oak Ridge Leadership Computing<br />

Facility (Oak Ridge National Laboratory)<br />

Jack Wells is the director of science for the National<br />

Center for Computational Sciences (NCCS) at Oak Ridge<br />

National Laboratory (ORNL). He is responsible for<br />

devising a strategy to ensure cost-effective, state-of-theart<br />

scientific computing at the NCCS, which houses the<br />

Department of Energy’s Oak Ridge Leadership<br />

Computing Facility (OLCF). In ORNL’s Computing and<br />

Computational Sciences Directorate, Wells has worked<br />

as group leader of both the Computational Materials<br />

Sciences group in the Computer Science and<br />

Mathematics Division and the Nanomaterials Theory<br />

Institute in the Center for Nanophase Materials<br />

Sciences. During a sabbatical, he served as a legislative<br />

fellow for Senator Lamar Alexander, providing<br />

information about high-performance computing, energy<br />

technology, and science, technology, engineering, and<br />

mathematics education issues. Wells began his ORNL<br />

career in 1990 for resident research on his Ph.D. in<br />

Physics from Vanderbilt University. Following a<br />

three-year postdoctoral fellowship at Harvard University,<br />

he returned to ORNL as a staff scientist in 1997 as a<br />

Wigner postdoctoral fellow. Jack is an accomplished<br />

practitioner of computational physics and has been<br />

supported by the Department of Energy’s Office of Basic<br />

Energy Sciences. Jack has authored or co-authored over<br />

70 scientific papers and edited one book, spanning<br />

nanoscience, materials science and engineering,<br />

nuclear and atomic physics computational science, and<br />

applied mathematics.<br />

h Session(s): S0606 - <strong>GPU</strong>-accelerated Science on<br />

Titan: Tapping into the World’s Preeminent <strong>GPU</strong><br />

Supercomputer to Achieve Better Science<br />

(Tuesday, 14:00, Room: A2)<br />

h S0657 - Applying for INCITE <strong>Program</strong>, Conclusions,<br />

Q&A (Tuesday, 17:30, Room A2)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

141


SPEAKERS AND<br />

PANELISTS<br />

Elmar Westphal<br />

Software Developer (Forschungszentrum Juelich)<br />

Elmar Westphal has been working at Forschungszentrum<br />

Juelich for 15 years in the group that is now PGI/JCNS-TA<br />

Scientific IT-Systems. His main tasks include planning the<br />

institute’s compute clusters and writing/porting scientific<br />

software for multi-core and <strong>GPU</strong> environments. His latest<br />

projects include the CUDA-port of the micromagnetic<br />

simulation software TetraMag and the creation of a<br />

framework of accelerator routines for <strong>GPU</strong>-assisted<br />

molecular dynamics simulations.<br />

h Session(s): S0036 - Multiparticle Collision<br />

Dynamics on <strong>GPU</strong>s (Tuesday, 15:00, Room: C)<br />

Jan-Philipp Weiss<br />

Junior Professor (Karlsruhe Institute of <strong>Technology</strong>)<br />

Jan-Philipp Weiss is a junior professor at the Karlsruhe<br />

Institute of <strong>Technology</strong> (KIT), Germany. He is heading the<br />

Computing Lab Hardware-Aware Numerics at the<br />

Engineering Mathematics and Computing Labs (EMCL).<br />

From 2008 to <strong>2012</strong> he was heading a Shared Research<br />

Group on multicore and coprocessor technologies at KIT in<br />

joint collaboration with the company Hewlett-Packard.<br />

Research of his group addresses parallel numerical<br />

methods and programming techniques for emerging<br />

multi- and manycore technologies in numerical simulation<br />

and scientific computing. He received a Ph.D. from<br />

University Karlsruhe (TH) in applied mathematics in 2006.<br />

h Session(s): S0289 – Fine-Grained Parallel<br />

Preconditioners for Fast <strong>GPU</strong>-based Solvers<br />

(Wednesday, 09:00, Marriott Ballroom 3)<br />

h S0291 – LAtoolbox: A Multi-platform Sparse<br />

Linear Algebra Toolbox<br />

(Thursday, 10:30, Marriott Ballroom 3)<br />

Ian Williams<br />

Director of Applied Engineering (NVIDIA)<br />

Ian Williams is currently Director of Applied Engineering<br />

within NVIDIA’s Professional Solutions Group. Within the<br />

Applied Engineering team he has been closely involved<br />

in the design and development of many of NVIDIA’s<br />

Industry focused professional solutions and key<br />

technologies. In addition the Applied Engineering team<br />

helps customers and partners integrate these<br />

technologies into their solutions . Prior to NVIDIA he<br />

worked for 8 years at Silicon Graphics in various<br />

technical roles within Application Engineering and the<br />

Desktop Product Group. Prior to Silicon Graphics, he<br />

worked at Rolls Royce Commercial Aerospace<br />

developing applications to numerically simulate<br />

manufacturing processes. He holds a Bachelor of<br />

Science degree in Engineering Science and <strong>Technology</strong><br />

from Loughborough University (UK) as well as a Masters<br />

of Business Administration from Pepperdine University<br />

(CA, USA). He is a Chartered Mechanical Engineer with<br />

the Institute of Mechanical Engineers (UK) and<br />

throughout his career has been awarded several<br />

patents. For the past 10 years he has been Chairman<br />

SPEC/GPC committee which is part of the Standard<br />

Performance Evaluation Corporation and responsible for<br />

developing the industry wide SPECViewperf benchmark.<br />

h S0530 - Multi-Display Roundtable<br />

(Monday, 13:00, Room: A2)<br />

h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />

Round Table (Monday, 14:30, Room: A2)<br />

h S0326 - Next Generation InfoWall<br />

(Thursday, 09:00, Room: A1)<br />

Robert Wipfel<br />

Fellow (Fusion-io)<br />

Robert Wipfel is a Fellow at Fusion-io. Prior to that, at<br />

Novell, Robert was an architect or engineering lead for<br />

various Data Center products that integrated clustering,<br />

virtualization, and shared storage. Robert also helped<br />

Unisys and Intel jointly enter the commercial parallel<br />

processing market. Robert is co-author of Novell’s <strong>Guide</strong><br />

to Storage Area Networks and Novell Cluster Services<br />

and frequently speaks at Novell’s Brainshare and other<br />

technology conferences. Robert earned a BSc (Hons) in<br />

Computer Systems Engineering from the University of<br />

Kent at Canterbury, U.K. He holds ten patents on parallel<br />

processing, clustering, server and storage virtualization.<br />

h Session(s): S0619 – Hate to Wait? Flash Memory<br />

for Full-Throttle <strong>GPU</strong> Acceleration<br />

(Thursday, 09:00, Room: L)<br />

Emmet Witchel<br />

(University of Texas, Austin)<br />

Biography unavailable at press time.<br />

h Session(s): S0360 – Set <strong>GPU</strong>s Free: Integrating<br />

a File System with CUDA <strong>Program</strong>s<br />

(Thursday, 09:30, Hall 1)<br />

Nils Woetzel<br />

PhD Candidate (Vanderbilt University)<br />

Nils Woetzel, a native German, was exposed to the Basic<br />

programming language in the second grade. In his<br />

senior year of high school, he wrote a Delphi program<br />

“TitraCom”, that aided in chemical analysis experiments<br />

and participated with it in the German “Jugend forscht”<br />

high school science competition in 2001. After studying<br />

Chemistry at the University of Leipzig, Germany he<br />

started his PhD in computational structural biology at<br />

the Vanderbilt University in Nashville in 2005, where he<br />

could combine his computational and chemical skills to<br />

develop a novel protein structure prediction algorithm.<br />

h Session(s): S0346 – GP<strong>GPU</strong> Accelerated Protein<br />

Similarity Measures Identifying Biological<br />

Relevant Structure (Wednesday, 17:30, Room: N)<br />

h S0354 – Bcl::ChemInfo Suite Enables Machine<br />

Learning-Based Drug Discovery Using <strong>GPU</strong>s<br />

(Thursday, 09:30, Marriott Ballroom 4)<br />

Tim Wood<br />

Quantitative Analyst (ING Bank nv)<br />

Tim Wood is a Quantitative Analyst and Developer at ING<br />

Bank in the Netherlands. Tim joined ING after studying<br />

Computational Science and Computational Finance at<br />

the University of Amsterdam. Since Joining ING in 2009<br />

Tim has played a key role in the development and<br />

deployment of computationally demanding risk analytics<br />

systems leveraging massively parallel architectures<br />

within the bank.<br />

h Session(s): S0369 - Running Risk On <strong>GPU</strong>s<br />

(Wednesday, 14:00, Room: L)<br />

Cliff Woolley<br />

CUDA Developer <strong>Technology</strong> Engineer (NVIDIA)<br />

Cliff Woolley is a CUDA Developer <strong>Technology</strong> Engineer<br />

with NVIDIA Corporation. He received his Master’s degree<br />

in Computer Science from the University of Virginia in<br />

2003. He was among the earliest academic researchers to<br />

investigate the use of graphics processors for general<br />

purpose computation, having applied these early GP<strong>GPU</strong><br />

ideas both to non-traditional graphics rendering<br />

techniques as well as to non-graphical algorithms such<br />

as a multigrid solver for PDEs.<br />

h Session(s): S0517A - <strong>Program</strong>ming <strong>GPU</strong>s with<br />

OpenACC (Part 1 of 3) (Monday, 10:30, Room: B)<br />

h S0517B - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />

2 of 3) (Monday, 13:00, Room: B)<br />

h S0517C - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />

3 of 3) (Monday, 14:30, Room: B)<br />

h S0377 - C++ Data Marshalling Best Practices<br />

(Wednesday, 16:30, Room: L)


Rio Yokota<br />

Research Scientist (King Abdullah University of Science<br />

and <strong>Technology</strong>)<br />

Rio Yokota obtained his PhD in Mechanical Engineering<br />

from Keio University, Japan, in 2009, and was a<br />

postdoctoral researcher at the Department of<br />

Mathematics at University of Bristol from 2009-2010,<br />

and also at Mechanical Engineering Department at<br />

Boston University from 2010-2011. During his PhD, he<br />

worked on the implementation of fast multipole methods<br />

on special purpose machines such as MDGRAPE-3, and<br />

then on <strong>GPU</strong>s after CUDA was released. During his<br />

post-doc he has continued to work on fast multipole<br />

methods, and was part of the team that won the Gordon<br />

Bell prize for price/performance in 2009 using 760 <strong>GPU</strong>s<br />

h Session(s): S0308 - Recent Trends in<br />

Hierarchical N-body Methods on <strong>GPU</strong>s<br />

(Tuesday, 15:00, Marriott Ballroom 3)<br />

Eric Young<br />

Manager of Developer <strong>Technology</strong> Profesional and<br />

Consumer Applications (NVIDIA)<br />

Eric Young is a developer technology engineering<br />

working at NVIDIA supporting developer with<br />

professional graphics and computer vision.<br />

h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />

Round Table (Monday, 14:30, Room: A2)<br />

h S0404 - Computer Vision Libraries with <strong>GPU</strong>s<br />

(Tuesday, 09:30, Room: A1)<br />

Ronald Young<br />

President (Multipath Corporation)<br />

Dr. Young received his PhD in Engineering and<br />

Numerical Analysis from UC Berkeley in 1972. His<br />

career has focused on designing matrix algebra<br />

algorithms which exploit all hardware features for<br />

achieving the highest performance possible. In 1989 Dr.<br />

Young founded Multipath Corporation which develops the<br />

Fast Matrix Solver (FMS) software. FMS is an out-of-core<br />

matrix algebra package used to solve extremely large<br />

problems in production applications.<br />

h Session(s): S0032 - Teraflop <strong>GPU</strong> Acceleration Of<br />

Large Matrix Algebra (Thursday, 14:30, Room: C)<br />

Alaa Yousif<br />

Software Solution Architect (Dell)<br />

Alaa Yousif is Principle Engineer at Dell and has spent<br />

the last 12 years in the area of Dell Remote Management<br />

Products. Currently responsible for integrating Hadoop<br />

(Big Data) with HPC cluster. Alaa was also a lead<br />

engineer in custom solutions engineering leading 12<br />

engineers in Austin and Bangalore design centers.<br />

h Session(s): S0309 - Dynamically Allocating GP<strong>GPU</strong><br />

to Host Nodes (servers) (Thursday, 10:30, Room: K)<br />

Song Yu<br />

(Chemical & Petroleum Department, University of Calgary)<br />

Song Yu is a petroleum engineering M.Sc. student who<br />

joined the Department of Chemical and Petroleum<br />

engineering at the University of Calgary in January 2010.<br />

He holds a B.Sc. degree in software engineering(ISS)<br />

from Wuhan University(WHU) in China and M.Sc. degree<br />

in computer software and theory from State Key<br />

Laboratory of Software Engineering(SKLSE) of Wuhan<br />

University(WHU) in China. Research Topic: Parallel<br />

Reservoir Simulation using <strong>GPU</strong> Computing Developing<br />

parallel sparse linear solver package on <strong>GPU</strong> parallel<br />

Computing Environment and integrating them into<br />

reservoir simulation to enhance the performance for<br />

large-scale simulation problems.<br />

h Session(s): S0190 - Large-Scale Reservoir<br />

Simulation on <strong>GPU</strong> (Wednesday, 14:30, Room: A7)<br />

Fabrizio Zanella<br />

Systems Manager (CST of America)<br />

Fabrizio Zanella has been at CST of America, a<br />

worldwide provider of full wave electromagnetic<br />

software, for 6 years. His current role consists of IT<br />

management for North America, and customer support<br />

for topics including hardware, licensing and high<br />

performance computing solutions. Prior to joining CST<br />

Fabrizio had 15 years of experience performing Signal<br />

Integrity characterization of high speed digital systems.<br />

He has worked at various companies including EMC<br />

Corporation and Teradyne.<br />

h Session(s): S0069 - <strong>GPU</strong> Computing Advances<br />

in 3D Electromagnetic Simulation<br />

(Tuesday, 14:00, Room: J3)<br />

Krzysztof Zarzycki<br />

Senior Software Developer (IBM Poland)<br />

Krzysztof Zarzycki is a Senior Software Developer in IBM<br />

Poland, Netezza R&D Department where he plays a role<br />

of technical lead of CUDA Development team. His<br />

research covers using <strong>GPU</strong>s to accelerate various<br />

methods - from AI, data mining & analytics, through<br />

data warehouse operations, finally to solving<br />

bioinformatics problems. He was educated on Warsaw<br />

University in Poland where he got a Master degree of<br />

Computer Science.<br />

h Session(s): S0376 – Dynamic <strong>Program</strong>ming on<br />

CUDA: Finding the Most Similar DNA Sequence<br />

(Tuesday, 10:00, Room: K)<br />

Peter Zaspel<br />

Research Assistant (University of Bonn)<br />

Peter Zaspel is research assistant at the Institute for<br />

Numerical Simulation of the University of Bonn,<br />

Germany. He studied Computer Science and is now<br />

working on his PhD. His research topics are<br />

computational fluid dynamics, general-purpose<br />

computations on graphics hardware and visualization.<br />

h Session(s): S0044 - A Massively Parallel Two-<br />

Phase Solver for Incompressible Fluids on<br />

Multi-<strong>GPU</strong> Clusters (Thursday, 14:00, Room: N)<br />

Kang Zhang<br />

Research Scientist (GE Global Research)<br />

Kang Zhang is currently a research scientist at GE<br />

Global Research Center, New York. He obtained the Ph.<br />

D. and M. S. E. degrees in Electrical and Computer<br />

Engineering from Johns Hopkins University, in 2011 and<br />

2009 respectively, and the B. S. degree in physics from<br />

Nankai University, China, in 2007. His research interests<br />

include GP<strong>GPU</strong> applications, high data throughput<br />

imaging platform, real-time imaging system, and optical<br />

sensing & imaging. From 2009 to 2010, Kang worked as<br />

an ORISE Research Fellow for the U. S. Food and Drug<br />

Administration (FDA), where he developed optical<br />

metrology methods for medical device evaluation.<br />

h Session(s): S0141 - <strong>GPU</strong>-Accelerated Optical<br />

Coherence Tomography Imaging<br />

(Wednesday, 15:30, Room: A8)<br />

Kaiyong Zhao<br />

PhD Student (Hong Kong Baptist University)<br />

Kaiyong received his B.Eng. degree in the Aircraft Design<br />

and <strong>Technology</strong> from Beijing Institute of <strong>Technology</strong> (BIT),<br />

Beijing, P. R. China, in 2005. After that he worked in CCUR<br />

two years, then got his master’s degree at HKBU. Now, he<br />

is currently an PhD student in the Department of<br />

Computer Science, Hong Kong Baptist University.<br />

h Session(s): S0281 - Accelerate a Fully Functional<br />

Photo Editing Software with <strong>GPU</strong><br />

(Wednesday, 15:00, Room: A1)<br />

CONFERENCE GUIDE SPEAKERS AND<br />

PANELISTS<br />

143


SPEAKERS AND<br />

PANELISTS<br />

Hongwei Zhou<br />

Senior Software Development Engineer (Altair)<br />

Hongwei Zhou is a senior software developer. He has<br />

various experiences in sparse direct solver, Lanczos and<br />

automatic multilevel-substructuring Eigen value solver<br />

in Altair Engineering. He received B.S. degree in 2003<br />

and M.S. degree in 2006 from Department of Mechanics,<br />

Peking University, China.<br />

h Session(s): S0225 – Speedup Altair RADIOSS<br />

Solvers Using NVIDIA <strong>GPU</strong><br />

(Wednesday, 09:30, Room: K)<br />

Jun Zhu<br />

Professor (Zhejiang University)<br />

Jun Zhu is currently the Director and a Professor, within<br />

the Institute of Bioinformatics at Zhejiang University.<br />

Previously, he was Vice President at Zhejiang University<br />

(2005-2009). Before that, Zhu was the Dean, for the<br />

College of Agricultural and Biotechnology at Zhejiang<br />

University (1999-2005). His education experience<br />

includes a Ph.D. in Statistics and Genetics, NC State,<br />

USA (1989).<br />

h Session(s): S0516 - The Advantage of <strong>GPU</strong><br />

Computation for Analyzing Complex Traits<br />

(Tuesday, 14:30, Room: K)<br />

Gernot Ziegler<br />

Compute Developer <strong>Technology</strong> (NVIDIA)<br />

Gernot Ziegler (MSc/civ.ing.) is an Austrian engineer with<br />

an MSc degree in Computer Science and Engineering<br />

from Linköping University, Sweden. He pursued his PhD<br />

studies at the Max-Planck-Institute for Informatics in<br />

Saarbrücken, Germany, where he specialized in <strong>GPU</strong><br />

algorithms for computer vision and data-parallel<br />

algorithms for spatial data structures. As a member of<br />

NVIDIA’s DevTech-Compute team, Gernot now consults<br />

in high performance computing on graphics hardware.<br />

h Session(s): S0096 - Summed Area Ripmaps<br />

(Wednesday, 17:30, Marriott Ballroom 3)<br />

Robert Zigon<br />

Sr Staff Development Engineer (Beckman Coulter)<br />

Bob Zigon is a Sr. Staff Research Engineer and has<br />

worked at Beckman Coulter for 10 years. He has<br />

degrees in Computer Science and Mathematics from<br />

Purdue University. He was the architect of Kaluza, an<br />

NVIDIA Tesla powered analysis application for flow<br />

cytometry. He’s now working in particle characterization<br />

and analytical ultracentrifugation. His interests include<br />

high performance computing, numerical analysis and<br />

information retrieval theory.<br />

h Session(s): S0221 - 1024 Bit Parallel Rational<br />

Arithmetic Operators for the <strong>GPU</strong><br />

(Tuesday, 16:00, Marriott Ballroom 3)<br />

Enrico Zschau<br />

Lead Software Architect (SeeReal Technologies GmbH)<br />

Enrico Zschau received the diploma in computer science<br />

from Technical University Dresden, Germany, in 2004.<br />

Since 2000 he has been working as assistant with the<br />

3D-group at Technical University Dresden. In 2002 he<br />

joined Dresden 3D GmbH, a spin-off from the TU<br />

Dresden 3D-group, which became SeeReal Technologies<br />

shortly after. Mr. Zschau’s activities focus on research<br />

and development of software solutions in the fields of<br />

image-processing and GP<strong>GPU</strong>-based algorithms for<br />

holography. He holds the position of Lead Software<br />

Architect and is responsible for a variety of softwaresolutions<br />

especially eye-tracking on PC and DSPs and<br />

real-time holography on <strong>GPU</strong>s and FPGAs.<br />

h Session(s): S0324 - Content Generation and<br />

Real-Time Hologram Computation for Holographic<br />

3D-Displays (Thursday, 10:00, Room: A1)


PLATINUM SPONSORS<br />

ASUS<br />

BULL<br />

CAPS<br />

Cooley LLP<br />

Dell<br />

ASUS comes from the last four letters of Pegasus, the winged horse in<br />

Greek mythology that represents the inspiration of art and learning. ASUS<br />

embodies the strength, creative spirit and purity symbolized by this regal<br />

and agile mythical creature, soaring to new heights of quality and innovation<br />

with each product it introduces to the market.<br />

Bull, the premier European-based global IT supplier, has made Extreme<br />

Computing one of its key strategic priorities. In a few years only, Bull has<br />

won over 150 customers in 15 countries across 3 continents. Bull has a<br />

proven track record of building Extreme Computing systems for prestigious<br />

academic and industry customers, most notably in France, Germany, UK,<br />

Spain, Netherlands and Brazil. Bull’s Extreme Computing solutions are<br />

based on bullx, a range of innovative systems designed for uncompromised<br />

performance, which has gained worldwide recognition. For more information<br />

visit: http://www.bull.com/extreme-computing<br />

CAPS is a major supplier of solutions dedicated to application migration and<br />

deployment on manycore processors. CAPS global solution for manycore<br />

leads the developer to performance by providing top-of-the-range<br />

technology (HMPP hybrid compiler and wizard), code porting methodology<br />

and ecosystem (third software tools, expertise, training…). It’s directivebased<br />

& multi-target HMPP compiler enables developers to safely move to<br />

hybrid CPU / <strong>GPU</strong> model and quickly get performance by leveraging the<br />

computing power of stream processors without the pain associated to <strong>GPU</strong><br />

programming. HMPP is offered within CAPS DevDeck package: an<br />

ALL-IN-ONE multi-level suite for manycore application definition, porting<br />

and optimization with tools (HMPP compiler, development tools such as<br />

HMPP Wizard, debugging & profiling software and scientific libraries),<br />

methodology and resources (tutorials, use cases…).<br />

Cooley LLP is a global law firm for the converging worlds of high technology,<br />

high finance and high-stakes litigation. We are counselors, strategists and<br />

advocates for the foremost private and public companies and investors in all<br />

major technology fields. Our Emerging Companies practice has a long<br />

tradition of representing emerging and high-growth companies worldwide.<br />

The <strong>GPU</strong> space is an exciting growth area in the technology arena, and<br />

Cooley has been at the forefront, advising both established and start-up<br />

companies on the issues facing businesses in this industry. Our attorneys’<br />

extensive experience in intellectual property protection and business<br />

counseling along with the Firm’s deep roots in the technology sector give us<br />

a unique perspective on the issues facing our clients. Cooley’s team consists<br />

of experienced counselors and litigators that are equally skilled at<br />

representing and advising clients on the protection and commercialization of<br />

their intellectual property in a wide range of areas, including copyright,<br />

trademark, patent, technology licensing, privacy, electronic security and<br />

electronic commerce. We are dedicated to offering comprehensive and<br />

creative legal support, utilizing the full resources of the Firm.<br />

For more than 26 years, Dell has played a critical role in transforming<br />

computing, enabling more affordable and more pervasive access to technology<br />

around the world. The company’s technology solutions improve customers’<br />

productivity, enhances their lives and meets their distinct needs.<br />

Headquartered in Round Rock, Texas, Dell serves customers ranging from the<br />

world’s largest and most demanding businesses and public-sector<br />

organizations, to small and medium businesses, and consumers worldwide.<br />

Recognized for its ability to provide customers personalized, built-to-order<br />

technology through direct, online and retail channels, nearly 80 percent of<br />

Dell’s $53 billion in revenue last year was driven by enterprise products,<br />

services and solutions it delivers to businesses and organizations. Dell’s nearly<br />

100,000 team members worldwide are deeply committed to corporate<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

145


SPONSORS AND<br />

EXHIBITORS<br />

PLATINUM SPONSORS, continued<br />

HP<br />

IBM<br />

Lenovo<br />

Los Alamos National Laboratory<br />

Microsoft Corporation<br />

responsibility. The company ranks among Working Mother Magazine’s 100 Best<br />

Companies and first among Newsweek’s Greenest Companies in America.<br />

At Dell, we promote an environment that thrives on innovation. To deliver<br />

effective solutions that meet customer challenges, Dell employs an open,<br />

standards-based approach to technology innovation. Each year, Dell honors<br />

the outstanding inventors among its employees.<br />

HP creates new possibilities for technology to have a meaningful impact on<br />

people, businesses, governments and society. The world’s largest technology<br />

company, HP brings together a portfolio that spans printing, personal<br />

computing, software, services and IT infrastructure to solve customer problems.<br />

More information about HP (NYSE: HPQ) is available at http://www.hp.com.<br />

IBM is involved in more than 150 smart grid engagements around the world,<br />

in both mature and emerging markets. IBM is the founding member of the<br />

Global Intelligent Utility Network Coalition, a unique collaboration of utilities<br />

from around the globe who are working to accelerate the use of smart grid<br />

technologies and move the industry forward through its most challenging<br />

transformation. More about IBM’s vision to bring a new level of intelligence<br />

to how the world works—how every person, business, organization,<br />

government, natural system, and man-made system interacts, can be found<br />

here: http://www.ibm.com/smarterplanet.<br />

Lenovo is one of the world’s largest makers of personal computers and<br />

makes the world’s most innovative PCs, including the renowned ThinkPad ®<br />

notebook as well as products carrying the ThinkCentre ® , ThinkStation ® ,<br />

ThinkServer ® , IdeaCentre ® , and IdeaPad ® sub-brands.<br />

Today, Lenovo is a global corporation with significant operations on six<br />

continents and operating in more than 60 countries and selling products in<br />

160. Everyone at Lenovo takes great pride in our ability to attract top talent<br />

from diverse backgrounds and from around the world. We view our<br />

differences and diversity as a source of strength in building a collaborative<br />

culture that helps us achieve our goals. We have no world headquarters and,<br />

instead, have put in place a distributed management structure that places<br />

operational hubs in centers of excellence around the world integrating this<br />

talented, diverse group into a cohesive Next Generation company.<br />

Los Alamos National Laboratory, a multidisciplinary research institution<br />

engaged in strategic science on behalf of national security, is operated by<br />

Los Alamos National Security, LLC, a team composed of Bechtel National,<br />

the University of California, The Babcock & Wilcox Company, and URS for the<br />

Department of Energy’s National Nuclear Security Administration.<br />

Los Alamos enhances national security by ensuring the safety and reliability<br />

of the U.S. nuclear stockpile, developing technologies to reduce threats from<br />

weapons of mass destruction, and solving problems related to energy,<br />

environment, infrastructure, health, and global security concerns.<br />

Microsoft Visual Studio® development system is an integrated environment<br />

that helps simplify the entire development process from design to<br />

deployment. Customers can unleash their creativity with powerful<br />

prototyping, modeling, and design tools that brings a vision to life. Work<br />

within a personalized environment, and target a growing number of<br />

platforms. With integrated testing and debugging tools that enable delivery<br />

of high-quality solutions, developers and testers can work more efficiently.


PNY<br />

Supermicro<br />

SYNNEX Corporation<br />

TSMC<br />

Established in 1985, PNY Technologies ® , Inc. is the authorized NVIDIA ®<br />

Quadro ® channel partner for North America, Latin America and Europe. PNY<br />

provides unsurpassed service and commitment to its professional graphics<br />

customers offering: 3 year warranty, pre and post sales support, dedicated<br />

Quadro Field Application engineers and direct tech support hot lines. PNY<br />

recently introduced a new line of high performance Solid State Drives Prevail<br />

Series SSD designed specifically for the professional and enterprise<br />

markets. The company also offers a full line of commercial and consumer<br />

graphics cards, computer memory upgrade modules, flash memory cards,<br />

USB flash drives, and HDMI cables. Headquartered in Parsippany, NJ, PNY<br />

maintains facilities in North America, Europe, Asia, and Latin America. For<br />

more information, please visit http://www.pny.com.<br />

Supermicro, the leader in server technology innovation and green computing,<br />

provides customers around the world with application-optimized server,<br />

workstation, blade, storage and <strong>GPU</strong> systems. Based on its advanced Server<br />

Building Block Solutions, Supermicro offers the most optimized selection for IT,<br />

datacenter and HPC deployments. The company’s system architecture<br />

innovations include Twin server, double-sided storage and SuperBlade ® product<br />

families. Offering the most comprehensive product lines in the industry,<br />

Supermicro delivers energy-efficient solutions with unmatched performance<br />

and value. Founded in 1993, Supermicro is headquartered in Silicon Valley with<br />

worldwide operations and manufacturing centers in Europe and Asia. For more<br />

information, visit www.supermicro.com.<br />

SYNNEX Corporation, a Fortune 300 corporation, is a leading business<br />

process services company, partnering with resellers and original equipment<br />

manufacturers in multiple regions around the world. The Company provides<br />

services in IT distribution, supply chain management, contract assembly and<br />

global business services. Founded in 1980, SYNNEX employs more than<br />

10,000 associates worldwide and operates in the United States, Canada,<br />

China, Japan, Mexico, the Philippines and the United Kingdom. Our valueadded<br />

service model streamlines business processes to help customers<br />

across the globe lower their costs and create greater efficiencies. We<br />

provide a variety of professional and marketing services, including: demand<br />

generation; education and training; pre- and post-sale technical support;<br />

end-user enablement; server assessment; design and integration; recycling<br />

and trade-in; contract design and assembly; and IT resource planning.<br />

TSMC is the world’s largest dedicated semiconductor foundry, providing the<br />

industry’s leading process technology and the foundry segment’s largest<br />

portfolio of process-proven libraries, IPs, design tools and reference flows.<br />

The Company’s managed capacity in 2011 totaled 13.22 million (8-inch<br />

equivalent) wafers, including capacity from three advanced 12-inch<br />

GIGAFAB facilities, four eight-inch fabs, one six-inch fab, as well as<br />

TSMC’s wholly owned subsidiaries, WaferTech and TSMC China, and its joint<br />

venture fab, SSMC. TSMC is the first foundry to provide 28nm production<br />

capabilities. Its corporate headquarters are in Hsinchu, Taiwan. For more<br />

information about TSMC please visit http://www.tsmc.com.<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

147


SPONSORS AND<br />

EXHIBITORS<br />

GOLD SPONSORS<br />

Amazon Web Services<br />

Fusion-io<br />

NextIO<br />

SGI<br />

SILVER SPONSORS<br />

Acceleware Corporation<br />

Adobe<br />

Appro International, Inc.<br />

Built upon the same world-class technology that powers Amazon.com,<br />

Amazon Web Services (AWS) provides businesses with a secure, reliable,<br />

easy-to-scale, low-cost computing platform “in the cloud.” Companies of all<br />

sizes, from all around the globe use AWS to build applications, store data,<br />

manage business processes, and more. Learn more: http://aws.amazon.com<br />

The Fusion-io storage memory platform significantly improves processing<br />

capabilities within a data center by moving active data closer to the CPU<br />

where it is processed. Called shared data decentralization, this reduces<br />

latency while increasing data center efficiency. Fusion’s software and<br />

hardware solutions leverage non-volatile memory for enterprise-grade<br />

performance, reliability and manageability.<br />

NextIO was founded based upon the vision of creating shared server I/O<br />

resource pools. Today, NextIO simplifies complex server I/O and enables<br />

any-to-any connectivity among a wide variety of data center resources. With<br />

the NextIO architecture server I/O is consolidated at the top of the rack, may<br />

be shared and dynamically allocated across servers within the rack. NextIO<br />

currently offers a complete portfolio of I/O consolidation and I/ O<br />

virtualization products that are easily managed, highly flexible, and provide<br />

customers with greater operational efficiencies that reduce CapEx and OpEx<br />

costs, and deliver the utmost in data center flexibility and business agility,<br />

which drives productivity and economic efficiencies.<br />

SGI is the trusted leader in technical computing. The company develops,<br />

markets and sells a broad line of mid-range and high-end scale-out and<br />

scale-up servers plus data storage solutions and differentiating software.<br />

SGI solutions are used by the scientific, technical and business communities<br />

to solve challenging, data-intensive compute and data management<br />

problems requiring large amounts of computing power and fast, efficient<br />

data movement both within the computing system and to and from largescale<br />

data storage installations.<br />

Acceleware delivers industry leading CUDA training and HPC consulting<br />

services to organisations looking to unlock the parallel processing potential of<br />

the <strong>GPU</strong>. Acceleware’s software solutions include <strong>GPU</strong> accelerated Seismic<br />

Migration libraries for the Oil & Gas industry and Electromagnetic solvers for<br />

CAE markets. At Acceleware the goal is always the same – Go Faster<br />

Whether it’s a smartphone or tablet app, a game, a video, a digital magazine,<br />

a website, or an online experience, chances are that it was touched by Adobe<br />

technology. Our tools and services enable our customers to create<br />

groundbreaking digital content, deploy it across media and devices, and then<br />

continually measure and optimize it based on user data. By providing<br />

complete solutions that combine digital media creation with data-driven<br />

marketing, we help businesses improve their communications, strengthen<br />

their brands, and ultimately achieve greater business success.<br />

Appro is a leading developer of innovative supercomputing solutions and is<br />

positioned to support High Performance Computing markets. Appro<br />

accelerates technical applications and business results through outstanding<br />

price/performance, power efficiency and fast time-to-market solutions<br />

based on the latest open standards technologies. Appro enables scientists<br />

and engineers to use data-intensive, capacity, capability and hybrid<br />

computing for scientific research, data modeling, engineering simulations,<br />

and seismic visualization. To learn more, visit www.appro.com


Deloitte<br />

ELEKS<br />

GE Intelligent Platforms<br />

Morgan Stanley<br />

SK Hynix<br />

SVB<br />

In the United States, Deloitte LLP and its subsidiaries have 45,000<br />

professionals with a single focus: serving our clients and helping them solve<br />

their toughest problems. We work in four key business areas — audit,<br />

financial advisory, tax and consulting — but our real strength comes from<br />

combining the talents of those groups to address clients’ needs. Fortune and<br />

BusinessWeek consistently rank our organization among the best places to<br />

work, which is good news for our talent and our clients alike. When the best<br />

people tackle the most compelling challenges, everyone wins.<br />

Multi-year expertise in building complex science-intensive solutions<br />

including HPC has determined our value proposition of delivering<br />

sophisticated custom computing systems for power, finance, automation,<br />

entertainment and other industries. ELEKS’ engineering culture, combined<br />

with aspiration for technological excellence and solid project management<br />

skills, ensures superior business value we deliver to our highly valued<br />

customers. For more information about ELEKS’ software development,<br />

localization and testing services go to www.eleks.com.<br />

GE Intelligent Platforms is a leading manufacturer of rugged COTS computer<br />

boards and systems for military programs. As a partner to NVIDIA for<br />

Embedded Applications, GE brings GP<strong>GPU</strong> technology into a wide range of<br />

defense related programs and can now be used in ground tanks, fighter<br />

aircraft, military helicopters, and UAV’s for Radar, ISR, DSP, Sensor<br />

Processing, Imaging and many other military applications.<br />

Morgan Stanley is a leading global financial services firm providing a wide<br />

range of investment banking, securities, investment management and<br />

wealth management services. The Firm’s employees serve clients worldwide<br />

including corporations, governments, institutions and individuals from more<br />

than 1,300 offices in 43 countries. For further information about Morgan<br />

Stanley, please visit www.morganstanley.com.<br />

SK Hynix designs, manufactures and markets a wide variety of DRAM and<br />

NAND Flash memories and CMOS Image Sensors.<br />

SK Hynix is the new corporate name of Hynix Semiconductor Inc. following<br />

the merger with SK Telecom on February 14, <strong>2012</strong>. In synergy with SK<br />

Telecom, SK Hynix expects to enhance its competitiveness in<br />

semiconductors, and expand into new global markets.<br />

Silicon Valley Bank is the premier commercial bank for companies in the<br />

technology, life science, cleantech, venture capital, private equity and<br />

premium wine industries. SVB provides a comprehensive suite of financing<br />

solutions, treasury management, corporate investment and international<br />

banking services to its clients worldwide. Through its focus on specialized<br />

markets and extensive knowledge of the people and business issues driving<br />

them, Silicon Valley Bank provides a level of service and partnership that<br />

measurably impacts its clients’ success. Founded in 1983 and headquartered<br />

in Santa Clara, Calif., the company serves clients around the world through<br />

26 U.S. offices and international operations in China, India, Israel and the<br />

United Kingdom. Silicon Valley Bank is a member of global financial services<br />

firm SVB Financial Group (Nasdaq: SIVB), with SVB Analytics, SVB Capital<br />

and SVB Private Bank. More information on the company can be found at<br />

www.svb.com.<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

149


SPONSORS AND<br />

EXHIBITORS<br />

PLATINUM MEDIA PARTNERS<br />

Dow Jones & Company<br />

Dr. Dobb’s<br />

HPCwire<br />

insideHPC<br />

mergermarket<br />

GOLD MEDIA PARTNERS<br />

HPC in the Cloud<br />

Dow Jones Private Equity & Venture Capital is a division of Dow Jones & Co.,<br />

a News Corporation company. Dow Jones Private Equity & Venture Capital<br />

offers integrated content solutions for deal-sourcing, due diligence and<br />

fundraising needs of today’s venture capital and private equity investors,<br />

corporate investors, advisors, and portfolio companies. Core products<br />

include the deal database VentureSource and the fundraising database LP<br />

Source, as well as the highly-respected publications Private Equity Analyst,<br />

VentureWire, Daily Bankruptcy Review and LBO Wire..<br />

Dr. Dobb’s is the most respected development-focused brand helping<br />

application and software development professionals make the right<br />

decisions for their businesses. Dr. Dobb’s provides deep content that<br />

challenges developers to think of new and dynamic ways to create businessfocused<br />

applications while balancing “what can be developed” with practical,<br />

real-world analysis. http://drdobbs.com<br />

HPCwire is the leading publication for news and information on high<br />

performance and data-intensive computing for business and technology<br />

professionals. HPCwire is the #1 resource selected by academic, government,<br />

industrial and vendor communities who are interested in computationallyintensive<br />

computing, including systems, software, applications, middleware,<br />

networking and storage. Subscribe at: www.hpcwire.com.<br />

insideHPC is the web’s premier high performance computing (HPC) short<br />

format news site. insideHPC distills news and events, and presents them in<br />

bite-sized nuggets of helpfulness as a resource for supercomputing<br />

professionals. insideHPC, along with its sister publication, inside-BigData,<br />

pumps out more than 1.2 million monthly page views to a growing<br />

community of readers that now exceeds 61,000 unique monthly visitors.<br />

mergermarket, part of The Mergermarket Group, is an unparalleled,<br />

independent M&A intelligence tool used by the world’s foremost financial<br />

institutions to originate deals. It provides proprietary intelligence on<br />

potential deal flow, potential mandates and valuations via the world’s largest<br />

group of M&A journalists and analysts who have direct access to the most<br />

senior decision-makers and corporates.<br />

HPC in the Cloud is dedicated to covering data-intensive cloud computing<br />

in science, industry and the data center. The publication provides<br />

technology decision-makers and stakeholders in the high performance<br />

computing industry on developments happening in the point where high<br />

performance and cloud computing intersect. Subscribe now at:<br />

http://www.hpcinthecloud.com/xs/register.


EXHIBITING COMPANIES<br />

3dmx<br />

AccelerEyes LLC<br />

ACE Computers<br />

Advantest<br />

Allinea Software<br />

AMAX<br />

Aspen Systems<br />

BioDigital<br />

BOXX Technologies, Inc.<br />

®<br />

Since 2003, 3dmx has been creating extraordinary 3D animation,<br />

stereoscopic 3D, visual effects, visualizations, live action, stop motion<br />

and video games for the medical, technology and entertainment<br />

industries. When in need to present a groundbreaking invention, to<br />

provide user tutorials for specialized machinery and processes,<br />

training material, architectural walkthroughs or preparing an<br />

appealing set of art for marketing campaigns, 3dmx is able to do it<br />

for you, on time and within budget.<br />

AccelerEyes develops and markets fast, simple <strong>GPU</strong> software<br />

libraries. Today, AccelerEyes delivers products which are used to<br />

accelerate C, C++, Fortran, Python, and MATLAB ® codes on CUDA<br />

and OpenCL <strong>GPU</strong>s.<br />

Founded in 1983, Ace Computers is a respected systems integrator<br />

focused on custom requirements and regularly works with major<br />

Universities, Federal Labs, and Corporate clients. We hold WSCA<br />

and GSA Prime contracts in addition to multiple GWACs. Ace is<br />

ISO9001:2008 Certified and is well associated with NVIDIA, Intel<br />

and AMD.<br />

A world-class technology company, Advantest is the leading<br />

producer of automatic test equipment (ATE) for the semiconductor<br />

industry and a premier manufacturer of measuring instruments. Its<br />

leading-edge products are integrated into the most advanced<br />

semiconductor production lines in the world. Founded in Tokyo in<br />

1954, Advantest now operates in 21 countries worldwide.<br />

www.advantest.co.jp<br />

We’re recognized as the leading vendor of tools for parallel software<br />

development and High Performance Computing (HPC). One of the<br />

fastest growing companies in HPC, we were recently honored as a Red<br />

Herring Top 100 company. We have offices in the US and the UK, as<br />

well as network of resellers and partners in most parts of the world.<br />

AMAX, pioneer of the Personal Supercomputer, is a leading<br />

technology provider with over 30 years of solidified partnerships<br />

with technology innovators such as NVIDIA. AMAX excels at<br />

delivering unique and customized HPC cluster, server and storage<br />

solutions that continually push the limits of innovation with<br />

maximum performance and exceptional efficiency.<br />

Aspen Systems, founded in 1982, is an established, privately-held,<br />

two time Inc. 500 corporation that designs, manufactures, and<br />

services computing products including high-performance compute<br />

clusters, systems software, storage/file systems, and visualization.<br />

Aspen Systems places its highest priority on first class technical<br />

support and the creation of fully customized products that always<br />

incorporate the latest technologies. This allows our customers to<br />

enjoy the highest performing solutions at very competitive prices.<br />

BioDigital is the leading developer of state of the art biomedical<br />

visualization. BioDigital recently launched The BioDigital Human <br />

- a 3D visualization platform with a revolutionary approach for<br />

communicating health and medical information with interactive<br />

tools for exploring human anatomy, physiology and conditions.<br />

BOXX is the leading innovator of high-performance workstations<br />

and rendering systems for product design, engineering, visual<br />

effects, animation, architectural visualization, and more. For over<br />

15 years, we’ve combined record-setting performance, speed, and<br />

reliability with unparalleled industry knowledge to become the<br />

trusted choice for creative professionals worldwide.<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

151


SPONSORS AND<br />

EXHIBITORS<br />

Bright Computing<br />

Cirrascale<br />

Colfax International<br />

Concurrent<br />

Creative Consultants<br />

Cyberpower<br />

Digital Storm<br />

reative<br />

onsultants<br />

COMPUTE FASTER!<br />

Bright Computing, a leader in integrated cluster management<br />

software, provides seamless management of NVIDIA <strong>GPU</strong> and<br />

hybrid clusters. Bright is a single solution for provisioning,<br />

scheduling, monitoring and managing clusters. Every Brightmanaged<br />

cluster is also cloud-ready, enabling users to extend their<br />

system into AWS EC2 for access to additional CPUs and NVIDIA<br />

<strong>GPU</strong>s, with a few mouse clicks. All of this capability is accessed via<br />

its intuitive GUI or using Bright’s powerful cluster management<br />

shell. Bright Computing is headquartered in San Jose, CA<br />

http://www.brightcomputing.com<br />

Cirrascale Corporation is a premier provider of advanced GP/<strong>GPU</strong><br />

blade-based workstation and server solutions for conventional and<br />

containerized data centers that are scalable, reliable and offer best<br />

price/performance value in the industry. Cirrascale leverages its<br />

patented Vertical Cooling <strong>Technology</strong> to provide the industry’s most<br />

energy-efficient standards-based platforms with the lowest possible<br />

total cost of ownership in the densest form factor. To learn more<br />

about Cirrascale and its unique GP/<strong>GPU</strong> solutions, please visit<br />

http://www.cirrascale.com or call (888) 942-3800.<br />

Buy it from a trusted expert. Colfax provides the most comprehensive<br />

range of innovative, cutting-edge and highly customized <strong>GPU</strong><br />

solutions. With outstanding price/performance and technical<br />

support, Colfax is a leading choice of scientists and engineers for<br />

<strong>GPU</strong>-accelerated data modeling, simulation and real-time<br />

visualization solutions. Visit www.colfax-intl.com for more details.<br />

Concurrent Computer Corporation (NASDAQ:CCUR) is a worldwide<br />

leader in real-time Linux ® computing technology including real-time<br />

operating systems; advanced debugging and analysis tools;<br />

simulation tools; and fully-integrated multiprocessing/<strong>GPU</strong> computer<br />

platforms. Concurrent focuses on hardware-in-the-loop and<br />

man-in-the-loop simulation, data acquisition and industrial systems.<br />

For more information, please visit www.real-time.ccur.com.<br />

Creative Consultants demonstrates a Multi-Projector Semi-<br />

Immersive Virtual Reality (VR) environment with <strong>GPU</strong> enabled<br />

warping and blending. Our parallel code development appliance<br />

Stelletto computes hundreds of thousands of threads, in real time,<br />

driving the VR display; thus creating an interactive HPC<br />

demonstration with live scaling of calculations for 250,000 particles.<br />

CyberPower, Inc. is one of the nation-wide leading computer system<br />

manufacturers. As published in the Los Angeles Business Journal<br />

in 2003, we were the fastest growing private company in Los<br />

Angeles. With vision, commitment, and steadfast determination, we<br />

manufacture and distribute various customized high-end gaming<br />

machines, notebook systems and high performance workstations<br />

to meet the unique needs for gamers, businesses, government<br />

agencies, educational institutions and other end-users.<br />

Founded in 2002, Digital Storm has rapidly emerged as the<br />

predominant name in system integration. With expertise in<br />

workstation computers, Digital Storm’s mission is to deliver its<br />

customers bleeding edge technology with direct support. As a<br />

validation of Digital Storm’s success, its systems have received the<br />

industry’s most prestigious awards.


EM Photonics<br />

Exxact Corporation<br />

eyesight Mobile technologies<br />

Ltd.<br />

Fuzzy Logix<br />

GraphStream Incorporated<br />

Green Revolution Cooling<br />

Immersive Media<br />

JMR Electronics, Inc.<br />

MathWorks<br />

��<br />

���������������<br />

Innovators in Storage<br />

Technologies<br />

EM Photonics’ core competency lies in its strength with using <strong>GPU</strong>s,<br />

FPGAs, and other parallel computing platforms to accelerate extremely<br />

complex computational applications. We have developed products in the<br />

areas of image processing, linear algebra, and scientific computing and<br />

worked with clients in fields from finance to defense to life sciences.<br />

Founded in 1992, Exxact Corporation is both a value-added<br />

distributor of professional workstation graphics cards and a<br />

manufacturer of solutions for visualization and compute-intensive<br />

applications. In addition, Exxact offers software and services to<br />

develop, port, maintain, and deploy applications for <strong>GPU</strong> computing.<br />

eyeSight’s Touch Free technology provides an enhanced user<br />

experience, allowing to easily and intuitively control a variety of devices<br />

using simple hand gestures. eyeSight’s Natural User Interface<br />

solution utilizes the device’s standard 2D camera, along with advanced<br />

real-time image processing and machine vision algorithms, to track<br />

the user’s hand gestures and convert them into actions.<br />

Fuzzy Logix is the leading provider of in-database analytics software<br />

and <strong>GPU</strong>-based analytics solutions. Our <strong>GPU</strong> Appliance, TANAY,<br />

makes accessing the power of <strong>GPU</strong> technology easy and includes a<br />

library of over 300 analytic functions that can be invoked from DLLs<br />

or Shared Objects. Additional Information: http://www.fuzzl.com<br />

GraphStream is a supplier of advanced scalable systems for data<br />

networking, processing, and storage. These systems are customconfigured<br />

to meet specific application requirements with superior<br />

simplicity, reliability, scalability, and efficiency. Since 2003,<br />

GraphStream has worked together with PNY and NVIDIA to deliver<br />

some of the world’s most powerful <strong>GPU</strong>-accelerated systems.<br />

Green Revolution Cooling (GRC) provides the highest performance,<br />

lowest cost-per-Watt cooling system available today for data centers.<br />

The CarnotJet system submerges fanless OEM servers into a<br />

managed dielectric fluid environment, reducing cooling energy by<br />

95% while providing powerful and continuous heat removal for even<br />

the highest density servers.<br />

Immersive Media is the pioneer and leading world provider of 3600,<br />

full motion, interactive video. Our immersive 3600 video content is<br />

delivered via internet to PC, Ipad or mobile device. Immersive Media<br />

provides the enabling technologies for interaction videos to record,<br />

process, live stream and deliver images from ours or other wide<br />

field cameras, with a patent portfolio covering key discoveries and<br />

capabilities of interactive and immersive video.<br />

JMR ELECTRONICS INC. is a 30-year established ISO 9001 certified<br />

design, development and manufacturing resource for high<br />

performance computing and storage systems based in Chatsworth,<br />

CA. JMR’s award-winning BlueStor and SilverStor systems are<br />

widely used in broadcast, digital intermediate, geophysical survey,<br />

post-production and scientific applications.<br />

Over one million people around the world use MATLAB for technical<br />

computing. They rely on MATLAB to help them develop cancer<br />

therapies, search for new sources of energy, make our cars safer<br />

and more fuel efficient, and explore outer space. By combining a<br />

powerful numeric engine and technical programming environment<br />

with interactive exploration and visualization tools, MATLAB has<br />

become the language of technical computing. For more<br />

information, visit www.mathworks.com<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

153


SPONSORS AND<br />

EXHIBITORS<br />

MBA Sciences<br />

Mellanox Technologies<br />

Mentor Graphics Corp.<br />

Mersive<br />

Microway Inc.<br />

migenius<br />

Morgan Kaufmann<br />

MulticoreWare Inc.<br />

Deliver on the promise of Data and Graph Analytics. MBA Sciences<br />

enables engineers and scientists to rapidly prototype, analyze and<br />

deploy robust parallel solutions across heterogeneous computing<br />

resources spanning servers, cores and <strong>GPU</strong>s from either data<br />

centers or public clouds.<br />

Mellanox Technologies (NASDAQ: MLNX, TASE: MLNX) is a leading<br />

supplier of end-to-end InfiniBand and Ethernet connectivity<br />

solutions and services for servers and storage. Mellanox products<br />

optimize data center performance and deliver industry-leading<br />

bandwidth, scalability, power conservation and cost-effectiveness<br />

while converging multiple legacy network technologies into one<br />

future-proof architecture. www.mellanox.com<br />

The Mentor Graphics ® Embedded Software Division comprises the<br />

Mentor ® Embedded family of products and services, including<br />

embedded software intellectual property (IP), tools, and professional<br />

consultant services to help embedded developers and silicon<br />

partners optimize their products for design and cost efficiency. The<br />

Mentor Embedded team continues to lead the industry with<br />

involvement in the open source community, with Inflexion ® 2D and 3D<br />

UI development, Sourcery open source tools, and Nucleus ® RTOS<br />

solutions. More information on Mentor Embedded products and<br />

services can be found at www.mentor.com/embedded<br />

Since it was founded in 2006, Mersive has revolutionized high<br />

performance display setup and maintenance enabling a new class of<br />

displays. Mersive’s Sol software automatically aligns multiple<br />

commodity projectors into one seamless image of extraordinary<br />

quality and resolution without the expense of specialized hardware<br />

and services. For more information, visit www.mersive.com<br />

Since 1982, Microway has earned an international reputation for<br />

building screaming fast HPC clusters, servers, and<br />

WhisperStations. Since 2007, these have included Tesla <strong>GPU</strong>s.<br />

Utilizing multi-core CPUs, high-efficiency power, robust designs<br />

and excellent cooling, Microway’s <strong>GPU</strong> clusters deliver more<br />

TFLOPs with fewer watts. Our unique Tesla systems offer full PCI-E<br />

Gen3 support and optional FDR InfiniBand.<br />

The migenius mission is to bring software and web services to the<br />

market that enable ‘live 3D for all’ for better and much faster<br />

decision making in design and marketing. Leveraging the power of<br />

the cloud, <strong>GPU</strong> and NVIDIA iray, migenius provides platforms and<br />

applications to make this a reality.<br />

Morgan Kaufmann delivers the knowledge of experts to the<br />

computing community. Through superior print and digital content,<br />

our authors aim to educate our readers and inspire innovation.<br />

MulticoreWare, Inc. develops tools and software solutions for<br />

homogeneous and heterogeneous architectures for profiling,<br />

optimization and portability. With significant expertise in <strong>GPU</strong> and<br />

multicore CPU programming models and in OpenCL, the company<br />

has delivered tools and software solutions in architectures such as<br />

OpenMP and CUDA to high-performance applications including<br />

video and image processing.


NeST/SFO Technologies<br />

Numecent<br />

Numira Biosciences<br />

Patriot Technologies<br />

PEER 1 Hosting<br />

Penguin Computing<br />

PGI<br />

SFO Technologies, a NeST Group company, offers end-to-end<br />

engineering solutions to OEMs in Healthcare, Industrial,<br />

Communications and Transportation verticals. Services include<br />

hardware and software design, embedded product engineering,<br />

application development, prototyping, testing and manufacturing.<br />

An early adopter of GP<strong>GPU</strong>, and a CUDA Design Partner of NVIDIA,<br />

NeST specializes in <strong>GPU</strong> computing and 3D Graphics solutions,<br />

leveraging a highly skilled team and a streamlined process to<br />

deliver industry leading speedup and optimization.<br />

Numecent (www.numecent.com) is a start-up which came out of<br />

stealth with a bang in March <strong>2012</strong> and is the inventor of<br />

‘cloudpaging’. This patented technology enables friction-free digital<br />

delivery of native software and other non-linear assets through<br />

virtualization. One of the benefits of cloudpaging is that it can<br />

reduce the network footprint of digital downloads between 20x and<br />

100x and execute them natively, at full speed, without actually<br />

requiring installation. Once cloudpaged, applications can even run<br />

off-line and always under license control.<br />

Numira Biosciences is a leading provider of specialty contract<br />

research services for preclinical drug and device development.<br />

Numira’s customers include the top biopharmaceutical companies<br />

and academic research institutions. Through its next-generation<br />

study portal, Numira provides its customers with interactive tools<br />

for accessing, exploring, and communicating about their preclinical<br />

study data.<br />

Patriot’s Manufacturing and Logistics Services enables software<br />

developers, application users and solution providers to optimize their<br />

software applications on a reliable, branded and customized hardware<br />

platform. By choosing Patriot, customers can leverage an appliancebased<br />

model with minimal investment and realize the benefits of<br />

faster time-to-market, increased profitability and business growth.<br />

Two obsessions – Ping & People – have made us one of the world’s<br />

leading hosting providers. Our proprietary 10Gbps FastFiber Network <br />

and 18 datacenters connect our customers to the world. And our<br />

FirstCall Promise supports over 10,000 businesses 24x7x365. The first<br />

large-scale <strong>GPU</strong> Cloud is just one of our hosting innovations.<br />

For well over a decade Penguin Computing has been delivering<br />

integrated, Linux based solutions for the enterprise and HPC space.<br />

With Linux expertise that is unmatched in the industry Penguin<br />

Computing offers an end-to-end portfolio of products that range<br />

from Linux servers and workstations to integrated, turn-key HPC<br />

clusters and cluster management software.<br />

The Portland Group ® is a premier supplier of software compilers<br />

and development tools for parallel computing. PGI ® offers high<br />

performance scalar and parallel Fortran, C and C++ compilers and<br />

tools for systems based on 64-bit x86 processors from Intel and<br />

AMD, and NVIDIA CUDA-enabled <strong>GPU</strong>s running under Linux,<br />

MacOS and Windows operating systems.<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

155


SPONSORS AND<br />

EXHIBITORS<br />

Polywell<br />

PQ Labs, Inc<br />

Prefixa<br />

Ramtron International<br />

Corporation<br />

Raytrix GmbH<br />

Reservoir Labs<br />

RTT<br />

raytrix<br />

3D light field camera<br />

Scalable Display <strong>Technology</strong><br />

Polywell, established in 1987, is a manufacturer of high quality<br />

computer products. Its lineup ranges from industrial embedded<br />

PCs and storage solutions to high-performance workstations and<br />

high-end servers. Polywell has been serving the needs of various<br />

commercial and government entities with systems for CAD/CAM,<br />

animation, content creation, and for data centers. Polywell also<br />

offers OEM/ODM services for various vertical markets, such as<br />

Digital Signage, Kiosk, POS, Surveillance, IPTV, entertainment,<br />

gaming, medical equipment, network appliance and IP Phone.<br />

Established in Silicon Valley, PQ Labs, Inc. is a leading provider of<br />

Multi-Touch solution in the world, providing revolutionary hardware<br />

and software to eliminate the need of keyboard and mouse for<br />

future computers. PQ Labs’ Multi-Touch G³ enables people to<br />

interact with computers directly using just fingers and gestures.<br />

The company’s key technology improvement is enabling a next<br />

generation of natural user interface to be widely adopted in the<br />

computer industry.<br />

Prefixa develops 3D solutions for 3D data capture, model and<br />

render, accelerated with Nvidia <strong>Technology</strong>. Our core technology is a<br />

3D Photorealistic Render Engine natively implemented in NVIDIA-<br />

CUDA, and scalable to multiple <strong>GPU</strong> - Multiple CPU nodes. We are<br />

looking for key partners to scale our solution to the cloud, and build<br />

business around our platform.<br />

Ramtron International Corporation, headquartered in Colorado<br />

Springs, Colorado, is a fabless semiconductor company that<br />

designs, develops and markets specialized semiconductor memory<br />

and integrated semiconductor solutions used in a wide range of<br />

product applications and markets worldwide. For more information,<br />

visit www.ramtron.com.<br />

Raytrix develops and markets single-lens 3D video cameras based<br />

on their patented high resolution light field technology, offering<br />

solutions for Particle Image Velocimetry (PIV), optical inspection,<br />

face capturing, microscopy – as well as IP for consumer products<br />

(mobile phones).<br />

Privately owned and in business since 1990, Reservoir Labs<br />

specializes in advanced compiler, network and reasoning<br />

technologies with an emphasis on mapping innovative algorithms<br />

to high performance and embedded architectures. We deliver<br />

cutting-edge technology products, customized solutions and<br />

advanced R&D services to our commercial and government clients.<br />

RTT stands for creative and fascinating 3D visualization solutions,<br />

which bring products to life in realtime and portray them in a<br />

natural and realistic environment. Our RTT Virtual Prototyping and<br />

RTT Virtual Marketing products and services combine software,<br />

support and customized strategic solutions, allowing us to turn<br />

dreams into reality.<br />

Scalable Display Technologies is a global leader providing software<br />

tools to construct and manage ultra high-resolution displays.<br />

Scalable’s software is used by the military and Global 1000<br />

accounts to enhance productivity through higher resolution and<br />

increased visual realism of displays. Scalable’s products are<br />

spawning a new class of displays called “multi-megapixel displays”.


SECO<br />

Seneca<br />

Splashtop Inc<br />

Terascala, Inc.<br />

Themis Computer<br />

TunaCode<br />

TYAN<br />

Seco, International company leader in the electronic embedded<br />

solutions, over its 30 years has shown the capability to adapt its<br />

know-how to meet the new challenging customer needs guiding the<br />

customer to its most innovative solutions. The collaborations with<br />

important scientific Universities and partnerships with the worldwide<br />

leading companies have contributed to transform Seco in an<br />

International reality that have owned the market based on the new<br />

challenges of the ordinary days.<br />

Seneca is a premier U.S.-based custom system manufacturer and<br />

value-added technology distributor with over 30 years of experience.<br />

As a designer and manufacturer of High Performance Computing<br />

Clusters, Seneca supports academic, lab, government, and defense<br />

researchers across the nation. Our HPC practice includes solutions for<br />

compute clusters, NVIDIA GP<strong>GPU</strong> platforms, technical computing<br />

workstations, storage systems, and management software.<br />

Splashtop aspires to touch people’s lives by delivering the best-inclass<br />

remote desktop experience - bridging tablets, phones,<br />

computers and TVs. Splashtop technology empowers consumer<br />

and business users with high-performance, secure, interactive<br />

access to their favorite applications, media content and files<br />

anytime, anywhere. Splashtop is headquartered in San Jose with<br />

offices in Beijing, Hangzhou, Shanghai, Taipei and Tokyo. For more<br />

information, visit http://www.splashtop.com.<br />

Terascala’s high throughput storage solutions make big data fast.<br />

With Terascala, organizations transition from storing and sifting their<br />

data to leveraging that data to drive applications. Combining a<br />

parallel file system, extensive analysis and optimization, appliances<br />

enable rapid analysis of big data sets using large server installations.<br />

Themis combines industry leadership, high-performance<br />

computing, and advanced thermal and mechanical design<br />

techniques to deliver reliable, rugged standards-based and custom<br />

embedded computing solutions. From small form factor computers<br />

to large blade servers, Themis is committed to building products<br />

that achieve a superior balance the between standard commercial<br />

technology and ruggedness to keep mission-critical applications<br />

available in the most demanding environments. Our diverse product<br />

portfolio includes: board-level computers, rack mounted servers,<br />

bladed server systems, mission and payload systems, small form<br />

factors, and storage appliances.<br />

TunaCode delivers accelerated computing solutions making<br />

innovative use of multi-core and manycore processors. We develop<br />

and market CUVILib which offers <strong>GPU</strong>-accelerated Vision and<br />

Imaging functionality with plug-and-play ease of use resulting in<br />

instant speedups of 10X. With over 1000 active users and<br />

commercial deployments in Medical Imaging, Industrial/Defense<br />

Imaging and Entertainment domains, CUVILib offers cost-effective<br />

way to achieve real-time performance in Imaging applications.<br />

PNY and TYAN have established a new EMEA partnership to offer a<br />

wide range of NVIDIA <strong>GPU</strong>-based computing platforms designed for<br />

High Performance Computing (HPC) and massive parallel computing<br />

environments. As companion processor to the CPU in a server,<br />

NVIDIA TESLA <strong>GPU</strong>s accelerate HPC applications by up to 10x.<br />

CONFERENCE GUIDE SPONSORS AND<br />

EXHIBITORS<br />

157


SPONSORS AND<br />

EXHIBITORS<br />

Ubitus Inc.<br />

USEFULPROGRESS<br />

WILD Systems (HPC Project)<br />

Wolfram Research, Inc.<br />

Wurth Electronics Midcom<br />

Zoobe<br />

WILD SYSTEMS<br />

Ubitus Inc., the technology leader in deploying Cloud-enabled rich<br />

media services, offers innovative cloud computing solutions for<br />

device manufacturers, wired/wireless communication service<br />

providers, telecommunication operators and digital content<br />

developers. Founded in 2007 and headquartered in Taipei, Taiwan,<br />

the company now has 150 employees and 4 offices in Tokyo, Beijing,<br />

Guangzhou and Seoul.<br />

The development in computer graphics allows huge progress in the<br />

knowledge of Life and Matter. In Medical science, CT scanners<br />

allow to investigate the whole body with transparency. A very<br />

important step in data analysis consist to convert signals (X, MR,<br />

US) in digital data that could be treated by computers<br />

UsefulProgress develops new software strategies based on<br />

computer graphics for highperformance visualisation.<br />

Wild Systems is a recognized expert in software performance<br />

optimization. At Wild Systems, we combine know how and tools for<br />

automatic code parallelization. This allows the user to run its<br />

optimized application on hybrid architecture appliances. Connected<br />

to the network, these appliances, fully dedicated to a given<br />

optimized application software, boosts its execution performance.<br />

Research is the company where “computation meets knowledge.”<br />

A powerhouse in technical innovation, the company is the developer of<br />

Mathematica, the ultimate computation platform, and Wolfram|Alpha,<br />

the computational knowledge engine. Wolfram also sponsors the<br />

world’s largest free network of technical information websites,<br />

including MathWorld and the Wolfram Demonstrations Project.<br />

Würth Elektronik is one of the world’s leading manufacturers of<br />

passive and electromechanical components. Our product range<br />

contains EMC ferrites, filter chokes, common mode chokes, circuit<br />

protection EMI shielding material, power inductors, power<br />

transformers, LAN and telecom transformers, RF inductors,<br />

LTCC components, connectors, switches, assembly technique and<br />

power elements.<br />

Zoobe is a messaging service that allows you to voice an animated<br />

character. From your voice or text message and your chosen<br />

character we generate a personal animation clip within seconds<br />

which you can send to your friends or post on your wall.


<strong>GTC</strong> WORLDWIDE<br />

EVENTS<br />

SAVE THE DATES<br />

<strong>GTC</strong> JAPAN <strong>2012</strong><br />

July 26<br />

Tokyo Midtown Hall<br />

www.gputechconf.jp<br />

<strong>GTC</strong> U.S. 2013<br />

March 19–22<br />

San Jose McEnery Convention Center<br />

www.gputechconf.com


STAY EDUCATED!<br />

<strong>GTC</strong> is comprised of year-round international<br />

conferences, workshops and online events. It is an<br />

essential resource for the scientists, engineers,<br />

researchers, and developers who rely on <strong>GPU</strong>s to tackle<br />

enormous computational challenges. <strong>GTC</strong> On-Demand<br />

gives you archival access to the world-class education<br />

delivered at <strong>GTC</strong>, as well as the latest research and<br />

insights presented by NVIDIA staff at other important<br />

industry events. Explore and learn from the best and<br />

brightest minds working in High Performance<br />

Computing today. Visit www.gputechconf.com<br />

Blog - http://blogs.nvidia.com/category/supercomputing/<br />

Facebook - https://www.facebook.com/gputechnologyconference<br />

Twitter - http://twitter.com/#!/gpucomputing<br />

LinkedIn - http://www.linkedin.com/groups?about=&gid=2159196<br />

Flickr - http://www.flickr.com/photos/nvidia/collections/<br />

YouTube - http://youtube.com/user/nvidiatesla<br />

Meetup - http://hpc.meetup.com/<br />

STAY CONNECTED!<br />

<strong>GTC</strong> attendees are talented. No doubt you’ve had firsthand<br />

experience of this here at <strong>GTC</strong> <strong>2012</strong>. Attendees<br />

work in major industry verticals such as Finance,<br />

Government, Life Sciences, Energy, Computer Software<br />

Development, Manufacturing, as well as Academia. <strong>GTC</strong><br />

provides invaluable opportunities for peer-to-peer<br />

learning and connection within and across industries all<br />

year long. Build on the relationships you made this week.<br />

Stay connected!


VISUALIZE A GREEN EVENT<br />

Place compostables and recyclables in proper bins<br />

Use public transportation during the show<br />

In hotel, decline new sheets and towels<br />

Also, unplug phone and laptop chargers<br />

Offset your travel at www.cool-it.us<br />

Take only collateral/giveaways you will use<br />

What We’re Doing<br />

> 100% of convention center’s greenhouse gas is offset<br />

> Extensive composting and recycling<br />

> Producers and vendors agree to green guidelines<br />

> Minimizing printed materials<br />

> Using recycled and biodegradable paper/non-toxic inks<br />

> Monitoring lighting and A/C usage<br />

> Local-based food options when available<br />

> Non-toxic cleaning materials<br />

�������������<br />

�������������������������������<br />

�������������������������������<br />

����������������������<br />

�����������������������<br />

�����������<br />

���������������������������������


FIRST FLOOR<br />

TO ST. CLAIRE HOTEL<br />

BALLROOMS<br />

(ACROSS THE STREET)<br />

GOLD SPONSORS<br />

SILVER SPONSORS<br />

SECOND FLOOR<br />

SALES<br />

OFFICE<br />

SPEAKER D<br />

READY ROOM<br />

SPEAKER &<br />

SPONSOR<br />

LOUNGE<br />

K L M N<br />

E<br />

PRESS<br />

LOUNGE<br />

STAIRS DOWN<br />

TO ROOMS K, L, M, N<br />

MICROSOFT<br />

LOUNGE<br />

STORES<br />

THINK TANK<br />

SILICON VALLEY<br />

BOARD ROOM<br />

NVIDIA MEETING ROOM<br />

CHECK-IN<br />

GUADALUPE<br />

MARRIOTT<br />

SAN CARLOS BALL-<br />

ROOM 3<br />

MARRIOTT<br />

WILLOW GLEN<br />

C<br />

B<br />

3<br />

2<br />

1<br />

A3<br />

A2<br />

A1<br />

POSTERS<br />

BALL-<br />

ROOM 4<br />

LAB<br />

A5<br />

A7<br />

A8<br />

MAIN ENTRANCE<br />

KEYNOTE HALL EXHIBIT HALL<br />

HALL 1 HALL 2<br />

STAIRS<br />

DOWN TO LAB<br />

ELEVEVATOR TO BLOSSOM HILL,<br />

ALMADEN AND<br />

3RD FLOOR MEETING ROOMS<br />

PARKING<br />

REGISTRATION<br />

PLATINUM MEDIA SPONSORS GOLD MEDIA SPONSORS<br />

HILTON<br />

STAFF &<br />

SHOW MANAGEMENT<br />

HILTON<br />

J3<br />

J2<br />

J1<br />

F2 H<br />

F1 G<br />

ALMADEN<br />

CONCOURSE<br />

ELEVATOR TO<br />

2ND FLOOR<br />

VIP MEETING ROOM<br />

�������������<br />

�������������������������������<br />

�������������������������������<br />

����������������������<br />

�����������������������<br />

© <strong>2012</strong> NVIDIA CORPORATION. ALL RIGHTS RESERVED. �����������<br />

���������������������������������

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!