GTC 2012 Program Guide - GPU Technology Conference
GTC 2012 Program Guide - GPU Technology Conference
GTC 2012 Program Guide - GPU Technology Conference
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
PRESENTED BY PLATINUM SPONSORS<br />
MAY 14-17, <strong>2012</strong> | SAN JOSE, CA<br />
PROGRAM<br />
GUIDE
The Power<br />
to do More<br />
HP <strong>GPU</strong> computing ranging from personal<br />
supercomputing Z-workstations to the<br />
world’s most self-sufficient <strong>GPU</strong> enabled<br />
servers. Come and talk with the HP<br />
<strong>GPU</strong> experts about the performance,<br />
HP Ad<br />
efficiency and agility you get with HP.<br />
Visit HP Booth #47 for<br />
more information.<br />
www.hp.com/go/zworkstations<br />
www.hp.com/go/accelerators
WELCOME<br />
TO <strong>GTC</strong><br />
Dear <strong>GTC</strong> Attendees,<br />
Back in 2009 we had an idea to bring together the wide<br />
variety of people who use <strong>GPU</strong>s in their work.<br />
Disciplines from quantum chemistry to computational<br />
fluid dynamics and astrophysics. People from every<br />
corner of the world. We hosted our first <strong>GPU</strong><br />
<strong>Technology</strong> <strong>Conference</strong>.<br />
We were proud of its success. More than 90 percent of<br />
the presentations were made by people outside<br />
NVIDIA. Meetings spilled into the hallways. The place<br />
buzzed with energy. We realized that <strong>GPU</strong> computing<br />
was bigger than NVIDIA. And that <strong>GTC</strong> was a conduit<br />
into the collective power of brilliant scientists,<br />
technologists and thought leaders. It was an honor to<br />
host it on your behalf.<br />
In 2010, we doubled down, with more than 280 sessions,<br />
and 2,000 attendees. The reach of <strong>GPU</strong> computing was<br />
growing. And its impact was truly breathtaking.<br />
Researchers from Adobe showcased their work in<br />
computational photography, which one day will redefine<br />
the field. A surgeon described how <strong>GPU</strong>s were vital in<br />
performing surgery on a beating heart.<br />
And <strong>GTC</strong> <strong>2012</strong> promises to be better still.<br />
You can choose from among hundreds of sessions.<br />
Among them are talks by Oak Ridge National<br />
Laboratory on using <strong>GPU</strong>s to build Titan, the world’s<br />
largest supercomputer; Tokyo Institute of <strong>Technology</strong>,<br />
winner of last year’s Gordon Bell Prize, on<br />
stereoscopic 3D visualization; and Beijing’s BGI on<br />
using <strong>GPU</strong>s for bioinformatics research. A variety of<br />
entrepreneurs will speak at the Emerging Companies<br />
Summit about how their startups use <strong>GPU</strong>s.<br />
<strong>GTC</strong> will also play host for the first time to two other<br />
events. Los Alamos National Laboratory will hold its<br />
Accelerated HPC Symposium, bringing together world<br />
leaders in supercomputing. InPar will provide a<br />
first-tier academic venue for peer-reviewed, archival<br />
publications in the emerging fields of parallel<br />
computing.<br />
And NVIDIA will discuss Kepler, our first new<br />
architecture in two years, and its impact on computing.<br />
Our first Kepler-based graphics card recently launched<br />
to fantastic reviews. We can’t wait to share how these<br />
powerful, super energy-efficient <strong>GPU</strong>s open up new<br />
horizons in high performance computing and scientific<br />
discovery.<br />
We will also be talking much more about <strong>GPU</strong> and<br />
cloud computing, as well as our Maximus technology,<br />
which creates a workstation so powerful that it<br />
simulates the physics of a design while it is being<br />
created.<br />
It should make for the best <strong>GTC</strong> yet.<br />
Enjoy the conference!<br />
Sincerely,<br />
The NVIDIA <strong>GTC</strong> Team<br />
CONFERENCE GUIDE
“0.1 of a second can be the difference<br />
between winning and losing in Formula One.<br />
Data analysis doesn’t get much<br />
more critical than that.”<br />
DELL AD?<br />
See how Dell helped Caterham F1 Team deploy an<br />
enterprise-class IT system able to do real-time analysis of<br />
data sent from the car, while withstanding the intense heat<br />
and vibration of the Formula 1 TM<br />
trackside environment.<br />
Learn more at Dell.com/EfficientIT.<br />
Mark Smith<br />
Technical Director<br />
Caterham F1 Team<br />
Join our breakout session on Wednesday, May 16th at 2pm, room M, where Dr. Jeff Layton,<br />
HPC Enterprise Technologist, will discuss compelling new technology advancements in <strong>GPU</strong> Computing.
IMPORTANT INFORMATION<br />
If there is anything we can do to make your conference experience better, please stop by the<br />
info desk and let us know.<br />
REGISTRATION / INFORMATION DESK HOURS<br />
SUNDAY, MAY 13<br />
16:00 to 18:00<br />
MONDAY, MAY 14<br />
08:00 to 18:00<br />
TUESDAY, MAY 15<br />
07:00 to 19:00<br />
WEDNESDAY, MAY 16<br />
08:00 to 18:00<br />
THURSDAY, MAY 17<br />
08:00 to 16:00<br />
EXHIBIT AND MEAL HOURS<br />
TUESDAY, MAY 15<br />
WEDNESDAY, MAY 16<br />
THURSDAY, MAY 17<br />
12:00 to 14:00 Lunch / Exhibits Open<br />
18:00 to 20:00 Reception / Exhibits Open<br />
12:00 to 14:00 Lunch / Exhibits Open<br />
18:00 to 20:00 Reception / Exhibits Open<br />
12:00 to 14:00 Lunch / Exhibits Open<br />
ENROLL IN YOUR SESSIONS Go to https://registration.gputechconf.com/schedule and log in to<br />
start adding sessions to your personal schedule. Priority access<br />
into each session will be given to those who enroll. Enrolling in<br />
sessions also helps us schedule the most popular sessions in the<br />
largest rooms.<br />
WIRELESS INTERNET ACCESS Free wireless internet access can be found under <strong>GTC</strong><strong>2012</strong> and is<br />
available in most session rooms, keynote hall, exhibit hall and<br />
throughout the concourse.<br />
DOWNLOAD THE MOBILE APP Keep up-to-date with the latest news and information at the<br />
conference through the <strong>GTC</strong> <strong>2012</strong> Mobile App. Download it from the<br />
Android market at https://play.google.com/store. You can also<br />
access news and announcements from the home page of<br />
www.gputechconf.com.<br />
BUSINESS CENTER / SHIPPING The Marriott Hotel and the Hilton Hotel both have business centers<br />
located on the first floor, near their respective front lobbies.<br />
Alternatively, there is a Fedex Office Print & Ship Center at 93 E. San<br />
Carlos Street, near 3rd Street (3 blocks from the Convention Center,<br />
call 408-295-4336 for hours).<br />
GO GREEN! Take part in the shared goal of minimizing our collective impact on<br />
the environment. Please take only the conference materials you<br />
need and recycle, and reuse, whenever possible throughout the<br />
week. Please turn in your badges for recycling at the conclusion of<br />
the event.<br />
BAG AND COAT CHECK Bag check is available at the bell desk of the Marriott and Hilton<br />
hotels, connected to the Convention Center. It is also available on the<br />
concourse of the Convention Center.<br />
LOST AND FOUND Please check the information desk should you lose or find an article.<br />
FIRST AID / EMERGENCY Should there be a medical emergency, please dial 911 and alert the<br />
nearest conference personnel.
Lenovo ® recommends Windows ® 7 Professional.<br />
MONTHS OF PLANNING | A FUTURISTIC MOVIE SET | AN IDEA TURNED TO REALITY.<br />
LENOVO AD?<br />
DREAM.<br />
CREATE.<br />
INTRODUCING THE LENOVO® THINKSTATION® 30 SERIES,<br />
FEATURING THE D30 FOR HIGH-END GRAPHICS AND PROCESSING<br />
POWER.<br />
The Lenovo ThinkStation® 30 series was designed for those who push technology to the limits and depend<br />
on professional applications and platforms to get there. The ThinkStation® 30 Series is certified to run the<br />
applications you need most from Adobe, Autodesk, Dassault Systemes, PTC and Siemens. Designed to tackle<br />
the biggest challenges, the D30 delivers the ultimate in performance and expandability. And now armed with<br />
the latest generation of Intel® Xeon® processors, Genuine Windows ® 7 Professional and supporting discrete<br />
Quadro and Tesla graphics technology from NVIDIA® - you can defy expectations like never before.<br />
Energy-efficient � Quiet Acoustics � Scalable Storage � ISV-certified<br />
www.lenovo.com/thinkstation<br />
Lenovo, the Lenovo logo, For Those Who Do and ThinkStation are trademarks or registered trademarks of Lenovo. Microsoft and Windows are registered trademarks of Microsoft Corporation in<br />
the U.S. and other countries. Intel and Intel Xeon are registered trademarks of Intel Corporation in the U.S. and other countries. Nvidia is registered trademarks of Nvidia Corporation in the<br />
U.S. and other countries.<br />
© Lenovo <strong>2012</strong>. All rights reserved.
1<br />
3<br />
6<br />
10<br />
20<br />
23<br />
27<br />
47<br />
69<br />
83<br />
103<br />
145<br />
160<br />
TABLE OF CONTENTS<br />
Welcome Letter<br />
Important Information<br />
<strong>Conference</strong> Highlights - Don’t Miss These Events!<br />
Emerging Companies Summit<br />
Los Alamos National Laboratory Accelerated High<br />
Performance Computing Symposium<br />
Sessions Listing - Monday<br />
Sessions Listing - Tuesday<br />
Sessions Listing - Wednesday<br />
Sessions Listing - Thursday<br />
Research Posters Listing<br />
Speakers and Panelists Listing<br />
Sponsors and Exhibitors<br />
Stay Connected!
CONFERENCE<br />
HIGHLIGHTS –<br />
DON’T MISS<br />
THESE EVENTS!<br />
NVIDIA ® Nsight Lab<br />
The lab will be open daily for product discussions, testing of your application with<br />
the latest version of Nsight, or a place to simply hang out and relax with the Nsight<br />
development team. The lab is located on the first floor next to the Nsight Lab.<br />
C++ AMP LOUNGE, by Microsoft<br />
While attending <strong>GTC</strong>, come learn from the experts at the C++ AMP Lounge by<br />
Microsoft, a casual environment for hands-on learning and instruction. Experts<br />
will be available each day to answer questions and provide instruction. The<br />
lounge is located on the concourse.<br />
Ask the CUDA Expert<br />
Stop by Ask the CUDA Expert on the main concourse for a quick consultation with<br />
NVIDIA software engineers and developer technology experts. Experts on CUDA<br />
C, Fortran, OpenACC, <strong>GPU</strong>-Accelerated Libraries and more will be on hand to<br />
answer your questions. No question is too challenging or too easy for this crew!<br />
Ask the CUDA Expert will be open as follows:<br />
Monday 10:00 to 16:00<br />
Tuesday 12:00 to 19:00<br />
Wednesday 10:00 to 11:00, 12:00 to 19:00<br />
Thursday 10:00 to 11:00, 12:00 to 16:00<br />
DigitalGuru: Where Smart People Get Smarter<br />
DigitalGuru Technical Bookshop of Cupertino, California is pleased to<br />
participate in <strong>GTC</strong> <strong>2012</strong>. Please visit our table during the conference for a wide<br />
and relevant selection of books on parallel programming, computer science,<br />
application tools and more. Books sold at <strong>GTC</strong> are available at 20% off list<br />
price. For more info visit www.digitalguru.com.<br />
Dinner with Strangers<br />
Over a meal in some of the best restaurants in Silicon Valley, engage in lively<br />
conversation and share your best ideas. Pre-reserved tables for small groups<br />
will be made available to <strong>GTC</strong> attendees to mix and mingle with fellow attendees.<br />
Dinner with Strangers is open to all, but space is limited and is on a first come,<br />
first serve basis. Stop by the sign-up board located on the concourse. Dinner<br />
with Strangers happens on Monday and Tuesday night with reservations at 20:00.
SUNDAY, MAY 13<br />
08:30 to 17:35 InPar <strong>2012</strong>, Foundations & Applications of <strong>GPU</strong>, Manycore, and<br />
Heterogeneous Systems (Room J)<br />
MONDAY, MAY 14<br />
08:40 to 17:00 InPar <strong>2012</strong>, Foundations & Applications of <strong>GPU</strong>, Manycore, and<br />
Heterogeneous Systems (Room J)<br />
09:00 to 15:50 Pre-<strong>Conference</strong> Tutorials<br />
16:00 to 18:00 Research Poster Showcase and Reception<br />
TUESDAY, MAY 15<br />
10:30 to 11:50 Opening Keynote with Jen-Hsun Huang, NVIDIA CEO and Co-Founder<br />
(Keynote Hall, Hall 1)<br />
12:00 to 14:00 Exhibits Open / Networking Lunch (Exhibit Hall)<br />
14:00 to 18:00 <strong>GPU</strong>-accelerated Science on Titan: Tapping into the World’s<br />
Preeminent <strong>GPU</strong> Supercomputer to Achieve Better Science, Jack<br />
Wells, Director of Science, Oak Ridge Leadership Computing Facility, Oak<br />
Ridge National Laboratory (Room A2)<br />
16:00 to 16:50 CUDA 5 and Beyond, Mark Harris, Chief Technologist, <strong>GPU</strong> Computing,<br />
NVIDIA (Hall 1)<br />
18:00 to 20:00 Exhibits Open / Networking Reception (Exhibit Hall)<br />
WEDNESDAY, MAY 16<br />
9:00 to 9:30 Emerging Companies Summit Opening Address with Jeff Herbst, VP<br />
Business Development, NVIDIA (Marriott Hotel, Ballroom 4)<br />
09:00 to 10:20 Exascaling Your Apps, moderated by Mike Bernhardt, Publisher, The<br />
Exascale Report (Room C)<br />
11:00 to 11:50 Day 2 Keynote with Dr. Iain Couzin, Professor, Princeton University<br />
(Keynote Hall, Hall 1)<br />
12:00 to 14:00 Exhibits Open / Networking Lunch (Exhibit Hall)<br />
14:00 to 14:50 Emerging Companies Summit Fireside Chat with Jen-Hsun Huang,<br />
NVIDIA CEO and Co-Founder (Marriott Hotel, Ballroom 4)<br />
14:00 to 15:20 Inside Kepler, Stephen Jones, CUDA Developer, NVIDIA, Lars Nyland,<br />
Senior Architect, NVIDIA (Hall 1)<br />
14:00 to 17:55 Los Alamos National Laboratory Accelerated High Performance<br />
Symposium (Room J1)<br />
18:00 to 20:00 Exhibits Open / Networking Reception (Exhibit Hall)<br />
20:00 to 23:00 <strong>GTC</strong> Party (Civic Auditorium)<br />
During a week of rigorous learning, it’s important to cut loose and<br />
celebrate with fellow members of the <strong>GPU</strong> community. Come party and<br />
enjoy the comedic and juggling talents of The Passing Zone and try your<br />
luck in the casino. And don’t forget to raise a glass to your success!<br />
THURSDAY, MAY 17<br />
09:00 to 15:50 Los Alamos National Laboratory Accelerated High Performance<br />
Symposium (Room J1)<br />
11:00 to 11:50 Day 3 Keynote with Robert Boehme CEO & Team Lead, Part-Time<br />
Scientists and Wes Faler, Head of Software Development, Part-Time<br />
Scientists (Keynote Hall, Hall 1)<br />
12:00 to 14:00 Exhibits Open / Networking Lunch (Exhibit Hall)
OPEN GENOMICS ENGINE <br />
Accelerating the DNA-analysis pipeline<br />
for cancer research<br />
Visit the Open Genomics Engine booth (#118) in the<br />
<strong>GTC</strong> exhibit hall to learn more.<br />
Developed by Sponsored by<br />
An NVIDIA Foundation Initiative
Welcome to<br />
NVIDIA’s Emerging<br />
Companies<br />
Summit (ECS) <strong>2012</strong>!<br />
We are thrilled to once again showcase promising<br />
startups that are using the massive computing power<br />
of <strong>GPU</strong> technology to transform existing industries and<br />
create new ones.<br />
From gesture-recognition technology and interactive<br />
video to virtualization and cloud computing, the dozens<br />
of companies from around the world participating in<br />
ECS <strong>2012</strong> are at the cutting-edge of technology. <strong>GPU</strong>s<br />
have recently stormed the handheld computing<br />
market, so you’ll also find a large number of mobile<br />
companies participating in this year’s summit.<br />
ECS itself has become something of a growth industry.<br />
In addition to this being our fourth event in Silicon<br />
Valley, we have recently held successful summits in<br />
Israel and China, with more planned in the near future.<br />
The conference has proven to be a great venue for<br />
startups, analysts, executives and industry experts to<br />
exchange information and understand where<br />
technology is heading.<br />
As a key part of the <strong>GPU</strong> <strong>Technology</strong> <strong>Conference</strong>, ECS<br />
<strong>2012</strong> will be host to hundreds of participants –<br />
including panelists, presenters, analysts, industry<br />
execs and others in our growing audience. Awaiting<br />
them is our best program yet.<br />
This year sees the return of our hugely popular “CEO<br />
on Stage” format, where a select group of CEOs<br />
present their companies to a distinguished panel of<br />
experienced investors, analysts and technology<br />
leaders, who in turn respond with insightful feedback.<br />
NVIDIA CEO and founder Jen-Hsun Huang will also sit<br />
down for another thoughtful and entertaining fireside<br />
chat, this year with Tim Bajarin, president of Creative<br />
Strategies Inc., a leading Silicon Valley industry<br />
analysis and market intelligence firm.<br />
New this year are special events like Startup<br />
University, where presenting and exhibiting companies<br />
will hold workshops on topics such as “Protecting Your<br />
IP Assets in a Global Marketplace” and “Best Practices<br />
for Building Valuable Relationships with <strong>Technology</strong><br />
Industry Analysts.” In addition, the exhibit halls will be<br />
filled with the innovative work of companies in a<br />
diverse array of fields. And this year a jury will select<br />
the most promising companies with the “One to<br />
Watch” awards, announced Wednesday evening in the<br />
Hilton ballroom.<br />
The <strong>GPU</strong> computing ecosystem is growing rapidly –<br />
and you, as an ECS attendee, are a key part of its<br />
success. I encourage you to participate in as many<br />
sessions as possible and thank you for joining us at<br />
what promises to be another superb event.<br />
In closing, I’d like to express gratitude to our sponsors<br />
who are helping to make this event possible, including<br />
Cooley LLP, Morgan Stanley, Silicon Valley Bank,<br />
Deloitte, mergermarket, and Dow Jones Private Equity<br />
& Venture Capital.<br />
Jeff Herbst<br />
Vice President of Business Development, NVIDIA
AGENDA<br />
WEDNESDAY, MAY 16, <strong>2012</strong><br />
MARRIOTT SAN JOSE BALLROOM 4<br />
9:00 to 9:50 S2000 Emerging Companies Summit Opening with Jeff Herbst (VP of<br />
Business Development, NVIDIA), followed by CEO on Stage featuring<br />
� Rocketick (Tomer Ben-David, VP R&D)<br />
� Cortexica (Iain McCready, CEO)<br />
Panelists:<br />
� Jon Peddie, President, Jon Peddie Research<br />
� Neil Sequeira, Managing Director, General Catalyst Partners<br />
� Savitha Srinivasan, Partner, IBM Venture Capital Group<br />
� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />
10:00 to 10:50 S2001 Emerging Companies Summit: CEO on Stage featuring<br />
� Unity Technologies (David Helgason, CEO)<br />
� MirriAd (Mark Popkiewicz, CEO)<br />
� BioDigital (Aaron Oliker, Partner/Director of 3D <strong>Technology</strong> and Frank Sculli,<br />
Co-Founder/Informatics Director)<br />
Panelists:<br />
� Jon Peddie, President, Jon Peddie Research<br />
� Neil Sequeira, Managing Director, General Catalyst Partners<br />
� Savitha Srinivasan, Partner, IBM Venture Capital Group<br />
� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />
11:00 to 11:50 S2002 Emerging Companies Summit: CEO on Stage featuring<br />
� eyeSight Mobile (Gideon Shmuel, CEO)<br />
� Numira Biosciences (David Weinstein, CTO)<br />
� Ubitus (Wesley Kuo, CEO)<br />
Panelists:<br />
� Jon Peddie, President, Jon Peddie Research<br />
� Neil Sequeira, Managing Director, General Catalyst Partners<br />
� Savitha Srinivasan, Partner, IBM Venture Capital Group<br />
� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />
12:00 to 13:50 Networking Lunch and Exhibits (Hall 2 – San Jose Convention Center)
14:00 to 14:50 S2003 Emerging Companies Summit Fireside Chat with Jen-Hsun Huang<br />
(CEO, President and Co-Founder, NVIDIA) and Tim Bajarin (President of<br />
Creative Strategies)<br />
15:00 to 15:50 S2004 Emerging Companies Summit: CEO on Stage featuring<br />
� GAIKAI (David Perry, CEO and Co-Founder)<br />
� Immersive Media (Myles M. McGovern, CEO)<br />
� Numecent (Osman Kent, Co-Founder & CEO)<br />
Panelists:<br />
� Tom Furlong, Managing Director, Granite Ventures<br />
� Rob Enderle, Principal Analyst, Enderle Group<br />
� Flip Gianos, General Partner, Interwest Partners<br />
� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />
16:00 to 16:50 S2005 Emerging Companies Summit: CEO on Stage featuring<br />
� RealView Imaging (Shaul Geldman, Co-Founder and VP of R&D)<br />
� Elemental Technologies (Sam Blackman, CEO and Co-Founder)<br />
� Mersive (Robert Balgley, CEO)<br />
Panelists:<br />
� Tom Furlong, Managing Director, Granite Ventures<br />
� Rob Enderle, Principal Analyst, Enderle Group<br />
� Flip Gianos, General Partner, Interwest Partners<br />
� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />
17:00 to 17:50 S2006 Emerging Companies Summit: CEO on Stage featuring<br />
� Raytrix (Christian Perwass, CEO)<br />
� Playcast (Guy De Beer, CEO)<br />
� Universal Robotics (David Peters, CEO)<br />
Panelists:<br />
� Tom Furlong, Managing Director, Granite Ventures<br />
� Rob Enderle, Principal Analyst, Enderle Group<br />
� Flip Gianos, General Partner, Interwest Partners<br />
� Jeff Herbst, V.P. Of Business Development, NVIDIA<br />
18:00 to 19:50 Networking Reception (Hall 2 - San Jose Convention Center)
Cooley is a proud Platinum Sponsor of<br />
the <strong>2012</strong> NVIDIA <strong>GTC</strong> <strong>Conference</strong><br />
Emerging Company Summit.<br />
Cooley attorneys have served as counselors, strategists<br />
and advocates to technology entrepreneurs and<br />
investment funds since 1959.<br />
Cooley, a global law firm for the converging worlds of high<br />
technology, high finance and high-stakes litigation.<br />
For more information, visit us at www.cooley.com<br />
Experienced <strong>Guide</strong>s<br />
PALO ALTO | NEW YORK | SAN DIEGO | SAN FRANCISCO | RESTON, VA | BROOMFIELD, CO | WASHINGTON, DC | BOSTON | SEATTLE | SHANGHAI<br />
© <strong>2012</strong> Cooley LLP, 101 California Street, 5th Floor, San Francisco, CA 94111. 415/693-2000.
CEO ON STAGE LISTING<br />
BIODIGITAL<br />
BioDigital is the leading developer of state of the art biomedical visualization.<br />
BioDigital recently launched The BioDigital Human - a 3D visualization platform<br />
with a revolutionary approach for communicating health and medical information<br />
with interactive tools for exploring human anatomy, physiology and conditions.<br />
www.biodigital.com<br />
Speakers Aaron Oliker, Partner/Director of 3D <strong>Technology</strong> and<br />
Frank Sculli, Co-Founder/Informatics Director<br />
Session Time Wednesday, May 16 at 10:45<br />
CORTEXICA VISION SYSTEMS<br />
Cortexica Vision Systems are the award winning creators of a bio-inspired vision<br />
system enabling intelligent image recognition using principles derived from the<br />
human visual cortex. Cortexica provides a patented platform for radically new<br />
Visual Search products that deliver exciting new experiences and value for<br />
consumers and businesses.<br />
www.cotexica.com<br />
Speaker Iain McCready, CEO<br />
Session Time Wednesday, May 16 at 09:35<br />
ELEMENTAL TECHNOLOGIES<br />
Elemental Technologies is a leading supplier of video solutions for multiscreen<br />
content delivery. Founded in 2006 and headquartered in Portland, Oregon, the<br />
company pioneered the use of graphics processors to power adaptive video<br />
streaming over IP networks. Top media and entertainment companies around the<br />
world rely on solutions from Elemental to drive next-generation video services.<br />
www.elementaltechnologies.com<br />
Speaker Sam Blackman, CEO and Co-Founder<br />
Session Time Wednesday, May 16 at 16:30<br />
13 CONFERENCE GUIDE EMERGING<br />
COMPANIES SUMMIT
��<br />
���������������<br />
EYESIGHT MOBILE TECHNOLOGIES<br />
eyeSight Mobile Technologies Ltd. presents innovative gesture recognition<br />
technology that powers Touch Free UI solutions, creating an enhanced user<br />
experience when interacting with a variety of digital devices. The technology is<br />
entirely software based, requiring only a standard 2D camera, while operating on<br />
the full range of operating systems.<br />
www.eyesight-tech.com<br />
Speaker Gideon Shmuel, CEO<br />
Session Time Wednesday, May 16 at 11:00<br />
GAIKAI<br />
GAIKAI offers a fully managed cloud platform that is optimized to deliver<br />
high-end video games and applications within seconds to all leading web<br />
browsers, operating systems, and devices, even in Facebook.<br />
www.gaikai.com<br />
Speaker David Perry, CEO and Co-Founder<br />
Session Time Wednesday, May 16 at 15:00<br />
IMMERSIVE MEDIA COMPANY<br />
Immersive Media is the pioneer and leading world provider of 360º, full motion,<br />
interactive video. Our immersive 360º video content is delivered via internet to<br />
PC, Ipad or mobile device. Immersive Media provides the enabling technologies<br />
for interaction videos to record, process, live stream and deliver images from<br />
ours or other wide field cameras, with a patent portfolio covering key discoveries<br />
and capabilities of interactive and immersive video.<br />
www.immersivemedia.com<br />
Speaker Myles M. McGovern, President/CEO<br />
Session Time Wednesday, May 16 at 15:30
MERSIVE<br />
Since it was founded in 2006, Mersive has revolutionized high performance<br />
display setup and maintenance enabling a new class of displays. Mersive’s Sol<br />
software automatically aligns multiple commodity projectors into one seamless<br />
image of extraordinary quality and resolution without the expense of specialized<br />
hardware and services.<br />
www.mersive.com<br />
Speaker Robert Balgley, CEO<br />
Session Time Wednesday, May 16 at 16:45<br />
MIRRIAD<br />
MirriAd is an end to end marketing solution that can be implemented quickly,<br />
easily and cost-effectively using our online campaign management system. We<br />
provide a new and innovative way for advertisers to reach their target audiences,<br />
and for content owners to generate additional revenue. We have an everexpanding<br />
library of content, from films and TV series to corporate training<br />
videos and user-generated material – and we’re always on the lookout for new<br />
and exciting content owners to work with.<br />
www.mirriad.com<br />
Speaker Mark Popkiewicz, CEO<br />
Session Time Wednesday, May 16 at 10:30<br />
NUMECENT<br />
Numecent is a start-up which came out of stealth with a bang in March <strong>2012</strong> and<br />
is the inventor of ‘cloudpaging’. This patented technology enables friction-free<br />
digital delivery of native software and other non-linear assets through<br />
virtualization. One of the benefits of cloudpaging is that it can reduce the<br />
network footprint of digital downloads between 20x and 100x and execute them<br />
natively, at full speed, without actually requiring installation. Once cloudpaged,<br />
applications can even run off-line and always under license control.<br />
www.numecent.com<br />
Speaker Osman Kent, Co-Founder and CEO<br />
Session Time Wednesday, May 16 at 15:45<br />
15 CONFERENCE GUIDE EMERGING<br />
COMPANIES SUMMIT
aytrix<br />
3D light field camera<br />
NUMIRA BIOSCIENCES<br />
Numira Biosciences is a leading provider of specialty contract research services<br />
for preclinical drug and device development. Numira’s customers include the top<br />
biopharmaceutical companies and academic research institutions. Through its<br />
next-generation study portal, Numira provides its customers with interactive tools<br />
for accessing, exploring, and communicating about their preclinical study data.<br />
www.numirabio.com<br />
Speaker David Weinstein, CTO<br />
Session Time Wednesday, May 16 at 11:30<br />
PLAYCAST MEDIA SYSTEM<br />
Playcast Media System brings video games to the world’s largest media<br />
distribution platform – Pay TV networks. The Company’s solution delivers<br />
off-the-shelf next generation video games to existing cable, IPTV and hybrid<br />
satellite platforms. We bring cloud gaming to the world’s hundreds of millions of<br />
paying TV subscribers.<br />
www.playcast-media.com<br />
Speaker Guy De Beer, CEO<br />
Session Time Wednesday, May 16 at 17:15<br />
RAYTRIX<br />
Raytrix develops and markets single-lens 3D video cameras based on their<br />
patented high resolution light field technology, offering solutions for Particle<br />
Image Velocimetry (PIV), optical inspection, face capturing, microscopy – as well<br />
as IP for consumer products (mobile phones).<br />
www.raytrix.de<br />
Speaker Christian Perwass, CEO<br />
Session Time Wednesday, May 16 at 17:00
REALVIEW IMAGING LTD.<br />
RealView Imaging Ltd. is developing a revolutionary 3D holographic display and<br />
interface system, initially for medical imaging applications. RealView’s<br />
proprietary technology projects high-res., full color, dynamic, real-time 3D<br />
holograms “floating in open air” allowing direct and precise interaction with and<br />
within the “in air” image by literally touching the image.<br />
www.realview.co.il<br />
Speaker Shaul Gelman, Co-Founder and VP of R&D<br />
Session Time Wednesday, May 16 at 16:00<br />
ROCKETICK<br />
Rocketick is a leading provider of software simulation acceleration, enabling<br />
acceleration of 10x or more for Verilog simulations. The company’s flagship<br />
product, RocketSim , supports semiconductor companies to reduce the overall<br />
time to market of new chip designs by up to 30%, allowing development teams to<br />
tape-out with greater confidence.<br />
www.rocketick.com<br />
Speaker Tomer Ben-David, Co-Founder and VP of R&D<br />
Session Time Wednesday, May 16 at 09:20<br />
UBITUS<br />
Ubitus Inc., the technology leader in deploying Cloud-enabled rich media<br />
services, offers innovative cloud computing solutions for device manufacturers,<br />
wired/wireless communication service providers, telecommunication operators<br />
and digital content developers. Founded in 2007 and headquartered in Taipei,<br />
Taiwan, the company now has 150 employees and 4 offices in Tokyo, Beijing,<br />
Guangzhou and Seoul.<br />
www.ubitus.com<br />
Speaker Wesley Kuo, CEO<br />
Session Time Wednesday, May 16 at 11:45<br />
17 CONFERENCE GUIDE EMERGING<br />
COMPANIES SUMMIT
unity<br />
UNIVERSAL<br />
Robotics<br />
R<br />
UNITY TECHNOLOGIES<br />
Unity Technologies is revolutionizing the game industry with Unity, its awardwinning<br />
breakthrough development platform. Unity Technologies has more than<br />
450,000 registered users worldwide — including Bigpoint, Cartoon Network,<br />
Coca-Cola, Disney, Electronic Arts, LEGO, Microsoft, NASA, Nickelodeon,<br />
Ubisoft, Warner Bros., large and small studios, indies, students and hobbyists<br />
— all using Unity to create games and interactive 3D on the web, mobile,<br />
consoles and beyond. Unity Technologies is aggressively innovating to expand<br />
usability, power and platform reach along with its Asset Store digital content<br />
marketplace and Union distribution service.<br />
www.unity3d.com<br />
Speaker David Helgason, CEO<br />
Session Time Wednesday, May 16 at 10:00<br />
UNIVERSAL ROBOTICS<br />
Universal Robotics is a software company which has brought to market a new<br />
form of artificial intelligence that uses sensor information to learn. Called<br />
Neocortex it discovers patterns in chaotic environments which are relevant to an<br />
assigned task. It then analyzes those patterns to understand complexity,<br />
improving process. The company has targeted the materials handling industry<br />
as its first market, increasing the flexibility in automated machines. Among<br />
various accolades, Universal won an “Emerging Company to Watch” award from<br />
NVIDIA in 2010<br />
www.universalrobotics.com<br />
Speaker David Peters, CEO<br />
Session Time Wednesday, May 16 at 17:45
CONFERENCE GUIDE<br />
19
WEDNESDAY, MAY 16 & THURSDAY, MAY 17, <strong>2012</strong><br />
ROOM J<br />
Los Alamos National Laboratory, a leading U.S. national security research<br />
institution, co-locates the Accelerated HPC Symposium at <strong>GTC</strong> <strong>2012</strong> and bring<br />
together world leaders in supercomputing to share knowledge and help solve<br />
the world’s most crucial technology challenges.<br />
Symposium highlights include:<br />
� Learning how accelerator technologies can be leveraged in innovative ways to<br />
advance the state-of-the-art for simulations on large-scale systems<br />
� Establishing hardware and software requirements that can meet the<br />
requirements of power, scalability and fault tolerance needed for the next<br />
generation of HPC<br />
� Understanding how legacy codes can be adapted to make use of modern<br />
computing architectures<br />
� Providing a forum for feedback to the vendor community to aid in the adoption<br />
of accelerator technologies
AGENDA<br />
WEDNESDAY, MAY 16<br />
Plenary Session I 14:00–14:45 Opening Keynote with Bill Barth of TACC<br />
14:50–15:15 A New <strong>GPU</strong> Appliance Sorin Faibish (EMC)<br />
15:20–15:45 Accelerator Architectures for HPC Justin Tripp (LANL)<br />
Plenary Session II 16:00–16:25 Adaptive Heterogeneous Computing with OpenCL Simon McIntosh-Smith<br />
(University of Bristol)<br />
16:30–16:55 Accelerating Iterative Linear Solvers Hui Liu (University of Calgary)<br />
17:00–17:25 Efficient AMG on Hybrid <strong>GPU</strong> Clusters Thomas Brandes (SCAI)<br />
17:30–17:55 PISTON: Visualization Portability and<br />
Performance<br />
Christoper Sewell (LANL)<br />
THURSDAY, MAY 17<br />
Scalability:<br />
9:00–9:10 Introduction: Justin Tripp (Chair)<br />
Hardware and Software 9:10–9:20 The FPGA: Another Piece of the Puzzle Justin Tripp (LANL)<br />
9:20–9:30 Increasing Efficiency with Kepler Stephen Jones (NVIDIA)<br />
9:30–9:50 Discussion<br />
9:50–10:00 Break<br />
10:00–10:10 Can You Keep All of the Astronomers Happy All Christopher Fluke<br />
of the Time?<br />
(Swinburne University of<br />
<strong>Technology</strong>)<br />
10:10–10:20 In situ Image Analysis for Large Scale<br />
Visualization<br />
Christopher Sewel (LANL)<br />
10:20–10:40 <strong>GPU</strong> Acceleration of MapReduce Miao Xin (Junnan University)<br />
10:40–10:50 Discussion<br />
Applications –<br />
Methods and<br />
<strong>Program</strong>ming Models,<br />
Part 1<br />
Applications –<br />
Methods and<br />
<strong>Program</strong>ming Models,<br />
Part 2<br />
9:00–9:10 Introduction: Guillaume Colin de Verdiere (Chair)<br />
9:10–9:20 Preconditioning for Large-Scale Linear Solvers Dimitar Lukarski<br />
(Karlsruhe Institute of <strong>Technology</strong>)<br />
9:20–9:30 Changing Data Structures for a Changing World Hui Liu (University of Calgary)<br />
9:30–9:40 Leveraging Roadrunner Experiences Jamaludin Mohd-Yusof (LANL)<br />
9:40–9:50 Discussion<br />
9:50–10:00 Break<br />
10:00–10:30 Taming Laser Plasma Interactions: PICon<strong>GPU</strong> Michael Bussmann (Helmholtz-<br />
Zentrum Dresden-Rossendorf)<br />
10:30–10:50 Discussion<br />
14:00–14:10 The Portability Wall: How hard can it really be? John Stone (Urbana Champaign)<br />
14:10–14:20 Accelerating NAMD James Phillips (University of<br />
Illinois)<br />
14:20–14:30 Refitting Legacy Software for the New Reality John Humphrey (EM Photonics)<br />
14:30–14:40 Unstructured Data Structures: An Achilles Heel? Raphael Poncet (CEA)<br />
14:40–14:50 Discussion<br />
14:50–15:00 Break<br />
15:00–15:10 Power: The New Metric Simon MacIntosh-Smith<br />
(University of Bristol)<br />
15:10–15:20 It’s About Concurrency, Stupid! Stanley Tzeng (UC Davis)<br />
15:20–15:40 Discussion<br />
*Please note: Session details can be found within the daily sessions pages that follow.<br />
CONFERENCE GUIDE<br />
21
SPONSORED BY:<br />
SYNNEX<br />
<strong>GTC</strong> NETWORK<br />
Please visit these Tesla Preferred Partners exhibits and be<br />
entered into a daily drawing to win a free NVIDIA Tesla C2075!<br />
ACE Computers AMAX Appro Aspen Systems<br />
Colfax International Creative Consultants Exxact Technologies Microway<br />
Penguin Computing, Inc Seneca Data Themis
SESSION INFORMATION –<br />
PRE-CONFERENCE TUTORIALS –<br />
MONDAY, MAY 14<br />
MONDAY, MAY 14, 09:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A5<br />
S0005 Languages, APIs and Development Tools for<br />
<strong>GPU</strong> Computing<br />
Get a head start on the conference with this first-day introduction<br />
to key technologies for <strong>GPU</strong> Computing. This 90-minute tutorial<br />
session will cover the key features and differences between the<br />
major programming languages, APIs and development tools<br />
available today. Attendees will also learn several high level design<br />
patterns for consumer, professional and HPC applications, with<br />
practical programming considerations for each.<br />
Speaker(s): Will Ramey (Sr. Product Manager, <strong>GPU</strong><br />
Computing, NVIDIA)<br />
Topic(s): General Interest, Development Tools & Libraries, Application<br />
Design & Porting Techniques (Beginner)<br />
MONDAY, MAY 14, 09:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A3<br />
S0023 NVIDIA OpenGL for <strong>2012</strong><br />
Attend this session to get the most out of OpenGL on NVIDIA<br />
Quadro and GeForce <strong>GPU</strong>s. Topics covered include the latest<br />
advances available for Cg 3.1, the OpenGL Shading Language<br />
(GLSL); programmable tessellation; improved support for<br />
Direct3D conventions; integration with Direct3D and CUDA<br />
resources; bindless graphics; and more. When you utilize the<br />
latest OpenGL innovations from NVIDIA in your graphics<br />
applications, you benefit from NVIDIA’s leadership driving OpenGL<br />
as a cross-platform, open industry standard.<br />
Speaker(s): Mark Kilgard (Principal Software Engineer, NVIDIA)<br />
Topic(s): Computer Graphics, Development Tools & Libraries,<br />
Visualization, Audio, Image and Video Processing (Intermediate)<br />
MONDAY, MAY 14, 09:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM C<br />
S0614 Part 1: Introduction to <strong>GPU</strong> <strong>Program</strong>ming<br />
(Presented by Acceleware)<br />
Join us for an informative introduction to <strong>GPU</strong> <strong>Program</strong>ming. The<br />
session will begin with a brief overview of CUDA and dataparallelism<br />
before focusing on the <strong>GPU</strong> programming model. We<br />
will explore the fundamentals of <strong>GPU</strong> kernels, host and device<br />
responsibilities, CUDA syntax and thread hierarchy. A<br />
programming demonstration of a simple CUDA kernel will<br />
be provided.<br />
Introduction to <strong>GPU</strong> <strong>Program</strong>ming<br />
���������������<br />
������������������<br />
�����������������������<br />
<strong>GPU</strong> kernels<br />
Host vs. device responsibilities<br />
CUDA syntax<br />
Thread hierarchy<br />
���������������������������������������<br />
Speaker(s): Chris Mason (Product Manager, Acceleware)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />
Development Tools & Libraries (Beginner)<br />
MONDAY, MAY 14, 10:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A2<br />
S0341 See the Big Picture Scalable Visualization<br />
Solutions for System Integrators<br />
NVIDIA Quadro Scalable Visualizations Solutions provide many<br />
feature for System Integrators who are building large scale<br />
displays. Come join us in this tutorial session on how to configure<br />
multi-projector systems, stereoscopic and immersive displays.<br />
Speaker(s): Doug Traill (Senior Solutions Architect, NVIDIA)<br />
Topic(s): Visualization (Beginner)<br />
MONDAY, MAY 14, 10:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM B<br />
S0517A <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 1 of 3)<br />
OpenACC is a programming standard for parallel computing on<br />
accelerators (including <strong>GPU</strong>s) using directives. It is designed to<br />
harness the transformative power of heterogeneous computing<br />
systems easily and quickly. In this tutorial you will learn how to<br />
add simple compiler hints to your code to expose parallelism to<br />
the compiler, allowing it to map computation onto an accelerator.<br />
OpenACC directives allow developers to make simple and<br />
portable code changes, enabling an easier migration to<br />
accelerated computing.<br />
This is part 1 of a 3-part tutorial that will take you from an<br />
overview through how to optimize your code. The tutorial starts<br />
with an overview of OpenACC programming in which you will learn<br />
about applying basic OpenACC directives to your code, with<br />
examples. You will also learn more about how <strong>GPU</strong>s execute<br />
parallel programs, and apply this understanding to optimizing<br />
more advanced OpenACC examples to gain larger speedups and<br />
accelerate applications with various types of parallelism.<br />
Lastly, you will see how to use NVIDIA profiling tools to target<br />
your optimizations.<br />
Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA),<br />
Duncan Poole (Senior Manager, HPC, NVIDIA), Cliff Woolley (CUDA<br />
Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
MONDAY, MAY 14, 10:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A3<br />
S0603 <strong>GPU</strong> Ray Tracing<br />
Learn the latest approaches in levering <strong>GPU</strong>s for the fastest<br />
possible ray tracing results from experts developing and<br />
leveraging the NVIDIA OptiX ray tracing engine, the team behind<br />
NVIDIA iray, and those making custom renderers. Multiple<br />
rendering techniques, <strong>GPU</strong> programming languages, out-of-core<br />
rendering, and optimal hardware configurations will be covered in<br />
this cutting-edge discussion.<br />
Speaker(s): Phillip Miller (Director, Workstation Software Product<br />
Management, NVIDIA)<br />
Topic(s): Ray Tracing (Beginner)<br />
MONDAY, MAY 14, 10:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM C<br />
S0615 Part 2: Introduction to the <strong>GPU</strong> Architecture and<br />
Memory Model (Presented by Acceleware)<br />
Explore the memory model of the <strong>GPU</strong>. The first part of the<br />
session covers task parallelism and thread cooperation in <strong>GPU</strong><br />
computing. The second part focuses on the different memory<br />
types available on the <strong>GPU</strong>. We will define shared, constant and<br />
global memory and discuss the best locations to store your<br />
23 CONFERENCE GUIDE MONDAY
MONDAY<br />
application data for optimized performance. A programming<br />
demonstration of shared memory will be delivered.<br />
Introduction to the <strong>GPU</strong> Architecture and Memory Model<br />
������������������<br />
�������������������������������������<br />
������������������<br />
Shared memory<br />
Constant memory<br />
Global memory<br />
���������������������������������<br />
Speaker(s): Chris Mason (Product Manager, Acceleware)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />
Development Tools & Libraries (Beginner)<br />
MONDAY, MAY 14, 10:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A5<br />
S0624 Introduction to CUDA C<br />
Starting with a background in C or C++, learn everything you need<br />
to know in order to start programming in CUDA C. Beginning with<br />
a “Hello, World” CUDA C program, explore parallel programming<br />
with CUDA through a number of hands-on code examples.<br />
Examine more deeply the various APIs available to CUDA<br />
applications and learn the best (and worst) ways in which to<br />
employ them in applications.<br />
Speaker(s): Justin Luitjens (Devtech Engineer, NVIDIA)<br />
Topic(s): <strong>Program</strong>ming Languages & Techniques (Beginner)<br />
MONDAY, MAY 14, 13:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM B<br />
S0517B <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 2 of 3)<br />
OpenACC is a programming standard for parallel computing on<br />
accelerators (including <strong>GPU</strong>s) using directives. It is designed to<br />
harness the transformative power of heterogeneous computing<br />
systems easily and quickly. In this tutorial you will learn how to<br />
add simple compiler hints to your code to expose parallelism to<br />
the compiler, allowing it to map computation onto an accelerator.<br />
OpenACC directives allow developers to make simple and<br />
portable code changes, enabling an easier migration to<br />
accelerated computing.<br />
This is part 2 of a 3-part tutorial that will take you from an<br />
overview through how to optimize your code. The tutorial starts<br />
with an overview of OpenACC programming in which you will learn<br />
about applying basic OpenACC directives to your code, with<br />
examples. You will also learn more about how <strong>GPU</strong>s execute<br />
parallel programs, and apply this understanding to optimizing<br />
more advanced OpenACC examples to gain larger speedups and<br />
accelerate applications with various types of parallelism.<br />
Lastly, you will see how to use NVIDIA profiling tools to target<br />
your optimizations.<br />
Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA),<br />
Duncan Poole (Senior Manager, HPC, NVIDIA), Cliff Woolley (CUDA<br />
Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
MONDAY, MAY 14, 13:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A2<br />
S0530 Multi-Display Roundtable<br />
Join NVIDIA product manager and application engineers for<br />
multi-display systems for an interactive discussion on the<br />
current trends in video walls, blended multi-projector systems<br />
and its deployment.<br />
Speaker(s): Andrew Page (Senior Product Manager, NVIDIA), Shalini<br />
Venkataraman (Senior Applied Engineer, NVIDIA), Ian Williams (NVIDIA)<br />
Topic(s): Visualization (Beginner)<br />
MONDAY, MAY 14, 13:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A3<br />
S0604 NVIDIA Advanced Rendering Solutions<br />
The full range of advanced rendering solutions and frameworks<br />
from NVIDIA will be explored in this insightful product and<br />
technology discussion and demonstration. Come learn about the<br />
latest possibilities involving advanced rendering techniques and<br />
how they integrate within commercial products – from production<br />
ray tracing to volumetric and distributed rendering.<br />
Speaker(s): Phillip Miller (Director, Workstation Software Product<br />
Management, NVIDIA)<br />
Topic(s): Ray Tracing (Advanced)<br />
MONDAY, MAY 14, 13:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM C<br />
S0616 Part 3: Debugging <strong>GPU</strong> <strong>Program</strong>s (Presented<br />
by Acceleware)<br />
Get the low down on debugging your <strong>GPU</strong> program. This session<br />
includes discussion on debugging techniques and tools to help<br />
you identify issues in your kernels. The latest debugging tools<br />
provided in CUDA 4.1 including Parallel NSight, cuda-gdb and<br />
cuda-memcheck will be discussed. A programming<br />
demonstration of Parallel NSight will be provided.<br />
Debugging <strong>GPU</strong> <strong>Program</strong>s<br />
��������������������������������<br />
����������<br />
�����������������<br />
���������������<br />
�����������������������������������<br />
Speaker(s): Chris Mason (Product Manager, Acceleware)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />
Development Tools & Libraries (Beginner)<br />
MONDAY, MAY 14, 13:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A5<br />
S0629 CUDA Accelerated Compute Libraries<br />
The libraries distributed in the CUDA SDK and offered by third<br />
parties provide a wealth for functions commonly encountered in a<br />
<strong>GPU</strong> acceleration project. Using these libraries can often<br />
significantly shorten the development time of a <strong>GPU</strong> project while<br />
leading to high-performance, high-quality software. In this<br />
tutorial, we will provide an overview of the libraries in the CUDA<br />
SDK, including cuBLAS, cuRAND, NPP and Thurst and introduce<br />
common use cases. The audience will not only learn about the<br />
strengths of the individual libraries, but also learn about the<br />
decision making process to select the best suited library for<br />
their project.<br />
Speaker(s): Peter Messner (NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
MONDAY, MAY 14, 13:00 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A8<br />
S0630 Part 1 of 2: <strong>Program</strong>ming Heterogeneous Manycores<br />
Using Directives (Presented by CAPS)<br />
Directive-based programming is a very promising technology to<br />
deal with Many-Core. In this context, HPC users can rely on<br />
emerging standards such as OpenACC and OpenHMPP. CAPS will<br />
introduce OpenACC and HMPP directive-based programming
models with companion tools (e.g. for tracing, tuning, debugging):<br />
HMPP Wizard, CULA, ArrayFire, Vampir, Paraver, DDT,<br />
CodeletFinder, etc. The speakers will provide insights on how <strong>GPU</strong><br />
/ CPU can be exploited in a unified manner and how code tuning<br />
issues can be minimized. The discussion will also cover the use of<br />
libraries which is essential when addressing Many-Core<br />
<strong>Program</strong>ming. Pathscale will present its product supporting<br />
OpenHMPP programming model.<br />
Speaker(s): Francois Bodin (CAPS), Christopher Bergström (Pathscale)<br />
Topic Area(s): Parallel <strong>Program</strong>ming Languages & Compilers;<br />
Development Tools & Libraries (Beginner)<br />
MONDAY, MAY 14, 14:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A5<br />
S0027A All-In-One Debugging Experience with CUDA-<br />
GDB and CUDA-MEMCHECK<br />
CUDA Debugger tools CUDA-GDB and CUDA-MEMCHECK provide<br />
a whole new feature set to help improve your CUDA application<br />
development cycle. This session is a detailed walk-through of the<br />
key new features and advanced techniques on using CUDA-GDB<br />
and CUDA-MEMCHECK together to improve overall code<br />
productivity. This tutorial will also include live demos.<br />
This session will repeat on Wednesday at 14:00.<br />
Speaker(s): Geoff Gerfin (Technical Manager and Senior Engineer,<br />
NVIDIA), Vyas Venkataraman (Software Engineer, NVIDIA)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
MONDAY, MAY 14, 14:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM B<br />
S0517C <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 3 of 3)<br />
OpenACC is a programming standard for parallel computing on<br />
accelerators (including <strong>GPU</strong>s) using directives. It is designed to<br />
harness the transformative power of heterogeneous computing<br />
systems easily and quickly. In this tutorial you will learn how to<br />
add simple compiler hints to your code to expose parallelism to the<br />
compiler, allowing it to map computation onto an accelerator.<br />
OpenACC directives allow developers to make simple and<br />
portable code changes, enabling an easier migration to<br />
accelerated computing.<br />
This is a 3-part tutorial that will take you from an overview<br />
through how to optimize your code. The tutorial starts with an<br />
overview of OpenACC programming in which you will learn about<br />
applying basic OpenACC directives to your code, with examples. You<br />
will also learn more about how <strong>GPU</strong>s execute parallel programs,<br />
and apply this understanding to optimizing more advanced<br />
OpenACC examples to gain larger speedups and accelerate<br />
applications with various types of parallelism. Lastly, you will see<br />
how to use NVIDIA profiling tools to target your optimizations.<br />
Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA),<br />
Duncan Poole (Senior Manager, HPC, NVIDIA), Cliff Woolley (CUDA<br />
Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
MONDAY, MAY 14, 14:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A3<br />
S0522 Introduction to CUDA Fortran<br />
This tutorial will cover various aspects of writing code in CUDA<br />
Fortran, which is the Fortran interface to the CUDA architecture.<br />
Topics covered will include a basic introduction to parallel<br />
programming concepts using CUDA, performance measurements<br />
and metrics, optimization, and multi-<strong>GPU</strong> programming via CUDA<br />
4.0’s peer-to-peer capability and MPI. Several case studies will be<br />
presented as well.<br />
Speaker(s): Massimiliano Fatica (Manager, NVIDIA), Gregory Ruetsch<br />
(Applied Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
MONDAY, MAY 14, 14:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A2<br />
S0601 <strong>GPU</strong>-Based Video Processing Round Table<br />
Have questions, concerns or thoughts about the direction of<br />
<strong>GPU</strong>-based video and image processing? Join NVIDIA engineers<br />
and product managers for a lively discussion of such topics as<br />
application design, multi-<strong>GPU</strong> architecture, data movement,<br />
threading, APIs, and color management as they apply to Video and<br />
Image processing applications.<br />
Speaker(s): Alina Alt (Applied Engineer, NVIDIA), Andrew Page (Senior<br />
Product Manager, NVIDIA), Thomas True (Senior Applied Engineer,<br />
NVIDIA), Ian Williams (Director of Applied Engineering, NVIDIA), Eric<br />
Young (Manager of Applied Research, NVIDIA)<br />
Topic(s): Audio, Image and Video Processing (Beginner)<br />
MONDAY, MAY 14, 14:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM C<br />
S0617 Part 4: Introduction to Optimizations and Profiling<br />
(Presented by Acceleware)<br />
Learn how to optimize and profile your algorithms for the <strong>GPU</strong>.<br />
This session will cover the essentials of code optimization and will<br />
include: arithmetic optimizations, warps, branching efficiency,<br />
memory latency/occupancy and memory performance<br />
optimizations. Real life commercial examples will be discussed to<br />
highlight the critical aspects of <strong>GPU</strong> optimization techniques. A<br />
programming demonstration using the NVIDIA Visual Profiler will<br />
be included.<br />
Introduction to Optimizations and Profiling<br />
��������������������������<br />
�������<br />
���������������������<br />
���������������������������<br />
�����������������������������������<br />
����������������������������������<br />
Speaker(s): Chris Mason (Product Manager, Acceleware)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />
Development Tools & Libraries (Beginner)<br />
MONDAY, MAY 14, 14:30 (80 MINUTES)<br />
PRE-CONFERENCE TUTORIAL - ROOM A8<br />
S0631 Part 2: <strong>Program</strong>ming Heterogeneous Many-cores<br />
Using Directives (Presented by CAPS)<br />
Directive-based programming is a very promising technology to<br />
deal with Many-Core. In this context, HPC users can rely on<br />
emerging standards such as OpenACC and OpenHMPP. CAPS will<br />
introduce OpenACC and HMPP directive-based programming<br />
models with companion tools (e.g. for tracing, tuning, debugging):<br />
HMPP Wizard, CULA, ArrayFire, Vampir, Paraver, DDT,<br />
CodeletFinder, etc. The speakers will provide insights on how <strong>GPU</strong><br />
/ CPU can be exploited in a unified manner and how code tuning<br />
issues can be minimized. The discussion will also cover the use of<br />
libraries which is essential when addressing Many-Core<br />
<strong>Program</strong>ming. Pathscale will present its product supporting<br />
OpenHMPP programming model.<br />
Speaker(s): Francois Bodin (CAPS), Christopher Bergström (Pathscale)<br />
Topic Area(s): Parallel <strong>Program</strong>ming Languages & Compilers;<br />
Development Tools & Libraries (Beginner)<br />
25 CONFERENCE GUIDE MONDAY
�������������������������������������������<br />
������������������������������������������������������������������������������������������������������������ ��������������������������������������������������������������������������������������������������<br />
������������������������������������������������������������������������������������������������������������<br />
��������������������������������������������������������������������������������������������������������<br />
������������������������������������������������������������������<br />
�������������������������������������������������������������������������������������������������������������<br />
��������������������������������������������������������������������������������������������������������������<br />
����������������������������<br />
������������������������������������������������������������������������������������������������������������<br />
��������������������������������������������������������������������������������������������������������������<br />
�����������������������������������������������������������<br />
��������������������������������������������������������������������������������������������������������������<br />
�����������������������������������������������������������������������������������������������������������������<br />
�����������������������������<br />
����������������������������������������������������������������� �� ��������� ��������� ��� ������� ��������<br />
�������������������������������<br />
������������������������������������������������������������������������������������������������������������ � �������������������������
SESSION INFORMATION<br />
TUESDAY, MAY 15<br />
TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />
ROOM J1<br />
S0102 Flame On: Real-Time Fire Simulation for<br />
Video Games<br />
Fire and explosions are common elements in video games and<br />
other virtual environments. We present a real-time fire simulator<br />
inspired by the paper “Directable, High-Resolution Simulation of<br />
Fire on the <strong>GPU</strong>” [Horvath and Geiger 2009], but this time<br />
implemented entirely in CUDA and targeted at adding interactive<br />
fire to video games. This talk will describe both the tricks necessary<br />
to implement an efficient fluid simulator in CUDA, and techniques<br />
for rendering the results to achieve realistic looking fire.<br />
Speaker(s): Simon Green (Senior Software Engineer, NVIDIA),<br />
Christopher Horvath (Global <strong>Technology</strong> Technical Director, Pixar)<br />
Topic(s): Computer Graphics, Computational Fluid Dynamics (Intermediate)<br />
TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0248 Excitements, Challenges, and Rewards In<br />
Optimizing GP<strong>GPU</strong> Kernels<br />
Learn about the excitements and challenges in optimizing CUDA<br />
kernels for the last two generations of NVIDIA GP<strong>GPU</strong>s.<br />
Autotuning, although crucially important, is merely a silver bullet<br />
to port code from one generation of <strong>GPU</strong> to another. The process<br />
required many steps: (a) architecture specific algorithms, (b)<br />
tuning algorithms, (c) finding innovative tricks to handle generic<br />
cases, (d) tweaking <strong>GPU</strong>’s internal scheduling to handle partition<br />
camping, and (e) above all, the dedication of many enthusiastic<br />
programmers. We will share our experiences and discoveries<br />
through the development of MAGMABLAS - a subset of CUDA<br />
BLAS, highly optimized for NVIDIA GP<strong>GPU</strong>s.<br />
Speaker(s): Rajib Nath (Student, University of California San Diego),<br />
Stanimire Tomov (Research Director, University of Tennessee, Knoxville)<br />
Topic(s): Algorithms & Numerical Techniques, Application Design &<br />
Porting Techniques, Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />
ROOM A8<br />
S0268 Virtual Process Engineering - Realtime<br />
Simulation of Multiphase Systems<br />
Realtime simulation and virtual reality with quantitatively correct<br />
physics for industrial processes with multi-scale and multiphase<br />
system is once a remote dream for process engineering, but is<br />
becoming true now with CPU-<strong>GPU</strong> hybrid supercomputing.<br />
Numerical and visualization methods for such simulations on<br />
thousands of <strong>GPU</strong>s will be reported with applications in chemical<br />
and energy industries.<br />
Speaker(s): Wei Ge (Professor, Institute of Process Engineering,<br />
Chinese Academy of Sciences)<br />
Topic(s): Computational Fluid Dynamics, Molecular Dynamics,<br />
Computational Physics, Algorithms & Numerical Techniques (Advanced)<br />
TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />
ROOM A7<br />
S0296 A <strong>GPU</strong>-Enabled SPH Method for Micro and<br />
Nanofluidic Simulations<br />
With SPH methods multi-phase flows within complex geometries<br />
can be efficiently investigated. Also physical effects present in<br />
micro- and nanofluidic applications are described with little effort<br />
using the SPH methodology. In order to investigate microfluidic<br />
applications relevant to industry, large domains and high spatial<br />
resolutions are required. Therefore, a SPH method for accelerated<br />
computations on <strong>GPU</strong>s is currently developed. The code features<br />
dynamic casting of computational data into blocks of appropriate<br />
size to fit the <strong>GPU</strong> memory layout. Also tree-like data structures<br />
for efficient manipulation of particle distributions help to obtain<br />
significant performance gains on <strong>GPU</strong> hardware.<br />
Speaker(s): Daniel Gaudlitz (Research Associate, Technische<br />
Universität München)<br />
Topic(s): Computational Fluid Dynamics, Algorithms & Numerical<br />
Techniques (Intermediate)<br />
TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />
ROOM J3<br />
S0317 Compiling a Parallel Domain Specific Language<br />
to <strong>GPU</strong>s<br />
Discuss techniques for compiling Parallel DSLs to <strong>GPU</strong>s. Verilog<br />
is a Domain Specific Language for Hardware Description. Verilog<br />
users express parallelism with guarded processes similar to<br />
Occam’s guarded commands. Review Verilog semantics, and<br />
different approaches to compiling Verilog to parallel architectures<br />
and to <strong>GPU</strong>s. Discuss challenges with (a) Verilog description’s<br />
runtime behavior (b) managing process dependency. Discuss<br />
approaches and challenges in compiling a parallel DSL to CUDA C.<br />
Speaker(s): Ramesh Narayanaswamy (Principal Engineer, Synopsys Inc.)<br />
Topic(s): Electronic Design Automation, Application Design & Porting<br />
Techniques (Intermediate)<br />
TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />
ROOM K<br />
S0337 High-Throughput Epistasis Screening Using <strong>GPU</strong>s<br />
Epistasis is the interaction of two or more genes in coding for a<br />
biological property. Epistasis is believed to be an important factor<br />
in an individual’s susceptibility to disease, and the search for<br />
epistasis is a major component in the development of<br />
personalized approaches to genomic medicine. Statistical tests<br />
for epistasis are typically confounded by the multiple-testing<br />
problem, that is, the aggregated loss of precision incurred through<br />
repeated hypothesis testing. One way to circumvent this problem<br />
is to simulate a false-discovery rate via resampling. We report<br />
success in using <strong>GPU</strong>s to accelerate these highly computeintensive<br />
resampling techniques.<br />
Speaker(s): Mark Seligman (Senior Scientist, Insilicos LLC)<br />
Topic(s): Bioinformatics, Life Sciences, Supercomputing,<br />
Cloud Computing (Intermediate)<br />
TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />
ROOM A2<br />
S0395 <strong>GPU</strong> Enablement in Adobe Photoshop<br />
Photoshop is one of the most popular products in history. It<br />
attempts to delight the customers with an immersive experience.<br />
Since CS4, Adobe has been tapping into the horsepower of the<br />
<strong>GPU</strong> to create a compelling playground for the imaginations of<br />
creative pros. Please join us to review the latest developments on<br />
how <strong>GPU</strong>s have been an enabling force.<br />
Speaker(s): Jeff Chien (Adobe Systems), Jerry Harris (Senior Computer<br />
Scientist II, Adobe Systems)<br />
Topic(s): Digital Content Creation & Film, Audio, Image and Video<br />
Processing (Beginner)<br />
27 CONFERENCE GUIDE TUESDAY
TUESDAY<br />
TUESDAY, MAY 15, 09:00 (50 MINUTES)<br />
ROOM C<br />
S0419A Optimizing Application Performance with CUDA<br />
Profiling Tools<br />
NVIDIA provides two powerful profiling tools that you can use to<br />
maximize your application’s performance. The NVIDIA Visual Profiler<br />
helps you understand your application’s behavior with a detailed<br />
timeline and data from <strong>GPU</strong> performance counters. The Visual<br />
Profiler also provides an automatic, data-driven analysis engine that<br />
provides suggestions on potential optimization strategies for your<br />
application. Nvprof is a command-line profiler that provides<br />
gprof-like functionality for the <strong>GPU</strong>. Nvprof provides summary<br />
information about where your application is spending the most time,<br />
so that you can focus your optimization efforts. This session will<br />
provide a step-by-step walk through of both of these profiling tools,<br />
showing how you can use these tools to identify optimization<br />
opportunities at the application, kernel, and source-line levels.<br />
This session will repeat Wednesday at 14:00 (S0419B).<br />
Speaker(s): David Goodwin (Software Engineer, NVIDIA)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />
ROOM J2<br />
S0527 <strong>GPU</strong>s and the Next-Generation Aerial Surveillance<br />
Graphics processors are already used for computationally<br />
intensive video tasks in many ISR (Intelligence, Surveillance,<br />
Reconnaissance) applications; <strong>GPU</strong>-based system for video<br />
enhancement and analytics outperforms a similarly priced<br />
CPU-based system 5-to-1 at HD resolutions. Our initial tests on 64<br />
megapixel Wide Area Aerial Surveillance (WAAS) data show at least<br />
10x speedup with tasks such as super-resolution or moving target<br />
indication. In this talk, we’ll discuss unique design and<br />
implementation challenges of real-time processing of very large<br />
video data sets. We will demonstrate our existing <strong>GPU</strong>-based<br />
software, IKENA ISR, and discuss its video-processing pipeline and<br />
innovative processing solutions that are promising to dramatically<br />
expand capabilities of emerging aerial surveillance platforms.<br />
Speaker(s): Nikola Bozinovic (CTO, MotionDSP)<br />
Topic(s): General Interest (Beginner)<br />
TUESDAY, MAY 15, 09:00 (25 MINUTES)<br />
ROOM A1<br />
S0607 High Performance 3D Perception<br />
The path to general purpose graphics programming was driven by<br />
computer graphics: the process of rendering 3d models into 2d<br />
viewpoints. With the advent of flexible programming of GP<strong>GPU</strong><br />
processing, this process can be reversed. 3D perception is the<br />
problem of inferring structure and motion of the physical world<br />
from 2d and 3d measurements. In this talk, we will demonstrate<br />
the role GP<strong>GPU</strong> plays in a diverse set of applications in high speed<br />
3d perception and discuss optimization of these techniques for the<br />
GP<strong>GPU</strong>. We also demonstrate several capabilities of future<br />
systems which are enabled by GP<strong>GPU</strong> technologies.<br />
Speaker(s): Chris Slaughter (President, University of Texas Perception,<br />
Lynx Labs)<br />
Topic(s): Computer Vision (Beginner)<br />
TUESDAY, MAY 15, 09:30 (25 MINUTES)<br />
ROOM J2<br />
S0040 Introducing CUDA in KBE Applications for Digital<br />
Vehicle Development <strong>Program</strong>s<br />
Get the latest development in Next Generation Knowledge Based<br />
Engineering (KBE) software which provides real results over the<br />
traditional design approach. Today there exist numerous KBE<br />
applications in the field of vehicle ergonomics, suspension, NVH,<br />
safety, regulations etc which deal with huge number of iterations<br />
and mathematical algorithm. With <strong>GPU</strong> computing and CUDA the<br />
KBE kernel is restructured to incorporate parallel programming<br />
model which helps the applications run faster and achieving time<br />
reduction from hours to seconds. KBE geometry kernel also gets<br />
benefited by enabling CUDA in topology based operations which<br />
take lot of time when performed on CPU.<br />
Speaker(s): Avijit Santra (Project Manager, Knowledge Based<br />
Engineering, Tata Motors Limited)<br />
Topic(s): General Interest (Intermediate)<br />
TUESDAY, MAY 15, 09:30 (25 MINUTES)<br />
ROOM K<br />
S0083 Swift: A <strong>GPU</strong>-based Smith-Waterman Sequence<br />
Alignment <strong>Program</strong><br />
This session describes Swift, a <strong>GPU</strong>-based Smith-Waterman<br />
implementation for aligning short DNA sequences to large<br />
genomes. Swift has been designed to reduce computation time<br />
and lower hardware cost. Also, unlike other leading <strong>GPU</strong>-based<br />
Smith-Waterman sequence alignment programs like CUDASW++<br />
and SWCUDA which focus on protein sequence alignment, Swift<br />
has been developed for DNA sequence alignment. Swift performs<br />
200x faster than CUDASW++ using a test data set containing 1000<br />
reads (100 bases each) and 1000 references (1000 bases each),<br />
and it performs 11x faster than the CPU-based implementation of<br />
Smith-Waterman using 24 million reads (100 bases each) and<br />
human chromosome 1.<br />
Speaker(s): Pankaj Gupta (Bioinformatics Application Developer, St<br />
Jude Children’s Research Hospital)<br />
Topic(s): Bioinformatics (Beginner)<br />
TUESDAY, MAY 15, 09:30 (25 MINUTES)<br />
ROOM A7<br />
S0258 Sailfish: Lattice Boltzmann Fluid Simulations with<br />
<strong>GPU</strong>s and Python<br />
Learn how Run-Time Code Generation (RTCG) techniques allowed<br />
for fast development of a lattice Boltzmann (LB) fluid dynamics<br />
solver called Sailfish. Sailfish is completely open source, supports<br />
a wide variety of LB models (single and multiple relaxation times,<br />
the entropic model; single and binary fluids) and can take<br />
advantage of multiple <strong>GPU</strong>s. Even though the project is written<br />
predominantly in Python, no performance compromises are made.<br />
This talk will introduce the basic design principles of Sailfish and<br />
illustrate how RTCG allows to exploit the power of <strong>GPU</strong>s with<br />
minimal programmer effort.<br />
Speaker(s): Michal Januszewski (PhD Student/Software Engineer,<br />
University of Silesia in Katowice/Google Switzerland)<br />
Topic(s): Computational Fluid Dynamics, Computational Physics,<br />
Development Tools & Libraries (Intermediate)<br />
TUESDAY, MAY 15, 9:30 (25 MINUTES)<br />
ROOM J3<br />
S0329 Using <strong>GPU</strong>s to Speedup Computational Lithography<br />
In this paper we show how <strong>GPU</strong>s can be used to significantly<br />
speedup computational lithography, which is heavily used in the<br />
Electronic Design Automation (EDA) industry. In particular, we<br />
demonstrate a noticeable performance increase in several basic<br />
optical lithography algorithms as well as the speedup of the<br />
full-chip verification software, crucial parts of which were ported
to NVIDIA’s <strong>GPU</strong>s. We summarize the advantages, disadvantages<br />
and challenges of using <strong>GPU</strong>s and compare it to more traditional<br />
multithreading and distributed computing alternatives for the<br />
same applications.<br />
Speaker(s): Constantin Chuyeshov (Algorithm Engineer, Cadence<br />
Design Systems)<br />
Topic(s): Electronic Design Automation (Intermediate)<br />
TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />
ROOM A1<br />
S0404 Computer Vision Libraries with <strong>GPU</strong>s<br />
Learn how Computer Vision libraries can take advantage of <strong>GPU</strong>s.<br />
Computer Vision algorithms are extremely well suited for <strong>GPU</strong><br />
architectures because they demand large computational power<br />
that <strong>GPU</strong>s offer over CPUs. This talk provides an overview of the<br />
different <strong>GPU</strong> libraries such as (OpenCV, <strong>GPU</strong>CV, PCL, and NPP<br />
Libraries) and online resources (<strong>GPU</strong>4Vision and OpeNVIDIA)<br />
available for developers today. Examples and demonstrations of<br />
practical applications making use of these libraries will also be<br />
shown throughout the talk.<br />
Speaker(s): Eric Young (Manager of Developer <strong>Technology</strong> Profesional<br />
and Consumer Applications, NVIDIA)<br />
Topic(s): Computer Vision, Audio, Image and Video Processing (Beginner)<br />
TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />
ROOM B<br />
S0430 Developing Next-Generation CUDA Acceleration<br />
in Wolfram’s Mathematica with Parallel Nsight<br />
Since version 8, Mathematica offers advanced support for <strong>GPU</strong><br />
acceleration with optimized CUDA functions and a built-in<br />
framework for developing scientific CUDA kernel code. In this<br />
session, the Wolfram development team will share their<br />
experience developing their next-generation CUDA support in<br />
Mathematica. From the unique ability of Parallel Nsight to attach<br />
its CUDA debugger to a running process, the new parallel Warp<br />
Watch for warp-wide variable views and expression evaluation, to<br />
the latest runtime CUDA profiling experiments; they will<br />
demonstrate how they were able to take advantage of Parallel<br />
Nsight to get the most out of CUDA and the <strong>GPU</strong>.<br />
Speaker(s): Abdul Dakkak (Kernel Developer, Wolfram), Sebastien Domine<br />
(Sr. Director, Software Engineering, Developer Tools, NVIDIA), Ulises<br />
Cervantel-Pimentel (Senior Kernel Developer, Wolfram)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />
ROOM M<br />
S0618 Best Practices of a 800TFlop Hybrid<br />
Supercomputer Implementation (Presented by Appro)<br />
Learn about the “Frontier Computing System”, deployed by Appro<br />
for the University Of Tsukuba Center Of Computational Sciences in<br />
Japan containing over half a million <strong>GPU</strong> cores. Learn how<br />
reliability, availability, manageability and compatibility were<br />
essential for this successful 800TF hybrid supercomputing<br />
implementation. Explore new techniques in how HA-PACS is<br />
accelerating large scale parallel code by combining CPU/<strong>GPU</strong><br />
processing cluster configurations for scientific research, such as<br />
astrophysics and climate modeling. Learn how to improve data I/O<br />
performance and memory size limitations in hybrid systems<br />
configured with Lustre File System offering the best<br />
performance per dollar and excellent memory capacity per/FLOP.<br />
Speaker(s): Taisuke Boku (Deputy Director of Center for Computational<br />
Sciences at University of Tsukuba), Steve Lyness (VP of HPC Solutions<br />
Engineering, Appro)<br />
Topic(s): Supercomputing, Astronomy & Astrophysics (Intermediate)<br />
TUESDAY, MAY 15, 09:30 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0800 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight<br />
development team! Whether you would like a private meeting to<br />
discuss specific product features or test out your application with<br />
the latest version of Nsight, or you just want to hang out with the<br />
team after attending one of the exciting training session, the lab is<br />
great place to learn everything you ever wanted to know about the<br />
tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM J2<br />
S0013 <strong>GPU</strong>s for Fast Triggering in NA62 Experiment<br />
We discuss an approach for using commercial graphic processors<br />
(<strong>GPU</strong>s) at the earliest trigger stages in high-energy physics<br />
experiments, and study its implementation on a real trigger<br />
system in preparation. In particular we focus on the possibility to<br />
reconstruct rings in a Cherenkov detector as building block of a<br />
selective trigger condition for rare decay search. Latency and<br />
processing rate measurements on several state-of-the-art<br />
devices are presented, and the potential issues related to<br />
processing time jitter and data transfer throughput are discussed.<br />
Speaker(s): Gianluca Lamanna (Researcher, CERN), Marco Sozzi<br />
(Associate Professor, Physics Department of Pisa)<br />
Topic(s): General Interest (Intermediate)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM A8<br />
S0031 Unstructured Grid Numbering Schemes for <strong>GPU</strong><br />
Coalescing Requirements<br />
Learn how to achieve high performance for computational fluid<br />
dynamics (CFD) solvers over unstructured grids using numbering<br />
schemes tailored for <strong>GPU</strong> coalescing requirements. Using these<br />
techniques, unstructured grid CFD solvers can make more<br />
effective use of memory bandwidth, which is an otherwise<br />
significant performance bottleneck that has so far led to relatively<br />
limited performance gains on <strong>GPU</strong>s in comparison to structured<br />
grid CFD solvers. Performance benchmarks will be shown using<br />
the Jet Engine Noise Reduction (JENRE) code.<br />
Speaker(s): Andrew Corrigan (Research Mathematician, Naval<br />
Research Laboratory), Johann Dahm (University of Michigan)<br />
Topic(s): Computational Fluid Dynamics, Algorithms & Numerical<br />
Techniques, Computational Physics (Advanced)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM A7<br />
S0251 RANS CFD Solver on Fermi<br />
SJTU-NS3D is an in-house CFD code co-developed by SJTU and<br />
COMAC for large civil airplane, solving 3D Reynolds Average<br />
Navier-Stokes (RANS) equations on structured grids by finite<br />
volume method, which could be used in designing wing model. In<br />
this talk, we will present the design and further optimization of<br />
CUDA version of SJTU-NS3D, and it achieves 20-fold speedup for<br />
standard M6 wing model and 37-fold speedup for wing model<br />
candidate from COMAC on single Fermi C2050.<br />
29 CONFERENCE GUIDE TUESDAY
<strong>GPU</strong> SuperBlade®<br />
SBI-7127RG<br />
Suports 20 <strong>GPU</strong>s in 7U<br />
4U 4 <strong>GPU</strong> SuperServer®<br />
SS7047GR Series<br />
Supports Up to 4 Double-Width <strong>GPU</strong>s in 4U<br />
�����������������������������������<br />
HPC Systems Optimized for Scientifi c, Engineering and Computational Finance Applications<br />
�� Up to 20 <strong>GPU</strong>s in 7U<br />
�� Non-Blocking Native PCI-E 3.0 x16 Direct Connections to <strong>GPU</strong>s<br />
�� Centralized Remote Management Module<br />
(IPMI 2.0, KVM-over-IP, Remote Virtual Media)<br />
�� Redundant Platinum Level (94%+) High-Effi ciency Power Supplies<br />
�� New Dual Intel® Xeon® E5-2600 Processor Family<br />
2U 4/6 <strong>GPU</strong> SuperServer®<br />
SS2027GR Series<br />
Supports Up to 6 Double-Width <strong>GPU</strong>s in 2U<br />
www.supermicro.com/X9<br />
1U 3/4 <strong>GPU</strong> SuperServer®<br />
SS1027GR Series<br />
Supports Up to 4 Double-Width <strong>GPU</strong>s in 1U<br />
© Super Micro Computer, Inc. Specifi cations subject to change without notice.<br />
Intel®, the Intel® logo, Xeon®, and Xeon® Inside, are trademarks or registered trademarks of Intel Corporation in the US and other countries. All other brands and names are the property of their respective owners.<br />
SMCI-<strong>2012</strong>0221- 1
Speaker(s): James Lin (Assistant Professor, Shanghai Jiao<br />
Tong University)<br />
Topic(s): Computational Fluid Dynamics (Intermediate)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0255 Telecom Systems Simulations Acceleration via<br />
CPU/<strong>GPU</strong> Co-Processing: Turbo Codes Case Study<br />
Learn how the struggle for acceleration of simulations of a<br />
Serially Concatenated turbo code (SCCC) led to the knowledge of<br />
new techniques applicable to a broad range of non-natively<br />
parallel physical layer telecommunication systems simulations.<br />
The overall architectural features of CUDA became inspiring for<br />
newer parallelization techniques involving algorithm engineering;<br />
the simulation acceleration attained for iterative SCCC Decoder<br />
represents an example of efficiency of leveraging on<br />
heterogeneous <strong>GPU</strong>-CPU coprocessing concepts. The registrants<br />
will deep dive into data sets and tasks organization strategies<br />
as well as into results and insights, all widely presented<br />
and discussed.<br />
Speaker(s): Paolo Spallaccini (System Engineer, Ericsson)<br />
Topic(s): Algorithms & Numerical Techniques, Audio, Image and Video<br />
Processing, Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM A2<br />
S0300 Jet: A Domain-Specific Approach to Parallelism<br />
for Film Fluid Simulation<br />
Discover how a domain-specific language can not only provide fast<br />
parallel performance but a simpler user experience in an<br />
environment that highly values flexibility. This talk will present the<br />
Jet language and heterogeneous compiler built on the LLVM<br />
compiler framework that enables efficient generation of X86<br />
machine code or NVIDIA PTX for stencil computation on<br />
structured grids. We show that moving target-specific<br />
optimizations upstream into the compiler can greatly improve the<br />
ability to manipulate the logic of the solver and thus lower the<br />
barrier-to-entry for artists and developers without compromising<br />
on performance.<br />
Speaker(s): Dan Bailey (R&D, Double Negative)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Digital<br />
Content Creation & Film, Computational Fluid Dynamics (Intermediate)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM L<br />
S0343 A Quantum Chemistry Domain-Specific Language<br />
For Heterogeneous Clusters<br />
This talk discuss the development of a Domain-Specific Language<br />
(DSL), the tools and the related runtime for efficiently generating<br />
Tensor Contractions (generalized matrix multiplications), an<br />
important part of many quantum chemistry methods (e.g. Coupled<br />
Cluster Theory). Starting from a high level description of the<br />
computation, the tool analyses it and generates optimized C,<br />
OpenCL or CUDA implementations. The runtime, supporting a<br />
task based computation model, is then able to execute the<br />
generated code on <strong>GPU</strong>-accelerated heterogeneous large scale<br />
clusters, maximizing the utilization of the processing elements<br />
and minimizing communication costs.<br />
Speaker(s): Antonino Tumeo (Research Scientist, Pacific Northwest<br />
National Laboratory), Oreste Villa (Research Scientist, Pacific<br />
Northwest National Laboratory)<br />
Topic(s): Quantum Chemistry, Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM K<br />
S0376 Dynamic <strong>Program</strong>ming on CUDA: Finding the Most<br />
Similar DNA Sequence<br />
Learn a couple of techniques to speed up compute-heavy Dynamic<br />
<strong>Program</strong>ming algorithms on the <strong>GPU</strong>. Our particular problem<br />
regarded DNA sequences: given a reference sequence, how to find<br />
the one most similar to it among a large database? The sequences<br />
are millions characters long, and their similarity is calculated with<br />
a (quadratic) DP algorithm, which makes the problem very tough<br />
even for the <strong>GPU</strong>s. We speed up both the theoretical and practical<br />
side: we present programming techniques that enable Dynamic<br />
<strong>Program</strong>ming to be performed at the hardware speed, and<br />
improvements to the algorithm itself that drastically lower the<br />
execution time.<br />
Speaker(s): Grzegorz Kokosinski (Software Engineer, IBM Poland),<br />
Krzysztof Zarzycki (Senior Software Developer, IBM Poland)<br />
Topic(s): Bioinformatics, Algorithms & Numerical Techniques<br />
(Intermediate)<br />
TUESDAY, MAY 15, 10:00 (25 MINUTES)<br />
ROOM J3<br />
S0520 Using <strong>GPU</strong>s to Speedup Chip Verification<br />
As VLSI designs become more complex, the process of verifying<br />
them becomes increasingly expensive and time consuming.<br />
Verification of such designs has become quite taxing as they take<br />
simulators to the edge in terms of both runtime demands and<br />
host memory requirements. In order to reduce verification time,<br />
different verification methodologies have been adopted including<br />
the use of emulators. However, emulators’ price point is high and<br />
so is the engineering time to set them up. Rocketick develops a<br />
Verilog co-simulator that uses <strong>GPU</strong>s as an acceleration platform.<br />
Rocketick’s product, RocketSim® is now part of NVIDIA’s design<br />
flow and it is being used to accelerate simulations by 10X-30X<br />
compared to the standard simulator and to reduce the memory<br />
footprint by 5X. In this session RocketSim ® will be presented using<br />
some real-world examples of verification flows.<br />
Speaker(s): Tomer Ben-David (Co-Founder and Vice President,<br />
R&D, Rocketick)<br />
Topic(s): Electronic Design Automation (Beginner)<br />
TUESDAY, MAY 15, 10:30 (80 MINUTES)<br />
KEYNOTE – HALL 1<br />
S3000 Opening Keynote<br />
Do not miss this opening keynote, featuring Jen-Hsun Huang, CEO<br />
and Co-Founder of NVIDIA. Hear about what’s next in computing<br />
and graphics, and preview disruptive technologies and exciting<br />
demonstrations from across industries. Jen-Hsun co-founded<br />
NVIDIA in 1993 and has served since its inception as president,<br />
chief executive officer and a member of the board of directors.<br />
Speaker(s): Jen-Hsun Huang (CEO & Co-Founder, NVIDIA)<br />
Topic(s): General Interest (All Levels)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM A3<br />
S0024 <strong>GPU</strong>-Accelerated Path Rendering<br />
Standards such as Scalable Vector Graphics (SVG), PostScript,<br />
TrueType outline fonts, and immersive web content such as Flash<br />
depend on a resolution-independent 2D rendering paradigm that<br />
<strong>GPU</strong>s have not traditionally accelerated. This session explains a<br />
new opportunity to greatly accelerate vector graphics, path<br />
rendering, and immersive web standards using the <strong>GPU</strong>. By<br />
31 CONFERENCE GUIDE TUESDAY
TUESDAY<br />
attending, you will learn how to write OpenGL applications that<br />
accelerate the full range of path rendering functionality. Not only<br />
will you learn how to render sophisticated 2D graphics with<br />
OpenGL, you will learn to mix such resolution-independent<br />
2D rendering with 3D rendering and do so at dynamic,<br />
real-time rates.<br />
Speaker(s): Mark Kilgard (Principal Software Engineer, NVIDIA)<br />
Topic(s): Computer Graphics, <strong>GPU</strong> Accelerated Internet, Digital<br />
Content Creation & Film, Visualization (Beginner)<br />
TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />
ROOM J3<br />
S0069 <strong>GPU</strong> Computing Advances in 3D<br />
Electromagnetic Simulation<br />
Learn about the latest developments in <strong>GPU</strong> acceleration for 3D<br />
Full Wave Electromagnetic simulation. The latest version of CST<br />
Studio Suite supports the full range of Tesla products on both<br />
Windows and Linux operating systems. Using <strong>GPU</strong>, multi-<strong>GPU</strong> and<br />
MPI-<strong>GPU</strong> Computing drastically reduces the simulation times for<br />
CST customers. We will provide a status of current and future <strong>GPU</strong><br />
developments at CST and share detailed simulation results.<br />
Speaker(s): Andreas Buhr (Department Manager - Performance<br />
Optimization, CST AG), Fabrizio Zanella (Systems Manager, CST<br />
of America)<br />
Topic(s): Electronic Design Automation (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM C<br />
S0088 Point Cloud Library (PCL) on CUDA<br />
The Point Cloud Library (PCL - http://pointclouds.org) is a large<br />
scale, open project for 3D point cloud processing. The PCL<br />
framework contains numerous state-of-the art algorithms<br />
including filtering, feature estimation, surface reconstruction,<br />
registration, model fitting and segmentation. Due to the massively<br />
parallel nature of many of the above algorithms, GP<strong>GPU</strong><br />
accelerations holds great potential for achieving real-time<br />
performance in numerous applications. In this work we<br />
demonstrate some of the recent advances in GP<strong>GPU</strong><br />
programming for 3D point cloud processing, and outline plans for<br />
future development.<br />
Speaker(s): Michael Dixon (Research Engineer, Willow Garage, Inc),<br />
Radu Rusu (Research Scientist, Willow Garage, Inc),<br />
Topic(s): Computer Vision, Algorithms & Numerical Techniques,<br />
Stereoscopic 3D, Machine Vision (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM A5<br />
S0254 Graphics in the Cloud - How NVIDIA is Enabling<br />
Cloud Visualization<br />
Engineers, artists, scientists, and gamers are the most<br />
demanding visual thinkers on the planet, and as such have not<br />
been willing to move their computing environments to the<br />
infamous “cloud”. These remotely accessed systems are seen as<br />
slow and not up to the visual experience that users expect when<br />
dealing with these types of applications. NVIDIA aims to change<br />
that perception with the NVIDIA Virtual Graphics Platform. In this<br />
session you will hear about the technologies behind accelerating<br />
graphics in the cloud, and some of the industry partnerships that<br />
are enabling it.<br />
Speaker(s): Will Wade (Manager, Quadro Advanced Technologies, NVIDIA)<br />
Topic(s): Cloud Computing, Visualization, Computer Graphics<br />
(Intermediate)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0313 Understanding and using Atomic<br />
Memory Operations<br />
Atomic memory operations provide powerful communication and<br />
coordination capabilities for parallel programs, including the<br />
well-known operations compare-and-swap and fetch-and-add. The<br />
atomic operations enable the creation of parallel algorithms and<br />
data structures that would otherwise be very difficult (or<br />
impossible) to express without them - for example: shared parallel<br />
data structures, parallel data aggregation, and control primitives<br />
such as semaphores and mutexes. In this talk we will use examples<br />
to describe atomic operations, explain how they work, and discuss<br />
performance considerations and pitfalls when using them.<br />
Speaker(s): Stephen Jones (CUDA Developer, NVIDIA), Lars Nyland<br />
(Compute Architect, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques, Parallel <strong>Program</strong>ming<br />
Languages & Compilers (Advanced)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM N<br />
S0319 Advanced Driver Assistance System Testing<br />
using OptiX<br />
Learn in this session how the AUDI AG and its partners make use<br />
of OptiX as a unified platform for the simulation of perception<br />
sensors utilizing different physical measurement principles, e.g.<br />
Video Camera, LIDAR, Ultra Sonic, etc. The aim is to generate<br />
synthetic sensor data with realistic measurement errors for<br />
testing Advanced Driver Assistance Systems. Get details about the<br />
challenges they faced during the implementation of the necessary<br />
tools for validating the sensor models and join the discussion<br />
when they describe the upcoming challenges related to real-time<br />
Ray Tracing and advanced material descriptions, when multiple<br />
sensors are simulated simultaneously.<br />
Speaker(s): Erwin Roth (Researcher, Technische Universitaet<br />
Muenchen), Tugkan Calapoglu (Lead Graphics Software Developer,<br />
VIRES Simulationstechnologie GmbH)<br />
Topic(s): Ray Tracing, Machine Vision (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />
ROOM A8<br />
S0321 <strong>GPU</strong>-Based Monte Carlo Ray Tracing Simulation<br />
for Solar Power Plants<br />
Learn about real time simulations of Concentrating Thermal Solar<br />
Power using <strong>GPU</strong> technology to enable performance optimization<br />
of these utility scale plants. By leveraging the power of <strong>GPU</strong>s and<br />
the parallel aspect of the field of thousands sun-tracking mirrors,<br />
we have been successful in cutting the computation time by<br />
orders of magnitude versus the previously required minutes and<br />
hours runtime. We will present an overview of the problem<br />
domain and describe how we used the <strong>GPU</strong> to derive a Monte<br />
Carlo physics ray tracing method to simulate the flux reflected by<br />
the mirrors onto the solar receiver.<br />
Speaker(s): Michel Izygon (Tietronix Software), Claus Nilsson<br />
(<strong>Program</strong>mer, Tietronix Software)<br />
Topic(s): Energy Exploration, Computational Physics, Ray Tracing<br />
(Beginner)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM J2<br />
S0328 Best Practices in <strong>GPU</strong>-Based Video Processing<br />
The combination of the <strong>GPU</strong>’s massively parallel compute engine
with extremely high memory bandwidth and new programming<br />
paradigms such as CUDA and OpenCL have made the <strong>GPU</strong> well<br />
suited for image and video processing applications. This session<br />
will explore best practices and techniques for the development of<br />
efficient <strong>GPU</strong>-based video and image processing applications.<br />
Topics to be discussed include image segmentation and threading<br />
models for efficient parallelism, optimal memory usage strategies<br />
to reduce expensive data movement as well as multi-<strong>GPU</strong><br />
considerations. Case studies and examples specific to video and<br />
image processing will be presented.<br />
Speaker(s): Thomas True (Applied Engineer, NVIDIA)<br />
Topic(s): Audio, Image and Video Processing, Digital Content Creation &<br />
Film, Computer Vision, Medical Imaging & Visualization (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM J1<br />
S0364 Interacting with Huge Particle Simulations in<br />
Maya with the <strong>GPU</strong><br />
We present a plug-in for Maya which enables an artist to simulate<br />
huge particle counts in real-time by leveraging the NVIDIA <strong>GPU</strong>.<br />
Being able to interact with the simulation opens up new<br />
possibilities for modifying the workflow. We will demonstrate the<br />
plug-in, and provide insight into the algorithms used.<br />
Speaker(s): Wil Braithwaite (Senior Applied Engineer, NVIDIA)<br />
Topic(s): Digital Content Creation & Film, Computational Fluid<br />
Dynamics, Visualization (Beginner)<br />
TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />
ROOM A1<br />
S0412 A 2-Petaflops Stencil Application with<br />
Stereoscopic 3D Visualization - Gorden Bell Prize 2011<br />
Most stencil applications such as CFD and structure analysis are<br />
memory-bound problems. <strong>GPU</strong> has high performances in both<br />
computation and memory bandwidth suitable for them. The<br />
TSUBAME 2.0 supercomputer with 4224 <strong>GPU</strong>s has started since<br />
November 2010. We study a metal dendritic solidification by solving<br />
the phase-field model. The performance of 2.0 Petaflops was<br />
achieved for 4,096x6,500x1,0400 mesh on 4000 <strong>GPU</strong>s and we<br />
received the ACM Gordon Bell Prize in 2011. We also demonstrated<br />
several large-scale stencil applications (Lattice Boltzmann,<br />
weather prediction and so on) with stereoscopic 3D visualization.<br />
Speaker(s): Takayuki Aoki (Professor, Tokyo Institute of <strong>Technology</strong>)<br />
Topic(s): Supercomputing, Computational Fluid Dynamics, Climate &<br />
Weather Modeling, Stereoscopic 3D (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM L<br />
S0418 High Productivity Computational Finance on <strong>GPU</strong>s<br />
Learn how Aon Benfield helps clients use <strong>GPU</strong>s to develop and<br />
accelerate Monte Carlo derivatives pricing models. We will<br />
present our PathWise software tools used by actuaries and quants<br />
in order to rapidly develop and deploy production quality, <strong>GPU</strong> grid<br />
enabled, Monte Carlo models, using only high-level languages and<br />
tools without requiring any knowledge of CUDA or C/C++. We will<br />
describe our approaching of using Code Generation, Visual<br />
<strong>Program</strong>ming, Domain Specific Languages and scripting<br />
languages to create a High Productivity Computing software stack<br />
for financial services applications.<br />
Speaker(s): Aamir Mohammad (Associate Director, Aon Benfield<br />
Securities), Peter Phillips (SVP, Aon Benfield Securities)<br />
Topic(s): Finance, Application Design & Porting Techniques, Parallel<br />
<strong>Program</strong>ming Languages & Compilers (Beginner)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM A7<br />
S0434 Schlumberger LiveQuest: Application Delivery<br />
and Collaboration Solution<br />
The LiveQuest application delivery and collaboration solution<br />
allows petro-technical professionals to securely access and share<br />
exploration and production (E&P) applications and data, including<br />
3D visualization applications, anytime, anywhere. By utilizing web<br />
and thin-client technologies, LiveQuest provides platformindependent<br />
and application-agnostic real-time collaboration. In<br />
this session, Mario Dean will provide an introduction to the needs<br />
of the O&G exploration from an application and large data 3D<br />
visualization perspective. He will discuss the LiveQuest solution<br />
stack, with specific focus on the 3D remote visualization<br />
technology, and share customer deployment examples and overall<br />
ROI considerations.<br />
Speaker(s): Mario Dean (Schlumberger)<br />
Topic(s): Energy Exploration (Beginner)<br />
TUESDAY, MAY 15, 14:00 (90 MINUTES)<br />
HALL 1<br />
S0515 Multi-<strong>GPU</strong> <strong>Program</strong>ming<br />
CUDA releases starting with 4.0 include a number of features that<br />
facilitate multi-<strong>GPU</strong> programming and computing. In this session<br />
we will review the features useful for programming for multiple<br />
<strong>GPU</strong>s, both within a single node and across network. We will cover<br />
peer-to-peer <strong>GPU</strong> communication, communication patterns for<br />
various <strong>GPU</strong> topologies, as well as streams in the context of<br />
multiple <strong>GPU</strong>s. Concepts will be illustrated with a case study of 3D<br />
forward wave modeling, common in seismic computing.<br />
Speaker(s): Paulius Micikevicius (Developer <strong>Technology</strong><br />
Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />
TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />
ROOM K<br />
S0519 <strong>GPU</strong> Accelerated Bioinformatics Research at BGI<br />
After digitizing DNA double helix by sequencing, computation is<br />
the key connecting raw sequences with life science discoveries. As<br />
massive data is generated, how to process and analysis as well as<br />
storage them in an efficiently manner turns out to be a major<br />
challenge. By developing <strong>GPU</strong> accelerated bioinformatics tools<br />
and integrate them into pipelines, BGI researchers now run<br />
analysis pipelines in several hours instead of several days. These<br />
tools include SOAP3 aligner, SNP calling and tool for population<br />
genomics. The speed up is generally around 10-50x comparing<br />
with traditional counterparts.<br />
Speaker(s): BingQiang Wang (Head of High Performance<br />
Computing, BGI)<br />
Topic(s): Bioinformatics, Life Sciences, Algorithms & Numerical<br />
Techniques, Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (240 MINUTES)<br />
ROOM A2<br />
S0606 <strong>GPU</strong>-accelerated Science on Titan: Tapping into the<br />
World’s Preeminent <strong>GPU</strong> Supercomputer to Achieve<br />
Better Science<br />
This year, the leadership-class computing facility at Oak Ridge<br />
National Labs is upgrading its largest supercomputer for open<br />
science, “Jaguar”, to employ high-performance, power- efficient<br />
<strong>GPU</strong>s. Once the transition is complete, the machine will be known<br />
as “Titan”. In this extended <strong>GTC</strong> session, we will feature a range of<br />
33 CONFERENCE GUIDE TUESDAY
BULL Ad?
presenters showcasing research codes that will run<br />
computational science on the <strong>GPU</strong> at scale. Through these<br />
selected presentations, we will investigate the progress and<br />
anticipated results of <strong>GPU</strong>-acceleration of these significant codes.<br />
In this session, we will also explain how research scientists<br />
interested in tapping into the immense capabilities of Titan can do<br />
so, through programs such as the Incite program sponsored by<br />
the US Department of Energy. The presenters include:<br />
�����������������������������������������������������������<br />
National Laboratories)<br />
“Direct Numerical Simulation of Turbulence-Chemistry<br />
Interactions: Fundamental Insights Towards Predictive Models”<br />
���������������������������������������������������<br />
“S3D Direct Numerical Simulation - Preparations for the<br />
10-100PF Era”<br />
�����������������������������������������������������������<br />
Princeton Plasma Physics Laboratory (PPPL), Princeton)<br />
“Fusion Energy Sciences & Computing at the Extreme Scale”<br />
���������������������������������������������<br />
�������������������������������������������������<br />
“Computer Simulation of Lignocellulosic Biomass”<br />
����������������������������������������������������������������<br />
Science, Princeton)<br />
“Toward Global Seismic Imaging based on Spectral-Element<br />
and Adjoint Methods”<br />
Speaker(s): Jack Wells, Ph.D. (Director of Science, Oak Ridge<br />
Leadership Computing Facility, Oak Ridge National Laboratory)<br />
Topic(s): Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (25 MINUTES)<br />
ROOM B<br />
S0609 Computational Graphics: An Overview of Graphics<br />
Research at NVIDIA<br />
The future of computer graphics presents many challenges. The<br />
worlds we render will be vastly more complex in geometry and<br />
artistic “texture”. Real-time rendering will use global illumination<br />
to achieve a far richer appearance, robustly. And content creation,<br />
which has grown to be the dominant cost of producing both games<br />
and film, must get simpler and less expensive. The NVIDIA<br />
Graphics Research group addresses these challenges with a focus<br />
on “Computational Graphics”: using general-purpose computation<br />
to enhance and extend the traditional pipelines and capabilities of<br />
real-time rendering. In this talk David Luebke, who leads graphics<br />
research, will give an overview of recent and ongoing work in<br />
computational graphics at NVIDIA Research.<br />
Speaker(s): David Luebke (Senior Director of Graphics<br />
Research, NVIDIA)<br />
Topic(s): Computer Graphics (Intermediate)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM M<br />
S0632 Learn how Adobe After Effects CS6 takes<br />
advantage of NVIDIA Optix technology for 3D Ray Tracing<br />
(Presented by Adobe)<br />
Adobe After Effects CS6 unveils an amazing new 3D ray-traced<br />
rendering engine based on NVIDIA Optix technology with <strong>GPU</strong><br />
acceleration of up to 50x faster than a CPU alone. This enables<br />
simple and quick designs of realistic geometric text and shapes in<br />
3D space. Motion graphics artists can now create more physically<br />
accurate scenes with beautiful results such as reflections,<br />
transparency, soft shadows, and depth-of-field blur directly in<br />
After Effects. <strong>GPU</strong>-accelerated ray tracing drastically improves<br />
the workflow by enabling motion graphics artists to develop these<br />
3D effects entirely within After Effects.<br />
Speaker(s): Steve Forde (Senior Product Manager, After Effects)<br />
Topic(s): Digital Content Creation (Beginner)<br />
TUESDAY, MAY 15, 14:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0801 CUDA Debugger Training on Windows<br />
Nsight offers a variety of powerful CUDA debugging feature set<br />
that enables developers to quickly spot bugs. From the memory<br />
checker to advanced breakpoints and variable warp watch panel, a<br />
developer can quickly isolate access memory errors, filter out the<br />
thousands of threads to a specific thread and quickly spot<br />
abnormal variable value ranges. Through a set of comprehensive<br />
exercises, the attendee will be able to utilize these features to<br />
become fully proficient at developing CUDA code.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />
ROOM J3<br />
S0046 Application of the <strong>GPU</strong> to a Two-Part<br />
Computational Electromagnetic Algorithm<br />
The shooting and bouncing ray (SBR) method is one way to<br />
simulate electromagnetic field radiation. Like all methods, there<br />
are certain problems where it does not yield accurate results. In<br />
this presentation, we will explain one such case that consists of an<br />
antenna resonating between two metal plates. We will discuss<br />
how we used the graphics processing unit (<strong>GPU</strong>) to separate the<br />
problem into two parts. Each part is simulated individually with<br />
SBR producing an improved result. Such a <strong>GPU</strong>-accelerated,<br />
two-part approach can be applied to other more general<br />
hybrid simulations.<br />
Speaker(s): Eric Dunn (Electromagnetic Research Scientist, SAIC)<br />
Topic(s): Computational Physics, Algorithms & Numerical Techniques,<br />
Ray Tracing (Beginner)<br />
TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />
ROOM A1<br />
S0351 Strong Scaling for Molecular Dynamics<br />
Applications<br />
In this session we will talk about how to improve strong scaling for<br />
molecular dynamics applications. Using the NAMD molecular<br />
dynamics code as our primary case study, we will discuss the<br />
types of issues that can impede scaling, how to use already<br />
available and custom tools to discover such issues, and how to<br />
build a model to help analyze and predict scaling performance.<br />
Although this session is primarily focused on molecular dynamics<br />
applications, most of the lessons can be applied equally well to<br />
many other areas and applications.<br />
Speaker(s): Sarah Tariq (Software Engineer, NVIDIA)<br />
Topic(s): Molecular Dynamics, Cluster Management, Life Sciences<br />
(Intermediate)<br />
TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />
ROOM A8<br />
S0379 <strong>GPU</strong>-based High-Performance Simulations<br />
for Spintronics<br />
The joint utilization of the electron’s charge and spin in<br />
“spintronics” represents a promising technology for data<br />
processing and storage in nanostructures. The complex quantum<br />
effects like the spin-Hall effect in these devices require<br />
demanding numerical simulations providing a convenient link<br />
between idealized analytical models to often very complex results<br />
35 CONFERENCE GUIDE TUESDAY
TUESDAY<br />
from measurements. The simulations involving multiplications<br />
and inversions of large matrices provide an ideal showcase for<br />
performance gain by employing GP<strong>GPU</strong>s in the execution of the<br />
algebraic routines on these matrices in computing environments<br />
with shared execution of algorithms on multiple nodes with<br />
multiple GP<strong>GPU</strong>s and CPU cores.<br />
Speaker(s): Jan Jacob (Postdoctoral Researcher, University of Hamburg)<br />
Topic(s): General Interest, Computational Physics, Application Design<br />
& Porting Techniques (Intermediate)<br />
TUESDAY, MAY 15, 14:30 (50 MINUTES)<br />
ROOM K<br />
S0516 The Advantage of <strong>GPU</strong> Computation for Analyzing<br />
Complex Traits<br />
Most import agriculture traits and human diseases are complex<br />
traits which are controlled by gene network with gene by gene<br />
interaction (epistasis) and gene by environment interaction (GE).<br />
New statistic methods and software are developed for analyzing<br />
genetic architecture for complex traits based on genome-wide<br />
association study (GWAS). When deal with large mapping<br />
population and huge amount of molecular information, <strong>GPU</strong><br />
computation has an advantage over CPU computation. We will<br />
demonstrate the newly developed <strong>GPU</strong> based software<br />
QTLNetwork V3.0 and GWAS-GMDR for mapping genes with<br />
epistasis and GE interaction for complex traits of human, crops,<br />
and mouse.<br />
Speaker(s): Jun Zhu (Professor, Zhejiang University)<br />
Topic(s): Bioinformatics, Life Sciences (Intermediate)<br />
TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />
ROOM B<br />
S0610 Octree-Based Sparse Voxelization For Real-Time<br />
Global Illumination<br />
Discrete voxel representations are generating growing interest in<br />
a wide range of applications in computational sciences and<br />
particularly in computer graphics. A new real-time usage of<br />
dynamic voxelization inside a sparse voxel octree is to compute<br />
voxel-based global illumination. When used in real-time contexts,<br />
it becomes critical to achieve fast 3D scan conversion (also called<br />
voxelization) of traditional triangle-based surface representations.<br />
This talk describes an new surface voxelization algorithm that<br />
produces a sparse voxel representation of a triangle mesh scene<br />
in the form of an octree structure using the <strong>GPU</strong> hardware<br />
rasterizer. In order to scale to very large scenes, our approach<br />
avoids relying on an intermediate full regular grid to build the<br />
structure and constructs the octree directly.<br />
Speaker(s): Cyril Crassin (Postdoctoral Research Scientist, NVIDIA)<br />
Topic(s): Computer Graphics (Intermediate)<br />
TUESDAY, MAY 15, 14:30 (25 MINUTES)<br />
ROOM A2<br />
S0655 Direct Numerical Simulation of Turbulence-<br />
Chemistry Interactions: Fundamental Insights Towards<br />
Predictive Models<br />
Recent petascale direct numerical simulation (DNS) of turbulent<br />
combustion have transformed our ability to interrogate finegrained<br />
‘turbulence-chemistry’ interactions in canonical<br />
laboratory configurations. In particular, three-dimensional DNS,<br />
at moderate Reynolds numbers and with complex chemistry, is<br />
providing unprecedented levels of detail to understand<br />
fundamental coupling between turbulence, mixing and reaction.<br />
This information is leading to new physical insight and is providing<br />
unique validation data for assessing model assumptions in<br />
coarse-grained engineering CFD approaches used to design<br />
modern combustors. The role of petascale DNS is illustrated<br />
through selected examples relevant to controlling ignition and<br />
combustion rates in homogeneous charge compression ignition<br />
engines and to fuel injection processes in stationary gas turbines<br />
for power generation. Petascale simulations presently generate<br />
upwards of a petabyte of complex, multi-scale, time-varying data<br />
used by combustion modelers to validate subfilter combustion and<br />
mixing models in large-eddy simulation. With the advent of 10-20<br />
petaflop hybrid architectures with accelerators like Titan at Oak<br />
Ridge National Laboratory, it will be possible to dramatically<br />
increase the chemical complexity of DNS. This will help accelerate<br />
the development of predictive subprocess models which will be<br />
used by engine developers to better understand and tailor the<br />
combustion of gasoline and new, more complex types of fuels in<br />
advanced engines. With Titan, simulations will move beyond<br />
today’s studies of simple fuels—hydrogen, syngas and methane—<br />
to more complex, larger-molecule hydrocarbon fuels like<br />
isooctane (a surrogate for gasoline), commercially important<br />
oxygenated alcohols (for example, ethanol and butanol), and<br />
biofuel surrogates.<br />
Speaker(s): Jacqueline H. Chen (Combustion Research Facility, Sandia<br />
National Laboratories)<br />
Topic(s): Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM L<br />
S0034 Real-Time Risk Simulation: The <strong>GPU</strong> Revolution In<br />
Profit Margin Analysis<br />
Discover how ICHEC helped a world leading company in its sector,<br />
to dramatically speed-up and improve the quality of its real-time<br />
risk management tool chain. In this session, we present the<br />
method used for porting the core-part of the simulation engines<br />
to <strong>GPU</strong>s using CUDA. This porting was realized on two very<br />
different simulation algorithms and resulted in speed-ups of 2 to<br />
3 orders of magnitude, allowing much greater accuracy of the<br />
results in a real-time environment.<br />
Speaker(s): Gilles Civario (Senior Software Architect, ICHEC), Renato<br />
Miceli (Computational Scientist, ICHEC)<br />
Topic(s): Finance, Application Design & Porting Techniques, Algorithms<br />
& Numerical Techniques (Intermediate)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM C<br />
S0036 Multiparticle Collision Dynamics on <strong>GPU</strong>s<br />
See how we employ <strong>GPU</strong>s to simulate the interaction of millions of<br />
solvent and solute particles of a fluid system. Often the domain of<br />
large cluster system, the most time consuming part of our<br />
simulations can now be done on desktop PCs in reasonable time.<br />
This contribution shows how <strong>GPU</strong>s can effectively be used to<br />
accelerate existing programs and how techniques like streaming<br />
and increased data locality significantly enhance calculation<br />
throughput. It also shows how a <strong>GPU</strong>-optimized program<br />
structure yields usually expensive additional functionality “almost<br />
free”. Furthermore, a well-scaling single-node/multi-<strong>GPU</strong><br />
implementation of the program is presented.<br />
Speaker(s): Elmar Westphal (Software Developer,<br />
Forschungszentrum Juelich)<br />
Topic(s): Computational Physics, Computational Fluid Dynamics,<br />
Molecular Dynamics (Intermediate)
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM J2<br />
S0049 Using the <strong>GPU</strong> Direct for Video API<br />
This tutorial will demonstrate how video I/O devices can take<br />
advantage of the <strong>GPU</strong> Direct for Video API to optimize the data<br />
transfer performance for digital video, film and broadcast<br />
applications and computer vision applications. The <strong>GPU</strong> Direct for<br />
Video API is a technology that permits the DMA transfer of data<br />
buffers between video I/O devices and the <strong>GPU</strong> through the use of<br />
a shared system memory buffer for immediate processing by<br />
OpenGL, DirectX, CUDA and OpenCL. This direct transfer can<br />
improve synchronization and eliminate latency between video<br />
capture, <strong>GPU</strong> processing and video output.<br />
Speaker(s): Alina Alt (Applied Engineer, NVIDIA), Thomas True (Applied<br />
Engineer, NVIDIA)<br />
Topic(s): Audio, Image and Video Processing, Development Tools &<br />
Libraries, Digital Content Creation & Film, Machine Vision (Advanced)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM A8<br />
S0067 PICon<strong>GPU</strong> - Bringing large-scale Laser Plasma<br />
Simulations to <strong>GPU</strong> Supercomputing<br />
With powerful lasers breaking the Petawatt barrier, applications<br />
for laser-accelerated particle beams are gaining more interest<br />
than ever. Ion beams accelerated by intense laser pulses foster<br />
new ways of treating cancer and make them available to more<br />
people than ever before. Laser-generated electron beams can<br />
drive new compact x-ray sources to create snapshots of ultrafast<br />
processes in materials. With PICon<strong>GPU</strong> laser-driven particle<br />
acceleration can be computed in hours compared to weeks on<br />
standard CPU clusters. We present the techniques behind<br />
PICon<strong>GPU</strong>, detailed performance analysis and the benefits of<br />
PICon<strong>GPU</strong> for real-world physics cases.<br />
Speaker(s): Michael Bussmann (Junior Group Leader Computational<br />
Radiation Physics, Helmholtz-Zentrum Dresden-Rossendorf), Guido<br />
Juckeland (System Engineer (HPC), Technical University Dresden)<br />
Topic(s): Computational Physics, Algorithms & Numerical Techniques,<br />
Application Design & Porting Techniques, Supercomputing (Advanced)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM A1<br />
S0075 Oculus Real-Time Modular Cognitive Vision System<br />
This session will explore ways to integrate <strong>GPU</strong> processing into a<br />
real-time computer vision architecture. While there has been a<br />
rapid push to move vision algorithms onto <strong>GPU</strong>s, integration into<br />
an efficient vision system architecture remains elusive. We will<br />
discuss our development of a modular vision system architecture<br />
that enables rapid prototyping of complex pipelines using multiple<br />
<strong>GPU</strong>s. The system incorporates modules for segmentation,<br />
disparity mapping, optical flow and particle filter tracking on the<br />
<strong>GPU</strong>. Our talk will explore the various difficulties associated with<br />
developing such a system and will give a hands-on demonstration<br />
of Oculus, our vision platform.<br />
Speaker(s): Jeremie Papon (PhD Student, University of Gottingen),<br />
Alexey Abramov (PhD Student, University of Gottingen)<br />
Topic(s): Computer Vision, Audio, Image and Video Processing,<br />
Application Design & Porting Techniques, Machine Vision (Intermediate)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM N<br />
S0223 Rapid Training of Acoustic Models Using <strong>GPU</strong>s<br />
Learn how to realize robust and accurate speech recognition<br />
systems by training acoustic models on <strong>GPU</strong>s. For common<br />
languages, state-of-the-art systems are now trained on<br />
thousands of hours of speech data, which can take weeks even<br />
with a large cluster of machines. To overcome this development<br />
bottleneck, we propose a new framework for rapid training of<br />
acoustic models using highly parallel <strong>GPU</strong>s. With a single NVIDIA<br />
GTX580 <strong>GPU</strong>, our proposed approach is shown to be 51x faster<br />
than a sequential CPU implementation, enabling a moderately<br />
sized acoustic model to be trained on 1000-hour speech data in<br />
just over 9 hours.<br />
Speaker(s): Jike Chong (Co-Director of CUDA Research Center,<br />
Carnegie Mellon University), Ian Lane (Assistant Research Professor,<br />
Carnegie Mellon University)<br />
Topic(s): Audio, Image and Video Processing, Machine Learning & AI<br />
(Intermediate)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0308 Recent Trends in Hierarchical N-body Methods<br />
on <strong>GPU</strong>s<br />
See the newest developments in the area of hierarchical N-body<br />
methods for <strong>GPU</strong> computing. Hierarchical N-body methods have<br />
O(N) complexity, are compute bound, and require very little<br />
synchronization, which makes them a favorable algorithm on<br />
next-generation supercomputers. In this session we will cover<br />
topics such as hybridization of treecodes and fast multipole<br />
methods, auto-tuning kernels for heterogenous systems, fast tree<br />
construction based on prefix sums, fast load balancing of global<br />
trees, and more. Examples will be given using ExaFMM --an open<br />
source hierarchical N-body library for heterogenous systems<br />
developed by the speaker. (Released at SC11)<br />
Speaker(s): Rio Yokota (Research Scientist, King Abdullah University of<br />
Science and <strong>Technology</strong>)<br />
Topic(s): Algorithms & Numerical Techniques, Supercomputing,<br />
Development Tools & Libraries (Intermediate)<br />
TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />
ROOM J3<br />
S0349 Tree Accumulation on the <strong>GPU</strong><br />
Learn how to map irregular tree structured computations to the<br />
<strong>GPU</strong> efficiently. See how extremely irregular data-dependent<br />
computations can be implemented by composing them out of<br />
regular data-parallel primitives. In particular we focus on the<br />
problem of tree accumulation, a generalization of the scan primitive<br />
to arbitrary tree data structures. We first show how tree orderings<br />
and properties can be computed using the Euler tour technique and<br />
standard scan primitives. Using these orderings we then develop<br />
our new approach to computing tree accumulations in parallel.<br />
Speaker(s): Scott Rostrup (Software Engineer, Synopsys Inc)<br />
Topic(s): Algorithms & Numerical Techniques, Application Design &<br />
Porting Techniques (Advanced)<br />
TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />
ROOM J1<br />
S0403 NURBS Tessellation with CUDA<br />
NURBS, or Non Uniform Rational B Splines, are a curved surface<br />
representation commonly used in computer aided design and<br />
digital content creation. This recursive representation gives a great<br />
deal of flexibility, allowing arbitrary surface order and knot vectors,<br />
enabling a single NURBS surface to contain many contiguous<br />
patches. However, this recursive representation is also expensive to<br />
compute, so a NURBS surface is often converted into multiple<br />
Bezier patches before being tessellated. In this implementation, we<br />
37 CONFERENCE GUIDE TUESDAY
TUESDAY<br />
present an efficient method for directly tessellating NURBS<br />
surfaces using the NVIDIA CUDA computing API.<br />
Speaker(s): Brent Oster (Applied Engineer, NVIDIA)<br />
Topic(s): Computer Graphics (Advanced)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM A3<br />
S0407 A High Level <strong>Program</strong>ming Environment for<br />
Accelerated Computing<br />
One of the critical hurdles for the widespread adoption of accelerated<br />
computing in HPC is programming difficulty. Users need a simple<br />
programming model that is portable and is not significantly different<br />
from the approaches used on current multi-core x86 processors. In<br />
this talk I will present Cray’s strategy to accelerator programming,<br />
which is based on a high level programming environment with tightly<br />
coupled compilers, libraries, and tools. Ease of use is possible with<br />
compiler making it feasible for users to write applications in Fortran,<br />
C, C++, tools to help users port and optimize for accelerators, and<br />
auto-tuned scientific libraries.<br />
Speaker(s): Luiz DeRose (Director of <strong>Program</strong>ming Environment,<br />
Cray Inc.)<br />
Topic(s): Development Tools & Libraries, Parallel <strong>Program</strong>ming<br />
Languages & Compilers (Intermediate)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM A5<br />
S0413 Delivering 3D Professional Graphics from the<br />
Cloud with Citrix XenDesktop<br />
Recent technological advances have made it practical to deliver<br />
3D professional graphics applications from the Cloud (private or<br />
public) with a high quality user experience and at an attractive<br />
cost. Organizations can keep their intellectual property safe in the<br />
data center since only fully-rendered screen images are sent over<br />
the network. Users in remote locations no longer have to wait for<br />
large file transfers. And they can access 3D models from a wide<br />
variety of devices, including iPads and Android tablets. Learn how<br />
Citrix XenDesktop, XenServer and Receiver technologies have<br />
made all of this a reality for many organizations today.<br />
Speaker(s): Derek Thorslund (Director of Product Management, Citrix<br />
Systems, Inc.)<br />
Topic(s): Cloud Computing, Computer Graphics, Visualization (Beginner)<br />
TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />
ROOM A7<br />
S0436 Integrated <strong>GPU</strong> Acceleration With Real Time<br />
Visualization Of Terabyte Data<br />
Computation and visualization doesn’t necessarily have to act as<br />
two separate entities. This talk explains the integration of real-time<br />
compute with real-time visualization. Industry and academia have<br />
provided attractive solutions for compiler-directive optimized code<br />
for computations. To support cases that involves massive yet ad-hoc<br />
data I/O and computation with interactive visualization, Hue<br />
developed a different model which bridges the gap between<br />
“complete system rewrite” and “compiler directive optimized code”.<br />
The talk explains how highly optimized data I/O mechanisms<br />
coupled with predefined input and output definitions for kernels<br />
provide excellent scalability and interactivity during runtime.<br />
Speaker(s): Kelly Walker (Senior Software Developer, Hue)<br />
Topic(s): Visualization, Energy Exploration (Beginner)<br />
TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />
ROOM B<br />
S0611 Edge-Aware Shaders for Real-Time<br />
Computer Graphics<br />
The most common approach in rendering is to define behavior at a<br />
point in terms of material properties and incident illumination.<br />
That approach works well when the geometry and material<br />
properties are well-known, and the light physics are simulated<br />
accurately. We present a technique to help situations where the<br />
model and/or physics is incomplete. This technique augments<br />
shaders with information about nearby edges, such as corners<br />
and boundaries between materials, and makes it natural to add<br />
richness procedurally near these visually critical regions.<br />
Speaker(s): Peter-Pike Sloan (Principal Research Scientist, NVIDIA)<br />
Topic(s): Computer Graphics (Intermediate)<br />
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM M<br />
S0620 VSIPL++: A High-Level <strong>Program</strong>ming Model<br />
for Productivity and Performance (Presented by<br />
Mentor Graphics)<br />
Learn how VSIPL++ can improve your productivity and provide<br />
software portability, without sacrificing performance. We will<br />
describe how VSIPL++’s open-standard high-level programming<br />
model addresses the challenges of writing high-performance<br />
embedded software on GP-<strong>GPU</strong>s and other heterogeneous<br />
hardware, using advanced C++ techniques and data abstraction –<br />
and how we make this work in the real world. We will also present<br />
a comparison of performance results from various configurations<br />
of CPU and GP-<strong>GPU</strong> processing engines for a signal processing<br />
application developed using VSIPL++.<br />
Speaker(s): Brooks Moses, Ph.D. (Sourcerer, Mentor<br />
Graphics Corporation)<br />
Topic(s): Supercomputing (Beginner)<br />
TUESDAY, MAY 15, 15:00 (25 MINUTES)<br />
ROOM A2<br />
S0625 S3D Direct Numerical Simulation - Preparations<br />
for the 10-100PF Era<br />
The evolution of supercomputing into the mid-petaflop era has<br />
been typified by heterogenous compute nodes with the majority of<br />
the compute capability delivered by a large number of lightweight<br />
cores. In order to prepare for the extension of this trend, the DNS<br />
code S3D has been retooled in anticipation of a target architecture<br />
offering 10s of thousands of heterogeneous nodes containing many<br />
X86 cores as well as <strong>GPU</strong> derived accelerators. Movement of outer<br />
loops to the highest level in the code facilitates hybrid MPI-OpenMP<br />
performance and an elegant path to accelerated kernels using<br />
OpenACC. It is anticipated that relevant scientific simulations at this<br />
scale will have a per-node footprint that can be contained entirely<br />
on the accelerator, so provision is made to maintain primary<br />
solution variables in accelerator memory with specific regions<br />
moved to the CPU for inter-node communication and workload<br />
balancing. With the current performance it is estimated that the<br />
new code will make it possible to meet early science goals with the<br />
full build-out of the anticipated Titan system as well as provide a<br />
platform to transition into the exascale software research space.<br />
Speaker(s): Ray Grout (National Renewable Energy Laboratory)<br />
Topic(s): Supercomputing (Beginner)
TUESDAY, MAY 15, 15:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0802 CUDA Profiler Training on Windows<br />
Nsight offers a comprehensive set of performance analysis tools.<br />
From the ability to trace complete system multi-core CPU and<br />
multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />
experiments, developers can identify system level optimization<br />
opportunities as well as expensive and inefficient CUDA kernels<br />
requiring in-depth analysis with the CUDA profiler. Through a set<br />
of comprehensive exercises, the attendee will be able to utilize<br />
these features to become fully proficient at optimizing complex<br />
CUDA applications.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />
ROOM K<br />
S0152 Accurate Sequence Alignment using Distributed<br />
Filtering on <strong>GPU</strong> Clusters<br />
Learn how <strong>GPU</strong>s enable new ways to rethink a complex<br />
bioinformatics problem: Accurate sequence alignment. What was<br />
once prohibitive to compute can become the basic block of novel<br />
<strong>GPU</strong>-based algorithms. Modern DNA sequencing machines<br />
generate enormous amounts of short sequences within minutes,<br />
and they should be aligned to a reference genome in real time.<br />
Most solutions only find a few locations that match a short<br />
sequence. We introduce a new technique to find all matching<br />
locations inside a reference sequence for a given number of<br />
mismatches. Our technique is based on a distributed filtering<br />
scheme and <strong>GPU</strong> based processing.<br />
Speaker(s): Reza Farivar (PhD Student, University of Illinois at Urbana-<br />
Champaign), Shivaram Venkataraman (PhD Student, UC Berkeley)<br />
Topic(s): Bioinformatics, Algorithms & Numerical Techniques<br />
(Intermediate)<br />
TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />
ROOM J3<br />
S0316 Using <strong>GPU</strong>s to Accelerate Synthetic Aperture<br />
Sonar Imaging via Backpropagation<br />
This presentation describes our development of a <strong>GPU</strong>accelerated<br />
backpropagation implementation for Synthetic<br />
Aperture Sonar systems that supports multiple nodes via MPI and<br />
multi-<strong>GPU</strong> nodes. This implementation can form a complexvalued<br />
gigapixel image in one hour on a single C2050. We further<br />
scale this implementation to the Keeneland system where we can<br />
form the same gigapixel image in 21 seconds on 48 nodes with<br />
144 C2070 Tesla <strong>GPU</strong>s. Our talk will discuss the details of our<br />
implementation, including our optimizations and scaling results<br />
for various node and <strong>GPU</strong> configurations, as well as the<br />
applicability to other domains, including Synthetic Aperture Radar.<br />
Speaker(s): Thomas Benson (Research Engineer II, Georgia Tech<br />
Research Institute)<br />
Topic(s): Application Design & Porting Techniques (Intermediate)<br />
TUESDAY, MAY 15, 15:30 (50 MINUTES)<br />
ROOM J1<br />
S0366 OptiX Out-of-Core and CPU Rendering<br />
OptiX has broken some major barriers recently by enabling<br />
out-of-<strong>GPU</strong>-core memory rendering and by adding a CPU<br />
rendering back-end when an OptiX-capable <strong>GPU</strong> is not present in<br />
the system. OptiX users and CUDA developers will be interested in<br />
how we accomplished these feats within the existing <strong>GPU</strong><br />
architecture. This talk will provide a brief introduction to OptiX and<br />
then dive into what the new features provide. We will then go<br />
under the covers and show how we pulled it off.<br />
Speaker(s): David McAllister (OptiX Manager, NVIDIA, OptiX group)<br />
Topic(s): Ray Tracing, Computer Graphics (Intermediate)<br />
TUESDAY, MAY 15, 15:30 (50 MINUTES)<br />
ROOM B<br />
S0409 Stochastic Rasterization<br />
Learn how to render transparency, motion blur, and depth of field<br />
effects in real time using random sampling. These effects<br />
combine multiple objects in each pixel, making them expensive to<br />
compute directly. But recent research shows that, with stratified<br />
sampling and clever reconstruction, good image quality can be<br />
achieved with surprisingly small numbers of samples per pixel.<br />
We will explain how to do this on the <strong>GPU</strong>, and explore trade-offs<br />
of performance, quality, accuracy, and noise.<br />
Speaker(s): Eric Enderton (Research Scientist, NVIDIA), Morgan<br />
McGuire (Visiting Professor, NVIDIA and WIlliams College)<br />
Topic(s): Computer Graphics, Digital Content Creation & Film<br />
(Intermediate)<br />
TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />
ROOM A7<br />
S0444 Explore New Techniques in Volume Rendering/<br />
Segmentation with Open Inventor<br />
The goal of this session is to show the improvements in quality,<br />
performance and flexibility of the volume rendering implementation<br />
of Open Inventor. The latest <strong>GPU</strong> techniques, such as virtual<br />
textures and ray casting, have been combined into a flexible shader<br />
API and applied on out of core data. The techniques of volume<br />
rendering, sugarcube rendering, basic and complex clipping,<br />
sculpting, editing and segmentation will be demonstrated using<br />
examples from a geobody extraction workflow. The great ease and<br />
flexibility of the shader pipeline API will be illustrated, and we will<br />
discuss the broad future perspectives of that technology.<br />
Speaker(s): Mike Heck (<strong>Technology</strong> Advisor, VSG)<br />
Topic(s): Computer Graphics (Advanced)<br />
TUESDAY, MAY 15, 15:30 (25 MINUTES)<br />
ROOM A2<br />
S0654 Fusion Energy Sciences & Computing at the<br />
Extreme Scale<br />
The fusion energy sciences community has made excellent progress<br />
in developing advanced codes for which computer run-time and<br />
problem size scale well with the number of processors on massively<br />
parallel supercomputers. A good example is the effective usage of<br />
the full power of modern leadership class computational platforms<br />
from the terascale to the petascale and beyond to produce nonlinear<br />
particle-in-cell simulations which have accelerated progress in<br />
understanding the nature of plasma turbulence in magneticallyconfined<br />
high temperature plasmas. Illustrative results provide great<br />
encouragement for being able to include increasingly realistic<br />
dynamics in extreme-scale computing campaigns to enable<br />
predictive simulations with unprecedented physics fidelity.<br />
Speaker(s): William Tang (Fusion Simulation <strong>Program</strong> at the Princeton<br />
Plasma Physics Laboratory (PPPL), Princeton)<br />
Topic Area(s): Supercomputing (Intermediate)<br />
39 CONFERENCE GUIDE TUESDAY
The Many-Core Company<br />
Discover our global solutions for many-core programming:<br />
Software tools<br />
Expertise<br />
and the methodology to safely port your code<br />
www.caps-entreprise.com
TUESDAY, MAY 15, 16:00 (50 MINUTES<br />
ROOM K<br />
S0008 Algorithms and Tools for Bioinformatics on <strong>GPU</strong>s<br />
Learn how to use <strong>GPU</strong>s to accelerate compute- and data-intensive<br />
applications and algorithms Bioinformatics. High-throughput<br />
techniques for DNA sequencing and gene expression analysis with<br />
microarrays have led to a rapid growth in the amount of digital<br />
biological data, e.g. the NCBI Sequence Read Archive (SRA) houses<br />
raw sequence data generated by next-generation sequencing (NGS)<br />
technologies which succeeds 25 trillion base-pairs. Therefore,<br />
modern bioinformatics tools need to be scalable; i.e. they need to<br />
deal with an ever growing amount of data. <strong>GPU</strong>s and CUDA provide<br />
the opportunity to significantly reduce the runtime of many<br />
biological algorithms on inexpensive hardware.<br />
Speaker(s): Bertil Schmidt (Nanyang Technological University)<br />
Topic(s): Bioinformatics, Life Sciences (Intermediate)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM J3<br />
S0050 High Performance Logic Simulation with <strong>GPU</strong>s<br />
Verification has become the bottleneck of IC design process due to<br />
its fast increasing complexity. The fundamental means of verifying<br />
digital circuits is logic simulation, which can be performed at both<br />
register-transfer level (RTL) and gate level. In this work, we<br />
developed <strong>GPU</strong> based logic simulation solutions. We implemented<br />
a Chandy-Misra-Bryant parallel simulation protocol on <strong>GPU</strong>s for<br />
sufficient parallelism. A dynamic <strong>GPU</strong> memory allocator was<br />
introduced to efficiently manage <strong>GPU</strong> memory resources. RTL<br />
simulation is performed in a compiled-code scheme by translating<br />
Verilog code into equivalent CUDA code. Experimental results<br />
proved that the <strong>GPU</strong> simulators significantly outperform their<br />
CPU counterparts.<br />
Speaker(s): Yangdong Deng (Associate Professor, Tsinghua University)<br />
Topic(s): General Interest, Algorithms & Numerical Techniques<br />
(Advanced)<br />
TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />
ROOM A1<br />
S0062 Inverse 3D Vision: Detection and Tracking of<br />
NVIDIA Glasses<br />
Computer Vision is becoming increasingly popular and important<br />
nowadays. With the advent of powerful mobile devices and<br />
increasing power of desktop PCs, it is important to improve user<br />
experience by tackling the hardest problems of real-time<br />
interaction with the user. These include body parts tracking, face,<br />
and gesture recognition. This talk discusses techniques behind an<br />
interaction pattern between a user and a 3D visualization system, in<br />
which the system tracks the position of NVIDIA 3D Vision Glasses,<br />
and accounts this information during rendering. The mentioned<br />
techniques include Histograms of Oriented Gradients and Template<br />
Matching. The system implementation is discussed too.<br />
Speaker(s): Anton Obukhov Engineering Consultant, (Ubiquiti Networks)<br />
Topic(s): Computer Vision, Machine Vision, Development Tools &<br />
Libraries, (Advanced)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM A3<br />
S0089 Accelerator Directives, OpenACC and OpenMP4ACC<br />
Rather than require the programmer to rewrite code for<br />
accelerators several directive sets have been created and<br />
proposed to support non-cache coherent and cache coherent<br />
accelerators. This talk will present the OpenACC specification and<br />
its implementation for Cray developers, as well as touch on a<br />
similar proposal being evaluated by the OpenMP language<br />
committee. The presentation will start by discussing the Memory<br />
and Execution model needed to allow a programmer to write<br />
codes that will run effectively on both distinct memory systems<br />
and unified memory systems. Once a proper background has been<br />
set the directives will be examined via usage examples.<br />
Speaker(s): James Beyer (Software Engineer, Cray Inc), David Oehmke<br />
(Cray Inc.)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers,<br />
Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />
ROOM C<br />
S0108 An Innovative Massively Parallelized Molecular<br />
Dynamic Software<br />
In this paper, we present how we improved the speedup of the<br />
electronic structure calculator VASP by more than an order of<br />
magnitude. Recently, the research works done (at IFP Energies<br />
Nouvelles) have shown that by coupling traditional clusters or<br />
High Performance Computing (HPC) machines with accelerators<br />
based on graphical processor units (<strong>GPU</strong>s), by recording the most<br />
time consuming parts of the codes (with programming languages<br />
like CUDA, OpenCL) and offloading them on the graphic chips, it is<br />
possible to reduce the computing time to ensure a speedup of a<br />
factor of 5 to 15.<br />
Speaker(s): Thomas Guignon (Research Engineer, IFPEN), Ani Anciaux<br />
Sedrakian (IFP Energie Nouvelles)<br />
Topic(s): Molecular Dynamics, Supercomputing, Application Design &<br />
Porting Techniques (Intermediate)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0221 1024 Bit Parallel Rational Arithmetic Operators for<br />
the <strong>GPU</strong><br />
Learn how to create a set of rational arithmetic operators that<br />
manipulate 1024 bit operands on a Tesla C2050. These operators<br />
are used to create a numerically stable implementation for Bessel<br />
functions. Naive implementations of the Bessel functions produce<br />
unreliable results when they are used to solve Maxwell’s<br />
equations by way of Mie theory. Maxwell’s equations are used to<br />
model the scattering of light by small particles. Light scatter is<br />
used in Particle Characterization to measure the quality of<br />
materials like cocoa, cement and pharmaceuticals.<br />
Speaker(s): Robert Zigon (Sr. Staff Development Engineer,<br />
Beckman Coulter)<br />
Topic(s): Algorithms & Numerical Techniques, Computational Physics<br />
(Intermediate)<br />
TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />
ROOM A8<br />
S0245 Porting Legacy Plasma Codes to <strong>GPU</strong><br />
Learn how to port legacy Fortran plasma codes to <strong>GPU</strong>. Many legacy<br />
plasma codes are written in Fortran and have many lines of codes.<br />
We will discuss techniques in porting such legacy codes easily and<br />
efficiently to CUDA C/C++. Performance analysis of major algorithmic<br />
patterns in plasma codes will be discussed. The discussion will use<br />
the <strong>GTC</strong> and GeFi plasma code as realistic examples.<br />
Speaker(s): Peng Wang (Devtech Engineer, NVIDIA)<br />
Topic(s): Computational Physics, Computational Physics (Intermediate)<br />
41 CONFERENCE GUIDE TUESDAY
TUESDAY<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM A5<br />
S0261 Scalable <strong>GPU</strong> Computing Service Architecture<br />
In this session we describe our <strong>GPU</strong> accelerated computing<br />
service which supports several internal business processes in a<br />
large scale company setup. The service supports diverse<br />
computational needs such as on-demand rendering, mesh<br />
optimization, a Massive Multiplayer Online Game (MMO), product<br />
visualizations and other demanding computational tasks. We<br />
present the architectural considerations for a service-oriented<br />
computational framework and the practical learning’s and<br />
opportunities encountered during development a enterprise<br />
system using NVIDIA technologies such as CUDA, OptiX, OpenGL<br />
and OpenCL. Our aim is to share knowledge and present LEGO’s<br />
vision for a <strong>GPU</strong> accelerated computational platform as a<br />
business-driven technology.<br />
Speaker(s): Henrik Høj Madsen (Solution Architect, LEGO), Michael<br />
Schøler (Senior Consultant, LEGO)<br />
Topic(s): Cloud Computing, Computer Graphics, Ray Tracing<br />
(Intermediate)<br />
TUESDAY, MAY 15, 16:00 (25 MINUTES)<br />
ROOM A7<br />
S0336 <strong>GPU</strong> Acceleration for Seismic<br />
Interpretation Algorithms<br />
The oil and gas industry is already leveraging <strong>GPU</strong>s for seismic<br />
data processing, but what about 3D seismic interpretation? This<br />
session will cover how the <strong>GPU</strong> is being used by TerraSpark<br />
Geosciences to dramatically decrease the runtime of algorithms<br />
for enhancing faults, computing horizon orientation, and<br />
calculating volumetric curvature. We will share our experiences in<br />
porting these techniques to the <strong>GPU</strong>, the challenges encountered,<br />
the solutions found, and, of course, the benefits to execution time.<br />
Speaker(s): Jonathan Marbach (Director, Software Architecture and<br />
Engineering, TerraSpark Geosciences)<br />
Topic(s): Energy Exploration (Beginner)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM J2<br />
S0356 Optimized Texture Transfers<br />
Many real world graphics applications need to transfer textures<br />
efficiently in and out of <strong>GPU</strong> memory in the form of 2D images,<br />
2.5D terrains or 3D volumes as well as their time-varying<br />
counterparts. The first part of this talk covers technical pointers<br />
on how to optimize your OpenGL application to overlap transfers<br />
with rendering using the NVIDIA Copy Engines. The second part<br />
demonstrates the integration and performance of this feature<br />
within the a real world latency-sensitive broadcast graphics<br />
application from VizRT.<br />
Speaker(s): Shalini Venkataraman (Senior Applied Engineer, NVIDIA),<br />
Gerhard Lang (Chief Engineering Officer, VizRT )<br />
Topic (s): Computer Graphics, Visualization<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM L<br />
S0435 Leveraging GP<strong>GPU</strong> <strong>Technology</strong> for Valuation of<br />
Complex Insurance Products<br />
We share our experiences moving a mature, large scale insurance<br />
application from a CPU to <strong>GPU</strong> environment. This session explores<br />
the nuances of porting a C++ application when ‘blank sheet’<br />
re-architecture is not an option. This session will cover: Insurance<br />
differences from other financial products (and the implications for<br />
the <strong>GPU</strong>), Considerations when moving an existing, fully featured<br />
C++ system to a GP<strong>GPU</strong> platform, Supporting CPU and <strong>GPU</strong><br />
implementations from a single code base, Supporting user defined<br />
code extensions on the <strong>GPU</strong>, CUDA 4.0 C++ extensions: experiences,<br />
challenges and limitations and Performance case study.<br />
Speaker(s): Chris Stiefeling (Oliver Wyman Financial Services)<br />
Topic(s): Finance (Intermediate)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM N<br />
S0526 Tools for Mobile Computational Photography<br />
This session will talk about advances in Mobile Computational<br />
Photography and the tools that NVIDIA is putting together to<br />
enable these on Tegra powered devices. It will demonstrate the<br />
use of FCam, an Application <strong>Program</strong>ming Interface (API) that<br />
allows for easy and precise control of the camera system. In<br />
addition, the FCam API can enable the application developer to<br />
replace basic camera routines such as metering, which are<br />
typically hidden inside black boxes in traditional camera<br />
programming models.<br />
Speaker(s): Alejandro Troccoli (Mobile Imaging Researcher, NVIDIA)<br />
Topic(s): Computational Photography (Intermediate)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM M<br />
S0638 Lenovo ThinkStation Accelerates Medical<br />
Research with Beckman Coulter (Presented by Lenovo)<br />
Lenovo ThinkStations utilize Nvidia Maximus technology to<br />
accelerate mission critical applications across multiple industries,<br />
including manufacturing, media & entertainment, and Life<br />
Sciences. Discover how <strong>GPU</strong>s are used to accelerate medical<br />
research from product experts with Lenovo and Beckman Coulter.<br />
Beckman Coulter has utilized Nvidia <strong>GPU</strong>s to reduce software<br />
development and test cycles by 50% with their Kaluza software.<br />
Kaluza is a revolutionary flow cytometry analysis software solution<br />
that provides visualization tools, speed and an innovative<br />
simplicity to the flow community. See how Kaluza allows users to<br />
analyze 10 million cells in real time. Session attendees will<br />
receive a drawing entry to win a brand new ThinkPad Tablet.<br />
Speaker(s): Scott Ruppert (ThinkStation Technical Solutions Manager,<br />
Lenovo), Tanmay Dharmadhikari (Senior Software Development<br />
Engineer, Beckman-Coulter)<br />
Topic(s): Computer Graphics, Life Sciences (Beginner)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
HALL 1<br />
S0641 CUDA 5 and Beyond<br />
CUDA, NVIDIA’s platform for parallel computing, has grown<br />
rapidly in the past 5 years. The performance and efficiency of<br />
software built on CUDA, combined with a thriving ecosystem of<br />
programming languages, libraries, tools, training, and service<br />
providers, have helped make <strong>GPU</strong> computing a leading HPC<br />
technology. CUDA 5 and the Kepler <strong>GPU</strong> architecture don’t just<br />
increase application performance; they enable a more powerful<br />
parallel programming model that expands the possibilities of <strong>GPU</strong><br />
computing, and language features that improve programmer<br />
productivity. In this talk you’ll hear about these revolutionary<br />
features and get insight into the philosophy driving the<br />
development of new CUDA hardware and software. You will learn<br />
about NVIDIA’s vision for CUDA and the challenges for the future<br />
of parallel software development.
Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
TUESDAY, MAY 15, 16:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0803 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight<br />
development team! Whether you would like a private meeting to<br />
discuss specific product features or test out your application with<br />
the latest version of Nsight, or you just want to hang out with the<br />
team after attending one of the exciting training session, the<br />
lounge is great place to learn everything you ever wanted to know<br />
about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />
ROOM J1<br />
S0021 OptiX for DirectX <strong>Program</strong>mers - EVE Online’s<br />
<strong>GPU</strong>-Raytraced Portraits<br />
By integrating NVIDIA’s OptiX system for real-time <strong>GPU</strong> raytracing<br />
into a DirectX9 based engine, CCP Games enables high-quality<br />
raytraced player portraits for the single shard MMO Eve Online,<br />
reusing the game’s assets and pipeline. We selectively add<br />
stochastic effects while closely maintaining the look of the<br />
DX9-based renderer that Art Direction aimed for. In this talk we<br />
approach OptiX from the point of view of a programmer familiar<br />
with DirectX, discuss integrating these two systems, and show<br />
how we reproduced some DirectX-based effects like transparency<br />
and subsurface scattering within OptiX.<br />
Speaker(s): Bert Peers (Senior Graphics <strong>Program</strong>mer, CCP Games)<br />
Topic(s): Ray Tracing, Computer Graphics, Application Design &<br />
Porting Techniques (Intermediate)<br />
TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />
ROOM A1<br />
S0104 <strong>GPU</strong> Implementation of Deep Learning for<br />
Intelligent Computer Vision<br />
Learn how to use <strong>GPU</strong> supercomputing for intelligent computer<br />
vision, via deep learning algorithms. We will focus on a case study<br />
of visual object and event recognition in a humanoid robotics<br />
context, involving a port to CUDA of the DeSTIN “compositional<br />
spatiotemporal deep learning network” vision processing<br />
algorithm (originally implemented at the University of Tennessee<br />
in Knoxville for conventional serial computers). The audience will<br />
learn how to use the open-source DeSTIN CUDA code, and also<br />
how to port other deep learning algorithms to CUDA.<br />
Speaker(s): Ben Goertzel (CEO, Novamente LLC)<br />
Topic(s): Computer Vision, Algorithms & Numerical Techniques<br />
(Advanced)<br />
TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />
ROOM C<br />
S0314 Efficient k-Nearest Neighbor Search Algorithms<br />
on <strong>GPU</strong>s<br />
Come see how to select the k smallest elements from an unsorted<br />
list. We present a selection and combination of different<br />
algorithms that perform exact k-nearest neighbors search<br />
(k-NNS) on <strong>GPU</strong>s and outperform the competition. In this session<br />
we present four different selection algorithms designed to exploit<br />
differently the parallelization of the <strong>GPU</strong> according to the relative<br />
size of the corpus data set, the size of the query set and the<br />
number of neighbors sought. We show the application of Logo<br />
Retrieval with SIFT vector matching on two different <strong>GPU</strong>s, the<br />
Tesla C1060 and the Fermi GTX480.<br />
Speaker(s): Nikos Pitsianis (Assistant Professor, Aristotle University,<br />
Greece), Xiaobai Sun (Professor, Duke University)<br />
Topic(s): Machine Learning & AI, Databases, Data Mining, Business<br />
Intelligence, Algorithms & Numerical Techniques (Beginner)<br />
TUESDAY, MAY 15, 16:30 (90 MINUTES<br />
ROOM A7<br />
S0628 <strong>GPU</strong>s in Energy & Exploration: Software<br />
Development and Production<br />
This session will feature expert panelists that will share their<br />
experience adopting <strong>GPU</strong>s in their respective environments. Since<br />
2009, these production systems have been boosting throughput,<br />
and shorten cycle times while delivering enhanced images using<br />
NVIDIA technologies. Featured panelists will include: Hess,<br />
Schlumberger, Petrobras, Chevron and more.<br />
Speaker(s): Paulius Micikevicius (Developer <strong>Technology</strong> Engineer, NVIDIA),<br />
Alexander Loddoch (Chevron), Dave Nichols (Schlumberger), Paulo Souza<br />
(Petrobas), Mauricio Araya (Repsol)<br />
Topic(s): Energy Exploration (Beginner)<br />
TUESDAY, MAY 15, 16:30 (25 MINUTES)<br />
ROOM A2<br />
S0659 Computer Simulation of Lignocellulosic Biomass<br />
Biomass from terrestrial plants offers the potential of an<br />
abundant source of cellulosic ethanol. However, technical<br />
problems still hinder the cost-effective conversion of biomass to<br />
ethanol arising from the recalcitrance of biomass to hydrolysis.<br />
Here, computer simulation of biomass is employed to understand<br />
the physical origins of biomass recalcitrance. The temperaturedependent<br />
structure and dynamics of lignin polymers in aqueous<br />
solution are examined using extensive molecular dynamics<br />
simulations. Neutron scattering experiments and molecular<br />
dynamics simulations reveal the structure of lignin aggregates.<br />
Finally, the interaction of lignin with cellulose is examined and<br />
differential binding to crystalline and amorphous cellulose<br />
explained thermodynamically.<br />
Speaker(s): Loukas Petridis (Staff Scientist, Oak Ridge National<br />
Laboratory)<br />
Topic Areas: Supercomputing (Intermediate)<br />
TUESDAY, MAY 15, 17:00 (25 MINUTES)<br />
ROOM K<br />
S0037 SeqNFind: Application Of CUDA <strong>GPU</strong><br />
Technologies To Sequence Alignment Techniques<br />
Explosive growth in the amount of genomic data has created a<br />
need for faster systems that align and compare nucleotide<br />
sequences. With the development of tools for leveraging the<br />
massively parallel architecture of NVIDIA <strong>GPU</strong>s it is a logical next<br />
step to construct algorithms for genomic analysis on <strong>GPU</strong> clouds/<br />
clusters. Although a seemingly simple task, there are a number of<br />
challenges to deploying the current algorithms. Every algorithm<br />
from Smith-Waterman to BLAST has its own unique set of<br />
barriers. Presented here some of the lessons learned and how<br />
ongoing genomic research projects have benefitted from the<br />
increased speed and accuracy.<br />
Speaker(s): D. Andrew Carr (Director of Bioinformatics, Accelerated<br />
<strong>Technology</strong> Laboratories)<br />
Topic(s): Bioinformatics, Algorithms & Numerical Techniques<br />
(Advanced)<br />
43 CONFERENCE GUIDE TUESDAY
TUESDAY<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
HALL 1<br />
S0156 Towards Computing the Cure for Cancer<br />
Attend this session to learn about how to create “designer”<br />
genomic analysis pipelines as part of the “Compute the Cure” for<br />
cancer initiative from NVIDIA Foundation. It will offer an overview<br />
of an open-source framework that enables the creation of<br />
customized genomic analysis pipelines. It will disucss how<br />
different plug-ins from the “mapping/realignment/discovery”<br />
repositories, respectively, can be composed to form a genomic<br />
analysis pipeline. Attendees will learn to use next-generation<br />
sequencing data to characterize previously undetectable genetic<br />
changes between normal and malignant cells and ways to<br />
contribute to the “Compute the Cure” cause.<br />
Speaker(s): Wu Feng (Professor, Virginia Tech), Heshan Lin (Research<br />
Scientist, Virginia Tech)<br />
Topic(s): Bioinformatics, Life Sciences, Supercomputing, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
TUESDAY, MAY 15, 17:00 (25 MINUTES<br />
ROOM C<br />
S0219 Efficient Top-Down Planning in<br />
Business Intelligence<br />
In business intelligence, tasks like corporate planning or what-if<br />
analysis complement traditional reporting and analysis. One main<br />
difference is that while the latter only read data, the former<br />
require the change of possibly large numbers of existing and<br />
creation of new data records in the business model, preferably in<br />
real time. In this session, we describe the extension of an existing<br />
BI tool, Jedox OLAP, by <strong>GPU</strong>-based parallel algorithms for<br />
interactive planning scenarios. Compared to sequential inmemory<br />
algorithms, our CUDA approach yields tremendous<br />
speedups and can also cope with large amounts of data by using<br />
multiple <strong>GPU</strong>s.<br />
Speaker(s): Tobias Lauer (Senior Researcher, Jedox AG), Alexander<br />
Haberstroh (Software Developer, Jedox AG)<br />
Topic(s): Databases, Data Mining, Business Intelligence, Finance,<br />
Algorithms & Numerical Techniques (Intermediate)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0247 3D ADI Method for Fluid Simulation on<br />
Multiple <strong>GPU</strong>s<br />
Find out about a multiple <strong>GPU</strong> implementation of the Alternating<br />
Direction Implicit method for large 3D domains. The ADI technique<br />
is applied towards direct numerical fluid simulation. Modeling<br />
complex flows demands extremely large grids and a distributed<br />
computation is required for sharing the memory among multiple<br />
<strong>GPU</strong>s. In this session a novel distributed tridiagonal solver as well<br />
as parallelization and load balancing strategies will be covered in<br />
detail. Finally, a comprehensive performance analysis and scaling<br />
studies for different input geometries and possible future<br />
improvements will be discussed.<br />
Speaker(s): Nikolay Markovskiy (HPC DevTech Engineer, NVIDIA),<br />
Nikolai Sakharnykh (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques, Computational<br />
Fluid Dynamics (Intermediate)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM J2<br />
S0267A Mixing Graphics and Compute with Multiple <strong>GPU</strong>s<br />
In this session we will cover all the different aspects of interaction<br />
between graphics and compute. The first part of the session will<br />
focus on compute API interoperability with OpenGL (using CUDA<br />
and OpenCL APIs), while the second part of the session will delve<br />
into interoperability at a system level. In particular we will go<br />
through the challenges and benefits of dedicating one <strong>GPU</strong> for<br />
compute and another for graphics, how different system<br />
configurations affect data transfer between two <strong>GPU</strong>s, and how it<br />
translates into application design decisions helping to enable an<br />
efficient, cross-<strong>GPU</strong> interoperability between compute and<br />
graphics contexts.<br />
This session is repeated on Thursday at 15:30 (S0267B).<br />
Speaker(s): Alina Alt (Applied Engineer, NVIDIA)<br />
Topic(s): Visualization Application Design & Porting Techniques<br />
(Beginner)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM A5<br />
S0359 VMware and NVIDIA: Delivering 3D Workstations<br />
from the Cloud<br />
This session will detail the delivery of the most demanding<br />
Workstation class workloads from the private cloud using<br />
technologies from NVIDIA and VMware. We will cover the<br />
configuration and performance metrics of the combined VMware,<br />
NVIDIA direct pass through hardware accelerated graphics<br />
solution. Using sample workloads, we will demonstrate how<br />
customers can realize the operational and security benefits of<br />
cloud based personal computing without sacrificing performance.<br />
Speaker(s): Aaron Blasius (Sr. Product Manager, VMware), Warren<br />
Ponder (Director, Product Management, VMware)<br />
Topic(s): Visualization, Cloud Computing (Advanced)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM L<br />
S0427 Intra-Day Risk-Management with Parallelized<br />
Algorithms on <strong>GPU</strong>s<br />
The challenge with intra-day risk management is that a very large<br />
number of calculations are required to be performed in a very<br />
short amount of time. Typically, we may be interested in<br />
calculating VaR for 100 to 1000 securities per second based on<br />
100 million potential scenarios. The magnitude of these<br />
calculations is not Utopian but it reflects the reality of modern<br />
financial institutions and exchanges. In this presentation, we<br />
outline how the complex problem of intra-day risk management<br />
can be solved using parallelized algorithms on <strong>GPU</strong>s. The<br />
methodology has been proven in a POC at 2 financial institutions.<br />
Speaker(s): Partha Sen (CEO, Fuzzy Logix)<br />
Topic(s): Databases, Data Mining, Business Intelligence, Finance,<br />
Algorithms & Numerical Techniques, Supercomputing (Advanced)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM A3<br />
S0602 An Introduction to the Thrust Parallel<br />
Algorithms Library<br />
Thrust is a parallel algorithms library which resembles the C++<br />
Standard Template Library (STL). Thrust’s high-level interface<br />
greatly enhances developer productivity while enabling performance<br />
portability between <strong>GPU</strong>s and multicore CPUs. Interoperability with<br />
established technologies (such as CUDA, TBB and OpenMP)<br />
facilitates integration with existing software. In this talk we’ll walk<br />
though the library’s main features and explain how developers can<br />
build high-performance applications rapidly with Thrust.
Speaker(s): Nathan Bell (Senior Research Scientist, NVIDIA), Julien<br />
Demouth (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages and Compilers,<br />
Development Tools and Libraries (Beginner)<br />
TUESDAY, MAY 15, 17:00:00 AM (25 MINUTES)<br />
ROOM A2<br />
S0608 Toward Global Seismic Imaging based on<br />
Spectral-Element and Adjoint Methods<br />
Precise information about the structure of the solid Earth comes<br />
from seismograms recorded at the surface of a highly<br />
heterogeneous lithosphere. Seismic imaging based on spectralelement<br />
and adjoint methods can assimilate this information into<br />
three-dimensional models of elastic and anelastic structure.<br />
These methods fully account for the physics of wave excitation,<br />
propagation, and interaction by numerically solving the<br />
inhomogeneous equations of motion for a heterogeneous<br />
anelastic solid. Such methods require the execution of complex<br />
computational procedures that challenge the most advanced<br />
high-performance computing systems. Current research is<br />
petascale; future research will require exascale capabilities. We<br />
illustrate the current state-of-the-art based on an inversion for<br />
European upper-mantle structure. Our ultimate goal is to move<br />
toward “adjoint tomography” of the entire planet.<br />
Speaker(s): Jeroen Tromp (Director, Princeton Institute for<br />
Computational Science, Princeton)<br />
Topic(s): Supercomputing, (Intermediate)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM M<br />
S0643 Hybrid Architectures for Advanced Seismic<br />
Imaging: Recent Experiences at Bull (Presented by Bull)<br />
The two-part presentation describes Bull’s system architecture<br />
for accelerated seismic applications using <strong>GPU</strong>s, together with<br />
the parallel programming aspects involved and some examples of<br />
recent work. The first part covers hybrid system architectures,<br />
basic principles of Reverse Time Migration and the numerical<br />
methods used to implement it in various forms, together with the<br />
architectural features needed, depending on the specific<br />
algorithms used. The second part examines CUDA programming<br />
aspects and the use of compiler-based directives and libraries to<br />
convert existing codes for maximum performance and scalability<br />
on <strong>GPU</strong> architectures.<br />
Speaker(s): Mathieu Dubois (Senior HPC Consultant, Bull), Guy Gueritz<br />
(Oil & Gas Business Development Director, Bull)<br />
Topic(s): Energy Exploration, High Performance Computing<br />
(Intermediate)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM A8<br />
S0646 Massively Parallel Code Development on Stelletto<br />
CDA (Presented by Creative Consultants)<br />
Come participate in the global launch of Stelletto – a multi-Node,<br />
office based, <strong>GPU</strong> accelerated conSTELLAtion compute platform.<br />
Join Rob Farber (author/scientist), Denis Gerrer (CAPS<br />
Enterprise), and Greg Scantlen (Creative Consultants) to learn<br />
how to create and leverage massively parallel applications.<br />
Whether you are porting legacy code or developing new code from<br />
scratch, the Stelletto Code Development Appliance offers a<br />
cost-effective methodology for producing scalable apps. In 50<br />
minutes you will learn the essentials of assembling a complete<br />
hardware and software solution for scalable Many-Core and <strong>GPU</strong><br />
accelerated code development from plug-in Stelletto to massively<br />
parallel executable code.<br />
Speaker(s): Rob Farber (BlackDog Endeavors, LLC), Denis Gerrer<br />
(CAPS enterprise), Greg Scantlen (CreativeC.com)<br />
Topic (s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
(Intermediate)<br />
TUESDAY, MAY 15, 17:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0804 CUDA Debugger Training on Windows<br />
Nsight offers a variety of powerful CUDA debugging feature set<br />
that enables developers to quickly spot bugs. From the memory<br />
checker to advanced breakpoints and variable warp watch panel, a<br />
developer can quickly isolate access memory errors, filter out the<br />
thousands of threads to a specific thread and quickly spot<br />
abnormal variable value ranges. Through a set of comprehensive<br />
exercises, the attendee will be able to utilize these features to<br />
become fully proficient at developing CUDA code.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 17:30 (25 MINUTES)<br />
ROOM C<br />
S0043 30x Faster Regular Expressions on a <strong>GPU</strong><br />
We present a regular expression (regex) engine on a <strong>GPU</strong>. We<br />
utilize the highly parallel architecture of <strong>GPU</strong>s to accelerate such<br />
searches. We believe that previous attempts to utilize the <strong>GPU</strong> for<br />
this task did not fully tap its potential. Regex present imbalanced<br />
compute workloads which are very different from common <strong>GPU</strong><br />
applications (CFD, CG and image processing). Hence, they can<br />
teach us general lessons on how to utilize <strong>GPU</strong>s for more general<br />
workloads.Our initial results show 30x improvement in running<br />
time relative to single threaded commercial regex engines.<br />
Speaker(s): David Lehavi (Senior Research Scientist, HP)<br />
Topic(s): Databases, Data Mining, Business Intelligence (Advanced)<br />
TUESDAY, MAY 15, 17:30 (25 MINUTES)<br />
ROOM K<br />
S0287 Jacket for Multidimensional Scaling in Genomics<br />
In this tutorial, we will present AccelerEyes’ Jacket software<br />
which enables <strong>GPU</strong> computing in MATLAB through a user case<br />
study entitled “Multidimensional Scaling for Genomics”. We show<br />
how Jacket enables developers to write and run code on the <strong>GPU</strong><br />
in the native M-Language used in MATLAB. By simply casting data<br />
to Jacket’s <strong>GPU</strong> data structure, MATLAB functions are<br />
transformed into <strong>GPU</strong> functions. Additionally, we will also include<br />
demos of running MATLAB code on the <strong>GPU</strong> for image and signal<br />
processing, life science, finance, and other applications. A Q/A<br />
session will enable audience members to ask specific questions<br />
about Jacket.<br />
Speaker(s): Chris McClanahan (Software Engineer, AccelerEyes)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
TUESDAY, MAY 15, 17:30 (25 MINUTES)<br />
ROOM A2<br />
S0657 Applying for INCITE <strong>Program</strong>, Conclusions, Q&A<br />
This session offers a wrap-up of “<strong>GPU</strong>-accelerated Science on<br />
Titan: Tapping into the World’s Preeminent <strong>GPU</strong> Supercomputer to<br />
Achieve Better Science” with Jack Wells.<br />
Speaker(s): Jack Wells, Ph.D. (Director of Science, Oak Ridge<br />
Leadership Computing Facility, Oak Ridge National Laboratory )<br />
Topic(s): Supercomputing (Intermediate)<br />
45 CONFERENCE GUIDE TUESDAY
C++ Accelerated Massive Parallelism (C++ AMP)<br />
���������������������������������������������������������������<br />
What is C++ AMP, how can it help me, and where can I get it?<br />
C++ AMP is a key new C++ language feature plus an STL-like library. It's designed to help you increase the performance of<br />
����������������������������������������������������������������������������������������������������������������������<br />
���������������������������������������������������������������������������������������������������������������������������<br />
�������������������������������������������������������������������������������������������������������������������������<br />
���������������������������������������������������������������������������������������������������������������������<br />
���������������������������������������������������������������<br />
MICROSOFT Ad?<br />
What platforms and hardware does C++ AMP support?<br />
������������������������������������������������������������������������������������������������������������������������<br />
�����������������������������������������������������������������������������������������������������������������������<br />
����������������������������������������������������<br />
�������������������������������������������������������������������������������������������������������������������<br />
What new language feature does C++ AMP introduce?<br />
Microsoft added the restrict(amp)� ��������� ������ ���� ���� ������ ��� ���� ��������� ����������� ��������� ��� �������� ����� ����<br />
function can be executed on a C++ AMP accelerator. The restrict keyword instructs the compiler to statically check that the<br />
�����������������������������������������������������������������������������������������void myFunc() restrict(amp) {…}<br />
��������������������������������������������������������������������������������������������������������������������������<br />
for purposes that are unrelated to C++ AMP.<br />
What new classes (APIs) does C++ AMP introduce?<br />
��������������������������������������������������������������������������������������������������������������������<br />
���������������������������������������������������������������������������������������������������������������������<br />
��������������������������������������������������������������������������������������������������������������������������<br />
�����������������������������������������������������������������������������<br />
���������������������������������������������������������������������������������������������������<br />
What does C++ AMP code look like?<br />
�����������������������������������������������������������������������������������������������<br />
void AddArrays(int n, int m, int * pA, int * pB, int * pSum) {<br />
concurrency::array_view a(n, m, pA), b(n, m, pB), sum(n, m, pSum);<br />
concurrency::parallel_for_each(sum.extent, [=](concurrency::index i) restrict(amp)<br />
{<br />
sum[i] = a[i] + b[i];<br />
});<br />
}<br />
Follow our blog: ������������������������������������������<br />
Ask questions: ��������������������������������������������������������������������
SESSION INFORMATION<br />
WEDNESDAY, MAY 16<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM N<br />
S0010 Towards Routine Microsecond Molecular<br />
Dynamics Simulations on Commodity Hardware<br />
The original AMBER 11 provided performance on one <strong>GPU</strong><br />
equivalent to an 8 node cluster and almost 60ns/day for 8 <strong>GPU</strong>s<br />
running the JAC production benchmark without additional<br />
approximations outstripping the performance of all conventional<br />
supercomputers. Here we describe further optimization of the<br />
code, coupled with hardware and software advances on the part of<br />
NVIDIA, that provides performance of >50ns/day on a single <strong>GPU</strong><br />
with multiple <strong>GPU</strong>s providing simulation rates on systems the size<br />
of DHFR approaching a microsecond per day. This brings<br />
performance levels on desktops and commodity hybrid clusters to<br />
levels previously only considered possible using custom silicon.<br />
Speaker(s): Ross Walker (Assistant Professor, University of California<br />
San Diego)<br />
Topic(s): Molecular Dynamics, Life Sciences (Advanced)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM A8<br />
S0017 4D Medical Image Processing with CUDA<br />
Learn how to do 4D image processing with CUDA, especially for<br />
medical imaging applications. In this session we will give a couple<br />
of examples of how 4D image processing can take advantage of<br />
the computational power of the <strong>GPU</strong>. We will present how to use<br />
the <strong>GPU</strong> for functional magnetic resonance imaging (fMRI)<br />
analysis and true 4D image denoising. Most of our examples use<br />
the <strong>GPU</strong> both to speedup the analysis and to visualize the results.<br />
Speaker(s): Anders Eklund (PhD Student, Linköping University)<br />
Topic(s): Medical Imaging & Visualization, Audio, Image and Video<br />
Processing, Neuroscience, Visualization (Advanced)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM B<br />
S0072 <strong>GPU</strong>-Enabled Spatiotemporal Model of Stochastic<br />
Cardiac Calcium Dynamics and Arrhythmias<br />
Calcium ions play a central role controlling the contraction of the<br />
heart to pump blood. This requires tight regulation of cellular<br />
calcium dynamics which depends upon over 1,000,000 calcium<br />
channels that open and close stochastically and have a very specific<br />
spatial arrangement. In the School of Systems Biology at George<br />
Mason University, CUDA technology coupled to novel algorithms for<br />
Monte Carlo simulation have made possible this computationally<br />
expensive spatiotemporal model of calcium dynamics in the heart<br />
muscle cell to study the regulation of calcium dynamics and what<br />
aberrations leads to cardiac arrhythmia.<br />
Speaker(s): Mohsin Jafri (Professor and Chair, George Mason University),<br />
Hoang-Tron Minh Tuan (PhD Student, George Mason University)<br />
Topic(s): Life Sciences, Bioinformatics (Beginner)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM A7<br />
S0171 Numerical Modeling Of 3D Anisotropic Seismic<br />
Wave Propagation On Multi<strong>GPU</strong> Platforms<br />
We present an efficient and accurate numerical algorithm for the<br />
simulation of seismic experiments. The basis of the approach is a<br />
heterogeneous spectral element method implemented on<br />
Multi<strong>GPU</strong> applied to anisotropic elastic wave equation. The<br />
approach was designed to simulate wave propagation in 3D<br />
arbitrary anisotropic elastic media. Due to the use of an<br />
unstructured grid, the spectral element algorithm enables<br />
handling complicate geometries of the layers. We discuss results<br />
and computational efforts of simulation on Multi<strong>GPU</strong> platform.<br />
Several aspects of the code implementation are considered:<br />
optimal domain decomposition, data transfers between <strong>GPU</strong> by<br />
means of P2P and UVA, etc.<br />
Speaker(s): Denis Sabitov (Schlumberger)<br />
Topic(s): Energy Exploration, Algorithms & Numerical Techniques,<br />
Supercomputing, Molecular Dynamics (Intermediate)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM M<br />
S0253 Sensor Processing with Rugged Kepler <strong>GPU</strong>s<br />
(Presented by GE Intelligent Platforms)<br />
Swimming in sensors and drowning in data? Turn the tide on<br />
high-bandwidth sensors with rugged next-generation Kepler <strong>GPU</strong>s<br />
from NVIDIA. See how we deploy Kepler into the most extreme of<br />
environments, providing GP<strong>GPU</strong> capabilities onboard platforms<br />
where SWaP and GFLOPS/watt is key. Dig into four realtime CUDA<br />
sensor processing applications - Hyperspectral Imaging,Wide-Area<br />
Surveillance, 360° Situational Awareness, and GSM Cellular SIGINT.<br />
Discuss the CUDA algorithms, interconnects, and rugged platforms<br />
behind each. Learn how we utilize <strong>GPU</strong>Direct and realtime Linux for<br />
improved latency and determinism.<br />
Speaker(s): Dustin Franklin (GP<strong>GPU</strong> Applications Engineer, GE<br />
Intelligent Platforms)<br />
Topic(s): Audio, Image and Video Processing, General Interest, Machine<br />
Vision, Computer Vision (Intermediate)<br />
WEDNESDAY, MAY 16, 09:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0289 Fine-Grained Parallel Preconditioners for Fast<br />
<strong>GPU</strong>-based Solvers<br />
Leverage the power of <strong>GPU</strong>s for efficient parallel solution of large<br />
sparse linear systems of equations by means of fine-grained and<br />
scalable parallel preconditioners. In this session we describe<br />
parallel preconditioners for <strong>GPU</strong>s based on multicolor re-ordering<br />
for Gauss-Seidel-type and ILU-type preconditioners as well as<br />
approximate inverse (FSAI) preconditioners. With the power(q)pattern<br />
method we detail a novel method for controlling the fill-in<br />
pattern of ILU(p) factorizations that introduces a high degree of<br />
parallelism in the preconditioning phase. We demonstrate<br />
significant improvements with respect to solver time for various<br />
problem scenarios and different Krylov-type solvers.<br />
Speaker(s): Dimitar Lukarski (Research Associate, Karlsruhe Institute<br />
of <strong>Technology</strong> (KIT)), Jan-Philipp Weiss (Junior Professor, Karlsruhe<br />
Institute of <strong>Technology</strong>)<br />
Topic(s): Algorithms & Numerical Techniques (Advanced)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM A1<br />
S0353 <strong>Program</strong>ming Multi-<strong>GPU</strong>’s for Scalable Rendering<br />
Multi-<strong>GPU</strong> configurations are becoming common affordable<br />
options for OpenGL applications to scale performance, data size,<br />
display size and image quality. We show how to structure your<br />
application for multi-gpu rendering by using multiple threads and<br />
OpenGL contexts and handle the synchronization and data<br />
transfer. We conclude with a discussion of how to implement<br />
common parallel rendering approaches such as sort-first,<br />
sort-last and hybrid techniques.<br />
47 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
Speaker(s): Shalini Venkataraman (Senior Applied Engineer, NVIDIA<br />
Topic(s): Visualization (Advanced)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM L<br />
S0383 Speedup Derivatives and Structured Products<br />
Pricing, Reduce TCO Using <strong>GPU</strong>s<br />
Numerix will share its experience using <strong>GPU</strong> to significantly<br />
reduce its customers’ Total Cost of Ownership (TCO) and<br />
accelerate forward Monte Carlo pricing methods and hybrid<br />
models of complex financial structured products and variable<br />
annuities. Numerix will describe how it combines complex<br />
financial and actuarial modeling with user scripting to drive <strong>GPU</strong><br />
execution from a script interpreted at run time. This architecture<br />
is well suited to financial services firms with portfolios of many<br />
different types of structured products where deals are<br />
represented independently from the models used to price them.<br />
Speaker(s): Steve Karmesin (Senior Developer, Numerix)<br />
Topic(s): Finance, Algorithms & Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM A5<br />
S0420 NSight IDE for Linux and Mac<br />
NSight IDE for Linux and Mac is an all-in-one development<br />
environment that lets you develop, debug and optimize CUDA code in<br />
an integrated UI environment. If you were waiting for an IDE on Linux<br />
and Mac then this session is for you. This session provides a detail<br />
usage walk-through of a fully CUDA aware source editor, build<br />
integration of the CUDA toolchain, graphical debugger for both CPU<br />
and <strong>GPU</strong>, and graphical profiler to enable performance optimization.<br />
Speaker(s): David Goodwin (Software Engineer, NVIDIA), Eugene<br />
Ostroukhov (Tools Developer, NVIDIA)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 09:00 (25 MINUTES)<br />
ROOM K<br />
S0431 Evolving Use of <strong>GPU</strong> for Dassault Systems<br />
Simulation Products<br />
SIMULIA, the Dassault Systems brand for simuliation, has been<br />
working with NVIDIA GP<strong>GPU</strong> cards to accelerate the computation<br />
required in doing large-scale structural finite-element<br />
simulations with the widely used Abaqus product line. SIMULIA’s<br />
initial efforts with GP<strong>GPU</strong>’s have been focused on accelerating<br />
particularly costly parts of the code when running both on<br />
workstations and clusters. We will look at success in these areas<br />
with existing products. Futher SIMULIA is now looking at how<br />
evolving programming models like OpenACC open the door to<br />
using <strong>GPU</strong>’s as a compute platform more than acceleration for<br />
limited parts of an application.<br />
Speaker(s): Luis Crivelli (Dassault Systemes, SIMULIA)<br />
Topic(s): Computational Structural Mechanics, Parallel <strong>Program</strong>ming<br />
Languages & Compilers (Intermediate)<br />
WEDNESDAY, MAY 16, 09:00 (90 MINUTES)<br />
ROOM C<br />
S0531 Exascaling Your Apps<br />
In the global exascale race, hardware often takes center stage.<br />
But the race might ultimately be won or lost based on how well<br />
the industry optimizes new and existing applications for extreme<br />
parallelism. Today’s apps will not just run on tomorrow’s systems,<br />
so we must think strategically and creatively about how to design<br />
applications that take maximum advantage of the first power-<br />
efficient, accelerator-driven exascale systems. This panel of HPC,<br />
software and computer science experts will discuss what we can,<br />
and should be doing, including a review of new scientific and<br />
commercial HPC requirements, programming model options and<br />
how to best align architecture and software design processes.<br />
Speaker(s): Mike Bernhardt (The Exascale Report), Olav Lindtjorn<br />
(Schlumberger), Satoshi Matsuoka (Titech), Steve Scott (CTO, Tesla<br />
Business, NVIDIA), Jeff Vetter (Oak Ridge National Laboratory) )<br />
Topic(s): Supercomputing (Beginner)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0805 CUDA Profiler Training on Windows<br />
Nsight offers a comprehensive set of performance analysis tools.<br />
From the ability to trace complete system multi-core CPU and<br />
multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />
experiments, developers can identify system level optimization<br />
opportunities as well as expensive and inefficient CUDA kernels<br />
requiring in-depth analysis with the CUDA profiler. Through a set<br />
of comprehensive exercises, the attendee will be able to utilize<br />
these features to become fully proficient at optimizing complex<br />
CUDA applications.<br />
Speaker(s): NVIDIA Developer Tools Team)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 09:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2000 Emerging Companies Summit Opening Address,<br />
Followed by CEO on Stage featuring Rocketick and Cortexica<br />
The Emerging Companies Summit is a unique forum for startup<br />
companies to showcase innovative applications that leverage the<br />
<strong>GPU</strong> to solve visual and compute-intensive problems. The opening<br />
address includes an overview of NVIDIA’s <strong>GPU</strong> ecosystem<br />
development activities. ECS is a great opportunity to discover new<br />
players in the <strong>GPU</strong> ecosystem, find great investments, explore<br />
partnership and customer/vendor opportunities, network/build<br />
relationships, and discuss the future of an industry that is<br />
reshaping computing. Immediately following the opening address is<br />
the ECS CEO on Stage session featuring two startups who will each<br />
have 15 minutes to introduce their companies and interact with a<br />
panel of leading venture capitalists, technology executives, and<br />
industry analysts.<br />
Speaker(s): Jeff Herbst (Vice President of Business Development, NVIDIA),<br />
Tomer Ben-David (VP R&D, Rocketick), Iain McCready (CEO, Cortexica)<br />
Topic(s): General Interest<br />
WEDNESDAY, MAY 16, 09:30 (25 MINUTES)<br />
ROOM K<br />
S0225 Speedup Altair RADIOSS Solvers Using NVIDIA <strong>GPU</strong><br />
Solvers are the heart of Altair’s HyperWorks computer aided<br />
engineering simulation software. In this session, you will learn how<br />
<strong>GPU</strong> can improve their performance. Direct solver is widely used in<br />
structural analysis and sensitivity calculations. By offloading the<br />
intensive matrix computation on the <strong>GPU</strong> and using heterogeneous<br />
computing, you will discover how its speed can be increased<br />
compared to multi-core approach. Iterative solver is particularly<br />
suited to solve large problems with millions of degrees of freedom.<br />
An innovative hybrid parallelization using multi <strong>GPU</strong>s and MPI<br />
allowing dramatic solution time reduction will be presented.<br />
Speaker(s): Eric Lequiniou (Director, High Performance Computing,<br />
Altair), Hongwei Zhou (Senior Software Development Engineer, Altair)<br />
Topic(s): Computational Structural Mechanics (Beginner)
WEDNESDAY, MAY 16, 09:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0415 An Accelerated Weeks Method for Numerical<br />
Laplace Transform Inversion<br />
Mathematical methods based on the use of the Laplace transform<br />
are a standard component of undergraduate education. Real world<br />
problems however often yield Laplace space solutions which are<br />
too complex to be analytically inverted to expressions in physically<br />
meaningful variables. A robust numerical inversion approach is<br />
thus desirable. In this talk, I present one of the approaches to<br />
compute an approximate inverse, the Weeks method. I will also<br />
discuss the difficulties in performing numerical inversion. Finally,<br />
I will show how we have been able to utilize Jacket from<br />
AccelerEyes in MATLAB to more efficiently and robustly<br />
implement the Weeks method.<br />
Speaker(s): Patrick Kano (Co-Owner, Acunum Algorithms and<br />
Simulations, LLC)<br />
Topic(s): Algorithms & Numerical Techniques (Beginner)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
ROOM A2<br />
S0016 NVIDIA Grad Fellowship Fast Forward<br />
We invite you to a special presentation from our 2011-<strong>2012</strong><br />
Graduate Fellowship recipients to learn “what’s next” in the world<br />
of research and academia. The NVIDIA Graduate Fellowship<br />
recipients were selected from 200 applications in 27 countries.<br />
Sponsored projects involve a variety of technical challenges,<br />
including computer architecture, computer vision, programmability<br />
and optimization for heterogeneous systems, automotive computing<br />
and much more. We believe that these minds lead the future in our<br />
industry and we are proud to support the 2011-<strong>2012</strong> NVIDIA<br />
Graduate Fellows. For more information on the 2011-<strong>2012</strong> NVIDIA<br />
Graduate Fellows, please visit www.NVIDIA.com/fellowship.<br />
Speaker(s): David Luebke (Director, NVIDIA Research)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
ROOM N<br />
S0058 Advancing <strong>GPU</strong> Molecular Dynamics: Rigid Bodies<br />
in HOOMD-blue<br />
Learn how rigid body dynamics are implemented in HOOMD-blue.<br />
Previous releases were capable of executing classical molecular<br />
dynamics -- where free particles interact via smooth potentials and<br />
their motion through time is computed using Newton’s laws. The<br />
latest version allows particles to be grouped into bodies that move<br />
as rigid units. Users can now simulate materials made of cubes,<br />
rods, bent rods, jacks, plates, patchy particles, bucky balls, or any<br />
other arbitrary shapes. This talk covers how these algorithms are<br />
implemented on the <strong>GPU</strong>, tuned to perform well for bodies of any<br />
size, and discusses several use-cases relevant to research.<br />
Speaker(s): Joshua Anderson (Research Area Specialist, University of<br />
Michigan), Trung Dac Nguyen (University of Michigan)<br />
Topic(s): Molecular Dynamics, Computational Physics (Intermediate)<br />
WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />
ROOM K<br />
S0066 Particleworks: Particle-based CAE Software<br />
Fully Ported on Multi-<strong>GPU</strong><br />
Get the latest information on Particle-based fluid simulation +<br />
multi-<strong>GPU</strong> computing as a commercial CAE software named<br />
“Particleworks” in Japan. In this session, we provide the<br />
information such as (1) Particle simulation trends in CAE, (2)<br />
Particle simulation development in Japanese industry, (3)<br />
Implementation and performance of full <strong>GPU</strong> porting and (4)<br />
Multi-<strong>GPU</strong>s scaling with the several clients’ cases.<br />
Speaker(s): Yoshiaki Hanada (CEO, Prometech Software), Issei Masaie<br />
(Chief Engineer, Prometech Software)<br />
Topic(s): Computational Fluid Dynamics (Intermediate)<br />
WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />
ROOM A7<br />
S0125 Memory Efficient Reverse Time Migration in 3D<br />
Learn how we can image the interior of the Earth in three dimensions<br />
using Reverse Time Migration. We discuss how <strong>GPU</strong>s accelerate this<br />
method using parallel wave propagation kernels, texture memories<br />
and minimal device to host transfers. Further we discuss how the<br />
progression to 3D presents a multitude of new problems, particularly<br />
memory based - causing the system to be IO limited. By manipulating<br />
boundary positions and values to a pseudo-random form we show<br />
how many of these memory restrictions can be diminished and how<br />
detailed subsurface images can be fully constructed using <strong>GPU</strong>s.<br />
Speaker(s): Chris Leader (Research Assistant, Stanford<br />
Exploration Project)<br />
Topic(s): Energy Exploration, Computational Physics (Intermediate)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
ROOM A5<br />
S0235 Compiling CUDA and Other Languages for <strong>GPU</strong>s<br />
This talk gives an overview of the technology behind NVIDIA’s<br />
CUDA C and OpenCL C compilers, as well as the <strong>GPU</strong> architecture<br />
as seen from a compiler’s perspective. Similarities and<br />
differences with compiling to a CPU are also discussed. We<br />
provide insights into compiler optimizations affect performance<br />
and how other languages could be targeted to <strong>GPU</strong>s.<br />
Speaker(s): Vinod Grover (Senior Manager, NVIDIA), Yuan Lin (Senior<br />
Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
ROOM L<br />
S0250 From <strong>GPU</strong> Computing Toward Full HPC In Finance<br />
with <strong>GPU</strong>s<br />
During the previous <strong>GTC</strong> Murex has shown how the company had<br />
adapted their generic Monte-Carlo & PDE codes compatible with a<br />
payoff language. With one more year of experience with <strong>GPU</strong>s and<br />
OpenCL Murex will show how the company has broadened the<br />
usage of <strong>GPU</strong>s for other subjects like vanilla screening or model<br />
calibration and focus on their new challenge ‘use as many <strong>GPU</strong>s<br />
as possible’ for one single computation.<br />
Speaker(s): Pierre Spatz (Head of Quantitative Research, Murex SAS)<br />
Topic(s): Finance (Intermediate)<br />
WEDNESDAY, MAY 16, 10:00 (25 MINUTES<br />
ROOM B<br />
S0262 <strong>GPU</strong>-Accelerated Model-Based Drug Development<br />
Explore how <strong>GPU</strong>s can be used to improve the efficiency of drug<br />
development. Drug development is a very time-consuming,<br />
complex and expensive process that has low successful rate. A<br />
model-based drug development paradigm has been proposed as a<br />
possible solution to overcome these problems. A key challenge is<br />
to develop computational intensive drug and disease-specific<br />
models from a large quantity of highly complicated preclinical and<br />
clinical data. This session will describe how <strong>GPU</strong>s can and will<br />
49 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
play a key role in shortening the model development times and<br />
improving the efficiency of model-based drug development.<br />
Speaker(s): Chee Ng (Research Assistant Professor of Pediatrics,<br />
Children Hospital of Philadelphia/University of Pennsylvania)<br />
Topic(s): Life Sciences, Algorithms & Numerical Techniques,<br />
Bioinformatics (Beginner)<br />
WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />
ROOM A8<br />
S0312 <strong>GPU</strong> Implementation for Rapid Iterative Image<br />
Reconstruction in Nuclear Medicine<br />
<strong>GPU</strong> implementation can greatly accelerate iterative techniques of<br />
3D image reconstruction in nuclear medicine imaging. Single<br />
Photon Emission Computed Tomography (SPECT) is a functional<br />
imaging modality widely used in clinical diagnosis. To obtain high<br />
quality images within reduced scanning times high sensitivity<br />
collimators need to be used and their response function modeled<br />
in the reconstruction. This is in general very computationally<br />
intensive and unfeasible with CPU and algorithm<br />
implementations. Our software is able to perform the<br />
reconstruction of patient data within clinically acceptable times<br />
using relatively low cost and widely available hardware.<br />
Speaker(s): Jakub Pietrzak (Software Engineer, University of Warsaw)<br />
Topic(s): Medical Imaging & Visualization, Computational Physics,<br />
Computer Graphics (Intermediate)<br />
WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />
ROOM A1<br />
S0322 Warping & Blending for Multi-Display Systems<br />
This talk will describe how to scale up from one to many displays for<br />
high end visualization. You will learn about NVIDIA’s new Warp and<br />
Blend capability that allows you to create a truly seamless logical<br />
display comprised of many individual display outputs. With this new<br />
capability you can project your graphics onto curved surfaces and<br />
implement the correct transformation entirely on the <strong>GPU</strong> without<br />
any external hardware to get the correct display transformations.<br />
Speaker(s): Shalini Venkataraman (Senior Applied Engineer, NVIDIA)<br />
Topic(s): Visualization, Computer Graphics (Beginner)<br />
WEDNESDAY, MAY 16, 10:00 (25 MINUTES)<br />
ROOM A3<br />
S0325 ArrayFire Graphics: A Tutorial<br />
Learn how to use the graphics primitives for <strong>GPU</strong> computing<br />
available in ArrayFire, a new C and C++ library for <strong>GPU</strong> computing<br />
in both CUDA and OpenCL. In this session, we will cover the<br />
capabilities of ArrayFire’s graphics primitives and show how to<br />
build fast, visual computing applications. The tutorial centers<br />
around the construction of an application for the computation of<br />
optical flow on the <strong>GPU</strong> and will illustrate how to couple graphics<br />
with compute using ArrayFire’s graphics primitives. We will also<br />
show how the graphics primitives can be composed to result in<br />
scalable, fast graphics that complement <strong>GPU</strong> applications.<br />
Speaker(s): Chris McClanahan (Software Engineer, AccelerEyes)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
ROOM M<br />
S0633 Learn about new Hewlett-Packard <strong>GPU</strong><br />
Systems, Solutions, and Applications! (Presented by<br />
Hewlett-Packard)<br />
Learn how to shorten time to discovery, gain faster insight, and<br />
beat the barriers to innovation, with performance, efficiency and<br />
agility! Hear the latest on how you can do this and more with HP’s<br />
purpose built SL server line. Servers are specifically designed for<br />
<strong>GPU</strong>s with HP ProActive Insight Architecture. Discover what a new<br />
generation of workstation desktop <strong>GPU</strong> computing technology<br />
from HP and NVIDIA can do for you! HP will compare and contrast<br />
<strong>GPU</strong> compute performance on the PCI Express Gen2 architecture<br />
available in HP’s Z800 Workstation to the PCI Express Gen3<br />
architecture in HP’s latest Z820 Workstation.<br />
Speaker(s): David Korf (Senior Marketing Manager, Hewlett-Packard),<br />
John Brown (Principle Engineer, Hewlett-Packard)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0806 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight development<br />
team! Whether you would like a private meeting to discuss specific<br />
product features or test out your application with the latest version<br />
of Nsight, or you just want to hang out with the team after attending<br />
one of the exciting training session, the lounge is great place to<br />
learn everything you ever wanted to know about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 10:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2001 Emerging Companies Summit: CEO on Stage<br />
Featuring Unity Technologies, MirriAd, and BioDigital<br />
See the hottest new technologies from startups that are<br />
transforming computing. In a lively and fast-paced exchange, the<br />
Emerging Companies Summit CEO on Stage sessions will feature<br />
CEOs from three startups who will each have 15 minutes to<br />
introduce their companies and interact with a panel of leading<br />
venture capitalists, technology executives, and industry analysts.<br />
Speaker(s): David Helgason (CEO, Unity Technologies), Mark<br />
Popkiewicz (CEO, MirriAd), Aaron Oliker (Partner/Director of 3D<br />
<strong>Technology</strong>, BioDigital), and Frank Sculli (Co-Founder/Informatics<br />
Director, BioDigital)<br />
Panelist(s): Jon Peddie (President, Jon Peddie Research), Neil<br />
Sequeira (Managing Director, General Catalyst Partners), Savitha<br />
Srinivasan (Partner, IBM Venture Capital Group)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 10:30 (25 MINUTES<br />
MARRIOTT BALLROOM 3<br />
S0115 Specialized Sparse Matrix Formats and SpMV<br />
Kernel Tuning for <strong>GPU</strong>s<br />
This session is focused on optimizing sparse matrix-vector product<br />
for NVIDIA <strong>GPU</strong>s. This is a frequently studied kernel that appears in<br />
applications employing iterative methods for solving systems of<br />
linear equations. In the majority of cases the computation is<br />
memory bandwidth bound. Our study focuses on developing<br />
specialized sparse matrix storage formats and corresponding<br />
CUDA SpMV implementation that achieves high performance at the<br />
cost of additional start-up time required for conversion and tuning.<br />
The proposed storage formats allow to reduce required memory<br />
bandwidth by providing compact coding for locations of some<br />
frequently observed patterns of non-zero elements.<br />
Speaker(s): Arutyun Avetisyan (Deputy Director, ISP, Russian Academy<br />
of Sciences), Alexander Monakov (Researcher, ISP, Russian Academy<br />
of Sciences)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)
WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />
ROOM A3<br />
S0209 Performance of 3-D FFT Using Multiple <strong>GPU</strong>s with<br />
CUDA 4<br />
Get the latest information on performance of 3-D fast Fourier<br />
transform using multiple <strong>GPU</strong> devices. CUDA 4.0 enables efficient<br />
data transfer between <strong>GPU</strong>s. It is really important in FFT computation<br />
since it requires a large amount of all-to-all data exchange between<br />
<strong>GPU</strong>s. The peer-to-peer communication feature of <strong>GPU</strong>Direct V2<br />
improves the communication between the devices on same node.<br />
<strong>GPU</strong>Direct also accelerates the communication between <strong>GPU</strong>s on<br />
different nodes. We will present the latest performance results on a<br />
four-<strong>GPU</strong> system and up to 128 compute nodes of TSUBAME 2.0.<br />
Speaker(s): Akira Nukada (Researcher, Tokyo Institute of <strong>Technology</strong>)<br />
Topic(s): Algorithms & Numerical Techniques, Development Tools<br />
& Libraries (Advanced)<br />
WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />
ROOM B<br />
S0272 <strong>GPU</strong> GWAS - CUDA Based Genome Wide<br />
Association Studies<br />
We have developed a CUDA based GWAS analyzer that has<br />
achieved a 10x analysis speed-up per <strong>GPU</strong>. Genome wide<br />
association studies scans through millions of SNP markers across<br />
the human genome seeking the genetic basis of life threatening<br />
diseases such as coronary artery disease and prostate cancer. The<br />
prospect of the $1,000 genome heralds a potential new scale of<br />
GWAS involving hundreds of thousands of patients. We will<br />
discuss how we utilized the Python, R, and C languages to produce<br />
a robust GWAS algorithm that can be extended to multiple <strong>GPU</strong>s<br />
and <strong>GPU</strong> clusters.<br />
Speaker(s): Tim Bi (Graduate Research Analyst, Johns Hopkins<br />
University / George Mason University)<br />
Topic(s): Life Sciences, Bioinformatics (Intermediate)<br />
WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />
ROOM K<br />
S0304 Large Scale Computational Fluid Dynamics<br />
Simulations on Hybrid Supercomputers<br />
Learn how to approach the all-too-common program of trying to<br />
retrofit a major application for speed in the modern era of the<br />
hybrid supercomputer. In this talk, we will focus on computational<br />
fluid dynamics (CFD) codes that are run on Top500<br />
Supercomputers. Many of these applications have existed for 20 or<br />
more years, so the process of adding the <strong>GPU</strong> and getting<br />
wall-clock improvements in performance can be very challenging!<br />
Our talk will discuss how to properly target your effort, the impact<br />
of directives-based coding, and how to maintain efficiency across<br />
a hybrid cluster.<br />
Speaker(s): John Humphrey (Engineering Director, EM Photonics), Eric<br />
Kelmelis (CEO, EM Photonics)<br />
Topic(s): Computational Fluid Dynamics, Supercomputing<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />
ROOM A8<br />
S0348 <strong>GPU</strong>s Open New Avenues in Medical MRI<br />
See how <strong>GPU</strong>s enable exciting new developments in medical<br />
Magnetic Resonance Imaging (MRI). Their computational power<br />
makes now practical new MRI techniques that can bring shorter<br />
imaging sessions, better images, and more insight into human<br />
physiology. Learn about the characteristics of the general<br />
computational approach for obtaining the final image, and how it<br />
can be implemented using an iterative conjugate gradient<br />
algorithm. The algorithm exhibits massive parallelism and fits<br />
well the <strong>GPU</strong> architecture. Learn about its CUDA implementation<br />
details and Matlab integration. See throughput measurements of<br />
Tesla <strong>GPU</strong>s compared to top of the line many-core and large RAM<br />
CPU systems.<br />
Speaker(s): Chris A. Cocosco (Scientist, University Medical Center<br />
Freiburg, Dept. of Radiology, Medical Physics)<br />
Topic(s): Medical Imaging & Visualization (Beginner)<br />
WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />
ROOM A7<br />
S0352 <strong>GPU</strong>-Accelerated Parallel Computing for<br />
Simulation of Seismic Wave Propagation<br />
We adopted <strong>GPU</strong> to accelerate large-scale, parallel finitedifference<br />
(FDTD) simulation of seismic wave propagation.<br />
Effective parallel implementation is needed because the size of<br />
the memory of a single <strong>GPU</strong> is too small for real applications.<br />
Thus we describe the memory optimization, the threedimensional<br />
domain decomposition, and overlapping the<br />
communication and computation adopted in our program. We<br />
achieved so far a high performance (single-precision) of about 61<br />
TFlops by using 1200 <strong>GPU</strong>s of TSUBAME-2.0, the <strong>GPU</strong><br />
supercomputer in Tokyo Institute of <strong>Technology</strong>, Japan. As an<br />
important application, we show the results of the simulation of the<br />
2011 Tohoku-Oki mega-quake.<br />
Speaker(s): Taro Okamoto (Assistant Professor, Tokyo Institute<br />
of <strong>Technology</strong>)<br />
Topic(s): Energy Exploration, Computational Physics, General Interest<br />
(Advanced)<br />
WEDNESDAY, MAY 16, 10:30 (25 MINUTES)<br />
ROOM A1<br />
S0355 Seamless Scalable Displays- Using NVDIA Warp +<br />
Intensity API<br />
In this talk we will discuss how we use the NVIDIA Warp and<br />
Intensity API to create seamless displays made up of<br />
multiprojectors based on our camera feedback systems. We will<br />
show and discuss case studies in production including a 25<br />
megapixel touch wall, military dome simulation systems, VR<br />
Walls, VR Caves, and immersive conference rooms that are made<br />
affordable and enabled by this technology.<br />
Speaker(s): Rajeev Surati (President, Scalable Display Technologies)<br />
Topic(s): Visualization, Audio, Image and Video Processing, Computer<br />
Vision, Computer Graphics (Beginner)<br />
WEDNESDAY, MAY 16, 11:00 (50 MINUTES)<br />
HALL 1<br />
S3001 Day 2 Keynote: From Democratic Consensus to<br />
Cannibalistic Hordes: <strong>GPU</strong> Computing Reveals the<br />
Principles of Collective Behavior<br />
Collective behavior is one of the most pervasive features of the<br />
natural world. Our brains are composed of billions of<br />
interconnected cells communicating with chemical and electrical<br />
signals. We are integrated in our own human society. Elsewhere in<br />
the natural world a fish school convulses, as if one entity, when<br />
being attacked by a predator. How does individual behavior<br />
produce dynamic group-level properties? Do animal groups -or<br />
even cells in a tumor- function as some form of ‘collective mind’?<br />
How does socially contagious behavior spread through natural<br />
human crowds? In his keynote address, Prof. Iain D. Couzin, will<br />
demonstrate how <strong>GPU</strong> computing has been pivotal in the study of<br />
51 CONFERENCE GUIDE WEDNESDAY
NVIDIA ® Quadro ® by PNY<br />
Visually Amplify Your Desktop<br />
If you’re an artist, designer, or video professional, accelerate your<br />
® ® Quadro by PNY professional<br />
graphic solutions. Delivering excellent graphics performance<br />
across a broad range of design, animation and video<br />
applications, NVIDIA Quadro by PNY offers the advantage.<br />
Get The Advantage·<br />
To learn more go to www.pny.com/quadro<br />
© <strong>2012</strong> NVIDIA Corporation. NVIDIA, the NVIDIA logo, Quadro are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries.<br />
Other company and product names may be trademarks of the respective companies with which they are associated. All rights reserved.<br />
The PNY logo is a registered trademark of PNY Technologies, Inc. All other trademarks are the property of their respective owners. Copyright © <strong>2012</strong> PNY Technologies, Inc. All rights reserved.
collective behavior, helping reveal how collective action emerges<br />
in a wide range of groups from plague locusts to human crowds,<br />
and the critical role that uninformed, or weakly-opinionated,<br />
individuals play in democratic consensus decision-making.<br />
Speaker(s): Iain Couzin (Assistant Professor, Princeton University)<br />
Topic(s): General Interest (All Levels)<br />
WEDNESDAY, MAY 16, 11:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2002 Emerging Companies Summit: CEO on Stage<br />
Featuring eyeSight Mobile, Numira Biosciences, and Ubitus<br />
See the hottest new technologies from startups that are<br />
transforming computing. In a lively and fast-paced exchange, the<br />
Emerging Companies Summit CEO on Stage sessions will feature<br />
CEOs from three startups who will each have 15 minutes to<br />
introduce their companies and interact with a panel of leading<br />
venture capitalists, technology executives, and industry analysts.<br />
Speaker(s): Gideon Shmuel (CEO, eyeSight Mobile), David Weinstein<br />
(CTO, Numira Biosciences), Wesley Kuo, (CEO, Ubitus)<br />
Panelist(s): Jon Peddie (President, Jon Peddie Research), Neil<br />
Sequeira (Managing Director, General Catalyst Partners), Savitha<br />
Srinivasan (Partner, IBM Venture Capital Group)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM C<br />
S0027B All-In-One Debugging Experience with CUDA-<br />
GDB and CUDA-MEMCHECK<br />
CUDA Debugger tools CUDA-GDB and CUDA-MEMCHECK provide<br />
a whole new feature set to help improve your CUDA application<br />
development cycle. This session is a detail walk-through of the<br />
key debugger features and advanced techniques on using printf,<br />
CUDA-GDB and MEMCHECK together to improve overall code<br />
productivity on Linux and MacOS platforms. This tutorial will also<br />
include live demos.<br />
Speaker(s): Geoff Gerfin (Technical Manager / Senior Engineer,<br />
NVIDIA), Vyas Venkataraman (Software Engineer, NVIDIA)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0029 Leveraging Matrix Block Structure In Sparse<br />
Matrix-Vector Multiplication<br />
The commonly occurring block structure of sparse matrices can<br />
be effectively leveraged to improve the performance of Sparse<br />
Matrix-Vector multiplication (SpMV) on <strong>GPU</strong>s. This session will<br />
present one such algorithm and discuss both its design and its<br />
performance relative to other SpMV algorithms. In particular,<br />
aspects of <strong>GPU</strong> floating point performance, <strong>GPU</strong> memory use, and<br />
datastructure translation effort will be detailed.<br />
Speaker(s): Steve Rennich (HPC Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />
ROOM K<br />
S0064 MSC Nastran Sparse Direct Solvers for Tesla <strong>GPU</strong>s<br />
The current implementation of MSC Nastran’s MSCLDL and<br />
MSCLU sparse direct solvers for multiple Tesla <strong>GPU</strong>s is<br />
presented. The matrix is first statically decomposed into a<br />
prescribed number of domains. The Schur compliments are then<br />
calculated with CPUs and <strong>GPU</strong>s, and the residual structure is<br />
solved afterward. Back-substitution is used to find the solution at<br />
every grid point. Merits of this method are discussed and<br />
performance comparisons are made.<br />
Speaker(s): Cheng Liao (Development Manager, MSCsoftware)<br />
Topic(s): Computational Structural Mechanics (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />
ROOM A7<br />
S0140 Accelerating Reservoir Simulation and Algebraic<br />
Multigrid with <strong>GPU</strong>s<br />
Given a model of a reservoir’s rock and well properties, a<br />
reservoir simulator solves the PDEs for the multiphase flow<br />
through porous rock to predict well production. Over the past<br />
several decades, simulation has progressed from coarse 2D<br />
models to detailed 3D models, providing strong fidelity to<br />
empirical production rates. By reformulating the Marathon Oil<br />
Corporation’s Multiscale Flow Simulator to use <strong>GPU</strong>s, we improve<br />
the overall execution speed by a factor of over 100, allowing fast<br />
turnaround on a <strong>GPU</strong> workstation. We also introduce GAMPACK, a<br />
fully-accelerated <strong>GPU</strong> algebraic multigrid solver, and demonstrate<br />
its performance relative to CPU solvers.<br />
Speaker(s): Kenneth Esler (Computational Physicist, Stone Ridge<br />
<strong>Technology</strong>), Vincent Natoli (Founder & CEO, Stone Ridge <strong>Technology</strong>)<br />
Topic(s): Energy Exploration (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM N<br />
S0142 VMD: High Performance Molecular Visualization<br />
and Analysis on <strong>GPU</strong>s<br />
This talk will present recent successes in the use of <strong>GPU</strong>s to<br />
accelerate interactive molecular visualization and analysis tasks<br />
on desktop computers, and batch-mode simulation and analysis<br />
jobs on <strong>GPU</strong>-accelerated HPC clusters. We’ll present Fermispecific<br />
algorithms and optimizations and compare with those for<br />
other devices. We’ll also present performance and performance/<br />
watt results for VMD analysis calculations on <strong>GPU</strong> clusters, and<br />
conclude with a discussion of ongoing work and future<br />
opportunities for <strong>GPU</strong> acceleration, particularly as applied to the<br />
analysis of petascale simulations of large biomolecular complexes<br />
and long simulation timescales.<br />
Speaker(s): John Stone (Senior Research <strong>Program</strong>mer, University of<br />
Illinois at Urbana-Champaign)<br />
Topic(s): Molecular Dynamics, Algorithms & Numerical Techniques,<br />
Computer Graphics (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />
ROOM A3<br />
S0307 New Advances in <strong>GPU</strong> Linear Algebra<br />
Hear product experts explain how we have created two of the most<br />
widely used libraries in the <strong>GPU</strong> computing ecosystem. The CULA<br />
library for dense linear algebra has been expanding to multi-<strong>GPU</strong><br />
and out-of-core applications, meaning that users are no longer<br />
limited by the onboard <strong>GPU</strong> memory for their work. In this field,<br />
effectively using multiple <strong>GPU</strong>s is significantly more challenging than<br />
a single <strong>GPU</strong>! The brand new CULA Sparse library tackles the tough<br />
world of sparse linear algebra and achieves 10x speedups. Learn<br />
more about what makes these two libraries work in this session.<br />
Speaker(s): John Humphrey (Engineering Director, EM Photonics), Kyle<br />
Spagnoli (Research Engineer, EM Photonics)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />
53 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM B<br />
S0327 Large and Sparse– Mass Spectrometry Data<br />
Processing in the <strong>GPU</strong><br />
Learn how the <strong>GPU</strong> helps identify millions of ions in datasets of<br />
several billion points of four-dimensional sparse data. The data is<br />
first reduced to 3D to locate regions of dense data, and then only<br />
those regions are processed in 4D. Processing involves combining<br />
several steps of convolution filters in three axes, finding local<br />
maximums in volumes of data, and extracting information from<br />
the data around each local maximum.<br />
Speaker(s): Jose de Corral (Principal Consulting Engineer,<br />
Waters Corporation)<br />
Topic(s): Life Sciences, Bioinformatics (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM A1<br />
S0335 Live 3D-Video with a Lightfield Camera<br />
In this session you will learn what a lightfield camera is, how it<br />
works and what you can do with it. Next to the theoretical<br />
presentation we give a live demo of the camera system developed<br />
by our company Raytrix that gives you 3D live video from a single<br />
camera through a single lens currently at up to 10fps with a<br />
maximum effective resolution of 3 megapixels synthesized from<br />
an 11 megapixel sensor using CUDA algorithms on a GTX580.<br />
Post-production features include pixel-wise focusing, depth zoom,<br />
variable stereo base-line and base-line rotation.<br />
Speaker(s): Christian Perwass (CEO, Raytrix GmbH)<br />
Topic(s): Computational Photography, Audio, Image and Video<br />
Processing, Stereoscopic 3D, Computer Vision (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />
ROOM A8<br />
S0342 Volumetric Processing and Visualization on<br />
Heterogeneous Architecture<br />
Volumetric data is typically very large and involves intensive<br />
computation for processing and visualization. We have developed an<br />
OpenCL-based framework that can utilize all available resources in<br />
a system or a cluster of systems. The framework manages one or<br />
more OpenCL devices. A large volume is partitioned into bricks.<br />
Each OpenCL device is associated with a set of brick producers that<br />
generates the contents of bricks while optionally utilizing other<br />
bricks as input. The framework is also composed of a scheduler<br />
that distributes brick workloads to different devices and chooses an<br />
optimized processing order aiming at certain criteria.<br />
Speaker(s): Wei Li (Research Scientist, Siemens Corporation)<br />
Topic(s): Visualization, Supercomputing (Advanced)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM L<br />
S0369 Running Risk On <strong>GPU</strong>s<br />
A key component of Basel III is the Credit Value Adjustment (CVA)<br />
which is in essence the value of counter-party credit risk.<br />
Quantifying the CVA on simple products already poses<br />
considerable computational challenges and considering many<br />
banks have hundreds of thousands of positions it becomes clear<br />
that the computational challenges of CVA are massive. Calculating<br />
CVA sensitivities for hedging only add to this burden. In this talk<br />
we will discuss real world applications of <strong>GPU</strong>s in risk<br />
management and show how, using CUDA, <strong>GPU</strong> computing is an<br />
enabling technology to address the computational challenges of<br />
an evolving regulatory environment.<br />
Speaker(s): Norbert Hari (Trading Quantitative Analyst, ING Bank nv), Tim<br />
Wood (Quantitative Analyst, ING Bank nv)<br />
Topic(s): Finance (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM A5<br />
S0419B Optimizing Application Performance with CUDA<br />
Profiling Tools<br />
NVIDIA provides two powerful profiling tools that you can use to<br />
maximize your application’s performance. The NVIDIA Visual<br />
Profiler helps you understand your application’s behavior with a<br />
detailed timeline and data from <strong>GPU</strong> performance counters. The<br />
Visual Profiler also provides an automatic, data-driven analysis<br />
engine that provides suggestions on potential optimization<br />
strategies for your application. Nvprof is a command-line profiler<br />
that provides gprof-like functionality for the <strong>GPU</strong>. Nvprof provides<br />
summary information about where your application is spending<br />
the most time, so that you can focus your optimization efforts.<br />
This session will provide a step-by-step walk through of both of<br />
these profiling tools, showing how you can use these tools to<br />
identify optimization opportunities at the application, kernel, and<br />
source-line levels.<br />
Speaker(s): David Goodwin (Software Engineer, NVIDIA)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (25 MINUTES)<br />
ROOM A2<br />
S0600 Scalable <strong>GPU</strong> Graph Traversal<br />
Breadth-first search (BFS) is a core primitive for graph traversal<br />
and a basis for many higher-level graph analysis algorithms. It is<br />
also representative of a class of parallel computations whose<br />
memory accesses and work distribution are both irregular and<br />
data-dependent. Recent work has demonstrated the plausibility of<br />
<strong>GPU</strong> sparse graph traversal, but has tended to focus on<br />
asymptotically inefficient algorithms that perform poorly on<br />
graphs with non-trivial diameter. We present a BFS parallelization<br />
focused on fine-grained task management constructed from<br />
efficient prefix sum that achieves an asymptotically optimal<br />
O(|V|+|E|) work complexity. Our implementation delivers excellent<br />
performance on diverse graphs, achieving traversal rates in<br />
excess of 3.3 billion and 8.3 billion traversed edges per second<br />
using single and quad-<strong>GPU</strong> configurations, respectively. This level<br />
of performance is several times faster than state-of-the-art<br />
implementations both CPU and <strong>GPU</strong> platforms.<br />
Speaker(s): Duane Merrill (Research Scientist, NVIDIA)<br />
Topic(s): Algorithms and Numerical Techniques (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM M<br />
S0637 Analyzing performance and power of applications<br />
with <strong>GPU</strong>s on Dell 12G platforms (Presented by Dell)<br />
In this talk, both performance and power aspects of running<br />
various applications on NVIDIA <strong>GPU</strong>s on Dell 12G platforms will be<br />
presented. These platforms utilize the latest PCIe Gen 3 slots and<br />
processors in conjunction with varying number of NVIDIA <strong>GPU</strong>s<br />
and are tested with several applications both from a performance<br />
perspective and a power perspective.<br />
Speaker(s): Dr. Jeff Layton (HPC Enterprise Technologist, Dell)<br />
Topic(s): Supercomputing, Visualization (Intermediate)
WEDNESDAY, MAY 16, 14:00 (80 MINUTES)<br />
HALL 1<br />
S0642 Inside Kepler<br />
In this talk, individuals from the <strong>GPU</strong> architecture and CUDA<br />
software groups will dive into the features of the compute<br />
architecture for “Kepler” – NVIDIA’s new <strong>GPU</strong>. From the<br />
reorganized processing cores with new instructions and<br />
processing capabilities, to an improved memory system with<br />
faster atomic processing and low-overhead ECC, we will explore<br />
how the Kepler <strong>GPU</strong> achieves world leading performance and<br />
efficiency, and how it enables wholly new types of parallel<br />
problems to be solved.<br />
Speaker(s): Stephen Jones (CUDA Developer, NVIDIA), Lars Nyland<br />
(Senior Architect, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM J1<br />
S0700 Stampede System Architecture and Early<br />
Accelerator <strong>Program</strong>ming Experiences<br />
We present a description of the design of the Stampede system to<br />
be deployed at TACC over the course of <strong>2012</strong>. Stampede comprises<br />
a 2PF Intel Sandy Bridge cluster with FDR InfiniBand augmented<br />
8PF of Intel MIC Architecture co-processors. We will describe the<br />
design of the system, the datacenter that houses it, and expected<br />
programming models and usage modes. In support of this, we will<br />
present early experiences programming for the Intel MIC<br />
Architecture using the Knights Ferry Software Development<br />
Platform. Key to this will be the presentation of several different<br />
programming models and the scalability of the resulting codes.<br />
Speaker(s): Bill Barth (Director of High Performance Computing, Texas<br />
Advanced Computing Center, University of Texas at Austin)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0807 CUDA Debugger Training on Windows<br />
Nsight offers a variety of powerful CUDA debugging feature set<br />
that enables developers to quickly spot bugs. From the memory<br />
checker to advanced breakpoints and variable warp watch panel, a<br />
developer can quickly isolate access memory errors, filter out the<br />
thousands of threads to a specific thread and quickly spot<br />
abnormal variable value ranges. Through a set of comprehensive<br />
exercises, the attendee will be able to utilize these features to<br />
become fully proficient at developing CUDA code.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 14:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2003 Emerging Companies Summit: Fireside Chat with<br />
Jen-Hsun Huang (CEO and Co-Founder, NVIDIA) and Tim<br />
Bajarin (President, Creative Strategies)<br />
NVIDIA CEO and co-founder Jen-Hsun Huang will take part in a<br />
fireside chat with Tim Bajarin, one of IT world’s pre-eminent<br />
analysts and president of Creative Strategies. They will discuss<br />
trends in mobile, visual and parallel computing, and the<br />
transformational changes ahead for the industry.<br />
Speaker(s): Jen-Hsun Huang (CEO, President and Co-Founder,<br />
NVIDIA), Tim Bajarin (President, Creative Strategies)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />
ROOM A3<br />
S0085 Floating Point and IEEE 754 Compliance for<br />
NVIDIA <strong>GPU</strong>s: Precision & Performance<br />
As a result of continuing improvements, NVIDIA offers <strong>GPU</strong>accelerated<br />
floating-point performance in compliance with IEEE<br />
754. It is our experience that a number of issues related to floating<br />
point accuracy and compliance are a frequent source of confusion<br />
both on CPUs and <strong>GPU</strong>s. The purpose of this talk is to discuss the<br />
most common ones related to NVIDIA <strong>GPU</strong>s and to supplement<br />
the documentation in the CUDA C <strong>Program</strong>ming <strong>Guide</strong><br />
Speaker(s): Alex Fit-Florea (Senior Engineer, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques, Development Tools<br />
& Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />
ROOM A8<br />
S0105 Hardware Acceleration for Vessel<br />
Visualization Tasks<br />
To analyze datasets visually, systems with fast feedback loops on<br />
user interaction are beneficial. In this session rendering and<br />
preprocessing techniques for medical volume data will be<br />
presented using OpenGL and CUDA. In the context of the coronary<br />
artery disease the analysis of individual vessel branches is<br />
important. We show how local transfer function application and<br />
generation by means of histogramm analysis can help navigating<br />
and finding details in the datasets. Furthermore, domain-specific<br />
acceleration and illustration techniques for volume rendering are<br />
also applied to datasets from brain aneurysms.<br />
Speaker(s): Christoph Kubisch (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Medical Imaging & Visualization, Computer Graphics (Beginner)<br />
WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />
ROOM K<br />
S0143 Fluid-Structure-Interaction Using SPH and<br />
GP<strong>GPU</strong> <strong>Technology</strong><br />
There are two goals when developing engineering analysis<br />
software, one is accuracy and the other is speed. In the area of<br />
Fluid-Structure Interaction (FSI) computational time has always<br />
been the major impediment to solving large realistic engineering<br />
problems. In our implementation the fluid/structural dynamics<br />
solver uses a combination of <strong>GPU</strong>/CPU processing. The added<br />
benefit of using a powerful <strong>GPU</strong> workstation is that it is roughly 10<br />
times less expensive than a regular CPU cluster. In this paper, we<br />
present the use of <strong>GPU</strong> <strong>Technology</strong> as implemented in the explicit<br />
dynamic finite element software IMPETUS Afea Solver ® .<br />
Speaker(s): Jean Luc Lacome (IMPETUS Afea SAS), Jerome Limido<br />
(IMPETUS Afea SAS)<br />
Topic(s): Computational Structural Mechanics, Algorithms &<br />
Numerical Techniques, Computational Fluid Dynamics (Intermediate)<br />
WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />
ROOM A7<br />
S0190 Large-Scale Reservoir Simulation on <strong>GPU</strong><br />
Develop highly parallel <strong>GPU</strong>-based GMRES solver and several<br />
precondtioners, and couple them with the in-house reservoir<br />
simulator to speedup large-scale reservoir simulation with over<br />
one million grid blocks. For those preconditioners, we develop the<br />
highly parallelized ILU(k), ILUT, and block ILU(k), block ILUT, with<br />
matrix partition by METIS on <strong>GPU</strong>. The excellent speedup and<br />
accurate results can demonstrate the great promising future of<br />
the <strong>GPU</strong> parallel device in parallel reservoir simulation.<br />
55 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
Speaker(s): Song Yu (Chemical & Petroleum Department, University<br />
of Calgary)<br />
Topic(s): Application Design & Porting Techniques, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 14:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0271 Fast Adaptive Sampling Technique for Multi-<br />
Dimensional Integral Estimation Using <strong>GPU</strong>s<br />
Evaluating multi-dimensional integrals is a commonly encountered<br />
problem in many areas of science including Physics and Volume<br />
estimation of convex bodies. One of the widely used techniques for<br />
integral evaluation in large dimensions is the Monte Carlo method.<br />
Vanilla Monte Carlo methods of Integral Estimation use uniform<br />
sampling techniques. Variance of such uniform sampling reduces<br />
as 1/√Sample-size, which is too slow for most real life applications.<br />
In this study, we discuss about an adaptive sampling technique<br />
called VEGAS which reduces the variance at a much faster rate than<br />
uniform sampling. We present a new parallel implementation for<br />
VEGAS based on CUDA that can significantly reduce the<br />
computation time of multi-dimensional integrals. We show that our<br />
<strong>GPU</strong> based implementation of VEGAS achieves up to a 45x speed up<br />
over an equivalent CPU based implementation.<br />
Speaker(s): Srinivasa Prasanna (Professor, Internation Institute of<br />
Information <strong>Technology</strong> Bangalore), Pradeep Rao (<strong>Technology</strong><br />
Architect, Infosys Technologies Ltd)<br />
Topic(s): Algorithms & Numerical Techniques, Finance (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0035 <strong>GPU</strong> Parallelization of Gibbs Sampling:<br />
Abstractions, Results, and Lessons Learned<br />
Monte-Carlo-Markov-Chain (MCMC) estimation of Hierarchical<br />
Bayesian (HB) models is not only time-consuming, but also<br />
difficult to parallelize due to its sequential (Markovian) nature. We<br />
present an abstraction of a widely-used MCMC algorithm, called<br />
Gibbs sampling. We define a taxonomy of variable blocks, and for<br />
each type of variable block we offer suitable parallelization<br />
strategies, along with their corresponding CUDA implementations.<br />
For large problems where model estimation may take several<br />
hours or days using a single-threaded software, we see speedups<br />
in the 30x-100x range, thereby reducing estimation time to a few<br />
hours. In addition to lower computation cost relative to MPI-based<br />
parallelization, the reduction in estimation time allows for a more<br />
interactive modeling experience. We offer an extensive discussion<br />
of lessons learned for the broader scientific computing field,<br />
including an analysis of tradeoffs between computation costs and<br />
development costs, implications of our tradeoff analysis for<br />
optimal software development and parallelization, and some<br />
practical tips and gotcha’s for rookie <strong>GPU</strong> programmers.<br />
Speaker(s): Alireza Mahani (Quantitative Modeler, Sentrana)<br />
Topic(s): Algorithms & Numerical Techniques, Databases, Data Mining,<br />
Business Intelligence (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
ROOM A3<br />
S0042 Solving Challenging Numerical Linear Algebra<br />
Algorithms using Multiple <strong>GPU</strong> Accelerators<br />
See the newest features integrated in MAGMA (Matrix Algebra on<br />
<strong>GPU</strong> and Multicore Architectures) to tackle the multiple <strong>GPU</strong>-based<br />
systems for numerical linear algebra. In this talk, we describe how<br />
we leveraged MAGMA to solve existing and new challenging<br />
numerical problems on multiple hardware accelerators. Using a<br />
hybridization methodology, the new multi<strong>GPU</strong>-enabled MAGMA is<br />
characterized by a representation of linear algebra algorithms as<br />
directed acyclic graphs, where nodes correspond to tasks and edges<br />
to data dependencies among them, and a dynamic runtime system<br />
environment StarPU used to schedule various computational kernels<br />
over hybrid architectures of <strong>GPU</strong>s and homogeneous multicores.<br />
Speaker(s): Hatem Ltaief (Computational Scientist, KAUST<br />
Supercomputing Laboratory), Stanimire Tomov (University of Tennessee)<br />
Topic(s): Algorithms & Numerical Techniques, Development Tools<br />
& Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM A5<br />
S0099 Debugging <strong>GPU</strong> Applications For Correctness<br />
and Performance<br />
This session reveals how debugging CUDA applications is made<br />
straightforward with the powerful Allinea DDT debugger. New<br />
features enabling greater understanding of performance<br />
optimizations will be explored, showing how they can be used to<br />
produce better, faster CUDA code. Coupled with newly released<br />
support for multiple languages and compilers we will also show<br />
how Allinea DDT is enabling developers on desktops and the<br />
largest supercomputers to achieve both correct and efficient<br />
<strong>GPU</strong> applications.<br />
Speaker(s): David Lecomber (CTO, Allinea Software)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM N<br />
S0127 Petascale Molecular Dynamics Simulations on<br />
<strong>GPU</strong>-Accelerated Supercomputers<br />
The highly parallel molecular dynamics code NAMD was chosen in<br />
2006 as a target application for the NSF petascale supercomputer<br />
now know as Blue Waters. NAMD was also one of the first codes<br />
to run on a <strong>GPU</strong> cluster when G80 and CUDA were introduced in<br />
2007. How do the Cray XK6 and modern <strong>GPU</strong> clusters compare to<br />
300,000 CPU cores for a hundred-million-atom Blue Waters<br />
acceptance test? Come learn the opportunities and pitfalls of<br />
taking <strong>GPU</strong> computing to the petascale and the importance of<br />
CUDA 4.0 features in combining multicore host processors and<br />
<strong>GPU</strong>s in a legacy message-driven application.<br />
Speaker(s): James Phillips (Senior Research <strong>Program</strong>mer, University<br />
of Illinois)<br />
Topic(s): Molecular Dynamics, Application Design & Porting<br />
Techniques, Parallel <strong>Program</strong>ming Languages & Compilers,<br />
Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM K<br />
S0214 <strong>GPU</strong> Based Stacking Sequence Optimization For<br />
Composite Skins Using GA<br />
The goal of this session is to showcase how <strong>GPU</strong>s can be used to<br />
achieve high performance in a Genetic algorithm based optimization.<br />
The particular domain applied is stacking sequence optimization of<br />
Aircraft wing skins. The concepts illustrated use CUDA but are<br />
generic to any other <strong>GPU</strong> language. It is assumed that the<br />
registrants have exposure to optimization in engineering domain.<br />
Speaker(s): Sathya Narayana K. (Principal Consultan, Infosys Ltd.),<br />
Ravikumar G.V.V. (Infosys Ltd, Bangalore)<br />
Topic(s): Computational Structural Mechanics, Algorithms &<br />
Numerical Techniques, Parallel <strong>Program</strong>ming Languages &<br />
Compilers, Algorithms & Numerical Techniques (Advanced)
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM A8<br />
S0259 A High Performance Platform for Real-Time<br />
X-Ray Imaging<br />
We will share our experience on development of the <strong>GPU</strong>-based<br />
platform for synchrotron-based X-ray imaging aimed to analysis<br />
of dynamic processes. The complete data flow from the camera to<br />
the data storage will be discussed with a special focus on I/O<br />
issues, hardware platform, and ways to utilize the available<br />
system resources. An efficient <strong>GPU</strong>-implementation of filtered<br />
back projection will be presented highlighting differences of<br />
implementations for GT200, Fermi, and AMD Cypress<br />
architectures. We will introduce our software platform used to<br />
abstract current configuration of the imaging station and to<br />
simplify the development of parallel image processing algorithms.<br />
Speaker(s): Suren Chilingaryan (Researcher, Karlsruhe Institute<br />
of <strong>Technology</strong>)<br />
Topic(s): General Interest, Supercomputing, Audio, Image and Video<br />
Processing, Algorithms & Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM A1<br />
S0281 Accelerate a Fully Functional Photo Editing<br />
Software with <strong>GPU</strong><br />
Introduce how to design a fully functional <strong>GPU</strong>-based photo<br />
editing software, which provides features like layering and<br />
selecting, and integrates various adjusting tools and image filters.<br />
This design contains a fast layer rendering engine, an image filter<br />
framework which manages different filters supporting visual<br />
feedback for filter parameter adjustment. We will also introduce<br />
how to design undoing system for <strong>GPU</strong>-based image processing<br />
software. Specifically a CUDA-accelerated HDR tool will be<br />
presented in detailed.<br />
Speaker(s): Kaiyong Zhao (PhD Student, Hong Kong Baptist University)<br />
Topic(s): Computational Photography, Computer Graphics (Beginner)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
ROOM C<br />
S0365 Delite: A Framework for Implementing<br />
Heterogeneous Parallel DSLs<br />
Domain-specific languages can be a solution for heterogeneous<br />
parallel computing since they provide higher productivity and<br />
performance. To lower the barrier for DSL development, we<br />
implemented the Delite compiler framework and runtime. DSL<br />
developers can easily extend the framework to build a new DSL.<br />
The framework provides various optimization facilities and<br />
automatically generates code for heterogeneous hardware<br />
including <strong>GPU</strong>. The runtime executes the generated code in<br />
parallel by scheduling the kernels on target devices and managing<br />
the memory allocations and data transfers. This talk will cover the<br />
details of Delite with examples from OptiML, a machine learning<br />
DSL implemented with the framework.<br />
Speaker(s): HyoukJoong Lee (PhD Student, Stanford University), Kevin<br />
J. Brown (Research Assistant, Stanford University)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
ROOM L<br />
S0405 New Generation <strong>GPU</strong> Accelerated Financial<br />
Quant Libraries<br />
Learn from industry experts how new generation <strong>GPU</strong> accelerated<br />
solutions for derivative pricing, hedging, and risk management<br />
can be build more efficiently with modern technology and<br />
functional programming languages like F# on .NET or Scala on<br />
the Java VM. As a concrete example we report from a large<br />
derivative pricing project developed in F# on .NET. We will<br />
introduce the key design concepts and parallelization strategies,<br />
which lead to an efficient and transparent <strong>GPU</strong> acceleration.<br />
Several examples will illustrate the benefit of the functional as<br />
compared to the classical object oriented approach.<br />
Speaker(s): Daniel Egloff (Managing Partner, QuantAlea GmbH)<br />
Topic(s): Finance, Application Design & Porting Techniques, Algorithms<br />
& Numerical Techniques, Cloud Computing (Advanced)<br />
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM A7<br />
S0432 New Ideas for Massively Parallel Preconditioners<br />
Linear Solvers on serial machines tend to be highly recursive, but<br />
that’s not an option on <strong>GPU</strong>s. In this paper we describe a new<br />
preconditoner for GMRES and similar Krylov subspace linear<br />
solvers that is highly parallel, but also provides effective<br />
mechanisms to reconcile remote driving forces in a spatially<br />
discretized system. We will present results, taken from some<br />
real-world studies using a commercial oil reservoir simulator,<br />
showing how it compares with a state of the art serial solver, and<br />
showing how performance scales in a domain decomposition<br />
formulation run on a multiple CPU+<strong>GPU</strong> cluster.<br />
Speaker(s): John Appleyard (Managing Director, Polyhedron Software<br />
Ltd), Jeremy Appleyard (Analyst, Polyhedron Software Ltd)<br />
Topic(s): Algorithms & Numerical Techniques, Computational Fluid<br />
Dynamics, Energy Exploration (Advanced)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
ROOM M<br />
S0635 How to Bake Portable Many-Core <strong>Program</strong>s<br />
(Presented by CAPS enterprise)<br />
A legacy code, a cool many-core accelerator and a directive-based<br />
programming environment are the main ingredients of the recipe to<br />
transform your legacy code into a portable many-core one. This<br />
presentation shows by the example how to exploit accelerators in<br />
legacy code without sacrificing portability. We describe a<br />
methodology and the use of directives, such as HMPP and OpenACC,<br />
to exploit the massive parallelism provided by many-core devices.<br />
During the presentation we illustrate using numerous illustrations<br />
how to analyze performance, tune accelerator code, reduce data<br />
transfers, deal with libraries, exploit multiple accelerators, etc.<br />
Speaker(s): François Bodin (Chief technology Officer, CAPS enterprise)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (25 MINUTES)<br />
ROOM J1<br />
S0701 New <strong>GPU</strong> Appliance for Co-processing<br />
In the Petascale era, the super computers were used both for<br />
simulation and the graphical visualization of the results in-situ. At<br />
Exascale the compute resources will be more precious than<br />
before and using them for co-processing tasks will be not<br />
efficient. We are designing at a new appliance that will move the<br />
processing required for graphical visualization on a separate<br />
appliance that will allow visualization as co-processing to the<br />
simulation. We showcased the appliance at SC11. Running a<br />
pipeline of computational simulation and visualization, we show<br />
that our prototype system reduces total time to simulation<br />
completion by up to 30%.<br />
57 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
Speaker(s): Sorin Faibish (EMC Corporation)<br />
Panelist(s): Tom Furlong (Managing Director, Granite Ventures), Rob<br />
Enderle (Principal Analyst, Enderle Group), Flip GIanos (General Partner,<br />
InterWest Partners), Jeff Herbst (VP of Business Development, NVIDIA)<br />
Topic(s): HW/SW Architectures for Co-processing (Intermediate)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0808 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight<br />
development team! Whether you would like a private meeting to<br />
discuss specific product features or test out your application with<br />
the latest version of Nsight, or you just want to hang out with the<br />
team after attending one of the exciting training session, the<br />
lounge is great place to learn everything you ever wanted to know<br />
about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 15:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2004 Emerging Companies Summit: CEO on Stage<br />
Featuring GAIKAI, Immersive Media, and Numecent<br />
See the hottest new technologies from startups that are<br />
transforming computing. In a lively and fast-paced exchange, the<br />
Emerging Companies Summit CEO on Stage sessions will feature<br />
CEOs from three startups who will each have 15 minutes to<br />
introduce their companies and interact with a panel of leading<br />
venture capitalists, technology executives, and industry analysts.<br />
Speaker(s): David Perry (CEO and Co-Founder, GAIKAI), Mark<br />
McGovern (CEO, Immersive Media), Osman Kent (CEO, Numecent)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM A1<br />
S0073 Cost-effective <strong>GPU</strong> Acceleration of a Video<br />
Restoration and Archiving Workflow<br />
The goal of this session is to present a complex <strong>GPU</strong>-accelerated<br />
video restoration and archiving workflow. The workflow consists of<br />
many different processing steps and a final review application.<br />
Fast and cost-effective processing and real-time display of the<br />
processed video material is a key requirement. It will be shown in<br />
detail how a <strong>GPU</strong> based acceleration can be achieved for many<br />
different processing steps and the review application based on the<br />
use of OpenCV, OpenCL, and OpenGL. Furthermore, an object<br />
oriented software architecture supporting the acceleration of<br />
several different processing tasks on the same graphics adapter<br />
will be presented.<br />
Speaker(s): Klaus Gaedke (Lab Manager, Technicolor)<br />
Topic(s): Audio, Image and Video Processing (Intermediate)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM B<br />
S0103 Accelerating Protein Sequences and Classification<br />
using <strong>GPU</strong>-HMMER Search<br />
In this paper we present the results of parallelizing HMMer, which<br />
is a widely used tool for protein sequence homology detection, as<br />
well as functional annotation of homologous protein sequences,<br />
and protein family classification. The HMMer program is based<br />
upon a Viterbi algorithm coded in C, and is quite time consuming.<br />
We modify the Viterbi algorithmic logically to port it on GP<strong>GPU</strong>. We<br />
test multiple enhancements in our <strong>GPU</strong> kernels in order to<br />
demonstrate the effectiveness of each strategy. Our<br />
implementation cuda_hmmsearch achieves overall up to 30x<br />
speedup over intel single core CPU.<br />
Speaker(s): Mahesh Khadtare (PhD Student - Scientist ESP, I2IT,<br />
Pune University)<br />
Topic(s): Life Sciences, Bioinformatics (Intermediate)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM A8<br />
S0141 <strong>GPU</strong>-Accelerated Optical Coherence<br />
Tomography Imaging<br />
We developed a series of <strong>GPU</strong>-based technologies to accelerate<br />
the imaging reconstruction and visualization for optical coherence<br />
tomography (OCT). Several <strong>GPU</strong>-based algorithms such as<br />
non-uniform fast Fourier transform, numerical dispersion<br />
compensation, simultaneous phase modulation and multi-<strong>GPU</strong><br />
implementation were developed to achieve improved impulse<br />
response, better SNR, doubled imaging range and higher system<br />
stability. The <strong>GPU</strong>-accelerated 4D-OCT system was validated by<br />
imaging both in vivo and ex vivo biological tissues. This technology<br />
overcomes the imaging reconstruction and visualization<br />
bottlenecks that widely exist in current ultrahigh speed OCT<br />
systems and opens the way to interventional OCT imaging for<br />
applications in guided microsurgery.<br />
Speaker(s): Kang Zhang (Research Scientist, GE Global Research)<br />
Topic(s): Medical Imaging & Visualization (Beginner)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM N<br />
S0207 <strong>GPU</strong> Enabled Macromolecular Simulation:<br />
Challenges and Opportunities<br />
<strong>GPU</strong> enabled simulation of fully atomistic macromolecular<br />
simulation is rapidly gaining momentum, enabled by the massive<br />
parallelism and due to parallelizability of various components of<br />
the underlying algorithms and methodologies. The massive<br />
parallelism in the order of several hundreds to few thousands of<br />
cores, presents opportunities as well poses implementation<br />
challenges. In this talk dive deep into the various key aspects of<br />
simulation methodologies of macro molecular systems<br />
specifically adapted to <strong>GPU</strong>s. Learn some of the underlying<br />
challenges and get the latest solutions devised to tackle them in<br />
the FEN ZI code for fully atomistic macromolecular simulations.<br />
Speaker(s): Michela Taufer (Assistant Professor, University of<br />
Delaware), Sandeep Patel (University of Delaware)<br />
Topic(s): Molecular Dynamics, Algorithms & Numerical Techniques<br />
(Advanced)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM K<br />
S0293 Culises – A Library for Accelerated CFD on Hybrid<br />
<strong>GPU</strong>-CPU Systems<br />
The vast majority of CFD simulations relies on the solution of<br />
large-scale systems of linear equations (SLE), where the solution of<br />
a system can consume most of the total CPU time. We have<br />
developed a library (Culises) for state-of-the-art solution of SLE that<br />
is targeted on hybrid <strong>GPU</strong>-CPU platforms. Culises can be connected<br />
to MPI-parallelized CFD codes (e.g. OpenFOAM) via an applicationspecific<br />
interface. In this talk, we focus on efficient implementation<br />
of preconditioned Krylov subspace methods. Using the computing<br />
power of <strong>GPU</strong>s, Culises can significantly accelerate pure CPU<br />
computations for a multitude of industrial CFD applications.
Speaker(s): Bjoern Landmann (Development Engineer, FluiDyna GmbH)<br />
Topic(s): Computational Fluid Dynamics, Algorithms & Numerical<br />
Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 15:30 (50 MINUTES)<br />
ROOM A5<br />
S0340 Debug Multi-<strong>GPU</strong> Applications on CUDA-<br />
Accelerated Clusters with TotalView<br />
Learn how TotalView can help you develop CUDA applications on<br />
single servers, multi-<strong>GPU</strong> servers, and HPC-style clusters. For<br />
more than 20 years the TotalView debugger has set the standard<br />
for parallel and multi-core debugging on Linux, HPC clusters and<br />
custom supercomputers such as the Cray XT/XE/XK series. CUDA<br />
developers deal with the same types of complexity and can realize<br />
the same productivity benefits. This talk will introduce TotalView<br />
for CUDA and show how you can program more easily with CUDA<br />
3.2, 4.0 and 4.1.<br />
Speaker(s): Chris Gottbrath (Principal Product Manager, Rogue<br />
Wave Software)<br />
Topic(s): Development Tools & Libraries, Supercomputing<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM A7<br />
S0433 Accelerated FDTD Technique for Marine<br />
Controlled Source Electromagnetic Imaging<br />
Find out about the newest method for Marine Hydrocarbon<br />
Exploration. In this session we will profile the use of Finite<br />
Difference Time Domain (FDTD) technique in combination with<br />
Mittet’s method and <strong>GPU</strong>s to produce faster, cheaper, more<br />
accurate forward modeling for electromagnetic imaging<br />
(Controlled Source Electromagnetic or CSEM). Unlike many<br />
frequency domain CSEM techniques this accelerated method does<br />
not require simplifying assumptions to reduce the memory and<br />
computational burden and has excellent scaling properties<br />
(essentially linear) across clusters of <strong>GPU</strong> accelerated nodes.<br />
CSEM is used in the industry to enhance confidence in<br />
hydrocarbon reservoir discoveries.<br />
Speaker(s): Geoff Clark (CEO, Acceleware Ltd.), Michal Okoniewski<br />
(Director of Marketing, Acceleware Ltd.)<br />
Topic(s): Energy Exploration (Intermediate)<br />
WEDNESDAY, MAY 16, 15:30 (180 MINUTES)<br />
HALL 1<br />
S0514 <strong>GPU</strong> Performance Analysis and Optimization<br />
This session will present the fundamental performanceoptimization<br />
concepts and illustrate their practical application in<br />
the context of programming for Fermi and Kepler <strong>GPU</strong>s. The goal<br />
is twofold: make the optimization process a methodical sequence<br />
of steps, facilitate making performance-aware algorithmic<br />
decisions before coding even starts. In order to maximize <strong>GPU</strong><br />
performance, a code should have sufficient parallelism, access<br />
memory in a coalesced pattern, and be amenable to vector<br />
execution within warps (groups of 32 threads). We will show how<br />
to quantify these requirements for a specific <strong>GPU</strong> in order to<br />
determine performance limiters and their importance for a given<br />
code. To address the limiters, we will review hardware operation<br />
specifics and related optimization techniques. Optimization<br />
process will be illustrated using NVIDIA profiling tools and kernel<br />
case studies.<br />
Speaker(s): Paulius Micikevicius (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />
WEDNESDAY, MAY 16, 15:30 (25 MINUTES)<br />
ROOM J1<br />
S0702 The Architecture of Acceleration in HPC<br />
High Performance Computing applications push the envelope of<br />
what can be computed today. Acceleration technologies play a<br />
critical role in extending and enhancing capability. Balancing the<br />
impact of acceleration within hardware and software is a difficult<br />
art, where critical decisions can have dramatic impacts. We<br />
present the role of acceleration in tightly and loosely coupled<br />
settings, as well as data structures and execution model.<br />
Speaker(s): Justin Tripp (Technical Staff Member, Los Alamos National<br />
Laboratory), Zack Baker (Los Alamos National Laboratory)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM K<br />
S0055 Particle Dynamics with MBD and FEA using CUDA<br />
Many sphere particles are solved with DEM (Discrete Element<br />
Method) and simulated with <strong>GPU</strong> technology. Fast algorithm is<br />
applied to calculate hertzian contact forces between many sphere<br />
particles (from 100,000 to 1,000,000) and NVIDIA’s CUDA is used to<br />
accelerate the calculation. Many sphere particles and MBD and<br />
FEA entities are simulated within commercial software RecurDyn.<br />
Many models are built and simulated; fork lifter with sand model,<br />
oil in oil tank model, oil filled engine system and water filled<br />
washing machine model. All models are simulated with NVIDIA’s<br />
<strong>GPU</strong> and the result is shown.<br />
Speaker(s): Graham Sanborn (Lead Software Developer, FunctionBay)<br />
Topic(s): Computational Structural Mechanics, Computational Physics,<br />
Computational Fluid Dynamics (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM B<br />
S0109 SOAP3: <strong>GPU</strong>-based Compressed Indexing and<br />
Ultra-fast Parallel Alignment of Short Reads<br />
We give the fi_x000C_rst implementation of a compressed index<br />
(Burrows-Wheeler Transform) on the <strong>GPU</strong>, supporting very<br />
efficient parallel alignment of short patterns (reads) onto the<br />
human genome. The new alignment software SOAP3 is tens of<br />
times faster than existing ones and can catch up the throughput<br />
(Giga to Tera bp) of next generation DNA sequencer. It takes 2.4<br />
seconds to perform exact matching for one million length-100<br />
reads (tens of seconds for small-error approximate matching).<br />
Technically, we show how to minimize memory accesses to the<br />
index from individual threads and to control the branching and<br />
divergence of the threads.<br />
Speaker(s): BingQiang Wang (BGI)<br />
Topic Areas: Bioinformatics (Advanced)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM A8<br />
S0131 Multi-<strong>GPU</strong> Real-Time Ptychographic X-ray<br />
Image Reconstruction<br />
Learn how a new imaging technique, combined with the<br />
computational power of <strong>GPU</strong>s and the brightness of modern X-ray<br />
synchrotrons can quickly and easily produce images with<br />
nanometer level resolution. Ptychography is a recent X-ray<br />
imaging technique in which overlapping regions of a sample are<br />
exposed in quick succession and the resulting scattering is used<br />
to reconstruct a high resolution image of the sample. Discover<br />
why <strong>GPU</strong>s can substitute for the lack of X-ray lenses and how they<br />
59 CONFERENCE GUIDE WEDNESDAY
enabled a dramatic reduction in the feedback time for users of the<br />
technique from days to seconds.<br />
Speaker(s): Filipe Maia (Postdoctoral Fellow, Lawrence Berkeley<br />
National Laboratory)<br />
Topic(s): Audio, Image and Video Processing, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM A3<br />
S0149 On the Parallel Solution of Sparse Triangular<br />
Linear Systems<br />
A parallel algorithm for solving a sparse triangular linear system on<br />
the <strong>GPU</strong> is proposed. It implements the solution of the triangular<br />
system in two phases. The analysis phase builds a dependency graph<br />
based on the matrix sparsity pattern and groups the independent<br />
rows into levels. The solve phase obtains the full solution by iterating<br />
sequentially across the constructed levels. The solution elements<br />
corresponding to each level are obtained in parallel. The numerical<br />
experiments are presented and it is shown that the incomplete-LU<br />
and Cholesky preconditioned iterative methods can achieve a 2x<br />
speedup on the <strong>GPU</strong> over their CPU implementation.<br />
Speaker(s): Maxim Naumov (Software Engineer, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques, Development Tools &<br />
Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM L<br />
S0206 Monte-Carlo Pricing Under a Hybrid Local<br />
Volatility Model<br />
This session shows how to calculate the prices of several financial<br />
products, vanilla and exotic, under Dupire’s Local Volatility model.<br />
We start with vanilla options on the foreign exchange rate and<br />
explain how to rescale the Local Volatility matrix in order to take<br />
advantage of the fast texture memory interpolation. We then extend<br />
this framework to two factors by including stochastic interest rates<br />
following Hull-White model, and show how to price Power-Reverse<br />
Dual Coupon swaps with an exotic TARN feature. We provide details<br />
of the algorithms and compare accuracy and speed with typical<br />
performances of single-core production implementations.<br />
Speaker(s): Sebastien Gurrieri (Quantitative Analyst, Mizuho<br />
International)<br />
Topic(s): Finance, Algorithms & Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM A1<br />
S0273 Fast JPEG Coding on the <strong>GPU</strong><br />
The goal of this session is to demonstrate how high speed JPEG<br />
compression and decompression can be efficiently implemented<br />
on the <strong>GPU</strong> using CUDA. In this session we will present: detailed<br />
analysis of Baseline JPEG compression and decompression<br />
processes and its constituent parts (such as Huffman Coding,<br />
RLE, Differential Coding, Quantization, Discrete Cosine Transform)<br />
and their suitability for the <strong>GPU</strong> architecture, analysis of achieved<br />
results and comparison with existing implementations,<br />
applications to high-speed imaging.<br />
Speaker(s): Fyodor Serzhenko (SEO, Fastvideo), Victor Podlozhnyuk<br />
(NVIDIA)<br />
Topic(s): Audio, Image and Video Processing, Algorithms &<br />
Numerical Techniques (Advanced)<br />
WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />
ROOM A2<br />
S0286 Scaling Applications to a Thousand <strong>GPU</strong>s<br />
and Beyond<br />
Discover how to scale scientific applications to thousands of <strong>GPU</strong>s<br />
in parallel. We will demonstrate our techniques using two codes<br />
representative of a wide spectrum of programming methods. The<br />
Ludwig lattice Boltzmann package, capable of simulating<br />
extremely complex fluid dynamics models, combines C, MPI and<br />
CUDA. The Himeno three-dimensional Poisson equation solver<br />
benchmark combines Fortran (using the new coarray feature for<br />
communication) with prototype OpenMP accelerator directives (a<br />
promising new high-productivity <strong>GPU</strong> programming method). We<br />
will present performance results using the cutting-edge<br />
massively-parallel Cray XK6 hybrid supercomputer featuring the<br />
latest NVIDIA Tesla 2090 <strong>GPU</strong>s.<br />
Speaker(s): Alan Gray (HPC Architect, The University of Edinburgh),<br />
Roberto Ansaloni (Cray Italy)<br />
Topic(s): Supercomputing, Computational Fluid Dynamics, Parallel<br />
<strong>Program</strong>ming Languages & Compilers, Application Design &<br />
Porting Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM C<br />
S0299 Exploiting Fault Tolerant Heterogeneous<br />
Parallelism with SPM.Python<br />
In this session, we shall review how SPM.Python enables the<br />
exploitation of parallelism across servers, cores and <strong>GPU</strong>s in a<br />
fault tolerant manner. We will start off by describing the how/<br />
what/why SPM.Python augments the traditional (serial) Python<br />
with parallel concepts like parallel task managers and<br />
communication primitives. Specifically, the context for and<br />
solutions to three formally open technical problems will be<br />
described. We will conclude by reviewing examples of how SPM.<br />
Python can be used to exploit both coarse and fine grain<br />
parallelism using <strong>GPU</strong>s within and across servers in a fault<br />
tolerant manner.<br />
Speaker(s): Minesh B Amin (Founder / CEO, MBA Sciences)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Advanced)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0332 Efficient Graph Matching and Coloring on the <strong>GPU</strong><br />
The goal of this session is to compare the performance of graph<br />
matching and graph coloring algorithms on massively parallel<br />
devices such as <strong>GPU</strong>s. We present novel algorithms, which produce<br />
superior results for certain graphs and also discuss the techniques<br />
used to efficiently implement these algorithms on the <strong>GPU</strong>.<br />
Speaker(s): Patrice Castonguay (Emerging Applications Intern,<br />
NVIDIA), Jonathan Cohen (Emerging Applications, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM N<br />
S0363 Efficient Molecular Dynamics on Heterogeneous<br />
<strong>GPU</strong> Architectures in GROMACS<br />
Molecular Dynamics is an important application for <strong>GPU</strong><br />
acceleration, but many algorithmic optimizations and features still<br />
rely on code that prefers traditional CPUs. It is only with the latest<br />
hardware and software we have been able to realize a<br />
heterogeneous <strong>GPU</strong>/CPU implementation and reach performance<br />
61 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
significantly beyond the state-of-the-art of hand-tuned CPU code<br />
in our GROMACS program. The sub-millisecond iteration time<br />
poses challenges on all levels of parallelization. Come and learn<br />
about our new atom-cluster pair interaction approach for<br />
non-bonded force evaluation that achieves 60% work-efficiency<br />
and other innovative solutions for heterogeneous <strong>GPU</strong> systems.<br />
Speaker(s): Berk Hess (PhD Student, KTH Royal Institute of <strong>Technology</strong>),<br />
Szilárd Páll (PhD Student, KTH Royal Institute of <strong>Technology</strong>)<br />
Topic(s): Molecular Dynamics, Computational Physics, Life Sciences<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM A7<br />
S0507 Interactive and Scalable Subsurface Data<br />
Visualization Framework<br />
The goal is to present an interactive visualization framework for<br />
large geo-spatial data. This framework has been developed by<br />
NVIDIA Advanced Rendering Center for the oil and gas<br />
(Hydrocarbone) industry. The Cuda based application is running on<br />
the cloud at interactive frame-rates. The visualization is remote<br />
on clients in a browser, including tablets. The scalable<br />
visualization framework can handle terra bytes of.<br />
Speaker(s): Tom-Michael Thamm (Director, Software Product<br />
Management, NVIDIA ARC), Marc Nienhaus (NVIDIA ARC)<br />
Topic(s): Visualization, Cloud Computing (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />
ROOM M<br />
S0636 Supermicro: Worldwide leader in GP/<strong>GPU</strong> Servers<br />
and Workstation Platforms (Presented by Supermicro)<br />
Discover the measurable advantages that make Supermicro the<br />
time-to-market leader in <strong>GPU</strong> platform enablement. See how<br />
Supermicro’s innovative Application-Optimized designs enable<br />
partners to both scale-up and scale-out for maximum return on<br />
investment. Review actual case studies that highlight Supermicro’s<br />
leadership in Compute Density, Peak Performance, Scalability,<br />
Power Efficiency, Manageability, Reliability and Cost Effectiveness.<br />
Speaker(s): Don Clegg (VP, Supermicro)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 16:00 (25 MINUTES)<br />
ROOM J1<br />
S0703 Adaptive Heterogeneous Computing with OpenCL:<br />
A Molecular Docking Case Study<br />
Modern computer systems routinely include multiple types of fully<br />
programmable computing resource, such as multi-core CPUs and<br />
many-core <strong>GPU</strong>s. Most research into accelerator-based<br />
computing tends to focus on just one part of the system, typically<br />
the <strong>GPU</strong>. In our work we have developed methods to harness all of<br />
the available computing resources in a system simultaneously,<br />
including CPUs and <strong>GPU</strong>s, using OpenCL as the underpinning<br />
cross-platform layer. In this paper we shall include results from a<br />
molecular docking program, which has been shown to scale<br />
across hundreds of hybrid CPU/<strong>GPU</strong> systems, yielding significant<br />
increases in performance and energy efficiency.<br />
Speaker(s): Simon McIntosh-Smith (University of Bristol)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0809 CUDA Profiler Training on Windows<br />
Nsight offers a comprehensive set of performance analysis tools.<br />
From the ability to trace complete system multi-core CPU and<br />
multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />
experiments, developers can identify system level optimization<br />
opportunities as well as expensive and inefficient CUDA kernels<br />
requiring in-depth analysis with the CUDA profiler. Through a set<br />
of comprehensive exercises, the attendee will be able to utilize<br />
these features to become fully proficient at optimizing complex<br />
CUDA applications.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 16:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2005 Emerging Companies Summit: CEO on Stage<br />
Featuring RealView Imaging, Elemental Technologies,<br />
and Mersive<br />
See the hottest new technologies from startups that are<br />
transforming computing. In a lively and fast-paced exchange, the<br />
Emerging Companies Summit CEO on Stage sessions will feature<br />
CEOs from three startups who will each have 15 minutes to<br />
introduce their companies and interact with a panel of leading<br />
venture capitalists, technology executives, and industry analysts.<br />
Speaker(s): Shaul Geldman (Co-Founder and VP of R&D, RealView<br />
Imaging), Sam Blackman (CEO and Co-Founder, Elemental<br />
Technologies), Robert Balgley (CEO, Mersive)<br />
Panelist(s): Tom Furlong (Managing Director, Granite Ventures), Rob<br />
Enderle (Principal Analyst, Enderle Group), Flip GIanos (General<br />
Partner, InterWest Partners), Jeff Herbst (VP of Business Development,<br />
NVIDIA)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM A1<br />
S0052 Fast High Quality Image and Video Background<br />
Removal with CUDA<br />
A tool to efficiently and easily cut out objects from a taken picture<br />
has great practical value. In this session we present aspects on how<br />
to efficiently implement such a tool with CUDA and the NPP library<br />
based on the GrabCut approach by Rother et al. Through <strong>GPU</strong><br />
acceleration both runtime and accuracy is improved compared to<br />
CPU based implementations such as the one in MS Word 2011.<br />
Further we show how to extend our <strong>GPU</strong> implementation to enable<br />
live background removal in a webcam video stream.<br />
Speaker(s): Timo Stich (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Audio, Image and Video Processing, Machine Learning & AI<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM K<br />
S0070 Large-Scale Matrix-Free Topology Optimization<br />
on the <strong>GPU</strong><br />
Popular topology optimization methods today are based on the<br />
SIMP concept. Unfortunately, SIMP leads to ill-conditioned<br />
stiffness matrices that are difficult to solve on <strong>GPU</strong> architectures.<br />
In this talk, I will present a new topology optimization method<br />
called PareTO that relies on the concepts of topological sensitivity<br />
and pareto-tracing. The resulting stiffness matrices are well
conditioned, and one can now fully exploit <strong>GPU</strong> architectures for<br />
fast matrix-free implementation of the finite element method.<br />
Numerical experiments demonstrate that the efficacy of PareTO.<br />
Speaker(s): Krishnan Suresh (Associate Professor, University<br />
of Wisconsin)<br />
Topic(s): Computational Structural Mechanics, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM B<br />
S0084 CUMACH - A Fast <strong>GPU</strong>-based Genotype<br />
Imputation Tool<br />
The goal of this session is to introduce a <strong>GPU</strong>-implemented tool in<br />
bioinformatics. Genotype imputation is method which extrapolates<br />
genetic correlations from a densely characterized reference panel<br />
to a sparsely typed study sample. There have already been lots of<br />
CPU-based tools, but they all cost lots of time for large data-set.<br />
In this session, we try to implement a <strong>GPU</strong>-based imputation tool<br />
which can get relatively good result and fast speed. There will be<br />
three main parts for the session: 1) Introduce the background and<br />
its HMM based algorithm, 2) <strong>GPU</strong> implementation and<br />
optimization, 3) Results.<br />
Speaker(s): Agatha Hu (NVIDIA)<br />
Topic(s): Bioinformatics (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM N<br />
S0121 Software Architecture to Facilitate CUDA<br />
Development<br />
We describe a workflow architecture and its use in developing<br />
Schrödinger’s core-hopping application. The application supplies<br />
the stages as callbacks. A stage may have multiple<br />
implementations; for example, CUDA and CPU. An implementation<br />
can be assigned a maximum number of simultaneous threads.<br />
When any stage completes, a scheduling algorithm determines<br />
which implementation of which stage will be launched next. The<br />
application may detect “special” environments, such as CUDA, and<br />
set up its stages accordingly, or it may allow specification of which<br />
implementation of each stage to run. This makes it easy to develop<br />
and debug CUDA stages flexibly and incrementally.<br />
Speaker(s): Peter Shenkin (Vice President, Schrodinger), K. Patrick<br />
Lorton (Principal Developer, Schrodinger)<br />
Topic(s): Development Tools & Libraries, Life Sciences (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM A8<br />
S0202 Terascale Volume Visualization in Neuroscience<br />
Learn how to create a scalable volume visualization system for<br />
interactive rendering of terascale EM data. We will describe the<br />
major design principles, how we can avoid the standard approach<br />
of pre-computing a 3D multi-resolution hierarchy such as an<br />
octree, and how to handle continuous streaming of newly acquired<br />
data. For rendering we build upon a visibility-driven approach and<br />
3D virtual texturing, and perform interactive volume rendering of<br />
a “virtual” volume, where the corresponding physical storage is<br />
only represented and populated in a sparse manner with 2D<br />
instead of 3D image data on the fly during rendering.<br />
Speaker(s): Johanna Beyer (Postdoctoral Fellow, King Abdullah<br />
University of Science and <strong>Technology</strong>), Markus Hadwiger (Assistant<br />
Professor, KAUST)<br />
Topic(s): Visualization, Neuroscience (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0241 Large Graphs on Multi-<strong>GPU</strong>s<br />
The goal of this session is to propose new paradigms to explore<br />
large graphs on <strong>GPU</strong>s. Graphs with billions of edges don’t fit<br />
within the memory of a single <strong>GPU</strong>. A possible solution is to resort<br />
to multiple <strong>GPU</strong>s. Most of common graph algorithms show low<br />
arithmetic intensity and irregular access patterns. These features<br />
lead to a poor load balance among threads and un-coalesced<br />
access to memory. We show how to balance the load to exploit as<br />
much as possible all threads and then how to use fast algorithms,<br />
as radix-sort and scan, to rearrange data before process them.<br />
Speaker(s): Enrico Mastrostefano (PhD Student, Sapienza Università<br />
di Roma)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM A5<br />
S0257 Trace Based Performance Analysis For <strong>GPU</strong><br />
Accelerated Multi-Hybrid Applications<br />
Get in contact with performance tuning experts for multi-hybrid<br />
applications and see first hand how VampirTrace/Vampir can<br />
significantly speed up application porting and development.<br />
Speaker(s): Guido Juckeland (System Engineer (HPC), Leader<br />
Hardware Accelerator Group, TU Dresden - ZIH)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM C<br />
S0367 Physis: An Implicitly Parallel Framework for<br />
Stencil Computations<br />
This session presents how to implement finite difference methods<br />
in a concise, readable, and portable way, yet achieving good<br />
scalability over hundreds of <strong>GPU</strong>s, using the Physis high-level<br />
application framework. Physis extends the standard C language<br />
with a small set of custom declarative constructs for expressing<br />
stencil computations with multidimensional structured grids,<br />
which are automatically translated to CUDA for <strong>GPU</strong> acceleration<br />
and MPI for node-level parallelization with automatic domainspecific<br />
optimizations such as overlapped boundary exchanges.<br />
We demonstrate the programmability improvement and<br />
performance of Physis using hundreds of <strong>GPU</strong>s on TSUBAME2.0.<br />
Speaker(s): Naoya Maruyama (Assistant Professor, Tokyo Institute<br />
of <strong>Technology</strong>)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers,<br />
Supercomputing, Development Tools & Libraries, Computational Fluid<br />
Dynamics (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM L<br />
S0377 C++ Data Marshalling Best Practices<br />
When integrating CUDA C++ kernels into existing C++ applications,<br />
it is at times desirable to migrate a C++ object instance from the<br />
host to the device or vice versa. Given variations among host<br />
compilers regarding structure layout, accomplishing this data<br />
marshalling in a manner that is reliable, simple, and efficient is a<br />
complex issue. cudaMemcpy is our primary means to transfer<br />
data to the <strong>GPU</strong>, but memcpy-style operations are more readily<br />
amenable to C-style structures and arrays than to C++ objects or<br />
collections of objects. In this session, we will cover the caveats<br />
and best practices for marshalling C++ data.<br />
63 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
Speaker(s): Cliff Woolley (CUDA Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Finance, Application Design & Porting Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM A7<br />
S0511 3D Helmholtz Solver with a Shifted Laplace<br />
Multigrid on Multi-<strong>GPU</strong>s<br />
Learn about an iterative solver of the 3D Helmholtz equation on<br />
multi-<strong>GPU</strong> using CUDA. The Helmholtz equation discretized by a<br />
second order finite differences is solved with Bi-CGSTAB<br />
preconditioned by a shifted Laplace multigrid method. Two<br />
multi-<strong>GPU</strong> approaches are considered: data parallelism and<br />
algorithm-split. Their implementations on multi-<strong>GPU</strong> architecture<br />
are compared to a multi-threaded CPU and single <strong>GPU</strong><br />
implementation. The results show that the data parallel<br />
implementation is suffering from communication between <strong>GPU</strong>s<br />
and CPU, but is still several times faster compared to many-cores.<br />
The algorithm-split across <strong>GPU</strong>s limits communication and<br />
delivers speedups comparable to a single <strong>GPU</strong> implementation.<br />
Speaker(s): Kees Lemmens (Delft University of <strong>Technology</strong>)<br />
Topic(s): Energy Exploration, Algorithms & Numerical Techniques<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM A3<br />
S0525 Copperhead: Data Parallel Python<br />
Copperhead is a data parallel language suitable for <strong>GPU</strong><br />
programming, embedded in Python, which aims to provide both a<br />
productive programming environment as well as excellent<br />
computational efficiency. Copperhead programs are written in a<br />
small, restricted subset of the Python language, using standard<br />
constructs like map and reduce, along with traditional data<br />
parallel primitives like scan and sort. Copperhead programs<br />
interoperate with existing Python numerical and visualization<br />
libraries such as NumPy, SciPy, and Matplotlib. In this talk, we will<br />
discuss the Copperhead language, the open-source Copperhead<br />
runtime, and selected example programs.<br />
Speaker(s): Bryan Catanzaro (Research Scientist, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Intermediate)<br />
WEDNESDAY, MAY 16, 16:30 (25 MINUTES)<br />
ROOM J1<br />
S0704 Accelerating Iterative Linear Solvers on <strong>GPU</strong>s<br />
In this talk, we present our work on solving sparse linear systems<br />
on NVIDIA Tesla <strong>GPU</strong>. We develop a new matrix format for <strong>GPU</strong>,<br />
HEC (Hybrid of ELL and CSR). The corresponding sparse matrix<br />
vector multiplication kernel and other related BLAS 1/2<br />
subroutines are developed. Based on these subroutines, seven<br />
Krylov subspace solvers and two algebraic multigrid solvers<br />
(AMG) are implemented. Several commonly used preconditioners,<br />
such as Neumann polynomial, approximate inverse, ILU(k), ILUT,<br />
block ILU(k), block ILUT, domain decomposition (DDM) and AMG<br />
preconditioners, are also developed. Besides, a new parallel<br />
triangular solver for <strong>GPU</strong> is designed. With this solver, a unified<br />
framework for ILU-related preconditioners is implemented.<br />
Speaker(s): Hui Liu (University of Calgary)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM L<br />
S0100 Mathematica as a Practical Platform for <strong>GPU</strong>-<br />
Accelerated Finance<br />
With the introduction of <strong>GPU</strong> support in version 8, Mathematica<br />
has become an excellent environment for integrating CUDA with<br />
high level code for interpretation or visualization. In this<br />
presentation, we will show the usefulness of Mathematica in the<br />
venue of computational finance. In addition to demonstrating the<br />
<strong>GPU</strong>-accelerated financial computations which can be readily<br />
performed within Mathematica, we will show that these<br />
calculations can easily be integrated with third-party data sources<br />
including Microsoft Excel and databases. Furthermore, we will<br />
cover the UnRisk Mathematica package written by MathConsult,<br />
which seamlessly adds <strong>GPU</strong>-accelerated complex model<br />
calibration algorithms to Mathematica’s repertoire.<br />
Speaker(s): Abdul Dakkak (Kernel Developer, Wolfram Research),<br />
Dylan Roeh (Kernel Developer, Wolfram Research)<br />
Topic(s): Finance, Development Tools & Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM A1<br />
S0128 V:Screen: A Real-Time Augmented Video Method<br />
This presentation presents a tool for image editing that allows us<br />
to modify a region of any image or video by another image or<br />
video. This application is useful for advertisements, commercials,<br />
music videos, movies, etc. We named “Virtual Screen” or just<br />
VScreen, to our development. The main difference between editing<br />
(augmenting) videos and fixed images is that the occlusions need<br />
be managed. Moving objects in the foreground may occlude the<br />
augmented region in background. So that we use a procedure for<br />
foreground-background video segmentation, that is implemented<br />
in NVIDIA video cards to fulfill the real-time requirement.<br />
Speaker(s): Francisco J. Hernandez-Lopez (PhD Student, CIMAT A.C.),<br />
Mariano Rivera (Researcher-Professor, CIMAT A.C.)<br />
Topic(s): Computer Vision (Beginner)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM N<br />
S0139 <strong>GPU</strong>-Based Molecular Dynamics Simulations of<br />
Protein and RNA Assembly<br />
Protein and RNA biomolecular folding and assembly problems<br />
have important applications because misfolding is associated with<br />
diseases like Alzheimer’s and Parkinson’s. However, simulating<br />
complex biomolecules on the same timescales as experiments is<br />
an extraordinary challenge due to a bottleneck in the force<br />
calculations. To overcome these hurdles, we perform coarsegrained<br />
molecular dynamics simulations where biomolecules are<br />
reduced into simpler components. Furthermore, our <strong>GPU</strong>-based<br />
simulations have a significant performance improvement over<br />
CPU-based simulations, which is limited to systems of 50-150<br />
residues/nucleotides. The <strong>GPU</strong>-based code can simulate protein/<br />
RNA systems of 400-10,000+ residues/nucleotides, and we<br />
present ribosome assembly simulations.<br />
Speaker(s): Samuel Cho (Assistant Professor, Wake Forest University)<br />
Topic(s): Molecular Dynamics, Computational Physics (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />
ROOM A3<br />
S0242 Harnessing <strong>GPU</strong> Compute with C++ AMP (Part 1 of 2)<br />
C++ AMP is an open specification for taking advantage of<br />
accelerators like the <strong>GPU</strong>. In this session we will explore the C++
AMP implementation in Microsoft Visual Studio 11. After a quick<br />
overview of the technology understanding its goals and its<br />
differentiation compared with other approaches, we will dive into<br />
the programming model and its modern C++ API. This is a code<br />
heavy, interactive, two-part session, where every part of the<br />
library will be explained. Demos will include showing off the<br />
richest parallel and <strong>GPU</strong> debugging story on the market, in the<br />
upcoming Visual Studio release.<br />
Speaker(s): Daniel Moth (Principal <strong>Program</strong> Manager, Microsoft)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Development<br />
Tools & Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />
ROOM A5<br />
S0298 Performance Tools for <strong>GPU</strong>-Powered Scalable<br />
Heterogeneous Systems<br />
Discover the latest parallel performance tool technology for<br />
understanding and optimizing parallel computations on scalable<br />
heterogeneous platforms. The session will present the TAU<br />
performance system and its support of measurement and analysis<br />
of heterogeneous platforms composed of clusters of sharedmemory<br />
nodes with <strong>GPU</strong>s. In particular, TAU’s integration of the<br />
CUPTI 4.1+ technology will be described and demonstrated<br />
through CUDA SDK examples and the SHOC benchmarks.<br />
Attendees will be provided LiveDVDs containing the TAU toolsuite<br />
and many pre-installed parallel tool packages. It will also include<br />
the last CUDA driver, runtime library, and CUPTI.<br />
Speaker(s): Allen Malony (Professor, University of Oregon)<br />
Topic(s): Development Tools & Libraries, Parallel <strong>Program</strong>ming<br />
Languages & Compilers, Application Design & Porting Techniques<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />
ROOM B<br />
S0361 Lossless Data Compression on <strong>GPU</strong>s<br />
In this talk, we will discuss common data compression algorithms<br />
used in the bzip2 implementation. We will also discuss our efforts<br />
towards parallelizing the Burrows-Wheeler Transform, Move-to-<br />
Front Transform, and Huffman encoding. The Burrows-Wheeler<br />
Transform is an algorithm used in both lossless data compression<br />
and bioinformatics. We’ll explain how it was computed using a<br />
parallel string-sorting algorithm. We will also show performance<br />
comparisons to serial implementations of each algorithm.<br />
Speaker(s): Jason Mak (Graduate Student, UC Davis), Ritesh Patel<br />
(Student, University of California Davis)<br />
Topic(s): Algorithms & Numerical Techniques, Bioinformatics<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0410 Computing Hausdorff Distances Between<br />
Freeforms on the <strong>GPU</strong><br />
We present new <strong>GPU</strong> algorithms for computing the directed<br />
Hausdorff distance between freeform surfaces, with applications in<br />
shape matching, mesh simplification, and geometric approximation<br />
and optimization. Our algorithms run in real-time with very small<br />
error bounds for parametric models defined by complex NURBS<br />
surfaces and can be used to interactively compute the Hausdorff<br />
distance for models made of dynamic deformable surfaces. We<br />
discuss implementation decisions and tradeoffs between OpenGL,<br />
Cuda, and Thrust, and the advantages and disadvantages of parallel<br />
hierarchical culling methods for this application.<br />
Speaker(s): Sara McMains (Professor, UC Berkeley), Adarsh<br />
Krishnamurthy (Post-doctoral Researcher, UC San Diego)<br />
Topic(s): Algorithms & Numerical Techniques, Computer Graphics,<br />
Computer Vision (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM K<br />
S0518 <strong>GPU</strong> Computing: From Sand to Tank Dynamics<br />
This talk explores the use of heterogeneous CPU/<strong>GPU</strong> computing,<br />
as enabled by an in-house developed Heterogeneous Computing<br />
Template (HCT), for physics-based simulations of mechanical<br />
systems. HCT draws on five components: advanced modeling<br />
techniques (formulating the governing equations); algorithmic<br />
support (solving these equations); proximity computation; domain<br />
decomposition/data exchange (for multi-node distributed CPU/<strong>GPU</strong><br />
computing); and post-processing/visualization. These five<br />
components provide the foundation of a computational framework<br />
used to analyze mechanical systems with millions of interacting<br />
elements. Example applications will include granular terrain<br />
simulation, tracked and wheeled vehicle mobility studies (tanks,<br />
rovers), fluid-solid interaction and nonlinear finite element analysis.<br />
Speaker(s): Dan Negrut (Associate Professor, University of<br />
Wisconsin-Madison)<br />
Topic(s): Computational Structural Mechanics, Computational<br />
Fluid Dynamics (Advanced)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM C<br />
S0605 cudaDMA: Emulating DMA engines on <strong>GPU</strong>s for<br />
Performance and <strong>Program</strong>mability<br />
The CudaDMA library is a collection of DMA objects that support<br />
efficient movement of data between off-chip global memory and<br />
on-chip shared memory in CUDA kernels. CudaDMA objects<br />
support many different data transfer patterns including sequential,<br />
strided, gather, scatter, and halo patterns. The library encapsulates<br />
efficient synchronization and data transfer implementations to<br />
achieve high memory bandwidth utilization. <strong>Program</strong>mer<br />
productivity is achieved by avoiding the need for thread array<br />
shapes to match data layout. Using CudaDMA, speedups of up to<br />
1.37x on synthetic micro-benchmarks and 1.15x-3.2x on kernels<br />
from scientific applications have been demonstrated.<br />
Speaker(s): Brucek Khailany (Senior Research Scientist, NVIDIA)<br />
Topic(s): Development Tools and Libraries (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 TBD (25 MINUTES)<br />
ROOM A8<br />
S0623 Visualizing Heterogeneous Performance Tested<br />
on MPI+CUDA Gigapixel Panorama Stitching<br />
This session consists of two technical parts. In the first part, we<br />
explain the use and implementation of a hybrid Poisson solver for<br />
gradient domain processing of massive images. Specifically, we<br />
provide a parallel out-of-core method for the seamless stitching<br />
of gigapixel panoramas in a parallel CUDA + MPI environment. In<br />
the second part, we shall cover the ongoing work of using novel<br />
visualizing techniques to understand performance data of<br />
heterogeneous computing clusters. The Poisson solver application<br />
shall be taken up as an example to demonstrate various features<br />
of this performance visualization tool.<br />
Speaker(s): Valerio Pascucci (Director of the Center for Extreme Data<br />
Management, Analysis and Visualization, University of Utah)<br />
Topic(s): Supercomputing, Visualization, Development Tools and<br />
Libraries (Beginner)<br />
65 CONFERENCE GUIDE WEDNESDAY
WEDNESDAY<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM J1<br />
S0705 Efficient AMG on Hybrid <strong>GPU</strong> Clusters<br />
This talk presents the implementation of an AMG solver for a hybrid<br />
cluster that exploits distributed and shared memory parallelization<br />
and uses the available <strong>GPU</strong> accelerators on each node. This solver<br />
has been written by using LAMA (Library for Accelerated Math<br />
Applications). This library does not only provide an easy-to-use<br />
framework for solvers that might run on different devices with<br />
different matrix formats, but also comes with features to optimize<br />
and hide communication and memory transfers between CPUs and<br />
<strong>GPU</strong>s. These features are explained and their impact on the<br />
efficiency of the AMG solver is shown. The benchmark results<br />
demonstrate that an efficient use of hybrid clusters is even possible<br />
for multi-level methods like AMG where fast solutions are needed<br />
on all levels for multiple problems sizes.<br />
Speaker(s): Thomas Brandes (Senior Scientist, Fraunhofer Institute for<br />
Algorithms and Scientific Computing SCAI), Jiri Krau, Fraunhofer<br />
Institute for Algorithms and Scientific Computing SCAI)<br />
Topic(s): Supercomputing (Intermediate)<br />
WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S2006 Emerging Companies Summit: CEO on Stage<br />
Featuring Raytrix and Playcast Featuring Raytrix,<br />
Playcast and Unviversal Robotics<br />
See the hottest new technologies from startups that are<br />
transforming computing. In a lively and fast-paced exchange, the<br />
Emerging Companies Summit CEO on Stage sessions will feature<br />
CEOs from three startups who will each have 15 minutes to<br />
introduce their companies and interact with a panel of leading<br />
venture capitalists, technology executives, and industry analysts.<br />
Speaker(s): Christian Perwass (CEO, Raytrix), Guy De Beer, (CEO,<br />
Playcast), David Peters (CEO, Universal Robotics)<br />
Panelist(s): Tom Furlong (Managing Director, Granite Ventures), Rob<br />
Enderle (Principal Analyst, Enderle Group), Flip GIanos (General Partner,<br />
InterWest Partners), Jeff Herbst (VP of Business Development, NVIDIA)<br />
Topic(s): General Interest (Beginner)<br />
WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />
ROOM M<br />
S0639 Presented by Penguin<br />
Description unavailable at press time.<br />
Topic(s): General (Beginner)<br />
WEDNESDAY, MAY 16, 17:00 (25 MINUTES)<br />
ROOM A7<br />
S0647 Effective HPC Architecture - Design, Develop,<br />
Implement (Presented by ELEKS)<br />
Effective HPC system is so much more than just GP<strong>GPU</strong>. Realworld<br />
applications often need to stream large amounts of data from<br />
across system boundaries to the dozens of worker nodes in a most<br />
scalable and efficient way. They usually require storing huge<br />
amounts of data, scheduling of computation jobs, monitoring of<br />
system health and results visualization. Having first-hand<br />
experience in design, development and implementation of end-toend<br />
HPC solutions, our engineers will share their experience on<br />
some of the pitfalls to avoid and things to consider when planning<br />
your next HPC system that works.<br />
Speaker(s): Oleh Khoma (Head of HPC Unit, ELEKS)<br />
Topic(s): Supercomputing; Application Design & Porting Techniques<br />
Intermediate (Beginner)<br />
WEDNESDAY, MAY 16, 17:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0810 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight development<br />
team! Whether you would like a private meeting to discuss specific<br />
product features or test out your application with the latest version<br />
of Nsight, or you just want to hang out with the team after attending<br />
one of the exciting training session, the lounge is great place to<br />
learn everything you ever wanted to know about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0096 Summed Area Ripmaps<br />
In this presentation, we show how ripmaps can replace Summed<br />
Area Tables (SATs) for the purpose of computing a large number<br />
of spatially varying box filter kernels throughout the input data,<br />
providing both higher accuracy and higher speed for typical use<br />
cases. For this purpose, we demonstrate an implementation of<br />
ripmap generation in CUDA C (accelerated by shared memory<br />
usage), and a texture-cache based box filter for spatially varying<br />
kernel sizes, which can be implemented in both CUDA C and<br />
graphics-based APIs (e.g. OpenGL and DirectX).<br />
Speaker(s): Gernot Ziegler (Compute Developer <strong>Technology</strong>, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques, Computer Vision,<br />
Computer Graphics (Intermediate)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
ROOM K<br />
S0217 Efficient Implementation of CFD Algorithms on<br />
<strong>GPU</strong> Accelerated Supercomputers<br />
The goal of this session is to introduce the concepts necessary to<br />
perform large computational fluid dynamic (CFD) problems on<br />
collections of many <strong>GPU</strong>s. Communication and computation<br />
overlapping schemes become even more critical when using fast<br />
compute engines such as <strong>GPU</strong>s that are connected via a relatively<br />
slow interconnect (such as MPI on InfiniBand). The algorithms<br />
presented are validated on unsteady CFD simulations of<br />
turbulence using 192 graphics processors to update half-a-billion<br />
unknowns per computational timestep. The performance results<br />
from three different <strong>GPU</strong> accelerated supercomputers (Lincoln,<br />
Forge, and Keeneland) are compared with a large CPU based<br />
supercomputer (Ranger).<br />
Speaker(s): Ali Khajeh Saeed (PhD Candidate, University of<br />
Massachusetts, Amherst), Blair Perot (University of Massachusetts,<br />
Amherst)<br />
Topic(s): Computational Fluid Dynamics, Computational Physics,<br />
Supercomputing, Application Design & Porting Techniques (Intermediate)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
ROOM C<br />
S0311 Teaching Applied Parallel Computing with <strong>GPU</strong>s<br />
Learn how the next generation of HPC developers are learning<br />
hands-on skills with <strong>GPU</strong>s, and how <strong>GPU</strong> computing is being<br />
incorporated into Computer Science courses. We will discuss how<br />
<strong>GPU</strong>s are being used to enhance student learning of parallel<br />
computing concepts through a cross-teaching approach, where<br />
students with different domain expertise are grouped into teams<br />
and tasked with parallelizing an application such as ray tracing.<br />
We’ll show that student projects that emphasize optimization of<br />
architectural resources and performance tuning allow students
with no prior experience to parallelize a large-scale application with<br />
significant performance improvement in as little as six weeks.<br />
Speaker(s): Chris Lupo (Assistant Professor, California Polytechnic<br />
State University)<br />
Topic(s): General Interest, Ray Tracing (Intermediate)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
ROOM N<br />
S0346 GP<strong>GPU</strong> Accelerated Protein Similarity Measures<br />
Identifying Biological Relevant Structure<br />
Atomic structure similarity measures for proteins help in de novo<br />
protein structure prediction. For a large set of computationally<br />
generated protein structures (~20k) all pairwise similarities have to<br />
be calculated to cluster structures. Common similarity measures<br />
are root mean square deviation (RMSD) and global distance test<br />
total score (GDT_TS). Although GDT_TS has advantages over RMSD,<br />
it is not used due to its time consuming calculation. Afore<br />
mentioned and other similarity measures are ported for parallel<br />
execution on GP<strong>GPU</strong>s to make them amenable for clustering de<br />
novo generated structural models to find the largest cluster<br />
representing the biological relevant protein conformations.<br />
Speaker(s): Edward Lowe (Research Assistant Professor, Vanderbilt<br />
University), Nils Woetzel (Research Assistant, Vanderbilt University)<br />
Topic(s): Bioinformatics, Application Design & Porting Techniques<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
ROOM A1<br />
S0425 File Sharing Plus Real Time Media and<br />
Document Collaboration<br />
Studiopass is a cloud based file sharing and visual collaboration<br />
tool which allows participants to collaborate on Microsoft<br />
documents and media files including 1080p video. It is graphic<br />
intensive and requires the best <strong>GPU</strong> performance to push<br />
playback of heavy files. This session will discuss how NVIDIA<br />
Tegra powered devices delivers the graphic and video<br />
performance needed for efficient collaboration needs and how it<br />
will bring more acceleration with the new Tegra 3 Quad Core plus<br />
1. Studiopass collaboration is not only accelerated by Tegra<br />
devices but also leverages NVIDIA Tesla accelerated transcoding<br />
running on Amazon Web Services.<br />
Speaker(s): Kevin Jackson (Founder / CEO, Viewpartners)<br />
Topic(s): Mobile Applications & Interfaces, Cloud Computing (Beginner)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
ROOM L<br />
S0656 kdb+ and <strong>GPU</strong>s for Market Data Analytics<br />
and Trading<br />
Market data volumes increase year-on-year with the occasional<br />
extraordinary capacity-breaking peak. We must capture, store and<br />
process these data to gain insights for quantitative and<br />
algorithmic trading using a variety of market data analytics and<br />
techniques. kdb+ from KX Systems is a memory-based column<br />
database, written in the vector-functional language q, often used<br />
in finance for these analyses. In this session we demonstrate a<br />
method for the enhanced performance of general programs<br />
written in q and kdb+ by executing them on the <strong>GPU</strong>.<br />
Speaker(s): Philip A. Beasley-Harling (Bank of America Merrill Lynch)<br />
Topic Area(s): Finance (Beginner)<br />
WEDNESDAY, MAY 16, 17:30 (25 MINUTES)<br />
ROOM J1<br />
S0706 PISTON: Portability and Performance for Data-<br />
Parallel Visualization and Analysis Operators<br />
Due to the wide variety of current and next-generation<br />
supercomputing architectures, the development of highperformance<br />
parallel visualization and analysis operators<br />
frequently requires re-writing the underlying algorithms for many<br />
different platforms. In order to facilitate portability, we have<br />
devised a framework for creating such operators that employs the<br />
data-parallel programming model.<br />
Speaker(s): Christopher Sewell (Los Alamos National Laboratory),<br />
Li-Ta Lo (Los Alamos National Laboratory)<br />
Topic(s): <strong>GPU</strong>/Hybrid Computing, Data Science and Visualization<br />
(Intermediate)<br />
WEDNESDAY, MAY 16, 18:00 (50 MINUTES)<br />
ROOM L<br />
S0653 C++ and CUDA Birds-of-a-Feather<br />
This birds-of-a-feather will provide an opportunity for C++ and<br />
<strong>GPU</strong> users to learn about how the powerful C++ language can be<br />
used on the CUDA platform. NVIDIA and guest speakers will<br />
present details of the latest C++ features in CUDA and the Thrust<br />
open source template library, as well as discuss some goals and<br />
directions for C++ on the CUDA platform. It will also provide<br />
attendees a valuable opportunity to network with other attendees<br />
and NVIDIA engineers who share their interest in C++.<br />
Speaker(s): Mark Harris (Chief Technologist, <strong>GPU</strong> Computing, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
67 CONFERENCE GUIDE WEDNESDAY
SESSION INFORMATION<br />
THURSDAY, MAY 17<br />
THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0057 <strong>GPU</strong>-Accelerated Molecular Dynamics Simulation<br />
of Solid Covalent Crystals<br />
An efficient and highly scalable algorithm for molecular dynamics<br />
(MD) simulation (using sophisticated many-body potentials) of solid<br />
covalent crystals is presented. Its effective memory throughput on a<br />
single C2050 <strong>GPU</strong> board reached 102 GB/s (81% of the peak), the<br />
instruction throughput reached 412 Ginstr/s (80% of the peak), and<br />
27% of the peak flops of a single <strong>GPU</strong> was obtained. Parallel<br />
efficiency of the algorithm can be as high as 95% on all 7168 <strong>GPU</strong>s<br />
of Tianhe-1A, reaching possibly a record in high performance of MD<br />
simulations, 1.87Pflops in single precision.<br />
Speaker(s): Wei Ge (Professor, Institute of Process Engineering,<br />
Chinese Academy of Sciences)<br />
Topic(s): Molecular Dynamics, Algorithms & Numerical Techniques,<br />
Supercomputing (Intermediate)<br />
THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />
ROOM A8<br />
S0129 A Monte Carlo Thermal Radiation Solver in <strong>GPU</strong>/<br />
CPU Hybrid Architecture<br />
A Monte Carlo ray-tracing code is developed to predict radiative<br />
heat transfer behaviours in CFD simulation of combustion<br />
phenomena. Using emission-reciprocal method, each random ray<br />
casting of each node could be independently conducted for<br />
parallel computations. The code is efficiently implemented in<br />
hybrid <strong>GPU</strong>/CPU HPC resources using a dedicated dynamic load<br />
balancing strategy. A linear speedup scaling of hybrid HPC<br />
resources has been shown in demonstrating calculation of<br />
radiative heat transfer of a helicopter engine’s combustion<br />
chamber, while adding one <strong>GPU</strong> in HPC resources pool is in sense<br />
of nine CPU cores supplements.<br />
Speaker(s): Oliver Gicquel (Professor, Laboratoire E.M2.C, Ecole<br />
Centrale Paris), Gaofeng Wang (Postdoc Fellow, Laboratoire E.M2.C,<br />
Ecole Centrale Paris)<br />
Topic(s): Computational Fluid Dynamics, Computational Fluid<br />
Dynamics, Computational Physics, Ray Tracing (Intermediate)<br />
THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />
ROOM A3<br />
S0133 Improving Mars Rover Image Compression Via<br />
<strong>GPU</strong>s And Genetic Algorithms<br />
Learn how to use Jacket to accelerate genetic algorithm (GA)<br />
image compression. Our research uses a GA to optimize lossy<br />
compression transforms that outperform state-of-the-art<br />
wavelet-based approaches for a variety of image classes,<br />
including fingerprints, satellite, medical, and images transmitted<br />
from the Mars Exploration Rovers. A typical training run evolves a<br />
population of transforms over many generations; since each<br />
transform must be applied to each image from the training set,<br />
each run entails thousands of independent, parallelizable fitness<br />
evaluations. By using MATLAB, and Jacket to perform 2D<br />
convolution on the <strong>GPU</strong>, we have greatly reduced the total<br />
computation time needed.<br />
Speaker(s): Brendan Babb (Student/Research Technician, University of<br />
Alaska Anchorage)<br />
Topic(s): Machine Learning & AI, Audio, Image and Video Processing,<br />
Development Tools & Libraries (Beginner)<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM N<br />
S0256 A Stencil Library for the New Dynamic Core<br />
of COSMO<br />
We will present a stencil library used in the heart of the COSMO<br />
numeric weather prediction model. During the talk we’ll show<br />
how we implemented an abstraction that allows easy development<br />
of new stencils and solvers on top of a framework allowing<br />
execution on both CPU and <strong>GPU</strong>. The library makes efficient use<br />
of <strong>GPU</strong> resources and we will show how to structure memory<br />
accesses and computation optimally. Developers involved in<br />
porting or writing fully-featured C++ libraries for CUDA will also<br />
be interested in attending.<br />
Speaker(s): Tobias Gysi (Supercomputing Systems AG),<br />
Paul Messner (NVIDIA)<br />
Topic(s): Climate & Weather Modeling, Development Tools &<br />
Libraries (Advanced)<br />
THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0302 Accelerating miniFE: A Finite Element<br />
Mini-application<br />
The Mantevo performance project is a collection of self-contained<br />
proxy applications that illustrate the main performance<br />
characteristics of important algorithms. miniFE is intended to be<br />
and approximation to an unstructured implicit finite element or<br />
finite volume application. Our work investigated algorithms for<br />
assembling a matrix on the <strong>GPU</strong>. Parallelization algorithms using<br />
both 1 thread and 8 threads per element were investigated. Using<br />
these approaches a significant speedup (over 60x for double<br />
precision) compared to the serial algorithm.<br />
Speaker(s): Justin Luitjens (Developer <strong>Technology</strong>, Compute, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM C<br />
S0303 <strong>GPU</strong> Acceleration for Threshold Based Region<br />
Growth Algorithms<br />
Come learn how the massively parallel computing power of<br />
modern <strong>GPU</strong>s help to create faster and more accurate volume<br />
rendered images for the medical imaging community. Attendees<br />
of this session will gain insight into how <strong>GPU</strong>s can accelerate<br />
region growth algorithms and how these algorithms can be<br />
optimized for the latest generation of NVIDIA hardware. Topics<br />
covered will include fundamental of region growth, <strong>GPU</strong><br />
implementations, and practical examples of vessel tracking<br />
algorithms based on <strong>GPU</strong> accelerated algorithms.<br />
Speaker(s): Supratik Moulik (Cardiovascular Imaging Fellow, University<br />
of Pennsylvania), Jason Walsh (University of Pennsylvania 3D Lab)<br />
Topic(s): Medical Imaging & Visualization, Bioinformatics (Beginner)<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM A1<br />
S0326 Next Generation InfoWall<br />
Learn how you can use a multiple display configuration to render<br />
video content captured from multiple sources, utilizing the power<br />
of <strong>GPU</strong>s to achieve unprecedented performance.<br />
Speaker(s): Alina Alt (Applied Engineer, NVIDIA), Andrew Page (Sr.<br />
Product Manager, NVIDIA), Shalini Venkataraman (Senior Applied<br />
Engineer, NVIDIA), Ian Williams (NVIDIA)<br />
Topic(s): Visualization, Computer Graphics (Intermediate)<br />
69 CONFERENCE GUIDE THURSDAY
THURSDAY<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM B<br />
S0333 GMAC-2: Easy and Efficient <strong>Program</strong>ming for<br />
CUDA-Based Systems<br />
In this talk we introduce GMAC-2, a framework that eases the<br />
development of CUDA applications and tools while achieving<br />
similar or better performance than hand-tuned code. The new<br />
features implemented in GMAC-2 allow programmers to further<br />
fine-tune their code and remove some limitations found in the<br />
original GMAC library. For example, memory objects can be now<br />
arbitrarily mapped on several devices without restrictions and a<br />
host thread can launch kernels on any <strong>GPU</strong> in the system.<br />
Moreover, GMAC-2 transparently takes advantage of the new<br />
features offered by the hardware like the <strong>GPU</strong>Direct 2 peer-topeer<br />
communication.<br />
Speaker(s): Javier Cabezas (PhD Student, Barcelona Supercomputing<br />
Center), Isaac Gelado (Senior Researcher, Barcelona<br />
Supercomputing Center)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM M<br />
S0347 Accelerating Radio Astronomy Cross-Correlation<br />
beyond 1 Tflops using Fermi<br />
Radio astronomy is a signal processing application that requires<br />
extreme supercomputing. While today’s radio telescopes require<br />
10-100 Tflops of computational power, by the end of the decade<br />
this will increase to 1 Exaflops. The most compute intensive part<br />
of this problem is the so-called cross-correlation algorithm, which<br />
is a linear-algebra problem. In this session we demonstrate that<br />
the Fermi architecture is ideally suited to this problem, and<br />
through exploiting the Fermi memory hierarchy it is possible to<br />
achieve close to 80% of peak performance in a real application.<br />
Speaker(s): Michael Clark (Compute DevTech Engineer, NVIDIA)<br />
Topic(s): Astronomy & Astrophysics, Supercomputing (Intermediate)<br />
THURSDAY, MAY 17, 09:00 (25 MINUTES)<br />
HALL 1<br />
S0362 Maximizing Performance on Multi-<strong>GPU</strong> Systems<br />
Are 512 CUDA Cores not enough? This session is for power users<br />
that are looking to scale applications to multi-<strong>GPU</strong> systems. We<br />
will take a holistic approach towards optimization. Rather than<br />
just focusing on CUDA programming, this session will cover<br />
techniques for reducing pressure on the PCIe bus, using CUDA<br />
Streams to improve load balance, dealing with NUMA impacts,<br />
and taking advantage of CPU threads. This talk will also cover<br />
strategies for developing applications that run on clusters with<br />
100 or more <strong>GPU</strong>s.<br />
Speaker(s): Kenneth Czechowski (Student, Georgia Tech)<br />
Topic(s): Supercomputing (Advanced)<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM L<br />
S0619 Hate to Wait? Flash Memory for Full-Throttle<br />
<strong>GPU</strong> Acceleration (Presented by Fusion-io)<br />
Are you guilty of ever not trying out an idea because of the time it<br />
would take to process the effect? With flash memory throttling your<br />
system like jet fuel for your <strong>GPU</strong>, you can finally make sluggish<br />
application performance a bad memory. This session will couple a<br />
technical overview of the latest in PCIe-attached flash memory<br />
technology for accelerating graphics processing with developer best<br />
practices and tuning for <strong>GPU</strong> applications using flash memory for<br />
image compositing, editing, video playback, 3D content creation,<br />
video capture and many other data-intensive tasks.<br />
Speaker(s): Vincent Brisebois (Visual Computing Product Manager,<br />
Fusion-io), Robert Wipfel (Fellow, Fusion-io)<br />
Topic(s): Digital Content Creation & Film, Computer Graphics<br />
(Intermediate)<br />
THURSDAY, MAY 17, 9:00 (50 MINUTES)<br />
ROOM A2<br />
S0648 Presented by ASUS<br />
Description unavailable at press time.<br />
Topic(s): General<br />
THURSDAY, MAY 17, 9:00 (110 MINUTES)<br />
ROOM J2<br />
S0707 Accelerated HPC Symposium: Scalability:<br />
Hardware and Software (Presented by LANL)<br />
This session will feature an introduction by Justin Tripp, followed<br />
by a short talk on “The FPGA: Another Piece of the Puzzle”<br />
followed by talk on “Increasing Efficiency with Kepler.” After a<br />
short discussion and break, we’ll end this session with three short<br />
talks, “Image Analysis for Terascale Radio Astronomy,” “In situ<br />
Image Analysis for Large Scale Visualization,” and “<strong>GPU</strong><br />
Acceleration of MapReduce.<br />
Speaker(s): Justin Tripp (LANL), Stephen Jones (NVIDIA),Christopher<br />
Fluke (Swinburne University of <strong>Technology</strong>),Christopher Sewel (LANL),<br />
Miao Xin (Junnan University)<br />
Topic(s): Supercomputing (Intermediate)<br />
THURSDAY, MAY 17, 9:00 (110 MINUTES)<br />
ROOM J3<br />
S0708 Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 1 (Presented<br />
by LANL)<br />
This session will feature an introduction by Guillaume Colin de<br />
Verdiere, followed by a short talk on “Precondition for Large-Scale<br />
Linear Solvers.” Proceeding this segment are two short talks on<br />
“Changing Data Structures for a Changing World,” and<br />
“Leveraging Roadrunner Experiences,” After a short discussion<br />
and break, we will then end this Part 1 of 2 talks with “Taming<br />
Laser Plasma Interactions: PICon<strong>GPU</strong>”.<br />
Speaker(s): Dimitar Lukarski (Karlsruhe Institute of <strong>Technology</strong>), Hui<br />
Liu (University of Calgary) and Michael Bussmann (Helmholtz-Zentrum<br />
Dresden-Rossendorf), Jamal Mohd-Yusof (Los Alamos National<br />
Laboratory)<br />
Topic(s): Supercomputing (Intermediate)<br />
THURSDAY, MAY 17, 09:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0811 CUDA Debugger Training on Windows<br />
Nsight offers a variety of powerful CUDA debugging feature set<br />
that enables developers to quickly spot bugs. From the memory<br />
checker to advanced breakpoints and variable warp watch panel, a<br />
developer can quickly isolate access memory errors, filter out the<br />
thousands of threads to a specific thread and quickly spot<br />
abnormal variable value ranges. Through a set of comprehensive<br />
exercises, the attendee will be able to utilize these features to<br />
become fully proficient at developing CUDA code.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)
THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />
ROOM A3<br />
S0081 Parallel Computing In Mobile Robotics for RISE<br />
RISE, Risky Intervention and Surveillance Environment, is a very<br />
demanding task. In this presentation, three areas of research are<br />
discussed, these include: 3D data registration, robot navigation<br />
and 3D cloud of points processing. The approach based on robust<br />
KNN nearest neighborhood search applied for improvement of ICP<br />
algorithm is shown. The path planning parallel approach based on<br />
wave propagation method is shown. On line segmentation of 3D<br />
cloud of points based on normal vector computation is given. The<br />
set of proposed algorithms where tested on GP<strong>GPU</strong> NVIDIA CUDA<br />
GF 580, the results are satisfying.<br />
Speaker(s): Janusz Bedkowski (Researcher)<br />
Topic(s): Machine Vision (Beginner)<br />
THURSDAY, MAY 17, 09:30 (50 MINUTES)<br />
ROOM K<br />
S0238 Tesla Cluster Monitoring & Management APIs<br />
Learn more about cluster management and monitoring of Tesla<br />
and Quadro products. This includes a detailed description of the<br />
NVIDIA Management Library (NVML) and user facing third party<br />
software. Additionally, a brief summary of our out-of-band<br />
capabilities will be provided.<br />
Speaker(s): Robert Alexander (CUDA Tools Software Engineer, NVIDIA)<br />
Topic(s): Cluster Management (Beginner)<br />
THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />
ROOM A8<br />
S0264 CU++: An Object-Oriented Framework for<br />
Computational Fluid Dynamics (CFD) Applications<br />
In this session, I will elucidate the power of blending C++<br />
expression templates and CUDA which has resulted in a smart<br />
framework - CU++ for solving Computational Fluid Dynamics<br />
problems on structured and unstructured meshes. Briefly, CU++<br />
allows a code developer with just C/C++ knowledge to write<br />
computer programs that will execute on the <strong>GPU</strong> with minimal<br />
knowledge of specific programming techniques in CUDA. It allows<br />
the user to reuse existing C/C++ CFD codes with minimal<br />
changes. Codes written in CU++ can also be compiled in serial<br />
mode to be executed on a CPU using the tool ugc.<br />
Speaker(s): Dominic Chandar (Postdoctoral Research Associate,<br />
University of Wyoming)<br />
Topic(s): Computational Fluid Dynamics, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0290 Algorithm Acceleration for Geospatial Analysis<br />
Learn how the power of <strong>GPU</strong> computing is being leveraged to<br />
accelerate algorithms in the field of geospatial image analysis.<br />
The data volume and computation requirements associated with<br />
geospatial imagery are rapidly expanding as a result of the<br />
increasing number of satellite and airborne sensors, greater data<br />
accessibility, and expanded utilization of data intensive<br />
technologies. This equates to a growing need for highperformance<br />
computing in this field. We demonstrate the capacity<br />
for <strong>GPU</strong> computing to meet this need by accelerating a complex<br />
non-linear optimization algorithm used for the mapping and<br />
assessment of coral reef ecosystems.<br />
Speaker(s): James Goodman (President/CEO, HySpeed Computing<br />
LLC), Matthew Sellitto (Northeastern University)<br />
Topic(s): Algorithms & Numerical Techniques, General Interest<br />
(Intermediate)<br />
THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0354 Bcl::ChemInfo Suite Enables Machine Learning-<br />
Based Drug Discovery Using <strong>GPU</strong>s<br />
High-throughput screening data allows the training of machine<br />
learning quantitative structure activity relationship models which<br />
can be used for in silico drug discovery screening. Here, we present<br />
a <strong>GPU</strong>- accelerated suite for descriptor generation, model training,<br />
feature selection, and data set similarity analysis, bcl::ChemInfo.<br />
The suite provides functionality for the analysis of constructed<br />
models as well as for screening external libraries of compounds.<br />
We examine case studies illustrating how this workflow can now be<br />
completed in a single day on a Tesla equipped workstation with<br />
speedups reaching 300x providing a complete <strong>GPU</strong>-accelerated<br />
cheminformatics framework for drug discovery.<br />
Speaker(s): Edward Lowe (Research Assistant Professor, Vanderbilt<br />
University), Nils Woetzel (PhD Candidate, Vanderbilt University)<br />
Topic(s): General Interest (Intermediate)<br />
THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />
HALL 1<br />
S0360 Set <strong>GPU</strong>s Free: Integrating a File System with<br />
CUDA <strong>Program</strong>s<br />
This session seeks the answer to the question: “Can we simplify<br />
and speed up CUDA programs by allowing them to access files<br />
residing on a host?” To prove our affirmative answer, we<br />
demonstrate how the concept of a file system enables programs<br />
with non-trivial CPU-<strong>GPU</strong> and <strong>GPU</strong>-<strong>GPU</strong> interactions to be<br />
efficiently and easily implemented on top of a new <strong>GPU</strong> file-system<br />
layer. We also show that such a file system enables implementation<br />
of fully stand-alone <strong>GPU</strong> programs without any CPU wrapper code.<br />
Finally we outline the details of the file system design which<br />
contributed to scalability, data consistency and performance.<br />
Speaker(s): Mark Silberstein (Post-doctoral Researcher, UT Austin),<br />
Emmet Witchel (University of Texas, Austin)<br />
Topic(s): General Interest (Intermediate)<br />
THURSDAY, MAY 17, 09:30 (25 MINUTES)<br />
ROOM A5<br />
S0621 NVIDIA OpenACC<br />
OpenACC is a directives-based programming standard for parallel<br />
computing on accelerators (including <strong>GPU</strong>s). It is designed to<br />
harness the transformative power of heterogeneous computing<br />
systems easily and quickly. Adding simple compiler hints to your<br />
code to express parallelism, allows the compiler to map<br />
computation onto an accelerator. OpenACC directives allow<br />
developers to make simple and portable code changes, enabling an<br />
easier migration to accelerated computing. This talk discusses the<br />
merits of this model, and provides an overview and guidance of the<br />
tools available to the developer from the OpenACC members.<br />
Speaker(s): Duncan Poole (Senior Manager, HPC, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0039 Data-Driven GP<strong>GPU</strong> Ideology Extension<br />
In this session we will demonstrate how the GP<strong>GPU</strong> ideology can<br />
71 CONFERENCE GUIDE THURSDAY
THURSDAY<br />
be extended so that it can be used on a scale of Infiniband hybrid<br />
system. The approach that we are presenting combines delayed<br />
execution, scheduling techniques and, most importantly, casts<br />
down the CPU multi-core ideology to the streaming<br />
multiprocessor’s one enforcing full fledged “GP<strong>GPU</strong> as a coprocessor”<br />
way of programming for large-scale MPI hybrid<br />
applications. Staying compatible with modern CPU/GP<strong>GPU</strong><br />
libraries it provides more than a fine grained control over<br />
resources - more than you wanted that is.<br />
Speaker(s): Bela Bauer (Postdoc, Microsoft Research), Alexandr<br />
Kosenkov (Software Engineer, University of Geneva)<br />
Topic(s): Application Design & Porting Techniques, Computational<br />
Physics, Parallel <strong>Program</strong>ming Languages & Compilers, Development<br />
Tools & Libraries (Advanced)<br />
THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />
ROOM N<br />
S0053 Real Time <strong>GPU</strong>-Based Marine Scenes Simulation<br />
Marine survey, carried out by sea or by air, is of major concern for<br />
current defense and security applications. Essential surveillance/<br />
observation/ identification systems involve electro-optics (visible<br />
and infra-red) and radar. Optimizing their performance requires<br />
amounts of expensive observational data spanning the wide<br />
variability of the marine environment. Computer simulation<br />
provides a valuable flexible and inexpensive alternative. Since<br />
2007, ALYOTECH, in partnership with the IFREMER (French<br />
Research Institute for Exploration of the Sea), has been developing<br />
a <strong>GPU</strong>-based real-time ocean scene simulator for visible, infrared<br />
and radar sensors, in order to meet the challenging requirements<br />
arising from marine survey issues.<br />
Speaker(s): Jérôme Graindorge (Project Manager, ALYOTECH), Julien<br />
Houssay (Software Engineer, ALYOTECH)<br />
Topic(s): Climate & Weather Modeling, Visualization (Intermediate)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM B<br />
S0078 Panoptes: A Binary Instrumentation Framework<br />
for CUDA<br />
Traditional CPU-based computing environments offer a variety of<br />
binary instrumentation frameworks, while the instrumentation<br />
and analysis tools available to date for <strong>GPU</strong> environments have<br />
been more limited. Here we present Panoptes, a binary<br />
instrumentation framework for CUDA that targets the <strong>GPU</strong>. By<br />
exploiting the <strong>GPU</strong> to run modified kernels, Panoptes allows<br />
computationally intensive programs to be run at the native<br />
parallelism of the device during analysis. To demonstrate the<br />
instrumentation capabilities of Panoptes, we will present our work<br />
on a memory addressability and validity checker that targets<br />
CUDA programs.<br />
Speaker(s): Christopher Kennelly (Research Scientist,<br />
D. E. Shaw Research)<br />
Topic(s): Development Tools & Libraries (Advanced)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM M<br />
S0124 Signal Processing on <strong>GPU</strong>s for Radio Telescopes<br />
This session will present <strong>GPU</strong> implementations of four highly<br />
compute-intensive algorithms used by radio telescopes.<br />
Speaker(s): John Romein (Senior Researcher, ASTRON)<br />
Topic(s): Astronomy & Astrophysics (Intermediate)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM C<br />
S0244 Harnessing <strong>GPU</strong> Compute with C++ AMP (Part 2 of 2)<br />
C++ AMP is an open specification for taking advantage of<br />
accelerators like the <strong>GPU</strong>. In this session we will explore the C++<br />
AMP implementation in Microsoft Visual Studio 11. After a quick<br />
overview of the technology understanding its goals and its<br />
differentiation compared with other approaches, we will dive into<br />
the programming model and its modern C++ API. This is a code<br />
heavy, interactive, two-part session, where every part of the<br />
library will be explained. Demos will include showing off the<br />
richest parallel and <strong>GPU</strong> debugging story on the market, in the<br />
upcoming Visual Studio release.<br />
Speaker(s): Daniel Moth (Principal <strong>Program</strong> Manager, Microsoft)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Development<br />
Tools & Libraries (Intermediate)<br />
THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />
ROOM A8<br />
S0305 Classical Algebraic Multigrid for CFD with CUDA<br />
Classical algebraic multigrid (AMG) is one of the most popular<br />
algorithms used in engineering, and the engine in many<br />
successful commercial packages. Among sparse linear solvers, it<br />
is known for being fast, parallel and scalable, yet it maps to <strong>GPU</strong><br />
architecture with some considerable difficulty. We have tackled<br />
these difficulties and currently have a full CUDA implementation<br />
of classical AMG, which has been validated against the goldstandard,<br />
Hypre. Significant effort was dedicated to reducing<br />
thread divergence and optimizing memory access, and we<br />
continue to work on performance improvements. We are aiming<br />
for a competitive AMG code for fluid dynamics applications.<br />
Speaker(s): Simon Layton (PhD Candidate, Boston University)<br />
Topic(s): Computational Fluid Dynamics, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0315 Probing Bio-Nano Interface Structure from<br />
Microsecond Molecular Dynamics on <strong>GPU</strong>s<br />
Using the latest algorithmic development in molecular dynamics<br />
on multiple <strong>GPU</strong>s over MPI, and technologies like <strong>GPU</strong>Direct it is<br />
now possible to address problems of interaction at bio-nano<br />
interface via large scale atomistic simulations. This talk will<br />
discuss the aspects of DNA-nanotube interactions and SWCNT<br />
induced conformational changes in DNA nucleosome structure.<br />
We will also address technical challenges upon porting and tuning<br />
AMBER 11 code on Condor <strong>GPU</strong> cluster at AFRL.<br />
Speaker(s): Olexandr Isayev (Research Scientist, Case Western<br />
Reserve University)<br />
Topic(s): Molecular Dynamics, Life Sciences (Advanced)<br />
THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />
ROOM A1<br />
S0324 Content Generation and Real-Time Hologram<br />
Computation for Holographic 3D-Displays<br />
This session will introduce SeeReal’s sub-hologram technology to<br />
massively reduce hologram computation effort in comparison to<br />
classic holography and how SeeReal implemented those still<br />
compute intensive algorithms to execute on the <strong>GPU</strong> to enable<br />
viewing of interactive, rich 3D-content on holographic 3D-displays<br />
using off-the-shelf graphics hardware. In contrast, you will<br />
explore why classic holography does not suit well for interactive
applications. Furthermore guidelines to create appropriate<br />
3D-content are presented, including aspects regarding<br />
transparency in holograms. Finally the specification and some<br />
impressions of SeeReal’s 20” holographic prototype will be<br />
presented, which allows viewing of live computed holograms<br />
showing 3D-content and 3D-video.<br />
Speaker(s): Enrico Zschau (Lead Software Architect, SeeReal<br />
Technologies GmbH)<br />
Topic(s): Visualization, Stereoscopic 3D, Algorithms & Numerical<br />
Techniques, Audio, Image and Video Processing (Beginner)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
HALL 1<br />
S0338 New Features In the CUDA <strong>Program</strong>ming Model<br />
The continuing evolution of the <strong>GPU</strong> brings with it new hardware<br />
capabilities and new functionality. Simultaneously, ongoing<br />
development of CUDA and its tools, libraries and ecosystem<br />
brings new features to the software stack as well. Come and learn<br />
from on of CUDA’s programming model architects about what’s<br />
new in the <strong>GPU</strong>, what’s coming in the next release of CUDA, how it<br />
works, and how it all fits together.<br />
Speaker(s): Stephen Jones (CUDA Developer, NVIDIA)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Intermediate)<br />
THURSDAY, MAY 17, 10:00 (25 MINUTES)<br />
ROOM A2<br />
S0508 Faster Finite Elements for Wave Propagation Codes<br />
Learn how to develop faster and better finite-element codes for<br />
wave propagation using <strong>GPU</strong>s and MPI combined with overlapping<br />
techniques to hide the cost of communications and of host/device<br />
memory copies. Different options based on mesh coloring or on<br />
atomic operations will be presented. The difficulty to define<br />
speedup will also be discussed (speedup versus what? Using what<br />
definition of “cost”?). Examples will be given using SPECFEM3D, a<br />
highly optimized spectral finite-element code that has won the<br />
Gordon Bell SuperComputing award and the BULL Joseph Fourier<br />
award, and that can run on CPU or <strong>GPU</strong> clusters.<br />
Speaker(s): Max Rietmann (PhD Student, Institute for Computational<br />
Science / USI Lugano, Switzerland)<br />
Topic(s): Algorithms & Numerical Techniques, Computational<br />
Physics (Intermediate)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM A3<br />
S0521 Desktop Supercomputing in the Soft-Matter<br />
Physics Laboratory<br />
While many GP<strong>GPU</strong> applications reside on large clusters, in many<br />
laboratories the time to move data to an external cluster would<br />
exceed the time to analyze it upon arrival. By bringing highthroughput<br />
computational power to the data in the laboratory,<br />
<strong>GPU</strong>s offer new capabilities in doing science. This session offers a<br />
number of ways in which <strong>GPU</strong>s are making a significant impact on<br />
our research in experimental physics, biology and chemistry, from<br />
designing and building apparatus (Quadro and Tesla), to collecting<br />
data on portable devices (Tegra), to high-throughput analysis of<br />
large data sets (Tesla). It also presents results from studies<br />
investigating the motion of diffusing and aggregating colloidal<br />
particles and swimming bacteria, observing liquid-gas phase<br />
separation onboard the International Space Station, applying high<br />
dynamic-range techniques to optical tomography, and using<br />
low-cost devices to detect chemical and microbial contamination<br />
in the third world.<br />
Speaker(s): Peter Lu (Post-Doctoral Research Fellow, Harvard University)<br />
Topic(s): General Interest (Beginner)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM A5<br />
S0622 The PGI Fortran and C99 OpenACC Compilers<br />
Experienced <strong>GPU</strong> programmers will learn about the latest PGI<br />
OpenACC Fortran and C compilers. This session discusses how<br />
and where to apply the Parallel and Kernels constructs and the<br />
differences between the two. It includes a review of the latest PGI<br />
release and a comparison of the OpenACC standard to the PGI<br />
Accelerator Model. Live component demonstrates how to interpret<br />
compiler feedback and how to use it to enable better performance<br />
and how to inter-operate with lower-level explicit <strong>GPU</strong> languages<br />
like CUDA and OpenCL. The presentation wraps up with a look at<br />
planned future enhancements.<br />
Speaker(s): Brent Leback (Portland Group)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers (Beginner)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM L<br />
S0644 Molecule Dynamics, <strong>GPU</strong>s, and EC2 (Presented by<br />
Amazon Web Services)<br />
<strong>GPU</strong>s have made molecular dynamics simulations faster, better,<br />
and cheaper, achieving supercomputer performance from a single<br />
<strong>GPU</strong> without sacrificing stability or accuracy. In this talk we<br />
demonstrate how the <strong>GPU</strong> refactoring of AMBER 12 Molecular<br />
Dynamics has led to an implementation that produces results that<br />
are indistinguishable from the original CPU code. In addition, we<br />
describe the <strong>GPU</strong> compute instances available on the Amazon EC2<br />
platform to show how anyone can run any number of AMBER 12<br />
simulations, anytime from anywhere.<br />
Speaker(s): Scott Le Grand (Principal Engineer, Amazon Web Services)<br />
Topic(s): Molecular Dynamics; Computational Fluid Dynamics<br />
(Intermediate)<br />
THURSDAY, MAY 17, 10:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0812 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight<br />
development team! Whether you would like a private meeting to<br />
discuss specific product features or test out your application with<br />
the latest version of Nsight, or you just want to hang out with the<br />
team after attending one of the exciting training session, the<br />
lounge is great place to learn everything you ever wanted to know<br />
about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />
ROOM A2<br />
S0079 Warped Parallel Nearest Neighbor Searches<br />
Using KD-Trees<br />
We propose a nearest neighbor search algorithm for a set of<br />
closely located query points that utilizes <strong>GPU</strong> parallelism and is<br />
optimized for a single CUDA warp. Instead of each query point<br />
traversing its own distinct path, a combined non-divergent path<br />
suitable for the entire query set can constructed. Therefore, for a<br />
single warp a single stack can be maintained for the entire set of<br />
query points, allowing for efficient utilization of the shared<br />
memory and a number of simultaneous queries equal to the<br />
number of threads in a warp.<br />
73 CONFERENCE GUIDE THURSDAY
Speaker(s): Roman Sokolov (Director of System Architecture, D4D<br />
Technologies), Andrei Tchouprakov (Director of System Architecture,<br />
D4D Technologies)<br />
Topic(s): Algorithms & Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />
ROOM N<br />
S0107 Acceleration of Long-Wave Rapid Radioactive<br />
Transfer Model on GP<strong>GPU</strong><br />
The WRF model is a next-generation mesoscale numerical<br />
weather prediction system designed to serve both operational<br />
forecasting and atmospheric research communities. WRF offers<br />
multiple physics options, one of which is the Long-Wave Rapid<br />
Radiative Transfer Model. We found, porting rtrn() subroutine to<br />
the CUDA challenging. It has couple of recursive loops, for which<br />
GP<strong>GPU</strong>s are actually not suitable. We developed a new technique<br />
called loop inversion, which helped us in getting 7.7x speed up for<br />
the individual, rtrn() subroutine without memory transfer, and in<br />
turn 10x speed up for overall RRTM module including initialization<br />
and memory transfer.<br />
Speaker(s): Mahesh Khadtare (PhD Student - Scientist ESP, I2IT, Pune<br />
University), Prakalp Somawanshi (CRL India)<br />
Topic(s): Climate & Weather Modeling, Application Design & Porting<br />
Techniques, Climate & Weather Modeling (Intermediate)<br />
THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0122 Computational Screening of Novel Carbon<br />
Capture Materials<br />
Discover how <strong>GPU</strong>s are used to identify optimal framework<br />
structures for carbon dioxide separation with the goal of reducing<br />
carbon emission. We describe the algorithm behind our <strong>GPU</strong><br />
software tool that iterates through a database of hypothetical<br />
zeolites and computes the selectivity of each of the structures.<br />
The code can be easily extended to simulate other adsorbent<br />
structures such as ZIFs (zeolitic imidazolate frameworks) and<br />
provide valuable insights to both theorists and experimentalists<br />
who have interest in carbon capture research.<br />
Speaker(s): Jihan Kim (Postdoctoral Researcher, Berkeley Lab),<br />
Berend Smit (UC Berkeley/Berkeley Lab)<br />
Topic(s): Molecular Dynamics (Intermediate)<br />
THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />
ROOM A1<br />
S0252 Building Real-Time Professional Visualization<br />
Solutions with OpenCL<br />
Professional visualization solutions, like high-quality highresolution<br />
medical displays or very large screens for surveillance<br />
or entertainment, benefit from <strong>GPU</strong>’s image and graphics<br />
compute capabilities to achieve real-time performance, but add<br />
specific constraints, like low-latency, multiple HD streams and<br />
strict synchronization. This talk first motivates the industrial<br />
relevance of development in OpenCL on heterogeneous devices. It<br />
then explains the techniques currently explored to meet the<br />
specific design constraints, with a main focus on parallel data<br />
transfer and compute. The lessons learned are illustrated with a<br />
real-life example.<br />
Speaker(s): Kristof Denolf (Research Engineer, Barco), Ronny Dewaele<br />
(Director <strong>Technology</strong> Center, Barco)<br />
Topic(s): Audio, Image and Video Processing, Visualization (Intermediate)<br />
THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0291 LAtoolbox: A Multi-platform Sparse Linear<br />
Algebra Toolbox<br />
Find out about an easy way for building sparse linear solvers for<br />
<strong>GPU</strong>s and multi-/many-core platforms. Based on data abstraction<br />
and virtualization of the hardware, the LAtoolbox supports several<br />
platforms such as <strong>GPU</strong>s, multi-core CPUs, and accelerators. The<br />
various backends (CUDA, OpenCL, OpenMP, ...) utilize optimized and<br />
platform-specific routines and allow seamless integration of <strong>GPU</strong>s<br />
into scientific applications. By means of unified interfaces across all<br />
platforms the library enables you to build generic linear solvers and<br />
preconditioners on a single code base without specific information of<br />
your hardware. We demonstrate portability and flexibility of our<br />
open-source approach on heterogeneous platforms.<br />
Speaker(s): Dimitar Lukarski (Research Associate, Karlsruhe Institute<br />
of <strong>Technology</strong> (KIT)), Jan-Philipp Weiss (Junior Professor, Karlsruhe<br />
Institute of <strong>Technology</strong>)<br />
Topic(s): Application Design & Porting Techniques (Intermediate)<br />
THURSDAY, MAY 17, 10:30 (25 MINUTES)<br />
ROOM K<br />
S0309 Dynamically Allocating GP<strong>GPU</strong> to Host<br />
Nodes (Servers)<br />
Learn how to remotely change the mapping of <strong>GPU</strong>s to hosts<br />
based on application needs. Audience will then be presented with<br />
example scripts and a demo illustrating how this can be<br />
implemented to improve system resource utilization.<br />
Speaker(s): Alaa Yousif (Software Solution Architect, Dell), Saeed Iqbal<br />
(Senior Systems Engineer, Dell)<br />
Topic(s): Cluster Management (Beginner)<br />
THURSDAY, MAY 17, 11:00 (50 MINUTES)<br />
KEYNOTE HALL 1<br />
S3002 Day 3 Keynote: Not Your Grandfather’s Moon<br />
Landing<br />
Do not miss the day 3 keynote, featuring Part-Time Scientists<br />
Robert Boehme and Wes Faler. Boehme and Faler are part of a<br />
team of international scientists and engineers who want to send a<br />
rover to the moon before the end of the year 2013. In this<br />
presentation, they will discuss their goals, recent<br />
accomplishments and milestones, and how <strong>GPU</strong>s have help in<br />
unexpected ways.<br />
Speaker(s): Robert Boehme (CEO & Team Lead, Part-Time Scientists),<br />
Wes Faler (Head of Software Development, Part-Time Scientists)<br />
Topic(s): General Interest (All Levels)<br />
THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />
ROOM N<br />
S0044 A Massively Parallel Two-Phase Solver for<br />
Incompressible Fluids on Multi-<strong>GPU</strong> Clusters<br />
Join our presentation of a multi-<strong>GPU</strong> fluid solver for high<br />
performance <strong>GPU</strong> compute clusters. We use high-order scientific<br />
techniques to simulate the interaction of two fluids like air and<br />
water. Scientists, engineers and even the computer animation<br />
industry will profit from the enormous compute power of tens or<br />
hundreds of <strong>GPU</strong>s. A major focus in this talk will be on the applied<br />
<strong>GPU</strong> implementation techniques and the performance results<br />
including performance per Watt and performance per dollar<br />
results. We also highlight the lessons we learned from porting the<br />
complex CPU CFD code NaSt3DGPF to the <strong>GPU</strong>.<br />
75 CONFERENCE GUIDE THURSDAY
THURSDAY<br />
Speaker(s): Peter Zaspel (Research Assistant, University of Bonn)<br />
Topic(s): Computational Fluid Dynamics, Supercomputing, Algorithms &<br />
Numerical Techniques, Digital Content Creation & Film (Intermediate)<br />
THURSDAY, MAY 17, 14:00 (25 MINUTES)<br />
ROOM C<br />
S0054 PFAC Library: <strong>GPU</strong>-Based String Matching Algorithm<br />
In this section, we first propose an exact string matching<br />
algorithm, called Parallel-Failureless Aho-Corasick (PFAC)<br />
algorithm which is used to match input texts against a set of<br />
string patterns on <strong>GPU</strong>s. The string patterns are compiled into a<br />
finite state machine similar to the well-known Aho-Corasick<br />
algorithm. Furthermore, to accommodate large number of<br />
patterns, we present two kinds of hash functions which are<br />
adopted to compress the state transition table. The experimental<br />
results show that the PFAC library achieves significant<br />
performance on NVIDIA <strong>GPU</strong>s. Finally, the PFAC library has been<br />
released on Google code (http://code.google.com/p/pfac/).<br />
Speaker(s): Cheng-Hung Lin (Associate Professor, National Taiwan<br />
Normal University)<br />
Topic(s): Development Tools & Libraries, Algorithms & Numerical<br />
Techniques (Beginner)<br />
THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />
ROOM K<br />
S0119 Best Practices for Architecting and Managing<br />
High-Performance <strong>GPU</strong> Clusters<br />
An overview of designing, deploying, and managing <strong>GPU</strong> clusters<br />
for HPC. Learn to build and operate top500-class <strong>GPU</strong> computing<br />
resources that provide users with the latest CUDA features.<br />
Speaker(s): Dale Southard (Senior Solution Architect, NVIDIA)<br />
Topic(s): Cluster Management, Supercomputing (Intermediate)<br />
THURSDAY, MAY 17, 14:00 (25 MINUTES)<br />
ROOM M<br />
S0187 <strong>GPU</strong>s for Radio Imaging<br />
With the advent of a new breed of Telescopes like the Low<br />
Frequency Array (LOFAR), which rely on software processing to<br />
process large data-sets that they generate, there is a need to<br />
improve the software to run as fast as possible in order to process<br />
the large data-sets in a reasonable time. In this session we<br />
describe how we have used the computing power of <strong>GPU</strong>’s to<br />
improve the performance of the standard radio imaging<br />
techniques as well as how this computational power is useful for<br />
creating a new generation of Radio Imaging Algorithms.<br />
Speaker(s): Vamsi Krishna Veligatla (<strong>GPU</strong> <strong>Program</strong>mer, University<br />
of Groningen)<br />
Topic(s): Astronomy & Astrophysics (Intermediate)<br />
THURSDAY, MAY 17, 14:00 (25 MINUTES)<br />
ROOM L<br />
S0285 Optimization of a Sparse Matrix-Matrix<br />
Multiplication on the <strong>GPU</strong><br />
The goal of this session is to present advanced techniques to<br />
optimize CUDA code on the <strong>GPU</strong>. In particular, we will<br />
demonstrate the use of advanced CUDA instructions (inline PTX,<br />
warp instructions, “extended” syncthreads) and load-balancing<br />
strategies to improve the performance of a sparse matrix-matrix<br />
multiplication on the <strong>GPU</strong>.<br />
Speaker(s): Julien Demouth (Developer <strong>Technology</strong> Engineer, NVIDIA)<br />
Topic(s): Algorithms & Numerical Techniques (Advanced)<br />
THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />
ROOM B<br />
S0320 PTask: OS Support for <strong>GPU</strong> Dataflow <strong>Program</strong>ming<br />
This session considers the PTask API, OS-level abstractions that<br />
support <strong>GPU</strong>s as first-class computing resources, and supports a<br />
dataflow programming model. With PTask, the programmer<br />
specifies where data goes, rather than how and when it should get<br />
there, allowing the system to provide fairness and isolation<br />
guarantees, streamline data movement in ways that currently<br />
require direct programmer involvement, and enable code<br />
portabality across diverse <strong>GPU</strong>-based platforms. Our experience<br />
building the PTask APIs shows that PTask can provide important<br />
system-wide guarantees and can enable significant performance<br />
benefits, for example improving the throughput of hand-tuned<br />
CUDA programs by up to 2x.<br />
Speaker(s): Jon Currey (Microsoft Research Silicon Valley), Christopher<br />
Rossbach (Researcher, Microsoft Research Silicon Valley)<br />
Topic(s): Development Tools & Libraries, General Interest, Parallel<br />
<strong>Program</strong>ming Languages & Compilers (Advanced)<br />
THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0378 VASP Accelerated with <strong>GPU</strong>s<br />
This session will detail the performance and capabilities of<br />
<strong>GPU</strong>-accelerated VASP, explain design decisions made in porting<br />
VASP to CUDA, and present a roadmap for <strong>GPU</strong> accelerated<br />
VASP development. We’ve achieved performance improvements<br />
up to around 20x on systems of around 100 ions and have<br />
implemented exact-exchange. We are working on ports of more<br />
conventional functionality.<br />
Speaker(s): Maxwell Hutchinson (PhD Student, University of Chicago)<br />
Topic(s): Quantum Chemistry, Application Design & Porting<br />
Techniques, Computational Physics (Intermediate)<br />
THURSDAY, MAY 17, 14:00 (110 MINUTES)<br />
ROOM J1<br />
S0709 Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models: Part 2 (Presented<br />
by LANL)<br />
This session is part 2 of Applications- Methods and <strong>Program</strong>ming<br />
model that will feature short talks on “The Portability Wall: How<br />
hard can it really be?,” followed by a talk on “Accelerating NAMD”<br />
as well as “Refitting Legacy Software for the New Reality” and<br />
“Unstructured Data Structures: An Achilles Heel?” After<br />
Discussion and break , the session will end with short talks on<br />
“Power: The New Metric” and “It’s about Concurrency, Stupid!”<br />
Speaker(s): John Stone (Urbana Champaign), James Phillips<br />
(University of Illinois), John Humphrey (EM Photonics), Raphael Poncet<br />
(CEA), Simon MacIntosh-Smith (University of Bristol), Stanley Tzeng<br />
(UC Davis)<br />
Topic(s): Supercomputing (Intermediate)<br />
THURSDAY, MAY 17, 14:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0813 CUDA Profiler Training on Windows<br />
Nsight offers a comprehensive set of performance analysis tools.<br />
From the ability to trace complete system multi-core CPU and<br />
multi <strong>GPU</strong> activities, to profile CUDA kernel with precise profiling<br />
experiments, developers can identify system level optimization<br />
opportunities as well as expensive and inefficient CUDA kernels<br />
requiring in-depth analysis with the CUDA profiler. Through a set<br />
of comprehensive exercises, the attendee will be able to utilize
these features to become fully proficient at optimizing complex<br />
CUDA applications.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
THURSDAY, MAY 17, 14:30 (25 MINUTES)<br />
ROOM M<br />
S0022 Scalable Frameworks and Algorithms for<br />
Terascale Radio Astronomy Images<br />
Learn how the oldest science is using the newest processors to<br />
solve a critical problem: how to accomplish traditional image<br />
analysis and visualization tasks when the images are terabytes in<br />
size? Simple, standard operations such as displaying 2-d slices,<br />
evaluating image statistics, and applying histogram equalization<br />
become manifestly challenging when images dramatically exceed<br />
single-node memory capacity. We will explain how our hybrid<br />
CPU-<strong>GPU</strong> cluster framework – which can volume render a 200GB<br />
image at >50fps! – will support traditional radio astronomy tasks<br />
for the colossal images that the Square Kilometre Array and its<br />
precursor, the Australian SKA Pathfinder, will generate.<br />
Speaker(s): Christopher Fluke (Senior Lecturer, Swinburne University of<br />
<strong>Technology</strong> - Centre for Astrophysics and Supercomputing)<br />
Topic(s): Astronomy & Astrophysics, Visualization (Intermediate)<br />
THURSDAY, MAY 17, 14:30 (25 MINUTES)<br />
ROOM C<br />
S0032 Teraflop <strong>GPU</strong> Acceleration Of Large Matrix Algebra<br />
Learn how Multipath’s Fast Matrix Solver (FMS) is setting<br />
performance records using multiple <strong>GPU</strong>’s solving large matrices<br />
in production applications. By (1) leveraging NVIDIA’s CUBLAS<br />
library, (2) operating multiple <strong>GPU</strong>’s in parallel and (3) overlapping<br />
data transfers with computation, FMS averages over 2 teraflops of<br />
performance, even on jobs lasting for days. The presentation also<br />
includes a description of what problems FMS solves and how it is<br />
incorporated into applications programs.<br />
Speaker(s): Ronald Young (President, Multipath Corporation)<br />
Topic(s): Development Tools & Libraries, General Interest (Beginner)<br />
THURSDAY, MAY 17, 14:30 (50 MINUTES)<br />
ROOM L<br />
S0106 <strong>GPU</strong> Based Numerical Methods in Mathematica<br />
A fast way of developing, prototyping and deploying numerical<br />
algorithms that can take advantage of CUDA capable systems is<br />
available in Mathematica 8. Over the past year, educators,<br />
scientists, and business users have taken advantage of the<br />
benefits that the support of <strong>GPU</strong> programming in Mathematica. By<br />
integrating and implementing CUDA/OpenCL in their programs,<br />
users make use of a hybrid approach, combining the speed-up<br />
that <strong>GPU</strong>s offer and a powerful numerical development system. In<br />
this presentation several examples describing numerical<br />
applications ranging from deconvolution of MRI imaging, linear<br />
solvers for FEM, systems of ODEs, line integral convolution<br />
visualization are presented.<br />
Speaker(s): Ulises Cervantes-Pimentel (Senior Kernel Developer,<br />
Wolfram Research), Abdul Dakkak (Kernel Developer, Wolfram Research)<br />
Topic(s): Algorithms & Numerical Techniques, Visualization,<br />
Application Design & Porting Techniques, Development Tools &<br />
Libraries (Intermediate)<br />
THURSDAY, MAY 17, 14:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0231 Levenberg-Marquardt Using Block Sparse Matrices<br />
on CUDA<br />
This session describes the experiences of constructing <strong>GPU</strong> based<br />
matrix-vector functions for block sparse matrices having multiple<br />
block sizes and a domain-specific numerical Jacobian generation<br />
function. The bundle adjustment algorithm is an optimization<br />
procedure which attempts to refine the relative camera pose, and<br />
3D structure location variables, estimated from multiple sets of<br />
images. The Conjugate Gradient algorithm is used to solve the<br />
normal equations which appear in the inner loop to the non-linear<br />
least squares problem.<br />
Speaker(s): Tetsuo Tawara (Software Engineer, Koozyt)<br />
Topic(s): Application Design & Porting Techniques, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />
ROOM C<br />
S0071 The High-Level Linear Algebra Library ViennaCL<br />
and Its Applications<br />
Get to know ViennaCL, an OpenCL high-level linear algebra<br />
software, which allows to get the speed of <strong>GPU</strong> computing at the<br />
convenience level of the C++ Boost libraries. Decrease the<br />
development and execution time of applications by utilizing our<br />
well-tested and widely used library, instead of spending days on<br />
learning details of <strong>GPU</strong> architectures and debugging. We provide<br />
examples that demonstrate not only how quickly existing<br />
applications are ported efficiently from single-threaded execution<br />
to fully utilizing multi-threaded environments, but also how to<br />
utilize the rich set of functionalities ranging from common BLAS<br />
routines to iterative solvers.<br />
Speaker(s): Karl Rupp (Project Assistant, TU Wien)<br />
Topic(s): Development Tools & Libraries, Algorithms & Numerical<br />
Techniques, Computational Physics (Intermediate)<br />
THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />
ROOM M<br />
S0087 <strong>GPU</strong> Acceleration of Dense Stellar<br />
Clusters Simulation<br />
Computing the interactions between stars within dense stellar<br />
clusters is a problem of fundamental importance in theoretical<br />
astrophysics. This paper presents the parallelization of a Monte<br />
Carlo algorithm for simulating stellar cluster evolution using<br />
programmable Graphics Processing Units. The kernels of this<br />
algorithm exhibit high levels of data dependent decision making<br />
and unavoidable non-contiguous memory accesses. However, we<br />
adopt various parallelization strategies and utilize the high<br />
computing power of the <strong>GPU</strong> to obtain substantial near-linear<br />
speedups which cannot be easily achieved on a CPU-based<br />
system. This acceleration allows to explore physical regimes<br />
which were out of reach of current simulations.<br />
Speaker(s): Bharath Pattabiraman (PhD Student, Northwestern University),<br />
Stefan Umbreit (Postdoctoral Associate, Northwestern University)<br />
Topic(s): Astronomy & Astrophysics, Computational Physics, Algorithms<br />
& Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />
ROOM N<br />
S0091 Sustainable Hybrid Parallelization of an<br />
Unstructured Hydrodynamic Code<br />
The goal of this presentation is to share our methodology for<br />
77 CONFERENCE GUIDE THURSDAY
THURSDAY<br />
porting a numerical code to hybrid supercomputing architectures<br />
using MPI coupled with directive-based languages (OpenMP for<br />
multicore CPUs, and HMPP for <strong>GPU</strong>s). Our code, VOLNA, is an<br />
unstructured partial differential equation hydrodynamic solver<br />
developed for the simulation of tsunamis. Our results<br />
demonstrate that using directive-based languages such as HMPP<br />
for <strong>GPU</strong> programming, one can retain good performance (e.g.<br />
speedup of 15 compared to 1 CPU core, 3 compared to 8 CPU<br />
cores) with minimal modifications of the original CPU source code<br />
(about 30 lines of directives in our case).<br />
Speaker(s): Raphaël Poncet (Research Scientist, Commissariat à<br />
l’Energie Atomique et aux Energies Alternatives)<br />
Topic(s): Application Design & Porting Techniques, Algorithms &<br />
Numerical Techniques, Computational Fluid Dynamics,<br />
Computational Physics (Advanced)<br />
THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />
ROOM B<br />
S0157 A Study of Persistent Threads Style <strong>Program</strong>ming<br />
Model for <strong>GPU</strong> Computing<br />
We present the usefulness of a new style of <strong>GPU</strong> programming<br />
called Persistent Threads, known to be useful on irregular<br />
workloads. First, we will begin by formally defining the PT model.<br />
We will then categorize use of PT into four “use cases”, and<br />
present micro-benchmark analyses of when this model is useful<br />
over traditional kernel formulations. Third, we will show a full<br />
speech recognition application that uses all four PT use cases.<br />
Finally, we will conclude our talk by suggesting appropriate<br />
modifications to <strong>GPU</strong> hardware, software, and APIs that make PT<br />
kernels both easier to implement and more efficient.<br />
Speaker(s): Kshitij Gupta (Graduate Student Researcher, UC Davis),<br />
Jeff Stuart (PhD Student, UC Davis)<br />
Topic(s): Parallel <strong>Program</strong>ming Languages & Compilers, Audio, Image<br />
and Video Processing (Advanced)<br />
THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0334 The Fast Multipole Method on CPU and <strong>GPU</strong><br />
Processors<br />
The fast multipole method (FMM) is a widely used numerical<br />
algorithm in computational engineering. Accelerating the FMM on<br />
CUDA-enabled <strong>GPU</strong>s is challenging because the FMM has a<br />
complicated data access pattern, mostly during the so-called<br />
multipole-to-local (M2L) operation. We have created several<br />
schemes to optimize the M2L and have attained a performance of<br />
over 350 (resp. 160) Gflop/s for single (double) precision<br />
arithmetic. The optimal algorithm was incorporated into a<br />
complete FMM code, which can accept any smooth kernel as<br />
specified by the user, making it very flexible. We have also<br />
developed a highly efficient CPU version.<br />
Speaker(s): Eric Darve (Professor, Stanford)<br />
Topic(s): Computational Physics, Molecular Dynamics, Algorithms &<br />
Numerical Techniques (Advanced)<br />
THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />
ROOM K<br />
S0368 Unraveling the Mysteries of Quarks with<br />
Hundreds of <strong>GPU</strong>s<br />
Dive into the world of quarks and gluons, and hear how <strong>GPU</strong><br />
computing is revolutionizing the way many calculations in lattice<br />
quantum chromodynamics (lattice QCD) are performed. The main<br />
computational challenge in such calculations is to repeatedly<br />
solve large systems of linear equations arising from a fourdimensional<br />
finite-difference problem. In this session, we’ll<br />
discuss strategies for parallelizing such a solver across hundreds<br />
of <strong>GPU</strong>s. These include techniques and algorithms for reducing<br />
memory traffic and inter-<strong>GPU</strong> communication. The net result is an<br />
implementation that achieves better than 20 Tflops on 256 <strong>GPU</strong>s,<br />
realized in the open-source “QUDA” library.<br />
Speaker(s): Ronald Babich (Research Scientist, NVIDIA)<br />
Topic(s): Computational Physics, Application Design & Porting<br />
Techniques, Algorithms & Numerical Techniques, Supercomputing<br />
(Intermediate)<br />
THURSDAY, MAY 17, 15:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0429 Quantum Chemistry: Automated Code Generation<br />
and Optimization for <strong>GPU</strong> Kernels<br />
In this session we discuss the challenges encountered in<br />
development of quantum chemistry software for <strong>GPU</strong>s from<br />
scratch and optimization of the kernels for the best performance.<br />
We attempt to create a unified framework for automatic<br />
generation of efficient quantum chemistry codes tailored<br />
individually for various <strong>GPU</strong> (NVIDIA, ATI) and CPU architectures<br />
and programming (CUDA, OpenCL, C/C++) languages using a<br />
meta-programming approach based on a computer algebra<br />
system. We demonstrate its utility by generating highly optimized<br />
<strong>GPU</strong> and CPU kernels dealing with various integrals over<br />
Gaussian basis functions implemented in the TeraChem quantum<br />
chemistry package.<br />
Speaker(s): Alexey Titov (Engineering Research Associate, Stanford),<br />
Ivan Ufimtsev (Postdoc, Stanford)<br />
Topic(s): Quantum Chemistry (Advanced)<br />
THURSDAY, MAY 17, 15:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0814 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight<br />
development team! Whether you would like a private meeting to<br />
discuss specific product features or test out your application with<br />
the latest version of Nsight, or you just want to hang out with the<br />
team after attending one of the exciting training session, the<br />
lounge is great place to learn everything you ever wanted to know<br />
about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
THURSDAY, MAY 17, 15:30 (50 MINUTES)<br />
ROOM M<br />
S0111 An Efficient CUDA Implementation of a Tree-Based<br />
N-Body Algorithm<br />
This session presents a complete CUDA implementation of the<br />
irregular Barnes-Hut n-body algorithm. This algorithm repeatedly<br />
builds and traverses unbalanced trees, making it difficult to map<br />
to <strong>GPU</strong>s. We explain in detail how our code exploits the<br />
architectural features of <strong>GPU</strong>s, including lockstep operation and<br />
thread divergence, both of which are commonly viewed as hurdles<br />
to achieving high performance, especially for irregular codes. On<br />
a five million body simulation running on a Tesla C2050, our CUDA<br />
implementation is 30 times faster than a parallel pthreads version<br />
running on a high-end 6-core Xeon.<br />
Speaker(s): Martin Burtscher (Associate Professor, Texas State University)<br />
Topic(s): Application Design & Porting Techniques, Astronomy &<br />
Astrophysics, Molecular Dynamics, Supercomputing (Advanced)
THURSDAY, MAY 17, 15:30 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0138 <strong>GPU</strong> Task-Parallelism: Primitives and Applications<br />
We explore how a task-parallel model can be implemented on the<br />
<strong>GPU</strong> and address concerns and programming techniques for<br />
doing so. We discuss the primitives for building a task-parallel<br />
system on the <strong>GPU</strong>. This includes novel ideas for mapping tasking<br />
systems onto the <strong>GPU</strong> including task granularity, load balancing,<br />
memory management, and dependency resolution. We also<br />
present several applications which demonstrate how a taskparallel<br />
model is more suitable than the regular data parallel<br />
model. These applications include a Reyes renderer, tiled deferred<br />
lighting renderer, and a video encoding demo.<br />
Speaker(s): Anjul Patney (PhD Candidate, UC Davis), Stanley Tzeng<br />
(Graduate Student, UC Davis)<br />
Topic(s): Application Design & Porting Techniques, Development Tools<br />
& Libraries, Computer Graphics (Intermediate)<br />
THURSDAY, MAY 17, 15:30 (50 MINUTES)<br />
ROOM L<br />
S0267B Mixing Graphics and Compute with Multiple <strong>GPU</strong>s<br />
In this session we will cover all the different aspects of interaction<br />
between graphics and compute. The first part of the session will<br />
focus on compute API interoperability with OpenGL (using CUDA<br />
and OpenCL APIs), while the second part of the session will delve<br />
into interoperability at a system level. In particular we will go<br />
through the challenges and benefits of dedicating one <strong>GPU</strong> for<br />
compute and another for graphics, how different system<br />
configurations affect data transfer between two <strong>GPU</strong>s, and how it<br />
translates into application design decisions helping to enable an<br />
efficient, cross-<strong>GPU</strong> interoperability between compute and<br />
graphics contexts.<br />
Speaker(s): Alina Alt (Applied Engineer, NVIDIA)<br />
Topic(s):Visualization, Application Design & Porting Techniques<br />
(Beginner)<br />
THURSDAY, MAY 17, 15:30 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0392 Large-Scale First Principle Pseudopotential DFT<br />
Calculations on <strong>GPU</strong> Clusters<br />
In this session, we will present a series of work on density<br />
functional theory (DFT) plane wave pseudopotential(PWP)<br />
calculations on <strong>GPU</strong> clusters. The <strong>GPU</strong> version is developed based<br />
on a CPU DFT-PWP code: PEtot, which can calculate ~1000 atoms<br />
on thousands of processors. Our test indicates that the <strong>GPU</strong><br />
version can have a ~20 times speedup over CPU code. A detail<br />
analysis of the speed-up and the scaling on the number of CPU/<br />
<strong>GPU</strong>(up to 256) will be presented. As far as we know, this is the<br />
first <strong>GPU</strong> DFT-PWP code scalable to large number of CPU/<strong>GPU</strong>.<br />
Speaker(s): WeiLe Jia (Postgraduate Student, Supercomputing Center of<br />
CNIC, Chinese Academy of Sciences), Long Wang (Associate Professor,<br />
Supercomputing Center of CNIC, Chinese Academy of Sciences)<br />
Topic(s): Quantum Chemistry, General Interest (Advanced)<br />
THURSDAY, MAY 17, 16:00 (25 MINUTES)<br />
MARRIOTT BALLROOM 3<br />
S0038 Designing Killer CUDA Applications for X86,<br />
multi<strong>GPU</strong>, and CPU+<strong>GPU</strong><br />
CUDA redefined software development with 10 to 1000-times<br />
faster <strong>GPU</strong> applications. Now a single CUDA source tree can<br />
support the x86 mass market (no <strong>GPU</strong> required) and 1/3 billion<br />
CUDA-enabled <strong>GPU</strong>s. Multi<strong>GPU</strong> and CPU+<strong>GPU</strong> apps utilize all<br />
system resources. <strong>GPU</strong>direct, UVA, caches, prefetching, ILP<br />
(Instruction level Parallelism), automated analysis tools and more<br />
offer ease, capability, and performance. The overall impact on<br />
software investment, scalability, balance metrics, programming<br />
API, and lifecycle will be considered. Working real-time video and<br />
other examples from my book, ”CUDA Application Design and<br />
Development” provide practical insight to enable augmented<br />
reality and your killer apps.<br />
Speaker(s): Robert Farber (Chief Scientist, BlackDog Endeavorsr, LLC)<br />
Topic(s): Machine Learning & AI, Supercomputing, Databases, Data<br />
Mining, Business Intelligence, Computer Vision (Intermediate)<br />
THURSDAY, MAY 17, 16:00 (50 MINUTES)<br />
ROOM N<br />
S0063 Robust Preconditioned Conjugate Gradient for the<br />
<strong>GPU</strong> and Parallel Implementations<br />
Get a closer look on how parallel conjugate gradient(CG) method<br />
can get an edge over it’s optimized CPU implementation. We have<br />
developed preconditioning techniques for CG which are suited to<br />
the <strong>GPU</strong> and match Block-IC in terms of numerical performance.<br />
We present our results for two level preconditioned CG on the <strong>GPU</strong><br />
and also compare it with multi-CPU, implementations. Our results<br />
show that for large problem sizes (1 million unknowns and above)<br />
it is possible to achieve an order of magnitude and higher<br />
speedups for the two level preconditioned CG method.<br />
Speaker(s): Rohit Gupta (PhD Student, Delft University of <strong>Technology</strong>)<br />
Topic(s): Computational Fluid Dynamics, Algorithms &<br />
Numerical Techniques (Intermediate)<br />
THURSDAY, MAY 17, 16:00 (25 MINUTES)<br />
ROOM K<br />
S0282 Leveraging NVIDIA <strong>GPU</strong>Direct on APEnet+ 3D<br />
Torus Cluster Interconnect<br />
APEnet+ is a novel cluster interconnect, based on a custom PCI<br />
card which features a PCI Express Gen2 X8 link and a reconfigurable<br />
HW component (FPGA). It supports a 3D Torus<br />
topology and has special acceleration features specifically<br />
developed for NVIDIA Fermi <strong>GPU</strong>s. An introduction to the basic<br />
features and the programming model of APEnet+ will be followed<br />
by a description of its performance on some numerical<br />
simulations, e.g. High Energy Physics simulations.<br />
Speaker(s): Davide Rossetti (Researcher, Italian National Institue for<br />
Nuclear Physics)<br />
Topic(s): Supercomputing, Computational Physics (Intermediate)<br />
THURSDAY, MAY 17, 16:00 (25 MINUTES)<br />
ROOM B<br />
S0428 Panini: A <strong>GPU</strong> Aware Array Class<br />
We present a new templated C++ class library, PANINI, for use in<br />
the development of large-scale scientific simulations in an<br />
hetrogeneous computing environment. The key feature of this new<br />
library is a generic parallel array class built on advanced generic<br />
programming methodologies where details of parallelization is<br />
hidden inside the array class itself. This library will be used for<br />
Poison Solver, Advection Diffusion and other equation.<br />
Speaker(s): Priyanka Sah (Compute DevTech Engineer, NVIDIA),<br />
Santosh Ansumali (Faculty Fellow, Engineering Mechanics Unit,<br />
JNCASR, Bangalore)<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
79 CONFERENCE GUIDE THURSDAY
THURSDAY<br />
THURSDAY, MAY 17, 16:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0815 CUDA Debugger Training on Windows<br />
Nsight offers a variety of powerful CUDA debugging feature set<br />
that enables developers to quickly spot bugs. From the memory<br />
checker to advanced breakpoints and variable warp watch panel, a<br />
developer can quickly isolate access memory errors, filter out the<br />
thousands of threads to a specific thread and quickly spot<br />
abnormal variable value ranges. Through a set of comprehensive<br />
exercises, the attendee will be able to utilize these features to<br />
become fully proficient at developing CUDA code.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
THURSDAY, MAY 17, 16:30 (50 MINUTES)<br />
ROOM M<br />
S0065 Satellite HUB Communication System <strong>GPU</strong> Based<br />
In the last few years the increasing <strong>GPU</strong> computational power has<br />
opened new perspectives in telecommunication fields trough SDR<br />
(software defined radio) approach. Some tasks, such as the one<br />
we had to deal with, do not offer negotiation margins with the<br />
execution speed due to the real-time analysis of a radio signal. We<br />
coped with the implementation of the lowest layer in the protocol<br />
stack for a land mobile satellite communication system, and we<br />
were able to deliver a product with a reduced time to market with<br />
respect to traditional FPGA approach.<br />
Speaker(s): Gaetano Mendola (Principal Engineer, MBI srl), Francesco<br />
Basile (Software Engineer, MBI srl)<br />
Topic(s): General Interest (Intermediate)<br />
THURSDAY, MAY 17, 16:30 (25 MINUTES)<br />
ROOM B<br />
S0218 ASI Parallel Fortran: A General-Purpose Fortran<br />
to <strong>GPU</strong> Translator<br />
Over the last 3 years we have developed a general-purpose<br />
Fortran to <strong>GPU</strong> translator: ASI Parallel Fortran does. The talk will<br />
detail its purpose, design layout and capabilities, and show how it<br />
is used and implemented. The use of ASI Parallel Fortran will be<br />
shown for large-scale CFD/CEM codes as well as other general<br />
purpose Fortran codes.<br />
Speaker(s): Rainald Lohner (Professor, George Mason University)<br />
Topic(s): Development Tools & Libraries, Computational Fluid<br />
Dynamics, Computational Physics, Parallel <strong>Program</strong>ming Languages<br />
& Compilers (Advanced)<br />
THURSDAY, MAY 17, 16:30 (50 MINUTES)<br />
MARRIOTT BALLROOM 4<br />
S0220 Enabling Faster Material Science Modeling Using<br />
the Accelerated Quantum ESPRESSO<br />
The goal of this session is to present the advantages of mixing<br />
CUDA libraries and CUDA kernels to deliver a robust community<br />
package for material science modeling that fully exploits multicore<br />
systems equipped with <strong>GPU</strong>s. The Plane-Wave Self-<br />
Consistent Field (PWscf) code of the Quantum ESPRESSO suite is<br />
the focus of this work. During the session the main computationdependent<br />
components, that also represent fundamental building<br />
blocks for many other quantum chemistry codes, will be<br />
discussed and analyzed. Subsequently an in-depth performance<br />
assessment of several realistic scientific cases will be presented,<br />
starting from single workstations to large clusters equipped with<br />
hundreds of <strong>GPU</strong>s.<br />
Speaker(s): Filippo Spiga (Computational Scientist, Irish Centre for<br />
High-End Computing)<br />
Topic(s): Quantum Chemistry, Supercomputing, Application Design &<br />
Porting Techniques (Intermediate)<br />
THURSDAY, MAY 17, 16:30 (25 MINUTES)<br />
ROOM L<br />
S0411 Artifact-Free Cloud-Based CAD Rendering<br />
Cloud computing for mechanical CAD provides centrally stored and<br />
synchronized models for concurrent engineering. For compactness,<br />
trimmed parametric NURBS surface representations are optimal<br />
for data transfer to client devices, which must evaluate and render<br />
models locally. Direct <strong>GPU</strong> rendering without pre-tessellation is an<br />
attractive solution in this context, both for speed and to preserve<br />
fidelity to the original geometry. However, existing data-parallel<br />
direct rendering approaches for NURBS suffer from rendering<br />
artifacts at trim boundaries. This talk proposes a solution to<br />
address these rendering artifacts that are still preventing widescale<br />
adoption of all such direct rendering algorithms for trimmed<br />
parametric models.<br />
Speaker(s): Sara McMains (Professor, UC Berkeley), Sushrut<br />
Pavanaskar (PhD Candidate, UC Berkeley)<br />
Topic(s): Algorithms & Numerical Techniques, Computer Graphics,<br />
Cloud Computing, Visualization (Beginner)<br />
THURSDAY, MAY 17, 17:00 (25 MINUTES)<br />
ROOM L<br />
S0074 Techniques for Designing GP<strong>GPU</strong> Games<br />
Learn how to develop faster and better games with the use of<br />
GP<strong>GPU</strong> thought the use of Game <strong>GPU</strong> tricks. Normally, games<br />
process most of its tasks in the CPU, using the <strong>GPU</strong> only for<br />
graphics processing. This session shows some techniques on how<br />
to better use the GP<strong>GPU</strong> power to process all the game logic,<br />
achieving speedups when compared to CPU, and traditional <strong>GPU</strong><br />
models. This session also shows some examples of this technique<br />
in practice.<br />
Speaker(s): Mark E S Joselli (Researcher, UFF), Esteban Clua<br />
(Professor, UFF)<br />
Topic(s): Development Tools & Libraries (Intermediate)<br />
THURSDAY, MAY 17, 17:00 (50 MINUTES)<br />
ROOM NVIDIA NSIGHT LAB<br />
S0816 NVIDIA Nsight Lounge<br />
Come to the NVIDIA Nsight Lounge to meet the Nsight<br />
development team! Whether you would like a private meeting to<br />
discuss specific product features or test out your application with<br />
the latest version of Nsight, or you just want to hang out with the<br />
team after attending one of the exciting training sessions, the<br />
lounge is great place to learn everything you ever wanted to know<br />
about the tool.<br />
Speaker(s): NVIDIA Developer Tools Team<br />
Topic(s): Development Tools & Libraries (Beginner)<br />
THURSDAY, MAY 17, 17:30 (25 MINUTES)<br />
ROOM M<br />
S0134 On the Integration of OpenCL into a Software<br />
Defined Radio<br />
We will present a software defined radio system that allows for<br />
heterogeneous processing using a host computer’s CPUs and<br />
<strong>GPU</strong>s, via dynamic runtime resource allocation provided by our<br />
Surfer framework and extensions to it using OpenCL. This system
collects runtime statistics including samples / second throughput<br />
for each signal processing block, data transfer latency between<br />
different processors, and the host CPU cores’ loads. Using this<br />
information, a supervisor can move computations between<br />
processors during runtime, without interrupting data processing.<br />
We will demonstrate an OFDM transmitter, graphing the system<br />
throughput and CPU loads while selecting where processing<br />
occurs for each block.<br />
Speaker(s): Michael Dickens (Graduate Student, University of Notre Dame)<br />
Topic(s): General Interest (Intermediate)<br />
81 CONFERENCE GUIDE THURSDAY
<strong>GPU</strong> Consolidation<br />
and Virtualization for<br />
Application Acceleration<br />
and Data Visualization<br />
www.nextio.com
ALGORITHMS & NUMERICAL TECHNIQUES<br />
AN01 - A Novel Parallel Realisation of the<br />
Element-by-Element FEM Technique<br />
The element-by-element (EbE) finite element<br />
method (FEM) is a long known technique, by which<br />
a conjugate gradient (CG) type iterative solution<br />
scheme can be entirely decomposed into<br />
computations on the element level, i.e., without<br />
assembling the global system matrix. In our<br />
implementation a CUDA capable <strong>GPU</strong> is utilized to<br />
perform the required element-wise computations in<br />
parallel. Since element matrices need not be stored,<br />
the memory requirement can be kept extremely low.<br />
This low-storage but computation intensive<br />
technique is better suited for <strong>GPU</strong>s than those<br />
requiring the massive manipulation of large data<br />
sets, enabling handling of millions of tetrahedrons.<br />
Contact: Zsolt Badics (Tensor Research, LLC)<br />
AN02 - ExaFMM: An Open Source Library for<br />
Fast Multipole Methods<br />
The fast multipole method (FMM) is a numerical<br />
engine use din many applications, from acoustics,<br />
electrostatics, fluid simulations, wave scattering<br />
and more. Despite its importance, there is lack of<br />
open community code, which arguably has<br />
affected its wider adoption. It is also a difficult<br />
algorithm to understand and to program, making<br />
availability of open-source implementations even<br />
more desirable. We developed a novel treecode-<br />
FMM hybrid algorithm with auto-tuning<br />
capabilities. It is highly parallel and <strong>GPU</strong>-capable.<br />
Its usage in the simulation of homogeneous<br />
isotropic turbulence achieved 0.5 petaflop/s on<br />
2048 <strong>GPU</strong>s of the Tsubame system.<br />
Contact: Lorena Barba (Boston University)<br />
AN03 - Collatz-Type Conjectures on <strong>GPU</strong><br />
We verify two types of Collatz conjectures: on the<br />
set of rational numbers and the set of matrices<br />
modulo p, where p is prime. In both cases, the<br />
number of pairs of rational numbers and matrices<br />
grow exponentially. However, our algorithm<br />
exhibits simple parallel patterns which exploit<br />
<strong>GPU</strong>s in an efficient way. The preliminary results<br />
show that the conjecture holds for both cases for<br />
large sets.<br />
Contact: Peter Yoon (Trinity College)<br />
AN04 - CUDA Implementation of Recurrence<br />
Equation Solvers Using P-scheme approach<br />
The recurrence equation solver is used in many<br />
numerical applications and other general-purpose<br />
applications, but it is inherently a sequential<br />
algorithm, so it is difficult to implement the<br />
parallel program for it. We implement a parallel<br />
and scalable algorithm for solving recurrence<br />
equations on <strong>GPU</strong>s by using CUDA and evaluate<br />
its effectiveness. The algorithm was originally<br />
implemented for MIMD parallel computers by the<br />
authors and we modify the algorithm suitable for<br />
the GP<strong>GPU</strong> system by rearranging arrays<br />
configurations. We also show how to determine<br />
the optimal size of threads in a thread block and<br />
evaluate its validity.<br />
Contact: Akiyoshi Wakatani (Konan University)<br />
AN05 - Accelerating Symmetric Matrix-Vector<br />
Product on Fermi <strong>GPU</strong><br />
We aim in the work presented here to describe an<br />
optimized numerical kernels computing the<br />
symmetric matrix-vector product (Level 2 BLAS)<br />
on the last NVIDIA TESLA <strong>GPU</strong> family, codenamed<br />
Fermi (C2070). Due to its inherent memory-bound<br />
nature, this kernel represents one of the most<br />
critical operations in computing the tridiagonal<br />
form of a symmetric dense matrix, which is the<br />
preprocessing step toward calculating the<br />
eigenpairs. Using a novel design to address the<br />
irregular memory accesses by hiding latency and<br />
increasing bandwidth, our preliminary asymptotic<br />
results show up to 3.5 fold speedups over existing<br />
numerical libraries.<br />
Contact: Hatem Ltaief (KAUST Supercomputing<br />
Laboratory)<br />
AN06 - Rapid Matrix Construction for Wavelet-<br />
Galerkin Schemes<br />
The wavelet Galerkin scheme is an efficient<br />
numerical method used to improve Boundary<br />
Element Methods and Finite Element Methods for<br />
solving partial differential equations given<br />
resulting matrix features like sparseness and<br />
conditionality. Using CUDA C/C++ we have<br />
implemented the open-source C++ Library of<br />
Adaptive Wavelet Applications (LAWA) on the <strong>GPU</strong><br />
and achieve significant performance gain for<br />
matrix construction.<br />
Contact: Yuri Nesterenko (Dantec Dynamics A/S)<br />
AN07 - Big Number Modulo Exponentiations For<br />
Zero-Knowledge Protocols on <strong>GPU</strong>s<br />
In this work we implement parallel big number<br />
exponentiations having a fixed base on the <strong>GPU</strong>.<br />
For this task we develop a new implementation of<br />
the Montgomery multiplication algorithm. Although<br />
big number exponentiations benefit from large<br />
caches like on a CPU, we show that this lack can be<br />
compensated by a high level of parallelization and<br />
an adaptation of the algorithms.<br />
Contact: Tobias Jeske (TU Hamburg-Harburg)<br />
AN08 - Tuning a Finite Difference Stencil<br />
Several ways of tuning a finite difference stencil<br />
computation are discussed. The combination of<br />
vectorization and a modified data layout, a cache<br />
aware algorithm, loop unrolling, parallelization<br />
and parameter tuning lead to optimized<br />
implementations at a level of up to 90% peak<br />
performance of the floating point pipelines on<br />
NVIDIA Fermi <strong>GPU</strong>s and on CPUs.<br />
Contact: Gerhard Zumbusch (University Jena)<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
83
POSTER LISTINGS<br />
AN09 - Parallel <strong>Program</strong>ming on CPU-<strong>GPU</strong> for<br />
Solving Population Balance Equation<br />
The population balance equation (PBE) is one of<br />
those. The Dual Quadrature Method of Generalized<br />
Moments (DuQMoGeM) is a promising method for<br />
solving the PBE. The drawback of this methodology<br />
is the large computational cost associated with the<br />
adaptive numerical integration. Therefore, the<br />
adaptive cubature algorithm was implemented in<br />
hybrid architecture (MPI-CUDA) to accelerate the<br />
DuQMoGeM. The maximum speed up was about<br />
48x using 4 <strong>GPU</strong>s and 4 nodes and the maximum<br />
speed up was about 40x using 2 <strong>GPU</strong>s and 1 node.<br />
Contact: Fabio Pereira dos Santos (Institute for<br />
Medical Physics)<br />
AN10 - <strong>GPU</strong> Enabled Comparison Between<br />
Stochastic Decomposition Methods<br />
The scale of engineering problems has sharply<br />
increased over the last twenty years. The ability to<br />
learn the coupling (inter-dependence) structure of<br />
a problem during the solution process could lead<br />
to large reductions in the time to analyze complex<br />
problems. Such decomposition methods could<br />
also provide engineering insight on the<br />
fundamental physics driving problem solution.<br />
This work forwards the current state of the art in<br />
engineering decomposition through the<br />
application of techniques originally developed<br />
within computer science and information theory.<br />
CUDA enabled a detailed comparison between the<br />
current practice of using Genetic Algorithms and a<br />
newly introduced method called MIMIC.<br />
Contact: Richard Otero (Los Alamos National Lab)<br />
AN11 - <strong>GPU</strong>-Accelerated 3-D Electromagnetic<br />
Particle-in-Cell Implementations in VORPAL<br />
We present recent developments in implementing<br />
3D <strong>GPU</strong>-accelerated eletromagnetic particle-incell<br />
particle updates in the plasma physics<br />
framework VORPAL. The primary challenge in PIC<br />
methods on <strong>GPU</strong>s is thread contention during the<br />
current deposition stage: we resolve these thread<br />
contentions by sorting particles into ‘tiles’ of many<br />
cells each time step. Multiple thread blocks may<br />
be assigned to each tile, and each block<br />
accumulates the contribution from a moderate<br />
number of particles via an unsegmented<br />
Esirkepov 1st-order scheme. We achieve update<br />
times of 50 ns per-particle per-timestep for a<br />
variety of realistic self-consistent double-precision<br />
EM simulations.<br />
Contact: Keegan Amyx (Tech-X Corporation)<br />
AN12 - LU Factorization for 10,000s of Small<br />
Dense Matrices<br />
LU factorization is a ”high-level” algebraic<br />
description for Gaussian elimination and is a<br />
fundamental operation performed in linear<br />
algebra. By implementing a register heavy<br />
mapping in CUDA specifically for small matrices,<br />
speed-up factors of more than 10 are achieved vs.<br />
an OpenMP parallelized Intel MKL implementation<br />
running on a high-end quad-core CPU.<br />
Contact: Ian Wainwright (High Performance<br />
Consulting)<br />
AN13 - <strong>GPU</strong> Implementation of a Streaming<br />
Broadband RF Receiver<br />
An experimental radio broadcasting system<br />
spreads the signal with a PN code. To reconstruct<br />
the original signal, the receiver correlates the PN<br />
code with the signal received, numerically<br />
sampled, requiring a direct as well as an inverse<br />
Fast Fourier Transform, plus other conditioning<br />
and filtering operations. Since the target speed of<br />
the system is 625 mega samples per second,<br />
processed in segments of one mega samples<br />
each, performing this computation on a standard<br />
CPU system is prohibitive and <strong>GPU</strong> processing is<br />
an attractive option. This project describes an<br />
initial CUDA implementation that performs almost<br />
at target speed.<br />
Contact: Andrea Di Blas (University of California,<br />
Santa Cruz)<br />
AN14 - Efficient Algebraic Multigrid Methods<br />
on <strong>GPU</strong>s<br />
Algebraic multigrid methods for large, sparse<br />
linear systems are a necessity in many<br />
computational simulations, yet parallel algorithms<br />
for such solvers are generally decomposed into<br />
coarse-grained tasks suitable for distributed<br />
computers with traditional processing cores. We<br />
develop a parallel algebraic multigrid method<br />
which exposes substantial fine-grained<br />
parallelism in both the construction of the<br />
multigrid hierarchy as well as the cycling or solve<br />
stage. The resulting solver achieves an average<br />
speedup of 1.8x in the setup phase and 5.7x in the<br />
cycling phase when compared to a representative<br />
CPU implementation.<br />
Contact: Steven Dalton (University of Illinois at<br />
Urbana-Champaign)<br />
APPLICATION DESIGN & PORTING<br />
TECHNIQUES<br />
AP01 - Debugging Floating Point<br />
Implementations on <strong>GPU</strong>s<br />
To debug <strong>GPU</strong> code it is important to understand<br />
differences between both CPU and <strong>GPU</strong><br />
implementations. The differences arise due to<br />
floating point (FP) differences and casting from<br />
floating point to fixed point. FP differences arise<br />
due to the lack of associativity of FP, differences in<br />
instruction implementation, and choices made by<br />
the compiler. We analyzed medical image<br />
reconstruction code for breast reconstruction and<br />
showed that <strong>GPU</strong> and CPU code could be made to<br />
produce identical results. We also analyze the<br />
performance implications of choosing different<br />
implementation options on the <strong>GPU</strong> and CPU to<br />
make the codes match.<br />
Contact: Miriam Leeser (Northeastern University)
AP02 - KILO Transactional Memory for <strong>GPU</strong><br />
<strong>GPU</strong>s are designed to efficiently execute of 1000s<br />
of concurrent threads on multiple SIMT cores to<br />
hide long latency operations. Currently, threads in<br />
different CUDA blocks can only communicate via<br />
global memory accesses, and programmers have<br />
to consider data-races. Although fine-grained<br />
locks can be constructed using 32-/64-bit word<br />
atomic operations in recent <strong>GPU</strong>s, operations<br />
involving multiple locks can have deadlocks. We<br />
propose to solve these problems by extending<br />
<strong>GPU</strong>s to support transactional memory. Some of<br />
the major challenges are to support 1000s of<br />
concurrent transactions, to commit nonconflicting<br />
transactions in parallel, and to<br />
integrate with stack-based SIMT execution.<br />
Contact: Wilson Wai Lun Fung (University of<br />
British Columbia)<br />
AP03 - CUDA-Based <strong>GPU</strong> Computing Framework<br />
for GNU Octave<br />
This poster presents the design of a CUDA-<strong>GPU</strong><br />
based parallel processing framework for GNU<br />
Octave. Octave is a high-level interpreted<br />
language, primarily intended for numerical<br />
computations. GNU Octave being an open source<br />
alternative to Matlab, is widely used in academic<br />
and research institutes. The <strong>GPU</strong> framework<br />
allows Octave users to accelerate their software<br />
written in Octave high-level ‘M’ language on <strong>GPU</strong>s<br />
with minimal code modifications. To my<br />
knowledge, this is the first attempt to build a <strong>GPU</strong><br />
framework for Octave, contrary to previous<br />
attempts to provide <strong>GPU</strong> variants for a set of<br />
Octave functions.<br />
Contact: John Melonakos (AccelerEyes)<br />
ASTRONOMY & ASTROPHYSICS<br />
AA01 - Adaptive Beam-Forming for Radio<br />
Astronomy on <strong>GPU</strong>s<br />
With the advent of a new breed of Telescopes like<br />
the Low Frequency Array (LOFAR), which rely on<br />
software processing to process large data-sets<br />
that they generate, there is a need to improve the<br />
software to run as fast as possible in order to<br />
process the large data-sets in a reasonable time.<br />
In this session we describe how we have used the<br />
computing power of <strong>GPU</strong>’s to improve the<br />
performance of the standard radio imaging<br />
techniques as well as how this computational<br />
power is useful for creating a new generation of<br />
Radio Imaging Algorithms.<br />
Contact: Vamsi Krishna Veligatla (University<br />
of Groningen)<br />
AA02 - Accelerating Real-Time Processing of the<br />
ATST Adaptive Optics System<br />
The real-time processing of the four meter<br />
Advanced <strong>Technology</strong> Solar Telescope (ATST)<br />
adaptive optics (AO) system with approximately<br />
1750 sub-apertures and 1900 actuators requires<br />
massive parallel processing to complete the task.<br />
The parallel processing is harnessed with the<br />
addition of hardware accelerators such as<br />
Graphics Processing Unit (<strong>GPU</strong>). We investigate<br />
the hybrid data processing architecture of the<br />
Shack-Hartmann correlation and wavefront<br />
reconstruction using FPGAs and <strong>GPU</strong>s. The ATST<br />
AO algorithm is implemented, benchmarked on<br />
the FPGA-<strong>GPU</strong> system and compared with the<br />
existing legacy Digital Signal Processing (DSP)<br />
based hardware system.<br />
Contact: Vivek Venugopal (United Technologies<br />
Research Center)<br />
AA03 - Cosmological Calculations on the <strong>GPU</strong><br />
Cosmological measurements often involve the<br />
calculation of non-trivial quantities over<br />
increasingly large datasets. The next generation of<br />
survey telescopes will yield information for billions<br />
of galaxies. The scale of the datasets, and the type<br />
of calculations involved, are ideal models for use<br />
of the <strong>GPU</strong>. We present two cosmological<br />
measurements, and describe the implementation<br />
and improvements found with the <strong>GPU</strong>.<br />
Contact: Deborah Bard (SLAC National Accelerator<br />
Laboratory)<br />
AA04 - Fast Cross-Matching of Astronomical<br />
Catalogs on <strong>GPU</strong>s<br />
We present a method of cross-matching objects of<br />
large astronomical catalogs, over 150 million<br />
objects, in under 4 minutes. We utilize up to 6<br />
NVIDIA c2050 and have achieved an over 40x<br />
speedup versus conventional methods.<br />
Contact: Matthias Lee (Johns Hopkins University)<br />
AUDIO, IMAGE & VIDEO PROCESSING<br />
AV01 - Rapid Training of Acoustic Models Using<br />
<strong>GPU</strong>s<br />
Robust and accurate speech recognition systems<br />
can only be realized with adequately trained<br />
acoustic models. For common languages,<br />
state-of-the-art systems are now trained on<br />
thousands of hours of speech data, which can take<br />
weeks even with a large cluster of machines. To<br />
overcome this development bottleneck, we<br />
propose a new framework for rapid training of<br />
acoustic models using highly parallel <strong>GPU</strong>s. With<br />
a single NVIDIA GTX580 <strong>GPU</strong>, our proposed<br />
approach is shown to be 51x faster than a<br />
sequential CPU implementation, enabling a<br />
moderately sized acoustic model to be trained on<br />
1000-hour speech data in just over 9 hours.<br />
Contact: Jike Chong (Carnegie Mellon University)<br />
AV02 - 2 Million Pixel Experiment<br />
This experimental application has been created as<br />
a piece of computational art using visual computing<br />
technologies. It maps a high definition video source<br />
(1080p) into 3D space. The pixel transformation is<br />
accelerated by a CUDA kernel to achieve realtime<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
85
POSTER LISTINGS<br />
accuracy. Beside the production of visual effects in<br />
arts this method may be utilized for video quality<br />
checking on lower pixel level.<br />
Contact: Philipp Drieger (Noumentalia.de - Digital<br />
Arts / KU Eichstätt-Ingolstadt)<br />
AV03 - Speeding Up Camera Sabotage Detection<br />
on CUDA<br />
Camera Sabotage Detection (CSD) algorithms,<br />
namely Camera Moved Detection, Camera Out of<br />
Focus Detection and Camera Covered Detection,<br />
are used to detect tampering attempts on<br />
surveillance cameras. CSD algorithms are required<br />
to be run on a high number of cameras in realtime,<br />
bringing high computational load to the video<br />
analytics systems. In this work, the CSD algorithms<br />
are accelerated by using CUDA. The overall system<br />
test results show that parallelization in <strong>GPU</strong> makes<br />
the system 18 times faster than its CPU<br />
counterpart and up to 400 cameras can be<br />
supported in real time on a GTX 470.<br />
Contact: Alptekin Temizel (Middle East Technical<br />
University)<br />
AV04 - Remote Sensing on <strong>GPU</strong>: A Case Study<br />
Satellite images have become widely available; as<br />
a result there are increasing number of<br />
commercial applications utilizing these images.<br />
Satellites provide data in different wavelengths<br />
and they have higher resolution and larger data<br />
size compared to typical images. Running complex<br />
algorithms on satellite images for large data<br />
volumes is highly time consuming using CPUs and<br />
can be speeded-up using <strong>GPU</strong>s. In this paper,<br />
performance of shadow detection and vegetation<br />
detection algorithms are investigated and their<br />
performance on <strong>GPU</strong> and CPU are compared.<br />
Results show that up to 10.2 times speed up could<br />
be achieved using <strong>GPU</strong>.<br />
Contact: Alptekin Temizel (Middle East Technical<br />
University)<br />
AV05 - Finite Difference-Based Sound Synthesis<br />
Using <strong>GPU</strong>s<br />
Finite Difference (FD) methods can be the basis<br />
for physics-based music instrument models that<br />
generate realistic audio output. However, such<br />
methods are compute-intensive; large simulations<br />
cannot run in real time on current CPUs. In this<br />
poster, we describe the current state of our<br />
implementation of a real-time sound synthesizer<br />
using an FD-based simulation of a twodimensional<br />
membrane executed on <strong>GPU</strong>s. We<br />
demonstrate that it is possible to use this method<br />
to create a usable real-time audio synthesizer.<br />
Contact: Marc Sosnick (San Francisco State<br />
University)<br />
AV06 - Parallelization of Hough Transform for<br />
Circles Using CUDA<br />
Hough Transform (HT) is a well-known technique<br />
used for detection of parametric shapes in image<br />
processing. However, various optimizations are<br />
necessary in its implementation due to large<br />
memory and computational requirements. In this<br />
paper, we consider the case of parallelization of<br />
Hough Transform for circles. A number of different<br />
implementation approaches of the algorithm is<br />
compared in CUDA. Results show that up to 360<br />
times speed up could be achieved compared to its<br />
CPU version, enabling real time applications.<br />
Contact: Alptekin Temizel (Middle East Technical<br />
University)<br />
AV07 - Accelerating an Imaging Spectroscopy<br />
Algorithm Using <strong>GPU</strong>s<br />
Graphics Processing Units (<strong>GPU</strong>s) have proven to<br />
be effective at accelerating a range of scientific<br />
applications. As data needs increase, and more<br />
complex data analysis methods are used, the<br />
processing requirements for solving scientific<br />
problems also increase. The parallel processing<br />
power of <strong>GPU</strong>s can be harnessed and used<br />
alongside multi-core CPUs to address this. As an<br />
example, many problems require solving<br />
optimization problems of multiple variables across<br />
large arrays of data. By utilizing modern<br />
optimization techniques and combining them with<br />
the computational throughput of a CPU-<strong>GPU</strong><br />
computing platform, we can greatly decrease the<br />
processing time required to solve these problems.<br />
Contact: Matthew Sellitto (LLC IntroVision)<br />
AV08 - CUVILib - <strong>GPU</strong> Accelerated Vision &<br />
Imaging Library<br />
Image Processing algorithms are used in a variety<br />
of different domains, from surveillance to medicine<br />
to industry. CUVI (CUDA Vision and Imaging Library)<br />
provides <strong>GPU</strong> accelerated Vision and Imaging<br />
functionality with plug-and-play ease of use, simple<br />
yet powerful interface and support for both NVIDIA<br />
and AMD <strong>GPU</strong>s. With over 1000 users of the Beta<br />
version, CUVI has fast grown into a mature solution<br />
of choice when it comes to delivering real-time<br />
performance for your Imaging/Vision applications<br />
and software-frameworks.<br />
Contact: Salman Ul Haq (TunaCode)<br />
AV09 - Implementation of Raptor Code on <strong>GPU</strong><br />
Raptor Code comes as an improvement to<br />
LT-Code, which performs as close as possible to<br />
the Shannon’s channel limit and provides linear<br />
encoding and decoding time. It has been chosen<br />
for the forward error correction (FEC) scheme in<br />
3GPP and DVB-H standards. We implement<br />
Raptor Codes on <strong>GPU</strong> for the purpose of<br />
processing large block size and symbol size<br />
effectively and efficiently.Our <strong>GPU</strong> decoding<br />
achieve up to a 40x speedup over the sequential<br />
CPU decoding.<br />
Contact: Linjia Hu (Michigan Technological<br />
University)<br />
AV10 - Real-Time Wind Velocity Estimation from<br />
Aerosol Lidar Data Using <strong>GPU</strong>s<br />
The REAL is an atmospheric light detection and
anging (LIDAR) system. It produces nearhorizontal<br />
and vertical cross-sectional images of<br />
the lower atmosphere. The images reveal the<br />
spatial distribution of atmospheric aerosol<br />
(particulate matter). By applying motion<br />
estimation algorithms to image sequences,<br />
two-dimensional vector wind fields can be<br />
determined. We will explore the use of <strong>GPU</strong><br />
computing in the real-time computation of wind<br />
vector fields.<br />
Contact: Chris Mauzey (Johns Hopkins University,<br />
Applied Physics Laboratory)<br />
AV11 - <strong>GPU</strong> Based Feature Extraction<br />
Implementation<br />
In this poster, we introduce an efficient parallel<br />
implementation of Mel-frequency Cepstral<br />
Coefficient (MFCC)-based feature extraction and<br />
describe the optimizations required for effective<br />
throughput on many core Graphic Processing<br />
Units (<strong>GPU</strong>) processors. We demonstrate that the<br />
feature extraction process in automatic speech<br />
recognition is well suited for <strong>GPU</strong>s and a<br />
substantial reduction in computation time can be<br />
obtained by performing feature extraction on<br />
these platforms. Using a single NVIDIA GTX460<br />
<strong>GPU</strong> our proposal approach is shown to be<br />
approximately 25x faster than a sequential CPU<br />
implementation, enabling feature extraction to be<br />
performed in real-time.<br />
Contact: Haofeng Kou (SCU)<br />
BIOINFORMATICS<br />
BI01 - Acceleration of Complex Network Analysis<br />
The scientific role of complex networks nowadays is<br />
of great importance. Their universal characteristics<br />
can be adopted for use from all over the scientific<br />
fields as network pharmacology.There is need for<br />
acceleration where the time execution of the used<br />
algorithms will be decreased in a large scale.The<br />
breakthrough is the use of <strong>GPU</strong>s and parallel<br />
computing in order to accelerate the whole<br />
process.The transformation of common algorithms<br />
as matrix multiplication to a parallel model has<br />
shown large acceleration, which is a promising<br />
point for the field of network analysis.<br />
Contact: Athanasios Grivas (Newcastle University)<br />
BI02 - GHOSTM: A <strong>GPU</strong>-Accelerated Homology<br />
Search Tool for Metagenomics<br />
A vast amount of sensitive homology searches is<br />
required for mapping sequence data to known<br />
protein sequence databases in metagenomic<br />
analysis. However, fast search tools such as BLAT<br />
do not have enough search sensitivity for<br />
metagenomic analysis. Thus a sensitive and<br />
efficient homology search tool is highly required.<br />
We develop <strong>GPU</strong> optimized algorithm for<br />
performing sensitive sequence homology<br />
searches. We implemented as the <strong>GPU</strong>-<br />
Accelerated Homology Search Tool for<br />
Metagenomics (GHOSTM), achieves calculation<br />
speeds faster and search accuracy higher than<br />
BLAT program. Our results indicate that GHOSTM<br />
offers a potentially cost-efficient solution to the<br />
increasingly difficult computational analysis of<br />
metagenomic data.<br />
Contact: Shuji Suzuki (Tokyo Institute of <strong>Technology</strong>)<br />
CLIMATE & WEATHER MODELING<br />
CW01 - CUDA/JAVA Model for Gas Line-by-Line<br />
Absorption of Atmospheric Radiation<br />
The potential of graphics processing units (<strong>GPU</strong>) to<br />
speed up the calculation of radiative energy<br />
absorption by atmospheric gases is presented. Gas<br />
absorption calculations are needed at millions of<br />
electromagnetic waves to have an accurate<br />
depiction of the Earth’s in-coming and out-coming<br />
radiative energies. The CUDA/<strong>GPU</strong> portion obtains<br />
the gases’ Voigt lineshapes, whereas the Java/CPU<br />
portion performs efficient I/O tasks on the large<br />
HITRAN database of molecular gas parameters. A<br />
modular combination of the lower-level CUDA<br />
algorithms and the higher-level Java language<br />
results in an accessible interface to the end-user<br />
that is not an expert in <strong>GPU</strong>.<br />
Contact: William Godoy (NASA Langley Research<br />
Center)<br />
CW02 - Heat Transfer Ray Tracing with OptiX<br />
QUIC Radiant is part of a suite of <strong>GPU</strong>-assisted<br />
tools developed by our research group that aim to<br />
increase knowledge for how environment and<br />
urban form interact. Our hypothesis is that urban<br />
structures exist that can minimize energy use<br />
while also minimizing air pollution exposure. Our<br />
efforts investigate the complex interactions of<br />
various types of urban structures by developing<br />
design strategies for optimizing urban form under<br />
a variety of constraints.<br />
Contact: Scot Halverson (University of<br />
Minnesota Duluth)<br />
COMPUTATIONAL FLUID DYNAMICS<br />
CD01 - Coalesced Simulation of Incompressible<br />
Navier-Stokes Equations Over Airfoil Using <strong>GPU</strong><br />
This work presents <strong>GPU</strong> based implementation of<br />
Finite Differencing Time Domain (FDTD) methods,<br />
for solving unsteady incompressible viscous flow<br />
over airfoil using the Stream function-Vorticity<br />
formulation for the structured grid. For the<br />
large-scale simulations, FDTD methods can be<br />
computationally expensive and require<br />
considerable amount of time to solve on CPUs. On<br />
the contrary, modern GP<strong>GPU</strong>s are designed to<br />
accelerate lots of independent calculations due to<br />
advantage of their parallel architecture. Our<br />
implemented FDTD simulation has efficient global<br />
memory coalescence with 66.67% of occupancy.<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
87
with High<br />
SGI®<br />
Performance<br />
<strong>GPU</strong><br />
NVIDIA®<br />
Tesla®<br />
Compute Solutions<br />
© <strong>2012</strong> Silicon Graphics International Corp. SGI is a trademark of Silicon Graphics International Corp or its subsidiaries in the<br />
U.S. and/or other countries. NVIDIA and Tesla are trademarks of NVIDIA Corporation in the U.S. and/or other countries.<br />
SGI Ad?<br />
���������������������<br />
SGI ® servers with NVIDIA ® Tesla ®<br />
<strong>GPU</strong>s deliver massive parallel<br />
compute power. Power that<br />
accelerates the pace at which our<br />
customers can solve their most<br />
compute-intensive challenges<br />
including structural design, drug<br />
research, oil and gas exploration,<br />
������������������������������<br />
sgi.com/products/gpu<br />
Come visit us at booth #4 in<br />
the exhibitor hall of the <strong>GTC</strong><br />
conference.
<strong>GPU</strong> based version of flow solver is over 28 times<br />
faster than a sequential CPU version.<br />
Contact: Iman Gohari (University of Tehran)<br />
CD02 - Parallel Computations on <strong>GPU</strong> in 3D<br />
Vortex Particle Method<br />
In this poster the Vortex in Cell (VIC) method for<br />
solution of the fluid equation in 3D and its<br />
implementation for parallel computation in<br />
muliticore architecture of the graphics cards was<br />
shortly presented. One of the most important<br />
components of VIC method algorithm is solution of<br />
the Poisson equation. Multigrid and Full Multigrid<br />
methods were chosen for its solution. It was<br />
obtained 12 times speed-up comparing to the<br />
direct fast solution algorithm for a single processor.<br />
The VIC method was fully implemented on the <strong>GPU</strong><br />
and a 46 times speed-up was obtained. The tests of<br />
the method were also shown.<br />
Contact: Andrzej Kosior (Wroclaw University<br />
of <strong>Technology</strong>)<br />
CD03 - Reynolds Equation Solver on GP<strong>GPU</strong> for<br />
Gas Film Lubrication Problem<br />
In the present study, we implemented a Reynolds<br />
equation solver on GP<strong>GPU</strong> for gas film lubrication<br />
problem. By using Red-Black Gauss-Siedle<br />
iteration scheme, we achieved 106x speedup for<br />
core calculation part and overall 12x speedup<br />
(double precision), relative to 1 core of AMD Llano<br />
A8-3850. A small serial part becomes a critical<br />
bottleneck and degrades overall speedup as the<br />
problem size gets bigger and <strong>GPU</strong> efficiency<br />
increases. Future work will include the<br />
development of general gas film analysis solver<br />
and the development of parallelization scheme for<br />
remaining serial part, such as integration, error<br />
check, and et al.<br />
Contact: Ji-Hoon Kang (KISTI)<br />
CD04 - Digital Core Analysis with <strong>GPU</strong><br />
Application<br />
Markets associated with the use of computed<br />
tomography (CT) for the calculation of core<br />
characteristics is one of the fast-growing markets<br />
in the oilfield services. Multi-<strong>GPU</strong> system<br />
processes raw data from CT-scanner using<br />
cheaper and more efficient way than CPU<br />
clusters. Calculation of key parameters of core<br />
such as porosity, absolute permeability and<br />
acoustic properties was processed using MPI and<br />
CUDA technologies. Special attention was paid to<br />
optimize memory usage and computational<br />
algorithms. Algorithms were tested on<br />
“Lomonosov” supercomputer and had close to<br />
linear increase in the computation speed<br />
according to the number of <strong>GPU</strong> devices in use.<br />
Contact: Dmitry Senin (University of Illinois at<br />
Urbana-Champaign)<br />
CD05 - Immersed Boundary Turbulent Flow<br />
Simulations on <strong>GPU</strong> Clusters<br />
A survey of recent literature reveals that <strong>GPU</strong><br />
speedup factors are generally much higher for<br />
structured Cartesian mesh methods than<br />
unstructured mesh methods. However, Cartesian<br />
mesh methods do not readily extend to complex<br />
geometries. To this end, immersed boundary (IB)<br />
methods extend Cartesian methods to complex<br />
geometry flow problems by imposing the boundary<br />
conditions on the equations as a forcing term. In<br />
this study we further develop our multi-<strong>GPU</strong><br />
parallel flow solver, GIN3D, to complex geometry<br />
turbulent flow problems by implementing the IB<br />
method along with the Lagrangian dynamic<br />
large-eddy simulation (LES) technique, which is<br />
suitable for arbitrarily complex shapes.<br />
Contact: Rey DeLeon (Boise State University)<br />
CD06 - Framework for Advanced Plasma<br />
Simulations on <strong>GPU</strong> HPC Clusters<br />
We present a fluid code called WARPM utilizing<br />
modern many-core computing devices – namely<br />
<strong>GPU</strong>s. WARPM is designed to both minimize data<br />
movement and maximize data-parallel<br />
computation. The code is a hybrid combination of<br />
OpenCL for parallel computation, MPI for<br />
communication between nodes, and threads for<br />
task-parallelism. The OpenCL standard is central<br />
to the code. <strong>GPU</strong>s and/or multi-core CPUs are<br />
utilized simultaneously to compute updates to the<br />
system of fluid equations using patch sequencing.<br />
We believe this new framework is representative of<br />
the future of high-performance fluid simulations<br />
and can be useful now to others in the community.<br />
Contact: Noah Reddell (University of Utah)<br />
COMPUTATIONAL PHYSICS<br />
CP01 - High Performance Beam Dynamics<br />
Simulator for the LANSCE Linear Accelerator<br />
The LANSCE accelerator complex located at the<br />
Los Alamos National Laboratory is a multi-beam<br />
facility that provides high-intensity H+ and H-<br />
particle beams for a variety of user programs. At<br />
the heart of the facility is a ½-mile long linear<br />
accelerator (linac). During beam operations, linac<br />
parameters are adjusted to maintain minimal<br />
beam spill, but without detailed knowledge of the<br />
beam distribution. We are presently developing a<br />
high performance multiparticle beam dynamics<br />
simulator using <strong>GPU</strong> that will provide fast and<br />
valuable information about the beam distribution<br />
in pseudo real-time during accelerator operations.<br />
Contact: Xiaoying Pang (The University of Plymouth)<br />
CP02 - Accelerating Atomic Collisions<br />
Calculations with CUDA: Atomic Basis Overlaps<br />
Atomic collisions calculations are relevant in many<br />
areas of science, from research in new materials<br />
to atmospheric studies, and even radiation therapy<br />
treatments. Accurate atomic computations are<br />
difficult and time consuming, computer codes in<br />
those areas rely basically in approximate models.<br />
The high performance computing power of <strong>GPU</strong>s<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
89
POSTER LISTINGS<br />
will allow to include precise computations in those<br />
codes. We started our research using simple ways<br />
to accelerate basic atomic collisions calculations<br />
using CUDA, and found excellent speed ups.<br />
Contact: Flavio Colavecchia (Div. Colisiones<br />
Atómicas/Instituto Balseiro)<br />
CP03 - Fast Discrete Element Simulations Using<br />
<strong>GPU</strong>s in the Million-Particle-Range<br />
Discrete Element Method (DEM) was introduced<br />
already in 1979. Even though available, due to<br />
limited computational power it was a challenge to<br />
run a simulation of granular assemblies of a few<br />
hundred disks in two dimensions for a long time.<br />
Meanwhile three-dimensional simulations in the<br />
range of 10,000 to 200,000 particles are standard<br />
and can be achieved on workstations and clusters,<br />
enabling simulated process times of up to several<br />
minutes in the latter case. Smart implementations<br />
with respect to the specific architecture of a <strong>GPU</strong><br />
allows for millions of particles already on a single<br />
<strong>GPU</strong> under your desk.<br />
Contact: Charles Radeke (University of Washington)<br />
CP04 - Discontinuous Galerkin Time-Domain<br />
Simulations of Plasmonic Nanostructures on<br />
NVIDIA <strong>GPU</strong>s<br />
The discontinuous Galerkin time-domain (DGTD)<br />
method is a powerful method to explore the<br />
electromagnetic properties of nano-scale<br />
plasmonic and dielectric systems. Here, we present<br />
the method’s advantages and disadvantages when<br />
implemented to run on graphic processing units<br />
(<strong>GPU</strong>s). The <strong>GPU</strong>’s superior performance is<br />
demonstrated for realistic nanophotonic setups<br />
characterized by both, optical spectroscopy and<br />
electron energy loss spectroscopy. Compared to<br />
modern CPU hardware, <strong>GPU</strong>-based DGTD yields up<br />
to two orders of magnitude decreased<br />
computational time.<br />
Contact: Richard Diehl (Karlsruhe Institute<br />
of <strong>Technology</strong>)<br />
CP05 - Inversion of a Sequence of Matrices<br />
Differing in Diagonal Elements<br />
We propose an implementation of the <strong>GPU</strong><br />
algorithm for the inversion of special matrices set.<br />
Each matrix in the set is differs from others only by<br />
its diagonal elements.The algorithm uses a direct<br />
product procedure for the matrix inversion. The<br />
ability to use massive parallelization for the<br />
calculation of the direct product allows to effectively<br />
use <strong>GPU</strong> calculations which speeds up the solution<br />
of this problem. We implement and study the<br />
properties of this algorithm for complex valued<br />
matrices. Using the <strong>GPU</strong> algorithm for simulation<br />
of the disordered 2D-lattice systems allows to<br />
achive significant speed up in calculations.<br />
Contact: Alexey Osipov (Jet Propulsion Laboratory)<br />
CP06 - Accelerating Particle Simulations with<br />
<strong>GPU</strong> Computing<br />
RandomWalk is a program designed to model<br />
particle dispersion for a city-scale environment. It<br />
is used to model airborne hazards in urban<br />
environments. We reimplemented RandomWalk in<br />
CUDA to achieve significantly faster results.<br />
Contact: Scot Halverson (University of<br />
Minnesota Duluth)<br />
CP07 - Accelerating Particle-Tracking Based<br />
Beam Dynamics Simulations with <strong>GPU</strong>s<br />
Efficient implementation of general-purpose<br />
particle tracking on <strong>GPU</strong>s can result in significant<br />
performance benefits to large-scale particle<br />
tracking and tracking-based accelerator<br />
optimization simulations. We present our work on<br />
CUDA kernels for transfer maps of single-particledynamics<br />
and collective-effects beamline elements,<br />
to be incorporated into a <strong>GPU</strong>-accelerated version<br />
of the Argonne National Lab’s accelerator code<br />
ELEGANT. In particular, we discuss techniques for<br />
efficient utilization of the device shared, cache, and<br />
local memory in the design of single-particle and<br />
collective-effects kernels. We also discuss the use<br />
of data-parallel and hardware-assisted approaches<br />
for resolving memory contention issues in collective<br />
effects kernels.<br />
Contact: Keegan Amyx (Tech-X Corporation)<br />
COMPUTER GRAPHICS<br />
CG01 - CUDA-Based Interactive Design of Urban<br />
Ecosystems<br />
We address the problem of interactive design of<br />
urban spaces by integrating plants in urban<br />
environments. We have developed an interactive<br />
simulation and procedural system for 3D urban<br />
models. Using our CUDA-based interactive system<br />
we can simulate spatial distribution of a large<br />
ecosystem embedded in a city. We have achieved a<br />
performance of 50M-70M collision tests per<br />
second allowing for 250,000 plants being<br />
simulated at 5-6 fps on a Tesla C2050.<br />
Contact: Michel Abdul Massih (Purdue University)<br />
CG02 - Robust <strong>GPU</strong> Algorithm for Exact 3D<br />
Minkowski Sum Computation<br />
We present a robust <strong>GPU</strong> algorithm to compute<br />
exact 3D Minkowski sum of two polyhedral<br />
objects. While Minkowski sum is of great<br />
importance in mathematics, geometric modeling,<br />
and robotics, it is hard to compute efficiently and<br />
robustly. The proposed algorithm achieves high<br />
performance by mainly running on <strong>GPU</strong>, while<br />
filtering out unsafe predicates caused from<br />
degenerate cases by using interval arithmetic. The<br />
filtered unsafe predicates are tossed to CPU<br />
where they are robustly evaluated by using<br />
extended arithmetic (MPFR). The performance<br />
result shows speedup of one order of magnitude<br />
versus a pure CPU algorithm.<br />
Contact: Min-Ho Kyung (Ajou University)
CG03 - Real-Time Mixed Water Simulation and<br />
Rendering Techniques for Visual Effects<br />
The synthesis of realistic scenes is a important<br />
research areas for applications in games and<br />
visual effects. Research groups developed<br />
techniques for realistic water rendering, but there<br />
are no research work that describes techniques<br />
and make a comparative analysis of them. The<br />
present work research analyses the most<br />
important techniques for water simulation and<br />
visualization, makes performance comparison,<br />
and create a system driven for artists. The system<br />
can choose between algorithms and combine<br />
them using layers to achieve the desired result.<br />
Finally it can use a virtual camera to output the<br />
final render in multiple passes for post production.<br />
Contact: Rodrigo Marques (California State<br />
University, Chico)<br />
COMPUTER VISION<br />
CV01 - Efficient Dense Stereo Matching Using<br />
CUDA<br />
The proposed work demonstrates the general<br />
strategy for parallelization of dense matching<br />
methods on <strong>GPU</strong>s, shows the potential capability<br />
of common graphics cards for general<br />
computation, and compares the implementations<br />
between local and global methods with the<br />
example of Sum of Absolute difference (SAD) and<br />
Semi-Global Matching (SGM).<br />
Contact: Ke Zhu (Technische Universität München)<br />
CV02 - Scalable Local Feature Extraction with<br />
Orientation Maps and <strong>GPU</strong> Computing<br />
This paper presents scalable computational<br />
techniques for extracting local invariant features.<br />
Although several investigators have developed<br />
efficient algorithms and implementations for<br />
feature extraction, the scalability in terms of the<br />
number of extracted features still remains as an<br />
issue. We introduce the data structure called<br />
orientation maps and <strong>GPU</strong> computing to improve<br />
the scalability of feature extraction. Experimental<br />
results demonstrate that using orientation maps<br />
and a <strong>GPU</strong> enable us to improve the scalability as<br />
well as the efficiency of computation compared to<br />
a CPU.<br />
Contact: Naoyuki Ichimura (National Institute of<br />
Advanced Industrial Science and <strong>Technology</strong> (AIST))<br />
CV03 - <strong>GPU</strong>-Accelerated Detection of Severe<br />
Video Distortions<br />
We show how to port a previously proposed<br />
algorithm for detection of severe analog and digital<br />
video distortions (termed ‘video breakup’), efficiently<br />
to Fermi Architecture <strong>GPU</strong>s with CUDA. By porting<br />
to a <strong>GPU</strong>, the runtime of the CPU implementations<br />
can be reduced by an order of magnitude. Thus our<br />
<strong>GPU</strong> algorithm is capable of analyzing up to ten Full<br />
HD (1920 x 1080) video streams in real-time. The<br />
<strong>GPU</strong> implementation is integrated in the AV-<br />
Inspector application, which allows the user to get<br />
an automatic assessment of the quality of video and<br />
film material in very short time.<br />
Contact: Hannes Fassold (JOANNEUM RESEARCH)<br />
CV04 - VScreen: A Real-Time Augmented<br />
Video Method<br />
We present a tool for image editing that allows us<br />
to modify a region of any image or video by another<br />
image or video. This application is useful for<br />
advertisements, commercials, music videos,<br />
movies, etc. The main difference between editing<br />
(augmenting) videos and fixed images is that the<br />
occlusions need be managed. Moving objects in<br />
foreground may occlude the augmented region in<br />
background. So that we use a procedure for<br />
Foreground/Background (FgBg) video<br />
segmentation, which is implemented in NVIDIA<br />
video cards to fulfill the real-time requirement.<br />
Contact: Francisco J. Hernandez-Lopez (CIMAT A.C.)<br />
CV05 - Accelerated Multiple Region Evaluation<br />
for Human Motion Tracking<br />
In this work we present a study about different<br />
NVIDIA CUDA approaches to the problem of the<br />
evaluation of a region of interesting (ROI) pixels in<br />
an image. This problem is usually integrated as<br />
part of other higher level methods, such as image<br />
retargeting, completion, video summarization,<br />
object detection, visual tracking, etc. Because<br />
of these problems evaluate millions of ROIs, in<br />
many cases performance is usually far from<br />
being interactive.<br />
Contact: David Concha Gomez (Universidad Rey<br />
Juan Carlos)<br />
CV06 - Efficient Segmentation Trees on the <strong>GPU</strong><br />
There are numerous computer vision tasks which<br />
demand a high performance algorithm for<br />
segmentation trees building. Unfortunately,<br />
current state-of-the-art methods aimed for the<br />
CPU are way too slow. Present work describes an<br />
efficient <strong>GPU</strong> implementation of a popular<br />
algorithm. Performance evaluations show that<br />
unlike its CPU counterpart the proposed method<br />
is suitable for real-time applications.<br />
Contact: Yaroslav Ganin (NVIDIA)<br />
CV07 - <strong>GPU</strong> Vision: OpenCV’s <strong>GPU</strong> Module<br />
Accelerates Computer Vision<br />
OpenCV is the world’s most used library for<br />
computer vision with over 3 million downloads<br />
worldwide. Using the power of CUDA and the<br />
NVPP library, the most computationally<br />
demanding of OpenCV’s more than 500 functions<br />
have been ported for an average speedup of 33X<br />
over the already highly optimized CPU code.<br />
Several application work flows have been<br />
dramatically improved, including HOG pedestrian<br />
detection, face detection, stereo correspondence,<br />
and feature detection and matching.<br />
Contact: Colin Tracey (NVIDIA)<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
91
POSTER LISTINGS<br />
CV08 - Orientation Flows: <strong>GPU</strong> Implementation<br />
Clarifies Cortical Computation<br />
Orientation flows play an important role in shape<br />
inference. We have developed a model of<br />
orientation flow extraction that explains the<br />
statistics of neurophysiologically observed<br />
connection structure through second order (mean<br />
and variance). Our <strong>GPU</strong>-based implementation of<br />
this model realizes dramatic performance<br />
improvements over the original C implementation,<br />
enabling us to pursue formerly prohibitively<br />
time-consuming studies.<br />
Contact: Daniel Holtmann-Rice (Yale University)<br />
CV09 - Michigan Visual Sonification System:<br />
Driving Efficient Mobile Vision Designs<br />
Visual Sonification is the process of converting<br />
visual properties of objects into audio. The<br />
Michigan Visual Sonification System (MVSS)<br />
utilizes this process to assist the visually impaired<br />
in distinguishing objects in their surroundings.<br />
MVSS uses computer vision to analyze scenes and<br />
create a dynamic audio representation of each<br />
object which is presented to the user using 3D<br />
audio. The performance of MVSS on mobile<br />
processors exposed a need for improved mobile<br />
vision performance. Our benchmark suite,<br />
MEVBench, was used to further analyze the<br />
computational characteristics of mobile vision.<br />
The EFFEX architecture was developed for<br />
efficient feature extraction in mobile vision.<br />
Contact: Jason Clemons (University of Michigan)<br />
DATABASES, DATA MINING, BUSINESS<br />
INTELLIGENCE<br />
DB01 - Parallel Data Mining Techniques on<br />
Graphics Processing Unit with CUDA<br />
Data mining is widely used in various domains and<br />
has significant applications. However, current data<br />
mining tools cannot meet the requirement of<br />
applications with large-scale databases in terms<br />
of speed. We propose three techniques to<br />
accelerate fundamental kernels in data mining<br />
algorithms on CUDA platform, scalable thread<br />
scheduling scheme for irregular pattern, parallel<br />
distributed top-k scheme, and parallel high<br />
dimension reduction scheme. They play a key role<br />
in our GUCAS_CU-Miner, including three<br />
representative data mining algorithms, CU-<br />
Apriori, CU-KNN and CU-K-means. The<br />
experiments have shown that <strong>GPU</strong> + CUDA<br />
parallel architecture is feasible and promising for<br />
data mining applications.<br />
Contact: Ying Liu (Graduate University of Chinese<br />
Academy of Sciences)<br />
DB02 - Parallel Spectral Graph Partitioning<br />
on CUDA<br />
Spectral graph partitioning is a widely used<br />
technique in many fields such as image<br />
processing, scientific computing and machine<br />
learning. In this study, we analyze the subroutines<br />
of spectral graph partitioning algorithm on CUDA.<br />
Each step is analyzed using various different<br />
techniques to lead a conclusion about suitability of<br />
the step for <strong>GPU</strong> implementation.Two different<br />
<strong>GPU</strong> configurations are implemented and their<br />
results are compared against the CPU version.<br />
Contact: Alptekin Temizel (Middle East Technical<br />
University)<br />
DB03 - Red Fox: Accelerating Data Warehousing<br />
Applications Using GP<strong>GPU</strong>s<br />
Red Fox is a compiler optimization framework for<br />
accelerating large scale data warehousing<br />
applications on cloud architectures augmented<br />
with <strong>GPU</strong>s. Currently, the framework is structured<br />
around the program transformations based on the<br />
concepts of kernel fusion and fission, drawing<br />
upon the analogy with classical loop fusion and<br />
fission transformations. These transformations<br />
seek to improve <strong>GPU</strong> utilization and optimize data<br />
movement throughout the CPU/<strong>GPU</strong> memory<br />
hierarchy. Coupled with the Ocelot dynamic<br />
compiler, this framework can optimize the<br />
execution of applications across the CPU and<br />
<strong>GPU</strong>. The initial application domain includes<br />
relational operators and arithmetic functions<br />
found in data warehousing applications.<br />
Contact: Haicheng Wu (Georgia Institute of<br />
<strong>Technology</strong>)<br />
DEVELOPMENT TOOLS & LIBRARIES<br />
DL01 - AutoTune: Automatic Online Code Tuning<br />
Performance analysis and tuning is an important<br />
step in programming multicore and manycore<br />
architectures. There are several tools to help<br />
developers analyze application performance; still,<br />
no tool provides recommendations about how to<br />
tune the code. AutoTune will extend Periscope, an<br />
automatic online and distributed performance<br />
analysis tool developed by Technische Universität<br />
München, with plugins for performance and<br />
energy efficiency tuning. The resulting Periscope<br />
Tuning Framework will be able to tune serial and<br />
parallel codes with and without <strong>GPU</strong> kernels; in<br />
addition, it will return tuning recommendations<br />
that can be integrated into the production version<br />
of the code.<br />
Contact: Renato Miceli (Aon Benfield Securities)<br />
DL02 - Interactive Linked Visualizations for<br />
Performance Analysis Of Heterogeneous<br />
Computing Clusters<br />
Performance analysis is a vital step in identifying<br />
execution bottlenecks to help target optimizations.<br />
This analysis is derived from observations of<br />
performance data collected from the computing<br />
hardware. Data obtained from computing clusters<br />
is necessarily complicated because its collection<br />
involves multiple interacting nodes as opposed to<br />
just a single serial execution. Further,<br />
heterogeneous clusters, having CPUs working
together with several <strong>GPU</strong>s, add additional layers<br />
of complexity. These characteristics pose a<br />
serious challenge to the analysis and<br />
improvement of application performance. We<br />
present a tool that assists performance analysis<br />
by visualizing performance data with the help of<br />
various interactive linked views.<br />
Contact: Aaditya Landge (Scientific Computing and<br />
Imaging Institute, University of Utah)<br />
DL03 - High-Performance Pedestrian Multi-<br />
Simulation Using <strong>GPU</strong> Cluster<br />
We have created a tool that could potentially help<br />
with decision support and planning of large-scale<br />
emergency pedestrian evacuations. Through the<br />
use of our simulation software distributed over a<br />
<strong>GPU</strong> cluster, many evacuation scenarios can be<br />
simultaneously simulated at faster than real-time<br />
speeds and compared for their effectiveness.<br />
Contact: Twin Karmakharm (University of Sheffield)<br />
DL04 - ttgLib - Middleware for Dynamic Software<br />
Adaptation to Heterogeneous Architectures<br />
We present ttgLib, a middleware that efficiently<br />
distributes computational tasks between CPUs<br />
and <strong>GPU</strong>s and provides load balancing between<br />
them on the fly. This enables an application to use<br />
all available processing units of heterogeneous<br />
HPC system simultaneously. ttgLib accomplishes<br />
several dynamic optimization procedures that<br />
significantly facilitate the development of new<br />
applications for and porting of existing software to<br />
heterogeneous platforms. ttgLib can be<br />
considered as an extension of widely used parallel<br />
programming tools that can be easily integrated<br />
into software development process. This<br />
middleware efficiently solves the most tedious<br />
problems of ‘heterogeneous coding’ the<br />
developers usually met with.<br />
Contact: Sergey Grizan (Moscow State University<br />
and Siberian Federal University, ttgLabs)<br />
DL05 - Efficient Formal Verification of CUDA<br />
SIMD and Atomics<br />
Detecting and Debugging assertion failures and<br />
runtime errors in CUDA programs is usually hard.<br />
Typical multithreaded program verification<br />
methods are not effective for verifying the largescale<br />
fine-grained concurrency of CUDA. Our novel<br />
contribution is a technique to handle CUDA SIMD<br />
plus Atomics using concolic execution methods.<br />
Contact: Wei-Fan Chiang (School of Computing,<br />
University of Utah)<br />
DL06 - Performance Optimizations And<br />
Modeling For Large-Scale Heterogeneous<br />
Computing Systems<br />
This poster proposes to address the following at<br />
every level of parallelism in heterogeneous<br />
computing systems: 1) performance optimizations<br />
of applications, and 2) performance modeling<br />
and prediction.<br />
Contact: Ashwin Aji (Virginia Tech)<br />
ELECTRONIC DESIGN AUTOMATION<br />
EA01 - Parallel VLSI CAD Algorithms for Energy<br />
Efficient Heterogeneous Computing Platforms<br />
In the past decade, parallel VLSI CAD tools have<br />
been successfully developed by major EDA<br />
vendors to leverage multi-core/distributed parallel<br />
computing powers. However, for recent energy<br />
efficient heterogeneous computing platforms that<br />
integrate multi-core CPUs and many-core <strong>GPU</strong>s,<br />
very limited progress has been made in VLSI CAD<br />
research society. Developing efficient CAD<br />
algorithms for such heterogeneous platforms can<br />
be extremely challenging, requiring strong<br />
domain-specific CAD algorithm knowledge as well<br />
as thorough understanding of the latest hardware<br />
properties. In this abstract, we show our latest<br />
research progress on large scale circuit electrical<br />
and thermal modeling and simulation methods.<br />
Contact: Zhuo Feng (Michigan Technological<br />
University)<br />
EA02 - Ultra-Low Power Transceivers for<br />
High-Bandwidth Interconnects<br />
A low-power transceiver for highly parallel<br />
chip-to-chip data communication is presented.<br />
The receiver is implemented in a 45nm SOI<br />
technology. High data rate and low power<br />
dissipation is achieved using a switched-capacitor<br />
S/H/summer front-end which enables FEXT<br />
cancellation with 33µW/Gbps power overhead. It<br />
operates up to 15Gb/s and dissipates 7.5mW from<br />
a 1.2V supply. The 15Gb/s transmitter employs an<br />
analog filtering pre-emphasis equalization<br />
technique and dissipates 10mW from a 1.2V supply<br />
while occupies 0.01mm2. It was fabricated in<br />
65nm CMOS technology and compensates for<br />
channel losses up to 20dB at Nyquist-rate.<br />
Contact: Meisam Honarvar Nazari (California<br />
Institute of <strong>Technology</strong>)<br />
ENERGY EXPLORATION<br />
EE01 - The Maven Vector-Thread Architecture<br />
We present a taxonomy and modular<br />
implementation approach for data-parallel<br />
accelerators, including the MIMD, vector-SIMD,<br />
subword-SIMD, SIMT, and vector-thread(VT)<br />
architectural design patterns. We have developed<br />
a new VT microarchitecture, Maven, based on the<br />
traditional vector-SIMD microarchitecture that is<br />
simpler to implement and easier to program than<br />
previous VT designs. Using an extensive designspace<br />
exploration of full VLSI implementations of<br />
many accelerator design points, we evaluate the<br />
varying tradeoffs between programmability and<br />
implementation efficiency among the different<br />
architectural patterns. Our results suggest that<br />
the Maven VT microarchitecture is superior to the<br />
vector-SIMD architecture, providing both greater<br />
efficiency and easier programmability.<br />
Contact: Yunsup Lee (UC Berkeley)<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
93
Covering the fastest computers in the world<br />
and the people who run them<br />
Subscribe Today!<br />
HPC Wire Ad?<br />
www.hpcwire.com
FINANCE<br />
FA01 - PathWise High Productivity<br />
Computing Platform<br />
PathWise High Productivity Computing (HPC)<br />
platform is a financial modeling environment for<br />
targeting <strong>GPU</strong> grids.<br />
Contact: Aamir Mohammad (Tsinghua University)<br />
GENERAL INTEREST<br />
GI01 - High Throughput MIMO-OFDM Detection<br />
with Graphics Processing Units<br />
A novel strategy is proposed to implement<br />
a reconfigurable MMSE-based detector for<br />
multiple-input multiple-output (MIMO) wireless<br />
communication systems with orthogonal<br />
frequency-division multiplexing (OFDM). The key<br />
component of the strategy is a massively parallel<br />
implementation of the scalable matrix inversion<br />
on <strong>GPU</strong>s. A series of optimization methods<br />
including multi-threaded matrix inversion with<br />
multiple data frames, maximizing the utilization<br />
of the fast on-chip memories, and overlapping<br />
kernel execution with data transfer, are proposed.<br />
Experiments demonstrate that the throughputs<br />
for a 4×4 64QAM MIMO-OFDM system can<br />
achieve over 100 Mbit/s, satisfying 4G wireless<br />
communication standards like LTE/LTE-Advanced.<br />
Contact: Dan Sui (Wireless & Mobile<br />
Communication R&D Center, Tsinghua University)<br />
GI02 - A Fast Irregular LDPC Decoder on NVIDIA<br />
Fermi<br />
Low-Density Parity-Check (LDPC) codes are<br />
widely used in many wireless communication<br />
systems. The decoding algorithms are often<br />
time-consuming. Graphics Processing Unit (<strong>GPU</strong>)<br />
is an attractive co-processor of CPU to implement<br />
massively parallel computing. The <strong>GPU</strong>-based<br />
LDPC decoder is studied, especially for irregular<br />
LDPC codes. Optimization techniques for <strong>GPU</strong> are<br />
considered. Experimental results demonstrate<br />
that compared to CPU, <strong>GPU</strong> can achieve more<br />
than 80 times speedup.<br />
Contact: Dan Sui (Wireless & Mobile<br />
Communication R&D Center, Tsinghua University)<br />
GI03 - Actual Power Consumption in Pattern<br />
Matching on CUDA <strong>GPU</strong>s<br />
For many embedded applications in e.g. the<br />
Aerospace/Defense industry, power efficiency is<br />
very important as both cooling and power are<br />
often difficult to supply. We show that the specified<br />
max power of a CUDA <strong>GPU</strong> is not a good measure<br />
of actual power consumption under a CUDA load,<br />
and that writing efficient code which reaches high<br />
utilization is of the essence when it comes to<br />
power efficiency.<br />
Contact: Ian Wainwright (High Performance<br />
Consulting)<br />
GI04 - <strong>GPU</strong>-Accelerated Fingerprint Matching<br />
As biometric databases approach hundreds of<br />
millions of identities in size, it becomes more<br />
costly and time-consuming to search these<br />
databases. Using a <strong>GPU</strong>-accelerated coarse<br />
filtering algorithm, we demonstrate that a large<br />
fingerprint database can be searched very quickly<br />
for a matching individual by isolating a small list<br />
of potential matches using <strong>GPU</strong>s, such that only<br />
these few records will be given further scrutiny by<br />
the matching system.<br />
Contact: Scott Bai (The MITRE Corporation)<br />
GI05 - Towards Task-Pipelined General Purpose<br />
Computing on <strong>GPU</strong>s<br />
Many real-world applications, especially those<br />
following a stream processing pattern, feature<br />
interleaved task-pipelined and data parallelisms.<br />
Current <strong>GPU</strong>s are ill-equipped for such<br />
applications due to the insufficient usage of<br />
computing resources and/or the excessive off-chip<br />
memory traffic. This paper focuses on architectural<br />
enhancements to enable task-pipelined execution<br />
of data-parallel kernels on <strong>GPU</strong>s. We propose an<br />
efficient adaptive dynamic scheduling mechanism<br />
and a moderately modified L2 cache structure to<br />
orchestrate both task-pipelined and data<br />
parallelisms. Simulation results show that the<br />
proposed <strong>GPU</strong> architecture improves IPC by 18%<br />
and reduces the overall access to off-chip <strong>GPU</strong><br />
memory by 11% on average.<br />
Contact: Shuai Mu (ABB Corporate Research)<br />
GI06 - High Performance Computing in<br />
Volumetric Velocimetry<br />
Since the advent of Particle Image Velocimetry (PIV)<br />
in experimental fluids measurements there has<br />
been a steady and sustained incline in the<br />
throughput capability and resolution of hardware<br />
devices (i.e.CMOS cameras) needed to acquire and<br />
transfer the copious amounts of image data. With<br />
the introduction of tomographic measurement<br />
techniques the amount of data suddenly increased<br />
by an order of magnitude. While the development of<br />
hardware paces reasonably well with the<br />
acquisition demand placed by current experiments,<br />
the ability of computers and current algorithms to<br />
process and further reduce the data within a<br />
reasonable period has fallen dramatically behind.<br />
Contact: Thomas Nonn (Moscow State University,<br />
Physics Department)<br />
GI07 - Conformal Transformations of 3D Meshes<br />
in Parallel<br />
Arbitrary deformations applied on 3D meshes<br />
pose significant restrictions in many design<br />
applications. Conformal transformations, these<br />
that preserve oriented angles for a given 3D mesh<br />
parametrization, however, offer the right balance<br />
between flexibility of the geometric form and<br />
structural preservation. The advantage of using<br />
such transformations is two-fold: one can<br />
maintain flexibility of the design process, and<br />
preserve texture and emblematic features of the<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
95
POSTER LISTINGS<br />
mesh. To this end, we investigate efficient and<br />
scalable implementations of the methodology<br />
introduced by Crane et al. in <strong>GPU</strong> architectures.<br />
Contact: Nikolaos Yiotis (London College of<br />
Fashion/ University of Arts London)<br />
<strong>GPU</strong> ACCELERATED INTERNET<br />
GA01 - Accelerating Greater Than-Strong<br />
Conditional Oblivious Transfer Multiparty<br />
Protocol Using <strong>GPU</strong><br />
Greater Than-Strong Conditional Oblivious<br />
Transfer (GT-SCOT) is a protocol used for sharing<br />
data between two parties without revealing any<br />
private information. Due to the large number of<br />
iterative operations and the increasing size of the<br />
input, the algorithm is computationally intensive,<br />
and hence cannot be used for large credentials or<br />
secure database mining. This work presents an<br />
implementation of GT-SCOT using <strong>GPU</strong> in order to<br />
accelerate the operations and handle large<br />
messages. Results show that <strong>GPU</strong><br />
implementation achieved a speedup of 7x for<br />
messages with size of 1024 bits using 64 bits of<br />
encryption for each bit of the message.<br />
Contact: Axel Rivera (The University of Tokyo)<br />
LIFE SCIENCES<br />
LS01 - <strong>GPU</strong>-Enabled Stochastic Spatiotemporal<br />
Model of Rat Ventricular Myocyte Calcium<br />
Dynamics<br />
Some cardiac arrhythmias are thought to result<br />
from Ca2+ waves under spark-induced spark<br />
phenomenon. Calcium sparks - the local elevation<br />
of calcium, may recruit the sparks in the<br />
neighboring sites. However, the study of such<br />
calcium dynamics at a detail whole-cell model is<br />
computational prohibitive. We introduced a novel<br />
Markov-Chain Monte Carlo simulation. The time<br />
steps is at microscale range, i.e.10ns to 1us. The<br />
simulation thus can capture the dynamics of<br />
individual ion channel kinetics. The authors<br />
introduced an on-going effort to study calcium<br />
dynamics, for the first time, that incorporate detail<br />
structure of rat ventricular myocytes.<br />
Contact: Tuan Hoang-Trong (George Mason<br />
University)<br />
LS02 - <strong>GPU</strong> Accelerated Signal Processing in Ion<br />
Torrent Analysis Pipeline<br />
We have adopted solutions to provide fast analysis<br />
results to our customers by accelerating our<br />
signal processing pipeline using Tesla C2050 <strong>GPU</strong>.<br />
This poster presents a high level view of <strong>GPU</strong><br />
application to our processing pipeline.<br />
Contact: Mohit Gupta (Life Technologies)<br />
LS03 - A Fast CUDA Compatible Short Read<br />
Aligner to Large Genomes<br />
We present CUSHAW, a parallelized short read<br />
aligner that exploits CUDA-compatible graphics<br />
hardware as accelerators to achieve fast speed. It<br />
employs a quality-aware bounded search<br />
approach based on the Burrows-Wheeler<br />
transform (BWT) and the FM-index to reduce the<br />
search space and achieve high alignment quality.<br />
Performance evaluation reveals that CUSHAW<br />
running on one or two <strong>GPU</strong>s achieves significant<br />
speedups in terms of execution time, while<br />
yielding comparable or even better alignment<br />
quality for paired-end alignments compared to<br />
three popular BWT-based aligners: Bowtie, BWA<br />
and SOAP2 (availability: http://cushaw.<br />
sourceforge.net).<br />
Contact: Bertil Schmidt (Johannes Gutenberg<br />
University Mainz)<br />
MACHINE LEARNING & AI<br />
ML01 - Accelerating Parallel Monte Carlo Tree<br />
Search Using CUDA<br />
The poster presents a parallel implementation of<br />
Monte Carlo Tree Search algorithm on <strong>GPU</strong> using<br />
CUDA. It is run on the TESLA equipped TSUBAME<br />
supercomputer and the results show that in a<br />
2-player game such as Reversi, the <strong>GPU</strong> version is<br />
much stronger than the CPU one. Additionally, it<br />
can be easily scaled to thousands of <strong>GPU</strong> cores.<br />
The scalability factors are presented.<br />
Contact: Kamil Rocki (KPIT Cummins Infosystems Ltd.)<br />
ML02 - Message Passing Parallelism for Belief<br />
Propagation in Junction Trees<br />
Belief propagation over junction tree is known to be<br />
computationally intensive in the general case. One<br />
way of addressing this computational challenge is<br />
to use parallel computing on <strong>GPU</strong>. In this paper, we<br />
develop a two dimensional parallel computing<br />
model for node level message passing. Based on<br />
this approach, we further develop a novel clique<br />
merging technique that leverages the two<br />
dimensions of parallelismto adapt the various<br />
Bayesian networks to parallel computing platform.<br />
We implement our approach on an NVIDIA <strong>GPU</strong> and<br />
test it using BNs from several applications.<br />
Contact: Lu Zheng (Carnegie Mellon)<br />
ML03 - Parallel Memetic Algorithm<br />
Implementation on CUDA<br />
In this poster, a parallel memetic algorithm<br />
implementation for CUDA platform is described.<br />
The conventional genetic operators are adapted to<br />
the <strong>GPU</strong> considering the <strong>GPU</strong> architecture. In this<br />
population based optimization technique, there are<br />
one more islands and each island consists of<br />
constant number of individuals. Each CUDA thread<br />
is responsible for evolution of one individual, and<br />
islands are mapped as CUDA blocks to benefit from<br />
the shared memory. The results show up to 38x<br />
speedup compared to the CPU implementation.<br />
Contact: Alptekin Temizel (Middle East Technical<br />
University)
ML04 - <strong>GPU</strong>-Accelerated Action Acquisition<br />
Through Multiple Time Scales Recurrent<br />
Neural Network<br />
This poster presents novel results of complex action<br />
learning experiments based on the use of extended<br />
multiple timescales recurrent neural<br />
networks(MTRNN). The experiments were carried<br />
out with the iCub humanoid robot, as a model of the<br />
developmental learning of motor primitives as the<br />
basis of sensorimotor and linguistic<br />
compositionality. The model was implemented<br />
through the <strong>GPU</strong>-accelerated Aquila cognitive<br />
robotics toolkit. The results presented herein show<br />
that the model was able to learn and successfully<br />
reproduce multiple actions in an object manipulation<br />
task scenario using large-scale MTRNNs. This<br />
forms the basis on ongoing experiments on action<br />
and language compositionality.<br />
Contact: Martin Peniak (Federal University of Rio<br />
de Janeiro)<br />
MACHINE VISION<br />
MV01 - <strong>GPU</strong> Based Fast Block Matching Using<br />
Orthogonal Thread Transformation<br />
Block matching (BM) technique is extensively used<br />
in object tracking and defect detection problems.<br />
BM has moderate accuracy for defect detection but<br />
it suffers from heavy performance drawbacks.<br />
Modifications in BM with compromised accuracy<br />
and increased performance have been reported in<br />
the literature. This technique is exhaustive search<br />
technique but on the contrary, it is highly data<br />
parallel in nature. We present the implementation of<br />
BM algorithm using CUDA using a novel orthogonal<br />
thread transformation technique to maintain the<br />
data parallelism throughout the processing. We<br />
have achieved 350x speed up against CPU and 2.3x<br />
against other <strong>GPU</strong> implementations.<br />
Contact: Sudhakar Sah (University of Oregon)<br />
MV02 - Integrating Machine Vision and<br />
Kinematics for a Robotic EV Charger<br />
This Poster is explains the use of Tegra II ULP<br />
GeForce <strong>GPU</strong> for Integrated Machine Vision and<br />
Inverse Kinematics (IK) on a Robotic (SCARA)<br />
Electric Vehicle Charging System. The<br />
convergence of wireless tech, mobile chip sets<br />
and powerful software environments enabled the<br />
smartphone revolution. Applying these economies<br />
with the addition of powerful imaging and GP<strong>GPU</strong><br />
capabilities enables a low-cost, high-performance,<br />
easily-engineered, embedded machine-tomachine<br />
(M2M) solution to an emergent problem<br />
in vehicle transportation. The result:<br />
PowerHydrant ® ELIMINATES ELECTRIC VEHICLE<br />
CHARGING INCONVENIENCE. Robotic conductive<br />
chargers beat wireless inductive chargers on<br />
efficiency, charger-time and constraint-free use.<br />
Contact: Kevin Leary (PowerHydrant)<br />
MEDICAL IMAGING & VISUALIZATION<br />
MI01 - Optimal Speed Gain for CUDA<br />
Implementation of SPECT Image Reconstruction<br />
<strong>GPU</strong> implementation can greatly accelerate<br />
iterative techniques of 3D image reconstruction in<br />
nuclear medicine imaging. To obtain high quality<br />
images in Single Photon Emission Computed<br />
Tomography (SPECT) within reduced scanning<br />
times, high sensitivity collimators need to be used<br />
and their response function modeled in the<br />
reconstruction. This is in general very<br />
computationally intensive and unfeasible with<br />
conventional PCs and algorithm implementations.<br />
Our software is able to perform the reconstruction<br />
of patient data within clinically acceptable times<br />
(18 s vs 17 min on CPU) using relatively low cost<br />
and widely available hardware.<br />
Contact: Jakub Pietrzak (RCPE-TU GRAZ)<br />
MI02 - Accelerating Mutual Information<br />
Computation for Nonrigid Registration on the <strong>GPU</strong><br />
Nonrigid registration is a technique for defining a<br />
geometric relation between each point in images.<br />
Although this technique helps medical doctors in<br />
detecting cancers by monitoring changes in size,<br />
some registration algorithms cannot be efficiently<br />
implemented due to small shared memory. The<br />
main objective of this poster is how such a<br />
capacity issue can be tackled for intra-operative<br />
registration. As an example, we present a CUDAbased<br />
method capable of rapidly computing joint<br />
histograms using shared memory. Our method<br />
achieved a three-fold speedup by exploiting the<br />
sparse structure of joint histograms, with<br />
successful registration of liver CT datasets.<br />
Contact: Kei Ikeda (Osaka University)<br />
MI03 - CUDA Accelerated Real Time Steered<br />
Spatial Compounding in Diagnostic Ultrasound<br />
Spatial compounding is a real time transmit and<br />
receive beam steering technique which acquires<br />
images from multiple lines of sight to increase the<br />
information content in medical ultrasound<br />
images. This function is implemented in the latest<br />
release of the ACUSON SC2000 platform for high<br />
frequency vascular imaging using CUDA texture<br />
lookups for geometric transformation to a<br />
common view. CUDA and the Quadro 2000 enable<br />
a substantial increase in processing performance<br />
(>8×) over conventional CPU based processing.<br />
Contact: Ismayil Guracar (Siemens Healthcare,<br />
Ultrasound Business Unit)<br />
MI04 - Ultrafast Multipinhole SPECT<br />
Iterative Reconstruction Using CUDA-Based<br />
<strong>GPU</strong> Computing<br />
We have developed an ultrafast SIR method for<br />
multipinhole SPECT programmed in CUDA and<br />
tested using a high performance graphic<br />
processing unit. We show significant performance<br />
improvement in reconstruction using both<br />
computer-generated and experimental<br />
sinograms, demonstrating an up-to fifty-fold<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
97
POSTER LISTINGS<br />
speed enhancement with virtually the same<br />
accuracy as the CPU-based SIR (with 0.15%<br />
normalized root mean square error).<br />
Contact: Fares Alhassen (University of California,<br />
San Francisco)<br />
MOBILE APPLICATIONS & INTERFACES<br />
MA01 - Accelerating Computer Vision with<br />
Tegra <strong>GPU</strong><br />
The mobile platform is quickly becoming a serious<br />
computing device, capable of tackling complex<br />
computer vision tasks. The ability to share memory<br />
space between <strong>GPU</strong> and CPU on Tegra 3 offers a<br />
unique opportunity to utilize the <strong>GPU</strong> without<br />
expensive memory copies. We have demonstrated<br />
that acceleration of just a handful of computeintensive<br />
CV operations on the Tegra 3 <strong>GPU</strong> can<br />
free some common bottlenecks and achieve real<br />
time performance on Video Stabilization and<br />
Panoramic Stitching applications.<br />
Contact: Colin Tracey (NVIDIA)<br />
MOLECULAR DYNAMICS<br />
MD01 - <strong>GPU</strong>-Based Molecular Dynamic<br />
Simulations Optimized with CUDPP and CURAND<br />
Libraries<br />
Computer simulations are indispensible tools for<br />
deciphering how biomolecular structures and<br />
folding correspond to functions. These simulations<br />
benefit greatly from advances in parallel<br />
computations (e.g., <strong>GPU</strong>s) because the calculated<br />
forces are inherently independent computations.<br />
However, a major limitation of <strong>GPU</strong>s is that the<br />
transfer of data between the CPU and <strong>GPU</strong> must be<br />
minimized. We introduce a new algorithm for<br />
calculating neighbor lists and transferring them to<br />
<strong>GPU</strong>s with minimal memory transfer. This<br />
algorithm is readily implemented with CUDPP and<br />
CURAND libraries. Using simulations of the<br />
ribosome, we observe a significant improvement in<br />
the performance, which is system size dependent.<br />
Contact: Tyson Lipscomb (Wake Forest University)<br />
MD02 - Plane Wave Pseudopotential Density<br />
Functional Theory Calculations on <strong>GPU</strong> Clusters<br />
In this poster, we present our implementation of the<br />
density functional theory (DFT) plane wave pseudopotential<br />
(PWP) calculation on <strong>GPU</strong> clusters. This<br />
<strong>GPU</strong> version is developed based on a CPU DFT-PWP<br />
code: PEtot. Our test indicates that the <strong>GPU</strong> version<br />
can have a ~10 times speed-up over the CPU version<br />
and is about 5 times faster than the legendary VASP<br />
code. An analysis of the speed-up and the scaling on<br />
the number of CPU/<strong>GPU</strong> computing units(up to 256)<br />
are presented. The success of our speed-up relies<br />
on a hybrid reciprocal-space and band-index<br />
parallelization scheme.<br />
Contact: WeiLe Jia (Supercomputing Center of<br />
CNIC, Chinese Academy of Sciences)<br />
MD03 - Single vs. Double Precision MD<br />
Simulations: Correlation is Length-Scale<br />
Dependent<br />
This poster evaluates how single vs. double<br />
precision operations affect Molecular Dynamics<br />
simulations using a <strong>GPU</strong>-optimized MD simulation<br />
software by performing coarse-grained MD<br />
simulations of many biologically relevant systems of<br />
various size. Three different measures of structural<br />
similarity are used to analyze structure of<br />
trajectories and to determine when single precision<br />
calculations would be appropriate and when would<br />
not. The conclusion is that the increased<br />
performance of single-precision implementations of<br />
MD simulations makes no significant difference in<br />
the accuracy and precision of MD simulations if the<br />
system size is sufficiently large.<br />
Contact: Anqi Zou (Wake Forest University)<br />
MD04 - <strong>GPU</strong>-Based Monte Carlo Simulations for<br />
Canonical and Gibbs Ensembles<br />
Markov Chain Monte Carlo (MCMC) simulation of<br />
chemical systems allows examination of<br />
nanoscopic thermodynamics and associated<br />
behavior at small time scales. These simulations<br />
tend to be computationally expensive, requiring<br />
days or more of CPU time to collect data.<br />
Optimization work is essential in order to remedy<br />
the inherent time complexity of these simulations.<br />
To date, there is no multi-ensemble molecular<br />
MCMC engine for the simulation of chemical<br />
systems that leverages <strong>GPU</strong>s. A speed up of 6.3<br />
and 14.4 times were achieved for a problem size of<br />
131072 particles for the canonical and Gibbs<br />
ensemble implementations, respectively.<br />
Contact: Loren Schwiebert (Northeastern University)<br />
MD05 - Simultaneous Evolution of Multiple<br />
Molecular Dynamics Simulations<br />
The need to generate statistically significant data<br />
from time intensive molecular dynamics (MD)<br />
simulations drives the search for algorithms that<br />
can take advantage of inherent parallelism in<br />
computer architectures. CUDA is an ideal platform<br />
for performing multiple MD simulations for<br />
ensemble averaging. We demonstrate a proof of<br />
concept highlighting the potential of CUDA in<br />
performing multiple MD simulations with different<br />
initial conditions. Compared to the traditional<br />
implementation, CUDA is able to deliver the output<br />
ten times faster. Work is in progress for improving<br />
the performance through memory optimization.<br />
Contact: Cory Slep (NC State University)<br />
MD06 - <strong>GPU</strong> Accelerated Molecular Dynamics<br />
Enabling Transformative Drug Development<br />
One powerful computational technique for the<br />
science of drug development has been the use of<br />
molecular dynamics (MD) simulations. MD<br />
simulations can explore the interactions between<br />
small molecule drugs and membrane-bound<br />
proteins on an atomic level. It is now possible to<br />
understand the biological function of drug targets<br />
through their structural motions. <strong>GPU</strong> computing
is revolutionizing the field of MD, with <strong>GPU</strong><br />
accelerated MD code competing with national<br />
supercomputers. Our research goal is to use <strong>GPU</strong><br />
technology to not only improve MD performance,<br />
but to improve MD development and workflow for<br />
drug development.<br />
Contact: Benjamin Madej (University of California<br />
San Diego, San Diego Supercomputer Center)<br />
NEUROSCIENCE<br />
NS01 - Realtime Cerebellum: Realtime<br />
Simulation of a Realistic Cerebellar Model<br />
Realtime computing is a natural demand to deal<br />
with realtime signal processing ang control. The<br />
cerebellum plays an essential role in motor<br />
learning and control. Once we build a cerebellar<br />
model running in realtime, the model could be<br />
used as a neural controller of hardware such as<br />
robots. We built a large-scale spiking network<br />
model of the cerebellum composed of more than<br />
100,000 neurons that runs in realtime. We<br />
succeeded to control a humanoid robot to hit a<br />
ball thrown by a pitching machine through online<br />
learning of a proper timing to swing a bat.<br />
Contact: Tadashi Yamazaki (RIKEN Brain<br />
Science Institute)<br />
NS02 - Computational Modeling of Human Head<br />
Electromagnetics Using <strong>GPU</strong>s<br />
This poster presents a computational environment<br />
ACSON that leverages <strong>GPU</strong> technology to<br />
accelerate the solution of the EEG forward problem,<br />
which is necessary to solve the neuroimaging<br />
inverse problem. Two finite difference algorithms,<br />
ADI and VAI, to solve Poisson equation are<br />
presented. The ADI algorithm can only handle<br />
isotropic conductivities of the head tissue while VAI<br />
can hand anisotropic conductivities as well. Their<br />
performance on different <strong>GPU</strong>s are evaluated and<br />
compared with OpenMP implementation.<br />
Contact: Allen D. Malony (University of Chicago)<br />
PARALLEL PROGRAMMING LANGUAGES<br />
& COMPILERS<br />
PC01 - Automatic Mapping of Shared Memory<br />
<strong>Program</strong>s to <strong>GPU</strong>-Based Heterogeneous Systems<br />
Realizing the potential of <strong>GPU</strong>-based<br />
heterogeneous systems is challenging due to the<br />
complexity of programming. We have developed a<br />
compiler-based approach to automatically generate<br />
optimised OpenCL code from shared memory<br />
OpenMP programs. A key feature of our scheme is<br />
that it leverages existing transformations, especially<br />
data transformations, to improve performance on<br />
<strong>GPU</strong> architectures. As not all programs are suitable<br />
for <strong>GPU</strong> execution it uses predictive modeling to<br />
automatically determine if it is worthwhile running<br />
the OpenCL code on the <strong>GPU</strong> or OpenMP code on<br />
the multi-core host.<br />
Contact: Dominik Grewe (University of Edinburgh)<br />
PC02 - GKLEE: Practical Concolic Verification<br />
and Test Generation for <strong>GPU</strong>s<br />
We provide a new framework called GKLEE that can<br />
analyze C++ <strong>GPU</strong> programs, locating the important<br />
correctness and performance bugs. For these<br />
programs, GKLEE can also automatically generate<br />
tests that provide high coverage, and these tests<br />
can later be run on the hardware to cross-check<br />
results. It helps pin-point memory accesses and<br />
execution steps that cause performance<br />
degradation. It also provides a versatile user<br />
interface. GKLEE has detected bugs and issues in<br />
many CUDA SDK kernels, and also has been able to<br />
handle non-trivial multi-kernel examples.<br />
Contact: Peng Li (School of Computing, University<br />
of Utah)<br />
PC03 - <strong>GPU</strong> Ocelot: Dynamic Compilation for PTX<br />
<strong>GPU</strong> Ocelot is an open-source dynamic JIT<br />
compilation framework for <strong>GPU</strong> compute<br />
applications targeting a range of <strong>GPU</strong> and non-<strong>GPU</strong><br />
execution targets. Ocelot supports CUDA<br />
applications and provides an implementation of the<br />
CUDA Runtime API enabling seamless integration<br />
with existing CUDA applications. Its JIT compiler<br />
supports four backend execution targets - (1) an<br />
emulator that implements NVIDIA’s Parallel Thread<br />
Execution (PTX) instruction set architecture, (2)<br />
NVIDIA <strong>GPU</strong>s, (3) AMD <strong>GPU</strong>s, and (4) a translator to<br />
LLVM for efficient parallel execution of <strong>GPU</strong> kernels<br />
on multicore CPUs. Existing CUDA applications are<br />
seamlessly supported.<br />
Contact: Andrew Kerr (Georgia Institute<br />
of <strong>Technology</strong>)<br />
PC04 - Legion: Expressing Locality and<br />
Independence with Logical Regions<br />
Modern parallel architectures have both<br />
heterogeneous processors and deep, complex<br />
memory hierarchies. We present Legion, a<br />
programming model and runtime system for<br />
programming these machines. Legion is<br />
organized around logical regions, which express<br />
both locality and independence of program data.<br />
Legion also enables explicit, programmer<br />
controlled movement of data through the memory<br />
hierarchy and placement of tasks based on locality<br />
information via a novel mapping interface.<br />
Running on a 4 node cluster with 8 total <strong>GPU</strong>s and<br />
4 levels of memory hierarchy, our implementation<br />
of Legion achieves a 5.9X speedup over a single<br />
CPU-<strong>GPU</strong> node on real-world applications.<br />
Contact: Michael Bauer (Stanford University)<br />
PC05 - Compilation Techniques for Demand-<br />
Driven Execution on Heterogeneous<br />
Architectures<br />
In order to leverage massive parallelism, there has<br />
been a resurgence of demand-driven programming<br />
models. The goal of this work is to develop<br />
compilation techniques and language extensions<br />
for existing imperative parallel programming<br />
languages that will then be mapped onto<br />
heterogeneous parallel architectures. In particular,<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
99
POSTER LISTINGS<br />
this work addresses the following topics: automatic<br />
generation of task-graphs from explicitly parallel<br />
loops, programming language extensions to<br />
provide the ordering constraints between sections<br />
of code, and the mapping of data and computation<br />
onto massively parallel architectures.<br />
Contact: Albert Sidelnik (Globo Network)<br />
PC06 - DL: A Data Layout Transformation System<br />
for Heterogeneous Computing<br />
DL is a combination of a novel approach to laying<br />
out array of aggregate types across <strong>GPU</strong> and CPU<br />
architectures to further improve memory<br />
parallelism and kernel performance beyond what<br />
is achieved by human programmers using discrete<br />
arrays today. Our proposed new layout can be<br />
derived in situ from the traditional Array of<br />
Structure, Structure of Arrays, and adjacent<br />
Discrete Arrays layouts used by programmers.<br />
Second, DL has novel in-place layout conversion<br />
algorithms implemented as part of a run-time<br />
library for OpenCL that transparently converts<br />
data to accommodate application components<br />
that have different data layout requirements.<br />
Contact: I-Jui Sung (University of Illinois at<br />
Urbana-Champaign)<br />
RAY TRACING<br />
RT01 - Searching for Cold Trapped Resources in<br />
the Lunar Regolith<br />
Our poster describes a ray tracing technique<br />
applied to the latest digital elevation models of the<br />
Moon in an effort to find permanent shadows<br />
where water ice may be cold trapped. Some of the<br />
shadows we found are characterized with surface<br />
temperature measurements from the Diviner<br />
mid-infrared radiometer on the Lunar<br />
Reconnaissance Orbiter.<br />
Contact: Andy McGovern (Irish Centre for High-End<br />
Computing (ICHEC))<br />
SUPERCOMPUTING<br />
SC01 - Multi-<strong>GPU</strong> Computing<br />
Our poster details several projects that make<br />
multi-<strong>GPU</strong> computing easy. It presents our work on<br />
a a callback method for <strong>GPU</strong>s (presented at<br />
UCHPC 2010), message-passing interface for <strong>GPU</strong>s<br />
(IPDPS 2009), a heterogeneous computationalresource<br />
scheduler (EG 2009), and a multi-<strong>GPU</strong><br />
MapReduce implementation (IPDPS 2011).<br />
Contact: Jeffery Stuart (UC Davis)<br />
SC02 - Automatic Generation of FFT Libraries<br />
for <strong>GPU</strong>s<br />
In this poster we present an extension of the<br />
Spiral code generation system to <strong>GPU</strong>s. We<br />
address the key problems of <strong>GPU</strong> memory<br />
hierarchy and parallelism, and we introduce a<br />
variety of FFT algorithms which avoid shared<br />
memory bank conflicts without wasting space<br />
using padding and optimized global memory<br />
bandwidth transfer with minimum register<br />
allocation even in low occupancy. We demonstrate<br />
high performance results against cuFFT 1-D and<br />
2-D DFTs for single precision. This research is still<br />
in progress, but at the moment we are able to<br />
match and beat cuFFT library on sizes we have<br />
generated optimized code.<br />
Contact: Christos Angelopoulos (Carnegie<br />
Mellon University)<br />
SC03 - Computational and Simulation Sciences:<br />
Applications of Heterogeneous Computing<br />
As the size and complexity of scientific problems<br />
grow, scientists from a broad range of discipline<br />
areas are relying more on computational methods<br />
and simulations to help solve their problems. This<br />
work presents summary of heterogeneous<br />
algorithms and applications that have been<br />
developed by CSIRO for solving practical and<br />
challenging science problems faster than is<br />
possible with conventional multi-core CPUs alone.<br />
The problem domains include: CFD, imaging and<br />
visualization, advanced materials modeling,<br />
computational biology, geosciences and climate<br />
research. The algorithms utilize NVIDIA <strong>GPU</strong>s<br />
and multi-core CPUs on a scale ranging from<br />
single workstation installations through to large<br />
<strong>GPU</strong> clusters.<br />
Contact: Tomasz Bednarz (CSIRO)<br />
SC04 - 75-Round SHA-1 Collision Search Using<br />
<strong>GPU</strong> Clusters<br />
SHA-1 is one of the most widely used<br />
cryptographic hash function. We ported method of<br />
characteristics for collision search for SHA-1 to<br />
<strong>GPU</strong> clusters. Using it, we found a collision for<br />
75-round version of SHA-1, which is currently the<br />
world record.<br />
Contact: Andrew Adinetz (Lomonosov Moscow<br />
State University)<br />
SC05 - <strong>GPU</strong> Clusters for Large-Scale Analysis of<br />
X-Ray Scattering Data<br />
X-ray scattering is a valuable tool for measuring<br />
the structural properties of materials used in the<br />
design and fabrication of energy-relevant<br />
nanodevices. A primary challenge here is in the<br />
analysis of data due to its generation rate and<br />
size. We are developing novel HPC algorithms and<br />
codes for such analyses. Here we present two<br />
advances using <strong>GPU</strong>s: a flexible Grazing Incidence<br />
Small Angle Scattering simulation code. This code<br />
can compute the scattered light intensity from any<br />
given sample in all directions of space. Second, an<br />
efficient inverse modeling code for structural<br />
fitting problems using Reverse Monte Carlo (RMC)<br />
simulation algorithm.<br />
Contact: Abhinav Sarje (Wayne State University)
VISUALIZATION<br />
VZ01 - CNC Tool Path Planning and Machining<br />
Simulation on <strong>GPU</strong><br />
Today a main part of a low-volume manufacturing<br />
cost involving CNC machining is a cost of a tool<br />
path planning performed by an engineer. The goal<br />
of this research is to develop an automatic CNC<br />
machine tool path planning and simulation<br />
system. In order to achieve a reasonable<br />
performance we are using GP<strong>GPU</strong> approach for<br />
geometry processing and propose to develop a<br />
new solid geometry representation especially<br />
designed for parallel processing and GP<strong>GPU</strong><br />
which will become a base for a new automatic tool<br />
path planning system and will also significantly<br />
increase speed and accuracy of a machining<br />
process simulation.<br />
Contact: Dmytro Konobrytskyi (Clemson University)<br />
VZ02 - <strong>GPU</strong>-Accelerated Power System<br />
State Visualization<br />
Modern energy management systems aim to<br />
provide situational awareness to grid operators<br />
using a variety of tools. Advances in technology<br />
such as high-frequency data from phasor<br />
measurement units distributed across the system<br />
support the display and analysis of the dynamic<br />
state of the power grid. Scattered data interpolation<br />
is a computationally intensive problem that benefits<br />
massively from parallel implementations on <strong>GPU</strong>s.<br />
This poster presents a highly optimized network<br />
state visualization system that fully exploits<br />
programmable graphics hardware and delivers<br />
three orders of magnitude performance<br />
improvements while offering extra features<br />
compared to a traditional, CPU-based approach.<br />
Contact: Martin Naef (NVIDIA)<br />
VZ03 - Image Treatment Implementing Extended<br />
Depth of Field with NVIDIA CUDA<br />
Extended depth of field (EDF) is a specific method<br />
used to analyze and treat specific image zones in<br />
optical research. Due to the complexity of the EDF<br />
and the large volume of data processed in optics<br />
problems, EDF is a good candidate to process in<br />
parallel architectures. This work is an<br />
implementation of parallel-extended depth of field<br />
using NVIDIA CUDA. We propose a solution<br />
algorithm addressed a multicomputer cluster and<br />
shared memory represented by an hybrid parallel<br />
machine based on NVIDIA <strong>GPU</strong>s. Moreover, a<br />
performance evaluation in terms of execution<br />
time is proposed followed by a discussion about<br />
this approach.<br />
Contact: Mónica Liliana Hernández Ariza<br />
(Universidad Industrial de Santander)<br />
VZ04 - Diderot: A Parallel DSL for Image<br />
Analysis and Visualization<br />
The analysis of structure in three-dimensional<br />
images is increasingly important for biomedical<br />
research and computational science. In this<br />
poster, we outline ongoing work developing<br />
Diderot, a parallel domain-specific language for<br />
three-dimensional image visualization and<br />
analysis algorithms, such as volume rendering,<br />
fiber tractography, and particle systems. Diderot<br />
supports a high-level mathematical computation<br />
model coupled with a batch-synchronous<br />
parallelism model. The poster further describes<br />
Diderot’s <strong>GPU</strong> implementation and its high<br />
performance measurements on <strong>GPU</strong>s versus<br />
other sequential and parallel platforms.<br />
Contact: Lamont Samuels (Lawrence Berkeley<br />
National Laboratory)<br />
CONFERENCE GUIDE POSTER LISTINGS<br />
101
What you need to know. Now.<br />
Dr. Dobbs Ad?<br />
Available on the iPad <br />
100% Free. Try it today!
<strong>GTC</strong> <strong>2012</strong><br />
SPEAKERS & PANELISTS<br />
Alexey Abramov<br />
PhD Student (University of Gottingen)<br />
Alexey Abramov received the M.Sc. degree in Computer<br />
Science from the Moscow Engineering and Physics<br />
Institute (State University), Moscow, Russia. Currently he<br />
is a PhD student at the Georg-August University,<br />
Goettingen, Germany. His research interests include<br />
image processing, image segmentation and object<br />
tracking, stereo image processing and real-time<br />
computer vision with highperformance computing on<br />
parallel hardware.<br />
h Session(s): S0075 - Oculus Real-Time Modular<br />
Cognitive Vision System (Tuesday, 15:00, Room: A1)<br />
Robert Alexander<br />
CUDA Tools Software Engineer (NVIDIA)<br />
Robert Alexander is a software engineer on the NVIDIA<br />
Tesla Platform Software team. His focus is on<br />
management, monitoring and diagnostics of <strong>GPU</strong>s in a<br />
cluster environment. His work includes the NVIDIA<br />
Management Library (NVML), the NVIDIA System<br />
Management Interface (NVIDIA-smi), and he is<br />
responsible for the Perl and Python NVML bindings.<br />
Robert has a BS in Computer Science from the<br />
Rochester Institute of <strong>Technology</strong>.<br />
h Session(s): S0238 - Tesla Cluster Monitoring &<br />
Management APIs (Thursday, 09:30, Room: K)<br />
Alina Alt<br />
Applied Engineer (NVIDIA)<br />
Alina Alt is an Applied Engineer at NVIDIA where her<br />
responsibilities include helping users incorporate<br />
NVIDIA’s <strong>GPU</strong>s, video products and video related driver<br />
features into their solutions and applications. Her past<br />
experience includes developing augmented reality<br />
applications for live sports telecasts and developing a<br />
scalable, CPU-based cluster graphics driver.<br />
h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />
Round Table (Monday, 14:30, Room: A2)<br />
h S0049 - Using the <strong>GPU</strong> Direct for Video API<br />
(Tuesday, 15:00, Room: J2)<br />
h S0267A - Mixing Graphics and Compute with<br />
Multiple <strong>GPU</strong>s (Tuesday, 17:00, Room: J2)<br />
h S0326 - Next Generation InfoWall<br />
(Thursday, 09:00, Room: A1)<br />
h S0267B - Mixing Graphics and Compute<br />
with Multiple <strong>GPU</strong>s (Thursday, 15:30, Room: L)<br />
Minesh B. Amin<br />
Founder / CEO (MBA Sciences)<br />
Dr. Minesh B. Amin is Founder & CEO of MBA Sciences,<br />
Inc. MBA Sciences enables engineers and scientists to<br />
rapidly prototype, analyze and deploy robust parallel<br />
solutions across heterogeneous computing resources<br />
spanning servers, cores and <strong>GPU</strong>s from either data<br />
centers or public clouds. Previously he worked at<br />
Synopsys, Inc, where he helped, prototype, implement<br />
and deploy several parallel versions of existing serial<br />
products including TetraMax TenX ATPG product and<br />
PrimeTime DMSA. Dr. Amin received his PhD from the<br />
University of Minnesota.<br />
h Session(s): S0299 - Exploiting Fault Tolerant<br />
Heterogeneous Parallelism with SPM.Python<br />
(Wednesday, 16:00, Room: C)<br />
Joshua Anderson<br />
Research Area Specialist (University of Michigan)<br />
Joshua Anderson is a Research Area Specialist in the<br />
Laboratory for Computational Nanoscience & Soft<br />
Matter Simulation at the University of Michigan. Dr.<br />
Anderson holds a Ph.D. degree in Condensed Matter<br />
Physics from Iowa State University and is the lead<br />
developer of HOOMD-blue, a high performance particle<br />
simulation tool. His current research interests include<br />
<strong>GPU</strong> computing, polymer physics, and nanoparticle<br />
self-assembly.<br />
h Session(s): S0058 - Advancing <strong>GPU</strong> Molecular<br />
Dynamics: Rigid Bodies in HOOMD-blue<br />
(Wednesday, 10:00, Room: N)<br />
Roberto Ansaloni<br />
(Cray Italy)<br />
Biography unavailable at press time.<br />
h Session(s): S0286 – Scaling Applications to a<br />
Thousand <strong>GPU</strong>s and Beyond<br />
(Wednesday, 16:00, Room: A2)<br />
Santosh Ansumali<br />
(Faculty Fellow, Engineering Mechanics Unit, JNCASR,<br />
Bangalore)<br />
Dr. Ansumali is a faculty at EMU, JNCASR and also<br />
holding Ramanujan Fellowship from DST India since July<br />
2009. Prior to this, he was an assistant Prof. at NTU,<br />
Singapore since August 2005. He has done his PhD from<br />
ETH, Zurich (Switzerland) on mesoscale simulation<br />
methods. His research area is mesoscale simulation<br />
methods and high performance computing based on<br />
Kinetic theory.<br />
h Session(s): S0428 – Panini: A <strong>GPU</strong> Aware Array<br />
Class (Thursday, 16:00, Room: B)<br />
Takayuki Aoki<br />
Professor (Tokyo Institute of <strong>Technology</strong>)<br />
Takayuki Aoki received a Dr. Sci (1989) from Tokyo<br />
Institute of <strong>Technology</strong>, was a visiting researcher in the<br />
Max-Planck Institute in Germany for one year, has been<br />
a professor in Tokyo Institute of <strong>Technology</strong> since 2001.<br />
He has received the Computational Mechanics<br />
Achievement Award from Japan Society of Mechanical<br />
Engineers and many awards and honors in visualization.<br />
He is also the vice president of the Japan Association for<br />
Computational Mechanics. He has authored the first<br />
book in the Japanese language on the CUDA<br />
programming and applications. His research covers<br />
numerical schemes for CFD, numerical weather models,<br />
HPC applications on graphics processors, multi-phase<br />
flows, and simulation of natural disasters.<br />
h Session(s): S0412 - A 2-Petaflops Stencil<br />
Application with Stereoscopic 3D Visualization -<br />
Gorden Bell Prize 2011 (Tuesday, 14:00, Room: A1)<br />
Jeremy Appleyard<br />
Analyst (Polyhedron Software Ltd)<br />
Biography unavailable at press time.<br />
h Session(s): S0432 – New Ideas for Massively<br />
Parallel Preconditioners<br />
(Wednesday, 15:00, Room: A7)<br />
John Appleyard<br />
Managing Director (Polyhedron Software Ltd)<br />
BA, MA and PhD from Cambridge University. One of the<br />
Original Developers of the Eclipse Oil Reservoir<br />
Simulator and an MD of Polyhedron Software Ltd.<br />
h Session(s): S0432 - New Ideas for Massively<br />
Parallel Preconditioners<br />
(Wednesday, 15:00, Room: A7)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
103
SPEAKERS AND<br />
PANELISTS<br />
Arutyun Avetisyan<br />
Deputy Director (Institute for System <strong>Program</strong>ming,<br />
Russian Academy of Sciences)<br />
Arutyun Avetisyan is Deputy Director of the Institute for<br />
System <strong>Program</strong>ming of the Russian Academy of<br />
Sciences (ISP RAS). His research interests are in the<br />
areas of compiler technologies, HPC and Cloud<br />
computing. He is leader of several projects, including<br />
researching compiler support for heterogeneous<br />
systems. He represents RAS in Steering Committee of<br />
Open Cirrus Community – the global cloud computing<br />
testbed for research projects. He is PI of the National<br />
“University Cluster” program, including in particular the<br />
technology platform (unihub.ru), which is an opportunity<br />
of creating wide range of services within a single<br />
infrastructure, e.g. creating subject-specific web-labs.<br />
h Session(s): S0115 – Specialized Sparse Matrix<br />
Formats and SpMV Kernel Tuning for <strong>GPU</strong>s<br />
(Wednesday, 10:30, Marriott Ballroom 3)<br />
Brendan Babb<br />
Student/Research Technician (University of<br />
Alaska Anchorage)<br />
Brendan Babb has over 20 years experience as a<br />
software programmer and analyst in the engineering<br />
and telecommunications industries with a background in<br />
Mathematics. He holds three patents in error detection<br />
and correction and his current interests are in<br />
Evolutionary Computation, Biomimicry, GP<strong>GPU</strong> and their<br />
collective application to optimizing renewable energy<br />
solutions. Since 2005 he has used evolutionary<br />
computation to evolve wavelet like transforms that<br />
improve image compression for photo, fingerprint,<br />
satellite, CT scans, Ultrasound and Mars Rover images.<br />
h Session(s): S0133 - Improving Mars Rover Image<br />
Compression Via <strong>GPU</strong>s And Genetic Algorithms<br />
(Thursday, 09:00, Room: A3)<br />
Ronald Babich<br />
Research Scientist (NVIDIA)<br />
Ron Babich is a Research Scientist at NVIDIA, where he<br />
works at the intersection of algorithms and architecture,<br />
with a particular focus on high-performance computing. He<br />
was previously a postdoctoral fellow in Boston University’s<br />
Center for Computational Science and received his PhD in<br />
Physics from Boston University in 2009.<br />
h Session(s): S0368 - Unraveling the Mysteries of<br />
Quarks with Hundreds of <strong>GPU</strong>s<br />
(Thursday, 15:00, Room: K)<br />
Philip A. Beasley-Harling<br />
(Bank of America Merrill Lynch)<br />
Biography unavailable at press time.<br />
h Session(s): S0656 kdb+ and <strong>GPU</strong>s for Market Data<br />
Analytics and Trading (Wednesday, 17:30, Room: L)<br />
Dan Bailey<br />
R&D (Double Negative)<br />
Dan Bailey is working in Research and Development at<br />
Double Negative, where he is driving the adoption of the<br />
<strong>GPU</strong> and increased parallelism in general. His primary<br />
focus is the proprietary fluid solver, where a strong<br />
educational background in Computer Science has<br />
complemented an interest in fluid simulation. His<br />
research concentrates on languages and parallel<br />
compiler technology, but with a strong leaning towards<br />
its use in production.<br />
h Session(s): S0300 - Jet: A Domain-Specific<br />
Approach to Parallelism for Film Fluid Simulation<br />
(Tuesday, 10:00, Room: A2)<br />
Tim Bajarin<br />
President (Creative Strategies)<br />
Tim Bajarin is recognized as one of the leading industry<br />
consultants, analysts and futurists, covering the field of<br />
personal computers and consumer technology. Mr.<br />
Bajarin has been with Creative Strategies since 1981 and<br />
has served as a consultant to most of the leading<br />
hardware and software vendors in the industry including<br />
IBM, Apple, Xerox, Hewlett Packard/Compaq, Dell, AT&T,<br />
Microsoft, Polaroid, Lotus, Epson, Toshiba and<br />
numerous others. His articles and/or analysis have<br />
appeared in USA Today, Wall Street Journal, The New<br />
York Times, Time and Newsweek magazines,<br />
BusinessWeek and most of the leading business and<br />
trade publications. He has appeared as a business<br />
analyst commenting on the computer industry on all of<br />
the major television networks and was a frequent guest<br />
on PBS’ The Computer Chronicles. Mr. Bajarin has been<br />
a columnist for US computer industry publications such<br />
as PC Week and Computer Reseller News and wrote for<br />
ABCNEWS.COM for two years and Mobile Computing for<br />
10 years. His columns currently appear in Asia<br />
Computer Weekly, Personal Computer World (UK), and<br />
Microscope (UK) as well as Mobile Enterprise Magazine.<br />
His various columns and analyses are syndicated in over<br />
30 countries.<br />
h Session(s): S2003 – Emerging Companies Summit<br />
Fireside Chat with Jen-Hsun Huang (CEO,<br />
President and Co-Founder, NVIDIA) and Tim<br />
Bajarin (President, Creative Strategies)<br />
(Wednesday, 14:00, Marriott Ballroom 4)<br />
Zack Baker<br />
(Los Alamos National Laboratory)<br />
Biography unavailable at press time.<br />
h Session(s): S0702 - Los Alamos AHPC Symposium,<br />
The Architecture of Acceleration in HPC<br />
(Wednesday, 15:30, Room: J1)<br />
Robert Balgley<br />
CEO (Mersive)<br />
Over the past 20 years Balgley has worked as CEO of<br />
several category-defining companies funded by some of<br />
the most successful venture capital firms in the world.<br />
Prior to Mersive, he was CEO of SkyeTek, the worldwide<br />
market share leader in embedded RFID readers and<br />
technology. Prior to that, Balgley was CEO of Jabber, the<br />
pioneer and leader in enterprise instant messaging,<br />
which was later acquired by Cisco Systems. Before<br />
Jabber, he was CEO of Mobile Logic, an early market<br />
leader of mobile data networking software which was<br />
acquired in 2000. Earlier in his career, Balgley held<br />
executive positions in sales and marketing at GE, 3Com,<br />
Hughes Aircraft and Case Communications.<br />
h Session(s): S2005 – Emerging Companies Summit:<br />
CEO on Stage Featuring RealView Imaging,<br />
Elemental Technologies, and Mersive<br />
(Wednesday, 16:00, Marriott Ballroom 4)<br />
Bill Barth<br />
Director of High Performance Computing (Texas Advanced<br />
Computing Center, University of Texas at Austin)<br />
Bill Barth is the Director of High Performance<br />
Computing at the Texas Advanced Computing Center<br />
where he oversees the use of TACC’s large-scale HPC<br />
resources by a diverse international community of<br />
scientists and researchers. Dr. Barth received his PhD<br />
from the Aerospace Engineering Department of The<br />
University of Texas in 2004 where he worked on finite<br />
element methods for incompressible flow and transport<br />
problems. His current interests include network topology<br />
aware job scheduling and MPI communication,<br />
physics-based, flow visualization, software tools for
large-scale clusters, and the design and deployment of<br />
leadership-class supercomputers.<br />
h Session(s): Los Alamos AHPC Symposium,<br />
Stampede System Architecture and Early<br />
Accelerator <strong>Program</strong>ming Experiences<br />
(Wednesday, 14:00, Room: J1)<br />
Francesco Basile<br />
Software Engineer (MBI srl)<br />
Basile obtained his joint PhD in Mathematical Physics at<br />
University of Pisa / Brunel University London in 2008.<br />
Since 2008 he devolved is strong mathematical<br />
background to analysis of digital radio signal processing.<br />
h Session(s): S0065 – Satellite HUB Communication<br />
System <strong>GPU</strong> Based (Thursday, 16:30, Room: M)<br />
Bela Bauer<br />
Postdoc (Microsoft Research)<br />
Biography unavailable at press time.<br />
h Session(s): S0039 – Data-Driven GP<strong>GPU</strong> Ideology<br />
Extension (Thursday, 10:00, Marriott Ballroom 3)<br />
Janusz Bedkowski<br />
Researcher<br />
Janusz has been a researcher in area of mobile robotics<br />
- navigation, 3D modeling and simulation since 2006. He<br />
is working in cooperation with following institutions:<br />
Warsaw University of <strong>Technology</strong>, faculty of Mechatronics<br />
(education), Industrial Research Institute for<br />
Automation and Robotics (researcher, mobile robot<br />
design and programming), Institute of Mathematical<br />
Machines (researcher, simulation and modeling using<br />
parallel computing).<br />
h Session(s): S0081 - Parallel Computing In Mobile<br />
Robotics for RISE (Thursday, 09:30, Room: A3)<br />
Nathan Bell<br />
Senior Research Scientist (NVIDIA)<br />
Nathan Bell joined NVIDIA Research in August 2008. His<br />
current research interests include sparse linear algebra<br />
and programming models for parallel computing.<br />
Nathan contributes to several open source projects<br />
including Thrust, a high-level parallel template library,<br />
Cusp, a library for sparse linear algebra and graph<br />
algorithms, and PyAMG, a library of algebraic multigrid<br />
methods in Python. Nathan received a bachelor’s degree<br />
in Computer Science from Georgia Tech and a Ph.D in<br />
Computer Science from the University of Illinois at<br />
Urbana-Champaign (UIUC).<br />
h Session(s): S0602 - An Introduction to the<br />
Thrust Parallel Algorithms Library<br />
(Tuesday, 17:00, Room: A3)<br />
Tomer Ben-David<br />
Co-Founder and Vice President, R&D (Rocketick)<br />
Tomer has co-founded Rocketick at 2008 and since then<br />
he is serving the company as the VP of R&D. Tomer<br />
brings 15 years of experience in management and<br />
engineering of software and hardware products. He<br />
previously worked at Intel Corporation, Siliquent<br />
(acquired by Broadcom) and Mellanox. Tomer holds a B.<br />
Sc. (Cum Laude) in Computer Engineering from the<br />
Technion – the Israeli Institute of <strong>Technology</strong>, and<br />
Executive MBA from Recanati School of Business,<br />
Tel-Aviv University.<br />
h Session(s): S0520 - Using <strong>GPU</strong>s to Speedup Chip<br />
Verification (Tuesday, 10:00, Room: J3)<br />
h S2004 – Emerging Companies Summit: CEO on<br />
Stage Featuring Raytrix, Rocketick, and Ubitus<br />
(Wednesday, 17:00, Marriott Ballroom 4)<br />
Thomas Benson<br />
Research Engineer II (Georgia Tech Research Institute)<br />
Thomas Benson is a Research Engineer with Georgia<br />
Tech Research Institute, where his research focus and<br />
interests include high-performance computing,<br />
high-performance embedded computing, radar signal<br />
processing and medical imaging, heterogeneous<br />
computing, and programming models related to such<br />
systems. He holds a Ph.D. in Computer Science from the<br />
University of Tennessee, Knoxville, and has nearly five<br />
years of post-graduate industrial research experience<br />
with GE Global Research in the field of medical imaging,<br />
specifically image reconstruction and related algorithms<br />
for X-ray computed tomography (CT). His experience<br />
includes developing large-scale real-time processing<br />
implementations for several of his fields of research.<br />
h Session(s): S0316 - Using <strong>GPU</strong>s to Accelerate<br />
Synthetic Aperture Sonar Imaging via<br />
Backpropagation (Tuesday, 15:30, Room: J3)<br />
Mike Bernhardt<br />
(The Exascale Report)<br />
Mike Bernhardt is a well-respected strategic marketing,<br />
communications, media relations and electronic<br />
publishing consultant with 25 years of experience<br />
serving the HPC community. Bernhardt founded The<br />
Exascale Report in 2010 to serve as the voice of the<br />
emerging exascale community. Today, the subscriptionbased<br />
Exascale Report is a widely read publication from<br />
which articles and extracts have been presented to<br />
numerous governmental bodies to help drive funding<br />
and political commitment discussions on a global scale.<br />
As an independent consultant, Bernhardt has worked<br />
with dozens of companies throughout the global HPC<br />
ecosystem on branding, marketing, strategic<br />
communications and public speaking programs.<br />
Bernhardt is a former Intel marketing executive and<br />
currently serves as a consultant or Board-level advisor<br />
to a number of privately held organizations.<br />
h Session(s): S0531 - Exascaling Your Apps<br />
(Wednesday, 09:00, Room: C)<br />
James Beyer<br />
Software Engineer (Cray Inc)<br />
James Beyer received his Ph.D. from University of<br />
Minnesota. He has been a member of the Cray<br />
<strong>Program</strong>ming Environment Optimization team for more<br />
than 12 years. He has represented Cray on the OpenMP<br />
language committee and ARB since Cray rejoined the<br />
organization. He led the effort to redesign the Cray<br />
OpenMP implementation to improve optimizer<br />
integration. He authored the original OpenMP for<br />
Accelerators, OpenMP4ACC, proposal and co-chairs the<br />
OpenMP language subcommittee on Accelerators.<br />
James was the primary Cray representative during the<br />
design of the OpenACC specification. He is currently<br />
actively involved in the Cray implementations of OpenMP,<br />
OpenACC and OpenMP4ACC.<br />
h Session(s): S0089 - Accelerator Directives,<br />
OpenACC and OpenMP4ACC<br />
(Tuesday, 16:00, Room: A3)<br />
Johanna Beyer<br />
Postdoctoral Fellow (King Abdullah University of Science<br />
and <strong>Technology</strong>)<br />
Johanna Beyer is a postdoctoral fellow at the Geometric<br />
Modeling and Scientific Visualization Center at King<br />
Abdullah University of Science and <strong>Technology</strong> (KAUST),<br />
Saudi Arabia. She holds an M.Sc. in medical software<br />
engineering (2004, University of Applied Sciences<br />
Hagenberg, Austria) and a Ph.D. in computer science<br />
(2009, University of <strong>Technology</strong> Vienna, Austria). Her<br />
research focuses on <strong>GPU</strong>-based volume rendering<br />
techniques for medical and neuroscience applications,<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
105
SPEAKERS AND<br />
PANELISTS<br />
with emphasis on visualization of large and multi-modal<br />
data. She regularly publishes at IEEE TVCG/IEEE<br />
Visualization.<br />
h Session(s): S0202 - Terascale Volume Visualization<br />
in Neuroscience (Wednesday, 16:30, Room: A8)<br />
Tim Bi<br />
Graduate Research Analyst (Johns Hopkins University /<br />
George Mason University)<br />
Tim Bi is a Bioinformatics Ph.D. candidate at George<br />
Mason University currently working as a GRA for Dr.<br />
Saleet Jafri and contributing to the efforts of improving<br />
the <strong>GPU</strong> program for Calcium Induced Calcium Release.<br />
He is also working for Dr. Diane Becker at the Johns<br />
Hopkins School of Medicine contributing to the GWAS<br />
studies being conducted at the GeneSTAR lab.<br />
h Session(s): S0272 - <strong>GPU</strong> GWAS - CUDA Based<br />
Genome Wide Association Studies<br />
(Wednesday, 10:30, Room: B)<br />
James Bigler<br />
Sr. Software Engineer (NVIDIA)<br />
James Bigler is currently working for NVIDIA as a Sr.<br />
Software Engineer developing OptiX, a <strong>GPU</strong> accelerated<br />
ray tracing framework. His work with ray tracing dates<br />
back to 2000 at the University of Utah where he worked<br />
under Dr. Steven Parker researching and developing<br />
parallel ray tracing applications for rendering and<br />
scientific visualization. Since coming to NVIDIA in 2008,<br />
James has strived to bring more ray tracing<br />
awesomeness to everyone through OptiX. James holds a<br />
B.S. and M.S. in Computer Science from the University of<br />
Utah.<br />
h Session(s): S0366 - OptiX Out-of-Core and CPU<br />
Rendering (Tuesday, 15:30, Room: J1)<br />
Sam Blackman<br />
CEO and Co-Founder (Elemental Technologies)<br />
Sam Blackman co-founded Elemental Technologies in<br />
2006 and has grown the company into a leading supplier<br />
of video solutions for multiscreen content delivery. Prior<br />
to co-founding Elemental, Sam designed integrated<br />
circuit products for Pixelworks. He has also held<br />
engineering positions at Silicon Graphics and Intel<br />
Corporation. Sam holds an M.B.A from University of<br />
Oregon, an M.S. in electrical engineering from University<br />
of California at Berkeley and a B.S in electrical<br />
engineering from Brown University.<br />
h Session(s): S2005 – Emerging Companies Summit:<br />
CEO on Stage Featuring RealView Imaging,<br />
Elemental Technologies, and Mersive<br />
(Wednesday, 16:00, Marriott Ballroom 4)<br />
Aaron Blasius<br />
Sr. Product Manager (VMware)<br />
Biography unavailable at press time.<br />
h Session(s): S0359 - VMware and NVIDIA: Delivering<br />
3D Workstations from the Cloud<br />
(Tuesday, 17:00, Room: A5)<br />
François Bodin<br />
Chief <strong>Technology</strong> Officer (CTO) (CAPS enterprise)<br />
As chief scientist, François Bodin plans, advises and<br />
advocates the research and development projects which<br />
led to the creation of innovative software tools. François<br />
carries on with its research activities at the Irisa lab,<br />
which focus in code optimization and compiler<br />
technologies for high performance computers and<br />
embedded systems. François is member of HIPEAC, the<br />
European Network of Excellence on High-Performance<br />
Embedded Architecture and Compilation. François has<br />
degrees in computer science from the University of<br />
Rennes I. François Bodin is also Chairman of IRISA<br />
Rennes, a research unit in the forefront of information<br />
and communication science and technology.<br />
h Session(s): S0630 Part 1of 2: <strong>Program</strong>ming<br />
Heterogeneous Many-cores Using Directives<br />
(Presented by CAPS) (Monday, 13:00, Room: A8)<br />
h S0631 Part 2 of 2: <strong>Program</strong>ming Heterogeneous<br />
Many-cores Using Directives (Presented by CAPS)<br />
(Monday, 14:30, Room: A8)<br />
h S0635 - How to Bake Portable Many-Core<br />
<strong>Program</strong>s (Wednesday, 15:00, Room: M)<br />
Robert Boehme<br />
Team Lead & CEO (Part-Time Scientists)<br />
Robert Boehme is Team Lead and CEO of Part-Time<br />
Scientists. The Part-Time Scientists Team consists of<br />
100 international engineers and scientists working in<br />
their free time on the first private mission to the moon.<br />
Over the past two years they managed to get the full<br />
technical development kick-started with a lot of<br />
prototypes and technology taken from the industry back<br />
into space. With five prototype lines, 50 business<br />
partnerships, several cooperations and many hours<br />
testing, the team is amongst the leading competitors for<br />
the 30 million dollar Google Lunar X-PRIZE competition.<br />
h Session(s): S3002 – Day 3 Keynote: Not Your<br />
Grandfather’s Moon Landing<br />
(Thursday, 11:00, Keynote Hall)<br />
Taisuke Boku<br />
Deputy Director of Center for Computational Sciences at<br />
University of Tsukuba (University of Tsukuba)<br />
Biography unavailable at press time.<br />
h Session(s): S0618 – Best Practices of a 800TFlop<br />
Hybrid Supercomputer Implementation<br />
(Tuesday, 09:30, Room: M)<br />
Nikola Bozinovic<br />
CTO (MotionDSP)<br />
Nikola Bozinovic is Chief <strong>Technology</strong> Officer at<br />
MotionDSP where he leads all technical efforts and<br />
oversees product development. As the company’s key<br />
technologist, he leverages his expertise in signal<br />
processing, image and video analysis, and video<br />
compression to provide people and organizations around<br />
the world with groundbreaking video technology. Prior to<br />
establishing MotionDSP’s engineering department,<br />
Nikola was as a senior software engineer at Veodia, a<br />
video streaming and distribution company, and a<br />
research scientist at Microsoft. Nikola holds M.S. and<br />
Ph.D. degrees from Boston University, where he was a<br />
Dean’s Fellow.<br />
h Session(s): S0527 - <strong>GPU</strong>s and the Next-Generation<br />
Aerial Surveillance (Tuesday, 09:00, Room: J2)<br />
Wil Braithwaite<br />
Senior Applied Engineer (NVIDIA)<br />
Wil Braithwaite has worked for 15 years in VisualFX at<br />
studios in London and Los Angeles, including<br />
FrameStore, MPC, and the Jim Henson Company.<br />
Positions ranged from Technical direction, Compositing,<br />
CG Supervision, and Mocap supervision. He has<br />
pioneered the use of graphics hardware in the VFX<br />
workflow, which led to his role at NVIDIA as a Senior<br />
Applied-Engineer for VFX, where he specializes in<br />
consulting, training and assisting development for studio<br />
projects utilizing NVIDIA technologies.<br />
h Session(s): S0364 - Interacting with Huge<br />
Particle Simulations in Maya with the <strong>GPU</strong><br />
(Tuesday, 14:00, Room: J1)
Thomas Brandes<br />
Senior Scientist (Fraunhofer Scientific Computing<br />
Institute (FhG-SCAI))<br />
Thomas Brandes received his PhD in Applied<br />
Mathematics in 1988 from the University in Marburg.He<br />
joined Fraunhofer’s Scientific Computing Institute<br />
(FhG-SCAI) in 1989. He is working as a senior scientist<br />
on the design, parallelization and optimization of<br />
scientific applications for all kinds of parallel<br />
architectures. His research interests are centered<br />
around parallelization tools, cache optimization, <strong>GPU</strong><br />
programming and object-oriented design of parallel<br />
software.<br />
h Session(s): S0705 - Los Alamos AHPC Symposium,<br />
Efficient AMG on Hybrid <strong>GPU</strong> Clusters<br />
(Wednesday, 17:00, Room: J1)<br />
Vincent Brisebois<br />
Visual Computing Product Manager (Fusion-io)<br />
As Visual Computing Product Manager at Fusion-io,<br />
Vincent Brisebois works closely with entertainment<br />
production studios on implementing solutions that<br />
facilitate new levels of creativity, productivity and<br />
worldwide collaboration. Vincent has designed<br />
technology solutions for 2D and 3D production in the<br />
visual effects, video game and design industries for over<br />
15 years.<br />
h Session(s): S0619 - Hate to Wait? Flash Memory<br />
for Full-Throttle <strong>GPU</strong> Acceleration<br />
(Thursday, 09:00, Room: L)<br />
John Brown<br />
Principal Engineer (Hewlett-Packard)<br />
John is a Principal Engineer in Hewlett-Packard’s<br />
Workstation Graphics Research and Development,<br />
engineering graphics and workstation solutions since<br />
1984. He has contributed to a wide variety of HP<br />
products and projects for 24 years, ranging from HP’s<br />
SRX graphics processor, to HP’s SV6 Scalable<br />
Visualization solution, to HP’s latest family of high-end<br />
workstation platforms.<br />
h Session(s): S0633 – Learn about new Hewlett-<br />
Packard <strong>GPU</strong> Systems, Solutions, and Applications!<br />
(Wednesday, 10:00, Room: M)<br />
Kevin J. Brown<br />
Research Assistant (Stanford University)<br />
Biography unavailable at press time.<br />
h Session(s): S0365 – Delite: A Framework for<br />
Implementing Heterogeneous Parallel DSLs<br />
(Wednesday, 15:00, Room: C)<br />
Andreas Buhr<br />
Department Manager - Performance Optimization<br />
(CST AG)<br />
Andreas Buhr works on performance optimization at<br />
CST AG since 2009. He holds a bachelor’s degree in<br />
physics and a master’s degree in applied physics from<br />
the Technical University Darmstadt. He is working with<br />
CUDA since its version 0.9.<br />
h Session(s): S0069 – <strong>GPU</strong> Computing Advances<br />
in 3D Electromagnetic Simulation<br />
(Tuesday, 14:00, Room: J3)<br />
Martin Burtscher<br />
Associate Professor (Texas State University)<br />
Martin Burtscher is Associate Professor in the<br />
Department of Computer Science at Texas State<br />
University. He received the combined BS/MS degree in<br />
computer science from the Swiss Federal Institute of<br />
<strong>Technology</strong> (ETH) Zurich in 1996 and the Ph.D. degree in<br />
computer science from the University of Colorado at<br />
Boulder in 2000. Martin’s research interests include<br />
efficient parallelization of programs for <strong>GPU</strong>s as well as<br />
automatic performance assessment and optimization of<br />
HPC applications. He is a senior member of the IEEE, its<br />
Computer Society, and the ACM. Martin has co-authored<br />
over 60 peer-reviewed publications, including a <strong>GPU</strong><br />
Computing Gems chapter.<br />
h Session(s): S0111 - An Efficient CUDA<br />
Implementation of a Tree-Based N-Body Algorithm<br />
(Thursday, 15:30, Room: M)<br />
Michael Bussmann<br />
Junior Group Leader Computational Radiation Physics<br />
(Helmholtz-Zentrum Dresden-Rossendorf)<br />
Michael Bussmann is a member of the Laser Particle<br />
Acceleration Group at the Helmholtz-Zentrum Dresden-<br />
Rossendorf (HZDR). He leads the Junior Group on<br />
Computational Radiation Physics, looking for ways to<br />
create and optimize new sources of radiation using<br />
high-intensity lasers. His goal is to create low-cost,<br />
compact, laser-driven sources of ion, electron and X-ray<br />
beams that can be used to understand the properties of<br />
matter on the atomic scale. Besides his interest in<br />
fundamental physics Michael helps to make laser-driven<br />
ion beams available to cancer patients for ion beam<br />
treatment of tumors. With <strong>GPU</strong>s he has been able to<br />
simulate the generation of laser-driven particle beams<br />
in a new, much faster way. Since then, Michael is used to<br />
think of computation speed in frames per second.<br />
h Session(s): S0067 - PICon<strong>GPU</strong> - Bringing largescale<br />
Laser Plasma Simulations to <strong>GPU</strong><br />
Supercomputing (Tuesday, 15:00, Room: A8)<br />
h S0708- Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 1<br />
(Thursday, 9:00, Room: J3)<br />
Javier Cabezas<br />
PhD Student (Barcelona Supercomputing Center)<br />
Javier Cabezas received a bachelor’s degree in Computer<br />
Science and a master’s degree in Computer Architecture<br />
from Universitat Politècnica de Catalunya (UPC). Since<br />
2008, he is a PhD student in the Computer Architecture<br />
Department at UPC. He also works in the Barcelona<br />
Supercomputing Center as a resident student since 2009.<br />
He has contributed to projects done in collaboration with<br />
companies like Hewlett-Packard, NXP and Repsol. His<br />
research is focused on operating system and run-time<br />
support for heterogeneous massively-parallel computing<br />
systems and massively-parallel accelerators.<br />
h Session(s): S0333 - GMAC-2: Easy and Efficient<br />
<strong>Program</strong>ming for CUDA-Based Systems<br />
(Thursday, 09:00, Room: B)<br />
Tugkan Calapoglu<br />
Lead Graphics Software Developer (VIRES<br />
Simulationstechnologie GmbH)<br />
Tugkan Calapoglu is the lead graphics software<br />
developer at Vires GmbH, Germany, with more than 10<br />
years of experience in visual simulation industry. He is<br />
working on design and development of 3D rendering<br />
software for real-time hardware-in-the-loop and<br />
human-in-the-loop simulation applications.<br />
h Session(s): S0319 – Advanced Driver<br />
Assistance System Testing using OptiX<br />
(Tuesday, 14:00, Room: N)<br />
D. Andrew Carr<br />
Director of Bioinformatics (Accelerated <strong>Technology</strong><br />
Laboratories, Inc.)<br />
D. AndrewCarr, Ph.D. is the Director of Bioinformatics for<br />
Accelerated <strong>Technology</strong> Laboratories where he oversees<br />
the design and development of new high through put<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
107
SPEAKERS AND<br />
PANELISTS<br />
computational and database tools for use in human<br />
genomic scale analysis projects. Andrewreceived his<br />
received his Ph.D. in Computational Science<br />
Bioinformatics from George Mason University in 2006.<br />
After spending a year as a research assistant professor in<br />
Computational Materials Science Center and<br />
Nanotechnology at GMU, he took a postdoctoral position at<br />
University of North Carolina Charlotte, where he worked<br />
developing tools algorithms, database and visualization<br />
tools for genomic microarray and sequence analysis.<br />
h Session(s): S0037 - SeqNFind: Application Of<br />
CUDA <strong>GPU</strong> Technologies To Sequence Alignment<br />
Techniques (Tuesday, 17:00, Room: K)<br />
Patrice Castonguay<br />
Emerging Applications Intern (NVIDIA)<br />
Patrice Castonguay is completing his Ph.D. in the<br />
Aeronautics and Astronautics department at Stanford<br />
University working under the supervision of Professor<br />
Antony Jameson at the Aerospace Computing Lab. His<br />
research focuses on unstructured high-order methods<br />
for fluid flow simulations and on the use of <strong>GPU</strong>s for<br />
algorithm developments in high performance<br />
computing. Recently, he worked in the Emerging<br />
Applications group at NVIDIA on the development of<br />
algebraic multigrid methods.<br />
h Session(s): S0332 - Efficient Graph Matching<br />
and Coloring on the <strong>GPU</strong><br />
(Wednesday, 16:00, Marriott Ballroom 3)<br />
Bryan Catanzaro<br />
Research Scientist (NVIDIA)<br />
Bryan recently received his PhD from the University of<br />
California at Berkeley, where he researched compilation<br />
techniques for embedded data parallel languages. He<br />
then joined NVIDIA Research, where he focuses on<br />
developing the Copperhead runtime and compiler.<br />
h Session(s): S0525 - Copperhead: Data Parallel<br />
Python (Wednesday, 16:30, Room: A3)<br />
Ulises Cervantes-Pimentel<br />
Senior Kernel Developer (Wolfram Research)<br />
Ulises Cervantes-Pimentel is Wolfram’s research lead<br />
kernel developer in visualization, computational teometry<br />
and <strong>GPU</strong> development since 2001. Ulises is a graduate<br />
from the University of Illinois at Urbana-Champaign<br />
h Session(s): S0430 – Developing Next-Generation<br />
CUDA Acceleration in Wolfram’s Mathematica with<br />
Parallel Nsight (Tuesday, 09:30, Room: B)<br />
h S0106 - <strong>GPU</strong> Based Numerical Methods in<br />
Mathematica (Thursday, 14:30, Room: L)<br />
Dominic Chandar<br />
Postdoctoral Research Associate (University of Wyoming)<br />
Dominic is a Postdoc at the University of Wyoming, and<br />
works on <strong>GPU</strong> acceleration for CFD codes. He has a PhD<br />
in Mechanical and Aerospace Engineering from Nanyang<br />
Technological University, Singapore, and a Masters in<br />
Aerospace Engineering from Indian Institute of Science,<br />
India. He has also held the position of a Scientist in the<br />
Defense Research and Development Organization, India.<br />
h Session(s): S0264 - CU++: An Object-Oriented<br />
Framework for Computational Fluid Dynamics<br />
(CFD) Applications (Thursday, 09:30, Room: A8)<br />
Jacqueline H. Chen<br />
Combustion Research Facility,National Laboratories<br />
Jacqueline H. Chen is a Distinguished Member of<br />
Technical Staff at the Combustion Research Facility at<br />
Sandia National Laboratories. She has contributed<br />
broadly to research in petascale direct numerical<br />
simulations (DNS) of turbulent combustion focusing on<br />
fundamental turbulence-chemistry interactions. These<br />
benchmark simulations provide fundamental insight into<br />
combustion processes and are used by the combustion<br />
modeling community to develop and validate turbulent<br />
combustion models for engineering CFD simulations. In<br />
collaboration with computer scientists and applied<br />
mathematicians she is the Director of the Center for<br />
Exascale Simulation of Combustion in Turbulence<br />
(ExaCT) co-designed exascale DNS algorithms together<br />
with exascale computer architectures including in-situ<br />
data mining and visualization.<br />
h Session(s): S0655 Direct Numerical Simulation of<br />
Turbulence-Chemistry Interactions: Fundamental<br />
Insights Towards Predictive Models<br />
(Tuesday, 14:30, Room: A2)<br />
Jeff Chien<br />
Principle Scientist (Adobe Systems)<br />
Biography unavailable at press time.<br />
h Session(s): S0395 – <strong>GPU</strong> Enablement in Adobe<br />
Photoshop (Tuesday, 09:00, Room: A2)<br />
Suren Chilingaryan<br />
Researcher (Karlsruhe Institute of <strong>Technology</strong>)<br />
Suren Chilingaryan is a data processing expert at<br />
Institute for Data Processing and Electronics at<br />
Karlsruhe Institute of <strong>Technology</strong>. He graduated in<br />
mathematics from Moscow State University and was<br />
awarded a Ph.D. degree in Computer Science from<br />
Armenian National Academy of Sciences. He works on<br />
data acquisition and slow control systems for the long<br />
running scientific experiments. The current research<br />
focus is a high performance data processing.<br />
h Session(s): S0259 - A High Performance<br />
Platform for Real-Time X-Ray Imaging<br />
(Wednesday, 15:00, Room: A8)<br />
Samuel Cho<br />
Assistant Professor (Wake Forest University)<br />
Sam graduated from the University of Maryland, Baltimore<br />
County with B.S. degrees in Biochemistry and Computer<br />
Science. He went on to receive a Ph.D. in Physical<br />
Chemistry at the University of California, San Diego. Since<br />
then, he performed post-doctoral research at the<br />
University of Maryland, College Park, where he was<br />
awarded the NIH (NRSA) Post-doctoral Fellowship. He has<br />
published his interdisciplinary computational biophysics<br />
research in protein and RNA dynamics, folding and<br />
assembly in over 15 papers in peer-reviewed journals,<br />
including four as first author in the high impact factor<br />
journal, Proceedings of the National Academy of Sciences.<br />
h Session(s): S0139 - <strong>GPU</strong>-Based Molecular<br />
Dynamics Simulations of Protein and RNA<br />
Assembly (Wednesday, 17:00, Room: N)<br />
Jike Chong<br />
Co-Director of CUDA Research Center<br />
(Carnegie Mellon University)<br />
Jike Chong is an adjunct professor at Carnegie Mellon<br />
Silicon Valley and directs the CUDA Teaching Center and<br />
the CUDA Research Center there. For the past 10 years,<br />
he has been working on multicore, manycore and<br />
parallel computing technologies at Carnegie Mellon<br />
University, Intel Research Labs, and Sun Microsystems<br />
and University of California, Berkeley. His research<br />
interests include speech recognition and analytics,<br />
quantitative financial analytics, and design patterns for<br />
parallel programming. Jike earned his Ph.D. from UC<br />
Berkeley, M.S. and B.S. for Carnegie Mellon University.<br />
h Session(s): S0223 - Rapid Training of Acoustic<br />
Models Using <strong>GPU</strong>s (Tuesday, 15:00, Room: N)
Constantin Chuyeshov<br />
Algorithm Engineer (Cadence Design Systems)<br />
Constantin Chuyeshov is an Algorithm Engineer with<br />
Computational Lithography Solutions Group at Cadence<br />
Design Systems. He is focusing on computational<br />
lithography, image processing and high-performance<br />
computing. Constantin was born in 1979 in Kharkov,<br />
Ukraine. He got his BS degree in Mathematical Physics<br />
and Applied Mathematics from Karazin Kharkov National<br />
University (Ukraine) and MSc degree in Computational<br />
Mathematics from Stanford University.<br />
h Session(s): S0329 - Using <strong>GPU</strong>s to Speedup<br />
Computational Lithography<br />
(Tuesday, 9:30, Room: J3)<br />
Gilles Civario<br />
Senior Software Architect (ICHEC)<br />
Gilles Civario is <strong>GPU</strong> software architect in ICHEC, PI of<br />
ICHEC’s NVIDIA CUDA Research Center, and a NVIDIA<br />
Certified CUDA <strong>Program</strong>mer. Gilles is involved directly or<br />
indirectly in all of ICHEC’s <strong>GPU</strong>-related projects. His<br />
involvement ranges from software or hardware<br />
architectural advices, to code development and tuning,<br />
debugging and implementation. Gilles also regularly<br />
presents talks to explain <strong>GPU</strong> computing and its benefits,<br />
and runs NVIDIA certified CUDA training courses. His<br />
unique expertise in both hardware and software allows<br />
him to design and propose tailored solutions to address<br />
each users’ particular needs. Gilles is particularly involved<br />
in ICHEC’s technology transfer activities.<br />
h Session(s): S0034 - Real-Time Risk Simulation:<br />
The <strong>GPU</strong> Revolution In Profit Margin Analysis<br />
(Tuesday, 15:00, Room: L)<br />
Geoff Clark<br />
CEO (Acceleware Ltd.)<br />
Before joining Acceleware, Geoff was CFO of SQFive a<br />
private oil and gas technology company, and of TSX listed<br />
Guest-Tek Interactive Entertainment Ltd. While with<br />
Guest-Tek, Geoff was instrumental in completing two<br />
major acquisitions, a share buyback, and several private<br />
placements of debt and equity. Geoff was a co-founder of<br />
Revolve Magnetic Bearings Inc. a supplier of magnetic<br />
levitation systems. Geoff secured several rounds of<br />
financing for Revolve and was instrumental in Revolve’s<br />
eventual sale to Sweden’s SKF. Geoff holds an MBA<br />
degree from the University of Western Ontario, and a<br />
BSc in Electrical Engineering from the University of<br />
Calgary.<br />
h Session(s): S0433 - Accelerated FDTD Technique<br />
for Marine Controlled Source Electromagnetic<br />
Imaging (Wednesday, 15:30, Room: A7)<br />
Michael Clark<br />
Compute DevTech Engineer (NVIDIA)<br />
Dr. Clark’s background is in high energy physics, having<br />
completed his doctoral research in Monte Carlo<br />
algorithms for lattice qcd in 2005, graduating from the<br />
University of Edinburgh. He subsequently moved to<br />
Boston University, developing adaptive multi-grid<br />
algorithms and symplectic integrators. There, he initiated<br />
research into harnessing <strong>GPU</strong>s for lattice QCD<br />
computation. Dr. Clark spent 2009-2011 at Harvard<br />
University, where he continued to work on algorithms for<br />
<strong>GPU</strong>s and many-core processors, with focus on signal<br />
processing and multigrid. Dr. Clark moved to NVIDIA in<br />
2011, where his present work lies at the interface between<br />
applications, algorithms and parallel computation.<br />
h Session(s): S0347 - Accelerating Radio Astronomy<br />
Cross-Correlation beyond 1 Tflops using Fermi<br />
(Thursday, 09:00, Room: M)<br />
Don Clegg<br />
VP (Supermicro)<br />
Biography unavailable at press time.<br />
h Session(s): S0636 - Supermicro: Worldwide leader<br />
in GP/<strong>GPU</strong> Servers and Workstation Platforms<br />
(Wednesday, 16:00, Room: M)<br />
Esteban Clua<br />
Professor (Computer Science Department of<br />
Universidade Federal Fluminense, Rio de Janeiro, Brazil)<br />
Esteban is associated professor at Universidade Federal<br />
Fluminense, Rio de Janeiro, and director of UFF<br />
Medialab. He is one of the founders of SBGames -<br />
Brazilian Symposium of Digital Entertainment and Video<br />
Games, is director of Academia of IGDA-Rio, president of<br />
the Brazilian Computing Society Game. In 2007 received<br />
an award for contributing to the growth of the video<br />
game industry in Brazil and in 2009 received the prize of<br />
Young Scientist of the State of Rio de Janeiro. Esteban is<br />
coordinator of the first Latin America CUDA NVIDIA<br />
Research Center, at UFF Medialab.<br />
h Session(s): S0074 – Techniques for Designing<br />
GP<strong>GPU</strong> Games (Thursday, 17:00, Room: L)<br />
Jonathan Cohen<br />
Emerging Applications (NVIDIA)<br />
Jonathan Cohen leads the Emerging Applications group<br />
as part of NVIDIA’s Content and <strong>Technology</strong> organization.<br />
Emerging Applications seeks to develop enabling<br />
technologies that will allow end-users to access the<br />
power of <strong>GPU</strong> computing in a wide variety of application<br />
areas. Previously, he spent three years as a senior<br />
research scientist with NVIDIA Research developing<br />
scientific computing and real-time physical simulation<br />
applications on NVIDIA’s massively parallel <strong>GPU</strong>s. Cohen<br />
was awarded an Academy Award (Technical Achievement<br />
Award) in 2007 from the Academy of Motion Pictures<br />
Arts and Sciences for his work on fluid simulation and<br />
volumetric modeling for visual effects. He received an<br />
undergraduate degree from Brown in Mathematics and<br />
Computer Science.<br />
h Session(s): S0332 – Efficient Graph Matching<br />
and Coloring on the <strong>GPU</strong><br />
(Wednesday, 16:00, Marriott Ballroom 3)<br />
Chris A. Cocosco<br />
Scientist (University Medical Center Freiburg, Dept. of<br />
Radiology, Medical Physics.)<br />
Chris A. Cocosco has spent over 15 years in research &<br />
development at the intersection of medical imaging,<br />
electrical engineering, computer science, and high<br />
performance computing, in both academic/clinical and<br />
industrial/commercial environments.<br />
h Session(s): S0348 - <strong>GPU</strong>s Open New Avenues in<br />
Medical MRI (Wednesday, 10:30, Room: A8)<br />
Andrew Corrigan<br />
Research Mathematician (Naval Research Laboratory)<br />
AndrewCorrigan has been a scientist at the Laboratory<br />
for Computational Physics and Fluid Dynamics at the US<br />
Naval Research Laboratory since 2010, where he is<br />
developing the Jet Engine Noise Reduction (JENRE)<br />
code. His research interests are in supersonic jet noise<br />
reduction and algorithms for high performance CFD<br />
solvers. He received his Ph.D. in 2009 from George<br />
Mason University, where he also worked as a<br />
postdoctoral researcher in the GMU CFD Center, porting<br />
the unstructured grid CFD code FEFLO to run on <strong>GPU</strong>s.<br />
h Session(s): S0031 - Unstructured Grid Numbering<br />
Schemes for <strong>GPU</strong> Coalescing Requirements<br />
(Tuesday, 10:00, Room: A8)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
109
SPEAKERS AND<br />
PANELISTS<br />
Iain Couzin<br />
Professor, Department of Ecology and Evolutionary<br />
Biology (Princeton University)<br />
Iain Couzin joined the Princeton faculty in late 2007.<br />
Prior to joining the faculty there, he was a Royal Society<br />
University Research Fellow in the Department of<br />
Zoology, University of Oxford, and a Junior Research<br />
Fellow in the Sciences at Balliol College, Oxford. His<br />
work aims to reveal the fundamental principles that<br />
underlie evolved collective behavior, and consequently<br />
his research includes the study of a wide range of<br />
biological systems, from brain tumors to insect swarms,<br />
fish schools and human crowds. Couzin is a member of<br />
the Faculty of 1000 Biology and in recognition of his<br />
research he was a recipient of the Searle Scholar Award<br />
in 2008, the Mohammed Dahleh Award in 2009 and<br />
Popular Science Magazines “Brilliant 10” award in 2010.<br />
Couzin holds a PhD in Biology from the University of<br />
Bath, UK.<br />
h Session(s): S3001: Day 2 Keynote: From Democratic<br />
Consensus to Cannibalistic Hordes: <strong>GPU</strong> Computing<br />
Reveals the Principles of Collective Behavior<br />
(Wednesday, 11:00, Keynote Hall)<br />
Cyril Crassin<br />
Postdoctoral Research Scientist (NVIDIA)<br />
Cyril Crassin joined NVIDIA Research in 2011 as a<br />
postdoctoral research scientist. Cyril obtained his Ph.D.<br />
degree from Grenoble University at INRIA in France in<br />
2011. His research interests include realistic rendering,<br />
voxel-based representations, global illumination,<br />
real-time ray-tracing and out-of-core data management.<br />
During his Ph.D., he developed the GigaVoxels approach<br />
that proposed the use of pre-filtered voxel representations<br />
for real-time rendering of large detailled scenes, complex<br />
objects, as well as global illumination effects.<br />
h Session(s): S0610 - Octree-Based Sparse<br />
Voxelization For Real-Time Global Illumination<br />
(Tuesday, 14:30, Room: B)<br />
Luis Crivelli<br />
Director of Solver Development (Dassault Systemes,<br />
SIMULIA)<br />
Biography unavailable at press time.<br />
h Session(s): S0431 - Evolving Use of <strong>GPU</strong> for<br />
Dassault Systems Simulation Products<br />
(Wednesday, 09:00, Room: K)<br />
Jon Currey<br />
(Microsoft Research Silicon Valley)<br />
Jon Currey joined Microsoft Research in 2007, initially<br />
working on the Dryad and DryadLINQ cluster computing<br />
projects. His current research focus is systems support for<br />
<strong>GPU</strong>-accelerated computation. Jon previously worked for<br />
Apple, Oracle, Nortel and some startups. He holds a BA<br />
and MA in philosophy from the University of Cambridge.<br />
h Session(s): S0320 – PTask: OS Support for <strong>GPU</strong><br />
Dataflow <strong>Program</strong>ming (Thursday, 14:00, Room: B)<br />
Kenneth Czechowski<br />
Student (Georgia Tech)<br />
Kenneth Czechowski is a PhD student in the School of<br />
Computational Science and Engineering at the Georgia<br />
Institute of <strong>Technology</strong>. His research interests include<br />
algorithm-architecture codesign, performance modeling<br />
for <strong>GPU</strong>/manycore architectures, and parallel and<br />
distributed algorithms. Czechowski holds a masters in<br />
computer science from the Georgia Institute of <strong>Technology</strong>.<br />
h Session(s): S0362 - Maximizing Performance on<br />
Multi-<strong>GPU</strong> Systems (Thursday, 09:00, Hall 1)<br />
Johann Dahm<br />
(University of Michigan)<br />
Biography unavailable at press time.<br />
h Session(s): S0031 – Unstructured Grid Numbering<br />
Schemes for <strong>GPU</strong> Coalescing Requirements<br />
(Tuesday, 10:00, Room: A8)<br />
Abdul Dakkak<br />
Wolfram Research)<br />
Biography unavailable at press time.<br />
h Session(s): S0100 – Mathematica as a Practical<br />
Platform for <strong>GPU</strong>-Accelerated Finance<br />
(Wednesday, 17:00, Room: L)<br />
h S0106 – <strong>GPU</strong> Based Numerical Methods in<br />
Mathematica (Thursday, 14:30, Room: L)<br />
Eric Darve<br />
Professor (Stanford)<br />
Prof. Darve received his PhD in Applied Mathematics from<br />
Pierre et Marie Curie University, Paris, France (1999),<br />
while working in the Jacques-Louis Lions Numerical<br />
Analysis Laboratory under the supervision of Prof. Olivier<br />
Pironneau. He was a postdoctoral fellow at Stanford in the<br />
Center for Turbulence Research, under the supervision of<br />
Prof. Parviz Moin and Dr. AndrewPohorille (NASA Ames<br />
Research Center). He became an assistant professor of<br />
Mechanical Engineering at Stanford University in 2001<br />
and was promoted to Associate Professor in 2010. He is a<br />
member of the Institute for Computational and<br />
Mathematical Engineering, a CUDA Center of Excellence.<br />
This work is in collaboration with Dr. Toru Takahashi<br />
(Nagoya University) and Dr. Cris Cecka (Harvard).<br />
h Session(s): S0334 - The Fast Multipole Method<br />
on CPU and <strong>GPU</strong> Processors<br />
(Thursday, 15:00, Marriott Ballroom 3)<br />
Guy De Beer<br />
CEO (Playcast Media System)<br />
Guy founded Playcast Media System. During his 16 years<br />
in the digital media communications industry, he led the<br />
successful development and commercialization of<br />
dozens of digital media communications products and<br />
services. Prior to founding Playcast, Guy managed<br />
Harmonic’s (NASDAQ: HLIT) Broadcast and VoD edge<br />
product lines. Before joining Harmonic, he held several<br />
product marketing and business development<br />
management positions with the MRV group (NASDAQ:<br />
MRVC). Guy holds a BA in Media from the University of<br />
Bar-Ilan in Israel and an MA in Philosophy of Digital<br />
Culture from the University of Tel Aviv.<br />
h Session: – S2006- Emerging Companies Summit:<br />
CEO on Stage Featuring Raytrix and Playcast,<br />
Featuring Raytrix, Playcast and Universal Robotics<br />
(Wednesday, 17:00, Marriott Ballroom 4)<br />
Jose de Corral<br />
Principal Consulting Engineer (Waters Corporation)<br />
Jose is currently Principal Consulting Engineer at Waters<br />
Corporation. Jose de Corral received his B.S. in Electrical<br />
Engineering from Universidad Politénica de Madrid, and<br />
his M.S. in Software Engineering from Harvard University.<br />
Jose has a long career at Waters, where he started in<br />
1983. He has been involved in many R&D design projects,<br />
specializing in analog electronic design, feedback control<br />
systems, and embedded software development. Jose’s<br />
preferences evolved toward the design of complex<br />
algorithms for data processing and instrument control.<br />
Since 2007, his main focus has been in Computer<br />
Graphics and <strong>GPU</strong> Computing.<br />
h Session(s): S0327 - Large and Sparse– Mass<br />
Spectrometry Data Processing in the <strong>GPU</strong><br />
(Wednesday, 14:00, Room: B)
Mario Dean<br />
Schlumberger<br />
Mario Dean’s current role is remote application delivery<br />
product champion at Schlumberger Information<br />
Solutions.<br />
h S0434 Schlumberger LiveQuest: Application<br />
Delivery and Collaboration Solution<br />
(Tuesday, 14:00, Room: A7)<br />
Julien Demouth<br />
Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Julien Demouth is a Developer <strong>Technology</strong> Engineer at<br />
NVIDIA where he works mainly on CUDA for high<br />
performance computing. Julien obtained his Ph.D.<br />
degree in Computational Geometry from Nancy<br />
University at INRIA in France.<br />
h Session(s): S0602 – An Introduction to the<br />
Thrust Parallel Algorithms Library<br />
(Tuesday, 17:00, Room: A3)<br />
h S0285 - Optimization of a Sparse Matrix-Matrix<br />
Multiplication on the <strong>GPU</strong><br />
(Thursday, 14:00, Room: L)<br />
Yangdong Deng<br />
Associate Professor (Tsinghua University)<br />
Yangdong Deng received his Ph.D. degree in Electrical<br />
and Computer Engineering from Carnegie Mellon<br />
University, Pittsburgh, PA, in 2006. He received his MS<br />
and BE degrees in Electronic Department from Tsinghua<br />
University, Beijing, in 1998 and 1995, respectively. He has<br />
been an associate professor of Institute of<br />
Microelectronics, Tsinghua University, since 2008. He<br />
also leads the systems modeling team of the Tsinghua-<br />
Intel Center of Advanced Mobile Computing <strong>Technology</strong>.<br />
His research interests include VLSI verification, parallel<br />
microarchitecture, and parallel algorithms. He is the<br />
author or co-author of three books and over 30 papers.<br />
h Session(s): S0050 - High Performance Logic<br />
Simulation with <strong>GPU</strong>s (Tuesday, 16:00, Room: J3)<br />
Kristof Denolf<br />
Research Engineer (Barco)<br />
Kristof Denolf received the M.Eng. degree in electronics<br />
from the KHBO(Belgium) in 1998, the M.Sc. degree in<br />
electronic system design from LMU (U.K.) in 2000 and a<br />
PhD from the Technische Universiteit Eindhoven in 2007.<br />
He joined IMEC, in August 1998, as research engineer<br />
focusing on optimized, low power video implementations.<br />
During 2008, he spent six months as a visiting<br />
researcher at Xilinx research labs to work with highlevel<br />
synthesis tools. In 2010, he was as SW architect at<br />
Philips. Recently he joined Barco’s technology center,<br />
working on cost efficient design of advanced video<br />
processing systems.<br />
h Session(s): S0252 - Building Real-Time<br />
Professional Visualization Solutions with OpenCL<br />
(Thursday, 10:30, Room: A1)<br />
Luiz DeRose<br />
Director of <strong>Program</strong>ming Environment (Cray Inc.)<br />
Dr. Luiz DeRose is a Senior Principal Engineer and the<br />
<strong>Program</strong>ming Environments Director at Cray Inc, where<br />
he is responsible for the programming environment<br />
strategy for all Cray systems. Dr. DeRose has a Ph.D. in<br />
Computer Science from the University of Illinois at<br />
Urbana-Champaign. With more than 20 years of high<br />
performance computing experience and a deep knowledge<br />
of its programming environments, he has published more<br />
than 50 peer-review articles in scientific journals,<br />
conferences, and book chapters, primarily on the topics of<br />
compilers and tools for high performance computing.<br />
h Session(s): S0407 - A High Level <strong>Program</strong>ming<br />
Environment for Accelerated Computing<br />
(Tuesday, 15:00, Room: A3)<br />
Ronny Dewaele<br />
Director <strong>Technology</strong> Center (Barco)<br />
Biography unavailable at press time.<br />
h Session(s): S0252 – Building Real-Time<br />
Professional Visualization Solutions with OpenCL<br />
(Thursday, 10:30, Room: A1)<br />
Tanmay Dharmadhikari<br />
Senior Software Development Engineer (Beckman-Coulter)<br />
Biography unavailable at press time.<br />
h Session(s): S0638 – Lenovo ThinkStation<br />
Accelerates Medical Research with Beckman<br />
Coulter (Presented by Lenovo)<br />
(Tuesday, 16:00, Room: M)<br />
Michael Dickens<br />
Graduate Student (University of Notre Dame)<br />
Michael L. Dickens is a Ph.D. candidate in Electrical<br />
Engineering at the University of Notre Dame. He<br />
received a B.S. from MIT in 1991, and a M.S. degree from<br />
the University of Notre Dame in 2001. He has more than<br />
10 years of industry experience, having worked at the<br />
Oak Ridge National Labs (Oak Ridge, TN), Bolt Beranek<br />
and Newman (“BBN”, Cambridge, MA), and most<br />
recently the MITRE Corporation (Bedford, MA). His<br />
current research interests span all aspects of<br />
programming for software-defined radios -- from<br />
system boot codes to kernels, signal-processing<br />
algorithm implementations to user interfaces.<br />
h Session(s): S0134 - On the Integration of<br />
OpenCL into a Software Defined Radio<br />
(Thursday, 17:30, Room: M)<br />
Michael Dixon<br />
Research Engineer (Willow Garage, Inc)<br />
Biography unavailable at press time.<br />
h Session(s): S0088 – Point Cloud Library (PCL) on<br />
CUDA (Tuesday, 14:00, Room: C)<br />
Sebastien Domine<br />
Sr. Director, Software Engineering, Developer<br />
Tools (NVIDIA)<br />
Sébastien is the Sr. Director of Developer <strong>Technology</strong><br />
Tools at NVIDIA. He runs various software engineering<br />
teams and oversees the development of software<br />
products dedicated to ease the developer’s life and to<br />
foster the creation of more applications that can take<br />
advantage of the <strong>GPU</strong>. Prior to NVIDIA, he worked on PC<br />
games at GameFX/THQ and 3D digital content creation<br />
tools at Katrix and Nichimen Graphics. He holds a<br />
Diplôme d’Ingénieur in Computer Science from EPITA,<br />
Paris, France.<br />
h Session(s): S0430 - Developing Next-Generation<br />
CUDA Acceleration in Wolfram’s Mathematica with<br />
Parallel Nsight (Tuesday, 09:30, Room: B)<br />
Mathieu Dubois<br />
(Bull)<br />
Mathieu joined Bull in 2009 as a <strong>GPU</strong> and hardware<br />
accelerator expert. After an engineering degree in<br />
electronics and a PhD in theoretical physics and<br />
nano-sciences, he started porting electronic transport<br />
applications to Graphical Processing Units in 2007, as<br />
part of a postdoctoral project for the simulation of new<br />
materials for nano-electronics. Now a member of the<br />
BULL’s Applications & Performance Team based in<br />
Grenoble, France, his main <strong>GPU</strong> activities are<br />
benchmarking, CUDA and OpenCL training, Proofs Of<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
111
SPEAKERS AND<br />
PANELISTS<br />
Concept and new technology evaluations. In 2011, he<br />
was heavily involved in the deployment of the three<br />
largest <strong>GPU</strong> clusters in Europe, at CEA, GENCI and the<br />
Barcelona Supercomputing Centre.<br />
h Session(s): S0643 Hybrid Architectures for<br />
Advanced Seismic Imaging: Recent Experiences at<br />
Bull (Presented by Bull) (Tuesday, 17:00, Room: M)<br />
Eric Dunn<br />
Electromagnetic Research Scientist (SAIC)<br />
Dr. Dunn has been a research scientist at SAIC since<br />
2005 responsible for planning and executing a diverse<br />
range of solutions to problems that employ<br />
computational electromagnetics. His current<br />
responsibilities involve serving as a principle investigator<br />
to research high frequency asymptotic methods and<br />
hybrid techniques. His research interests involve<br />
studying hardware and software acceleration for<br />
high-performance scientific computing. He has been<br />
involved with product development and training for many<br />
SAIC software tools as well as outreach to Universities<br />
for collaboration and research mentoring. BSEE/<br />
UMCP/1999, MS/UIUC/2000, PhD/UIUC/2005.<br />
h Session(s): S0046 - Application of the <strong>GPU</strong> to a<br />
Two-Part Computational Electromagnetic<br />
Algorithm (Tuesday, 14:30, Room: J3)<br />
Daniel Egloff<br />
Managing Partner (QuantAlea GmbH)<br />
Dr. Daniel Egloff studied mathematics, theoretical<br />
physics, and computer science at the University of<br />
Zurich and the ETH Zurich. He has been working for the<br />
last 17 years in the financial industry, mainly in risk<br />
management, credit risk, and derivative pricing. Since<br />
2007 he is actively working with <strong>GPU</strong>s to accelerate<br />
quantitative financial calculations. In 2010 he founded<br />
QuantAlea, a niche consulting firm providing specialized<br />
project services in the area of derivative modeling,<br />
statistical arbitrage strategies and risk management<br />
paired with first class software engineering.<br />
h Session(s): S0405 - New Generation <strong>GPU</strong><br />
Accelerated Financial Quant Libraries<br />
(Wednesday, 15:00, Room: L)<br />
Anders Eklund<br />
PhD Student (Linköping University)<br />
Anders Eklund is a Ph.D. student at Linköping University,<br />
Sweden, with a M.Sc. in applied physics and electrical<br />
engineering. He is focused on medical image analysis,<br />
especially functional magnetic resonance imaging<br />
(fMRI). His current work involves using <strong>GPU</strong>s for<br />
non-parametric fMRI analysis (e.g. random permutation<br />
tests), real-time fMRI analysis (e.g. brain computer<br />
interfaces), interactive functional connectivity analysis<br />
and general medical image processing in 4D (e.g.<br />
denoising of large computed tomography (CT) datasets,<br />
512 x 512 x 450 x 20).<br />
h Session(s): S0017 - 4D Medical Image Processing<br />
with CUDA (Wednesday, 09:00, Room: A8)<br />
Rob Enderle<br />
Principal Analyst (Enderle Group)<br />
Rob is President and Principal Analyst of the Enderle<br />
Group, a forward looking emerging technology advisory<br />
firm. With over 25 years experience with emerging<br />
technologies he has provided regional and global<br />
companies with guidance on how to be successful in this<br />
changing world. Before founding the Enderle Group Rob<br />
was the Senior Research Fellow for Forrester Research<br />
and the Giga Information Group. While there he worked<br />
for and with companies like Microsoft, TI, HP, IBM, Dell,<br />
Toshiba, Gateway, Sony, USAA, Texas Instruments, AMD,<br />
Intel, Credit Suisse First Boston, GM, Ford, ROLM, and<br />
Siemens. Prior to that he worked for IBM and held<br />
positions in Internal Audit, Competitive Analysis,<br />
Marketing, Finance, and Security. Currently Rob writes<br />
on Emerging Personal <strong>Technology</strong>, Security, and Linux<br />
for a wide variety of publications including<br />
TechNewsWorld, CIO, Forbes, TGdaily, TMCNET,<br />
Datamation, and IT Business Edge and international<br />
news organizations like CNBC, CNN, Bloomberg, and<br />
NPR. Rob also does a semi weekly radio spot for Wall<br />
Street Journal radio on consumer technology. Rob sits<br />
on the advisory councils for a variety of technology<br />
companies.<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Eric Enderton<br />
Research Scientist (NVIDIA)<br />
Eric Enderton is a research scientist at NVIDIA, focusing<br />
on transparency, shadows, and film rendering. He was a<br />
principal engineer on NVIDIA Gelato, the first <strong>GPU</strong>accelerated<br />
film rendering software. Previously, Eric<br />
developed rendering and animation software at<br />
Lucasfilm’s Industrial Light & Magic and at other major<br />
film studios. His film credits include “Terminator 2”,<br />
“Jurassic Park”, and “Star Wars Episode I”. Eric has a<br />
masters degree in computer science from the University<br />
of California at Berkeley.<br />
h Session(s): S0409 - Stochastic Rasterization<br />
(Tuesday, 15:30, Room: B)<br />
Kenneth Esler<br />
Computational Physicist (Stone Ridge <strong>Technology</strong>)<br />
Dr. Esler is a computational physicist at Stone Ridge<br />
<strong>Technology</strong> in Bel Air, Maryland. He received his<br />
bachelor’s degree in physics from MIT in 1999. He<br />
completed his Ph.D. in computational condensed matter<br />
physics at the University of Illinois at Urbana-Champaign<br />
in 2006, developing methods for quantum-level<br />
simulation of matter at finite temperature. He accepted<br />
postdoctoral appointments at the Carnegie Institution of<br />
Washington and the National Center for Supercomputing<br />
Applications. His professional interests include<br />
computational methods development, algorithm<br />
optimization, and heterogeneous computing platforms.<br />
h Session(s): S0140 - Accelerating Reservoir<br />
Simulation and Algebraic Multigrid with <strong>GPU</strong>s<br />
(Wednesday, 14:00, Room: A7)<br />
Sorin Faibish<br />
(EMC Corporation)<br />
Sorin Faibish designed and built innovative shared High<br />
Performance storage solutions including architecture<br />
design of NFS clusters, architect the performance<br />
strategy of Celerra file system. Sorin is a technology<br />
consultant and evangelist for pNFS as well as member<br />
of IETF and contributor to the pNFS protocol and<br />
promoted pNFS in research forums. Sorin’s wider<br />
expertise include: Clustered File systems, Storage<br />
systems, High Performance Computing, Robotic<br />
architectures, Complex systems design and Artificial<br />
Intelligence. Sorin holds a Master degree from Technion,<br />
Israel in EE, and is a member of IEEE, ACM, USENIX,<br />
IETF and SNIA and has 50 papers and 36 patents.<br />
h Session(s): S0701 - Los Alamos AHPC Symposium,<br />
New <strong>GPU</strong> Appliance for Co-processing<br />
(Wednesday, 15:00, Room: J)<br />
Wes Faler<br />
Head of Software Development (Part-Time Scientists)<br />
Wesley Faler is a Head of Software Development at<br />
Part-Time Scientists. He is also a software engineer with<br />
25 years of broad experience. Unusual skills include<br />
<strong>GPU</strong>-based simulations, genetic programming, FPGAs,
high voltage electronics, ion engines, and sending a<br />
rover to the moon with the Part-Time Scientists for the<br />
Google Lunar X Prize.<br />
h Session(s): S3002 – Day 3 Keynote: Not Your<br />
Grandfather’s Moon Landing<br />
(Thursday, 11:00, Keynote Hall)<br />
Robert Farber<br />
Chief Scientist (BlackDog Endeavors, LLC)<br />
Rob is recognized for his work in High Performance<br />
Computing (HPC), machine learning, complex dynamical<br />
systems and high energy physics. Lately, he has been<br />
focused on advancing the state-of-the art through his<br />
publications and computational research including his<br />
book CUDA Application Design and Development, online<br />
venues Doctor Dobb’s Journal and The Code Project,<br />
peer-review journals, conferences, and magazines such<br />
as Scientific Computing. Rob has co-founded two<br />
companies that achieved liquidity events, as a theoretical<br />
division scientist at Los Alamos, on-staff at SFI, Berkeley<br />
and PNNL. Currently, he is working with and teaching at<br />
research and educational organizations around the world.<br />
h Session(s): S0038 - Designing Killer CUDA<br />
Applications for X86, multi<strong>GPU</strong>, and CPU+<strong>GPU</strong><br />
(Thursday, 16:00, Marriott Ballroom 3),<br />
h S0646 Massively Parallel Code Development on<br />
Stelletto CDA (Presented by Creative Consultants)<br />
(Tuesday, 17:00, Room: A8)<br />
Reza Farivar<br />
PhD Student (University of Illinois at Urbana-Champaign)<br />
Reza Farivar received his B.S. degree in electrical<br />
engineering in 2003, and his M.S. degree in computer<br />
engineering in 2005. He is currently finishing his PhD in<br />
Electrical and Computer Engineering at the University of<br />
Illinois at Urbana-Champaign. His major research<br />
interests include parallel cloud computing programming<br />
models, heterogeneous computing algorithms<br />
(specifically with <strong>GPU</strong>s) and combining <strong>GPU</strong>s and cloud<br />
computing paradigms. He has also worked on reliability<br />
and security as well as ubiquitous computing.<br />
h Session(s): S0152 - Accurate Sequence Alignment<br />
using Distributed Filtering on <strong>GPU</strong> Clusters<br />
(Tuesday, 15:30, Room: K)<br />
Massimiliano Fatica<br />
Manager (NVIDIA)<br />
Massimiliano Fatica is a manager of the Tesla<br />
Performance Group at NVIDIA where he works in the<br />
area of <strong>GPU</strong> computing (high-performance computing<br />
and clusters). He holds a laurea in Aeronautical<br />
Engineering and a Phd in Theoretical and Applied<br />
Mechanics from the University of Rome “La Sapienza”.<br />
Prior to joining NVIDIA, he was a research staff member<br />
at Stanford University where he worked at the Center for<br />
Turbulence Research and Center for Integrated<br />
Turbulent Simulations on applications for the Stanford<br />
Streaming Supercomputer.<br />
h Session(s): S0522 – Introduction to CUDA Fortran<br />
(Monday, 14:30, Room: A3)<br />
Wu Feng<br />
Professor (Virginia Tech)<br />
Wu Feng holds dual appointments in Computer Science<br />
and Electrical & Computer Engineering at Virginia Tech<br />
(VT) and an adjunct professorship in Cancer Biology and<br />
Translational Science Institute at Wake Forest University.<br />
He is an internationally recognized expert in highperformance<br />
computing (HPC), as evidenced by his<br />
presence on HPCwire’s People to Watch List in 2011. His<br />
lab works at the synergistic intersection of HPC and the<br />
domain sciences. He is an ACM Distinguished Scientist<br />
and an IEEE Senior Member.<br />
h Session(s): S0156 - Towards Computing the Cure<br />
for Cancer (Tuesday, 17:00, Hall 1)<br />
Alex Fit-Florea<br />
Senior Engineer (NVIDIA)<br />
Alex Fit-Florea currently works for NVIDIA as the CUDA<br />
software manager in charge with core mathematical<br />
functionality, random number generators, and fft<br />
algorithms. His main professional and research<br />
interests revolve around computer arithmetic and<br />
numerical methods. He served as a member of the<br />
IEEE754-2008 Standard for Floating Point Arithmetic<br />
Review Committee. Alex holds B.S and M.S. degrees<br />
from UB-B, and a PhD from SMU.<br />
h Session(s): S0085 - Floating Point and IEEE 754<br />
Compliance for NVIDIA <strong>GPU</strong>s: Precision &<br />
Performance (Wednesday, 14:30, Room: A3)<br />
Christopher Fluke<br />
Senior Lecturer (Swinburne University of <strong>Technology</strong> -<br />
Centre for Astrophysics and Supercomputing)<br />
Dr. Christopher Fluke is a Senior Lecturer at the Centre<br />
for Astrophysics and Supercomputing, Swinburne<br />
University of <strong>Technology</strong>. His main research interests are<br />
in gravitational lensing, astronomy visualization, and<br />
advanced computation, with an emphasis on the adoption<br />
of <strong>GPU</strong>s to accelerate the rate of astronomical discovery.<br />
His <strong>GPU</strong> work has included advancements in gravitational<br />
microlensing computations (teraflop/s rates achieved on<br />
the desktop), real-time terascale visualization and data<br />
analysis on <strong>GPU</strong>-clusters (for next generation radio<br />
telescopes), and strategies for adoption of <strong>GPU</strong>s by<br />
astronomers. He is the Principle Investigator of the<br />
NVIDIA CUDA Research Centre at Swinburne University.<br />
h Session(s): S0707- Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Scalability:<br />
Hardware and Software (Thursday, 9:00, Room: J2)<br />
h S0022 - Scalable Frameworks and Algorithms<br />
for Terascale Radio Astronomy Images<br />
(Thursday, 14:30, Room: M)<br />
Steve Forde<br />
Senior Product Manager (Adobe)<br />
Steve Forde joined Adobe in 2011 as senior product<br />
manager for After Effects, the industry-leading software<br />
for creating sophisticated motion graphics and cinematic<br />
visual effects. In this role, Forde oversees extending<br />
After Effects into new markets and workflows. Forde is<br />
an experienced executive and co-founder of multiple<br />
businesses within media and emerging technology. He<br />
joined Adobe from Gridiron Software where he was<br />
co-founder/CEO and CTO. Gridiron develops<br />
complementary technologies for After Effects, and<br />
software for managing overall workflow in the creative<br />
enterprise. Forde grew the company from venture<br />
funding to a global operation and from a perpetual<br />
license revenue base to a SaaS model. Forde was<br />
co-founder/CEO of Creative Shack Inc. and oversaw an<br />
acquisition by Mitel Networks. Forde sits on the board of<br />
Black Cherry Digital Media.<br />
h Session(s): S0632 Learn how Adobe After Effects<br />
CS6 takes advantage of NVIDIA Optix technology<br />
for 3D Ray Tracing (Presented by Adobe)<br />
(Tuesday, 14:00, Room: M)<br />
Dustin Franklin<br />
GP<strong>GPU</strong> Applications Engineer (GE Intelligent Platforms)<br />
Dustin is a <strong>GPU</strong> expert in the defense & aerospace<br />
industry. Originally a 3D rendering architect for games<br />
and simulations, he changed focus in 2005 to GP<strong>GPU</strong>.<br />
Dustin has years of experience in deploying highperformance<br />
CUDA applications onto rugged platforms<br />
like tanks, humvees, and UAVs. Currently, he works for<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
113
SPEAKERS AND<br />
PANELISTS<br />
GE as a GP<strong>GPU</strong> Applications Engineer and lives near<br />
Washington DC.<br />
h Session(s): S0253 - Sensor Processing with<br />
Rugged Kepler <strong>GPU</strong>s (Wednesday, 09:00, Room: M)<br />
Tom Furlong<br />
Managing Director (Granite Ventures LLC)<br />
Tom joined Granite Ventures in 2000, after a successful<br />
career in Silicon Valley that included stints as a vice<br />
president at Zhone Technologies, a communications<br />
equipment provider, and as a partner with a leading<br />
valley law firm, where he spent 13 years counseling<br />
technology companies, venture capitalists and<br />
investment banks. Tom currently serves on the Boards<br />
of Directors for Aspen Avionics, GoingOn Networks,<br />
Indicee, Mixamo and Skytide. Prior investments include<br />
Biz360 (acquired by Attensity), Digital Fountain (acquired<br />
by QualComm), Five Across (acquired by Cisco), Kinecta<br />
(acquired by Stellent), and TuVox (acquired by West<br />
Interactive).<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Ravikumar G.V.V.<br />
(Infosys Ltd, Bangalore)<br />
Biography unavailable at press time.<br />
h Session(s): S0214 – <strong>GPU</strong> Based Stacking Sequence<br />
Optimization For Composite Skins Using GA<br />
(Wednesday, 15:00, Room: K)<br />
Klaus Gaedke<br />
Lab Manager (Technicolor)<br />
Klaus Gaedke studied Electrical and Electronic<br />
Engineering at the University of Hannover, Germany, and<br />
received his Dipl.-Ing. and PhD degree from this<br />
institution. In 1996 he started to work for Technicolor<br />
Research and Innovation. Currently, he is responsible for<br />
Technicolor’s Image Processing Lab. His research<br />
interest include parallel programming, parallel real-time<br />
processing architectures and real-time implementation<br />
of image processing algorithms.<br />
h Session(s): S0073 - Cost-effective <strong>GPU</strong><br />
Acceleration of a Video Restoration and Archiving<br />
Workflow (Wednesday, 15:30, Room: A1)<br />
Daniel Gaudlitz<br />
Research Associate (Technische Universität München)<br />
As a research associate at Technische Universität<br />
München, Daniel Gaudlitz works on complex multiphase<br />
flows and their numerical modelling. Also efficient<br />
methods for HPC in academia and industry is a major<br />
research focus. Daniel Gaudlitz also leads R&D activities at<br />
the engineering company FluiDyna GmbH. After gratuating<br />
with a master’s degree from TU Dresden in 2003, he joined<br />
TU München and received a PhD in 2008 for his research<br />
on numerical simulations of multiphase flows.<br />
h Session(s): S0296 - A <strong>GPU</strong>-Enabled SPH Method<br />
for Micro and Nanofluidic Simulations<br />
(Tuesday, 09:00, Room: A7)<br />
Wei Ge<br />
Professor (Institute of Process Engineering, Chinese<br />
Academy of Sciences)<br />
Prof. Ge got his PhD degree at Harbin Institute of<br />
<strong>Technology</strong> in 1998 and has been professor of chemical<br />
engineering at Institute of Process Engineering, Chinese<br />
Academy of Sciences since 2006. He is mainly engaged<br />
in multi-scale simulation of particle-fluid two-phase<br />
systems. He proposed the so-called “pseudo-particle”<br />
model which enables simulation of macro-scale flow<br />
phenomena from microscopic physics through largescale<br />
parallel computation. As project leader, he has<br />
been working on the multi-scale software and hardware<br />
systems to bridge the simulation of molecular details to<br />
reactor performance.<br />
h Session(s): S0268 - Virtual Process Engineering<br />
- Realtime Simulation of Multiphase Systems<br />
(Tuesday, 09:00, Room: A8)<br />
h S0057 - <strong>GPU</strong>-Accelerated Molecular Dynamics<br />
Simulation of Solid Covalent Crystals<br />
(Thursday, 09:00, Marriott Ballroom 4)<br />
Isaac Gelado<br />
Senior Researcher (Barcelona Supercomputing Center)<br />
Isaac Gelado is a Senior Researcher at the Barcelona<br />
Supercomputing Center and a Visiting Scholar at the<br />
Coordinated Science Laboratory at the University of<br />
Illinois. At BSC, Isaac is working in the Mont-Blanc<br />
project and the NVIDIA CUDA Center of Excellence. Isaac<br />
holds a Master’s degree on Telecommunications<br />
Engineering from the Universidad de Valladolid, and a<br />
PhD degree from The Department of Computer<br />
Architecture in the Universitat Politecnica de Catalunya,<br />
where he also held a teaching position in the Computer<br />
Architecture Department.<br />
h Session(s): S0333 – GMAC-2: Easy and Efficient<br />
<strong>Program</strong>ming for CUDA-Based Systems<br />
(Thursday, 09:00, Room: B)<br />
Shaul Geldman<br />
Co-Founder and VP of R&D (RealView Imaging Ltd.)<br />
Mr. Gelman is an experienced R&D executive with over<br />
twelve years of hands-on experience in cutting edge<br />
projects in the field of multidisciplinary display<br />
technologies. Mr. Gelman co-founded RealView Imaging<br />
in 2008 and has been leading all the company’s R&D<br />
activities since inception. Prior to that, Shaul worked for<br />
Elbit Systems (NASDAQ: ESLT), one of Israel’s largest<br />
defense companies, leading the development of<br />
high-end helmet-mounted display systems for aviation/<br />
pilot applications. Mr. Gelman earned his Executive MBA<br />
from the Haifa University, and a B.Sc. in Industrial<br />
Engineering & Management from the Technion, Israel<br />
Institute of <strong>Technology</strong>.<br />
h Session(s): S2005 – Emerging Companies Summit:<br />
CEO on Stage Featuring RealView Imaging,<br />
Elemental Technologies, and Mersive<br />
(Wednesday, 16:00, Marriott Ballroom 4)<br />
Geoff Gerfin<br />
Sr. System Software Engineer and Technical Manager<br />
(NVIDIA)<br />
Geoff Gerfin is currently a Sr. System Software Engineer<br />
and Technical Manager in the CUDA Tools Group at<br />
NVIDIA, where he develops and manages tools for<br />
next-generation <strong>GPU</strong> architectures. Geoff has worked in<br />
the HPC community since receiving his degree in<br />
Computer Engineering from the University of Delaware<br />
in 2005.<br />
h Session(s): S0027A - All-In-One Debugging<br />
Experience with CUDA-GDB and CUDA-MEMCHECK<br />
(Monday, 14:30, Room: A5)<br />
h S0027B - All-In-One Debugging Experience with<br />
CUDA-GDB and CUDA-MEMCHECK<br />
(Wednesday, 14:00, Room: C)<br />
Denis Gerrer<br />
Denis Gerrer has 20 years of experience in HPC<br />
previously working for SGI and Altair Engineering. As<br />
CAPS VP and General Manager Americas, he is now in<br />
charge of relations with CAPS Enterprise partners.<br />
h Session(s): S0646 Massively Parallel Code<br />
Development on Stelletto CDA (Presented by<br />
Creative Consultants) (Tuesday, 17:00, Room: A8)
Flip Gianos<br />
General Partner (Interwest Partners)<br />
Philip “Flip” Gianos has been part of InterWest’s IT team<br />
since 1982. With a background in engineering, he has<br />
invested in multiple areas of information technology,<br />
including semiconductors, computing and networking<br />
equipment, and infrastructure and applications software.<br />
He is chairman of the board of Xilinx (XLNX), a publicly<br />
held company, and is also a board member of several<br />
privately held companies, including: Bivio Networks,<br />
Brand.net, Convey Computer, and SpectraLinear. Gianos<br />
also serves on the advisory board of Storm Ventures II,<br />
and is a past president of the Western Association of<br />
Venture Capitalists.<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Oliver Gicquel<br />
Professor (Laboratoire E.M2.C, Ecole Centrale Paris)<br />
Biography unavailable at press time.<br />
h Session(s): S0129 – A Monte Carlo Thermal<br />
Radiation Solver in <strong>GPU</strong>/CPU Hybrid Architecture<br />
(Thursday, 09:00, Room: A8)<br />
Ben Goertzel<br />
CEO (Novamente LLC)<br />
Biography unavailable at press time.<br />
h Session(s): S0104 - <strong>GPU</strong> Implementation of Deep<br />
Learning for Intelligent Computer Vision<br />
(Tuesday, 16:30, Room: A1)<br />
James Goodman<br />
President/CEO (HySpeed Computing LLC)<br />
Dr. Goodman is founder and President/CEO of HySpeed<br />
Computing, a technology company specializing in<br />
developing advanced algorithms and analytic tools for<br />
the geospatial community. His expertise includes remote<br />
sensing, image analysis, mathematical modeling, and<br />
high performance computing. Dr. Goodman maintains<br />
academic affiliations with the University of Puerto Rico<br />
at Mayaguez and the University of Miami, where<br />
research is focused on remote sensing of coastal<br />
ecosystems. He has been awarded grants from NASA,<br />
NSF and NOAA, and collaborated with investigators from<br />
around the world. He is also active in the scientific<br />
community, publishing research and leading sessions at<br />
international conferences.<br />
h Session(s): S0290 - Algorithm Acceleration for<br />
Geospatial Analysis (Thursday, 09:30, Marriott<br />
Ballroom 3)<br />
David Goodwin<br />
Software Engineer (NVIDIA)<br />
David is technical lead for the CUDA Visual Profiler<br />
at NVIDIA.<br />
h Session(s): S0419A - Optimizing Application<br />
Performance with CUDA Profiling Tools<br />
(Tuesday, 09:00, Room: C)<br />
h S0420 - NSight IDE for Linux and Mac<br />
(Wednesday, 09:00, Room: A5)<br />
h S0419B - Optimizing Application Performance with<br />
CUDA Profiling Tools (Wednesday, 14:00, Room: A5)<br />
Chris Gottbrath<br />
Principal Product Manager (Rogue Wave Software)<br />
Chris Gottbrath is Principal Product Manager for<br />
TotalView, MemoryScape, ReplayEngine and<br />
ThreadSpotter at Rogue Wave Software. He’s worked<br />
with the TotalView debugger for more than a decade in a<br />
range of technical and marketing roles. Prior to that he<br />
wrote his fair share of bugs in linux-based numerical<br />
simulations of galaxy dynamics and large scale structure<br />
as a graduate student in Tucson, AZ. He has a Masters<br />
of Science in Astronomy and Astrophysics from the<br />
University of Arizona.<br />
h Session(s): S0340 - Debug Multi-<strong>GPU</strong> Applications<br />
on CUDA-Accelerated Clusters with TotalView<br />
(Wednesday, 15:30, Room: A5)<br />
Jérôme Graindorge<br />
Project Manager (ALYOTECH)<br />
Graindorge has been working for six years for ALYOTECH<br />
(a software services company) first as a software<br />
engineer, and most recently as a project manager<br />
specially dedicated to HPC and particularly <strong>GPU</strong>-based<br />
scientific applications.<br />
h Session(s): S0053 - Real Time <strong>GPU</strong>-Based Marine<br />
Scenes Simulation (Thursday, 10:00, Room: N)<br />
Alan Gray<br />
HPC Architect (The University of Edinburgh)<br />
Dr. Alan Gray was awarded a Ph.D. at The University of<br />
Glasgow in Theoretical Particle Physics in 2003, winning<br />
the 2004 Ogden Prize for the best UK thesis in particle<br />
physics phenomenology. He furthered this work under a<br />
fellowship at The Ohio State University, and since joining<br />
EPCC in 2005 he has been involved with a wide range of<br />
HPC-related projects: lately his research has focused on<br />
the role <strong>GPU</strong>s will play in future generations of<br />
supercomputers, including participation in the OpenMP<br />
language committee exploring adoption of accelerators.<br />
He has authored a large number of refereed and<br />
highly-cited publications.<br />
h Session(s): S0286 - Scaling Applications to a<br />
Thousand <strong>GPU</strong>s and Beyond<br />
(Wednesday, 16:00, Room: A2)<br />
Simon Green<br />
Senior Software Engineer (NVIDIA)<br />
Simon Green is a senior member of the Developer<br />
<strong>Technology</strong> group at NVIDIA, specializing in real-time<br />
compute, rendering and physical simulation. He started<br />
graphics programming on the Sinclair ZX-81, which had<br />
1 kB of RAM and a screen resolution of 64 by 48 pixels,<br />
and has been trying to improve the quality of real-time<br />
graphics ever since.<br />
h Session(s): S0102 - Flame On: Real-Time Fire<br />
Simulation for Video Games<br />
(Tuesday, 09:00, Room: J1)<br />
Ray Grout<br />
(National Renewable Energy Laboratory)<br />
Dr. Grout’s research interests as part of the<br />
Computational Science Center at the National<br />
Renewable Energy Laboratory include algorithmic<br />
advances to facilitate integrating partial differential<br />
equations (PDEs) numerically on future architectures<br />
and development of future computation fluid dynamics<br />
(CFD) capabilities with particular emphasis on reacting<br />
flows. Dr. Grout has expertise in development of<br />
turbulent combustion submodels and has a wealth of<br />
experience developing several combustion codes at<br />
different institutions. His recent work has focused on the<br />
development of DNS (direct numerical simulation)<br />
databases for jets in cross flow from peta-scale,<br />
high-fidelity simulations in collaboration with the gas<br />
turbine industry. A key outcome of this work has been<br />
insight into the importance of low-velocity recirculation<br />
zones and stratified combustion in the stabilization of<br />
flames above a jet in cross flow. Earlier work involved<br />
using DNS to probe fundamental understanding of<br />
stratified combustion, to investigate appropriate flame<br />
markers (progress variables, tracers), and to propose<br />
new models for the combined effects of flame<br />
propagation and mixing. Dr. Grout also has experience<br />
deploying models for gaseous auto-ignition using<br />
commercial CFD codes.<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
115
SPEAKERS AND<br />
PANELISTS<br />
h Session(s): S0625 S3D Direct Numerical<br />
Simulation - Preparations for the 10-100PF Era<br />
(Tuesday, 15:00, Room: A2)<br />
Vinod Grover<br />
Senior Manager (NVIDIA)<br />
Vinod Grover manages the compiler team at NVIDIA and<br />
responsible for compilation of CUDA and OpenCL to PTX<br />
ISA. Vinod has been with NVIDIA for 4 years and at<br />
Microsoft and Sun Microsystems before that. He<br />
holds a Master’s degree in computer science from<br />
Syracuse University.<br />
h Session(s): S0235 - Compiling CUDA and Other<br />
Languages for <strong>GPU</strong>s (Wednesday, 10:00, Room: A5)<br />
Guy Gueritz<br />
(Bull)<br />
Guy Gueritz joined Bull in 2008 to develop Bull’s HPC<br />
business in the upstream oil and gas industry, with<br />
particular focus on <strong>GPU</strong>-accelerated hybrid systems for<br />
advanced seismic imaging applications such as Reverse<br />
Time Migration. He has over twenty years’ experience in<br />
HPC and visualization applied to the geosciences, with<br />
previous roles in Hewlett-Packard, Linux Networx and<br />
SGI. His worldwide responsibilities include working with<br />
oil companies, seismic contractors, independent<br />
software vendors and technology partners to deploy<br />
advanced imaging capabilities on scalable HPC systems.<br />
He regularly participates in oil industry seminars and<br />
conferences and is a member of SEG and EAGE.<br />
h Session(s): S0643 Hybrid Architectures for<br />
Advanced Seismic Imaging: Recent Experiences at<br />
Bull (Presented by Bull) (Tuesday, 17:00, Room: M)<br />
Thomas Guignon<br />
Research Engineer (IFPEN)<br />
Biography unavailable at press time.<br />
h Session(s): S0108 - An Innovative Massively<br />
Parallelized Molecular Dynamic Software<br />
(Tuesday, 16:00, Room: C)<br />
Kshitij Gupta<br />
Graduate Student Researcher (UC Davis)<br />
Kshitij Gupta is a Ph.D. candidate in the Department of<br />
Electrical & Computer Engineering at UC Davis. He is<br />
interested in a variety of application domains like audio,<br />
image, and video. His primary interests are in exploring<br />
novel ways of transforming today’s high-performance<br />
algorithms onto emerging low-end, low-power, hybrid<br />
(CPU/<strong>GPU</strong>/DSP/ASIP) processors targeted towards<br />
mobile and automotive platforms. In his spare time, he<br />
likes procrastinating about novel user-interfaces, and<br />
hopes to work more actively on it some day. Kshitij<br />
received his Masters in EE from University of Pittsburgh<br />
(PA, USA), and his Bachelors in ECE from Osmania<br />
University (Hyderabad, India).<br />
h Session(s): S0157 - A Study of Persistent Threads<br />
Style <strong>Program</strong>ming Model for <strong>GPU</strong> Computing<br />
(Thursday, 15:00, Room: B)<br />
Pankaj Gupta<br />
Bioinformatics Application Developer (St Jude Children’s<br />
Research Hospital)<br />
Pankaj is working as a Bioinformatics Application<br />
Developer at St. Jude Children’s Research Hospital in<br />
Memphis, TN. He received his bachelor’s degree in<br />
Computer Science from Rutgers University and his<br />
master’s degree in Computational Bioscience from<br />
Arizona State University. He likes working with opensource<br />
technologies whenever possible.<br />
h Session(s): S0083 - Swift: A <strong>GPU</strong>-based Smith-<br />
Waterman Sequence Alignment <strong>Program</strong><br />
(Tuesday, 09:30, Room: K)<br />
Rohit Gupta<br />
PhD Student (Delft University of <strong>Technology</strong>)<br />
Rohit completed his masters at the Delft University of<br />
<strong>Technology</strong> in computer engineering. During his<br />
masters’ thesis he worked on implementing a<br />
preliminary version of a preconditioned conjugate<br />
gradient solver on the <strong>GPU</strong>. He continued at the Delft<br />
Institute of Applied Mathematics as a phd student after<br />
graduating. His primary focus is to find new<br />
preconditioning methods that are suited to the <strong>GPU</strong> and<br />
the same time are at par with established parallelizable<br />
preconditioning techniques like Block Incomplete<br />
Cholesky in terms of achievable precision and<br />
mathematical stability.<br />
h S0063 - Robust Preconditioned Conjugate Gradient<br />
for the <strong>GPU</strong> and Parallel Implementations<br />
(Thursday, 16:00, Room: N)<br />
Sebastien Gurrieri<br />
Quantitative Analyst (Mizuho International)<br />
With a background of research in Theoretical Physics<br />
(String Theory), Gurrieri switched to finance 4 years ago.<br />
He is now working in the London branch of a Japanese<br />
investment bank and specializes in Risk Management of<br />
Fixed Income and Equity products. Until now he has<br />
been mostly interested in calibration and Monte-Carlo<br />
simulation issues, although he has also done some work<br />
on Finite Difference methods.<br />
h Session(s): S0206 - Monte-Carlo Pricing<br />
Under a Hybrid Local Volatility Model<br />
(Wednesday, 16:00, Room: L)<br />
Tobias Gysi<br />
(Supercomputing Systems AG)<br />
Tobias Gysi graduated 2005 in computer science from<br />
ETH Zurich, Switzerland. He joined the R&D service<br />
provider Supercomputing Systems AG (SCS), working on<br />
advanced topics such as cryptography, image<br />
processing, speech recognition, and Monte-Carlo<br />
pricing. Tobias’ work has a strong focus on performance<br />
optimizations - developing more efficient<br />
implementation strategies and algorithms, and<br />
employing accelerators such as <strong>GPU</strong>s or FPGAs.<br />
Currently Tobias is dealing with a community code<br />
project where software maintainability and<br />
(performance) portability are key issues.<br />
h Session(s): S0256 – A Stencil Library for the New<br />
Dynamic Core of COSMO (Thursday, 09:00, Room: N)<br />
Alexander Haberstroh<br />
Software Developer (Jedox AG)<br />
Alexander Haberstroh studied computer science with a<br />
focus on image processing at the University of Freiburg,<br />
Germany, where he obtained his Master’s degree in<br />
2010. Between 2008 and 2010, he was also working at<br />
the Fraunhofer Institute for Solar Energy Systems.<br />
During his studies he worked on his first CUDA project,<br />
developing algorithms for comparing depth maps which<br />
are used in mobile robot mapping. Since 2011, he has<br />
been working at Jedox, concentrating on <strong>GPU</strong><br />
algorithms for multidimensional databases in the area<br />
of Business Intelligence.<br />
h Session(s): S0219 – Efficient Top-Down Planning in<br />
Business Intelligence (Tuesday, 17:00, Room: C)<br />
Markus Hadwiger<br />
Assistant Professor (KAUST)<br />
Markus Hadwiger is an assistant professor of computer<br />
science at King Abdullah University of Science and<br />
<strong>Technology</strong> (KAUST) in Saudi Arabia. His research<br />
interests are petascale visual computing and scientific<br />
visualization, volume rendering, and <strong>GPU</strong> algorithms in<br />
general. He is currently teaching classes on scientific
visualization, and <strong>GPU</strong> and GP<strong>GPU</strong> programming. He<br />
obtained a PhD in computer science from the Vienna<br />
University of <strong>Technology</strong>. He has taught a series of<br />
courses on various aspects of visualization and volume<br />
rendering at ACM SIGGRAPH, IEEE Visualization, and<br />
Eurographics, and is a coauthor of the book Real-Time<br />
Volume Graphics (A.K. Peters, 2006).<br />
h Session(s): S0202 – Terascale Volume Visualization<br />
in Neuroscience (Wednesday, 16:30, Room: A8)<br />
Yoshiaki Hanada<br />
CEO (Prometech Software, Inc.)<br />
Yoshiaki Hanada is CEO of Prometech Software and<br />
works on promoting a particle simulation technology<br />
from Japan to the world. In his former job, he worked at<br />
Accenture Japan as a management consultant. In 2006<br />
he recieved a master’s degree from the Department of<br />
Advanced Energy, Graduate School of Frontier Sciences,<br />
The University of Tokyo.<br />
h Session(s): S0066 - Particleworks: Particle-based<br />
CAE Software Fully Ported on Multi-<strong>GPU</strong><br />
(Wednesday, 10:00, Room: K)<br />
Jerry Harris<br />
Senior Computer Scientist II (Adobe Systems)<br />
For 25+ years, Jerry has focused on deploying engaging<br />
commercial imaging applications. First as part of a<br />
startup that delivered the first commercial color paint<br />
program to the macintosh, later at Apple, and for the<br />
past 15 years at Adobe working on Photoshop. Has been<br />
an engineer on the Photoshop team starting on version<br />
5.0. Responsible for Layer Effects, Painting, Warping,<br />
and <strong>GPU</strong> acceleration. His current focus in on <strong>GPU</strong><br />
enablement, and the delivery of joy of use via immersive<br />
fluid workflows.<br />
h Session(s): S0395 - <strong>GPU</strong> Enablement in Adobe<br />
Photoshop (Tuesday, 09:00, Room: A2)<br />
Mark Harris<br />
Chief Technologist, <strong>GPU</strong> Computing (NVIDIA)<br />
Mark Harris is Chief Technologist for <strong>GPU</strong> Computing at<br />
NVIDIA, where he works as a developer advocate and<br />
helps drive NVIDIA’s <strong>GPU</strong> computing software strategy.<br />
His research interests include parallel computing,<br />
general-purpose computation on <strong>GPU</strong>s, physically based<br />
simulation, and real-time rendering. Mark founded www.<br />
GP<strong>GPU</strong>.org while he was earning his PhD in computer<br />
science from the University of North Carolina at Chapel<br />
Hill. Mark brews his own beer and cures his own bacon<br />
in Brisbane, Australia, where he lives with his wife and<br />
daughter.<br />
h Session(s): S0517A - <strong>Program</strong>ming <strong>GPU</strong>s with<br />
OpenACC (Part 1 of 3) (Monday, 10:30, Room: B)<br />
h S0517B - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />
2 of 3) (Monday, 13:00, Room: B)<br />
h S0517C - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part 3<br />
of 3) (Monday, 14:30, Room: B)<br />
h S0641 - CUDA 5 and Beyond (Tuesday, 16:00, Hall 1)<br />
h S0653 - C++ and CUDA Birds-of-a-Feather<br />
(Wednesday, 18:00, Room: L)<br />
Mike Heck<br />
<strong>Technology</strong> Advisor (VSG)<br />
Biography unavailable at press time.<br />
h Session(s): S0444 - Explore New Techniques in<br />
Volume Rendering/Segmentation with Open<br />
Inventor (Tuesday, 15:30, Room: A7)<br />
Francisco J. Hernandez-Lopez<br />
(PhD Student, CIMAT A.C.)<br />
Francisco received a bachelor’s degree in computer<br />
systems engineering from the San Luis Potosi Institute<br />
of <strong>Technology</strong>, Mexico in 2005. He received the MSc<br />
degree in Computer Science from the Center for<br />
Research in Mathematics (CIMAT) in 2009. Since then, he<br />
is doctoral student at the CIMAT where he has been<br />
granted a CONACYT scholarship. His main interests are<br />
in the area of computer vision and in particular the<br />
development of efficient, parallel, algorithms for video<br />
processing and analysis.<br />
h Session(s): S0128 - V:Screen: A Real-Time<br />
Augmented Video Method<br />
(Wednesday, 17:00, Room: A1)<br />
David Helgason<br />
CEO (Unity Technologies)<br />
David Helgason, an entrepreneur, visionary and<br />
ex-programmer, has served as the CEO of Unity<br />
Technologies since co-founding it in 2003. The vision is<br />
to democratize game development and develop<br />
technology for the next generation of the industry. David<br />
founded and participated in startups in fields such as<br />
news and community integration, music distribution and<br />
consulting. He serves on the boards of several games<br />
and technology startups.<br />
h Session(s): S2001 – Emerging Companies Summit:<br />
CEO on Stage Featuring Unity Technologies,<br />
MirriAd and BioDigital<br />
(Wednesday, 10:00, Marriott Ballroom 4)<br />
Jeff Herbst<br />
Vice President of Business Development (NVIDIA)<br />
Jeff is the Vice President of Business Development at<br />
NVIDIA Corporation, the world leader in visual<br />
computing technologies (and inventor of the <strong>GPU</strong>). In<br />
this role, which he has held since 2001, Jeff leads<br />
NVIDIA’s worldwide business development efforts,<br />
including overall ecosystem development, mergers and<br />
acquisitions strategy, investments, partnerships and<br />
other strategic business relationships and transactions.<br />
Prior to NVIDIA, Jeff was the worldwide head of<br />
corporate and business development at AltaVista, and<br />
also served as general manager for a start-up focused<br />
on content delivery infrastructure for wireless networks.<br />
Earlier in his career, Jeff was a partner with the law firm<br />
of Wilson Sonsini where he specialized in corporate<br />
finance, joint ventures, mergers and acquisitions and<br />
other strategic business and intellectual propertyrelated<br />
transactions. Jeff holds a B.S degree in<br />
Computer Science from Brown University (where he<br />
studied computer graphics), and a law degree from<br />
Stanford Law School.<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Berk Hess<br />
PhD Student (KTH Royal Institute of <strong>Technology</strong>)<br />
Biography unavailable at press time.<br />
h Session(s): S0363 – Efficient Molecular Dynamics<br />
on Heterogeneous <strong>GPU</strong> Architectures in GROMACS<br />
(Wednesday, 16:00, Room: N)<br />
Christopher Horvath<br />
Global <strong>Technology</strong> Technical Director (Pixar)<br />
Biography unavailable at press time.<br />
h Session(s): S0102 – Flame On: Real-Time Fire<br />
Simulation for Video Games<br />
(Tuesday, 09:00, Room: J1)<br />
Julien Houssay<br />
Software Engineer (ALYOTECH)<br />
Julien is a software engineer at ALYOTECH, specialized<br />
in <strong>GPU</strong> computing in scientific applications. He is<br />
currently working on a marine scene simulator mixing<br />
electro-optics and radar, using <strong>GPU</strong> for both general<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
117
SPEAKERS AND<br />
PANELISTS<br />
purpose computing (CUDA and/or OpenCL) and<br />
rendering (OpenGL).<br />
h Session(s): S0053 – Real Time <strong>GPU</strong>-Based Marine<br />
Scenes Simulation (Thursday, 10:00, Room: N)<br />
Agatha Hu<br />
Developer technology Engineer (NVIDIA)<br />
Agatha Hu is Developer <strong>Technology</strong> Engineer at NVIDIA<br />
Corporation. She received a master’s degree in Biomedical<br />
Engineering from Shanghai Jiaotong University. Her work<br />
includes developing data parallel algorithms on <strong>GPU</strong> for<br />
bioinformatics as well as image processing.<br />
h Session(s): S0084 CUMACH - A Fast <strong>GPU</strong>-based<br />
Genotype Imputation Tool<br />
(Wednesday, 16:30, Room: B)<br />
Jen-Hsun Huang<br />
Co-Founder, President and CEO (NVIDIA)<br />
Jen-Hsun Huang co-founded NVIDIA in 1993 and has<br />
served since its inception as president, chief executive<br />
officer and a member of the board of directors. Under<br />
his leadership, NVIDIA invented the graphics processing<br />
unit (<strong>GPU</strong>) in 1999. Since then, it has consistently set<br />
new standards in visual computing with breathtaking,<br />
interactive graphics available on devices ranging from<br />
tablets and portable media players to notebooks and<br />
workstations. NVIDIA’s expertise in programmable <strong>GPU</strong>s<br />
has led to breakthroughs in parallel processing which<br />
make supercomputing inexpensive and widely<br />
accessible. The company holds more than 1,100 U.S.<br />
patents, including ones covering designs and insights<br />
fundamental to modern computing.<br />
h Session(s): S3000: Opening Keynote<br />
(Tuesday, 10:30, Keynote Hall)<br />
h S2003: Emerging Companies Summit Fireside Chat<br />
(Wednesday, 14:00, Marriott Ballroom 4)<br />
John Humphrey<br />
Engineering Director (EM Photonics)<br />
John received his MSEE from the University of Delaware<br />
in 2004 and has been working in the field of accelerated<br />
computing for 10 years. The past six years have focused<br />
primarily on <strong>GPU</strong> applications, in areas ranging from<br />
computational electromagnetics to computational fluid<br />
dynamics and linear algebra libraries.<br />
h Session(s): S0304 - Large Scale Computational<br />
Fluid Dynamics Simulations on Hybrid<br />
Supercomputers (Wednesday, 10:30, Room: K)<br />
h S0307 - New Advances in <strong>GPU</strong> Linear Algebra<br />
(Wednesday, 14:00, Room: A3)<br />
h S0709- Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 2<br />
(Thursday, 14:00, Room: J1)<br />
Maxwell Hutchinson<br />
PhD Student (University of Chicago)<br />
Maxwell is currently a physics PhD student at the<br />
University of Chicago, funded by a Department of Energy<br />
Computational Science Graduate Fellowship. He has<br />
been working with GP<strong>GPU</strong>s since 2008, applying them to<br />
problems in electronic structure, Ising models, error<br />
correction in radio systems, and post-processing for<br />
particle detectors.<br />
h Session(s): S0378 - VASP Accelerated with <strong>GPU</strong>s<br />
(Thursday, 14:00, Marriott Ballroom 4)<br />
Saeed Iqbal<br />
Senior Systems Engineer (Dell)<br />
Saeed Iqbal is a Senior Systems Engineer in the Global<br />
Solutions Engineering Group at Dell. Currently, he is the<br />
lead engineer on integration and performance analysis<br />
of <strong>GPU</strong>s in the Dell HPC solutions. He is also the lead<br />
engineer of the HPC advisor online tool at Dell.com/hpc.<br />
This tool is used by HPC customers to configure <strong>GPU</strong><br />
enabled HPC clusters and associated high performance<br />
parallel storage clusters.<br />
h Session(s): S0309 – Dynamically Allocating GP<strong>GPU</strong><br />
to Host Nodes (Servers) (Thursday, 10:30, Room: K)<br />
Olexan Isayev<br />
Research Scientist (Case Western Reserve University)<br />
Olexan Isayev was born in Ukraine and earned his Ph.D.<br />
in Theoretical Chemistry under the supervision of Jerzy<br />
Leszczynski at Jackson State University. He is currently<br />
joint Postdoctoral Fellow at Case Western Research<br />
University and US Army Engineering Research and<br />
Development Center (ERDC). Dr. Isayev’s research<br />
interests focused on structure and dynamics at bio-nano<br />
interfaces, fist principles and hybrid QM/MM simulations<br />
and high performance computing.<br />
h Session(s): S0315 - Probing Bio-Nano Interface<br />
Structure from Microsecond Molecular Dynamics<br />
on <strong>GPU</strong>s (Thursday, 10:00, Marriott Ballroom 4)<br />
Michel Izygon<br />
CTO (Tietronix Software, Inc.)<br />
Dr. Izygon has been involved in Solar Energy Projects<br />
since 1982, when he became the Principal Investigator<br />
on a French-Israeli research project to build and assess<br />
the performance of different solar energy concentrating<br />
systems. Since 1999, Dr. Izygon has been the co-founder<br />
and CTO of Tietronix Software, a company specializing<br />
in custom software development for customers such<br />
as NASA.<br />
h Session(s): S0321 – <strong>GPU</strong>-Based Monte Carlo Ray<br />
Tracing Simulation for Solar Power Plants<br />
(Tuesday, 14:00, Room: A8)<br />
Kevin Jackson<br />
Founder / CEO (Viewpartners)<br />
Kevin Jackson is founder and CEO of Viewpartners. He<br />
has 20+ years of visual media experience as one of the<br />
first in L.A.’s special effects market. He has worked with<br />
the biggest names in the film and advertising industry –<br />
Sony, Disney, BBDO, JWT, and others.<br />
h Session(s): S0425 - File Sharing Plus Real Time<br />
Media and Document Collaboration<br />
(Wednesday, 17:30, Room: A1)<br />
Jan Jacob<br />
Postdoctoral Researcher (University of Hamburg)<br />
Dr. Jan Jacob is a postdoctoral researcher at the<br />
Institute of Applied Physics of the University of Hamburg,<br />
Germany. He studied physics in Hamburg and graduated<br />
in 2007 with his diploma thesis “Preparation and<br />
Characterization of Spin Filters based on InAs Quantum-<br />
Point Contacts”. Two years later in 2009 he received his<br />
Ph.D. from the University of Hamburg for his thesis<br />
“All-electrical InAs Spin Filters”. Since then he expanded<br />
his research from low-temperature magnetotransport<br />
measurements also to numerical high-performance<br />
computing simulations of spin and charge transport in<br />
mesoscopic systems to model spintronic devices.<br />
h Session(s): S0379 - <strong>GPU</strong>-based High-Performance<br />
Simulations for Spintronics<br />
(Tuesday, 14:30, Room: A8)<br />
M. Saleet Jafri<br />
Professor and Chair (George Mason University)<br />
M. Saleet Jafri is a Professor in the School of Systems<br />
Biology at George Mason University. His current research<br />
uses detailed multi-scale models consisting of the<br />
subcellular, cellular, and tissue components to
understand the mechanisms that give rise to complex<br />
diseases in the heart such as cardiac arrhythmia,<br />
ischemic heart disease, and heart failure. <strong>GPU</strong> computing<br />
plays a central role in these studies. He received his PhD<br />
from Mount Sinai School of Medicine/CUNY in the<br />
Biomathematical Sciences, MS in Mathematics from the<br />
Courant Institute of Mathematical Sciences at NYU and is<br />
BS in mathematics from Duke University.<br />
h Session(s): S0072 - <strong>GPU</strong>-Enabled Spatiotemporal<br />
Model of Stochastic Cardiac Calcium Dynamics and<br />
Arrhythmias (Wednesday, 09:00, Room: B)<br />
Michal Januszewski<br />
PhD Student and Software Engineer (University of Silesia<br />
in Katowice; Google Switzerland)<br />
Michał Januszewski is a Software Engineer at Google<br />
Switzerland and a PhD student at the University of<br />
Silesia in Katowice under the supervision of Prof. Marcin<br />
Kostur. His current research is centered around applying<br />
mesoscale hydrodynamics simulation methods to<br />
biologically relevant flows. Michał is also the leader of<br />
the Sailfish project, an open source effort to build a<br />
highly scalable lattice Boltzmann fluid dynamics solver<br />
for <strong>GPU</strong>s.<br />
h Session(s): S0258 - Sailfish: Lattice Boltzmann<br />
Fluid Simulations with <strong>GPU</strong>s and Python<br />
(Tuesday, 09:30, Room: A7)<br />
WeiLe Jia<br />
Postgraduate Student (Supercomputing Center of CNIC,<br />
Chinese Academy of Sciences)<br />
Weile Jia is a post-graduate student from<br />
Supercomputing Center of Chinese Academy of Sciences.<br />
h Session(s): S0392 - Large-Scale First Principle<br />
Pseudopotential DFT Calculations on <strong>GPU</strong> Clusters<br />
(Thursday, 15:30, Marriott Ballroom 4)<br />
Stephen Jones<br />
CUDA Developer (NVIDIA)<br />
Stephen Jones is a member of CUDA’s parallel<br />
algorithms group. Having first worked on the CUFFT<br />
library, he moved on to architect the parallel system<br />
software framework which enables system I/O from <strong>GPU</strong><br />
kernels, and wrote the first parallel system calls. He has<br />
made a particular study of thread execution on the <strong>GPU</strong>,<br />
and now works on future <strong>GPU</strong> architectures and<br />
development of the CUDA programming model.<br />
h Session(s): S0313 – Understanding and using<br />
Atomic Memory Operations<br />
(Tuesday, 14:00, Marriott Ballroom 3)<br />
h S0642 - Inside Kepler (Wednesday, 14:00, Hall 1)<br />
h S0338 - New Features In the CUDA <strong>Program</strong>ming<br />
Model (Thursday, 10:00, Hall 1)<br />
h S0707- Los Alamos AHPC Symposium, Accelerated<br />
HPC Symposium: Scalability: Hardware and<br />
Software (Thursday, 09:00, Room: J2)<br />
Mark E S Joselli<br />
Researcher (UFF)<br />
Mark is a Industrial Engineer and Electrical Electronic<br />
emphasis by the Federal Center for Technological<br />
Education Celso Suckow da Fonseca (CEFET-RJ – 2005)<br />
and MSc in Computer Science from Federal Fluminense<br />
University (2007). He has experience in Computer<br />
Science with emphasis in Computer Methods and<br />
Techniques. Acting on the following topics: Games,<br />
Simulation, GP<strong>GPU</strong>.<br />
h Session(s): S0074 - Techniques for Designing<br />
GP<strong>GPU</strong> Games (Thursday, 17:00, Room: L)<br />
Guido Juckeland<br />
System Engineer (HPC), Leader Hardware Accelerator<br />
Group (TU Dresden - ZIH)<br />
Guido is a computer engineer at Technische Universität<br />
Dresden where he is responsible for the design, setup and<br />
operation of the HPC resources for the state of Saxony. He<br />
is also working on a Ph.D. thesis titled “Trace Based<br />
Performance Analysis for Hardware Accelerators”.<br />
h Session(s): S0067 – PICon<strong>GPU</strong> - Bringing largescale<br />
Laser Plasma Simulations to <strong>GPU</strong><br />
Supercomputing (Tuesday, 15:00, Room: A8)<br />
h S0257 - Trace Based Performance Analysis For<br />
<strong>GPU</strong> Accelerated Multi-Hybrid Applications<br />
(Wednesday, 16:30, Room: A5)<br />
Patrick Kano<br />
Co-Owner (Acunum Algorithms and Simulations, LLC)<br />
Patrick Kano’s background lies in algorithm and<br />
simulation development and physics based modeling. In<br />
addition to being a co-owner of Acunum, he is a<br />
consultant with PsiNapse <strong>Technology</strong> in the San<br />
Francisco Bay Area. He studied at the University of<br />
Arizona (2001-2005) and was a student at the Arizona<br />
Center for Mathematical Sciences. He received a<br />
Diplom-Physik from the Dresden University of<br />
<strong>Technology</strong> in 2000 and a BS in physics from the<br />
University of Nevada, Reno in 1998. From 1998 to 2000,<br />
he was a research assistant at the Max Planck Institute<br />
for the Physics of Complex Systems.<br />
h Session(s): S0415 - An Accelerated Weeks Method<br />
for Numerical Laplace Transform Inversion<br />
(Wednesday, 09:30, Marriott Ballroom 3)<br />
Steve Karmesin<br />
Senior Developer (Numerix)<br />
Dr. Steve Karmesin is a senior developer at Numerix LLC<br />
working with many aspects of the CrossAsset derivatives<br />
pricing and analytics software, from software<br />
architecture to numerical modeling to <strong>GPU</strong><br />
development. His <strong>GPU</strong> work rests on his background in<br />
supercomputing at the Los Alamos Advanced Computing<br />
Laboratory where he worked on numerous massively<br />
parallel projects including leading the POOMA (Parallel<br />
Object Oriented Methods and Algorithms) team for<br />
applying advanced C++ techniques to large scale<br />
scientific codes.<br />
h Session(s): S0383 - Speedup Derivatives and<br />
Structured Products Pricing, Reduce TCO Using<br />
<strong>GPU</strong>s (Wednesday, 09:00, Room: L)<br />
Eric Kelmelis<br />
CEO (EM Photonics)<br />
Biography unavailable at press time.<br />
h Session(s): S0304 – Large Scale Computational<br />
Fluid Dynamics Simulations on Hybrid<br />
Supercomputers (Wednesday, 10:30, Room: K)<br />
Christopher Kennelly<br />
Research Scientist (D. E. Shaw Research)<br />
Chris Kennelly received his B.S. in computer science<br />
from Caltech. During his time as an Amgen Scholar at<br />
Caltech, he developed algorithms for simulating DNA<br />
self-assembly. Since then, Chris has been employed at<br />
D.E. Shaw Research developing algorithms and software<br />
for Desmond.<br />
h Session(s): S0078 - Panoptes: A Binary<br />
Instrumentation Framework for CUDA<br />
(Thursday, 10:00, Room: B)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
119
SPEAKERS AND<br />
PANELISTS<br />
Osman Kent<br />
Co-Founder & CEO (Numecent)<br />
Osman Kent is a serial technology and media<br />
entrepreneur. He is best known as the co-founder & CEO<br />
of 3Dlabs – at one time a $1B company on NASDAQ and<br />
one of the fathers of the <strong>GPU</strong> and the OpenGL on the PC.<br />
He has a First Class double-major in Computer Science<br />
and Electronics from University of Birmingham (UK), is a<br />
fellow of the Royal Society (RSA) and was recently given<br />
the Freedom of London for lifetime contributions to the<br />
IT industry. He is the inventor of numerous patents in<br />
computing and graphics. In his spare time, Osman<br />
incubates musicians through his record label<br />
Songphonic, recites live poetry while improvising on the<br />
piano and produces music for films.<br />
h Session(s): S2003 – Emerging Companies<br />
Summit: CEO on Stage Featuring GAIKAI,<br />
Immersive Media and Numecent<br />
(Wednesday, 15:00, Marriott Ballroom 4)<br />
Mahesh Khadtare<br />
PhD Student - Scientist ESP (I2IT, Pune University)<br />
Biography unavailable at press time.<br />
h Session(s): S0103 - Accelerating Protein<br />
Sequences and Classification using <strong>GPU</strong>-HMMER<br />
Search (Wednesday, 15:30, Room: B)<br />
h S0107 - Acceleration of Long-Wave Rapid<br />
Radioactive Transfer Model on GP<strong>GPU</strong><br />
(Thursday, 10:30, Room: N)<br />
Brucek Khailany<br />
Senior Research Scientist (NVIDIA)<br />
Brucek Khailany joined NVIDIA in December 2009 as a<br />
member of the Computer Architecture Research Group.<br />
Previously, Dr. Khailany was a Co-Founder and Principal<br />
Architect at Stream Processors, Inc. (SPI) where he led<br />
research and development activities related to highlyparallel<br />
programmable processor architectures. He<br />
received his Ph.D. and Masters in Electrical Engineering<br />
from Stanford University and received B.S.E. degrees in<br />
Electrical Engineering and Computer Engineering from<br />
the University of Michigan.<br />
h Session(s): S0605 - cudaDMA: Emulating DMA<br />
engines on <strong>GPU</strong>s for Performance and<br />
<strong>Program</strong>mability (Wednesday, 17:00, Room: C)<br />
Ali Khajeh-Saeed<br />
PhD Candidate (University of Massachusetts, Amherst)<br />
Ali Khajeh-Saeed obtained his Ph.D. in Mechanical<br />
Engineering and in Computer Science at the University<br />
of Massachusetts Amherst in November 2011. Ali was<br />
awarded a bachelor and master degrees in Aerospace<br />
Engineering from Sharif University of <strong>Technology</strong>, Iran in<br />
2008. His main research interests are Computational<br />
Fluid Dynamics (CFD), parallel computation and<br />
General-Purpose computation on Graphics Processing<br />
Units (GP<strong>GPU</strong>). He is currently working as a software<br />
engineer in CD-Adapco.<br />
h Session(s): S0217 - Efficient Implementation of<br />
CFD Algorithms on <strong>GPU</strong> Accelerated<br />
Supercomputers (Wednesday, 17:30, Room: K)<br />
Oleh Khoma<br />
Head of HPC Unit (ELEKS)<br />
With background in Applied Mathematics and more than<br />
12 years of experience in Software Engineering, Oleh is a<br />
Head of HPC Unit at ELEKS and is leading companys<br />
efforts in the reign of High Performance Computing.<br />
During the last couple of years Oleh and his team has<br />
successfully completed several complex bespoke HPC<br />
solutions utilizing the power of NVIDIA GP<strong>GPU</strong> cards.<br />
Passionate about engineering, his largest affection is his<br />
team. When you have the right people, no problem is<br />
challenging enough.<br />
h Session(s): S6047 - Effective HPC Architecture -<br />
Design, Develop, Implement (Presented by ELEKS)<br />
(Wednesday, 17:00, Room: A7)<br />
Mark Kilgard<br />
Principal Software Engineer (NVIDIA)<br />
Mark J. Kilgard is a Principal System Software Engineer<br />
and an NVIDIA Distinguished Inventor based in Austin,<br />
Texas. Mark works on OpenGL, programmable shading<br />
languages, and <strong>GPU</strong>-rendering algorithms. Mark wrote<br />
numerous important OpenGL extension specifications<br />
and implemented the popular OpenGL Utility Toolkit<br />
(GLUT) for developing portable OpenGL examples and<br />
demos. Mark co-authored the book The Cg Tutorial: the<br />
definitive guide to programmable real-time graphics.<br />
Mark’s Karaoke rendition of Dolly Parton’s “9 to 5” can’t<br />
be beat.<br />
h Session(s): S0023 - NVIDIA OpenGL for <strong>2012</strong><br />
(Monday, 09:00, Room: A3)<br />
h S0024 - <strong>GPU</strong>-Accelerated Path Rendering<br />
(Tuesday, 14:00, Room: A3)<br />
Jihan Kim<br />
Postdoctoral Researcher (Berkeley Lab)<br />
Jihan Kim began his new postdoctoral researcher position<br />
at NERSC on August, 2009, after earning his doctorate<br />
degree in electrical engineering at the University of Illinois<br />
Urbana-Champaign. For his dissertation, Kim wrote a<br />
quantum Monte Carlo code in C used to conduct<br />
simulations of quantum dots. He also worked on the<br />
device simulator Charon, during a summer internship at<br />
the Sandia National Laboratory. Currently, he is<br />
collaborating with Prof. Berend Smit from UC Berkeley on<br />
carbon capture and separation project.<br />
h Session(s): S0122 - Computational Screening<br />
of Novel Carbon Capture Materials<br />
(Thursday, 10:30, Marriott Ballroom 4)<br />
Grzegorz Kokosiński<br />
Software Engineer (IBM Poland)<br />
Grzegorz Kokosiński is MSc of Computier Science from<br />
Warsaw University of <strong>Technology</strong> in Poland with thesis<br />
about Ray Tracing implementation on CUDA in 2010.<br />
Since February 2011, he is a Software Engineer at IBM<br />
Netezza R&D Department in Warsaw, Poland. He has<br />
been involved in HPC appliance project as a CUDA team<br />
member, where he contributed in many proof of<br />
concepts, including advanced analitycs, bioinformatics<br />
and geo spatial algorithms implementation on CUDA.<br />
h Session(s): S0376 - Dynamic <strong>Program</strong>ming on<br />
CUDA: Finding the Most Similar DNA Sequence<br />
(Tuesday, 10:00, Room: K)<br />
David Korf<br />
Senior Marketing Manager (Hewlett-Packard)<br />
Mr. Korf has 16 years of engineering experience with the<br />
last 25 years in various senior marketing, product<br />
management and partner management positions.<br />
Accelerators, partner relationships and competitive<br />
analysis are currently some of his focus areas.<br />
h Session(s): S0633 - Learn about new Hewlett-<br />
Packard <strong>GPU</strong> Systems, Solutions, and Applications!<br />
(Wednesday, 10:00, Room: M)<br />
Alexandr Kosenkov<br />
Software Engineer (University of Geneva)<br />
Highly qualified software engineer in the field of HPC<br />
and distributed applications under Linux with over five<br />
years of experience. Possesses strong understanding
of hardware architecture designs down to the physics<br />
level and high-level technologies/programming<br />
languages. Open-minded leader, delivering useable,<br />
well-designed products.<br />
h Session(s): S0039 - Data-Driven GP<strong>GPU</strong> Ideology<br />
Extension (Thursday, 10:00, Marriott Ballroom 3)<br />
Jiri Kraus<br />
(Fraunhofer Institute for Algorithms and Scientific<br />
Computing (FhG-SCAI))<br />
Biography unavailable at press time.<br />
h Session(s): S0706 - Los Alamos AHPC Symposium,<br />
Efficient AMG on Hybrid <strong>GPU</strong> Clusters<br />
(Wednesday, 17:00, Room: J)<br />
Adarsh Krishnamurthy<br />
Post-Doctoral Researcher (UC San Diego)<br />
Adarsh Krishnamurthy is a post-doctoral researcher in<br />
the department of bioengineering at UC San Diego. His<br />
research interests include computer-aided design (CAD),<br />
geometric modeling, parallel <strong>GPU</strong> algorithms,<br />
biomechanics, and heart modeling. He received his Ph.D.<br />
in mechanical engineering from UC Berkeley specializing<br />
on parallel <strong>GPU</strong> algorithms for CAD. He received his<br />
bachelors and masters in mechanical engineering from<br />
Indian Institute of <strong>Technology</strong>, Madras, India.<br />
h Session(s): S0410 - Computing Hausdorff<br />
Distances between Freeforms on the <strong>GPU</strong><br />
(Wednesday, 17:00, Marriott Ballroom 3)<br />
Christoph Kubisch<br />
Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Prior joining NVIDIA as Developer <strong>Technology</strong> Engineer<br />
(Professional Solutions), Christoph was a Ph.D. student<br />
on hardware accelerated visualization techniques for<br />
medical datasets at the Otto-von-Guericke University of<br />
Magdeburg. During his studies he has co-authored<br />
luxinia, a scriptable 3d game engine for games and<br />
research projects. Furthermore, he has worked for the<br />
games industry as technical artist doing game art,<br />
shader and 3dsmax plugin development.<br />
h Session(s): S0105 - Hardware Acceleration<br />
for Vessel Visualization Tasks<br />
(Wednesday, 14:30, Room: A8)<br />
Wesley Kuo<br />
CEO (Ubitus)<br />
Wesley Kuo founded Ubitus Inc. in 2007. Ubitus is<br />
specialized in providing cutting-edge cloud computing<br />
technology in multimedia application and has won<br />
recognition from leading carriers and hand-held device<br />
manufacturers around the world including NTT, NTT<br />
Docomo and Samsung Electronics. Wesley is a<br />
successful entrepreneur who founded i@Solution Inc. in<br />
2000 which was later merged with Aplix Corporation in<br />
2004 where he was a board member and held several<br />
managerial positions in the field of international sales,<br />
marketing and OEM business. Wesley owns a Bachelor<br />
degree in Computer Science and Information<br />
Engineering from National Taiwan University and has<br />
dedicated his career in cloud computing, distributed<br />
computing and embedded solutions.<br />
h Session(s): S2002 – Emerging Companies Summit:<br />
CEO on Stage Featuring eyesight Mobile,<br />
Numira Biosciences, and Ubitus<br />
(Wednesday, 11:00, Marriott Ballroom 4))<br />
Jean Luc Lacome<br />
CEO (IMPETUS Afea SAS)<br />
Jean Luc LACOME has a background in Applied<br />
Mathematics and has been working for the past 10 years<br />
on the development of Smoothed Particle<br />
Hydrodynamics. Jean Luc has interests in fluid-structure<br />
interaction and defense applications. Jean-Luc is CEO of<br />
IMPETUS Afea France.<br />
h Session(s): S0143 - Fluid-Structure-Interaction<br />
Using SPH and GP<strong>GPU</strong> <strong>Technology</strong><br />
(Wednesday, 14:30, Room: K)<br />
Gianluca Lamanna<br />
Researcher (CERN)<br />
Gianluca is physicist working at CERN, the European<br />
Laboratory for Particle physics. In particular, at the<br />
moment, he’s involved in building the trigger system and<br />
the data acquisition system for an experiment searching<br />
for very rare processes. He obtained his PhD in physics<br />
in 2006 in the Pisa University with a thesis in data<br />
analysis about the search for possible violation of the<br />
particle physics Standard Model. After the PhD he spent<br />
few years in getting skills in electronics design and<br />
FPGA programming, very useful in our field to build<br />
detectors and acquisition system.<br />
h Session(s): S0013 - <strong>GPU</strong>s for Fast Triggering in<br />
NA62 Experiment (Tuesday, 10:00, Room: J2)<br />
Bjoern Landmann<br />
Development Engineer (FluiDyna GmbH)<br />
Landmann is a development engineer at FluiDyna GmbH,<br />
Munich, Germany since 2011. His research interests<br />
include: computational multiphysics; high-performance<br />
computing; and turbulence and aeroacoustics.<br />
h Session(s): S0293 - Culises – A Library for<br />
Accelerated CFD on Hybrid <strong>GPU</strong>-CPU Systems<br />
(Wednesday, 15:30, Room: K)<br />
Ian Lane<br />
Assistant Research Professor (Carnegie Mellon<br />
University)<br />
Biography unavailable at press time.<br />
h Session(s): S0223 – Rapid Training of Acoustic<br />
Models Using <strong>GPU</strong>s (Tuesday, 15:00, Room: N)<br />
Gerhard Lang<br />
Chief Engineering Officer (VizRT)<br />
Biography unavailable at press time.<br />
h Session(s): S0356 - Optimizing Texture Transfers<br />
(Tuesday, 16:00, Room: J2)<br />
Tobias Lauer<br />
Senior Researcher (Jedox AG)<br />
Tobias Lauer got his PhD in computer science from the<br />
University of Freiburg (Germany) in 2007. From 2008-<br />
2011, he did research on parallel algorithms for OLAP<br />
applications in a project sponsored by the German<br />
Research Foundation (DFG). He is now a Senior<br />
Researcher at Jedox AG, a software company specialized<br />
in Business Intelligence.<br />
h Session(s): S0219 - Efficient Top-Down Planning in<br />
Business Intelligence (Tuesday, 17:00, Room: C)<br />
Jeff Layton<br />
Enterprise Technologist for HPC (Dell)<br />
Dr. Jeffrey Layton is the Enterprise Technologist for HPC<br />
within Dell. Dr. Layton’s Ph.D. is from Purdue in<br />
Aeronautical and Astronautical Engineering. In his 25+<br />
years of experience with Supercomputing technologies,<br />
Dr. Layton has served in roles as a Professor, Engineer<br />
and Scientist at Boeing, Lockheed Martin, NASA, and<br />
Clarkson University, and has led technical efforts for High<br />
Performance Computing companies such as Linux<br />
Networx, Panasas, and Dell. In these roles he has been a<br />
cluster builder, a cluster user and code writer, a cluster<br />
administrator, as well as a systems engineer, manager,<br />
and benchmark engineer for HPC vendors. He is also an<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
121
SPEAKERS AND<br />
PANELISTS<br />
active contributor to multiple open source projects and<br />
actively contributes to technical publications both for<br />
magazines, books, and for websites.<br />
h Session(s): S0637 Analyzing performance<br />
and power of applications with <strong>GPU</strong>s on<br />
Dell 12G platforms (Presented by Dell)<br />
(Wednesday, 14:00, Room: M)<br />
Simon Layton<br />
PhD Candidate (Boston University)<br />
Simon Layton obtained his Masters in Mechanical<br />
Engineering from Boston University in 2011, and a<br />
Bachelor’s in mathematics and computer science from<br />
the University of Bristol in 2008. He is a PhD candidate<br />
under the supervision of Professor Barba at Boston<br />
University. During his postgraduate studies, he has<br />
worked on <strong>GPU</strong>-based projects, including the Fast Gauss<br />
transform and a CUDA based implementation of the<br />
immersed boundary method in fluid dynamics. Currently<br />
he is working on a <strong>GPU</strong> accelerated classical algebraic<br />
multigrid, work begun while interning at NVIDIA in<br />
Jonathan Cohen’s emerging applications group during<br />
the Summer of 2011.<br />
h Session(s): S0305 - Classical Algebraic Multigrid<br />
for CFD with CUDA (Thursday, 10:00, Room: A8)<br />
Scott Le Grande<br />
Principal Engineer (Amazon Web Services)<br />
Scott Le Grand is currently a principal engineer at<br />
Amazon Web Services. He developed the first molecular<br />
modeling system for home computers, Genesis, in 1987,<br />
Folderol, the distributed computing project targeted at<br />
the protein folding problem in 2000, and BattleSphere, a<br />
networkable 3D space shooter for the Atari Jaguar the<br />
same year. Surprisingly, all three of these efforts shared<br />
a common codebase. More recently, he ported the<br />
Folding@Home codebase to CUDA, achieving a 5x<br />
speedup over previous efforts, and which currently<br />
accounts for ~2.6 petaFLOPs of the project’s<br />
computational firepower. He is best known for his work<br />
porting the AMBER molecular dynamics package to<br />
CUDA, attaining record-breaking performance in the<br />
process. In a previous life, Scott picked up a B.S. in<br />
biology from Siena College and a Ph.D. in biochemistry<br />
from the Pennsylvania State University. In the current<br />
life, he is developing life science services on Amazon’s<br />
Elastic Compute Cloud (EC2).<br />
h Session(s): S0644 Molecule Dynamics, <strong>GPU</strong>s, and<br />
EC2 (Presented by Amazon Web Services)<br />
(Thursday, 10:00, Room: L)<br />
Chris Leader<br />
Research Assistant (Stanford Exploration Project)<br />
Chris Leader is currently working towards a PhD in<br />
Geophysics with the Stanford Exploration Project, under<br />
the supervision of Biondo Biondi and Jon Claerbout. He<br />
received an MSc in Geophsyics from Imperial College<br />
London whilst working on imaging 3D land seismic data<br />
and a BA in Physics from The University of Oxford whilst<br />
working on astrophysics and atmospheric phenomena.<br />
His interests include imaging blended seismic data,<br />
geophysical algorithm acceleration using advanced<br />
computing architectures and using micro-seismic data<br />
for imaging purposes.<br />
h Session(s): S0125 - Memory Efficient Reverse Time<br />
Migration in 3D (Wednesday, 10:00, Room: A7)<br />
Brent Leback<br />
Engineering Manager (Portland Group)<br />
Brent Leback is an Engineering Manager for PGI. He has<br />
worked in various positions over the last 26 years in HPC<br />
customer support, math library development,<br />
applications engineering and consulting at QTC, Axian,<br />
PGI and STMicroelectronics.<br />
h Session(s): S0622 - The Portand Group OpenACC<br />
(Thursday, 10:00, Room: A5)<br />
David Lecomber<br />
CTO (Allinea Software)<br />
Dr. David Lecomber is a founder of Allinea and leads the<br />
research, development and support teams behind its<br />
software products. David’s history in High Performance<br />
Computing began with the Oxford BSP group in 1993,<br />
working on alternatives for parallel programming to the<br />
emerging complex MPI standard. He obtained a DPhil in<br />
Parallel Computing, on the simulation of sharedmemoryand<br />
formal semantics for distributed-memory<br />
clusters, continuing to research parallel libraries and<br />
languages afterwards. After two years developing<br />
software for online services on clusters, he returned to<br />
HPC at Allinea, building the development tools needed<br />
for parallel and multithreaded software.<br />
h Session(s): S0099 - Debugging <strong>GPU</strong> Applications<br />
For Correctness and Performance<br />
(Wednesday, 15:00, Room: A5)<br />
HyoukJoong Lee<br />
PhD Student (Stanford University)<br />
HyoukJoong Lee is a PhD candidate in electrical<br />
engineering at Stanford University. His research<br />
interests include parallel computer architecture and<br />
general-purpose <strong>GPU</strong> computing with their<br />
programming models. He has an MS in electrical<br />
engineering from Stanford University.<br />
h Session(s): S0365 - Delite: A Framework for<br />
Implementing Heterogeneous Parallel DSLs<br />
(Wednesday, 15:00, Room: C)<br />
David Lehavi<br />
Senior Research Scientist (HP)<br />
David Lehavi is a senior research scientist with HP Labs<br />
Israel. He got his Ph.D. in algebraic geometry from the<br />
Hebrew university of Jerusalem on 2002. He has done<br />
research in algebraic geometry, bioinformatics,<br />
computerized proofs, communication networks, and<br />
semantics. He is currently interested in machine<br />
learning, and in various models for execution of general<br />
purpose algorithms on <strong>GPU</strong>s.<br />
h Session(s): S0043 - 30x Faster Regular<br />
Expressions on a <strong>GPU</strong> (Tuesday, 17:30, Room: C)<br />
Eric Lequiniou<br />
Director, High Performance Computing (Altair)<br />
Lequiniou is director of High Performance Computing at<br />
Altair. Expert in software optimization and parallelization<br />
on clusters and multi-core architectures, he developed<br />
the Hybrid MPP parallel version of RADIOSS finite<br />
element software. After a MsC degree in computer<br />
science, Eric started his career in 1994 at CNRS, in<br />
Laboratoire Informatique du Parallélisme. He joined<br />
Mecalog company in 1994 and worked for the French<br />
company until it was acquired by Altair in 2006. He also<br />
holds an Executive MBA from HEC French business<br />
school obtained in 2007.<br />
h Session(s): S0225 - Speedup Altair RADIOSS<br />
Solvers Using NVIDIA <strong>GPU</strong><br />
(Wednesday, 09:30, Room: K)<br />
Wei Li<br />
Research Scientist (Siemens Corporation)<br />
Wei Li is a research scientist at Siemens Corporation,<br />
Corporate Research & <strong>Technology</strong>, with the responsibility<br />
focused on <strong>GPU</strong>-related innovations for Siemens’<br />
products. He is the creator of the volume renderer that
is widely deployed in Syngo.via, the medical imaging<br />
platform of Siemens Healthcare. Wei Li received a PhD<br />
in computer science from Stony Brook University. His<br />
research interests include visualization, medical<br />
imaging, <strong>GPU</strong> acceleration for graphics and nongraphics<br />
applications. He has published 20+ papers in<br />
prestigious journals and conferences, and has produced<br />
10+ approved and pending patents.<br />
h Session(s): S0342 - Volumetric Processing and<br />
Visualization on Heterogeneous Architecture<br />
(Wednesday, 14:00, Room: A8)<br />
Cheng Liao<br />
Development Manager (MSCsoftware)<br />
Cheng Liao received a PhD degree from Georgia Tech, and<br />
is a development manager with MSCsoftware. His<br />
professional interests include high performance matrix<br />
computing, I/O, and other FEA related technologies. Prior<br />
to MSC, Cheng spent many years with SGI and Convex.<br />
h Session(s): S0064 - MD.Nastran Sparse Direct<br />
Solvers for Tesla <strong>GPU</strong>s<br />
(Wednesday, 14:00, Room: K)<br />
Jerome Limido<br />
Research & Development (IMPETUS Afea SAS)<br />
Jérôme LIMIDO has experience from research and<br />
advanced engineering within aerospace applications.<br />
The main work of Jérôme has focused on processes<br />
involving large deformations, both experimentally and<br />
numerically. Jérôme has special interests in advanced<br />
numerical methods and fatigue of materials. Jérôme is<br />
R&D responsible at IMPETUS Afea France and teaches<br />
Advanced Computational Mechanics and Numerical<br />
Methods at ISAE.<br />
h Session(s): S0143 – Fluid-Structure-Interaction<br />
Using SPH and GP<strong>GPU</strong> <strong>Technology</strong><br />
(Wednesday, 14:30, Room: K)<br />
Cheng-Hung Lin<br />
Associate Professor (National Taiwan Normal University)<br />
Cheng-Hung Lin received the Ph.D. degree in computer<br />
science from the National Tsing Hua University in 2008. He<br />
is currently an associate professor with National Taiwan<br />
Normal University. His current research interests include<br />
multicore programming and parallel algorithm design.<br />
h Session(s): S0054 - PFAC Library: <strong>GPU</strong>-Based<br />
String Matching Algorithm<br />
(Thursday, 14:00, Room: C)<br />
Heshan Lin<br />
Research Scientist (Virginia Tech)<br />
Heshan Lin is a Research Scientist in the Department of<br />
Computer Science at Virginia Tech. His current research<br />
focuses on the intersection of High Performance<br />
Computing and Bioinformatics. Specifically, his research<br />
aims at massively accelerating biological discoveries<br />
with emergent computational techniques including<br />
graphics processing units (<strong>GPU</strong>) and cloud computing.<br />
He is the author of the latest version of mpiBLAST, a<br />
popular parallel sequence-search software that has<br />
received thousands of downloads worldwide. He received<br />
a Ph.D. degree in Computer Science from North Carolina<br />
State University in 2009.<br />
h Session(s): S0156 - Towards Computing the Cure<br />
for Cancer (Tuesday, 17:00, Hall 1)<br />
James Lin<br />
Technical Director, High Performance Computing Center<br />
(Shanghai Jiao Tong University)<br />
James Lin is technical director for High Performance<br />
Computing Center in Shanghai Jiao Tong University and<br />
co-funder of HMPP Competence Center for AP & Japan.<br />
His major research area is parallel programming,<br />
especially for applying CUDA in CFD. He was awarded<br />
NVidia Academic Partnership <strong>Program</strong> in Year 2010 and<br />
is in reviewer committee for CUDA Campus Contest.<br />
h Session(s): S0251 - RANS CFD Solver on Fermi<br />
(Tuesday, 10:00, Room: A7)<br />
Yuan Lin<br />
Senior Engineer (NVIDIA)<br />
Yuan Lin is a senior engineer and manages the compute<br />
compiler code generation team at NVIDIA. His team’s<br />
responsibilities include PTX code generation, tools and<br />
platform support. Yuan has been at NVIDIA for 3 years.<br />
He was at Sun Microsystems and Motorola before that.<br />
He holds a doctorate in computer science from<br />
University of Illinois at Urbana-Champaign.<br />
h Session(s): S0235 – Compiling CUDA and Other<br />
Languages for <strong>GPU</strong>s (Wednesday, 10:00, Room: A5)<br />
Olay Lindtjorn<br />
(Schlumberger)<br />
Biography unavailable at press time.<br />
h Session(s): S0531 - Exascaling Your Apps<br />
(Wednesday, 09:00, Room: C)<br />
Hui Liu<br />
(University of Calgary)<br />
Hui Liu is working for the reservoir simulation group at<br />
the University of Calgary. He is leading the development of<br />
<strong>GPU</strong>-based parallel iterative solvers. He has successfully<br />
designed/implemented a sparse BLAS library, four Krylov<br />
subspace solvers, two algebraic multigrid solvers, parallel<br />
triangular solvers and several preconditioners. He<br />
received his PhD degree in Computational Mathematics<br />
and Parallel Computing from the Chinese Academy of<br />
Sciences in 2010, and his BSc. degree in Computational<br />
Mathematics from the University of Science and<br />
<strong>Technology</strong> of China (USTC) in 2005.<br />
h Session(s): S0704 - Los Alamos AHPC Symposium,<br />
Accelerating Iterative Linear Solvers on <strong>GPU</strong>s<br />
(Wednesday, 16:30, Room: J1)<br />
h S0708 - Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 1<br />
(Thursday, 09:00, Room: J3)<br />
Li-Ta Lo<br />
(Los Alamos National Laboratory)<br />
Biography unavailable at press time.<br />
h Session(s): S0706 - Los Alamos AHPC Symposium,<br />
PISTON: Portability and Performance for Data-<br />
Parallel Visualization and Analysis Operators<br />
(Wednesday, 17:30, Room: J1)<br />
Alex Loddoch<br />
Sr. Research Scientist (Chevron)<br />
Alex Loddoch is a Senior Research Scientist in Chevron’s<br />
Technical Computing group. His work includes the<br />
evaluation of emerging High Performance Computing<br />
technologies and their application to algorithms in<br />
Seismic Imaging and Processing and Reservoir<br />
Simulation. Before joining Chevron he was a Research<br />
Assistant at the University of Muenster, Germany where<br />
he worked on topics such as Computational Fluid<br />
Dynamics, Visualization and Data Compression. Alex<br />
received a M.Sc. in Physics and a Ph.D. in Geophysics<br />
from University of Muenster, studying the internal<br />
dynamics of terrestrial planets.<br />
h Session(s): S0628 - Panel Session: Learn from<br />
Experts in the Oil & Gas Industry<br />
(Wednesday, 16:30, Room: A7)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
123
SPEAKERS AND<br />
PANELISTS<br />
Rainald Lohner<br />
Professor (George Mason University)<br />
Biography unavailable at press time.<br />
h Session(s): S0218 - ASI Parallel Fortran: A<br />
General-Purpose Fortran to <strong>GPU</strong> Translator<br />
(Thursday, 16:30, Room: B)<br />
K. Patrick Lorton<br />
Principal Developer (Schrodinger)<br />
Patrick Lorton is a Principal Developer and the Technical<br />
Lead for the Core Hopping and Combiglide products at<br />
Schrödinger. He received bachelors degrees in Computer<br />
Science, Mathematics and Chemistry from Indiana<br />
University, where he published in the fields of Parallel<br />
Computing and Computational Chemistry. He has<br />
worked with Schrödinger since graduation.<br />
h Session(s): S0121 – Software Architecture to<br />
Facilitate CUDA Development<br />
(Wednesday, 16:30, Room: N)<br />
Edward Lowe<br />
Research Assistant Professor (Vanderbilt University)<br />
Dr. Lowe is a research assistant professor at Vanderbilt<br />
University developing novel computational methods for<br />
drug discovery. His interests include <strong>GPU</strong> acceleration,<br />
algorithmic techniques in massively parallel<br />
programming, machine learning, computational<br />
chemistry, and enzyme mechanisms. He currently leads<br />
a cheminformatics core in the laboratory of Professor<br />
Jens Meiler as a member of the Vanderbilt Center for<br />
Structural Biology and Institute in Chemical Biology.<br />
h Session(s): S0346 - GP<strong>GPU</strong> Accelerated Protein<br />
Similarity Measures Identifying Biological<br />
Relevant Structure (Wednesday, 17:30, Room: N)<br />
h S0354 - Bcl::ChemInfo Suite Enables Machine<br />
Learning-Based Drug Discovery Using <strong>GPU</strong>s<br />
(Thursday, 09:30, Marriott Ballroom 4)<br />
Hatem Ltaief<br />
Computational Scientist (KAUST<br />
Supercomputing Laboratory)<br />
Dr. Hatem Ltaief received the MSc degree from ISITIL, a<br />
school of engineering at the University of Claude<br />
Bernard Lyon I, France, the MSc in applied mathematics<br />
at the University of Houston and the PhD degree in<br />
computer science from the University of Houston. He<br />
was a Research Scientist II in the Innovative Computing<br />
Laboratory in the Department of Electrical Engineering<br />
and Computer Science at the University of Tennessee,<br />
Knoxville. He is currently a Computational Scientist at<br />
KAUST Supercomputing Laboratory, Saudi Arabia.<br />
h Session(s): S0042 - Solving Challenging Numerical<br />
Linear Algebra Algorithms using Multiple <strong>GPU</strong><br />
Accelerators (Wednesday, 15:00, Room: A3)<br />
Peter Lu<br />
Post-Doctoral Research Fellow (Harvard University)<br />
Peter J. Lu received his AB summa cum laude in physics<br />
(2000) from Princeton University, and AM (2002) and PhD<br />
(2008) in physics from Harvard University. He is presently<br />
a post-doctoral research fellow in the Department of<br />
Physics and SEAS at Harvard University; his main focus<br />
is on the physics of attractive colloids and the integration<br />
of high-performance imaging and analysis techniques.<br />
He conducts experiments aboard the International Space<br />
Station, examining phase separation of colloid mixtures<br />
in the absence of gravity. He has published his<br />
discoveries of modern quasicrystal geometry in medieval<br />
Islamic architectural tilings; the first precision<br />
compound machines, from ancient China; the first use of<br />
diamond, in prehistoric China; and the first<br />
quasicrystalline mineral found in nature.<br />
h Session(s): S0521 - Desktop Supercomputing<br />
in the Soft-Matter Physics Laboratory<br />
(Thursday, 10:00, Room: A3)<br />
David Luebke<br />
Senior Director of Graphics Research (NVIDIA)<br />
David Luebke helped found NVIDIA Research in 2006<br />
after eight years on the faculty of the University of<br />
Virginia. Luebke received his Ph.D. under Fred Brooks at<br />
the University of North Carolina in 1998. His principal<br />
research interests are <strong>GPU</strong> computing and real-time<br />
computer graphics. Luebke’s honors include the NVIDIA<br />
Distinguished Inventor award, the NSF CAREER and DOE<br />
Early Career PI awards, and the ACM Symposium on<br />
Interactive 3D Graphics “Test of Time Award”. Dr. Luebke<br />
has co-authored a book, a SIGGRAPH Electronic<br />
Theater piece, a major museum exhibit visited by over<br />
110,000 people, and dozens of papers, articles, chapters,<br />
and patents.<br />
h Session(s): S0609 - Computational Graphics: An<br />
Overview of Graphics Research at NVIDIA<br />
(Tuesday, 14:00, Room: B)<br />
h S0016 - NVIDIA Grad Fellowship Fast Forward<br />
(Wednesday, 10:00, Room: A2)<br />
Justin Luitjens<br />
Devtech Engineer (NVIDIA)<br />
Justin Luitjens is a Devtech Engineer at NVIDIA and<br />
works with applications engineers to optimize and port<br />
their applications to CUDA. He joined NVIDIA after<br />
receiving his Ph.D. in Scientific Computing from the<br />
University of Utah in 2011.<br />
h Session(s): S0624 - Introduction to CUDA C<br />
(Monday, 10:30, Room: A5)<br />
h S0302 - Accelerating miniFE:<br />
A Finite Element Mini-application<br />
(Thursday, 09:00, Marriott Ballroom 3)<br />
Dimitar Lukarski<br />
Research Associate (Karlsruhe Institute of<br />
<strong>Technology</strong> (KIT))<br />
Dimitar Lukarski holds a bachelor’s degree from Technical<br />
University of Sofia, Bulgaria and a master’s degree from<br />
Technical University of Karlsruhe, Germany. Currently, he<br />
is working at the Engineering Mathematics and Computing<br />
Lab (EMCL) at Karlsruhe Institute of <strong>Technology</strong> (KIT) on<br />
interdisciplinary topics in the area of parallel numerical<br />
methods and emerging hardware such as <strong>GPU</strong>s and<br />
multi-core CPUs. His focus is on robust and fine-grained<br />
parallel preconditioners with implementations on<br />
stream-based platforms such as CUDA.<br />
h Session(s): S0289 - Fine-Grained Parallel<br />
Preconditioners for Fast <strong>GPU</strong>-based Solvers<br />
(Wednesday, 09:00, Marriott Ballroom 3)<br />
h S0708 - Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 1<br />
(Thursday, 09:00, Room: J3)<br />
h S0291 - LAtoolbox: A Multi-platform Sparse<br />
Linear Algebra Toolbox<br />
(Thursday, 10:30, Marriott Ballroom 3)<br />
Chris Lupo<br />
Assistant Professor (California Polytechnic<br />
State University)<br />
Chris Lupo is an Assistant Professor of Computer<br />
Science and Computer Engineering at California<br />
Polytechnic State University in San Luis Obispo. His<br />
teaching and research interests include parallel<br />
computing, computer architecture, embedded system<br />
design and code generation. Chris earned his PhD in<br />
Computer Engineering from UC Davis in 2008.
h Session(s): S0311 - Teaching Applied Parallel<br />
Computing with <strong>GPU</strong>s (Wednesday, 17:30, Room: C)<br />
Steve Lyness<br />
VP of HPC Solutions Engineering (Appro)<br />
In November of 2007, Steve Lyness joined Appro as Vice<br />
President of HPC Solutions Engineering. Steve is<br />
responsible for the success of Appro’s closed-loop<br />
solution management, up-front consulting and<br />
pre-integration of Appro’s HPC solutions across a wide<br />
range of HPC applications. Steve also acts as a key<br />
member of the management team for project<br />
management, planning and coordinating of worldwide<br />
pre-sales and post-sales customer solution programs.<br />
Before joining Appro, Steve was Director of Sales<br />
Engineering for NetEffects, a provider of 10 GigE adapter<br />
technologies for HPC and Enterprise customers. Steve<br />
graduated from Drexel University with a Bachelor’s<br />
degree in Electrical Engineering with an emphasis on<br />
radar and signal processing technologies.<br />
h Session(s): S0618 - Best Practices of a 800TFlop<br />
Hybrid Supercomputer Implementation<br />
(Tuesday, 09:30, Room: M)<br />
Henrik Høj Madsen<br />
Solution Architect (LEGO)<br />
Henrik’s background is based on a Master degree in<br />
Computer sciences and Engineering from Technical<br />
University of Denmark where he designed and<br />
implemented a realtime raytracing architecture on FPGA<br />
hardware. Henrik was CEO and Lead game developer in<br />
DogOnFire Interactive, a small game development<br />
company dedicated to producing core MMO technologies<br />
for indie market developers. He is currently positioned<br />
as Solution Architect at LEGO where he is the architect<br />
of the 3D rendering backend technologies for “LEGO<br />
Universe”, LEGO’s Massive Online Multiplayer Game for<br />
LEGO fans worldwide.<br />
h Session(s): S0261 - Scalable <strong>GPU</strong> Computing<br />
Service Architecture (Tuesday, 16:00, Room: A5)<br />
Alireza Mahani<br />
Quantitative Modeler (Sentrana)<br />
Dr. Alireza S Mahani works as a computational scientist<br />
at Sentrana Inc., a quantitative marketing company in<br />
Washington, DC. His recent work has been focused on<br />
building high-performance software (using CUDA/<br />
OpenMP/MPI) for Monte Carlo Markov Chain (MCMC)<br />
sampling of high-dimensional conditional posterior<br />
distributions arising in Gibbs sampling of Hierarchical<br />
Bayesian models. Prior to joining Sentrana, Dr. Mahani<br />
worked as a management consultant at McKinsey & Co.<br />
He holds a Ph.D. in Physics from Washington University<br />
in St. Louis, where his research on statistical modeling<br />
of neuronal motion processing in the avian brain<br />
resulted in six articles in peer-reviewed journals.<br />
h Session(s): S0035 - <strong>GPU</strong> Parallelization of Gibbs<br />
Sampling: Abstractions, Results, and Lessons<br />
Learned (Wednesday, 15:00, Marriott Ballroom 3)<br />
Filipe Maia<br />
Fellow (Lawrence Berkeley National Laboratory)<br />
Filipe Maia graduated in biochemistry from Oporto<br />
University, Portugal, in 2004 and completed his PhD in<br />
Physics at Uppsala University, Sweden. He is currently a<br />
Petascale Postdoctoral Fellow at NERSC, Lawrence<br />
Berkeley National Laboratory. His main research<br />
interests, besides <strong>GPU</strong> computing, are diffraction<br />
imaging, image reconstruction and compressive sensing.<br />
h Session(s): S0131 - Multi-<strong>GPU</strong> Real-Time<br />
Ptychographic X-ray Image Reconstruction<br />
(Wednesday, 16:00, Room: A8)<br />
Jason Mak<br />
Graduate Student (UC Davis)<br />
Jason is a computer science Ph.D student at U.C. Davis.<br />
He received my B.S. in computer science from California<br />
Polytechnic State University. His research interests<br />
include <strong>GPU</strong> computing, parallel algorithms and<br />
architectures, and scientific computing.<br />
h Session(s): S0361 – Lossless Data Compression on<br />
<strong>GPU</strong>s (Wednesday, 17:00, Room: B)<br />
Allen Malony<br />
Professor (University of Oregon)<br />
Allen D. Malony is a Professor in the Department of<br />
Computer and Information Science at the University of<br />
Oregon where he directs the TAU parallel performance<br />
system project. His research interests are in parallel<br />
computing, performance tools, and computational<br />
science. Malony was awarded the NSF National Young<br />
Investigator award, was a Fulbright Research Scholar to<br />
The Netherlands and Austria, and received the<br />
prestigious Alexander von Humboldt Research Award for<br />
Senior U.S. Scientists by the Alexander von Humboldt<br />
Foundation. He also received a Professor Partnership<br />
award from NVIDIA Corporation. Malony is CEO of<br />
ParaTools, Inc., founded in 2005.<br />
h Session(s): S0298 - Performance Tools for<br />
<strong>GPU</strong>-Powered Scalable Heterogeneous Systems<br />
(Wednesday, 17:00, Room: A5)<br />
Jonathan Marbach<br />
Director, Software Architecture and Engineering<br />
(TerraSpark Geosciences)<br />
Jonathan Marbach is Director of Software Architecture<br />
and Engineering at TerraSpark Geosciences, makers of<br />
the 3D Seismic Interpretation package Insight Earth. He<br />
received his PhD from the University of Colorado and<br />
specializes in 3d graphics, virtual reality, and<br />
visualization. He presented at <strong>GTC</strong> 2010 on <strong>GPU</strong><br />
accelerated stereographic rendering.<br />
h Session(s): S0336 - <strong>GPU</strong> Acceleration for Seismic<br />
Interpretation Algorithms (Tuesday, 16:00, Room: A7)<br />
Nikolay Markovskiy<br />
HPC DevTech Engineer (NVIDIA)<br />
Nikolay Markovskiy is a developer technology engineer<br />
at NVIDIA and specializes in high performance<br />
computing using CUDA. He has a background in<br />
computational condensed matter physics and made his<br />
PhD in multi-level Monte Carlo algorithms at University<br />
of Southern California.<br />
h Session(s): S0247 – 3D ADI Method for Fluid<br />
Simulation on Multiple <strong>GPU</strong>s<br />
(Tuesday, 17:00, Marriott Ballroom 3)<br />
Samuel Maroy<br />
Software Engineer (Barco)<br />
Samuel Maroy received the M.Sc. degree in computer<br />
science from the Universiteit Gent in 2008. He joined<br />
Barco, in August 2008, as software engineer working on<br />
the development of a networked visualization system.<br />
Since 2011, Samuel focuses on the use of <strong>GPU</strong>’s to<br />
power the video streaming and video processing in<br />
Barco’s next generation visualization platform. Outside<br />
of work, Samuel is interested in graphics rendering and<br />
hopes someday to build his own game. Furthermore, he<br />
enjoys cycling, soccer, racing and spending time with<br />
friends.<br />
h Session(s): S0252 - Building Real-Time<br />
Professional Visualization Solutions with OpenCL<br />
(Thursday, 10:30, Room: A1)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
125
SPEAKERS AND<br />
PANELISTS<br />
Naoya Maruyama<br />
Assistant Professor (Tokyo Institute of <strong>Technology</strong>)<br />
Naoya Maruyama received his Ph.D. degree in Computer<br />
Science from Tokyo Institute of <strong>Technology</strong> in 2008, and<br />
is an Assistant Professor at Global Scientific Information<br />
and Computing Center, Tokyo Institute of <strong>Technology</strong>. He<br />
has been working on research topics related to<br />
large-scale high performance computing, including fault<br />
tolerance, low power computing, and programming<br />
models for heterogeneous systems.<br />
h Session(s): S0367 - Physis: An Implicitly Parallel<br />
Framework for Stencil Computations<br />
(Wednesday, 16:30, Room: C)<br />
Issei Masaie<br />
Chief Engineer (Prometech Software, Inc.)<br />
Issei Masaie is a chief engineer of Prometech Software<br />
and works on developing physics simulation and<br />
acceleration technology on gpu / cell / multicore<br />
hardware for particle-based CAE software. In 2005 he<br />
recieved a master’s degree from the Department of<br />
Quantum Engineering and System Science, at the<br />
Graduate School of Engineering, The University of Tokyo.<br />
h Session(s): S0066 – Particleworks: Particle-based<br />
CAE Software Fully Ported on Multi-<strong>GPU</strong><br />
(Wednesday, 10:00, Room: K)<br />
Chris Mason<br />
Product Manager (Acceleware)<br />
Chris is the Product Manager for Acceleware’s <strong>GPU</strong><br />
accelerated electromagnetic product line. He is<br />
responsible for the successful development and launch of<br />
Acceleware products used by companies world-wide.<br />
Chris has seven years of experience in developing<br />
commercial applications for the <strong>GPU</strong> and has delivered<br />
over 20 CUDA courses to students in a diverse range of<br />
industries. His previous experience also includes<br />
parallelization of algorithms on digital signal processors<br />
(DSPs) for cellular phones and base stations. Chris has a<br />
Masters in Electrical Engineering from Stanford University.<br />
h Session(s): S0614 - Part 1: Introduction to <strong>GPU</strong><br />
<strong>Program</strong>ming (Monday, 09:00, Room: C)<br />
h S0615 - Part 2: Introduction to the <strong>GPU</strong><br />
Architecture and Memory Model<br />
(Monday, 10:30, Room: C)<br />
h S0616 - Part 3: Debugging <strong>GPU</strong> <strong>Program</strong>s<br />
(Monday, 13:00, Room: C)<br />
h S0617 - Part 4: Introduction to Optimizations and<br />
Profiling (Monday, 14:30, Room: C)<br />
Enrico Mastrostefano<br />
PhD Student (Sapienza Università di Roma)<br />
Enrico is a PhD student at Sapienza University of Rome.<br />
h Session(s): S0241 - Large Graphs on Multi-<strong>GPU</strong>s<br />
(Wednesday, 16:30, Marriott Ballroom 3)<br />
Satoshi Matsuoka<br />
Titech<br />
Biography unavailable at press time.<br />
h Session(s): S0531 - Exascaling Your Apps<br />
(Wednesday, 09:00, Room: C)<br />
David McAllister<br />
OptiX Manager (NVIDIA, OptiX group)<br />
Bio unavailable at press time.<br />
h Session(s): S0366 - OptiX Out-of-Core and CPU<br />
Rendering (Tuesday, 15:30, Room: J1)<br />
Chris McClanahan<br />
Software Engineer (AccelerEyes)<br />
Chris McClanahan is a software engineer at<br />
AccelerEyes. He has a Master’s Degree in Computer<br />
Science from the Georgia Institute of <strong>Technology</strong>, with a<br />
focus on computer vision and computational<br />
photography.<br />
h Session(s): S0287 - Jacket for Multidimensional<br />
Scaling in Genomics (Tuesday, 17:30, Room: K)<br />
h S0325 - ArrayFire Graphics: A Tutorial<br />
(Wednesday, 10:00, Room: A3)<br />
Iain McCready<br />
CEO (Cortexica)<br />
Iain has over 25 years experience within the world’s<br />
Telecommunications and IT Industries. Until recently he<br />
was the CEO of NeoMedia Inc., a public US based<br />
software business that is the world leader in state-of-the<br />
art barcode creation, capture, delivery and reading<br />
technology. Prior to that Iain was CEO of Mobiqa Limited,<br />
an Edinburgh based business where he led the company<br />
form a start up to the world leaders in mobile ticketing,<br />
mobile boarding pass and couponing solutions based on<br />
the creation, optimisation, delivery and redemption of<br />
barcodes to mobile phones. He was also Chairman of<br />
Scolocate Limited a co-location and managed services<br />
business specialising in IT architecture, design and<br />
planning, project management and implementation<br />
services. Prior to that he was Chief Operating Officer of<br />
KSCL, Scotland’s largest software house and a leading<br />
supplier of customer care and billing applications to the<br />
world’s mobile phone operators.<br />
h Sessions: S2000 – Emerging Companies Summit<br />
Opening with Jeff Herbst (VP of Business<br />
Development, NVIDIA), Followed by CEO on<br />
Stage Featuring, Rocketick and Cortexica<br />
(Wednesday, 09:00, Marriott Ballroom 4)<br />
Myles M. McGovern<br />
President/CEO (Immersive Media)<br />
Myles McGovern has served as the President and CEO of<br />
Immersive Media since 2004. Under Myles’ direction IMC<br />
has pioneered and become the world leading provider of<br />
3600 interactive video experience ever since. Prior to<br />
joining IMC Myles was the Founder, President and CEO<br />
of Centrinity/MC2 where he spearheaded the company’s<br />
rapid growth in 55 countries and was twice nominated<br />
for Canadian Entrepreneur of the Year. After his post<br />
secondary education at Simon Fraser University Myles<br />
gained valuable technology experience during his 10<br />
years at Xerox culminating in product management for<br />
their digital product integration strategy.<br />
h Session(s): SS2004 – Emerging Companies<br />
Summit: CEO on Stage Featuring GAIKAI,<br />
Immersive Media, and Numecent<br />
(Wednesday, 15:00, Marriott Ballroom 4)<br />
Morgan McGuire<br />
Visiting Professor (NVIDIA and WIlliams College)<br />
Morgan McGuire is a visiting professor in the NVIDIA<br />
Research Graphics Group, where he works on real-time<br />
special effects and future <strong>GPU</strong>s, and an assistant<br />
professor of Computer Science at Williams College<br />
where he teaches computer graphics and game design.<br />
He is also the editor in chief of the Journal of Graphics<br />
Tools. Dr. McGuire contributed to many commercial<br />
products including the E-Ink display for the Amazon<br />
Kindle, the PeakStream high-performance computing<br />
infrastructure acquired by Google, the Titan Quest role<br />
playing game, and the Marvel Ultimate Alliance 2 video<br />
game for Xbox 360.<br />
h Session(s): S0409 – Stochastic Rasterization<br />
(Tuesday, 15:30, Room: B)
Simon McIntosh-Smith<br />
(The University of Bristol)<br />
Simon McIntosh-Smith has spent most of his life<br />
designing and programming multi-core and many-core<br />
systems. He began his career as a microprocessor<br />
architect at Inmos and STMicroelectronics, before<br />
co-designing the world’s first fully programmable <strong>GPU</strong><br />
at Pixelfusion in 2000. In 2002 he co-founded ClearSpeed<br />
where, as Director of Architecture and Applications, he<br />
led the development of the first modern many-core HPC<br />
accelerators. In 2003 he designed the first accelerated<br />
BLAS/LAPACK and FFT libraries, leading to the first<br />
modern accelerated Top500 system, TSUBAME 1.0 at<br />
Tokyo Tech in 2006. He now leads the Microelectronics<br />
Research Group at the University of Bristol, UK.<br />
h Session(s): S0703 - Los Alamos AHPC Symposium,<br />
Adaptive Heterogeneous Computing with<br />
OpenCL: A Molecular Docking Case Study<br />
(Wednesday, 16:00, Room: J1)<br />
h S0709 - Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 2<br />
(Thursday, 14:00, Room: J1)<br />
Sara McMains<br />
Professor (UC Berkeley)<br />
Dr. McMains is an Associate Professor of Mechanical<br />
Engineering at Berkeley. Her research interests include<br />
geometric solid modeling, CAD/CAM, <strong>GPU</strong> algorithms,<br />
geometric Design for Manufacturing feedback, computer<br />
aided process planning, layered manufacturing,<br />
computer graphics, visualization, and virtual prototyping.<br />
Applications of her research include haptic design<br />
environments, accessibility analysis for manufacturing,<br />
design for cleanability, layered manufacturing, and<br />
machining. She received her AB from Harvard and her<br />
MS and PhD from Berkeley, all in Computer Science. She<br />
is the recipient of Best Paper Awards from Usenix, ASME<br />
and the ACM Solid and Physical Modeling Symposium,<br />
and the NSF CAREER Award.<br />
h Session(s): S0410 - Computing Hausdorff<br />
Distances between Freeforms on the <strong>GPU</strong><br />
(Wednesday, 17:00, Marriott Ballroom 3)<br />
h S0411 - Artifact-Free Cloud-Based CAD Rendering<br />
(Thursday, 16:30, Room: L)<br />
Gaetano Mendola<br />
Principal Engineer (MBI srl)<br />
Principal Software Engineer for MBI srl. MBI develops<br />
exclusive critical mission solutions. He graduated in<br />
computer engineer at University of Pisa. His interest are<br />
related to low latency systems. Since 2008 exploiting the<br />
Software Designed Radio approach is leading the<br />
building of real demodulators completely in software<br />
offloading to <strong>GPU</strong> what normally other do with FPGA.<br />
h Session(s): S0065 - Satellite HUB Communication<br />
System <strong>GPU</strong> Based (Thursday, 16:30, Room: M)<br />
Duane Merrill<br />
Research Scientist (NVIDIA)<br />
Duane Merrill joined NVIDIA Research after completing<br />
his Ph.D. in Computer Science at the University of<br />
Virginia. His research interests include algorithmic<br />
primitives, design idioms, and programming models with<br />
a particular focus on dynamic, irregular, and cooperative<br />
parallelism. He contributes to the B40C and Thrust open<br />
source libraries of <strong>GPU</strong> computing primitives.<br />
h Session(s): S0600 - Scalable <strong>GPU</strong> Graph Traversal<br />
(Wednesday, 14:00, Room: A2)<br />
Peter Messner<br />
Compute Devtech Engineer (NVIDIA)<br />
Peter Messmer has been developing and optimizing<br />
parallel scientific software for over 15 years. After<br />
completing his PhD in solar plasma-physics at ETH Zurich<br />
in 2001, Peter joined Tech-X Corp in Boulder, CO, where he<br />
was leading a group of scientists solving space-related<br />
simulation and data analysis problems. As part of a NASA<br />
project, he became an early adopter of <strong>GPU</strong> computing<br />
and the lead developer of <strong>GPU</strong>Lib, a library for accelerating<br />
data analysis tasks with <strong>GPU</strong>s. Since joining NVIDIA in<br />
2011, he has been working with clients to optimize their<br />
massively parallel <strong>GPU</strong> applications.<br />
h Session(s): S0629 - CUDA Accelerated Compute<br />
Libraries (Monday, 13:00, Room: A5)<br />
h S0256 - A Stencil Library for the New Dynamic<br />
Core of COSMO (Thursday, 09:00, Room: N)<br />
Renato Miceli<br />
Computational Scientist (ICHEC)<br />
Renato Miceli is a Computational Scientist and <strong>GPU</strong><br />
Developer at the Irish Centre for High-End Computing.<br />
He has a BSc in Computer Science (hons) from<br />
Universidade Federal de Campina Grande, Brazil, where<br />
he focused on Software Engineering and Distributed<br />
Systems, especially Grid and Cloud Computing for HPC.<br />
At ICHEC, Renato works primarily at analyzing,<br />
developing, optimizing and porting of applications to<br />
many-core architectures; his past projects involved<br />
cryptography, financial simulation, geophysical analysis<br />
and molecular dynamics. Renato also works on the<br />
European FP7 projects PRACE, in enabling scientific<br />
computing on <strong>GPU</strong>s; and AutoTune, for automatic tuning<br />
of <strong>GPU</strong> codes.<br />
h Session(s): S0034 – Real-Time Risk Simulation:<br />
The <strong>GPU</strong> Revolution In Profit Margin Analysis<br />
(Tuesday, 15:00, Room: L)<br />
Paulius Micikevicius<br />
Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Paulius Micikevicius is a Developer <strong>Technology</strong> Engineer<br />
at NVIDIA with a focus on parallel computation and<br />
performance analysis. He has been involved in the<br />
analysis and optimization of both industrial and scientific<br />
codes over several generations of <strong>GPU</strong>s starting with<br />
G80, the first CUDA-capable architecture. Prior to joining<br />
NVIDIA, Paulius was an assistant professor of Computer<br />
Science at Armstrong Atlantic State University as well as<br />
a research associate at the Media Convergence<br />
Laboratory at UCF. Paulius holds a PhD in Computer<br />
Science from the University of Central Florida and a B.S.<br />
in Computer Science from Midwestern State University.<br />
h Session(s): S0515 - Multi-<strong>GPU</strong> <strong>Program</strong>ming<br />
(Tuesday, 14:00, Room: Hall 1)<br />
h S0628 - Panel Session: Learn from Experts in the<br />
Oil & Gas Industry (Tuesday, 16:30, Room: A7)<br />
h S0514 - <strong>GPU</strong> Performance Analysis and<br />
Optimization (Wednesday, 15:30, Hall 1)<br />
Phillip Miller<br />
Director, Workstation Software Product<br />
Management (NVIDIA)<br />
Bio unavailable at press time.<br />
h Session(s): S0603 - <strong>GPU</strong> Ray Tracing<br />
(Monday, 10:30, Room: A3)<br />
h S0604 - NVIDIA Advanced Rendering Solutions<br />
(Monday, 13:00, Room: A3)<br />
Aamir Mohammad<br />
Associate Director (Aon Benfield Securities)<br />
Aamir leads the development of High Productivity<br />
Computing solutions for Variable Annuity derivatives<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
127
SPEAKERS AND<br />
PANELISTS<br />
models, at Aon Benfield Securities. Prior to joining Aon,<br />
Aamir worked in US Variable Annuity Hedging at a global<br />
insurance company, and began his career in quantitative<br />
finance at a hedge fund in Toronto. Aamir has over five<br />
years of experience in computational finance, trading<br />
and software development. Aamir holds an Honors B.Sc.<br />
in Applied Mathematics & Statistics from the University<br />
of Toronto.<br />
h Session(s): S0418 - High Productivity<br />
Computational Finance on <strong>GPU</strong>s<br />
(Tuesday, 14:00, Room: L)<br />
Jamal Mohd-Yusof<br />
(Los Alamos National Laboratory)<br />
Jamal Mohd-Yusof is member of the Collaborative<br />
<strong>Program</strong>ming team in Applied Computer Science group<br />
at LANL. He was part of team which worked on Open<br />
Science programming for Roadrunner, where he was<br />
responsible for refactoring and porting of the CFDNS-RR<br />
fluid dynamics code, including development of a novel<br />
low-communication tridiagonal solver. He has been<br />
working with advanced architectures for several years,<br />
and teaches OpenCL courses at LANL. He is currently<br />
developing and profiling physics algorithms for a variety<br />
of advanced architectures. Prior to coming to LANL he<br />
worked at the Center for Turbulence Research at<br />
Stanford University. He received his MS and PhD from<br />
Cornell University in fluid mechanics, where he<br />
developed novel computational techniques for multiphase<br />
flow simulation.<br />
h Session(s): S0708 - Accelerated HPC Symposium:<br />
Applications - Methods and <strong>Program</strong>ming Models,<br />
Part 1 (Thursday, 09:00, Room: J3)<br />
Alexander Monakov<br />
Researcher (ISP RAS)<br />
Alexander Monakov is a PhD candidate at Moscow State<br />
University and a researcher at Institute for System<br />
<strong>Program</strong>ming, specializing in program optimization and<br />
compiler technology. He has provided improvements to<br />
the GCC compiler, including contributions to Graphite-<br />
OpenCL, an automatic translation pass that generates<br />
OpenCL code from parallel loops.<br />
h Session(s): S0115 - Specialized Sparse Matrix<br />
Formats and SpMV Kernel Tuning for <strong>GPU</strong>s<br />
(Wednesday, 10:30, Marriott Ballroom 3)<br />
Brooks Moses<br />
Ph.D., Sourcerer (Mentor Graphics Corporation)<br />
Dr. Moses leads the High Performance Computing<br />
Solutions team in Mentor Graphics’ Embedded Software<br />
Division. He also participates directly in the development<br />
of the Sourcery VSIPL++ library and other highperformance<br />
library products. Dr. Moses worked<br />
extensively on the Cell/B.E and NVIDIA CUDA ports of<br />
Sourcery VSIPL++. Dr. Moses holds a Ph.D. in<br />
Mechanical Engineering from Stanford University where<br />
he conducted advanced research into algorithms for<br />
computational fluid dynamics simulation.<br />
h Session(s): S0620 - VSIPL++: A High-Level<br />
<strong>Program</strong>ming Model for Productivity and<br />
Performance (Tuesday, 15:00, Room: M)<br />
Daniel Moth<br />
Principal <strong>Program</strong> Manager (Microsoft)<br />
As a Principal <strong>Program</strong> Manager in the Developer<br />
Division, Daniel Moth is responsible for parallel runtimes<br />
and tools that ship with Visual Studio. He has been with<br />
Microsoft for over five years; before that he worked in<br />
the UK as a consultant for Avanade, and before that as a<br />
developer for a Honeywell company for seven years. In<br />
his free time you can find him on FICS playing chess or<br />
near a beach SCUBA diving with his wife.<br />
h Session(s): S0242 - Harnessing <strong>GPU</strong> Compute<br />
with C++ AMP (Part 1 of 2)<br />
(Wednesday, 17:00, Room: A3)<br />
h S0244 - Harnessing <strong>GPU</strong> Compute with C++ AMP<br />
(Part 2 of 2) (Thursday, 10:00, Room: C)<br />
Supratik Moulik<br />
Cardiovascular Imaging Fellow (University of Pennsylvania)<br />
Biography unavailable at press time.<br />
h Session(s): S0303 - <strong>GPU</strong> Acceleration for<br />
Threshold Based Region Growth Algorithms<br />
(Thursday, 09:00, Room: C)<br />
Sathya Narayana K.<br />
Principal Consultan (Infosys Ltd.)<br />
Sathya Narayana K. is a Principal Consultant with<br />
Advanced Engineering Group (AEG) of Infosys. He has<br />
more than twenty years of experience in the areas of<br />
high performance scientific computing (HPC), Computer<br />
Graphics (CG), Mathematical Modeling & Simulation and<br />
Engineering Software Development. His research<br />
interests include Mathematical Modeling, Simulation,<br />
Optimization and Operations Research in Aerospace,<br />
Gaming, Oil and Gas industry. He has Master of Science<br />
degree in structural engineering (1993) and information<br />
technology. He has published 5 papers in national and<br />
international conferences.<br />
h Session(s): S0214 - <strong>GPU</strong> Based Stacking Sequence<br />
Optimization For Composite Skins Using GA<br />
(Wednesday, 15:00, Room: K)<br />
Ramesh Narayanaswamy<br />
Principal Engineer (Synopsys Inc.)<br />
Ramesh works on Optimizing Compilers and Special<br />
Purpose Supercomputers for Hardware Description<br />
Language execution. Notable architectures from past<br />
projects include a 96 core Heterogeneous Computer with<br />
MIPS Core + ASIC Coprocessor, a 1024 core HDL<br />
Processor, and a Multicore CPU + Array of FPGAs. These<br />
architectures provide orders of magnitude performance<br />
improvement. Ramesh has been granted seven patents.<br />
h Session(s): S0317 - Compiling a Parallel<br />
Domain Specific Language to <strong>GPU</strong>s<br />
(Tuesday, 09:00, Room: J3)<br />
Rajib Nath<br />
Student (University of California San Diego)<br />
Biography unavailable at press time.<br />
h Session(s): S0248 - Excitements, Challenges, and<br />
Rewards In Optimizing GP<strong>GPU</strong> Kernels<br />
(Tuesday, 09:00, Marriott Ballroom 3)<br />
Vincent Natoli<br />
Founder & CEO (Stone Ridge <strong>Technology</strong>)<br />
Dr. Vincent Natoli is the founder and CEO of Stone Ridge<br />
<strong>Technology</strong>. Stone Ridge is an NVIDIA partner that<br />
develops, optimizes and ports complex scientific and<br />
engineering codes to <strong>GPU</strong> and multi-core platforms. The<br />
company focusses on work in the energy industry and has<br />
experience with seismic, reservoir simulation and other<br />
industry applications. Dr. Natoli has a BS and MS from MIT,<br />
a PhD in Physics from the University of Illinois Urbana-<br />
Champaign and an MS in technology management from<br />
the University of Pennsylvania and Wharton School. He<br />
worked for 10 years with ExxonMobil Corporate research<br />
before starting Stone Ridge <strong>Technology</strong>.<br />
h Session(s): S0140 – Accelerating Reservoir<br />
Simulation and Algebraic Multigrid with <strong>GPU</strong>s<br />
(Wednesday, 14:00, Room: A7)
Maxim Naumov<br />
Software Engineer (NVIDIA)<br />
Maxim Naumov’s expertise is in the area of parallel<br />
numerical linear algebra. In particular, he has worked<br />
on parallel iterative linear systems and eigenvalue<br />
solvers. He received his Ph.D. in Computer Science (with<br />
specialization in Computational Science and<br />
Engineering) in 2009 and his B.Sc. in Computer Science<br />
and Mathematics in 2003, all from Purdue University<br />
– West Lafayette. He currently works in NVIDIA CUDA<br />
Platform team developing parallel numerical algorithms<br />
for Graphics Processing Units (<strong>GPU</strong>s). He has previously<br />
worked in the Intel Corporation Microprocessor<br />
<strong>Technology</strong> Lab and Computational Software Lab, and<br />
received a 2008-09 Intel Foundation Ph.D. Fellowship.<br />
h Session(s): S0149 - On the Parallel Solution of<br />
Sparse Triangular Linear Systems<br />
(Wednesday, 16:00, Room: A3)<br />
Dan Negrut<br />
Associate Professor (University of Wisconsin-Madison)<br />
Dan Negrut received his Mechanical Engineering Ph.D.<br />
in 1998 from the University of Iowa after which he spent<br />
six years in the CAE industry. In 2004 he served as<br />
Adjunct Assistant Professor in the Department of<br />
Mathematics at the University of Michigan. He spent<br />
2005 as a Visiting Scientist at Argonne National<br />
Laboratory in the Mathematics and Computer Science<br />
Division. At the end of 2005 Dan joined the Mechanical<br />
Engineering faculty at the University of Wisconsin-<br />
Madison. His interests are in Computational Science and<br />
he leads the Simulation-Based Engineering Lab (http://<br />
sbel.wisc.edu) and Wisconsin Applied Computing Center.<br />
h Session(s): S0518 - <strong>GPU</strong> Computing: From Sand to<br />
Tank Dynamics (Wednesday, 17:00, Room: K)<br />
Chee Ng<br />
Research Assistant Professor of Pediatrics (Children<br />
Hospital of Philadelphia/University of Pennsylvania)<br />
Dr. Chee M Ng PharmD PhD FCP, is a Research<br />
Assistant Professor of Pediatrics, at the University of<br />
Pennsylvania and an investigator of the Laboratory for<br />
Applied Pharmacokinetic/Pharmacodynamic in the<br />
Division of Clinical Pharmacology and Therapeutics at<br />
the Children’s Hospital of Philadelphia (CHOP). He is<br />
also an investigator of Kinetic Modeling and Simulation<br />
(KMAS) core of the University of Pennsylvania. He<br />
received his B.S. from the State University of New York at<br />
Buffalo, Doctor of Pharmacy with High Honor from the<br />
University of Illinois, PhD in pharmaceutics from the<br />
University of North Carolina at Chapel Hill.<br />
h Session(s): S0262 - <strong>GPU</strong>-Accelerated Model-Based<br />
Drug Development (Wednesday, 10:00, Room: B)<br />
Trung Dac Nguyen<br />
(University of Michigan)<br />
Biography unavailable at press time.<br />
h Session(s): S0058 – Advancing <strong>GPU</strong> Molecular<br />
Dynamics: Rigid Bodies in HOOMD-blue<br />
(Wednesday, 10:00, Room: N)<br />
Dave Nichols<br />
(Schlumberger)<br />
Biography unavailable at press time.<br />
h Session(s): S0628 - Panel Session: Learn from<br />
Experts in the Oil & Gas Industry<br />
(Wednesday, 16:30, Room: A7)<br />
Marc Nienhaus<br />
(NVIDIA ARC)<br />
Biography unavailable at press time.<br />
h Session(s): S0507 – Interactive and Scalable<br />
Subsurface Data Visualization Framework<br />
(Wednesday, 16:00, Room: A7)<br />
Claus Nilsson<br />
<strong>Program</strong>mer (Tietronix Software, Inc.)<br />
Biography unavailable at press time.<br />
h Session(s): S0321 - <strong>GPU</strong>-Based Monte Carlo Ray<br />
Tracing Simulation for Solar Power Plants<br />
(Tuesday, 14:00, Room: A8)<br />
Lars Nyland<br />
Senior Architect (NVIDIA)<br />
Lars Nyland has been a Senior Architect in the Compute-<br />
Architecture Group at NVIDIA for over 6 years. Among his<br />
concerns is memory performance for <strong>GPU</strong> computing,<br />
and one of the more interesting sub-problems has been<br />
the implementation and performance evaluation of atomic<br />
memory operations on tesla, fermi and kepler <strong>GPU</strong>s.<br />
Prior to joining NVIDIA, Lars was a professor of Computer<br />
Science at the University of North Carolina and the<br />
Colorado School of Mines. Lars earned his Ph.D. studying<br />
parallel programming at Duke University in 1991.<br />
h Session(s): S0313 - Understanding and using<br />
Atomic Memory Operations<br />
(Tuesday, 14:00, Marriott Ballroom 3)<br />
h Session(s): S0642 – Inside Kepler<br />
(Wednesday, 14:00, Hall 1)<br />
Akira Nukada<br />
Researcher (Tokyo Institute of <strong>Technology</strong>)<br />
Akira Nukada is a researcher at Global Scientific<br />
Information and Computing center, Tokyo Institute of<br />
<strong>Technology</strong>, Japan. His research interest includes high<br />
performance computing, especially on fast Fourier<br />
transform and <strong>GPU</strong> computing. He has developed the<br />
FFTSS library and NukadaFFT library, which are for<br />
superscalar processor systems and for NVIDIA CUDA<br />
<strong>GPU</strong>s, respectively. Both of them have a kind of<br />
auto-tuning mechanism and the performance is often<br />
competitive with vendor’s libraries.<br />
h Session(s): S0209 - Performance of 3-D FFT<br />
Using Multiple <strong>GPU</strong>s with CUDA 4<br />
(Wednesday, 10:30, Room: A3)<br />
Anton Obukhov<br />
Engineering Consultant (Ubiquiti Networks)<br />
Anton Obukhov’s specialization lies in the field of<br />
computer vision, multimedia processing, and systems<br />
design. Prior to joining Ubiquiti Networks, he was an<br />
engineer at NVIDIA in the Developer <strong>Technology</strong> group<br />
for four years. He graduated from Moscow State<br />
University with a master’s degree in Computer Science<br />
from the Computational Mathematics and Cybernetics<br />
department in Russia. Before joining NVIDIA, he<br />
conducted research and development in the Graphics<br />
and Multimedia Lab at Moscow State University while<br />
also working at YUVsoft Corporation.<br />
h Session(s): S0062 - Histograms of Oriented<br />
Gradients with CUDA: Performance Analysis and<br />
Optimization Tips (Tuesday, 16:00, Room: A1)<br />
David Oehmke<br />
(Cray Inc.)<br />
Biography unavailable at press time.<br />
h Session(s): S0089 – Accelerator Directives, OpenACC<br />
and OpenMP4ACC (Tuesday, 16:00, Room: A3)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
129
SPEAKERS AND<br />
PANELISTS<br />
Taro Okamoto<br />
Assistant Professor (Department of Earth and Planetary<br />
Sciences, Tokyo Institute of <strong>Technology</strong>)<br />
Taro Okamoto’s major research fields include:<br />
geophysics, in particular seismology: simulating and<br />
analyzing seismic waves to study the structure of the<br />
Earth and other planets, and to study the earthquake<br />
source physics.<br />
h Session(s): S0352 - <strong>GPU</strong>-Accelerated Parallel<br />
Computing for Simulation of Seismic Wave<br />
Propagation (Wednesday, 10:30, Room: A7)<br />
Michal Okoniewski<br />
Director of Marketing (Acceleware Ltd.)<br />
Biography unavailable at press time.<br />
h Session(s): S0433 – Accelerated FDTD Technique<br />
for Marine Controlled Source Electromagnetic<br />
Imaging (Wednesday, 15:30, Room: A7)<br />
Aaron Oliker<br />
Partner/Director of 3D <strong>Technology</strong> (BioDigital)<br />
Aaron is a partner and Director of 3D <strong>Technology</strong> at<br />
BioDigital. Aaron is an expert in the field of 3D computer<br />
based medical simulation and his work has created a<br />
new paradigm in medical education. Aaron is also a<br />
Research Assistant Professor of Educational Informatics<br />
New York University School of Medicine. He has taught<br />
3D programming and medical visualization at the<br />
undergraduate and graduate level at NYU and SVA for<br />
past 12 years. Prior to BioDigital, Aaron founded<br />
CyberFiber, Inc. and was the Director of Animation and<br />
<strong>Program</strong>ming at the New York University School of<br />
Medicine Virtual Surgery Research Laboratory.<br />
h Session(s): S2001 – Emerging Companies<br />
Summit: CEO on Stage Featuring Unity<br />
Technologies, MirriAd, and BioDigital<br />
(Wednesday, 10:00, Marriott Ballroom 4)<br />
Brent Oster<br />
Applied Engineer (NVIDIA)<br />
Brent Oster is an applied engineer at NVIDIA, with 17<br />
years experience in computer graphics and simulation,<br />
having worked with Bioware, LucasFilm, Electronic Arts,<br />
and holding a degree in Aerospace Engineering and<br />
graduate studies in scientific computing.<br />
h Session(s): S0403 - NURBS Tessellation with CUDA<br />
(Tuesday, 15:00, Room: J1)<br />
Eugene Ostroukhov<br />
Tools Developer (NVIDIA)<br />
Eugene Ostroukhov is currently a part of the NVIDIA<br />
CUDA developer tools team, developing NVIDIA Nsight<br />
for Linux and Mac platforms. He believes in visual tools<br />
as an important way to combat ever-increasing software<br />
complexity and spent almost a decade working on visual<br />
tools and popular integrated developing environments<br />
for Java, web and mobile application developers. He<br />
holds B.S. and M.S. from KNEU.<br />
h Session(s): S0420 – NSight IDE for Linux and Mac<br />
(Wednesday, 09:00, Room: A5)<br />
Andrew Page<br />
Senior Product Manager (NVIDIA)<br />
Andrew Page is the Senior Product Manager for<br />
multi-display and broadcast video products in NVIDIA’s<br />
Quadro product line. Over his 15 years in hardware and<br />
software industries he has held engineering and<br />
marketing roles in professional photo imaging, color<br />
management and high performance 3D graphics toolkits.<br />
h S0530 - Multi-Display Roundtable<br />
(Monday, 13:00, Room: A2)<br />
h S0326 - Next Generation InfoWall<br />
(Thursday, 09:00, Room: A1)<br />
Szilárd Páll<br />
PhD Student (KTH Royal Institute of <strong>Technology</strong>)<br />
Szilard is a PhD student at KTH Royal Instute of<br />
<strong>Technology</strong>, working on parallel algorithms for Molecular<br />
Dynamics; developer of the GROMACS MD package.<br />
h Session(s): S0363 - Efficient Molecular Dynamics<br />
on Heterogeneous <strong>GPU</strong> Architectures in GROMACS<br />
(Wednesday, 16:00, Room: N)<br />
Jeremie Papon<br />
PhD Student (University of Gottingen)<br />
Biography unavailable at press time.<br />
h Session(s): S0075 - Oculus Real-Time Modular<br />
Cognitive Vision System (Tuesday, 15:00, Room: A1)<br />
Valerio Pascucci<br />
(University of Utah)<br />
Dr. Valerio Pascucci is the Director of the Center for<br />
Extreme Data Management, Analysis and Visualization<br />
(CEDMAV.COM) of the University of Utah establishes in<br />
collaboration of the Pacific Northwest National<br />
Laboratory (PNNL). Valerio is a Professor of the Scholl of<br />
computing, Associate Director of the Scientific<br />
Computing and Imaging (SCI) Institute, and a Laboratory<br />
Fellow at the PNNL. Before joining SCI, Dr. Pascucci<br />
served as a Group Leader and a Project Leader at the<br />
Lawrence Livermore National Laboratory, Center for<br />
Applied Scientific Computing and as Adjunct Professor<br />
at the Computer Science Department of University of<br />
California Davis.<br />
h Session(s): S0623 Visualizing Heterogeneous<br />
Performance Tested on MPI+CUDA Gigapixel<br />
Panorama Stitching (Wednesday, 17:00, Room: A8)<br />
Ritesh Patel<br />
Student (University of California Davis)<br />
Ritesh is a graduate student pursuing my M.S. degree in<br />
Electrical and Computer Engineering at the University of<br />
California, Davis. His interests are in the area of GP<strong>GPU</strong><br />
applications.<br />
h Session(s): S0361 - Lossless Data Compression on<br />
<strong>GPU</strong>s (Wednesday, 17:00, Room: B)<br />
Sandeep Patel<br />
Assitant Professor (University of Delaware)<br />
Sandeep Patel is and Assistant Professor in the<br />
Department of Chemistry and Biochemistry at the<br />
University of Delaware. He earned his Ph.D. in Chemical<br />
Engineering from the Massachusetts Institute of<br />
<strong>Technology</strong> (MIT). His research interests include the<br />
broad areas to which simulation techniques of<br />
biophysical systems and development of advanced<br />
molecular modeling technologies are applied.<br />
h Session(s): S0207 – <strong>GPU</strong> Enabled Macromolecular<br />
Simulation: Challenges and Opportunities<br />
(Wednesday, 15:30, Room: N)<br />
Anjul Patney<br />
PhD Candidate (University of California, Davis)<br />
Anjul is a fifth year PhD student in the Department of<br />
Electrical and Computer Engineering at University of<br />
California, Davis. He works under the guidance of Prof.<br />
John Owens in the area of graphics and computer<br />
architecture. In his research, he is interested in pursuing<br />
hardware and software challenges in the design of<br />
programmable rendering architectures.<br />
h Session(s): S0138 – <strong>GPU</strong> Task-Parallelism:<br />
Primitives and Applications<br />
(Thursday, 15:30, Marriott Ballroom 3)
Bharath Pattabiraman<br />
PhD Student (Northwestern University)<br />
Biography unavailable at press time.<br />
h Session(s): S0087 - <strong>GPU</strong> Acceleration of<br />
Dense Stellar Clusters Simulation<br />
(Thursday, 15:00, Room: M)<br />
Sushrut Pavanaskar<br />
PhD Candidate (UC Berkeley)<br />
Sushrut Pavanaskar is a PhD candidate in Mechanical<br />
Engineering at UC Berkeley. His research interests<br />
include CAD/CAM, geometric modeling, <strong>GPU</strong> algorithms,<br />
computer graphics, and manufacturing. Applications of<br />
his research include solid model rendering, toolpath<br />
planning, and methods to improve efficiency in<br />
manufacturing. He received his BE in Mechanical<br />
Engineering from Pune University and his M. Tech. from<br />
IIT Bombay in Manufacturing. Currently at Berkeley, he<br />
works in computer aided design and manufacturing<br />
laboratory advised by Prof. Sara McMains. He recently<br />
won Audi Production Award 2011 for his concept on<br />
applying advanced geometric algorithms in automobile<br />
manufacturing for resource efficiency.<br />
h Session(s): S0411 – Artifact-Free Cloud-Based CAD<br />
Rendering (Thursday, 16:30, Room: L)<br />
Jon Peddie<br />
President (Jon Peddie Research)<br />
Jon Peddie is one of the pioneers of the graphics<br />
industry, starting his career in computer graphics in<br />
1962. After the successful launch of several graphics<br />
manufacturing companies, Peddie began JPA in 1984 to<br />
provide comprehensive data, information and<br />
management expertise to the computer graphics<br />
industry. Peddie lectures at numerous conferences on<br />
topics pertaining to graphics technology and the<br />
emerging trends in digital media technology. Recently<br />
named one of the most influential analysts, he is<br />
frequently quoted in trade and business publications,<br />
and contributes articles to numerous publications<br />
including as well as appearing on CNN and TechTV.<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Bert Peers<br />
Senior Graphics <strong>Program</strong>mer (CCP Games)<br />
Bert Peers is a senior graphics programmer with<br />
Iceland based CCP Games, the company behind the<br />
single shard space MMO Eve Online. After working in the<br />
games industry as a freelancer for over a decade, as<br />
well as a few years in the field of medical imaging and<br />
rapid prototyping, he joined CCP to focus on high fidelity<br />
avatar customization, rendering, and all things<br />
characters.<br />
h Session(s): S0021 - OptiX for DirectX <strong>Program</strong>mers<br />
- EVE Online’s <strong>GPU</strong>-Raytraced Portraits<br />
(Tuesday, 16:30, Room: J1)<br />
Blair Perot<br />
Professor (University of Massachusetts, Amherst)<br />
Prof. Perot is the Director of the Theoretical and<br />
Computational Fluid Dynamics Laboratory at the<br />
University of Massachusetts, Amherst. He obtained his<br />
Ph.D. and M.S. degrees in Mechanical Engineering and<br />
in Computer Science from Stanford University and a<br />
B.S.E in Engineering Physics with highest honors from<br />
Princeton University in 1987. Research in the Theoretical<br />
and Computational Fluid Dynamics Laboratory focuses<br />
on high performance computing, the computer<br />
simulation of fluid flow, and the study of fluid turbulence.<br />
The Laboratory is funded, in part, by the Office of Naval<br />
Research, the Air Force Office of Scientific Research, the<br />
DOE and the NSF.<br />
h Session(s): S0217 – Efficient Implementation of<br />
CFD Algorithms on <strong>GPU</strong> Accelerated<br />
Supercomputers (Wednesday, 17:30, Room: K)<br />
David Perry<br />
CEO and Co-Founder (GAIKAI)<br />
David Perry was the founder & president Shiny<br />
Entertainment, Inc. for over 12 years (bought by Atari),<br />
he’s one of the best known video game industry<br />
veterans. Over 29 years, Perry has developed or<br />
programmed over 100 games across 29 video game<br />
platforms. All told, Perry’s games (including #1 hits like<br />
The Terminator, Teenage Mutant Ninja Turtles, Disney’s<br />
Aladdin & Warner’s Matrix projects) have totaled over a<br />
billion dollars in retail sales. Perry sits on the advisory<br />
board of the Game Developers <strong>Conference</strong>, Indiecade,<br />
VGEXPO, and has spoken at TED, E3, Hollywood and<br />
Games Summit, CGDC, MIT, USC, UCI, UCLA, QUB,<br />
Montreal Game Summit, Digital Hollywood, What Teens<br />
Want etc.). In his last position Perry was the co-founder<br />
& chief creative officer of Acclaim.com, directing<br />
multiple MMORPG games, Social Network Games &<br />
Casual Titles. All games used the ‘free-to-play’ model,<br />
supported by in-game advertising, subscriptions or<br />
micro-transactions. Now Perry is the CEO and cofounder<br />
of Gaikai.com, a company that’s developed a<br />
cutting-edge video game streaming technology that<br />
allows any Windows game or application to run in any<br />
browser with just one click. Perry also recently launched<br />
a book for students called David Perry on Game Design<br />
- GameDesignBook.org (the largest non-profit book on<br />
Game Design ever written).<br />
h Session(s): S2004 – Emerging Companies<br />
Summit: CEO on Stage Featuring GAIKAI,<br />
Immersive Media, and Numecent<br />
(Wednesday, 15:00, Marriott Ballroom 4)<br />
Christian Perwass<br />
CEO (Raytrix GmbH)<br />
Dr. Christian Perwass received a MSci degree in Physics<br />
from the University of London, UK, in 1996, and a Ph.D.<br />
in engineering from Cambridge University, UK, in 1999.<br />
He then held a post-doctoral position at the University of<br />
Kiel, Germany, until 2006, where he worked on image<br />
processing, machine learning and camera models. From<br />
2006 until 2009 he worked at Robert Bosch GmbH,<br />
Germany, where he developed image processing<br />
software for automated optical inspection machines. In<br />
2009 he co-founded Raytrix GmbH to develop and build<br />
lightfield cameras.<br />
h Session(s): S0335 - Live 3D-Video with a Lightfield<br />
Camera (Wednesday, 14:00, Room: A1)<br />
h S2006 - Emerging Companies Summit: CEO on<br />
Stage Featuring Raytrix, Playcast and Universal<br />
Robotics (Wednesday, 17:00, Marriott Ballroom 4)<br />
David Peters<br />
(CEO, Universal Robotics)<br />
David launched Universal Robotics in April of 2008,<br />
having raised private equity to capitalize operations. He<br />
is the Chairman of the Board. Before founding Universal,<br />
he was an entrepreneur in the motion picture industry,<br />
working as a producer for 17 years. He is a seasoned<br />
operations executive and fund raiser. David is a member<br />
of the Director’s Guild of America and the Robotics and<br />
Smart Device Committee of the World Economic Forum<br />
Network of Global Agenda Councils. He has a Bachelor<br />
of Fine Arts from the Cleveland Institute of Art.<br />
h S2006 - Emerging Companies Summit: CEO on<br />
Stage Featuring Raytrix, Playcast and Universal<br />
Robotics (Wednesday, 17:00, Marriott Ballroom 4)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
131
SPEAKERS AND<br />
PANELISTS<br />
Loukas Petridis<br />
Staff Scientist (Oak Ridge National Laboratory)<br />
Loukas Petridis obtained his PhD in theoretical physics<br />
from Cambridge University in 2006. He is a Postdoctoral<br />
fellow at Oak Ridge National Laboratory from 2007 to<br />
2009 where he currently is a Staff Scientist.<br />
h Session(s): S0659 - Computer Simulation of<br />
Lignocellulosic Biomass (Tuesday, 16:30, Room: A2<br />
James Phillips<br />
Senior Research <strong>Program</strong>mer (University of Illinois)<br />
James Phillips is a Senior Research <strong>Program</strong>mer in the<br />
Theoretical and Computational Biophysics Group at the<br />
Beckman Institute for Advanced Science and <strong>Technology</strong><br />
at the University of Illinois at Urbana-Champaign. He<br />
has a Ph.D. in Physics from the University of Illinois.<br />
Since 1999, James has been the lead developer of the<br />
highly scalable parallel molecular dynamics program<br />
NAMD, for which he received a Gordon Bell Award in<br />
2002. His research interests include improving the<br />
performance and accuracy of biomolecular simulations<br />
through parallelization, optimization, hardware<br />
acceleration, better algorithms, and new methods.<br />
h Session(s): S0127 - Petascale Molecular Dynamics<br />
Simulations on <strong>GPU</strong>-Accelerated Supercomputers<br />
(Wednesday, 15:00, Room: N)<br />
h S0709 - Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 2<br />
(Thursday, 14:00, Room: J1)<br />
Peter Phillips<br />
SVP (Aon Benfield Securities)<br />
Biography unavailable at press time.<br />
h Session(s): S0418 – High Productivity<br />
Computational Finance on <strong>GPU</strong>s<br />
(Tuesday, 14:00, Room: L)<br />
Jakub Pietrzak<br />
Software Engineer (University of Warsaw)<br />
Jakub Pietrzak is member in the research team in the<br />
Department of Medical Physics, Maria Skłodowska-<br />
Curie Memorial Cancer Centre - Institute of Oncology.<br />
He is an experienced C++ developer interested in image<br />
processing and analyzing techniques and their<br />
applications in medical imaging. He also worked as<br />
software engineer for a video postproduction company.<br />
Jakub is a student of the final year of Inter-faculty<br />
Individual Studies In Mathematics and Natural Sciences<br />
at the University of Warsaw, where he studies<br />
simultaneously physics (specialization in nuclear<br />
medicine) and mathematics.<br />
h Session(s): S0312 - <strong>GPU</strong> Implementation for Rapid<br />
Iterative Image Reconstruction in Nuclear<br />
Medicine (Wednesday, 10:00, Room: A8)<br />
Nikos Pitsianis<br />
Assistant Professor (Aristotle University, Greece)<br />
Nikos Pitsianis is an assistant professor at the<br />
Department of Electrical and Computer Engineering,<br />
Aristotle University of Thessaloniki, Greece, and an<br />
adjunct professor with the Departments of Computer<br />
Science and Electrical and Computer Engineering of<br />
Duke University, Durham, North Carolina. His research<br />
interests include high-performance algorithms and<br />
architectures for signal and image processing.<br />
h Session(s): S0314 - Efficient k-Nearest<br />
Neighbor Search Algorithms on <strong>GPU</strong>s<br />
(Tuesday, 16:30, Room: C)<br />
Victor Podlozhnyuk<br />
Software Engineer (NVIDIA)<br />
Victor Podlozhnyuk is a performance optimization expert<br />
currently working on NVIDIA FFT library. In his spare time<br />
he is investigating various opportunities for putting to use<br />
the tremendous amount of horsepower modern<br />
<strong>GPU</strong>-based systems have. In his previous role of a devtech<br />
engineer at NVIDIA he authored a number of sample<br />
algorithm implementations in CUDA and OpenCL for<br />
NVIDIA <strong>GPU</strong> Computing SDK. Victor holds a Master’s and<br />
a Bachelor’s degree in Electrical Engineering from<br />
Moscow Institute of Physics and <strong>Technology</strong>.<br />
h Session(s): S0273 - Fast JPEG Coding on the <strong>GPU</strong><br />
(Wednesday, 16:00, Room: A1)<br />
Raphaël Poncet<br />
Research Scientist (Commissariat à l’Energie Atomique<br />
et aux Energies Alternatives)<br />
Raphael Poncet is a research scientist at CEA (the French<br />
Alternative Energies and Atomic Energy Commission), a<br />
French government-funded technological research<br />
institution, where he works on a high performance<br />
industrial multi-physics multi-material hydrodynamic code.<br />
h Session(s): S0091 - Sustainable Hybrid<br />
Parallelization of an Unstructured Hydrodynamic<br />
Code (Thursday, 15:00, Room: N)<br />
Warren Ponder<br />
Director, Product Management (VMware)<br />
Biography unavailable at press time.<br />
h Session(s): S0359 – VMware and NVIDIA:<br />
Delivering 3D Workstations from the Cloud<br />
(Tuesday, 17:00, Room: A5)<br />
Duncan Poole<br />
Senior Manager, HPC (NVIDIA)<br />
Duncan Poole is the CEO of the OpenACC organization,<br />
and Senior Manager in HPC for NVIDIA, he where he<br />
works with 3rd party tools providers to deliver <strong>GPU</strong>enabled<br />
capabilities. Duncan’s interests include<br />
fostering strong academic research relationships, most<br />
recently in the area of computational chemistry. Duncan<br />
is a graduate in Electrical Engineering from the<br />
University of Toronto.<br />
h Session(s): S0517A – <strong>Program</strong>ming <strong>GPU</strong>s with<br />
OpenACC (Part 1 of 3) (Monday, 10:30, Room: B)<br />
h S0517B – <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />
2 of 3) (Monday, 13:00, Room: B)<br />
h S0517C – <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />
3 of 3) (Monday, 14:30, Room: B)<br />
h S0621 - NVIDIA OpenACC<br />
(Thursday, 09:30, Room: A5)<br />
Mark Popkiewicz<br />
CEO (MirriAd)<br />
Mark has extensive executive experience in high growth<br />
companies and has grown businesses from small to<br />
large and from local to global market leadership globally,<br />
having set up 30 operations around the world. With<br />
extensive executive experience in high growth companies<br />
such as Eicon Network, SDX Business systems, Lucent<br />
Technologies, Mobile Media, and BBC Ventures Group,<br />
and BBC Vecta – he is now CEO of MirriAd. Mark has a<br />
thorough understanding of contemporary technology and<br />
business models in digital media, on-line advertising as<br />
well as telecoms and mobile.<br />
h Session(s): S2001 – Emerging Companies<br />
Summit: CEO on Stage Featuring Unity<br />
Technologies, MirriAd, BioDigital<br />
(Wednesday, 10:00, Marriott Ballroom 4))
Srinivasa Prasanna<br />
Professor (International Institute of Information<br />
<strong>Technology</strong> Bangalore)<br />
Biography unavailable at press time.<br />
h Session(s): S0271 – Fast Adaptive Sampling<br />
Technique for Multi-Dimensional Integral Estimation<br />
Using <strong>GPU</strong>s (Wednesday, 14:30, Marriott Ballroom 3)<br />
Will Ramey<br />
Sr. Product Manager, <strong>GPU</strong> Computing (NVIDIA)<br />
As NVIDIA’s Senior Product Manager for <strong>GPU</strong><br />
Computing, Will helps define and promote platforms,<br />
libraries and developer tools for CUDA architecture<br />
<strong>GPU</strong>s. Prior to joining NVIDIA in 2003, he managed an<br />
independent game studio and developed advanced<br />
technology for the entertainment industry as a product<br />
manager and software engineer. He holds a BA in<br />
Computer Science from Willamette University and<br />
completed the Japan Studies <strong>Program</strong> at the Tokyo<br />
International University. Outside of work, Will learns<br />
something new every day, usually from his two kids. He<br />
enjoys hiking, camping, swimming, spending time with<br />
his wonderful wife, and playing The Game.<br />
h Session(s): S0005 - Languages, APIs and<br />
Development Tools for <strong>GPU</strong> Computing<br />
(Monday, 09:00, Room: A5)<br />
Pradeep Rao<br />
<strong>Technology</strong> Architect (Infosys Technologies Ltd)<br />
Pradeep is <strong>Technology</strong> Architect at Infosys Limited,<br />
Bangalore, India. He has nine years of experience in the<br />
IT industry. His core focus area has been building<br />
solutions and applied research in the field of High<br />
Performance Computing (HPC). He has experience in<br />
many HPC technologies such as CUDA, OpenCL and<br />
multi-core technologies such as Microsoft HPC Server.<br />
As part of HPC team at Infosys, his responsibilities<br />
include providing consulting services to our Fortune 500<br />
clients for their HPC needs and building solutions<br />
leveraging suitable HPC technology. He has also worked<br />
on various Microsoft platforms including .Net<br />
technologies and Sql Server.<br />
h Session(s): S0271 - Fast Adaptive Sampling<br />
Technique for Multi-Dimensional Integral<br />
Estimation Using <strong>GPU</strong>s<br />
(Wednesday, 14:30, Marriott Ballroom 3)<br />
Steve Rennich<br />
HPC Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Steve Rennich is a CUDA Developer <strong>Technology</strong> Engineer<br />
at NVIDIA where he supports the use of <strong>GPU</strong>s in by<br />
computational structural mechanics community. Steve<br />
holds a PhD in Aeronautics and Astronautics from<br />
Stanford University where he studied computational fluid<br />
mechanics and vortex system instabilities. Prior to<br />
joining NVIDA Steve spent 10 years developing structural<br />
analysis codes.<br />
h Session(s): S0029 - Leveraging Matrix Block<br />
Structure In Sparse Matrix-Vector Multiplication<br />
(Wednesday, 14:00, Marriott Ballroom 3)<br />
Max Rietmann<br />
PhD Student (Institute for Computational Science / USI<br />
Lugano, Switzerland)<br />
Max Rietmann is a PhD Student in computer science at<br />
the Institute for Computational Science at the USI<br />
Lugano in Switzerland. As a developer for the <strong>GPU</strong><br />
version of seismology code SPECFEM3D, he research is<br />
focused on both computational and algorithmic<br />
challenges associated with numerical wave propagation.<br />
h Session(s): S0508 - Faster Finite Elements for Wave<br />
Propagation Codes (Thursday, 10:00, Room: A2)<br />
Mariano Rivera<br />
(Researcher-Professor, CIMAT A.C.)<br />
Biography unavailable at press time.<br />
h Session(s): S0128 - V:Screen: A Real-Time<br />
Augmented Video Method<br />
(Wednesday, 17:00, Room: A1)<br />
Dylan Roeh<br />
Kernel Developer (Wolfram Research Inc)<br />
Dylan Roeh is a Kernel Developer for Wolfram Research<br />
Inc., the company that makes Mathematica and<br />
Wolfram|Alpha. He is one of the developers responsible<br />
for the recently-added CUDA and OpenCL support.<br />
h Session(s): S0100 - Mathematica as a Practical<br />
Platform for <strong>GPU</strong>-Accelerated Finance<br />
(Wednesday, 17:00, Room: L)<br />
John Romein<br />
Senior Researcher (ASTRON)<br />
John W. Romein is a senior system researcher in<br />
high-performance computing at ASTRON, where he is<br />
responsible for the central, real-time data processing of<br />
LOFAR telescope data. He obtained his Ph.D. degree on<br />
distributed search algorithms for board-game playing at<br />
Vrije Universiteit, Amsterdam. As a postdoctoral<br />
researcher, he solved the game of Awari using a large<br />
computer cluster and did research on parallel<br />
algorithms for bioinformatics. His research interests<br />
include high-performance computing, parallel<br />
algorithms, networks, programming languages, and<br />
compiler construction.<br />
h Session(s): S0124 - Signal Processing on <strong>GPU</strong>s for<br />
Radio Telescopes (Thursday, 10:00, Room: M)<br />
Christopher Rossbach<br />
Researcher (Microsoft Research Silicon Valley)<br />
Chris Rossbach is a Researcher with Microsoft Research<br />
Silicon Valley.<br />
h Session(s): S0320 - PTask: OS Support for <strong>GPU</strong><br />
Dataflow <strong>Program</strong>ming (Thursday, 14:00, Room: B)<br />
Davide Rossetti<br />
Researcher (Italian National Institue for Nuclear Physics)<br />
Davide Rossetti has a degree in Theoretical Physics and<br />
is currently a staff researcher at Italian National Institute<br />
for Nuclear Physics (INFN). He has been member of the<br />
Array Processor Experiment (APE) research group for<br />
more than 15 years. His interests range from numerical<br />
simulations and HPC to processor architectures,<br />
compilers, computer graphics. He spent the last 10<br />
years working on the development of software and<br />
hardware for high performance interconnection<br />
networks on clusters.<br />
h Session(s): S0282 - Leveraging NVIDIA <strong>GPU</strong>Direct<br />
on APEnet+ 3D Torus Cluster Interconnect<br />
(Thursday, 16:00, Room: K)<br />
Scott Rostrup<br />
Software Engineer (Synopsys Inc)<br />
After completing a Masters Thesis at the University of<br />
Waterloo on developing fluid simulation algorithms for<br />
the Cell and <strong>GPU</strong> architectures, Scott joined Synopsys’s<br />
<strong>GPU</strong> computing effort. Since joining Synopsys, Scott has<br />
become interested in developing <strong>GPU</strong> algorithms for<br />
applications not typically thought suitable for<br />
acceleration such as sparse linear algebra, graph<br />
algorithms, and circuit simulation.<br />
h Session(s): S0349 - Tree Accumulation on the <strong>GPU</strong><br />
(Tuesday, 15:00, Room: J3)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
133
SPEAKERS AND<br />
PANELISTS<br />
Erwin Roth<br />
Researcher (Technische Universitaet Muenchen)<br />
Erwin graduated from Technische Universität München<br />
in 2008 with a Master of Science (Dipl.-Ing.) degree in<br />
Mechanical Engineering with a solid background in<br />
computer vision and model based tracking. He is<br />
currently working as PhD candidate for the Ingolstadt<br />
Institute of the Technische Universität München, a<br />
scientific research center founded by the AUDI AG and<br />
the Technische Universität München in the field of<br />
sensor data simulation for the computer-based testing<br />
of Advanced Driver Assistance Systems.<br />
h Session(s): S0319 - Advanced Driver<br />
Assistance System Testing using OptiX<br />
(Tuesday, 14:00, Room: N)<br />
Gregory Ruetsch<br />
Applied Engineer (NVIDIA)<br />
Greg Ruetsch is an applications engineer in <strong>GPU</strong><br />
Computing at NVIDIA. Prior to this he held positions at<br />
Clearspeed Technologies and at Sun Microsystems. He<br />
received his Bachelor’s degree in mechanical and<br />
aerospace engineering from Rutgers University and a<br />
Ph.D. in applied mathematics from Brown University,<br />
after which he was a postdoctoral fellow in the<br />
Aerospace Engineering Department at the University of<br />
Southern California and in the Center for Turbulence<br />
Research at Stanford University.<br />
h Session(s): S0522 - Introduction to CUDA Fortran<br />
(Monday, 14:30, Room: A3)<br />
Karl Rupp<br />
Project Assistant (TU Wien)<br />
Karl Rupp received the BSc degree in electrical<br />
engineering from the Technische Universität Wien in<br />
2006, the MSc in computational mathematics from<br />
Brunel University in 2007, and the degree of<br />
Diplomingenieur in microelectronics and in technical<br />
mathematics from the Technische Universität Wien in<br />
2009. He completed his doctoral degree on deterministic<br />
numerical solutions of the Boltzmann transport<br />
equation in 2011. His scientific interests are in the field<br />
of semiconductor device simulation and include generic<br />
programming, advanced discretization schemes for<br />
partial differential equations and parallel computing.<br />
h Session(s): S0071 - The High-Level Linear<br />
Algebra Library ViennaCL And Its Applications<br />
(Thursday, 15:00, Room: C)<br />
Scott Ruppert<br />
ThinkStation Technical Solutions Manager (Lenovo)<br />
Scott Ruppert is Technical Solutions Manager for the<br />
worldwide ThinkStation business unit at Lenovo.<br />
h Session(s): S0638 - Lenovo ThinkStation<br />
Accelerates Medical Research with Beckman<br />
Coulter (Presented by Lenovo)<br />
(Tuesday, 16:00, Room: M)<br />
Radu Rusu<br />
Research Scientist (Willow Garage, Inc)<br />
Radu B. Rusu is a Research Scientist at Willow Garage<br />
and a Visiting Lecturer at Stanford University. Dr. Rusu<br />
received his Ph.D. in Computer Science from the<br />
Technische Universitaet Muenchen, Germany. During the<br />
last few years, Dr. Rusu has been on the board of many<br />
workshops and scientific events held at prestigious<br />
conferences, such as RSS, ICRA, IROS, AAAI, etc. He has<br />
authored over 50 scientific publications, including 1 book<br />
and 1 best paper award at ICAR 2009. Dr. Rusu’s current<br />
research interests include realtime perception and 3D<br />
semantic mapping. He is currently a maintainer of the<br />
PCL project.<br />
h Session(s): S0088 - Point Cloud Library (PCL) on<br />
CUDA (Tuesday, 14:00, Room: C)<br />
Denis Sabitov<br />
(Schlumberger)<br />
Biography unavailable at press time.<br />
h Session(s): S0171 - Numerical Modeling Of 3D<br />
Anisotropic Seismic Wave Propagation On<br />
Multi<strong>GPU</strong> Platforms (Wednesday, 09:00, Room: A7)<br />
Priyanka Sah<br />
Compute DevTech Engineer (NVIDIA)<br />
Having spent two years with the Indian Space Research<br />
Organization, developing and implementing parallel<br />
image processing algorithms for satellite imagery,<br />
Priyanka Sah went on to attain her masters in Computer<br />
Science and Engineering at IIT Delhi. Priyanka<br />
subsequently worked on life science and weather<br />
simulation codes as a CUDA consultant, before joining<br />
NVIDIA in their Developer <strong>Technology</strong> group. With NVIDIA<br />
Priyanka works in a number of HPC application<br />
domains, helping customers develop with the <strong>GPU</strong> and<br />
working at the leading edge of HPC performance.<br />
h Session(s): S0428 - Panini: A <strong>GPU</strong> Aware Array<br />
Class (Thursday, 16:00, Room: B)<br />
Nikolai Sakharnykh<br />
Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Nikolai Sakharnykh is a developer technology engineer<br />
at NVIDIA. He has been working with game developers<br />
and HPC CUDA customers providing support for<br />
graphics technology and <strong>GPU</strong> compute. Currently he is<br />
working on CFD and linear algebra related projects for<br />
current and future <strong>GPU</strong> hardware. His interests include<br />
computational fluid dynamics, sparse matrix solvers and<br />
visualization techniques. Nikolai graduated with honours<br />
from Moscow State University, the department of<br />
Computational Mathematics and Cybernetics as a<br />
specialist in applied mathematics and informatics.<br />
Currently he’s also working on his PhD at MSU.<br />
h Session(s): S0247 - 3D ADI Method for<br />
Fluid Simulation on Multiple <strong>GPU</strong>s<br />
(Tuesday, 17:00, Marriott Ballroom 3)<br />
Graham Sanborn<br />
Lead Software Developer (FunctionBay)<br />
Graham Sanborn is a research engineer at FunctionBay,<br />
Inc. He is a member of the multi-flexible-body dynamics<br />
(MFBD) development team, where his research and<br />
development focus is finite element technologies for<br />
nonlinear dynamics, the integration of these technologies<br />
with multi-body formulations for system-level analysis of<br />
dynamic systems, and the numerical methods<br />
appropriate for these systems. He has a bachelor’s<br />
degree in computer science and a PhD in mechanical<br />
engineering. He received his PhD in 2008 from the<br />
University of Illinois at Chicago, where he studied<br />
computational rigid and flexible body system dynamics.<br />
h Session(s): S0055 - Particle Dynamics with MBD<br />
and FEA using CUDA (Wednesday, 16:00, Room: K)<br />
Avijit Santra<br />
Project Manager - Knowledge Based Engineering (Tata<br />
Motors Limited)<br />
Avijit Santra received his Masters in Mechanical<br />
Engineering from IIT Kharagpur 2001. He then joined Tata<br />
Technologies Ltd in 2001 and deputed to Tata Motors Ltd<br />
Engineering Research Center. Having 10 years of<br />
experience in Knowledge Based Engineering Kernel and<br />
Application development, he is also involved in various<br />
initiatives in Tata Motors Digital Vehicle Development<br />
<strong>Program</strong> which includes PLM, 3D for All etc.
h Session(s): S0040 - Introducing CUDA in KBE<br />
Applications for Digital Vehicle Development<br />
<strong>Program</strong>s (Tuesday, 09:30, Room: J2)<br />
Greg Scantlen<br />
Greg Scantlen is CEO of CreativeC.com, a supplier of<br />
high-performance computing machines and expertise to<br />
scientists and researchers at academic institutions and<br />
US national laboratories, such as Los Alamos National<br />
Laboratory and Sandia National Laboratory.<br />
h Session(s): S0646 - Massively Parallel Code<br />
Development on Stelletto CDA (Presented by<br />
Creative Consultants) (Tuesday, 17:00, Room: A8)<br />
Bertil Schmidt<br />
(Nanyang Technological University)<br />
Biography unavailable at press time.<br />
h Session(s): S0008 - Algorithms and Tools for<br />
Bioinformatics on <strong>GPU</strong>s (Tuesday, 16:00, Room: K)<br />
Michael Schøler<br />
Senior Consultant (LEGO)<br />
Michael has a Masters Degree in Computer Science<br />
from Aalborg University within the fields of Computer<br />
Vision and Artificial Intelligence systems. As a Senior<br />
Consultant and CEO in Hinnerup Net A/S, Michael has<br />
participated in a number of projects for LEGO. One of<br />
these projects is LEGO 3DServices which is a service<br />
oriented distributed HPC framework (<strong>GPU</strong>/CPU) that this<br />
session will focus on. Michael has worked on numerous<br />
other projects, ranging from simple websites to cutting<br />
edge technology development. The most recent primary<br />
customers for Hinnerup Net A/S are: Vestas, TrygVesta,<br />
The Danish Road-Directory and LEGO.<br />
h Session(s): S0261 – Scalable <strong>GPU</strong> Computing<br />
Service Architecture (Tuesday, 16:00, Room: A5)<br />
Steve Scott<br />
CTO, Tesla Business (NVIDIA)<br />
Dr. Steve Scott is Chief <strong>Technology</strong> Officer of the Tesla<br />
business unit at NVIDIA, where he is responsible for the<br />
evolution of NVIDIA’s <strong>GPU</strong> computing roadmap. Prior to<br />
joining NVIDIA in August 2011, Steve spent 19 years at<br />
Cray, where he was CTO since 2004. He was the Chief<br />
Architect of multiple systems at Cray, architected the<br />
routers for the Cray XT, XE and Cascade systems, and<br />
led the Cray Cascade project funded by the DARPA High<br />
Productivity Computing Systems program. Steve holds<br />
twenty-eight US patents, and has served on numerous<br />
advisory boards and program committees. He was the<br />
recipient of the 2005 ACM Maurice Wilkes Award and the<br />
2005 IEEE Seymour Cray Computer Engineering Award.<br />
He received his PhD in computer architecture in 1992<br />
from the University of Wisconsin at Madison, where he<br />
was a Wisconsin Alumni Research Foundation and Hertz<br />
Foundation Fellow.<br />
h Session(s): S0531 - Exascaling Your Apps<br />
(Wednesday, 09:00, Room: C)<br />
Frank Sculli<br />
Co-Founder/Informatics Director (BioDigital)<br />
Frank cofounded BioDigital on the premise that<br />
advancements in 3D and information technology will<br />
revolutionize the understanding of health and this vision<br />
continues to drive innovation. With extensive experience<br />
in health informatics, Frank has consulted to numerous<br />
prestigious medical institutions. Most notably, Frank led<br />
the development of the Caisis cancer data management<br />
project which is used globally by leading cancer<br />
hospitals. Prior to cofounding BioDigital, Frank worked<br />
at Honeywell, and later as a consultant to major<br />
organizations such as the Bank of New York, Pfizer and<br />
the Pennsylvania Treasury Department. Frank received<br />
his MS in Engineering from Columbia University.<br />
h Session(s): S2001 – Emerging Companies Summit:<br />
CEO on Stage Featuring Unity Technologies,<br />
Numecent, and BioDigital<br />
(Wednesday, 10:00, Marriott Ballroom 4)<br />
Ani Anciaux Sedrakian<br />
(IFP Energie Nouvelles)<br />
biography unavailable at press time.<br />
h Session(s): S0108 – An Innovative Massively<br />
Parallelized Molecular Dynamic Software<br />
(Tuesday, 16:00, Room: C)<br />
Mark Seligman<br />
Senior Scientist (Insilicos LLC)<br />
Mark was a compiler developer for supercomputer<br />
vendors for many years. In recent years, he became<br />
more interested in the interplay of algorithms with<br />
hardware and now prefers to work directly with other<br />
researchers. His original training was in pure math, but<br />
nowadays he tends to focus on bioinformatics,<br />
computational statistics and optimization.<br />
h Session(s): S0337 - High-Throughput Epistasis<br />
Screening Using <strong>GPU</strong>s (Tuesday, 09:00, Room: K)<br />
Matthew Sellitto<br />
(Northeastern University)<br />
Biography unavailable at press time.<br />
h Session(s): S0290 – Algorithm Acceleration<br />
for Geospatial Analysis<br />
(Thursday, 09:30, Marriott Ballroom 3)<br />
Partha Sen<br />
CEO (Fuzzy Logix)<br />
Partha Sen is the Co-founder and CEO of Fuzzy Logix.<br />
He has a passion for solving complex business problems<br />
using quantitative methods, data mining and pattern<br />
recognition. Since 1995, Partha has pursued this passion<br />
and has developed numerous high-performance<br />
quantitative algorithms. Today, these algorithms and<br />
models are the basis for the products being brought to<br />
market by Fuzzy Logix. Before founding Fuzzy Logix,<br />
Partha worked at Bank of America where he held senior<br />
management positions in the commercial and<br />
investment bank and in the portfolio strategies group.<br />
Previously Partha held managerial positions at Ernst<br />
and Young and Tata. He has an Engineering degree from<br />
the Indian Institute of <strong>Technology</strong> and an MBA from<br />
Wake Forest University.<br />
h Session(s): S0427 - Intra-Day Risk-Management<br />
with Parallelized Algorithms on <strong>GPU</strong>s<br />
(Tuesday, 17:00, Room: L)<br />
Neil Sequeira<br />
Managing Director (General Catalyst Partners)<br />
As a Managing Director of General Catalyst Partners,<br />
Neil invests in both new and existing technology<br />
businesses. His areas of special interest include:<br />
Internet and new media; software; consumer services;<br />
and network infrastructure. He is based in our Palo Alto<br />
office. Before joining General Catalyst Partners, Neil<br />
held positions at Time Warner where he was most<br />
recently Managing Director, <strong>Technology</strong> for Time Warner<br />
Investments. Formerly AOL Time Warner Ventures, the<br />
early stage private investment vehicle for the world’s<br />
largest media company. During his four years at Time<br />
Warner, Neil worked closely with various operating<br />
groups including AOL, HBO, Time Inc., Time Warner<br />
Cable, Turner and Warner Brothers to identify<br />
investment opportunities. Neil sourced, led and was a<br />
board director or observer for several of the companies<br />
within the Time Warner Investments portfolio including:<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
135
SPEAKERS AND<br />
PANELISTS<br />
Arroyo Video Solutions (CSCO), BigBand Networks<br />
(BBND), Entropic (ENTR), Goldpocket Interactive (ERIC),<br />
N2Broadband (ERIC) and Waterfront Media.<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Fyodor Serzhenko<br />
SEO (Fastvideo)<br />
Fyodor Serzhenko is CEO of Fastvideo company. His<br />
research interests include high speed cameras and<br />
software for high speed imaging, high performance<br />
computing. He was graduated from Moscow Institute of<br />
Physics and <strong>Technology</strong> in 1989 and got PhD in physics<br />
of semiconductors in 1993.<br />
h Session(s): S0273 - Fast JPEG Coding on the <strong>GPU</strong><br />
(Wednesday, 16:00, Room: A1)<br />
Christopher Sewell<br />
(Los Alamos National Laboratory)<br />
Biography unavailable at press time.<br />
h Session(s): S0706 - Los Alamos AHPC Symposium,<br />
PISTON: Portability and Performance for Data-<br />
Parallel Visualization and Analysis Operators<br />
(Wednesday, 17:30, Room: J2)<br />
h S0707 - Los Alamos AHPC Symposium, Accelerated<br />
HPC Symposium: Scalability: Hardware and<br />
Software (Thursday, 09:00, Room: J2)<br />
Peter Shenkin<br />
Vice President (Schrodinger)<br />
Peter S. Shenkin, Vice President, joined Schrödinger in<br />
1999. Previously, he was the lead developer of the<br />
MacroModel molecular-modeling package at Columbia<br />
University. He received his Ph.D. in Chemistry from<br />
Princeton University in 1979. After working for Owens-<br />
Corning Fiberglass Corporation for four years, he taught<br />
and carried out research at Columbia University and<br />
Barnard College prior to joining the MacroModel group<br />
in 1992. He has published in the areas of biosequence<br />
diversity analysis, protein structure determination,<br />
implicit solvation models for molecular mechanics and<br />
fast methods for determining solvent-accessible surface<br />
areas for atoms in molecules.<br />
h Session(s): S0121 - Software Architecture to<br />
Facilitate CUDA Development<br />
(Wednesday, 16:30, Room: N)<br />
Gideon Shmuel<br />
CEO (eyeSight Mobile Technologies, Ltd.)<br />
Gideon joined eyesight with 20 Years of experience in<br />
the Telecoms and Enterprise Software markets.<br />
Gideon has been involved in growing technology<br />
organizations and running and establishing the business<br />
activities and operations of several companies across<br />
international markets. Most recently Gideon performed<br />
the role of VP Sales at cVidya Networks. Prior to that<br />
Gideon had a number of executive roles in a number of<br />
countries in Olista, Top Image Systems, LCR Telecom<br />
and Esprit Telecom.<br />
h Session(s): S2002 - Emerging Companies Summit:<br />
CEO on Stage Featuring eyeSight Mobile,<br />
Numira Biosciences, and Ubitus<br />
(Wednesday, 11:00, Marriott Ballroom 4)<br />
Mark Silberstein<br />
Post-doctoral Researcher (UT Austin)<br />
Mark Silberstein is a Post-doctoral fellow at the<br />
University of Texas at Austin, with Prof. Emmett Witchel.<br />
He earned his PhD from the Technion, Israel Institute of<br />
<strong>Technology</strong>. His current research focuses on improving<br />
the integration of <strong>GPU</strong>s with the Operating Systems, as<br />
well as optimized execution of hybrid applications<br />
involving both <strong>GPU</strong>s and CPUs. He can be reached at<br />
marks@cs.utexas.edu.<br />
h Session(s): S0360 - Set <strong>GPU</strong>s Free: Integrating a<br />
File System with CUDA <strong>Program</strong>s<br />
(Thursday, 09:30, Hall 1)<br />
Chris Slaughter<br />
President (University of Texas Perception, Lynx Labs)<br />
Chris Slaughter is the President of Lynx Laboratories<br />
and a member of the Perception Laboratory at the<br />
University of Texas at Austin. Along with a team of<br />
engineers and researchers, he investigates theoretical<br />
problems in Computer Vision with an emphasis on high<br />
performance. His current research direction is focused<br />
on compressive motion analysis, real-time data<br />
clustering, and statistical localization on large maps. As<br />
the President of Lynx Labs, he also oversees the<br />
development of high performance algorithms for<br />
tracking, dense reconstruction, and SLAM as well as the<br />
commercialization of these technologies<br />
h Session(s): S0607 - High Performance 3D<br />
Perception (Tuesday, 09:00, Room: A1)<br />
Peter-Pike Sloan<br />
Principal Research Scientist (NVIDIA)<br />
Peter-Pike Sloan recently moved to NVIDIA Research.<br />
Prior to that he was part of a research group for Disney<br />
Interactive Studios and also spent nearly 10 years at<br />
Microsoft, where he worked in the graphics research<br />
group, DirectX and on the many-core incubation team.<br />
He is interested in all areas of computer graphics,<br />
particularly interactive rendering techniques.<br />
h Session(s): S0611 - Edge-Aware Shaders<br />
for Real-Time Computer Graphics<br />
(Tuesday, 15:00, Room: B)<br />
Berend Smit<br />
(UC Berkeley/Berkeley Lab)<br />
Biography unavailable at press time.<br />
h Session(s): S0122 – Computational Screening<br />
of Novel Carbon Capture Materials<br />
(Thursday, 10:30, Marriott Ballroom 4)<br />
Roman Sokolov<br />
Director of System Architecture (D4D Technologies)<br />
Roman Sokolov received his Ph.D. in Physics from UCSD in<br />
2005. He has been working at D4D technologies since 2007<br />
as a software engineer. His main interests include applied<br />
mathematics, numerical methods and image processing.<br />
h Session(s): S0079 – Warped Parallel Nearest<br />
Neighbor Searches using KD-Trees<br />
(Thursday, 10:30, Room: A2)<br />
Prakalp Somawanshi<br />
(CRL India)<br />
Biography unavailable at press time.<br />
h Session(s): S0107 – Acceleration of Long-Wave<br />
Rapid Radioactive Transfer Model on GP<strong>GPU</strong><br />
(Thursday, 10:30, Room: N)<br />
Paulo Souza<br />
HPC Consultant / Software Engineer (Petrobras)<br />
Paulo Souza has spent 9+ years working with E&P<br />
production geophysics software, seismic imaging on<br />
HPC clusters, RTM, One Way Wave Equation, Kirchhoff,<br />
multiple architecture optimization (GP<strong>GPU</strong>, x86, Power,<br />
Cell) and cluster deployment. He has been working with<br />
GP<strong>GPU</strong> since 2006 porting seismic imaging applications<br />
to CUDA with gains up to 10X in performance/price and<br />
performance/watt over a traditional multi-million dollar<br />
x86 cluster.
h Session(s): S0628 - Panel Session: Learn from<br />
Experts in the Oil & Gas Industry<br />
(Wednesday, 16:30, Room: A7)<br />
Dale Southard<br />
Senior Solution Architect (NVIDIA)<br />
Dale Southard is a senior solution architect with NVIDIA.<br />
In the past he was a HW architect in the LLNL systems<br />
group designing the vis/post-processing solutions and<br />
on-call for capability systems.<br />
h Session(s): S0119 - Best Practices for Architecting<br />
and Managing High-Performance <strong>GPU</strong> Clusters<br />
(Thursday, 14:00, Room: K)<br />
Marco Sozzi<br />
Associate Professor (Physics Department of Pisa)<br />
Marco Sozzi is associate professor of physics at the<br />
University of Pisa, working in particle physics and<br />
focusing on discrete symmetry violations in Nature. His<br />
areas of interest include high-performance triggering<br />
and event selection, and he coordinates the Trigger and<br />
Data Acquisition project for the NA62 experiment in<br />
preparation at CERN, for which a pilot project using<br />
<strong>GPU</strong>s is foreseen.<br />
h Session(s): S0013 – <strong>GPU</strong>s for Fast Triggering in<br />
NA62 Experiment (Tuesday, 10:00, Room: J2)<br />
Kyle Spagnoli<br />
Research Engineer (EM Photonics)<br />
Kyle has been working in <strong>GPU</strong> accelerated algorithms and<br />
applications since the pre-CUDA era. At the University of<br />
Delaware, he received his Master’s degree in electrical<br />
engineering with a focus in parallel computing<br />
architectures. Since then, as a research engineer at EM<br />
Photonics, he was worked on a number of GP<strong>GPU</strong> projects<br />
including: accelerated physical optics simulations,<br />
computational fluid dynamics, biomedical processing,<br />
advanced image processing, and computational linear<br />
algebra. Currently, he is researching new algorithms and<br />
techniques for large scale sparse linear algebra solvers.<br />
h Session(s): S0307 – New Advances in <strong>GPU</strong> Linear<br />
Algebra (Wednesday, 14:00, Room: A3)<br />
Paolo Spallaccini<br />
System Engineer (Ericsson)<br />
Paolo Spallaccini is working at Ericsson R&D Italy,<br />
Microwave Department, as a system engineer. His<br />
research interests lie in diverse digital signal processing<br />
areas, with focus on source and channel coding, as well<br />
as in software engineering and in algorithm<br />
engineering, with focus on parallel computing. His<br />
working experiences ranged from joining/leading<br />
technical development groups for signal processing<br />
systems to pioneering long-time perspective innovative<br />
and strategic projects for telecommunication networks<br />
backbone and mobile backhaul systems. He received a<br />
master degree in Electronic Engineering from University<br />
of Perugia in 1999. He is an IEEE Member.<br />
h Session(s): S0255 - Telecom Systems Simulations<br />
Acceleration via CPU/<strong>GPU</strong> Co-Processing: Turbo<br />
Codes Case Study<br />
(Tuesday, 10:00, Marriott Ballroom 3)<br />
Pierre Spatz<br />
Head of Quantitative Research (Murex SAS)<br />
Pierre has joined Murex in 1989 and has a master<br />
degree in computer science and applied mathematics<br />
from ENSIMAG. After various leading positions in the<br />
Murex software development team Pierre has launched<br />
the Murex Analytics initiative in 2002.<br />
h Session(s): S0250 - From <strong>GPU</strong> Computing<br />
Toward Full HPC In Finance with <strong>GPU</strong>s<br />
(Wednesday, 10:00, Room: L)<br />
Filippo Spiga<br />
Computational Scientist (Irish Centre for High-End<br />
Computing)<br />
Filippo joined ICHEC in January 2011 as a<br />
Computational Scientist after six months at the IBM T.J.<br />
Watson Research Center as Research Engineer. His<br />
main interests include general GP-<strong>GPU</strong> programming,<br />
numerical algorithms for GP-<strong>GPU</strong>, development of<br />
mixed multi-core CPU and <strong>GPU</strong> code and scientific<br />
application porting. Inside ICHEC Filippo is directly<br />
involved in the GP-<strong>GPU</strong> porting of the PWSCF package<br />
(QUANTUM ESPRESSO suite), enabling the package for<br />
efficient and high-scalable serial and parallel<br />
calculations on large <strong>GPU</strong> clusters.<br />
h Session(s): S0220 - Enabling faster material<br />
science modeling using the accelerated Quantum<br />
ESPRESSO (Thursday, 16:30, Marriott Ballroom 4)<br />
Savitha Srinivasan<br />
Partner (IBM Venture Capital Group)<br />
Savitha Srinivasan is a Partner in IBM’s Venture Capital<br />
Group in Corporate Strategy where she develops<br />
strategic relationships with venture capitalists and their<br />
portfolio companies to leverage external innovation for<br />
mutual strategic advantage. She has over 20 years of<br />
experience at IBM in leadership roles addressing the<br />
strategic priorities of IBM’s Services businesses and<br />
leads the development of IBM’s Services venture<br />
ecosystem, with each of the Global <strong>Technology</strong> Services<br />
business units – Strategic Outsourcing, Integrated<br />
<strong>Technology</strong> Services, Managed Business Process<br />
Services and Industry Analytics with early identification<br />
of companies, fostering pilots, partnerships and M&A<br />
insights. She is currently engaged in driving IBM<br />
Watson’s content partnership strategy.<br />
h Session(s): Emerging Companies Summit<br />
(Wednesday all day, Marriott Ballroom 4)<br />
Timo Stich<br />
Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Timo Stich is a Developer <strong>Technology</strong> Engineer for<br />
NVIDIA Corporation. His focus is on image processing<br />
and general purpose compute applications of <strong>GPU</strong>s.<br />
Prior to joining NVIDIA he was research staff at the<br />
Graphics, Optics and Vision Group at the Max-Planck-<br />
Institute for Computer Science, Saarbruecken and the<br />
Computer Graphics Lab at Brunswick University. He<br />
received a diploma degree in Computer Science from<br />
Mannheim University, Germany and a Ph.D. degree from<br />
the Brunswick University, Germany.<br />
h Session(s): S0052 - Fast High Quality Image and<br />
Video Background Removal with CUDA<br />
(Wednesday, 16:30, Room: A1)<br />
Chris Stiefeling<br />
(Oliver Wyman Financial Services)<br />
Chris has more than 15 years of experience in designing<br />
and implementing software solutions for the Financial<br />
Sector. He has an in-depth knowledge of spreadsheet,<br />
database and automation technologies and has<br />
developed expertise in many different programming<br />
languages and technologies. He has developed a<br />
significant amount of experience in the areas of<br />
economic scenario generation as well as pricing and<br />
valuation of derivatives and insurance products using<br />
Monte Carlo simulation techniques. Chris has expertise<br />
in implementing HPC solutions including large scale<br />
cloud computing implementations, programming on<br />
general purpose <strong>GPU</strong> cards and distributed computing<br />
frameworks such as Windows HPC.<br />
h Session(s): S0435 - Leveraging GP<strong>GPU</strong> <strong>Technology</strong><br />
for Valuation of Complex Insurance Products<br />
(Tuesday, 16:00, Room: L)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
137
SPEAKERS AND<br />
PANELISTS<br />
John Stone<br />
Senior Research <strong>Program</strong>mer (University of Illinois at<br />
Urbana-Champaign)<br />
John Stone is a Senior Research <strong>Program</strong>mer in the<br />
Theoretical and Computational Biophysics Group, and<br />
Associate Director of the NVIDIA CUDA Center of<br />
Excellence at the University of Illinois. Stone is the lead<br />
developer of VMD, a high performance molecular<br />
visualization tool used by researchers all over the world.<br />
His research interests include molecular visualization,<br />
<strong>GPU</strong> computing, parallel processing, ray tracing, haptics,<br />
and virtual environments. Mr. Stone was awarded as an<br />
NVIDIA CUDA Fellow in 2010. Stone provides consulting<br />
services for projects involving computer graphics and<br />
<strong>GPU</strong> computing.<br />
h Session(s): S0142 - VMD: High Performance<br />
Molecular Visualization and Analysis on <strong>GPU</strong>s<br />
(Wednesday, 14:00, Room: N)<br />
h S0709 - Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 2<br />
(Thursday, 14:00, Room: J1)<br />
Jeff Stuart<br />
PhD Student (UC Davis)<br />
Biography unavailable at press time.<br />
h Session(s): S0157 – A Study of Persistent Threads<br />
Style <strong>Program</strong>ming Model for <strong>GPU</strong> Computing<br />
(Thursday, 15:00, Room: B)<br />
Xiaobai Sun<br />
Professor (Duke University)<br />
Xiaobai Sun is a professor of computer science at Duke<br />
University. Her research interests and efforts focus on<br />
numerical algorithm design and analysis, especially, in<br />
bridging and blending mathematical models and<br />
computer architectures for scientific simulation and<br />
signal processing.<br />
h Session(s): S0314 – Efficient k-Nearest<br />
Neighbor Search Algorithms on <strong>GPU</strong>s<br />
(Tuesday, 16:30, Room: C)<br />
Rajeev Surati<br />
President (Scalable Display Technologies)<br />
Biography unavailable at press time.<br />
h Session(s): S0355 - Seamless Scalable Displays-<br />
using NVDIA Warp + Intensity API<br />
(Wednesday, 10:30, Room: A1)<br />
Krishnan Suresh<br />
Associate Professor (University of Wisconsin)<br />
Krishnan Suresh is currently an Associate Professor in<br />
the Department of Mechanical Engineering Department,<br />
University of Wisconsin, Madison. He graduated in 1998<br />
from Cornell with a Ph.D. in Mechanical Engineering. He<br />
later served as an Engineering Manager at Kulicke and<br />
Soffa Industries, Philadelphia from 1998 through 2002.<br />
His research interests are in representational and<br />
computational challenges underlying computational and<br />
bio-mechanics.<br />
h Session(s): S0070 - <strong>GPU</strong>-Friendly<br />
Preconditioners for Thin Structure Analysis<br />
(Wednesday, 16:30, Room: K)<br />
William Tang<br />
Director of Fusion Simulation <strong>Program</strong> at the Princeton<br />
Plasma Physics Laboratory (Princeton)<br />
William Tang is the Director of the Fusion Simulation<br />
<strong>Program</strong> at the Princeton Plasma Physics Laboratory<br />
(PPPL) and Lecturer with Rank & Title of Professor in<br />
the Department of Astrophysical Sciences at Princeton<br />
University. He is a Fellow of the American Physical<br />
Society and received the 2005 Chinese Institute of<br />
Engineers-USA (CIE-USA) Distinguished Achievement<br />
Award “for his outstanding leadership in fusion research<br />
and contributions to fundamentals of plasma science.”<br />
He is internationally recognized for his theoretical<br />
contributions as well as associated HPC applications<br />
dealing with electromagnetic kinetic plasma behavior in<br />
complex geometries. He has over 200 publications – with<br />
more than 140 peer-reviewed papers and an “h-index”<br />
or “impact factor” of 42 on the Web of Science, including<br />
over 5400 total citations. He is currently the U.S. PI for<br />
the G8 Exascale Project in Fusion Energy -- an<br />
international HPC collaboration involving the US, UK,<br />
France, Germany, Japan, and Russia.<br />
h Session(s): S0654 Fusion Energy Sciences &<br />
Computing at the Extreme Scale<br />
(Tuesday, 15:30, Room: A2)<br />
Sarah Tariq<br />
Software Engineer (NVIDIA)<br />
Sarah is a senior engineer in NVIDIA’s Developer<br />
<strong>Technology</strong> team focusing on High Performance <strong>GPU</strong><br />
Computing in the Life Sciences domain. As part of her job<br />
she works collaboratively with external developers to<br />
research and develop <strong>GPU</strong> computing algorithms and<br />
ensure the best performance of <strong>GPU</strong> computing<br />
applications on current and next-generation architectures.<br />
h Session(s): S0351 - Strong Scaling for Molecular<br />
Dynamics Applications (Tuesday, 14:30, Room: A1)<br />
Michela Taufer<br />
Assistant Professor (University of Delaware)<br />
Michela Taufer is an Assistant Professor in Computer<br />
and Information Sciences at the University of Delaware.<br />
She earned her MS in Computer Engineering from the<br />
University of Padova and her Ph.D. in Computer Science<br />
from ETH. She was a post-doc at UC San Diego and The<br />
Scripps Research Institute. Michela has a long history of<br />
interdisciplinary work with computational biophysics<br />
groups. Her research interests include software<br />
applications and their advance programmability in<br />
heterogeneous computing (i.e., multi-core platforms and<br />
<strong>GPU</strong>s); cloud computing and volunteer computing; and<br />
performance analysis, modeling and optimization of<br />
multi-scale applications.<br />
h Session(s): S0207 - <strong>GPU</strong> Enabled Macromolecular<br />
Simulation: Challenges and Opportunities<br />
(Wednesday, 15:30, Room: N)<br />
Tetsuo Tawara<br />
Software Engineer (Koozyt)<br />
Tetsuo Tawara is currently a software engineer at Koozyt<br />
where he works on augmented reality and data mining<br />
projects. He received a Masters degree in Mechanical<br />
Engineering from Aoyama Gakuin University.<br />
h Session(s): S0231 - Levenberg-Marquardt using<br />
Block Sparse Matrices on CUDA<br />
(Thursday, 14:30, Marriott Ballroom 3)<br />
Andrei Tchouprakov<br />
Director of System Architecture (D4D Technologies)<br />
Andrei Tchouprakov is a Director of System Architecture<br />
at D4D Technologies where he is currently working on<br />
developing a 3D dental scanner. His background is in 3D<br />
data acquisition, point cloud processing, surface<br />
generation, image processing and parallel computing.<br />
He received his MS degree in Mathematics in 1998 from<br />
Irkutsk State University, Russia.<br />
h Session(s): S0079 - Warped Parallel Nearest<br />
Neighbor Searches using KD-Trees<br />
(Thursday, 10:30, Room: A2)
Tom-Michael Thamm<br />
Director, Software Product Management (NVIDIA ARC)<br />
Tom-Michael Thamm is the Director for Software<br />
Product Management at NVIDIA ARC and is responsible<br />
for all products, such as iray, mental ray and the<br />
geo-spatial library. He is managing direct customer<br />
support as well. Thamm is working for mental images<br />
and NVIDIA ARC for over 20 years. He has led several<br />
key projects such as integration of mental ray into many<br />
of the major CAD systems. He has studied Mathematics<br />
and has developed various 3D file formats, such as<br />
extended OBJ, and free-form surface algorithms.<br />
h Session(s): S0507 - Interactive and Scalable<br />
Subsurface Data Visualization Framework<br />
(Wednesday, 16:00, Room: A7)<br />
Derek Thorslund<br />
Director of Product Management (Citrix Systems, Inc.)<br />
Derek Thorslund Drives Citrix’s product strategy for HDX<br />
(high definition experience) multimedia virtualization<br />
technologies and leads the company’s HDX Product<br />
Management group across XenDesktop, XenApp,<br />
VDI-in-a-Box, Citrix Receiver and CloudGateway. Upon<br />
joining Citrix in 2003, he played a key role in introducing<br />
the Citrix Access Suite, forerunner to XenDesktop<br />
Platinum Edition. Thorslund has had an extensive career<br />
in the high-tech industry as Director of Product<br />
Management at Avotus and Manager of New Business<br />
Applications at Bell-Northern Research.<br />
h Session(s): S0413 - Delivering 3D Professional<br />
Graphics from the Cloud with Citrix XenDesktop<br />
(Tuesday, 15:00, Room: A5)<br />
Alexey Titov<br />
Engineering Research Associate (Stanford)<br />
Dr. Alexey Titov is an Engineering Research Associate in<br />
the Martinez Group at Stanford University. His research<br />
efforts are focused on exploring, implementing and<br />
optimizing computational chemistry algorithms for novel<br />
architectures. He is one of developers of TeraChem,<br />
quantum chemistry software created from scratch for<br />
<strong>GPU</strong>s. Alexey Titov’s research interests also include<br />
parallel algorithms, various applications of symbolic<br />
algebra systems in optimization of performance-critical<br />
computational routines for novel architectures.<br />
h Session(s): S0429 - Quantum Chemistry: Automated<br />
Code Generation and Optimization for <strong>GPU</strong> Kernels<br />
(Thursday, 15:00, Marriott Ballroom 4)<br />
Stanimire Tomov<br />
Research Director (University of Tennessee, Knoxville)<br />
Biography unavailable at press time.<br />
h Session(s): S0248 – Excitements, Challenges,<br />
and Rewards In Optimizing GP<strong>GPU</strong> Kernels<br />
(Tuesday, 09:00, Marriott Ballroom 3)<br />
h S0042 – Solving Challenging Numerical Linear<br />
Algebra Algorithms using Multiple <strong>GPU</strong><br />
Accelerators (Wednesday, 15:00, Room: A3)<br />
Doug Traill<br />
Senior Solutions Architect (NVIDIA)<br />
Doug Traill is a Senior Solutions Architect at NVIDIA for<br />
scalable visualization solutions. He has over 15 years<br />
experience in designing and building some of the worlds<br />
most complex visualization systems.<br />
h Session(s): S0341 - See the Big Picture Scalable<br />
Visualization Solutions for System Integrators<br />
(Monday, 10:30, Room: A2)<br />
Justin Tripp<br />
Technical Staff Member (Los Alamos National Laboratory)<br />
Dr. Justin L. Tripp is a Technical Staff Member on the<br />
Advanced Architectures team at Los Alamos National<br />
Laboratory. Dr. Tripp works on tools and methodologies<br />
for creating high-performance computing systems,<br />
which have been applied to systems from<br />
supercomputers to satellites and airborne video<br />
surveillance. Dr. Tripp received an R&D100 Award for his<br />
work on the Trident C-to-FPGA Compiler. Dr. Tripp<br />
received his PhD in Electrical Engineering from Brigham<br />
Young University in 2004 and has nineteen publications<br />
relating to FPGAs and high-performance computing, and<br />
more than 15 years of experience with FPGAs, highperformance<br />
computing, advanced architectures, and<br />
system-level design and analysis tools.<br />
h Session(s): S0702 - Los Alamos AHPC Symposium,<br />
The Architecture of Acceleration in HPC<br />
(Wednesday, 15:30, Room: J1)<br />
h S0707- Los Alamos AHPC Symposium, Accelerated<br />
HPC Symposium: Scalability: Hardware and<br />
Software (Thursday, 09:00, Room: J2)<br />
Alejandro Troccoli<br />
Mobile Imaging Researcher (NVIDIA)<br />
Alejandro has been with NVIDIA since 2006 and joined<br />
NVIDIA Research in March 2011 to work in mobile<br />
computer vision and applications. As a 3D Systems<br />
Software Engineer he lead the development of NVIDIA’s<br />
Optimus technology, contributed to NVIDIA’s hybrid<br />
technology and did development work for the Direct3D<br />
Driver. Alejandro received a Licenciatura en Ciencias de<br />
la Computacion from the Universidad de Buenos Aires,<br />
Argentina, in 2001. He did his graduate work at Columbia<br />
University in the City of New York, where he received a<br />
Ph.D. in 2006.<br />
h Session(s): S0526 - Tools for Mobile Computational<br />
Photography (Tuesday, 16:00, Room: N)<br />
Jeroen Tromp<br />
Director, Princeton Institute for Computational<br />
Science (Princeton)<br />
Seismologist Jeroen Tromp, Blair Professor of Geology,<br />
Professor of Applied & Computational Mathematics, and<br />
Director of the Princeton Institute for Computational<br />
Science joined the Princeton faculty in 2008. Tromp’s<br />
main research interests are in theoretical &<br />
computational seismology, including simulations of<br />
acoustic (an)elastic, and poroelastic seismic wave<br />
propagation on local, regional and global scales. The<br />
current focus of his research involves imaging Earth’s<br />
interior based on spectral-element and adjoint methods.<br />
He received the Macelwane Medal of the American<br />
Geophysical Union in 1999 and a Gordon Bell Award in<br />
2003. He is a corresponding member of the Royal<br />
Netherlands Academy of Sciences.<br />
h Session(s): S0608 - Toward Global Seismic Imaging<br />
based on Spectral-Element and Adjoint Methods<br />
(Tuesday, 17:00, Room: A2)<br />
Thomas True<br />
Applied Engineer (NVIDIA)<br />
Tom is a Senior Applied Engineer in NVIDIA’s<br />
Professional Solutions Group where he focuses on the<br />
use of <strong>GPU</strong>s in broadcast, video and film applications<br />
ranging from pre-visualization to post production and<br />
live to air. Prior to joining NVIDIA, Tom was an<br />
Applications Engineer at SGI. Thomas has a M.S. degree<br />
in Computer Science from the Graphics Lab at Brown<br />
University and a B.S. Degree from the Rochester<br />
Institute of <strong>Technology</strong>.<br />
h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />
Round Table (Monday, 14:30, Room: A2)<br />
h S0328 - Best Practices in <strong>GPU</strong>-Based Video<br />
Processing (Tuesday, 14:00, Room: J2)<br />
h S0049 - Using the <strong>GPU</strong> Direct for Video API<br />
(Tuesday, 15:00, Room: J2)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
139
SPEAKERS AND<br />
PANELISTS<br />
Hoang-Tron Minh Tuan<br />
PhD Student (George Mason University)<br />
Tuan is currently the PhD student at George Mason<br />
University, School of System Biology. His research has<br />
been focusing on calcium dynamics, cardiac cell<br />
modeling, and high performance computing. Currently,<br />
he’s working on developing a computational model for<br />
cardiac cell at a microscale level using <strong>GPU</strong> technology<br />
to study the underlying mechanisms of calciumentrained<br />
arrhythmias.<br />
h Session(s): S0072 – <strong>GPU</strong>-Enabled Spatiotemporal<br />
Model of Stochastic Cardiac Calcium Dynamics and<br />
Arrhythmias (Wednesday, 09:00, Room: B)<br />
Antonino Tumeo<br />
Research Scientist (Pacific Northwest National<br />
Laboratory)<br />
Dr. Antonino Tumeo received the M.S degree in<br />
Informatic Engineering, in 2005, and the Ph.D. degree in<br />
Computer Engineering, in 2009, from Politecnico di<br />
Milano in Italy. Since February 2011, he has been a<br />
research scientist in the PNNL’s High Peformance<br />
Computing group. He Joined PNNL in 2009 as a post<br />
doctoral research associate. Previously, he was a post<br />
doctoral researcher at Politecnico di Milano. His<br />
research interests are modeling and simulation of high<br />
performance architectures, hardware-software<br />
codesign, FPGA prototyping and GP<strong>GPU</strong> computing.<br />
h Session(s): S0343 - A Quantum Chemistry<br />
Domain-Specific Language For Heterogeneous<br />
Clusters (Tuesday, 10:00, Room: L)<br />
Stanley Tzeng<br />
Graduate Student (University of California, Davis)<br />
Stanley Tzeng is a graduate student at the University of<br />
California, Davis. His main research is into task-parallel<br />
systems on the <strong>GPU</strong> and he is interested in its applications.<br />
h Session(s): S0138 - <strong>GPU</strong> Task-Parallelism:<br />
Primitives and Applications<br />
(Thursday, 15:30, Marriott Ballroom 3)<br />
h S0709- Los Alamos AHPC Symposium,<br />
Accelerated HPC Symposium: Applications -<br />
Methods and <strong>Program</strong>ming Models, Part 2<br />
(Thursday, 14:00, Room: J1)<br />
Ivan Ufimtsev<br />
Postdoc (Stanford)<br />
Biography unavailable at press time.<br />
h Session(s): S0429 – Quantum Chemistry: Automated<br />
Code Generation and Optimization for <strong>GPU</strong> Kernels<br />
(Thursday, 15:00, Marriott Ballroom 4)<br />
Stefan Umbreit<br />
Postdoctoral Associate (Northwestern University)<br />
Biography unavailable at press time.<br />
h Session(s): S0087 – <strong>GPU</strong> Acceleration of<br />
Dense Stellar Clusters Simulation<br />
(Thursday, 15:00, Room: M)<br />
Vamsi Krishna Veligatla<br />
<strong>GPU</strong> <strong>Program</strong>mer (University Of Groningen)<br />
Vamsi Krishna Veligatla received his Masters in<br />
Computer Science (IIIT Hyderabad 2006) and BTech in<br />
Computer Science (IIIT Hyderabad 2004). His<br />
professional experience includes, Software Developer at<br />
NVIDIA (Pune, India), then later worked as a Software<br />
Developer at AMD (Hyderabad, India), and most recently<br />
has been working as <strong>GPU</strong> <strong>Program</strong>mer at Kapteyn<br />
Astronomical Institute, University Of Groningen<br />
(Groningen, The Netherlands).<br />
h Session(s): S0187 - <strong>GPU</strong>s for Radio Imaging<br />
(Thursday, 14:00, Room: M)<br />
Shalini Venkataraman<br />
Senior Applied Engineer (NVIDIA)<br />
Shalini Venkataraman is a Senior Applied Engineer<br />
at NVIDIA.<br />
h S0530 - Multi-Display Roundtable<br />
(Monday, 13:00, Room: A2)<br />
h Session(s): S0356 - Optimized Texture Transfers<br />
(Tuesday, 16:00, Room: J2)<br />
h S0353 - <strong>Program</strong>ming Multi-<strong>GPU</strong>’s for Scalable<br />
Rendering (Wednesday, 09:00, Room: A1)<br />
h S0322 - Warping & Blending for Multi-Display<br />
Systems (Wednesday, 10:00, Room: A1)<br />
h S0326 - Next Generation InfoWall<br />
(Thursday, 09:00, Room: A1)<br />
Shivaram Venkataraman<br />
PhD Student (UC Berkeley)<br />
Shivaram Venkataraman is a PhD student at the<br />
University of California, Berkeley and is a part of the<br />
AMP Lab. He completed his M.S at the University of<br />
Illinois in 2011 and his B.E from the Birla Institute of<br />
<strong>Technology</strong> and Science, Pilani, India. His research<br />
interests are in design of storage systems and analytics<br />
platforms for big-data applications.<br />
h Session(s): S0152 – Accurate Sequence Alignment<br />
using Distributed Filtering on <strong>GPU</strong> Clusters<br />
(Tuesday, 15:30, Room: K)<br />
Vyas Venkataraman<br />
Software Engineer (NVIDIA)<br />
Vyas Venkataraman is a software engineer in the CUDA<br />
developer tools group at NVIDIA. He is primarily<br />
responsible for CUDA-MEMCHECK, and contributes to<br />
the CUDA Driver and backend code shared by clients of<br />
the debug API. He joined NVIDIA in 2010 from Boston<br />
University where he was doing research on abstractions<br />
for high level modeling of synthesizable communicating<br />
systems. Vyas received his Doctor of Philosophy from the<br />
College of Engineering at Boston University.<br />
h Session(s): S0027A – All-In-One Debugging<br />
Experience with CUDA-GDB and CUDA-MEMCHECK<br />
(Monday, 14:30, Room: A5)<br />
h S0027B – All-In-One Debugging Experience<br />
with CUDA-GDB and CUDA-MEMCHECK<br />
(Wednesday, 14:00, Room: C)<br />
Jeff Vetter<br />
(Oak Ridge National Laboratory)<br />
Biography unavailable at press time.<br />
h Session(s): S0531 - Exascaling Your Apps<br />
(Wednesday, 09:00, Room: C)<br />
Oreste Villa<br />
Research Scientist (Pacific Northwest National Laboratory)<br />
Biography unavailable at press time.<br />
h Session(s): S0343 – A Quantum Chemistry<br />
Domain-Specific Language For Heterogeneous<br />
Clusters (Tuesday, 10:00, Room: L)<br />
Will Wade<br />
Manager, Quadro Advanced Technologies (NVIDIA)<br />
Will Wade manages the Quadro Advanced Technologies<br />
Team at NVIDIA, responsible for some of the highest<br />
demanding visual computing solutions on the planet.<br />
This team creates technologies for virtual reality caves,<br />
3D stereo-scopic professional visualization, real-time<br />
broadcast graphics, and remote and virtualized<br />
interactive graphics. Will has been a leader in the field<br />
for over 15 years, with work at both NVIDIA and HP.<br />
h Session(s): S0254 - Graphics in the Cloud -<br />
How NVIDIA is Enabling Cloud Visualization<br />
(Tuesday, 14:00, Room: A5)
Kelly Walker<br />
Senior Software Developer (Hue)<br />
Biography unavailable at press time.<br />
h Session(s): S0436 - Integrated <strong>GPU</strong> Acceleration<br />
With Real Time Visualization Of Terabyte Data<br />
(Tuesday, 15:00, Room: A7)<br />
Ross Walker<br />
Assistant Professor (University of California San Diego)<br />
Ross Walker is an Assistant Research Professor at the<br />
San Diego Supercomputer Center, an Adjunct Assistant<br />
Professor in the Department of Chemistry and<br />
Biochemistry at the University of California, San Diego<br />
and an NVIDIA Fellow. He runs the Walker Molecular<br />
Dynamics Lab where he leads a team developing<br />
advanced techniques for Molecular Dynamics Simulations<br />
supporting work improving drug and biocatalyst design.<br />
His work includes improved Quantum Mechanical/<br />
Molecular Mechanical models, development of force<br />
fields for simulation of lipid membranes, simulations of<br />
cellulase enzymes for improved cellulosic bioethanol<br />
production and the development of <strong>GPU</strong> accelerated<br />
versions of the AMBER Molecular Dynamics engine.<br />
h Session(s): S0010 - Towards Routine Microsecond<br />
Molecular Dynamics Simulations on Commodity<br />
Hardware (Wednesday, 09:00, Room: N)<br />
Jason Walsh<br />
(University of Pennsylvania 3D Lab)<br />
Biography unavailable at press time.<br />
h Session(s): S0303 – <strong>GPU</strong> Acceleration for<br />
Threshold Based Region Growth Algorithms<br />
(Thursday, 09:00, Room: C)<br />
BingQiang Wang<br />
Head of High Performance Computing (BGI)<br />
BingQiang Wang completed his doctorate in<br />
computational chemistry at East China University of<br />
Science and <strong>Technology</strong> (ECUST) in 2006. From March<br />
2005, he was a research scientist at Shanghai<br />
Supercomputer center, dedicated to high performance<br />
computing enabling in computational chemistry and life<br />
science research. In March 2010 he joined BGI as group<br />
head of high performance computing, to develop<br />
solutions for challenging life science problems.<br />
h Session(s): S0519 - <strong>GPU</strong> Accelerated<br />
Bioinformatics Research at BGI<br />
(Tuesday, 14:00, Room: K)<br />
h S0109 - SOAP3: <strong>GPU</strong>-based Compressed Indexing<br />
and Ultra-fast Parallel Alignment of Short Reads<br />
(Wednesday, 16:00, Room: B)<br />
Gaofeng Wang<br />
Postdoc Fellow (Laboratoire E.M2.C, Ecole Centrale Paris)<br />
Dr. Gaofeng WANG is postdoc fellow in Laboratory EM2C,<br />
CNRS UPR288, Ecole Centrale Paris. His research<br />
interests are in area of turbulent combustion modeling<br />
and high fidelity CFD.<br />
h Session(s): S0129 - A Monte Carlo Thermal<br />
Radiation Solver in <strong>GPU</strong>/CPU Hybrid Architecture<br />
(Thursday, 09:00, Room: A8)<br />
Long Wang<br />
Associate Professor (Supercomputing Center of CNIC,<br />
Chinese Academy of Sciences)<br />
Biography unavailable at press time.<br />
h Session(s): S0392 – Large-Scale First Principle<br />
Pseudopotential DFT Calculations on <strong>GPU</strong> Clusters<br />
(Thursday, 15:30, Marriott Ballroom 4)<br />
Peng Wang<br />
Devtech Engineer (NVIDIA)<br />
Peng Wang is currently the manager of HPC developer<br />
technology in NVIDIA China, where he works with HPC<br />
developers in porting and optimizing HPC codes on <strong>GPU</strong>.<br />
Previously he works in NVIDIA US as a HPC developer<br />
technology engineer, where he mainly worked on CAE<br />
solvers on <strong>GPU</strong> and molecular dynamics. He got a Ph.D.<br />
on computational physics from Stanford, where he<br />
worked on developing massively parallel adaptive mesh<br />
fluid simulations code and applying to astrophysical<br />
turbulence simulations. He also got a MS in Physics and<br />
BS in Scientific Computing from Nankai University.<br />
h Session(s): S0245 - Porting Legacy Plasma Codes<br />
to <strong>GPU</strong> (Tuesday, 16:00, Room: A8)<br />
David Weinstein<br />
CTO (Numira Biosciences)<br />
Dr. David Weinstein is the Chief <strong>Technology</strong> Officer and<br />
Senior Director of Salt Lake Operations for Numira<br />
Biosciences. As a PhD student at the University of Utah<br />
in the early 90’s, David was a founding member of the<br />
Scientific Computing and Imaging (SCI) Institute. In 2004,<br />
he co-founded Visual Influence (VI), a SCI startup<br />
focused on custom visualization and analysis software<br />
for the medical imaging industry. In 2007, VI was<br />
acquired by Numira Biosciences, where David and his<br />
team now develop high-throughput processing, and<br />
Cloud-based interactive visual analysis tools for<br />
preclinical imaging. David has co-authored over 40<br />
peer-reviewed scientific publications.<br />
h Session(s): S2002 – Emerging Companies Summit:<br />
CEO on Stage Featuring eyeSight Mobile,<br />
Numira Biosciences, and Ubitus<br />
(Wednesday, 11:00, Marriott Ballroom 4)<br />
Jack Wells, Ph.D.<br />
Director of Science, Oak Ridge Leadership Computing<br />
Facility (Oak Ridge National Laboratory)<br />
Jack Wells is the director of science for the National<br />
Center for Computational Sciences (NCCS) at Oak Ridge<br />
National Laboratory (ORNL). He is responsible for<br />
devising a strategy to ensure cost-effective, state-of-theart<br />
scientific computing at the NCCS, which houses the<br />
Department of Energy’s Oak Ridge Leadership<br />
Computing Facility (OLCF). In ORNL’s Computing and<br />
Computational Sciences Directorate, Wells has worked<br />
as group leader of both the Computational Materials<br />
Sciences group in the Computer Science and<br />
Mathematics Division and the Nanomaterials Theory<br />
Institute in the Center for Nanophase Materials<br />
Sciences. During a sabbatical, he served as a legislative<br />
fellow for Senator Lamar Alexander, providing<br />
information about high-performance computing, energy<br />
technology, and science, technology, engineering, and<br />
mathematics education issues. Wells began his ORNL<br />
career in 1990 for resident research on his Ph.D. in<br />
Physics from Vanderbilt University. Following a<br />
three-year postdoctoral fellowship at Harvard University,<br />
he returned to ORNL as a staff scientist in 1997 as a<br />
Wigner postdoctoral fellow. Jack is an accomplished<br />
practitioner of computational physics and has been<br />
supported by the Department of Energy’s Office of Basic<br />
Energy Sciences. Jack has authored or co-authored over<br />
70 scientific papers and edited one book, spanning<br />
nanoscience, materials science and engineering,<br />
nuclear and atomic physics computational science, and<br />
applied mathematics.<br />
h Session(s): S0606 - <strong>GPU</strong>-accelerated Science on<br />
Titan: Tapping into the World’s Preeminent <strong>GPU</strong><br />
Supercomputer to Achieve Better Science<br />
(Tuesday, 14:00, Room: A2)<br />
h S0657 - Applying for INCITE <strong>Program</strong>, Conclusions,<br />
Q&A (Tuesday, 17:30, Room A2)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
141
SPEAKERS AND<br />
PANELISTS<br />
Elmar Westphal<br />
Software Developer (Forschungszentrum Juelich)<br />
Elmar Westphal has been working at Forschungszentrum<br />
Juelich for 15 years in the group that is now PGI/JCNS-TA<br />
Scientific IT-Systems. His main tasks include planning the<br />
institute’s compute clusters and writing/porting scientific<br />
software for multi-core and <strong>GPU</strong> environments. His latest<br />
projects include the CUDA-port of the micromagnetic<br />
simulation software TetraMag and the creation of a<br />
framework of accelerator routines for <strong>GPU</strong>-assisted<br />
molecular dynamics simulations.<br />
h Session(s): S0036 - Multiparticle Collision<br />
Dynamics on <strong>GPU</strong>s (Tuesday, 15:00, Room: C)<br />
Jan-Philipp Weiss<br />
Junior Professor (Karlsruhe Institute of <strong>Technology</strong>)<br />
Jan-Philipp Weiss is a junior professor at the Karlsruhe<br />
Institute of <strong>Technology</strong> (KIT), Germany. He is heading the<br />
Computing Lab Hardware-Aware Numerics at the<br />
Engineering Mathematics and Computing Labs (EMCL).<br />
From 2008 to <strong>2012</strong> he was heading a Shared Research<br />
Group on multicore and coprocessor technologies at KIT in<br />
joint collaboration with the company Hewlett-Packard.<br />
Research of his group addresses parallel numerical<br />
methods and programming techniques for emerging<br />
multi- and manycore technologies in numerical simulation<br />
and scientific computing. He received a Ph.D. from<br />
University Karlsruhe (TH) in applied mathematics in 2006.<br />
h Session(s): S0289 – Fine-Grained Parallel<br />
Preconditioners for Fast <strong>GPU</strong>-based Solvers<br />
(Wednesday, 09:00, Marriott Ballroom 3)<br />
h S0291 – LAtoolbox: A Multi-platform Sparse<br />
Linear Algebra Toolbox<br />
(Thursday, 10:30, Marriott Ballroom 3)<br />
Ian Williams<br />
Director of Applied Engineering (NVIDIA)<br />
Ian Williams is currently Director of Applied Engineering<br />
within NVIDIA’s Professional Solutions Group. Within the<br />
Applied Engineering team he has been closely involved<br />
in the design and development of many of NVIDIA’s<br />
Industry focused professional solutions and key<br />
technologies. In addition the Applied Engineering team<br />
helps customers and partners integrate these<br />
technologies into their solutions . Prior to NVIDIA he<br />
worked for 8 years at Silicon Graphics in various<br />
technical roles within Application Engineering and the<br />
Desktop Product Group. Prior to Silicon Graphics, he<br />
worked at Rolls Royce Commercial Aerospace<br />
developing applications to numerically simulate<br />
manufacturing processes. He holds a Bachelor of<br />
Science degree in Engineering Science and <strong>Technology</strong><br />
from Loughborough University (UK) as well as a Masters<br />
of Business Administration from Pepperdine University<br />
(CA, USA). He is a Chartered Mechanical Engineer with<br />
the Institute of Mechanical Engineers (UK) and<br />
throughout his career has been awarded several<br />
patents. For the past 10 years he has been Chairman<br />
SPEC/GPC committee which is part of the Standard<br />
Performance Evaluation Corporation and responsible for<br />
developing the industry wide SPECViewperf benchmark.<br />
h S0530 - Multi-Display Roundtable<br />
(Monday, 13:00, Room: A2)<br />
h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />
Round Table (Monday, 14:30, Room: A2)<br />
h S0326 - Next Generation InfoWall<br />
(Thursday, 09:00, Room: A1)<br />
Robert Wipfel<br />
Fellow (Fusion-io)<br />
Robert Wipfel is a Fellow at Fusion-io. Prior to that, at<br />
Novell, Robert was an architect or engineering lead for<br />
various Data Center products that integrated clustering,<br />
virtualization, and shared storage. Robert also helped<br />
Unisys and Intel jointly enter the commercial parallel<br />
processing market. Robert is co-author of Novell’s <strong>Guide</strong><br />
to Storage Area Networks and Novell Cluster Services<br />
and frequently speaks at Novell’s Brainshare and other<br />
technology conferences. Robert earned a BSc (Hons) in<br />
Computer Systems Engineering from the University of<br />
Kent at Canterbury, U.K. He holds ten patents on parallel<br />
processing, clustering, server and storage virtualization.<br />
h Session(s): S0619 – Hate to Wait? Flash Memory<br />
for Full-Throttle <strong>GPU</strong> Acceleration<br />
(Thursday, 09:00, Room: L)<br />
Emmet Witchel<br />
(University of Texas, Austin)<br />
Biography unavailable at press time.<br />
h Session(s): S0360 – Set <strong>GPU</strong>s Free: Integrating<br />
a File System with CUDA <strong>Program</strong>s<br />
(Thursday, 09:30, Hall 1)<br />
Nils Woetzel<br />
PhD Candidate (Vanderbilt University)<br />
Nils Woetzel, a native German, was exposed to the Basic<br />
programming language in the second grade. In his<br />
senior year of high school, he wrote a Delphi program<br />
“TitraCom”, that aided in chemical analysis experiments<br />
and participated with it in the German “Jugend forscht”<br />
high school science competition in 2001. After studying<br />
Chemistry at the University of Leipzig, Germany he<br />
started his PhD in computational structural biology at<br />
the Vanderbilt University in Nashville in 2005, where he<br />
could combine his computational and chemical skills to<br />
develop a novel protein structure prediction algorithm.<br />
h Session(s): S0346 – GP<strong>GPU</strong> Accelerated Protein<br />
Similarity Measures Identifying Biological<br />
Relevant Structure (Wednesday, 17:30, Room: N)<br />
h S0354 – Bcl::ChemInfo Suite Enables Machine<br />
Learning-Based Drug Discovery Using <strong>GPU</strong>s<br />
(Thursday, 09:30, Marriott Ballroom 4)<br />
Tim Wood<br />
Quantitative Analyst (ING Bank nv)<br />
Tim Wood is a Quantitative Analyst and Developer at ING<br />
Bank in the Netherlands. Tim joined ING after studying<br />
Computational Science and Computational Finance at<br />
the University of Amsterdam. Since Joining ING in 2009<br />
Tim has played a key role in the development and<br />
deployment of computationally demanding risk analytics<br />
systems leveraging massively parallel architectures<br />
within the bank.<br />
h Session(s): S0369 - Running Risk On <strong>GPU</strong>s<br />
(Wednesday, 14:00, Room: L)<br />
Cliff Woolley<br />
CUDA Developer <strong>Technology</strong> Engineer (NVIDIA)<br />
Cliff Woolley is a CUDA Developer <strong>Technology</strong> Engineer<br />
with NVIDIA Corporation. He received his Master’s degree<br />
in Computer Science from the University of Virginia in<br />
2003. He was among the earliest academic researchers to<br />
investigate the use of graphics processors for general<br />
purpose computation, having applied these early GP<strong>GPU</strong><br />
ideas both to non-traditional graphics rendering<br />
techniques as well as to non-graphical algorithms such<br />
as a multigrid solver for PDEs.<br />
h Session(s): S0517A - <strong>Program</strong>ming <strong>GPU</strong>s with<br />
OpenACC (Part 1 of 3) (Monday, 10:30, Room: B)<br />
h S0517B - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />
2 of 3) (Monday, 13:00, Room: B)<br />
h S0517C - <strong>Program</strong>ming <strong>GPU</strong>s with OpenACC (Part<br />
3 of 3) (Monday, 14:30, Room: B)<br />
h S0377 - C++ Data Marshalling Best Practices<br />
(Wednesday, 16:30, Room: L)
Rio Yokota<br />
Research Scientist (King Abdullah University of Science<br />
and <strong>Technology</strong>)<br />
Rio Yokota obtained his PhD in Mechanical Engineering<br />
from Keio University, Japan, in 2009, and was a<br />
postdoctoral researcher at the Department of<br />
Mathematics at University of Bristol from 2009-2010,<br />
and also at Mechanical Engineering Department at<br />
Boston University from 2010-2011. During his PhD, he<br />
worked on the implementation of fast multipole methods<br />
on special purpose machines such as MDGRAPE-3, and<br />
then on <strong>GPU</strong>s after CUDA was released. During his<br />
post-doc he has continued to work on fast multipole<br />
methods, and was part of the team that won the Gordon<br />
Bell prize for price/performance in 2009 using 760 <strong>GPU</strong>s<br />
h Session(s): S0308 - Recent Trends in<br />
Hierarchical N-body Methods on <strong>GPU</strong>s<br />
(Tuesday, 15:00, Marriott Ballroom 3)<br />
Eric Young<br />
Manager of Developer <strong>Technology</strong> Profesional and<br />
Consumer Applications (NVIDIA)<br />
Eric Young is a developer technology engineering<br />
working at NVIDIA supporting developer with<br />
professional graphics and computer vision.<br />
h Session(s): S0601 - <strong>GPU</strong>-Based Video Processing<br />
Round Table (Monday, 14:30, Room: A2)<br />
h S0404 - Computer Vision Libraries with <strong>GPU</strong>s<br />
(Tuesday, 09:30, Room: A1)<br />
Ronald Young<br />
President (Multipath Corporation)<br />
Dr. Young received his PhD in Engineering and<br />
Numerical Analysis from UC Berkeley in 1972. His<br />
career has focused on designing matrix algebra<br />
algorithms which exploit all hardware features for<br />
achieving the highest performance possible. In 1989 Dr.<br />
Young founded Multipath Corporation which develops the<br />
Fast Matrix Solver (FMS) software. FMS is an out-of-core<br />
matrix algebra package used to solve extremely large<br />
problems in production applications.<br />
h Session(s): S0032 - Teraflop <strong>GPU</strong> Acceleration Of<br />
Large Matrix Algebra (Thursday, 14:30, Room: C)<br />
Alaa Yousif<br />
Software Solution Architect (Dell)<br />
Alaa Yousif is Principle Engineer at Dell and has spent<br />
the last 12 years in the area of Dell Remote Management<br />
Products. Currently responsible for integrating Hadoop<br />
(Big Data) with HPC cluster. Alaa was also a lead<br />
engineer in custom solutions engineering leading 12<br />
engineers in Austin and Bangalore design centers.<br />
h Session(s): S0309 - Dynamically Allocating GP<strong>GPU</strong><br />
to Host Nodes (servers) (Thursday, 10:30, Room: K)<br />
Song Yu<br />
(Chemical & Petroleum Department, University of Calgary)<br />
Song Yu is a petroleum engineering M.Sc. student who<br />
joined the Department of Chemical and Petroleum<br />
engineering at the University of Calgary in January 2010.<br />
He holds a B.Sc. degree in software engineering(ISS)<br />
from Wuhan University(WHU) in China and M.Sc. degree<br />
in computer software and theory from State Key<br />
Laboratory of Software Engineering(SKLSE) of Wuhan<br />
University(WHU) in China. Research Topic: Parallel<br />
Reservoir Simulation using <strong>GPU</strong> Computing Developing<br />
parallel sparse linear solver package on <strong>GPU</strong> parallel<br />
Computing Environment and integrating them into<br />
reservoir simulation to enhance the performance for<br />
large-scale simulation problems.<br />
h Session(s): S0190 - Large-Scale Reservoir<br />
Simulation on <strong>GPU</strong> (Wednesday, 14:30, Room: A7)<br />
Fabrizio Zanella<br />
Systems Manager (CST of America)<br />
Fabrizio Zanella has been at CST of America, a<br />
worldwide provider of full wave electromagnetic<br />
software, for 6 years. His current role consists of IT<br />
management for North America, and customer support<br />
for topics including hardware, licensing and high<br />
performance computing solutions. Prior to joining CST<br />
Fabrizio had 15 years of experience performing Signal<br />
Integrity characterization of high speed digital systems.<br />
He has worked at various companies including EMC<br />
Corporation and Teradyne.<br />
h Session(s): S0069 - <strong>GPU</strong> Computing Advances<br />
in 3D Electromagnetic Simulation<br />
(Tuesday, 14:00, Room: J3)<br />
Krzysztof Zarzycki<br />
Senior Software Developer (IBM Poland)<br />
Krzysztof Zarzycki is a Senior Software Developer in IBM<br />
Poland, Netezza R&D Department where he plays a role<br />
of technical lead of CUDA Development team. His<br />
research covers using <strong>GPU</strong>s to accelerate various<br />
methods - from AI, data mining & analytics, through<br />
data warehouse operations, finally to solving<br />
bioinformatics problems. He was educated on Warsaw<br />
University in Poland where he got a Master degree of<br />
Computer Science.<br />
h Session(s): S0376 – Dynamic <strong>Program</strong>ming on<br />
CUDA: Finding the Most Similar DNA Sequence<br />
(Tuesday, 10:00, Room: K)<br />
Peter Zaspel<br />
Research Assistant (University of Bonn)<br />
Peter Zaspel is research assistant at the Institute for<br />
Numerical Simulation of the University of Bonn,<br />
Germany. He studied Computer Science and is now<br />
working on his PhD. His research topics are<br />
computational fluid dynamics, general-purpose<br />
computations on graphics hardware and visualization.<br />
h Session(s): S0044 - A Massively Parallel Two-<br />
Phase Solver for Incompressible Fluids on<br />
Multi-<strong>GPU</strong> Clusters (Thursday, 14:00, Room: N)<br />
Kang Zhang<br />
Research Scientist (GE Global Research)<br />
Kang Zhang is currently a research scientist at GE<br />
Global Research Center, New York. He obtained the Ph.<br />
D. and M. S. E. degrees in Electrical and Computer<br />
Engineering from Johns Hopkins University, in 2011 and<br />
2009 respectively, and the B. S. degree in physics from<br />
Nankai University, China, in 2007. His research interests<br />
include GP<strong>GPU</strong> applications, high data throughput<br />
imaging platform, real-time imaging system, and optical<br />
sensing & imaging. From 2009 to 2010, Kang worked as<br />
an ORISE Research Fellow for the U. S. Food and Drug<br />
Administration (FDA), where he developed optical<br />
metrology methods for medical device evaluation.<br />
h Session(s): S0141 - <strong>GPU</strong>-Accelerated Optical<br />
Coherence Tomography Imaging<br />
(Wednesday, 15:30, Room: A8)<br />
Kaiyong Zhao<br />
PhD Student (Hong Kong Baptist University)<br />
Kaiyong received his B.Eng. degree in the Aircraft Design<br />
and <strong>Technology</strong> from Beijing Institute of <strong>Technology</strong> (BIT),<br />
Beijing, P. R. China, in 2005. After that he worked in CCUR<br />
two years, then got his master’s degree at HKBU. Now, he<br />
is currently an PhD student in the Department of<br />
Computer Science, Hong Kong Baptist University.<br />
h Session(s): S0281 - Accelerate a Fully Functional<br />
Photo Editing Software with <strong>GPU</strong><br />
(Wednesday, 15:00, Room: A1)<br />
CONFERENCE GUIDE SPEAKERS AND<br />
PANELISTS<br />
143
SPEAKERS AND<br />
PANELISTS<br />
Hongwei Zhou<br />
Senior Software Development Engineer (Altair)<br />
Hongwei Zhou is a senior software developer. He has<br />
various experiences in sparse direct solver, Lanczos and<br />
automatic multilevel-substructuring Eigen value solver<br />
in Altair Engineering. He received B.S. degree in 2003<br />
and M.S. degree in 2006 from Department of Mechanics,<br />
Peking University, China.<br />
h Session(s): S0225 – Speedup Altair RADIOSS<br />
Solvers Using NVIDIA <strong>GPU</strong><br />
(Wednesday, 09:30, Room: K)<br />
Jun Zhu<br />
Professor (Zhejiang University)<br />
Jun Zhu is currently the Director and a Professor, within<br />
the Institute of Bioinformatics at Zhejiang University.<br />
Previously, he was Vice President at Zhejiang University<br />
(2005-2009). Before that, Zhu was the Dean, for the<br />
College of Agricultural and Biotechnology at Zhejiang<br />
University (1999-2005). His education experience<br />
includes a Ph.D. in Statistics and Genetics, NC State,<br />
USA (1989).<br />
h Session(s): S0516 - The Advantage of <strong>GPU</strong><br />
Computation for Analyzing Complex Traits<br />
(Tuesday, 14:30, Room: K)<br />
Gernot Ziegler<br />
Compute Developer <strong>Technology</strong> (NVIDIA)<br />
Gernot Ziegler (MSc/civ.ing.) is an Austrian engineer with<br />
an MSc degree in Computer Science and Engineering<br />
from Linköping University, Sweden. He pursued his PhD<br />
studies at the Max-Planck-Institute for Informatics in<br />
Saarbrücken, Germany, where he specialized in <strong>GPU</strong><br />
algorithms for computer vision and data-parallel<br />
algorithms for spatial data structures. As a member of<br />
NVIDIA’s DevTech-Compute team, Gernot now consults<br />
in high performance computing on graphics hardware.<br />
h Session(s): S0096 - Summed Area Ripmaps<br />
(Wednesday, 17:30, Marriott Ballroom 3)<br />
Robert Zigon<br />
Sr Staff Development Engineer (Beckman Coulter)<br />
Bob Zigon is a Sr. Staff Research Engineer and has<br />
worked at Beckman Coulter for 10 years. He has<br />
degrees in Computer Science and Mathematics from<br />
Purdue University. He was the architect of Kaluza, an<br />
NVIDIA Tesla powered analysis application for flow<br />
cytometry. He’s now working in particle characterization<br />
and analytical ultracentrifugation. His interests include<br />
high performance computing, numerical analysis and<br />
information retrieval theory.<br />
h Session(s): S0221 - 1024 Bit Parallel Rational<br />
Arithmetic Operators for the <strong>GPU</strong><br />
(Tuesday, 16:00, Marriott Ballroom 3)<br />
Enrico Zschau<br />
Lead Software Architect (SeeReal Technologies GmbH)<br />
Enrico Zschau received the diploma in computer science<br />
from Technical University Dresden, Germany, in 2004.<br />
Since 2000 he has been working as assistant with the<br />
3D-group at Technical University Dresden. In 2002 he<br />
joined Dresden 3D GmbH, a spin-off from the TU<br />
Dresden 3D-group, which became SeeReal Technologies<br />
shortly after. Mr. Zschau’s activities focus on research<br />
and development of software solutions in the fields of<br />
image-processing and GP<strong>GPU</strong>-based algorithms for<br />
holography. He holds the position of Lead Software<br />
Architect and is responsible for a variety of softwaresolutions<br />
especially eye-tracking on PC and DSPs and<br />
real-time holography on <strong>GPU</strong>s and FPGAs.<br />
h Session(s): S0324 - Content Generation and<br />
Real-Time Hologram Computation for Holographic<br />
3D-Displays (Thursday, 10:00, Room: A1)
PLATINUM SPONSORS<br />
ASUS<br />
BULL<br />
CAPS<br />
Cooley LLP<br />
Dell<br />
ASUS comes from the last four letters of Pegasus, the winged horse in<br />
Greek mythology that represents the inspiration of art and learning. ASUS<br />
embodies the strength, creative spirit and purity symbolized by this regal<br />
and agile mythical creature, soaring to new heights of quality and innovation<br />
with each product it introduces to the market.<br />
Bull, the premier European-based global IT supplier, has made Extreme<br />
Computing one of its key strategic priorities. In a few years only, Bull has<br />
won over 150 customers in 15 countries across 3 continents. Bull has a<br />
proven track record of building Extreme Computing systems for prestigious<br />
academic and industry customers, most notably in France, Germany, UK,<br />
Spain, Netherlands and Brazil. Bull’s Extreme Computing solutions are<br />
based on bullx, a range of innovative systems designed for uncompromised<br />
performance, which has gained worldwide recognition. For more information<br />
visit: http://www.bull.com/extreme-computing<br />
CAPS is a major supplier of solutions dedicated to application migration and<br />
deployment on manycore processors. CAPS global solution for manycore<br />
leads the developer to performance by providing top-of-the-range<br />
technology (HMPP hybrid compiler and wizard), code porting methodology<br />
and ecosystem (third software tools, expertise, training…). It’s directivebased<br />
& multi-target HMPP compiler enables developers to safely move to<br />
hybrid CPU / <strong>GPU</strong> model and quickly get performance by leveraging the<br />
computing power of stream processors without the pain associated to <strong>GPU</strong><br />
programming. HMPP is offered within CAPS DevDeck package: an<br />
ALL-IN-ONE multi-level suite for manycore application definition, porting<br />
and optimization with tools (HMPP compiler, development tools such as<br />
HMPP Wizard, debugging & profiling software and scientific libraries),<br />
methodology and resources (tutorials, use cases…).<br />
Cooley LLP is a global law firm for the converging worlds of high technology,<br />
high finance and high-stakes litigation. We are counselors, strategists and<br />
advocates for the foremost private and public companies and investors in all<br />
major technology fields. Our Emerging Companies practice has a long<br />
tradition of representing emerging and high-growth companies worldwide.<br />
The <strong>GPU</strong> space is an exciting growth area in the technology arena, and<br />
Cooley has been at the forefront, advising both established and start-up<br />
companies on the issues facing businesses in this industry. Our attorneys’<br />
extensive experience in intellectual property protection and business<br />
counseling along with the Firm’s deep roots in the technology sector give us<br />
a unique perspective on the issues facing our clients. Cooley’s team consists<br />
of experienced counselors and litigators that are equally skilled at<br />
representing and advising clients on the protection and commercialization of<br />
their intellectual property in a wide range of areas, including copyright,<br />
trademark, patent, technology licensing, privacy, electronic security and<br />
electronic commerce. We are dedicated to offering comprehensive and<br />
creative legal support, utilizing the full resources of the Firm.<br />
For more than 26 years, Dell has played a critical role in transforming<br />
computing, enabling more affordable and more pervasive access to technology<br />
around the world. The company’s technology solutions improve customers’<br />
productivity, enhances their lives and meets their distinct needs.<br />
Headquartered in Round Rock, Texas, Dell serves customers ranging from the<br />
world’s largest and most demanding businesses and public-sector<br />
organizations, to small and medium businesses, and consumers worldwide.<br />
Recognized for its ability to provide customers personalized, built-to-order<br />
technology through direct, online and retail channels, nearly 80 percent of<br />
Dell’s $53 billion in revenue last year was driven by enterprise products,<br />
services and solutions it delivers to businesses and organizations. Dell’s nearly<br />
100,000 team members worldwide are deeply committed to corporate<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
145
SPONSORS AND<br />
EXHIBITORS<br />
PLATINUM SPONSORS, continued<br />
HP<br />
IBM<br />
Lenovo<br />
Los Alamos National Laboratory<br />
Microsoft Corporation<br />
responsibility. The company ranks among Working Mother Magazine’s 100 Best<br />
Companies and first among Newsweek’s Greenest Companies in America.<br />
At Dell, we promote an environment that thrives on innovation. To deliver<br />
effective solutions that meet customer challenges, Dell employs an open,<br />
standards-based approach to technology innovation. Each year, Dell honors<br />
the outstanding inventors among its employees.<br />
HP creates new possibilities for technology to have a meaningful impact on<br />
people, businesses, governments and society. The world’s largest technology<br />
company, HP brings together a portfolio that spans printing, personal<br />
computing, software, services and IT infrastructure to solve customer problems.<br />
More information about HP (NYSE: HPQ) is available at http://www.hp.com.<br />
IBM is involved in more than 150 smart grid engagements around the world,<br />
in both mature and emerging markets. IBM is the founding member of the<br />
Global Intelligent Utility Network Coalition, a unique collaboration of utilities<br />
from around the globe who are working to accelerate the use of smart grid<br />
technologies and move the industry forward through its most challenging<br />
transformation. More about IBM’s vision to bring a new level of intelligence<br />
to how the world works—how every person, business, organization,<br />
government, natural system, and man-made system interacts, can be found<br />
here: http://www.ibm.com/smarterplanet.<br />
Lenovo is one of the world’s largest makers of personal computers and<br />
makes the world’s most innovative PCs, including the renowned ThinkPad ®<br />
notebook as well as products carrying the ThinkCentre ® , ThinkStation ® ,<br />
ThinkServer ® , IdeaCentre ® , and IdeaPad ® sub-brands.<br />
Today, Lenovo is a global corporation with significant operations on six<br />
continents and operating in more than 60 countries and selling products in<br />
160. Everyone at Lenovo takes great pride in our ability to attract top talent<br />
from diverse backgrounds and from around the world. We view our<br />
differences and diversity as a source of strength in building a collaborative<br />
culture that helps us achieve our goals. We have no world headquarters and,<br />
instead, have put in place a distributed management structure that places<br />
operational hubs in centers of excellence around the world integrating this<br />
talented, diverse group into a cohesive Next Generation company.<br />
Los Alamos National Laboratory, a multidisciplinary research institution<br />
engaged in strategic science on behalf of national security, is operated by<br />
Los Alamos National Security, LLC, a team composed of Bechtel National,<br />
the University of California, The Babcock & Wilcox Company, and URS for the<br />
Department of Energy’s National Nuclear Security Administration.<br />
Los Alamos enhances national security by ensuring the safety and reliability<br />
of the U.S. nuclear stockpile, developing technologies to reduce threats from<br />
weapons of mass destruction, and solving problems related to energy,<br />
environment, infrastructure, health, and global security concerns.<br />
Microsoft Visual Studio® development system is an integrated environment<br />
that helps simplify the entire development process from design to<br />
deployment. Customers can unleash their creativity with powerful<br />
prototyping, modeling, and design tools that brings a vision to life. Work<br />
within a personalized environment, and target a growing number of<br />
platforms. With integrated testing and debugging tools that enable delivery<br />
of high-quality solutions, developers and testers can work more efficiently.
PNY<br />
Supermicro<br />
SYNNEX Corporation<br />
TSMC<br />
Established in 1985, PNY Technologies ® , Inc. is the authorized NVIDIA ®<br />
Quadro ® channel partner for North America, Latin America and Europe. PNY<br />
provides unsurpassed service and commitment to its professional graphics<br />
customers offering: 3 year warranty, pre and post sales support, dedicated<br />
Quadro Field Application engineers and direct tech support hot lines. PNY<br />
recently introduced a new line of high performance Solid State Drives Prevail<br />
Series SSD designed specifically for the professional and enterprise<br />
markets. The company also offers a full line of commercial and consumer<br />
graphics cards, computer memory upgrade modules, flash memory cards,<br />
USB flash drives, and HDMI cables. Headquartered in Parsippany, NJ, PNY<br />
maintains facilities in North America, Europe, Asia, and Latin America. For<br />
more information, please visit http://www.pny.com.<br />
Supermicro, the leader in server technology innovation and green computing,<br />
provides customers around the world with application-optimized server,<br />
workstation, blade, storage and <strong>GPU</strong> systems. Based on its advanced Server<br />
Building Block Solutions, Supermicro offers the most optimized selection for IT,<br />
datacenter and HPC deployments. The company’s system architecture<br />
innovations include Twin server, double-sided storage and SuperBlade ® product<br />
families. Offering the most comprehensive product lines in the industry,<br />
Supermicro delivers energy-efficient solutions with unmatched performance<br />
and value. Founded in 1993, Supermicro is headquartered in Silicon Valley with<br />
worldwide operations and manufacturing centers in Europe and Asia. For more<br />
information, visit www.supermicro.com.<br />
SYNNEX Corporation, a Fortune 300 corporation, is a leading business<br />
process services company, partnering with resellers and original equipment<br />
manufacturers in multiple regions around the world. The Company provides<br />
services in IT distribution, supply chain management, contract assembly and<br />
global business services. Founded in 1980, SYNNEX employs more than<br />
10,000 associates worldwide and operates in the United States, Canada,<br />
China, Japan, Mexico, the Philippines and the United Kingdom. Our valueadded<br />
service model streamlines business processes to help customers<br />
across the globe lower their costs and create greater efficiencies. We<br />
provide a variety of professional and marketing services, including: demand<br />
generation; education and training; pre- and post-sale technical support;<br />
end-user enablement; server assessment; design and integration; recycling<br />
and trade-in; contract design and assembly; and IT resource planning.<br />
TSMC is the world’s largest dedicated semiconductor foundry, providing the<br />
industry’s leading process technology and the foundry segment’s largest<br />
portfolio of process-proven libraries, IPs, design tools and reference flows.<br />
The Company’s managed capacity in 2011 totaled 13.22 million (8-inch<br />
equivalent) wafers, including capacity from three advanced 12-inch<br />
GIGAFAB facilities, four eight-inch fabs, one six-inch fab, as well as<br />
TSMC’s wholly owned subsidiaries, WaferTech and TSMC China, and its joint<br />
venture fab, SSMC. TSMC is the first foundry to provide 28nm production<br />
capabilities. Its corporate headquarters are in Hsinchu, Taiwan. For more<br />
information about TSMC please visit http://www.tsmc.com.<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
147
SPONSORS AND<br />
EXHIBITORS<br />
GOLD SPONSORS<br />
Amazon Web Services<br />
Fusion-io<br />
NextIO<br />
SGI<br />
SILVER SPONSORS<br />
Acceleware Corporation<br />
Adobe<br />
Appro International, Inc.<br />
Built upon the same world-class technology that powers Amazon.com,<br />
Amazon Web Services (AWS) provides businesses with a secure, reliable,<br />
easy-to-scale, low-cost computing platform “in the cloud.” Companies of all<br />
sizes, from all around the globe use AWS to build applications, store data,<br />
manage business processes, and more. Learn more: http://aws.amazon.com<br />
The Fusion-io storage memory platform significantly improves processing<br />
capabilities within a data center by moving active data closer to the CPU<br />
where it is processed. Called shared data decentralization, this reduces<br />
latency while increasing data center efficiency. Fusion’s software and<br />
hardware solutions leverage non-volatile memory for enterprise-grade<br />
performance, reliability and manageability.<br />
NextIO was founded based upon the vision of creating shared server I/O<br />
resource pools. Today, NextIO simplifies complex server I/O and enables<br />
any-to-any connectivity among a wide variety of data center resources. With<br />
the NextIO architecture server I/O is consolidated at the top of the rack, may<br />
be shared and dynamically allocated across servers within the rack. NextIO<br />
currently offers a complete portfolio of I/O consolidation and I/ O<br />
virtualization products that are easily managed, highly flexible, and provide<br />
customers with greater operational efficiencies that reduce CapEx and OpEx<br />
costs, and deliver the utmost in data center flexibility and business agility,<br />
which drives productivity and economic efficiencies.<br />
SGI is the trusted leader in technical computing. The company develops,<br />
markets and sells a broad line of mid-range and high-end scale-out and<br />
scale-up servers plus data storage solutions and differentiating software.<br />
SGI solutions are used by the scientific, technical and business communities<br />
to solve challenging, data-intensive compute and data management<br />
problems requiring large amounts of computing power and fast, efficient<br />
data movement both within the computing system and to and from largescale<br />
data storage installations.<br />
Acceleware delivers industry leading CUDA training and HPC consulting<br />
services to organisations looking to unlock the parallel processing potential of<br />
the <strong>GPU</strong>. Acceleware’s software solutions include <strong>GPU</strong> accelerated Seismic<br />
Migration libraries for the Oil & Gas industry and Electromagnetic solvers for<br />
CAE markets. At Acceleware the goal is always the same – Go Faster<br />
Whether it’s a smartphone or tablet app, a game, a video, a digital magazine,<br />
a website, or an online experience, chances are that it was touched by Adobe<br />
technology. Our tools and services enable our customers to create<br />
groundbreaking digital content, deploy it across media and devices, and then<br />
continually measure and optimize it based on user data. By providing<br />
complete solutions that combine digital media creation with data-driven<br />
marketing, we help businesses improve their communications, strengthen<br />
their brands, and ultimately achieve greater business success.<br />
Appro is a leading developer of innovative supercomputing solutions and is<br />
positioned to support High Performance Computing markets. Appro<br />
accelerates technical applications and business results through outstanding<br />
price/performance, power efficiency and fast time-to-market solutions<br />
based on the latest open standards technologies. Appro enables scientists<br />
and engineers to use data-intensive, capacity, capability and hybrid<br />
computing for scientific research, data modeling, engineering simulations,<br />
and seismic visualization. To learn more, visit www.appro.com
Deloitte<br />
ELEKS<br />
GE Intelligent Platforms<br />
Morgan Stanley<br />
SK Hynix<br />
SVB<br />
In the United States, Deloitte LLP and its subsidiaries have 45,000<br />
professionals with a single focus: serving our clients and helping them solve<br />
their toughest problems. We work in four key business areas — audit,<br />
financial advisory, tax and consulting — but our real strength comes from<br />
combining the talents of those groups to address clients’ needs. Fortune and<br />
BusinessWeek consistently rank our organization among the best places to<br />
work, which is good news for our talent and our clients alike. When the best<br />
people tackle the most compelling challenges, everyone wins.<br />
Multi-year expertise in building complex science-intensive solutions<br />
including HPC has determined our value proposition of delivering<br />
sophisticated custom computing systems for power, finance, automation,<br />
entertainment and other industries. ELEKS’ engineering culture, combined<br />
with aspiration for technological excellence and solid project management<br />
skills, ensures superior business value we deliver to our highly valued<br />
customers. For more information about ELEKS’ software development,<br />
localization and testing services go to www.eleks.com.<br />
GE Intelligent Platforms is a leading manufacturer of rugged COTS computer<br />
boards and systems for military programs. As a partner to NVIDIA for<br />
Embedded Applications, GE brings GP<strong>GPU</strong> technology into a wide range of<br />
defense related programs and can now be used in ground tanks, fighter<br />
aircraft, military helicopters, and UAV’s for Radar, ISR, DSP, Sensor<br />
Processing, Imaging and many other military applications.<br />
Morgan Stanley is a leading global financial services firm providing a wide<br />
range of investment banking, securities, investment management and<br />
wealth management services. The Firm’s employees serve clients worldwide<br />
including corporations, governments, institutions and individuals from more<br />
than 1,300 offices in 43 countries. For further information about Morgan<br />
Stanley, please visit www.morganstanley.com.<br />
SK Hynix designs, manufactures and markets a wide variety of DRAM and<br />
NAND Flash memories and CMOS Image Sensors.<br />
SK Hynix is the new corporate name of Hynix Semiconductor Inc. following<br />
the merger with SK Telecom on February 14, <strong>2012</strong>. In synergy with SK<br />
Telecom, SK Hynix expects to enhance its competitiveness in<br />
semiconductors, and expand into new global markets.<br />
Silicon Valley Bank is the premier commercial bank for companies in the<br />
technology, life science, cleantech, venture capital, private equity and<br />
premium wine industries. SVB provides a comprehensive suite of financing<br />
solutions, treasury management, corporate investment and international<br />
banking services to its clients worldwide. Through its focus on specialized<br />
markets and extensive knowledge of the people and business issues driving<br />
them, Silicon Valley Bank provides a level of service and partnership that<br />
measurably impacts its clients’ success. Founded in 1983 and headquartered<br />
in Santa Clara, Calif., the company serves clients around the world through<br />
26 U.S. offices and international operations in China, India, Israel and the<br />
United Kingdom. Silicon Valley Bank is a member of global financial services<br />
firm SVB Financial Group (Nasdaq: SIVB), with SVB Analytics, SVB Capital<br />
and SVB Private Bank. More information on the company can be found at<br />
www.svb.com.<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
149
SPONSORS AND<br />
EXHIBITORS<br />
PLATINUM MEDIA PARTNERS<br />
Dow Jones & Company<br />
Dr. Dobb’s<br />
HPCwire<br />
insideHPC<br />
mergermarket<br />
GOLD MEDIA PARTNERS<br />
HPC in the Cloud<br />
Dow Jones Private Equity & Venture Capital is a division of Dow Jones & Co.,<br />
a News Corporation company. Dow Jones Private Equity & Venture Capital<br />
offers integrated content solutions for deal-sourcing, due diligence and<br />
fundraising needs of today’s venture capital and private equity investors,<br />
corporate investors, advisors, and portfolio companies. Core products<br />
include the deal database VentureSource and the fundraising database LP<br />
Source, as well as the highly-respected publications Private Equity Analyst,<br />
VentureWire, Daily Bankruptcy Review and LBO Wire..<br />
Dr. Dobb’s is the most respected development-focused brand helping<br />
application and software development professionals make the right<br />
decisions for their businesses. Dr. Dobb’s provides deep content that<br />
challenges developers to think of new and dynamic ways to create businessfocused<br />
applications while balancing “what can be developed” with practical,<br />
real-world analysis. http://drdobbs.com<br />
HPCwire is the leading publication for news and information on high<br />
performance and data-intensive computing for business and technology<br />
professionals. HPCwire is the #1 resource selected by academic, government,<br />
industrial and vendor communities who are interested in computationallyintensive<br />
computing, including systems, software, applications, middleware,<br />
networking and storage. Subscribe at: www.hpcwire.com.<br />
insideHPC is the web’s premier high performance computing (HPC) short<br />
format news site. insideHPC distills news and events, and presents them in<br />
bite-sized nuggets of helpfulness as a resource for supercomputing<br />
professionals. insideHPC, along with its sister publication, inside-BigData,<br />
pumps out more than 1.2 million monthly page views to a growing<br />
community of readers that now exceeds 61,000 unique monthly visitors.<br />
mergermarket, part of The Mergermarket Group, is an unparalleled,<br />
independent M&A intelligence tool used by the world’s foremost financial<br />
institutions to originate deals. It provides proprietary intelligence on<br />
potential deal flow, potential mandates and valuations via the world’s largest<br />
group of M&A journalists and analysts who have direct access to the most<br />
senior decision-makers and corporates.<br />
HPC in the Cloud is dedicated to covering data-intensive cloud computing<br />
in science, industry and the data center. The publication provides<br />
technology decision-makers and stakeholders in the high performance<br />
computing industry on developments happening in the point where high<br />
performance and cloud computing intersect. Subscribe now at:<br />
http://www.hpcinthecloud.com/xs/register.
EXHIBITING COMPANIES<br />
3dmx<br />
AccelerEyes LLC<br />
ACE Computers<br />
Advantest<br />
Allinea Software<br />
AMAX<br />
Aspen Systems<br />
BioDigital<br />
BOXX Technologies, Inc.<br />
®<br />
Since 2003, 3dmx has been creating extraordinary 3D animation,<br />
stereoscopic 3D, visual effects, visualizations, live action, stop motion<br />
and video games for the medical, technology and entertainment<br />
industries. When in need to present a groundbreaking invention, to<br />
provide user tutorials for specialized machinery and processes,<br />
training material, architectural walkthroughs or preparing an<br />
appealing set of art for marketing campaigns, 3dmx is able to do it<br />
for you, on time and within budget.<br />
AccelerEyes develops and markets fast, simple <strong>GPU</strong> software<br />
libraries. Today, AccelerEyes delivers products which are used to<br />
accelerate C, C++, Fortran, Python, and MATLAB ® codes on CUDA<br />
and OpenCL <strong>GPU</strong>s.<br />
Founded in 1983, Ace Computers is a respected systems integrator<br />
focused on custom requirements and regularly works with major<br />
Universities, Federal Labs, and Corporate clients. We hold WSCA<br />
and GSA Prime contracts in addition to multiple GWACs. Ace is<br />
ISO9001:2008 Certified and is well associated with NVIDIA, Intel<br />
and AMD.<br />
A world-class technology company, Advantest is the leading<br />
producer of automatic test equipment (ATE) for the semiconductor<br />
industry and a premier manufacturer of measuring instruments. Its<br />
leading-edge products are integrated into the most advanced<br />
semiconductor production lines in the world. Founded in Tokyo in<br />
1954, Advantest now operates in 21 countries worldwide.<br />
www.advantest.co.jp<br />
We’re recognized as the leading vendor of tools for parallel software<br />
development and High Performance Computing (HPC). One of the<br />
fastest growing companies in HPC, we were recently honored as a Red<br />
Herring Top 100 company. We have offices in the US and the UK, as<br />
well as network of resellers and partners in most parts of the world.<br />
AMAX, pioneer of the Personal Supercomputer, is a leading<br />
technology provider with over 30 years of solidified partnerships<br />
with technology innovators such as NVIDIA. AMAX excels at<br />
delivering unique and customized HPC cluster, server and storage<br />
solutions that continually push the limits of innovation with<br />
maximum performance and exceptional efficiency.<br />
Aspen Systems, founded in 1982, is an established, privately-held,<br />
two time Inc. 500 corporation that designs, manufactures, and<br />
services computing products including high-performance compute<br />
clusters, systems software, storage/file systems, and visualization.<br />
Aspen Systems places its highest priority on first class technical<br />
support and the creation of fully customized products that always<br />
incorporate the latest technologies. This allows our customers to<br />
enjoy the highest performing solutions at very competitive prices.<br />
BioDigital is the leading developer of state of the art biomedical<br />
visualization. BioDigital recently launched The BioDigital Human <br />
- a 3D visualization platform with a revolutionary approach for<br />
communicating health and medical information with interactive<br />
tools for exploring human anatomy, physiology and conditions.<br />
BOXX is the leading innovator of high-performance workstations<br />
and rendering systems for product design, engineering, visual<br />
effects, animation, architectural visualization, and more. For over<br />
15 years, we’ve combined record-setting performance, speed, and<br />
reliability with unparalleled industry knowledge to become the<br />
trusted choice for creative professionals worldwide.<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
151
SPONSORS AND<br />
EXHIBITORS<br />
Bright Computing<br />
Cirrascale<br />
Colfax International<br />
Concurrent<br />
Creative Consultants<br />
Cyberpower<br />
Digital Storm<br />
reative<br />
onsultants<br />
COMPUTE FASTER!<br />
Bright Computing, a leader in integrated cluster management<br />
software, provides seamless management of NVIDIA <strong>GPU</strong> and<br />
hybrid clusters. Bright is a single solution for provisioning,<br />
scheduling, monitoring and managing clusters. Every Brightmanaged<br />
cluster is also cloud-ready, enabling users to extend their<br />
system into AWS EC2 for access to additional CPUs and NVIDIA<br />
<strong>GPU</strong>s, with a few mouse clicks. All of this capability is accessed via<br />
its intuitive GUI or using Bright’s powerful cluster management<br />
shell. Bright Computing is headquartered in San Jose, CA<br />
http://www.brightcomputing.com<br />
Cirrascale Corporation is a premier provider of advanced GP/<strong>GPU</strong><br />
blade-based workstation and server solutions for conventional and<br />
containerized data centers that are scalable, reliable and offer best<br />
price/performance value in the industry. Cirrascale leverages its<br />
patented Vertical Cooling <strong>Technology</strong> to provide the industry’s most<br />
energy-efficient standards-based platforms with the lowest possible<br />
total cost of ownership in the densest form factor. To learn more<br />
about Cirrascale and its unique GP/<strong>GPU</strong> solutions, please visit<br />
http://www.cirrascale.com or call (888) 942-3800.<br />
Buy it from a trusted expert. Colfax provides the most comprehensive<br />
range of innovative, cutting-edge and highly customized <strong>GPU</strong><br />
solutions. With outstanding price/performance and technical<br />
support, Colfax is a leading choice of scientists and engineers for<br />
<strong>GPU</strong>-accelerated data modeling, simulation and real-time<br />
visualization solutions. Visit www.colfax-intl.com for more details.<br />
Concurrent Computer Corporation (NASDAQ:CCUR) is a worldwide<br />
leader in real-time Linux ® computing technology including real-time<br />
operating systems; advanced debugging and analysis tools;<br />
simulation tools; and fully-integrated multiprocessing/<strong>GPU</strong> computer<br />
platforms. Concurrent focuses on hardware-in-the-loop and<br />
man-in-the-loop simulation, data acquisition and industrial systems.<br />
For more information, please visit www.real-time.ccur.com.<br />
Creative Consultants demonstrates a Multi-Projector Semi-<br />
Immersive Virtual Reality (VR) environment with <strong>GPU</strong> enabled<br />
warping and blending. Our parallel code development appliance<br />
Stelletto computes hundreds of thousands of threads, in real time,<br />
driving the VR display; thus creating an interactive HPC<br />
demonstration with live scaling of calculations for 250,000 particles.<br />
CyberPower, Inc. is one of the nation-wide leading computer system<br />
manufacturers. As published in the Los Angeles Business Journal<br />
in 2003, we were the fastest growing private company in Los<br />
Angeles. With vision, commitment, and steadfast determination, we<br />
manufacture and distribute various customized high-end gaming<br />
machines, notebook systems and high performance workstations<br />
to meet the unique needs for gamers, businesses, government<br />
agencies, educational institutions and other end-users.<br />
Founded in 2002, Digital Storm has rapidly emerged as the<br />
predominant name in system integration. With expertise in<br />
workstation computers, Digital Storm’s mission is to deliver its<br />
customers bleeding edge technology with direct support. As a<br />
validation of Digital Storm’s success, its systems have received the<br />
industry’s most prestigious awards.
EM Photonics<br />
Exxact Corporation<br />
eyesight Mobile technologies<br />
Ltd.<br />
Fuzzy Logix<br />
GraphStream Incorporated<br />
Green Revolution Cooling<br />
Immersive Media<br />
JMR Electronics, Inc.<br />
MathWorks<br />
��<br />
���������������<br />
Innovators in Storage<br />
Technologies<br />
EM Photonics’ core competency lies in its strength with using <strong>GPU</strong>s,<br />
FPGAs, and other parallel computing platforms to accelerate extremely<br />
complex computational applications. We have developed products in the<br />
areas of image processing, linear algebra, and scientific computing and<br />
worked with clients in fields from finance to defense to life sciences.<br />
Founded in 1992, Exxact Corporation is both a value-added<br />
distributor of professional workstation graphics cards and a<br />
manufacturer of solutions for visualization and compute-intensive<br />
applications. In addition, Exxact offers software and services to<br />
develop, port, maintain, and deploy applications for <strong>GPU</strong> computing.<br />
eyeSight’s Touch Free technology provides an enhanced user<br />
experience, allowing to easily and intuitively control a variety of devices<br />
using simple hand gestures. eyeSight’s Natural User Interface<br />
solution utilizes the device’s standard 2D camera, along with advanced<br />
real-time image processing and machine vision algorithms, to track<br />
the user’s hand gestures and convert them into actions.<br />
Fuzzy Logix is the leading provider of in-database analytics software<br />
and <strong>GPU</strong>-based analytics solutions. Our <strong>GPU</strong> Appliance, TANAY,<br />
makes accessing the power of <strong>GPU</strong> technology easy and includes a<br />
library of over 300 analytic functions that can be invoked from DLLs<br />
or Shared Objects. Additional Information: http://www.fuzzl.com<br />
GraphStream is a supplier of advanced scalable systems for data<br />
networking, processing, and storage. These systems are customconfigured<br />
to meet specific application requirements with superior<br />
simplicity, reliability, scalability, and efficiency. Since 2003,<br />
GraphStream has worked together with PNY and NVIDIA to deliver<br />
some of the world’s most powerful <strong>GPU</strong>-accelerated systems.<br />
Green Revolution Cooling (GRC) provides the highest performance,<br />
lowest cost-per-Watt cooling system available today for data centers.<br />
The CarnotJet system submerges fanless OEM servers into a<br />
managed dielectric fluid environment, reducing cooling energy by<br />
95% while providing powerful and continuous heat removal for even<br />
the highest density servers.<br />
Immersive Media is the pioneer and leading world provider of 3600,<br />
full motion, interactive video. Our immersive 3600 video content is<br />
delivered via internet to PC, Ipad or mobile device. Immersive Media<br />
provides the enabling technologies for interaction videos to record,<br />
process, live stream and deliver images from ours or other wide<br />
field cameras, with a patent portfolio covering key discoveries and<br />
capabilities of interactive and immersive video.<br />
JMR ELECTRONICS INC. is a 30-year established ISO 9001 certified<br />
design, development and manufacturing resource for high<br />
performance computing and storage systems based in Chatsworth,<br />
CA. JMR’s award-winning BlueStor and SilverStor systems are<br />
widely used in broadcast, digital intermediate, geophysical survey,<br />
post-production and scientific applications.<br />
Over one million people around the world use MATLAB for technical<br />
computing. They rely on MATLAB to help them develop cancer<br />
therapies, search for new sources of energy, make our cars safer<br />
and more fuel efficient, and explore outer space. By combining a<br />
powerful numeric engine and technical programming environment<br />
with interactive exploration and visualization tools, MATLAB has<br />
become the language of technical computing. For more<br />
information, visit www.mathworks.com<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
153
SPONSORS AND<br />
EXHIBITORS<br />
MBA Sciences<br />
Mellanox Technologies<br />
Mentor Graphics Corp.<br />
Mersive<br />
Microway Inc.<br />
migenius<br />
Morgan Kaufmann<br />
MulticoreWare Inc.<br />
Deliver on the promise of Data and Graph Analytics. MBA Sciences<br />
enables engineers and scientists to rapidly prototype, analyze and<br />
deploy robust parallel solutions across heterogeneous computing<br />
resources spanning servers, cores and <strong>GPU</strong>s from either data<br />
centers or public clouds.<br />
Mellanox Technologies (NASDAQ: MLNX, TASE: MLNX) is a leading<br />
supplier of end-to-end InfiniBand and Ethernet connectivity<br />
solutions and services for servers and storage. Mellanox products<br />
optimize data center performance and deliver industry-leading<br />
bandwidth, scalability, power conservation and cost-effectiveness<br />
while converging multiple legacy network technologies into one<br />
future-proof architecture. www.mellanox.com<br />
The Mentor Graphics ® Embedded Software Division comprises the<br />
Mentor ® Embedded family of products and services, including<br />
embedded software intellectual property (IP), tools, and professional<br />
consultant services to help embedded developers and silicon<br />
partners optimize their products for design and cost efficiency. The<br />
Mentor Embedded team continues to lead the industry with<br />
involvement in the open source community, with Inflexion ® 2D and 3D<br />
UI development, Sourcery open source tools, and Nucleus ® RTOS<br />
solutions. More information on Mentor Embedded products and<br />
services can be found at www.mentor.com/embedded<br />
Since it was founded in 2006, Mersive has revolutionized high<br />
performance display setup and maintenance enabling a new class of<br />
displays. Mersive’s Sol software automatically aligns multiple<br />
commodity projectors into one seamless image of extraordinary<br />
quality and resolution without the expense of specialized hardware<br />
and services. For more information, visit www.mersive.com<br />
Since 1982, Microway has earned an international reputation for<br />
building screaming fast HPC clusters, servers, and<br />
WhisperStations. Since 2007, these have included Tesla <strong>GPU</strong>s.<br />
Utilizing multi-core CPUs, high-efficiency power, robust designs<br />
and excellent cooling, Microway’s <strong>GPU</strong> clusters deliver more<br />
TFLOPs with fewer watts. Our unique Tesla systems offer full PCI-E<br />
Gen3 support and optional FDR InfiniBand.<br />
The migenius mission is to bring software and web services to the<br />
market that enable ‘live 3D for all’ for better and much faster<br />
decision making in design and marketing. Leveraging the power of<br />
the cloud, <strong>GPU</strong> and NVIDIA iray, migenius provides platforms and<br />
applications to make this a reality.<br />
Morgan Kaufmann delivers the knowledge of experts to the<br />
computing community. Through superior print and digital content,<br />
our authors aim to educate our readers and inspire innovation.<br />
MulticoreWare, Inc. develops tools and software solutions for<br />
homogeneous and heterogeneous architectures for profiling,<br />
optimization and portability. With significant expertise in <strong>GPU</strong> and<br />
multicore CPU programming models and in OpenCL, the company<br />
has delivered tools and software solutions in architectures such as<br />
OpenMP and CUDA to high-performance applications including<br />
video and image processing.
NeST/SFO Technologies<br />
Numecent<br />
Numira Biosciences<br />
Patriot Technologies<br />
PEER 1 Hosting<br />
Penguin Computing<br />
PGI<br />
SFO Technologies, a NeST Group company, offers end-to-end<br />
engineering solutions to OEMs in Healthcare, Industrial,<br />
Communications and Transportation verticals. Services include<br />
hardware and software design, embedded product engineering,<br />
application development, prototyping, testing and manufacturing.<br />
An early adopter of GP<strong>GPU</strong>, and a CUDA Design Partner of NVIDIA,<br />
NeST specializes in <strong>GPU</strong> computing and 3D Graphics solutions,<br />
leveraging a highly skilled team and a streamlined process to<br />
deliver industry leading speedup and optimization.<br />
Numecent (www.numecent.com) is a start-up which came out of<br />
stealth with a bang in March <strong>2012</strong> and is the inventor of<br />
‘cloudpaging’. This patented technology enables friction-free digital<br />
delivery of native software and other non-linear assets through<br />
virtualization. One of the benefits of cloudpaging is that it can<br />
reduce the network footprint of digital downloads between 20x and<br />
100x and execute them natively, at full speed, without actually<br />
requiring installation. Once cloudpaged, applications can even run<br />
off-line and always under license control.<br />
Numira Biosciences is a leading provider of specialty contract<br />
research services for preclinical drug and device development.<br />
Numira’s customers include the top biopharmaceutical companies<br />
and academic research institutions. Through its next-generation<br />
study portal, Numira provides its customers with interactive tools<br />
for accessing, exploring, and communicating about their preclinical<br />
study data.<br />
Patriot’s Manufacturing and Logistics Services enables software<br />
developers, application users and solution providers to optimize their<br />
software applications on a reliable, branded and customized hardware<br />
platform. By choosing Patriot, customers can leverage an appliancebased<br />
model with minimal investment and realize the benefits of<br />
faster time-to-market, increased profitability and business growth.<br />
Two obsessions – Ping & People – have made us one of the world’s<br />
leading hosting providers. Our proprietary 10Gbps FastFiber Network <br />
and 18 datacenters connect our customers to the world. And our<br />
FirstCall Promise supports over 10,000 businesses 24x7x365. The first<br />
large-scale <strong>GPU</strong> Cloud is just one of our hosting innovations.<br />
For well over a decade Penguin Computing has been delivering<br />
integrated, Linux based solutions for the enterprise and HPC space.<br />
With Linux expertise that is unmatched in the industry Penguin<br />
Computing offers an end-to-end portfolio of products that range<br />
from Linux servers and workstations to integrated, turn-key HPC<br />
clusters and cluster management software.<br />
The Portland Group ® is a premier supplier of software compilers<br />
and development tools for parallel computing. PGI ® offers high<br />
performance scalar and parallel Fortran, C and C++ compilers and<br />
tools for systems based on 64-bit x86 processors from Intel and<br />
AMD, and NVIDIA CUDA-enabled <strong>GPU</strong>s running under Linux,<br />
MacOS and Windows operating systems.<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
155
SPONSORS AND<br />
EXHIBITORS<br />
Polywell<br />
PQ Labs, Inc<br />
Prefixa<br />
Ramtron International<br />
Corporation<br />
Raytrix GmbH<br />
Reservoir Labs<br />
RTT<br />
raytrix<br />
3D light field camera<br />
Scalable Display <strong>Technology</strong><br />
Polywell, established in 1987, is a manufacturer of high quality<br />
computer products. Its lineup ranges from industrial embedded<br />
PCs and storage solutions to high-performance workstations and<br />
high-end servers. Polywell has been serving the needs of various<br />
commercial and government entities with systems for CAD/CAM,<br />
animation, content creation, and for data centers. Polywell also<br />
offers OEM/ODM services for various vertical markets, such as<br />
Digital Signage, Kiosk, POS, Surveillance, IPTV, entertainment,<br />
gaming, medical equipment, network appliance and IP Phone.<br />
Established in Silicon Valley, PQ Labs, Inc. is a leading provider of<br />
Multi-Touch solution in the world, providing revolutionary hardware<br />
and software to eliminate the need of keyboard and mouse for<br />
future computers. PQ Labs’ Multi-Touch G³ enables people to<br />
interact with computers directly using just fingers and gestures.<br />
The company’s key technology improvement is enabling a next<br />
generation of natural user interface to be widely adopted in the<br />
computer industry.<br />
Prefixa develops 3D solutions for 3D data capture, model and<br />
render, accelerated with Nvidia <strong>Technology</strong>. Our core technology is a<br />
3D Photorealistic Render Engine natively implemented in NVIDIA-<br />
CUDA, and scalable to multiple <strong>GPU</strong> - Multiple CPU nodes. We are<br />
looking for key partners to scale our solution to the cloud, and build<br />
business around our platform.<br />
Ramtron International Corporation, headquartered in Colorado<br />
Springs, Colorado, is a fabless semiconductor company that<br />
designs, develops and markets specialized semiconductor memory<br />
and integrated semiconductor solutions used in a wide range of<br />
product applications and markets worldwide. For more information,<br />
visit www.ramtron.com.<br />
Raytrix develops and markets single-lens 3D video cameras based<br />
on their patented high resolution light field technology, offering<br />
solutions for Particle Image Velocimetry (PIV), optical inspection,<br />
face capturing, microscopy – as well as IP for consumer products<br />
(mobile phones).<br />
Privately owned and in business since 1990, Reservoir Labs<br />
specializes in advanced compiler, network and reasoning<br />
technologies with an emphasis on mapping innovative algorithms<br />
to high performance and embedded architectures. We deliver<br />
cutting-edge technology products, customized solutions and<br />
advanced R&D services to our commercial and government clients.<br />
RTT stands for creative and fascinating 3D visualization solutions,<br />
which bring products to life in realtime and portray them in a<br />
natural and realistic environment. Our RTT Virtual Prototyping and<br />
RTT Virtual Marketing products and services combine software,<br />
support and customized strategic solutions, allowing us to turn<br />
dreams into reality.<br />
Scalable Display Technologies is a global leader providing software<br />
tools to construct and manage ultra high-resolution displays.<br />
Scalable’s software is used by the military and Global 1000<br />
accounts to enhance productivity through higher resolution and<br />
increased visual realism of displays. Scalable’s products are<br />
spawning a new class of displays called “multi-megapixel displays”.
SECO<br />
Seneca<br />
Splashtop Inc<br />
Terascala, Inc.<br />
Themis Computer<br />
TunaCode<br />
TYAN<br />
Seco, International company leader in the electronic embedded<br />
solutions, over its 30 years has shown the capability to adapt its<br />
know-how to meet the new challenging customer needs guiding the<br />
customer to its most innovative solutions. The collaborations with<br />
important scientific Universities and partnerships with the worldwide<br />
leading companies have contributed to transform Seco in an<br />
International reality that have owned the market based on the new<br />
challenges of the ordinary days.<br />
Seneca is a premier U.S.-based custom system manufacturer and<br />
value-added technology distributor with over 30 years of experience.<br />
As a designer and manufacturer of High Performance Computing<br />
Clusters, Seneca supports academic, lab, government, and defense<br />
researchers across the nation. Our HPC practice includes solutions for<br />
compute clusters, NVIDIA GP<strong>GPU</strong> platforms, technical computing<br />
workstations, storage systems, and management software.<br />
Splashtop aspires to touch people’s lives by delivering the best-inclass<br />
remote desktop experience - bridging tablets, phones,<br />
computers and TVs. Splashtop technology empowers consumer<br />
and business users with high-performance, secure, interactive<br />
access to their favorite applications, media content and files<br />
anytime, anywhere. Splashtop is headquartered in San Jose with<br />
offices in Beijing, Hangzhou, Shanghai, Taipei and Tokyo. For more<br />
information, visit http://www.splashtop.com.<br />
Terascala’s high throughput storage solutions make big data fast.<br />
With Terascala, organizations transition from storing and sifting their<br />
data to leveraging that data to drive applications. Combining a<br />
parallel file system, extensive analysis and optimization, appliances<br />
enable rapid analysis of big data sets using large server installations.<br />
Themis combines industry leadership, high-performance<br />
computing, and advanced thermal and mechanical design<br />
techniques to deliver reliable, rugged standards-based and custom<br />
embedded computing solutions. From small form factor computers<br />
to large blade servers, Themis is committed to building products<br />
that achieve a superior balance the between standard commercial<br />
technology and ruggedness to keep mission-critical applications<br />
available in the most demanding environments. Our diverse product<br />
portfolio includes: board-level computers, rack mounted servers,<br />
bladed server systems, mission and payload systems, small form<br />
factors, and storage appliances.<br />
TunaCode delivers accelerated computing solutions making<br />
innovative use of multi-core and manycore processors. We develop<br />
and market CUVILib which offers <strong>GPU</strong>-accelerated Vision and<br />
Imaging functionality with plug-and-play ease of use resulting in<br />
instant speedups of 10X. With over 1000 active users and<br />
commercial deployments in Medical Imaging, Industrial/Defense<br />
Imaging and Entertainment domains, CUVILib offers cost-effective<br />
way to achieve real-time performance in Imaging applications.<br />
PNY and TYAN have established a new EMEA partnership to offer a<br />
wide range of NVIDIA <strong>GPU</strong>-based computing platforms designed for<br />
High Performance Computing (HPC) and massive parallel computing<br />
environments. As companion processor to the CPU in a server,<br />
NVIDIA TESLA <strong>GPU</strong>s accelerate HPC applications by up to 10x.<br />
CONFERENCE GUIDE SPONSORS AND<br />
EXHIBITORS<br />
157
SPONSORS AND<br />
EXHIBITORS<br />
Ubitus Inc.<br />
USEFULPROGRESS<br />
WILD Systems (HPC Project)<br />
Wolfram Research, Inc.<br />
Wurth Electronics Midcom<br />
Zoobe<br />
WILD SYSTEMS<br />
Ubitus Inc., the technology leader in deploying Cloud-enabled rich<br />
media services, offers innovative cloud computing solutions for<br />
device manufacturers, wired/wireless communication service<br />
providers, telecommunication operators and digital content<br />
developers. Founded in 2007 and headquartered in Taipei, Taiwan,<br />
the company now has 150 employees and 4 offices in Tokyo, Beijing,<br />
Guangzhou and Seoul.<br />
The development in computer graphics allows huge progress in the<br />
knowledge of Life and Matter. In Medical science, CT scanners<br />
allow to investigate the whole body with transparency. A very<br />
important step in data analysis consist to convert signals (X, MR,<br />
US) in digital data that could be treated by computers<br />
UsefulProgress develops new software strategies based on<br />
computer graphics for highperformance visualisation.<br />
Wild Systems is a recognized expert in software performance<br />
optimization. At Wild Systems, we combine know how and tools for<br />
automatic code parallelization. This allows the user to run its<br />
optimized application on hybrid architecture appliances. Connected<br />
to the network, these appliances, fully dedicated to a given<br />
optimized application software, boosts its execution performance.<br />
Research is the company where “computation meets knowledge.”<br />
A powerhouse in technical innovation, the company is the developer of<br />
Mathematica, the ultimate computation platform, and Wolfram|Alpha,<br />
the computational knowledge engine. Wolfram also sponsors the<br />
world’s largest free network of technical information websites,<br />
including MathWorld and the Wolfram Demonstrations Project.<br />
Würth Elektronik is one of the world’s leading manufacturers of<br />
passive and electromechanical components. Our product range<br />
contains EMC ferrites, filter chokes, common mode chokes, circuit<br />
protection EMI shielding material, power inductors, power<br />
transformers, LAN and telecom transformers, RF inductors,<br />
LTCC components, connectors, switches, assembly technique and<br />
power elements.<br />
Zoobe is a messaging service that allows you to voice an animated<br />
character. From your voice or text message and your chosen<br />
character we generate a personal animation clip within seconds<br />
which you can send to your friends or post on your wall.
<strong>GTC</strong> WORLDWIDE<br />
EVENTS<br />
SAVE THE DATES<br />
<strong>GTC</strong> JAPAN <strong>2012</strong><br />
July 26<br />
Tokyo Midtown Hall<br />
www.gputechconf.jp<br />
<strong>GTC</strong> U.S. 2013<br />
March 19–22<br />
San Jose McEnery Convention Center<br />
www.gputechconf.com
STAY EDUCATED!<br />
<strong>GTC</strong> is comprised of year-round international<br />
conferences, workshops and online events. It is an<br />
essential resource for the scientists, engineers,<br />
researchers, and developers who rely on <strong>GPU</strong>s to tackle<br />
enormous computational challenges. <strong>GTC</strong> On-Demand<br />
gives you archival access to the world-class education<br />
delivered at <strong>GTC</strong>, as well as the latest research and<br />
insights presented by NVIDIA staff at other important<br />
industry events. Explore and learn from the best and<br />
brightest minds working in High Performance<br />
Computing today. Visit www.gputechconf.com<br />
Blog - http://blogs.nvidia.com/category/supercomputing/<br />
Facebook - https://www.facebook.com/gputechnologyconference<br />
Twitter - http://twitter.com/#!/gpucomputing<br />
LinkedIn - http://www.linkedin.com/groups?about=&gid=2159196<br />
Flickr - http://www.flickr.com/photos/nvidia/collections/<br />
YouTube - http://youtube.com/user/nvidiatesla<br />
Meetup - http://hpc.meetup.com/<br />
STAY CONNECTED!<br />
<strong>GTC</strong> attendees are talented. No doubt you’ve had firsthand<br />
experience of this here at <strong>GTC</strong> <strong>2012</strong>. Attendees<br />
work in major industry verticals such as Finance,<br />
Government, Life Sciences, Energy, Computer Software<br />
Development, Manufacturing, as well as Academia. <strong>GTC</strong><br />
provides invaluable opportunities for peer-to-peer<br />
learning and connection within and across industries all<br />
year long. Build on the relationships you made this week.<br />
Stay connected!
VISUALIZE A GREEN EVENT<br />
Place compostables and recyclables in proper bins<br />
Use public transportation during the show<br />
In hotel, decline new sheets and towels<br />
Also, unplug phone and laptop chargers<br />
Offset your travel at www.cool-it.us<br />
Take only collateral/giveaways you will use<br />
What We’re Doing<br />
> 100% of convention center’s greenhouse gas is offset<br />
> Extensive composting and recycling<br />
> Producers and vendors agree to green guidelines<br />
> Minimizing printed materials<br />
> Using recycled and biodegradable paper/non-toxic inks<br />
> Monitoring lighting and A/C usage<br />
> Local-based food options when available<br />
> Non-toxic cleaning materials<br />
�������������<br />
�������������������������������<br />
�������������������������������<br />
����������������������<br />
�����������������������<br />
�����������<br />
���������������������������������
FIRST FLOOR<br />
TO ST. CLAIRE HOTEL<br />
BALLROOMS<br />
(ACROSS THE STREET)<br />
GOLD SPONSORS<br />
SILVER SPONSORS<br />
SECOND FLOOR<br />
SALES<br />
OFFICE<br />
SPEAKER D<br />
READY ROOM<br />
SPEAKER &<br />
SPONSOR<br />
LOUNGE<br />
K L M N<br />
E<br />
PRESS<br />
LOUNGE<br />
STAIRS DOWN<br />
TO ROOMS K, L, M, N<br />
MICROSOFT<br />
LOUNGE<br />
STORES<br />
THINK TANK<br />
SILICON VALLEY<br />
BOARD ROOM<br />
NVIDIA MEETING ROOM<br />
CHECK-IN<br />
GUADALUPE<br />
MARRIOTT<br />
SAN CARLOS BALL-<br />
ROOM 3<br />
MARRIOTT<br />
WILLOW GLEN<br />
C<br />
B<br />
3<br />
2<br />
1<br />
A3<br />
A2<br />
A1<br />
POSTERS<br />
BALL-<br />
ROOM 4<br />
LAB<br />
A5<br />
A7<br />
A8<br />
MAIN ENTRANCE<br />
KEYNOTE HALL EXHIBIT HALL<br />
HALL 1 HALL 2<br />
STAIRS<br />
DOWN TO LAB<br />
ELEVEVATOR TO BLOSSOM HILL,<br />
ALMADEN AND<br />
3RD FLOOR MEETING ROOMS<br />
PARKING<br />
REGISTRATION<br />
PLATINUM MEDIA SPONSORS GOLD MEDIA SPONSORS<br />
HILTON<br />
STAFF &<br />
SHOW MANAGEMENT<br />
HILTON<br />
J3<br />
J2<br />
J1<br />
F2 H<br />
F1 G<br />
ALMADEN<br />
CONCOURSE<br />
ELEVATOR TO<br />
2ND FLOOR<br />
VIP MEETING ROOM<br />
�������������<br />
�������������������������������<br />
�������������������������������<br />
����������������������<br />
�����������������������<br />
© <strong>2012</strong> NVIDIA CORPORATION. ALL RIGHTS RESERVED. �����������<br />
���������������������������������