From pathology to human avatars, oral
papers—top 3% of all papers—reveal advanced research
results
LOS
ALAMITOS, Calif., May 16, 2024
/PRNewswire/ -- Co-sponsored by the IEEE Computer Society (CS) and
the Computer Vision Foundation (CVF), the 2024 Computer Vision and
Pattern Recognition (CVPR) Conference is the preeminent event
for research and development (R&D) in the hot topic areas
of computer vision, artificial intelligence (AI), machine learning
(ML), augmented, virtual and mixed reality (AR/VR/MR), deep
learning, and related fields. Over the past decade, these areas
have seen significant growth, and the emphasis on this sector by
the science and engineering community has fueled an increasingly
competitive technical program.
This year, the CVPR Program Committee received 11,532 paper
submissions—a 26% increase over 2023—but only 2,719 were accepted,
resulting in an acceptance rate of just 23.6%. Of those accepted
papers, only 3.3% were slotted for oral presentations based on
nominations from the area chairs and senior area chairs overseeing
the program.
"CVPR is not only the premiere conference in computer vision,
but it's also among the highest-impact publication venues in all of
science," said David Crandall,
Professor of Computer Science at Indiana
University, Bloomington, Ind., U.S.A., and CVPR 2024 Program
Co-Chair. "Having one's paper accepted to CVPR is already a major
achievement, and then having it selected as an oral presentation is
a very rare honor that reflects its high quality and potential
impact."
Taking place 17-21 June at the Seattle Convention Center in Seattle, Wash., U.S.A., CVPR offers oral
presentations that speak to both fundamental and applied research
in areas as diverse as healthcare applications, robotics, consumer
electronics, autonomous vehicles, and more. Examples include:
- Pathology: Transcriptomics-guided Slide
Representation Learning in Computational Pathology*– Training
computer systems for pathology requires a multi-modal approach for
efficiency and accuracy. New work from a multi-disciplinary team at
Harvard University (Cambridge, Mass., U.S.A.), the Massachusetts Institute of Technology (MIT; Cambridge,
Mass., U.S.A.), Emory University
(Atlanta, Ga., U.S.A.) and others
employs modality-specific encoders, and when applied on liver,
breast, and lung samples from two different species, they
demonstrated significantly better performance when compared to
current baselines.
- Robotics: SceneFun3D: Fine-Grained Functionality
and Affordance Understanding in 3D Scenes – Creating realistic
interactions in 3D scenes has been troublesome from a technology
perspective because it has been difficult to manipulate objects in
the scene context. Research from ETH Zürich (Zürich,
Switzerland), Google
(Mountainview, Calif., U.S.A.), Technical University of
Munich (TUM; Munich, Germany), and Microsoft (Redmond, Wash., U.S.A.) has begun bridging
that divide by creating a large-scale dataset with more than
14.8k highly accurate interaction
annotations for 710 high-resolution real-world 3D indoor scenes.
This work, as the paper concludes, has the potential to "stimulate
advancements in embodied AI, robotics, and realistic human-scene
interaction modeling."
- Virtual Reality: URHand: Universal Relightable
Hands – Teams from Codec Avatars Lab at Meta (Menlo Park, Calif., U.S.A.) and Nanyang Technological University (Singapore) unveil a hand model that
generalizes to novel viewpoints, poses, identities, and
illuminations, which enables quick personalization from a phone
scan. The resulting images make for a more realistic experience of
reaching, grabbing, and interacting in a virtual environment.
- Human Avatars: Semantic Human Mesh
Reconstruction with Textures – Working to create realistic
human models, teams at Nanjing
University (Nanjing, China) and Texas A&M
University (College Station, Texas, U.S.A.) designed a method of 3-D human
mesh reconstruction that is capable of producing high-fidelity and
robust semantic renderings that outperform state-of-the-art
methods. The paper concludes, "This approach bridges existing
monocular reconstruction work and downstream industrial
applications, and we believe it can promote the development of
human avatars."
- Text-to-Image Systems: Ranni: Taming
Text-to-Image Diffusion for Accurate Instruction –
Existing text-to-image models can misinterpret more difficult
prompts, but now, new research from Alibaba Group (Hangzhou, Zhejiang,
China) and Ant Group (Hangzhou, Zhejiang,
China) has made strides in addressing that issue via a
middleware layer. This approach, which they have dubbed Ranni,
supports the text-to-image generator in better following
instructions. As the paper sums up, "Ranni shows potential as a
flexible chat-based image creation system, where any existing
diffusion model can be incorporated as the generator for
interactive generation."
- Autonomous Driving: Producing and Leveraging
Online Map Uncertainty in Trajectory Prediction – To
enable autonomous driving, vehicles must be pre-trained on the
geographic region and potential pitfalls. High-definition (HD) maps
have become a standard part of a vehicle's technology stack, but
current approaches to those maps are siloed in their programming.
Now, work from a research team from the University of Toronto (Toronto, Ontario, Canada), Vector Institute
(Toronto, Ontario, Canada), NVIDIA
Research (Santa Clara, Calif.,
U.S.A.), and Stanford University
(Palo Alto, Calif., U.S.A.)
enhances current methodologies by incorporating uncertainty,
resulting in up to 50% faster training convergence and up to 15%
better prediction performance.
"As the field's leading event, CVPR introduces the latest
research in all areas of computer vision," said Crandall. "In
addition to the oral paper presentations, there will be
thousands of posters, dozens of workshops and tutorials, several
keynotes and panels, and countless opportunities for learning and
networking. You really have to attend the conference to get the
full scope of what's next for computer vision and AI
technology."
Digital copies of all final technical papers* will be available
on the conference website by the week of 10 June to allow attendees
to prepare their schedules. To register for CVPR 2024 as a member
of the press and/or request more on a specific paper, visit
https://cvpr.thecvf.com/Conferences/2024/MediaPass or email
media@computer.org. For more information on the conference, visit
https://cvpr.thecvf.com/.
*Papers linked in this press release refer to pre-print
publications. Final, citable papers will be available just prior to
the conference.
About CVPR 2024
The Computer Vision and Pattern
Recognition Conference (CVPR) is the preeminent computer vision
event for new research in support of artificial intelligence (AI),
machine learning (ML), augmented, virtual and mixed reality
(AR/VR/MR), deep learning, and much more. Sponsored by the IEEE
Computer Society (CS) and the Computer Vision Foundation (CVF),
CVPR delivers the important advances in all areas of computer
vision and pattern recognition and the various fields and
industries they impact. With a first-in-class technical program,
including tutorials and workshops, a leading-edge expo, and robust
networking opportunities, CVPR, which is annually attended by more
than 10,000 scientists and engineers, creates a one-of-a-kind
opportunity for networking, recruiting, inspiration, and
motivation.
CVPR 2024 takes place 17-21 June at the Seattle Convention Center in Seattle, Wash., U.S.A., and participants may
also access sessions virtually. For more information about CVPR
2024, visit cvpr.thecvf.com.
About the Computer Vision Foundation
The Computer
Vision Foundation (CVF) is a non-profit organization whose purpose
is to foster and support research on all aspects of computer
vision. Together with the IEEE Computer Society, it co-sponsors the
two largest computer vision conferences, CVPR and the International
Conference on Computer Vision (ICCV). Visit thecvf.com for
more information.
About the IEEE Computer Society
Engaging computer
engineers, scientists, academia, and industry professionals from
all areas and levels of computing, the IEEE Computer Society (CS)
serves as the world's largest and most established professional
organization of its type. IEEE CS sets the standard for the
education and engagement that fuels continued global technological
advancement. Through conferences, publications, and programs that
inspire dialogue, debate, and collaboration, IEEE CS empowers,
shapes, and guides the future of not only its 375,000+ community
members, but the greater industry, enabling new opportunities to
better serve our world. Visit computer.org for more
information.
View original content to download
multimedia:https://www.prnewswire.com/news-releases/cvpr-technical-program-features-presentations-on-the-latest-ai-and-computer-vision-research-for-healthcare-robotics-virtual-reality-autonomous-vehicles-and-beyond-302147330.html
SOURCE IEEE Computer Society