by Piotr Bojanowski Mathilde Caron Edouard Grave Lucas Hosseini Gautier Izacard Armand Joulin Sebastian Riedel
Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.
Authors: | Piotr Bojanowski Mathilde Caron Edouard Grave Lucas Hosseini Gautier Izacard Armand Joulin Sebastian Riedel |
Series: | |
Publishers: | arXiv |
Publish Date: | 2022-12-05 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning Information Retrieval |
Description:
Find it at:
by Eric Breck Shanqing Cai Eric Nielsen Michael Salib D. Sculley
Available at: https://research.google/pubs/pub46555/ Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt.
Authors: | Eric Breck Shanqing Cai Eric Nielsen Michael Salib D. Sculley |
Series: | |
Publishers: | IEEE |
Publish Date: | 2017-01-05 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning |
Description:
by Saleema Amershi Andrew Begel Christian Bird Robert DeLine Harald Gall Ece Kamar
Available at:
https://ieeexplore.ieee.org/document/8804457
Recent advances in machine learning have stimulated widespread interest within the Information Technology sector on integrating AI capabilities into software and services. This goal has forced organizations to evolve their development processes. We report on a study that we conducted on observing software teams at Microsoft as they develop AI-based applications. We consider a nine-stage workflow process informed by prior experiences developing AI applications (e.g., search and NLP) and data science tools (e.g. application diagnostics and bug reporting). We found that various Microsoft teams have united this workflow into preexisting, well-evolved, Agile-like software engineering processes, providing insights about several essential engineering challenges that organizations may face in creating large-scale AI solutions for the marketplace. We collected some best practices from Microsoft teams to address these challenges. In addition, we have identified three aspects of the AI domain that make it fundamentally different from prior software application domains: 1) discovering, managing, and versioning the data needed for machine learning applications is much more complex and difficult than other types of software engineering, 2) model customization and model reuse require very different skills than are typically found in software teams, and 3) AI components are more difficult to handle as distinct modules than traditional software components - models may be "entangled" in complex ways and experience non-monotonic error behavior. We believe that the lessons learned by Microsoft teams will be valuable to other organizations.
Authors: | Saleema Amershi Andrew Begel Christian Bird Robert DeLine Harald Gall Ece Kamar |
Series: | |
Publishers: | IEEE |
Publish Date: | 2019-01-01 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning Software Design & Engineering |
Description:
by Vinay Chaudhary Jean-François Crespo Eugene Davydov Dan Dennison Dietmar Ebner Daniel Golovin Gary Holt Todd Phillips D. Sculley Michael Young
Machine learning offers a fantastically powerful toolkit for building useful complexprediction systems quickly. This paper argues it is dangerous to think ofthese quick wins as coming for free. Using the software engineering frameworkof technical debt, we find it is common to incur massive ongoing maintenancecosts in real-world ML systems. We explore several ML-specific risk factors toaccount for in system design. These include boundary erosion, entanglement,hidden feedback loops, undeclared consumers, data dependencies, configurationissues, changes in the external world, and a variety of system-level anti-patterns.
Paper available at NIPS:
https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
Authors: | Vinay Chaudhary Jean-François Crespo Eugene Davydov Dan Dennison Dietmar Ebner Daniel Golovin Gary Holt Todd Phillips D. Sculley Michael Young |
Series: | |
Publishers: | PIPS |
Publish Date: | 2015-01-01 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Programming Machine Learning |
Description:
by Jie An Oron Ashual Oran Gafni Sonal Gupta Thomas Hayes Qiyuan Hu Devi Parikh Adam Polyak Uriel Singer Yaniv Taigman Harry Yang Xi Yin Songyang Zhang
We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to build on T2I models with novel and effective spatial-temporal modules. First, we decompose the full temporal U-Net and attention tensors and approximate them in space and time. Second, we design a spatial temporal pipeline to generate high resolution and frame rate videos with a video decoder, interpolation model and two super resolution models that can enable various applications besides T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.
Authors: | Jie An Oron Ashual Oran Gafni Sonal Gupta Thomas Hayes Qiyuan Hu Devi Parikh Adam Polyak Uriel Singer Yaniv Taigman Harry Yang Xi Yin Songyang Zhang |
Series: | |
Publishers: | arXiv |
Publish Date: | 2022-09-29 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning Computer Vision and Pattern Recognition |
Description:
Find it at:
by Aidan N. Gomez Llion Jones Lukasz Kaiser Niki Parmar Illia Polosukhin Noam Shazeer Jakob Uszkoreit Ashish Vaswani
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Authors: | Aidan N. Gomez Llion Jones Lukasz Kaiser Niki Parmar Illia Polosukhin Noam Shazeer Jakob Uszkoreit Ashish Vaswani |
Series: | |
Publishers: | arXiv |
Publish Date: | 2017-06-04 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning |
Description:
Find it at:
This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.
Authors: | Marcus Hutter Mary Phuong |
Series: | |
Publishers: | arXiv |
Publish Date: | 2022-07-22 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning |
Description:
Find it at:
by Parker Barnes Timnit Gebru Ben Hutchinson Margaret Mitchell Inioluwa Deborah Raji Elena Spitzer Lucy Vasserman Simone Wu Andrew Zaldivar
Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related AI technology, increasing transparency into how well AI technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation.
Authors: | Parker Barnes Timnit Gebru Ben Hutchinson Margaret Mitchell Inioluwa Deborah Raji Elena Spitzer Lucy Vasserman Simone Wu Andrew Zaldivar |
Series: | |
Publishers: | arXiv |
Publish Date: | 2018-10-05 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning |
Description:
Find it at:
by Greg Brockman Jong Wook Kim Christine McLeavey Alex Radford Ilya Sutskever Tao Xu
Available:
https://openai.com/blog/whisper/
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero- shot transfer setting without the need for any fine- tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.
Authors: | Greg Brockman Jong Wook Kim Christine McLeavey Alex Radford Ilya Sutskever Tao Xu |
Series: | |
Publishers: | openAI |
Publish Date: | 2022-09-21 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Machine Learning |
Description:
by Kfir Aberman Varun Jampani Yuanzhen Li Yael Pritch Michael Rubinstein Nataniel Ruiz
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models (specializing them to users' needs). Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject's key features).
Authors: | Kfir Aberman Varun Jampani Yuanzhen Li Yael Pritch Michael Rubinstein Nataniel Ruiz |
Series: | |
Publishers: | arXiv |
Publish Date: | 2022-08-25 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Computer Science Deep Learning Paper Computer Vision and Pattern Recognition |
Description:
Find it at:
by Prafulla Dhariwal Heewoo Jun Jong Wook Kim Christine Payne Alec Radford Ilya Sutskever
We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multi-scale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.
Authors: | Prafulla Dhariwal Heewoo Jun Jong Wook Kim Christine Payne Alec Radford Ilya Sutskever |
Series: | |
Publishers: | arXiv |
Publish Date: | 2020-04-30 |
Languages: | ENG |
ISBN: |
None |
Rating: | |
Tags: | Deep Learning Machine Learning Music Paper Generative Models Audiot and Speech Processing |
Description:
Find it at: