Best Practices for Deploying Large Language Models in Production

code red code red

Large language models (LLMs) have revolutionized natural language processing (NLP) and enabled a wide range of applications, such as text generation, summarization, question answering, and conversational agents. However, deploying LLMs in real-world scenarios poses many challenges for software engineers, such as high computational costs, data quality and availability, model robustness and reliability, user satisfaction and trust, and ethical and social implications. In this session, we will discuss the current state-of-the-art and best practices for LLM deployment for software engineers, as well as the open problems and future directions for research and development. We will also showcase some examples of successful LLM deployment in various domains and settings, and share the lessons learned and the best practices for deployment, such as:

  • Choosing the right model for the task, based on the size, complexity, and domain of the model.
  • Optimizing the model using prompt engineering, fine-tuning, and context retrieval techniques to improve accuracy and relevance.
  • Deploying the model using scalable and secure infrastructure, such as cloud platforms or vector databases, to handle variable workloads and latency.
  • Monitoring the model performance and user feedback, using metrics and tools such as MLOps and Responsible AI Framework, to identify and address any issues or risks.
  • Iterating the model based on the changing needs and expectations of the users and the domain, using continuous integration and deployment strategies.

Interview:

What's the focus of your work these days?

Fine-tuning LLMs models: I am working on ways to better measure and fine-tune GenAI solutions.

Data Augmentation: I am working on Generative AI applications that can create synthetic data that mimics real-world distributions, helping to enhance datasets where data may be scarce or imbalanced.

What technical aspects of your role are most important?

Tuning and Generation: After training, the model is tuned to tailor it for specific applications. The generation phase involves creating content, evaluating it, and retuning the model to enhance quality and accuracy.

How does your InfoQ Dev Summit Boston session address current challenges or trends in the industry?

My session will help attendees better understand how to work with GenAI applications. Generative AI is important for several reasons, as it represents a significant advancement in the field of artificial intelligence.


Speaker

Francesca Lazzeri

Principal Director of Data Science and AI @Microsoft, Author of several books on applied machine learning and AI

Francesca Lazzeri, Ph.D. has over 16 years of experience in academic research, applied machine learning, AI product management and engineering team management. She is author of a few books on applied machine learning and AI, such as:

  • Machine Learning Governance for Managers (2023, Springer Nature)
  • Impact of Artificial Intelligence in Business and Society (2023, Routledge)
  • Machine Learning for Time Series Forecasting with Python (2020, Wiley)
  • and many other publications, including technology journals (O’Reilly, InfoQ, DZone)

Francesca is Principal Director of Data Science and AI at Microsoft, where she leads an organization of talented data scientists and machine learning scientists building AI applications on the Cloud, utilizing data and techniques spanning from generative AI, time series forecasting, experimentation, causal inference, computer vision, natural language processing, reinforcement learning. Before joining Microsoft, she was a Research Fellow at Harvard University in the Technology and Operations Management Unit and Adjunct Professor of Python for AI at Columbia University.
Francesca is currently Professor at the Machine Learning Institute (MLI), Advisory Board Member of the AI-CUBE (Artificial Intelligence and Big Data CSA for Process Industry Users, Business Development and Exploitation) project, Advisory Board Member of the Women in Data Science (WiDS) initiative, Machine Learning Mentor at the Massachusetts Institute of Technology, and active member of the AI community. You can find her on LinkedIn https://www.linkedin.com/in/francescalazzeri/ and Medium https://medium.com/@francescalazzeri

Read more
Find Francesca Lazzeri at:

Date

Monday Jun 24 / 10:20AM EDT ( 50 minutes )

Location

Metcalf Hall

Topics

LLM LLMOps Generative AI

Video

Video is not available

Slides

Slides are not available

Share