-->
At Sequoia Capital’s AI Ascent on March 26, 2024, Arthur Mensch, founder of Mistral AI, spoke with Matt Miller about his vision of democratizing AI access for developers. They discussed the imperative of creating open platforms while navigating the complexities of open-source initiatives alongside commercial ambitions. The full video is available here.
Some key takeaways:
[0:03] I’m excited to introduce our first speaker, Arthur Mensch from Mistral AI. Arthur is the founder and CEO of Mistral AI. Despite just being nine months old as a company and having many fewer resources than some of the larger foundation model companies, I think they’ve really shocked everybody by putting out incredibly high-quality models approaching GPT-4 in caliber out into the open. So, we’re thrilled to have Arthur with us today, all the way from BRS, to share more about the opportunity behind building an open-source platform. Interviewing Arthur will be my partner, Matt Miller, who is dressed in his best French attire to honor Arthur today and helps lead our efforts in Europe. So please welcome Matt.
[0:50] Matt: Thank you! Arthur, with all the efficiency of a French train, you arrived just right on time. We were sweating a little bit back there because you just walked in the door. It’s good to see you! Thanks for being with us here at AI Ascent today.
[1:06] Arthur: Thank you for hosting us.
[1:08] Matt: Absolutely! I would love to start with the background story of why you chose to start Mistral. Take us back to the beginning. We all know about your successful career at DeepMind, your work on the Chinchilla paper, but we love to hear that spark that gave you the idea to launch and to break out and start your own company.
[1:30] Arthur: Yeah, sure. We started the company in April, but I guess the idea was out there for a couple of months before. Timothée and I were in school together, so we knew each other from before. We had been in the field for like ten years doing research. We loved the way AI progressed because of the open exchanges that occurred between academic labs and industrial companies. However, it was a bit of a shame when the field stopped making open contributions.
The last important model in the field published by Google was Chinchilla. For us, it was a shame that we stopped contributing openly so early in the AI journey because we are very far from finishing it. When we saw ChatGPT at the end of the year, we reflected on the fact that there was an opportunity for doing things differently, especially from France. There were many talented people who were a bit bored in big tech companies, and that was how we realized there was an opportunity to build strong open-source models quickly with a lean and experienced team.
[2:30] Arthur: We wanted to correct the direction the field was taking and push open-source models much further. I think we did a good job at that because we’ve been followed by various companies in our trajectory.
[2:40] Matt: Wonderful! So a lot of the drive behind starting the company was the open-source movement?
[2:44] Arthur: Yes, that was one of the driving factors. Our intention and mission is really to bring AI to the hands of every developer. The way it has been done by our competitors is very closed, so we want to push for a much more open platform. We aim to spread and accelerate adoption through that strategy.
[3:06] Matt: Wonderful. Fast forward to today: you’ve released Mistral’s Large and have been on a tear of partnerships with Microsoft, Snowflake, DataBricks, and more. How do you balance what you’re going to do open-source with what you’re going to pursue commercially while also thinking about tradeoffs? That’s something many open-source companies contend with.
[3:30] Arthur: It’s a hard question, and we’re currently addressing it through two families of models. We intend to stay the leader in open-source, which puts pressure on the open-source family. We need to go faster compared to how various software providers play this strategy. AI develops faster than software.
We are constantly thinking about how we should contribute to the community while also getting some commercial adoption through enterprise deals, etc. There’s definitely tension there. For now, I think we’ve done a good job at managing this, but it’s very dynamic. Each week we consider what we should release next for both families.
[4:10] Matt: You have been the fastest in developing models and reaching different benchmarking levels. What do you think gives you that advantage to move quicker than your predecessors and more efficiently?
[4:22] Arthur: I think we like to get our hands dirty. Machine learning has always been about crunching numbers and looking at data. We hired people who were willing to do the groundwork. This has been critical to our speed, and it’s something we want to maintain.
[4:40] Matt: In addition to the large model, you also have several small models that are extremely popular. When would you tell people to spend their time working with you on the small models, and when should they focus on the large models? Where do you think the economic opportunity lies—is it in doing more of the big models or more of the small ones?
[5:06] Arthur: Every LLM provider has made the observation that one size does not fit all. Depending on your application, you typically have different large language model calls. Some should have low latency and don’t require much intelligence, while others should have higher latency and require more intelligence. An efficient application should leverage both.
The challenge lies in ensuring that everything works together. You end up with a system that is not only a model, but really like two models plus an out-loop that calls your systems and functions. We also want to address the developer challenge of how to evaluate and ensure that your application improves when you upgrade to a new model version.
[5:52] Matt: What are some of the most exciting things you see being built on Mistral? What excites you about what the community or your customers are doing?
[6:05] Arthur: Pretty much every young startup in the Bay Area has been using it for fine-tuning purposes for fast application development. Part of the value of Mistral is that it allows for fast applications, and we’ve seen web search companies and standard enterprise applications using our models.
The fact that you have access to the weights means you can pour in your editorial tone more easily. The value of the open-source part is that developers have control. They can deploy everywhere and achieve a high quality of service because they can use dedicated instances. They can also modify the weights to suit their needs and bump performance close to large models while being much cheaper.
[6:58] Matt: What’s the next big thing we can expect to see from you guys? Can you give us a sneak peek of what’s coming soon?
[7:05] Arthur: For sure! While Mistral Large was good, we are working on improving it significantly. We have interesting open-source models in various vertical domains that we will be announcing very soon. The platform currently operates as serverless APIs, and we’re working on making customization a part of it. We’re also heavily betting on multilingual data and models because as a European company, we’re well positioned there.
[7:40] Matt: Many people in this room are already using Mistral models. How should they engage with you and your company? What’s the best way for them to work with you?
[7:46] Arthur: They can reach out to us. We have developer relations that are actively pushing the community forward by making guides, gathering use cases, and showcasing what can be built with Mistral models. We’re investing a lot in the community because it ultimately makes the model better.
We want to map what people are building with our models into evaluations that can help us generate better open-source models. We invite everyone to engage with us to discuss their use cases, and we can also gather insights for new evaluations to improve our models over time.
Our commercial models are available on our platform, and they work better than the open-source ones. They’re also available on various cloud providers to facilitate adoption for enterprises. Customization capabilities, like fine-tuning which enhances the value of the open-source models, will be available very soon.
[9:02] Matt: You briefly touched on the benefits of being in Europe. What advantages do you think there are to building a business in France and Europe, along with any drawbacks?
[9:11] Arthur: One advantage is having a strong pool of junior talent. Many people from universities in France, Poland, and the UK can be trained quickly to produce at the level of a million-dollar engineer in the Bay Area for significantly lower costs.
The workforce here is made up of very good engineers and machine learning specialists. Also, we have substantial support from the state, which is more significant in Europe than in the US. European companies also prefer to work with us because we are based here and better understand European languages.
[10:07] Matt: Paint a picture for us five years from now. Where do you think Mistral will be, and what do you envision for the landscape?
[10:15] Arthur: Our bet is that the platform and infrastructure of artificial intelligence will be open. Based on that, we’ll be able to create assistance and potentially autonomous agents. We believe we can become this platform by being the most open one out there, independent from cloud providers.
In five years, I have literally no idea what this will look like. If you had looked at the field in 2019, you wouldn’t have bet on where we are today. We are evolving toward more autonomous agents able to handle more tasks. The way we work will change profoundly, and I expect that AI technology will be so easily controllable that eventually, any user could create their own assistant or autonomous agent.
[11:06] Matt: Awesome! We have about five minutes left; I want to open the floor for questions from the audience. Don’t be shy!
[11:13] Audience Member: How do you see the future of open-source versus commercial models playing out for your company? You made a big splash with open-source, and now some commercial models are even better. How do you imagine that will evolve over the next couple of years?
[11:28] Arthur: The one thing we optimize for is to be able to continuously produce open models with a sustainable business model to fuel the development of the next generation. We expect this to evolve over time, but to stay relevant, we need to maintain our status as the best at producing open-source models.
[11:52] Audience Member: Can you talk about competition with models like Llama 3 from Facebook?
[11:57] Arthur: Llama is making models, but I’m not sure how open-source they will be. We’ve been delivering faster and smaller models. The good thing about open-source is that it’s generally beneficial for everybody. If competition arises, we’ll welcome it.
[12:14] Matt: Your partnerships with Snowflake and DataBricks are significant, as they run natively in their clouds. Why did you pursue those deals, and what do you see as the future for platforms like Snowflake and DataBricks in the AI world?
[12:34] Arthur: AI models become much stronger when connected to data. The enterprise data is often on platforms like Snowflake or DataBricks, and we want to ensure that customers can deploy technology where their data resides.
[12:57] Audience Member: Where do you draw the line between openness and proprietary in your models? Are you comfortable sharing how you train the models or do you limit that information to just releasing the weights?
[13:06] Arthur: We draw the line at releasing the weights because it’s a competitive landscape. There’s tension between staying relevant and disclosing everything. This is a moving line. If everyone starts doing it, we might reconsider. But, for now, we’re not taking that risk.
[13:27] Audience Member: When another company releases weights for a model, what practices do you employ to learn from it?
[13:32] Arthur: You can’t learn a lot just from the weights. While we look at them, it’s difficult to deploy or reverse engineer the architecture. It compresses information so much that it’s not easy to find out what’s going on.
[13:58] Audience Member: What are your plans for model sizes? Will you continue with smaller models, move to larger ones, or maintain a balance?
[14:02] Arthur: Model sizes depend on scaling laws and your available compute. You optimize for training and inference costs. We aim to have a family of models that includes both small and large options to meet various needs.
[14:27] Audience Member: Are there any plans for Mistral to expand into the application stack, such as releasing custom GPTs and assistance APIs?
[14:32] Arthur: We’re really focusing on developers, but the line between developers and users is thin for this technology. We’ve released an assistant demonstrator called LHA. The goal is to expose this to enterprises, allowing them to connect their data with our technology.
[14:56] Matt: Thank you, Arthur, for sharing your insights with us today!
[14:59] Audience: [Applause]
[15:00] Matt: Thanks for being with us!