It’s crazy how the AI and NLP landscape has evolved over the last five years. 5 years ago, around the time I finished my PhD, if you wanted to work on cutting-edge natural language processing (NLP), your choice was relatively limited.
Recently, I decided to go on the job market again, which has become much more diverse. In this post, I want to highlight some macro trends that I observed and the reasons that I joined my new company, Cohere, which may be helpful in guiding your own job search.
Note: This post reflects my personal opinions and not those of my previous and current employers. It is written from my perspective as a Europe-based researcher focused on NLP. If you are interested in AI companies but have a different skillset, some of these thoughts should still be relevant to you.
AI Job Market Trends
1. Research has become more applied.
In the past, most problems at the forefront of ML and NLP were firmly in the purview of fundamental or basic research. As models were not powerful enough, datasets reflected simplified evaluation settings that were feasible at the time and typically far removed from applications. In order to work on such cutting-edge problems, you generally had the choice of joining academia or going to a handful of big tech labs (Google Research/Brain, DeepMind, FAIR, MSR, etc).
For research advances to make their way into products could take a team months or years of dedicated work—if it succeeded at all. An exception is machine translation where research breakthroughs such as the emergence of statistical MT and neural MT resulted in concrete product improvements.1 On the other hand, applied research departments directly worked on improving a specific application.
In light of the emergence of pre-training (NLP’s ‘ImageNet moment’) and models becoming more powerful, the gap between fundamental and applied research in NLP has consistently narrowed: The integration of BERT-based representations led to one of the biggest quality leaps in Google Search history and the recent generation of large language models (LLMs) enabled a plethora of new applications.
Problems that were previously in the domain of basic research (how to measure generation quality, how to teach models to reason, how to learn with long-range dependencies, etc) now impact real-world applications. As a result, new advances in research have the potential to have a much broader impact. This leads to many new opportunities and emerging research areas. However, it also means that researchers must consider challenges regarding the safe and responsible use of such technology.
As the Generative AI space heats up, new research breakthroughs are perceived to provide an edge over the competition. As a result, publishing them has become more challenging. In addition, with increasing proximity to product, doing purely curiosity-driven research has become more difficult: when most other research has immediate product impact, how do you justify working on an unproven direction? For researchers, this requires balancing short-term impact with long-term research potential, with the scale tilting to the former.
To encourage long-term research in this application-oriented climate, companies need to create an environment that still rewards open-ended directions. With companies focusing on applications, researchers in academia should focus on unexplored directions and look to the blue skies. While compute requirements for research have increased, there are plenty of less compute-intensive directions:
2. Startups are a serious alternative to a PhD.
When people have asked me for advice in the past on whether they should do a PhD, I generally told them that it’s well worth the time investment. Not only does it unlock research jobs that require a PhD, it also is a great way to focus on your personal growth.2
In light of the applied nature of current research problems, there is another path that exposes you to cutting-edge AI work: joining a startup. To be sure, startups—particularly early-stage ones—are not for everyone. They require a person with a certain type of mindset and motivation: Someone who enjoys solving real-world problems and having a tangible, direct impact; who can work autonomously and without much guidance; who thrives in a hectic, unstructured environment and can handle ambiguity.
But if you’re comfortable in these conditions, you can acquire certain skills and knowledge much faster than in a typical PhD. You will also get hands-on experience with emerging methodologies such as instruction and preference tuning, red-teaming, LLM alignment, etc, which can prove invaluable for your career. Of course, in a startup, your work is typically determined by the company’s needs rather than your own interests so you need to be flexible.
A PhD is still the best option for you if you would like to follow your own curiosity and focus on your personal development; if you enjoy to go deep and fully dedicate yourself to a topic; if you value collaboration and mentorship; and if you enjoy being creative and coming up with genuinely new ideas. A scientific mindset as well as other research-related skills such as designing ablations and testing for hypotheses, publishing, and developing research taste, among many others, are also more easily learned during a PhD.
3. ML has become less open and more polarized.
An amazing attribute of the ML community has been that much of ML development and research has been conducted in the open. The top ML journal JMLR was founded in 2000 with the goal to provide open access to its publications. Conferences provide free access to their proceedings online. ML frameworks such as TensorFlow and PyTorch were originally developed by companies and then open-sourced. In NLP, journals and conferences are similarly open-access and open-source libraries such as Transformers are common building blocks.
Early pre-trained models such as ELMo, ULMFiT, GPT, and T5 were open-sourced as this enabled wide-spread adoption. However, this landscape of radical openness has shifted. Stalwarts of open-source AI such as OpenAI and Google have gradually released less information about their models. Starting with the first generation of LLMs, models such as GPT-3 and PaLM were increasingly locked behind APIs—but papers still described the architecture and data in detail. More recent models such as GPT-4, PaLM-2, and Gemini are not only closed-source but the corresponding papers reveal nothing about the architecture and training data.
This lack of knowledge sharing may impede progress in AI development. Fortunately, other companies and organizations continued to release a steady stream of open-source LLMs. Still, even among open-source models there is a spectrum of openness. For instance, the exact composition of training data often remains a secret. Few models such as BLOOM or the recently released OLMo are truly open. Among the big tech companies, Facebook and Microsoft showed a renewed commitment to open-source. Even Apple—with its reputation of being secretive—has been quietly open-sourcing AI projects.
For industry researchers, the trend towards closed-source means that it has become harder to publish. In the past, researchers at top AI industry labs were often able to publish a steady stream of publications similar to their academic counterparts. For LLM-related papers, this stream is reduced to a trickle and new advances may eventually see the light of day only as patents rather than research publications. Additionally, it has become more difficult to publish individual contributions as advances are more likely to be absorbed into large collaborations producing a single report.
4. Research is concentrated in large projects.
The average number of authors on a publication has steadily increased. Starting in particle physics, author numbers have surged in recent years due to massive global collaborations such as the Large Hadron Collider. The emergence of LLMs brought this trend to ML and NLP. Recent examples of such large-scale collaborations are BLOOM (300+ authors), GPT-4 (200+ authors), and Gemini (900+ authors). While several successful LLMs have been produced by small teams, the number of authors of an LLM has generally increased with the number of its parameters.
LLM projects not only require people with research skills but also strong software engineers that can design systems that scale efficiently to 100s of billions of parameters and trillions of tokens. In addition, LLMs require disparate sets of expertise including data processing, optimization, fine-tuning, RL, evaluation, safety, infrastructure, multi-modality, etc. As a result and due to their strategic importance, the size of teams working on the latest generation of LLMs has rapidly increased.
This size contrasts with the previous generation of AI breakthroughs such as AlphaGo, which were executed by much smaller, focused teams. Such size poses challenges for the effective execution and prioritization, increasing friction and making it more difficult to quickly make decisions. A less direct downside of the increasing number of people getting absorbed into LLM-related research is that other research directions that do not directly relate to the latest generation of LLMs such as the development of Transformer alternatives are deprioritized.
5. More companies, more opportunities.
The advent of LLMs led to a wave of new companies leveraging this technology—and prompted existing companies to figure out how incorporate these models into their products. YC, the prolific startup incubator, has already funded more than 100 generative AI startups. A recent McKinsey report estimated that Generative AI’s impact on productivity could add trillions of dollars to the global economy, with most of the expected value concentrated across four use cases: customer operations, marketing and sales, software engineering, and R&D.
Generative AI, however, is still just at the beginning. Many research challenges remain including mitigating hallucinations and ensuring trustworthiness and attribution, aligning models to reliably elicit desired behavior, ensuring robust reasoning, etc. In order to effectively use LLMs for business use cases, we furthermore need to successfully conduct pilot studies, assess biases and risks, define suitable guardrails, rethink core business processes, and develop new skills in the workforce, among others.
With all of these new AI companies, it is difficult to choose the one that is the best fit for you.
Why I Joined Cohere
Below, I highlight the criteria that led me to join Cohere. Many of these considerations are personal but I hope they may be useful to you as inspiration or to guide your own job search.
1. Openness and community
In addition to building powerful proprietary enterprise models, Cohere supports openness and inclusion through its non-profit research arm, Cohere for AI (C4AI). C4AI’s openness is not an after-thought but part of Cohere’s DNA: the idea for C4AI goes back to FOR.ai, a decentralized ML collaboration—which I highlighted in a 2020 newsletter—initiated by Cohere founders Aidan Gomez and Ivan Zhang, among others. C4AI published more than 30 papers in 2023. Talented researchers from a diversity of backgrounds are mentored by researchers across the organization in their Scholars Program. C4AI organizes large-scale community initiatives including Aya, a massive collaboration to develop a large multilingual open-source instruction tuning dataset and model. In addition, Cohere also invests in programs that make ML more accessible such as LLM University.
2. A mature start-up
At an early-stage startup, you can move fast but things are hectic and unstructured. In big tech, there are established tools and processes available but bureaucracy can impede progress.
Cohere occupies a middle ground. It has been around for a while and had time to build structure and processes, which make it easy to hit the ground running and directly have an impact without getting bogged down by unrelated tasks. The core components of the LLM pipeline are firmly in place and are being refined and iterated upon.
At the same time, the team is small enough so that you can have impact across the entire LLM stack and own crucial parts of the pipeline. There is little friction and red tape, which makes it easy to prototype and test new improvements and to collaborate across teams.
3. Enabling remote work
With a new baby, having a flexible working arrangement that would allow me to work remotely was very important to me. However, not every company that allows its employees to work from home is set up for remote work. It’s good to be aware: Does the culture support remote work or will you miss out on conversations at the micro-kitchen? Do the tools enable working remotely? Are there opportunities to meet in person?
For companies with multiple offices, an important factor is where the company’s main office or a project’s center of gravity is located as it may be harder to achieve the same level of impact if you are in a satellite office or working remotely. In the same vein, working with colleagues in similar timezones often facilitates collaboration. At the beginning of your career, it is often a good idea to work in-person as it will be easier to seek advice and learn from colleagues.
At Cohere, half of all employees are remote. Tools and benefits support remote work and events provide equal access for distributed teams. Cohere has offices in Toronto, London, San Francisco, and Palo Alto that you can visit to work with team members in-person. Many people working on ML and modeling at Cohere are based in Europe or in the US East Coast so it’s easy to collaborate across these timezones.
4. Alignment
Another thing that was important to me is to be aligned with the overall goals of the company. While most companies are profit-driven, some are more serious about having a positive impact on society than others. This is reflected in the way they develop their technology (do they prioritize safety and ethics and put appropriate safeguards in place?) as well as the programs they organize, causes they support, and the way they interact with the community.
Cohere aims to develop AI models in a responsible manner to serve humanity, an important objective. Making models more accessible across languages is not only a personal objective but serves this overarching goal as it enables Cohere’s customers to reach their users across the world.
5. Team and culture
Cohere’s team is world-class. I had worked with a few at Google DeepMind and knew others from conferences or by reputation. Cohere has teams with deep expertise across many LLM areas such as pre-training, RL, retrieval augmentation and search. The latter are crucial for knowledge-intensive and enterprise use cases. Beyond research, senior leaders at Cohere have experience in building and scaling products and businesses to billions of users.
In addition, I enjoyed all my interactions and conversations with Cohere employees. Everyone I got to know was kind, humble, and genuine. The culture is collaborative and everyone is aligned towards the same objective.3 People are motivated by helping each other succeed.
For more information about careers at Cohere, check out the careers page. Cohere is hiring across many roles.
For a job market perspective from an RLHF researcher, check out
‘s Interconnects post.Other examples include OCR, image recognition, predictive text, dependency parsing (when it was still widely used as an intermediate step in NLP pipelines), among others.
I’ve collected some advice in 10 tips for research and a PhD.
A company’s culture is difficult to assess from the outside. Try to ask current and former employees about their work environment and read reviews of the company online.
A few months old now but your thoughts are valid.
Your statement on “research is concentrated on large projects” I think is mostly attributed to access to resources now but with GPU clusters becoming increasingly available as-a-service this could change