Podcast transcript: Can AI and IT teams coexist?

Podcast transcript: Can AI and IT teams coexist?

This automatically-generated transcript is taken from the IT Pro Podcast episode Can AI and IT teams coexist?’. We apologise for any errors.

Adam Shepherd

Hello, and welcome to the IT Pro Podcast. My name is Adam Shepherd,

Zach Marzouk

and I'm Zach Marzouk,

Adam

And this week we're talking about artificial intelligence.

Zach

The past few years have been a boon time for AI with machine learning tools and use cases growing in number and complexity. Larger companies have been gradually investing in the infrastructure and skills needed to deploy AI systems in the field, and launching early pilot projects to test the waters.

Adam

Fast forward to today and organisations have now started moving beyond the proof of concept stage and rolling out wider AI projects across the business. But in many cases, the biggest obstacle to these efforts isn't a lack of board level buy in, or the technical complexity of training and deploying AI models.

Zach

In fact, the most significant challenges for many organisations' AI endeavours comes from the IT department itself. Joining us this week to tell us more about what causes these clashes, and how businesses can prevent them from happening is Luke Wignall, Director of Technical Product Marketing at Nvidia.

Adam

Luke, thanks for being with us.

Luke Wignall

Thank you for having me.

Adam

So Luke, I think most people would assume that IT teams and AI professionals would get along quite well. Can you explain why these two groups are butting heads?

Luke

Ooh, that's opening a.... that's a long standing battle. So I really believe it boils down to let's call it goals out of alignment. You know, the goal of an IT department, right, is to deliver consistent stable services. And so they do everything in their power to figure out how to nail down the right building materials, limit the number of configurations, do everything they possibly can to provide the best possible consistent long term support. Now switch over to your AI practitioner, your data scientist, and they're, you know, they're data explorers, they're looking to solve all sorts of different problems with different amounts of data and, and that right out of the gate means a need for flexibility to iterate over various different platforms, you know, basically pull from their, their palette of colours or their toolbox of choice and assemble some sort of new way to tackle that problem. And every change turns into a helpdesk request. And so you've got IT saying, I just gave you that. And yeah, IT person or AI person saying, Yeah, but now I need something different. And I think just that point alone is where the conflict really begins. And also, keep in mind that historically, you know, large data analytics sorts of things have happened in clusters that have existed more in the corner of the data centre, and tended to operate around special budgets, special teams, whatever. And now we're seeing this democratisation of of AI and data science, which means it's, it's coming over into the core data centre, which is a very different IT team, right? With very, you know, as the sort of the the goals that I that I established. So it's, I think that's really the crux of it, if you will. And I think what exacerbates that is really a lack of a common language between the two, like, they just operate in such different worlds. And so they're trying to find a meeting place in the middle in order to figure out how to how to support each other and how to ask the right questions. And I think that's a struggle.

Adam

So it's stability versus agility, almost.

Luke

I think that's a great way to sum it up. Thanks for taking all five minutes of me and turn it into a great sentence. Perfect, I'm stealing that.

Adam

Feel free!

Zach

What would you say is the biggest sticking point for most teams?

Luke

I think a customer quote fits here perfectly. We were talking to a university. And and we offered to assist them with a POC or what have you. And to get started. Their immediate reaction was No, no, absolutely not. We're not prepared to support those users, those users are three to five times more - that was a shocking statistic, by the way, or metric - three to five times more support intensive than any other user in their organisation.

Adam

Wow.

Luke

So I think just that out of the gate tells you everything, right? These are extremely... well, I mean, these are very valuable resources, right? These researchers, these data explorers are solving some of the most challenging problems in organisations today. And so I think they have a lot of momentum behind them. And IT is looking for that, you know, that sort of steady support relationship, and it's, it's just different. Right? So I would say that's the biggest early sticking point.

Adam

I mean, that that makes a lot of sense. I think when you consider that very few teams within an organisation are going to be even requesting kind of infrastructure changes at all, let alone infrastructure modulations on the kind of frequency and timescale that AI teams deal in, right?

Luke

Oh, absolutely. It's one thing to, you know, say be in a virtualized environment and you need, you know, a little more horsepower, right or, or maybe you're in a vGPU supported GPU environment, and you want a little more GPU. But it's another thing to demand multiple GPUs, you really want some big horsepower. And more critically, now you're beginning to work with larger and larger sets of data. So it's not just, I need a lot of storage, it's I need a lot of networking as well to get to that data and pull it back and forth. And so, you know, suddenly those demands become, can become very extreme, you can really push the limits.

Adam

So what kind of impact do these disagreements have on organisations?

Luke

Great question. Our, our own IT team at Nvidia, there's a there's a team that combines both IT infrastructure and AI. And, in fact, the reason we've merged these two teams was to try to solve some of these problems. And early on, one of the metrics that came out of that team was from ideation through to a functioning solution that could be an application, some sort of some AI app, and the total length of time, something that we're very proud of speeding up, you know, various elements of and including that entire pipeline, 70% of the of the total project time was spent in exchanging helpdesk tickets to get the right sized workbench for that AI practitioner to work on.

Adam

Wow, like the actual physical desk?

Luke

Not so much the physical desk, but you know, that whatever, you know, that Linux virtual machine, or whatever it might be that they're going to actually start the project on. But it might as well be a physical desk, right? I mean, if they don't have it, they're sitting there in a chair with you know, nothing to work on. And, and if you think about it, you know, you can you can bring as much horsepower as you want to the table to speed up your data analytics, or whatever, your training, your inferencing, whatever. If you can't get the right resources to even get started on, then nothing's speeding you up, right. So it was part of the reason we merged the team was to help solve that communication problem, get these two speaking the same language, so that they could speed up that process. And I think I think that's absolutely critical. As is delivering, I think, taking some of the complexity, it's another piece we haven't talked about yet is the complexity that exists in all of these things, right, we've talked about all the resources that are required. So now think about the sorts of support tools that will be required to deliver those and to support them over time, you know, increase as well as decrease the resources as needed and be as flexible as those practitioners require. I think there's a perception of a just a vast amount of complexity there. And the reality is, we've worked really, really hard to assemble a software platform, Nvidia AI Enterprise that collects all these libraries and puts them into a much more functional, supported, enterprise quality supported bundle that enterprises can now deploy. And the sole purpose of this is to take what was that sort of complex toolbox that the AI practitioner expects, bundle it, support it, and put it back into IT's hands, so that it was easy to deploy, and easy to support. Right? So it's not just day zero; what am I installing, but then day two and on? How do I take care of it? How do I make sure I can support those users properly?

Zach

Just out of curiosity, can you tell us a bit more about what these tools are?

Luke

I think the hardest part of answering that question is it's a very long list of tools. And, and like any big toolbox, right, or you walk into somebody's garage, and you look at their wall of tools, though, that tool collection is assembled by the AI or data scientist, in order to best select what they need to get whatever job done. And so what we've assembled are the leading tools for training, the leading tools for inferencing. Put those into a collection of containers that we call NGC, those containers can then be pulled down onto a variety of platforms, doesn't just have to be a Kubernetes or a core data centre Kubernetes instance, it can just be a data science workstation or a laptop. But what's most critical is assembling those tools into an enterprise bundle, a collection that's then supported, longterm and gives that confidence back to the IT side of the house, as well as the AI practitioner, that what they're getting is an optimised high performance, long support set of tools that they can then build and solve solutions with.

Zach

And did you find that the IT department was familiar with these tools or did you have to like convince them?

Luke

Oh, no, great, great question. Generally speaking, this is a, you know, Greek versus Latin problem; it's uh, you know, these are two different languages, right? You put a group of IT people in a room and ask them what they think AI and data science is all about, generally speaking, they're going to point back to that cluster community that that that large collection of servers that's not in their domain, that's handling big, big data. And all of a sudden, you've got this AI practitioner, data scientist who is at the very beginning of some problem exploration, and all they care about is, give me a desktop to work on, give me sufficient horsepower, give me networking, storage, get me access to the data, so that I can get started working, they don't frankly, care about the underlying IT. So just as I would answer your question, and say, IT doesn't speak the language of AI, AI doesn't speak the language of IT. And, you know, as far as AI is concerned, just give me a Jupyter Notebook, I want to get started, give me access to the data, I just want to get started. Problem is an hour later, they're like, I think I found something, I need a lot more data. Now I need, you know, I need I need I need, in order to explore. And IT is like, oh, no, no, open a help desk ticket. Like that's, that's a super complicated thing you're asking. And it is to be fair, IT has a complex job. And this is outside their, you know, sort of their comfort zone. Now, one of the things that we've kicked off is a, call it a service, that we call Launchpad. And Launchpad is a, in short, an instantaneous POC environment. What are the challenges that a lot of organisations have, especially with supply issues and other things right now, is getting their hands on enough gear in order to build out a decent POC. And there's a lot of questions around that, again, IT try to figure out how to build and support that, AI saying, I don't even know what you're asking me; a server, what's a server? And, and trying to figure out how to bring all that together. So we've created a a very production, I want to say production like but that sounds like it's not production. But this is actually hosted in Equinix. It's very production, it resembles anybody's core data centre. And these, you know, standard 2U servers equipped with GPUs and DPUs, are something that we can then allow a customer access to, for up to two weeks, we give them a wide variety, I think we're I think we've just passed 20 different labs that are full blown experiences, not a hands on lab, not a scripted, sort of, you know, the more traditional hands on labs you see an event or something like that, where you sit down and it's sort of queued for you, right? This is a, essentially, we're, it's like a cooking show, we're giving you the ingredients, and we're giving you the recipe, and we're wishing you good luck. Not really; we're there to help you as well. But the idea is that you're actually going to go into Launchpad, you're going to build one of these labs. And in the case of I don't know, take, we have a IT support bot that you can build, you can actually export that if you want it at the end and take it with you. Right, so we didn't want to create an environment where it was like a, like a sort of a class or a training sort of hands on lab environment. But instead a truly roll up your sleeves, you know, work within these tools and create something that you could actually leave with if you wanted to. And that to us is a much more powerful learning experience. They are designed from two different personas, they're designed to be executed by an IT professional with an emphasis on the IT side of the sort of the experience, right? So deploying and managing and supporting that environment. And they can then go forward and do the lab and learn more about AI and data science techniques, tools and their application. Or it comes from the AI persona side. And it's more heavily on the development of a model, training of that model. And then, you know, creating some sort of inference or model or a bot for example. And but still have an opportunity to see what the IT person is having to also support and do. So we're trying very hard to create a an environment where both sides get to experience, have a complete experience and walk away, you know, with a far greater level of comfort.

Adam

So Luke, you've mentioned that IT and AI teams are in many ways speaking different languages, and coming from different perspectives. Is there any way for the two groups to meet in the middle and reach a compromise, particularly around the needs of both organisations when it comes to workflows and timescales and things of that nature?

Luke

Well, absolutely. Earlier I mentioned Launchpad, and I think, you know, we've tried to create as a safe sandbox, you know, a neutral platform. Neither side owns it. It's it's something that we're giving to them. I mean, if you think about the value of this, it's incredible. And I think that's that's really a great starting place. But it's also the, it's, I would argue one of the first times that we've seen somebody assemble, for example, Nvidia AI Enterprise, assembling the what was very much this sort of community driven, a lot of open source tools, a lot of, of the various elements that AI and practitioners and data scientists are leveraging, but being bundled and presented in a way that IT can understand and support. And I think that's one of the first times we've seen that. I think that's a powerful move on our part, to try to bridge that gap and get those both those teams feeling comfortable and secure with that solution. I think up till now, it's been very much a, the AI practitioner rolls their own, they create, you know, something that IT isn't necessarily in full control of; that automatically creates conflict around security and support issues. And you're off to a bad start. Now, by presenting this sort of common supported platform, I think we've given them something that they can both feel comfortable using, as opposed to, you know, one side feeling like, maybe they're not getting what they need, right. And I think that's critical.

Adam

That's a good starting point, then, but how well does that scale across the organisation when organisations want to start, for want of a better term, operationalising this technology a bit more and kind of working on really kind of business wide projects with it?

Luke

Well, I've been pretty focused on the software layer. But if we look at this as an entire platform, right, a platform solution that's built on top of at the very bottom, NV certified hardware, so we're working with the OEMs to develop platforms, these are standard 2U servers, you can you can go to your favourite vendor, and pick the standard 2U build that you normally get, now adding some accelerated, so it's an accelerated version of what you were buying before, fully supported, fully configured, you're not having to try to figure out how to do this on your own, figure out how to customise you know, existing platforms or what have you. So I think just even starting with that layer, we're offering that that sort of uniformity that's required, right? IT looks for not wanting to add five or six more builds, or worse, custom builds, and instead, we're giving them something that is just a small tweak off of what they're already buying today. And so I think that gives them something that they can then use and scale out because it fits into their model, they're looking for something that if I had to I could move a non accelerated legacy application to, right, this is almost, you know, these are core data centres; they're almost uniformly virtualized, in order to be properly supported, right. And so thanks to that level of, of virtualization, of flexibility, we're inserting, working with the OEMs to insert properly supported, you know, commonly configured SKUs of server nodes that fit into their perception of how to support all their infrastructure. So it's not like they're having to figure out how to fit, you know, a unicorn node into an otherwise, you know, homogenous environment. Instead, these are just accelerated versions of what they're used to. But I think that makes it really easy to scale out.

Adam

So working on the same kind of principles as hyper converged infrastructure, then.

Luke

Yeah, absolutely. Okay. Easy to deploy, easy to use, that's what it needs to be.

Adam

So then how does that interact with the process side of things? Cause as you mentioned, one of the pain points, if you like, for IT is having to facilitate requests from the AI team at a much kind of greater pace, than they would normally, because adding, adding a new, a new 2U node to a datacenter - while easier if it's kind of a prepackaged sort of single converged unit is easier than adding new kind of storage and GPU compute capacity in a sort of piecemeal way - that's still quite a, shall we say, significant project for an IT team compared to the average support requests that you'd get from the kind of rank and file employees, right?

Luke

Yes, but I would fall back on the on the, on my comments about NV cert, and then move up the stack and say, This is why we work so closely with VMware, for example, with Red Hat and other partners. If you want to be successful in the core data centre, you have to work within that ecosystem. You have to be a first class citizen of that ecosystem. And we are, right. We have been for some time. And I think what's different is we're taking the lead and taking what was... it's funny, I'm hesitating because I don't want to describe AI tools and data science tools as not first class citizens but they haven't been part of sort of the large datacentre, you know, core data centre ecosystem at any point recently, right? So now is this opportunity to introduce those. And to find ways to make it not a, it shouldn't be an abrupt interruption or, or, you know, fork in their processes to suddenly have to support this, it should just slipstream in and be something that we've added accelerated servers, we've added some software that, you know, is the tools that AI and DS people are already using. And we're simply putting it into a support model. And you know, IT, you're not exposed, you're not having to, you know, figure out how to support something you're not used to, you're not having to go it alone on things that, you know, might have come out of the open source or Linux community that don't necessarily have a support element to them. And we're bundling all that together and giving it to them in a nice, stable, easy to deploy and easy to support solution.

Adam

I guess what I'm asking is, how far along are we in the process of bringing these two sides closer together; in allowing AI practitioners to have their expectations met without IT departments feeling like they're having to work massively outside their comfort zone in terms of their processes, the technologies that they're used to working with, the pace of delivery that they're used to - all of that kind of thing.

Luke

I think that's actually a great question, because I think we're finally, we've made it over that, that, that sort of milestone, that benchmark of maturity in terms of our solution stack, you know, I've mentioned, you know, the certified OEM platforms, I've mentioned, Nvidia AI Enterprise. You know, I've hinted at, you know, solutions like Riva and you know, that's conversational AI, or Merlin, that's recommender systems, Metropolis, safe city and computer vision, there's all these these elements that are coming together into building blocks, so not just, you know, the bag full of Lego bits, you know, that all hurt to step on. Now, we are beginning to put together full blown Lego kits, right? And these are things that are enough along that I think to answer your question, I think, actually does make this easier and faster and less disruptive to deploy for IT. And more closely fits... And this, I think, is an element I don't know that we've talked about yet on this, that, you know, it's one thing to be talking about a, you know, wise sage old data scientist who's been doing this a long time. And she or he has got all their toolkit already created and set on and they know what they're doing, right? It's about, and you know, the term gets abused repeatedly, but it's about democratising this, right. So if I'm making this easy, and I keep talking about making it easy, if I'm making it easy, then it's putting these tools in a place in a way where somebody doesn't need to be super deep in development skills, or super deep into data science in order to get started. And then we go back to Launchpad. We talked about the labs, there's a great learning platform, you put all this together. And I think back to your question that we finally are making this very easy to deploy and use. And so from, uh, are we there yet? Yeah, I think we are there. Long road to go. But a lot more that we're working on. Again, GTC, don't miss it. Lots of announcements about this. But we wouldn't be at a point where we're beginning to talk about the little stuff, right? Or the big stuff in the form of bundles, we wouldn't be at that point if we didn't have the level of maturity that we've gotten to.

Zach

So Luke, how can businesses, particularly technology leaders, like CIOs and CTOs preempt this kind of friction between AI and IT? Are there any specific processes that can be put in place to smooth over that relationship?

Luke

Great question. sitting here thinking about how best to answer it. And as I as I, the way I'm processing that, is I'm thinking about all the customer engagements I've been on for the last, call it 12 months. And many of which have been with, you know, IT leadership, decision makers, certainly, as well as AI leadership and decision makers and some great quotes. Just talking about accelerated servers, right. So disregard the rest of the stack for a second, just start at the bottom. I had a senior IT person at an automotive manufacturer tell me when I saw this meeting on my calendar this morning, I had no idea I would leave this meeting excited about core data centre 2U servers again. Like it's just not something they think about. Right. That's a decision that was made, I don't know how long ago, and so it's it's an awakening that's occurring even at the leadership level. And then they wouldn't be interested in something as fundamental as that core server building block, right? If it weren't for the fact that the rest of this is an awakening for them as well, right? They, they see the value, they hear the value, they witness it. And so they know the importance. Now, in terms of what's worked for teams that are, are progressing through this, and almost in all cases, it's them bringing both parties to the table. So much earlier example of our own IT team, being a blend of AI practitioners as well as core IT, I think was critical. If those two teams aren't working together to solve the problem, then they're working separately. And that's usually not solving anything. So I think that's step one. I think, you know, there's a skill set question in there as well, like, do I need to hire the right people? Do I need a large pool of this or that sort of skill set? And I would say that we're working very, very hard to make these tools easier to consume, easier to understand and support. So again, meeting both sides at the table. Right. And I think that's a big piece that goes along, back to Adam's question about maturity. I think that maturity is beginning to ease this for those, those executive decision makers. And then, and then Launchpad. I'll come back to that, because I think it's insanely, it's insanely valuable. And the sort of thing that I'm not sure anybody really thought of until now, which is, hey, put your teams in the sandbox together. So not just at the same table, but put them in this lab, and have them work together for two weeks. And it doesn't cost you a dime. Right? I mean, we couldn't be any easier. So I think those are the sorts of things that need to be done and are working very, very powerfully with our customers.

Adam

Unfortunately, that's all we've got time for in this week's episode, but all thanks to Nvidia's Luke Wignall for chatting to us today.

Luke

Thank you very much for the time, this was great.

Zach

You can find links to all of the topics we've spoken about today in the show notes and even more on our website, at itpro.co.uk.

Adam

You can also follow us on social media as well as subscribe to our daily newsletter.

Zach

Don't forget to subscribe to the IT Pro Podcast wherever you find podcasts. And if you're enjoying the show, leave us a rating and a review.

Adam

We'll be back next week with more insight from the world of IT but until then, goodbye.

Zach

Bye!

ITPro

ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.