Feross Aboukhadijeh - Socket

[00:00:00] Introduction

Feross: even if you had, let's say like a five year history on the project, like you're the maintainer yourself, you can go rogue one day. And that's happened too. That's happened actually quite a lot in the last couple of years where people have. Sabotage their own projects and for those people, I mean, what is the defense other than evaluating the code itself,

Andrew: hello.

[00:00:24] The Journey into Open Source

Andrew: Welcome to the DevTools FM podcast. This is a podcast about developer tools and the people who make them. I'm Andrew. And this is my cohost, Justin.

Justin: Hey, everyone. Uh, we're really excited to have Feross on the podcast today. Uh, if you've tuned into any of the open source world in the last several years, uh, you know, Feross 's work, uh, we're big fans and incredibly, incredibly excited to have you on Feross . Uh, but before we get started, would you like to tell our listeners a little bit more about yourself?

Feross: Sure. Yeah. Thanks for having me on. It's great to be here. Um, yeah, so I'm a developer. I've been doing open source software for most of my adult life. Um, so I, um, I got into node and kind of the NPM community. When I, um, kind of right out of college, uh, and, uh, and, and, uh, got kind of hooked into, to that world.

Um, I always had thought it would be really cool to be an open source maintainer kind of for my whole life, because it felt like, um, the people that, you know, developed that type of code, uh, were magicians, you know, they were able to kind of, you know, write, write this, uh, it's like God mode level code that like, you know, that, that powers like all the stuff that we do and, and we, we basically, you know.

Most, most of the time when we start a project, we get it started by like installing some framework or some, some big, uh, you know, dependencies. Obviously we're all, a lot of us are using, uh, You know, open source operating systems or operating systems that have open source in them in some way. And so I always thought it would be cool.

Um, and then, um, I got really obsessed with, uh, this project called web torrent that, um, I think we're going to maybe talk about a little bit later, but, um, that was kind of my gateway into, into open source. Um, I wanted it to be an open source project and, uh, and then, um, uh, ended up writing a whole bunch of, uh, packages, uh, in order to make that project a reality.

And then before I knew it, I was a. You know, important maintainer in some sense in that, you know, my code is, you know, relied upon by all these companies. And I was, I was like, why are these companies using code written by a random 20 something year old? They don't even know. And it was like, Oh, that's cool.

And, and then, you know, I ended up, uh, Uh, learning a lot about like the software supply chain as they call it, you know, and how software gets developed today and, and that kind of led to, led to starting the company socket. So, yeah, that's my, my, my story.

Andrew: Yeah, to say that you're a prolific open source developer is, is cutting it short. I think there's like a handful of names that if you went, who's done most of the MPM ecosystem, it'd be like, uh, you, uh, Luke, and then Sindra are just like in everybody's package locks. So, uh, Thank you for all the work you've done over the years.

Um, like you said, you got into it out of college, but like, uh, what, like, what was your first step into it? Like you said that there were some like open source magicians out there. Were there any projects that you had hoped to contribute to and did you ever end up contributing to them?

Feross: Hmm. Yeah, that's, that's a good question. So, uh, for, yeah, I guess, first of all, um, I just want to say a quick point, like, you know, it's a lot of the people who are in your dependency tree, like, you know, you mentioned Sindra, you mentioned me. A lot of the code that gets into those dependency trees is honestly sometimes the most boring type of code.

Like, the code I'm most proud of is, a lot of it is in WebTorrent and that doesn't have nearly as many downloads as some of the popular packages that are used, you know, uh, in like Webpack and stuff like that. And it's, often it's like the most boring code that, it's like, it's like, you know, you wrote some utility, you wrote, you know, you wrote like a hundred line function or some five hundred line function, uh, that like solves some gnarly problem.

It's not that interesting. It's like something that, you know, manipulates. binary data, or, you know, it's, you know, some, some annoying parser that nobody wants to write, and then some popular project uses it, and then, uh, before you know it, you're like, you know, in everybody's dependency trees, but, um, I have to say, it's like, sometimes it's not even the code that you're proud of that ends up getting, uh, getting the most, uh, downloads.

It's actually the It's the opposite. It's the more boring it is, the more like, you know, trivial it is, the more, um, like people, people install it. Cause more people have that problem than some, you know, really cool, you know, like torrenting problem. Right. Like that's just very, very going to be very specifically, uh, tailored to, to only certain people.

Right. Um, but yeah, um, but yeah, I know your question about how I got interested. So I, I actually watched this video, um. Uh, that Paul Irish posted about, like, how, um, he learned, uh, how he learned from the jQuery source code, like what he, ten things he learned from the jQuery source code. And so I always had a desire to contribute to that project because it was something that I used in all my websites when I was learning, um, web development.

Uh, I never ended up contributing though. But it did get me, it did sort of break that illusion that like, people who do open source are some type of magicians because I, you know, I was able to look at the code and it was, yes, it was more complicated than any other code I'd looked at, but the way he explained it in the video, the way he kind of walked through it, I was left with this feeling of, oh, you know, actually, I could, I could probably do this, like, you know, and then, um, I ended up writing, uh, a library, In Python, um, my only Python library, and my first open source, uh, library that I wrote, uh, which was called SpoofMac, and I used it to basically get free, uh, Wi Fi at airports by changing my Mac address, uh, so when you run out of the 30 minute time limit, you know, you can change your computer's, basically your device identifier, also known as the Mac address, and then it appears that you're a new device to the, to the Wi Fi router, and it will give you another 30 minutes.

So, uh, I posted that online, and it, uh, it, uh, Got decently popular and, uh, and, uh, then I was hooked.

[00:05:51] The Challenges of Open Source Funding

Andrew: So, so with lots of packages comes lots of responsibility. Uh, we've talked to it, uh, about this topic to many other open source developers, but how, how have you managed to manage all of the different packages that you've put out over the years?

Feross: Well, uh, it depends on what time frame in my life you're asking about because, uh, today I don't feel like I'm doing a very good job. In fact, I'm kind of, kind of, uh, yeah, kind of not, not doing a good job at all. But, um, yeah, I mean, so, so the arc has really been like, when I first, when you first start in open source, you end up, um, being super happy when anybody uses your code.

Like the first, Time. You see any downloads or anybody open an issue, you, you're like, so excited. Oh my gosh, they found a bug in my project. That's so great. You know? 'cause that means that they're using it, right? Um, and then, and then over time you, you, um, you start to get more and more issues and, uh, and that's still pretty fun.

Um, now you're, you know, you're. Getting lot of users and maybe get invited to a conference talk to talk about one of your packages. It's all very exciting. And then at some point you go and you cross this threshold of like, okay, I wake up and I have 40 issues. That have been opened yesterday and there's no way I'm going to get to those and they're all random esoteric things like they're not even interesting problems anymore.

They're like, oh, it doesn't work on some random, uh, Linux variant that nobody uses. Right? And you're like, I guess I guess I have to debug this. You know, I guess I have to. And at least for me, the way I treated it as as debug You know, this GitHub issues list was my to do list. I would wake up in the morning and I would just do everything on the list.

And so it was like, imagine like a publicly, globally writable to do list that anybody can add items to. And that was just what I was doing with my life every morning, waking up doing that. And, and then, um, you know, at some point, I don't know what it was, but it hit me, I was like. The people opening these issues, some of them like work at fang companies, and they're probably making like, you know, 300, 400k a year, uh, and they're opening these issues, and I'm literally unemployed, like doing this for fun, and I'm just like, oh, I guess I'll do, I guess I'll go and do this for you for the next three days, and, and then, and, you know, no offense to the, to the, you know, people who, who, uh, you know, use open source and open up issues, that's fine, but I realized I need to change my, um, kind of relationship to, uh, uh, to the way issues are, Uh, are, the way I think about issues, really they're not, I don't really have to do all of them and I don't have to, you know, I don't have to be responsive to all of them.

I can actually focus on what, um, what I think is important and where I want the project to go, which I guess is probably obvious to a lot of people. But, um, I know that that type of mindset is, it's, it's decently common and I think it contributes a lot to like maintainer burnout, you know, and wanting to be a good maintainer and wanting to, you know, not let people down that decided to use your code.

It's like pretty, it was pretty ingrained in me and a lot of other people. And I, I, uh, I think you just have to at some point, like declare bankruptcy and be like, all right, you know, I, um, I can't do that anymore. And, um, and so, yeah, I think, uh, you know, one thing that helps is adding, giving people access to the project that, that, uh, have contributed, you know, being very liberal with like sharing access that really, really helps.

Um, I did that on a lot of projects, including like web torrent and standard JS, uh, and that's helped so much. And those people now are really running the project and. You know, deserve most of the credit, um, for, you know, the continued success of those projects. So, um, yeah, being, no, don't be stingy with, like, sharing access.

I think that's, like, the best tip, tip I have for maintainers. You can always reserve, like, NPM publish, uh, powers for yourself if you're worried about somebody, um, you know, you've never met before in person, um, publishing something malicious or something bad that you don't agree with, um, but, um, but giving them the GitHub, uh, commit bit is, is pretty, um, there's pretty low cost.

I mean, what worse they can do is, is publish some bad commits. You can always back those out or even force push over them if you really don't want them around and, and then remove their access. So, um, there's like a way to do it in stages where that you can really build up trust first.

Justin: This does feel like the classic maintainer art though. You start off with this, like optimism, you see like open source maintainers out there and you're like, Oh, these people are so cool. I want to do this. And then you like stumble into it. And like, suddenly you're a maintainer is like, Oh, people are using my thing now.

Awesome. And then like, It's just continues to grow. It's like, people are using my thing now. Awesome. And then you like wake up one day and like, this is not awesome anymore. And I think it's also interesting that. A lot of the like really prolific, uh, open source contributors and maintainers that we've talked to recently.

So I think like of Anthony Foo. Uh, who's just, just incredible, incredible developer and like, you know, very prolific, uh, open source contributor himself. Uh, they do this like delegation as like a key part of their strategy because they know, like, there's like this trough of despair, right. And like at a certain point, and the only way to do that is like bridge the gap with, you know, getting more help.

And, uh, I think it's, uh, it's cool to hear your experience, but like very validating in general, it's like. Somebody needs to write a book one day. It's like how to be an open source maintainer. Here's the art. It's like there are lessons that we've all learned.

Feross: Yeah, there's actually been a great book written on this topic called working in public the making and maintenance of open source software by a Nadia Eggball umm it's it's great Actually has cindersaurus and in a couple quotes from me and other open source people in it, and it's it's pretty great.

Justin: Oh, that's fantastic. Didn't know about that. Cool. Um, so speaking of, uh, like open source sustainability, you know, and you're, you're talking earlier, you know, you've got these People potentially from thing companies, like creating issues on your open source, uh, PR. And that comes to a topic that we discuss quite a bit, which is just like open source sustainability and in particular open source funding.

Um, so you've written a lot about this, about open source funding over the years, uh, written, created some tools around it and all this stuff. Um, so. How have your sort of like past experiences, uh, with like sustainable open source or open source funding went and like, how do you feel we are today? Versus like maybe when you started thinking about this,

Feross: It's a complicated topic. How much time do you have?

Justin: go for it.

Feross: I Mean Yeah, I think I don't know. I could take this in many directions. I guess, like, you know, I, um I feel like part of the problem with open source funding is that a lot, there's like a never ending supply of people who are excited to start doing open source, you know, like, like I was in the early days.

And so, you know, the one really big challenge is if you try to just charge for this stuff, you realize actually that the value that the market's willing to pay for it is basically zero, because it's like, if you. Put a paywall in front of your open source package and say, Hey, you have to pay. Or even if you try to, you know, change the license to make it not pure open source, you get an insane amount of pushback.

And then there's always some other, you know, up and coming excited. You know, a young person who's like, Oh, right. The open source version of that. And then, you know, and, and there's, there's a reason, you know, I mean, it's exciting to be doing open source, which is why there's these people willing to do this.

And there's also like, um, you know, obviously benefits to your career to like be, you know, to be able to say that your packages have, you know, users and all these different companies. And so I think that's one big challenge to like charging for this stuff is that, um, is that. It's just hard. It's, you know, you can't really change the license that people getting mad.

And there's always somebody who will just make a, make a, you know, a free version that's, that's pure open source to, to replace, to replace you if you do that. Um, so yeah, a lot of people who've been successful, I think, um, have to basically become marketers. And, you know, be really good on Twitter and be really good on, on, you know, used to be Patreon.

It was one of the, one of the first places that people went to, and now there's GitHub sponsors, but you're basically marketing yourself. And that's not something that every maintainer wants to do. It happens to be something that like I was, I was pretty comfortable doing, you know, talking about what I'm working on and, you know, and sharing it with people. umm But a lot of people feel like that's, they're just not comfortable doing it. They don't want to have like that persona. They don't want to talk about their work or they do, but they don't do it. In like aggressive enough way that they can actually build like a, like a, like a life on the income from this.

Um, the other big problem is if you try to do that, is, is that like, you could be an incredibly dependent upon maintainer, but you might be in a, you might be responsible for a dependency that is, um, a transitive dependency for most people, so they don't even know they're using it. And that's a huge challenge.

Like if your package has a million downloads a week. But it's used by another package, and all your downloads are through that package. Then that's the package that people know about, and that's the package that people, they go to the doc site for, that's the one they go to the GitHub issue tracker for, and they open up issues.

And if, if you don't, if you're not in those workflows, if you, if they never go to your website, if they never go to your issue tracker, then you have no chance to even make your appeal to them, to ask them for donations, or funding, or, or, you know. Give them your sales pitch or whatever you're trying to do to make money, um, from this.

Um, and so it's really challenging. Uh, and, and there's a lot of people that, you know, that do that kind of, you know, really core infrastructural, these really core infrastructural packages that nobody even knows about. Um, there was a, yeah, there's a bunch of examples that we could get into, but, but yeah, I think that's one big problem.

The other. So the solution to that might be tooling. You might say, well, why don't we just use some tools that tell us about this? I tried that, you know, that was the thanks project that I, that I started basically tells you, you know, which, which of your dependencies is asking for funding. Um, there's, uh, ways that, uh, you know, uh, get sponsors and, and, um, Other tools can tell you that now, um, open collective and stuff like that.

Uh, back, back your stack is another tool that open collective made to help you find out this type of information, but it, it tends to, I mean, it's not widespread because you have to be the kind of person or company that wants to go out of your way to back open source and to, and to ask, you know, what am I, what's open source should I donate to?

And, you know, that's, so I think that there's actually a problem with the whole framing of donations is it's sort of like. You've given everything away, and then you're like asking afterwards, Hey, do you think this was useful? If so, could you spare some, some change, you know, for me? And it's, uh, it's kind of, uh, yeah, it's just kind of bad.

So, uh, it's, it's, it's a bad, uh, it's a bad way to like successfully raise a lot of money. Now, I don't know. Yeah. So I guess like, um, maybe this should be like funded by governments. I don't know. Maybe it should be like taxes collected for this. It's a public good. You know, I don't know. I don't know what the answer is.

Um, what tends to actually work in practice in today's world that I've seen is, um, people doing, um, uh, like contracts. So, you know, support contracts for the companies that use their code. Um, that's a way to make a decent amount of money and you don't have to go and collect 5 from every maintainer, you know, or every user of your software, which takes forever and is hard.

You can just go to one company and They'll pay you, you know, a good amount, and then that's one way to go. The other way is to sort of, from the beginning, think about a business model for your open source, right? And like, you have a lot of companies that do this. Some of them cynically, some of them not cynically, you know, out of genuine desire to share their code with people.

But, um, you do see these businesses built around like an open core and stuff like that. Um, and, and there's, you know, that's, that's one way to do it. Um, but I just have this dream that like an independent person, like the way I was, an independent random individual, Could just do open source as a job. Like, one day, I want there to be a world where like a kid could stay in school, you know, when I grow up, I want to be an open source maintainer and they could just do open source.

And because they're providing value to people, they would just get compensated somehow fairly. Um, and that was what I was trying to, with all my experiments, I was trying to kind of

um Get

to, but, um, but yeah, unfortunately, you know, it, uh, it was pretty hard. So, um, yeah, that's the short version. I don't know if you can keep talking about it if you want, but there's, yeah, there's a lot of, uh, there's a lot of good people still working on it too.

I don't want to be like completely fatalistic either. Cause maybe somebody will figure it out.

Justin: Yeah, it is hard. Uh, you said something that, that I think about a lot, which is just like, oftentimes I think even the value capture from like people who are able to make some money in open source is often skewed because more of more often than not, it's like other developers who are also, you know, doing things who recognize the value of work who contribute to you.

Right. But not necessarily the companies. Who are really capitalizing on the top level. Right. So it may be like employees from the companies who are like individually donating stuff like that. And I think that that, that skew of like where the most value is going versus like, you know, where the funding is coming from is, is, uh, unfortunate and hard to bridge.

[00:18:06] Ad

Andrew: Once again, we'd like to thank our sponsors that are sponsors. This podcast. Wouldn't be possible. This week's episode is sponsored by Raycast. Raycast an app format. That's like spotlight, but way better. It can do all the things spotlight does, all the things Alfred does and a host of other features in there constantly adding more.

One of my new favorite features that they added is AI emoji suggestions. when I first heard about Raycast , I didn't think it was going to be one of my favorite AI companies, but, but here we are. They're shipping features that improve my life daily. With the AI emoji search, you can search for emojis without even knowing the name of them. Which is a really cool use of AI.

Being able to understand natural language. Better than normal programs can to get your results better than normal programs.

That new feature is only available with Ray cast pro with recast pro you get a bunch of really cool features. But the biggest one for me is to get access to recast AI with raycast AI. an LLM is just a keystroke away. this has changed my workflow immensely as I've used it throughout the months.

With Ray cast pro you also get access to raycast teams where you can share. Your workflows and the extensions you made with your team.

You want to learn more about Raycast, head over to rake hast.com or if you want a more in-depth look about what recast is, where it came from, where they plan to take it. Head over to episode 38, where we interviewed the CEO, Thomas.

Do you want to advertise with dev tools, RFM, head over to dev tools.fm/sponsor to apply. With that let's get back to the episode.

[00:19:29] Controversial Funding Experiment

Justin: Do you want to hear the story of the, um, the, the funding experiment I did? That was the most, uh, uh, uh, I don't know what the word is. Controversial, I guess.

Andrew: Yeah, go for it.

Feross: Yeah. Yeah. Um, so one idea I had was to put, um, to put, uh, like an ad into. Uh, so when you go type NPM install, uh, foo, and you'd enter, you'd get like a little message that says like this package is brought to you by, you know, like whatever the company is.

And, uh, uh, and so I, I, um, found a couple of sponsors, um, uh, Linode and, um, LogRocket, uh, were the two companies and, um, they were super, I mean, basically, you know, they wanted to help, you know, Innovate in open source funding and try new things and try new models. So they were like, you know, they, they were kind of interested in just being part of this experiment and, and help trying to help maintainers.

And the way I, um, I did this was, um, like as privacy preserving as kind of friendly as it could possibly be. It was literally a console log that would show up, you know, during the NPM install process, there's no like ad network code, there was no kind of data going outside. It was literally a console log.

Like. It's as simple as it gets and there was no way for them to measure like the success of it. It was purely this like, you know, as clean and pure of a, of a thing you could, you could do. And, um, I added it to a project that, uh, I just released a new major version for. So it was also in a major version.

So, you know, you had to opt into, into the new version. And anyway, after all this, uh, uh, You know, it made it onto hacker news that, uh, you know, I tried, I was trying this experiment and people, people had basically, there were two, two, two, two extreme reactions. One was from basically maintainers who were like, yes, I'm so glad you're doing this.

I hope that more people like realize what the situation is with us and like, that they are aware of the problem. Thank you for doing this. And, you know, it's also, there were a lot of them were telling me in private, like, thank you for. Um, you know, being the one to poke the hornet's nest because they didn't want to do it.

Um, and then, there were the people on the other side who were like, um, My terminal is my one part of my life where there's no advertising, where it's free from, you know, spam and like this type of thing. And they just, they wanted to have nothing to do with it. They were like, burn this with fire, right? Get rid of this.

And, um, and so they were, you know, they were very upset. They, one, one hacker news commenter said that I like, uh, I should go to jail for this, like it was probably some type of computer crime to do this, like very, very heated emotions. Um, on both sides, Um, and in the end, actually, the companies were, um, were so, uh, I mean, they were getting customers calling them saying like, I'm going to cancel.

Uh, my subscription to, you know, your service. If you don't, uh, if you don't cut, cut this, you know, stop your ads, uh, from running on, on, on, on these packages. So I got a call from the CEO of one of those companies and they were like, take it down, take it down, please. We're losing customers. And I was like, okay, this is supposed to have positive ROI, not negative ROI.

So I just, you know, I just basically, uh, uh, published a new, a new patch version that just, you know, killed the, killed the experiment. And, um, and then I wrote a big postmortem on my blog about kind of the whole thing and what went down, but yeah, it was, uh. It was a very interesting, uh, it was a very interesting reaction to get and to see the heated feelings on both sides.

Andrew: Yeah, it's like our society does not know how to grapple with the idea of a free thing and like the problems that it might entail from both ends.

Justin: I remember the core JS thing, which is very similar, sort of like advertisement for like, Hey, you know, I'm developing that. Well, it was, it was the same, but different. But yeah, there's a lot of backlash from that too.

Feross: yeah, that's actually part of the reason now why I mean a bunch of things happen from from that and from other things like now, um, you know, in response to the to that experiment I did, um, NPM now has a fund command. So you can run NPM fund, and it'll tell you which of your packages are seeking donations, but then they actually change their.

Policy as well to say that you can't, um, you can't do what I did. You can't, you know, can't put an ad into the install process. Explicitly to, to stop this. Uh, and then, um, with Core, with the CoreJS person, I think those messages also are now suppressed. So anybody who tries to print, uh, logs out or print things out during, during the install process, that just all goes to, basically to DevNull and, and you can't see it.

Um, so that was NPMs. I mean, NPM tried to kind of help with the NPM funding, but they're also like, no, no, no, you're not doing that.

Andrew: Yes, so not much progress there. Um, let's switch gears a little bit before we move on to socket.

[00:23:53] Exploring WebTorrent and Decentralization

Andrew: Uh, you mentioned earlier that you're heavily involved with web torrents. Uh, I've used torrents in the past. Maybe, maybe not. Um, uh, so what was the project and like, uh, what's what were some cool things that you built with it?

Feross: Yeah, so, um, yeah, I think I've always been interested in peer to peer systems. I just think the idea of like a bunch of strangers on the internet, uh, cooperating to share files with each other and nobody knows who they are, nobody, nobody trusts each other, but somehow they're able to cooperate and accomplish useful work together is like a really beautiful idea.

And so separate just from like, you know, what people, most people think of, uh, with, with torrents is, you know, downloading movies or something like that. Um, I was just really fascinated by the, by the technology and by like the, the, uh, sort of the idea of building like a decentralized, uh, internet. Um, and this was all before cryptocurrencies and all that type of stuff.

It, you know, had really. Taken over the word decentralized in people's minds. Um, and, and so, yeah, I wanted to build, basically the vision was like build a torrent client that you can use from your web browser that doesn't require you to install software on your computer. And that would be awesome because then.

You know, you wouldn't need, you would need to explain this technically complicated concept to, to beginners. They could just go to a website, they could click a button, and then, you know, uh, content would start to come to them, um, through a peer to peer network. And I was starting to see, you know, all the, all these new, uh, web technologies get added to the web browser.

And, um, WebRTC was the one that enabled this idea to even, you know, to be possible. Because with WebRTC, you can actually create A peer to peer connection from my browser to your browser without a server involved. And that way I can literally just send you data over like a socket type interface. This is how stuff like Google Meets is implemented.

That's how this Riverside Chat is that we're using right now to record this podcast is implemented. But, um, but nobody had really I hadn't seen people doing, doing that many interesting things with it. They were mostly doing video and voice calling, which is what it was intended for. But I realized that actually this data channel component of it lets you send any data you want.

And so you can actually implement any peer to peer protocol over it, uh, that you can imagine. And that was super cool. And I thought, well, why don't we do torrents? Because, um, you know, that would, that would, uh, just, just make this stuff more accessible to people. And, um, and so, yeah, that's basically what I did is I started by building a torrent client in JavaScript.

that worked in Node. js, and then once that was working and you could torrent stuff, um, in Node, then I basically just went in and changed the, um, way that the connections are opened, and in Node you can do TCP and UDP connections, but in the browser you can't, obviously you can't do those, and so in the browser it uses WebRTC instead.

Um, and, uh, That was the vision. Yeah, and, um, you know, it's still going strong. In fact, actually, after about 10 years now of us working on this project, uh, we now have pretty much ubiquitous support in all the major, um, desktop torrent apps. That means that they now support the WebTorrent protocol, which means that, um, if you go to a website and it includes our WebTorrent JavaScript library, it will actually be able to connect.

To, uh, people who have desktop Torrent apps installed. And basically now we have this massive network of partially of browsers, partially of installed, uh, apps, and it's all like connected. It's super cool. Uh, and, and I, um. Yeah, it just took a long time, but now we got here, and it's, it's pretty cool to see it, like, actually working, the vision actually being real after all this time.

Andrew: So there are new decentralization technologies. Have things like IPFS, like spark that same interest in you? Do you think like the next wave of like decentralized file protocols is it, or do you think like what, like torrents kind of got it more, right?

Feross: Well, it's, torrents are a really old protocol, like, they were around before, before JSON was a thing. So, literally, the way that the, the data interchange happens, it's like, it's this weird, uh, thing called bencoding, which is like, fake JSON, but like, worse, basically, in every way. And, Uh, it's just, it's a disaster and, and, uh, things are very underspecified in, in, in BitTorrent.

Like, people kind of just observe what the other clients on the network are doing and try to like, be compatible. So there's a lot of like, things that are wrong with, with, with Torrents. But that said, it's also like, I think probably still the most popular decentralized protocol. I could be wrong on that, but certainly for most of the time I was working on WebTorrent, none of these, um, new projects, these cryptocurrency backed projects were actually Even, you know, like 10 percent as popular as torrents, um, just in terms of like real users using it to do real things.

Um, uh, so, um. So, yeah, but on the other hand, like, things like IPFS are, are, um, you know, have had more thought put into them. They have had the benefit of all the learnings from all the last, you know, like, 15 years about, about, you know, how to make this stuff work better. So, um, I'm optimistic that that stuff is like, you know, cool and people are gonna, you know, use it for cool things.

And, um, but there's something about the beauty and simplicity of, of torrents. Like, if you haven't actually, if you're a developer and you haven't actually looked at the BitTorrent, uh, specification, I recommend just take, like, five minutes and Google BitTorrent spec, and just, like, scroll through, it's like three pages, and you can just, like, read the spec, and you can understand torrents.

Like, it's, it's really simple, and it's incredible how, like, much you can do with it, and how, and how, like, you know, how long the ideas, like, lasted for. Um, it's super cool. It's a really cool protocol.

Justin: That's awesome. Yeah, I'll have to do that. Uh, so just switching over to the next topic. Uh, so you teach a class at Stanford. We understand. Uh, so what do you, what do you cover in your class?

Feross: So I teach a web security class and, uh, it's, um, basically, um, you know, covering. The truth is, it's really, it's honestly kind of like a JavaScript and web class, but through the lens of security. So, um, you end up at the end of it learning like how you like learn how the browser works at a pretty deep level, I would say.

So, um, and I, I love the web. And so I, I kind of, it's kind of like, I, I tried to kind of, even though a lot of people love, love to complain about, you know, JavaScript and the web in terms of security and they want to, you know. A lot of people will say they want to kind of burn it all down and start over with something that's pure and, you know, it doesn't have all this baggage of all this legacy stuff i actually think the web

It's pretty awesome that, um, it's as good as it is given that it's evolved from like being a text document viewer. And so, I try to basically kind of convey that like love to the, for the web in the class. Um, and then you kind of learn, you kind of learn like what you can do with different APIs and kind of where all the boundaries are.

Like what, you know, how does an iframe work? Like how does, how do all these different attacks work? And how to make sure you don't get affected. But through the lens of like, The web is awesome, you know, and the videos for that are actually online if anyone's interested in watching it. You can search CS253 web security and you'll find the videos on YouTube.

[00:30:29] Socket.dev

Justin: That's really awesome. Uh, and maybe that's a good transition point to talk about what you're working on now. So, um, you have a company called socket, uh, socket. dev for people who want to visit the website. Uh, so can you tell us a little bit about what socket is and why you built it?

Feross: Yeah. Yeah. So first of all, it's a team of people. It's not just me. Um, there's a whole bunch of us now, about 15 people building socket. Um, and it's a tool to help developers and security teams to basically to ship faster and spend less time on security busy work. Um, so it really helps people, developers find, safely find and audit, um, uh, open source software at scale.

So, kind of the backstory is, uh, you know, I mentioned before that, um, uh, you know, I was a maintainer. A lot of the, most of the folks on our team, uh, the developers on our team were actually maintainers as well. And so we kind of understand like how the sausage is made as far as open source is made, is concerned.

And, you know, we know how, uh, Like how sketchy some of these, some of these practices are that some maintainers have, like I had a friend who is a maintainer of extremely popular packages that had a six letter password. At one point and I'm like, dude, you can't do, you can't do this. You gotta, you gotta make your, you gotta make your password better.

Like you're going to get everybody hacked. Like please, uh, and, and you know, and then we've seen kind of the different supply chain attacks that have happened over the years. Um, I think the big one for me that really, um, got me super interested in, in this, the security of open source, uh, was the event stream, uh, package that got compromised in 2017.

Um, so the story behind that is that basically, uh, this guy, Dominic Tarr, who's an incredibly prolific open source maintainer, um, author of 500 packages. Um, obviously he couldn't maintain all those super well because that's an insane amount of packages. Um, he was, he's this incredibly generative, creative guy and he, um, had, um, written a streaming package called EventStream and then kind of moved on to a, to a better implementation and then had actually moved on from that to another implementation.

So he's a very generatively creative guy creating all these, these new, uh, things for us. But it turns out that that early package he made was It's used by a lot of popular packages and he had completely moved on from it for like four years. He wasn't even looking at it. So it was collecting a lot of issues and problems.

And some guy emailed him and said, Hey, we use this at my company and, um, you're not maintaining it anymore. I'd love to help you. And he said, sure, you know, here's the access. And like we said before, like, that's a totally normal thing to do in open source. That's actually encouraged and that's what everybody does.

Um, but it turns out that, um, this person was. They pretended to be good for about a month, and then after about a month, they, they, uh, included some heavily obfuscated code into the bottom of one of the files, and This code, it wasn't clear actually what it was doing, in fact nobody, nobody even noticed it for, for about a week.

Um, despite this package having tens of millions of downloads. Um, because, uh, the code, uh, only ran in one company's, uh, project. So it was super targeted. Um, the way it did it was actually kind of funny too. So it took the, it would go up the package. json tree to find the very top level package. json for your project.

And it would take the description string in that, uh, file. And it used that to decrypt. The obfuscated, you know, the, the, the encrypted code. The payload basically. And so what that meant is if it was running in, like, you know, your project or my project, it would basically fail to decrypt and then nothing would happen.

But for everybody el or for, for that one, that one person they were going after, that one company, it would run. Um, and so nobody noticed it for about a week. And, uh, and it was, uh, it was, so they were targeting a crypto wallet and basically stole a bunch of Bitcoin from a bunch of really, uh, like rich Bitcoin people.

Uh, and, um, they, uh, yeah, so, so it ended up getting built into that company's, uh, electron app that got shipped to all their users, uh, before anybody noticed it. Um, and the way we noticed it too, by the way, was like, this was the most disturbing part to me actually was like. It wasn't like somebody noticed it because people were like reviewing the files and, you know, uh, review, like looking at what their open source code is doing or something like that, or there was no tool that they ran that found this, right?

It was complete accident. So what they did was, um, a new version of node had come out, um, like a day after the, the, this malware author had put this, this booby trapped code into the package. So they didn't know that this new node version had come out. It came out like after they'd already done this attack.

And that new version happened to just by total accident deprecate. Uh, the method that they were using in their attack. And so that caused a bunch of deprecation warnings for people who are running the Bleeding Edge version of Node. And they, then they traced that back to this, to this file and said, what the heck is this, you know, this code doing?

It looks super sketchy. And then they found it. So, um, and so that just made me realize, like, wow, like, how, how frequent is this type of thing? You know, like, how often is this happening? Like, and we're not finding it, you know, who, who knows? And then I started asking around and asking people, like, Do any of you like review your open source code and see what it's doing?

Or do you just like hope for the best? And, um, and nobody like nobody was doing anything like of the sort. I mean, and so I just realized this was a big problem and, and, um, and then wanted to solve it.

Andrew: Yeah, that particular case is just like crazy because it's like nobody would have known like if he had not used a deprecated API and it just would have gotten through. It's just like, and. He, he built trust, which is like the crazy part. Like most people would say like, Oh, if this person had like a month long, like commit history on the project and like, you've started to build a rapport, like, no, like they're a safe person, but they were the not safe person.

Feross: Yeah, totally. And the thing is, is even if you had, let's say like a five year history on the project, like you're the maintainer yourself, you can go rogue one day. And that's happened too. That's happened actually quite a lot in the last couple of years where people have. Sabotage their own projects and for those people, I mean, what is the defense other than evaluating the code itself, you know, and, and this is why I get kind of personally, um, a little bit annoyed when people, um, talk about code signing as the solution to all the problems, you know, um, some package managers like aptitude on Linux, um, have code signing, but, uh, but NPM doesn't, uh, in the same way, um, require, you know, require authors to sign their, their packages.

Um, and so, um, You know, I mean, it's a nice thing to have, but it's not going to solve this case where, you know, I'm the author, I have the signing keys, I decide to go rogue one day, um, And we actually saw a lot of this around the Russia Ukraine war recently, where, um, Uh, people were putting protest code into their packages, including one person who even, um, added code that would literally RMRF your hard drive if you appeared to have a Russian IP address.

Actually, it wasn't, it wasn't an RMRF. It was, it was implemented as a loop over all your files. So it would go through all your files one at a time, and it would replace the contents with a flower emoji. So you just get all, all your files would just become flower emojis. And they just published this to their popular package, and then anyone who ran it was owned.

And then eventually like, you know, NPM caught wind of it and then like deleted it because it's like malware and, and, and, you know, it shouldn't be on there, but, um, but other more mild forms of this still are still alive on NPM. So like there's there's one kind of under a different person did another type of thing like this where they added secret code.

into their package that gets 600, 000, um, weekly downloads. It's called, uh, EventSource Polyfill, and it's a pretty popular, like, web polyfill project. And, um, what it does is it looks at your time zone, and if you're in, uh, an Eastern European time zone, then it will call a set timeout for 15 seconds and wait.

And then after 15 seconds, it will, um, open a new tab to this petition website to get you to, you know, basically sign something against the war. And regardless of what you think of the war, um, and like, you know, who's right or who's wrong, like, this is definitely something that, um, like, nobody expects their open source code to be doing.

Like, imagine, you know, you're a bank or something and you, you know, your bank website just starts opening pop ups to, like, random petitions. Like, people are going to think that the bank is hacked and, like, it's going to be really damaging to the company. So, um, this is the kind of thing that, uh, you know, that, uh, is, you know, is, hurts trust in open source, I think.

And so, uh, that's the kind of thing Socket can actually detect.

Andrew: Yeah, with JavaScript, you have just so many packages, like, I don't think you could really, like, expect a normal developer to be like, Oh, you installed webpack. So you audited every single dependency in your tree, right? Like, no, nobody's nobody's doing that. So, um, how does how to socket help me solve this?

Like, what tools does it provide to like, alert me to these things in my own codebase

Feross: so the first thing that Socket does that no other security tool does that people might be familiar with. First of all, let me just take a moment to talk about like, when people think about open source security, a lot of times they think of vulnerability scanners. So things like Dependabot or other tools like npm audit that will tell you about known vulnerabilities in your packages.

And one problem with that is that those can be pretty noisy and a lot of the things are pretty low. Low risk. Um, there are, you know, things that, uh, probably can't even be exploited. The code's not reachable. Um, even if it could be reached, it's like, it's like a, you know, it's a minor, it's a minor issue.

Like a, like a regular expression denial of service is a very common one that is super low impact. Um, and, uh, and so people really obsess over these vulnerabilities, but meanwhile, you know, they're not even like, they're not detecting that the worst kinds of attacks that we've already been talking about, like a package that gets hijacked.

They're just like, oh, I guess we can't scan for that. Um, I guess we'll just hope for the best. So we're, we're wasting developer time on all these like non important, or less important issues, and, and we're not even looking for the really big problems. So what Socket tried to, you know, what we wanted to do differently when we launched, and what we, what we do differently than everybody else is we actually download the code of every open source package that exists that we can get our hands on, on NPM, PyPy, and Go today.

And we're expanding to more ecosystems over time. And we, once we have a copy of all that code, we do static analysis on it to figure out what it's doing. So, um, first thing we can do for any package is we know whether it's going to access the network, whether it's going to read your files, whether it's going to delete your files, whether it's going to read your environment variables, your API tokens, your keys, whether it has obfuscated code, um, all that type of behavioral, like, capabilities of the package, right?

Um, we can, we know about. And so what that lets us do is If a package doesn't do any of those dangerous things, then it's probably a pretty safe package. It's probably like a pure function or something where it's, it doesn't even do any I. O. It doesn't even touch your data or talk to the network. It's probably a very safe package.

Um, on the other hand, if you've been using a package for years, That was pure and didn't do any IO and suddenly now, uh, a new version comes out and, uh, it now needs to run shell commands and open child processes and, uh, and read your environment variables and send them to the network. Like that's a pretty significant change in behavior of the package.

And if you look for that type of thing, you could have caught almost all of the supply chain attacks we've been talking about because they all happen when somebody takes over a package and they just add in Cause like, what does an attacker want to do? Right? If you think about it, there's not that many things that they can do that give them value.

Right. They almost always need to talk to the network. They almost always want to, you know, use your data in some way, so there's going to be new places where that's happening, and so we can just look for those things, um, and then, and then when we find them and they're correlated with like a new maintainer being added, um, or, um, you know, we have a bunch of signals, basically, if people are interested, they're all available on our website, there's about 80 now that we look for, um, and, uh, when we find a package that has enough of those things, then we put it through a second level scan that will actually, uh, Try to decide how significant the risk is.

And then when it makes it through that, then it has a human review step. And then finally, if it gets through all that, then we will flag it as, as confirmed malware. So. Yeah, it's a pretty thorough process. We're also using LLMs, uh, which is kind of cool. And we actually summarize the findings in plain English for developers.

So, um, for any malware we find, it'll literally, you know, explain to the developer, like, this is going to allow a remote attacker to Send commands to bash on your computer and let them run, you know, whatever commands they want. You should not use this package. Uh, and it'll, it'll, it'll very clearly explain that inside of the GitHub pull request for the developer and it can even block the PR, um, for them and, and, uh, uh, stop them from, like, owning themselves and their, and their company.

Justin: That's, that's pretty cool. Uh, good use of LLMs. So a question I was going to ask is I feel like the situation around JavaScript in particular is hard because you have different execution contexts. So, you know, you have like an NPM, you have all these mod modules that like are intended to be executed in different places.

Maybe you have something that it's like intended to be executed on the client, you know, and, And a web browser or something that's intended to be executed, uh, in node and like the attack surfaces is where those are going to be very different. And, you know, the things that I've been like, I I've seen, uh, attacks, uh, from dependencies that are injected into the page and the client that are like very clever and like hard to detect.

You remember like the, the CSS key logger back in the, like MySpace days was like a, just like a mind blowing thing. And then, you know, people doing things like just, uh, Updating an image URL to like some random server and like putting a payload on a query parameter payload on the end of that, that like leaks the password or something.

This is like, a lot of these things can be incredibly hard to detect. So it's like how, so you said, you're like detecting like network access. It's like, do you have to really like divvy up is like figure out what the context is and then try to, you know, figure out some more, you know, extra context around that.

Or like, how do you. How do you actually do that?

Feross: Yeah, it's a great question. So. Um, we, we're currently pretty node heavy, we, so we assume a lot of, I mean, most of these packages are, are, you know, are, are, you know, server side, uh, packages, um, that, that contain these, these, um, these payloads, at least a lot of the, the, the stuff we see coming through, but for browsers, we do detect things like fetch and, and, and, and all that, you know, that kind of, kind of XML HTTP request, um, uh, that type of thing, uh, explicitly.

Um, Um, but then you're totally right. I mean, there's like a million ways to exfiltrate stuff. And, and, um, the safest thing to do is to actually use a content security policy and just, you know, lock down what the page is actually allowed to do. Um, but, um, short of that, like one, one other thing we use LLMs for that I'll, I'll mention here because it's, it's actually perfect for this exact question you've asked is, besides all the static analysis, we also just put files through the LLM and ask it, Um, what is this doing?

Um, and that is nice because it can sometimes get around like, you know, we didn't write a detection for this like CSS or this image. Maybe we don't detect like an image, an image tag with a source being set on it as a form of data exfiltration. But if you put it into an LLM and just say, what is this code doing?

It will have no problem saying like, oh, this is, you know, uh, it's sending data to this URL. Um, and, and, and, and, and then if you combine that with like further questions about, um, you know, how could an attacker use this, et cetera, et cetera, you can get it to actually reason its way to this might be malicious, in which case we then put it into our feed for human reviewer to look at.

So, we catch stuff that even our static stuff misses through just throwing, throwing it in front of an LLM and asking it what it thinks.

[00:46:23] The Future of Open Source Security: Challenges and Opportunities

Feross: And that's, it's funny because like. A lot of people when they think of these, you know, these, uh, LLMs, they think of like a chatbot or something like, oh, I'm going to be able to chat with my, my SAAS tool.

But like, we actually see the most interesting use cases to us are that you can actually use this thing to kind of reason almost. Like, it kind of knows what maliciousness is. It kind of knows like what the code is doing. And if you kind of like walk it through, you can actually get it to say like, oh, this could be malicious.

And it might have a lot of false positives, but because we have that human step, it's actually quite powerful. Um, we can, we can, uh, You know, go from having, you know, the insurmountable problem of like one human cannot, or even a team of humans cannot review every open source package to, uh, oh, we have a feed now where let's say one in every three packages in that feed is malicious because there's.

66 percent false positives. Okay, that sounds really bad, but actually a human looking through and finding malware every third package is actually amazing and that's super valuable to the community that we're doing that. So, um, so that's kind of how we use it. And, you know, it's not a perfect answer to your question because we still could miss things.

Um, but it's, we're still, we're catching like, um, right now we're catching about 400 pieces of malware per week on, um, NPM PyPy and Go. Um, so it's already really effective and then for the other stuff, I think we're just gonna have to do dynamic analysis at some point and actually like run this code and try to get it to do things and observe kind of what, what it's doing at the, at the sys call level.

But, um. Yeah, that's a, that's a future project for us to do.

Andrew: Doing that at scale seems like crazy hard. Like there's so much on NPM. There's like, I myself have put out thousands of versions of my packages. Uh, and you said you just run LLM stuff on top of it. So like, can you give us a little bit of a view on how you like how you do all that? Cause I know it's a lot for humans to do, but it also kind of seems like a lot for computers to do in my mind.

Feross: If you do it in a really naive way and you just put all of it into GPT 4 and, you know, you just pay the bill, like it's basically a great way to donate all of your company's funding to open AI and Sam Altman. So we don't do that. Um, but, uh, we do with a bunch of layers. So similar to how I kind of described like the.

layered approach

like If

we have a static signal that something is bad, we definitely put it into an LLM. Um, and then there's, there's other, there's other signals too that are like based on like certain string patterns and things like that where we'll decide that it's worth asking the LLM about it in those cases.

Um, so we aren't literally putting every single file through an LLM because that would be cost prohibitive. Uh, um, that said, we are looking at other ways. I mean, we, we might end up using a local model to kind of do this ourselves and cut down on cost. We also also do, um, uh, like we layer with, we'd use cheaper models like dumber, cheaper models first, and we'll, we'll do things like, we'll ask like five, uh, dumber models, like what they think and if like one of them thinks that it's suspicious, but the other four are like, it's fine, then we will ask the smarter model to be like the decider.

Um, so there's different like. Techniques you can do to cut down on the cost. Um, so yeah, and it's evolving a lot all the time. Like we're still figuring this out. I think the whole industry is honestly figuring out like, how do you use these things and, you know, in, in a way that you can scale, right. Um, and, and how do you even vet this?

Like, how do you make improvements to your, to your prompts? Measure, like whether you're making things better or not. 'cause otherwise you just, could you just change a prompt and you're like, ah, you know, it fixed the problem I saw here, but did it make it worse like everywhere else? You know? So, um, there's a lot of stuff that not just us, but everybody's figuring out right now.

Andrew: Yeah, the company I work at, we're integrating, uh, LLM features, and, uh, we have the same problem of all of our prompts, only one of them can we actually measure if it's gotten measurably worse or measurably bad. So yeah, definitely unsolved.

Justin: I think this is really a very novel use of AI though. And like, uh, a pretty cool use of it too, cause you're right. This is like, I don't know, you know, I, I tend to try to stay off the hype cycles, but there's one thing for certain that, you know. We have always, or for the longest time, you know, we've had machine learning, of course, but like, oftentimes we write the black box and it's like hard to get magical answers, you know, out from the other side, unless you're specifically training it on a very specific dataset.

So, I mean, you can imagine a company in your position, you know, five years ago, 10 years ago, who would be like getting a dataset of malicious code and like trying to say, train some ML model to like have some, you know, signal about like, yeah, you know, this is. Potentially malicious or not or whatever, versus like now I'll just like feeding it to a general model and say, like, analyze this code for me.

Um,

that's that's pretty crazy. I also had a question for you, a slightly different question. So, so, uh, you're talking about like, okay, the best thing to do for browsers, you know, set a good content security policy. So this is like have good security posturing and that like protects you in a lot of cases.

Um, So there's this like interesting transition that's happening, uh, where there are more popular JavaScript runtime engines now. Uh, so, so we have bun, we have Dino, um, the. Well, a, that's like a little bit of an increasing surface area for the tax base. Um, hopefully the, the API APIs from these aren't too incredibly divergent, but they're divergent enough.

Uh, and B what's interesting is like, so I know the most about Dino is like, it has a different security posture, you know, around permissioning for things. Um, and I'm really just curious. What your thoughts are on the ecosystem right now and how it's evolving. And if you have like hopes for like more secure ecosystem in the future, if there are things that are like really concerning to you or yeah, just get your general insight on that.

Feross: Yeah. Well, um, things are, some things are getting better, some things are getting worse. . Um, it's pretty mixed. I mean, so with Dino, I really like, um, the way that their, uh, CLI lets you, um. Turn on or off specific capabilities. So if you have a piece of untrusted code that you want to run, you can deny it access to the file system or to the network.

That's a really great idea. Um, I wish they took it a little bit further though, because as far as I understand that type of, um, system only works at the process level. So you give the whole process access to the network and then, you know, anything can access the network. And so in most real world applications, you're going to end up.

Giving access to the file system to the network to every package in your project, which means you're back to the situation where any one dependency going rogue can cause problems. So, um, so it's a good, it was a good start. I just wish they went further. Um, node has a project called, uh, or a feature called the permissions policy.

And that is a more, um, that is a more fine grained version of kind of the Latino, um, shipped that you can use to basically give. Specific modules, specific permissions to, to do, you know, to access the network or to access the file system, etc, etc. And so, um, that, um, that is pretty powerful, but it does require you to edit, like, this massive configuration file to basically say what every single dependency is allowed to do.

And if you get one thing wrong, then your app crashes at runtime. And, You know, what if you have a code path that only triggers, you know, once a week or something and, you know, that didn't get the permission you needed, well, then you're going to get a random production crash, you know, a week later after you deploy.

So it's just, it's like, it's hard to act and teams just, you know, if they're not even, you know, there's so many more lower, lower level, easier security things that they should be doing that they're not doing like they're not going to, they're not going to use this feature at scale, unfortunately

um There

is a project called LavaMote that I wanted to shout out to that is, um, doing, trying to make this a little bit easier.

So they, um, can auto generate policies for, uh, your, your packages and then you can check in that policy file into git. And then if you update a package, and like later on, it adds a new capability, um, you can basically kind of detect that and then. If you want to give it that permission, you can kind of check in that new policy file.

And they'll even be like a get diff that then tracks that, Oh, we just gave this package additional permissions. Um, so that's pretty cool, but it also has, you know, its own trade offs or its own, its own, um, downsides, which is that it has a lot of runtime overhead. The way that they implemented, implement this.

Um, and, uh, so it's typically so far only been used by very, very security sensitive, um, products and, and only in certain like security critical parts of the, of the product. So, um, the biggest user is Metamask today, the, the crypto, you know, browser extension, um, and they just basically do their, their like signing and kind of their crypto logic inside of this, uh, sort of this.

really tight sandbox that, um, uses LavaMote. Um, they also use Socket, uh, incidentally, too. So, you can, I mean, these things aren't competitive. Like, you can use multiple things, right? So, you can, you can kind of have, have, um, have, have, um, multiple levels of security, which is, which is what you should always do.

Uh, defense in depth, right? That's, that's the right way to do security. Um, but yeah, I don't know. I think, um, I don't know. Overall, I think, um, I'm excited. I'm hopeful that, um, That something like what Lava Mode does will become part of the language. There's a couple of spec, uh, uh, proposals to do that right now that are, that are going through, um, the, the standardization process.

Um, so I'm hopeful that like, we'll actually get a native way to sandbox. You know, modules or functions or something like that, that we can use in socket and in other community, you know, community tooling that would let us lock down these packages and to do it in a way that doesn't have a massive, uh, you know, like 90 percent runtime performance hit, um, that would be, that would be kind of the dream.

Um, and then what we need is like some community coordination so that the, you know, that when you install 10, 000 dependencies, like most people do in their applications, uh, you know, that, uh, you don't have to go through and kind of manually annotate. For every one of these, like some type of effort, like sort of the way that types are done with community, uh, you know, community types, basically that type of thing, but for, for these permissions, um, would, would, would help this.

Spread in a nice way, uh, and I think Socket would be happy to like, you know, we could contribute our knowledge to that effort as well. I mean, if we know that, if we see that the package is accessing the network, I mean, you know, that's, you know, we have all these ways of detecting this through static analysis that we could also help contribute to.

Um, so I don't know. Yeah, I think, I guess it's mixed because. There's some hope on the horizon, but I also just see kind of the, I see things getting worse too. Um, and I, I see like, you know, um, the number of dependencies people are using continues to climb. Uh, and, uh, and I don't see that really changing. Um, and it's funny, you guys had Isaac on recently, right?

Isaac Schluter, the creator of NPM. And, um, I was listening to that episode and, uh, at one point he, he was talking about the definition of dependency hell. And, um, kind of the old definition and the new definition. And. The new definition is just like, I have a lot of dependencies, right? I'm in dependency hell, I have too many dependencies, right?

But the old definition was like, uh, I've gotten my app into a place where I can't successfully install because Um, Package A and Package B both depend on Package C, um, but different versions of Package C. Uh, and my package manager doesn't know how to handle that, because it can only install one version of Package C, but in order to satisfy A and B, they actually need different versions of that Package C.

Um, and I actually think that it's Isaac, part of the, part of this is Isaac's fault, not to call out Isaac, but, like the NPM design was so successful. At solving that dependency hell problem that it basically made the cost of adding dependencies to your app. Zero. Like before, if you were a developer, you had to like, if you were a developer of an open source project and you added a dependency, if you made your dependency have a dependency, right.

You're now potentially screwing over all of your users because you're now making it so that if any of your users. Um, also use that dependency, right? Um, like, let me make this concrete. So I'm the author of package A, and I want to add package B as a dependency to my package A that I maintain. I basically now am screwing over any of my users who have package B in their dependency tree already.

Um, because they may have a different version, um, of that package B. And so, as a developer, as a maintainer, there's this insanely strong incentive in ecosystems like PyPy to not add dependencies. Because you might be screwing over your users in that way and making their app uninstallable. And so, what they typically do is they'll just Straight up copy paste code into their package and say we have no dependencies, right?

Because it's But that has its own problems, right? I mean then you don't get bug fixes. You don't get security fixes It has its own problems. But um, but anyway npm My point is just npm was so successful at making these packages these dependencies feel like they're Zero cost that it's sort of put us, got us into the situation now where, you know, um, where, where, where hello world is like a thousand dependencies.

And then, uh, uh, you know, uh, a production app like discord, the discord desktop app has like 30, 000 dependencies, right? That's like totally normal, um, today. And so, yeah. Um, it's not all Isaac's fault, but he was, he was too successful in a way.

Andrew: he was too good. Um.

Feross: Yeah. NPM is NPM is too good.

Andrew: yeah, that's, that's why you built a whole company to catch exploits shipped in it. Uh, so, so speaking of exploits, uh, before we start wrapping up, uh, you've seen lots of different terrifying trends when it comes to what people are shipping in their packages. Uh, I read a blog post about ransomware and typo squatting.

What are some of like the, the scarier trends that you've seen pop up while you've run Socket?

Feross: Yeah, let me think of which one to cover. There's so many.

Justin: That itself is kind of chilling.

Feross: Well, I mean, I think, um, the thing that scares me the most is these, these attacks that target one specific company, to be honest, because I think they're just way less likely to get detected. Um, the, there's all this sort of bottom of the barrel like stuff that, that we catch all the time that, you know, is, is, Very low effort and the reason why we are able to catch the reason why they haven't had to try harder to hide that what they're doing is because no one is scanning for this stuff.

No one is looking for this stuff. So, um, you know, just think about your own team, your own experience. Do you ever open up the source code of your dependencies? I mean, I do sometimes to see like how it's, you know, to see whether the code, I mean, if I open up the first file and the code looks like a disaster, then I might.

Use different dependency, but, um, but I usually don't go further than that. I don't go like two layers deep, three layers deep and check all of its dependencies and go all the way down. Um, and so since no one's doing that, you know, um, it may, it just makes me, uh, worry that like if, if we are missing something at socket, you know, or, or, you know, you know, and something is really targeted to one company, like what, what that effect, what that could, you know, what that could mean, um, and how long, how long is it, um, How long is it going to be in there before somebody catches it and like what damage is going to happen?

So the really targeted stuff scares me. The stuff that's like more, you know, these, these, I mentioned these 400 malware packages that go up every week. I mean, if you suck it, it's like a two click install. You can add it to your repo and then you'll be protected from it. Um, and. Uh, I think everybody should do that.

Obviously, I'm, I'm biased, but, but, but there's like a, there's like a solution to that. Right. But like the stuff that's super targeted. Um, you know, we just have to keep improving our, our detections and making, making them better and better. So we don't miss anything for people.

Justin: Yeah. So I guess to, to sort of lead us out, uh, I have, I have two questions for you and maybe we'll start with the first one. Uh, so, so what's, what's next up for socket? What are y'all, what's on your roadmap?

Feross: Yeah, we got a ton of exciting stuff. So, um, we're, um, shipping, um, a bunch of new languages. So we actually, we actually, Most of us on the team now are JavaScript and Python, uh, people, um, and, uh, we're, we're adding new languages like Go, I mentioned, um, and, and we have a whole bunch of, of, of, you know, like another eight languages coming in, in, in the new year, 2024.

So, um. Uh, that's like a really big one because when we, when we, um, talk to really large companies, the number of languages that they use is just, it boggles the mind. Like, they have basically every language. Um, one company told us, uh, it was, it was a bank. They told us they have like 38 languages in their environment.

And I was, we don't have 38 languages on our roadmap, so I don't know what the languages are. Like, like, I don't know if they mean like just different file extensions because that would be, uh, that would be Less and less scary, but, but if it's like 38 languages, like they said, then, you know, it's, it's, it's super crazy to think, uh, but so my point is just that, uh, it's really important for a tool to support like all the languages that a company has.

So that's like one really, really big priority for us, um, we're, we're going to work on auto fix PRS, uh, for, for more of our, um, more of our findings so we can help people fix this stuff. Um, we have some new AI stuff coming out that I can't talk about yet. We have, um, uh, uh, faster reports coming out. So we're going to get, get our, our, our scan times way down.

Um, and, uh, yeah, and we're just trying to, you know, just overall make the product better and better, um, add more ways for people to use it, get the data in different places, get lab, uh, support, um, bitbucket support, just like more and more integrations with different things. Um, yeah, and then just trying to keep our focus on, on making it really developer friendly the whole throughout the whole, the whole thing.

So, um, you know, there's, there's way more we could be doing there that I'm, I'm excited to keep doing in 2024.

Justin: That's awesome. So, uh, following that up with, uh, what do you think the future is for this space when we're considering future the web of open source, but like really specifically through the security lens, what is our future? What can we do? What should we be doing? What's going to happen? I don't know.

What are your thoughts?

Feross: It's so

broad.

Justin: This is broad. Narrow it down to whatever area that you're most interested in talking about.

Feross: Um, I mean, just specifically in, in the, in the open source security space. One thing I can say is that I think more and more companies are recognizing that this obsession with known vulnerabilities. Is really missing a big part of the picture, uh, we're starting to see like when we came when socket came out in 2021, um, we were like blowing people's minds by, by, by like explaining this and saying, you know, hey, You should care about more than just vulnerabilities.

Um, but, but now I think people are really seeing that. That's, um, that's like important to do. Um, and part of that is that like the word supply chain has become kind of a buzzword. Part of that is like there's these executive orders around. Um, if you want to sell software to the government, you have to provide like a full list of all of your open source packages that you're using in an S bomb or a software bill of materials.

So there's all these like reasons why this is happening. And, you know, and then there's of course the massive rise in supply chain attacks. So all that kind of adds up to people now, um, can't just install like a dependent bot or something, you know, some type of phone scanner and be like, Oh, we're good now.

You know, our, our open source is safe, right? They recognize that a developer could install, you know, a typo package that has. 20 downloads, no one's ever looked at that thing, and it's not going to even have any CVEs or any known vulnerabilities. Doesn't mean it's safe, right? It could be anything. So, just that type of awareness, I think, is going to mean that people start asking more questions of their dependencies.

Um, I think that means that hopefully more people are going to use Socket and, um, you know, and, uh, You know, to start thinking more consciously about their open source and like what they're using, how they can cut down on the amount, how they can, um, improve the quality of the packages they're using, um, be more thoughtful and careful.

I think that's the future, um, but I'm an optimist, so who knows, maybe it'll get worse and we'll have, next time we talk, we'll have like 100, 000 dependencies in, in, in Discord and, and other apps of similar scale and. Who knows? But I think the future is that we're going to be, we're going to actually be more careful and more thoughtful about this stuff just because of, um, of all the combination of things I said.

Justin: It'll be interesting too, with the rise of web assembly to see how some of this stuff changes. Uh, you know, another distribution source, broader attack surface from just the perspective of like, now we have more things using, you know, this compile to target or whatever. It'd be interesting to see that.

Feross: Yeah. The other advantage of that is, is that you have a really clear boundary too, with like WASM and the, and the sort of outside world, like, um, where you can, you can almost enforce policy there. Um, and, um, I'm, I'm actually very Interested in that and I think that that's that's actually it's a really good point you bring up That's a very very promising area though.

Um You know, it doesn't it doesn't work for all the code we have today you know the the thousands and thousands of npm packages are probably not going to be rewritten in wasm anytime soon,

Andrew: it's a lot of work. Well, that's it for our questions this week. Uh, thanks for coming on Feross . This was a both interesting and chilling conversation into the world of open source security. So thanks for coming on.

Feross: Yeah, no worries, I hope I didn't scare you too much There are good people in the world

Justin: Yeah,

Feross:

Cool. Yeah. Thanks for having me

Justin: Yeah. Thanks for coming on. And also like, thanks for tackling this problem. So this is, this is an interesting area where I can see a lot of the things that you've been passionate about over the years coming to like culminate in this like really, really worthwhile, very necessary, very needed initiative.

So, uh, I'm, I'm excited that socket exists and excited to see it grow and yeah, wish you wish you the best with it.

Feross: Awesome. Cool. Thanks. Thanks again for having me guys. This was really fun