David: [00:00:00] You can't write automated tests to make sure your security rules are working correctly. developers spend all this time building the application, they push it to production. Then they hit a load of problems or the problems come a few months down the line. We need is better tooling on the production side of things to bring developers into the security. mindset,
[00:00:21] Introduction
Andrew: Hello, welcome to the DevTools FM podcast, this podcast about developer tools and the people who make them. I'm Andrew and this is my co host Justin.
Justin: Hey everyone, uh, we're really excited to be joined by David Mitton today. Uh, so David is, uh, the co founder, one of the co founders of console. dev, which I'm super excited about, was saying before we started, it's like, wait, we should actually talk about how to run a DevTools podcast. Um, uh, David, you're also, uh, one of, or are you the founder or one of the founders of ArcJet?
The founder, cool, cool, cool. Um, so you're the founder of ArcJet, which we're [00:01:00] really excited to talk about today. It's a security product, um, which I think is kind of under discussed in the sort of like front end meta framework world, and it's really interesting that, uh, you, you target or market towards that audience very explicitly.
But, uh, before we dive in, uh, would you like to tell our audience any more about yourself?
David: Yeah. I'm David Mitton. I'm the founder of Artjet also writing console. dev. As you mentioned DevTools newsletter, which I've been doing for four years now, um, started Artjet just over a year ago in June, 2023, and then prior to that, I ran a server cloud monitoring startup that I started in 2009 and sold in 2018.
Uh, stayed with the Acquirer StatPath for about 18 months and then left, did a, um, a stint in sustainable computing and then decided to get back into DevTools and infrastructure and programming [00:02:00] through console.
Justin: That's awesome.
[00:02:01] Sustaining a DevTools Newsletter
Justin: One question I want to ask about console before we dig into the other stuff. How do you sustainably write a newsletter? Uh, we have a newsletter that I have not updated in a long time, uh, because it's actually, uh, kind of hard to consistently do that, but I'm curious about what your process is.
David: I'm always playing with new tools, people send me things, I see things on Twitter, and through building Artjet, I've got to actually use a lot of them properly rather than just playing around. The whole idea with console is I include the things I think are worth doing. It's not just what's new or what's cool.
Some of the stuff that I review is quite old, but it's just, I've come across it or I've started using it properly. And I think it's really interesting. And so it's about, as I'm browsing and reading, um, we have a. A tool that subscribes to about one and a half thousand blogs, um, all from different vendors, open source, Reddit, everything [00:03:00] goes into this tool and it picks out things like, uh, this is a beta release, and then that puts it into a notion database that we have.
Um, but I'm just keeping a list of the things I'm seeing all the time. And then every week, because it's a, it's a weekly newsletter on Thursdays goes out on Thursday, normally on Tuesday, depending which time zone I'm in Tuesday, I'll put together the newsletter, spend a couple of hours writing the reviews, reviews only 300 characters of what I like and what I don't like.
So it's, it's not a super in depth. Review, but it's just basically a couple of bullet points of here's the interesting stuff, but here's also the stuff I don't like, or it's not really the stuff I don't like because I don't put stuff in the newsletter that I don't think is good, but it's more like here are the things you might come across if you are actually going to use it or things to be aware of or limitations.
And the idea is that if you're. The author of that tool, it would, shouldn't be a surprise. It shouldn't be me just flaming the tool or whatever it is. It's just, these are the limitations as it exists because nothing's perfect. And so then it goes in the newsletter and it only takes a couple [00:04:00] of hours to put together because we've automated most of it.
Um, the actual. Creation of the newsletter each week, but it's kind of this ongoing process of things throughout the week, all the time that I'm seeing that just goes into a long list.
Andrew: Some interesting ideas there. We've kind of shied away from automation in our newsletter, but it's It's interesting to hear that you guys have leaned fully into it.
David: It's automation of the discovery. Um, and then like creating the HTML for the newsletter. And like, I write the reviews in Google docs, copy them into notion and then press a button and it generates the email. Um, so there's some automation there, but it's not like I'm asking an LLM to write the review or anything like that.
It's been suggested. And I, it is on my to do list to try it out and see if it is actually going to provide some interesting insights, but some of the tools are new. So it's just not going to know about them, or it's just not going to get the nuances as well as make stuff up. So it'll probably end up being more effort than it's worth having to just fact check it all, but that's going to be a fun experiment at some point.
Andrew: Yeah, I'm excited to get to the point [00:05:00] where all of these hallucinations that LLMs are having we can just be like, oh, please just go implement that thing because more than once I've, I've seen an API suggested to me. I'm like, that is the golden API. And then it just never works.
David: All the API just doesn't exist. And I had this interact support interaction with, uh, with wordpress. com and I was asking them, is there an API to do this particular thing? And, um, the support bot generated a complete full documentation of this API point that just didn't exist. And then I emailed them and I was like, this doesn't actually exist.
Is. Is it just because it's behind some feature flag and it's not available, or is it actually not really there and it wasn't there, which is disappointing.~~ That's~~
Justin: So
[00:05:40] Console.dev Podcast Process
Justin: let's talk about a little bit about the console podcast, uh, or console. dev podcast. Uh, so you said kind of before that y'all aren't recording right now. He's kind of do it in season. So you're kind of like waiting till the next season. Uh, what is your process like?
David: So I co hosted it with Jean Young, who's the co founder of Akita software, [00:06:00] which was acquired by Postman. And she's now working on the observability tooling at Postman. And we decided that we were going to get together and do the podcast where we would interview interesting people in DevTools. And so.
The idea was that we didn't want to have to do it every single week. We wanted to time box it. And so we decided to have 10, 12 guests. We made a list of the topics that we wanted to cover, and then we tried to find the number one person. Who we thought would be the most interesting to speak about that topic.
And then over a period of a couple of weeks, we recorded all the episodes and had them all produced. And then we released them as like a season. So it was over kind of 12 weeks, but we'd recorded it in the space of just a couple of weeks. And that, uh, Was very intense, uh, but it allowed us to, uh, kind of spend the time on each episode and not have the feeling like we're just like with the newsletters every week, I've got to produce every single week.[00:07:00]
I'm sure you, you do that with the podcast as well. I'm trying to get the cadence right of getting guests scheduled and then rescheduling and then getting everything. We just didn't want to have that stress. And also with the first season, we didn't know whether anyone was going to listen to it. And so it was like, well, let's do the fixed number of episodes, see if it's interesting.
And if. If we want to continue doing it or, um, we're having enough fun, then we would do another app, another season. And so we've done three seasons now, I think. Um, but I then started ArcJet and haven't had time to do it. And Jean's startup was acquired and she hasn't had time either.
Andrew: Yeah, balancing side projects with a week to week podcast is no easy task.
David: Yep.
Andrew: Cool. I want to move on to a different topic.
[00:07:41] Ad
Andrew: We'd like to stop and thank our sponsor for the week, Mux. Mux is an awesome platform that makes adding video to your product as easy as adding an API. Video is notoriously hard to add. There's so many things you have to think about. It's a deeply technical medium and delivering it at [00:08:00] scale is no easy feat. That's where Mux comes in.
They have SDKs and front end components make adding an enterprise quality video experience to your app, a breeze.
One area of focus of Mux's platform that really impresses me is their data. Introduced a new feature called long term storage where they'll hold your data up to 13 months.
Andrew: And you can view things like viewer engagement, video quality, all of this data is accessible right through their dashboards. too.
I can easily see how having access to historical insights could really be valuable for understanding trends. Planning ahead, and making data driven decisions. It's cool to see Mux giving their customers more options and flexibility when it comes to storing and analyzing their data.
So if you want to add video to your platform and you don't want to spend weeks
or even months worrying about the details, head over to Mux. com
Do you not want to hear these ads anymore? Become a member on one of the various channels that we offer it. With that, you'll get our episodes a little bit early and completely ad free.
With that, let's get back to the
[00:08:56] Sustainable Computing and Environmental Impact
Andrew: Now it's some, it's a topic we've broached here a few times on the podcast. [00:09:00] Uh, but that's a sustainable computing, computing and the environment. I I've always found it very funny on Twitter where during the crypto boom, people were like, Oh, crypto bad uses computers lots.
And then we get to the AI boom and nobody gets it. Cares anymore. Uh, so can you tell us about sustainable computing and how our like current world of computing might not be the best for our environment?
David: All right. So I, I got into sustainable computing after selling my first company in 2018. And I was thinking through what areas. The world, I think are interesting and important and climate change is one coming from startups and tech. I didn't know anything about it. And so I decided I'd do a master's degree in environmental technology and kind of bootstrap my knowledge in science.
My original undergrad was in law and self taught programming. So completely. Completely different area, um, nothing to do with science. And so I needed to bootstrap my knowledge into science. And so I did this master's [00:10:00] degree and that was very, very broad. It was everything you can think of related to the environment.
We covered fisheries management through to how the energy system works, um, and everything in between. And then in the final part of the masters, I decided to specialize in energy. Um, Because I thought that was the most applicable. And we visited a power plant and we learn how electricity works and how all the electricity grids come together.
And at that point in time, there weren't many people looking at what I described sustainable computing or the energy consumption of it day centers, the cloud. Um, there are a couple of research groups. Um, uh, that was, it was probably 10, 15 people in the world who are publishing on this on a regular basis in academia.
The big tech companies are doing things. Google has been doing a lot for a while. Microsoft, um, Amazon's a bit slower. They're a bit less transparent in what they do, but they're all doing things. Um, but there wasn't really any, any real buzz around it. And so I thought this would be interesting because I'm coming at it from a kind of [00:11:00] computing perspective.
Perspective, whereas most people were involved from the science, energy, science, engineering side of things. And I thought I could combine those two having done that master's degree, having specialized in that for a little bit. I thought I could bring more on the, on the computing side of things. And so the idea is with sustainable computing to be able to continually increase our usage of computing and technology whilst at the same time decreasing the environmental impact.
And when people talk about the environmental impact. They primarily mean the carbon footprint and usually the carbon footprint of electricity, which is quite specific, but it's much broader than that. There's the water consumption of data centers, air conditioning, even in the production mechanisms, but then also manufacturing and the mining of materials to, to create computers and data centers and all the equipment that goes into it. The easiest thing that people think about is carbon footprint, because the goal is zero, you're going for net zero or zero carbon. And that's [00:12:00] really easy to understand because it's a, it's a goal of zero. When it comes to water footprint, that's a bit more vague. It's not necessarily zero that we're aiming for.
It really depends on the location, because if you're in an area that is water stressed, then you want to get to as little as possible. But if you're in an area. Right. If you put your data center next to the sea or by a lake, then water consumption is a factor, but it's probably not the most important one.
And so there's much more of a trade off around there and people tend, it's a lot more complicated and complex to discuss. And so the idea with sustainable computing, therefore, is to understand when you do something on, on a computer, on the internet, what is the energy footprint, and then what is the environmental footprint of that?
Andrew: So do you think, like, how do we, how do we tackle that? How do we like use less energy when like largely our world has moved to the cloud? All of the technology that is like on the bleeding edge is always some Heavy operation happening in the cloud. So [00:13:00] how do we get past that?
David: So over time, the overall energy efficiency has improved significantly, and the demand for computing has somewhat decoupled from the, um, the overall impact. So, um, if you compare back by kind of 10, 15 years, the. And the increase in usage of computing has gone up 100, 000, 10, 000 times, whereas the overall energy consumption has only gone up a couple of percent.
So it's still increasing, but it's increasing at a significantly lower rate than it has in the past. And the reason for that is just efficiency. The cloud, one of the big things that they always talking about is how efficient it is. You're, uh, you're using resources that are there, whether you're using them or not, but they can optimize the placement of workloads to maximize their infrastructure.
And they do that just from a cost perspective, ignore the environment completely. It's better for them financially to maximize the usage of their infrastructure. Just so happens that the majority of the cost. In running the infrastructure is energy. And so there's this [00:14:00] linked incentive. And so just using the cloud itself is way better than putting servers in a data center, typically, um, if you're just thinking about the environmental footprint, now there's all sorts of different things you need to think about.
Managing servers. People don't want to do that. Although people are getting more into it these days. Um, there's a kind of cycle of put everything in the cloud and then you realize it's really expensive. So that brings certain things back into your own data center, all those kinds of things, and no one's actually building their own data centers, but that's kind of how you phrase it.
You're putting it in a co location or something like that, but it's very, very difficult to be able to. A link, what code you write to the overall impact is the level of granularity, uh, in terms of the monitoring and the observability just isn't there. And often there is no link between the amount you use and the amount of the, of the environmental impact.
If you think about energy, um, on, uh, Processor basis. Let's say we're using streaming software, so we're doing encoding [00:15:00] and on our computers, um, it's going to be converting the video into the encoded version, then sending it over the network. And so you can break that down into two steps as the actual encoding bit on your computer, and that is directly linked to the usage.
So whilst we're streaming, our laptops are using, um. Mine's at 11 percent right now. You can probably see how much it's doing on your CPU and look at the energy consumption there. But as far as the network is concerned, there is no difference between us streaming now, if we were a hundred percent CPU usage, there's no difference as if the computer was off.
The amount that the network uses is exactly the same, whether you're using it or not. It's not a hundred percent true. There's a very slight variation. I just published a paper on this, um, which took a couple of years in academia. It's very, very slow. Took a couple of years to publish. Um, I just published a model, which shows us basically no impact of usage of any kind of computing on the network.
Um, there is no link. Um, it's not directly proportional, um, because the network is provisioned for [00:16:00] maximum capacity and it uses the same amount either way.~~ Okay.~~
Andrew: So was the like uproar over crypto and now like AI image generation kind of just rage bait.
David: Some of it, um, with crypto, it's a little different. It's using the network less. It's kind of proof of work, right? So your CPU has to do work. And so that uses energy, um, and that does have a huge energy consumption. Um, but when they changed the algorithm, then you can see, um, on the graphs that the energy consumption basically went to zero when it's kind of proof of stake, um, the energy consumption has got almost gone to zero, um, that's where the theory, um, Bitcoin is still proof of work.
And so it still has a relatively large energy footprint. Um, but most of the activity happens on Ethereum with all the different blockchains, um, all the different tokens that are built on top of Ethereum. Um, and so that switch was possible. Um, so you really have to separate the different types of.
Environmental impacts. And if we're thinking about processing, then that is directly proportional to the usage. The network isn't, um, and those are the [00:17:00] two interesting components that people tend to think about. Um, but then there's other, other, um, impacts as well, like building a laptop, manufacturing it.
Most people's usage of their computer equipment, almost all of their carbon footprint of that usage is in the manufacturing of the laptop. It's not in the usage of it. Um, it's in the actual manufacturing. And so the biggest thing that someone could do to reduce their environmental impact of computing is to buy fewer things.
And not get a new laptop every single year, get it for the full lifetime of the equipment, like four or five years, um, particularly with the new, um, Apple Silicon, if using a MacBook that like the laptop I've got from the original MacBook Air, Apple Silicon, um, it's still incredibly fast. And you can use that probably for five or six years before you even need to consider replacing it.
[00:17:47] ArcJet: Enhancing Application Security
Justin: So let's transition and start talking about ArcJet. Uh, so what is ArcJet?
David: So Artjet is a SDK that developers install into their application that helps them build [00:18:00] in security. So we provide things like rate limiting, bot protection, email validation, and attack detection. And the idea is that you build it in as a developer into your code. You can define the rules in the code that it's actually protecting, and it will then give you the response back so you can change the logic of your application.
And the idea is that developers can build this in from day one, or they can add it to an existing application. And you can kind of bootstrap these core features that build up the security profile of the application, just as you would with any other library that you might be using.
Andrew: So I'm, I've said a lot on this podcast. I'm a front end developer and I really don't know too much about security. So, uh, ArcGIS says it protects me from a lot of things, but like, what are the sort of things that protects me from like maybe in the context of shield?
David: So shield is a WAF like component and a WAF is a web application firewall where it analyzes every request and looks for suspicious activity. Um, the typical thing that you might think about is a [00:19:00] SQL injection or cross site scripting attack. The shield component can detect that but unlike every other WAF it doesn't block it immediately Most attacks don't succeed on the first request.
Typically when you're probing an application, you'll take a bit of time. You'll send requests to it. You'll see what it's doing, see how it responds. And then over time you can figure out if there are any vulnerabilities where they are. And that process of. Probing an application is what we detect. So this helps reduce the number of false positives.
Because the last thing you want is to have something you think is suspicious, and actually you've just blocked someone who's trying to buy a product on a website or sign up to your service. So we use the profile of the requests over a period of time to decide whether we think a request is suspicious.
And then we return the result back to you. And as a developer, you can just accept that result and it might just be block the request. More often, you probably want to take some kind of custom action. So if the user's already [00:20:00] authenticated, it might be that's flagged as suspicious and you decide, Oh, well, let's ask them to reauthenticate.
Can they prove who they are? Um, or you might be wanting to protect your API about, uh, from abuse, large numbers of requests coming through. If it's an anonymous user. Then you want to apply a certain rate limit, but if it's your largest paying customer, then you probably don't want to block them. You might just want to trigger an alert to your team so they can reach out and say, what are you doing with the API?
Can we help you change your design? Or do you want to pay us more for a higher quota or something like that? And the idea with that, Jay, is you can make those very granular decisions because we have the context of the request. You have the context of where it is in your application, who's authenticated, what's the session, uh, what do you want to do as a developer rather than it being something that's generic and sits on the network, which has no understanding of your application, of your tech stack, of what you're building, and we'll just issue block, uh, block requests without you even knowing what's going on.
Justin: [00:21:00] Um, I want to talk about the approach that you're, that you're taking here a little bit because it is a little bit different than kind of what I've seen in the past. Uh, so you'd mentioned earlier ArcJet is an SDK. Um, so you've got a lot of really great examples on your landing page of, like, integration into Node.
js or Next. js or a bunch of different frameworks and tools. Um, and you sort of like. call the ArcJet SDK, you set it up, uh, give it a key and some rules that you want to configure. And then based on what you're doing, if you're doing shield, uh, you might have it in some middleware that's like handling the, uh, And kind of looking at that, or if it's doing bot detection, you know, it might do something a little bit different if it's doing, um, you know, email detection, then, then you'll pass emails to it manually.
But, um, the thing that's interesting about this approach is that in some middleware, you're making a request to ArcJet to do this verification. Um, and this [00:22:00] has some really interesting trade offs. Uh, And I'd like to talk, uh, sort of like in depth about like what those trade offs are, um, what this makes easy and like what this might make harder.
Um, so do you kind of want to start it off from there?
David: Right. So we've, we've come at this from the perspective of a developer. Um, Who's probably trying to solve a problem rather than thinking about security specifically. So they, the common example I use is they're getting spam signups. Now to solve that problem, you need to do rate limiting. Cause you want to just make sure that like normal people will submit a form.
Once or twice, if they make an error, maybe three times in a short period of time, not hundreds of times and not even tens of times. Um, so you have to do sort out rate limiting. So probably if you've got Redis in there or, um, figuring out where you're going to store the rate limit, cause you need to track that across time.
Um, so you have to build something there. And then you probably need to do bot detection because you want to detect those automated clients. And to do bot detection, you're going to have to [00:23:00] figure out a product to buy. There's probably a commercial product, um, that does that. Uh, and then you need to do email validation and verification.
So you need to make sure the email is actually a valid product. Um, but you also want to do things like, well, is this email address actually deliverable? Does it have MX records? Um, where is that user signing up from? Are they going through a proxy? Are they going through Tor? Um, is it coming from a cloud IP address, which might be suspicious.
And so to do that, you probably got to. Build that as well. You've got to write some code. And so you've got these things where you've got to figure out a Redis, um, which is easy on a single server, but becomes more difficult when you're deploying it globally. Um, you've got to integrate an existing product, um, and then you've got to build something.
And all of that is undifferentiated as a small startup in particular, but any business really, um, developers generally prioritize fixing bugs and building features for users. Building this kind of security functionality doesn't differentiate them. But it's a pain point because it might be costing them money or it might be hurting their credit card processor reputation or something like that.[00:24:00]
And so we come at this from the perspective of the developer trying to solve the problem, which we want to be as quick and easy as possible. And the way that developers normally do that is they install a dependency, install the library to help them solve that particular problem. And they also don't want to then have to manage infrastructure.
So Redis, it's not that difficult to manage on a single server, but it gets more complicated. If you're using a database, um, then that's more challenging. You're probably already using it, but is it set up to do rate limiting? Is that the right place to store rate limiting? How do you design all that, that kind of thing?
And so the way most security products do it is by having an agent. Which you've got to install somewhere or by putting it at the network. And our philosophy is. We don't want an agent because it's another thing for the developer to manage. If they're in a larger company, it's probably another team that they've got to get approved or managed somewhere.
It's probably yet another agent that they've installed on their, on their, um, their servers. Um, and we don't want to have to give them anything else to manage, but also we want developers to be able [00:25:00] to. Do things locally you can't put a network firewall and run it locally on your development environment And the idea is there that you can write the security functionality.
You can integrate archette locally You can test it locally. Um, then when you push it to staging it's the exact same thing that's running there You can run it in ci and test it and it's the same as when you go into production The number of people i've met who have kind of pushed it Got a new security product.
It's only in production. They turn it on and then suddenly everything breaks because it's changed something in the environment or is filtering something. Um, and the idea with that is you can avoid that pain of breaking production by testing it locally. One of the big challenges that we saw was that, well, In serverless environments, there's no shared state.
And so the reason you have something like Redis running somewhere else is because you need to store something you're tracking the rate limit. You need to store that somewhere. And so the natural place is to put it in Redis. Um, because when you're using a service environment, the function is [00:26:00] going to recycle on sometimes every request.
More likely it'll stay warm for a little bit and then it'll shut down and then you'll have the cold start and then you've got to distribute it globally, depending on your application. And so this was a real challenge because we wanted to have no agent, no infrastructure. Um, but we also needed to do things like tracking state across requests.
And so the architecture that we came up with is the SDK is a very lightweight wrapper. Around a web assembly module. And that does a lot of analysis. It runs locally in your environment and we can do things like bot detection. We can do the email validation entirely locally that just runs as part of the node process or the Python process or whatever it is that starts up.
But then when you configure a rule that requires tracking state across multiple requests or needs to do something like. A network call to do an MX record lookup, which might be blocked on your firewall or might be really slow. Um, then we have our API and so we've got this real time decision [00:27:00] API that ArcGIS SDK will send the request to get a decision and then return that back to the SDK.
And we run that API in every AWS region. As close as possible to your application, we're also deployed on fly. io. So we're deployed in every single one of their regions. And the goal is that we are one to two milliseconds of network latency to our API. And then we give ourselves an SLA of 20 milliseconds to give you a decision, full round trip, including the network.
So it adds a bit of overhead, but not very much 10, 10 to 20 milliseconds. And that's when it needs to make a request to the API. And it has this decision logic built in so that it doesn't always have to go to the API. If we can do it with the WebAssembly, but then if we return a deny decision, that gets cached.
And so there's this complicated logic determines whether it goes to the API, what it's in memory, what it caches, um, and whether we can use that WebAssembly component.
[00:27:56] Choosing WebAssembly: Benefits and Challenges
Andrew: The WebAssembly component seems pretty interesting to me. [00:28:00] Uh, what were some properties of WebAssembly that made you choose that? And then like, it's just like all of the logic for that written in some language and you compile it to a WebAssembly module, like how does that work?
David: Yeah, exactly. It's written in Rust and WebAssembly and, uh, API is written in Go. Um, and it actually calls the same WebAssembly code because in some cases WebAssembly, you can't run WebAssembly everywhere. You can almost run it everywhere, which is one of the reasons why we chose it because it's completely independent of the execution environment.
But in some cases, WebAssembly is not available. And so in those cases, we fall back entirely to our API. But the challenge there is you've got the SDK is written in Node. js for our, um, for our Node SDK. The API is written in Go because that's the best language for GRPC APIs, but then you start getting into the implementation differences between languages.
Like if you're doing hashing, the hashing algorithm might be implemented very slightly differently in Node as it is in Go, and you get two different values [00:29:00] for the same input, and we wanted to avoid that, and so. We call our WebAssembly module in Go, and it runs the exact same code as if you were calling it in the SDK. There are other properties of WebAssembly, which are really interesting. It's almost, gives you almost native speed. So there's no real overhead. Um, we've written in Rust, which doesn't have a garbage collector. So that avoids a problem that you might have if you're using Go, for instance. Um, compiling Go to WebAssembly, um, includes the garbage collection.
And also for security, it's entirely sandboxed and you can see by inspecting the code exactly what we send into the WebAssembly component, exactly what comes out. The bindings show that and there's no ability for that WebAssembly component to do anything else on your system. And that's a really nice security.
It's not, it's more of a marketing thing, um, really, but it does actually give us proper security benefits. And so it's nice for us to be able to it's executing entirely within the secure sandbox.
Justin: So one of the questions that I [00:30:00] had was, um, so you have an API component, which hopefully you, you would like to call as little as possible to make sure that, uh, a customer's request is as fast as possible. Um, what happens if you experience an outage or downtime? And, and there's like, I guess, because you said that you're deployed to so many, like, In nodes.
So every region AWS and like every region in fly. io, it seems like you could have like a localized outage where like one node is down that might be close to a customer. And then you could have like, you know, maybe a region outage or something. Can you kind of talk through like what happens in those scenarios?
David: Yeah. So we thought through as much of the failure scenarios as possible. So like you said, we try and do as much as we can locally using the web assembly and where the web assembly can make a decision, it just reports that back to us asynchronously. Um, so there's no, it doesn't block the request, but when we need to track a rate limit, then we do block the [00:31:00] request.
And that's why we have this very tight SLA. So we go to the closest cloud region. wherever you're running and we use any cars to do that. So we use BGP to help us write the request appropriately. Um, either on a flight IO, which provides that natively or on AWS, we use the latency based routing built into rack 53.
Um, so that will pick the closest region and then we're deployed across multiple availability zones within the region. So we can survive a localized problem. The interesting thing is if you're. Let's say your application is deployed to us east 1 and you're going to our API in us east 1. If there's a regional outage, then probably the application is down as well and our API is down.
So we can, we can avoid that, that challenge. It's less used to happen a bit more often than it has done in recent years. us east 1 has this reputation. It's their biggest. Uh, it's their biggest region, um, and, um, has had some availability challenges and you know where it's down because there are so many [00:32:00] services that use that, that region by default, but each region has been architected so that it is entirely independent.
There's no dependencies on any other region. Um, we can take a region offline and it doesn't affect the application. And the goal of the API is to return a decision back to the SDK as quickly as quickly as it possibly can, which means it doesn't even do any database calls. Uh, it hands it off to a queue behind the scenes to record the, record the request and to click house, which is what we use for showing you the requests on the dashboard.
Um, so there's a lot of resiliency built in just to the cloud architecture. But then if all that fails, um, or the AZ AZs are down that we're in, and you're in a different one because AWS has a lot, um, then the SDK is designed to fail open. And there's a timeout built into the SDK so that if there is an issue, it doesn't take down your application.
And we think this is a, a pretty good way to do it. The best default, because it means that your application doesn't get taken offline. [00:33:00] If we have a problem or if we're slow and you have the option to flip that and the SDK doesn't throw, it will just report an error and you can do an error checking within the code decision dot is errored.
Um, and then you can decide what you want to do with that within your own code. Um, by default, we don't do anything. We just log it so you can see it in your code. And the trade off there is that potentially the security protections that you expect to exist are no longer there. Um, but the idea is that you can have that choice and perhaps in certain APIs, your.
Less concerned about the security layer being unavailable for a short period of time. Uh, but maybe in some very sensitive end points, you do want to fail closed. Like AWS is WAF has this default fail open, but you can switch it to fail closed. And it's really then down to the developer to decide what they want to do in those error cases based on their knowledge of the application.
[00:33:54] Security Primitives and New Features
Andrew: So in reading through the docs, it seemed like there's a focus on like these security primitives, like [00:34:00] looking at the bot protection feature as a whole, it's really just kind of like a composition of the other primitives that are found in ArcJet. So are there any other primitives that you want to add that'll unlock some like new security things?
David: Right. So we've come across it with this, this idea of primitives that you can compose. Um, we provide what we call a product, um, that uses multiple primitives configured in our recommended, um, recommended way. So sign up form protection uses bot protection, rate limiting and email verification. And you can use that, just drop in the code and use it in one go, or you can use them individually and configure them directly.
And the idea behind this was that something that I'd seen from, from console. dev is that. Developers tend to focus on that happy path of the quick start where you want to get the developer up and running as quickly as possible. And they use your, your application in a very specific way, but where the tools I've seen that not being successful as they don't then think about, well, how do I break out of that happy path?
How do I become a power user? Can I configure [00:35:00] things outside of the recommended way? Um, or just because I have some custom thing that I need to be able to do. And so we wanted to provide developers with a very easy way just to drop in a couple of lines of code. And solve a particular problem, but every developer knows that it's never just a couple of lines of code.
You always have to do extra things. And when you need to get into the details of that, that's where a lot of products fail. And so that's why we've got this model of primitives that you can get into the details of. So we've got those key primitives that I mentioned, and, um, potentially by the time this podcast is out, we'll have a new primitive, which is data redaction and personal information detection.
And that is actually a really interesting one because it's the, it's, it uses entirely. It's executed entirely in the WebAssembly module. There's no API component to that. And there's a reason for that in that we've written a custom parser. So we can do that entirely in WebAssembly, but also personal information is exactly the thing you don't want to send to a cloud service and every single other product that I've seen that [00:36:00] does this always requires you to send it to their API, and then they did the detection, which is great, and that's one way of doing it, but then you have to.
You have to do to trust that, that third party and the goal that we had with our module was that you could do the, all the detection and the redaction entirely locally, and we would report the offsets in the string to our dashboard. So you could go back and find where that person information was, but we would never see any of that data through our API.
And that's one of the great things about WebAssembly is we can. Implement that entirely locally.
Andrew: What, what is the parser parsing, like just random strings that might contain like personal information? Is that like parsing natural language?
David: I, it will pass what you give it. So look at the body of a request. You can give it the headers. You can give it any string. Uh, it's just an SDK that you import in. Um, and you can use that as a completely independent SDK, um, provided by Artjet, if you like, um, or you can use it as part of the rule set. [00:37:00] And the use case that we're building, the first example use case is a wrapper around LangChain.
So that when you're calling a third party LLM, you're not sending it personal information. And we can detect that before it goes off to the LLM API. Um, so if you have a support form on your website and you've got a chatbot that's powered by AI, you don't want customers sending a credit card number or other personal information, particularly if you're in a regulated environment where you have healthcare information, for instance, or You are not entirely convinced that you've managed to redact all your logs, or you've got some kind of like GDPR compliance where you've got to be able to, uh, delete someone's data on request.
You just don't want any of that to go through your infrastructure. And so we've got this new component that will process it entirely within your environment and then. Allows you to detect and optionally redact that.
Justin: I guess this would be like some level of like structured data, like credit card numbers, phone numbers, social security numbers, address, maybe it gets harder when you're like doing names, [00:38:00] uh, because that would be. You know,
very loose.
David: This is, yeah, we've, we've had a lot of fun going through all the different formats of data and thinking, well, a name, how do we have to have a database of every single name in the world? And that is quite challenging to do. Even if we could do that, having that database in the SDK, um, would cause it to balloon quite a lot.
So yeah, names we don't redact. Um, it's very difficult to detect. That's the, that's a perfect use case for an LLM or some kind of a natural language analysis or something like that. And that is one of the things that you would want to then send to an API that does the analysis for you. But we think that the core. Components of personal information that you want to ensure never appear in your, um, in your infrastructure, like a credit card, for instance, for PCI compliance, that can be done
entirely locally.
Justin: Yeah. I think there's like.
There are definitely levels of severity for the kind of data that you could leak. Like, if you're leaking someone's social security number, which I [00:39:00] guess everybody in the U. S. has already had their social security numbers leaked now, so maybe it's a moot point. But, it's not something that you want to be leaked from your organization, so, you know, that's something that's really well structured and easy to block.
David: Right. It's the leaking is one thing you obviously that you just don't want that to happen, but it has happened and the data is out there. I think. Was more challenging for organizations is the privacy compliance, where you've got to remove or update personal information, because if someone's submitted something like an email address or, um, some other personal information into your system, and it's gone through your API and it's been logged, maybe it's gone to data dog.
Um, so it's in data dog for 15 days and then it ages out into an S3 bucket and you're retaining that for two to three years, maybe trying to find all that information is almost impossible and trying to. Show that it's been removed is very, very challenging. And so the goal there is that never appears in any of your logs.
It never appears anywhere in your infrastructure that can't be updated [00:40:00] like a database. Um, but doing that log, just redacting fields in logs is, um, it's not trivial. And so the goal of this is less the leak protection, which does do, but it's more the, can you comply with all the privacy requirements that we now have
[00:40:17] Using LLMs for Security
Andrew: So you mentioned LLMs in there. Have you guys experimented with any interesting ways of using LLMs to do security stuff? Like, do you give it a request and say, is this bad? Yeah.
David: we have done some, um, some tests and experiments on it to look at the characteristics across multiple requests. Um, the challenge that you have with looking at requests is just the volume of logs and being able to understand them. And this hits the token window limits of LLMs very, very quickly. Uh, 1 million token window, uh, is very, very interesting because that allows us to put in a meaningful amount of data in, um, but it's not cheap.
And so really we're falling back on what used to be called machine learning is now called [00:41:00] AI. But companies like Cloudflare have been doing this for probably over a decade now, where they're actually just doing machine learning on all of the, um, all of the requests that they get. Um, why I think we'll be.
More interesting is the incident response, or is something weird going on question. And that's often where security incidents start. As you notice something strange or unusual in your infrastructure or in your activity in your systems, and it's hard to know whether that is a security incident or not.
And often you have to engage security incident response teams Who are not cheap to go in and actually do the full analysis, the forensics to understand what's been going on. I think that's where LLMs will become more, more useful in security is the post breach response, the analysis, the forensics, that kind of stuff, looking through huge volumes of data, which just takes a lot of human time to do.
And really you want the human specialist knowledge to interpret those results and figure out [00:42:00] what you need to do next. I think the, the more interesting thing with. AI in general, and the rise of LLMs is how it's changed. How people need to think about infrastructure is in the past. You could assume there's a zero marginal cost to the additional user because adding you're running your cloud infrastructure, adding a new user into your application.
Like there is an additional cost, but it's so tiny that you wouldn't really notice it for most applications anyway, most SaaS applications, but with AI, there is now a linear relationship between the number of users and the cost of your application, because you add another user and they start using all your AI functionality, and it's going to cost you on the inference, uh, or on the open AI API credits or whichever provider you're using.
And I think people haven't really made that mind shift. Yeah, the mind shift change because they're still thinking in this world of, well, the infrastructure is there. It doesn't really cost me more. If we add 10, 000 more users, then we have to add some more EC2 instances or increase our [00:43:00] concurrency limits on Lambda or pay Vercel a little bit more.
Um, but with AI and an additional 10, 000 users could cost you a lot of money. And so that's why we're seeing, we're seeing quite a lot of demand for particularly Detection functionality with AI because people are trying to do inference stealing. And so I think that that's the interesting shift that we're starting to notice.
Volumetric
Justin: I had one question that I wanted to ask, and you've kind of already, you've kind of already spoken to it, but just to sort of like, iterate on the idea. Yeah. Um, so ArcGIS has a lot of really interesting functionality and you sort of has described as like the user has a problem. It's usually like a very sort of targeted type of abuse.
Um, 1 challenge with like rate limiting and in particular is that if you're doing it in process, if you're doing it as a part of your application, then. You could get a denial of service attack and like rate limiting which could help you with that won't necessarily help you if it's like running in process.
Um, So I'm curious to [00:44:00] hear about, uh, how you're thinking about that. I mean, it's sort of obvious that you've made a trade off for like the developer and like ease of use and like very specific use cases. So it sounds like that's not particularly like something you're trying to solve for, but I'm curious to hear where your thinking is with that and like, if that's factoring in into some like future design.
David: DDoS protection is something that we specifically are not trying to solve because that is provided by all of the platform providers, all of the cloud providers for free now. Um, 10 years ago, that wasn't the case and this is where Cloudflare really made their name. They are the best DDoS protection still in the world.
Um, they also have CDN and 3DNS which Um, is, is great. Um, but they made a lot of their money and continue to make a lot of their money on protecting against these sophisticated volumetric DDoS attacks. The differentiation is gradually eroding as all of the platforms add this in. And whilst you will [00:45:00] see the likes of Google and AWS.
Fair blog about the largest ever DDoS attack that they have mitigated. You'll note that it's AWS talking about it and they've done it on behalf of their customers and none of their customers noticed. And that's their goal. They want to provide the infrastructure layer. They are responsible for the security at the network and just having huge volumes of.
Traffic coming into your network is going to be solved by them. Now you can pay extra, like adress has a, an advanced service where they, that's when they say, okay, no, now you need to install our WAF and you have to install, I set up the rate limits and, uh, you need to ensure that you've architected it in the correct way you need to be using our, uh, global accelerator.
You have to use cloud front. You have to use all these different components and that starts getting closer and closer to your application and you have to pay them. amount of money before they will start helping you with that level of protection. And I think that's where we start to come in, because as soon as you start needing to configure sophisticated rate [00:46:00] limits, that aren't just blocking a series of IP addresses or ASNs or anything like that, um, then you're getting into questions around, or how is the, uh, the application going to handle that?
Which API routes need to have which configuration? And. Can you just return a generic, um, 503 error, or do you actually need to return a JSON response with the details of the error? Because everybody using the endpoint is expecting to get a JSON response, not an HTML error page that's potentially going to break their application.
Um, and so these are the things that we're focused on is where. You need the infrastructure team probably, but you also need the development team. And historically we've just had developers focus on the application and then they give it to someone else to deal with the, the infrastructure or they're dealing with the DevOps side of things.
And there's a separate team for security, but ultimately security then needs to get developers to make the fixes and to make the changes to the application. And the incentives are different. Developers need to build security people break and they try and, you know, [00:47:00] Interface somehow to get those fixes in when they need to.
And there's often this tension because developers need to build stuff. They're not incentivized by the security team's incentives. And so that's why we're focused on these things that really need that application context, and we think that over time. The cloud platforms will just deal with all the DDoS attacks and we can focus on things where you really need that understanding of the application and the context.
Andrew: ~~So do you look, um, I'm not gonna ask that question, actually. Um, uh, ~~
[00:47:23] Developer-Centric Security Tools
Andrew: so what do you think the software industry has got or gotten wrong about security?
David: I think the challenge has always been for developers that it's thought of as someone else's job. Um, when it comes to writing code, there's. Been a revolution in the tools that help you write more secure code. Um, Dependabot is probably the one that developers are most familiar with opening up dependency updates for you.
Um, it's often very noisy, depending on how many dependencies you have. Um, Snyk does, uh, does a good job at that. Um, there's static analysis tools like Semgrep, um, which is great. Um, you [00:48:00] can either, you can also think of it as like linting and code formatting. They're the same kind of things that are happening as you're writing code.
And then you've got secrets detection and the code scanning that get hub does when you push code and you've got their security features enabled. So all of these things I've done, I've been doing a great job and doing a better job over time as you're writing code, but then you hit this. Line of separation when you push your code into production and then it's someone else's problem from security or you put cloudflare in front of it and just delegate it to a third party.
But often that comes at the very end of the development process and there's no way you can test cloudflare locally and you can't run it in staging. You can't write automated tests to make sure your security rules are working correctly. And so developers spend all this time building the application, getting everything right.
They push it to production. Then they hit a load of problems or the problems come a few months down the line. Once they start getting real traffic, um, real traction, and then they have to go back and [00:49:00] refactor and make changes. Um, and it's all of this undifferentiated work that I mentioned. And so I think what we need is better tooling on the production side of things to bring developers into the security.
Um, mindset, um, and it's just, uh, the lack of tooling hasn't been built for developers. This is like spending three and a half years on console. So one of the things that made me start Artjet is just the lack of security tooling built for developers to help them once they go into production. It's all pre prod or built for security engineers.
Justin: It's an awesome product. Um, I've like had to deal with many of these problems in production in one form or another. And, you know, it's, it's usually either, you know, manually trying to implement something, which is a fraught or, uh, yeah, relying on cloud flare or something else. Uh, so it's, it's really cool to see the solution.
Um, and I, and again, I especially think it's really, um, It's well timed in that you're [00:50:00] definitely solving the case for you were talking about, like the serverless case earlier, which is becoming more and more of a broad use case where people are building on metaframeworks and deploying to Vercel or CloudFlare or AWS Lambda or, you know, a plethora of other things.
And there's something really be said about having some sort of security infrastructure that works in those environments as seamlessly as it would work in a You know, regular stateful server environment.
[00:50:25] Future Plans for ArcJet
Justin: Um, with all that said, uh, what's next for ArcJet? Uh, what's your 1. 0 look like?
David: So developers never deploy in a single location and they've got apps deployed all over the place. You might have your primary infrastructure in AWS, but then you're deploying a marketing website to Vercel. Maybe you're deploying a disputed API to fly. io. And this is a real challenge for security teams.
It's a challenge for developers to manage it once they've had the fun of building with some new cool platform. I'm. But there's no system that works across all of them for security. And, uh, particularly across languages [00:51:00] and frameworks. Uh, you just mentioned the amazing thing about the likes of Vercel and those platforms is you literally just get pushed and then your code is deployed globally and available to everybody with unlimited scale, limited by your, your credit card or your budget.
So you set, um, but essentially with unlimited scale and it's just the speed of being able to deploy those applications has changed things for developers and for security teams. And so. We started with the JavaScript SDK because it's the most popular language for web applications. But like we were talking about with writing the WebAssembly in Rust, the great thing about WebAssembly is we can run it anywhere.
And so we're going to be building out more language SDKs. So that we can support, if you're writing a Rails application or you're using Laravel or Django, we want to have the exact same. Um, API so that you can use the similar rules or the same rules across your applications. Um, but you can also deploy them to any environment.
You don't have to figure out how to configure AWS as well, and then how to sort out your security on fly. [00:52:00] io, which is still a relatively new platform compared to AWS. And then maybe you've got your site on Vercel and that's a completely different system. And developers just don't want to have to deal with all these different platforms.
And so we want to bring the security rules closer to the code. So developers can just build it once, build it in as part of the normal flow. And then whichever language, whichever framework, whichever platform they're deploying to, it's all the same security layer.
Andrew: Justin, do you think we should do the last question or just wrap up?
Justin: ~~Um, maybe we just wrap up today. I think it ~~
Andrew: Uh, w
[00:52:24] Conclusion and Final Thoughts
Andrew: ell that wraps it up for our questions for this episode. Thanks for coming on, David. This was a very interesting look into the world of security and how you're solving it with Wasm. Something that we've grown to love on this podcast. So thanks for coming on.
David: Awesome. Thanks a lot.
Justin: Yeah, David, I'm, I'm incredibly excited about your product. It was, uh, it was fun seeing it come through and, again, I think it is really, really well timed, so I'm excited to see where y'all go next.
David: Great. Let me know what you think when you try it
out.