WEBVTT

00:00:00.001 --> 00:00:03.900
Do you struggle to make sure your code is always correct before checking it in?

00:00:03.900 --> 00:00:08.320
What about your team member's code? That one person who never wants to run the linter,

00:00:08.320 --> 00:00:13.240
tired of dealing with tons of conflicts and spurious Git changes? You need Git pre-commit

00:00:13.240 --> 00:00:18.900
hooks. Well, we're lucky to have Stephanie Molin on the show today, who has done a bunch of writing

00:00:18.900 --> 00:00:26.280
and teaching of Git hooks. This is Talk Python To Me, episode 482, recorded October 24th, 2024.

00:00:27.120 --> 00:00:32.800
Are you ready for your host? You're listening to Michael Kennedy on Talk Python To Me.

00:00:32.800 --> 00:00:36.560
Live from Portland, Oregon, and this segment was made with Python.

00:00:36.560 --> 00:00:44.780
Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy.

00:00:44.780 --> 00:00:50.020
Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

00:00:50.020 --> 00:00:56.080
both accounts over at fosstodon.org, and keep up with the show and listen to over nine years of

00:00:56.080 --> 00:01:01.820
episodes at talkpython.fm. If you want to be part of our live episodes, you can find the live streams

00:01:01.820 --> 00:01:07.600
over on YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified

00:01:07.600 --> 00:01:13.000
about upcoming shows. This episode is brought to you by Sentry. Don't let those errors go unnoticed.

00:01:13.000 --> 00:01:19.880
Use Sentry like we do here at Talk Python. Sign up at talkpython.fm/sentry. And this episode is

00:01:19.880 --> 00:01:25.480
brought to you by Bluehost. Do you need a website fast? Get Bluehost. Their AI builds your WordPress site

00:01:25.480 --> 00:01:31.360
in minutes, and their built-in tools optimize your growth. Don't wait. Visit talkpython.fm

00:01:31.360 --> 00:01:36.700
slash Bluehost to get started. Hey, everyone. Before we jump into the interview with Stephanie,

00:01:36.700 --> 00:01:43.620
I want to tell you real quickly that I just released a blog for Talk Python. Now, we have had tons of RSS

00:01:43.620 --> 00:01:49.640
over there because that's what powers podcasts. You can subscribe to the episodes. You can subscribe to

00:01:49.640 --> 00:01:54.900
an RSS feed for new course announcements over at Talk Python Training. And I've had a personal blog

00:01:55.520 --> 00:02:01.740
time over at mkennedy.codes, but no official Talk Python blog. And so I'm going to be posting

00:02:01.740 --> 00:02:06.260
really cool things on there. I've already got a couple of articles posted, but I have plans for

00:02:06.260 --> 00:02:11.860
some interesting series. And anytime there's some more interesting announcements or exciting news I

00:02:11.860 --> 00:02:16.540
want to share with Talk Python, it's going to be over on the Talk Python blog. So if you're interested,

00:02:16.540 --> 00:02:21.660
I would really, really appreciate it. If you go to talkpython.fm, click on blog, right in the

00:02:21.660 --> 00:02:26.040
navigation or at the bottom and just subscribe to the RSS feed. That way we can stay in touch.

00:02:26.040 --> 00:02:32.520
And with that, let's talk pre-commit hooks. Stephanie, welcome to Talk Python. It's awesome

00:02:32.520 --> 00:02:38.080
to have you. Thanks for having me. Yeah, really looking forward to talking about pre-commit hooks.

00:02:38.080 --> 00:02:42.160
You know, these are things that I'm sure a lot of people have heard of. I've certainly heard of,

00:02:42.160 --> 00:02:47.420
but to be honest, it's not much I've done very much with. And I bet a lot of people out there

00:02:47.420 --> 00:02:51.340
listening are like, yeah, that'd be a good idea. Just like continuous integration and writing tests.

00:02:51.540 --> 00:02:54.160
Now let's get back to it. You know, something like that. So I think

00:02:54.160 --> 00:03:00.640
there's a lot for people to take on, take away here. And we'll talk about what are these pre-commit

00:03:00.640 --> 00:03:05.180
hooks, when to use them, how to build them, and a whole bunch of other things that you're up to.

00:03:05.180 --> 00:03:08.160
So it should be a lot of fun. I'm looking forward to it. Me too.

00:03:08.160 --> 00:03:14.060
Yeah. Now, before we get to that, how about your story? How do you get into programming Python and

00:03:14.060 --> 00:03:18.900
pre-commit hooks and all these things? Hello everyone. I'm Stephanie Molin. I am a software engineer at

00:03:18.900 --> 00:03:25.220
Bloomberg. And I would say, I guess I got into programming in Python. I initially was programming

00:03:25.220 --> 00:03:33.580
in R and I was doing more data analysis while still building some things. And I needed to build a web

00:03:33.580 --> 00:03:38.980
app. And one of my teammates had suggested that rather than battling with Shiny in R, that I just

00:03:38.980 --> 00:03:43.740
learn Python. So I took a few weeks and just forced myself to do that. And I built something

00:03:43.740 --> 00:03:50.660
with Flask. And that was how I got into it. Oh, that's really awesome. Yeah. You were doing work in

00:03:50.660 --> 00:03:56.360
not finance, but in ads or something like that with R. What kind of work was that? Like we just generally

00:03:56.360 --> 00:04:03.100
add, you don't have to go into details. Yeah. So it was, it was mainly reporting and doing analysis on how

00:04:03.100 --> 00:04:08.240
client campaigns were going. But what really got me started with programming was more, I had gotten

00:04:08.240 --> 00:04:13.500
involved with a hackathon team and we had built an alerting system. So just monitoring when something

00:04:13.500 --> 00:04:19.340
weird went on with the campaigns. And I really enjoyed building more, more so than the analysis.

00:04:19.340 --> 00:04:25.820
And so I had to find a way to, and I enjoy like a little bit of data and more on the coding side.

00:04:25.820 --> 00:04:28.560
So I had to find something that would let me combine those two.

00:04:28.560 --> 00:04:33.860
Yeah. Well, that sounds really fun. I definitely, I'm on the same wavelength as you with data analysis

00:04:33.860 --> 00:04:39.720
is fun, but the building is, is really where things get interesting and, you know, look back and see

00:04:39.720 --> 00:04:42.060
like, Oh, we built this thing. That's, that's a pretty awesome feeling.

00:04:42.060 --> 00:04:47.480
Yeah. It was, it was a ton of fun and we ended up getting, I think third place on the hackathon,

00:04:47.480 --> 00:04:52.640
but yeah, that was, that was really that moment where it was like, I got to taste of something else.

00:04:52.640 --> 00:04:54.700
And I was like, this is, this is what I want to be doing.

00:04:55.020 --> 00:04:58.800
Yeah. Oh, that's fantastic. Was that at your company or was that someone?

00:04:58.800 --> 00:05:04.220
That was at the previous, previous role. It was the ad tech company. And so that was actually

00:05:04.220 --> 00:05:10.180
all built in R, the alerting system. And then, Oh no. Yeah. Okay. Yeah. And then, and then as we

00:05:10.180 --> 00:05:16.400
worked more on it, certain things ended up moving into Python. So a lot easier to work with and to

00:05:16.400 --> 00:05:19.540
automate things and not have like some laptop running R somewhere.

00:05:19.540 --> 00:05:28.860
Yeah, exactly. It's, that's sort of the promise of Python over a lot of these things that at first

00:05:28.860 --> 00:05:35.080
blush seem somewhat equivalent, right? Is that it's, it's a real programming language that can go on to do

00:05:35.080 --> 00:05:40.440
all the stuff. You don't have to try to automate some weird thing. That's not really meant to be that

00:05:40.440 --> 00:05:40.840
way. Right.

00:05:40.840 --> 00:05:46.840
I know. And now, I mean, I could not write R if I, if I had to, I wouldn't, I don't think I would.

00:05:47.320 --> 00:05:53.580
Yeah. Well, I was going to ask you now, which side of the fence do you spend more time on R or Python?

00:05:53.580 --> 00:05:54.140
It sounds like.

00:05:54.140 --> 00:05:59.920
I haven't touched R in maybe six plus years at this point. So I, yeah. Other than the arrows,

00:05:59.920 --> 00:06:01.740
that's probably the only thing I could manage too.

00:06:01.740 --> 00:06:10.340
Yeah. No more equal size, just arrows. Okay. Got it. Awesome. Well, that's super fun. Let's talk

00:06:10.340 --> 00:06:16.420
about pre-commit hooks, right? I've had Anthony Sotili on the show to talk about his pre-commit project.

00:06:16.420 --> 00:06:21.720
It was a long time ago and I'm sure that project will get a bit of a shout out from your work as

00:06:21.720 --> 00:06:28.240
well. But, you know, congrats, you put together a really nice series of articles and resources

00:06:28.240 --> 00:06:35.440
teaching people what commit hooks are, how to debug them, how to build them, how to choose them. So I

00:06:35.440 --> 00:06:38.420
think, you know, the stuff we're going to talk about, I'll link, of course, in the show notes.

00:06:38.600 --> 00:06:41.700
It's a really nice resource for folks. So thank you. I appreciate that.

00:06:41.700 --> 00:06:51.020
Yeah. Yeah. You bet. So let's talk about NumPy doc, doc string validation. This is, this was your entry

00:06:51.020 --> 00:06:54.180
way into what this whole world of pre-commit hooks is, right?

00:06:54.180 --> 00:07:02.580
Yeah. So, and I think July, 2022, I was at my first EuroPython and I decided to do the sprints

00:07:02.580 --> 00:07:08.260
for the first time. I ended up working with the scikit learn team and they wanted to make sure that

00:07:08.260 --> 00:07:14.620
all of their doc strings were conforming to the NumPy doc standard. They had a file in place or a test

00:07:14.620 --> 00:07:19.480
file that you could run and just validate that whatever changes you made were now being validated

00:07:19.480 --> 00:07:27.300
as far as doc strings. And I remember at one point, like I had, I think done 12 or so PRs in that sprint.

00:07:27.300 --> 00:07:32.260
So I was very productive. And there was one early on, I think in the second or so, where it just wasn't

00:07:32.260 --> 00:07:37.340
working and I couldn't figure out why it was telling me it wasn't valid. It was saying that it wasn't

00:07:37.340 --> 00:07:43.460
ending in a period. And I had called over the, one of the maintainers and we both stared at it. To us,

00:07:43.460 --> 00:07:48.880
it looked like a period. And I ended up just deleting the doc string and starting over. And it turned out

00:07:48.880 --> 00:07:54.460
that it was a trailing space at the end. And so I had asked the maintainer, like, how do you not have

00:07:54.460 --> 00:07:59.580
this happen to you? And the response was, you should install pre-commit. And by then I had, I was already,

00:07:59.580 --> 00:08:04.960
I had to leave. So I was like, make, I made a note to myself. I need to research this when I get home.

00:08:04.960 --> 00:08:09.740
And when I did, I was like, well, how did I not know about this before? And I set it up on things.

00:08:09.740 --> 00:08:15.180
And then I went to look, does NumPy doc have that? This seems like exactly what you would want.

00:08:15.300 --> 00:08:19.220
As you're writing code, you want to make sure that it's going to check the doc string there. You don't

00:08:19.220 --> 00:08:24.420
want to have to run some other thing later on and remember to run it. So I looked and there was no

00:08:24.420 --> 00:08:29.720
pre-commit hook for NumPy doc. And I had made something, something that initially we had just

00:08:29.720 --> 00:08:35.440
used internally within my team. And then later on, I kind of wanted to use it for a personal project.

00:08:35.440 --> 00:08:42.580
And so I set about seeing how we could actually open source it. And I had contacted the NumPy doc team

00:08:42.580 --> 00:08:46.280
and they were very, very interested in it because there was a reason there was no hook. It's because

00:08:46.280 --> 00:08:52.060
no one knew how to do it. Right. And at that point I had the horrible realization that what I had written

00:08:52.060 --> 00:08:58.700
would never work outside because it was relying on things being installed. So, and then I felt pretty

00:08:58.700 --> 00:09:03.700
bad about promising that to them. So I managed to come up with an entirely new solution in a weekend

00:09:03.700 --> 00:09:10.360
and figured out how to use the abstract syntax tree to work through. And so I built an entirely

00:09:10.360 --> 00:09:15.900
new version of it. And that is what is currently available in NumPy doc. And that actually led to

00:09:15.900 --> 00:09:22.240
them inviting me to be a core developer for NumPy doc. Congratulations. How cool is that?

00:09:22.240 --> 00:09:27.380
Yeah, I know. It's like the full spectrum, right? And just having heard about it and then just

00:09:27.380 --> 00:09:30.960
seeing the connection between two things that weren't previously connected.

00:09:31.260 --> 00:09:37.120
Yeah. Yeah. Well, I think your comment about the pre-commit hook not previously existing,

00:09:37.120 --> 00:09:41.260
you know, for this project also is, it's pretty interesting, right? It's kind of like I hinted at,

00:09:41.260 --> 00:09:45.120
I mean, a lot of people hear about this kind of stuff, but that doesn't mean they're putting it

00:09:45.120 --> 00:09:45.920
into practice, right?

00:09:45.920 --> 00:09:46.840
Yeah, for sure.

00:09:47.140 --> 00:09:53.080
And so how do we, you know, let's, let's find our way over to pre-commit hooks in general. So how do we

00:09:53.080 --> 00:09:59.660
encourage people or ensure that people follow coding rules, right? We've got tools like black,

00:09:59.660 --> 00:10:06.300
we've got tools like rough. Now those will work awesome. If you give them a consistent config file

00:10:06.300 --> 00:10:13.100
or config settings, not so much with black, but rough. Anyway, they'll make those changes and do a lot of the

00:10:13.100 --> 00:10:16.640
kind of stuff that we're talking about here, but that requires, like you said, people to have it

00:10:16.640 --> 00:10:23.000
installed, people to run it and people to buy into the whole concept of the project in the first place,

00:10:23.000 --> 00:10:23.320
right?

00:10:23.320 --> 00:10:24.400
Yeah, that last bit.

00:10:24.440 --> 00:10:28.360
We're all using these tools and we're all going to run them and we're going to remember to run them

00:10:28.360 --> 00:10:33.140
until one person goes, I don't like these tools. I'm not doing it. And then their settings fight with

00:10:33.140 --> 00:10:36.380
your settings or their spacing fights with your spacing or whatever, right?

00:10:36.380 --> 00:10:41.800
Yeah. I think what has, what really helped in my experience, when you incorporate these things,

00:10:41.800 --> 00:10:46.060
even like going and approaching open source projects that didn't have a pre-commit set up and just asking

00:10:46.060 --> 00:10:51.100
if they were interested in it, it's, you really see the value when you've, you think if you've ever

00:10:51.100 --> 00:10:56.220
reviewed something or gotten review comments about, you should start a new line here. I don't like this

00:10:56.220 --> 00:11:01.800
space here. And then you think about how much time you waste at that stage. And then you still have

00:11:01.800 --> 00:11:08.180
zero consistency because you did it one way, someone else does it another way. And even further than that,

00:11:08.180 --> 00:11:13.420
it's just the time you waste in your code. Oh, I should put this on a new line and reformatting files

00:11:13.420 --> 00:11:18.140
when you could actually be writing things and thinking about how should I design this algorithm,

00:11:18.140 --> 00:11:23.240
them. Right. And so I think a big part of making sure that once you find these tools that you're

00:11:23.240 --> 00:11:28.000
going to use and actually make sure they're using, it's making it easy to use. Like you said, yeah,

00:11:28.000 --> 00:11:32.720
you can just run black or rough, but you have to remember to run black or rough. And that is the

00:11:32.720 --> 00:11:38.500
key problem. And what's so great about pre-commit or even extensions in your IDE is that these things

00:11:38.500 --> 00:11:43.100
become automatic and that's what you need to get towards for these things to actually stick.

00:11:43.100 --> 00:11:49.580
Yeah. To make them automatic and not part of it. And to some degree, continuous integration can do

00:11:49.580 --> 00:11:54.400
those kinds of things. But a lot of times it's too late at that point. It's already checked in,

00:11:54.400 --> 00:11:59.620
it's already committed. And then you've got the back and forth of now it's a diff, but it's only a diff

00:11:59.620 --> 00:12:04.500
because they spaced it differently when they hit save in their IDE than when you hit save in yours and

00:12:04.500 --> 00:12:11.040
all that. So pre-commit hooks run prior to actually leaving your computer, right?

00:12:11.040 --> 00:12:16.840
Yeah. So it's actually prior to even the commit. So when you do get commit and you, let's say you pass

00:12:16.840 --> 00:12:21.700
your message and if it's successful, you normally, you see the hash that gets generated. If you have

00:12:21.700 --> 00:12:28.160
pre-commit hooks enabled, then if that, those checks don't pass, then that commit never gets created in

00:12:28.160 --> 00:12:33.280
the first place. So you still have the files staged, but nothing has made it to the commit.

00:12:33.280 --> 00:12:34.020
Yeah. That's great.

00:12:34.020 --> 00:12:38.220
Yeah. I was just going to explain maybe a little bit about how they work if you're curious.

00:12:38.220 --> 00:12:42.120
Yeah. Yeah. Well, let's start with just like, what even are, are these pre-commit hooks?

00:12:42.120 --> 00:12:49.620
Yeah. So pre-commit hooks, and I think the naming is, is quite overloaded and that leads to a lot of confusion.

00:12:49.920 --> 00:12:57.540
So at the lowest level, a Git repository in general supports a hooks system. So there's a variety of

00:12:57.540 --> 00:13:03.440
different types of actions that Git will trigger a script on your behalf. And one of those such actions

00:13:03.440 --> 00:13:08.500
is pre-commit. So as I described before, as you run Git commit, this gets triggered. Another thing

00:13:08.500 --> 00:13:15.060
might be pushing. You can have Git wired to run some script when you push. Now that is Git's version

00:13:15.060 --> 00:13:21.380
of pre-commit and hook, singular hook, because you can only have a single file run, single executable

00:13:21.380 --> 00:13:21.860
can run.

00:13:21.860 --> 00:13:27.860
Right. If you go to your, it's in the Git folder, there's a hooks subfolder and it's got little

00:13:27.860 --> 00:13:30.500
samples for all the different lifecycle things, right?

00:13:30.500 --> 00:13:35.640
Yeah. And yeah, they provide some, they have to, like I said, they had to be executable, but you can

00:13:35.640 --> 00:13:41.100
be in any language that you have available on the machine. And so Git provides some examples. I do think

00:13:41.100 --> 00:13:46.720
there are a few stages that don't have examples, but it's basically you take the name of the stage

00:13:46.720 --> 00:13:51.600
that you're going to use and that's the name of the file. And that has to be an executable and Git

00:13:51.600 --> 00:13:53.700
will run it at the designated moment.

00:13:53.700 --> 00:14:00.040
Okay. So it could be a Python executable or it could be a Go executable or whatever, but it's just

00:14:00.040 --> 00:14:00.580
one, right?

00:14:00.580 --> 00:14:05.040
It's just one. Yeah. Cause it has to be named. And like in the case of pre-commit, it has to be called

00:14:05.040 --> 00:14:06.320
pre-commit, nothing else.

00:14:07.280 --> 00:14:12.740
This portion of Talk Python To Me is brought to you by Sentry. Code breaks. It's a fact of life.

00:14:12.740 --> 00:14:18.680
With Sentry, you can fix it faster. As I've told you all before, we use Sentry on many of our apps

00:14:18.680 --> 00:14:24.680
and APIs here at Talk Python. I recently used Sentry to help me track down one of the weirdest bugs I've

00:14:24.680 --> 00:14:30.360
run into in a long time. Here's what happened. When signing up for our mailing list, it would crash

00:14:30.360 --> 00:14:36.700
under a non-common execution paths, like situations where someone was already subscribed or entered an

00:14:36.700 --> 00:14:42.240
invalid email address or something like this. The bizarre part was that our logging of that

00:14:42.240 --> 00:14:49.720
unusual condition itself was crashing. How is it possible for our log to crash? It's basically a

00:14:49.720 --> 00:14:54.580
glorified print statement. Well, Sentry to the rescue. I'm looking at the crash report right now,

00:14:54.580 --> 00:14:59.680
and I see way more information than you'd expect to find in any log statement. And because it's

00:14:59.680 --> 00:15:06.120
production, debuggers are out of the question. I see the traceback, of course, but also the browser version,

00:15:06.120 --> 00:15:13.000
client OS, server OS, server OS version, whether it's production or Q&A, the email and name of the person

00:15:13.000 --> 00:15:18.520
signing up. That's the person who actually experienced the crash. Dictionaries of data on the call stack and so much

00:15:18.520 --> 00:15:25.180
more. What was the problem? I initialized the logger with the string info for the level rather than the

00:15:25.180 --> 00:15:33.340
enumeration.info, which was an integer-based enum. So the login statement would crash, saying that I could not use

00:15:33.340 --> 00:15:40.140
less than or equal to between strings and ints. Crazy town. But with Sentry, I captured it,

00:15:40.140 --> 00:15:46.400
fixed it, and I even helped the user who experienced that crash. Don't fly blind. Fix code faster with

00:15:46.400 --> 00:15:52.520
Sentry. Create your Sentry account now at talkpython.fm/sentry. And if you sign up with the code

00:15:52.520 --> 00:16:00.040
TALKPYTHON, all capital, no spaces, it's good for two free months of Sentry's business plan, which will give you up to

00:16:00.040 --> 00:16:03.040
20 times as many monthly events as well as other features.

00:16:04.640 --> 00:16:10.800
So if you want to run more, you basically have to, potentially, write a program which then itself

00:16:10.800 --> 00:16:16.440
figures out all the things to do and then delegates to running them. Like if you want to run ruff with

00:16:16.440 --> 00:16:23.860
a fixed formatting issues and you want to run the checker fixer for NumPy doc strings and all those

00:16:23.860 --> 00:16:27.100
things, you'd have to write a sort of orchestrating program for that, right?

00:16:27.100 --> 00:16:32.520
Yeah, it's almost like you're writing in the case of like a bash script, like a giant bash script where you have

00:16:32.520 --> 00:16:38.160
to decide, you know, do you fail early? How do you like and check, do I run this one and then this one? And then

00:16:38.160 --> 00:16:43.700
even worse, you're actually, in that case, you're probably running everything, you know, you're running everything

00:16:43.700 --> 00:16:49.240
sequentially. And if you don't do it carefully, then you know, maybe, maybe you want to fail early, maybe you don't.

00:16:49.300 --> 00:16:55.780
So that becomes very, very challenging to configure and also to share because the thing about that file is that is not

00:16:55.780 --> 00:17:01.480
included in version control. So that would be something that you would maybe have to store somewhere else and then do a

00:17:01.480 --> 00:17:05.920
symbolic link. And then that becomes already a lot trickier for everyone to manage.

00:17:05.920 --> 00:17:09.920
Yeah, I was just doing that last night and that's an AI question. I don't remember how to do that.

00:17:09.920 --> 00:17:16.740
I know you can do it. It's not that hard. It involves LN, but you know, ChatGPT, what do I do exactly?

00:17:16.740 --> 00:17:19.120
LN-S. I've had to do that quite a bit.

00:17:19.120 --> 00:17:22.780
It's burned into the brain, huh?

00:17:22.780 --> 00:17:31.240
So one of the things that you recommend so we don't have to build this orchestration piece is actually pre-commit,

00:17:31.240 --> 00:17:32.880
which is a Python project, right?

00:17:32.880 --> 00:17:37.700
Yes. And it's not the only one. So again, that's where like the naming becomes challenging.

00:17:37.700 --> 00:17:42.360
But pre-commit is built in Python, but it can run hooks in a variety of languages.

00:17:42.580 --> 00:17:48.860
And it interfaces with GitHub's system for you. So it creates that executable and plants it there.

00:17:48.860 --> 00:17:56.120
But that executable is then pointing back to pre-commit so that you can just define a simple YAML file like you can see part of it on the screen right now.

00:17:56.120 --> 00:18:01.300
And it becomes very easy because essentially you're just configuring what you want to run.

00:18:01.300 --> 00:18:05.820
You're not actually coding the logic of the checks and how they relate to each other.

00:18:05.820 --> 00:18:12.720
Right. So let's assume that all the pre-commit hooks that you want to run somehow exist out there in the world, right?

00:18:12.720 --> 00:18:14.440
You don't have to create them for the moment.

00:18:14.740 --> 00:18:20.000
So what you can do with pre-commit is you can set up a YAML file.

00:18:20.000 --> 00:18:22.060
I always get those crisscrossed.

00:18:22.060 --> 00:18:30.600
A YAML file, a pre-commit config YAML file, which then has a bunch of listings of here's a Git repository.

00:18:30.600 --> 00:18:38.200
And if you install it as a Python package, here's a bunch of things that you can run on it, like check toml, check YAML and so on, right?

00:18:38.200 --> 00:18:40.740
Well, it doesn't actually have to be a Python package, right?

00:18:40.740 --> 00:18:48.780
So in that repo, and we're maybe jumping ahead, but there's a special file in that repo, which will tell pre-commit how it actually needs to install it.

00:18:48.780 --> 00:18:49.580
So it could be anything.

00:18:49.580 --> 00:18:50.620
Oh, that's interesting.

00:18:50.620 --> 00:19:01.200
So the thing that integrates with the pre-commit project, it has to opt in in a sense in that it has to have a configuration file or a launch file or a setup file, something like that.

00:19:01.200 --> 00:19:04.340
Yeah. So right now we're looking at pre-commit config.

00:19:04.340 --> 00:19:09.600
There's pre-commit hooks, and that one is kind of registering it with pre-commit system.

00:19:09.600 --> 00:19:13.300
So it tells pre-commit how to install it once it gets a hold of it.

00:19:13.300 --> 00:19:21.660
And it also lists out these hooks that we see here under ID, but that will be defined over there so that pre-commit knows, well, what is check toml?

00:19:21.660 --> 00:19:22.760
What is check YAML?

00:19:22.760 --> 00:19:24.400
Okay. Yeah.

00:19:24.400 --> 00:19:25.380
That's really cool.

00:19:25.380 --> 00:19:30.580
And you can have more than one of these repositories in there, right?

00:19:30.580 --> 00:19:31.080
Correct.

00:19:31.080 --> 00:19:42.420
Yeah. So the repos section is a list of repo sections, and then each repo then has other config, like the individual hooks that you want to run from that repo.

00:19:42.420 --> 00:19:43.160
Right, right.

00:19:43.160 --> 00:19:53.180
So for the first example that you have in this, and this is your article, I guess, I don't know if I give this the proper announcement, but how to set up pre-commit hooks.

00:19:53.280 --> 00:19:57.480
This is your, I perceive this as kind of your getting started article for this whole series.

00:19:57.480 --> 00:19:58.860
I don't know if you see it that way.

00:19:58.860 --> 00:19:59.280
Yeah.

00:19:59.280 --> 00:20:01.180
Yeah, this was the first one.

00:20:01.180 --> 00:20:04.540
I had gotten a lot of questions on how to do this.

00:20:04.920 --> 00:20:15.380
And I think it's always interesting, especially when you think about, you know, speaking at conferences, I feel like, and which I do a lot of, and I feel like a lot of what gets more hits in that sense is like the advanced stuff, maybe more creating it.

00:20:15.380 --> 00:20:20.740
But there's so much value in people just getting started and figuring out how do I even use this in the first place?

00:20:20.740 --> 00:20:23.060
Because this saves you so much time.

00:20:23.060 --> 00:20:25.760
So I really, this was where I got started for that reason.

00:20:25.760 --> 00:20:28.320
I think a lot of people were able to benefit from this article.

00:20:28.320 --> 00:20:30.000
Yeah, it seems like it.

00:20:30.000 --> 00:20:36.260
I know it's fun to talk about the super advanced deep dive things, but most people, they just need to get started.

00:20:36.260 --> 00:20:37.760
They just need some foundation, right?

00:20:37.760 --> 00:20:47.000
And I think, I think that's actually where most of the benefit comes from, even though it is really fun to see some cool deep dive talk that people are going into, right?

00:20:47.000 --> 00:20:57.040
So this next one is pretty interesting that we're adding here in this example, and that's the rough pre-commit from straight from Astral, right?

00:20:57.040 --> 00:21:02.540
So this is just github.com/astral.sh, which is the company behind rough newbie.

00:21:02.540 --> 00:21:04.340
And this is the rough pre-commit.

00:21:04.340 --> 00:21:09.240
But what's interesting about this is, well, one, that it has nothing to do with the pre-commit project.

00:21:09.240 --> 00:21:14.080
But two, that this one also takes special arguments that you can pass to it.

00:21:14.080 --> 00:21:20.860
Yeah, so I think the rough pre-commit one is just a smaller version so that it works faster with pre-commit.

00:21:20.860 --> 00:21:23.480
Because pre-commit will have to install these at some point.

00:21:23.480 --> 00:21:25.020
It will have a cache.

00:21:25.020 --> 00:21:28.660
So if you don't change the version in this case, it will be able to reuse that.

00:21:28.660 --> 00:21:31.400
But that first time, you do have a bit of a delay.

00:21:31.400 --> 00:21:33.340
And that's not something you want.

00:21:33.340 --> 00:21:36.280
It's something you have to be very careful of when you want to be using these.

00:21:36.280 --> 00:21:43.360
And then the args thing is nice because you have a few options when you configure these tools, depending on what the tool supports.

00:21:43.360 --> 00:21:47.420
In this case, rough supports, as I think we mentioned a little bit earlier, configuration file.

00:21:47.420 --> 00:21:50.320
So, for example, you could have stuff in your pyproject.toml.

00:21:50.560 --> 00:21:55.200
But the key here is that maybe you're using rough in your IDE.

00:21:55.200 --> 00:22:00.080
And maybe you don't want to do the same kind of changes that you want to do in pre-commit.

00:22:00.080 --> 00:22:03.720
Maybe you wanted to ask you if it's going to change something.

00:22:03.720 --> 00:22:06.460
Whereas in the pre-commit stage, you definitely want it to be fixed.

00:22:06.460 --> 00:22:13.940
So you can use the args here to provide stuff that you only want to happen when it's running in the context of pre-commit.

00:22:14.180 --> 00:22:21.260
Yeah, and rough has a exit non-zero on fix, which means if it goes through and you say to fix it, it will fix it.

00:22:21.260 --> 00:22:30.320
But then it'll error out and say that wasn't a smooth transition or whatever, which is cool because that will then fail the commit itself.

00:22:30.320 --> 00:22:30.880
Correct.

00:22:30.880 --> 00:22:34.060
Give you the modified files and say basically have a look.

00:22:34.060 --> 00:22:36.280
See if you like it now, right?

00:22:36.280 --> 00:22:37.740
Before it actually just ships it off.

00:22:37.740 --> 00:22:43.200
That's so important because sometimes you realize there was some rule that you hadn't reviewed before.

00:22:43.300 --> 00:22:45.860
That's not quite doing what I want and let me tweak my setup.

00:22:45.860 --> 00:22:50.180
So it's nice to have that bit where you can verify what was actually changed is what you want.

00:22:50.180 --> 00:22:54.400
Yeah, I guess it's a little bit dangerous to just say change it and then commit it.

00:22:54.400 --> 00:22:55.500
I've had people.

00:22:55.500 --> 00:23:03.460
So I did a workshop on pre-commit both on setting it up and then making your own hooks at EuroPython this year.

00:23:03.460 --> 00:23:05.800
And I did have a few people actually.

00:23:05.800 --> 00:23:12.220
One very insistent asking me why wasn't there a hook or why don't they support just fixing it

00:23:12.220 --> 00:23:14.700
and then automatically adding it and committing it on your behalf.

00:23:14.700 --> 00:23:18.020
And to me, as a person who works in security, that just sounds very scary.

00:23:18.020 --> 00:23:20.520
I don't want things doing that.

00:23:20.520 --> 00:23:23.940
I want to see what is being changed and whether or not I agree with it or not.

00:23:23.940 --> 00:23:24.780
Yeah.

00:23:24.780 --> 00:23:28.100
Why doesn't it just go ahead and push it as well?

00:23:28.100 --> 00:23:28.560
Come on.

00:23:28.560 --> 00:23:28.820
Yeah.

00:23:28.820 --> 00:23:30.840
Well, I think that was part of the suggestion.

00:23:30.840 --> 00:23:32.980
I was like, I certainly don't want that running on my machine.

00:23:34.100 --> 00:23:39.000
Yeah, it does skip out on some of the benefits of the multi-stage aspects of Git, I suppose.

00:23:39.000 --> 00:23:40.080
But it is efficient.

00:23:40.080 --> 00:23:41.360
You just get it done all at once.

00:23:41.360 --> 00:23:42.180
That's pretty cool.

00:23:42.180 --> 00:23:44.540
Yeah, but you don't know what else is grabbing, which is the scary part.

00:23:44.540 --> 00:23:45.360
No, of course not.

00:23:45.360 --> 00:23:45.800
I know.

00:23:45.800 --> 00:23:46.900
Super bad.

00:23:47.900 --> 00:23:53.140
So this example that we're talking about here where we've got a pre-commit hook that we're grabbing

00:23:53.140 --> 00:23:57.820
and then it takes these arguments, I think this is an interesting point of discussion.

00:23:57.820 --> 00:24:01.980
So the example you have in your article just says, what we're going to tell rough is dash,

00:24:01.980 --> 00:24:06.940
dash, fix, dash, dash, exit non-zero fix, and show fixes, which is all good.

00:24:07.180 --> 00:24:11.920
But rough can be pretty complex in its configuration, right?

00:24:11.920 --> 00:24:14.720
You can say, disable flight gate, turn this one on.

00:24:14.720 --> 00:24:15.320
These are warnings.

00:24:15.320 --> 00:24:16.100
These are errors.

00:24:16.100 --> 00:24:21.340
And there's a whole, you know, here's how many line columns I want and all of this stuff, right?

00:24:21.340 --> 00:24:27.240
So you can either do this argument thing, or if it's supported, you could also potentially have,

00:24:27.240 --> 00:24:28.660
say, a rough.toml, right?

00:24:28.660 --> 00:24:29.080
Yeah.

00:24:29.080 --> 00:24:34.000
So I tend to want to minimize the amount of configuration files I have.

00:24:34.000 --> 00:24:38.340
So in my case, I think below I talk about having it in the pyproject.toml.

00:24:38.340 --> 00:24:38.780
Yeah, exactly.

00:24:38.780 --> 00:24:42.140
So you just add a rough section in there and then you configure things.

00:24:42.140 --> 00:24:46.820
And this is stuff that you'd want to use both in your editor as well as in the pre-commit stage,

00:24:46.820 --> 00:24:47.820
because you want them to agree.

00:24:47.820 --> 00:24:51.300
And nothing worse than one telling you the lines too long and the other one like,

00:24:51.300 --> 00:24:51.960
nope, that's good.

00:24:51.960 --> 00:24:52.500
Go ahead.

00:24:52.500 --> 00:24:59.220
Or put a space after the comma in parameters and then take away the space and put the space and take away the space.

00:24:59.220 --> 00:24:59.240
Exactly.

00:24:59.240 --> 00:25:00.460
You don't want them fighting.

00:25:00.460 --> 00:25:01.320
You want them in agreement.

00:25:01.320 --> 00:25:02.600
No, no, you don't.

00:25:02.700 --> 00:25:09.800
So I suppose that's a massive bonus of having either the tool.rough settings in your pyproject or just a rough.toml,

00:25:09.800 --> 00:25:11.520
however you go about that, it doesn't really matter.

00:25:11.520 --> 00:25:17.120
Because then no matter how you're using rough via the pre-commit or for your project, it'll be the same thing, right?

00:25:17.120 --> 00:25:17.680
Exactly.

00:25:17.680 --> 00:25:18.240
Yeah.

00:25:18.240 --> 00:25:18.640
Okay.

00:25:18.640 --> 00:25:19.140
Yeah.

00:25:19.140 --> 00:25:20.520
That's pretty awesome.

00:25:20.520 --> 00:25:24.500
Now, I guess maybe we got a bit ahead of ourselves.

00:25:24.980 --> 00:25:32.300
If I want to somehow install a pre-commit hook or pre-commit so that when I then give it one of these toml files,

00:25:32.300 --> 00:25:34.800
it'll go subsequently grab them and do the things.

00:25:34.800 --> 00:25:36.420
How do you get started with that?

00:25:36.420 --> 00:25:39.780
I think I need a rephrasing of that question.

00:25:39.780 --> 00:25:40.360
Yeah.

00:25:40.360 --> 00:25:40.700
Sorry.

00:25:40.960 --> 00:25:49.460
So if I have just a plain GitHub repository and I want to have pre-commit manage the hooks for that repository,

00:25:49.460 --> 00:25:50.640
like what do I do?

00:25:50.920 --> 00:25:51.140
Okay.

00:25:51.140 --> 00:25:54.140
So the first thing is you have to actually install pre-commit.

00:25:54.140 --> 00:25:56.920
And that's not the command that's on the screen.

00:25:56.920 --> 00:25:58.260
This is more of a pip install.

00:25:58.260 --> 00:26:02.020
So make sure you have the Python library in place.

00:26:02.020 --> 00:26:05.680
And then you need to have this configuration file.

00:26:05.680 --> 00:26:09.800
At least one hook in there so that you have a valid file.

00:26:09.800 --> 00:26:12.240
And then you can run pre-commit install.

00:26:12.240 --> 00:26:15.620
And I omitted it here, but what I talk about in a different article,

00:26:15.760 --> 00:26:21.940
when you run this command, pre-commit actually tells you that it created the git hooks pre-commit file.

00:26:21.940 --> 00:26:25.060
And if you open that up, and I have an example on that other article,

00:26:25.060 --> 00:26:28.860
it's very simple and it's just calling pre-commit the tool itself.

00:26:28.860 --> 00:26:33.440
So in all cases, you need to have it installed in your environment.

00:26:33.440 --> 00:26:39.960
And a single time you run pre-commit install, which then does the wiring on the git side.

00:26:39.960 --> 00:26:45.340
And this is something that everyone in your project has to run on any machine that they are using.

00:26:45.740 --> 00:26:50.240
Because it's part of the repository itself, that file needs to exist there.

00:26:50.240 --> 00:26:52.220
And that can only happen if you run this command.

00:26:52.220 --> 00:26:53.100
Yeah.

00:26:53.100 --> 00:26:56.980
So there's a .pre-commit.config.yaml file.

00:26:56.980 --> 00:27:01.880
That's what you put into GitHub at the root of your project or something like this.

00:27:01.880 --> 00:27:07.720
But then to actually configure git itself, you've got to run this pre-commit space install.

00:27:07.720 --> 00:27:11.120
And it basically wires up the hooks to make that happen, right?

00:27:11.120 --> 00:27:11.660
Correct.

00:27:11.660 --> 00:27:14.840
So yeah, when you run this, that file gets created on your behalf.

00:27:14.920 --> 00:27:17.100
And then you don't have to worry about wiring that up.

00:27:17.100 --> 00:27:18.720
And then it's transparent.

00:27:18.720 --> 00:27:22.740
All you have to do is tweak your config and then the changes happen.

00:27:22.740 --> 00:27:23.200
Nice.

00:27:23.200 --> 00:27:26.640
I don't know if the naming, how much to believe the naming.

00:27:26.640 --> 00:27:28.420
Can it do things other than pre-commit?

00:27:28.420 --> 00:27:28.940
Yes.

00:27:28.940 --> 00:27:31.880
Can it do pre-push and those kinds of things?

00:27:32.260 --> 00:27:35.660
They don't support every single one.

00:27:35.660 --> 00:27:38.280
But there are quite a few that they do support.

00:27:38.280 --> 00:27:44.980
For example, I once configured an open source project with a pre-push because it was a slower

00:27:44.980 --> 00:27:45.400
check.

00:27:45.400 --> 00:27:48.520
And that's something you definitely don't want running on each commit.

00:27:48.520 --> 00:27:51.780
But it might be something where you want to make sure when you push the files that you've

00:27:51.780 --> 00:27:53.600
addressed something that's maybe a little bit longer.

00:27:54.080 --> 00:28:00.000
And that is really not any different than configuring with the pre-commit config YAML.

00:28:00.000 --> 00:28:03.680
There's just a separate item that goes in there that says which stage to run.

00:28:03.680 --> 00:28:04.940
By default, it's pre-commit.

00:28:04.940 --> 00:28:06.060
So you don't see it.

00:28:06.060 --> 00:28:07.400
But if you needed to change it, you can.

00:28:07.400 --> 00:28:07.860
Yeah.

00:28:07.860 --> 00:28:08.860
I figured that was the case.

00:28:08.860 --> 00:28:09.880
But I'd never tried.

00:28:09.880 --> 00:28:13.980
And given that it's named pre-commit, you know, it's kind of named after one of the hooks,

00:28:13.980 --> 00:28:14.180
right?

00:28:14.180 --> 00:28:14.780
But of course.

00:28:15.100 --> 00:28:17.420
I think that's named probably the most useful one.

00:28:17.420 --> 00:28:18.880
I would.

00:28:18.880 --> 00:28:19.900
Yeah, I would think so.

00:28:19.900 --> 00:28:25.980
I think a very popular example would perhaps be the commit message hook.

00:28:25.980 --> 00:28:30.220
So there's a lot of tools that work on, you know, making sure your commits are following

00:28:30.220 --> 00:28:30.920
a certain standard.

00:28:30.920 --> 00:28:32.480
I think one of them is called committizen.

00:28:32.480 --> 00:28:36.600
And so that runs on, my guess is on the commit message hook.

00:28:36.600 --> 00:28:37.340
Committizen?

00:28:37.340 --> 00:28:38.000
Yes.

00:28:38.000 --> 00:28:38.400
Okay.

00:28:38.400 --> 00:28:39.900
What is this committizen about?

00:28:39.900 --> 00:28:40.900
I haven't heard of this.

00:28:40.900 --> 00:28:42.920
I don't think their example uses that.

00:28:42.920 --> 00:28:44.740
But I think they do have a pre-commit hook.

00:28:45.020 --> 00:28:46.840
And I believe it works that way.

00:28:46.840 --> 00:28:47.240
Yeah.

00:28:47.240 --> 00:28:47.540
Yeah.

00:28:47.540 --> 00:28:48.240
Interesting.

00:28:48.240 --> 00:28:48.760
Okay.

00:28:48.760 --> 00:28:49.600
What's this thing?

00:28:49.600 --> 00:28:51.820
A release management tool for teams.

00:28:51.820 --> 00:28:52.180
Yeah, sure.

00:28:52.180 --> 00:28:56.320
That makes sense that you want to kind of be a little bit careful about what your commit

00:28:56.320 --> 00:28:57.140
messages are.

00:28:57.140 --> 00:29:01.320
Maybe you want to grab certain commit messages and add them to your changelog or something

00:29:01.320 --> 00:29:01.860
like that, right?

00:29:01.860 --> 00:29:02.300
Yeah.

00:29:02.300 --> 00:29:07.740
I think there's been quite a bit of talk about this one at conferences I've been lately.

00:29:07.740 --> 00:29:09.440
I think it's gotten a lot of traction.

00:29:09.440 --> 00:29:09.960
Yeah.

00:29:09.960 --> 00:29:11.840
2.5,000 GitHub stars.

00:29:11.840 --> 00:29:12.380
That's pretty good.

00:29:12.380 --> 00:29:13.000
I'll check it out.

00:29:13.000 --> 00:29:14.080
This is news to me.

00:29:14.080 --> 00:29:18.860
This portion of Talk Python To Me is brought to you by Bluehost.

00:29:18.860 --> 00:29:22.100
Got ideas, but no idea how to build a website?

00:29:22.100 --> 00:29:23.220
Get Bluehost.

00:29:23.220 --> 00:29:28.640
With their AI design tool, you can quickly generate a high-quality, fast-loading WordPress

00:29:28.640 --> 00:29:29.740
site instantly.

00:29:29.740 --> 00:29:33.400
Once you've nailed the look, just hit enter and your site goes live.

00:29:33.400 --> 00:29:34.420
It's really that simple.

00:29:34.520 --> 00:29:39.000
And it doesn't matter whether you're a hobbyist, entrepreneur, or just starting your side hustle.

00:29:39.000 --> 00:29:44.620
Bluehost has you covered with built-in marketing and e-commerce tools to help you grow and scale

00:29:44.620 --> 00:29:46.060
your website for the long haul.

00:29:46.060 --> 00:29:50.300
Since you're listening to my show, you probably know Python, but sometimes it's better to focus

00:29:50.300 --> 00:29:55.520
on what you're creating rather than a custom-built website and add another month until you launch

00:29:55.520 --> 00:29:56.000
your idea.

00:29:56.380 --> 00:30:02.160
When you upgrade to Bluehost cloud, you get 100% of time and 24-7 support to ensure your

00:30:02.160 --> 00:30:04.560
site stays online through heavy traffic.

00:30:04.560 --> 00:30:08.420
Bluehost really makes building your dream website easier than ever.

00:30:08.420 --> 00:30:09.700
So what's stopping you?

00:30:09.700 --> 00:30:10.980
You've already got the vision.

00:30:10.980 --> 00:30:11.760
Make it real.

00:30:11.760 --> 00:30:16.700
Visit talkpython.fm/bluehost right now and get started today.

00:30:16.900 --> 00:30:19.220
And thank you to Bluehost for supporting the show.

00:30:19.220 --> 00:30:21.720
All right.

00:30:21.720 --> 00:30:23.760
What other takeaways should we talk about in this first one?

00:30:23.760 --> 00:30:26.380
I think we maybe have pretty much covered it.

00:30:26.380 --> 00:30:26.880
Let's see.

00:30:26.880 --> 00:30:32.340
I guess, you know, we mentioned before, but if people want to see sort of examples of pre-commit

00:30:32.340 --> 00:30:37.140
hooks failing or succeeding or failing because they changed something, which is not exactly

00:30:37.140 --> 00:30:42.160
a failure, but stopping and starting over, you have a nice example of what that's like

00:30:42.160 --> 00:30:42.400
there.

00:30:42.400 --> 00:30:49.860
So one thing that I guess might be useful is sometimes maybe you don't want to run the

00:30:49.860 --> 00:30:50.560
pre-commit hooks.

00:30:50.560 --> 00:30:57.100
Maybe you need to check in something in a certain way to fix the servers down, right?

00:30:57.100 --> 00:30:58.100
We have to check this in.

00:30:58.100 --> 00:31:01.560
I can't fix this hook, whatever this hook is upset about right now.

00:31:01.560 --> 00:31:03.280
It needs to go in right away.

00:31:03.280 --> 00:31:05.200
Just let me commit it, right?

00:31:05.200 --> 00:31:05.900
You can do that.

00:31:05.900 --> 00:31:10.180
I mean, I think there are probably several use cases or something like this.

00:31:10.180 --> 00:31:13.540
Maybe you're going to be squashing things later and it doesn't, and it's, you don't,

00:31:13.540 --> 00:31:17.040
maybe you don't even know what the API for you're doing, what you're doing is going to

00:31:17.040 --> 00:31:17.440
look like.

00:31:17.440 --> 00:31:22.260
It could be, and this kind of ties back to what we talked about earlier, perhaps roughs

00:31:22.260 --> 00:31:25.780
doing something and you don't agree with, but you need to like check with the rest of

00:31:25.780 --> 00:31:29.440
your team to make sure that everyone's in agreement with let's remove this rule.

00:31:29.440 --> 00:31:29.900
Right.

00:31:29.960 --> 00:31:34.060
So it's, I, this definitely don't encourage always doing this.

00:31:34.060 --> 00:31:35.280
That defeats the purpose, right?

00:31:35.280 --> 00:31:40.100
But there is kind of a break glass solution here where you, let's say you first run, get

00:31:40.100 --> 00:31:45.200
commit and something fails and it's not something that you either want to fix at the moment or

00:31:45.200 --> 00:31:45.980
really can fix.

00:31:45.980 --> 00:31:48.960
Then you can just pass it, pass in dash, dash, no verify.

00:31:48.960 --> 00:31:51.440
And none of the checks run at that point.

00:31:51.440 --> 00:31:54.400
So it's like, as if the checks were never there in the first place.

00:31:54.400 --> 00:31:54.880
Right.

00:31:55.080 --> 00:31:55.260
Right.

00:31:55.260 --> 00:31:55.380
Right.

00:31:55.380 --> 00:31:55.820
Okay.

00:31:55.820 --> 00:31:56.840
That's pretty interesting.

00:31:56.840 --> 00:32:00.160
Like you say, hopefully people don't run that all the time.

00:32:00.160 --> 00:32:03.380
At that point, just remove the pre-commit setup, save yourself.

00:32:03.380 --> 00:32:03.720
Yeah.

00:32:03.720 --> 00:32:05.680
Like what are you, what are you even doing?

00:32:05.680 --> 00:32:05.900
Right.

00:32:05.900 --> 00:32:11.460
I suppose there's an interesting interplay between pre-commit hooks and continuous integration,

00:32:11.460 --> 00:32:12.120
right?

00:32:12.120 --> 00:32:16.220
Like in a sense, they are often checking some of the same things.

00:32:16.220 --> 00:32:17.040
What do you think?

00:32:17.040 --> 00:32:22.680
So I think it's probably an example, like not, not quite a Venn diagram.

00:32:22.680 --> 00:32:29.060
I probably, the circle for pre-commit is entirely contained within the circle for the CICD.

00:32:29.060 --> 00:32:33.080
The difference is there are certain things where you can get immediate feedback, quick

00:32:33.080 --> 00:32:37.620
feedback locally, and that should be something that you can put pre-commit things like linting,

00:32:37.620 --> 00:32:38.520
formatting, et cetera.

00:32:38.520 --> 00:32:42.200
And then CICD may be running your test suite.

00:32:42.200 --> 00:32:44.620
That's definitely not something you want to be doing in a commit.

00:32:44.620 --> 00:32:48.700
Imagine you have a test suite that takes three minutes to run, even maybe three minutes isn't

00:32:48.700 --> 00:32:52.760
that bad, but every commit waiting three minutes is definitely not something you want to do.

00:32:52.760 --> 00:32:53.140
No.

00:32:53.140 --> 00:32:55.240
But it's still a check that you should definitely be running.

00:32:55.240 --> 00:32:57.620
So in CICD, I would run everything.

00:32:57.620 --> 00:32:58.920
Do the linting, do the formatting.

00:32:58.920 --> 00:33:03.680
That's your final, that's your last layer of defense and you need to be checking everything.

00:33:03.680 --> 00:33:06.480
And this just allows developers to get that feedback sooner.

00:33:06.480 --> 00:33:07.020
Right.

00:33:07.020 --> 00:33:12.480
So what you're actually checking in and you finally approve is much closer to what CICD

00:33:12.480 --> 00:33:14.140
would kind of want in the first place, right?

00:33:14.140 --> 00:33:14.540
Yeah.

00:33:14.540 --> 00:33:14.920
Yeah.

00:33:14.920 --> 00:33:15.380
Okay.

00:33:15.380 --> 00:33:17.180
And it's also a much faster feedback, right?

00:33:17.180 --> 00:33:20.940
So like if the thing has to run all the way through the linting, the formatting, the testing,

00:33:20.940 --> 00:33:25.380
the type checking, whatever, you might be waiting 10, 15 minutes for all the things to run when

00:33:25.380 --> 00:33:30.280
you could have had, you know, under a minute, hopefully way under a minute feedback instantly that

00:33:30.280 --> 00:33:31.740
your file wasn't formatted correctly.

00:33:31.740 --> 00:33:34.400
It should be near instantaneous, right?

00:33:34.500 --> 00:33:40.680
I mean, instant maybe is asking too much, but some of that astral stuff is kind of ridiculous.

00:33:40.680 --> 00:33:41.240
Yeah.

00:33:41.240 --> 00:33:43.480
I think you have to be very careful, right?

00:33:43.480 --> 00:33:47.680
Because there's all these checks and I think you had up on the screen maybe earlier, like

00:33:47.680 --> 00:33:53.240
the pre-commit hooks, the general ones provided by the pre-commit organization.

00:33:53.240 --> 00:33:53.940
Yeah.

00:33:53.940 --> 00:33:56.940
There's tons of things in there, but you do have to be careful, right?

00:33:56.940 --> 00:33:59.560
Because if you're like, oh, this could be good and this could be good and this could

00:33:59.560 --> 00:33:59.820
be good.

00:33:59.820 --> 00:34:01.820
Each check is adding time.

00:34:02.000 --> 00:34:05.920
Assuming, like I say, assuming they're all running on Python files, you're adding time

00:34:05.920 --> 00:34:06.500
to how long.

00:34:06.500 --> 00:34:09.500
So you do have to be mindful of what you actually need.

00:34:09.500 --> 00:34:15.020
And if you go to the point where you end up making the whole process take too long, people

00:34:15.020 --> 00:34:16.040
are going to stop using it.

00:34:16.040 --> 00:34:17.380
And then that defeats the...

00:34:17.380 --> 00:34:17.540
Yeah.

00:34:17.540 --> 00:34:18.180
Yeah, exactly.

00:34:18.180 --> 00:34:22.520
As soon as it becomes a point where people go, I'm not using this thing, then you're kind

00:34:22.520 --> 00:34:25.660
of kind of sort of lost unless you can just say, no, you have to use it.

00:34:25.660 --> 00:34:27.640
But then you just have unhappy teammates.

00:34:27.640 --> 00:34:28.300
Exactly.

00:34:28.300 --> 00:34:29.960
Either way, it's not a real great outcome, is it?

00:34:29.960 --> 00:34:35.380
I mean, if there's something that maybe only runs on a few files every once in a while, then

00:34:35.380 --> 00:34:39.900
if you are having problems with speed, then you can also consider moving that to the CICD.

00:34:39.900 --> 00:34:45.080
And I am definitely a big fan of rough, as you said, like just switching from black, flaky,

00:34:45.260 --> 00:34:50.000
all that onto rough, you do save a significant amount of time on these checks and it's a huge

00:34:50.000 --> 00:34:50.320
benefit.

00:34:50.320 --> 00:34:51.580
Yeah, it's pretty ridiculous.

00:34:51.580 --> 00:34:54.620
Now, this is not a get pre-commit thing.

00:34:54.620 --> 00:34:57.160
This is a pre-commit the project thing.

00:34:57.160 --> 00:35:01.700
But you can, if you're using this pre-commit project we've been talking about, you can say

00:35:01.700 --> 00:35:07.080
pre-commit space run and do kind of a test without actually doing a commit, right?

00:35:07.080 --> 00:35:07.580
Correct.

00:35:07.580 --> 00:35:07.960
Yeah.

00:35:07.960 --> 00:35:09.840
So there's a bit of nuances.

00:35:09.840 --> 00:35:14.200
So if you just do pre-commit run, it's going to run all of your hooks, but on the staged

00:35:14.200 --> 00:35:17.280
changes, because it's thinking essentially you're doing like a dry run.

00:35:17.280 --> 00:35:22.180
If you, let's say, are adding a new hook and you want to make sure all of your files are

00:35:22.180 --> 00:35:26.260
compatible with that new hook, then you might want to do something like pre-commit run dash

00:35:26.260 --> 00:35:27.220
dash all files.

00:35:27.220 --> 00:35:31.500
So look through your entire repository, regardless of whether you have changes in place.

00:35:31.500 --> 00:35:37.400
So if you say pre-commit run, it only works on your, basically your changed files, not the

00:35:37.400 --> 00:35:38.760
stuff that's already there and accepted.

00:35:38.760 --> 00:35:39.460
Correct.

00:35:39.460 --> 00:35:44.100
And another neat thing is in the case I mentioned where you add a new hook, you might just want

00:35:44.100 --> 00:35:44.900
to run that hook.

00:35:44.900 --> 00:35:49.080
So you can say pre-commit run and then the hook ID, and then you would just run that hook

00:35:49.080 --> 00:35:52.740
and then you can define either a certain set of files or the staged runs, whatever.

00:35:52.740 --> 00:35:53.280
Yeah.

00:35:53.280 --> 00:35:56.840
That sounds pretty useful when you're building your own pre-commit hook, right?

00:35:56.840 --> 00:36:01.880
So yeah, depending on how you build it, you can either use that or they have also a try

00:36:01.880 --> 00:36:03.000
repo command.

00:36:03.000 --> 00:36:03.840
Right.

00:36:03.840 --> 00:36:04.300
Got it.

00:36:04.300 --> 00:36:04.580
Got it.

00:36:04.580 --> 00:36:06.340
Well, let's see.

00:36:06.340 --> 00:36:12.720
Maybe we could jump over and talk a bit through your hook creation guide, a step-by-step guide

00:36:12.720 --> 00:36:14.280
to developing your own pre-commit hook.

00:36:14.280 --> 00:36:17.380
I thought this was really, like I said, a good article.

00:36:17.380 --> 00:36:23.860
And maybe one of the first things we talk about is just what makes a good hook in the first

00:36:23.860 --> 00:36:24.380
place, right?

00:36:24.720 --> 00:36:29.760
You said that they can't be too long or people will go crazy and turn them off or skip them

00:36:29.760 --> 00:36:30.080
or whatever.

00:36:30.080 --> 00:36:31.120
But what else?

00:36:31.120 --> 00:36:36.840
So I think another big thing is if you're able to fix something, then you should fix it.

00:36:36.840 --> 00:36:41.060
In the case of formatting and you're saying, oh, this should have a trailing comma, then

00:36:41.060 --> 00:36:42.140
that's easy enough.

00:36:42.140 --> 00:36:43.240
You can add the trailing comma.

00:36:43.240 --> 00:36:44.900
You don't make more work for the user.

00:36:44.900 --> 00:36:48.800
If you can't do that, then you should be very specific saying this file.

00:36:48.960 --> 00:36:53.760
And if you have a line number saying exactly where it is, because just saying there's something

00:36:53.760 --> 00:36:57.320
wrong in this file and someone has to hunt it is also not a good user experience.

00:36:57.320 --> 00:37:00.700
No, that's going to be frustrating and super, super quick.

00:37:00.700 --> 00:37:01.140
Yeah.

00:37:01.140 --> 00:37:03.220
So be really descriptive about it.

00:37:03.220 --> 00:37:06.820
And then also, maybe choose not to make it a pre-commit hook, right?

00:37:06.820 --> 00:37:09.400
Not necessarily everything needs to run on every commit.

00:37:09.400 --> 00:37:12.860
Yeah, I think that the speed thing is a huge factor.

00:37:12.860 --> 00:37:18.880
And in general, I think one big thing that is key to note here is that it's even,

00:37:18.880 --> 00:37:23.900
though, let's say you change files that, let's say you change a Python file, a Markdown file

00:37:23.900 --> 00:37:24.980
and an image file.

00:37:24.980 --> 00:37:30.400
If you're making a hook that only runs on a certain type of file, if you're careful and

00:37:30.400 --> 00:37:34.080
specify that, then it's not necessarily a bad thing to include that in there because it will

00:37:34.080 --> 00:37:36.480
only get triggered on those certain types of files.

00:37:36.480 --> 00:37:40.400
And so like an example I have is the XF stripper.

00:37:40.400 --> 00:37:44.500
Well, I created when I was building my website.

00:37:44.500 --> 00:37:46.820
Your XF stripper is super interesting.

00:37:47.000 --> 00:37:48.960
I'm starting to think maybe I want this as well.

00:37:48.960 --> 00:37:53.580
Yeah, I was just very paranoid at one point about just working with images.

00:37:53.580 --> 00:37:57.480
And so they come with, what's up here?

00:37:57.480 --> 00:38:02.100
So exchangeable image file format data or XF as it's commonly called.

00:38:02.100 --> 00:38:06.340
It's metadata that is in the image that you might not realize is there.

00:38:06.340 --> 00:38:12.500
And so in this article, I talk about a picture of me presenting that I was given from a conference.

00:38:12.500 --> 00:38:15.500
And this was something that was stored, I think, in a Google Drive.

00:38:15.500 --> 00:38:18.320
So you have access to all the metadata that was available.

00:38:18.320 --> 00:38:20.620
So I never met the photographer.

00:38:20.620 --> 00:38:24.680
And yet I know the photographer's name, the camera they use, what type of computer they have,

00:38:24.680 --> 00:38:26.880
how they edited it, all kinds of information.

00:38:27.060 --> 00:38:31.200
And the dangerous part is the exact location of where this was.

00:38:31.200 --> 00:38:33.280
Now, conference, not a big deal.

00:38:33.280 --> 00:38:38.880
But you have to think about maybe you're blogging about something you did in your house or your apartment.

00:38:38.880 --> 00:38:47.180
And now you have a photo up on your website where anyone can potentially see it that has the GPS coordinates for where you live.

00:38:47.440 --> 00:38:48.460
Yeah, that wouldn't be great, no.

00:38:48.460 --> 00:38:50.980
So I was very paranoid about this.

00:38:50.980 --> 00:38:54.560
And I don't want the idea of like, oh, I'm going to add a new image.

00:38:54.560 --> 00:39:00.400
Let me go through my checklist of what I need to do because I know at some point I'm going to mess something up or forget it.

00:39:00.400 --> 00:39:03.960
And so this is a perfect use case for the pre-commit, right?

00:39:03.960 --> 00:39:08.480
Because you want something that is going to stop you and tell you, nope, you can't do this, right?

00:39:08.480 --> 00:39:14.700
And in this case, it can also remove the metadata because I am being super conservative and saying no metadata,

00:39:14.700 --> 00:39:19.800
which has the nice side benefit of shrinking files, which is good for serving them.

00:39:19.800 --> 00:39:20.240
Yeah.

00:39:20.240 --> 00:39:26.340
Well, what value is it to have all that metadata in there for a blog?

00:39:26.340 --> 00:39:29.540
Most of the time, most people are not, they just want to see, they want to read the blog.

00:39:29.540 --> 00:39:31.160
They're not going to dissect your image, right?

00:39:31.160 --> 00:39:36.480
I think it depends what you, I mean, maybe you have a travel blog and you want to know like, here's that location.

00:39:36.660 --> 00:39:40.800
And then you have one off post where you introduce yourself and oops, you know?

00:39:40.800 --> 00:39:41.580
Yeah.

00:39:41.580 --> 00:39:42.760
There's so many ways.

00:39:42.760 --> 00:39:46.400
And I think even just thinking, oh, I'm only going to be doing this.

00:39:46.400 --> 00:39:48.740
There's always going to be something that later on happens.

00:39:48.740 --> 00:39:53.420
So you have to be very careful just upfront that everything is going to go through this track.

00:39:53.420 --> 00:39:54.200
Sure.

00:39:54.200 --> 00:39:58.140
Can your exit thing, can it be selective about the metadata?

00:39:58.140 --> 00:40:00.860
That's something I do want to do in the future.

00:40:00.860 --> 00:40:03.240
Just remove the location if you say.

00:40:03.240 --> 00:40:11.140
But the thing is, there's like, looking through all of that, it's hard to tell if there might be something in one subset of images you take that might be sensitive.

00:40:11.140 --> 00:40:16.580
You can even think of certain situations where you might not want someone to know what kind of device you were using.

00:40:16.580 --> 00:40:16.900
Right.

00:40:16.900 --> 00:40:20.780
Because maybe they're like, oh, that device is vulnerable to something and I know they have it.

00:40:20.780 --> 00:40:20.980
Right.

00:40:21.900 --> 00:40:34.300
The worst of these is, I think, the multiple times, pretty sure it was the Samsung, but one of the Android companies posted a picture promoting the new phone.

00:40:34.300 --> 00:40:38.620
And, you know, the exit information had the picture as being from an iPhone or something like that.

00:40:38.620 --> 00:40:40.140
Oh, no, it was the other way around, I think.

00:40:40.140 --> 00:40:41.060
Oh, the other way around.

00:40:41.060 --> 00:40:41.620
I think I remember hearing that, yeah.

00:40:41.620 --> 00:40:49.400
Well, it was like one phone company was posting it from, but the picture was actually, even though it was about the phone, it was, you know, implying this picture comes from or something.

00:40:49.400 --> 00:40:50.160
It was like, nope.

00:40:50.160 --> 00:40:54.780
Whoever is on the marketing team just happens to have the other kind of phone and there it goes.

00:40:54.780 --> 00:40:54.960
Right.

00:40:54.960 --> 00:40:55.760
And it's a huge scandal.

00:40:55.760 --> 00:41:00.340
I mean, for those companies that talk about how awesome they're, how much better their cameras are or whatever.

00:41:00.340 --> 00:41:01.880
Well, I see that's also the thing, right?

00:41:01.880 --> 00:41:03.980
Because you never know who's going to look at the metadata either.

00:41:03.980 --> 00:41:09.540
So, and it's interesting because certain things will, certain platforms will remove it.

00:41:09.540 --> 00:41:12.320
So I mentioned like Google Drive, it's everything is preserved.

00:41:12.320 --> 00:41:15.520
But the thing is, is you have to know ahead of time.

00:41:15.520 --> 00:41:18.480
So you'd have to say, I'm planning to put this image here.

00:41:18.480 --> 00:41:20.240
Let me upload a dummy image.

00:41:20.240 --> 00:41:23.400
I don't care and check if the metadata is still there.

00:41:23.400 --> 00:41:24.500
Yeah, exactly.

00:41:24.500 --> 00:41:25.340
Yeah.

00:41:25.340 --> 00:41:27.780
I think, I think Mastodon might remove it.

00:41:27.780 --> 00:41:30.760
There's some certain platforms that will take away that metadata.

00:41:30.760 --> 00:41:31.900
I think Facebook might.

00:41:31.900 --> 00:41:33.100
It's been a long time.

00:41:33.100 --> 00:41:35.780
I mean, it's a huge security concern.

00:41:35.780 --> 00:41:41.720
So I imagine more and more places are, but I just wanted to have an abundance of caution and not risk anything happening.

00:41:41.720 --> 00:41:42.580
Well, yeah.

00:41:42.580 --> 00:41:49.220
And you're putting it on the internet as well, which there's, it goes straight from your computer through some sort of static website process.

00:41:49.220 --> 00:41:50.560
And then it's downloaded, right?

00:41:50.560 --> 00:41:53.060
There's very, there's no, nothing in between those two steps.

00:41:53.060 --> 00:41:53.860
Exactly.

00:41:53.860 --> 00:41:55.600
At least not in terms of image processing.

00:41:55.600 --> 00:41:55.960
Yeah.

00:41:55.960 --> 00:41:56.200
Yeah.

00:41:56.200 --> 00:41:56.560
Cool.

00:41:57.240 --> 00:41:58.060
Yeah, this is nice.

00:41:58.060 --> 00:42:01.160
I'm thinking about grabbing it and trying out.

00:42:01.160 --> 00:42:03.420
What file types does it work on?

00:42:03.420 --> 00:42:07.440
Does it work on just JPEGs or does it do like WebP and all that?

00:42:07.440 --> 00:42:12.120
Any image, anything that's classified as an image on pre-commit, the way pre-commit runs.

00:42:12.120 --> 00:42:15.760
And it has to work with, I'm using Pillow.

00:42:15.760 --> 00:42:17.920
So if Pillow can't read it, then it's not going to work.

00:42:17.920 --> 00:42:18.500
Right.

00:42:18.500 --> 00:42:20.540
Then I'll just skip over it or whatever.

00:42:20.540 --> 00:42:21.020
Yeah.

00:42:21.200 --> 00:42:21.380
Yeah.

00:42:21.380 --> 00:42:26.480
So really quick, while we're talking about stuff on your website, your website's super nice.

00:42:26.480 --> 00:42:28.600
Did you build this yourself?

00:42:28.600 --> 00:42:29.620
Like, how is this thing built?

00:42:29.620 --> 00:42:29.960
I did.

00:42:29.960 --> 00:42:31.320
I did build it to myself.

00:42:32.520 --> 00:42:38.100
I took a couple months in the beginning of the year and I had before a single page where

00:42:38.100 --> 00:42:39.600
it was just like some boxes.

00:42:39.600 --> 00:42:41.720
And then I was like, this needs to be revisited.

00:42:42.220 --> 00:42:47.940
So it's built with Next.js and so React and TypeScript.

00:42:47.940 --> 00:42:50.280
And then I use Tailwind CSS.

00:42:50.280 --> 00:42:54.700
And yeah, it was kind of just like, I mean, a lot of these things are for me because sometimes,

00:42:54.700 --> 00:43:00.740
you know, I like seeing all in one place where I'm speaking next or like stats about where

00:43:00.740 --> 00:43:02.480
I've spoken, like a map and stuff.

00:43:02.480 --> 00:43:08.060
And I went through, so kind of my process would be, you know, on my iPad, I would sketch out

00:43:08.060 --> 00:43:13.180
what I kind of envisioned a page looking at and then I would prototype it in React and

00:43:13.180 --> 00:43:17.780
see, okay, maybe this isn't fully work or like tweak things and iterate on a few times

00:43:17.780 --> 00:43:20.600
and bit by bit the pages formed.

00:43:20.600 --> 00:43:24.280
The latest thing I added was this timeline functionality.

00:43:24.280 --> 00:43:31.240
At EuroPython this year, I had this idea for a timeline and I kind of got really, really into

00:43:31.240 --> 00:43:31.400
it.

00:43:31.400 --> 00:43:31.860
So it was funny.

00:43:31.860 --> 00:43:32.780
I had a Python conference.

00:43:32.780 --> 00:43:34.040
I was doing tons of React.

00:43:34.640 --> 00:43:38.160
But if you scroll down a tiny bit, there's actually too much.

00:43:38.160 --> 00:43:39.040
This one, right?

00:43:39.040 --> 00:43:39.420
Yeah, yeah.

00:43:39.420 --> 00:43:41.120
Versus the little text.

00:43:41.120 --> 00:43:42.620
Oh, the complete upcoming.

00:43:42.620 --> 00:43:43.360
Yeah, I got you.

00:43:43.360 --> 00:43:44.180
So I built this.

00:43:44.180 --> 00:43:45.060
Oh, that's beautiful.

00:43:45.060 --> 00:43:45.620
I love it.

00:43:45.620 --> 00:43:48.780
It's like a little infographic of your upcoming events.

00:43:48.780 --> 00:43:49.360
Yeah.

00:43:49.360 --> 00:43:53.980
So I was like very inspired and I did this in a few days.

00:43:53.980 --> 00:43:59.480
But it's nice because, you know, going from the sketch to the React components, it's become

00:43:59.480 --> 00:44:03.380
very natural, which it takes a bit to get there.

00:44:03.380 --> 00:44:08.720
But it was nice because I did have to learn TypeScript for some changes in my team.

00:44:08.720 --> 00:44:10.540
We were going to be starting moving to TypeScript.

00:44:10.540 --> 00:44:15.700
So this was great to work on something that, you know, fit in my head as far as what needed

00:44:15.700 --> 00:44:16.200
to be done.

00:44:16.200 --> 00:44:17.960
And it was very, very helpful.

00:44:17.960 --> 00:44:20.300
But yeah, so I'm very proud of this.

00:44:20.300 --> 00:44:22.380
There's still more, tons more to do.

00:44:22.380 --> 00:44:23.720
I have massive lists.

00:44:24.100 --> 00:44:25.380
But yeah, I remember looking at Google.

00:44:25.380 --> 00:44:26.820
This is a nice static site.

00:44:26.820 --> 00:44:27.240
Very cool.

00:44:27.240 --> 00:44:28.820
And I didn't even see this feature.

00:44:28.820 --> 00:44:29.220
This is great.

00:44:29.220 --> 00:44:31.900
Broadvon out in the audience says fire emoji for it.

00:44:31.900 --> 00:44:32.300
Very good.

00:44:32.300 --> 00:44:33.580
Thank you.

00:44:33.580 --> 00:44:36.020
And also, thanks.

00:44:36.020 --> 00:44:38.200
I see you put the podcast appearance on here as well.

00:44:38.200 --> 00:44:38.860
That's cool.

00:44:38.860 --> 00:44:40.040
So that's happening today.

00:44:40.040 --> 00:44:41.360
Watch the live stream now.

00:44:41.360 --> 00:44:43.360
If you're not watching now, then it's probably missed it.

00:44:43.360 --> 00:44:45.000
But the recording will be there, of course.

00:44:45.000 --> 00:44:49.320
But the reason I say that is you maybe want to give a shout out to some of your upcoming

00:44:49.320 --> 00:44:50.120
events.

00:44:50.120 --> 00:44:51.000
Yeah, why not?

00:44:51.120 --> 00:44:57.360
So I'm going to be in San Francisco next week talking about my Datamorph project.

00:44:57.360 --> 00:45:02.680
And I'll also be doing a book signing there for my hands-on data analysis with Pandas book,

00:45:02.680 --> 00:45:03.320
second edition.

00:45:03.320 --> 00:45:09.840
And then after that, I'm off to France to give a workshop on Pandas and then also talk about

00:45:09.840 --> 00:45:12.140
getting started in open source contributions.

00:45:12.140 --> 00:45:18.220
And then a couple of weeks after that, I will be at the final conference of the year in Australia.

00:45:18.220 --> 00:45:21.100
And I will be talking about Datamorph once again.

00:45:21.100 --> 00:45:26.280
And I'm hoping to run my third development sprint on Datamorph while I'm there.

00:45:26.280 --> 00:45:27.120
Oh, that's cool.

00:45:27.120 --> 00:45:28.980
Yeah, we'll talk about Datamorph in a second.

00:45:28.980 --> 00:45:30.380
That's some interesting stuff.

00:45:30.380 --> 00:45:33.000
But this is quite the agenda.

00:45:33.000 --> 00:45:34.400
You got a full trip coming up.

00:45:34.400 --> 00:45:35.360
No, I'm excited.

00:45:35.360 --> 00:45:39.600
It's nice to see different cultures.

00:45:39.960 --> 00:45:43.960
It definitely does land different, you know, the topics and just reactions.

00:45:43.960 --> 00:45:46.300
Some people are at the top excited.

00:45:46.300 --> 00:45:48.340
Some of them are just straight face.

00:45:48.340 --> 00:45:49.500
You're like, I enjoy it.

00:45:49.500 --> 00:45:54.660
I think it really comes into play as far as giving workshops.

00:45:54.660 --> 00:46:00.100
I was in Portugal last week and I did the data analysis workshop.

00:46:00.100 --> 00:46:03.500
And I think that was one of the best ones I've ever had.

00:46:03.560 --> 00:46:07.500
It was very, very highly interactive and it was a really fun time for me.

00:46:07.500 --> 00:46:09.240
And hopefully everyone else thought so as well.

00:46:09.240 --> 00:46:11.180
Yeah, that's fantastic.

00:46:11.180 --> 00:46:12.940
How did you get into public speaking?

00:46:12.940 --> 00:46:19.960
Yeah, so I wrote the hands-on data analysis with Panda's book in 2019.

00:46:20.640 --> 00:46:25.240
And at that time, if you had told me, go do some public speaking, I'm like, please no.

00:46:25.240 --> 00:46:29.180
You're going to France and Australia and Portugal recently.

00:46:29.180 --> 00:46:30.160
So I'm like, no, no, no.

00:46:30.160 --> 00:46:30.640
Yeah.

00:46:30.640 --> 00:46:38.940
And then, well, during pandemic times, a conference reached out to me about doing a workshop on pandas

00:46:38.940 --> 00:46:41.560
because I had written the book and doing it virtually.

00:46:41.560 --> 00:46:47.160
And to me, that felt like a good stepping stone to get over that fear of public speaking and

00:46:47.160 --> 00:46:48.380
the fact that it would be virtual.

00:46:48.840 --> 00:46:50.260
I wouldn't really have to look at anyone.

00:46:50.260 --> 00:46:55.620
And I was still absolutely terrified when it came to actually delivering that talk.

00:46:55.620 --> 00:46:57.560
And when you think about it, it wasn't a talk, right?

00:46:57.560 --> 00:47:01.440
So it was my first thing was a four-hour workshop.

00:47:01.440 --> 00:47:08.180
And now I'm at the point where a virtual thing is much less desirable because it's so hard when

00:47:08.180 --> 00:47:12.520
you can't see people, you can't see our things landing, are they confused, are they with me?

00:47:12.520 --> 00:47:13.740
Are they even still there?

00:47:14.740 --> 00:47:19.180
So, and then after I did, you know, I made it to the end and I was like, okay, that's

00:47:19.180 --> 00:47:22.200
definitely something I want to work on and do it again.

00:47:22.200 --> 00:47:26.240
So I did, I came up with a second workshop on data visualization.

00:47:26.640 --> 00:47:31.260
And then I think I did two or three more virtual sessions.

00:47:31.260 --> 00:47:35.820
And then it became that some conferences were now in person.

00:47:35.820 --> 00:47:37.940
And I was like, okay, I think I should try this.

00:47:37.940 --> 00:47:40.080
And again, it was still a long one.

00:47:40.080 --> 00:47:42.320
It may have even been a six-hour session that time.

00:47:42.320 --> 00:47:43.800
So it's like crazy, right?

00:47:43.800 --> 00:47:45.700
And then I did that in person.

00:47:45.700 --> 00:47:47.520
And I was like, okay, I survived.

00:47:47.780 --> 00:47:52.680
And then it kind of just felt like something, if I kept doing it, I would get over it or

00:47:52.680 --> 00:47:56.620
at least get to the point where, you know, I could do it without being terrified for a

00:47:56.620 --> 00:47:57.400
month ahead of time.

00:47:57.400 --> 00:47:57.800
Right.

00:47:57.800 --> 00:47:59.000
And I am at that point now.

00:47:59.000 --> 00:48:04.160
It is like, I enjoy doing it because I enjoy, I'm very passionate about knowledge sharing and

00:48:04.160 --> 00:48:09.480
just teaching people and getting that interaction that, oh, people are really like getting value

00:48:09.480 --> 00:48:10.080
out of this.

00:48:10.080 --> 00:48:11.760
And that to me is very nice.

00:48:11.760 --> 00:48:12.180
Yeah.

00:48:12.180 --> 00:48:13.060
It's super rewarding.

00:48:13.520 --> 00:48:15.340
So, but yeah, this is quite impressive.

00:48:15.340 --> 00:48:18.600
So just, I got the sense you kind of got started pretty soon.

00:48:18.600 --> 00:48:19.320
You said 2019.

00:48:19.320 --> 00:48:21.400
So that's, haven't been doing it for that long.

00:48:21.400 --> 00:48:22.140
And this is great.

00:48:22.140 --> 00:48:27.020
So maybe, you know, you brought it, maybe we could talk a bit about your book as well.

00:48:27.020 --> 00:48:29.520
I don't know what to say about this one.

00:48:29.520 --> 00:48:33.220
Just that it exists and people should check it out.

00:48:33.220 --> 00:48:33.980
It's giant.

00:48:33.980 --> 00:48:34.600
It's giant.

00:48:34.600 --> 00:48:36.500
As you can see, 788 pages.

00:48:36.500 --> 00:48:37.680
Holy moly.

00:48:37.680 --> 00:48:38.300
That is giant.

00:48:38.300 --> 00:48:40.180
Yeah.

00:48:40.180 --> 00:48:42.120
So this is the second edition.

00:48:42.480 --> 00:48:46.040
If you scroll down, there's also the covers for the Korean and Chinese editions.

00:48:46.040 --> 00:48:47.660
Oh, awesome.

00:48:47.660 --> 00:48:52.160
And I do not read either of those, but I do have copies.

00:48:52.160 --> 00:48:53.520
You can act of faith to put your name on them.

00:48:53.520 --> 00:48:54.420
You know what?

00:48:54.420 --> 00:48:59.380
I've been told by people that read both of those languages that the name is not quite translated

00:48:59.380 --> 00:49:02.020
correctly, but you know, I'll forget about that.

00:49:02.020 --> 00:49:03.480
It's cool to have the copies.

00:49:03.480 --> 00:49:04.680
Yeah.

00:49:04.680 --> 00:49:10.400
So this book covers obviously pandas working through the basics of data analysis.

00:49:10.400 --> 00:49:13.460
We also talk about data visualization.

00:49:13.460 --> 00:49:19.120
And then there is a little bit towards the end about like actually applying this stuff

00:49:19.120 --> 00:49:21.840
to use cases and also a little bit of machine learning.

00:49:21.840 --> 00:49:22.200
Cool.

00:49:22.200 --> 00:49:22.960
Yeah.

00:49:23.020 --> 00:49:24.540
So I'll put a link in the show notes.

00:49:24.540 --> 00:49:26.840
People can check it out if they would like to.

00:49:26.840 --> 00:49:27.240
All right.

00:49:27.240 --> 00:49:28.300
I feel like there's a few things.

00:49:28.300 --> 00:49:31.880
We didn't make it very far in our creation guide.

00:49:31.880 --> 00:49:33.820
So let's talk about the recipe.

00:49:33.820 --> 00:49:34.440
All right.

00:49:34.440 --> 00:49:35.720
What are the four steps?

00:49:35.720 --> 00:49:39.000
At least Stephanie's recipe for pre-commit hook.

00:49:39.200 --> 00:49:39.520
Yeah.

00:49:39.520 --> 00:49:41.100
This is definitely my recipe.

00:49:41.100 --> 00:49:46.200
I mean, I've, I think I've made two that are published ones and then obviously a few other

00:49:46.200 --> 00:49:48.200
for trainings and explanation purposes.

00:49:48.200 --> 00:49:50.980
And this, this is something that works well for me.

00:49:50.980 --> 00:49:53.880
And I think makes sense as far as thinking about the pieces.

00:49:53.880 --> 00:49:59.040
So the first thing, the hardest thing is actually to figure out what are you checking and how do

00:49:59.040 --> 00:50:00.100
you actually code that up?

00:50:00.100 --> 00:50:03.960
And if you want to do this in Python, this is just, okay, code your logic.

00:50:03.960 --> 00:50:04.360
Yeah.

00:50:04.360 --> 00:50:04.540
Right.

00:50:04.540 --> 00:50:04.820
Yeah.

00:50:04.820 --> 00:50:09.140
Well, and if it has a --fix, maybe that's even harder than just trying to

00:50:09.140 --> 00:50:10.020
understand, right?

00:50:10.020 --> 00:50:13.600
Because now you got to not break somebody's code or sorts of things like that.

00:50:13.600 --> 00:50:13.780
Yeah.

00:50:13.780 --> 00:50:18.260
But this would be where you start at the basic level, probably first, you know, find,

00:50:18.260 --> 00:50:22.140
figure out, can you find the issue and show people where it is?

00:50:22.140 --> 00:50:23.320
And then you can look into fixing it.

00:50:23.320 --> 00:50:27.160
But yeah, you have to be very careful, especially if you're going to be touching things.

00:50:27.160 --> 00:50:32.540
So I guess it's pretty straightforward, but the magic of Python is not just the language

00:50:32.540 --> 00:50:37.400
and the static, the standard library, but the 500,000 external packages, right?

00:50:37.420 --> 00:50:41.080
There's probably a ton of external packages that understand code, check different things.

00:50:41.080 --> 00:50:44.520
And you could, you can use those in your hook implementation, right?

00:50:44.520 --> 00:50:47.860
Just like a standard Python package, it can have dependencies and stuff.

00:50:47.860 --> 00:50:48.380
Yes.

00:50:48.380 --> 00:50:54.020
And so I talk about this in the third step, but I do like to make it as a package just

00:50:54.020 --> 00:50:57.820
because you know that that's going to work and grab the dependencies as long as you follow

00:50:57.820 --> 00:50:58.900
what you already know.

00:50:59.300 --> 00:51:04.580
And pre-commit will, you will tell pre-commit in the fourth step in that pre-commit hooks

00:51:04.580 --> 00:51:06.120
file how it should be installed.

00:51:06.120 --> 00:51:11.020
So when you say this is, this is Python, then it will know, okay, so I should be using, for

00:51:11.020 --> 00:51:12.400
example, pip to install this.

00:51:12.400 --> 00:51:16.960
And if you have, for example, pyproject.tomo and you specify how it should be built, then

00:51:16.960 --> 00:51:18.800
all of that just happens as it normally would.

00:51:18.800 --> 00:51:20.500
It's just that pre-commit is doing it instead of you.

00:51:20.500 --> 00:51:20.900
Yeah.

00:51:20.900 --> 00:51:21.280
Yeah.

00:51:21.340 --> 00:51:25.880
That's kind of, instead of you doing a pip install dashy dot or whatever, that it's

00:51:25.880 --> 00:51:26.680
kind of figuring that out.

00:51:26.680 --> 00:51:31.420
And I guess we haven't really talked too much about it, but when you pre-commit install, it

00:51:31.420 --> 00:51:36.420
looks at the, this hooks YAML file and then it, it creates the environment and it downloads

00:51:36.420 --> 00:51:39.120
all the packages the first time to kind of set it up.

00:51:39.120 --> 00:51:41.060
Then it just runs over and over after that.

00:51:41.060 --> 00:51:41.240
Right.

00:51:41.240 --> 00:51:41.640
Yeah.

00:51:41.640 --> 00:51:47.440
Unless you change something in your pre-commit config file, then it won't need to rebuild the

00:51:47.440 --> 00:51:48.560
environment for this.

00:51:48.560 --> 00:51:51.060
So if you keep the same version, then it's kind of like you said.

00:51:51.160 --> 00:51:52.840
I installed this version of the package.

00:51:52.840 --> 00:51:56.400
And as long as you don't say you need to update the package and it's kind of like a virtual

00:51:56.400 --> 00:51:56.760
environment.

00:51:56.760 --> 00:51:57.100
Okay.

00:51:57.100 --> 00:51:57.800
You already have that.

00:51:57.800 --> 00:51:58.520
There's no need to.

00:51:58.520 --> 00:51:59.100
Yeah.

00:51:59.100 --> 00:51:59.700
Yeah.

00:51:59.700 --> 00:51:59.980
Excellent.

00:51:59.980 --> 00:52:07.060
So your recipe is one, design the check function to turn it into a CLI, which there's some interesting

00:52:07.060 --> 00:52:08.260
stuff in that one as well.

00:52:08.260 --> 00:52:08.600
That's.

00:52:08.600 --> 00:52:13.040
And I think that's kind of where the --fix comment comes into play.

00:52:13.040 --> 00:52:13.200
Right.

00:52:13.200 --> 00:52:18.980
So your logic, that check function, you should be able to say this was successful.

00:52:18.980 --> 00:52:21.140
This was not successful as in stop the commit.

00:52:21.140 --> 00:52:26.660
And then the CLI provides a very easy way to plug into that.

00:52:26.660 --> 00:52:31.460
Maybe you want to say --fix or dash dash, you know, leave this type of file alone,

00:52:31.460 --> 00:52:33.720
whatever kind of modification you want to do.

00:52:33.720 --> 00:52:36.100
You can expose that in a CLI.

00:52:36.460 --> 00:52:42.840
And that's also a quicker way to get started versus trying to, let's say, read the pipe,

00:52:42.840 --> 00:52:46.160
find the pipe project.tongle, read it in, parse out things.

00:52:46.160 --> 00:52:51.300
That's all stuff that can come later once you figure out exactly how you want your tool to

00:52:51.300 --> 00:52:52.060
be configured.

00:52:52.060 --> 00:52:52.560
Yeah.

00:52:52.700 --> 00:52:57.240
Especially if it just has one or two arguments, it might not be necessary to be too, too over

00:52:57.240 --> 00:52:58.460
the top with all the configuration.

00:52:58.460 --> 00:53:00.680
And then you make it installable.

00:53:00.680 --> 00:53:05.160
Basically, like you said, make it a package and then create the pre-commit hooks.

00:53:05.160 --> 00:53:05.380
Yeah.

00:53:05.380 --> 00:53:06.400
Well, those are the steps.

00:53:06.780 --> 00:53:09.280
So I think write the function, that's pretty straightforward.

00:53:09.280 --> 00:53:12.320
You just, whatever you want it to do, you just write a function that does it.

00:53:12.320 --> 00:53:19.880
You do have an example in here about checking for valid file names and snake cased file names.

00:53:19.880 --> 00:53:25.540
So things like it can't be just one letter and it has to be snake cased and so on.

00:53:25.740 --> 00:53:25.900
Right.

00:53:25.900 --> 00:53:31.740
But then to turn that into a CLI, there's a lot of options in Python these days, right?

00:53:31.740 --> 00:53:37.180
You can click, you can type, but if you want something built in, yeah, if you want something

00:53:37.180 --> 00:53:39.620
built in, argparse is pretty straightforward, right?

00:53:39.620 --> 00:53:40.140
Yeah.

00:53:40.140 --> 00:53:45.940
And I think also, I mean, if you look at the pre-commit hooks repo provided by pre-commit org,

00:53:45.940 --> 00:53:49.100
a lot of them, or maybe all of them are just using argpars.

00:53:49.100 --> 00:53:54.300
Because for most hooks, all you'll need to say is, I have an argument parser and it accepts

00:53:54.300 --> 00:53:54.880
file names.

00:53:55.220 --> 00:53:58.180
And at that point you have this boilerplate that you can just copy and you don't even

00:53:58.180 --> 00:54:02.240
need to worry about configuring multiple, you know, different arguments.

00:54:02.240 --> 00:54:06.640
It doesn't have to be too advanced with like sub commands and all that kind of stuff necessarily.

00:54:06.640 --> 00:54:07.480
Yeah.

00:54:07.480 --> 00:54:07.660
Yeah.

00:54:07.660 --> 00:54:09.380
And then make it installable.

00:54:09.380 --> 00:54:15.280
This is, you recommend a pyproject.toml, which yeah, for packages these days, that seems

00:54:15.280 --> 00:54:17.560
pretty much the de facto standard, right?

00:54:17.560 --> 00:54:18.140
Yeah.

00:54:18.140 --> 00:54:21.520
And then what's nice is, yeah, you're using current things.

00:54:21.520 --> 00:54:23.200
You're not relying on setup.py.

00:54:23.620 --> 00:54:27.080
And also in there, there's a way to expose an entry point.

00:54:27.080 --> 00:54:28.840
And that's line 24.

00:54:28.840 --> 00:54:29.460
Yeah.

00:54:29.460 --> 00:54:29.740
Yeah.

00:54:29.740 --> 00:54:29.960
Yeah.

00:54:29.960 --> 00:54:30.020
Yeah.

00:54:30.020 --> 00:54:30.800
That's really nice.

00:54:30.800 --> 00:54:31.880
I love entry points.

00:54:31.880 --> 00:54:35.480
I think it's, I think they're massively underused in Python.

00:54:35.480 --> 00:54:40.440
You know, people talk about how do I create a script that I can give it to somebody so they

00:54:40.440 --> 00:54:41.220
can run something.

00:54:41.220 --> 00:54:44.760
And that so often involves like, where is it?

00:54:44.760 --> 00:54:46.040
Where is its associated files?

00:54:46.040 --> 00:54:47.580
Where is its Python?

00:54:47.580 --> 00:54:48.720
And where is its dependence?

00:54:48.720 --> 00:54:52.820
All of that stuff you, if you just create a package and it has an entry point, you can

00:54:52.820 --> 00:54:55.060
pipx install it or uv tool install it.

00:54:55.060 --> 00:54:58.820
Or, and now you just have all these commands and people don't have to mess with all the Python

00:54:58.820 --> 00:54:59.240
stuff.

00:54:59.240 --> 00:55:01.860
Even if you know how to do it, you don't necessarily want to do that all the time.

00:55:01.860 --> 00:55:02.080
Right?

00:55:02.080 --> 00:55:02.480
Yeah.

00:55:02.480 --> 00:55:05.520
And then it's just easy to, you can kind of call it from anywhere at that point.

00:55:05.520 --> 00:55:06.340
Yeah, exactly.

00:55:06.680 --> 00:55:12.040
So in this example, you give, you put a, a validate dash file name command and you

00:55:12.040 --> 00:55:16.060
just point to, you know, what module and then what function to call.

00:55:16.060 --> 00:55:17.500
And that's the CLI.

00:55:17.500 --> 00:55:17.660
Yeah.

00:55:17.660 --> 00:55:18.140
That's really nice.

00:55:18.140 --> 00:55:22.440
And then of course that, that function in there is built and backed with arg parse.

00:55:22.440 --> 00:55:24.640
So it all, it kind of all comes through a circle right there.

00:55:24.640 --> 00:55:24.800
Yeah.

00:55:24.800 --> 00:55:25.240
Yeah.

00:55:25.240 --> 00:55:29.840
So it's like you, it's almost like you had created, you know, some command line utility,

00:55:29.840 --> 00:55:31.380
like bash wise or something.

00:55:31.380 --> 00:55:35.520
And you just have that available and it's hooks into your, your CLI.

00:55:35.520 --> 00:55:39.920
I also want to call out on a 21 line 21, cause we talked about dependencies, right?

00:55:39.920 --> 00:55:44.640
So anything you put in there, that's automatically will get grabbed when pre-commit installs.

00:55:44.640 --> 00:55:46.040
So in this case, there's nothing.

00:55:46.040 --> 00:55:50.340
And then the case of the exit stripper I mentioned, like we need to install pillow, right?

00:55:50.340 --> 00:55:55.020
So this is how you can configure how pre-commit will grab everything.

00:55:55.020 --> 00:55:58.560
And I also see it has, yeah, I see there's a requires Python version.

00:55:58.560 --> 00:56:01.380
Does pre-commit help you get Python in any way?

00:56:01.380 --> 00:56:03.560
Or is it just assume that there's a...

00:56:03.560 --> 00:56:06.840
You need to have whatever languages you're relying on, you do need to have them installed

00:56:06.840 --> 00:56:07.140
already.

00:56:07.140 --> 00:56:07.700
Okay.

00:56:07.700 --> 00:56:12.560
So in order for you to use this pre-commit hook on your machine, you'd have to have, for

00:56:12.560 --> 00:56:16.960
example, Python 3, 10, 11, 12, something like that installed, given that it says 310

00:56:16.960 --> 00:56:17.300
or greater.

00:56:17.300 --> 00:56:21.860
So for example, like if you saw some hook that sounded interesting, but it's written in Go

00:56:21.860 --> 00:56:24.760
and you don't have Go on your computer, you have to figure that out first.

00:56:24.760 --> 00:56:25.580
That's a no-go.

00:56:25.580 --> 00:56:28.020
It's a no-go.

00:56:28.120 --> 00:56:28.320
All right.

00:56:28.320 --> 00:56:28.660
Let's see.

00:56:28.660 --> 00:56:30.540
Yeah.

00:56:30.540 --> 00:56:36.000
And then last thing to do is you say, create the pre-commit hooks.yaml file.

00:56:36.000 --> 00:56:39.820
And is this the thing that goes into your repo?

00:56:39.820 --> 00:56:42.400
So when pre-commit sees it, it knows what to do?

00:56:42.400 --> 00:56:42.800
Yeah.

00:56:42.800 --> 00:56:47.400
So for example, in the exif stripper repo, there's this file exists.

00:56:47.400 --> 00:56:51.160
So if someone uses exif stripper, they point to that repository.

00:56:51.160 --> 00:56:54.380
And then when pre-commit goes and grabs it, it looks for this file, right?

00:56:54.380 --> 00:56:58.960
And then the key things here, for one being language.

00:56:58.960 --> 00:57:02.880
So language tells pre-commit, how does it try to install that?

00:57:02.880 --> 00:57:04.580
So in this case, it says, oh, this is Python.

00:57:04.580 --> 00:57:06.020
So then it knows, okay, pip.

00:57:06.020 --> 00:57:12.160
The ID at the top, that's the name that you reference in the pre-commit config.

00:57:12.160 --> 00:57:17.340
Like when you want to, like we saw check toml, check yaml in the beginning, those correspond

00:57:17.340 --> 00:57:23.140
to entries in the pre-commit hooks yaml of that repository that they were being referenced

00:57:23.140 --> 00:57:23.440
from.

00:57:23.440 --> 00:57:28.540
So pre-commit can, so first finds this file, it can install, then it can see, oh, which

00:57:28.540 --> 00:57:29.420
hook do you want?

00:57:29.420 --> 00:57:30.880
Validate file name in this case.

00:57:30.880 --> 00:57:32.820
And then how do I call this?

00:57:32.820 --> 00:57:33.640
And that's entry.

00:57:33.640 --> 00:57:38.480
And this is pointing to the entry point that we made, but it can be anything, right?

00:57:38.520 --> 00:57:43.180
You could call rough and then add, you know, 20 different command line flags if you want.

00:57:43.180 --> 00:57:44.440
And that can be your hook.

00:57:44.440 --> 00:57:46.140
And that would be fine as well.

00:57:46.140 --> 00:57:51.500
And what's very interesting here is it's optional, but it's the types one at the bottom.

00:57:51.500 --> 00:57:55.240
So I talked before about XF stripper only running on images, right?

00:57:55.240 --> 00:57:58.620
It'd be wasteful to have it look at toml and markdown, right?

00:57:58.620 --> 00:57:59.840
If it's not going to do anything with it.

00:57:59.840 --> 00:58:01.720
Can't find any XF information in the toml.

00:58:01.720 --> 00:58:02.120
Yeah.

00:58:02.120 --> 00:58:04.380
So this controls that.

00:58:04.500 --> 00:58:08.680
So for example, this hook will only run on Python files.

00:58:08.680 --> 00:58:14.140
And this logic, I'm blanking on the name of the tool that pre-commit uses to figure this

00:58:14.140 --> 00:58:14.320
out.

00:58:14.320 --> 00:58:15.320
But this is handled elsewhere.

00:58:15.320 --> 00:58:17.260
So there's like certain names that you can use.

00:58:17.260 --> 00:58:18.180
Right.

00:58:18.180 --> 00:58:23.880
Some sort of category mapping over to these file extensions or these bombs at the beginning

00:58:23.880 --> 00:58:26.020
of the file or whatever mean that it's this thing.

00:58:26.020 --> 00:58:26.640
Exactly.

00:58:26.640 --> 00:58:31.100
There is a very dangerous thing with this and that types is an and.

00:58:31.540 --> 00:58:35.840
So if you say, if you wanted to do like this should run on Python and markdown, you can't

00:58:35.840 --> 00:58:41.360
use this because it will look for files that are both Python and markdown and will not end

00:58:41.360 --> 00:58:41.600
well.

00:58:41.600 --> 00:58:43.580
Not too many of those exist.

00:58:43.580 --> 00:58:43.800
Yeah.

00:58:43.800 --> 00:58:46.680
There's a separate types or that you have to use.

00:58:46.680 --> 00:58:47.940
That's like a little gotcha.

00:58:47.940 --> 00:58:51.900
It's like an ORM sort of instead of a SQL statement.

00:58:51.900 --> 00:58:52.820
Kind of you got to.

00:58:52.820 --> 00:58:53.640
Yeah.

00:58:53.640 --> 00:58:54.480
Those things always get weird.

00:58:54.480 --> 00:58:56.160
Like import the or operator.

00:58:56.160 --> 00:58:57.020
Like, okay.

00:58:57.020 --> 00:58:58.780
Yeah.

00:58:58.920 --> 00:58:59.100
Cool.

00:58:59.100 --> 00:58:59.340
Okay.

00:58:59.340 --> 00:59:03.940
That's actually that that is very good to know because it looks like a list of options.

00:59:03.940 --> 00:59:05.060
It is.

00:59:05.060 --> 00:59:05.240
Yeah.

00:59:05.240 --> 00:59:05.900
But they combine.

00:59:05.900 --> 00:59:08.680
So you might have something like it is a file and it's Python.

00:59:08.680 --> 00:59:10.320
That might be one thing I've seen.

00:59:10.320 --> 00:59:10.740
Right.

00:59:10.740 --> 00:59:11.800
Okay.

00:59:11.800 --> 00:59:12.720
Yeah.

00:59:12.720 --> 00:59:12.920
Cool.

00:59:12.920 --> 00:59:16.940
So if I wanted to have more than one hook, I could put it into one.

00:59:16.940 --> 00:59:18.300
I could have more than one here.

00:59:18.300 --> 00:59:18.860
Is that possible?

00:59:18.860 --> 00:59:19.280
Yeah.

00:59:19.280 --> 00:59:20.480
So this looks like a list.

00:59:20.480 --> 00:59:21.100
Yeah, exactly.

00:59:21.100 --> 00:59:22.920
It's structured as a YAML list.

00:59:22.920 --> 00:59:27.460
So you just kind of could copy that block, paste the new one, and then just change whatever

00:59:27.460 --> 00:59:28.420
fields you want.

00:59:28.420 --> 00:59:31.000
And then that's now the second hook that you expose.

00:59:31.000 --> 00:59:31.860
Right.

00:59:31.860 --> 00:59:36.280
And working backwards, I suppose you just expose a different entry point potentially and then

00:59:36.280 --> 00:59:38.000
just call it out or whatever you want.

00:59:38.000 --> 00:59:41.440
Well, I mean, you could like maybe you have a validate file name and maybe you have another

00:59:41.440 --> 00:59:44.820
one that's like validate long file names or something where you're like, now they have

00:59:44.820 --> 00:59:45.580
to be this long.

00:59:45.580 --> 00:59:47.720
And then it's just a shortcut for something else.

00:59:47.720 --> 00:59:49.100
So it doesn't have to be a different thing.

00:59:49.380 --> 00:59:49.860
Oh, yeah.

00:59:49.860 --> 00:59:52.860
You just put an argument in there as a default kind of for people.

00:59:52.860 --> 00:59:57.000
So we talked about args earlier and that was something the user could tweak.

00:59:57.000 --> 01:00:01.220
Anything you put in here is essentially like it will always run with these.

01:00:01.220 --> 01:00:04.220
So you could bake in certain things that have to happen.

01:00:04.220 --> 01:00:04.700
Yeah.

01:00:04.700 --> 01:00:05.160
Awesome.

01:00:05.160 --> 01:00:06.140
I love it.

01:00:06.140 --> 01:00:06.380
Okay.

01:00:06.380 --> 01:00:12.220
We're pretty much out of time, but let's talk about one final thing.

01:00:12.220 --> 01:00:13.300
Not this one.

01:00:13.300 --> 01:00:15.300
Your Datamorph project.

01:00:15.300 --> 01:00:19.060
Give a quick shout out to that before we wrap things up.

01:00:19.100 --> 01:00:19.420
What do you think?

01:00:19.420 --> 01:00:19.960
Sure.

01:00:19.960 --> 01:00:26.460
So this project started related to the pandas workshop I had mentioned.

01:00:26.460 --> 01:00:31.980
I wanted to have a visual to really drive home the point that we needed to visualize our

01:00:31.980 --> 01:00:35.700
data because pandas very much data wrangling.

01:00:35.700 --> 01:00:40.320
And after talking to people two hours about data wrangling and statistics, you can calculate

01:00:40.320 --> 01:00:41.200
on tabular data.

01:00:41.200 --> 01:00:44.060
Some people just feel like, oh, okay, we're done.

01:00:44.060 --> 01:00:45.620
I mean, you know, we're done.

01:00:45.620 --> 01:00:47.600
And that's definitely not the case.

01:00:47.600 --> 01:00:52.640
And I was thinking about, and you had it on the screen before, but the data source doesn't.

01:00:52.640 --> 01:00:53.320
So yeah.

01:00:53.320 --> 01:01:01.320
So there was research in 2017 by Autodesk where they took the idea of Anscombe's Quartet, which

01:01:01.320 --> 01:01:07.960
is, sorry, just a little bit above that, which is just a set of four, yeah, four data sets.

01:01:08.360 --> 01:01:10.560
They share the same summary statistics.

01:01:10.560 --> 01:01:16.000
So the mean in X and Y, the standard deviation in X and Y, and the Pearson correlation coefficient.

01:01:16.000 --> 01:01:17.720
And they look very different.

01:01:17.720 --> 01:01:24.680
And if you think of, naively, you think, well, I know the average and maybe how spread out

01:01:24.680 --> 01:01:25.140
things are.

01:01:25.300 --> 01:01:28.760
So I can kind of get a sense of what this data probably means.

01:01:28.760 --> 01:01:33.820
But in reality, outliers and other weird things could just completely blow up those ideas,

01:01:33.820 --> 01:01:34.100
right?

01:01:34.100 --> 01:01:34.520
Yeah.

01:01:34.520 --> 01:01:40.820
And so in 2017, they had developed this algorithm using simulated annealing.

01:01:40.820 --> 01:01:47.100
So if you scroll down once more, where they take the dinosaur at the top and they use

01:01:47.100 --> 01:01:48.480
simulated annealing to push the points.

01:01:48.480 --> 01:01:50.600
Let me describe this really quick for just people listening.

01:01:50.600 --> 01:01:56.920
So there's a matplotlib looking graph of some data points, and it has a certain standard

01:01:56.920 --> 01:01:58.680
deviation, certain mean, et cetera.

01:01:58.680 --> 01:02:02.680
But if you actually look at it, it looks like a T-Rex, right?

01:02:02.680 --> 01:02:03.320
Something like this?

01:02:03.320 --> 01:02:03.760
Yes.

01:02:03.760 --> 01:02:05.820
Is that a decent enough description?

01:02:05.820 --> 01:02:07.000
That's a perfect description.

01:02:07.000 --> 01:02:07.540
Yeah.

01:02:07.540 --> 01:02:12.860
So what the researchers have done is they use this simulated annealing algorithm to push

01:02:12.860 --> 01:02:13.960
the points around.

01:02:13.960 --> 01:02:18.420
So starting from that dinosaur and just moving the points ever so slightly in such a way where

01:02:18.420 --> 01:02:23.440
the summary statistics are unchanged, at least to the two decimal places where they're currently

01:02:23.440 --> 01:02:26.080
shown, and tried to make other shapes.

01:02:26.080 --> 01:02:32.240
So some of the other shapes they have are a bullseye, a circle, lines slanted vertically

01:02:32.240 --> 01:02:33.480
or a star.

01:02:33.480 --> 01:02:38.940
And all of these can be formed from that dinosaur, some to varying degrees of success.

01:02:38.940 --> 01:02:45.100
But they're visually recognizable, which is the point that is pretty important here, right?

01:02:45.100 --> 01:02:48.960
So you cannot, as we said, rely on those summary statistics because you don't know.

01:02:48.960 --> 01:02:49.600
Is it the star?

01:02:49.600 --> 01:02:50.320
Is it the dinosaur?

01:02:50.320 --> 01:02:51.300
Is it a line?

01:02:51.300 --> 01:02:52.240
It could be anything.

01:02:52.240 --> 01:02:56.760
And they also had animation that they included.

01:02:56.760 --> 01:03:00.660
So basically, you could start from the dinosaur and then turn it into a circle.

01:03:00.660 --> 01:03:06.300
And that's even more impractical because you realize at that point that it's not just the

01:03:06.300 --> 01:03:10.280
dinosaur and the circle that have something in common, but it's the infinite number of

01:03:10.280 --> 01:03:14.020
points arrangements that you can make between them that actually share that.

01:03:14.020 --> 01:03:20.900
And so I wanted to explore if I could extend that to working for arbitrary data sets and also

01:03:20.900 --> 01:03:21.680
different shapes.

01:03:21.680 --> 01:03:27.140
So I found the research code and spent quite a bit hacking at it and even just trying to

01:03:27.140 --> 01:03:29.800
get it to work for their example.

01:03:29.800 --> 01:03:30.980
And that took quite a bit of time.

01:03:30.980 --> 01:03:35.860
And then I had this idea of being that it was for a pandas workshop to take a panda and

01:03:35.860 --> 01:03:36.300
turn it.

01:03:36.300 --> 01:03:38.160
Initially, I wanted to turn it into the dinosaur.

01:03:38.160 --> 01:03:44.100
I still have not found a good way to do that yet, but I also haven't been trying at all this

01:03:44.100 --> 01:03:45.520
year on that, to be honest.

01:03:45.520 --> 01:03:51.700
But I figured out how to, and by adding a lot of other things that didn't exist in the initial

01:03:51.700 --> 01:03:56.540
algorithm, things like calculating bounds of the data and different metrics that I figured

01:03:56.540 --> 01:03:59.180
out a way to get it to work regardless.

01:03:59.180 --> 01:04:04.700
So I can give it a panda data set or a soccer ball and it can perform these transformations

01:04:04.700 --> 01:04:06.300
and move the points around.

01:04:06.300 --> 01:04:11.600
So on the screen, we have the first time I shared this publicly, what I had been working on,

01:04:11.600 --> 01:04:12.780
it happened to be Easter.

01:04:12.780 --> 01:04:17.080
So I made a bunny holding an Easter egg with the words, happy Easter off the side.

01:04:17.080 --> 01:04:22.460
And it turns into two vertical lines all while preserving the summary statistics.

01:04:22.880 --> 01:04:28.980
This is something I think makes it for a very good teaching tool in say like an introductory

01:04:28.980 --> 01:04:32.820
statistics course to encourage people that they need to visualize.

01:04:32.820 --> 01:04:38.600
There's an interesting study, I think called the hypothesis is a liability.

01:04:38.600 --> 01:04:44.960
And they talked about taking students in a statistical analysis course and they split them into two.

01:04:44.960 --> 01:04:49.800
And one set of students were just given the data set and say, here, explore, see what you find.

01:04:49.800 --> 01:04:53.360
And then the other set were given a set of hypotheses to test.

01:04:53.360 --> 01:04:56.320
And it turns out that the data is shaped like a gorilla.

01:04:56.320 --> 01:05:02.820
And the students who were told here, test these hypotheses were five times less likely to even

01:05:02.820 --> 01:05:05.700
realize that it was shaped like a gorilla because they never plotted it.

01:05:05.700 --> 01:05:06.040
Yeah.

01:05:06.040 --> 01:05:10.760
This is such a huge thing to like get people learning this early.

01:05:10.760 --> 01:05:14.120
And the more shocking these visuals are, the better.

01:05:14.800 --> 01:05:14.920
Yeah.

01:05:14.920 --> 01:05:17.320
And I think these are super shocking, right?

01:05:17.320 --> 01:05:22.300
Having T-Rexes and bunnies and go, you know, that bunny is, you know, equivalent.

01:05:22.300 --> 01:05:28.040
And there's a continuous transformation from bunny to blob of dots with one outside dot, right?

01:05:28.040 --> 01:05:30.440
That kind of stuff kind of surprise you, I think.

01:05:30.440 --> 01:05:37.000
And one thing I see, especially when the dinosaur came out, but even when I posted some of my first

01:05:37.000 --> 01:05:41.680
examples is you see people comment right away, wow, that there's something that's so cool that

01:05:41.680 --> 01:05:44.120
that dinosaur is possible to do that with.

01:05:44.120 --> 01:05:44.780
Like, no, no, no.

01:05:44.780 --> 01:05:47.260
It's not, it's not just the dinosaur or just the panda.

01:05:47.260 --> 01:05:48.240
It's really like anything.

01:05:48.240 --> 01:05:53.260
And so the way this also works is that people can use their own data sets or they can add

01:05:53.260 --> 01:05:53.820
something new.

01:05:53.920 --> 01:05:59.420
And that's what I've had, that's what's what I've done this year in the two previous development

01:05:59.420 --> 01:06:07.060
sprints that I had people just been, I did one in EuroPython and one in PyCon Taiwan earlier

01:06:07.060 --> 01:06:07.540
this year.

01:06:07.540 --> 01:06:11.100
And hopefully in Australia, we'll do some more.

01:06:11.100 --> 01:06:14.880
But I had people add, for example, a target shape.

01:06:15.000 --> 01:06:21.040
So what the, for example, the panda would turn into, we have a club, like the card suit,

01:06:21.040 --> 01:06:24.080
which was quite a challenge, and the spade.

01:06:24.080 --> 01:06:25.640
And I had already had the heart.

01:06:25.640 --> 01:06:30.420
The heart is actually a trigonometric equation, which, you know, blew my mind at first.

01:06:30.420 --> 01:06:35.900
There's actually a page I found on, I think, Wolfram Alpha, which was like, I want to say

01:06:35.900 --> 01:06:40.520
like 10 or 15 different equations, trigonometric equations for different types of hearts.

01:06:40.520 --> 01:06:42.980
And you can pick the exact type of heart you wanted.

01:06:43.760 --> 01:06:45.840
Social media heart, the emoji heart, what are we talking about?

01:06:45.840 --> 01:06:48.360
No, no, it was just like, this is longer, this is more curved.

01:06:48.360 --> 01:06:49.780
Yeah, yeah, yeah, that's awesome.

01:06:49.780 --> 01:06:53.200
But these are all now math problems when you think about that side of it.

01:06:53.200 --> 01:06:57.980
So this could then be used maybe in a course where they want to focus on math, but also

01:06:57.980 --> 01:06:58.700
some more coding.

01:06:58.700 --> 01:07:02.340
So there's lots of different use cases, like just giving it the data.

01:07:02.340 --> 01:07:04.800
And that's very much more just pure statistics.

01:07:04.800 --> 01:07:09.360
But, you know, I think, and I've heard from a few teachers that, from what I presented,

01:07:09.360 --> 01:07:13.180
that they're, it sounds like this would be something that they would like to use.

01:07:13.260 --> 01:07:14.480
So hopefully that does happen.

01:07:14.480 --> 01:07:16.840
If not, it's a fun thing to put in my slides.

01:07:16.840 --> 01:07:18.700
And I did enjoy getting it to work.

01:07:18.700 --> 01:07:23.780
Yeah, I didn't pull up any good videos for the YouTube video, but there's some really nice

01:07:23.780 --> 01:07:27.140
animations of actually seeing it go from one to the other that you got.

01:07:27.140 --> 01:07:32.480
And this is, you're doing a talk at PyCon Australia, and then you're doing a sprint on

01:07:32.480 --> 01:07:33.620
this as well, right?

01:07:33.620 --> 01:07:36.440
Coming up in November 22nd, about a month from now.

01:07:36.440 --> 01:07:36.820
Correct.

01:07:37.040 --> 01:07:37.440
So cool.

01:07:37.440 --> 01:07:41.900
People can check that out if they happen to be at PyCon Australia and want to...

01:07:41.900 --> 01:07:45.580
Well, I'll also be talking about it in San Francisco next week.

01:07:45.580 --> 01:07:48.240
There won't be a sprint, but I will be talking about that.

01:07:48.240 --> 01:07:48.900
So people can...

01:07:48.900 --> 01:07:49.080
Okay.

01:07:49.080 --> 01:07:49.920
It's not a PyCon.

01:07:49.920 --> 01:07:50.720
Sure.

01:07:50.720 --> 01:07:51.520
It's still cool.

01:07:51.520 --> 01:07:52.440
All right.

01:07:52.520 --> 01:07:54.320
Well, Stephanie, thank you so much for being here.

01:07:54.320 --> 01:07:56.260
Let's wrap things up.

01:07:56.260 --> 01:08:00.920
But I guess, you know, give us a final call to action for people maybe interested in pre-commit

01:08:00.920 --> 01:08:02.180
hoax or other stuff that you're doing.

01:08:02.180 --> 01:08:06.520
Yeah, you can find everything that we mentioned here and the projects on my website.

01:08:06.520 --> 01:08:12.440
I'm putting much more effort into putting stuff on there this year now that I've rebuilt it.

01:08:12.440 --> 01:08:15.680
So definitely check there and sign up for my newsletter.

01:08:15.680 --> 01:08:16.980
Follow me on socials.

01:08:17.000 --> 01:08:19.180
There's no links down here, but you can find them.

01:08:19.180 --> 01:08:21.540
There'll be links on the episode page.

01:08:21.540 --> 01:08:22.780
So we'll put them there.

01:08:22.780 --> 01:08:23.280
All right.

01:08:23.280 --> 01:08:24.020
Well, thanks.

01:08:24.020 --> 01:08:24.900
Thanks for being here.

01:08:24.900 --> 01:08:25.760
It's great to talk to you.

01:08:25.760 --> 01:08:26.580
Thanks for coming on and sharing.

01:08:26.580 --> 01:08:27.540
Thanks for having me.

01:08:27.540 --> 01:08:27.920
Yeah.

01:08:27.920 --> 01:08:28.220
Bye-bye.

01:08:28.220 --> 01:08:32.400
This has been another episode of Talk Python To Me.

01:08:32.400 --> 01:08:34.220
Thank you to our sponsors.

01:08:34.220 --> 01:08:35.820
Be sure to check out what they're offering.

01:08:35.820 --> 01:08:37.240
It really helps support the show.

01:08:37.240 --> 01:08:39.380
Take some stress out of your life.

01:08:39.380 --> 01:08:45.180
Get notified immediately about errors and performance issues in your web or mobile applications with Sentry.

01:08:45.660 --> 01:08:50.160
Just visit talkpython.fm/sentry and get started for free.

01:08:50.160 --> 01:08:53.740
And be sure to use the promo code talkpython, all one word.

01:08:53.740 --> 01:08:56.360
This episode is brought to you by Bluehost.

01:08:56.360 --> 01:08:58.080
Do you need a website fast?

01:08:58.080 --> 01:08:58.980
Get Bluehost.

01:08:58.980 --> 01:09:04.340
Their AI builds your WordPress site in minutes and their built-in tools optimize your growth.

01:09:04.340 --> 01:09:05.300
Don't wait.

01:09:05.300 --> 01:09:08.900
Visit talkpython.fm/bluehost to get started.

01:09:08.900 --> 01:09:10.400
Want to level up your Python?

01:09:10.400 --> 01:09:14.440
We have one of the largest catalogs of Python video courses over at Talk Python.

01:09:14.440 --> 01:09:19.620
Our content ranges from true beginners to deeply advanced topics like memory and async.

01:09:19.620 --> 01:09:22.300
And best of all, there's not a subscription in sight.

01:09:22.300 --> 01:09:25.200
Check it out for yourself at training.talkpython.fm.

01:09:25.200 --> 01:09:27.300
Be sure to subscribe to the show.

01:09:27.300 --> 01:09:30.080
Open your favorite podcast app and search for Python.

01:09:30.080 --> 01:09:31.400
We should be right at the top.

01:09:31.400 --> 01:09:36.560
You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:09:36.720 --> 01:09:40.740
and the direct RSS feed at /rss on talkpython.fm.

01:09:40.740 --> 01:09:43.720
We're live streaming most of our recordings these days.

01:09:43.720 --> 01:09:47.120
If you want to be part of the show and have your comments featured on the air,

01:09:47.120 --> 01:09:51.560
be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:09:51.560 --> 01:09:53.600
This is your host, Michael Kennedy.

01:09:53.600 --> 01:09:54.900
Thanks so much for listening.

01:09:55.040 --> 01:09:56.060
I really appreciate it.

01:09:56.060 --> 01:09:57.980
Now get out there and write some Python code.

01:09:57.980 --> 01:10:19.160
I'll see you next time.

