Plotting the trail for Django Cairn
This high-level post will go over how I design Django projects. The goal is to consider every action the application will need and determine what is needed to implement those actions. We’ll encounter some hand-waving around the details, but that’s fine. We can’t know everything. And if we did, why are we bothering with this?
The benefit of this exercise is to realize a more complete understanding of the project and to create an outline of what needs to be implemented. An outline makes it much easier to split the work into chunks and determine the dependencies and priorities. Then, we’ll delegate to others to maximize project efficiency. Even if you’re working alone, it’s nice to know how far you need to go. Personally, if I chug along on a project without knowing where it ends, I’m likely to become demoralized and quit. Plus, I simply find this exercise fun.
My general approach is to ask myself, “how do I want this to work?” That answer is probably a list of features or requirements. Next, I go through that list and ask the question again. “How do I want this to work?” I continue this process until I’m satisfied with the level of details included.
Step 1: Define the purpose of the project
The project I’ll be working on is Django Cairn (read more about what it is here). The purpose is to create an index of Django knowledge around the community and to guide folks to particularly useful resources.
Step 2: How do I want Django Cairn to work
Now that I have the two purposes for the project, I can move onto defining how the project should work to accomplish those goals. One thing the project will need to do will be to collect and store knowledge from sources. This will achieve the purpose of being an index of knowledge. The second purpose is a little more subtle. My current interpretation of “being a guide for community resources” means there will need to be curation and reviews of content.
This all means I want Django Cairn to do the following:
- Retain a list of content sources
- Support adding new content sources
- Fetch new content from known sources
- Refetch content periodically to check staleness
- Support creating reviews of content
- Display reviews of content
- Display general content
Step 3: Skeleton data model
Some of the previous requirements are straightforward. Some hand-wave the details. I like to start with defining the interfaces of the project first, then move directly to the data model.
When I use the term interface here, it doesn’t necessarily mean the user interface. Instead it means the system of interaction between the system and the user. Step 2 outlined the interface in a series of statements of what the system should do and how users will interact with it.
As more of the interface gets defined, the data model will require updates. At this phase of the project changing the data model has a low cost. However, as the project gets closer to completion the cost to change the data model grows significantly. Django Cairn is almost certainly missing definitions on how it should work. This process should help me discover them.
So far, here’s what I have for model definitions to accomplish the goals as I currently understand them:
Content
- title
- description
- posted
- source
- url
Source
- url
- title
ContentReview
- content
- user
- published
- publish date
- created
- updated
- review
- must_read
- reader level (beginner, intermediate, expert, all)
Step 4: How does the data model get populated
With the basic data model established, I can focus on the next phase. Namely, how to get data into the data model.
Going back to the list of desired actions, there are four related to content creation:
- Support adding new content sources
- Fetch new content from known sources
- Refetch content periodically to check staleness
- Support creating reviews of content
Let’s consider them individually.
Support adding new content sources
This sounds fairly straightforward. There needs to be a way to add new content sources, however, I never clarified how exactly those content sources should be added. Will it scour the web searching for any blog using Django, or will it be a minimalistic form? If you’re working with others, here’s when you need to collaborate with them.
Since I’m building this alone, I get to decide. I want to get something off the ground as quickly as possible, but still support adding known content types in the future. The known content type for me will be any blog that publishes a RSS feed.
I don’t plan on automatically ingesting content from conferences such as DjangoCon US because there’s no guarantee that the data will be the same from one year to the next. And there’s especially no guarantee that DjangoCon US, DjangoCon EU, and Python Web Conference will all use the same data file structure. While I’d love to automatically pull that data in every year, it’s simply too much work. If I find that there are commonalities in the future, I can build it out then.
Hidden in that decision is a change to my data model. I’ve started talking about different types of sources. Now the data model needs to support that difference. The new model will be:
Source
- url
- title
- type
Let’s get back to actually creating a source. Who is performing this action? To start, it will largely be only myself. However, the purpose of this application is to be a catalog of Django knowledge and I certainly will never know all the sources. At some level, the application needs to support accepting outside submissions for new sources. Going further along that process, after a source has been requested, I’ll need to review something. Automatically ingesting content from an unknown or anonymous source and then rendering it is an easy way to host content that violates the Django Code of Conduct—and my values. I think a sufficient requirement for a new source request is an email address to contact for more information, and a text-based reason upon submission.
Now, let’s consider this specific action again: “a new source is requested via a form”.
How does that work? Should it create an entry in a table that sends a notification to be
reviewed, or should that table be the Source
model or an entirely different model? Seeing
as my driving force is speed and functionality, I think adding a few fields to the
Source
model and relying on email to manage source requests is an effective solution.
After all, the email can contain a link to the Django administration page to modify a
Source
instance’s properties. Yes, I lose the historical record of reasons for why
a Source
instance should be created, but I don’t envision that data being useful, either
now or in the future.
Getting back to our model, I’m going to store the contact. Hey, it might be nice to know who to contact if an issue were to arise in the future!
Source
- url
- title
- type
- contact
- active
We also now have information on how the view should work for requesting a new source.
- Submit new source
- Fields
- url
- title
- contact
- reason
- On submit
- Create an inactive source from the data.
- Fields
If a Source
instance already exists, don’t re-create or change content. Only send an email.
Send an email with all submitted fields in the body of the email.
Phew, done with that one!
Fetch new content from known sources.
I’ve decided that blog sources should have content automatically pulled in. This implies a need to periodically run some logic to fetch data. The alternative would be to have the content creators ping the site to have the new post(s) fetched, but that goes against industry standards for search engines. Plus, they’re focused on creating their own stuff and I’d rather them do that.
As soon as I say the phrase “periodically run”, I know it means background jobs. For
now, I can probably get away with cron and
management commands.
That said, this approach means I now need to store any information regarding the
last time the source was checked for new content. That can be handled with some new
fields to the Source
model.
Source
- url
- title
- type
- contact
- active
- last_checked
The field last_checked
will be used to identify when a check was last performed
on the source so the system can skip it until the next period. The following is the
logic of the background task that will need to be created to fetch new content.
- Identify any unchecked
Source
instances in the last X hours - Request posts from source URL
- Create new
Post
instances - Update existing
Post
instances if needed
Refetch content periodically to check staleness.
The main goal here is to not link to material that is no longer accessible.
“Inaccessible” will be defined as a non-200 response after three tries over
three days. This means the Content
model will require additional fields.
Content
- title
- description
- posted
- source
- url
- active
- last_checked
- next_check
- staleness_count
The field active will be used to identify when a Content
instance should no longer
be displayed on the site. The field last_checked
will be used to identify the last
time a check was performed on the source so the system can skip it until the next
period. However, this will only be useful when the content is valid. If the content
isn’t valid, it should be checked sooner. Right now, it makes the most sense to
have an override datetime field to identify the next time a check should be performed
(next_check
), regardless of last_checked
.
The last new field is staleness_count
. Frankly, naming is hard for me. This one will
probably change in the future. The purpose of this field is to identify how many times
the content was fetched but failed to return valid content.
The logic of the task should be:
- Identify any active, unchecked content in last X days or active content that has a
next_check
in the past - Request content from
Content.url
- If content is valid, clear
next_check
, staleness_count, and update content fields as necessary - If content is invalid, increment
staleness_count
and setnext_check
a future datetime - If
staleness_count
exceeds Y, setactive = False
Support creating reviews of content.
A goal for Django Cairn is to provide some guidance to other developers. This will be accomplished by providing commentary on what a reader can expect from a given piece of content. I’d also like to highlight particularly good content and the Djangonaut experience level that the content is meant for.
For now, I plan on being the only reviewer, but in the future this may open up to others. As far as I can tell, the initial draft of the model suffices for the given requirements. Since I will be the only person reviewing content, I’m making the executive decision to use the Django Administration site to manage those reviews. After all, the goal is to deliver functionality, not build the perfect web app.
Step 5: Ask what’s missing
The application now has sources, content, and reviews. It has a way to add new sources, fetch new content, and create reviews for the content. So what’s missing?
How will Djangonauts find content that’s relevant to them!?
Welp, that’s a pretty egregious oversight. Nonetheless, it’s pretty typical for the process of planning a project. My next step is to identify the key actions enabling users to find content that’s relevant to them. I’ve received some good ideas from the general community and I settled on the following:
- Content tags
- Django version tags
- Python version tags
- Full-text search
Okay, almost there… Except not quite. Sorry!
The next part of the process is to go back to Step 2. “How do I want this to work?” This time, it will be in the context of the above features, but I need to be careful. While the more time spent planning and designing the application the better, there are diminishing returns. I could potentially spend an infinite amount of time stuck on this phase. And keeping that theme in mind, I’m going to skip a more detailed explanation in favor of concluding this blog post.
Step N-1: Cut scope.
Okay, almost there! I promise.
The last step is to re-review all the features and work that has been identified, then eliminate the excess. Every project is suspect to feature creep, and it’s especially easy to let it slip in during the planning phase. All ideas sound great until you hit that 80% done mark and the last 80% of the project finally begins! That’s why it’s imperative to keep a tight leash on the project’s scope; it’s always better to ship something on time and add more later than to never ship at all.
That’s it. Thanks for following along to the bitter end. It means a lot to me. Have questions? Shoot me an email or reach out to me on the Fediverse.