Gender Analysis Toolkit
At this point, you have all of the tools necessary to work on our Summer 2021 projects installed -- Git, Atom, Python, Node/npm, and PyCharm -- so let's put it all together and get our first project running.
If you have not yet completed the Technical Onboarding guide, you're in the wrong place. Head over to the Intro and get started there. You'll end up back here eventually.
For the first half of the summer, we’ll focus on the Gender Analysis Toolkit. For this project, we’ll make use of Git, Python, and PyCharm. We’ll reserve using Node/npm for the summer's second project, Sonification Techinques.
In this guide, you’ll find the following sections:
Reading material
- Gender Novels Project, DH Labs
- Gender Analysis Toolkit documentation, DH Labs
- Gender and Cultural Analytics: Finding or Making Stereotypes?, Laura Mandell
- Text Analysis, Melanie Walsh
- Why Data Science Needs Feminism, Catherine D'Ignazio and Laura Klein
- What Gets Counted Counts, Catherine D'Ignazio and Laura Klein
- The Numbers Don’t Speak for Themselves, Catherine D'Ignazio and Laura Klein
Cloning the Repository
Open GitHub Desktop. If this is your first time using the application and you've already been added to the dhmit organization, dhmit projects should appear in the repositories pane, under the "Filter your repositories" search box. Click on dhmit/gender_analysis
, and click Clone dhmit/gender_analysis
.
If this option doesn't appear, click Clone a repository from the Internet...
or go to File
-> Clone repository
. In the URL tab, enter dhmit/gender_analysis
.
Leave the Local path
field as is, but take note of it, as you'll need it later. Click Clone
to download the repo.
Once the repo is cloned, you should see a screen that looks like this. Try clicking on Windows Show in Explorer
or on macOS Show in Finder
to see where the project has ended up on your computer, or click View on GitHub
to go to our project page on GitHub.
Setting up the Gender Analysis Toolkit project
Setting up GATK in PyCharm
Open the project
Open PyCharm. In the menu bar, select File->Open, and open the folder where you cloned the repo in the step above. By default on Windows this is C:\\Users\\YOUR_USER_NAME\\Documents\\GitHub\\gender_analysis
and on macOS /Users/YOUR_USER_NAME/Documents/GitHub/gender_analysis
.
If you get this dialog box, always select "This Window" (the default) or "New Window" if you want to have multiple projects open at once. (Ryaan doesn't recommend the latter: PyCharm is very resource-intensive, so unless you absolutely need to have two projects open at once, click "This Window".)
Once it's loaded, you should see the gender_analysis
folder appear in the Project
pane on the left.
Setup Python environment and packages for GATK
Open the settings window (macOS PyCharm -> Preferences
; Windows, File -> Settings
), and go to "Project: gender_analysis" -> Project Interpreter. Click on the wheel in the top right and select "Add".
Select "New Environment". Make sure that the base interpreter is the path that you noted when installing Python. Leave the location for the interpreter as the default provided by PyCharm.
Apply your changes, wait for the virtual environment to be created, and exit the settings window.
Click Terminal
in the bottom status bar. You should see (venv)
at the beginning of your prompt. If you do not, restart PyCharm and check again. If you still don't see (venv)
appear, go back to the beginning of this section and make sure you correctly setup a new environment.
Go back to the Project Interpreter window, and ensure that the Python Interpreter field is populated with the name Python 3.9 (gender_analysis)
followed by a path. If this says anything else, find Mike or Ryaan to get help fixing it.
In the Project
panel, find and open the requirements.txt
file in the top-level gender_analysis
folder. Wait until PyCharm automatically detects that this file contains our Python package requirements. Click Install requirements
in the banner that pops up, and click Install
once the Choose Packages to Install
window pops up. (If it asks you to install a plugin, you can safely click Ignore extension
) You should see the installation process begin in the bottom status bar. We can let this run while moving on to the next step, but we'll come back to this to make sure it worked.
At this point, you've completed the setup for the gender_analysis
project! Here are a few usage notes, but if you’ve worked with the gender_analysis
project before, feel free to skip these and get started.
Tips for local development
General usage notes
You can find user-facing documentation here that roughly outlines how to use the Gender Analysis Toolkit. Over the course of the summer, we’ll make significant updates and modifications to both the codebase and these docs.
Tests, coverage, and linting
pytest
The gender_analysis
project includes several packages and configurations to assist us in ensuring our code is well-formatted and tested. For both docstring and unit/integration testing, we use pytest
, a standard Python testing package. To use pytest
, you can either execute the command pytest
in PyCharm's Terminal
or set up a configuration for a one-click testing and coverage run.
Running pytest
from the Terminal
Click on the Terminal
tab in the bottom status bar of PyCharm to open a terminal prompt for your virtual environment. Enter the command pytest
to run all docstring and unit tests for the entire project. You should see terminal output that looks something like:
You can also run specific docstring or unit tests by specifying a filepath: pytest gender_analysis/analysis/proximity.py
. You should see terminal output that looks something like:
Running pytest
from PyCharm configuration
You can also configure PyCharm to include a one-click option for running tests. To do so, follow the following steps.
Click on Add Configuration...
in the status bar at the top of PyCharm:
In the following menu, click the +
button:
Navigate down to the pytest
option and add any additional configuration you’d like or leave the default settings intact:
You configuration for pytest
are now intact! When you’d like to run your tests, click the 'play' button in PyCharm's top status bar:
coverage
The gender_analysis
project also includes the `coverage` package, which allows us easily to learn how much of our code is covered by tests and to produce visualizations to help us track down uncovered code. While it is not necessarily adviseable or even desirable to have complete code coverage, it’s a good idea to ensure that the majority of our code is covered by testing.
To run a coverage report of our codebase, execute the command coverage run
in the Terminal
. You should see something like this:
You can follow this up by executing the command coverage report
in the Terminal
to view our current code coverage. You should see something like this:
And finally you can run coverage html
in the Terminal
to generate HTML files containing detailed coverage information that you can open and navigate in the browser:
pylint
Similar to pytest
and coverage
above, you can run pylint
followed by a filepath (ex. pylint gender_analysis
) in the PyCharm Terminal
to perform automatic Python style linting on the project. In the words of the pylint docs: "pylint is a tool that checks for errors in Python code, tries to enforce a coding standard and looks for code smells. It can also look for certain type errors, it can recommend suggestions about how particular blocks can be refactored and can offer you details about the code's complexity." pylint
runs automatically when you open a pull request on GitHub, so it’s worth the time to run pylint
locally every once in a while while you develop and to correct anything that it flags.