Gender Analysis Toolkit

At this point, you have all of the tools necessary to work on our Summer 2021 projects installed -- Git, Atom, Python, Node/npm, and PyCharm -- so let's put it all together and get our first project running.

If you have not yet completed the Technical Onboarding guide, you're in the wrong place. Head over to the Intro and get started there. You'll end up back here eventually.

For the first half of the summer, we’ll focus on the Gender Analysis Toolkit. For this project, we’ll make use of Git, Python, and PyCharm. We’ll reserve using Node/npm for the summer's second project, Sonification Techinques.

In this guide, you’ll find the following sections:

  1. Schedule
    1. Reading materials
  2. Cloning the Repository
  3. Setting up the Gender Analysis Toolkit project
    1. Setting up GATK in PyCharm
    2. Setup Python environment and packages for GATK
  4. Tips for local development
    1. General usage notes
    2. Tests, coverage, and linting

Summer schedule

We’ll follow a regular weekly cadence of lab activities during the summer term. Unless otherwise stated, all activities will take place in the lab’s Gather space: https://gather.town/app/vQrHYbBBPjr5pMeI/dhmit.

In addition to the scheduling below, Mike will hold office hours on Tuesday and Thursday from 4:00pm to 5:00pm. Feel free to drop by!

Monday Tuesday Wednesday Thursday Friday
06-0712:00pm to 1:00pm:
introductory meeting
06-0812:00pm to 3:00pm:
working hours
06-09 06-10 06-113:00pm to 5:00pm:
hack hours
06-142:00pm to 4:00pm:
working hours
06-152:00pm to 3:00pm:
group stand up meetings
06-16 06-172:00pm to 3:00pm:
group stand up meetings
06-18Juneteenth observed:
no lab activities
06-212:00pm to 4:00pm:
working hours
06-22 06-234:00pm to 5:00pm:
group stand up meetings
06-24 06-253:00pm to 5:00pm:
hack hours
06-283:00pm to 5:00pm:
working hours
06-29 06-304:00pm to 5:00pm:
group stand up meetings
07-01 07-02Institute holiday:
no lab activities
07-02Independence Day observed:
no lab activities
07-06 07-074:00pm to 5:00pm:
group stand up meetings
07-08 07-093:00pm to 5:00pm:
hack hours
07-12Lab closed:
no lab activities
07-13Lab closed:
no lab activities
07-14Lab closed:
no lab activities
07-15Lab closed:
no lab activities
07-16Lab closed:
no lab activities
07-193:00pm to 5:00pm:
working hours
07-20 07-214:00pm to 5:00pm:
group stand up meetings
07-22 07-233:00pm to 5:00pm:
hack hours
07-263:00pm to 5:00pm:
working hours
07-27 07-284:00pm to 5:00pm:
group stand up meetings
07-29 07-303:00pm to 5:00pm:
hack hours
08-023:00pm to 5:00pm:
working hours
08-03 08-044:00pm to 5:00pm:
group stand up meetings
08-05 08-063:00pm to 5:00pm:
hack hours
08-093:00pm to 5:00pm:
working hours
08-10 08-114:00pm to 5:00pm:
group stand up meetings
08-12 08-133:00pm to 5:00pm:
hack hours

Reading material

Cloning the Repository


Open GitHub Desktop. If this is your first time using the application and you've already been added to the dhmit organization, dhmit projects should appear in the repositories pane, under the "Filter your repositories" search box. Click on dhmit/gender_analysis, and click Clone dhmit/gender_analysis.


If this option doesn't appear, click Clone a repository from the Internet... or go to File -> Clone repository. In the URL tab, enter dhmit/gender_analysis.

Leave the Local path field as is, but take note of it, as you'll need it later. Click Clone to download the repo.


Once the repo is cloned, you should see a screen that looks like this. Try clicking on Windows Show in Explorer or on macOS Show in Finder to see where the project has ended up on your computer, or click View on GitHub to go to our project page on GitHub.

Setting up the Gender Analysis Toolkit project

Setting up GATK in PyCharm

Open the project


Open PyCharm. In the menu bar, select File->Open, and open the folder where you cloned the repo in the step above. By default on Windows this is C:\\Users\\YOUR_USER_NAME\\Documents\\GitHub\\gender_analysis and on macOS /Users/YOUR_USER_NAME/Documents/GitHub/gender_analysis.


If you get this dialog box, always select "This Window" (the default) or "New Window" if you want to have multiple projects open at once. (Ryaan doesn't recommend the latter: PyCharm is very resource-intensive, so unless you absolutely need to have two projects open at once, click "This Window".)


Once it's loaded, you should see the gender_analysis folder appear in the Project pane on the left.

Setup Python environment and packages for GATK


Open the settings window (macOS PyCharm -> Preferences; Windows, File -> Settings), and go to "Project: gender_analysis" -> Project Interpreter. Click on the wheel in the top right and select "Add".


Select "New Environment". Make sure that the base interpreter is the path that you noted when installing Python. Leave the location for the interpreter as the default provided by PyCharm.


Apply your changes, wait for the virtual environment to be created, and exit the settings window.

Click Terminal in the bottom status bar. You should see (venv) at the beginning of your prompt. If you do not, restart PyCharm and check again. If you still don't see (venv) appear, go back to the beginning of this section and make sure you correctly setup a new environment.


Go back to the Project Interpreter window, and ensure that the Python Interpreter field is populated with the name Python 3.9 (gender_analysis) followed by a path. If this says anything else, find Mike or Ryaan to get help fixing it.


In the Project panel, find and open the requirements.txt file in the top-level gender_analysis folder. Wait until PyCharm automatically detects that this file contains our Python package requirements. Click Install requirements in the banner that pops up, and click Install once the Choose Packages to Install window pops up. (If it asks you to install a plugin, you can safely click Ignore extension) You should see the installation process begin in the bottom status bar. We can let this run while moving on to the next step, but we'll come back to this to make sure it worked.

At this point, you've completed the setup for the gender_analysis project! Here are a few usage notes, but if you’ve worked with the gender_analysis project before, feel free to skip these and get started.

Tips for local development

General usage notes

You can find user-facing documentation here that roughly outlines how to use the Gender Analysis Toolkit. Over the course of the summer, we’ll make significant updates and modifications to both the codebase and these docs.

Tests, coverage, and linting

pytest

The gender_analysis project includes several packages and configurations to assist us in ensuring our code is well-formatted and tested. For both docstring and unit/integration testing, we use pytest, a standard Python testing package. To use pytest, you can either execute the command pytest in PyCharm's Terminal or set up a configuration for a one-click testing and coverage run.

Running pytest from the Terminal

Click on the Terminal tab in the bottom status bar of PyCharm to open a terminal prompt for your virtual environment. Enter the command pytest to run all docstring and unit tests for the entire project. You should see terminal output that looks something like:


You can also run specific docstring or unit tests by specifying a filepath: pytest gender_analysis/analysis/proximity.py. You should see terminal output that looks something like:

Running pytest from PyCharm configuration

You can also configure PyCharm to include a one-click option for running tests. To do so, follow the following steps.


Click on Add Configuration... in the status bar at the top of PyCharm:


In the following menu, click the + button:


Navigate down to the pytest option and add any additional configuration you’d like or leave the default settings intact:


You configuration for pytest are now intact! When you’d like to run your tests, click the 'play' button in PyCharm's top status bar:

coverage

The gender_analysis project also includes the `coverage` package, which allows us easily to learn how much of our code is covered by tests and to produce visualizations to help us track down uncovered code. While it is not necessarily adviseable or even desirable to have complete code coverage, it’s a good idea to ensure that the majority of our code is covered by testing.


To run a coverage report of our codebase, execute the command coverage run in the Terminal. You should see something like this:


You can follow this up by executing the command coverage report in the Terminal to view our current code coverage. You should see something like this:


And finally you can run coverage html in the Terminal to generate HTML files containing detailed coverage information that you can open and navigate in the browser:

pylint

Similar to pytest and coverage above, you can run pylint followed by a filepath (ex. pylint gender_analysis) in the PyCharm Terminal to perform automatic Python style linting on the project. In the words of the pylint docs: "pylint is a tool that checks for errors in Python code, tries to enforce a coding standard and looks for code smells. It can also look for certain type errors, it can recommend suggestions about how particular blocks can be refactored and can offer you details about the code's complexity." pylint runs automatically when you open a pull request on GitHub, so it’s worth the time to run pylint locally every once in a while while you develop and to correct anything that it flags.

← Return to home