MacPorts Statistics - Google Summer of Code 2011
Derek Ingrouville - derek AT macports.org
This page describes the implementation of the MacPorts Statistics project as part of Google Summer of Code 2011. It is based on the documentation in the svn repository available at branches/gsoc11-statistics/docs/implementation/impl.tex
The code shown here was written during GSoC 2011 and may no longer be current. The latest version of the code is available at branches/gsoc11-statistics
Client Side - MacPorts Base
Install
In order to automatically submit data at regular intervals some small changes had to be made to the installation process. These changes include installing a script which handles data submissions submitstats.sh
, configuring launchd
to regularly run submitstats.sh
, and generating a unique identifier for the user submitting data.
Makefile.in
- Install
submitstats.sh
to$(DESTDIR)${datadir}/macports/
- Run
setupstats.sh
configure.ac
Generate a universally unique identifier to identify this MacPorts installation. The UUID is generated by uuidgen
and stored in the variable STATS_UUID
Scripts
setupstats.sh
This script is responsible for generating and installing the file org.stats.macports.plist
. This plist is used by launchd
to regularly run submitstats.sh
.
The script takes two arguments
- The path to the script that
launchd
should execute - The path to the MacPorts configuration file
macports.conf
It will execute the script once a week. The day of the week, hour and minute are determined as follows:
Weekday The day of the week is determined by the machine's hardware UUID modulo 7. This is to help ensure that submissions are roughly evenly distributed throughout the week.
Hour: The hour that submitstats.sh
was executed.
Minute: The minute that submitstats.sh
was executed.
The plist is installed to /Library/LaunchAgents/org.macports.stats.plist
and then loaded by launchctl
submitstats.sh
This script has two responsibilities
- Check if a user is participating.
- Submit data only if the user if participating
It takes one parameter, the path to macports.conf
.
To determine if a user is participating it checks if the variable stats_participate
is set to yes
. If it is, then port stats submit
is executed. If the user is not participating then the script exits.
The reason this script exists is to have a lightweight tool to check if a user is participating before running port
. This script will be executed once a week for every user, regardless of whether or not they are participating.
Configuration
Added several variables to macports.conf.in
and appropriate descriptions to macports.conf.5
.
stats_participate
This indicates whether or not a user has chosen to opt-in and share their data. Its value is either yes
or no
stats_url
This is the url where data should be submitted.
stats_id
This is the UUID used for submissions. It is initially set to value of the autoconf variable @STATS_UUID@
Changes to macports1.0/macports.tcl
New Globals
Added globals stats_participate
, stats_url
, stats_id
that correspond to configuration options. Added deferred global gccversion
gcc
version check
Added proc setgccinfo that is called the first time gccversion
is read.
Changes to pextlib1.0/curl.c - CurlPostCmd()
Added CurlPostCmd function. This takes two Tcl parameters, the post data and the url.
Example usage is
curl post "project=macports" $url
The port stats
action
port stats
gathers lists of all active and inactive ports as well as relevant system information. It no subaction is given port stats
prints the system information to stdout
.
If the submit
subaction is given then it will encode all the collected data as a JSON
object. It then submits this via HTTP POST to a server specified in macports.conf
.
JSON
encoding is done though sub-procedures contained inside the procedure for the port stats
action.
Changes to port/port-help.tcl
Added help entry for the port stats
action describing proper usage.
Data Format
Transmitted data is encoded as a JSON object with four fields.
{ "id": "...", "os": { ... }, "active_ports": [ {...}, ... {...} ], "inactive_ports": [ {...}, ... {...} ] }
- id
This is a string containing the user’s UUID.
- os
This is a JSON object containing information about the user’s system.
"os": { "macports_version": "1.9.99", "osx_version": "10.6", "os_arch": "i386", "os_platform": "darwin", "build_arch": "x86_64", "gcc_version": "4.2.1", "xcode_version": "4.0" }
- active ports
This is an array of json objects. Each object represents a single port.
"active_ports": [ { "name": "aalib", "version": "1.4rc5_4" }, { "variants": "nonls +", "name": "aspell", "version": "0.60.6_4" } ]
- inactive ports
This is the same as active ports except that port objects represent installed inactive ports.
Server Side - Ruby on Rails
Database Schema
Categories table
Imported from MPWA
create_table "categories", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" end
Relationships and Validations
has_many :ports validates_presence_of :name
Changes from MPWA
- Validate presence of name
Ports table
Imported from MPWA
create_table "ports", :force => true do |t| t.string "name" t.string "path" t.string "version" t.text "description" t.string "licenses" t.integer "category_id" t.text "variants" t.string "maintainers" t.string "platforms" t.string "categories" t.datetime "created_at" t.datetime "updated_at" end add_index "ports", ["name"], :name => "index_ports_on_name"
Relationships and Validations
has_one :category belongs_to :category has_many :installed_ports validates_presence_of :name, :version
Changes from MPWA
- Changed variant column to text type from string
- Added index on name column
- Validate presence of name and version
- has_many installed ports
installed_ports table
The installed_ports table holds submitted port installation data. It keeps track of an installed port's version and variants as well as the id of the submitting user.
create_table "installed_ports", :force => true do |t| t.integer "port_id" t.string "version" t.text "variants" t.datetime "created_at" t.datetime "updated_at" t.integer "user_id" end add_index "installed_ports", ["port_id"], :name => "index_installed_ports_on_port_id" add_index "installed_ports", ["user_id"], :name => "index_installed_ports_on_user_id"
Relationships and Validations
belongs_to :port has_one :user validates_presence_of :user_id, :port_id, :version
os_statistics table
The os_statistics table holds information about a user's system.
create_table "os_statistics", :force => true do |t| t.datetime "created_at" t.datetime "updated_at" t.string "macports_version" t.string "osx_version" t.string "os_arch" t.string "os_platform" t.string "build_arch" t.string "xcode_version" t.string "gcc_version" t.integer "user_id" end add_index "os_statistics", ["user_id"], :name => "index_os_statistics_on_user_id"
Relationships and Validations
belongs_to :port has_one :user validates_presence_of :user_id, :port_id, :version
users table
The users table holds UUIDs for each user.
create_table "users", :force => true do |t| t.string "uuid" t.datetime "created_at" t.datetime "updated_at" end
Relationships and Validations
has_one :os_statistic has_many :installed_ports
Submissions
JSON encoded submissions are sent via HTTPS POST to the /submissions
page. All data is stored in the data
POST variable.
Submissions are stored on a month by month basis. Resubmissions in a given month cause that month's data to be updated.
Storing data happens as follows
- Attempt to find a user with the given UUID in the database. If no user is found then add a new entry
- Attempt to find an entry in the os_statistics table for this user that was created this month. If no such entry is found then add an try for this month. If an entry is found then update it.
- For each submitted port verify that it is a valid port by checking to see if it exists in the
ports
table. If it does not exist then skip it. If it does exist then attempt to find an entry for the given user that was created this month. If an entry was found then update it, otherwise create a new entry.
OS Statistics Page
The OS statistics page provides visualizations of the data in the os_statistics
table. It shows pie charts for each of
- MacPorts Version
- OSX Versions
- OS Arch
- OS Platform
- Build Arch
- gcc Versions
- XCode Versions
These pie charts show the percentage of the user population running different versions (or arch / platform) in each category.
Port Page
Every port in the MacPorts repository has an associated port page. This page displays basic information about the port such as
- Name
- Current version
- Licenses
- Categories
- Variants
This page also shows visualizations of the data in the installed_ports
page for this particular port.
It has the following
- Line chart of installation counts over the past 12 months.
- Top versions over the past 12 months. This finds the top 5 most popular versions in use right now and tracks how their popularity has changed over the past 12 months. Popularity is measured by the number of installations of each version per month.
- Pie chart of all versions. This shows the distribution of all different versions in use right now. It will show you that
x%
of users of this port are using versiony
. - Similarly to all versions, there is a pie chart of all variants in use.
Installed Ports Page
This page shows summary information for installed ports such as:
- The number of participating users
- The number of ports in the MacPorts repository
- Average number of ports installed port user
- Most popular port this month and the number of installs
- Most popular port this year and the number of installs
- A bar chart of the top 25 most installed ports along with their install counts
- A table of the top 25 most installed ports along with their install counts
Home Page
The home has links to all other pages as well as a search area for ports. It also displays a line chart of the number of participating users over the past 12 months.