When prospective students would like to learn more about the curriculum of a major they are interested in, they typically visit the department web sites of the universities they are interested in attending. But how accurately does the department’s advertising of the major portray what the curriculum is really like? Undergraduates often come to the realization that they are not as interested in a major as they thought, and end up changing to a different major. This can be frustrating to students, since they could have been focusing their efforts on a subject they were more passionate about. The goal of this visualization is to analyze how majors are advertised and identify ways to most effectively advertise a major.
To reach this goal, I chose to visualize the portrayal of the Bucknell curriculum from multiple perspectives to help uncover differences between what is advertised, and what is offered by Bucknell University. To identify how Bucknell publicizes different majors, I created a dataset by visiting each Bucknell department’s web site choosing one sentence that I thought most accurately and generally describes the major from the landing page. Despite the variation in layout across the various pages, I did my best to be consistent in the sentences I chose. For example, in some cases I had to go to a subpage on the department site to find a representative sentence. To identify what was actually being offered to students by different majors, I scraped descriptions of every course available in the Fall semester of 2015, and group them by department. Lastly, I created a data file that mapped every course to each College Core Curriculum (CCC) requirement that the course fulfilled. This data can be used to create a visualization that provides a multidimensional perspective on the Bucknell curriculum that could provide insight to both current and prospective Bucknell students, as well as faculty. This will help faculty ensure that the curriculum is accurately advertised, and prevent students from picking the wrong major because of how it was described.
The first thing I did with the course description data was to import it into Jigsaw. My corpus consisted of a text file for each department, which each contained all course descriptions for that department. My most interesting finding in Jigsaw for this corpus was the Document View. This view contained summaries of the department’s course descriptions that appeared to be representative of the department and similar to the sentences I picked off the department landing pages. However, there were some issues with the text summarizer used by Jigsaw. For example, rather than describing the goals of the department, many of the summaries ended up describing prerequisites. Although these prerequisites were common in course descriptions, they did not contribute to my target visualization.
The summaries in Jigsaw led to the first view in my visualization, the computed summary view. This view was created by passing all the course descriptions in a department through a text summarizer program. This creates a brief text that is meant to be most representative of all course descriptions in that department, which could then be compared to web site’s description. Rather than using the text summaries computed by Jigsaw, I passed the course descriptions through an independent text summarizer program called Sumy. This Python library allowed for more versatility in choice of the summarization algorithm and length of the summary text. After using multiple algorithms on the text, I found that the Edmundson algorithm was best at avoiding irrelevant phrases, such as those containing prerequisites, while still creating a concise summary. One issue I found with these summaries is that it tends to pick out descriptions from introductory classes. This is most likely due to there being multiple sections of these classes that each use the same description. In a future iteration, I would likely either filter out repeat sentences and do additional research to find the ideal text summarization algorithm for my data.
I also used the course descriptions to create a word cloud of key terms for each department. I did this by running an n-gram analysis of the text. Before generating the word cloud, I filtered out certain words using a stoplist. This consisted not only of common English words, but also some words that were specific to the descriptions such as “prerequisites”, “instructor”, and “permission”. These words appeared prominently in many of the word clouds, but did not provide much insight that was relevant to my research questions. The word cloud complements the computed department description by displaying keywords for the department that may not have made it in the summarization. The word cloud is also useful because it may bring to light what is not offered by a department. For example, a prospective computer science student may scan the word cloud for words such as “database” or “server” to get an idea of much focus there is in the department on large scale systems, if that is what they are interested in.
Another visualization I created was a network diagram of the departments. The idea to add this feature was inspired by previous assignments in the past using tools such as Google Fusion Tables and Palladio. This visualization extends previous ones by using SigmaJS to add an interactive component the nodes. The diagram contains nodes for courses and CCC requirements that are met by the courses in that department. Edges are drawn between course and CCC nodes if a course satisfies that requirement. Clicking on a course node will display the full description of the course and clicking on a CCC node will display the goals of that CCC requirement. This ability to explore relationships creates a martini-glass structure to the data, allowing the viewer to drill deeper and find out how specific courses are influencing the curriculum (Segel and Heer).
All four segments of the visualization (the Bucknell summary, computed summary, word cloud, and network visualization) each uniquely contribute to the user’s experience with the data. Tanya Clement emphasizes the importance of multiple perspectives by explaining, “Ultimately, the rule of plausibility dictates that differently situated eyes panning multiple directions (or realities) not only are more powerful than a small magnifying glass but also serve different purposes and research agendas” (Clement). In her essay, Clement argues that although computers can often oversimplify a complex topic, the insight they provide can become more useful and credible if presented from multiple perspectives. In this visualization, I created different views that show the data at different levels of “zoom” to not only help the viewer more easily find meaningful insights, but also to prevent viewers from reaching an incorrect conclusion due to a lack of differing perspectives. The organization of the views in this visualization makes it easy to choose a department, look at the summary sentences for the department, and then explore deeper by using the word cloud and network diagram. It is possible to zoom out even further to look at network diagrams of the entire curriculum through individual CCC requirements in the CCC tab. This high level view allows the viewer to easily discover what the goals of each department are, based their various academic goals.
This visualization meets all five of Lima’s principles for data visualization. It documents a system of relations that has not documented before. It does so by merging together various data sets that were not otherwise easy to interact with before. The layout of the visualization clarifies the system by organizing the data so that it can be viewed from multiple angles. Since the visualization is intended to be used by a variety of users, the self-guided layout allows viewers find their own meaning in the data. Viewer will be able to make discoveries about the curriculum that others may have never seen before, which includes helping to ensure that all majors are equally and accurately represented.
Link to visualization: http://nadeem.io/270