Friday, August 23, 2013

Field Directed Work Week 9

This is my last and final week in Virginia for the summer.  I spent a majority of the week working on my final artifact (Powerpoint slides) which included all of the datasets and visualization analytic tools I worked with.  The visual analytic tools I worked with were IN-SPIRE, Tool “B”, IBM’s i2 Analyst Notebook and Tableau Software.  The dataset types were library citations (Database searches in Web of Science and IEEE), twitter, emails from database sources, and web logs.  Each raw dataset had to be formatted for each designated tool.  Each tool is built upon different theories, algorithms, and background mathematics, to display the dataset visually for the user to discover trends or contextual relationships of the data they may not have seen with the naked eye. 

Introduction Slide Introduction Slide


Visual Analytics2 Visual Analytic Tools and Different Data Sources


Visual Analytics1

Data Flow into the Visualization tools


Included with the roadmap are the initial views of the tools and the types of figures a user can obtain to discover the relationships within the dataset.  These views were created throughout the summer and can be seen in previous weeks’ blog entries.


ConclusionSlide Information can come from a variety of sources and interpreting large sets of data can be quite daunting, but with the use of visual analytic tools, users can discover relationships and trends in a given dataset. 

The final artifact, Powerpoint Presentation, was sent to the Directed Fieldwork Advisor directly and if you would like to see the final slides feel free and email or leave a comment here.  I did not want the information to be distributed publicly.  Thanks!

Friday, August 16, 2013

Field Directed Work Week 8

This week I spent time ingesting Twitter datasets into Tableau Software.  Tableau Software is a great tool to make tables, charts and graphs on the fly by dragging and dropping the dimensions and measures you want to have to calculated instantaneously.  The datasets are mostly in CSV or Excel format.  The dimensions are dictated by the field or column headings.  The data must be formatted in Excel to designate whether it is a number, string, text, etc.


(1) Arrows that point to the dimensions (fields) from the excel spreadsheet (2) Measured values.  This is the table with the designated dimension and measured values, this is a table displaying tweet counts (4) by user (3)


Dragging and dropping dimensions and measures to formulate graphs or figures, in this bar chart we’ve plotted tweet counts by time zone


The various types of figures you can visualize based on the number of dimensions (fields) and measures.  You can make tables, maps, heat maps, tree maps, stacked or horizontal bar charts, side by side style, line, dual line, area, pie, scatter, circle, bullet, Gantt, packed bubble and histogram charts.  The title of this widget is “Show Me” so it shows you your options based on your criteria.



Stacked bar chart that displays, number of tweets per year, by designated twitter ID (color block)



“Packed Bubble” visualization of twitter data by device or method, the larger the bubble the higher the number of devices



Graph of web log data that display the website URL and number of hits (counts)


I also spent the majority of this week gathering all datasets and writing up my final artifacts for my mentor and for my field directed work.  I also played with Tool “D” ingesting photos and videos and examining the capabilities that tool can offer for intelligence analysts.

Saturday, August 10, 2013

Field Directed Work Week 7

This week I spent time doing a variety of tasks.  The first task was inputting web logs into IBM’s Analyst Notebook and Tableau.  This task was to familiarize me with the different topic headings of web logs.  In this case the logs were Squid Proxy Access.  I learned quite a bit about the features and capabilities of Microsoft Access and Notepad to manipulate the data into a usable format. I also had to help define the headings of what is in a Squid Proxy Access web log.  The types of information you can obtain from these logs include from what IP address you are at and what website (URL) you are trying to access and the duration of time it took to access it and whether you had a success log-on or not.  Here are a few screenshots of the data in both tools. 


Analyst Notebook view of IP Address node and link to the URL user is trying to access



Tableau allows for quick drag and drop of attributes to create views/graphs/visualizations on the fly and it does calculations, statistics and trend lines as well



Squid Result Codes and their Counts



Bubble view of IP address counts



Number of websites accessed per IP address (Client Address)


The second task was inputting citations into IBM’s Analyst Notebook.  There are approximately, 4000 citations on the Web of Science search of “code reuse” from Web of Science.  I am wanting to demonstrate the relationship of title, author and keyword.   The issue with this connection is that there are multiple authors and multiple keywords.  In the RIS format of these citations the authors appear as A1, A1, etc and keywords as KW, KW, etc.  They aren’t designated by individual tags (A1, A2, A3 or K1, K2, K3).   Therefore the visualization includes one long line of information which can be messy.  To separate the author fields and keyword fields per record will take some programming because it creates a multiple attribute system (Cartesian multiplication).  Here is an Analyst Notebook screenshot of the records in journals, books and chapters connected to its Web of Science address.

ANB WOS Code Reuse 

Journal, Book and Chapter links to Web of Science



Literature search for the past 15 years on Web of Science, each node denotes a year




Year linked to Title linked to Keywords or Authors

The third task was gathering all datasets to present to my mentors what sets can go into what tools thus far to present on Monday.

Friday, August 2, 2013

Field Directed Work Week 6

One of the big passions I have is working with the outreach programs at PNNL (High School, Undergrad, Graduate, or Community College students).  One of the programs I worked with throughout the year is SULI (Science Undergraduate Laboratory Internships) offered by the Department of Energy.  I went back home to Richland, Washington to participate in the first wave of summer students symposium.  One of the students, Claudia Gallegos, presented a way to visualize the twitter feeds from the Boston Marathon bombings.  I contacted the student and her mentors to obtain her raw datasets and then import the sets into IN-SPIRE.   The sets included data a few months up until the bombing through the days of the man hunt.  A tremendous amount of information!  Visualizing the tweets, retweets, the times, hastags and buzz words used is very interesting.  Here are a few of her slides from her presentation.




Other applications for the visualization of tweets is to see the response of disease epidemics and natural disaster responses.  It seems that the norm for people with smart phones, computers and mobile devices that the use of social media can spread information faster than a news reporter can, but it is up to the viewer to believe whether the data is fact or fiction.  It was a really interesting way to see how social media can be used and visualized from a large dataset.

In this short week, I was also able to obtain an unrestricted Open Source Center account and set up a feed of data into email.  The database has an advanced search feature option that includes from what source country you want the records to come from.  In this case I had set up a search on Syria with source countries of Qatar, Saudi Arabia, Turkey, Iran, Lebanon, Israel, Russia, Jordan and France.  The resultant records are either designated classified or unclassified.  These emails accrued for a few days and I was able to input the text into IN-SPIRE.   In order for these emails to be imported into the other tools, templates to convert these emails into XML or CSV format will be needed to define the metadata fields.