Friday, July 12, 2013

Field Directed Work Week 3

This week I spent quite a bit of time downloading datasets out of the advanced search features of Web of Science and IEEE Xplore Digital Library from a variety of search terms given to me from my two mentors.   PNNL’s Technical Library has subscriptions to these two databases.  Web of Science provides researchers, administrators, faculty and students with quick, powerful access to the world’s leading citation databases.  Its authoritative and multidisciplinary content covers over 12,000 of the highest impact journals worldwide, including Open Access journals and over 150,000 conference proceedings.  You’ll find current and retrospective coverage in the sciences, social sciences, arts and humanities, with coverage from 1900-present.    The IEEE Xplore digital library is a powerful resource for discovery and access to scientific and technical content published by the IEEE (Institute of Electrical and Electronics Engineers) and its publishing partners.  IEEE Xplore provides Web access to more than 3-million full-text documents from some of the world's most highly cited publications in electrical engineering, computer science and electronics.  The content in IEEE Xplore comprises over 160 journals, over 1,200 conference proceedings, more than 3,800 technical standards, over 1,000 eBooks and over 300 educational courses. Approximately 25,000 new documents are added to IEEE Xplore each month.

Web of Science Search Interface



IEEE Xplore Digital Library Search Interface

The Universal Parsing Agent also known as UPA is another tool written by some staff at Pacific Northwest National Laboratory.  This tool earned them an R&D 100 award in 2007.  This tool is used to process text documents, extracts information, and stores that information in XML markup files for further use by other software products.  This provides users with more time for analysis by automating document processing.  Templates are needed to parse the data dependent on the source of the information.  I have determined what my dataset format needs are and by manipulating the template I can output the dataset into the correct format to be ingested into the visual analytic tools.  For IN-SPIRE and “Tool A” my preferred format of data is in XML although they do take other formats.  They essentially use the same UPA template with some minor tweaking.   
The data that comes out of the RIS formats are written as two letter notations or one letter and one number.  Here are some example RIS tags:
TY  - Type of reference (must be the first tag)
A2  - Secondary Author (each author on its own line preceded by the tag)
A3  - Tertiary Author (each author on its own line preceded by the tag)
A4  - Subsidiary Author (each author on its own line preceded by the tag)
AB  - Abstract
AD  - Author Address
AN  - Accession Number
AU  - Author (each author on its own line preceded by the tag)
C1  - Custom 1
C2  - Custom 2
C3  - Custom 3
C4  - Custom 4
C5  - Custom 5
C6  - Custom 6
C7  - Custom 7
C8  - Custom 8
CA  - Caption
CN  - Call Number
CY  - Place Published
DA  - Date
DB  - Name of Database
DO  - DOI
DP  - Database Provider
EP  - End Page
ET  - Edition
IS  - Number 
J2  - Alternate Title (this field is used for the abbreviated title of a book or journal name)
KW  - Keywords (keywords should be entered each on its own line preceded by the tag)
L1  - File Attachments (this is a link to a local file on the users system not a URL link)
L4  - Figure (this is also meant to be a link to a local file on the users's system and not a URL link)
LA  - Language
LB  - Label
M1  - Number
M3  - Type of Work
N1  - Notes
NV  - Number of Volumes
OP  - Original Publication
PB  - Publisher
PY  - Year
RI  - Reviewed Item
RN  - Research Notes
RP  - Reprint Edition
SE  - Section
SN  - ISBN/ISSN
SP  - Start Page
ST  - Short Title
T2  - Secondary Title
T3  - Tertiary Title
TA  - Translated Author
TI  - Title
TT  - Translated Title
UR  - URL
VL  - Volume
Y2  - Access Date

I preferred the tags in the XML markup files to have full names so that it is more user friendly.  So I spent time editing the templates to make them more readable and exported the datasets with these full names to make it easier to distinguish what the entities are in the visualization tools.  Other things I worked on this week included contacting some scientists about Open Source Data searching, features, and data downloads and getting help converting XML data into CSV to import into Analyst Notebook.

No comments: