Web of Science Search Interface
IEEE Xplore Digital Library Search Interface
The Universal Parsing Agent also known as UPA is another tool written by some staff at Pacific Northwest National Laboratory. This tool earned them an R&D 100 award in 2007. This tool is used to process text documents, extracts information, and stores that information in XML markup files for further use by other software products. This provides users with more time for analysis by automating document processing. Templates are needed to parse the data dependent on the source of the information. I have determined what my dataset format needs are and by manipulating the template I can output the dataset into the correct format to be ingested into the visual analytic tools. For IN-SPIRE and “Tool A” my preferred format of data is in XML although they do take other formats. They essentially use the same UPA template with some minor tweaking.
The data that comes out of the RIS formats are written as two letter notations or one letter and one number. Here are some example RIS tags:
TY - Type of reference (must be the first tag) A2 - Secondary Author (each author on its own line preceded by the tag) A3 - Tertiary Author (each author on its own line preceded by the tag) A4 - Subsidiary Author (each author on its own line preceded by the tag) AB - Abstract AD - Author Address AN - Accession Number AU - Author (each author on its own line preceded by the tag) C1 - Custom 1 C2 - Custom 2 C3 - Custom 3 C4 - Custom 4 C5 - Custom 5 C6 - Custom 6 C7 - Custom 7 C8 - Custom 8 CA - Caption CN - Call Number CY - Place Published DA - Date DB - Name of Database DO - DOI DP - Database Provider EP - End Page ET - Edition IS - Number J2 - Alternate Title (this field is used for the abbreviated title of a book or journal name) KW - Keywords (keywords should be entered each on its own line preceded by the tag) L1 - File Attachments (this is a link to a local file on the users system not a URL link) L4 - Figure (this is also meant to be a link to a local file on the users's system and not a URL link) LA - Language LB - Label M1 - Number M3 - Type of Work N1 - Notes NV - Number of Volumes OP - Original Publication PB - Publisher PY - Year RI - Reviewed Item RN - Research Notes RP - Reprint Edition SE - Section SN - ISBN/ISSN SP - Start Page ST - Short Title T2 - Secondary Title T3 - Tertiary Title TA - Translated Author TI - Title TT - Translated Title UR - URL VL - Volume Y2 - Access Date
I preferred the tags in the XML markup files to have full names so that it is more user friendly. So I spent time editing the templates to make them more readable and exported the datasets with these full names to make it easier to distinguish what the entities are in the visualization tools. Other things I worked on this week included contacting some scientists about Open Source Data searching, features, and data downloads and getting help converting XML data into CSV to import into Analyst Notebook.