<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1"
         xmlns:dc="http://purl.org/dc/terms/"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://rs.gbif.org/schema/eml-gbif-profile/1.1/eml.xsd"
         packageId="72cedfec-7360-4cd8-b79b-6a65e3f4dd78/v1.4" system="http://gbif.org" scope="system"
         xml:lang="eng">

<dataset>
  <alternateIdentifier>72cedfec-7360-4cd8-b79b-6a65e3f4dd78</alternateIdentifier>
  <alternateIdentifier>https://training-ipt-b.gbif.org/resource?r=uc3_birds_fallen_at_danish_lighthouses</alternateIdentifier>
  <title xml:lang="eng">uc3_birds_fallen_at_danish_lighthouses</title>
      <creator>
    <individualName>
        <givenName>Bello</givenName>
      <surName>Danmallam</surName>
    </individualName>
    <organizationName>A.P. Leventis Ornithological Research Institute (APLORI); Africa Bird Atlas Project (ABAP)</organizationName>
    <positionName>Database Administrator</positionName>
    <address>
        <country>NG</country>
    </address>
    <phone>+2348038175878</phone>
    <electronicMailAddress>adamubello001@gmail.com</electronicMailAddress>
    <onlineUrl>https://bdanmallam.com</onlineUrl>
          <userId directory="https://www.linkedin.com/profile/view?id=">bdanmallam</userId>
      </creator>
      <metadataProvider>
    <individualName>
        <givenName>Bello</givenName>
      <surName>Danmallam</surName>
    </individualName>
    <organizationName>A.P. Leventis Ornithological Research Institute (APLORI); Africa Bird Atlas Project (ABAP)</organizationName>
    <positionName>Database Administrator</positionName>
    <address>
        <country>NG</country>
    </address>
    <phone>+2348038175878</phone>
          <userId directory="https://www.linkedin.com/profile/view?id=">talatu.tende</userId>
      </metadataProvider>
      <associatedParty>
    <individualName>
        <givenName>Michael </givenName>
      <surName>Brooks</surName>
    </individualName>
    <organizationName>University of Cape Town</organizationName>
    <role></role>
      </associatedParty>
  <pubDate>
      2021-07-18
  </pubDate>
  <language>eng</language>
  <abstract>
    <para>This dataset is an occurrence data with its associated metadata for birds obtained from the Literature, “Birds fallen at Danish Lighthouses, 1883–1939” (In Danish, ‘Fuglene ved de danske Fyr, 1883–1939’). The data was collected in Denmark during the night of bird migration (1883–1939) and documented in the book &apos;Fuglene ved de danske Fyr.&apos; To digitize the data, a copy of the book was scanned using OCR (Optical Character Recognition) into PDF files. This was followed by transferring the PDF files into spreadsheets as individual records. </para>
    <para>The data published here is from two of the 45 Light Houses found in Denmark (Lodbjerg Fyr and Hanstholm Fyr) with a total of 1212 records (with 742 records belonging to Lodbjerg Fyr and 470 records belonging to Hanstholm Fyr).</para>
  </abstract>
      <keywordSet>
            <keyword>Occurrence</keyword>
        <keywordThesaurus>GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type_2015-07-10.xml</keywordThesaurus>
      </keywordSet>
      <keywordSet>
            <keyword>Observation</keyword>
        <keywordThesaurus>GBIF Dataset Subtype Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_subtype.xml</keywordThesaurus>
      </keywordSet>
  <intellectualRights>
    <para>This work is licensed under a <ulink url="http://creativecommons.org/licenses/by/4.0/legalcode"><citetitle>Creative Commons Attribution (CC-BY) 4.0 License</citetitle></ulink>.</para>
  </intellectualRights>
  <distribution scope="document">
    <online>
      <url function="information">https://danbif.dk/</url>
    </online>
  </distribution>
  <coverage>
      <geographicCoverage>
          <geographicDescription>Denmark</geographicDescription>
        <boundingCoordinates>
          <westBoundingCoordinate>6.482</westBoundingCoordinate>
          <eastBoundingCoordinate>12.722</eastBoundingCoordinate>
          <northBoundingCoordinate>57.61</northBoundingCoordinate>
          <southBoundingCoordinate>54.432</southBoundingCoordinate>
        </boundingCoordinates>
      </geographicCoverage>
          <temporalCoverage>
              <rangeOfDates>
                  <beginDate>
                    <calendarDate>1883</calendarDate>
                  </beginDate>
                <endDate>
                  <calendarDate>1939</calendarDate>
                </endDate>
              </rangeOfDates>
          </temporalCoverage>
  </coverage>
  <maintenance>
    <description>
      <para></para>
    </description>
    <maintenanceUpdateFrequency>unkown</maintenanceUpdateFrequency>
  </maintenance>

      <contact>
    <individualName>
        <givenName>Bello</givenName>
      <surName>Danmallam</surName>
    </individualName>
    <organizationName>A.P. Leventis Ornithological Research Institute (APLORI); Africa Bird Atlas Project (ABAP)</organizationName>
    <positionName>Database Administrator</positionName>
    <address>
        <country>NG</country>
    </address>
    <phone>+2348038175878</phone>
    <electronicMailAddress>adamubello001@gmail.com</electronicMailAddress>
    <onlineUrl>https://bdanmallam.com</onlineUrl>
          <userId directory="https://www.linkedin.com/profile/view?id=">bdanmallam</userId>
      </contact>
      <contact>
    <individualName>
        <givenName>Samuel T.</givenName>
      <surName>Ivande</surName>
    </individualName>
    <organizationName>A.P. Leventis Ornithological Research Institute (APLORI); Africa Bird Atlas Project (ABAP)</organizationName>
    <positionName>Scientific Director</positionName>
    <electronicMailAddress>ivande.sam@gmail.com</electronicMailAddress>
      </contact>
      <contact>
    <individualName>
        <givenName>UIf</givenName>
      <surName>Ottoson</surName>
    </individualName>
    <organizationName>A.P. Leventis Ornithological Research Institute (APLORI); Africa Bird Atlas Project (ABAP)</organizationName>
    <positionName>Project coordinator</positionName>
    <address>
        <country>SE</country>
    </address>
      </contact>
      <contact>
    <individualName>
        <givenName>Talatu</givenName>
      <surName>Tende</surName>
    </individualName>
    <organizationName>Ap.P. Leventis Ornithological Research Institute</organizationName>
    <positionName>Project Manager</positionName>
      </contact>
  <methods>
        <methodStep>
          <description>
            <para>The dataset was digitized from a copy of the book &quot;Fuglene ved de danske Fyr.&quot; This was done by scanning the book using OCR (Optical Character Recognition) into PDF files. This was followed by transferring the PDF files into spreadsheets as individual records.

To assess the quality of the data Excel and OpenRefine were used. Additionally, web tools like Canadensys coordinate conversion: http://data.canadensys.net/tools/coordinates, InfoXY: http://splink.cria.org.br/infoxy?criaLANG=en, and Global Names Resolver: http://resolver.globalnames.org were used together with the Excel and OpenRefine to clean/standardize the data. 

For date format, some records were not consistent with the conventional YYYY-MM-DD format and this was corrected in excel. To do this, the given cells were selected (click Format cell and Category Date) and I selected YYYY-MM-DD format. Years 2038 and 2093 were found to be outside the bounds (1883-1939). The records were carefully checked and it was observed that it could be an oversight since the ‘year’ column have it as 1938 and 1893. This was manually edited.

To check for spelling errors, OpenRefine was used. Having launched the software and it opened in my machine browser, I Choose Files &gt; Next and I checked &lt;Trim leading &amp; trailing whitespace from strings&gt; and Create Project. The project was created with 1212 rows loaded and 10 rows showing by default. By clicking the triangular column menu (go to Facet, then make a Text facet) I realized a spelling error (Sylvia borin misspelt as Slyvia borin). I corrected the error manually by clicking edit in the cell and pasted the right spelling. Also, one of the taxonomies in German ‘Rindrossel’ was not translated and captured in a scientific name. Thus, with the aid of google search and translate, it was edited and captured as &apos;Turdus torquatus&apos;.

Many empty cells were found in columns like individualCount, sex, and lifeStage. These were left untouched because they could represent missing data or no observation at all. Similarly, Not Available (NA) values were noted and left untouched. And in the column disposition, two records appear to be outliers or could be an error in the record (e.g 240 and 887), as all other records fall under 50. Want to explore more about missing values and outliers? Please visit the following link https://www.coursera.org/lecture/ibm-exploratory-data-analysis-for-machine-learning/handling-missing-values-and-outliers-9O50z
For geographic errors, five records were captured with negative signs in their latitude. These were manually corrected by removing the negative sign and validating the record using the tool InfoXY: http://splink.cria.org.br/infoxy?criaLANG=en This tool was used to validate all other geographic records to their identified locality. Also, not all the coordinates were consistent in their format, as some of the records were presented in the conventional decimal degrees while others in DDMMSS (e.g 56Â° 49&apos; 24.3408&apos;&apos; N, 8Â° 15&apos; 46.0404&apos;&apos; E). The degrees part of the latitude and longitude have a capital letter ‘A’ with circumflex, and ‘E’ was used in the longitude instead of ‘W’. The ‘A’ with circumflex was removed and ‘E’ was replaced with ‘W’ in the longitude before conversion using https://data.canadensys.net/tools/coordinates. The conversion was possible by copying and pasting the wrongly formatted coordinates in the tool before submitting. 

To validate names and populate the sheet with higher taxonomy from GBIF API, I used openRefine. To do this, I clicked on the column menu under speciesName and then Edit column followed by Add column by fetching URLs, where a new window popped Add column by fetching URLs based on column scientificName. I named the new column as Api_name, changed the Throthle Delay to 250 milliseconds, pasted the following expression &quot;http://api.gbif.org/v1/species/match?verbose=true&amp;name=&quot;+escape(value,&apos;url&apos;), and clicked OK. This took some time to process and finally generated an Api_name for the records. 
Next is to create a column for higher taxonomy by making reference to the created Api_name column. I followed the column menu under Api_name and clicked Edit Column followed by Add column based on this column... and a new column also popped up Add column based on column Api_name. I named the new column higherTaxonomy and pasted the following expression:
value.parseJson().get(&quot;kingdom&quot;)+
&quot;, &quot;+value.parseJson().get(&quot;phylum&quot;)+
&quot;, &quot;+value.parseJson().get(&quot;class&quot;)+
&quot;, &quot;+value.parseJson().get(&quot;order&quot;)+
&quot;, &quot;+value.parseJson().get(&quot;family&quot;)  
Under the preview section of the new window, the Kingdom, Phylum, Class, Order, and family of the 10-demo taxon appeared and I entered OK which created a single column for all the categories of higher taxonomy being separated by a comma. 
Next, is to split the higherTaxonomy column into several columns containing the aforementioned taxonomic categories of each taxon. To do this, I followed the higherClassification menu column clicked Edit column followed Split into several columns…, by separator ‘,’ and OK. Having created the higherTaxonomy columns captured with the same header, I then had to rename the columns by following Edit Column then Rename this column and I manually renamed it as kingdom, phylum, class, order, and family columns. But these columns have leading and trailing spaces. To remove the leading and trailing spaces from the newly added higher taxonomy columns, I performed Edit Cells followed by Common transforms then Trim leading and trailing whitespace on each of the columns. This removed all the leading and trailing spaces and I then Export the cleaned file as Comma Separated Value (CSV) file.

</para>
          </description>
        </methodStep>
      <sampling>
        <studyExtent>
          <description>
            <para>Light Houses, Denmark </para>
          </description>
        </studyExtent>
        <samplingDescription>
          <para>Birds seen and their activities were recorded within Light Houses in Denmark by the keepers of the lighthouses. Fallen birds were also collected and sent to museum in Copenhagen. These birds were carefully preserved and catalogued by collection managers at the museum. Observations of weather conditions during the nights when birds were observed by the keepers were also documented.
</para>
        </samplingDescription>
      </sampling>
      <qualityControl>
        <description>
          <para>The dataset was opened in both Excel and OpenRefine. Additionally, web tools like Canadensys coordinate conversion: http://data.canadensys.net/tools/coordinates, InfoXY: http://splink.cria.org.br/infoxy?criaLANG=en, and Global Names Resolver: http://resolver.globalnames.org were used together with the Excel and OpenRefine to clean/standardize the data.</para>
        </description>
      </qualityControl>
  </methods>
  <project id="BID-AF2020-039-REG">
    <title>Data Mobilization Project from Literature “Birds fallen at Danish Lighthouses, 1883–1939”</title>
      <personnel>
        <individualName>
            <givenName>Bello</givenName>
          <surName>Danmallam</surName>
        </individualName>
              <userId directory="https://www.linkedin.com/profile/view?id=">bdanmallam</userId>
        <role>publisher</role>
      </personnel>
      <abstract>
        <para>This dataset is an occurrence data with associated metadata for birds obtained from the Literature, “Birds fallen at Danish Lighthouses, 1883–1939” (In Danish, ‘Fuglene ved de danske Fyr, 1883–1939’). The data was collected in Denmark during the night of bird migration (1883–1939) and documented in the book &quot;Fuglene ved de danske Fyr, 1895-1939&quot; (UK: Birds at the Danish Lighthouses, 1895-1939). To digitize the data, a copy of the book was scanned using OCR (Optical Character Recognition) into PDF files. This was followed by the transfer of the PDF files into spreadsheets as individual records. The data published here is from two of the 45 Light Houses found in Denmark (Lodbjerg Fyr and Hanstholm Fyr) with a total of 1212 records (with 742 records belonging to Lodbjerg Fyr and 470 records belonging to Hanstholm Fyr).</para>
      </abstract>
      <funding>
        <para>Funding type: State funding |
Country: Denmark
</para>
      </funding>
      <studyAreaDescription>
        <descriptor name="generic"
                    citableClassificationSystem="false">
          <descriptorValue>45 Light Houses in Denmark</descriptorValue>
        </descriptor>
      </studyAreaDescription>
      <designDescription>
        <description>
          <para>The presence and activities of birds were recorded within Light Houses in Denmark. Collected birds were carefully preserved and catalogued by collection managers at the  Natural History Museum of Denmark (NHM-DK). Observations of weather conditions during the nights when birds were observed by the keepers were also documented.</para>
        </description>
      </designDescription>
  </project>
</dataset>
  <additionalMetadata>
    <metadata>
      <gbif>
          <dateStamp>2021-07-14T09:17:41.329+00:00</dateStamp>
          <hierarchyLevel>dataset</hierarchyLevel>
            <citation>Danmallam B (2021): uc3_birds_fallen_at_danish_lighthouses. v1.4. Training Organization. Dataset/Occurrence. https://training-ipt-b.gbif.org/resource?r=uc3_birds_fallen_at_danish_lighthouses&amp;v=1.4</citation>
          <dc:replaces>72cedfec-7360-4cd8-b79b-6a65e3f4dd78/v1.4.xml</dc:replaces>
      </gbif>
    </metadata>
  </additionalMetadata>
</eml:eml>
