Description

The following metadata fields can be extracted from a DESCRIPTION file.
These fields are defined in the DESCRIPTON specification, and are mapped according to the CodeMeta crosswalk for DESCRIPTION files based in R Package.

Software metadata category	SOMEF metadata JSON path	DESCRIPTION metadata file field
authors	authors[i].result.value	Authors (1)
authors	authors[i].result.email	Authors (2)
code_repository	code_repository[i].result.value	URL (3)
description	description[i].result.value	Description (3)
has_package_file	has_package_file[i].result.value	URL of the DESCRIPTION file
homepage	homepage[i].result.value	URL (3)
issue_tracker	issue_tracker[i].result.value	BugReports (5)
license	license[i].result.value	License (6)
package_id	package_id[i].result.value	Package (6)
version	version[i].result.value	Version (7)

(1), (2) , - Regex 1: r'Authors@R:\s*c$([\s\S]*?)$\s*$' → group[1]
- Regex 2: find in group[1] all persons and extract first name (or organition), last name and email - Example:

        Authors@R: c(
            person("Hadley", "Wickham", , "hadley@posit.co", role = "aut",
                comment = c(ORCID = "0000-0003-4757-117X")),
            person("Winston", "Chang", role = "aut",
                comment = c(ORCID = "0000-0002-1576-2126"))
        )

Result:

{'result': {'value': 'Hadley Wickham', 'type': 'Agent', 'email': 'hadley@posit.co'}, 'confidence': 1, 'technique': 'code_parser', 'source': 'https://example.org/DESCRIPTION'}, {'result': {'value': 'Winston Chang', 'type': 'Agent'}, 'confidence': 1, 'technique': 'code_parser', 'source': 'https://example.org/DESCRIPTION'}

(3) - Regex: 'URL:\s*([^\n]+(?:\n\s+[^\n]+)*)' - if github.com or gitlab.com --> code_repository - if not --> homepage

Example:

URL: https://ggplot2.tidyverse.org,
        https://github.com/tidyverse/ggplot2

Result code_repository: 'result': {'value': 'https://github.com/tidyverse/tidyverse', 'type': 'Url'}
Result hompeage: 'result': {'value': 'https://tidyverse.tidyverse.org', 'type': 'Url'}}

(3) - Regex: r'Description:\s*([^\n]+(?:\n\s+[^\n]+)*)', content) - Example: ```Description: A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

- Result:

A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. ```

(5) - Regex: 'BugReports:\s*([^\n]+)' - Example: BugReports: https://github.com/tidyverse/ggplot2/issues - Result: https://github.com/tidyverse/ggplot2/issues

(5) - Regex: r'License:\s*([^\n]+)'`` - Example:License: MIT + file LICENSE- Result:MIT + file LICENSE`

(6) - Regex: r'Package:\s*([^\n]+) - Example: Package: ggplot2 - Result: ggplot2

(6) - Regex: r'Version:\s*([^\n]+)' - Example: Version: 2.0.0.9000 - Result: 2.0.0.9000