Supported Metadata Files¶
This project supports extracting metadata from specific types of files commonly used to declare authorship and contribution in open source repositories.
Supported Metadata Files in SOMEF¶
SOMEF can extract metadata from a wide range of files commonly found in software repositories. Below is a list of supported file types, along with clickable examples from real projects:
| File Name | Language | Description | Detail | Source Spec. | Version Spec. | Example |
|---|---|---|---|---|---|---|
AUTHORS.md |
General | Lists contributors, authors, and affiliations relevant to the project | 📄 | Example | ||
pom.xml |
Java / Maven | Project configuration file containing metadata and dependencies | 📄 | 4.0.0 | Example | |
bower.json |
JavaScript (Bower) | Package descriptor used for configuring packages that can be used as a dependency for Bower-managed front-end projects. | 📄 | Example | ||
package.json |
JavaScript / Node.js | Defines metadata, scripts, and dependencies for Node.js projects | 📄 | 10.9.4 | Example | |
codemeta.json |
JSON-LD | Metadata file for research software using JSON-LD vocabulary | 📄 | v3.0 | Example | |
readme.me |
Markdown | Main documentation file of repository | Example | |||
composer.json |
PHP | Manifest file serves as the package descriptor used in PHP projects. | 📄 | 2.8.12 | Example | |
juliaProject.toml |
Python | Defines the package metadata and dependencies for Julia projects, used by the Pkg package manager. | 📄 | Example | ||
pyproject.toml |
Python | Modern Python project configuration file used by tools like Poetry and Flit | 📄 | Example | ||
requirements.txt |
Python | Lists Python package dependencies | 📄 | 25.2 | Example | |
setup.py |
Python | Package file format used in python projects | 📄 | Example | ||
DESCRIPTION |
R | Metadata file for R packages including title, author, and version | 📄 | Example | ||
*.gemspec |
Ruby | Manifest file serves as the package descriptor used in Ruby gem projects. | 📄 | Example | ||
cargo.toml |
Rust | Manifest file serves as the package descriptor used in Rust projects | 📄 | Example | ||
*.cabal |
Haskell | Manifest file serving as the package descriptor for Haskell projects. | 📄 | Example | ||
dockerfile |
Dockerfile | Build specification file for container images that can include software metadata via LABEL instructions (OCI specification). | 📄 | Example | ||
publiccode.yml |
YAML | YAML metadata file for public sector software projects | 📄 | Example |
Note: The general principles behind metadata mapping in SOMEF are based on the CodeMeta crosswalk and the CodeMeta JSON-LD context.
However, each supported file type may have specific characteristics and field interpretations.
Types of metadata in SOMEF¶
| Type | Metadata Category |
|---|---|
| Agent | authors |
| Keywords | keywords |
| License | license |
| Release | version |
| Software_application | requirements |
| String | description |
| String | name |
| String | package_id |
| String | runtime_platform |
| Url | has_package_file |
| Url | homepage |
| Url | issue_tracker |
| Url | package_distribution |
Example: Dependency Metadata Extraction from Configuration Files¶
SOMEF parses configuration files like pom.xml to extract structured metadata about software dependencies and other requirements.
Source File: Snippet from Widoco/pom.xml¶
Below is the XML fragment for the Maven dependency that is being parsed:
<dependencies>
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-model</artifactId>
<version>3.9.0</version>
</dependency>
</dependencies>
The following Python code snippet show the logic used by the SOMEF parser to transform the XML elements to the JSON metadata structure:
if project_data["dependencies"]:
for dependency in project_data["dependencies"]:
metadata_result.add_result(
constants.CAT_REQUIREMENTS,
{
"value": f'{dependency.get("groupId", "")}.{dependency.get("artifactId", "")}'.strip("."),
"name": dependency.get("artifactId", ""),
"version": dependency.get("version", ""),
"type": constants.SOFTWARE_APPLICATION
},
1,
constants.TECHNIQUE_CODE_CONFIG_PARSER,
source
)
After applying the mapping logic, the metadata for the dependency is stored under the requirements category (CAT_REQUIREMENTS in this case) with the following JSON structure:
``` somef json
"requirements": [
{
"result": {
"value": "org.apache.maven.maven-model",
"name": "maven-model",
"version": "3.9.0",
"type": "Software_application"
},
"confidence": 1,
"technique": "code_parser",
"source": "https://raw.githubusercontent.com/dgarijo/Widoco/master/pom.xml"
},
```