Skip to content

Supported Metadata Files

This project supports extracting metadata from specific types of files commonly used to declare authorship and contribution in open source repositories.

Supported Metadata Files in SOMEF

SOMEF can extract metadata from a wide range of files commonly found in software repositories. Below is a list of supported file types, along with clickable examples from real projects:

File Name Language Description Detail Source Spec. Version Spec. Example
AUTHORS.md General Lists contributors, authors, and affiliations relevant to the project 📄 Example
pom.xml Java / Maven Project configuration file containing metadata and dependencies 📄 4.0.0 Example
bower.json JavaScript (Bower) Package descriptor used for configuring packages that can be used as a dependency for Bower-managed front-end projects. 📄 Example
package.json JavaScript / Node.js Defines metadata, scripts, and dependencies for Node.js projects 📄 10.9.4 Example
codemeta.json JSON-LD Metadata file for research software using JSON-LD vocabulary 📄 v3.0 Example
readme.me Markdown Main documentation file of repository Example
composer.json PHP Manifest file serves as the package descriptor used in PHP projects. 📄 2.8.12 Example
juliaProject.toml Python Defines the package metadata and dependencies for Julia projects, used by the Pkg package manager. 📄 Example
pyproject.toml Python Modern Python project configuration file used by tools like Poetry and Flit 📄 Example
requirements.txt Python Lists Python package dependencies 📄 25.2 Example
setup.py Python Package file format used in python projects 📄 Example
DESCRIPTION R Metadata file for R packages including title, author, and version 📄 Example
*.gemspec Ruby Manifest file serves as the package descriptor used in Ruby gem projects. 📄 Example
cargo.toml Rust Manifest file serves as the package descriptor used in Rust projects 📄 Example
*.cabal Haskell Manifest file serving as the package descriptor for Haskell projects. 📄 Example
dockerfile Dockerfile Build specification file for container images that can include software metadata via LABEL instructions (OCI specification). 📄 Example
publiccode.yml YAML YAML metadata file for public sector software projects 📄 Example

Note: The general principles behind metadata mapping in SOMEF are based on the CodeMeta crosswalk and the CodeMeta JSON-LD context.
However, each supported file type may have specific characteristics and field interpretations.

Types of metadata in SOMEF

Type Metadata Category
Agent authors
Keywords keywords
License license
Release version
Software_application requirements
String description
String name
String package_id
String runtime_platform
Url has_package_file
Url homepage
Url issue_tracker
Url package_distribution

Example: Dependency Metadata Extraction from Configuration Files

SOMEF parses configuration files like pom.xml to extract structured metadata about software dependencies and other requirements.


Source File: Snippet from Widoco/pom.xml

Below is the XML fragment for the Maven dependency that is being parsed:

<dependencies>
  <dependency>
    <groupId>org.apache.maven</groupId>
    <artifactId>maven-model</artifactId>
    <version>3.9.0</version>
  </dependency>
  </dependencies>

The following Python code snippet show the logic used by the SOMEF parser to transform the XML elements to the JSON metadata structure:

        if project_data["dependencies"]:
            for dependency in project_data["dependencies"]:
                metadata_result.add_result(
                    constants.CAT_REQUIREMENTS, 
                    {
                        "value": f'{dependency.get("groupId", "")}.{dependency.get("artifactId", "")}'.strip("."),
                        "name": dependency.get("artifactId", ""),
                        "version": dependency.get("version", ""),
                        "type": constants.SOFTWARE_APPLICATION
                    },
                    1,
                    constants.TECHNIQUE_CODE_CONFIG_PARSER,
                    source
                )

After applying the mapping logic, the metadata for the dependency is stored under the requirements category (CAT_REQUIREMENTS in this case) with the following JSON structure:

``` somef json

"requirements": [
    {
        "result": {
            "value": "org.apache.maven.maven-model",
            "name": "maven-model",
            "version": "3.9.0",
            "type": "Software_application"
        },
        "confidence": 1,
        "technique": "code_parser",
        "source": "https://raw.githubusercontent.com/dgarijo/Widoco/master/pom.xml"
    },

```