Skip to the content.

PyEuropePMC Features

**✨ Explore what PyEuropePMC can do** - Comprehensive feature overview and workflows [πŸ” Search](search/) β€’ [πŸ“„ Full-Text](fulltext/) β€’ [πŸ”¬ Parsing](parsing/) β€’ [⬅️ Back to Docs](/pyEuropePMC/)

πŸ” Core Features

Query the Europe PMC database with powerful search capabilities

Quick Example:

from pyeuropepmc import SearchClient

with SearchClient() as client:
    results = client.search("cancer AND therapy", pageSize=50, sort="CITED desc")

Learn More β†’


Full-Text Retrieval

Download complete article content in multiple formats

Quick Example:

from pyeuropepmc import FullTextClient

with FullTextClient() as client:
    pdf_path = client.download_pdf_by_pmcid("PMC1234567")
    xml_content = client.download_xml_by_pmcid("PMC1234567")

Learn More β†’


XML Parsing

Extract structured data from full-text XML documents

Quick Example:

from pyeuropepmc import FullTextXMLParser, ElementPatterns

parser = FullTextXMLParser(xml_content)

# Extract metadata
metadata = parser.extract_metadata()

# Extract tables
tables = parser.extract_tables()

# Convert to markdown
markdown = parser.to_markdown()

# Validate schema coverage
coverage = parser.validate_schema_coverage()
print(f"Coverage: {coverage['coverage_percentage']:.1f}%")

Learn More β†’


πŸ”§ Query Builder

Advanced fluent API for building complex search queries with type safety

Quick Example:

from pyeuropepmc import QueryBuilder

qb = QueryBuilder()
query = (qb
    .keyword("cancer", field="title")
    .and_()
    .citation_count(min_count=50)
    .and_()
    .date_range(start_year=2020)
    .build())
# Result: "(TITLE:cancer) AND (CITED:[50 TO *]) AND (PUB_YEAR:[2020 TO *])"

Learn More β†’


πŸ“‹ Systematic Review Tracking

PRISMA/Cochrane-compliant search logging and audit trails

Quick Example:

from pyeuropepmc import QueryBuilder
from pyeuropepmc.utils.search_logging import start_search

log = start_search("Cancer Review", executed_by="Researcher")
qb = QueryBuilder().keyword("cancer").and_().field("open_access", True)
qb.log_to_search(log, filters={"open_access": True}, results_returned=100)

Learn More β†’


πŸ“Š Feature Comparison

Feature SearchClient FullTextClient FullTextXMLParser FTPDownloader QueryBuilder
Search Europe PMC βœ… - - - βœ…
Build Complex Queries - - - - βœ…
Type-Safe Fields - - - - βœ…
Query Validation - - - - βœ…
Query Translation - - - - βœ…
Download PDFs - βœ… - βœ… -
Download XML - βœ… - - -
Parse XML - - βœ… - -
Extract Metadata - - βœ… - -
Extract Tables - - βœ… - -
Bulk Downloads - - - βœ… -
Systematic Review Logging - - - - βœ…
Caching βœ… βœ… - - -
Progress Tracking - βœ… - βœ… Β 

πŸš€ Common Workflows

Workflow 1: Advanced Query β†’ Search β†’ Parse

from pyeuropepmc import QueryBuilder, SearchClient, FullTextXMLParser

# Step 1: Build complex query with QueryBuilder
qb = QueryBuilder()
query = (qb
    .keyword("machine learning", field="title")
    .and_()
    .citation_count(min_count=25)
    .and_()
    .date_range(start_year=2020)
    .build())

# Step 2: Search with the query
with SearchClient() as client:
    results = client.search(query, pageSize=20, sort="CITED desc")

    # Step 3: Process results
    for paper in results['resultList']['result']:
        if paper.get('pmcid'):
            # Download and parse XML
            xml_content = client.get_fulltext_xml(paper['pmcid'])
            parser = FullTextXMLParser(xml_content)
            metadata = parser.extract_metadata()
            print(f"High-impact paper: {metadata['title']}")

Workflow 2: Systematic Review with Audit Trail

from pyeuropepmc import QueryBuilder
from pyeuropepmc.utils.search_logging import start_search

# Start systematic review
log = start_search("ML in Biology Review", executed_by="Researcher Name")

# Build comprehensive search strategy
qb = QueryBuilder()
comprehensive_query = (qb
    .keyword("machine learning")
    .and_()
    .keyword("biology")
    .and_()
    .field("open_access", True)
    .and_()
    .date_range(start_year=2019)
    .build())

# Execute and log search
with SearchClient() as client:
    results = client.search(comprehensive_query, pageSize=100)

    # Log for systematic review compliance
    qb.log_to_search(
        search_log=log,
        filters={"open_access": True, "date_range": "2019+"},
        results_returned=len(results['resultList']['result']),
        notes="Comprehensive ML in biology search"
    )

# Save review log
log.save("systematic_review_log.json")

Workflow 3: Advanced Search β†’ Filter β†’ Extract

from pyeuropepmc import SearchClient, FullTextXMLParser

with SearchClient() as client:
    # Advanced search with filters
    results = client.search(
        query="cancer AND (therapy OR treatment)",
        sort="CITED desc",
        pageSize=100,
        resultType="core"
    )

    # Filter for high-impact papers
    high_impact = [
        paper for paper in results['resultList']['result']
        if paper.get('citedByCount', 0) > 50 and paper.get('pmcid')
    ]

    # Extract detailed information
    for paper in high_impact:
        xml_content = client.get_fulltext_xml(paper['pmcid'])
        parser = FullTextXMLParser(xml_content)
        # Analyze...

πŸ“Š Feature Matrix

Search Features

Capability Supported Notes
Keyword search βœ… Full-text search across all fields
Boolean operators βœ… AND, OR, NOT
Field-specific βœ… Search specific fields (author, title, etc.)
Date filtering βœ… Publication date ranges
Citation sorting βœ… Sort by citation count
Pagination βœ… Handle large result sets
Multiple formats βœ… JSON, XML, Dublin Core

Full-Text Features

Capability Supported Notes
PDF download βœ… Open access articles only
XML download βœ… JATS/NLM XML format
HTML content βœ… HTML representation
Bulk FTP βœ… Efficient for large datasets
Progress tracking βœ… Real-time progress callbacks
Auto-retry βœ… Robust error handling

Parsing Features

Capability Supported Notes
Metadata extraction βœ… Title, authors, journal, dates, etc.
Table extraction βœ… Structured table data
Reference extraction βœ… Complete bibliography
Plaintext conversion βœ… Full article text
Markdown conversion βœ… Formatted markdown
Schema validation βœ… Coverage analysis
Custom patterns βœ… Flexible configuration
Multiple XML schemas βœ… JATS, NLM, custom

πŸŽ“ Learning Resources

By Feature

By Use Case

By Skill Level


πŸ’‘ Best Practices

Performance

Error Handling

Rate Limiting


πŸ”„ What’s Next?

Explore each feature in detail:

  1. Search Features - Master the search API
  2. Full-Text Retrieval - Download article content
  3. XML Parsing - Extract structured data
  4. Caching - Optimize performance

Or jump to:


Section Why Visit?
πŸš€ Getting Started Installation and basics
πŸ“š API Reference Complete method documentation
🎯 Examples Working code samples
βš™οΈ Advanced Power user features

**[⬆ Back to Top](#pyeuropepmc-features)** β€’ [⬅️ Back to Main Docs](/pyEuropePMC/)