A modern, comprehensive Python package for parsing, analyzing, and manipulating RNA secondary structures. Designed with a clean API, lazy loading for performance, comprehensive motif analysis, and extensive integration capabilities.
✨ Modern & Easy to Use - Clean, intuitive API inspired by best practices
🚀 Performance Optimized - Lazy loading for fast parsing of large structures
🧬 Comprehensive Analysis - Extract, search, and manipulate structural motifs
🔧 Flexible Parsing - Supports multiple bracket types, pseudoknots, and alternative formats
📊 Pandas Integration - Seamless integration with pandas DataFrames via accessors
⚡ Parallel Processing - Batch processing support for large datasets
🛡️ Robust Error Handling - Graceful handling of malformed structures with warnings
🔍 Type Safe - Full type annotations with mypy support for better code quality
📦 JSON Serialization - Built-in JSON support for data exchange
🔬 Advanced Search - Pattern matching, wildcards, and complex motif queries
🔄 Structure Manipulation - Immutable operations for safe structure modification
📈 Statistics & Analysis - Comprehensive metrics and comparison tools
✅ Validation - Built-in validation and normalization utilities
Install from PyPI:
pip install rna_secstructInstall with optional dependencies:
# With pandas support
pip install rna_secstruct[pandas]
# With parallel processing
pip install rna_secstruct[parallel]
# With all optional dependencies
pip install rna_secstruct[all]from rna_secstruct import SecStruct
# Create a structure from sequence and dot-bracket notation
struct = SecStruct("GGGAAACCC", "(((...)))")
# Access basic properties
print(f"Sequence: {struct.sequence}") # GGGAAACCC
print(f"Structure: {struct.structure}") # (((...)))
print(f"Length: {len(struct)}") # 9
# Access motifs (lazy loading - parsing happens here)
for motif_id, motif in struct.motifs.items():
print(f"{motif_id}: {motif.m_type} - {motif.sequence}")
# 0: HELIX - GGG&CCC
# 1: HAIRPIN - GAAAC- Basic Usage
- Working with Motifs
- Advanced Search
- Structure Manipulation
- Connectivity Analysis
- Statistics & Analysis
- Validation & Normalization
- Comparison Operations
- JSON Serialization
- Pandas Integration
- Parallel Processing
- Multi-Strand Structures
- Pseudoknot Support
- Error Handling
- API Reference
from rna_secstruct import SecStruct
# Simple hairpin
hairpin = SecStruct("GGGAAACCC", "(((...)))")
# Multi-strand structure (use & to separate strands)
multistrand = SecStruct(
"GGGAAACCC&UUUAAA",
"(((...)))&(((...)))"
)
# Structure with junction
junction = SecStruct(
"GGAAACGAAACGAAACC",
"((...)(...)(...))"
)
# Complex structure
complex_struct = SecStruct(
"GGGACCUUCGGGACCC",
"(((.((....)).)))"
)struct = SecStruct("GGGAAACCC", "(((...)))")
# Sequence and structure
print(struct.sequence) # GGGAAACCC
print(struct.structure) # (((...)))
print(len(struct)) # 9
# String representation
print(repr(struct)) # GGGAAACCC, (((...)))
# Connectivity (pairmap)
print(struct.connectivity) # [8, 7, 6, -1, -1, -1, 2, 1, 0]
# -1 means unpaired, numbers indicate paired positionstruct = SecStruct("GGGAAACCC", "(((...)))")
# Slicing returns new SecStruct (immutable)
substruct = struct[3:6] # Extract middle region
print(substruct.sequence) # AAA
print(substruct.structure) # ...
# Access motifs by ID (after parsing)
motif = struct[0] # Get motif with ID 0
print(motif.m_type) # HELIXstruct1 = SecStruct("GGG", "(((")
struct2 = SecStruct("AAA", "...")
combined = struct1 + struct2 # New SecStruct instance
print(combined.sequence) # GGGAAAstruct = SecStruct("GGGAAACCC", "(((...)))")
# Motifs are stored as a dictionary (lazy-loaded)
print(struct.motifs)
# {0: HELIX,GGG&CCC,(((&))), 1: HAIRPIN,GAAAC,(...)}
# Access by ID
helix = struct[0]
hairpin = struct[1]
# Iterate over motifs
for motif in struct:
print(f"{motif.m_type}: {motif.sequence}")
# Iterate with IDs
for motif_id, motif in struct.itermotifs():
print(f"ID {motif_id}: {motif.m_type}")
# Get specific motif types
helices = struct.get_helices()
hairpins = struct.get_hairpins()
junctions = struct.get_junctions()
single_strands = struct.get_single_strands()
# Count motifs
print(struct.get_num_motifs()) # Total number of motifsstruct = SecStruct("GGGAAACCC", "(((...)))")
motif = struct[0] # Get helix motif
# Basic properties
print(f"ID: {motif.m_id}") # 0
print(f"Type: {motif.m_type}") # HELIX
print(f"Sequence: {motif.sequence}") # GGG&CCC
print(f"Structure: {motif.structure}") # (((&)))
print(f"Token: {motif.token}") # Helix3
# Position information
print(f"Strands: {motif.strands}") # [[0, 1, 2], [6, 7, 8]]
print(f"Positions: {motif.positions}") # [0, 1, 2, 6, 7, 8]
print(f"Start: {motif.start_pos}") # 0
print(f"End: {motif.end_pos}") # 8
# Hierarchy
print(f"Has parent: {motif.has_parent()}") # False
print(f"Has children: {motif.has_children()}") # True
print(f"Children: {motif.children}") # [HAIRPIN,GAAAC,(...)]
print(f"Parent: {motif.parent}") # None
# Type checking
print(motif.is_helix()) # True
print(motif.is_hairpin()) # False
print(motif.is_junction()) # False
print(motif.is_single_strand()) # False
# Position checking
print(motif.contains(5)) # True (position 5 is in this motif)
# Recursive operations (include all children)
seq = motif.recursive_sequence()
struct = motif.recursive_structure()
print(f"Recursive: {seq} {struct}") # GGGAAACCC (((...)))
# String representation
print(motif.to_str())
# ID: 0, Helix3 GGG&CCC (((&)))
# ID: 1, Hairpin5 GAAAC (...)from rna_secstruct import SecStruct, MotifSearchParams
struct = SecStruct("GGGACCUUCGGGACCC", "(((.((....)).)))")
# Search by sequence
results = struct.get_motifs(MotifSearchParams(sequence="GAC&GAC"))
print(results) # [JUNCTION,GAC&GAC,(.(&).)]
# Search by structure pattern
results = struct.get_motifs(MotifSearchParams(structure="(....)"))
print(results) # [HAIRPIN,GGAAAC,(....)]
# Search by motif type
helices = struct.get_motifs(MotifSearchParams(m_type="HELIX"))
hairpins = struct.get_motifs(MotifSearchParams(m_type="HAIRPIN"))
junctions = struct.get_motifs(MotifSearchParams(m_type="JUNCTION"))struct = SecStruct("GGGACCUUCGGGACCC", "(((.((....)).)))")
# Position constraints
results = struct.get_motifs(
MotifSearchParams(
m_type="JUNCTION",
min_pos=10, # Start after position 10
max_pos=50 # End before position 50
)
)
# Length constraints
results = struct.get_motifs(
MotifSearchParams(
m_type="HAIRPIN",
min_length=4, # At least 4 nucleotides
max_length=10 # At most 10 nucleotides
)
)
# ID constraints
results = struct.get_motifs(
MotifSearchParams(
min_id=1, # Motif ID >= 1
max_id=5 # Motif ID <= 5
)
)
# Children constraints
results = struct.get_motifs(
MotifSearchParams(
m_type="HELIX",
has_children=True # Only helices with children
)
)
# Combined search
results = struct.get_motifs(
MotifSearchParams(
m_type="HAIRPIN",
min_length=4,
max_length=8,
min_pos=5,
max_pos=20
)
)struct = SecStruct("GGGAAACCC", "(((...)))")
# Search by token (motif identifier)
helix4 = struct.get_motifs_by_token("Helix4") # Any helix of length 4
junction2 = struct.get_motifs_by_token("Junction2_5|0") # 2-way junction
# Token format examples:
# - "Helix3" - Helix with 3 base pairs
# - "Hairpin5" - Hairpin with 5 nucleotides
# - "Junction2_3|4" - 2-way junction with loop sizes 3 and 4struct = SecStruct("GGGACCUUCGGGACCC", "(((.((....)).)))")
# Find motifs by strand lengths
# For a hairpin: [5] means 5 nucleotides
hairpins_5 = struct.get_motifs_by_strand_lengths([5])
# For a junction: [3, 3] means two strands of length 3
junctions_3_3 = struct.get_motifs_by_strand_lengths([3, 3])
# For a helix: [3, 3] means two strands of length 3
helices_3_3 = struct.get_motifs_by_strand_lengths([3, 3])struct = SecStruct("GGGACCUUCGGGACCC", "(((.((....)).)))")
# Find two-way junctions by topology
# x_pos and y_pos are loop sizes (excluding flanking base pairs)
junctions = struct.get_twoway_junctions_by_topology(x_pos=1, y_pos=1)
# This finds junctions with loop sizes 1 and 1struct = SecStruct("GGGAAACCC", "(((...)))")
# Find sequence patterns (with wildcards)
matches = struct.find_sequence("GAA", allow_wildcards=False)
# Returns: [(3, 6)] # (start, end) positions
# Find with wildcards
matches = struct.find_sequence("GNN", allow_wildcards=True)
# N matches any nucleotide
# R = A/G, Y = U/C, M = A/C, K = U/G, S = G/C, W = A/U
# B = not A, D = not C, H = not G, V = not U
# Find structure patterns
matches = struct.find_structure("(((")
# Returns: [(0, 3)] # (start, end) positions
# Find complete substructures
sub = SecStruct("AAA", "...")
matches = struct.find(sub)
# Returns: [(3, 6)] # (start, end) positionsMost manipulation operations return new SecStruct instances (immutable pattern). The original structure is never modified. However, change_motif() modifies the structure in place.
struct = SecStruct("GGGAAACCC", "(((...)))")
# Change helix sequence
# Note: change_motif() modifies the SecStruct in place (not immutable)
struct.change_motif(0, "AGG&CCU", "(((&)))")
print(struct.sequence) # AGGAAACCU
# Change hairpin to hexaloop
struct.change_motif(1, "CUUUUUUG", "(......)")
print(struct.sequence) # AGGCUUUUUUGCCU
# Replace with complex structure (auto-reparsing)
struct = SecStruct("GGGAAACCC", "(((...)))")
struct.change_motif(1, "GGGACCUUCGGGACCC", "(((.((....)).)))")
print(struct.to_str())
# ID: 0, Helix5 GGGGG&CCCCC (((((&)))))
# ID: 1, Junction2_1|1 GAC&GAC (.(&).)
# ID: 2, Helix2 CC&GG ((&))
# ID: 3, Hairpin4 CUUCGG (....)struct = SecStruct("GGGACCUUCGGGACCC", "(((.((....)).)))")
# Get a copy (important before making changes)
struct_copy = struct.get_copy()
# Get substructure starting from a motif
sub_struct = struct.get_sub_structure(1) # From motif 1 and all its children
print(sub_struct.sequence) # GACCUUCGGGAC
print(sub_struct.structure) # (.((....)).)struct1 = SecStruct("GGG", "(((")
struct2 = SecStruct("AAA", "...")
struct3 = SecStruct("CCC", ")))")
# Split strands (if multi-strand)
strands = struct.split_strands() # Returns list of SecStruct objects
# Insert at position
new_struct = struct1.insert(3, struct2)
# Returns: SecStruct("GGGAAA", "(((...)")
# Join structures (with & separator)
joined = struct1.join(struct2)
# Returns: SecStruct("GGG&AAA", "(((&...)")
# Replace at position
replaced = struct1.replace(struct2, 0)
# Returns: SecStruct("AAA", "...")
# Remove region
removed = struct1.remove(1, 2)
# Returns: SecStruct("GG", "((")
# Subtract substructure
subtracted = struct1.subtract(struct2)
# Removes struct2 from struct1 if foundfrom rna_secstruct import SecStruct, get_connectivity_list, ConnectivityList
# Get connectivity list (pairmap)
struct = SecStruct("GGGAAACCC", "(((...)))")
conn = struct.connectivity
print(conn) # [8, 7, 6, -1, -1, -1, 2, 1, 0]
# Index shows paired position, -1 means unpaired
# Using ConnectivityList class
cl = ConnectivityList("GGGAAACCC", "(((...)))")
print(cl.connections) # [8, 7, 6, -1, -1, -1, 2, 1, 0]
print(cl.sequence) # GGGAAACCC
print(cl.structure) # (((...)))cl = ConnectivityList("GGGAAACCC", "(((...)))")
# Check if nucleotide is paired
print(cl.is_nucleotide_paired(0)) # True
print(cl.is_nucleotide_paired(3)) # False
# Get paired nucleotide
print(cl.get_paired_nucleotide(0)) # 8
print(cl.get_paired_nucleotide(8)) # 0
# Get base pair
print(cl.get_basepair(0)) # GC
print(cl.get_basepair(3)) # . (unpaired)
# Get pair type (bracket character, letter, or number)
print(cl.get_pair_type(0)) # '('struct = SecStruct("GGGAAACCC", "(((...)))")
# Check if position is paired
print(struct.is_paired(0)) # True
# Get base pair tuple
bp = struct.get_basepair(0)
print(bp) # (0, 8) or None if unpaired
# Count base pairs
print(struct.get_num_basepairs()) # 3
# Count unpaired
print(struct.get_num_unpaired()) # 3from rna_secstruct import get_connectivity_list, STANDARD_BRACKET_TYPES
# Pseudoknot structure using different bracket types
pseudoknot = get_connectivity_list(
"GGGAAACCC",
"(([[))]]",
bracket_types=STANDARD_BRACKET_TYPES # Supports () [] {} <>
)
print(pseudoknot.is_nucleotide_paired(0)) # True
print(pseudoknot.get_pair_type(0)) # '('
print(pseudoknot.get_pair_type(3)) # '['
# Detect pseudoknots
from rna_secstruct.connectivity import has_pseudoknot
conn = pseudoknot.connections
has_pk = has_pseudoknot(conn, STANDARD_BRACKET_TYPES)
print(has_pk) # Truefrom rna_secstruct import get_connectivity_list
# Letter-based format
cl = get_connectivity_list(
"GGGAAACCC",
"a b c c b a",
format="letter"
)
print(cl.get_pair_type(0)) # 'a'
# Number-based format
cl = get_connectivity_list(
"GGGAAACCC",
"1 2 3 3 2 1",
format="number"
)
print(cl.get_pair_type(0)) # '1'
# Auto-detect format
cl = get_connectivity_list(
"GGGAAACCC",
"(((...)))"
# format=None auto-detects
)from rna_secstruct.connectivity import is_circular
conn = [1, 2, 0] # Circular: 0->1, 1->2, 2->0
is_circ = is_circular(0, conn)
print(is_circ) # Truestruct = SecStruct("GGGAAACCC", "(((...)))")
# Base pair statistics
print(struct.get_num_basepairs()) # 3
print(struct.get_num_unpaired()) # 3
# GC content
print(struct.get_gc_content()) # 0.666... (2/3)
# Helix information
helix_lengths = struct.get_helix_lengths()
print(helix_lengths) # [3] (lengths of all helices)
# Motif counts
print(struct.get_num_motifs()) # 2
print(len(struct.get_helices())) # 1
print(len(struct.get_hairpins())) # 1
print(len(struct.get_junctions())) # 0struct1 = SecStruct("GGGAAACCC", "(((...)))")
struct2 = SecStruct("GGGAAACCC", "(((...)))")
struct3 = SecStruct("AAA", "...")
# Equality
print(struct1 == struct2) # True
print(struct1 == struct3) # False
# Structural similarity (structure string comparison)
similarity = struct1.structural_similarity(struct2)
print(similarity) # 1.0 (identical structures)
# Sequence identity
identity = struct1.sequence_identity(struct2)
print(identity) # 1.0 (identical sequences)struct = SecStruct("GGGAAACCC", "(((...)))")
# Validate structure (raises ValueError if invalid)
try:
struct.validate()
print("Structure is valid!")
except ValueError as e:
print(f"Invalid structure: {e}")
# Check validity (non-raising)
if struct.is_valid():
print("Structure is valid")
else:
print("Structure is invalid")struct = SecStruct("gggaaaccc", "(((...)))")
# Normalize (uppercase, T->U conversion)
normalized = struct.normalize()
print(normalized.sequence) # GGGAAACCC (uppercase)
# T nucleotides are converted to Ustruct1 = SecStruct("GGGAAACCC", "(((...)))")
struct2 = SecStruct("GGGAAACCC", "(((...)))")
struct3 = SecStruct("AAA", "...")
# Equality
print(struct1 == struct2) # True
print(struct1 == struct3) # False
# Structural similarity (0.0 to 1.0)
similarity = struct1.structural_similarity(struct2)
print(similarity) # 1.0
# Sequence identity (0.0 to 1.0)
identity = struct1.sequence_identity(struct2)
print(identity) # 1.0from rna_secstruct import SecStruct
struct = SecStruct("GGGAAACCC", "(((...)))")
# Convert to dictionary
data = struct.to_dict()
print(data)
# {'sequence': 'GGGAAACCC', 'structure': '(((...)))', 'motifs': [...]}
# Convert to JSON string
json_str = struct.to_json(indent=2)
print(json_str)
# Create from dictionary
struct2 = SecStruct.from_dict(data)
# Create from JSON string
struct3 = SecStruct.from_json(json_str)struct = SecStruct("GGGAAACCC", "(((...)))")
# Save to file
struct.to_json_file("structure.json", indent=2)
# Load from file
loaded = SecStruct.from_json_file("structure.json")from rna_secstruct.json_encoder import SecStructJSONEncoder, dumps, loads
import json
struct = SecStruct("GGGAAACCC", "(((...)))")
# Use custom encoder
json_str = json.dumps(struct, cls=SecStructJSONEncoder, indent=2)
# Or use convenience function
json_str = dumps(struct, indent=2)
# Load back
data = loads(json_str)
struct2 = SecStruct.from_dict(data)struct = SecStruct("GGGAAACCC", "(((...)))")
# CSV representation
csv = struct.to_comma_delimited()
print(csv) # GGGAAACCC,(((...)))import pandas as pd
from rna_secstruct import SecStruct
# Create a DataFrame with sequences and structures
df = pd.DataFrame({
'sequence': ['GGGAAACCC', 'GGAAACGAAAC', 'GGGACCUUCGGGACCC'],
'structure': ['(((...)))', '((...)(...))', '(((.((....)).)))']
})
# Convert to SecStruct objects
df['secstruct'] = df.apply(
lambda row: SecStruct(row['sequence'], row['structure']),
axis=1
)
# Access motifs directly
df['num_helices'] = df['secstruct'].apply(lambda s: len(s.get_helices()))
df['num_hairpins'] = df['secstruct'].apply(lambda s: len(s.get_hairpins()))import pandas as pd
from rna_secstruct import SecStruct
df = pd.DataFrame({
'sequence': ['GGGAAACCC', 'GGAAACGAAAC'],
'structure': ['(((...)))', '((...)(...))']
})
# Create SecStruct column using accessor
df = df.rna.add_secstruct('sequence', 'structure', column='secstruct')
# Add statistics columns
df = df.rna.add_statistics('secstruct')
# Adds: secstruct_num_bp, secstruct_num_unpaired,
# secstruct_gc_content, secstruct_lengthimport pandas as pd
from rna_secstruct import SecStruct
# Series of SecStruct objects
series = pd.Series([
SecStruct("GGGAAACCC", "(((...)))"),
SecStruct("GGAAACGAAAC", "((...)(...))")
])
# Get statistics
num_bp = series.rna.num_basepairs()
num_motifs = series.rna.num_motifs()
gc_content = series.rna.gc_content()
helix_lengths = series.rna.helix_lengths()
has_pk = series.rna.has_pseudoknot()
# JSON operations
json_str = series.rna.to_json(indent=2)
series2 = series.rna.from_json(json_str)from rna_secstruct import batch_parse
# Large dataset
sequences = ["GGGAAACCC"] * 1000
structures = ["(((...)))"] * 1000
# Process in parallel
results = batch_parse(
sequences,
structures,
n_jobs=4, # Number of parallel jobs
backend="multiprocessing" # or "threading" or "sequential"
)
print(len(results)) # 1000from rna_secstruct.parallel import batch_connectivity
sequences = ["GGGAAACCC"] * 1000
structures = ["(((...)))"] * 1000
# Generate connectivity lists in parallel
conn_lists = batch_connectivity(
sequences,
structures,
n_jobs=4,
backend="multiprocessing"
)from rna_secstruct.parallel import batch_apply
from rna_secstruct import SecStruct
# List of structures
structs = [SecStruct("GGGAAACCC", "(((...)))")] * 1000
# Apply function in parallel
def count_motifs(s):
return s.get_num_motifs()
results = batch_apply(
structs,
count_motifs,
n_jobs=4,
backend="multiprocessing"
)# Multiprocessing (default, good for CPU-bound tasks)
results = batch_parse(seqs, structs, backend="multiprocessing", n_jobs=4)
# Threading (good for I/O-bound tasks)
results = batch_parse(seqs, structs, backend="threading", n_jobs=4)
# Sequential (no parallelization)
results = batch_parse(seqs, structs, backend="sequential")from rna_secstruct import SecStruct
# Two separate RNA molecules
struct = SecStruct(
"GGGAAACCC&UUUGGGAAA",
"(((...)))&(((...)))"
)
# Access strands separately
print(struct.sequence.count('&')) # Number of strand separators
# Split into individual strands
strands = struct.split_strands()
for i, strand in enumerate(strands):
print(f"Strand {i}: {strand.sequence} {strand.structure}")
# Iterate over motifs (includes all strands)
for motif in struct:
print(motif.sequence) # May contain '&' for multi-strand motifsstruct1 = SecStruct("GGG", "(((")
struct2 = SecStruct("AAA", "...")
# Join with & separator
joined = struct1.join(struct2)
print(joined.sequence) # GGG&AAAfrom rna_secstruct import SecStruct, get_connectivity_list, STANDARD_BRACKET_TYPES
# Pseudoknot structure using different bracket types
pseudoknot = SecStruct("GGGAAACCC", "(([[))]]")
# Use connectivity module for full pseudoknot analysis
conn = get_connectivity_list(
"GGGAAACCC",
"(([[))]]",
bracket_types=STANDARD_BRACKET_TYPES # Supports () [] {} <>
)
print(conn.is_nucleotide_paired(0)) # True
print(conn.get_pair_type(0)) # '('
print(conn.get_pair_type(3)) # '['from rna_secstruct.connectivity import has_pseudoknot, STANDARD_BRACKET_TYPES
# Simple structure (no pseudoknot)
conn1 = [8, 7, 6, -1, -1, -1, 2, 1, 0]
print(has_pseudoknot(conn1, STANDARD_BRACKET_TYPES)) # False
# Pseudoknot structure
conn2 = get_connectivity_list("GGGAAACCC", "(([[))]]",
bracket_types=STANDARD_BRACKET_TYPES)
print(has_pseudoknot(conn2.connections, STANDARD_BRACKET_TYPES)) # TrueThe parser handles invalid inputs gracefully with warnings:
import logging
from rna_secstruct import Parser
# Set up logging to see warnings
logging.basicConfig(level=logging.WARNING)
p = Parser()
# These will log warnings but still parse:
# - Invalid characters (replaced with 'N' or '.')
result = p.parse("GGGYAACCC", "(((...)))") # Invalid 'Y' - replaced with 'N'
# - Length mismatches (truncated/padded)
result = p.parse("GGGAAACCC", "(((...)))(") # Unbalanced - will auto-fix
# - Unbalanced parentheses (auto-balanced)
result = p.parse("GGGAAACCC", "((([...)))") # Invalid bracket - normalized
# - Invalid bracket types (normalized)
result = p.parse("GGGAAACCC", "(((...)))") # Valid structurefrom rna_secstruct import SecStruct
# These will raise ValueError:
try:
# Length mismatch
struct = SecStruct("GGG", "(((") # OK
struct = SecStruct("GGG", "(((") # OK
except ValueError as e:
print(f"Error: {e}")
# Invalid structure (if validation enabled)
try:
struct = SecStruct("GGGAAACCC", "(((...)))")
struct.validate() # Raises if invalid
except ValueError as e:
print(f"Validation error: {e}")Main class for RNA secondary structures.
Key Methods:
get_motifs(params)- Search for motifs with constraintsget_motifs_by_token(token)- Search by motif identifierget_motifs_by_strand_lengths(lengths)- Search by strand lengthsget_twoway_junctions_by_topology(x, y)- Find junctions by topologyget_helices(),get_hairpins(),get_junctions(),get_single_strands()- Get specific motif typeschange_motif(id, sequence, structure)- Modify a motifget_sub_structure(id)- Extract substructureget_copy()- Create a copyto_str()- Format structure representationsplit_strands()- Split multi-strand structureinsert(pos, other)- Insert structure at positionjoin(other)- Join structures with &replace(other, pos)- Replace at positionremove(start, end)- Remove regionsubtract(other)- Remove substructurefind(sub)- Find substructure positionsfind_sequence(pattern)- Find sequence patternfind_structure(pattern)- Find structure patternis_paired(index)- Check if position is pairedget_basepair(index)- Get base pair tupleget_num_basepairs()- Count base pairsget_num_unpaired()- Count unpaired nucleotidesget_gc_content()- Calculate GC contentget_helix_lengths()- Get helix lengthsvalidate()- Validate structureis_valid()- Check validitynormalize()- Normalize sequencestructural_similarity(other)- Compare structuressequence_identity(other)- Compare sequencesto_dict()- Convert to dictionaryto_json()- Serialize to JSONfrom_dict(data)- Create from dictionaryfrom_json(json_str)- Deserialize from JSONto_json_file(filepath)- Save to filefrom_json_file(filepath)- Load from file
Properties:
sequence- RNA sequencestructure- Secondary structuremotifs- Dictionary of motifs (lazy-loaded)connectivity- Connectivity list (pairmap)
Represents individual structural motifs.
Properties:
m_id- Motif IDm_type- Motif type (HELIX, HAIRPIN, JUNCTION, SINGLESTRAND)sequence- Motif sequencestructure- Motif structurestrands- List of strand indicespositions- All positions in motifstart_pos- Start positionend_pos- End positionparent- Parent motifchildren- List of child motifstoken- Motif identifier token
Methods:
contains(position)- Check if position is in motifhas_parent()- Check if has parenthas_children()- Check if has childrenis_helix(),is_hairpin(),is_junction(),is_single_strand()- Type checksnum_strands()- Number of strandsrecursive_sequence()- Sequence including childrenrecursive_structure()- Structure including childrento_str(depth)- String representationto_dict()- Convert to dictionary
Parameters for RNA motif search.
Attributes:
sequence- Exact sequence to matchstructure- Exact structure to matchm_type- Motif type to matchmin_pos,max_pos- Position constraintsmin_id,max_id- ID constraintstoken- Token to matchmin_length,max_length- Length constraintsstrand_lengths- List of strand lengths to matchhas_children- Whether motif must have children
Connectivity/pairmap representation.
Methods:
is_nucleotide_paired(index)- Check if pairedget_paired_nucleotide(index)- Get paired positionget_basepair(index)- Get base pair stringget_pair_type(index)- Get pair type (bracket/letter/number)
Properties:
connections- Connectivity listsequence- RNA sequencestructure- Secondary structurepair_types- Dictionary of pair types
get_connectivity_list(sequence, structure, format, bracket_types)- Create ConnectivityListconnectivity_list(structure, bracket_types)- Get simple connectivity listhas_pseudoknot(connectivity_lists, bracket_types)- Detect pseudoknotsis_circular(start, connections)- Detect circular structuresbatch_parse(sequences, structures, n_jobs, backend)- Parallel parsingbatch_connectivity(sequences, structures, format, n_jobs, backend)- Parallel connectivitybatch_apply(structs, func, n_jobs, backend)- Parallel function application
STANDARD_BRACKET_TYPES- Standard bracket types for pseudoknots:[('(', ')'), ('[', ']'), ('{', '}'), ('<', '>')]
- Jupyter Notebooks: See
notebooks/directory for detailed examples01_basic_usage.ipynb- Basic operations02_connectivity.ipynb- Connectivity analysis03_structure_manipulation.ipynb- Structure manipulation04_search_and_analysis.ipynb- Advanced search05_json_serialization.ipynb- JSON operations06_pandas_integration.ipynb- Pandas integration07_parallel_processing.ipynb- Parallel processing- All notebooks have been tested and work with the current version
- Run
jupyter notebookfrom the project root to explore examples
- API Documentation: Check docstrings in source code
- Examples: All examples in this README are runnable
- Type Hints: Full type annotations throughout for better IDE support and type checking
# Run all tests
pytest
# Run with coverage
pytest --cov=rna_secstruct --cov-report=html
# Run specific test file
pytest test/test_parser.py
# Run excluding integration tests
pytest -m "not integration"# Format code
black rna_secstruct/ test/
# Lint and auto-fix
ruff check rna_secstruct/ test/
ruff check --fix rna_secstruct/ test/
# Type checking
mypy rna_secstruct/
# Run all checks
make check-allContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under a Non-Commercial License. See LICENSE file for details.
For commercial licensing inquiries, please contact: jyesselm@unl.edu
If you use rna_secstruct in your research, please cite:
@software{rna_secstruct,
author = {Yesselman, Joe},
title = {rna_secstruct: A Python package for RNA secondary structure analysis},
url = {https://github.com/jyesselm/rna_secstruct},
version = {0.1.1},
year = {2024}
}- GitHub: https://github.com/jyesselm/rna_secstruct
- Issues: https://github.com/jyesselm/rna_secstruct/issues
- Author: Joe Yesselman (jyesselm@unl.edu)
Note: This package is designed for non-commercial use. For commercial applications, please contact the author for licensing options.