This vignette walks through the core features of tezr.
Each section builds on the previous one, starting with simple keyword
searches and progressing to multi-filter queries, detail retrieval, and
cache management.
Package builds show the code without running live requests. Set
TEZR_LIVE_DOCS=true before rendering if you want to refresh
all outputs.
tezr has three search functions:
search_basic(), search_advanced(), and
search_detailed(). The sections below cover each function
in order.
Basic Search
search_basic() searches the National Thesis Center (NTC) database
by keyword. It checks all fields by default, so it works well when you
do not know where your term appears.
# Search all fields for "tarımsal sulama"
ag_irrigation <- search_basic("tarımsal sulama")The output is a tibble with one row per thesis.
# Column names and types
dplyr::glimpse(ag_irrigation)Targeting Specific Fields
You can use the search_field argument to restrict
matching to a single field.
# Search only in thesis titles
ag_irrigation_title <- search_basic(
"tarımsal sulama",
search_field = "title")
dplyr::glimpse(ag_irrigation_title)Available search field values are "all" (default),
"title", "author", "supervisor",
"subject", "index", and
"abstract". Use
search_detailed(thesis_no = ...) for thesis-number
lookup.
# Search abstracts
abstract_search <- search_basic(
"production function",
search_field = "abstract")
# Search by author name
author_search <- search_basic(
"Işıl Şirin Selçuk",
search_field = "author")Filtering by Thesis Type and Access Status
search_basic() also accepts thesis_type and
access_type filters. These filters are applied server-side,
so you download fewer records.
Available thesis_type values are: "all"
(default), "masters", "phd",
"medical_specialty", "arts",
"dentistry", "medical_sub",
"pharmacy".
# PhD dissertations only
phd_results <- search_basic(
"ekonometri",
thesis_type = "phd")
dplyr::glimpse(phd_results)Available access type values are: "all" (default),
"open", "restricted".
# Open access theses only
open_results <- search_basic(
"hanehalkı",
access_type = "open")The 2000-Result Limit
Basic search cannot exceed 2000 results. This is a server-side limit.
If your query returns more than 2000 records, the function warns you. In
these cases, you can set max_search_results = Inf to
paginate past the limit. search_basic() automatically
delegates to advanced search for pagination when you set
max_search_results = Inf. There is more information about
pagination below.
# This stops at 2000
climate_change <- search_basic("climate change")
# Delegate to advanced search with auto-pagination
climate_change_all <- search_basic(
keyword = "climate change",
max_search_results = Inf
)Advanced Search
search_advanced() adds year range, language, thesis
type, access type, and thesis status filters to keyword search.
The NTC advanced search form supports up to three keyword rows
combined with Boolean operators (AND, OR,
NOT), each targeting a different field.
search_advanced() exposes only the first keyword row.
R packages that interface with academic databases, such as rentrez (PubMed) and europepmc (Europe PMC),
often pass Boolean logic as a single query string (for example,
"term1 AND term2"). NTC does not accept free-form Boolean
strings. It uses structured form fields for each keyword row, so that
pattern is not applicable here. To keep the interface simple,
search_advanced() does not expose Boolean row combinations.
For equivalent results, you can use the following approaches.
AND: Use
search_detailed()with its field-specific parameters (title,author,supervisor, etc.).OR: Run separate searches and combine with
dplyr::bind_rows() |> dplyr::distinct().NOT: Run both searches and exclude with
dplyr::anti_join().
Year and Language Filters
# Keyword search with year range
recent_climate <- search_advanced(
keyword = "iklim değişikliği",
year_start = 2015,
year_end = 2024
)
# English-language theses only
# language accepts ISO 639 codes ("tr", "en", "fr", "de", ...), or
# full names ("Turkish", "French")
english_growth <- search_advanced(
keyword = "economic growth",
language = "en"
)
# French-language theses
french_theses <- search_advanced(
keyword = "migration",
language = "fr"
)Thesis Status
status controls whether results include only approved
theses or also in-preparation ones: "approved" (default),
"all", "in_preparation".
# In-preparation theses (not yet defended)
ongoing_ml <- search_advanced(
keyword = "makine öğrenmesi",
status = "in_preparation"
)Combining Filters
You can combine filters to build precise keyword queries. Start with a minimal query, then add constraints.
# PhD theses in social sciences, open access, 2000-2024
complex_query <- search_advanced(
keyword = "ekonometri",
search_field = "title",
thesis_type = "phd",
year_start = 2000,
year_end = 2024,
access_type = "open"
)Auto-Pagination
When max_search_results is greater than 2000 (including
Inf) and the server reports more than 2000 matches,
tezr switches to iterative year-range pagination. If you do
not supply year_start and year_end, the
package uses 1959:current_year as the search window. It
then creates year chunks with weighted split points (pre-2000,
2000-2010, post-2010) and a safety target below the hard 2000-row cap.
Each chunk is requested with the same filters as the original query. If
a chunk is still capped by the server limit, that chunk is split again
and retried until the range is small enough (or a single year remains).
During this process, tezr updates split weights from
observed uncapped chunk densities to bias later splits toward denser
periods. Finally, chunk results are merged, deduplicated by
thesis_no, and returned. If a single year still exceeds
2000 results, the package cannot paginate further for that year and
warns you to narrow the query with additional filters.
# Retrieve all title matches (auto-paginate by year)
all_eu <- search_advanced(
keyword = "avrupa",
search_field = "title",
year_start = 2010,
year_end = 2020,
max_search_results = Inf
)
dplyr::glimpse(all_eu)Detailed Search
search_detailed() provides field-specific keyword
search. Use it when you need to target thesis titles, authors,
supervisors, subjects, index terms, or abstracts. It supports the same
auto-pagination flow as search_advanced().
Supported keyword parameters in search_detailed() are
title, author, supervisor,
abstract, keyword, and subject.
You can combine those with university,
university_id, group,
thesis_type, year_start,
year_end, language, access_type,
status, max_search_results, and
ignore_cache.
YÖK’s redesigned detailed form supports field-specific and
institutional filters. tezr sends university, institute,
division, subject, discipline, group, and thesis-number filters through
that form when you use search_detailed().
Finding Valid Filter Values
You can still use the list_*() functions to inspect
YÖK’s metadata tables and interpret result fields.
# All universities
unis <- list_universities()
head(unis)
# Subjects have Turkish and English names
subjects <- list_subjects()
subjects |>
filter(stringr::str_detect(name_tr, "Ekonomi"))
# Other list functions (each returns 'name' and 'id' columns)
institutes <- list_institutes()
divisions <- list_divisions()
disciplines <- list_disciplines()Filtering by Subject
# All econometrics theses
econ_all <- search_detailed(subject = "Ekonometri")Filtering by Supervisor
You can also filter results by supervisor names.
# Find theses supervised by a specific supervisor
supervisor_theses <- search_detailed(supervisor = "Mustafa Kadir Doğan")
head(supervisor_theses)Vector-Valued Parameters
The YÖK web portal accepts only one value per keyword field.
tezr removes this restriction for supported keyword fields
and selected filters. When you pass multiple values, the package expands
them into separate API calls, combines the results, and deduplicates by
thesis_no.
# Search multiple subjects
multi_subject <- search_detailed(
subject = c("Ekonomi", "Ekonometri")
)
# Multiple thesis types
multi_type <- search_detailed(
subject = "Ekonomi",
thesis_type = c("phd", "masters")
)
# Multiple languages (ISO 639 codes)
multi_lang <- search_detailed(
subject = "Ekonomi",
language = c("tr", "en", "fr")
)
# Search multiple subjects with pagination
multi_subject_all <- search_detailed(
subject = c("Ekonomi", "Ekonometri"),
max_search_results = Inf,
ignore_cache = TRUE
)Retrieving Detailed Metadata
Search results contain core metadata (title, author, university,
year, type, subject). If you need full details, such as abstracts,
keywords, supervisor names, page counts, and PDF links, you can use
detail() function.
Single Thesis
Pass a search-result row to detail() to fetch the full
record. When the row includes encrypted_no,
detail() uses it automatically to request citation
metadata.
# Search and get details for the first match
econ_phd <- search_detailed(
subject = "Ekonometri",
thesis_type = "phd",
year_start = 2024,
year_end = 2025
)
econ_phd_details <- detail(econ_phd[2, ])
dplyr::glimpse(econ_phd_details)
# English abstract
econ_phd_details$abstract_translationBatch Retrieval
You can also pass all search-result rows to fetch details for multiple theses. The function shows text progress updates by default and fetches uncached records in parallel (up to 5 active requests).
# Fetch details for all results
econ_phd <- search_detailed(
subject = "Ekonometri",
thesis_type = "phd",
year_start = 2025,
year_end = 2026
)
# Batch retrieval
econ_phd_all_details <- detail(econ_phd)Aggregate Statistics
These functions return summary statistics tables from the NTC.
# Thesis counts by year
year_stats <- stats_years()
tail(year_stats)
# Thesis counts by university
uni_stats <- stats_universities()
head(uni_stats)
# Thesis counts by subject
subject_stats <- stats_subjects()
head(subject_stats)
# Total counts by thesis type
type_stats <- stats_types()
type_statsCache Management
tezr caches search results, detail records, year-range
queries, and lookup lists in memory. Caching speeds up repeated queries
and reduces server load.
Viewing Cache Status
# Shows: enabled status, item counts, and TTL settings
cache_info()The output includes search_count,
range_count, detail_count,
search_ttl, and detail_ttl. Search cache
defaults to 3600 seconds (1 hour). Detail cache defaults to
NULL (session lifetime; entries stay until you clear them
or restart R).
Clearing Cache
You can clear specific cache types or everything at once. The
"lookups" option clears cached university/subject/division
lists.
# Clear search results only
cache_clear("searches")
# Clear detail records only
cache_clear("details")
# Clear lookup lists (universities, subjects, etc.)
cache_clear("lookups")
# Clear everything
cache_clear("all")Configuring Cache TTL
You can also adjust time-to-live settings or disable caching
entirely. TTL values are in seconds. NULL means entries
persist for the entire session.
# 2-hour search cache, 1-week detail cache
cache_config(
search_ttl = 7200,
detail_ttl = 604800
)
# Disable caching entirely (every call hits the server)
cache_config(enable = FALSE)
# Re-enable with defaults
cache_config(enable = TRUE, search_ttl = 3600, detail_ttl = NULL)Working with Results
Search results are returned in tibbles, so they work directly with
dplyr and other tidyverse tools.
climate_change <- search_basic("climate change")
# Count by year
climate_change |>
dplyr::count(year)
# Filter recent PhDs
climate_change |>
dplyr::filter(thesis_type_tr == "Doktora", year >= 2020) |>
dplyr::select(author, year, title_original, university)
# Most common subjects
climate_change |>
dplyr::count(subject_tr, sort = TRUE) |>
dplyr::slice_head(n = 10)See the Analysis Examples vignette for complete analysis workflows with visualizations.
Limitations and Best Practices
Technical Limitations
-
No official API.
tezrscrapes <tez.yok.gov.tr> by simulating browser requests and parsing HTML/JavaScript responses. Any change to the portal’s page structure, form parameters, or JavaScript patterns will break the package until updated. This is the primary fragility risk. - Single-year overflow. Auto-pagination splits year ranges to work around the 2000-result server cap, but cannot split below a single calendar year. If a query matches more than 2000 theses in one year, the package retrieves only the first 2000 for that year and issues a warning. You should narrow your search with supported filters such as thesis type, language, access type, status, subject, or year range.
-
In-memory cache only. All cached data (searches,
details, lookups) is stored in R environment objects and lost when the
session ends. You can save results to disk with
readr::write_rds()orsaveRDS()for persistence across sessions. -
SSL verification disabled. The YÖK server has
certificate issues, so SSL peer verification is turned off
(
ssl_verifypeer = FALSE). This is a security trade-off required for the package to function. - Fixed rate limiting. Requests use a built-in 2-second rate limit that is not user-configurable, so fetching large datasets still takes time.
-
Vector parameter expansion.
search_detailed()expands vector-valued supported parameters into separate API calls via cartesian product. Passing many multi-valued parameters can generate a large number of requests. -
Lookup tables. The
list_*()lookup functions expose metadata tables from YÖK. These IDs can be passed to detailed searches to skip name lookup. - Metadata only. The package retrieves thesis metadata. PDF URLs are included in detail records but full-text files are not downloaded. You can use URLs to download the PDFs.
Best Practices
-
Cache and save results. Run large queries once and
save locally with
readr::write_rds(),readr::write_csv(), or similar functions. Then reload from disk in later sessions. -
Filter before paginating. Add year ranges, thesis
types, languages, access types, statuses, or supported keyword fields to
keep result sets manageable before setting
max_search_results = Inf. - Minimize server load. Use cached results when possible. Avoid repeating identical queries.
- Validate data quality. Metadata may have inconsistencies (missing fields, encoding issues). Clean and validate before analysis.
