“Its my instance and I’ll Block if I want to, Block if I want to…”

Right now, updating the Iceshrimp Blocklist is a manual task; these scripts automate that process.

Note:
I am well aware of the controversy and debate around Blocklist(s) but the fact remains that they exist and people use them. This is a post about how to automate the currently manual process.

Like many Instance owners and Administrators, I employ Blocklists.

These Blocklists need regular updating, mainly because a Domain may be on a Blocklist one week and removed from that Blocklist later on.

When using Iceshrimp, this process is a manual one:

  • Find a suitable blocklist
  • Download the list (usually a .csv file)
  • Extract the column of Domains
  • Copy that column to the clipboard
  • Log onto Iceshrimp as an Administrator
  • Go to ‘Control Panel | Federation Management’
  • Delete any old Domain names in the text box
  • Paste in the clipboard contents

Clearly this is less than ideal.

Updating the Iceshrimp Blocklist Python Scripts

This first script replaces that manual task by; collecting a single online list, doing the data extraction, and then inserting the Domain info into the PostgreSQL database using Python.

update_blocklist_single.py

This requires the following:

  • The remote URL link to a .csv Blocklist file
  • A local path setting
  • The local filename for the .csv file (this can be the same as the remote name)
  • Finally, a local filename for the log file
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# The url library
import urllib.request

# The pandas library (pip install pandas)
import pandas as pd

# The PostgreSQL library (pip install psycopg2)
import psycopg2

# Local path and filenames
blocklist_path = "/home/foo/path/to/update_iceshrimp_blocklist/"
blocklist_filename = "blocklist_single.csv"
blocklist_log_path = "/home/foo/Logs/"
blocklist_log_filename = "iceshrimp_blocklist_single_updated"

# Link to the Blocklist to be used
url = ("https://codeberg.org/oliphant/blocklists/raw/branch/main/blocklists/_unified_tier0_blocklist.csv")

# Go grab the Blocklist
urllib.request.urlretrieve(url, blocklist_path+blocklist_filename)

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(blocklist_path+blocklist_filename)

# Select the first column (column 0) and convert it to a text Series
blocklist_series = df.iloc[:, 0].astype(str)

# Join the text Series into a single string with a comma
blocklist_string = ','.join(blocklist_series)

# Establish a Postgres DB connection
conn = psycopg2.connect(host="127.0.0.1", database="iceshrimp_db", user="iceshrimp_user", password="iceshrimp_db_user_password", port= '5432')

# Create a cursor object using the cursor() method
cursor = conn.cursor()

# Prepare the SQL query to UPDATE the 'blockedHosts' field in the 'meta' table in the database
table_modification = """UPDATE meta SET "blockedHosts" = '{""" + blocklist_string + """}' WHERE id = 'x'"""

# Execute and error-trap
try:
    # Execute the SQL command
    cursor.execute(table_modification)
    print("Executed and UPDATED")

    # Commit the changes to the database
    conn.commit()
    print("UPDATE Committed")

    # Create a log file to show that the UPDATE has successfully completed
    open(blocklist_log_path+blocklist_log_filename, 'w')
    print("UPDATE log file created")

except:
    # Roll back in case of error
    conn.rollback()
    print("UPDATE Failed and rolled back")

# Close the connection
conn.close()
print("Connection CLOSED")

This second script replaces that manual task by; collecting multiple online lists, doing the data extraction, collating them, removing any duplicates and then inserting the Domain info into the PostgreSQL database using Python.

update_blocklist_multiple.py

Note:
This process will work just as well with one entry in the list as it will with multiple entries.

This requires the following:

  • A ‘ blocklist_multiple.txt ‘ file that contains URLs to the remote .csv files
  • An optional .csv file containing URLS to be blocked; a ‘ personal_blocklist.csv
  • A local path setting
  • Finally, a local filename for the log file
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# The url library
import urllib.request

# The pandas library (pip install pandas)
import pandas as pd

# The PostgreSQL library (pip install psycopg2)
import psycopg2

# Local path and filenames
blocklist_path = "/home/foo/path/to/update_iceshrimp_blocklist/"
blocklist_filename = "blocklist_multiple.txt"
blocklist_log_path = "/home/foo/Logs/"
blocklist_log_filename = "iceshrimp_blocklist_multiple_updated"

# Python pathname pattern matching
import glob

# Download all of the *.csv files in 'blocklist.txt' into the current directory
with open(blocklist_path+blocklist_filename) as f:
    for line in f:
        url = line
        blocklist_filename = url.split('/', -1)[-1]
        urllib.request.urlretrieve(url, blocklist_path+blocklist_filename.rstrip('\n'))

# Put all of the *.csv files in the current directory into a single DataFrame
files = glob.glob(blocklist_path+"*.csv")
content = []

# Loop through all the *.csv files
for filename in files:
    df = pd.read_csv(filename, index_col=None)
    print(len(df.index), "rows in", filename)
    content.append(df)

# Convert 'content' to a DataFrame
df = pd.concat(content)

# Print the # of DataFrame rows
print(len(df.index), "rows in the DataFrame")

# Select the first column (column 0) and convert it to a text Series
blocklist_series = df.iloc[:, 0].astype(str)

# Make the Series unique
blocklist_series = pd.unique(blocklist_series)

# Print the # of rows in the Series
print(pd.Series(blocklist_series).count(), "unique rows in the Series")

# Join the text Series into a single string with a comma
blocklist_string = ','.join(blocklist_series)

# Establish a Postgres DB connection
conn = psycopg2.connect(host="127.0.0.1", database="iceshrimp_db", user="iceshrimp_user", password="iceshrimp_db_user_password", port= '5432')

# Create a cursor object using the cursor() method
cursor = conn.cursor()

# Prepare the SQL query to UPDATE the 'blockedHosts' field in the 'meta' table in the database
table_modification = """UPDATE meta SET "blockedHosts" = '{""" + blocklist_string + """}' WHERE id = 'x'"""

# Execute and error-trap
try:
    # Execute the SQL command
    cursor.execute(table_modification)
    print("Executed and UPDATED")

    # Commit the changes to the database
    conn.commit()
    print("UPDATE Committed")

    # Create a log file to show that the UPDATE has successfully completed
    open(blocklist_log_path+blocklist_log_filename, 'w')
    print("UPDATE log file created")

except:
    # Roll back in case of error
    conn.rollback()
    print("UPDATE Failed and rolled back")

# Close the connection
conn.close()
print("Connection CLOSED")

blocklist_multiple.txt

This is the list of URLs that link to the .csv Blocklist(s).

https://codeberg.org/oliphant/blocklists/raw/branch/main/blocklists/_unified_tier0_blocklist.csv
https://seirdy.one/pb/pleroma.envs.net.csv

personal_blocklist.csv

Additionally, a ‘personal_blocklist’ can be created (this file can be named what you like as long as it is a .csv file).

naughty_fedi.com

Updating the Iceshrimp Blocklist crontab

In addition to running the script manually, either script can be executed nightly, at midnight, from a normal crontab (crontab -e from a Terminal).

# Update the iceshrimp blocklist daily at midnight
#0 0 * * * python3 /home/foo/path/to/script/update_blocklist_single.py
0 0 * * * python3 /home/foo/path/to/script/update_blocklist_multiple.py
#

Finally

(I did mention this in a prior post but it bears repeating.)

For anyone interested in getting into the SQL side, pgAdmin is a useful tool.

I installed it using:

curl -fsS https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo gpg --dearmor -o /usr/share/keyrings/packages-pgadmin-org.gpg

sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/packages-pgadmin-org.gpg] https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/jammy pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update'

sudo apt update

sudo apt install pgadmin4

pgAdmin Screenshot

However you Fediverse, Enjoy!

UPDATE

Iceshrimp is moving to a .NET framework.

Additional scripts for updating a blocklist when running Iceshrimp.NET can be found at .