This is an update to the previous post based on a memo issued by the Tribunal Supremo Electoral (TSE) (not Audisis) on the crash of the TSE servers on November 29. This update is a result of finding a technical appendix on another copy of the TSE memo.
First, lets make it clear the memo comes from the TSE, not Audisis as the press coverage implied.
The appendix describes the vote counting system as a group of clustered servers. Scanned acta images are transmitted from the INFOP warehouse to a cluster of two receivers at the TSE. Transmission is via JSON, a markup language, over a secure network connection. From there, the JSON markup is converted to a database record, transitioning through an application software cluster of two servers, and a database cluster of two servers. We learn for the first time what we had guessed; that the database software is MS SQL. The database cluster talks to a single database of 600 Gbytes on a SAN system which has 12 TBytes of free storage. The transcription of the actas takes place on a series of desktops connected to the application software level over a network. We know that there are at least 60 computers doing transcription today, and probably more. The Technical diagram does not show the web dissemination of data a part of this process.
The appendix says that Dell AMC sized the original database at 600 GBytes based on recommendations of the vendor of the transcription software.
The tecnical appendix has confused and undoubtedly not correct set of times for events, and contradicts the main report of what happened with respect to the database. We will use the times reported in the appendix here, but compare with the times reported in our previous blog post, which came from the body of the report.
Both the appendix and the body of the report agree that the servers ran out of space and went down at 9:47 am on November 29. They shut down the transcription system and enlarged the database to 1.8 TBytes. They repeat that it took 3 hours 20 minutes to enlarge the database and bring the transcription system back up, coming back on line about 1:10 pm the same day. When they restarted transcription, the MS SQL database was unstable, needing to be restarted every 10 minutes. No explanation of why it kept crashing or why they thought taking the next steps would resolve it.
At 5:30 pm they took down Node 1 of the database cluster, leaving Node 2 to do all database access. This supposedly restored some stability to the Transcription service. At the time they took Node 1 off the cluster, they decided to install a new database instance on it, in case Node 2 started crashing. This involved reinstalling MS SQL on Node 1, which they renamed SQL4, which they then gave a 6 TByte database. They also configured another server, SQL5, as a mirror of SQL4, with a 2 TByte database. They took a snapshot of the database on Node 2 at 9:47 pm? and restored it on to SQL4 and SQL5, finishing about 1:10 am? (the appendix says PM but that's not possible unless it took into the next afternoon, long after they'd restarted the transcription process). The transcription server was then reconfigured to use SQL4 and Node 2 and its database were permanently retired. SQL5 collected a mirror of the database but otherwise was not part of the transcription process.
Audisis reportedly then audited what the TSE had done. The TSE claims what they did was completely transparent. This description of a reformat of the server and installation of a new, larger database on the SAN matches with what the Alianza reported when they stated that the TSE had formatted the server. They did.
Just as a point of normal procedure, it should have been Audisis that released a report on the changes to the system, not the TSE, and it should have been Audisis in its own words reporting on the integrity, or lack thereof, of the systems after the change. Instead the TSE chose to put words in Audisis's mouth. That they managed to nearly fill up a 12 TByte SAN with databases is remarkable.
No comments:
Post a Comment