- Jan 27, 2014
-
-
Christian Dietrich authored
Differences: - the task activation order is determined in the faulty experiment as well as in the golden run (which is now done by fail-generic-tracing) by observing a variable fail_virtual_port. - There is a panic value read from the fail_virtual_port - The golden run task activation is determined by giving an extended trace to task_activation.py. The script collects all writes to fail_virtual_port, and determines the activation from this. Change-Id: Id401b78933b45a4b2cf031fc0a8b5ac90151ec24
-
- Jan 24, 2014
-
-
Horst Schirmeier authored
This only compiled everywhere because all users included (i)ostream. Change-Id: I29b0fb13a01606fdffd8ebdb9701eff652065916
-
Horst Schirmeier authored
-
- Jan 23, 2014
-
-
Horst Schirmeier authored
The dependency on fail-comm exists not only at compile time (the latter is due to protobuf header generation). Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30
-
- Jan 22, 2014
-
-
Horst Schirmeier authored
Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a
-
Horst Schirmeier authored
-
Michael Lenz authored
-
Michael Lenz authored
In some cases the write-pilot is located at the upper boundary of the experiment and thus is in a race situation with the experiment's end. If the experiment's end occurs first, the campaign ends and complains about missing data, otherwise everything is fine. This patch circumvents this via using "the first" writing pilot; iff the only write is located at the experiment's end, the race will still occur, but cleverly written experiment code can, according to hsc, circumvent it. Change-Id: I6a27a8c4770c04ea8dcaef8aa7bd85d18f43f0b5
-
- Jan 21, 2014
-
-
Richard Hellwig authored
The TrapListener works like in Bochs. Instead of a number to a trap the offset is returned for GEM5. See: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211h/Babfeega.html Conflicts: simulators/gem5/src/cpu/simple/atomic.cc Change-Id: Ia8b2083e3c16315d9c577150f14f16995494b2e6
-
Richard Hellwig authored
-
Horst Schirmeier authored
Unfortunately this implicit dependency is currently not resolved anywhere else (e.g., FindBoost.cmake), although the 'net heavily discusses this issue. Change-Id: I8a7c8518394cdba27e591fed250623011d988067
-
Lars Rademacher authored
As 32-bit libc6 atoi() caps the value of unsigned ints bigger than 2^31-1 (instead of just letting it overflow to the corresponding negative value, as on x86_64), it must not be used especially for the conversion of 32-bit pointers. Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f
-
Horst Schirmeier authored
This prevents integer overflows when using addresses > 2GiB, which are common for x86 operating systems with paging (Linux, Fiasco.OC) or some test cases on the PandaBoard. Note that this results in slightly different result table definitions when automatically translating an experiment's protobuf message in the DatabaseCampaign. This change affects all existing protobuf messages to prevent copy/paste propagation of this issue. Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
-
- Jan 20, 2014
-
-
Horst Schirmeier authored
Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10
-
Horst Schirmeier authored
The new CLIENT_JOB_INITIAL configuration option allows to configure the client to request more than one job in the first request round. If a reasonable initial value is chosen, this removes the job ramp-up after each fail-client restart, and slightly improves overall throughput. Change-Id: Idac2721264ec264c520d341fac64a8311a974708
-
Horst Schirmeier authored
This change makes the JobClient act properly on communication aborts. Change-Id: I0a76489f117e9721546215e3b627002605e25452
-
Horst Schirmeier authored
The JobClient currently waits a LONG time until it really shuts down after not having reached the server in sendResultsToServer() (which is unfortunately the by far most probable point in the code to determine this): - A different bug (fixed in the previous commit) provoked the situation that a (way) too large amount of jobs was fetched before. - sendResult() (called after each experiment iteration) realized that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to prematurely call home to send first results (without planning to get new jobs yet). - If the server was gone (done, or aborted), connect in sendResultsToServer() failed after several retries and timeouts. - All subsequent calls to sendResult() retried connecting to the server (again, with retries and timeouts), once for each remaining job. - When all jobs were done, getParam() tries to connect a last time, finally telling the experiment that nobody's home. This resulted in client shutdown times of up to four hours (for the default CLIENT_JOB_LIMIT of 1000) after the campaign server terminated. This change solves the issue by not handing out new (cached) jobs after the connect failed once, making the experiment terminate quickly. Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
-
Horst Schirmeier authored
If we don't properly initialize the job timing statistics, the number of jobs to be requested in the second request to the server is based on the wrong timings. In our test case, CLIENT_JOB_LIMIT jobs were requested at once. Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
-
Horst Schirmeier authored
Delay insertion of to-be-sent jobs into m_runningJobs until they are really sent, as getMessage() won't work anymore (as in: segfault) if this job is concurrently re-sent (due to campaign end), its result is received, and deleted in the campaign. This becomes non-hypothetical with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC. Additionally, reinsert the remaining jobs into the input queue if communication fails, instead of inefficiently delaying redistribution until the campaign end. Change-Id: If85e3c8261deda86beb8d4d93343429223753f22
-
Horst Schirmeier authored
Bounding the outgoing queue is always a good idea: If the campaign has separate threads for outgoing and incoming jobs (true for the DatabaseCampaign), this keeps memory requirements reasonable. If the campaign works in a single thread, this is not disadvantageous either. Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86
-
Horst Schirmeier authored
To allow the JobServer to shutdown properly, the accept() loop in JobServer::run() needs to regularly check whether we're done. This change introduces a timed, non-blocking variant of accept() into SocketComm to achieve this. Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
-
Horst Schirmeier authored
To avoid accessing destroyed resources in CommThreads talking to clients, we need to properly join them on shutdown. The m_CommMutex becomes a JobServer member to make sure it isn't destroyed before the JobServer itself. Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0
-
Horst Schirmeier authored
This change cleans up in/out queue synchronization in the job server. End-of-jobs conditions are now properly signaled through the SynchronizedQueue, allowing to resume and abort blocked readers when no more input is expected. Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906
-
Horst Schirmeier authored
The Fail* tools expect trace events to be ordered in a specific way: memory-access events are supposed to come *after* the instruction event for the instruction that caused them. Using a different order may cause subtle problems with both fault-space pruning and fast forwarding. This change introduces a warning message when such a malformed trace is detected (i.e., when the instruction pointer of a memory-access event does not match the preceding instruction event). Change-Id: I8ae7420fd8ff26e2574590748bdcc5a63db76490
-
Horst Schirmeier authored
-
Horst Schirmeier authored
According to <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>, (potentially) threaded clients should use the reentrant libmysqlclient_r. This is just a precaution, I haven't seen any issues with the normal libmysqlclient. Change-Id: Icb29df6dd54eb666e3b43b73fbda406acccd11cb
-
Horst Schirmeier authored
Change-Id: Ib68e54ba82e988db0d2d74ffafa6dc9bd54cd272
-
Horst Schirmeier authored
According to <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>, a MySQL connection handle must not be used concurrently with an open result set and mysql_use_result() in one thread (DatabaseCampaign::run()), and mysql_query() in another (DatabaseCampaign::collect_result_thread()). This indeed leads to crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE), and maybe even more insidous effects in other cases. The solution is to create separate connections for both threads. Additionally, call mysql_library_init() before spawning any threads. Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb
-
- Jan 15, 2014
-
-
Michael Lenz authored
-
Michael Lenz authored
Up until now the JobServer was silently losing jobs and only claiming to be finished - a workaround for this was to restart the campaign until all jobs were finished according to the database and the campaign's output. This change fixes the underlying problem, so a single campaign-run suffices and does no longer lose any jobs. Debugging this was awful and took us quite some time... Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c
-
Michael Lenz authored
This removes the ability to directly parse protobufs from the socket, because google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message; thus preventing us from sending multiple Message objects over a single socket. Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656
-
Christoph Borchert authored
Change-Id: Id0bb9400b8aa28307ed385a8c32b91b17254ba1c
-
- Jan 14, 2014
-
-
Richard Hellwig authored
-
Richard Hellwig authored
GEM5 throws a reset trap during initialization. This happens before the startup function is called. This leads to problems because the startup function fills the m_CPUs list. m_CPUs is needed for the TrapListener. Therefore, we only react on traps after initialization. This is needed in the following commit (see gem5/src/arch/arm/faults.cc). Change-Id: I9ec6fd453705feb54b4f8a87d024181323a2d7ef
-
Richard Hellwig authored
-
Richard Hellwig authored
Change-Id: I01fdb5e4bdd61fc761e93ef77904c830131c9ed6
-
- Jan 06, 2014
-
-
Richard Hellwig authored
Change-Id: I6ea9811c132ef7c235d5a03486ca08afc842b51f
-
- Jan 03, 2014
-
-
Richard Hellwig authored
Parameters that are specified on the command line are now also forwarded. Change-Id: I0e636f14dba43ef7877ce6e6deca1abb1f00a8a6
-
- Dec 11, 2013
-
-
Michael Lenz authored
"removed" unneccessary memory-mapping ("Step 0") cleaned out ExperimentData - now consists only of fsppilot and resultset resultset now contains bitoffset which is part of result-table's primary key adapted code to work with msg.fsppilot() instead of ExperimentData-values Change-Id: I3b310e7a71d4b28479028250cd5722b3b2ce9f8c
-
- Dec 06, 2013
-
-
Martin Hoffmann authored
-